What do Office 365 DLP Policies, Communication Compliance, Data Investigations, Advanced e-Discovery, Cloud App Security, Insider Risk, and automatic Sensitivity Labels all have in common?
They can all leverage Sensitive InformationTypes.
Sensitive Information Types (SI Types) are fundamental objects for identifying information in Office/Microsoft 365. When you need to define what “it” is, an SI Type is the definition.
Office 365 comes with over a hundred SI Types built-in; these cannot be edited, but can be used for initial assessments of your environment.
SI Types are found under Data Classification in the Compliance Center (compliance.microsoft.com). You can also navigate to them through the old Security & Compliance Center.
Most organizations will need to create their own SI Types, so let’s take a look at that process.
Creating a Sensitive Information Type
SI Types can be broad or specific. Defining an SI Type in itself will not affect your environment. Since they are leveraged by other solutions, they don’t do anything until tied to some kind of active policy.
Navigate to SI Types, then click on New. The first step is to provide a name and desciption.
Next we’ll get into the meat of defining an SI Type. Adding one or more elements. Be sure to click on “Add an Element”. It’s easy to overlook.
At a minimum, an SI Type will have a matching element. This is the definition of what “it” is. Below that are Supporting elements, which can help define the SI Type more specifically. You can only have one matching element, but you can have multiple supporting elements.
SI Type elements can be defined in one of three ways, pictured below
Keywords are just that: enter a bunch of keywords. A dictionary is a large set of keywords, which can be re-used in other SI Types (see below). Regular expressions are formulas for patterns; for example, something that looks like a US social security number, or something that looks like a credit card number.
For example, you might want to have a definition for insider trading information. You might have a dictionary that gets periodically updated, but also include some supporting elements such as things that look like account numbers. Or, if you’re a manufacturer, there might be some patent numbers that you list as keywords, along with some other key terms that are usually found with them.
Dictionaries are easy to add. You can also add and edit them via PowerShell. In the GUI though, choose to detect content containing a dictionary, then click on Add a dictionary.
I have some dictionaries already defined in my environment: LoremIpsum, and some variations on insider trading. To create a new dictionary though, I’d click on “Create new keyword dictionaries”.
Dictionaries are pretty simple: a name, and a list of words/phrases.
Juggling Requirements
As I mentioned, an SI Type can have three kinds of element: keywords, dictionaries, or regular expressions. You can have one matching element (must have) and one or more supporting elements (helps to have).
Concept: Mix the element types, i.e. a a Regular Expression for the matching element, with Keywords and Dictionaries for supporting elements.
As with all configuration wizards, review and finalize.
Accuracy is used by certain services to trigger specific actions; for example if an SI Type is 80% accurate, prevent download. Proximity just defines the distance from which a supporting element can be from the matching element, in characters; the minimum is 50.
Using Sensitive Information Types
As I mentioned earlier, SI Types can be used in a variety of compliance solutions in Office 365. My personal favorite is automatically labeling content in Office apps.
SI Types can also be used as inspection methods for Cloud App Security. Wherever you are looking, CAS can reference an SI Type to check if the file has matching content, and take action.
You can also use SI Types in Communication Compliance. While the new classifiers bring machine learning into the mix, SI Types are tried and true, and perfectly capable of flagging content based on a variety of criteria – bad words, phrases indicating malfeasance, or making sure employees are sending sensitive information outside the company.
SI Types can be used in DLP policies as well. For example, making sure that a spreadsheet of customer PII isn’t downloaded on their PC. Another example would be in a Teams DLP policy, preventing employees from discussing sensitive information in non-sensitive Teams chats.
Finally, Insider Risk is capable of leveraging SI Types to prioritize content to keep aware of. For example, if an employee starts downloading or deleting content that has confidential or proprietary language.
Summary
I’ve been meaning to write this post for a while, because many of the solutions I’ve written about include SI Types. I can’t write about other things without writing about SI Types, but they’re enough of their own “thing” that it is more than a detour to try to explain them. So, here, at least, Sensitive Information Types, explained on their own terms.
The biggest challenge to implementing SI Types is how to do so in a programmatic way. Typically, even a well-organized customer has various teams organized around the workflow or solutions that SI Types support, i.e. communication compliance, data security, and so on. They’re not organized around what a dynamic taxonomy would look like.
Thinking out loud, it’s very much a librarian’s job: defining what these fundamental types of information are. This is not the same as defining a labeling nomenclature, because properties A+B+C might = Label 1, but perhaps A+B+# = Label 2, and so on.
One thing is for certain: Sensitive Information Types provide a highly configurable way to identify content used by your users, which can then be leveraged to protect it and your organization across a variety of locations.