In high-compliance industries like finance and healthcare, sensitive data flows into, out of, and throughout the infrastructure all day. Because of this, many of the files need to comply with strict standards like HIPAA for storage and transfer.
But, often, not all of them need to.
Nowadays, it's not uncommon for a business to have a combination of files meant to be shared among the wider team for collaboration, along with sensitive data that needs to be kept private. The issue is how to split these different types so that they go to the right people, and so the wrong people don't have access to files they shouldn't.
This is one of the main use cases for Data Loss Protection (DLP). Data Loss Protection can cover several different features that, when combined together, can substantially reduce the risk of data loss.
In this article, we'll focus on one key component of DLP, scanning and detecting files that might be sensitive, which for our purposes, we'll collectively refer to as DLP monitoring.
We cover other types of DLP such as filtering and scrubbing in a high-level overview at What is DLP?
What is DLP Monitoring?
DLP monitoring is the process of scanning data stores or data in transit for sensitive data. In order to do this, the tool running the scan needs to have access to not only view the file metadata, but also the contents of the file itself. Essentially, it needs full permission to see all the files and what they contain while the files are stored and/or in transit.
This doesn't necessarily mean that the tool comprehends what the files are about. More often than not, DLP monitoring uses machine learning for advanced pattern recognition.
For example, a Version 4 UUID is structured as a 128-bit number in 32 hexadecimal characters in 5 separate groups with a specific number of characters in each group. Since this is a very specific pattern, if the tool finds ee478ce7-67eb-4aa6-8537-adb0895fb3fc, it would flag it as a potential UUID number, since it is unlikely to be anything else.
So what about more common patterns? Take, for example, the string George Washington Lane. That could very well be a street name, but it could also be a person's name (including a somewhat famous one). Both of those could also be identifiers for PII (Personally Identifiable Information).
How does DLP monitoring reconcile this? That will depend on the settings put in place. A common approach is to flag this as potential PII, which can then trigger more actions like requiring someone to manually review the file. Another is to simply deny the file from being moved to the location.
What kinds of data can DLP tools monitor for?
DLP tools can monitor for essentially anything in a pattern. This doesn't mean that all tools can recognize all types of data, however. For instance, some might be focused on healthcare data and protected health information for HIPAA compliance.
There are literally hundreds of different types of data that DLP tools can monitor for, like:
- Personal information like passports, drivers licenses, and SSNs
- Personal Health Information like patient profiles, admission & discharge dates, and medical record numbers
- Device and System identifiers like serial numbers and MAC addresses
- Keys and authenticators like passwords, certificates, and access keys
- Business data like lease numbers, bank accounts, and credit card info
So why not scan for every potential data type just to be sure?
There are a few reasons why this isn't standard practice. One is that it greatly inhibits speed and performance. Checking every file against thousands of entries requires more time and resources, and if you're never expecting something like court numbers--and don't care if something with that pattern comes through--then there's no point in monitoring and flagging that type.
This leads to the other main issue of having too many alerts. When there's hundreds of data types to check against, there's going to be a lot of alerts that come up. Someone will either need to handle those alerts or let an automation handle them, which could lead to missing data when so many data types to consider. A massive number of alerts could also lead to alert fatigue, so even if the important data is flagged, it could also easily slip through because it was missed in a cursory review.
The better method is to know what kind of data your organization is expecting, flag files with sensitive data that could reasonably be expected to come through, and then base your next steps depending on what data is flagged.
Flagging custom data types
Most DLP tools have built-in capabilities to recognize common data types like identifying documents, social security numbers, and financial information. But what about if you need to check for a very specific or even proprietary data pattern?
In more advanced tools, you can create custom detectors to monitor data that your organization deems important or sensitive. Google Cloud's Sensitive Data Protection, for instance, lets you create a custom regex detector to define your own patterns. However, as this is a custom pattern, you're also responsible for accuracy and updating changes, and ensuring the regex doesn't interfere with other data types unexpectedly.
Types of DLP monitoring
There are two primary types of DLP monitoring: scanning files as they attempt to move between folders and scanning folders at rest. Both are relatively straightforward.
DLP montiforing for data at rest and data in transit
DLP monitoring in transit scans files as they move and flags files that match specific types during that stage. It won't catch files that are stored at rest, and it's not intended to. Instead, the primary purpose is to stop sensitive data from ending up in the wrong place and to warn you when sensitive files are trying to move somewhere they shouldn't.
DLP monitoring at rest scans the data where it's stored. These scans use the DLP engine to check data against rules that you set, such as personal information.
However, this returns to one of the original issues: what if some locations should have these sensitive files, while others shouldn't? This is where monitoring folders vs the entire tenant can come into play.
Monitoring folders vs entire tenants
Depending on what the business needs DLP for, they may need to monitor the entire tenant or only specific folders. Monitoring the whole tenant would generally only be needed for the most secure or sensitive environments. In these cases, everything that comes in needs to be closely monitored for security and compliance risks.
More commonly, organizations need to only monitor a subset of folders where sensitive data may be flowing through. This subset is tightly-watched and controlled so that files are flagged before entering or exiting the folder so that data isn't overshared.
For instance, one SharePoint site might be for accepting client records, so DLP monitoring is used on the entire site, while a different site is for collaborative marketing where those client records are never expected to be part of data flows.
This often leads to different folders having different data monitoring settings. Some folders might flag potential PHI, while others might check for passwords or authentication. Some might flag files coming, like client details, while others might flag files going out, like confidential company information. Connections with external parties can be monitored at the ingestion point to make sure they aren't sendingdata that shouldn't be transferred to those locations.
Because it's common for organizations to have different policies for different kinds of data, the next steps can vary considerably. Fortunately, many DLP tools can take actions depending on what kind of data is found based on the policies that have been put in place.
What can you do if DLP tools find sensitive data?
The monitoring stage of data loss protection simply identifies that data may be potentially sensitive. From there, it will be up to your organization's policies to decide what to do with it, and there is no shortage of options.
One method is to have completely different storage platforms. For example, you could use SharePoint for important, sensitive files and Dropbox for files that should be shared. The problem with this approach is that in many cases, there isn't such a clear-cut line. Sensitive files might need to be shared occasionally or sent to other platforms for analysis, and they might even be expected to include that content rather than not.
These kinds of situations quickly devolve into a complex series of processes and permissions where data slips through the cracks and users can't get access to the data they need. Plus, it makes for an additional platform that needs to be set up, paid for, and monitored.
A better use case for "monitoring" might be a Data Loss Protection tool that can check if the file contents hold information that's marked as sensitive. Then, depending on what's inside the file, a different action can happen like deleting it, holding it for approval, routing it to another location, or automatically adjusting the permissions.
Many DLP tools allow you to monitor, route, and scrub data using a single tool so that the entire flow is protected.
Monitor, route, and scrub data with Transfer Shield
Transfer Shield is a Couchdrop feature that is data loss protection specifically for file transfers. When accepting files from external parties, you don't have control over what they send you, so files could potentially contain data that you don't want in your platforms.
When you enable Transfer Shield on a folder, Couchdrop scans the file for data types that you specify. Then, depending on what the data is, it can perform various actions, such as stopping the transfer or requiring manual approval from a specific user. You can both exclude by data type or include by data type if you're expecting a specific kind of data to be sent to you.
If you're interested in adding Transfer Shield to your Couchdrop account, get in touch with sales@couchdrop.io or request access directly from our Transfer Shield access page.