Use Case: Data De-identification

Learn how Screen generates secure datasets

The problem

Many valuable datasets are difficult or risky to use because they contain sensitive data. For example, it is often not appropriate to give broad access to internal teams to this data, use this data for training machine learning models, or share this data with external vendors or APIs.

This creates a large barrier to entry to unlocking data for business needs. Even if you are aware that sensitive data exists in a dataset after sensitive data discovery or otherwise, it remains difficult to de-identify the data and generate secure datasets unless you are able to accurately and scalably identify sensitive data and remove or obfuscate it.

How we help

Granica Screen provides integrated tools to automatically redact, remove, or otherwise obfuscate sensitive data detected during the sensitive data discovery process. This process creates a secure copy of the original data with the desired transformations applied, which can then be used broadly with the risk of sensitive data exposure mitigated.

Screen operates through the Granica Platform, which enables you to screen incoming data in the background immediately after objects land in your buckets. For more details on the Granica architecture and approach for background processing, see the reference for how Screening works. Screen can be configuration to apply a variety of transformations to sensitive data based on the type of sensitive data identified. See the configuration reference for more details.

Why we're the best solution

Best-in-class accuracy

Highly accurate sensitive data detection is the key to successfully de-identifying a dataset. First, high recall of sensitive data is required to successfully detect sensitive data, as undetected sensitive data cannot be obfuscated and will continue to be included in cleartext in the de-identified data. High precision is also vital so that non-sensitive data is minimally disturbed, maintaining the value of the dataset.

Granica Screen provides demonstrably superior classification accuracy in terms of both recall and precision. We benchmark our performance on a variety of synthetic data, such as data generated by the Presidio Research library, as well as real datasets across a range of filetypes and industries.

Simple integration

Since data de-identification first requires information from sensitive data detection, Granica Screen's integrated solution is the simplest way to implement comprehensive data privacy. Alternative detection-only approaches require the purchase, integration, and maintenance of a separate de-identification solution, increasing infrastructure costs, management burden as well as risk of error.

Scalable to petabytes of data and billions of objects

Granica Screen is built on top of Granica's data processing platform, which serves customers storing petabytes of data and billions of objects. Many vendors cannot handle both detection and transformation of data at scale, but Granica's platform has been built to efficiently handle large volumes of data.

See also