Configuration

Configuring Granica Screen

Once you've installed the Granica platform with Granica Screen enabled, it's time to start monitoring and protecting your data. This can be managed through the Granica CLI interface.

1. Identify data to be protected by Granica Screen

Granica Screen supports scanning existing data stored in data lakes such as Amazon S3 and Google Cloud Storage (GCS). The first step is to identify buckets or data of interest are identified, which might be all buckets within your organization! The data of interest can then be configured for scanning in the Granica policy.

Currently, the following file types are supported - unsupported files will be skipped and will not affect the scanning process.

File TypeExtensionsScan MethodAvailable Now
Big Data.parquet, .snappy.parquetStructured ParsingYes
Comma/tab separated.csv, .tsvStructured ParsingYes
Text.json, .txt, .html, etc.Intelligent ParsingYes
Email.emlIntelligent ParsingYes
Archived/Compressed.gz, .zipDecompress and ParseYes
Image.jpeg, .png, .tiffOCRIn progress, contact us
Document.pdf, .doc, .xlsx, .pptxIntelligent ParsingIn progress, contact us

2. Specify types of sensitive data to identify

Within the Granica policy, the set of sensitive data to identify can be configured.

Currently, the following types of sensitive data are supported by standard classifiers. Custom classifiers can also be specified in addition to these, and Granica is continuously adding support for additional types of sensitive data. Note: If data can be interpreted as multiple PII types, we report the most likely type.

3. Specify report format and location

After the data is scanned, Granica Screen generates reports for each instance of sensitive data identified. The format and location of this report can be customized as follows within the Granica policy.

ConfigurationOptions
Output formatjson, csv, Parquet
Output compressionnone, gzip, snappy (Parquet only)
Output locationAn AWS S3 or GCS location. If unspecified, a bucket will automatically be created.

The generated report includes the following information for each instance of sensitive data:

ColumnTypeDescription
nbigintIndex of result within result file
obj_keystringThe cloud object containing this instance of sensitive data
classification_typestringThe type of sensitive data identified
offsetbigintThe offset location within an unstructured file
classified_sizebigintThe length of the result within an unstructured file
rowbigintThe row number of a result within a tabular file
colbigintThe column number of a result within a tabular file
column_namebigintThe column name of a result within a tabular file, when available
datastringThe sensitive data identified (optional via policy)

4. Specify the redacted output format

In addition to generating a detection report, Granica Screen can directly redact sensitive data from a file and create a sanitized copy of the data at a separately configured cloud location. Appropriately redacted data can then be used in broader contexts to enable additional use cases while managing privacy risk.

A variety of redaction formats are supported, along with additional customization options.

Transformation TypeDescription
RedactionRemoval of sensitive data without replacement, e.g. "My name is John Smith" to "My name is"
ReplacementReplacement of sensitive data with a fixed value, e.g. [REDACTED]
Size-preserving replacementReplacement of sensitive data with a value of equal length, e.g. XXXXX
Named replacementReplacement of sensitive data with a label identifying the type of sensitive data, e.g. [EMAIL]
Numbered replacementReplacement of sensitive data with a label identifying each unique instance of sensitive data, e.g. [EMAIL_1] and [EMAIL_2]
EncryptedReplacement of sensitive data with an encrypted value, e.g. [EMAIL_encryptedemailaddress]
Format preserving encryptedReplacement of sensitive data with an encrypted value, preserving the original format, e.g. john@granica.ai to siek@jtiwoei.qb
Synthetic data replacementReplacement of sensitive data with a similar synthetic value of the same type, e.g. replacing John with Evan

If you need further assistance with redaction formats, contact us for details.

See also