How Screen Works
Understand how Screen protects your data
Background
Data warehouses typically follow a standard architecture where data flows from sources through an ETL process, populates core and derived data tables, and ultimately feeds online systems. However, many data sources contain private or sensitive information, and ensuring its protection throughout this process is crucial.
Challenges in Data Privacy
Sensitive data can be exposed at two key points:
- ETL process: Before data lands in the core data lake/warehouse, it's essential to clean sensitive data to prevent unauthorized access.
- Derived data generation: While allowing sensitive data in core tables might be necessary for specific use cases, direct access to such data should be restricted. When generating derived data sources, it's vital to clean/omit sensitive data beforehand to prevent exposure.
Requirements for a Solution
An ideal solution for this scenario should address the following:
- Accurate identification and handling of private/sensitive data: The solution should effectively pinpoint and manage sensitive data across the data warehouse.
- Performance and scalability: The data cleaning process must handle large data volumes efficiently, especially when applied during ETL.
- Cost-effectiveness: Inefficient data processing can significantly impact data warehouse operational costs due to the large data volumes involved.
Where Granica Screen Fits In
Granica Screen offers a highly accurate, scalable, and efficient data classification engine combined with a flexible system for cleaning, redacting, or otherwise obfuscating sensitive data as needed. Screen is deployed into your VPC.
Granica Screen Deployment Modes
Continuous scan:
Monitors a specified cloud storage path (e.g., table, namespace, entire data warehouse) for new and existing data, generating reports on detected sensitive data.
On Demand scan
Scans an existing dataset once and generates reports on sensitive data findings.
API
Provides programmatic access to Granica Screen's data classification and transformation capabilities for custom integrations.
Screening Modes:
Detection Reports:
Granica Screen scans data automatically, generating reports on sensitive data findings without impacting existing workflows. You can choose which data to scan anywhere in the warehouse (staging during ETL, core tables, derived data) and gain insights into the location of sensitive data.
Data Transformation (PII removal) and Detection Reports
Granica Screen integrates into the data transformation pipeline. Configure a destination location, and a transformation configuration, and Granica Screen will output transformed data there. Depending on your access control needs:
- No sensitive data in the warehouse: The ETL process outputs data to a staging table, where Granica Screen redacts it before transferring it to the desired core tables.
- Sensitive data in core tables, but not derived tables: Similar to above, a dedicated staging process can be set up for specific derived data tables, ensuring sensitive data remains excluded.
Typical Scenarios
Self-managed data warehouse (S3/GCS storage)
- Scenario: Logs are exported to
s3://my-org-logs/
. - Policy options:
- Add
s3://my-org-logs
to Granica policy include filter. - Run
granica crunch s3://my-org-logs
for non-crunch data.
- Add
- Detection: Reports are generated to
s3://n-hawkeye-report-...
and displayed in the Console dashboard. - Transformation: Cleaned logs are exported to
s3://my-org-logs-cleaned/
with corresponding object names.
External tables in managed data warehouse (S3/GCS storage)
Scenario: Consider the table creation statement:
CREATE TABLE logs LOCATION 's3://depts/finance/my-org-logs';
Granica setup: The same setup as above applies. However, the external table needs to be configured to use the new output location containing the cleaned data:
CREATE TABLE logs_cleaned LOCATION 's3://depts/finance/my-org-logs-cleaned';
Managed tables in data warehouse (Snowflake, Bigquery, Redshift, Databricks)
- API integration (future release): Granica Screen will provide API connectors to:
- Get notified of new data.
- Read data.
- Write data (if transforming).
Get a demo
Contact us to get a live demo and see Granica Screen in action.