Architecture
Understand the Granica Crunch platform architecture, deployment models, and the separation between control plane and data plane.
Granica Crunch is built on a two-plane architecture that separates orchestration from data processing. Understanding this separation is important for evaluating security, data residency, and operational boundaries.
Control plane and data plane
Data plane — The data plane runs in your cloud environment and is responsible for all actual data processing. This includes the Spark clusters that read, optimize, and write your Parquet and Iceberg files. Your data is processed in place inside your cloud; it does not need to travel anywhere for Crunch to work.
Control plane — The control plane hosts the Granica Console, API, Airflow scheduler, and PostgreSQL state store. Depending on your deployment model, the control plane runs either in Granica's cloud (Hybrid) or entirely within your cloud (On-Premises).
The two planes communicate through a secure tunnel maintained by the Tunnel Agent running in your data plane. Granica uses this tunnel for job scheduling, operational access, and software upgrades — no inbound network access to your environment is required.
Deployment models
Granica supports three deployment models. The right choice depends on your data residency, compliance, and operational requirements. See Deployment Models for a detailed comparison.
| Model | Control plane location | Data plane location | Table data leaves your cloud? |
|---|---|---|---|
| Granica Hosted | Granica's cloud | Granica's cloud | Yes |
| Hybrid | Granica's cloud | Your cloud | No — only metadata and metrics |
| On-Premises | Your cloud | Your cloud | No |
Hybrid (recommended)
In the Hybrid model, the control plane (Console, API, Airflow, PostgreSQL) runs in Granica's cloud. The data plane (Spark, Tunnel Agent, Granica Worker) runs inside your cloud environment. This is the most common deployment model.
What leaves your cloud in this model:
- Spark job progress metrics (e.g. completed task counts, durations)
- Job status signals (e.g. RUNNING, FAILED)
- Aggregated table metadata used for reporting (e.g. table names, partition counts, DRR)
Your actual table data never leaves your cloud.
On-Premises
In the On-Prem model, both the control plane and the data plane run entirely within your cloud environment. This model is designed for customers with strict data residency or compliance requirements where no traffic of any kind can cross cloud boundaries.
Granica Hosted
In the Granica Hosted model, Granica manages the entire platform including the data processing infrastructure. Table data and catalog metadata flow in and out of your cloud as part of Crunch operations. This model requires the least customer infrastructure but involves data leaving your cloud environment.
Key architectural properties
Data plane isolation
The Spark workers that process your data run in your cloud VPC. They read from and write to your own cloud storage (S3, GCS, Azure Blob). At no point do they transmit file contents outside your environment. The Tunnel Agent only carries control signals and metadata — never file data.
Single-tenant data plane
Each customer's data plane is a dedicated deployment — a single-tenant EKS cluster in your cloud account or project. Granica does not run multi-tenant compute for data processing. Your data is never co-mingled with another customer's data on shared infrastructure.
Components and data flow

Crunch Data integrity
Granica implements multiple levels of data integrity to ensure your data is always protected during optimization.
Object integrity

- Pre-Crunch file validation. Before crunching, Granica reads the source file to verify it is consistent with the native format — for example, confirming a Parquet file is structurally valid before processing begins.
- Post-Crunch integrity validation. Immediately after a Crunch job completes, Granica performs logical data validation by comparing the source and optimized output, and verifying row counts match.
Integrity failure handling
In the unlikely event of an integrity failure, Crunch stops processing new objects and the Granica team is alerted immediately. Processing resumes only after the failure is investigated and resolved.
High availability
Granica provides >99.99% availability, built on cloud-native primitives such as AWS EKS with multi-AZ node groups.

All Crunch services run as Kubernetes pods across a cluster of compute instances. A minimum two-node on-demand cluster ensures baseline availability. As workload increases, Granica automatically provisions additional spot instances and scales service pods to match. Pods are distributed across nodes and availability zones using a Broker pod that manages routing to distributed service pods.
Elastic scaling
Granica Crunch is a background service and is not in the read or write path of your query engines. It operates independently, reading from and writing to cloud storage without affecting query latency.

Compute resources are fully elastic. Crunch uses autoscaling Kubernetes clusters that scale from zero to as many nodes as needed based on the volume of data queued for optimization, and back to zero during idle periods. Processing throughput reaches 150 MBps per node.
Non-disruptive upgrades
Granica upgrades are transparent to your applications and do not require downtime or application changes.

Granica uses a rolling upgrade approach across all service pods, containers, and Kubernetes cluster infrastructure. In the Hybrid deployment model, Granica can perform upgrades through the control plane tunnel without requiring direct access to your cloud environment. On-Premises customers can trigger upgrades manually or configure automatic updates.
See also
How crunching works
Understand how Crunch optimizes compression for your lakehouse data.
Crunch compatibility
Supported platforms and file formats for Granica Crunch.