Architecture

Understand the Granica Crunch platform architecture, deployment models, and the separation between control plane and data plane.

Granica Crunch is built on a two-plane architecture that separates orchestration from data processing. Understanding this separation is important for evaluating security, data residency, and operational boundaries.

Control plane and data plane

Data plane — The data plane runs in your cloud environment and is responsible for all actual data processing. This includes the Spark clusters that read, optimize, and write your Parquet and Iceberg files. Your data is processed in place inside your cloud; it does not need to travel anywhere for Crunch to work.

Control plane — The control plane hosts the Granica Console, API, Airflow scheduler, and PostgreSQL state store. Depending on your deployment model, the control plane runs either in Granica's cloud (Hybrid) or entirely within your cloud (On-Premises).

The two planes communicate through a secure tunnel maintained by the Tunnel Agent running in your data plane. Granica uses this tunnel for job scheduling, operational access, and software upgrades — no inbound network access to your environment is required.

Deployment models

Granica supports three deployment models. The right choice depends on your data residency, compliance, and operational requirements. See Deployment Models for a detailed comparison.

Model	Control plane location	Data plane location	Table data leaves your cloud?
Granica Hosted	Granica's cloud	Granica's cloud	Yes
Hybrid	Granica's cloud	Your cloud	No — only metadata and metrics
On-Premises	Your cloud	Your cloud	No

Hybrid (recommended)

In the Hybrid model, the control plane (Console, API, Airflow, PostgreSQL) runs in Granica's cloud. The data plane (Spark, Tunnel Agent, Granica Worker) runs inside your cloud environment. This is the most common deployment model.

What leaves your cloud in this model:

Spark job progress metrics (e.g. completed task counts, durations)
Job status signals (e.g. RUNNING, FAILED)
Aggregated table metadata used for reporting (e.g. table names, partition counts, DRR)

Your actual table data never leaves your cloud.

On-Premises

In the On-Prem model, both the control plane and the data plane run entirely within your cloud environment. This model is designed for customers with strict data residency or compliance requirements where no traffic of any kind can cross cloud boundaries.

Granica Hosted

In the Granica Hosted model, Granica manages the entire platform including the data processing infrastructure. Table data and catalog metadata flow in and out of your cloud as part of Crunch operations. This model requires the least customer infrastructure but involves data leaving your cloud environment.

Key architectural properties

Data plane isolation

The Spark workers that process your data run in your cloud VPC. They read from and write to your own cloud storage (S3, GCS, Azure Blob). At no point do they transmit file contents outside your environment. The Tunnel Agent only carries control signals and metadata — never file data.

Single-tenant data plane

Each customer's data plane is a dedicated deployment — a single-tenant EKS cluster in your cloud account or project. Granica does not run multi-tenant compute for data processing. Your data is never co-mingled with another customer's data on shared infrastructure.

Components and data flow

Object data integrity

Crunch Data integrity

Granica implements multiple levels of data integrity to ensure your data is always protected during optimization.

Object integrity

Object data integrity

Pre-Crunch file validation. Before crunching, Granica reads the source file to verify it is consistent with the native format — for example, confirming a Parquet file is structurally valid before processing begins.
Post-Crunch integrity validation. Immediately after a Crunch job completes, Granica performs logical data validation by comparing the source and optimized output, and verifying row counts match.

Integrity failure handling

In the unlikely event of an integrity failure, Crunch stops processing new objects and the Granica team is alerted immediately. Processing resumes only after the failure is investigated and resolved.

High availability

Granica provides >99.99% availability, built on cloud-native primitives such as AWS EKS with multi-AZ node groups.

High availability

All Crunch services run as Kubernetes pods across a cluster of compute instances. A minimum two-node on-demand cluster ensures baseline availability. As workload increases, Granica automatically provisions additional spot instances and scales service pods to match. Pods are distributed across nodes and availability zones using a Broker pod that manages routing to distributed service pods.

Elastic scaling

Granica Crunch is a background service and is not in the read or write path of your query engines. It operates independently, reading from and writing to cloud storage without affecting query latency.

Elastic Scaling

Compute resources are fully elastic. Crunch uses autoscaling Kubernetes clusters that scale from zero to as many nodes as needed based on the volume of data queued for optimization, and back to zero during idle periods. Processing throughput reaches 150 MBps per node.

Non-disruptive upgrades

Granica upgrades are transparent to your applications and do not require downtime or application changes.

Non-disruptive upgrades

Granica uses a rolling upgrade approach across all service pods, containers, and Kubernetes cluster infrastructure. In the Hybrid deployment model, Granica can perform upgrades through the control plane tunnel without requiring direct access to your cloud environment. On-Premises customers can trigger upgrades manually or configure automatic updates.

Control plane and data plane

Deployment models

Hybrid (recommended)

On-Premises

Granica Hosted

Key architectural properties

Data plane isolation

Single-tenant data plane

Components and data flow

Crunch Data integrity

Object integrity

Integrity failure handling

High availability

Elastic scaling

Non-disruptive upgrades

See also

On this page