How crunching works
Understand how Crunch optimizes compression for your lakehouse data.
Once you've deployed Granica into your cloud environment, it's time to get crunching. "Crunching" is our euphemism for data processing in the context of compression optimization, where we "crunch" the data down to its purest information-rich state.
The Granica Crunch lakehouse compression optimizer works at the bucket level. You first specify buckets which are eligible to be crunched via policy, and then run granica crunch <bucket_name> to begin crunching.
Production lakehouse data
Production lakehouse data is the columnar data your teams are generating and working with every day. Crunch offers two mechanisms to compress and optimize that data:
| Runtime Crunch | Background Crunch | |
|---|---|---|
| Optimizes incoming data (as written) | Yes | No |
| Optimizes existing data | No | Yes |
| Continuously learns from data | Yes | No |
| Compatibility | Any application using open source Parquet writer | Specific qualified platforms |
| Availability | Early access | Now |
Crunch lexicon
- Crunched buckets — those under active management, processing and monitoring by Crunch
- Crunched objects — objects evaluated by Crunch for compression optimization
- Vanilla buckets — those which have not been crunched
- Vanilla objects — those which have not been crunched
- Ingested objects — crunched and reduced (background mode only)
- Analyzed objects — crunched and analyzed to generate optimal recipes (runtime mode only)
Runtime crunch write workflow
In this mode Crunch has two main components:
-
An ML-powered adaptive compression control system which analyzes your existing columnar files to create compression optimization recipes. The control system continuously learns from and adapts to your data over time.
-
A runtime optimizer SDK which integrates into your data platform and is invoked transparently by any applications utilizing an open source Apache Parquet writer, without any code changes.

1. User runs granica crunch to initiate crunching on an eligible vanilla source bucket. The Controller retrieves vanilla objects using LIST and GET operations.
2. The Controller routes vanilla objects to a Compression recipe generator, which analyzes the unique characteristics and structure of the columnar files.
3. The Compression recipe generator updates the Compression recipe store with new or updated recipes.
4. Spark-based applications initiate columnar writes using standard commands. The Granica runtime SDK intercepts the write and applies the best available recipe.
5. The SDK writes data out in standard, lakehouse-native format (typically Parquet).
6. The runtime SDK notifies the compression system to analyze newly created files, creating a continuous feedback loop.
Crunch is not in the read path. Reading Granica Crunch compressed files is transparent — any application using the open source Parquet reader can read them normally.
Background crunch write workflow
In this mode Crunch monitors your buckets for incoming columnar files and crunches them in the background. Once your data is crunched you'll see immediate savings in your lakehouse storage costs.

1. User runs granica crunch to initiate crunching on an eligible vanilla source bucket. The Controller retrieves vanilla objects.
2. When an application writes to the Crunched bucket, the Controller receives notifications via real-time SQS pub/sub events.
3. The Controller sends vanilla objects to a load balanced Compression optimizer.
4. The Compression optimizer validates policy eligibility, optimizes the compression and encoding, and swaps the original object with the newly optimized version — initiating a reduction in your monthly cloud storage bill.
Crunch is not in the read path in background mode either. Crunch swaps the original files with smaller, compression-optimized versions. Compatible applications will then begin reading the reduced files normally.
See also
Granica Crunch overview
Lakehouse-native compression optimization for analytics, ML and AI.
Crunch compatibility
Supported platforms and file formats for Granica Crunch.