Granica Crunch overview

Lakehouse-native compression optimization for analytics, ML and AI.

Granica Crunch is the industry's first cloud cost optimization solution purpose-built for lakehouse data. It applies lakehouse-native data compression optimization to high volume datasets heavily used in analytics, ML and AI. Such data is stored in columnar formats, especially Apache Parquet, which typically already use a form of compression such as Snappy, zlib, zstd and others.

Crunch optimizes the compression used inside these formats, dramatically shrinking the physical size of the columnar files. This in turn dramatically shrinks the cost to store and transfer them while also speeding queries run against them.

Crunch simple architecture

The problem

Organizations of all sizes realize they need to focus on being a lot more efficient with their resources in order to keep investing into their strategic priorities. For most organizations today, AI is already a strategic priority and it is becoming more strategic every day.

Modern data lakes, lakehouses and AI systems with their large volumes of data in the public cloud represent a significant source of data inefficiency. The potential savings from applying Crunch to optimize that data and eliminate those inefficiencies is dramatic.

How Granica Crunch helps

Analogous to how a query optimizer further speeds the performance of existing SQL queries, Crunch is a compression optimizer which further reduces the size (and increases the efficiency) of existing columnar files such as Apache Parquet while remaining standards-compliant.

Granica Crunch reduces the cost to store and transfer petabyte-scale lakehouse data typically by 15-60%, depending on the structure and pre-existing compression of your columnar files. If you're storing 10 petabytes of Parquet in Amazon S3 or Google Cloud Storage-backed lakehouses that translates into ~$1.2M per year of gross cash savings from reducing at-rest storage costs alone.

Use cases

Lower costs to store & move data

Crunch compression optimization shrinks the physical size of your columnar-optimized files, lowering at-rest storage and data transfer costs by up to 60% for large-scale lakehouse data sets.

Faster cross-region replication

Smaller files also make data transfers and replication up to 60% faster, addressing AI-related compute scarcity, compliance, disaster recovery and other use cases.

Faster processing

Smaller files are also faster files — they accelerate any process bottlenecked by network or IO bandwidth, from queries to data loading for model training. Up to 56% faster based on TPC-DS benchmarks.

Key characteristics and features

  • Multi-Cloud — Supports Amazon S3 and GCP Google Cloud Storage
  • Lakehouse-Native IO — Data remains in standard, open columnar formats. Supports Apache Parquet with other formats coming soon. Not in the read path — requires no changes to applications.
  • Structure-Adaptive Compression Optimizer — Advanced ML-based algorithms dynamically control, adapt and tune lossless compression to each file's unique structure, lowering costs by up to 60%.
  • Data-Driven Query Boost — Optimized files enhance query execution and efficiency, accelerating query performance by up to 50% based on TPC-DS benchmarks.
  • Zero-Copy Architecture — Compresses and updates files in place, without creating copies.
  • Exabyte-Scale — Background data processing handles arbitrary volume of data.
  • Secure — Data never leaves your environment. Crunch runs entirely within your VPC.
  • Resilient — Highly available clusters ensure always-on data access.
  • Powerfully simple — Start crunching your data in-place and cutting costs in 30 minutes, with 1 command.

See also

Was this page helpful?

On this page