Crunch FAQ
Get answers to common questions.
General
What is Granica Crunch?
Granica Crunch is the industry's first cloud cost optimization solution purpose-built for lakehouse data. It applies lakehouse-native data compression optimization to the high volume datasets heavily used in analytics, ML and AI. These datasets typically consist of columnar files such as Apache Parquet stored in Amazon S3 and Google Cloud Storage. Granica Crunch optimizes the compression for these files, lowering at-rest storage and data transfer costs by up to 60% and speeding queries by up to 56%.
How much cost reduction does Crunch typically deliver?
Granica Crunch typically reduces columnar data-related storage costs by 15-60%, depending on your data. In TPC-DS benchmark tests using Snappy-compressed Parquet as the input data, Crunch delivered an average 43% file-size reduction. The reduction rate directly translates into gross savings on an annual basis. If you’re storing 10 petabytes of AI data in Amazon S3 or Google GCS that translates into >$1.2M per year (growing as your data grows).
How does Crunch compare to alternatives?
AI data is hot data, and Crunch delivers the savings efficiencies you need without the trade-offs:
- Vs. DIY compression and optimization: Granica Crunch Crunch automatically adapts optimization based on the unique structure of each file, eliminating data pipeline complexity, risk, and latency/throughput challenges.
- Vs. archival and tiering to cooler/colder classes: Granica Crunch speeds queries and data loading times rather than slowing them down, and also doesn't incur any data access and transfer charges.
How does Granica Crunch compress and optimize columnar files?
Granica Crunch uses a proprietary compression control system that leverages the columnar nature of modern analytics formats like Parquet and ORC to achieve high compression ratios through underlying OSS compression algorithms such as zstd. The compression control system combines techniques like run-length encoding, dictionary encoding, and delta encoding to optimally compress data on a per-column basis.
Does Crunch affect query performance and read/write latency?
Granica Crunch is NOT in the read or write path. It works in the background to adaptively optimize the OSS compression and encoding of your columnar lakehouse files stored in Amazon S3 and/or Google Cloud Storage. Apache Spark and Trino-based applications continue accessing the newly-optimized files normally. Query performance actually improves as there are fewer physical bits to read and transfer from object storage to the query engine. TPC-DS benchmarks show an improvement of 2-56% depending on the query.
Pricing, Billing and ROI
How does Granica charge for Crunch?
Granica Crunch is licensed and priced based on the volume of compressed data stored per month, measured using uncompressed source bytes. Granica charges a minimum monthly commitment which allows you to crunch up to 1PB of source data. Thereafter, each incremental terabyte of source data crunched is billed at a $ per TB rate. Please contact us for a custom quote based on your environment.
How does Granica handle data deletions in billing?
Granica calculates its monthly bill based on uncompressed source bytes at the end of each month. If data is deleted, the subsequent month's bill will reflect the reduced data volume.
How does Granica's monthly reconciliation process work?
At month's end, Granica measures the total number of uncompressed source bytes that have been crunched in that month. The invoice is then calculated and distributed by the 5th of the following month for payment.
How should we budget for Crunch if product usage can vary monthly based on data volumes?
We recommend allocating a budget with a buffer percentage (e.g., 10-20% above or below the average estimated monthly cost). Crunch policies can be configured to respect monthly and annual budget targets.
How is overall ROI calculated?
Your Net Savings ROI is calculated by taking the annual direct bottom line savings generated by Granica Crunch (e.g., at-rest storage savings + compute savings + network savings) minus Granica licensing fees.
Are there additional benefits beyond direct cost savings?
Yes, Crunch also delivers value through developer productivity gains and potential topline revenue increases due to positive impacts on query performance. While these are not included in the direct cost savings ROI for conservatism, they should be considered when evaluating Crunch's full impact on your operations.
Are there volume-based or term-based discounts available?
Yes, Granica applies select volume-based discounts for higher data volume tiers and term-based discounts for 24 and 36-month contracts. With higher data volumes, the overall cost per TB trends downward as the impact of the discounts increases.
Can we use existing AWS or Google Marketplace commitments to pay for Granica Crunch?
A: Not yet, however we are actively working to list Crunch in those marketplaces. Please contact us for specifics.
Implementation
Is Granica Crunch compatible with my existing data formats, tools, and queries?
Granica Crunch is fully compatible with Apache Spark and Trino query engines utilizing Apache Parquet, including any BI tools querying those engines. Granica-optimized tables can be queried using standard SQL without any modification.
We plan to enhance Granica Crunch to support other major query engines such as Presto, Hive, BigQuery, EMR, Databricks etc., support for other columnar formats such as ORC, and integration with popular data catalogs and schema registries.
Does Crunch require any changes to our production applications?
No, your production applications continue to read and write data as they normally would. Crunch's Zero-Copy Architecture optimizes, compresses and updates files in place, without the need to create copies. This lowers costs and reduces data management overhead while ensuring seamless integration and zero disruption to your existing workflows.
How do I decide which datasets to compress?
In general, the largest datasets with the highest storage costs are the best candidates for compression. You should also consider datasets that are infrequently updated and have a high query volume, as these will benefit the most from compression. Granica provides tools such as Chronicle AI to analyze your data landscape and recommend datasets for compression based on size, usage, and query patterns.
Can I compress streaming or real-time data with Granica Crunch?
Yes, Granica Crunch can be used to compress streaming data that has landed in a data lakehouse or warehouse. The compression process is typically run as a micro-batch job on a regular interval (e.g. every 5 minutes), allowing freshly ingested data to be compressed and made available for querying with minimal latency.
What is the typical timeline for implementing Granica Crunch in production?
For most customers we recommend a phased implementation approach spanning 8-12 weeks. This includes an initial pilot on a subset of data to validate the compression ratios and performance benefits, followed by a production rollout across the entire dataset. The exact timeline depends on the size and complexity of your data environment.
How does Crunch support our cloud strategy?
Crunch is designed to work seamlessly across major cloud platforms, giving you flexibility to move data and workloads as you see fit. Crunched data is also faster and lower cost to transfer across regions - and clouds - giving you further flexibility and opening up use cases such as disaster recovery, access to GPU compute etc. Our internal FinOps optimizations, such as intelligent caching and data placement, help minimize cross-AZ traffic and optimize resource utilization associated with our deployments, balancing performance and cost in your cloud environment.
How do I "undo" crunching of my data, and/or the entire Granica deployment?
At any time you can simply uncrunch your AI data to return it to its original unreduced form, and then teardown your deployment to return your environment to its pre-Granica state.