How policies work
Learn how to manage Granica using policies.
Granica policies give you additional controls around crunching and managing data in your cloud object stores. They enable you to centrally manage Granica behind the scenes without impacting developer workflows or applications. Granica policies support a range of use cases, enabling you to:
- Automatically crunch new buckets
- Control crunching while you Granica-enable your custom applications
- Crunch cold data without making any changes to your custom applications
- Use Crunch with 1st and 3rd party applications that are not Granica-enabled
- Restrict modifications or deletions of objects for retention and compliance
- Maintain your existing object deletion policies for GDPR and compliance
- Maintain your existing lifecycle policies to tier crunched data to archival storage classes (e.g. S3 Glacier, GCS Coldline)
- Protect against accidental deletion of objects
- Control which objects within a bucket are crunch-eligible
- Automate object removal
Unlike S3/GCS policies which are managed at the individual bucket level, Granica policies are applied and managed globally making them simple to administer.
1Policy Management
How to manage policies
Use the granica policy edit
command to view, change, apply and delete Granica policies. Granica policies will not be in effect for a given bucket until that bucket has been discovered and
crunched. This occurs when you manually run the granica crunch <bucket>
, granica execute-policy
, and/or granica crunch universe
commands, or when you enable auto-crunch
for the universe.
Assuming the bucket matches the policy filters, then these actions place the bucket under ongoing management by Crunch. In other words, unmanaged buckets are not affected by policy edits, but
once a bucket is managed any subsequent policy updates will take effect immediately on save with no need to re-crunch.
Executing granica policy edit
will open your existing policy in the editor defined by the VISUAL or EDITOR environment variable on your Granica Admin Server. If neither of the variables exist, the command will open the policy using vi (vim).
Granica policies are a powerful tool. As a best practice test your policies on non-critical data before applying them to production buckets to avoid unintended consequences and potential data loss because of the incorrectly configured policies.
The first time you execute granica policy edit
command your editor will open the default Granica policy. If you modify and commit (write) the policy and exit the editor the modified policy is
automatically applied and takes effect immediately. If you commit (write) an empty file and exit the editor the existing policy is deleted and replaced with the default policy. If you exit the
editor without commiting (writing) any changes then the existing policy remains in place.
Yes, you can change the default editor for granica policy edit
by setting the environment variable VISUAL
(or EDITOR
if VISUAL
is unset). For example, to always use nano
add an export to the ~/.bashrc
file on the Granica Admin Server: echo 'export VISUAL=nano' >> ~/.bashrc && source ~/.bashrc
.
2Bucket Discovery (the "universe")
Introduction
The universe
section operates at the account level, not the bucket level, and specifies how and when Crunch discovers buckets and projects in your account. Discovered buckets are then filtered, crunched and managed using the relevant policies.
`auto-crunch`
If enabled, Crunch automatically and periodically runs granica crunch universe
to discover and crunch newly created buckets in your account. Also, every time you save the policy with auto-crunch
enabled Crunch will run granica crunch universe
.
If not defined or enabled, bucket discovery (and crunching) only occurs whenever you manually run granica crunch universe
.
Use cases
- Automatically crunch new buckets The
auto-crunch
policy makes it easy to capture savings from newly created buckets in your account.
Yes. This policy facilitates large scale adoption of Granica and assumes your app/dev workflow involves Granica-enabling your applications as a standard practice. This ensures your data is always fully accessible to your users and applications.
3Filters
Introduction
Crunch policy consists of three main filters:
- standard
- include
- exclude
`standard`
Specifies the parameters which apply to all buckets which are crunch-eligible (more on eligibility below). Standard settings can be overwritten by the settings listed in the include
section for a particular bucket or buckets matching the glob patterns. See Customizing Crunch Policy.
Use cases
- Create default policies. The
standard
filter makes it easy to define default crunch behavior.
`crunch-enable`
Specifies whether to enable Crunch. When set to “true”, Crunch is enabled according to the policy settings. When set to “false”, Crunch is disabled.
- Prevent crunching while you Granica-enable your custom applications. If you have a small number of buckets to exclude (vs. include), you can easily exclude them with the
exclude
policy. If you have buckets that are accessed by multiple applications, then all those applications must be Granica-enabled before you remove the exclusion and queue crunching via the CLI.
If include/exclude filters are not set, any bucket within the organization is eligible. If only an exclusion filter is set, then any buckets in the organization except those excluded are eligible.
`exclude`
Specifies which buckets are not crunch-eligible. The exclude
pattern overrides the include
pattern. If a bucket is listed here or meets the globe pattern then it will NOT be crunched regardless whether it is listed in the include
section or an explicit crunch command (granica crunch <bucket>
) is issued.
Use cases
- Prevent crunching while you Granica-enable your custom applications. If you have a small number of buckets to exclude (vs. include), you can easily exclude them with the
exclude
policy. If you have buckets that are accessed by multiple applications, then all those applications must be Granica-enabled before you remove the exclusion and queue crunching via the CLI.
If include/exclude filters are not set, any bucket within the organization is eligible. If only an exclusion filter is set, then any buckets in the organization except those excluded are eligible.
`include`
Specifies a list of global bucket glob patterns that define which buckets are crunch-eligible. Bucket customization parameters are also defined here. If there are no buckets listed in this section, no buckets will be crunched when crunching universe (granica crunch universe
). If the include section is populated, only buckets listed or buckets matching the glob pattern will be eligible for crunching. Buckets not explicitly listed in this section can still be crunched with granica crunch <bucket>
command.
For bucket customization paramaters see Customization.
Use cases
- Control the scope of buckets which Crunch will crunch. The
include
filter makes it easy to expand or reduce the scope of thegranica crunch universe
command. - Create exceptions for specific buckets and applications. The
include
filter makes it easy to add exceptions to thestandard
policies.
4Customization
Introduction
Crunch policy allows you to easily customize crunching for individual buckets or a group of buckets. To customize a specific bucket simply list it in the include
section along with the desired settings. If you have a glob pattern defined there, then adding the desired policy setting will apply to all matching buckets. The custom policy setting for a bucket or buckets will overwrite the standard settings defined in the standards
section. If a parameter is left unspecified in the custom policy settings, the value from the standard
section will apply.
Yes. The following policies involve time durations which must happen in a specific, logical order:
`freeze-for`
Specifies how long objects in a bucket must be retained before they can be updated, moved or deleted. The duration can be specified in intervals of seconds s
, minutes m
, hours h
, or days d
. This is the same as bucket retention policies you may be familiar with. Crunch will crunch both your incoming and existing data as per the policies set, but will not allow the crunched objects to be updated, moved or deleted until the freeze-for
period expires.
Configure the freeze-for
period to align with your retention requirements.
Use cases
- Restrict modifications or deletions of objects for retention and compliance. U.S.-based financial service institutions such as banks, broker-dealers and record keepers are required to comply with a number of regulations specifying requirements for electronic records retention, including the Securities and Exchange (SEC) Rule 17a-4(f), Commodity Futures Trading Commission (CFTC) Rule 1.31(c)-(d), and Financial Industry Regulatory Authority (FINRA) Rule 4511(c).
`crunch-after`
Specifies how long Crunch waits to crunch a object after it has been created. After you queue crunching via the CLI, Crunch continuously monitors your source buckets for new objects. However, instead of immediately crunching existing objects and/or new objects when they land, Crunch waits to crunch the objects until the crunch-after
duration has passed.
Set the crunch-after
duration to be greater than or equal to your data access window for your specified buckets.
Use cases
Crunch data without making any changes to your own applications. This use case requires that your applications not read data after a known period of time (say 30 days), i.e. that the data is cold after this timeframe. The benefit is that you do not need to make any changes whatsoever to your applications and so you can start seeing storage savings immediately; however, there are trade-offs. First, your savings are reduced as you will pay full storage costs for all data inside the
crunch-after
window. Second, it requires close coordination between between appdev teams to ensure existing (or more likely new) applications either (a) do not attempt to read the crunched data outside thecrunch-after
window or (b) are Granica-enabled before attempting to access the data. You can also use the Granica CLI plugin to access crunched data.Use Crunch with 1st and 3rd party applications that are not Granica-enabled. This use case requires that you wait to crunch your data until those 3rd party applications have stopped accessing it. For example, you could have a SaaS application like Snowflake or Redshift ingest your data (and thus make their own copy), and once the ingestion is complete you can crunch your data and continue to use it with your own Granica-enabled applications or the Granica CLI plugin.
No, since inline writes involve crunching the data immediately.
When freeze-for
is used in combination with crunch-after
, the crunch-after
duration must be longer, and the data will be frozen in the original source bucket. However, any incoming writes to Crunch will be processed directly and stored in a crunched format, which will obey the freeze-for
policy set on the bucket.
`tier`
If defined, specifies how long Crunch waits after objects have been modified (or previously tiered) before automatically moving them from their current storage class to a lower-cost storage class. tier
takes in a list of objects with class
and after
fields. Your crunched objects will move from their current storage class into your specified class
after the time period after
expires. The after
duration can be specified in intervals of seconds s
, minutes m
, hours h
, or days d
. Valid class
options are:
- Amazon S3:
glacier
,deep-archive
- Google GCS:
coldline
,archive
Set your desired storage tier using the class
field, and your desired tiering timeframe using the after
field.
Use cases
- Maintain your existing lifecycle policies. Easily
tier
crunched data to archival storage classes (e.g. S3 Glacier, GCS Coldline). - Further reduce storage costs for cold objects. Use
tier
for truly cold data such as old backups.
Lifecycle and tiering policies originated back when most data aged, cooled, and was infrequently accessed. Tiering policies existed in order to reduce the cost of storing that cold data especially given how fast it continued to grow. But today most AI data is hot data (or at least warm) and maintaining fast access is ever more critical. And while these lower-cost storage classes reduce your storage costs they add significant retrieval costs in addition to significantly increasing the latency to access your data. In practice this means that your data will likely cost you more to archive - contact us to request a customized cost and savings analysis. With Granica Crunch you can achieve both your cost savings goals and provide fast access to your cloud object data. In other words, you no longer need to implement traditional tiering policies; however, we provide the tier
option for any truly cold data (e.g. old backups in regulated industries). Recent (“fresh”) backups can and should be kept in cloud object storage so that you can recover quickly if necessary.
`expire-after`
If defined, specifies how long Crunch keeps objects after they have been modified before automatically deleting them. When expire-after
triggers Crunch to delete an object, all copies are deleted immediately. This is the case regardless whether the Recycle Bin is enabled, i.e. the Recycle Bin does not protect objects that are automatically deleted via expire-after
. Crunch deletes the expired object whether it is currently in your S3/GCS store or in a tiered storage class. Set your desired expiration timeframe using the expire-after
field.
Use cases
- Maintain your existing object deletion policies for GDPR and compliance. Crunch supports your existing deletion/expiration policies to ensure that when a crunched object expires it, and all copies, are deleted.
- Automatically clean up any temp data used for test/dev. Crunch helps you delete temporary data to reduce your storage costs. You can also use
expire-after
in combination with instant, free copies (Coming Soon) to further lower your costs as well as increase the speed of your development workflows.
- When you
DELETE
an object using the Granica API, Crunch follows the same procedure and deletes all copies of the object unless you have enabled the Recycle Bin, in which case Crunch waits to delete the object until theexpire-after
period has expired. - Crunch preserves all object metadata including
LastModified
time as-is, ensuring your objects will expire on the proper future date. :::
`uncrunch-expire-after`
uncrunch-expire-after
specifies how long Crunch keeps objects after they have been uncrunched before automatically deleting them from the source bucket. When uncrunch-expire-after
triggers Crunch to delete an uncrunched object, the object is deleted immediately from the source bucket. Set your desired expiration timeframe using the uncrunch-expire-after
field. If not defined, the uncrunch-expire-after
field defaults to a value of 1 day(1d
).
`object-include`
If defined, specifies an object regex pattern that defines which objects are crunch-eligible. Crunch applies the regex pattern on the entire key and only objects that match this pattern will be crunched via granica crunch <bucket>
and/or granica crunch universe
.
Use cases
- Control which objects within a bucket are crunch-eligible. Allows you to have a wide range of objects in a single bucket yet have full control over which objects can be crunched.
The pattern \.*.json$
would make all JSON files and objects with any prefix eligible to be crunched. Similarly a complex regex pattern like [20]{4}-[0-9]{2}-[0-9]{2}-[0-9]{2}-[0-9]{2}-[0-9]{2}_[012]{4}-[0-9]{2}-[0-9]{2}-[0-9]{2}-[0-9]{2}-[0-9]{2}
can detect timestamp range in key: sample/example/timestamp/2020-04-01-00-00-00_2020-04-01-23-59-59_timestamp_range.data
`object-exclude`
If defined, specifies an object regex pattern that defines which objects are not crunch-eligible. Crunch applies the regex pattern on the entire key. The object-exclude
pattern overrides the object-include
pattern, i.e. objects that match this pattern will NOT be crunched via granica crunch <bucket>
and/or granica crunch universe
regardless whether they match the object-include
pattern.
Use cases
- Control which objects within a bucket are crunch-eligible. Allows you to have a wide range of objects in a single bucket yet have full control over which objects can be crunched.
The pattern \.*.pdf$
would make all pdf files and objects with any prefix ineligible to be crunched.
`cleaner`
Crunched objects are cleaned on scheduled basis from the source bucket when certain conditions are met. The process responsible for cleaning the crunched objects called cleaner can be disabled. When cleaner is disabled, the crunched objects will not be removed from the source bucket.
Use cases
- Efficiently integrate Crunch into your environment. The
cleaner
policy can be used to prevent the deletion of crunched objects from source buckets, thus allowing the re-use of the same test objects without potentially lengthy and costly object movement such as archiving and restoration. - Initiate storage savings. With
cleaner
enabled, Crunch removes any original full-size objects and their associated costs from the environment. Screen Policy documentation
5Screen Filters
`screen`
Nested mapping which specifies screen-specific behaviors. It includes the following fields:
`enable`
Specifies whether to enable Screen. When set to “true”, Screen is enabled according to the policy settings. When set to “false” (default), Screen is disabled.
`classification-types`
An array of classification type specifications, specifying which classification types to enable, as well as the desired likelihood level. Each specification is a mapping with the following fields:
`type`
The name of the classification type. See [link](granica-screen-configuration) for a list of supported classification types.
`likelihood`
The minimum likelihood of a match. Lower likelihood threshold gives better recall, while higher likelihood threshold gives better precision. Accepted values: LOW, MEDIUM (default), HIGH
`transformation-params`
- A mapping of configurations specifying what transforms, if any, to apply to the data. This mapping includes the following fields:
`transformation-type`
The type of transformation to apply to sensitive data identified. See [link](granica-screen-configuration) for documentation of behaviors
- Accepted values: "NONE”, “NAMED", "REDACTED", "SIZE_PRESERVING"
- Default: “NONE”
`redaction-char`
Character to use for redaction, if transformation type is SIZE_PRESERVING. Ignored otherwise.
Default: '#'
`redaction-string`
Character to use for redaction, if transformation type is REDACTED. Ignored otherwise.
- Default: REDACTED
`transformation-output-path`
A cloud bucket/prefix to write transformed objects to. Cross-cloud writes are not supported, so the specification should be of the form “
Must be set if transformation-type is not NONE.
`max-objs-per-day`
Specify maximum number of objects per day in a bucket to scan, for sampling purposes.
- Default: no limit.
`obj-sampling-percent`
Specify target percent of objects to be sampled in a bucket for scanning. This percentage of objects will be scanned, up to max-objs-per-day (if set).
- Default: 100%