Disaster recovery

Granica is deployed as a Kubernetes cluster inside your AWS environment. The cluster itself is stateless with the following exceptions:

The Terraform state that reflects the infrastructure is stored in an S3 bucket
The ingested data is stored in S3 buckets
The database that is used by the cluster uses an EBS volume (backed up every 10 mins to S3)

Record your cluster configuration

Granica cluster details like EBS volumes, SQS ARN, S3 bucket used for DB backup, etc. are listed as part of granica deploy or granica ls. A best practice is to record this information and keep it in a safe place — before disaster strikes. You can always contact the Granica team for this information as we persist it as part of our telemetry.

Granica Admin Server details:

$ granica ls
--- AWS Deployments ---
[default]
{
  "remote_state_bucket": "n-tfst-t",
  "remote_state_bucket_path": "17f396",
  "remote_state_bucket_region": "us-east-1",
  ...
}

Granica cluster details:

$ granica deploy
Disaster recovery config: retain this in a safe location
{
  "volumes": {
    "db-persistent-storage-db-0": "aws://us-east-1d/vol-060aeebd1f7797a0b",
    "prometheus-pvc": "aws://us-east-1d/vol-0ad9ac3cb5d2175ad",
    "register-pvc": "aws://us-east-1d/vol-0ba6ff3fb74ab9e4c"
  },
  "buckets": {
    "meta_bucket": "n-meta-us-east-1-c2ed",
    "remote_state_bucket": "n-tfst-t"
  },
  "iam": "project-n-admin-b89c311b",
  "aws_sqs_arn": "arn:aws:sqs:us-east-1:<account>:project-n-<cluster-id>-sqs"
}

Recover from accidentally deleted Admin Server

If you accidentally delete your Granica Admin Server, you can create a new one using your previously recorded Terraform state:

Create a new t2.micro instance in the same VPC and with the same IAM role as the original Admin Server
Install the Granica CLI from RPM — contact Granica Support for the correct release
Use aws configure to point to the correct region
Run granica deploy using the bucket containing the Terraform state:

granica deploy --remote-state s3://n-tfst-mpar-a5f684/b6747c --custom-domain="xxxx.bolt.granica.ai"

If the initial deployment used projectn.tfvars to specify VPC, subnets, etc., specify it here as well: granica deploy --remote-state s3://n-tfst-mpar-a5f684/b6747c --custom-domain="xxxx.bolt.granica.ai" --var-file=projectn.tfvars

This deploy command will not recreate any resources — it only updates the local state of the new Admin Server.

Recover from accidentally deleted cluster

The recovery procedure depends on whether the EBS volumes are still present.

The production database changes are always written to the EBS volume before being finalized, and updates are also streamed every 10 mins to an S3 bucket. If both the S3 bucket and EBS volume are available, the database pod can start fresh and recover to the original state.

When EBS volumes are still present

Create a projectn.tfvars file with the volume IDs and the database cloud bucket name (meta_bucket). Include SQS ARN if still present:

meta_bucket = "n-meta-us-east-1-c2ed"
db_volume_id = "aws://us-east-1d/vol-060aeebd1f7797a0b"
prometheus_volume_id = "aws://us-east-1d/vol-0ad9ac3cb5d2175ad"
register_volume_id = "aws://us-east-1d/vol-0ba6ff3fb74ab9e4c"
aws_sqs_arn = "arn:aws:sqs:us-east-1:<account>:project-n-<cluster-id>-sqs"

Create a new cluster and Admin Server using the setup.sh script with your projectn.tfvars file. Contact Granica Support for the package URL:

./setup.sh --var-file /Users/username/customer_deployment_arena/projectn.tfvars --package <url_granica_rpm>

Configure the new cluster with your custom domain: granica deploy --custom-domain="xxx.bolt.granica.ai"
Add the new DNS entries from the deployment output to your DNS provider (e.g. AWS Route53)
Wait for the Kubernetes pods to come up: watch kubectl get pods
Alter all existing source bucket policies (DR cluster creates a different data cruncher role)

When EBS volumes have been accidentally deleted

A newly deployed cluster needs to be up for at least 15 minutes before the n-meta- S3 bucket has consistent data. If the cluster and EBS volumes are deleted before that 15-minute mark, the bucket data is unusable. Contact Granica for assistance in this case.

Create a projectn.tfvars file with the database cloud bucket name and SQS ARN:

meta_bucket = "n-meta-us-east-1-c2ed"
aws_sqs_arn = "arn:aws:sqs:us-east-1:<account>:project-n-<cluster-id>-sqs"

Create a new Admin Server with --manual-confirm:

./setup.sh --package <url_granica_rpm> --manual-confirm

Enter N before the granica deploy command can run
Deploy the new cluster in recovery mode:

granica deploy --recovery-mode --var-file projectn.tfvars

Re-hydrate the register database. Contact Granica Support for the S3 bucket URI:

aws s3api get-object --bucket <logs-bucket> --key <uri-for-register.sql> register.sql
kubectl cp register.sql default/register-0:/register.sql
kubectl exec -ti register-0 -- psql -Un -h0.0.0.0 -dn < register.sql

Take the cluster out of recovery mode: granica deploy --var-file projectn.tfvars

DO NOT PROCEED FURTHER until recovery is complete. The new database will be automatically rehydrated from the n-meta bucket.

Configure your custom domain: granica deploy --custom-domain="xxx.bolt.granica.ai"
Add DNS entries from the deployment output to your DNS provider
Wait for pods: watch kubectl get pods
Alter source bucket policies for the new data cruncher role

Validate the cluster

Create a temporary, small EC2 instance in the same VPC as the new cluster (public subnet, project-n-admin role)
Install the Granica CLI plugin
Validate reads:

.local/bin/aws s3api get-object --bucket <bucket-name> --key <object-key> <object-name>