Disaster recovery

Learn the standard procedures for recovering from a disaster.

Granica is deployed as a Kubernetes cluster inside your AWS environment. The cluster itself is stateless with the following exceptions:

  • The Terraform state that reflects the infrastructure is stored in an S3 bucket
  • The ingested data is stored in S3 buckets
  • The database that is used by the cluster uses an EBS volume (backed up every 10 mins to S3)

Record your cluster configuration

Granica cluster details like EBS volumes, SQS ARN, S3 bucket used for DB backup, etc. are listed as part of granica deploy or granica ls. A best practice is to record this information and keep it in a safe place — before disaster strikes. You can always contact the Granica team for this information as we persist it as part of our telemetry.

Granica Admin Server details:

$ granica ls
--- AWS Deployments ---
[default]
{
  "remote_state_bucket": "n-tfst-t",
  "remote_state_bucket_path": "17f396",
  "remote_state_bucket_region": "us-east-1",
  ...
}

Granica cluster details:

$ granica deploy
Disaster recovery config: retain this in a safe location
{
  "volumes": {
    "db-persistent-storage-db-0": "aws://us-east-1d/vol-060aeebd1f7797a0b",
    "prometheus-pvc": "aws://us-east-1d/vol-0ad9ac3cb5d2175ad",
    "register-pvc": "aws://us-east-1d/vol-0ba6ff3fb74ab9e4c"
  },
  "buckets": {
    "meta_bucket": "n-meta-us-east-1-c2ed",
    "remote_state_bucket": "n-tfst-t"
  },
  "iam": "project-n-admin-b89c311b",
  "aws_sqs_arn": "arn:aws:sqs:us-east-1:<account>:project-n-<cluster-id>-sqs"
}

Recover from accidentally deleted Admin Server

If you accidentally delete your Granica Admin Server, you can create a new one using your previously recorded Terraform state:

  1. Create a new t2.micro instance in the same VPC and with the same IAM role as the original Admin Server
  2. Install the Granica CLI from RPM — contact Granica Support for the correct release
  3. Use aws configure to point to the correct region
  4. Run granica deploy using the bucket containing the Terraform state:
granica deploy --remote-state s3://n-tfst-mpar-a5f684/b6747c --custom-domain="xxxx.bolt.granica.ai"

If the initial deployment used projectn.tfvars to specify VPC, subnets, etc., specify it here as well: granica deploy --remote-state s3://n-tfst-mpar-a5f684/b6747c --custom-domain="xxxx.bolt.granica.ai" --var-file=projectn.tfvars

This deploy command will not recreate any resources — it only updates the local state of the new Admin Server.

Recover from accidentally deleted cluster

The recovery procedure depends on whether the EBS volumes are still present.

The production database changes are always written to the EBS volume before being finalized, and updates are also streamed every 10 mins to an S3 bucket. If both the S3 bucket and EBS volume are available, the database pod can start fresh and recover to the original state.

When EBS volumes are still present

  1. Create a projectn.tfvars file with the volume IDs and the database cloud bucket name (meta_bucket). Include SQS ARN if still present:
meta_bucket = "n-meta-us-east-1-c2ed"
db_volume_id = "aws://us-east-1d/vol-060aeebd1f7797a0b"
prometheus_volume_id = "aws://us-east-1d/vol-0ad9ac3cb5d2175ad"
register_volume_id = "aws://us-east-1d/vol-0ba6ff3fb74ab9e4c"
aws_sqs_arn = "arn:aws:sqs:us-east-1:<account>:project-n-<cluster-id>-sqs"
  1. Create a new cluster and Admin Server using the setup.sh script with your projectn.tfvars file. Contact Granica Support for the package URL:
./setup.sh --var-file /Users/username/customer_deployment_arena/projectn.tfvars --package <url_granica_rpm>
  1. Configure the new cluster with your custom domain: granica deploy --custom-domain="xxx.bolt.granica.ai"
  2. Add the new DNS entries from the deployment output to your DNS provider (e.g. AWS Route53)
  3. Wait for the Kubernetes pods to come up: watch kubectl get pods
  4. Alter all existing source bucket policies (DR cluster creates a different data cruncher role)

When EBS volumes have been accidentally deleted

A newly deployed cluster needs to be up for at least 15 minutes before the n-meta- S3 bucket has consistent data. If the cluster and EBS volumes are deleted before that 15-minute mark, the bucket data is unusable. Contact Granica for assistance in this case.

  1. Create a projectn.tfvars file with the database cloud bucket name and SQS ARN:
meta_bucket = "n-meta-us-east-1-c2ed"
aws_sqs_arn = "arn:aws:sqs:us-east-1:<account>:project-n-<cluster-id>-sqs"
  1. Create a new Admin Server with --manual-confirm:
./setup.sh --package <url_granica_rpm> --manual-confirm
  1. Enter N before the granica deploy command can run

  2. Deploy the new cluster in recovery mode:

granica deploy --recovery-mode --var-file projectn.tfvars
  1. Re-hydrate the register database. Contact Granica Support for the S3 bucket URI:
aws s3api get-object --bucket <logs-bucket> --key <uri-for-register.sql> register.sql
kubectl cp register.sql default/register-0:/register.sql
kubectl exec -ti register-0 -- psql -Un -h0.0.0.0 -dn < register.sql
  1. Take the cluster out of recovery mode: granica deploy --var-file projectn.tfvars

DO NOT PROCEED FURTHER until recovery is complete. The new database will be automatically rehydrated from the n-meta bucket.

  1. Configure your custom domain: granica deploy --custom-domain="xxx.bolt.granica.ai"
  2. Add DNS entries from the deployment output to your DNS provider
  3. Wait for pods: watch kubectl get pods
  4. Alter source bucket policies for the new data cruncher role

Validate the cluster

  1. Create a temporary, small EC2 instance in the same VPC as the new cluster (public subnet, project-n-admin role)
  2. Install the Granica CLI plugin
  3. Validate reads:
.local/bin/aws s3api get-object --bucket <bucket-name> --key <object-key> <object-name>
Was this page helpful?

On this page