Deploy Screen With EKS

You can run Granica Screen as part of your EKS cluster, allowing you to seamlessly integrate Screen into your AWS deployments.

System Requirements

We recommend deploying the pod on a g5.xlarge node running an amazon-eks-gpu-node-x.xx-vxxxxxxxx (find the version that matches with your version of k8s) with at least 32GB RAM and 128GB Disk (see: System Requirements).

Creating a Cluster

If you don't already have a Kubernetes cluster on EKS, you can use the following steps to bring up a cluster using eksctl. If you already have a cluster, you can skip to the next section.

If you have not yet already, create a new IAM role and request access to the screen image.
Copy the following config to screen_cluster.yaml:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: screen-cluster
  region: us-west-2

nodeGroups:
  - name: screen-ng-1
    ami: ami-043de4ad25ed718c1 # amazon-eks-gpu-node-1.29-v20240117, replace this as needed
    amiFamily: AmazonLinux2
    instanceType: g5.xlarge
    minSize: 1
    maxSize: 1
    desiredCapacity: 1
    volumeSize: 128
    iam:
      instanceRoleARN: # insert your granica-screen-docker-role here
    overrideBootstrapCommand: |
      #!/bin/bash
      source /var/lib/cloud/scripts/eksctl/bootstrap.helper.sh
      /etc/eks/bootstrap.sh ${CLUSTER_NAME} --kubelet-extra-args "--node-labels=${NODE_LABELS}"

You can then deploy the cluster with eksctl create cluster -f screen_cluster.yaml

Deploy Screen in Your Cluster

Run the following command on the EC2 instance to login to Granica's ECR repository on the instance:

aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin 809541265033.dkr.ecr.us-east-2.amazonaws.com

Then, use this to create a Kubernetes Secret to use as credentials to pull the image:

kubectl create secret generic regcred \
    --from-file=.dockerconfigjson=<path/to/.docker/config.json> \
    --type=kubernetes.io/dockerconfigjson

You also need to get a license file from Granica. Save this locally and create a kubernetes secret from the file using the following (change the path to your license path):

kubectl create secret generic screen-license --from-file=./screen.lic

We recommend running Granica Screen as a deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: screen
spec:
  replicas: 1
  selector:
    matchLabels:
      app: screen
  template:
    metadata:
      labels:
        app: screen
    spec:
      containers:
      - name: screen-container
        image: 809541265033.dkr.ecr.us-east-2.amazonaws.com/screen:latest
        imagePullPolicy: Always
        name: screen
        resources:
          limits:
            nvidia.com/gpu: "1"
        env:
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: compute,utility
        ports:
        - name: screenapi
          containerPort: 8080
          protocol: TCP
        - name: metrics
          containerPort: 9092
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/screen
          name: screen-license-volume
          readOnly: true
      imagePullSecrets:
      - name: regcred
      volumes:
      - name: screen-license-volume
        secret:
          secretName: screen-license

tip

Note that since each pod will have a resource requirement of 1 GPU, you may encounter GPU related resource constraint issues when rolling out Screen Kubernetes deployments. If this happens, consider adjusting the deployment's rollout strategy accordingly.

Copy this into a file titled screen.yaml and create the deployment:

kubectl apply -f screen.yaml

You probably want to expose the Screen API port as a service:

kubectl expose deployment screen --type=NodePort --port=8080 --target-port=8080

That's it! You should be all ready to make requests to the service on the /screen endpoint. Refer here to see how you can use the endpoint. To find out about logging, health check and versioning, refer here.

Updating

To update the running Screen image, you can just update the Deployment spec and rollout your changes. For example, to change the image to v1.29.1-gpu:

kubectl set image deploy/screen screen=809541265033.dkr.ecr.us-east-2.amazonaws.com/screen:v1.29.1-gpu
kubectl rollout restart deploy/screen

Do recall from the note above that your rollout strategy might need to be adjusted if you don't have headroom to perform the default RollingUpdate strategy, which requires at least one extra GPU node available in this case. You can update your strategy to Recreate before running the above commands to get around this. Note that you may incur some downtime while the new pod comes back up:

kubectl patch deployment screen -p '{"spec":{"strategy":{"type":"Recreate", "rollingUpdate": null}}}'

System Performance

Performance was measured using the recommended system specs with a client running inside the Kubernetes cluster.

With a single client sending requests of 100 tokens, average latency was 69ms with P50 at 70ms and P90 at 74ms.
One instance of the Screen container can handle in steady state up to three sustained concurrent sending 100 token requests with a P90 of 90ms.
With over 500 concurrent clients sending 1,000 word inputs, one instance can reach a throughput of 20175 words/s.

System Metrics

We expose Prometheus metrics on :8080/prom-metrics documented here. We recommend using Prometheus Operator to scrape these metrics.

With Helm

Copy the following snippet to kube-prometheus-stack-values.yaml:
grafana:
enabled: false
alertmanager:
enabled: false
prometheus:
prometheusSpec:
scrapeInterval: 5s
podMonitorSelector:
matchLabels:
prometheus: screen
additionalPodMonitors:
- name: screen
additionalLabels:
prometheus: screen
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app: screen
podMetricsEndpoints:
- path: /prom-metrics
port: metrics
Next, install the kube-prometheus-stack helm chart with these values:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install -f kube-prometheus-stack-values.yaml kube-prometheus-stack-release prometheus-community/kube-prometheus-stack

With kubectl

Follow the tutorial to set up Prometheus Operator if you don't have it set up already.
To create a PodMonitor for Screen metrics, apply the following yaml configurations to your Kubernetes cluster:
- This sets up a PodMonitor for the Screen metrics:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: screen
labels:
app: screen
spec:
selector:
matchLabels:
app: screen
podMetricsEndpoints:
- port: metrics
path: /prom-metrics
note
Note that even though specifying ports on the Deployment spec is usually optional in Kubernetes, it's necessary in this case in order for PodMonitor to match the port.
- To sets up Prometheus Operator to use the PodMonitor:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
serviceAccountName: prometheus
podMonitorSelector:
matchLabels:
app: screen
resources:
requests:
memory: 400Mi
scrapeInterval: 5s
enableAdminAPI: false

Scaling Out

If your application needs better throughput or latency than possible with a single replica, you can increase the number of replicas in your Screen deployment.
If your application has variable traffic and you want to be able to dynamically provision replicas, you can configure Kubernetes to autoscale Screen using Prometheus. For that you may find it useful to configure Prometheus Adapter and autoscale with K8s HPA.
- If you're using Helm, you can just install the Prometheus Adapter chart.

Autoscaling Tuning Recommendations

A good starting point could be to autoscale with a target of averageValue=3 for n_screen_api_num_calls_outstanding.
If the data size of each request is roughly constant, you can also try a target of averageValue=1k for n_screen_api_num_bytes_outstanding
You can take the avg_over_time (say, over 1m) for these metrics if you don't want your autoscaling to be too sensitive.
Note that you will need to have as many GPU-attached nodes in your NodeGroup as the maxReplicas that you want to scale to.

Deploy Screen in your Kubernetes Cluster (EKS)

System Requirements#

Creating a Cluster#

Deploy Screen in Your Cluster#

tip

Updating#

System Performance#

System Metrics#

With Helm#

With kubectl#

note

Scaling Out#

Autoscaling Tuning Recommendations#