Fast API Deployment in K8s

Fast API Deployment in K8s

June 28, 2023

Minikube is a tool that enables local Kubernetes development and testing by running a single-node Kubernetes cluster on a personal machine.

Minikube will be utilised instead of Kubernetes in this post. I have a trained ONNX model that, given an input of an array of 5 values, outputs the maximum value. I’ll dockerize this model, then develop a Fast API for it, push it to a dockerhub or local registry, and lastly deploy it to Minikube.

Fast API

The provided FastAPI code implements an endpoint for performing inference using an ONNX model. The /inference endpoint accepts a list of numbers as input and calculates the maximum value from the array using the loaded ONNX model. The maximum value is then returned as the response. Additionally, there is a /health_check endpoint that can be used to verify the availability of the application.

# Imports
import numpy as np
import onnxruntime as ort
from pydantic import BaseModel
from config import config as cfg
from fastapi import FastAPI


# Define the FastAPI app
app = FastAPI(title="Hello World Fast",
              description="Hello World Fast API", version="1.0")

# Define the input request model
class InferenceRequest(BaseModel):
    numbers: list

# Load the ONNX model for inference
ort_session = ort.InferenceSession(cfg.onnx_file_path)

# Define the inference endpoint
@app.post("/inference")
def inference(request: InferenceRequest):
    numbers = request.numbers

    # Convert the input to a numpy array
    input_array = np.array(numbers).reshape(1, -1).astype(np.float32)

    # Run the ONNX model
    ort_inputs = {ort_session.get_inputs()[0].name: input_array}
    ort_output = ort_session.run(None, ort_inputs)

    # Get the maximum value and index from the predicted output
    max_value_index = np.argmax(ort_output[0])
    max_value = numbers[max_value_index]

    # Return the max value as the response
    return {"max_value": max_value}


@app.get("/health_check")
async def healthcheck():
    return {"status": "alive"}

You can run this fast api by running
eval $(cat ../config/staging.env) bash main.sh

Dockerizing

This Dockerfile uses Python 3.9 as the base image. It sets the working directory to /app and installs the Python dependencies specified in requirements.txt. The application code is then copied to the working directory. Port 8980 is exposed, and the command bash main.sh is executed as the container’s entrypoint.

# Use Python 3.8 as the base image
FROM python:3.9

# Set the working directory inside the container
WORKDIR /app

# By copying over requirements first, we make sure that Docker will cache
# our installed requirements rather than reinstall them on every build
COPY requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt

# Copy the application code to the working directory
COPY . .

EXPOSE 8980

CMD ["bash","main.sh"]
  • Build the dockerfile with
    • docker build -t fast_api_image .
  • Run the the image with
    • docker run -it -p 8080:8980 --env-file config/staging.env --name fast_api_container fast_api_image
  • Test the endpoint with either of following curl commands
    • curl -X POST -H "Content-Type: application/json" -d '{"numbers": [1.0, 2.0, 33.0, 4.0, 5.0]}' http://localhost:8080/inference
    • curl -X POST -H "Content-Type: application/json" -d @input.json http://localhost:8080/inference (here input.json is json file contain the input data)

Pushing to Registry

First setup a local registry

  • docker run -d -p 5000:5000 --restart always --name registry registry:2

Then tag the image with the registry and then push it

  • docker tag your_docker_image localhost:5000/fast_api_image
  • docker push localhost:5000/fast_api_image

Or else you can push to docker hub also.

docker tag local_image:tagname username/repository:tagname
docker push username/repository:tagname`

Deployment

With the image successfully pushed to the registry, most of the deployment process is complete. Dockerizing the application is a major step, and the remaining tasks mainly involve applying specific manifests in Kubernetes. Once these steps are completed, your deployment will be finalized.

Setup Minikube

NOTE : Due to bugs in the latest version of Kubernetes at the time of writing, specifically related to image pulling from Docker Hub, I encountered difficulties with Minikube. To work around this issue, I opted to use an earlier version, specifically kubernetes-version=v1.22.9. However, even in this version, there was a bug that prevented reading from the config map within the pods. As a temporary solution, I had to hard code the configuration variables instead.

Create Config Map

A ConfigMap is a resource used to store configuration data that can be consumed by containers running in pods. Following kubectl command is used to create a ConfigMap:
kubectl create configmap <config_name> --from-file=<file_path>

In this post we’ve following key value pairs as config map

UVICORN_WORKERS=1
HOST_IP=0.0.0.0
HOST_PORT=8980

Deployment Manifest

deployment.yaml : This file describes the desired state of a Kubernetes Deployment. It specifies the container image, replicas, resource requirements, and other deployment-specific settings. It is used to create and manage the pods that run your application.

apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: fast-api
  name: fast-api
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fast-api
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: fast-api
    spec:
      containers:
      - image: rahul1024/fast-api
        name: fast-api
        imagePullPolicy: Always
        resources:
          requests:
            memory: "1000Mi"
            cpu: "1000m"
          limits:
            memory: "4000Mi"
            cpu: "4000m"
        volumeMounts:
          - mountPath: /root/staging.env
            subPath: staging.env
            name: env
      volumes:
        - name: env
          configMap:
            name: fast-api-config
            items:
              - key: staging.env
                path: staging.env
status: { }

HPA Manifest

hpa.yaml: This file represents a Horizontal Pod Autoscaler (HPA) in Kubernetes. The HPA automatically scales the number of pods based on the observed CPU utilization or custom metrics. It specifies the minimum and maximum number of replicas, as well as the target CPU utilization or metric thresholds to trigger scaling.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  creationTimestamp: null
  name: fast-api
spec:
  maxReplicas: 4
  minReplicas: 2
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: fast-api
  targetCPUUtilizationPercentage: 50

Service Manifest

service.yaml: This file defines a Kubernetes Service, which provides a stable network endpoint to access your application. The Service acts as a load balancer and distributes incoming traffic to the pods running your application. It defines the networking rules, such as the service type (ClusterIP, NodePort, LoadBalancer), port mappings, and selectors to match the pods.

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: null
  labels:
    app: fast-api
  name: fast-api
spec:
  ports:
  - port: 8980
    protocol: TCP
    targetPort: 8980
  selector:
    app: fast-api
  type: ClusterIP
status:

Apply these manifest with the following commands :

  • kubectl apply -f <deployment_file_path>
  • kubectl apply -f <service_file_path>
  • kubectl apply -f <hpa_file_path>

Test k8s deployment

Port forward for the service which has been deployed to k8s with one of the following commands:

  • kubectl port-forward service/<service_name> <local_port:k8s_port>
  • kubectl port-forward deployment/<deployement_name> <local_port:k8s_port>

References