Fast API Deployment in K8s
Minikube is a tool that enables local Kubernetes development and testing by running a single-node Kubernetes cluster on a personal machine.
Minikube will be utilised instead of Kubernetes in this post. I have a trained ONNX model that, given an input of an array of 5 values, outputs the maximum value. I’ll dockerize this model, then develop a Fast API for it, push it to a dockerhub or local registry, and lastly deploy it to Minikube.
Fast API
The provided FastAPI code implements an endpoint for performing inference using an ONNX model. The /inference
endpoint accepts a list of numbers as input and calculates the maximum value from the array using the loaded ONNX model. The maximum value is then returned as the response. Additionally, there is a /health_check
endpoint that can be used to verify the availability of the application.
# Imports
import numpy as np
import onnxruntime as ort
from pydantic import BaseModel
from config import config as cfg
from fastapi import FastAPI
# Define the FastAPI app
app = FastAPI(title="Hello World Fast",
description="Hello World Fast API", version="1.0")
# Define the input request model
class InferenceRequest(BaseModel):
numbers: list
# Load the ONNX model for inference
ort_session = ort.InferenceSession(cfg.onnx_file_path)
# Define the inference endpoint
@app.post("/inference")
def inference(request: InferenceRequest):
numbers = request.numbers
# Convert the input to a numpy array
input_array = np.array(numbers).reshape(1, -1).astype(np.float32)
# Run the ONNX model
ort_inputs = {ort_session.get_inputs()[0].name: input_array}
ort_output = ort_session.run(None, ort_inputs)
# Get the maximum value and index from the predicted output
max_value_index = np.argmax(ort_output[0])
max_value = numbers[max_value_index]
# Return the max value as the response
return {"max_value": max_value}
@app.get("/health_check")
async def healthcheck():
return {"status": "alive"}
You can run this fast api by running
eval $(cat ../config/staging.env) bash main.sh
Dockerizing
This Dockerfile uses Python 3.9 as the base image. It sets the working directory to /app
and installs the Python dependencies specified in requirements.txt
. The application code is then copied to the working directory. Port 8980 is exposed, and the command bash main.sh
is executed as the container’s entrypoint.
# Use Python 3.8 as the base image
FROM python:3.9
# Set the working directory inside the container
WORKDIR /app
# By copying over requirements first, we make sure that Docker will cache
# our installed requirements rather than reinstall them on every build
COPY requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt
# Copy the application code to the working directory
COPY . .
EXPOSE 8980
CMD ["bash","main.sh"]
- Build the dockerfile with
docker build -t fast_api_image .
- Run the the image with
docker run -it -p 8080:8980 --env-file config/staging.env --name fast_api_container fast_api_image
- Test the endpoint with either of following curl commands
curl -X POST -H "Content-Type: application/json" -d '{"numbers": [1.0, 2.0, 33.0, 4.0, 5.0]}' http://localhost:8080/inference
curl -X POST -H "Content-Type: application/json" -d @input.json http://localhost:8080/inference
(here input.json is json file contain the input data)
Pushing to Registry
First setup a local registry
docker run -d -p 5000:5000 --restart always --name registry registry:2
Then tag the image with the registry and then push it
docker tag your_docker_image localhost:5000/fast_api_image
docker push localhost:5000/fast_api_image
Or else you can push to docker hub also.
docker tag local_image:tagname username/repository:tagname
docker push username/repository:tagname`
Deployment
With the image successfully pushed to the registry, most of the deployment process is complete. Dockerizing the application is a major step, and the remaining tasks mainly involve applying specific manifests in Kubernetes. Once these steps are completed, your deployment will be finalized.
Setup Minikube
- Follow https://minikube.sigs.k8s.io/docs/start/ for minikube installation
- Start your cluster
minikube start --memory 8192 --cpus 4 --driver=docker --kubernetes-version=v1.22.9
NOTE
: Due to bugs in the latest version of Kubernetes at the time of writing, specifically related to image pulling from Docker Hub, I encountered difficulties with Minikube. To work around this issue, I opted to use an earlier version, specifically kubernetes-version=v1.22.9. However, even in this version, there was a bug that prevented reading from the config map within the pods. As a temporary solution, I had to hard code the configuration variables instead.
Create Config Map
A ConfigMap is a resource used to store configuration data that can be consumed by containers running in pods. Following kubectl command is used to create a ConfigMap:
kubectl create configmap <config_name> --from-file=<file_path>
In this post we’ve following key value pairs as config map
UVICORN_WORKERS=1
HOST_IP=0.0.0.0
HOST_PORT=8980
Deployment Manifest
deployment.yaml : This file describes the desired state of a Kubernetes Deployment. It specifies the container image, replicas, resource requirements, and other deployment-specific settings. It is used to create and manage the pods that run your application.
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
app: fast-api
name: fast-api
spec:
replicas: 1
selector:
matchLabels:
app: fast-api
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: fast-api
spec:
containers:
- image: rahul1024/fast-api
name: fast-api
imagePullPolicy: Always
resources:
requests:
memory: "1000Mi"
cpu: "1000m"
limits:
memory: "4000Mi"
cpu: "4000m"
volumeMounts:
- mountPath: /root/staging.env
subPath: staging.env
name: env
volumes:
- name: env
configMap:
name: fast-api-config
items:
- key: staging.env
path: staging.env
status: { }
HPA Manifest
hpa.yaml: This file represents a Horizontal Pod Autoscaler (HPA) in Kubernetes. The HPA automatically scales the number of pods based on the observed CPU utilization or custom metrics. It specifies the minimum and maximum number of replicas, as well as the target CPU utilization or metric thresholds to trigger scaling.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
creationTimestamp: null
name: fast-api
spec:
maxReplicas: 4
minReplicas: 2
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: fast-api
targetCPUUtilizationPercentage: 50
Service Manifest
service.yaml: This file defines a Kubernetes Service, which provides a stable network endpoint to access your application. The Service acts as a load balancer and distributes incoming traffic to the pods running your application. It defines the networking rules, such as the service type (ClusterIP, NodePort, LoadBalancer), port mappings, and selectors to match the pods.
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
app: fast-api
name: fast-api
spec:
ports:
- port: 8980
protocol: TCP
targetPort: 8980
selector:
app: fast-api
type: ClusterIP
status:
Apply these manifest with the following commands :
kubectl apply -f <deployment_file_path>
kubectl apply -f <service_file_path>
kubectl apply -f <hpa_file_path>
Test k8s deployment
Port forward for the service which has been deployed to k8s with one of the following commands:
kubectl port-forward service/<service_name> <local_port:k8s_port>
kubectl port-forward deployment/<deployement_name> <local_port:k8s_port>