Large Language Models are revolutionising the tech world as we know it!
In this episode, we’re about to embark on an exciting journey, merging LLMs with Kubernetes to create a supercharged, easily maintainable systems, while slashing recovery time to the bare minimum.
Outages can be complex; however, with a little AI goblin in your cluster, we can soothe your nerves and resolve all sorts of issues.


# Welcome to LocalAI!

LocalAI is at the heart of this whole thing, it is a brilliant drop-in replacement of OpenAPI, allowing you to run models locally or on-prem, even with consumer-grade hardware.
This powerful feature opens up new possibilities for AI integration across a range of applications.

LocalAI is versatile, working with a variety of models like llama.cpp, gpt4all, rwkv.cpp, and ggml.
Head over to HuggingFace to find the perfect model for your unique needs!

Today, we’ll be exploring two models from gpt4all - the commercially licensable ggml-gpt4all-j-groovy and the llama-based ggml-gpt4all-snoozy, which isn’t commercially licensable but promises oodles of fun!

First, we’ll clone the LocalAI repository, then download the gpt model we fancy, starting with the commercially viable one.

git clone https://github.com/go-skynet/LocalAI
cd LocalAI
wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin -O models/ggml-gpt4all-j-v1.3-groovy # download our model
cp prompt-templates/ggml-gpt4all-j.tmpl models/ggml-gpt4all-j-v1.3-groovy.tmpl # copy our prompt-template over
cat <<EOF > .env
THREADS=6
CONTEXT_SIZE=512
MODELS_PATH=/models
# DEBUG=true
# BUILD_TYPE=generic
EOF
docker compose up -d --build
×
git clone https://github.com/go-skynet/LocalAI
cd LocalAI
wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin -O models/ggml-gpt4all-j-v1.3-groovy # download our model
cp prompt-templates/ggml-gpt4all-j.tmpl models/ggml-gpt4all-j-v1.3-groovy.tmpl # copy our prompt-template over
cat <<EOF > .env
THREADS=6
CONTEXT_SIZE=512
MODELS_PATH=/models
# DEBUG=true
# BUILD_TYPE=generic
EOF
docker compose up -d --build

Give it a few minutes to build and run the server, then we can query the API for available models and chat or message completions.
The best part?
Since this API is a perfect match for the OpenAI API, custom chatgpt apps can easily link up for a silky smooth chat experience.

curl http://localhost:8080/v1/models

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j-v1.3-groovy",
     "messages": [{"role": "user", "content": "What is a cow?"}],
     "temperature": 0.1
   }'

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j-v1.3-groovy",
     "prompt": "What is a cow?",
     "temperature": 0.1
   }'

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j-v1.3-groovy",
     "prompt": "What is Kubernetes?",
     "temperature": 0.1
   }'
×
curl http://localhost:8080/v1/models

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j-v1.3-groovy",
     "messages": [{"role": "user", "content": "What is a cow?"}],
     "temperature": 0.1
   }'

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j-v1.3-groovy",
     "prompt": "What is a cow?",
     "temperature": 0.1
   }'

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-j-v1.3-groovy",
     "prompt": "What is Kubernetes?",
     "temperature": 0.1
   }'

It’s always a treat when things just work.
But, ask the model, “What is Kubernetes?” and brace yourself for some bizarre responses about Turkish cuisine or Icelandic traditions!


# Meet GPT4ALL Snoozy!

Time to be amazed by the GPT4ALL Snoozy model, inspired by Meta’s incredible LLaMa model.
This gem of a model shines when it comes to dishing out valuable information!
However, do keep in mind that, as it’s based on the LLaMa model, it’s not suitable for commercial use.

But, don’t lose hope!
With the recent release of the OpenLLaMa project, which permits commercial licensing for its data, things could change soon!

For now, let’s switch gears, stop our docker server, and dive into the world of the Snoozy model from GPT4ALL.
And don’t forget to create a new template for this fantastic model!

docker compose down

wget https://gpt4all.io/models/ggml-gpt4all-l13b-snoozy.bin -O models/ggml-gpt4all-l13b-snoozy
cp prompt-templates/ggml-gpt4all-j.tmpl models/ggml-gpt4all-l13b-snoozy.tmpl

docker compose up -d --build

curl http://localhost:8080/v1/models

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-l13b-snoozy",
     "prompt": "What is a cow?",
     "temperature": 0.1
   }'

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-l13b-snoozy",
     "prompt": "What is Kubernetes?",
     "temperature": 0.1
   }'
×
docker compose down

wget https://gpt4all.io/models/ggml-gpt4all-l13b-snoozy.bin -O models/ggml-gpt4all-l13b-snoozy
cp prompt-templates/ggml-gpt4all-j.tmpl models/ggml-gpt4all-l13b-snoozy.tmpl

docker compose up -d --build

curl http://localhost:8080/v1/models

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-l13b-snoozy",
     "prompt": "What is a cow?",
     "temperature": 0.1
   }'

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-l13b-snoozy",
     "prompt": "What is Kubernetes?",
     "temperature": 0.1
   }'

# Setup Local Kubernetes

We’re about to integrate LocalAI with Kubernetes, and what’s more, we’re actually going to run LocalAI inside Kubernetes too.

We’ll kick things off by creating a local Kubernetes cluster using the brilliant K3D tool.

K3D works its magic with K3s, a nifty, lightweight version of Kubernetes that packs all the punch we need.
Our nodes will run as Docker containers, cleverly mimicking a real cluster, without any of the overhead.

curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
sudo dnf install helm
k3d cluster create local

# K8sGPT Magic!

Once Kubernetes initialises, we’ll install the helm chart for the k8sgpt-operator!
K8sGPT is a game-changer, as it scans your Kubernetes clusters, diagnosing and triaging issues in plain English.
The operator creates report results, saved right inside the cluster as a shiny, new results resource.

First, we’ll integrate it with OpenAI itself to showcase its functionality.
Then we’ll look at installing LocalAI into the cluster and integrating it with our model.

Let’s start by creating a quick secret containing our openai-api-key.

helm repo add k8sgpt https://charts.k8sgpt.ai/
helm install k8sgpt-operator k8sgpt/k8sgpt-operator

kubectl create secret generic k8sgpt-sample-secret --from-literal=openai-api-key=$OPENAI_TOKEN -n default

kubectl run test --image nginx/test

Next, we’ll create a new K8sGPT resource that connects with the OpenAI API and processes error messages.

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
  namespace: default
spec:
  model: gpt-3.5-turbo
  backend: openai
  noCache: false
  version: v0.2.9
  enableAI: true
  secret:
    name: k8sgpt-sample-secret
    key: openai-api-key
×
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
  namespace: default
spec:
  model: gpt-3.5-turbo
  backend: openai
  noCache: false
  version: v0.2.9
  enableAI: true
  secret:
    name: k8sgpt-sample-secret
    key: openai-api-key

Oh, and we can’t forget to run a cheeky, broken kubectl command to a non-existent Docker container.
This will be our fun dummy error.

Give it a moment, and you’ll see a Result resource being created.
It’s packed with helpful tips and tricks on how to resolve our error.
It might not be perfect advice, but it’s a lifesaver for anyone with less experience.
I am excited to see the future where we can fine-tune a model to understand the full cluster, and codebase, to give more accurate and surgical advice.


# K8sGPT & LocalAI Unite!

Let’s bring together the two awesome things we just explored, and run K8sGPT in Kubernetes using LocalAI as its backend.

First, we’ll delete our K8sGPT resource to avoid any unwanted interactions with OpenAI.
Next, we’ll create a new configmap, which will hold our template that we’ve been previously copying around to structure queries for our model.

apiVersion: v1
kind: ConfigMap
metadata:
  name: local-ai
data:
  model.tmpl: |
    The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.
    ### Prompt:
      {{.Input}}
    ### Response:
×
apiVersion: v1
kind: ConfigMap
metadata:
  name: local-ai
data:
  model.tmpl: |
    The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.
    ### Prompt:
      {{.Input}}
    ### Response:

Now, it’s time for a new deployment!
This deployment will create a pod with an init container that downloads our model.
Our pod’s main container will be LocalAI, and it’ll mount our template from the configmap.
To wrap it all up, we’ll create a service for K8sGPT to connect to.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: local-ai
spec:
  selector:
    matchLabels:
      app: local-ai
  replicas: 1
  template:
    metadata:
      name: local-ai
      labels:
        app: local-ai
    spec:
      initContainers:
        - name: download-model
          image: busybox
          command: ["/bin/sh", "-c"]
          args:
            - |
              url="https://gpt4all.io/models/ggml-gpt4all-l13b-snoozy.bin"
              model="ggml-gpt4all-l13b-snoozy"
              wget "${url}" -O "/models/${model}"
          volumeMounts:
            - mountPath: /models
              name: models

      containers:
        - name: local-ai
          image: quay.io/go-skynet/local-ai:master
          env:
            - name: THREADS
              value: "6"
            - name: CONTEXT_SIZE
              value: "512"
            - name: MODELS_PATH
              value: /models
          volumeMounts:
            - name: models
              mountPath: /models
            - name: template
              mountPath: /models/ggml-gpt4all-l13b-snoozy.tmpl
              subPath: model.tmpl
              readOnly: true
      volumes:
        - name: models
          emptyDir: {}
        - name: template
          configMap:
            name: local-ai
            items:
              - key: model.tmpl
                path: model.tmpl
---
apiVersion: v1
kind: Service
metadata:
  name: local-ai
spec:
  selector:
    app: local-ai
  type: ClusterIP
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080
×
apiVersion: apps/v1
kind: Deployment
metadata:
  name: local-ai
spec:
  selector:
    matchLabels:
      app: local-ai
  replicas: 1
  template:
    metadata:
      name: local-ai
      labels:
        app: local-ai
    spec:
      initContainers:
        - name: download-model
          image: busybox
          command: ["/bin/sh", "-c"]
          args:
            - |
              url="https://gpt4all.io/models/ggml-gpt4all-l13b-snoozy.bin"
              model="ggml-gpt4all-l13b-snoozy"
              wget "${url}" -O "/models/${model}"
          volumeMounts:
            - mountPath: /models
              name: models

      containers:
        - name: local-ai
          image: quay.io/go-skynet/local-ai:master
          env:
            - name: THREADS
              value: "6"
            - name: CONTEXT_SIZE
              value: "512"
            - name: MODELS_PATH
              value: /models
          volumeMounts:
            - name: models
              mountPath: /models
            - name: template
              mountPath: /models/ggml-gpt4all-l13b-snoozy.tmpl
              subPath: model.tmpl
              readOnly: true
      volumes:
        - name: models
          emptyDir: {}
        - name: template
          configMap:
            name: local-ai
            items:
              - key: model.tmpl
                path: model.tmpl
---
apiVersion: v1
kind: Service
metadata:
  name: local-ai
spec:
  selector:
    app: local-ai
  type: ClusterIP
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080

By port-forwarding our new pod, we can query the API and confirm everything’s up and running.

Finally, we’ll create a new K8sGPT resource, point its backend to LocalAI, specify our model, and wait for the magic to happen.

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-local
spec:
  backend: localai
  model: ggml-gpt4all-l13b-snoozy
  baseUrl: http://local-ai.default.svc.cluster.local:8080/v1
  noCache: false
  version: v0.2.9
  enableAI: true
×
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-local
spec:
  backend: localai
  model: ggml-gpt4all-l13b-snoozy
  baseUrl: http://local-ai.default.svc.cluster.local:8080/v1
  noCache: false
  version: v0.2.9
  enableAI: true

As expected, we receive a super helpful and useful new result resource.
K8sGPT and LocalAI, united at last!


# Tailor-made Prompts!

We can tweak our prompt template to perfectly suit our model’s queries to our context.
By rewriting the prompt to explicitly state the context for the model’s queries, our responses will always be customised to that context.

apiVersion: v1
kind: ConfigMap
metadata:
  name: local-ai
data:
  model.tmpl: |
    The prompt below is an error message from inside a Kubernetes cluster; create a list of possible solutions and actions a software engineer could take to diagnose and resolve the issue.
    ### Prompt:
      {{.Input}}
    ### Response:
×
apiVersion: v1
kind: ConfigMap
metadata:
  name: local-ai
data:
  model.tmpl: |
    The prompt below is an error message from inside a Kubernetes cluster; create a list of possible solutions and actions a software engineer could take to diagnose and resolve the issue.
    ### Prompt:
      {{.Input}}
    ### Response:

Let’s give our context a makeover: “The prompt below is an error message from inside a Kubernetes cluster; create a list of possible solutions and actions a software engineer could take to diagnose and resolve the issue.”

Now, we can throw anything at the API, like “out of memory,” and it’ll know that the error must be related to Kubernetes.
You could imagine all sorts of meta-information you could pass into this prompt template to further enhance the responses you get.

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-gpt4all-l13b-snoozy",
     "prompt": "out of memory",
     "temperature": 0.1
   }'

# Outro

The world of engineering is evolving at breakneck speed!
With Large Language Models, we can simplify our administration skills and conquer all sorts of challenges.
Imagine harnessing the power of LLMs for system alerts from Prometheus or other monitoring suites.
When we can embed the full scope of our organization’s projects into these models, there’s no doubt that they’ll become an indispensable tool in every system.
Imagine asking the system, in plain text, how a tool or service operates, and getting a detailed and useful response.
Maybe humanity we will be able to finally make sense of sprawling microservice architectures.