Les avancées dans l’Intelligence Artificielle Générative et sa mise en oeuvre simplifiée via certains outils, transforment profondément la gestion des clusters Kubernetes via ce concept d’AIOps (acronyme anglais de Artificial Intelligence for IT Operations) énoncé par Gartner, un processus dans lequel vous utilisez des techniques d’intelligence artificielle (IA) pour maintenir par exemple une infrastructure. L’un des domaines dans lesquels les ingénieurs DevOps et les opérateurs de cluster débutants sont souvent confrontés, est relatif aux défis de l’identification, de la compréhension et de la résolution de problèmes au sein d’un cluster Kubernetes …
Dans cet article, nous allons donc explorer comment configurer et utiliser K8sGPT, un outil open source basé sur l’IA générative, en combinaison avec Ollama et le modèle Falcon3, pour identifier et résoudre les problèmes dans un cluster Kubernetes.
Pour cela, je pars ici d’une instance Ubuntu 24.04 LTS dans DigitalOcean :
En y installant localement le moteur Docker …
(base) root@k8sgpt:~# curl -fsSL https://get.docker.com | sh -
# Executing docker install script, commit: 4c94a56999e10efcf48c5b8e3f6afea464f9108e
+ sh -c apt-get -qq update >/dev/null
+ sh -c DEBIAN_FRONTEND=noninteractive apt-get -y -qq install ca-certificates curl >/dev/null
Scanning processes...
Scanning candidates...
Scanning linux images...
+ sh -c install -m 0755 -d /etc/apt/keyrings
+ sh -c curl -fsSL "https://download.docker.com/linux/ubuntu/gpg" -o /etc/apt/keyrings/docker.asc
+ sh -c chmod a+r /etc/apt/keyrings/docker.asc
+ sh -c echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu noble stable" > /etc/apt/sources.list.d/docker.list
+ sh -c apt-get -qq update >/dev/null
+ sh -c DEBIAN_FRONTEND=noninteractive apt-get -y -qq install docker-ce docker-ce-cli containerd.io docker-compose-plugin docker-ce-rootless-extras docker-buildx-plugin >/dev/null
Scanning processes...
Scanning candidates...
Scanning linux images...
+ sh -c docker version
Client: Docker Engine - Community
Version: 27.4.1
API version: 1.47
Go version: go1.22.10
Git commit: b9d17ea
Built: Tue Dec 17 15:45:46 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.4.1
API version: 1.47 (minimum version 1.24)
Go version: go1.22.10
Git commit: c710b88
Built: Tue Dec 17 15:45:46 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.24
GitCommit: 88bf19b2105c8b17560993bee28a01ddc2f97182
runc:
Version: 1.2.2
GitCommit: v1.2.2-0-g7cb3632
docker-init:
Version: 0.19.0
GitCommit: de40ad0
================================================================================
To run Docker as a non-privileged user, consider setting up the
Docker daemon in rootless mode for your user:
dockerd-rootless-setuptool.sh install
Visit https://docs.docker.com/go/rootless/ to learn about rootless mode.
To run the Docker daemon as a fully privileged service, but granting non-root
users access, refer to https://docs.docker.com/go/daemon-access/
WARNING: Access to the remote API on a privileged Docker daemon is equivalent
to root access on the host. Refer to the 'Docker daemon attack surface'
documentation for details: https://docs.docker.com/go/attack-surface/
================================================================================
Et je suis capable d’y lancer directmeent Ollama via son image officiellle pour exécuter des modèles de langage de grande taille (LLM) localement en y utilisant les CPU premium d’Intel présents dans l’instance Ubuntu :
(base) root@k8sgpt:~# docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:latest
Unable to find image 'ollama/ollama:latest' locally
latest: Pulling from ollama/ollama
6414378b6477: Pull complete
9423a26b200c: Pull complete
629da9618c4f: Pull complete
00b71e3f044c: Pull complete
Digest: sha256:18bfb1d605604fd53dcad20d0556df4c781e560ebebcd923454d627c994a0e37
Status: Downloaded newer image for ollama/ollama:latest
7b09d9fcdacff4319e553c41f741a15266eb5a5ec745959363e7754c53a203ef
(base) root@k8sgpt:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7b09d9fcdacf ollama/ollama:latest "/bin/ollama serve" About a minute ago Up About a minute 0.0.0.0:11434->11434/tcp, :::11434->11434/tcp ollama
(base) root@k8sgpt:~# netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:11434 0.0.0.0:* LISTEN 46227/docker-proxy
tcp 0 0 127.0.0.54:53 0.0.0.0:* LISTEN 744/systemd-resolve
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 744/systemd-resolve
tcp6 0 0 :::22 :::* LISTEN 1/init
tcp6 0 0 :::11434 :::* LISTEN 46235/docker-proxy
udp 0 0 127.0.0.54:53 0.0.0.0:* 744/systemd-resolve
udp 0 0 127.0.0.53:53 0.0.0.0:* 744/systemd-resolve
Et en y chargeant le grand modèle Falcon 3 développé par le Technology Innovation Institute (TII) d’Abu Dhab et présent dans la bibliothèque de grands modèles d’Ollama :
(base) root@k8sgpt:~# docker exec -it ollama ollama pull falcon3
pulling manifest
pulling 3717a52b7aea... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.6 GB
pulling 803b5adc3448... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 218 B
pulling 58f83c52a4e3... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 13 KB
pulling 35e31ed4c388... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 101 B
pulling acb75345e14b... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 487 B
verifying sha256 digest
writing manifest
success
(base) root@k8sgpt:~# docker exec -it ollama ollama list
NAME ID SIZE MODIFIED
falcon3:latest 472ea1c89f64 4.6 GB 11 seconds ago
Je peux à cette étape y charger K8sGPT, un outil pour scanner vos clusters kubernetes, diagnostiquer et trier les problèmes en anglais simple. Il dispose d’une expérience SRE codifiée dans ses analyseurs et aide à extraire les informations les plus pertinentes pour les enrichir avec de l’IA générative.
K8sGPT fonctionne en trois étapes :
- Extraction : Récupération des détails de configuration de toutes les charges de travail déployées dans le cluster.
- Filtration : Un composant appelé « analyzer » filtre les données nécessaires.
- Génération : Les données filtrées sont traitées pour générer des insights et des rapports en anglais simple.
(base) root@k8sgpt:~# curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.48/k8sgpt_amd64.deb
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 34.5M 100 34.5M 0 0 30.5M 0 0:00:01 0:00:01 --:--:-- 30.5M
(base) root@k8sgpt:~# dpkg -i k8sgpt_amd64.deb
(Reading database ... 74447 files and directories currently installed.)
Preparing to unpack k8sgpt_amd64.deb ...
Unpacking k8sgpt (0.3.48) over (0.3.48) ...
Setting up k8sgpt (0.3.48) ...
(base) root@k8sgpt:~# k8sgpt
Kubernetes debugging powered by AI
Usage:
k8sgpt [command]
Available Commands:
analyze This command will find problems within your Kubernetes cluster
auth Authenticate with your chosen backend
cache For working with the cache the results of an analysis
completion Generate the autocompletion script for the specified shell
custom-analyzer Manage a custom analyzer
dump Creates a dumpfile for debugging issues with K8sGPT
filters Manage filters for analyzing Kubernetes resources
generate Generate Key for your chosen backend (opens browser)
help Help about any command
integration Integrate another tool into K8sGPT
serve Runs k8sgpt as a server
version Print the version number of k8sgpt
Flags:
--config string Default config file (/root/.config/k8sgpt/k8sgpt.yaml)
-h, --help help for k8sgpt
--kubeconfig string Path to a kubeconfig. Only required if out-of-cluster.
--kubecontext string Kubernetes context to use. Only required if out-of-cluster.
Use "k8sgpt [command] --help" for more information about a command.
La ligne de commande est alors disponible pour y vérifier les fournisseurs disponibles localement dans cette instance :
(base) root@k8sgpt:~# k8sgpt auth list
Default:
> openai
Active:
Unused:
> openai
> localai
> ollama
> azureopenai
> cohere
> amazonbedrock
> amazonsagemaker
> google
> noopai
> huggingface
> googlevertexai
> oci
> ibmwatsonxai
Ollama sera utilisé comme fournisseur de backend IA pour K8sGPT au travers de LocalAI (qui agit comme l’API REST de remplacement compatible avec les spécifications de l’API OpenAI pour une inférence locale).
Voici la commande pour configurer K8sGPT avec Ollama et le modèle Falcon3 :
(base) root@k8sgpt:~# k8sgpt auth add --backend localai --model falcon3 --baseurl http://localhost:11434/v1
localai added to the AI backend provider list
Je lance un cluster Kubernetes managé sur DigitalOcean via DigitalOcean Kubernetes (DOKS) :
Récupération du fichier Kubeconfig depuis ce cluster pour l’insérer localement sur l’instance Ubuntu pour l’utiliser avec le client Kubectl :
(base) root@k8sgpt:~# curl -LO https://dl.k8s.io/release/v1.32.0/bin/linux/amd64/kubectl && chmod +x ./kubectl && mv kubectl /usr/local/bin/ && kubectl
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 138 100 138 0 0 1000 0 --:--:-- --:--:-- --:--:-- 1007
100 54.6M 100 54.6M 0 0 120M 0 --:--:-- --:--:-- --:--:-- 120M
kubectl controls the Kubernetes cluster manager.
Find more information at: https://kubernetes.io/docs/reference/kubectl/
Basic Commands (Beginner):
create Create a resource from a file or from stdin
expose Take a replication controller, service, deployment or pod and expose it as a new Kubernetes service
run Run a particular image on the cluster
set Set specific features on objects
Basic Commands (Intermediate):
explain Get documentation for a resource
get Display one or many resources
edit Edit a resource on the server
delete Delete resources by file names, stdin, resources and names, or by resources and label selector
Deploy Commands:
rollout Manage the rollout of a resource
scale Set a new size for a deployment, replica set, or replication controller
autoscale Auto-scale a deployment, replica set, stateful set, or replication controller
Cluster Management Commands:
certificate Modify certificate resources
cluster-info Display cluster information
top Display resource (CPU/memory) usage
cordon Mark node as unschedulable
uncordon Mark node as schedulable
drain Drain node in preparation for maintenance
taint Update the taints on one or more nodes
Troubleshooting and Debugging Commands:
describe Show details of a specific resource or group of resources
logs Print the logs for a container in a pod
attach Attach to a running container
exec Execute a command in a container
port-forward Forward one or more local ports to a pod
proxy Run a proxy to the Kubernetes API server
cp Copy files and directories to and from containers
auth Inspect authorization
debug Create debugging sessions for troubleshooting workloads and nodes
events List events
Advanced Commands:
diff Diff the live version against a would-be applied version
apply Apply a configuration to a resource by file name or stdin
patch Update fields of a resource
replace Replace a resource by file name or stdin
wait Experimental: Wait for a specific condition on one or many resources
kustomize Build a kustomization target from a directory or URL
Settings Commands:
label Update the labels on a resource
annotate Update the annotations on a resource
completion Output shell completion code for the specified shell (bash, zsh, fish, or powershell)
Subcommands provided by plugins:
Other Commands:
api-resources Print the supported API resources on the server
api-versions Print the supported API versions on the server, in the form of "group/version"
config Modify kubeconfig files
plugin Provides utilities for interacting with plugins
version Print the client and server version information
Usage:
kubectl [flags] [options]
Use "kubectl <command> --help" for more information about a given command.
Use "kubectl options" for a list of global command-line options (applies to all commands).
(base) root@k8sgpt:~# kubectl cluster-info
Kubernetes control plane is running at https://738af175-32d4-43e9-9e31-b7ae3058be3e.k8s.ondigitalocean.com
CoreDNS is running at https://738af175-32d4-43e9-9e31-b7ae3058be3e.k8s.ondigitalocean.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
(base) root@k8sgpt:~# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
pool-kuaxj3k47-ejvl3 Ready <none> 6m40s v1.31.1 10.110.0.3 159.65.200.52 Debian GNU/Linux 12 (bookworm) 6.1.0-27-amd64 containerd://1.6.31
pool-kuaxj3k47-ejvl8 Ready <none> 6m38s v1.31.1 10.110.0.2 164.92.147.176 Debian GNU/Linux 12 (bookworm) 6.1.0-27-amd64 containerd://1.6.31
(base) root@k8sgpt:~# kubectl get po,svc -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/cilium-4gxht 1/1 Running 0 6m46s
kube-system pod/cilium-lw8gd 1/1 Running 0 6m48s
kube-system pod/coredns-c5c6457c-bnzfc 0/1 Running 0 36s
kube-system pod/coredns-c5c6457c-nz6gr 0/1 Running 0 36s
kube-system pod/cpc-bridge-proxy-ebpf-7ncbq 1/1 Running 0 55s
kube-system pod/cpc-bridge-proxy-ebpf-qth8w 1/1 Running 0 55s
kube-system pod/hubble-relay-67597fb8-kmlw5 1/1 Running 1 (51s ago) 8m40s
kube-system pod/hubble-ui-79957d9f7b-4n9kj 2/2 Running 0 74s
kube-system pod/konnectivity-agent-7ml7p 1/1 Running 0 61s
kube-system pod/konnectivity-agent-tnf8j 1/1 Running 0 61s
kube-system pod/kube-proxy-ebpf-4gt2z 1/1 Running 0 6m48s
kube-system pod/kube-proxy-ebpf-ztjql 1/1 Running 0 6m46s
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.108.32.1 <none> 443/TCP 9m54s
kube-system service/hubble-peer ClusterIP 10.108.54.62 <none> 443/TCP 8m40s
kube-system service/hubble-relay ClusterIP 10.108.41.20 <none> 80/TCP 8m40s
kube-system service/hubble-ui ClusterIP 10.108.52.164 <none> 80/TCP 8m40s
kube-system service/kube-dns ClusterIP 10.108.32.10 <none> 53/UDP,53/TCP,9153/TCP 36s
Installation d’Headlamp, une interface web qui remplace le dashboard traditionnel de Kubernetes, facile à utiliser et extensible.
Headlamp a été créé pour combiner les fonctionnalités traditionnelles des autres interfaces web et tableaux de bord (c’est-à-dire pour lister et visualiser les ressources) avec des fonctionnalités supplémentaires.
(base) root@k8sgpt:~# kubectl apply -f https://raw.githubusercontent.com/kinvolk/headlamp/main/kubernetes-headlamp.yaml
service/headlamp created
deployment.apps/headlamp created
secret/headlamp-admin created
(base) root@k8sgpt:~# kubectl get po,svc -n kube-system
NAME READY STATUS RESTARTS AGE
pod/cilium-4gxht 1/1 Running 0 11m
pod/cilium-lw8gd 1/1 Running 0 11m
pod/coredns-c5c6457c-bnzfc 1/1 Running 0 5m7s
pod/coredns-c5c6457c-nz6gr 1/1 Running 0 5m7s
pod/cpc-bridge-proxy-ebpf-7ncbq 1/1 Running 0 5m26s
pod/cpc-bridge-proxy-ebpf-qth8w 1/1 Running 0 5m26s
pod/csi-do-node-gxlmb 2/2 Running 0 4m24s
pod/csi-do-node-swqfv 2/2 Running 0 4m24s
pod/do-node-agent-7bgsh 1/1 Running 0 4m11s
pod/do-node-agent-hwt6l 1/1 Running 0 4m11s
pod/headlamp-7dfd97b98b-wmn66 1/1 Running 0 48s
pod/hubble-relay-67597fb8-kmlw5 1/1 Running 1 (5m22s ago) 13m
pod/hubble-ui-79957d9f7b-4n9kj 2/2 Running 0 5m45s
pod/konnectivity-agent-7ml7p 1/1 Running 0 5m32s
pod/konnectivity-agent-tnf8j 1/1 Running 0 5m32s
pod/kube-proxy-ebpf-4gt2z 1/1 Running 0 11m
pod/kube-proxy-ebpf-ztjql 1/1 Running 0 11m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/headlamp ClusterIP 10.108.38.247 <none> 80/TCP 48s
service/hubble-peer ClusterIP 10.108.54.62 <none> 443/TCP 13m
service/hubble-relay ClusterIP 10.108.41.20 <none> 80/TCP 13m
service/hubble-ui ClusterIP 10.108.52.164 <none> 80/TCP 13m
service/kube-dns ClusterIP 10.108.32.10 <none> 53/UDP,53/TCP,9153/TCP 5m7s
en l’exposant localement et récupérant la clé nécessaire à l’accès à son interface web :
(base) root@k8sgpt:~# nohup kubectl port-forward -n kube-system service/headlamp 8080:80 &
(base) root@k8sgpt:~# cat nohup.out
Forwarding from 127.0.0.1:8080 -> 4466
Forwarding from [::1]:8080 -> 4466
(base) root@k8sgpt:~# kubectl -n kube-system create serviceaccount headlamp-admin
serviceaccount/headlamp-admin created
(base) root@k8sgpt:~# kubectl create clusterrolebinding headlamp-admin --serviceaccount=kube-system:headlamp-admin --clusterrole=cluster-admin
clusterrolebinding.rbac.authorization.k8s.io/headlamp-admin created
(base) root@k8sgpt:~# kubectl create token headlamp-admin -n kube-system
eyJhbGciOiJSUzI1NiIsImtpZCI6Ikd0UHltNHV6c1liSkY0VkhNRWFZMXJKYklqY1R6ckZrMDZQS281dEg3dUUifQ.eyJhdWQiOlsic3lzdGVtOmtvbm5lY3Rpdml0eS1zZXJ2ZXIiXSwiZXhwIjoxNzM1MzkyNDEwLCJpYXQiOjE3MzUzODg4MTAsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwianRpIjoiNTVkNjI1ZTItNjA2Yi00MTNhLTk2OTgtODFmYjdjZDU4MWY4Iiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJoZWFkbGFtcC1hZG1pbiIsInVpZCI6IjdmOGQwMTU0LWRiYWQtNGU2MS04NTUzLWU1NWI3ZWU0ZjhlOSJ9fSwibmJmIjoxNzM1Mzg4ODEwLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06aGVhZGxhbXAtYWRtaW4ifQ.jbrtuXS7uMP6HfwR3CbIbnpRTq4CDaacq0okwm_4tvmJNNcExi9-Dti3cGj1J3tteszpxVzurWPhrWgFlL4UkEacY9fD1TRH4GAZDCFldJ_jvyeaclzGeymrjEGAZ9TbBdoyuXtLeIVhApdICF1KNM-s8mfr1oOREDwlR9HzzrhoECozYxVS9uM1WIEZpum4FwMEl6cKPqOyNx1Rn5MtKPcc87JyK0FxuXzg9WC-cPSNOxu_rUFrZYHyrVapCDpl_XLymD3pFUUuB8XPVidVXcVOthH1Djwm8TRE6aAD4XlkHTcyTYchvN_CpOI2JQ6DVY60unSU8nq2pxfqLC6G2Q
Avant de lancer une analyse via K8sGPT, introduction d’un problème dans le cluster Kubernetes pour simuler une situation réelle. Vous pouvez utiliser des exemples de déploiements défectueux disponibles sur des répositories comme celui de Robusta :
Exemple avec ce déploiement d’un Pod cassé via :
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-processing-worker
spec:
replicas: 1
selector:
matchLabels:
app: payment-processing-worker
template:
metadata:
labels:
app: payment-processing-worker
spec:
containers:
- name: payment-processing-container
image: bash
command: ["/bin/sh"]
args: ["-c", "if [[ -z \"${DEPLOY_ENV}\" ]]; then echo Environment variable DEPLOY_ENV is undefined ; else while true; do echo hello; sleep 10;done; fi"]
base) root@k8sgpt:~# kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yaml
deployment.apps/payment-processing-worker created
(base) root@k8sgpt:~# kubectl get po
NAME READY STATUS RESTARTS AGE
payment-processing-worker-747ccfb9db-dzjqx 0/1 CrashLoopBackOff 1 (11s ago) 17s
(base) root@k8sgpt:~# kubectl logs po/payment-processing-worker-747ccfb9db-dzjqx
Environment variable DEPLOY_ENV is undefined
Il s’est en effet crashé …
Lancement de l’analyse avec une sortie en JSON :
(base) root@k8sgpt:~# k8sgpt analyze -o json --explain --filter=Pod --backend localai | jq .
{
"provider": "localai",
"errors": null,
"status": "ProblemDetected",
"problems": 1,
"results": [
{
"kind": "Pod",
"name": "default/payment-processing-worker-747ccfb9db-dzjqx",
"error": [
{
"Text": "the last termination reason is Completed container=payment-processing-container pod=payment-processing-worker-747ccfb9db-dzjqx",
"KubernetesDoc": "",
"Sensitive": []
}
],
"details": "Error: The pod \"payment-processing-worker-747ccfb9db-dzjqx\" has completed its execution with a \"Completed\" termination reason, indicating the container \"payment-processing-container\" has finished successfully.\n\nSolution: Verify the logs for the container to ensure data integrity, then check related services for expected outcomes; if successful, mark the pod as ready in the cluster.",
"parentObject": "Deployment/payment-processing-worker"
}
]
}
ou en mode texte :
(base) root@k8sgpt:~# k8sgpt analyze --explain --backend localai --with-doc
100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (1/1, 7960 it/s)
AI Provider: localai
0: Pod default/payment-processing-worker-747ccfb9db-dzjqx(Deployment/payment-processing-worker)
- Error: the last termination reason is Completed container=payment-processing-container pod=payment-processing-worker-747ccfb9db-dzjqx
Error: The pod "payment-processing-worker-747ccfb9db-dzjqx" has completed its execution with a "Completed" termination reason, indicating the container "payment-processing-container" has finished successfully.
Solution: Verify the logs for the container to ensure data integrity, then check related services for expected outcomes; if successful, mark the pod as ready in the cluster.
Comme le montre cette sortie, K8sGPT a identifié et signalé le pod problématique et a également fourni des conseils sur les mesures potentielles à prendre pour comprendre et résoudre le problème.
Autre exemple avec ce Pod Nginx problématique :
apiVersion: v1
kind: Pod
metadata:
name: inventory-management-api
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
command:
- wge
- "-O"
- "/work-dir/index.html"
- https://home.robusta.dev
(base) root@k8sgpt:~# kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/refs/heads/main/crashloop_backoff/create_crashloop_backoff.yaml
pod/inventory-management-api created
(base) root@k8sgpt:~# kubectl get po
NAME READY STATUS RESTARTS AGE
inventory-management-api 0/1 ContainerCreating 0 5s
(base) root@k8sgpt:~# kubectl get po
NAME READY STATUS RESTARTS AGE
inventory-management-api 0/1 RunContainerError 1 (1s ago) 10s
Et une nouvelle analyse met le doigt sur le Pod problématique …
(base) root@k8sgpt:~# k8sgpt analyze --explain --backend localai --with-doc
100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (1/1, 2 it/min)
AI Provider: localai
0: Pod default/inventory-management-api()
- Error: the last termination reason is StartError container=nginx pod=inventory-management-api
Error: The Kubernetes error indicates that there was a StartError issue with the nginx container for the pod named inventory-management-api.
Solution:
1. Check the nginx configuration file for syntax errors.
2. Ensure all required resources and permissions are correctly set.
3. Verify network accessibility within the pod.
4. Confirm proper image pull secrets if using Docker images.
5. Review any recent changes to the deployment or service configurations.
Dans cet autre exemple il peut ne pas détecter de problématique avec cette simulation de faux positif via busybox :
apiVersion: batch/v1
kind: Job
metadata:
name: java-api-checker
spec:
template:
spec:
containers:
- name: java-beans
image: busybox
command: ["/bin/sh", "-c"]
args: ["echo 'Java Network Exception: \nAll host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 \nAll host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 \nAll host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256 \nAll host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256'; sleep 60; exit 1"]
restartPolicy: Never
backoffLimit: 1
(base) root@k8sgpt:~# kubectl delete -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/refs/heads/main/crashloop_backoff/create_crashloop_backoff.yaml
pod "inventory-management-api" deleted
(base) root@k8sgpt:~# kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/refs/heads/main/job_failure/job_crash.yaml
job.batch/java-api-checker created
(base) root@k8sgpt:~# kubectl get po
NAME READY STATUS RESTARTS AGE
java-api-checker-5s6dc 1/1 Running 0 7s
(base) root@k8sgpt:~# kubectl logs po/java-api-checker-5s6dc
Java Network Exception:
All host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256
All host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256
All host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256
All host(s) tried for db query failed (tried: prod-db:3333) - no available connection and the queue has reached its max size 256
(base) root@k8sgpt:~# k8sgpt analyze --explain --backend localai --with-doc
AI Provider: localai
No problems detected
Pour une intégration plus complète, vous pouvez installer l’opérateur K8sGPT dans votre cluster Kubernetes. Cet opérateur surveille en continu les problèmes dans le cluster et génère des insights que vous pouvez consulter en interrogeant la ressource personnalisée (CR) de l'opérateur.
$ helm repo add k8sgpt https://charts.k8sgpt.ai/
$ helm repo update
$ helm install release k8sgpt/k8sgpt-operator -n k8sgpt-operator-system --create-namespace
# Installer l'opérateur k8sGPT
$ kubectl apply -n k8sgpt-operator-system -f - << EOF
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-ollama
spec:
ai:
enabled: true
model: falcon3
backend: localai
baseUrl: http://localhost:11434/v1
noCache: false
filters: ["Pod"]
repository: ghcr.io/k8sgpt-ai/k8sgpt
version: v0.3.48
EOF
Il est également possible d’analyser plusieurs clusters Kubernetes en spécifiant le chemin vers le fichier Kubeconfig concerné :
$ k8sgpt analyze --explain --backend localai --with-doc --kubeconfig <chemin vers le fichier Kubeconfig>
L’opérateur recherchera les problèmes dans le cluster et générera des résultats d’analyse.
En fonction de la puissance de votre machine (pour accélerer les temps de réponse d’Ollama, des ressources GPU sont nécessaires), il faut un certain temps à l’opérateur pour appeler le LLM et générer les informations …
Pour conclure, K8sGPT en combinaison avec Ollama offre une solution puissante pour déboguer et gérer les clusters Kubernetes de manière efficace. Cette intégration utilise l'intelligence artificielle pour fournir des insights clairs et des recommandations pour résoudre les problèmes, simplifiant ainsi la vie des opérateurs de cluster. En suivant ces étapes, vous pouvez mettre en place une solution de diagnostic automatisée et basée sur l'IA pour votre environnement Kubernetes …
À suivre !