Chapter 6A: Kubernetes & Helm Deployment
After building and testing our multi-agent RAG system, we need a production-grade deployment strategy. In this chapter, I'll walk you through deploying ResearcherAI to Kubernetes using Helm charts - taking it from development Docker Compose to cloud-native, auto-scaling infrastructure.
Why Kubernetes for ResearcherAI?
Before diving into implementation, let me explain why I chose Kubernetes for production deployment:
The Challenge
Our ResearcherAI system has complex requirements:
- 6 different services: Application, Neo4j, Qdrant, Kafka (3 brokers), Zookeeper (3 nodes)
- Stateful data: Graph database and vector database need persistent storage
- Event streaming: Kafka cluster requires coordination and high availability
- Variable load: Research queries can spike unpredictably
- Resource management: LLM API calls are expensive, need cost controls
Docker Compose vs. Kubernetes
Docker Compose (Development)
version: '3.8'
services:
app:
build: .
ports:
- "8000:8000"
depends_on:
- neo4j
- qdrant
- kafka
This works great for development, but in production we need:
- Auto-scaling when traffic increases
- Automatic restarts when services crash
- Rolling updates with zero downtime
- Resource limits to control costs
- Health checks and monitoring
- Multiple replicas for high availability
Web Developer Analogy
Think of Kubernetes like moving from a single server to a managed platform:
Docker Compose = Deploying your Node.js app on a single VPS
- You SSH in, run
docker-compose up - Works great until traffic increases
- Manual scaling, manual recovery
Kubernetes = Deploying to a platform like Heroku or Vercel
- Platform handles scaling automatically
- Self-healing when things break
- Built-in load balancing
- But more complex to set up
Kubernetes Basics for Web Developers
If you're coming from web development, here's how Kubernetes concepts map to what you know:
Core Concepts
1. Pods (Like containers, but smarter)
// In Express, you run one server:
const app = express();
app.listen(3000);
// In Kubernetes, a Pod runs one or more containers together:
Pod {
containers: [app, sidecar] // Usually just one
shared_network: true
shared_storage: true
}
2. Deployments (Like PM2 or Forever)
// PM2 keeps your app running:
pm2 start app.js -i 4 // 4 instances
// Kubernetes Deployment does the same:
Deployment {
replicas: 4
auto_restart: true
rolling_updates: true
}
3. Services (Like Nginx reverse proxy)
# Nginx load balances to backends:
upstream backend {
server 10.0.0.1:8000;
server 10.0.0.2:8000;
}
# Kubernetes Service does this automatically:
Service {
type: LoadBalancer
selects: Pods with label "app=researcherai"
distributes: traffic to all selected Pods
}
4. ConfigMaps (Like .env files)
# .env file
DATABASE_URL=postgres://localhost/db
API_KEY=secret123
# Kubernetes ConfigMap
ConfigMap {
data:
DATABASE_URL: postgres://localhost/db
# Secrets go in separate Secret resource
}
5. Persistent Volumes (Like mounted volumes)
# Docker Compose volume:
volumes:
- ./data:/app/data
# Kubernetes PersistentVolumeClaim:
PersistentVolumeClaim {
storage: 10Gi
accessMode: ReadWriteOnce
}
Helm: Kubernetes Package Manager
Helm is to Kubernetes what npm is to Node.js:
npm (Node.js packages):
npm install express
npm install -g pm2
Helm (Kubernetes packages):
helm install my-app ./chart
helm upgrade my-app ./chart
Helm Chart Structure
researcherai/ # Like package.json + all code
├── Chart.yaml # Package metadata (like package.json)
├── values.yaml # Configuration (like .env.example)
├── templates/ # Kubernetes YAML files (like src/)
│ ├── deployment.yaml
│ ├── service.yaml
│ └── configmap.yaml
└── charts/ # Dependencies (like node_modules/)
Building the ResearcherAI Helm Chart
Now let's build our production deployment step by step.
Step 1: Chart Metadata
First, I created Chart.yaml to define the package:
apiVersion: v2
name: researcherai
description: Production-grade Multi-Agent RAG System
type: application
version: 2.0.0 # Chart version
appVersion: "2.0.0" # Application version
keywords:
- rag
- multi-agent
- knowledge-graph
- vector-search
# Dependencies (like npm dependencies)
dependencies:
- name: neo4j
version: 5.13.0
repository: https://helm.neo4j.com/neo4j
condition: neo4j.enabled
- name: qdrant
version: 0.7.0
repository: https://qdrant.github.io/qdrant-helm
condition: qdrant.enabled
- name: strimzi-kafka-operator
version: 0.38.0
repository: https://strimzi.io/charts/
condition: kafka.enabled
Why dependencies?
- Neo4j and Qdrant have official Helm charts
- No need to reinvent the wheel
- Automatic updates and best practices
- But we still control configuration through our
values.yaml
Step 2: Configuration Values
values.yaml is like .env.example - all configurable settings:
# Global settings
global:
namespace: researcherai
storageClass: standard # Cloud provider's default storage
imagePullSecrets: []
# Application configuration
app:
name: researcherai
replicaCount: 2 # Start with 2 replicas for high availability
image:
repository: researcherai/multiagent
tag: "2.0.0"
pullPolicy: IfNotPresent
# Resource limits (prevent runaway costs!)
resources:
requests:
cpu: 1000m # 1 CPU core
memory: 2Gi # 2GB RAM
limits:
cpu: 2000m # Max 2 CPU cores
memory: 4Gi # Max 4GB RAM
# Auto-scaling configuration
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
# Health checks
health:
livenessProbe:
path: /health
initialDelaySeconds: 30
periodSeconds: 30
readinessProbe:
path: /health
initialDelaySeconds: 10
periodSeconds: 10
# Neo4j configuration
neo4j:
enabled: true
image:
tag: "5.13-community"
# Neo4j memory tuning
config:
dbms.memory.heap.initial_size: "2G"
dbms.memory.heap.max_size: "2G"
dbms.memory.pagecache.size: "1G"
# Enable APOC procedures
apoc:
core:
enabled: true
# Persistent storage
persistentVolume:
size: 10Gi
storageClass: standard
# Qdrant configuration
qdrant:
enabled: true
replicaCount: 1
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi
persistence:
size: 10Gi
storageClass: standard
# Kafka configuration
kafka:
enabled: true
cluster:
name: rag-kafka
version: 3.6.0
replicas: 3 # 3 brokers for high availability
# Storage per broker
storage:
type: persistent-claim
size: 10Gi
storageClass: standard
# Kafka configuration
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
default.replication.factor: 3
min.insync.replicas: 2
# Zookeeper configuration
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 5Gi
storageClass: standard
Step 3: Application Deployment
The core deployment template (templates/app-deployment.yaml):
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "researcherai.fullname" . }}
namespace: {{ .Values.global.namespace }}
labels:
{{- include "researcherai.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.app.replicaCount }}
# Rolling update strategy
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Add 1 extra pod during update
maxUnavailable: 0 # Keep all pods running during update
selector:
matchLabels:
{{- include "researcherai.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
{{- include "researcherai.selectorLabels" . | nindent 8 }}
annotations:
# Force restart when config changes
checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
spec:
# Security context
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
# Init containers - wait for dependencies
initContainers:
# Wait for Neo4j
- name: wait-for-neo4j
image: busybox:1.35
command:
- sh
- -c
- |
until nc -z {{ include "researcherai.neo4jHost" . }} 7687; do
echo "Waiting for Neo4j..."
sleep 2
done
# Wait for Qdrant
- name: wait-for-qdrant
image: busybox:1.35
command:
- sh
- -c
- |
until nc -z {{ include "researcherai.qdrantHost" . }} 6333; do
echo "Waiting for Qdrant..."
sleep 2
done
# Wait for Kafka
- name: wait-for-kafka
image: busybox:1.35
command:
- sh
- -c
- |
until nc -z {{ include "researcherai.kafkaBootstrap" . }} 9092; do
echo "Waiting for Kafka..."
sleep 2
done
# Main application container
containers:
- name: researcherai
image: "{{ .Values.app.image.repository }}:{{ .Values.app.image.tag }}"
imagePullPolicy: {{ .Values.app.image.pullPolicy }}
# Container ports
ports:
- name: http
containerPort: 8000
protocol: TCP
# Environment variables from ConfigMap
envFrom:
- configMapRef:
name: {{ include "researcherai.fullname" . }}-config
- secretRef:
name: {{ include "researcherai.fullname" . }}-secrets
# Resource limits
resources:
{{- toYaml .Values.app.resources | nindent 12 }}
# Liveness probe - restart if unhealthy
livenessProbe:
httpGet:
path: {{ .Values.app.health.livenessProbe.path }}
port: http
initialDelaySeconds: {{ .Values.app.health.livenessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.app.health.livenessProbe.periodSeconds }}
timeoutSeconds: 5
failureThreshold: 3
# Readiness probe - don't send traffic if not ready
readinessProbe:
httpGet:
path: {{ .Values.app.health.readinessProbe.path }}
port: http
initialDelaySeconds: {{ .Values.app.health.readinessProbe.initialDelaySeconds }}
periodSeconds: {{ .Values.app.health.readinessProbe.periodSeconds }}
timeoutSeconds: 3
failureThreshold: 3
# Volumes
volumeMounts:
- name: logs
mountPath: /app/logs
- name: sessions
mountPath: /app/sessions
# Volume definitions
volumes:
- name: logs
emptyDir: {}
- name: sessions
emptyDir: {}
Key Features:
- Init Containers: Wait for dependencies before starting
- Rolling Updates: Zero-downtime deployments
- Health Checks: Automatic restart if unhealthy
- Security Context: Run as non-root user
- Resource Limits: Prevent runaway costs
- Configuration Management: Auto-restart on config changes
Step 4: Auto-Scaling
Horizontal Pod Autoscaler (templates/hpa.yaml):
{{- if .Values.app.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: {{ include "researcherai.fullname" . }}
namespace: {{ .Values.global.namespace }}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ include "researcherai.fullname" . }}
minReplicas: {{ .Values.app.autoscaling.minReplicas }}
maxReplicas: {{ .Values.app.autoscaling.maxReplicas }}
metrics:
# Scale based on CPU usage
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: {{ .Values.app.autoscaling.targetCPUUtilizationPercentage }}
# Scale based on memory usage
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: {{ .Values.app.autoscaling.targetMemoryUtilizationPercentage }}
# Scaling behavior
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 minutes before scaling down
policies:
- type: Percent
value: 50 # Remove max 50% of pods at once
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0 # Scale up immediately
policies:
- type: Percent
value: 100 # Can double pods quickly
periodSeconds: 15
{{- end }}
How it works:
Traffic Spike:
CPU > 80% → Add pods (up to max 10)
CPU < 80% for 5min → Remove pods (down to min 2)
Memory Pressure:
Memory > 80% → Add pods
Memory < 80% → Remove pods
Step 5: Kafka Event Streaming
Kafka cluster configuration (templates/kafka-cluster.yaml):
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: {{ .Values.kafka.cluster.name }}
namespace: {{ .Values.global.namespace }}
spec:
# Kafka brokers
kafka:
version: {{ .Values.kafka.cluster.version }}
replicas: {{ .Values.kafka.cluster.replicas }}
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: {{ .Values.kafka.cluster.config.offsets.topic.replication.factor }}
transaction.state.log.replication.factor: {{ .Values.kafka.cluster.config.transaction.state.log.replication.factor }}
transaction.state.log.min.isr: {{ .Values.kafka.cluster.config.transaction.state.log.min.isr }}
default.replication.factor: {{ .Values.kafka.cluster.config.default.replication.factor }}
min.insync.replicas: {{ .Values.kafka.cluster.config.min.insync.replicas }}
# Performance tuning
num.network.threads: 8
num.io.threads: 8
socket.send.buffer.bytes: 102400
socket.receive.buffer.bytes: 102400
socket.request.max.bytes: 104857600
# Storage
storage:
type: {{ .Values.kafka.cluster.storage.type }}
size: {{ .Values.kafka.cluster.storage.size }}
class: {{ .Values.kafka.cluster.storage.storageClass }}
deleteClaim: false # Keep data when deleting cluster
# Zookeeper ensemble
zookeeper:
replicas: {{ .Values.kafka.zookeeper.replicas }}
storage:
type: {{ .Values.kafka.zookeeper.storage.type }}
size: {{ .Values.kafka.zookeeper.storage.size }}
class: {{ .Values.kafka.zookeeper.storage.storageClass }}
deleteClaim: false
# Entity Operator (manages topics and users)
entityOperator:
topicOperator: {}
userOperator: {}
Kafka topics for our event-driven architecture (templates/kafka-topics.yaml):
# Query flow topics
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: query.submitted
namespace: {{ .Values.global.namespace }}
labels:
strimzi.io/cluster: {{ .Values.kafka.cluster.name }}
spec:
partitions: 3
replicas: 3
config:
retention.ms: 604800000 # 7 days
segment.bytes: 1073741824 # 1GB
compression.type: producer
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: data.collection.started
namespace: {{ .Values.global.namespace }}
labels:
strimzi.io/cluster: {{ .Values.kafka.cluster.name }}
spec:
partitions: 3
replicas: 3
config:
retention.ms: 604800000
segment.bytes: 1073741824
compression.type: producer
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: data.collection.completed
namespace: {{ .Values.global.namespace }}
labels:
strimzi.io/cluster: {{ .Values.kafka.cluster.name }}
spec:
partitions: 3
replicas: 3
config:
retention.ms: 604800000
segment.bytes: 1073741824
compression.type: producer
# ... (16 total topics for complete event flow)
Topics I created:
- Query Topics:
query.submitted,query.validated - Data Collection:
data.collection.{started,completed,failed} - Graph Processing:
graph.processing.{started,completed,failed} - Vector Processing:
vector.processing.{started,completed,failed} - Reasoning:
reasoning.{started,completed,failed} - Health:
agent.health.check,agent.error
Step 6: Neo4j StatefulSet
Neo4j needs persistent storage and stable network identity:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: {{ include "researcherai.fullname" . }}-neo4j
namespace: {{ .Values.global.namespace }}
spec:
serviceName: {{ include "researcherai.fullname" . }}-neo4j
replicas: 1 # Single instance for simplicity
selector:
matchLabels:
app: neo4j
template:
metadata:
labels:
app: neo4j
spec:
securityContext:
runAsNonRoot: true
runAsUser: 7474
fsGroup: 7474
containers:
- name: neo4j
image: neo4j:{{ .Values.neo4j.image.tag }}
ports:
- name: http
containerPort: 7474
- name: bolt
containerPort: 7687
env:
# Memory configuration
- name: NEO4J_dbms_memory_heap_initial__size
value: {{ .Values.neo4j.config.dbms.memory.heap.initial_size }}
- name: NEO4J_dbms_memory_heap_max__size
value: {{ .Values.neo4j.config.dbms.memory.heap.max_size }}
- name: NEO4J_dbms_memory_pagecache_size
value: {{ .Values.neo4j.config.dbms.memory.pagecache.size }}
# Enable APOC
- name: NEO4J_apoc_export_file_enabled
value: "true"
- name: NEO4J_apoc_import_file_enabled
value: "true"
- name: NEO4J_dbms_security_procedures_unrestricted
value: "apoc.*"
# Authentication
- name: NEO4J_AUTH
valueFrom:
secretKeyRef:
name: {{ include "researcherai.fullname" . }}-secrets
key: NEO4J_PASSWORD
volumeMounts:
- name: data
mountPath: /data
- name: logs
mountPath: /logs
resources:
{{- toYaml .Values.neo4j.resources | nindent 12 }}
# Volume claim templates
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: {{ .Values.neo4j.persistentVolume.storageClass }}
resources:
requests:
storage: {{ .Values.neo4j.persistentVolume.size }}
- metadata:
name: logs
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: {{ .Values.neo4j.persistentVolume.storageClass }}
resources:
requests:
storage: 1Gi
Why StatefulSet?
- Stable network identity: Pod always gets same DNS name
- Ordered deployment: Pods start/stop in order
- Persistent storage: Each pod gets its own persistent volume
- Data survives restarts: Storage persists even if pod is deleted
Step 7: Service Discovery
Services make pods discoverable (templates/service.yaml):
# Application service
apiVersion: v1
kind: Service
metadata:
name: {{ include "researcherai.fullname" . }}
namespace: {{ .Values.global.namespace }}
spec:
type: ClusterIP
ports:
- port: 8000
targetPort: http
protocol: TCP
name: http
selector:
{{- include "researcherai.selectorLabels" . | nindent 4 }}
---
# Neo4j service
apiVersion: v1
kind: Service
metadata:
name: {{ include "researcherai.fullname" . }}-neo4j
namespace: {{ .Values.global.namespace }}
spec:
type: ClusterIP
ports:
- port: 7474
targetPort: 7474
name: http
- port: 7687
targetPort: 7687
name: bolt
selector:
app: neo4j
---
# Qdrant service
apiVersion: v1
kind: Service
metadata:
name: {{ include "researcherai.fullname" . }}-qdrant
namespace: {{ .Values.global.namespace }}
spec:
type: ClusterIP
ports:
- port: 6333
targetPort: 6333
name: http
- port: 6334
targetPort: 6334
name: grpc
selector:
app.kubernetes.io/name: qdrant
DNS names applications use:
# Automatic service discovery in Kubernetes
NEO4J_URI = "bolt://researcherai-neo4j:7687"
QDRANT_HOST = "researcherai-qdrant:6333"
KAFKA_BOOTSTRAP_SERVERS = "rag-kafka-kafka-bootstrap:9092"
Step 8: Ingress for External Access
Ingress controller routes traffic from internet to our application:
{{- if .Values.ingress.enabled }}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ include "researcherai.fullname" . }}
namespace: {{ .Values.global.namespace }}
annotations:
# NGINX Ingress Controller
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
# cert-manager for Let's Encrypt SSL
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts:
- {{ .Values.ingress.host }}
secretName: {{ include "researcherai.fullname" . }}-tls
rules:
- host: {{ .Values.ingress.host }}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: {{ include "researcherai.fullname" . }}
port:
number: 8000
{{- end }}
Traffic flow:
Internet
↓
Ingress Controller (nginx) → SSL termination
↓
Service (load balancer)
↓
Pods (2-10 replicas)
Deploying to Kubernetes
Now let's deploy everything!
Prerequisites
# 1. Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
# 2. Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# 3. Verify installations
kubectl version --client
helm version
Option 1: Automated Deployment Script
I created a deployment script that handles everything:
#!/bin/bash
# k8s/scripts/deploy.sh
set -e
echo "ResearcherAI Kubernetes Deployment"
echo "=================================="
# Check prerequisites
if ! command -v kubectl &> /dev/null; then
echo "Error: kubectl not found"
exit 1
fi
if ! command -v helm &> /dev/null; then
echo "Error: helm not found"
exit 1
fi
# Check cluster connection
echo "Checking Kubernetes cluster connection..."
if ! kubectl cluster-info &> /dev/null; then
echo "Error: Cannot connect to Kubernetes cluster"
exit 1
fi
# Install Strimzi Kafka Operator
echo "Installing Strimzi Kafka Operator..."
kubectl create namespace kafka --dry-run=client -o yaml | kubectl apply -f -
helm repo add strimzi https://strimzi.io/charts/
helm repo update
helm upgrade --install strimzi-kafka-operator strimzi/strimzi-kafka-operator \
--namespace kafka \
--set watchAnyNamespace=true \
--wait
# Get configuration
read -p "Enter your Google API key: " -s GOOGLE_API_KEY
echo
read -p "Enter Neo4j password: " -s NEO4J_PASSWORD
echo
# Create values file
cat > /tmp/researcherai-values.yaml <<EOF
app:
secrets:
googleApiKey: "${GOOGLE_API_KEY}"
neo4jPassword: "${NEO4J_PASSWORD}"
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
neo4j:
enabled: true
qdrant:
enabled: true
kafka:
enabled: true
EOF
# Install/upgrade ResearcherAI
echo "Deploying ResearcherAI..."
helm upgrade --install researcherai ./k8s/helm/researcherai \
--namespace researcherai \
--create-namespace \
--values /tmp/researcherai-values.yaml \
--wait \
--timeout 10m
# Clean up values file
rm /tmp/researcherai-values.yaml
# Show deployment status
echo ""
echo "Deployment completed successfully!"
echo ""
echo "Checking pod status..."
kubectl get pods -n researcherai
echo ""
echo "Getting service information..."
kubectl get services -n researcherai
echo ""
echo "To watch pod status:"
echo " kubectl get pods -n researcherai --watch"
echo ""
echo "To view logs:"
echo " kubectl logs -n researcherai -l app.kubernetes.io/name=researcherai --follow"
echo ""
echo "To access the application:"
echo " kubectl port-forward -n researcherai svc/researcherai 8000:8000"
echo " Then visit: http://localhost:8000"
Run the deployment:
chmod +x k8s/scripts/deploy.sh
./k8s/scripts/deploy.sh
Option 2: Manual Step-by-Step Deployment
Step 1: Install Strimzi Operator
# Create Kafka namespace
kubectl create namespace kafka
# Add Strimzi Helm repo
helm repo add strimzi https://strimzi.io/charts/
helm repo update
# Install operator
helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator \
--namespace kafka \
--set watchAnyNamespace=true
Step 2: Create Configuration File
# custom-values.yaml
app:
replicaCount: 2
secrets:
googleApiKey: "YOUR_GOOGLE_API_KEY"
neo4jPassword: "secure-neo4j-password"
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 80
neo4j:
enabled: true
config:
dbms.memory.heap.initial_size: "2G"
dbms.memory.heap.max_size: "2G"
qdrant:
enabled: true
persistence:
size: 20Gi # Adjust based on your needs
kafka:
enabled: true
cluster:
replicas: 3
ingress:
enabled: true
host: researcherai.yourdomain.com
Step 3: Install ResearcherAI
helm install researcherai ./k8s/helm/researcherai \
--namespace researcherai \
--create-namespace \
--values custom-values.yaml \
--wait
Step 4: Verify Deployment
# Watch pods start up
kubectl get pods -n researcherai --watch
# Expected output:
NAME READY STATUS RESTARTS AGE
researcherai-6b8f7d9c5d-abcde 1/1 Running 0 2m
researcherai-6b8f7d9c5d-fghij 1/1 Running 0 2m
researcherai-neo4j-0 1/1 Running 0 3m
researcherai-qdrant-7d9b8c6f5-xyz 1/1 Running 0 3m
rag-kafka-kafka-0 1/1 Running 0 4m
rag-kafka-kafka-1 1/1 Running 0 4m
rag-kafka-kafka-2 1/1 Running 0 4m
rag-kafka-zookeeper-0 1/1 Running 0 5m
rag-kafka-zookeeper-1 1/1 Running 0 5m
rag-kafka-zookeeper-2 1/1 Running 0 5m
# Check services
kubectl get services -n researcherai
# Check Kafka topics
kubectl get kafkatopics -n researcherai
Step 5: Access the Application
# Port forward to local machine
kubectl port-forward -n researcherai svc/researcherai 8000:8000
# Visit in browser
open http://localhost:8000
Testing Auto-Scaling
Let's verify that auto-scaling works:
Generate Load
# Install Apache Bench
sudo apt-get install apache2-utils
# Generate load (100 concurrent requests, 10000 total)
ab -n 10000 -c 100 http://localhost:8000/api/query
Watch Scaling
# In another terminal, watch HPA
kubectl get hpa -n researcherai --watch
# Expected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
researcherai Deployment/researcherai 45%/80%, 30%/80% 2 10 2 10m
researcherai Deployment/researcherai 85%/80%, 35%/80% 2 10 3 10m
researcherai Deployment/researcherai 92%/80%, 40%/80% 2 10 4 11m
# Watch pods being created
kubectl get pods -n researcherai --watch
Monitoring Deployment
View Logs
# All application pods
kubectl logs -n researcherai -l app.kubernetes.io/name=researcherai --follow
# Specific pod
kubectl logs -n researcherai researcherai-6b8f7d9c5d-abcde --follow
# Previous crashed container
kubectl logs -n researcherai researcherai-6b8f7d9c5d-abcde --previous
# Multiple containers in a pod
kubectl logs -n researcherai researcherai-6b8f7d9c5d-abcde -c researcherai
Check Resource Usage
# Pod resource usage
kubectl top pods -n researcherai
# Node resource usage
kubectl top nodes
# Detailed pod description
kubectl describe pod -n researcherai researcherai-6b8f7d9c5d-abcde
Check Events
# Namespace events
kubectl get events -n researcherai --sort-by='.lastTimestamp'
# Watch events in real-time
kubectl get events -n researcherai --watch
Updating the Application
Rolling Update
# Update image tag in values
app:
image:
tag: "2.1.0" # New version
# Apply upgrade
helm upgrade researcherai ./k8s/helm/researcherai \
--namespace researcherai \
--values custom-values.yaml
# Watch rolling update
kubectl rollout status deployment/researcherai -n researcherai
Rollback if Needed
# View revision history
helm history researcherai -n researcherai
# Rollback to previous revision
helm rollback researcherai -n researcherai
# Rollback to specific revision
helm rollback researcherai 3 -n researcherai
Production Considerations
1. Resource Planning
Calculate Total Resources:
# Application: 2-10 pods
CPU: 1000m request × 2 pods = 2 cores minimum
CPU: 2000m limit × 10 pods = 20 cores maximum
RAM: 2Gi × 2 = 4Gi minimum
RAM: 4Gi × 10 = 40Gi maximum
# Neo4j: 1 pod
CPU: 1 core
RAM: 4Gi (2G heap + 1G page cache + overhead)
Storage: 10Gi
# Qdrant: 1 pod
CPU: 1 core
RAM: 2Gi
Storage: 10Gi
# Kafka: 3 brokers
CPU: 3 cores
RAM: 6Gi
Storage: 30Gi
# Zookeeper: 3 nodes
CPU: 1.5 cores
RAM: 3Gi
Storage: 15Gi
# TOTAL MINIMUM:
CPU: ~10 cores
RAM: ~19Gi
Storage: ~65Gi
# TOTAL MAXIMUM (when auto-scaled):
CPU: ~30 cores
RAM: ~55Gi
Kubernetes Cluster Sizing:
For production, I recommend:
- 3-node cluster (high availability)
- Each node: 8 cores, 32GB RAM, 100GB SSD
- Total: 24 cores, 96GB RAM, 300GB storage
- Cost: ~$500-700/month (varies by cloud provider)
2. High Availability Checklist
- ✅ Multiple pod replicas (2-10)
- ✅ Pod Disruption Budget configured
- ✅ Kafka 3-broker cluster
- ✅ Zookeeper 3-node ensemble
- ✅ Rolling updates with maxUnavailable: 0
- ⚠️ Neo4j single replica (consider clustering)
- ⚠️ Qdrant single replica (configure replication)
3. Backup Strategy
What to backup:
- Neo4j data (persistent volume)
- Qdrant collections (persistent volume)
- Kubernetes configurations (Git)
- Secrets (external secret management)
Backup tools:
# Install Velero for Kubernetes backups
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
helm install velero vmware-tanzu/velero \
--namespace velero \
--create-namespace \
--set configuration.provider=aws \
--set configuration.backupStorageLocation.bucket=researcherai-backups \
--set configuration.backupStorageLocation.config.region=us-east-1
# Schedule daily backups
velero schedule create daily-backup \
--schedule="0 2 * * *" \
--include-namespaces researcherai
4. Security Hardening
Network Policies:
# Restrict pod-to-pod communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: researcherai-network-policy
namespace: researcherai
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: researcherai
policyTypes:
- Ingress
- Egress
ingress:
# Allow from ingress controller
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8000
egress:
# Allow to Neo4j
- to:
- podSelector:
matchLabels:
app: neo4j
ports:
- protocol: TCP
port: 7687
# Allow to Qdrant
- to:
- podSelector:
matchLabels:
app.kubernetes.io/name: qdrant
ports:
- protocol: TCP
port: 6333
# Allow to Kafka
- to:
- podSelector:
matchLabels:
strimzi.io/cluster: rag-kafka
ports:
- protocol: TCP
port: 9092
# Allow DNS
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
# Allow internet (for API calls)
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443
Pod Security Standards:
# Enforce restricted security standard
apiVersion: v1
kind: Namespace
metadata:
name: researcherai
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
5. Cost Optimization
Tips to reduce costs:
- Use spot/preemptible instances for non-production
- Right-size resources - adjust requests/limits based on actual usage
- Scale to zero for dev/staging environments when not in use
- Use cluster autoscaler - add/remove nodes based on demand
- Monitor costs - use tools like Kubecost
Example: Development Environment
# dev-values.yaml
app:
replicaCount: 1 # Only 1 replica
resources:
requests:
cpu: 500m # Half resources
memory: 1Gi
autoscaling:
enabled: false # No auto-scaling in dev
neo4j:
config:
dbms.memory.heap.initial_size: "512M" # Less memory
dbms.memory.heap.max_size: "512M"
kafka:
cluster:
replicas: 1 # Single broker for dev
zookeeper:
replicas: 1 # Single zookeeper
Troubleshooting
Pod Won't Start
# Check pod status
kubectl describe pod -n researcherai <pod-name>
# Common issues:
# 1. Image pull error - check image name/tag
# 2. Init container failing - check dependency services
# 3. Resource constraints - check node resources
# Check events
kubectl get events -n researcherai --sort-by='.lastTimestamp'
Application Crashes
# Check logs
kubectl logs -n researcherai <pod-name> --previous
# Check resource limits
kubectl top pod -n researcherai <pod-name>
# Common issues:
# - Out of memory (OOMKilled)
# - Unhandled exceptions
# - Database connection failures
Can't Connect to Database
# Test Neo4j connection
kubectl run -it --rm debug --image=busybox --restart=Never -n researcherai -- sh
nc -zv researcherai-neo4j 7687
# Check Neo4j logs
kubectl logs -n researcherai researcherai-neo4j-0
# Verify service
kubectl get svc -n researcherai researcherai-neo4j
Kafka Issues
# Check Kafka cluster status
kubectl get kafka -n researcherai
# Check topics
kubectl get kafkatopics -n researcherai
# Connect to Kafka pod
kubectl exec -it -n researcherai rag-kafka-kafka-0 -- bash
# Inside pod, test Kafka
kafka-topics.sh --bootstrap-server localhost:9092 --list
Next Steps
We've successfully deployed ResearcherAI to Kubernetes with:
- ✅ Auto-scaling (2-10 replicas)
- ✅ High availability (multiple brokers, rolling updates)
- ✅ Persistent storage (StatefulSets, PVCs)
- ✅ Service discovery (Kubernetes DNS)
- ✅ Event streaming (Kafka cluster)
- ✅ Health checks (liveness/readiness probes)
But we're not done yet! In the next chapter, we'll cover:
- Terraform - Infrastructure as Code for cloud resources
- Observability - Prometheus, Grafana, distributed tracing
- CI/CD - Automated deployments with GitHub Actions
- Production readiness - Security, backup, disaster recovery
Let's continue with Terraform deployment in the next chapter!