Chapter 6A: Kubernetes & Helm Deployment

After building and testing our multi-agent RAG system, we need a production-grade deployment strategy. In this chapter, I'll walk you through deploying ResearcherAI to Kubernetes using Helm charts - taking it from development Docker Compose to cloud-native, auto-scaling infrastructure.

Why Kubernetes for ResearcherAI?

Before diving into implementation, let me explain why I chose Kubernetes for production deployment:

The Challenge

Our ResearcherAI system has complex requirements:

6 different services: Application, Neo4j, Qdrant, Kafka (3 brokers), Zookeeper (3 nodes)
Stateful data: Graph database and vector database need persistent storage
Event streaming: Kafka cluster requires coordination and high availability
Variable load: Research queries can spike unpredictably
Resource management: LLM API calls are expensive, need cost controls

Docker Compose vs. Kubernetes

Docker Compose (Development)

version: '3.8'
services:
  app:
    build: .
    ports:
      - "8000:8000"
    depends_on:
      - neo4j
      - qdrant
      - kafka

This works great for development, but in production we need:

Auto-scaling when traffic increases
Automatic restarts when services crash
Rolling updates with zero downtime
Resource limits to control costs
Health checks and monitoring
Multiple replicas for high availability

Web Developer Analogy

Think of Kubernetes like moving from a single server to a managed platform:

Docker Compose = Deploying your Node.js app on a single VPS

You SSH in, run docker-compose up
Works great until traffic increases
Manual scaling, manual recovery

Kubernetes = Deploying to a platform like Heroku or Vercel

Platform handles scaling automatically
Self-healing when things break
Built-in load balancing
But more complex to set up

Kubernetes Basics for Web Developers

If you're coming from web development, here's how Kubernetes concepts map to what you know:

Core Concepts

1. Pods (Like containers, but smarter)

// In Express, you run one server:
const app = express();
app.listen(3000);

// In Kubernetes, a Pod runs one or more containers together:
Pod {
  containers: [app, sidecar]  // Usually just one
  shared_network: true
  shared_storage: true
}

2. Deployments (Like PM2 or Forever)

// PM2 keeps your app running:
pm2 start app.js -i 4  // 4 instances

// Kubernetes Deployment does the same:
Deployment {
  replicas: 4
  auto_restart: true
  rolling_updates: true
}

3. Services (Like Nginx reverse proxy)

# Nginx load balances to backends:
upstream backend {
    server 10.0.0.1:8000;
    server 10.0.0.2:8000;
}

# Kubernetes Service does this automatically:
Service {
  type: LoadBalancer
  selects: Pods with label "app=researcherai"
  distributes: traffic to all selected Pods
}

4. ConfigMaps (Like .env files)

# .env file
DATABASE_URL=postgres://localhost/db
API_KEY=secret123

# Kubernetes ConfigMap
ConfigMap {
  data:
    DATABASE_URL: postgres://localhost/db
    # Secrets go in separate Secret resource
}

5. Persistent Volumes (Like mounted volumes)

# Docker Compose volume:
volumes:
  - ./data:/app/data

# Kubernetes PersistentVolumeClaim:
PersistentVolumeClaim {
  storage: 10Gi
  accessMode: ReadWriteOnce
}

Helm: Kubernetes Package Manager

Helm is to Kubernetes what npm is to Node.js:

npm (Node.js packages):

npm install express
npm install -g pm2

Helm (Kubernetes packages):

helm install my-app ./chart
helm upgrade my-app ./chart

Helm Chart Structure

researcherai/              # Like package.json + all code
├── Chart.yaml             # Package metadata (like package.json)
├── values.yaml            # Configuration (like .env.example)
├── templates/             # Kubernetes YAML files (like src/)
│   ├── deployment.yaml
│   ├── service.yaml
│   └── configmap.yaml
└── charts/                # Dependencies (like node_modules/)

Building the ResearcherAI Helm Chart

Now let's build our production deployment step by step.

Step 1: Chart Metadata

First, I created Chart.yaml to define the package:

apiVersion: v2
name: researcherai
description: Production-grade Multi-Agent RAG System
type: application
version: 2.0.0        # Chart version
appVersion: "2.0.0"   # Application version
keywords:
  - rag
  - multi-agent
  - knowledge-graph
  - vector-search

# Dependencies (like npm dependencies)
dependencies:
  - name: neo4j
    version: 5.13.0
    repository: https://helm.neo4j.com/neo4j
    condition: neo4j.enabled

  - name: qdrant
    version: 0.7.0
    repository: https://qdrant.github.io/qdrant-helm
    condition: qdrant.enabled

  - name: strimzi-kafka-operator
    version: 0.38.0
    repository: https://strimzi.io/charts/
    condition: kafka.enabled

Why dependencies?

Neo4j and Qdrant have official Helm charts
No need to reinvent the wheel
Automatic updates and best practices
But we still control configuration through our values.yaml

Step 2: Configuration Values

values.yaml is like .env.example - all configurable settings:

# Global settings
global:
  namespace: researcherai
  storageClass: standard  # Cloud provider's default storage
  imagePullSecrets: []

# Application configuration
app:
  name: researcherai
  replicaCount: 2  # Start with 2 replicas for high availability

  image:
    repository: researcherai/multiagent
    tag: "2.0.0"
    pullPolicy: IfNotPresent

  # Resource limits (prevent runaway costs!)
  resources:
    requests:
      cpu: 1000m      # 1 CPU core
      memory: 2Gi     # 2GB RAM
    limits:
      cpu: 2000m      # Max 2 CPU cores
      memory: 4Gi     # Max 4GB RAM

  # Auto-scaling configuration
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilizationPercentage: 80
    targetMemoryUtilizationPercentage: 80

  # Health checks
  health:
    livenessProbe:
      path: /health
      initialDelaySeconds: 30
      periodSeconds: 30
    readinessProbe:
      path: /health
      initialDelaySeconds: 10
      periodSeconds: 10

# Neo4j configuration
neo4j:
  enabled: true
  image:
    tag: "5.13-community"

  # Neo4j memory tuning
  config:
    dbms.memory.heap.initial_size: "2G"
    dbms.memory.heap.max_size: "2G"
    dbms.memory.pagecache.size: "1G"

  # Enable APOC procedures
  apoc:
    core:
      enabled: true

  # Persistent storage
  persistentVolume:
    size: 10Gi
    storageClass: standard

# Qdrant configuration
qdrant:
  enabled: true
  replicaCount: 1

  resources:
    requests:
      cpu: 1000m
      memory: 2Gi
    limits:
      cpu: 2000m
      memory: 4Gi

  persistence:
    size: 10Gi
    storageClass: standard

# Kafka configuration
kafka:
  enabled: true
  cluster:
    name: rag-kafka
    version: 3.6.0
    replicas: 3  # 3 brokers for high availability

    # Storage per broker
    storage:
      type: persistent-claim
      size: 10Gi
      storageClass: standard

    # Kafka configuration
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2

  # Zookeeper configuration
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 5Gi
      storageClass: standard

Step 3: Application Deployment

The core deployment template (templates/app-deployment.yaml):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "researcherai.fullname" . }}
  namespace: {{ .Values.global.namespace }}
  labels:
    {{- include "researcherai.labels" . | nindent 4 }}
spec:
  replicas: {{ .Values.app.replicaCount }}

  # Rolling update strategy
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Add 1 extra pod during update
      maxUnavailable: 0  # Keep all pods running during update

  selector:
    matchLabels:
      {{- include "researcherai.selectorLabels" . | nindent 6 }}

  template:
    metadata:
      labels:
        {{- include "researcherai.selectorLabels" . | nindent 8 }}
      annotations:
        # Force restart when config changes
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
        checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}

    spec:
      # Security context
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000

      # Init containers - wait for dependencies
      initContainers:
        # Wait for Neo4j
        - name: wait-for-neo4j
          image: busybox:1.35
          command:
            - sh
            - -c
            - |
              until nc -z {{ include "researcherai.neo4jHost" . }} 7687; do
                echo "Waiting for Neo4j..."
                sleep 2
              done

        # Wait for Qdrant
        - name: wait-for-qdrant
          image: busybox:1.35
          command:
            - sh
            - -c
            - |
              until nc -z {{ include "researcherai.qdrantHost" . }} 6333; do
                echo "Waiting for Qdrant..."
                sleep 2
              done

        # Wait for Kafka
        - name: wait-for-kafka
          image: busybox:1.35
          command:
            - sh
            - -c
            - |
              until nc -z {{ include "researcherai.kafkaBootstrap" . }} 9092; do
                echo "Waiting for Kafka..."
                sleep 2
              done

      # Main application container
      containers:
        - name: researcherai
          image: "{{ .Values.app.image.repository }}:{{ .Values.app.image.tag }}"
          imagePullPolicy: {{ .Values.app.image.pullPolicy }}

          # Container ports
          ports:
            - name: http
              containerPort: 8000
              protocol: TCP

          # Environment variables from ConfigMap
          envFrom:
            - configMapRef:
                name: {{ include "researcherai.fullname" . }}-config
            - secretRef:
                name: {{ include "researcherai.fullname" . }}-secrets

          # Resource limits
          resources:
            {{- toYaml .Values.app.resources | nindent 12 }}

          # Liveness probe - restart if unhealthy
          livenessProbe:
            httpGet:
              path: {{ .Values.app.health.livenessProbe.path }}
              port: http
            initialDelaySeconds: {{ .Values.app.health.livenessProbe.initialDelaySeconds }}
            periodSeconds: {{ .Values.app.health.livenessProbe.periodSeconds }}
            timeoutSeconds: 5
            failureThreshold: 3

          # Readiness probe - don't send traffic if not ready
          readinessProbe:
            httpGet:
              path: {{ .Values.app.health.readinessProbe.path }}
              port: http
            initialDelaySeconds: {{ .Values.app.health.readinessProbe.initialDelaySeconds }}
            periodSeconds: {{ .Values.app.health.readinessProbe.periodSeconds }}
            timeoutSeconds: 3
            failureThreshold: 3

          # Volumes
          volumeMounts:
            - name: logs
              mountPath: /app/logs
            - name: sessions
              mountPath: /app/sessions

      # Volume definitions
      volumes:
        - name: logs
          emptyDir: {}
        - name: sessions
          emptyDir: {}

Key Features:

Init Containers: Wait for dependencies before starting
Rolling Updates: Zero-downtime deployments
Health Checks: Automatic restart if unhealthy
Security Context: Run as non-root user
Resource Limits: Prevent runaway costs
Configuration Management: Auto-restart on config changes

Step 4: Auto-Scaling

Horizontal Pod Autoscaler (templates/hpa.yaml):

{{- if .Values.app.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: {{ include "researcherai.fullname" . }}
  namespace: {{ .Values.global.namespace }}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ include "researcherai.fullname" . }}

  minReplicas: {{ .Values.app.autoscaling.minReplicas }}
  maxReplicas: {{ .Values.app.autoscaling.maxReplicas }}

  metrics:
    # Scale based on CPU usage
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: {{ .Values.app.autoscaling.targetCPUUtilizationPercentage }}

    # Scale based on memory usage
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: {{ .Values.app.autoscaling.targetMemoryUtilizationPercentage }}

  # Scaling behavior
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 minutes before scaling down
      policies:
        - type: Percent
          value: 50    # Remove max 50% of pods at once
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0    # Scale up immediately
      policies:
        - type: Percent
          value: 100   # Can double pods quickly
          periodSeconds: 15
{{- end }}

How it works:

Traffic Spike:
  CPU > 80% → Add pods (up to max 10)
  CPU < 80% for 5min → Remove pods (down to min 2)

Memory Pressure:
  Memory > 80% → Add pods
  Memory < 80% → Remove pods

Step 5: Kafka Event Streaming

Kafka cluster configuration (templates/kafka-cluster.yaml):

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: {{ .Values.kafka.cluster.name }}
  namespace: {{ .Values.global.namespace }}
spec:
  # Kafka brokers
  kafka:
    version: {{ .Values.kafka.cluster.version }}
    replicas: {{ .Values.kafka.cluster.replicas }}

    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true

    config:
      offsets.topic.replication.factor: {{ .Values.kafka.cluster.config.offsets.topic.replication.factor }}
      transaction.state.log.replication.factor: {{ .Values.kafka.cluster.config.transaction.state.log.replication.factor }}
      transaction.state.log.min.isr: {{ .Values.kafka.cluster.config.transaction.state.log.min.isr }}
      default.replication.factor: {{ .Values.kafka.cluster.config.default.replication.factor }}
      min.insync.replicas: {{ .Values.kafka.cluster.config.min.insync.replicas }}

      # Performance tuning
      num.network.threads: 8
      num.io.threads: 8
      socket.send.buffer.bytes: 102400
      socket.receive.buffer.bytes: 102400
      socket.request.max.bytes: 104857600

    # Storage
    storage:
      type: {{ .Values.kafka.cluster.storage.type }}
      size: {{ .Values.kafka.cluster.storage.size }}
      class: {{ .Values.kafka.cluster.storage.storageClass }}
      deleteClaim: false  # Keep data when deleting cluster

  # Zookeeper ensemble
  zookeeper:
    replicas: {{ .Values.kafka.zookeeper.replicas }}
    storage:
      type: {{ .Values.kafka.zookeeper.storage.type }}
      size: {{ .Values.kafka.zookeeper.storage.size }}
      class: {{ .Values.kafka.zookeeper.storage.storageClass }}
      deleteClaim: false

  # Entity Operator (manages topics and users)
  entityOperator:
    topicOperator: {}
    userOperator: {}

Kafka topics for our event-driven architecture (templates/kafka-topics.yaml):

# Query flow topics
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: query.submitted
  namespace: {{ .Values.global.namespace }}
  labels:
    strimzi.io/cluster: {{ .Values.kafka.cluster.name }}
spec:
  partitions: 3
  replicas: 3
  config:
    retention.ms: 604800000  # 7 days
    segment.bytes: 1073741824  # 1GB
    compression.type: producer

---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: data.collection.started
  namespace: {{ .Values.global.namespace }}
  labels:
    strimzi.io/cluster: {{ .Values.kafka.cluster.name }}
spec:
  partitions: 3
  replicas: 3
  config:
    retention.ms: 604800000
    segment.bytes: 1073741824
    compression.type: producer

---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: data.collection.completed
  namespace: {{ .Values.global.namespace }}
  labels:
    strimzi.io/cluster: {{ .Values.kafka.cluster.name }}
spec:
  partitions: 3
  replicas: 3
  config:
    retention.ms: 604800000
    segment.bytes: 1073741824
    compression.type: producer

# ... (16 total topics for complete event flow)

Topics I created:

Query Topics: query.submitted, query.validated
Data Collection: data.collection.{started,completed,failed}
Graph Processing: graph.processing.{started,completed,failed}
Vector Processing: vector.processing.{started,completed,failed}
Reasoning: reasoning.{started,completed,failed}
Health: agent.health.check, agent.error

Step 6: Neo4j StatefulSet

Neo4j needs persistent storage and stable network identity:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: {{ include "researcherai.fullname" . }}-neo4j
  namespace: {{ .Values.global.namespace }}
spec:
  serviceName: {{ include "researcherai.fullname" . }}-neo4j
  replicas: 1  # Single instance for simplicity

  selector:
    matchLabels:
      app: neo4j

  template:
    metadata:
      labels:
        app: neo4j
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 7474
        fsGroup: 7474

      containers:
        - name: neo4j
          image: neo4j:{{ .Values.neo4j.image.tag }}

          ports:
            - name: http
              containerPort: 7474
            - name: bolt
              containerPort: 7687

          env:
            # Memory configuration
            - name: NEO4J_dbms_memory_heap_initial__size
              value: {{ .Values.neo4j.config.dbms.memory.heap.initial_size }}
            - name: NEO4J_dbms_memory_heap_max__size
              value: {{ .Values.neo4j.config.dbms.memory.heap.max_size }}
            - name: NEO4J_dbms_memory_pagecache_size
              value: {{ .Values.neo4j.config.dbms.memory.pagecache.size }}

            # Enable APOC
            - name: NEO4J_apoc_export_file_enabled
              value: "true"
            - name: NEO4J_apoc_import_file_enabled
              value: "true"
            - name: NEO4J_dbms_security_procedures_unrestricted
              value: "apoc.*"

            # Authentication
            - name: NEO4J_AUTH
              valueFrom:
                secretKeyRef:
                  name: {{ include "researcherai.fullname" . }}-secrets
                  key: NEO4J_PASSWORD

          volumeMounts:
            - name: data
              mountPath: /data
            - name: logs
              mountPath: /logs

          resources:
            {{- toYaml .Values.neo4j.resources | nindent 12 }}

  # Volume claim templates
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: {{ .Values.neo4j.persistentVolume.storageClass }}
        resources:
          requests:
            storage: {{ .Values.neo4j.persistentVolume.size }}

    - metadata:
        name: logs
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: {{ .Values.neo4j.persistentVolume.storageClass }}
        resources:
          requests:
            storage: 1Gi

Why StatefulSet?

Stable network identity: Pod always gets same DNS name
Ordered deployment: Pods start/stop in order
Persistent storage: Each pod gets its own persistent volume
Data survives restarts: Storage persists even if pod is deleted

Step 7: Service Discovery

Services make pods discoverable (templates/service.yaml):

# Application service
apiVersion: v1
kind: Service
metadata:
  name: {{ include "researcherai.fullname" . }}
  namespace: {{ .Values.global.namespace }}
spec:
  type: ClusterIP
  ports:
    - port: 8000
      targetPort: http
      protocol: TCP
      name: http
  selector:
    {{- include "researcherai.selectorLabels" . | nindent 4 }}

---
# Neo4j service
apiVersion: v1
kind: Service
metadata:
  name: {{ include "researcherai.fullname" . }}-neo4j
  namespace: {{ .Values.global.namespace }}
spec:
  type: ClusterIP
  ports:
    - port: 7474
      targetPort: 7474
      name: http
    - port: 7687
      targetPort: 7687
      name: bolt
  selector:
    app: neo4j

---
# Qdrant service
apiVersion: v1
kind: Service
metadata:
  name: {{ include "researcherai.fullname" . }}-qdrant
  namespace: {{ .Values.global.namespace }}
spec:
  type: ClusterIP
  ports:
    - port: 6333
      targetPort: 6333
      name: http
    - port: 6334
      targetPort: 6334
      name: grpc
  selector:
    app.kubernetes.io/name: qdrant

DNS names applications use:

# Automatic service discovery in Kubernetes
NEO4J_URI = "bolt://researcherai-neo4j:7687"
QDRANT_HOST = "researcherai-qdrant:6333"
KAFKA_BOOTSTRAP_SERVERS = "rag-kafka-kafka-bootstrap:9092"

Step 8: Ingress for External Access

Ingress controller routes traffic from internet to our application:

{{- if .Values.ingress.enabled }}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: {{ include "researcherai.fullname" . }}
  namespace: {{ .Values.global.namespace }}
  annotations:
    # NGINX Ingress Controller
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"

    # cert-manager for Let's Encrypt SSL
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx

  tls:
    - hosts:
        - {{ .Values.ingress.host }}
      secretName: {{ include "researcherai.fullname" . }}-tls

  rules:
    - host: {{ .Values.ingress.host }}
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: {{ include "researcherai.fullname" . }}
                port:
                  number: 8000
{{- end }}

Traffic flow:

Internet
  ↓
Ingress Controller (nginx) → SSL termination
  ↓
Service (load balancer)
  ↓
Pods (2-10 replicas)

Deploying to Kubernetes

Now let's deploy everything!

Prerequisites

# 1. Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# 2. Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# 3. Verify installations
kubectl version --client
helm version

Option 1: Automated Deployment Script

I created a deployment script that handles everything:

#!/bin/bash
# k8s/scripts/deploy.sh

set -e

echo "ResearcherAI Kubernetes Deployment"
echo "=================================="

# Check prerequisites
if ! command -v kubectl &> /dev/null; then
    echo "Error: kubectl not found"
    exit 1
fi

if ! command -v helm &> /dev/null; then
    echo "Error: helm not found"
    exit 1
fi

# Check cluster connection
echo "Checking Kubernetes cluster connection..."
if ! kubectl cluster-info &> /dev/null; then
    echo "Error: Cannot connect to Kubernetes cluster"
    exit 1
fi

# Install Strimzi Kafka Operator
echo "Installing Strimzi Kafka Operator..."
kubectl create namespace kafka --dry-run=client -o yaml | kubectl apply -f -
helm repo add strimzi https://strimzi.io/charts/
helm repo update

helm upgrade --install strimzi-kafka-operator strimzi/strimzi-kafka-operator \
  --namespace kafka \
  --set watchAnyNamespace=true \
  --wait

# Get configuration
read -p "Enter your Google API key: " -s GOOGLE_API_KEY
echo
read -p "Enter Neo4j password: " -s NEO4J_PASSWORD
echo

# Create values file
cat > /tmp/researcherai-values.yaml <<EOF
app:
  secrets:
    googleApiKey: "${GOOGLE_API_KEY}"
    neo4jPassword: "${NEO4J_PASSWORD}"

  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10

neo4j:
  enabled: true

qdrant:
  enabled: true

kafka:
  enabled: true
EOF

# Install/upgrade ResearcherAI
echo "Deploying ResearcherAI..."
helm upgrade --install researcherai ./k8s/helm/researcherai \
  --namespace researcherai \
  --create-namespace \
  --values /tmp/researcherai-values.yaml \
  --wait \
  --timeout 10m

# Clean up values file
rm /tmp/researcherai-values.yaml

# Show deployment status
echo ""
echo "Deployment completed successfully!"
echo ""
echo "Checking pod status..."
kubectl get pods -n researcherai

echo ""
echo "Getting service information..."
kubectl get services -n researcherai

echo ""
echo "To watch pod status:"
echo "  kubectl get pods -n researcherai --watch"
echo ""
echo "To view logs:"
echo "  kubectl logs -n researcherai -l app.kubernetes.io/name=researcherai --follow"
echo ""
echo "To access the application:"
echo "  kubectl port-forward -n researcherai svc/researcherai 8000:8000"
echo "  Then visit: http://localhost:8000"

Run the deployment:

chmod +x k8s/scripts/deploy.sh
./k8s/scripts/deploy.sh

Option 2: Manual Step-by-Step Deployment

Step 1: Install Strimzi Operator

# Create Kafka namespace
kubectl create namespace kafka

# Add Strimzi Helm repo
helm repo add strimzi https://strimzi.io/charts/
helm repo update

# Install operator
helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator \
  --namespace kafka \
  --set watchAnyNamespace=true

Step 2: Create Configuration File

# custom-values.yaml
app:
  replicaCount: 2
  secrets:
    googleApiKey: "YOUR_GOOGLE_API_KEY"
    neo4jPassword: "secure-neo4j-password"

  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilizationPercentage: 80

neo4j:
  enabled: true
  config:
    dbms.memory.heap.initial_size: "2G"
    dbms.memory.heap.max_size: "2G"

qdrant:
  enabled: true
  persistence:
    size: 20Gi  # Adjust based on your needs

kafka:
  enabled: true
  cluster:
    replicas: 3

ingress:
  enabled: true
  host: researcherai.yourdomain.com

Step 3: Install ResearcherAI

helm install researcherai ./k8s/helm/researcherai \
  --namespace researcherai \
  --create-namespace \
  --values custom-values.yaml \
  --wait

Step 4: Verify Deployment

# Watch pods start up
kubectl get pods -n researcherai --watch

# Expected output:
NAME                                   READY   STATUS    RESTARTS   AGE
researcherai-6b8f7d9c5d-abcde         1/1     Running   0          2m
researcherai-6b8f7d9c5d-fghij         1/1     Running   0          2m
researcherai-neo4j-0                  1/1     Running   0          3m
researcherai-qdrant-7d9b8c6f5-xyz     1/1     Running   0          3m
rag-kafka-kafka-0                      1/1     Running   0          4m
rag-kafka-kafka-1                      1/1     Running   0          4m
rag-kafka-kafka-2                      1/1     Running   0          4m
rag-kafka-zookeeper-0                  1/1     Running   0          5m
rag-kafka-zookeeper-1                  1/1     Running   0          5m
rag-kafka-zookeeper-2                  1/1     Running   0          5m

# Check services
kubectl get services -n researcherai

# Check Kafka topics
kubectl get kafkatopics -n researcherai

Step 5: Access the Application

# Port forward to local machine
kubectl port-forward -n researcherai svc/researcherai 8000:8000

# Visit in browser
open http://localhost:8000

Testing Auto-Scaling

Let's verify that auto-scaling works:

Generate Load

# Install Apache Bench
sudo apt-get install apache2-utils

# Generate load (100 concurrent requests, 10000 total)
ab -n 10000 -c 100 http://localhost:8000/api/query

Watch Scaling

# In another terminal, watch HPA
kubectl get hpa -n researcherai --watch

# Expected output:
NAME           REFERENCE                 TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
researcherai   Deployment/researcherai   45%/80%, 30%/80%   2        10        2         10m
researcherai   Deployment/researcherai   85%/80%, 35%/80%   2        10        3         10m
researcherai   Deployment/researcherai   92%/80%, 40%/80%   2        10        4         11m

# Watch pods being created
kubectl get pods -n researcherai --watch

Monitoring Deployment

View Logs

# All application pods
kubectl logs -n researcherai -l app.kubernetes.io/name=researcherai --follow

# Specific pod
kubectl logs -n researcherai researcherai-6b8f7d9c5d-abcde --follow

# Previous crashed container
kubectl logs -n researcherai researcherai-6b8f7d9c5d-abcde --previous

# Multiple containers in a pod
kubectl logs -n researcherai researcherai-6b8f7d9c5d-abcde -c researcherai

Check Resource Usage

# Pod resource usage
kubectl top pods -n researcherai

# Node resource usage
kubectl top nodes

# Detailed pod description
kubectl describe pod -n researcherai researcherai-6b8f7d9c5d-abcde

Check Events

# Namespace events
kubectl get events -n researcherai --sort-by='.lastTimestamp'

# Watch events in real-time
kubectl get events -n researcherai --watch

Updating the Application

Rolling Update

# Update image tag in values
app:
  image:
    tag: "2.1.0"  # New version

# Apply upgrade
helm upgrade researcherai ./k8s/helm/researcherai \
  --namespace researcherai \
  --values custom-values.yaml

# Watch rolling update
kubectl rollout status deployment/researcherai -n researcherai

Rollback if Needed

# View revision history
helm history researcherai -n researcherai

# Rollback to previous revision
helm rollback researcherai -n researcherai

# Rollback to specific revision
helm rollback researcherai 3 -n researcherai

Production Considerations

1. Resource Planning

Calculate Total Resources:

# Application: 2-10 pods
CPU: 1000m request × 2 pods = 2 cores minimum
CPU: 2000m limit × 10 pods = 20 cores maximum
RAM: 2Gi × 2 = 4Gi minimum
RAM: 4Gi × 10 = 40Gi maximum

# Neo4j: 1 pod
CPU: 1 core
RAM: 4Gi (2G heap + 1G page cache + overhead)
Storage: 10Gi

# Qdrant: 1 pod
CPU: 1 core
RAM: 2Gi
Storage: 10Gi

# Kafka: 3 brokers
CPU: 3 cores
RAM: 6Gi
Storage: 30Gi

# Zookeeper: 3 nodes
CPU: 1.5 cores
RAM: 3Gi
Storage: 15Gi

# TOTAL MINIMUM:
CPU: ~10 cores
RAM: ~19Gi
Storage: ~65Gi

# TOTAL MAXIMUM (when auto-scaled):
CPU: ~30 cores
RAM: ~55Gi

Kubernetes Cluster Sizing:

For production, I recommend:

3-node cluster (high availability)
Each node: 8 cores, 32GB RAM, 100GB SSD
Total: 24 cores, 96GB RAM, 300GB storage
Cost: ~$500-700/month (varies by cloud provider)

2. High Availability Checklist

✅ Multiple pod replicas (2-10)
✅ Pod Disruption Budget configured
✅ Kafka 3-broker cluster
✅ Zookeeper 3-node ensemble
✅ Rolling updates with maxUnavailable: 0
⚠️ Neo4j single replica (consider clustering)
⚠️ Qdrant single replica (configure replication)

3. Backup Strategy

What to backup:

Neo4j data (persistent volume)
Qdrant collections (persistent volume)
Kubernetes configurations (Git)
Secrets (external secret management)

Backup tools:

# Install Velero for Kubernetes backups
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
helm install velero vmware-tanzu/velero \
  --namespace velero \
  --create-namespace \
  --set configuration.provider=aws \
  --set configuration.backupStorageLocation.bucket=researcherai-backups \
  --set configuration.backupStorageLocation.config.region=us-east-1

# Schedule daily backups
velero schedule create daily-backup \
  --schedule="0 2 * * *" \
  --include-namespaces researcherai

4. Security Hardening

Network Policies:

# Restrict pod-to-pod communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: researcherai-network-policy
  namespace: researcherai
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: researcherai

  policyTypes:
    - Ingress
    - Egress

  ingress:
    # Allow from ingress controller
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8000

  egress:
    # Allow to Neo4j
    - to:
        - podSelector:
            matchLabels:
              app: neo4j
      ports:
        - protocol: TCP
          port: 7687

    # Allow to Qdrant
    - to:
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: qdrant
      ports:
        - protocol: TCP
          port: 6333

    # Allow to Kafka
    - to:
        - podSelector:
            matchLabels:
              strimzi.io/cluster: rag-kafka
      ports:
        - protocol: TCP
          port: 9092

    # Allow DNS
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53

    # Allow internet (for API calls)
    - to:
        - namespaceSelector: {}
      ports:
        - protocol: TCP
          port: 443

Pod Security Standards:

# Enforce restricted security standard
apiVersion: v1
kind: Namespace
metadata:
  name: researcherai
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

5. Cost Optimization

Tips to reduce costs:

Use spot/preemptible instances for non-production
Right-size resources - adjust requests/limits based on actual usage
Scale to zero for dev/staging environments when not in use
Use cluster autoscaler - add/remove nodes based on demand
Monitor costs - use tools like Kubecost

Example: Development Environment

# dev-values.yaml
app:
  replicaCount: 1  # Only 1 replica
  resources:
    requests:
      cpu: 500m     # Half resources
      memory: 1Gi
  autoscaling:
    enabled: false  # No auto-scaling in dev

neo4j:
  config:
    dbms.memory.heap.initial_size: "512M"  # Less memory
    dbms.memory.heap.max_size: "512M"

kafka:
  cluster:
    replicas: 1  # Single broker for dev
  zookeeper:
    replicas: 1  # Single zookeeper

Troubleshooting

Pod Won't Start

# Check pod status
kubectl describe pod -n researcherai <pod-name>

# Common issues:
# 1. Image pull error - check image name/tag
# 2. Init container failing - check dependency services
# 3. Resource constraints - check node resources

# Check events
kubectl get events -n researcherai --sort-by='.lastTimestamp'

Application Crashes

# Check logs
kubectl logs -n researcherai <pod-name> --previous

# Check resource limits
kubectl top pod -n researcherai <pod-name>

# Common issues:
# - Out of memory (OOMKilled)
# - Unhandled exceptions
# - Database connection failures

Can't Connect to Database

# Test Neo4j connection
kubectl run -it --rm debug --image=busybox --restart=Never -n researcherai -- sh
nc -zv researcherai-neo4j 7687

# Check Neo4j logs
kubectl logs -n researcherai researcherai-neo4j-0

# Verify service
kubectl get svc -n researcherai researcherai-neo4j

Kafka Issues

# Check Kafka cluster status
kubectl get kafka -n researcherai

# Check topics
kubectl get kafkatopics -n researcherai

# Connect to Kafka pod
kubectl exec -it -n researcherai rag-kafka-kafka-0 -- bash

# Inside pod, test Kafka
kafka-topics.sh --bootstrap-server localhost:9092 --list

Next Steps

We've successfully deployed ResearcherAI to Kubernetes with:

✅ Auto-scaling (2-10 replicas)
✅ High availability (multiple brokers, rolling updates)
✅ Persistent storage (StatefulSets, PVCs)
✅ Service discovery (Kubernetes DNS)
✅ Event streaming (Kafka cluster)
✅ Health checks (liveness/readiness probes)

But we're not done yet! In the next chapter, we'll cover:

Terraform - Infrastructure as Code for cloud resources
Observability - Prometheus, Grafana, distributed tracing
CI/CD - Automated deployments with GitHub Actions
Production readiness - Security, backup, disaster recovery

Let's continue with Terraform deployment in the next chapter!

Why Kubernetes for ResearcherAI?​

The Challenge​

Docker Compose vs. Kubernetes​

Web Developer Analogy​

Kubernetes Basics for Web Developers​

Core Concepts​

Helm: Kubernetes Package Manager​

Helm Chart Structure​

Building the ResearcherAI Helm Chart​

Step 1: Chart Metadata​

Step 2: Configuration Values​

Step 3: Application Deployment​

Step 4: Auto-Scaling​

Step 5: Kafka Event Streaming​

Step 6: Neo4j StatefulSet​

Step 7: Service Discovery​

Step 8: Ingress for External Access​

Deploying to Kubernetes​

Prerequisites​

Option 1: Automated Deployment Script​

Option 2: Manual Step-by-Step Deployment​

Testing Auto-Scaling​

Generate Load​

Watch Scaling​

Monitoring Deployment​

View Logs​

Check Resource Usage​

Check Events​

Updating the Application​

Rolling Update​

Rollback if Needed​

Production Considerations​

1. Resource Planning​

2. High Availability Checklist​

3. Backup Strategy​

4. Security Hardening​

5. Cost Optimization​

Troubleshooting​

Pod Won't Start​

Application Crashes​

Can't Connect to Database​

Kafka Issues​

Next Steps​