Chapter 7B: CI/CD & Automation

Manual deployments don't scale. Every code change should be automatically tested, security-scanned, and deployed to production. In this chapter, I'll show you how to build a complete CI/CD pipeline for ResearcherAI using GitHub Actions, with automated testing, security scanning, and GitOps-based deployments.

Why CI/CD for ML/AI Applications?

Traditional web apps need CI/CD, but LLM applications have additional challenges:

Traditional CI/CD Concerns:

Code quality (linting, formatting)
Unit tests
Integration tests
Security vulnerabilities
Build artifacts
Deployment automation

LLM-Specific Concerns:

Prompt regression testing
Model version management
Token usage regression
Cost regression testing
Response quality evaluation
Embedding drift detection
Vector database migrations

The Goal

git push
  ↓
Automated Pipeline
  ├─ Lint & format code
  ├─ Run unit tests (Python)
  ├─ Run LLM evaluation tests
  ├─ Check prompt regressions
  ├─ Build Docker images
  ├─ Scan for vulnerabilities
  ├─ Deploy to staging
  ├─ Run integration tests
  ├─ Deploy to production
  └─ Monitor deployment

GitHub Actions Overview

GitHub Actions is like having a CI/CD server that runs automatically on every git push.

Web Developer Analogy

GitHub Actions = Automated deployment like Vercel/Netlify

Triggers on git push
Runs tests automatically
Builds and deploys
All configured in .github/workflows/

Workflow = Recipe for automation Job = Group of related steps Step = Individual command or action Action = Reusable workflow component

Basic Structure

.github/workflows/ci.yml:

name: CI Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run tests
        run: pytest

Complete CI/CD Pipeline for ResearcherAI

Let's build a production-grade pipeline step by step.

Step 1: Code Quality Checks

.github/workflows/lint.yml:

name: Code Quality

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'

      - name: Install dependencies
        run: |
          pip install black flake8 mypy pylint isort
          pip install -r requirements.txt

      - name: Check code formatting (Black)
        run: black --check .

      - name: Check import sorting (isort)
        run: isort --check-only .

      - name: Lint with flake8
        run: |
          flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
          flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

      - name: Type checking (mypy)
        run: mypy agents/ utils/ --ignore-missing-imports

      - name: Lint with pylint
        run: pylint agents/ utils/ --disable=C0111,C0103

  yaml-lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Lint YAML files
        uses: ibiqlik/action-yamllint@v3
        with:
          file_or_dir: .github/workflows/*.yml k8s/ terraform/

  dockerfile-lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Lint Dockerfile
        uses: hadolint/hadolint-action@v3.1.0
        with:
          dockerfile: Dockerfile

Step 2: Automated Testing

.github/workflows/test.yml:

name: Test Suite

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ['3.10', '3.11', '3.12']

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          cache: 'pip'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest pytest-cov pytest-asyncio pytest-mock

      - name: Run unit tests
        run: |
          pytest tests/unit/ \
            --cov=agents \
            --cov=utils \
            --cov-report=xml \
            --cov-report=html \
            --junitxml=test-results.xml

      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v3
        with:
          file: ./coverage.xml
          flags: unittests
          name: codecov-umbrella

      - name: Upload test results
        uses: actions/upload-artifact@v3
        if: always()
        with:
          name: test-results-${{ matrix.python-version }}
          path: test-results.xml

  integration-tests:
    runs-on: ubuntu-latest
    services:
      neo4j:
        image: neo4j:5.13-community
        env:
          NEO4J_AUTH: neo4j/testpassword
        ports:
          - 7687:7687
          - 7474:7474

      qdrant:
        image: qdrant/qdrant:latest
        ports:
          - 6333:6333

      kafka:
        image: bitnami/kafka:latest
        env:
          KAFKA_ENABLE_KRAFT: yes
          KAFKA_CFG_PROCESS_ROLES: broker,controller
          KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER
          KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
          KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
          KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
          KAFKA_BROKER_ID: 1
          KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@localhost:9093
          ALLOW_PLAINTEXT_LISTENER: yes
        ports:
          - 9092:9092

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest pytest-asyncio

      - name: Wait for services
        run: |
          timeout 60 bash -c 'until nc -z localhost 7687; do sleep 1; done'
          timeout 60 bash -c 'until nc -z localhost 6333; do sleep 1; done'
          timeout 60 bash -c 'until nc -z localhost 9092; do sleep 1; done'

      - name: Run integration tests
        env:
          NEO4J_URI: bolt://localhost:7687
          NEO4J_PASSWORD: testpassword
          QDRANT_HOST: localhost:6333
          KAFKA_BOOTSTRAP_SERVERS: localhost:9092
        run: |
          pytest tests/integration/ -v

  llm-evaluation-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install langsmith

      - name: Run LLM evaluation tests
        env:
          GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
          LANGSMITH_API_KEY: ${{ secrets.LANGSMITH_API_KEY }}
          LANGCHAIN_PROJECT: ResearcherAI-CI
        run: |
          pytest tests/llm_evaluation/ -v --tb=short

      - name: Check for prompt regressions
        run: |
          python scripts/check_prompt_regressions.py \
            --baseline=prompts/baseline.json \
            --current=prompts/current.json

      - name: Check token usage regression
        run: |
          python scripts/check_token_regression.py \
            --max-increase-percent=10

Step 3: Security Scanning

.github/workflows/security.yml:

name: Security Scanning

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 0 * * 1'  # Weekly on Monday

jobs:
  dependency-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run Snyk to check for vulnerabilities
        uses: snyk/actions/python@master
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
        with:
          command: test
          args: --severity-threshold=high

      - name: Safety check (Python dependencies)
        run: |
          pip install safety
          safety check --json

  secret-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for secret scanning

      - name: TruffleHog Secret Scanning
        uses: trufflesecurity/trufflehog@main
        with:
          path: ./
          base: main
          head: HEAD

      - name: GitLeaks Secret Scanning
        uses: gitleaks/gitleaks-action@v2
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

  code-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Initialize CodeQL
        uses: github/codeql-action/init@v2
        with:
          languages: python

      - name: Perform CodeQL Analysis
        uses: github/codeql-action/analyze@v2

  container-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build Docker image
        run: docker build -t researcherai:${{ github.sha }} .

      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: researcherai:${{ github.sha }}
          format: 'sarif'
          output: 'trivy-results.sarif'
          severity: 'CRITICAL,HIGH'

      - name: Upload Trivy results to GitHub Security
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-results.sarif'

Step 4: Build and Push Docker Images

.github/workflows/build.yml:

name: Build and Push

on:
  push:
    branches: [main, develop]
    tags:
      - 'v*'

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/${{ github.repository }}
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}
            type=sha

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          platforms: linux/amd64,linux/arm64

      - name: Image digest
        run: echo ${{ steps.meta.outputs.digest }}

Step 5: Deploy to Staging

.github/workflows/deploy-staging.yml:

name: Deploy to Staging

on:
  push:
    branches: [develop]
  workflow_dispatch:

jobs:
  deploy-staging:
    runs-on: ubuntu-latest
    environment:
      name: staging
      url: https://staging.researcherai.com

    steps:
      - uses: actions/checkout@v4

      - name: Configure kubectl
        uses: azure/k8s-set-context@v3
        with:
          method: kubeconfig
          kubeconfig: ${{ secrets.KUBE_CONFIG_STAGING }}

      - name: Install Helm
        uses: azure/setup-helm@v3
        with:
          version: '3.12.0'

      - name: Deploy with Helm
        run: |
          helm upgrade --install researcherai ./k8s/helm/researcherai \
            --namespace staging \
            --create-namespace \
            --set app.image.tag=${{ github.sha }} \
            --set app.secrets.googleApiKey=${{ secrets.GOOGLE_API_KEY_STAGING }} \
            --set app.environment=staging \
            --wait \
            --timeout 10m

      - name: Verify deployment
        run: |
          kubectl rollout status deployment/researcherai -n staging
          kubectl get pods -n staging

      - name: Run smoke tests
        run: |
          sleep 30  # Wait for services to be ready
          curl -f https://staging.researcherai.com/health || exit 1
          python scripts/smoke_tests.py --environment=staging

      - name: Notify Slack
        if: always()
        uses: 8398a7/action-slack@v3
        with:
          status: ${{ job.status }}
          text: 'Staging deployment ${{ job.status }}'
          webhook_url: ${{ secrets.SLACK_WEBHOOK }}

Step 6: Deploy to Production

.github/workflows/deploy-production.yml:

name: Deploy to Production

on:
  push:
    tags:
      - 'v*'
  workflow_dispatch:

jobs:
  deploy-production:
    runs-on: ubuntu-latest
    environment:
      name: production
      url: https://researcherai.com

    steps:
      - uses: actions/checkout@v4

      - name: Configure kubectl
        uses: azure/k8s-set-context@v3
        with:
          method: kubeconfig
          kubeconfig: ${{ secrets.KUBE_CONFIG_PRODUCTION }}

      - name: Install Helm
        uses: azure/setup-helm@v3

      - name: Blue-Green Deployment
        run: |
          # Deploy to blue environment
          helm upgrade --install researcherai-blue ./k8s/helm/researcherai \
            --namespace production \
            --create-namespace \
            --set app.image.tag=${{ github.ref_name }} \
            --set app.secrets.googleApiKey=${{ secrets.GOOGLE_API_KEY_PROD }} \
            --set app.environment=production \
            --set app.deployment.color=blue \
            --wait \
            --timeout 10m

      - name: Run production smoke tests
        run: |
          python scripts/smoke_tests.py \
            --environment=production \
            --target=blue \
            --critical-only

      - name: Switch traffic to blue
        run: |
          kubectl patch service researcherai -n production \
            -p '{"spec":{"selector":{"color":"blue"}}}'

      - name: Monitor for 5 minutes
        run: |
          sleep 300
          python scripts/monitor_deployment.py \
            --duration=300 \
            --error-threshold=0.01

      - name: Scale down green
        if: success()
        run: |
          helm delete researcherai-green -n production || true

      - name: Rollback on failure
        if: failure()
        run: |
          kubectl patch service researcherai -n production \
            -p '{"spec":{"selector":{"color":"green"}}}'
          helm delete researcherai-blue -n production

      - name: Create GitHub Release
        if: success()
        uses: actions/create-release@v1
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        with:
          tag_name: ${{ github.ref }}
          release_name: Release ${{ github.ref_name }}
          body: |
            ## Changes
            ${{ github.event.head_commit.message }}

            ## Deployment
            - Deployed to production
            - Image: ghcr.io/${{ github.repository }}:${{ github.ref_name }}

      - name: Notify Slack
        if: always()
        uses: 8398a7/action-slack@v3
        with:
          status: ${{ job.status }}
          text: 'Production deployment ${{ job.status }}: ${{ github.ref_name }}'
          webhook_url: ${{ secrets.SLACK_WEBHOOK }}

GitOps with ArgoCD

For declarative deployments, ArgoCD watches your Git repository and automatically syncs changes to Kubernetes.

Step 1: Install ArgoCD

# Create namespace
kubectl create namespace argocd

# Install ArgoCD
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Get admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
  -o jsonpath="{.data.password}" | base64 -d

# Port forward to access UI
kubectl port-forward svc/argocd-server -n argocd 8080:443

# Login
argocd login localhost:8080

Step 2: Create ArgoCD Application

argocd/application.yaml:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: researcherai
  namespace: argocd
spec:
  project: default

  source:
    repoURL: https://github.com/Sebuliba-Adrian/ResearcherAI.git
    targetRevision: HEAD
    path: k8s/helm/researcherai
    helm:
      valueFiles:
        - values.yaml
        - values-production.yaml

  destination:
    server: https://kubernetes.default.svc
    namespace: production

  syncPolicy:
    automated:
      prune: true      # Delete resources not in Git
      selfHeal: true   # Sync on changes
    syncOptions:
      - CreateNamespace=true

    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

Apply:

kubectl apply -f argocd/application.yaml -n argocd

Step 3: Update GitHub Actions for GitOps

.github/workflows/gitops-deploy.yml:

name: GitOps Deployment

on:
  push:
    branches: [main]
    paths:
      - 'k8s/**'
      - 'Dockerfile'

jobs:
  update-manifests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Update image tag
        run: |
          sed -i "s|tag: .*|tag: ${GITHUB_SHA}|" \
            k8s/helm/researcherai/values-production.yaml

      - name: Commit changes
        run: |
          git config user.name "GitHub Actions"
          git config user.email "actions@github.com"
          git add k8s/helm/researcherai/values-production.yaml
          git commit -m "Update production image to ${GITHUB_SHA}"
          git push

# ArgoCD automatically detects the change and syncs!

LLM-Specific CI/CD Practices

1. Prompt Regression Testing

scripts/check_prompt_regressions.py:

#!/usr/bin/env python3
import json
import sys
from langsmith import Client

def check_prompt_regressions():
    """
    Compare current prompts against baseline.
    Fail if quality drops significantly.
    """
    client = Client()

    # Load baseline scores
    with open("prompts/baseline.json") as f:
        baseline = json.load(f)

    # Run current evaluation
    results = client.evaluate(
        lambda x: generate_with_current_prompt(x),
        data="research-queries-test-set",
        evaluators=[quality_evaluator, relevance_evaluator]
    )

    # Compare scores
    for metric in ["quality", "relevance"]:
        current_score = results[metric]
        baseline_score = baseline[metric]

        # Fail if more than 5% drop
        if current_score < baseline_score * 0.95:
            print(f"❌ Regression detected in {metric}!")
            print(f"  Baseline: {baseline_score:.2f}")
            print(f"  Current:  {current_score:.2f}")
            print(f"  Drop:     {(baseline_score - current_score):.2f}")
            sys.exit(1)

    print("✅ No prompt regressions detected")

if __name__ == "__main__":
    check_prompt_regressions()

2. Token Usage Monitoring

scripts/check_token_regression.py:

#!/usr/bin/env python3
import sys
from langfuse import Langfuse

def check_token_regression(max_increase_percent=10):
    """
    Ensure token usage doesn't increase unexpectedly.
    """
    langfuse = Langfuse()

    # Get token usage from last week
    last_week = langfuse.get_trace_stats(
        filter_time_start="7d ago",
        filter_time_end="now"
    )

    avg_tokens_last_week = last_week["avg_tokens_per_trace"]

    # Get token usage from last day
    last_day = langfuse.get_trace_stats(
        filter_time_start="1d ago",
        filter_time_end="now"
    )

    avg_tokens_last_day = last_day["avg_tokens_per_trace"]

    # Calculate increase
    increase_percent = (
        (avg_tokens_last_day - avg_tokens_last_week) / avg_tokens_last_week
    ) * 100

    if increase_percent > max_increase_percent:
        print(f"❌ Token usage increased by {increase_percent:.1f}%!")
        print(f"  Last week: {avg_tokens_last_week:.0f} tokens")
        print(f"  Last day:  {avg_tokens_last_day:.0f} tokens")
        sys.exit(1)

    print(f"✅ Token usage stable (change: {increase_percent:.1f}%)")

if __name__ == "__main__":
    check_token_regression()

3. Cost Budget Checks

scripts/check_cost_budget.py:

#!/usr/bin/env python3
import sys
from langfuse import Langfuse

def check_cost_budget(daily_budget=100):
    """
    Fail deployment if projected cost exceeds budget.
    """
    langfuse = Langfuse()

    # Get cost from last hour
    stats = langfuse.get_trace_stats(
        filter_time_start="1h ago",
        filter_time_end="now"
    )

    hourly_cost = stats["total_cost"]
    projected_daily_cost = hourly_cost * 24

    if projected_daily_cost > daily_budget:
        print(f"❌ Projected daily cost exceeds budget!")
        print(f"  Budget:    ${daily_budget:.2f}")
        print(f"  Projected: ${projected_daily_cost:.2f}")
        print(f"  Hourly:    ${hourly_cost:.2f}")
        sys.exit(1)

    print(f"✅ Cost within budget (${projected_daily_cost:.2f}/day)")

if __name__ == "__main__":
    check_cost_budget()

Deployment Strategies

1. Rolling Update (Default)

# k8s/helm/researcherai/templates/deployment.yaml
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Add 1 pod before removing old
      maxUnavailable: 0  # Always keep all pods running

Pros:

Zero downtime
Automatic rollback on failure
Resource efficient

Cons:

Gradual rollout (not instant)
Mixed versions running temporarily

2. Blue-Green Deployment

# Deploy blue
helm install researcherai-blue ./k8s/helm/researcherai \
  --set app.deployment.color=blue

# Test blue
curl https://blue.researcherai.com/health

# Switch traffic
kubectl patch service researcherai \
  -p '{"spec":{"selector":{"color":"blue"}}}'

# Remove green
helm delete researcherai-green

Pros:

Instant rollback
Full testing before switch
Clean separation

Cons:

Requires 2x resources
Database migrations complex

3. Canary Deployment

# Deploy canary with 10% traffic
helm install researcherai-canary ./k8s/helm/researcherai \
  --set app.replicaCount=1 \
  --set ingress.weight=10

# Monitor
python scripts/monitor_canary.py --duration=600

# If successful, scale up
helm upgrade researcherai-canary \
  --set app.replicaCount=10 \
  --set ingress.weight=100

Pros:

Gradual rollout
Early problem detection
User-based routing possible

Cons:

Complex traffic management
Requires service mesh (Istio)

Monitoring CI/CD Pipeline

GitHub Actions Dashboard

View all workflow runs:

https://github.com/Sebuliba-Adrian/ResearcherAI/actions

Pipeline Metrics to Track

Build Time
- Target: < 10 minutes
- Alert if > 15 minutes
Test Pass Rate
- Target: 100%
- Alert if < 95%
Deployment Frequency
- Target: Multiple times per day
- Track trend
Mean Time to Recovery (MTTR)
- Target: < 1 hour
- Track incidents
Change Failure Rate
- Target: < 5%
- Track rollbacks

Pipeline Alerts

.github/workflows/alerts.yml:

name: Pipeline Alerts

on:
  workflow_run:
    workflows: ["CI Pipeline", "Deploy to Production"]
    types: [completed]

jobs:
  alert-on-failure:
    runs-on: ubuntu-latest
    if: ${{ github.event.workflow_run.conclusion == 'failure' }}
    steps:
      - name: Send PagerDuty alert
        uses: PagerDuty/pagerduty-github-action@v1
        with:
          pagerduty-token: ${{ secrets.PAGERDUTY_TOKEN }}
          service-id: ${{ secrets.PAGERDUTY_SERVICE_ID }}
          event-action: trigger
          summary: "CI/CD Pipeline failed: ${{ github.event.workflow_run.name }}"
          severity: high

Best Practices

1. Fast Feedback

# Run fast tests first
jobs:
  quick-tests:
    runs-on: ubuntu-latest
    steps:
      - run: pytest tests/unit/ -v  # < 2 minutes

  slow-tests:
    needs: quick-tests  # Only run if quick tests pass
    runs-on: ubuntu-latest
    steps:
      - run: pytest tests/integration/ -v  # 5-10 minutes

2. Fail Fast

jobs:
  test:
    strategy:
      fail-fast: true  # Stop all jobs on first failure
      matrix:
        python-version: ['3.10', '3.11', '3.12']

3. Caching

- name: Cache pip packages
  uses: actions/cache@v3
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
    restore-keys: |
      ${{ runner.os }}-pip-

- name: Cache Docker layers
  uses: actions/cache@v3
  with:
    path: /tmp/.buildx-cache
    key: ${{ runner.os }}-buildx-${{ github.sha }}
    restore-keys: |
      ${{ runner.os }}-buildx-

4. Secure Secrets

# Never log secrets
- name: Deploy with secrets
  run: |
    echo "::add-mask::${{ secrets.API_KEY }}"
    deploy.sh --api-key="${{ secrets.API_KEY }}"
  # API_KEY is now masked in logs

5. Environment Protection

# Require approval for production
environment:
  name: production
  url: https://researcherai.com
# Configure in GitHub Settings → Environments
# - Require reviewers: 2 approvals
# - Wait timer: 0 minutes
# - Deployment branches: main only

Troubleshooting CI/CD

Pipeline Fails on Dependencies

# Use exact versions
- name: Install dependencies
  run: |
    pip install -r requirements-lock.txt  # Pinned versions
    # Not: pip install -r requirements.txt  # Can break

Tests Pass Locally, Fail in CI

# Use same environment
- name: Run tests in Docker
  run: |
    docker-compose -f docker-compose.test.yml up --abort-on-container-exit
    # Same environment as local development

Slow Pipelines

# Run jobs in parallel
jobs:
  unit-tests:
    runs-on: ubuntu-latest
  integration-tests:
    runs-on: ubuntu-latest
  security-scan:
    runs-on: ubuntu-latest
# All run simultaneously

Deployment Failures

# Add rollback step
- name: Rollback on failure
  if: failure()
  run: |
    helm rollback researcherai -n production
    kubectl rollout undo deployment/researcherai -n production

Complete CI/CD Checklist

✅ Code Quality

Linting (black, flake8)
Type checking (mypy)
Import sorting (isort)
Complexity checks (pylint)

✅ Testing

✅ Security

✅ Build & Deploy

✅ Monitoring

✅ GitOps (Optional)

ArgoCD setup
Declarative manifests
Automated sync
Self-healing

Conclusion

Congratulations! You now have a complete CI/CD pipeline that:

✅ Automatically tests every code change
✅ Checks for security vulnerabilities
✅ Validates prompt quality
✅ Monitors token usage and costs
✅ Builds and pushes Docker images
✅ Deploys to staging and production
✅ Runs smoke tests
✅ Monitors deployments
✅ Automatically rolls back on failure
✅ Sends alerts on issues

Your ResearcherAI system is now production-ready with:

Kubernetes deployment - Auto-scaling, high availability
Terraform infrastructure - Reproducible cloud resources
Comprehensive observability - LangFuse, Prometheus, Grafana
Automated CI/CD - GitHub Actions, security scanning
Multiple environments - Dev, staging, production

In the final chapter, we'll wrap up with lessons learned and future enhancements!

Why CI/CD for ML/AI Applications?​

The Goal​

GitHub Actions Overview​

Web Developer Analogy​

Basic Structure​

Complete CI/CD Pipeline for ResearcherAI​

Step 1: Code Quality Checks​

Step 2: Automated Testing​

Step 3: Security Scanning​

Step 4: Build and Push Docker Images​

Step 5: Deploy to Staging​

Step 6: Deploy to Production​

GitOps with ArgoCD​

Step 1: Install ArgoCD​

Step 2: Create ArgoCD Application​

Step 3: Update GitHub Actions for GitOps​

LLM-Specific CI/CD Practices​

1. Prompt Regression Testing​

2. Token Usage Monitoring​

3. Cost Budget Checks​

Deployment Strategies​

1. Rolling Update (Default)​

2. Blue-Green Deployment​

3. Canary Deployment​

Monitoring CI/CD Pipeline​

GitHub Actions Dashboard​

Pipeline Metrics to Track​

Pipeline Alerts​

Best Practices​

1. Fast Feedback​

2. Fail Fast​

3. Caching​

4. Secure Secrets​

5. Environment Protection​

Troubleshooting CI/CD​

Pipeline Fails on Dependencies​

Tests Pass Locally, Fail in CI​

Slow Pipelines​

Deployment Failures​

Complete CI/CD Checklist​

✅ Code Quality​

✅ Testing​

✅ Security​

✅ Build & Deploy​

✅ Monitoring​

✅ GitOps (Optional)​

Conclusion​

Why CI/CD for ML/AI Applications?

The Goal

GitHub Actions Overview

Web Developer Analogy

Basic Structure

Complete CI/CD Pipeline for ResearcherAI

Step 1: Code Quality Checks

Step 2: Automated Testing

Step 3: Security Scanning

Step 4: Build and Push Docker Images

Step 5: Deploy to Staging

Step 6: Deploy to Production

GitOps with ArgoCD

Step 1: Install ArgoCD

Step 2: Create ArgoCD Application

Step 3: Update GitHub Actions for GitOps

LLM-Specific CI/CD Practices

1. Prompt Regression Testing

2. Token Usage Monitoring

3. Cost Budget Checks

Deployment Strategies

1. Rolling Update (Default)

2. Blue-Green Deployment

3. Canary Deployment

Monitoring CI/CD Pipeline

GitHub Actions Dashboard

Pipeline Metrics to Track

Pipeline Alerts

Best Practices

1. Fast Feedback

2. Fail Fast

3. Caching

4. Secure Secrets

5. Environment Protection

Troubleshooting CI/CD

Pipeline Fails on Dependencies

Tests Pass Locally, Fail in CI

Slow Pipelines

Deployment Failures

Complete CI/CD Checklist

✅ Code Quality

✅ Testing

✅ Security

✅ Build & Deploy

✅ Monitoring

✅ GitOps (Optional)

Conclusion