No description
  • Go 86.4%
  • Shell 8%
  • Makefile 3.8%
  • Dockerfile 1.8%
Find a file
Daniel Sy 5b3078217f
fix(deploy): 🚚 update sizer-url after namespace migration
Receiver moved from garm to ci-sizer namespace. Since webhook
is also in ci-sizer, use short service name (same namespace).
2026-04-29 10:47:11 +02:00
cmd/webhook feat(webhook): 🏷️ pass pod name as runner identifier 2026-04-28 17:49:00 +02:00
deploy fix(deploy): 🚚 update sizer-url after namespace migration 2026-04-29 10:47:11 +02:00
internal feat(webhook): 🏷️ pass pod name as runner identifier 2026-04-28 17:49:00 +02:00
.gitignore feat(webhook): 🏷️ pass pod name as runner identifier 2026-04-28 17:49:00 +02:00
build.sh feat(webhook): 🏷️ pass pod name as runner identifier 2026-04-28 17:49:00 +02:00
CLAUDE.md docs: 📝 add pod targeting, backends, deployment, and build sections 2026-04-28 15:39:05 +02:00
Dockerfile feat(webhook): 🏷️ pass pod name as runner identifier 2026-04-28 17:49:00 +02:00
Dockerfile.standalone feat(webhook): 🏷️ pass pod name as runner identifier 2026-04-28 17:49:00 +02:00
go.mod feat(webhook): 🏷️ pass pod name as runner identifier 2026-04-28 17:49:00 +02:00
go.sum feat(webhook): 🏷️ pass pod name as runner identifier 2026-04-28 17:49:00 +02:00
Makefile feat(webhook): 🏷️ pass pod name as runner identifier 2026-04-28 17:49:00 +02:00
README.md docs: 📝 add comprehensive README with architecture, deployment, and operations guide 2026-04-28 16:04:51 +02:00

gitlab-webhook-edge-connect

Kubernetes MutatingAdmissionWebhook that intercepts GitLab Runner job pods and injects CI Sizer collector sidecars with resource sizing recommendations.

Overview

When GitLab Runner creates a Kubernetes pod to execute a CI job, this webhook intercepts the pod creation request, queries the CI Sizer receiver for historical sizing data, and mutates the pod spec to:

  1. Inject a collector sidecar — monitors CPU and memory usage via /proc and pushes metrics to the receiver on job completion
  2. Apply resource recommendations — sets Kubernetes resource requests/limits on the runner container based on historical data

This closes the feedback loop: every CI job is measured, and future jobs are right-sized based on actual usage. On cold start (no historical data), the webhook falls back to configurable defaults and allows the pod through unchanged — it never blocks CI execution.

The webhook supports two mutation backends:

  • kubernetes — mutates pod specs inline via the admission response (default)
  • edgeconnect — provisions resources via the Dynatrace Edge Connect SDK

Architecture

┌──────────────┐     ┌──────────────────┐     ┌─────────────────────────────────────────┐
│  GitLab CI   │     │  GitLab Runner   │     │  Kubernetes API Server                  │
│  Pipeline    │────▶│  (K8s executor)  │────▶│                                         │
└──────────────┘     └──────────────────┘     │  MutatingAdmissionWebhook               │
                                              │    ┌───────────────────────────────┐     │
                                              │    │ gitlab-sizer-webhook          │     │
                                              │    │                               │     │
                                              │    │ 1. Is this a Runner pod?      │     │
                                              │    │    (job.runner.gitlab.com/pod) │     │
                                              │    │                               │     │
                                              │    │ 2. Extract metadata from      │     │
                                              │    │    labels + annotations       │     │
                                              │    │                               │     │
                                              │    │ 3. GET /api/v1/sizing/...     │◄────┼──── ci-sizer receiver
                                              │    │    (fetch recommendation)     │     │
                                              │    │                               │     │
                                              │    │ 4. POST /api/v1/token         │     │
                                              │    │    (get scoped push token)    │     │
                                              │    │                               │     │
                                              │    │ 5. Return JSON patch:         │     │
                                              │    │    - Add collector sidecar     │     │
                                              │    │    - Set resource limits       │     │
                                              │    └───────────────────────────────┘     │
                                              └─────────────────────────────────────────┘
                                                              │
                                                              ▼
                                              ┌─────────────────────────────────────────┐
                                              │  CI Job Pod (mutated)                   │
                                              │                                         │
                                              │  ┌──────────────┐  ┌─────────────────┐  │
                                              │  │ runner       │  │ sizer-collector  │  │
                                              │  │ (CI job)     │  │ (sidecar)        │  │
                                              │  │              │  │                  │  │
                                              │  │ Sized with   │  │ Reads /proc      │  │
                                              │  │ historical   │  │ Pushes metrics   │──┼──▶ ci-sizer receiver
                                              │  │ data         │  │ on shutdown      │  │
                                              │  └──────────────┘  └─────────────────┘  │
                                              │  (shareProcessNamespace: true)           │
                                              └─────────────────────────────────────────┘

Prerequisites

  • Kubernetes cluster (1.19+) with admission webhook support
  • cert-manager (recommended) or manual TLS certificate provisioning
  • CI Sizer receiver deployed and accessible from the webhook pod
  • GitLab Runner (v17+) with Kubernetes executor — v17+ is required because the job.runner.gitlab.com/pod label (used for pod targeting) was introduced in that version
  • Container image for the webhook, pushed to an accessible registry

Configuration

All flags support environment variable fallback with the WEBHOOK_ prefix.

Flag Env Var Description Default
--listen-addr WEBHOOK_LISTEN_ADDR HTTPS listen address :8443
--tls-cert-file WEBHOOK_TLS_CERT_FILE Path to TLS certificate /etc/webhook/certs/tls.crt
--tls-key-file WEBHOOK_TLS_KEY_FILE Path to TLS private key /etc/webhook/certs/tls.key
--sizer-url WEBHOOK_SIZER_URL Base URL of the ci-sizer receiver — (required)
--sizer-read-token WEBHOOK_SIZER_READ_TOKEN Bearer token for sizer read/admin endpoints — (required)
--sizer-push-token WEBHOOK_SIZER_PUSH_TOKEN Bearer token for sizer push endpoints
--sizer-sidecar-image WEBHOOK_SIZER_SIDECAR_IMAGE Container image for the collector sidecar — (required for injection)
--backend WEBHOOK_BACKEND Mutation backend: kubernetes or edgeconnect kubernetes
--edge-connect-url WEBHOOK_EDGE_CONNECT_URL Edge Connect API URL (required if backend=edgeconnect)
--edge-connect-token WEBHOOK_EDGE_CONNECT_TOKEN Edge Connect auth token (required if backend=edgeconnect)
--log-level WEBHOOK_LOG_LEVEL Log level: debug, info, warn, error info

Deployment

Quick Start

# 1. Create namespaces
kubectl apply -f deploy/manifests/namespace.yaml

# 2. Install cert-manager (if not already present)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml

# 3. Create cert-manager issuer and certificate
kubectl apply -f deploy/manifests/cert-manager/issuer.yaml
kubectl apply -f deploy/manifests/cert-manager/certificate.yaml

# 4. Create the token secret
kubectl create secret generic gitlab-sizer-webhook-tokens \
  --namespace=ci-sizer \
  --from-literal=sizer-read-token=YOUR_READ_TOKEN \
  --from-literal=sizer-push-token=YOUR_PUSH_TOKEN

# 5. Deploy RBAC, webhook service, deployment, and webhook configuration
kubectl apply -f deploy/manifests/rbac.yaml
kubectl apply -f deploy/manifests/webhook-service.yaml
kubectl apply -f deploy/manifests/webhook-deployment.yaml
kubectl apply -f deploy/manifests/mutatingwebhookconfiguration.yaml

Verification

# Check the webhook pod is running
kubectl get pods -n ci-sizer -l app=gitlab-sizer-webhook

# Check the certificate was issued
kubectl get certificate -n ci-sizer gitlab-sizer-webhook-cert

# Check the webhook configuration has a CA bundle
kubectl get mutatingwebhookconfiguration gitlab-sizer-webhook -o jsonpath='{.webhooks[0].clientConfig.caBundle}' | head -c 20
# Should output base64 data (not empty)

# Verify the webhook is reachable (from inside the cluster)
kubectl run test-curl --rm -it --image=curlimages/curl -- \
  curl -sk https://gitlab-sizer-webhook.ci-sizer.svc:443/healthz
# Should return: ok

TLS Options

The default deployment uses cert-manager with a self-signed issuer. cert-manager automatically:

  • Generates the TLS certificate and key
  • Creates the gitlab-sizer-webhook-tls Secret
  • Injects the CA bundle into the MutatingWebhookConfiguration (via the cert-manager.io/inject-ca-from annotation)
  • Rotates certificates before expiry (30 days before the 1-year validity)

For production, replace the self-signed issuer with a ClusterIssuer backed by a real CA.

Option B: Self-Signed (No cert-manager)

If cert-manager is not available:

# Generate certs, create Secret, and patch the webhook caBundle
bash deploy/manifests/self-signed/setup-webhook.sh

# Or step by step:
bash deploy/manifests/self-signed/generate-certs.sh ./certs
kubectl create secret tls gitlab-sizer-webhook-tls \
  --cert=./certs/tls.crt --key=./certs/tls.key -n ci-sizer
# Then manually patch the MutatingWebhookConfiguration with the CA bundle

Note: Self-signed certificates require manual rotation. The setup-webhook.sh script handles the full flow including CA bundle injection.

Namespace Labeling

The webhook only intercepts pods in namespaces labeled with ci-sizer.devfw.io/watch: "true". The runner namespace must have this label:

apiVersion: v1
kind: Namespace
metadata:
  name: gitlab-runner
  labels:
    ci-sizer.devfw.io/watch: "true"

This is included in deploy/manifests/namespace.yaml.

GitLab Runner Setup

Helm Deployment

Deploy a GitLab Runner with the Kubernetes executor using Helm:

helm repo add gitlab https://charts.gitlab.io
helm install gitlab-runner gitlab/gitlab-runner \
  -n gitlab-runner \
  -f deploy/runner-values.yaml

Example runner-values.yaml:

gitlabUrl: https://gitlab.example.com
runnerToken: "glrt-YOUR_RUNNER_TOKEN"
runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        namespace = "gitlab-runner"
        image = "alpine:3.20"
        privileged = false
        poll_timeout = 600
        [runners.kubernetes.pod_labels]
          "ci-sizer.devfw.io/managed" = "true"
  tags: "sizer-test"
  runUntagged: false
rbac:
  create: true
  rules:
    - apiGroups: [""]
      resources: ["pods", "pods/exec", "pods/attach", "pods/log", "secrets", "configmaps", "serviceaccounts"]
      verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
    - apiGroups: [""]
      resources: ["events"]
      verbs: ["get", "list", "watch"]

Required RBAC

The GitLab Runner service account needs broad pod management permissions because the Kubernetes executor creates, attaches to, and cleans up job pods:

Resource Verbs Why
pods get, list, watch, create, delete, update, patch Job pod lifecycle
pods/exec get, list, watch, create, delete, update, patch Execute commands in job containers
pods/attach get, list, watch, create, delete, update, patch Attach to running containers for log streaming
pods/log get, list, watch, create, delete, update, patch Retrieve job logs
secrets get, list, watch, create, delete, update, patch Pull image credentials, CI variables
configmaps get, list, watch, create, delete, update, patch Runner configuration
serviceaccounts get, list, watch, create, delete, update, patch Pod identity
events get, list, watch Debugging pod scheduling issues

Runner Tags

Configure your .gitlab-ci.yml jobs to use the runner via tags:

my-job:
  tags:
    - sizer-test   # Must match the runner's tags configuration
  script:
    - echo "This job will be measured by ci-sizer"

How It Works

1. Pod Detection

The webhook uses a Kubernetes objectSelector to match only pods with the label job.runner.gitlab.com/pod (operator: Exists). This label is automatically set by GitLab Runner v17+ on every job pod. Non-runner pods are never intercepted.

Additionally, the namespaceSelector requires the label ci-sizer.devfw.io/watch: "true" on the namespace, providing a second layer of scoping.

2. Metadata Extraction

The webhook extracts CI context from GitLab Runner's native pod labels and annotations:

Source Key Maps To Example
Label project.runner.gitlab.com/root-namespace Organization (Owner) my-group
Label project.runner.gitlab.com/name Repository my-project
Annotation job.runner.gitlab.com/name Job name build
Annotation job.runner.gitlab.com/id Run/Pipeline ID 12345
Annotation job.runner.gitlab.com/ref Workflow (branch ref) main

These are set automatically by GitLab Runner — no pipeline configuration is needed.

3. Sizing Recommendation Lookup

The webhook calls GET /api/v1/sizing/repo/{org}/{repo}/{workflow}/{job} on the CI Sizer receiver. Three outcomes:

Response Behavior
200 OK with sizing data Apply historical recommendations to pod resources
404 Not Found (cold start) Fall back to config defaults; the collector will still gather data for future runs
Error / timeout Allow pod through unchanged (failurePolicy: Ignore)

4. Sidecar Injection

When a sidecar image is configured, the webhook:

  1. Requests a scoped HMAC push token from POST /api/v1/token
  2. Adds a sizer-collector init container (with restartPolicy: Always — a sidecar container)
  3. Sets shareProcessNamespace: true so the collector can read /proc for all containers
  4. Configures the sidecar with:
    • COLLECTOR_PUSH_TOKEN — scoped to this org/repo/workflow/job
    • GitLab CI environment variables (GITLAB_CI, CI_PIPELINE_ID, CI_PROJECT_NAMESPACE, etc.)
    • CGROUP_PROCESS_MAP — maps process names to container identifiers
    • CGROUP_LIMITS — resource limits for cgroup telemetry
    • APPLIED_SIZING — the raw sizing response for audit/debugging

5. Failure Safety

The webhook is designed to never block CI execution:

  • failurePolicy: Ignore — if the webhook is down, pods are created normally
  • timeoutSeconds: 5 — fast timeout prevents slow API calls from blocking pod creation
  • Every error path in the handler returns Allowed: true — parse failures, backend errors, and missing data all result in the pod being allowed through

Telekom MCICD Specifics

When deploying on Telekom Magenta CICD (MCICD) infrastructure:

Proxy Configuration

The webhook deployment supports HTTP proxy via optional ConfigMap:

kubectl create configmap gitlab-sizer-webhook-config \
  --namespace=ci-sizer \
  --from-literal=HTTP_PROXY=http://proxy.devops.telekom.de:3128 \
  --from-literal=HTTPS_PROXY=http://proxy.devops.telekom.de:3128 \
  --from-literal=NO_PROXY=.svc,.svc.cluster.local,10.0.0.0/8

Registry Mirror

If the cluster uses a registry mirror, update the sidecar image reference in the deployment:

--sizer-sidecar-image=registry-mirror.example.com/devfw-cicd/ci-sizer-collector:latest

Runner Version

GitLab Runner v17+ is required. Earlier versions do not set the job.runner.gitlab.com/pod label, and the webhook's objectSelector will not match their pods.

Troubleshooting

Webhook Not Intercepting Pods

Symptom Cause Fix
No sidecar injected Runner namespace missing ci-sizer.devfw.io/watch: "true" label kubectl label ns gitlab-runner ci-sizer.devfw.io/watch=true
No sidecar injected Runner version < v17 (no job.runner.gitlab.com/pod label) Upgrade GitLab Runner to v17+
No sidecar injected Webhook pod not running kubectl get pods -n ci-sizer -l app=gitlab-sizer-webhook
Webhook pod CrashLoopBackOff TLS certificate not found Check kubectl get secret gitlab-sizer-webhook-tls -n ci-sizer
Webhook pod CrashLoopBackOff Invalid configuration Check logs: kubectl logs -n ci-sizer -l app=gitlab-sizer-webhook

TLS / Certificate Issues

Symptom Cause Fix
x509: certificate signed by unknown authority CA bundle not injected into webhook config Verify cert-manager annotation or run setup-webhook.sh
connection refused on port 443 Service not routing to webhook pod Check kubectl get endpoints -n ci-sizer gitlab-sizer-webhook
Certificate expired cert-manager not renewing Check kubectl get certificate -n ci-sizer for Ready status

Sidecar Issues

Symptom Cause Fix
Collector sidecar present but no metrics pushed Push token invalid or receiver unreachable Check collector logs in the job pod
SIGPIPE errors in collector Runner container exited before collector finished pushing Non-fatal; metrics are pushed on graceful shutdown
Volume mount errors Pod spec references volumes that don't exist The webhook only mounts the runner volume if it exists in the pod spec

RBAC Issues

Symptom Cause Fix
Runner can't create pods Missing RBAC rules Apply the RBAC from deploy/runner-values.yaml
Runner can't attach to pods Missing pods/attach permission Add pods/attach to the runner ClusterRole

Development

Building Locally

make all    # fmt + vet + lint + build
make test   # run tests

The binary is built to ./webhook.

Docker Build

Two Dockerfiles are provided:

File Use Case Build Context
Dockerfile Development — uses replace directives to local ci-sizer and edge-connect-client repos Parent directory (requires sibling repos)
Dockerfile.standalone CI/CD — uses published module versions This repo only
# Development build (from parent directory containing all repos)
docker build -f gitlab-webhook-edge-connect/Dockerfile -t webhook:dev .

# Standalone build
docker build -f Dockerfile.standalone -t webhook:latest .

Package Layout

Package Purpose
cmd/webhook/main.go Entry point, flag parsing, TLS server, graceful shutdown
internal/webhook/handler.go AdmissionReview decode → label check → backend delegation → JSON patch
internal/webhook/config.go Config struct, backend type constants
internal/backend/backend.go Backend interface
internal/backend/kubernetes.go Inline pod spec mutation (sidecar + resource limits)
internal/backend/edgeconnect.go Edge Connect SDK provisioning

Dependencies

Module Purpose
ci-sizer/pkg/inject Shared injection library — GetRecommendation, AddCollectorSidecar, ApplyRecommendation
ci-sizer/pkg/client Generated OpenAPI client for the sizer receiver API
edge-connect-client/v2/sdk/edgeconnect Dynatrace Edge Connect SDK
k8s.io/api, k8s.io/apimachinery Kubernetes API types

License

EUPL-1.2 — see LICENSE.