# Forgejo Runner Resource Collector A lightweight metrics collector for CI/CD workloads in shared PID namespace environments. Reads `/proc` to collect CPU and memory metrics, groups them by container/cgroup, and pushes run summaries to a receiver service for storage and querying. ## Architecture The system has two independent binaries: ``` ┌─────────────────────────────────────────────┐ ┌──────────────────────────┐ │ CI/CD Pod (shared PID namespace) │ │ Receiver Service │ │ │ │ │ │ ┌───────────┐ ┌────────┐ ┌───────────┐ │ │ POST /api/v1/metrics │ │ │ collector │ │ runner │ │ sidecar │ │ │ │ │ │ │ │ │ │ │ │ │ │ ▼ │ │ │ reads │ │ │ │ │ │ push │ ┌────────────┐ │ │ │ /proc for │ │ │ │ │ │──────▶│ │ SQLite │ │ │ │ all PIDs │ │ │ │ │ │ │ └────────────┘ │ │ └───────────┘ └────────┘ └───────────┘ │ │ │ │ │ │ │ ▼ │ └─────────────────────────────────────────────┘ │ GET /api/v1/metrics/... │ └──────────────────────────┘ ``` ### Collector Runs as a sidecar alongside CI workloads. On a configurable interval, it reads `/proc` to collect CPU and memory for all visible processes, groups them by container using cgroup paths, and accumulates samples. On shutdown (SIGINT/SIGTERM), it computes run-level statistics (peak, avg, percentiles) and pushes a single summary to the receiver. ```bash ./collector --interval=2s --top=10 --push-endpoint=http://receiver:8080/api/v1/metrics ``` **Flags:** `--interval`, `--proc-path`, `--log-level`, `--log-format`, `--top`, `--push-endpoint` **Environment variables:** | Variable | Description | Example | | ------------------------- | ------------------------------------- | ------------------- | | `GITHUB_REPOSITORY_OWNER` | Organization name | `my-org` | | `GITHUB_REPOSITORY` | Full repository path | `my-org/my-repo` | | `GITHUB_WORKFLOW` | Workflow filename | `ci.yml` | | `GITHUB_JOB` | Job name | `build` | | `GITHUB_RUN_ID` | Unique run identifier | `run-123` | | `CGROUP_PROCESS_MAP` | JSON: process name → container name | `{"node":"runner"}` | | `CGROUP_LIMITS` | JSON: per-container CPU/memory limits | See below | **CGROUP_LIMITS example:** ```json { "runner": { "cpu": "2", "memory": "1Gi" }, "sidecar": { "cpu": "500m", "memory": "256Mi" } } ``` CPU supports Kubernetes notation (`"2"` = 2 cores, `"500m"` = 0.5 cores). Memory supports `Ki`, `Mi`, `Gi`, `Ti` (binary) or `K`, `M`, `G`, `T` (decimal). ### Receiver HTTP service that stores metric summaries in SQLite (via GORM) and exposes a query API. ```bash ./receiver --addr=:8080 --db=metrics.db --read-token=my-secret-token ``` **Flags:** | Flag | Environment Variable | Description | Default | | -------------- | --------------------- | ---------------------------------------------- | ------------ | | `--addr` | — | HTTP listen address | `:8080` | | `--db` | — | SQLite database path | `metrics.db` | | `--read-token` | `RECEIVER_READ_TOKEN` | Pre-shared token for read endpoints (optional) | — | **Endpoints:** - `POST /api/v1/metrics` — receive and store a metric summary - `GET /api/v1/metrics/repo/{org}/{repo}/{workflow}/{job}` — query stored metrics (protected if `--read-token` is set) **Authentication:** When `--read-token` is configured, the GET endpoint requires a Bearer token: ```bash curl -H "Authorization: Bearer my-secret-token" \ #gitleaks:allow http://localhost:8080/api/v1/metrics/repo/org/repo/workflow/job ``` If no token is configured, the endpoint remains open. ## How Metrics Are Collected The collector reads `/proc/[pid]/stat` for every visible process to get CPU ticks (`utime` + `stime`) and `/proc/[pid]/status` for memory (RSS). It takes two samples per interval and computes the delta to derive CPU usage rates. Processes are grouped into containers by reading `/proc/[pid]/cgroup` and matching cgroup paths against the `CGROUP_PROCESS_MAP`. This is necessary because in shared PID namespace pods, `/proc/stat` only shows host-level aggregates — per-container metrics must be built up from individual process data. Container CPU is reported in **cores** (not percentage) for direct comparison with Kubernetes resource limits. System-level CPU is reported as a percentage (0-100%). Over the course of a run, the `summary.Accumulator` tracks every sample and on shutdown computes: | Stat | Description | | -------------------------- | ------------------------------ | | `peak` | Maximum observed value | | `p99`, `p95`, `p75`, `p50` | Percentiles across all samples | | `avg` | Arithmetic mean | These stats are computed for CPU, memory, and per-container metrics. ## API Response ``` GET /api/v1/metrics/repo/my-org/my-repo/ci.yml/build ``` ```json [ { "id": 1, "organization": "my-org", "repository": "my-org/my-repo", "workflow": "ci.yml", "job": "build", "run_id": "run-123", "received_at": "2026-02-06T14:30:23.056Z", "payload": { "start_time": "2026-02-06T14:30:02.185Z", "end_time": "2026-02-06T14:30:22.190Z", "duration_seconds": 20.0, "sample_count": 11, "cpu_total_percent": { "peak": ..., "avg": ..., "p50": ... }, "mem_used_bytes": { "peak": ..., "avg": ... }, "containers": [ { "name": "runner", "cpu_cores": { "peak": 2.007, "avg": 1.5, "p50": 1.817, "p95": 2.004 }, "memory_bytes": { "peak": 18567168, "avg": 18567168 } } ], "top_cpu_processes": [ ... ], "top_mem_processes": [ ... ] } } ] ``` **CPU metric distinction:** - `cpu_total_percent` — system-wide, 0-100% - `cpu_cores` (containers) — cores used (e.g. `2.0` = two full cores) - `peak_cpu_percent` (processes) — per-process, where 100% = 1 core All memory values are in **bytes**. ## Running ### Docker Compose ```bash docker compose -f test/docker/docker-compose-stress.yaml up -d # Wait for collection, then trigger shutdown summary: docker compose -f test/docker/docker-compose-stress.yaml stop collector # Query results: curl http://localhost:9080/api/v1/metrics/repo/test-org/test-org%2Fstress-test/stress-test-workflow/heavy-workload ``` ### Local ```bash go build -o collector ./cmd/collector go build -o receiver ./cmd/receiver ./receiver --addr=:8080 --db=metrics.db ./collector --interval=2s --top=10 --push-endpoint=http://localhost:8080/api/v1/metrics ``` ## Internal Packages | Package | Purpose | | -------------------- | ------------------------------------------------------------------- | | `internal/proc` | Low-level `/proc` parsing (stat, status, cgroup) | | `internal/metrics` | Aggregates process metrics from `/proc` into system/container views | | `internal/cgroup` | Parses `CGROUP_PROCESS_MAP` and `CGROUP_LIMITS` env vars | | `internal/collector` | Orchestrates the collection loop and shutdown | | `internal/summary` | Accumulates samples, computes stats, pushes to receiver | | `internal/receiver` | HTTP handlers and SQLite store | | `internal/output` | Metrics output formatting (JSON/text) | ## Background Technical reference on the Linux primitives this project builds on: - [Identifying process cgroups by PID](docs/background/identify-process-cgroup-by-pid.md) — how to read `/proc//cgroup` to determine which container a process belongs to - [/proc/stat behavior in containers](docs/background/proc-stat-in-containers.md) — why `/proc/stat` shows host-level data in containers, and how to aggregate per-process stats from `/proc/[pid]/stat` instead, including CPU tick conversion and cgroup limit handling