|
|
||
|---|---|---|
| .github/workflows | ||
| cmd | ||
| docs/background | ||
| internal | ||
| scripts/hooks | ||
| test | ||
| .gitignore | ||
| .goreleaser.yaml | ||
| CLAUDE.md | ||
| Dockerfile | ||
| go.mod | ||
| go.sum | ||
| Makefile | ||
| README.md | ||
Forgejo Runner Resource Collector
A lightweight metrics collector for CI/CD workloads in shared PID namespace environments. Reads /proc to collect CPU and memory metrics, groups them by container/cgroup, and pushes run summaries to a receiver service for storage and querying.
Architecture
The system has two independent binaries:
┌─────────────────────────────────────────────┐ ┌──────────────────────────┐
│ CI/CD Pod (shared PID namespace) │ │ Receiver Service │
│ │ │ │
│ ┌───────────┐ ┌────────┐ ┌───────────┐ │ │ POST /api/v1/metrics │
│ │ collector │ │ runner │ │ sidecar │ │ │ │ │
│ │ │ │ │ │ │ │ │ ▼ │
│ │ reads │ │ │ │ │ │ push │ ┌────────────┐ │
│ │ /proc for │ │ │ │ │ │──────▶│ │ SQLite │ │
│ │ all PIDs │ │ │ │ │ │ │ └────────────┘ │
│ └───────────┘ └────────┘ └───────────┘ │ │ │ │
│ │ │ ▼ │
└─────────────────────────────────────────────┘ │ GET /api/v1/metrics/... │
└──────────────────────────┘
Collector
Runs as a sidecar alongside CI workloads. On a configurable interval, it reads /proc to collect CPU and memory for all visible processes, groups them by container using cgroup paths, and accumulates samples. On shutdown (SIGINT/SIGTERM), it computes run-level statistics (peak, avg, percentiles) and pushes a single summary to the receiver.
./collector --interval=2s --top=10 --push-endpoint=http://receiver:8080/api/v1/metrics
Flags: --interval, --proc-path, --log-level, --log-format, --top, --push-endpoint
Environment variables:
| Variable | Description | Example |
|---|---|---|
GITHUB_REPOSITORY_OWNER |
Organization name | my-org |
GITHUB_REPOSITORY |
Full repository path | my-org/my-repo |
GITHUB_WORKFLOW |
Workflow filename | ci.yml |
GITHUB_JOB |
Job name | build |
GITHUB_RUN_ID |
Unique run identifier | run-123 |
CGROUP_PROCESS_MAP |
JSON: process name → container name | {"node":"runner"} |
CGROUP_LIMITS |
JSON: per-container CPU/memory limits | See below |
CGROUP_LIMITS example:
{
"runner": { "cpu": "2", "memory": "1Gi" },
"sidecar": { "cpu": "500m", "memory": "256Mi" }
}
CPU supports Kubernetes notation ("2" = 2 cores, "500m" = 0.5 cores). Memory supports Ki, Mi, Gi, Ti (binary) or K, M, G, T (decimal).
Receiver
HTTP service that stores metric summaries in SQLite (via GORM) and exposes a query API.
./receiver --addr=:8080 --db=metrics.db --read-token=my-secret-token
Flags:
| Flag | Environment Variable | Description | Default |
|---|---|---|---|
--addr |
— | HTTP listen address | :8080 |
--db |
— | SQLite database path | metrics.db |
--read-token |
RECEIVER_READ_TOKEN |
Pre-shared token for read endpoints (optional) | — |
Endpoints:
POST /api/v1/metrics— receive and store a metric summaryGET /api/v1/metrics/repo/{org}/{repo}/{workflow}/{job}— query stored metrics (protected if--read-tokenis set)
Authentication:
When --read-token is configured, the GET endpoint requires a Bearer token:
curl -H "Authorization: Bearer my-secret-token" \ #gitleaks:allow
http://localhost:8080/api/v1/metrics/repo/org/repo/workflow/job
If no token is configured, the endpoint remains open.
How Metrics Are Collected
The collector reads /proc/[pid]/stat for every visible process to get CPU ticks (utime + stime) and /proc/[pid]/status for memory (RSS). It takes two samples per interval and computes the delta to derive CPU usage rates.
Processes are grouped into containers by reading /proc/[pid]/cgroup and matching cgroup paths against the CGROUP_PROCESS_MAP. This is necessary because in shared PID namespace pods, /proc/stat only shows host-level aggregates — per-container metrics must be built up from individual process data.
Container CPU is reported in cores (not percentage) for direct comparison with Kubernetes resource limits. System-level CPU is reported as a percentage (0-100%).
Over the course of a run, the summary.Accumulator tracks every sample and on shutdown computes:
| Stat | Description |
|---|---|
peak |
Maximum observed value |
p99, p95, p75, p50 |
Percentiles across all samples |
avg |
Arithmetic mean |
These stats are computed for CPU, memory, and per-container metrics.
API Response
GET /api/v1/metrics/repo/my-org/my-repo/ci.yml/build
[
{
"id": 1,
"organization": "my-org",
"repository": "my-org/my-repo",
"workflow": "ci.yml",
"job": "build",
"run_id": "run-123",
"received_at": "2026-02-06T14:30:23.056Z",
"payload": {
"start_time": "2026-02-06T14:30:02.185Z",
"end_time": "2026-02-06T14:30:22.190Z",
"duration_seconds": 20.0,
"sample_count": 11,
"cpu_total_percent": { "peak": ..., "avg": ..., "p50": ... },
"mem_used_bytes": { "peak": ..., "avg": ... },
"containers": [
{
"name": "runner",
"cpu_cores": { "peak": 2.007, "avg": 1.5, "p50": 1.817, "p95": 2.004 },
"memory_bytes": { "peak": 18567168, "avg": 18567168 }
}
],
"top_cpu_processes": [ ... ],
"top_mem_processes": [ ... ]
}
}
]
CPU metric distinction:
cpu_total_percent— system-wide, 0-100%cpu_cores(containers) — cores used (e.g.2.0= two full cores)peak_cpu_percent(processes) — per-process, where 100% = 1 core
All memory values are in bytes.
Running
Docker Compose
docker compose -f test/docker/docker-compose-stress.yaml up -d
# Wait for collection, then trigger shutdown summary:
docker compose -f test/docker/docker-compose-stress.yaml stop collector
# Query results:
curl http://localhost:9080/api/v1/metrics/repo/test-org/test-org%2Fstress-test/stress-test-workflow/heavy-workload
Local
go build -o collector ./cmd/collector
go build -o receiver ./cmd/receiver
./receiver --addr=:8080 --db=metrics.db
./collector --interval=2s --top=10 --push-endpoint=http://localhost:8080/api/v1/metrics
Internal Packages
| Package | Purpose |
|---|---|
internal/proc |
Low-level /proc parsing (stat, status, cgroup) |
internal/metrics |
Aggregates process metrics from /proc into system/container views |
internal/cgroup |
Parses CGROUP_PROCESS_MAP and CGROUP_LIMITS env vars |
internal/collector |
Orchestrates the collection loop and shutdown |
internal/summary |
Accumulates samples, computes stats, pushes to receiver |
internal/receiver |
HTTP handlers and SQLite store |
internal/output |
Metrics output formatting (JSON/text) |
Background
Technical reference on the Linux primitives this project builds on:
- Identifying process cgroups by PID — how to read
/proc/<PID>/cgroupto determine which container a process belongs to - /proc/stat behavior in containers — why
/proc/statshows host-level data in containers, and how to aggregate per-process stats from/proc/[pid]/statinstead, including CPU tick conversion and cgroup limit handling