- Go 78.6%
- JavaScript 10.1%
- CSS 7.6%
- Makefile 1.2%
- HTML 1.2%
- Other 1.3%
|
All checks were successful
ci / ci (push) Successful in 53s
go-oidc built-in ClientID check requires aud=clientID which OAuth2 access tokens do not carry by default. Switch to SkipClientIDCheck:true and enforce audience manually via validateAudience() so signature/expiry are still verified via JWKS while allowing the IdP-issued access token aud to differ from the OIDC client ID. When ClientID is empty the aud check is skipped entirely, supporting deployments without audience enforcement. Add test cases: missing aud rejected when clientID set, any aud accepted when clientID is empty. |
||
|---|---|---|
| .github/workflows | ||
| cmd | ||
| docs | ||
| internal | ||
| pkg/client | ||
| scripts | ||
| test | ||
| .gitignore | ||
| .goreleaser.yaml | ||
| AGENTS.md | ||
| CLAUDE.md | ||
| Dockerfile | ||
| Dockerfile.goreleaser | ||
| flake.lock | ||
| go.mod | ||
| go.sum | ||
| Makefile | ||
| metrics.db | ||
| README.md | ||
| renovate.json | ||
Forgejo Runner Sizer
A resource sizer for CI/CD workloads in shared PID namespace environments. The collector reads /proc to gather CPU and memory metrics grouped by container/cgroup, and pushes run summaries to the receiver. The receiver stores metrics and exposes a sizer API that computes right-sized Kubernetes resource requests and limits from historical data.
Architecture
The system has two binaries — a collector and a receiver (which includes the sizer):
┌─────────────────────────────────────────────┐ ┌──────────────────────────┐
│ CI/CD Pod (shared PID namespace) │ │ Receiver Service │
│ │ │ │
│ ┌───────────┐ ┌────────┐ ┌───────────┐ │ │ POST /api/v1/metrics │
│ │ collector │ │ runner │ │ sidecar │ │ │ │ │
│ │ │ │ │ │ │ │ │ ▼ │
│ │ reads │ │ │ │ │ │ push │ ┌────────────┐ │
│ │ /proc for │ │ │ │ │ │──────▶│ │ SQLite │ │
│ │ all PIDs │ │ │ │ │ │ │ └────────────┘ │
│ └───────────┘ └────────┘ └───────────┘ │ │ │ │
│ │ │ ▼ │
└─────────────────────────────────────────────┘ │ GET /api/v1/metrics/... │
│ GET /api/v1/sizing/... │
│ (sizer) │
└──────────────────────────┘
Collector
Runs as a sidecar alongside CI workloads. On a configurable interval, it reads /proc to collect CPU and memory for all visible processes, groups them by container using cgroup paths, and accumulates samples. On shutdown (SIGINT/SIGTERM), it computes run-level statistics (peak, avg, percentiles) and pushes a single summary to the receiver.
./collector --interval=2s --top=10 --push-endpoint=http://receiver:8080/api/v1/metrics
Flags: --interval, --proc-path, --log-level, --log-format, --top, --push-endpoint, --push-token
Environment variables:
| Variable | Description | Example |
|---|---|---|
GITHUB_REPOSITORY_OWNER |
Organization name | my-org |
GITHUB_REPOSITORY |
Full repository path | my-org/my-repo |
GITHUB_WORKFLOW |
Workflow filename | ci.yml |
GITHUB_JOB |
Job name | build |
GITHUB_RUN_ID |
Unique run identifier | run-123 |
COLLECTOR_PUSH_TOKEN |
Bearer token for push endpoint auth | — |
CGROUP_PROCESS_MAP |
JSON: process name → container name | {"node":"runner"} |
CGROUP_LIMITS |
JSON: per-container CPU/memory limits | See below |
CGROUP_LIMITS example:
{
"runner": { "cpu": "2", "memory": "1Gi" },
"sidecar": { "cpu": "500m", "memory": "256Mi" }
}
CPU supports Kubernetes notation ("2" = 2 cores, "500m" = 0.5 cores). Memory supports Ki, Mi, Gi, Ti (binary) or K, M, G, T (decimal).
Receiver (with sizer)
HTTP service that stores metric summaries in SQLite (via GORM), exposes a query API, and provides a sizer endpoint that computes right-sized Kubernetes resource requests and limits from historical run data.
./receiver --addr=:8080 --db=metrics.db --read-token=my-secret-token --hmac-key=my-hmac-key
Flags:
| Flag | Environment Variable | Description | Default |
|---|---|---|---|
--addr |
— | HTTP listen address | :8080 |
--db |
— | SQLite database path | metrics.db |
--read-token |
RECEIVER_READ_TOKEN |
Pre-shared token for read/admin endpoints (required) | — |
--hmac-key |
RECEIVER_HMAC_KEY |
Secret key for push token generation/validation (required) | — |
Web UI
The receiver includes a web UI for viewing collected metrics.
- URL:
/ui - Authentication: The UI uses the same
--read-tokenas the API. Enter the token in the UI to load metrics.
Endpoints:
POST /api/v1/metrics— receive and store a metric summary (requires scoped push token)POST /api/v1/token— generate a scoped push token (requires read token auth)GET /api/v1/metrics/repo/{org}/{repo}/{workflow}/{job}— query stored metrics (requires read token auth)GET /api/v1/debug/metrics— return all metric rows from the database (requires read token auth)GET /api/v1/sizing/repo/{org}/{repo}/{workflow}/{job}— compute container sizes from historical data (requires read token auth)
Authentication:
All metrics endpoints require authentication via --read-token:
- The GET endpoint requires a Bearer token matching the read token
- The POST metrics endpoint requires a scoped push token (generated via
POST /api/v1/token) - The token endpoint itself requires the read token
Token flow:
# 1. Admin generates a scoped push token using the read token
curl -X POST http://localhost:8080/api/v1/token \
-H "Authorization: Bearer my-secret-token" \
-H "Content-Type: application/json" \
-d '{"organization":"my-org","repository":"my-repo","workflow":"ci.yml","job":"build"}'
# → {"token":"<hex-encoded HMAC>"}
# 2. Collector uses the scoped token to push metrics
./collector --push-endpoint=http://localhost:8080/api/v1/metrics \
--push-token=<token-from-step-1>
# 3. Query metrics with the read token
curl -H "Authorization: Bearer my-secret-token" \ #gitleaks:allow
http://localhost:8080/api/v1/metrics/repo/my-org/my-repo/ci.yml/build
# 4. Debug endpoint: dump all stored metrics
curl -H "Authorization: Bearer my-secret-token" \ #gitleaks:allow
http://localhost:8080/api/v1/debug/metrics
Push tokens are HMAC-SHA256 digests derived from --hmac-key and the scope (org/repo/workflow/job). They are stateless — no database storage is needed. The HMAC key is separate from the read token so that compromising a push token does not expose the admin credential.
How Metrics Are Collected
The collector reads /proc/[pid]/stat for every visible process to get CPU ticks (utime + stime) and /proc/[pid]/status for memory (RSS). It takes two samples per interval and computes the delta to derive CPU usage rates.
Processes are grouped into containers by reading /proc/[pid]/cgroup and matching cgroup paths against the CGROUP_PROCESS_MAP. This is necessary because in shared PID namespace pods, /proc/stat only shows host-level aggregates — per-container metrics must be built up from individual process data.
Container CPU is reported in cores (not percentage) for direct comparison with Kubernetes resource limits. System-level CPU is reported as a percentage (0-100%).
Over the course of a run, the summary.Accumulator tracks every sample and on shutdown computes:
| Stat | Description |
|---|---|
peak |
Maximum observed value |
p99, p95, p75, p50 |
Percentiles across all samples |
avg |
Arithmetic mean |
These stats are computed for CPU, memory, and per-container metrics.
API Response
GET /api/v1/metrics/repo/my-org/my-repo/ci.yml/build
[
{
"id": 1,
"organization": "my-org",
"repository": "my-org/my-repo",
"workflow": "ci.yml",
"job": "build",
"run_id": "run-123",
"received_at": "2026-02-06T14:30:23.056Z",
"payload": {
"start_time": "2026-02-06T14:30:02.185Z",
"end_time": "2026-02-06T14:30:22.190Z",
"duration_seconds": 20.0,
"sample_count": 11,
"cpu_total_percent": { "peak": ..., "avg": ..., "p50": ... },
"mem_used_bytes": { "peak": ..., "avg": ... },
"containers": [
{
"name": "runner",
"cpu_cores": { "peak": 2.007, "avg": 1.5, "p50": 1.817, "p95": 2.004 },
"memory_bytes": { "peak": 18567168, "avg": 18567168 }
}
],
"top_cpu_processes": [ ... ],
"top_mem_processes": [ ... ]
}
}
]
CPU metric distinction:
cpu_total_percent— system-wide, 0-100%cpu_cores(containers) — cores used (e.g.2.0= two full cores)peak_cpu_percent(processes) — per-process, where 100% = 1 core
All memory values are in bytes.
How Sizing Works
The sizer computes Kubernetes resource requests and limits by aggregating historical run data for a given workflow/job combination.
Algorithm
-
Collect the N most recent runs (default: 5, configurable via
?runs=). -
Per container, across runs:
- CPU request — take the selected percentile (default: p95) of each run's CPU usage, then take the maximum across runs.
- Memory request — take the peak memory of each run, then take the maximum across runs.
-
Apply a buffer to add headroom above observed values:
- CPU uses a flat configurable buffer (default: 20%, via
?buffer=). - Memory uses a staircase buffer — larger allocations are inherently more stable and over-provisioning them wastes more cluster resources:
Observed peak Buffer < 1 GiB 20% 1 – 4 GiB 10% > 4 GiB 5%
- CPU uses a flat configurable buffer (default: 20%, via
-
Apply floor values — ensure every container gets a minimum viable allocation even if it was completely idle in all observed runs:
Resource Request floor Limit floor CPU 10m500mMemory 32Mi128MiRequest and limit floors are intentionally asymmetric: a low request allows efficient scheduling bin-packing, while a higher limit prevents OOM kills or severe throttling if a previously-idle container becomes active.
-
Apply a memory ceiling — a single container cannot be recommended more memory than the entire pod ever consumed across all observed runs, plus 20%. This caps outlier recommendations without hardcoding a node-size-specific value; the ceiling adapts automatically as more runs are collected.
-
Round limits to clean values: CPU limits round up to the nearest 0.5 cores; memory limits round up to the next power of 2 in Mi.
Query parameters
| Parameter | Default | Description |
|---|---|---|
runs |
5 |
Number of recent runs to analyse (1–100) |
buffer |
20 |
CPU headroom percentage (memory uses the staircase above) |
cpu_percentile |
p95 |
CPU stat to use: peak, p99, p95, p75, p50, avg |
Sizing response
GET /api/v1/sizing/repo/my-org/my-repo/ci.yml/build?runs=10&buffer=20&cpu_percentile=p95
{
"containers": [
{
"name": "runner",
"cpu": { "request": "960m", "limit": "1" },
"memory": { "request": "615Mi", "limit": "1024Mi" }
},
{
"name": "buildkitd",
"cpu": { "request": "10m", "limit": "500m" },
"memory": { "request": "32Mi", "limit": "128Mi" }
}
],
"total": {
"cpu": { "request": "970m", "limit": "1500m" },
"memory": { "request": "647Mi", "limit": "1024Mi" }
},
"meta": {
"runs_analyzed": 10,
"buffer_percent": 20,
"cpu_percentile": "p95"
}
}
The total fields sum requests across all containers and can be used to size the pod as a whole.
Note: For per-container sizing to work correctly, the collector must have
CGROUP_PROCESS_MAPconfigured so that processes are grouped under stable container names. Runs collected without this mapping use raw cgroup paths as container identifiers, which change every run and will never accumulate history.
Running
Docker Compose
# Start the receiver (builds image if needed):
docker compose -f test/docker/docker-compose-stress.yaml up -d --build receiver
# Generate a scoped push token for the collector:
PUSH_TOKEN=$(curl -s -X POST http://localhost:9080/api/v1/token \
-H "Authorization: Bearer dummyreadtoken" \
-H "Content-Type: application/json" \
-d '{"organization":"test-org","repository":"test-org/stress-test","workflow":"stress-test-workflow","job":"heavy-workload"}' \
| jq -r .token)
# Start the collector and stress workloads with the push token:
COLLECTOR_PUSH_TOKEN=$PUSH_TOKEN \
docker compose -f test/docker/docker-compose-stress.yaml up -d --build collector
# ... Wait for data collection ...
# Trigger shutdown summary:
docker compose -f test/docker/docker-compose-stress.yaml stop collector
# Query results with the read token:
curl -H "Authorization: Bearer dummyreadtoken" \
http://localhost:9080/api/v1/metrics/repo/test-org/test-org%2Fstress-test/stress-test-workflow/heavy-workload
Local
go build -o collector ./cmd/collector
go build -o receiver ./cmd/receiver
# Start receiver with both keys:
./receiver --addr=:8080 --db=metrics.db \
--read-token=my-secret-token --hmac-key=my-hmac-key
# Generate a scoped push token:
PUSH_TOKEN=$(curl -s -X POST http://localhost:8080/api/v1/token \
-H "Authorization: Bearer my-secret-token" \
-H "Content-Type: application/json" \
-d '{"organization":"my-org","repository":"my-repo","workflow":"ci.yml","job":"build"}' \
| jq -r .token)
# Run collector with the push token:
./collector --interval=2s --top=10 \
--push-endpoint=http://localhost:8080/api/v1/metrics \
--push-token=$PUSH_TOKEN
Internal Packages
| Package | Purpose |
|---|---|
internal/proc |
Low-level /proc parsing (stat, status, cgroup) |
internal/metrics |
Aggregates process metrics from /proc into system/container views |
internal/cgroup |
Parses CGROUP_PROCESS_MAP and CGROUP_LIMITS env vars |
internal/collector |
Orchestrates the collection loop and shutdown |
internal/summary |
Accumulates samples, computes stats, pushes to receiver |
internal/receiver |
HTTP handlers, SQLite store, and sizer logic |
internal/output |
Metrics output formatting (JSON/text) |
Dependency Updates (Renovate)
This repository includes a scheduled Renovate workflow at .github/workflows/renovate.yaml.
Create a repository secret named RENOVATE_TOKEN from a dedicated Forgejo bot account PAT.
Required Forgejo token scopes for RENOVATE_TOKEN:
repo(Read and Write)user(Read)issue(Read and Write)organization(Read)
If Renovate needs to read Forgejo packages, also add read:packages.
Background
Technical reference on the Linux primitives this project builds on:
- Identifying process cgroups by PID — how to read
/proc/<PID>/cgroupto determine which container a process belongs to - /proc/stat behavior in containers — why
/proc/statshows host-level data in containers, and how to aggregate per-process stats from/proc/[pid]/statinstead, including CPU tick conversion and cgroup limit handling