Document receiver API endpoints and response format. Clarify that container cpu_cores values are in number of cores (not percentage), while system/process CPU values are percentages. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
5.1 KiB
Forgejo Runner Resource Collector
A lightweight resource metrics collector designed to run alongside CI/CD workloads in shared PID namespace environments. It collects CPU and memory metrics, groups them by container/cgroup, and pushes summaries to a receiver service.
Components
- Collector: Gathers system and per-process metrics at regular intervals, computes run-level statistics, and pushes a summary on shutdown.
- Receiver: HTTP service that stores metric summaries in SQLite and provides a query API.
Receiver API
POST /api/v1/metrics
Receives metric summaries from collectors.
GET /api/v1/metrics/repo/{org}/{repo}/{workflow}/{job}
Retrieves all stored metrics for a specific workflow and job.
Example request:
GET /api/v1/metrics/repo/my-org/my-repo/ci.yml/build
Example response:
[
{
"id": 1,
"organization": "my-org",
"repository": "my-org/my-repo",
"workflow": "ci.yml",
"job": "build",
"run_id": "run-123",
"received_at": "2026-02-06T14:30:23.056Z",
"payload": {
"start_time": "2026-02-06T14:30:02.185Z",
"end_time": "2026-02-06T14:30:22.190Z",
"duration_seconds": 20.0,
"sample_count": 11,
"cpu_total_percent": { ... },
"mem_used_bytes": { ... },
"mem_used_percent": { ... },
"top_cpu_processes": [ ... ],
"top_mem_processes": [ ... ],
"containers": [
{
"name": "runner",
"cpu_cores": {
"peak": 2.007,
"p99": 2.005,
"p95": 2.004,
"p75": 1.997,
"p50": 1.817,
"avg": 1.5
},
"memory_bytes": {
"peak": 18567168,
"p99": 18567168,
"p95": 18567168,
"p75": 18567168,
"p50": 18567168,
"avg": 18567168
}
}
]
}
}
]
Understanding the Metrics
CPU Metrics
There are two different CPU metric formats in the response:
1. System and Process CPU: Percentage (cpu_total_percent, peak_cpu_percent)
These values represent CPU utilization as a percentage of total available CPU time.
cpu_total_percent: Overall system CPU usage (0-100%)peak_cpu_percent(in process lists): Per-process CPU usage where 100% = 1 full CPU core
2. Container CPU: Cores (cpu_cores)
Important: The cpu_cores field in container metrics represents CPU usage in number of cores, not percentage.
| Value | Meaning |
|---|---|
0.5 |
Half a CPU core |
1.0 |
One full CPU core |
2.0 |
Two CPU cores |
2.5 |
Two and a half CPU cores |
This allows direct comparison with Kubernetes resource limits (e.g., cpu: "2" or cpu: "500m").
Example interpretation:
{
"name": "runner",
"cpu_cores": {
"peak": 2.007,
"avg": 1.5
}
}
This means the "runner" container used a peak of ~2 CPU cores and averaged 1.5 CPU cores during the run.
Memory Metrics
All memory values are in bytes:
mem_used_bytes: System memory usagememory_bytes(in containers): Container RSS memory usagepeak_mem_rss_bytes(in processes): Process RSS memory
Statistical Fields
Each metric includes percentile statistics across all samples:
| Field | Description |
|---|---|
peak |
Maximum value observed |
p99 |
99th percentile |
p95 |
95th percentile |
p75 |
75th percentile |
p50 |
Median (50th percentile) |
avg |
Arithmetic mean |
Configuration
Collector Environment Variables
| Variable | Description | Example |
|---|---|---|
GITHUB_REPOSITORY_OWNER |
Organization name | my-org |
GITHUB_REPOSITORY |
Full repository path | my-org/my-repo |
GITHUB_WORKFLOW |
Workflow filename | ci.yml |
GITHUB_JOB |
Job name | build |
GITHUB_RUN_ID |
Unique run identifier | run-123 |
CGROUP_PROCESS_MAP |
JSON mapping process names to container names | {"node":"runner"} |
CGROUP_LIMITS |
JSON with CPU/memory limits per container | See below |
CGROUP_LIMITS example:
{
"runner": {"cpu": "2", "memory": "1Gi"},
"sidecar": {"cpu": "500m", "memory": "256Mi"}
}
CPU values support Kubernetes notation: "2" = 2 cores, "500m" = 0.5 cores.
Memory values support: Ki, Mi, Gi, Ti (binary) or K, M, G, T (decimal).
Receiver Environment Variables
| Variable | Description | Default |
|---|---|---|
DB_PATH |
SQLite database path | metrics.db |
LISTEN_ADDR |
HTTP listen address | :8080 |
Running
Docker Compose (stress test example)
docker compose -f test/docker/docker-compose-stress.yaml up -d
# Wait for metrics collection...
docker compose -f test/docker/docker-compose-stress.yaml stop collector
# Query results
curl http://localhost:9080/api/v1/metrics/repo/test-org/test-org%2Fstress-test/stress-test-workflow/heavy-workload
Local Development
# Build
go build -o collector ./cmd/collector
go build -o receiver ./cmd/receiver
# Run receiver
./receiver --listen=:8080 --db=metrics.db
# Run collector
./collector --interval=2s --top=10 --push-endpoint=http://localhost:8080/api/v1/metrics