No description

Find a file

Manuel Ganter addab99e5d All checks were successful ci / build (push) Successful in 26s Details docs: add CLAUDE.md with development guidance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>		2026-02-06 15:42:43 +01:00
.github/workflows	fix(ci): unset GITHUB_TOKEN via shell before running goreleaser	2026-02-04 14:57:56 +01:00
cmd	feat(collector): add HTTP push for metrics to receiver	2026-02-06 11:44:20 +01:00
docs/background	docs: move background documentation to docs/background	2026-02-06 15:40:38 +01:00
internal	fix(receiver): return Payload as JSON object instead of string	2026-02-06 15:22:27 +01:00
scripts/hooks	feat: add resource collector for Forgejo runners	2026-02-04 14:13:24 +01:00
test	test(stress): increase cpu-stress to 3 workers with 2.0 CPU limit	2026-02-06 15:32:55 +01:00
.gitignore	feat: add resource collector for Forgejo runners	2026-02-04 14:13:24 +01:00
.goreleaser.yaml	chore: remove darwin from build targets	2026-02-04 15:10:07 +01:00
CLAUDE.md	docs: add CLAUDE.md with development guidance	2026-02-06 15:42:43 +01:00
Dockerfile	feat(docker): multi-stage build for collector and receiver	2026-02-06 11:54:20 +01:00
go.mod	feat(receiver): add HTTP metrics receiver with SQLite storage	2026-02-06 11:40:03 +01:00
go.sum	feat(receiver): add HTTP metrics receiver with SQLite storage	2026-02-06 11:40:03 +01:00
Makefile	feat: add resource collector for Forgejo runners	2026-02-04 14:13:24 +01:00
README.md	docs: add README with API documentation	2026-02-06 15:38:22 +01:00

README.md

Forgejo Runner Resource Collector

A lightweight resource metrics collector designed to run alongside CI/CD workloads in shared PID namespace environments. It collects CPU and memory metrics, groups them by container/cgroup, and pushes summaries to a receiver service.

Components

Collector: Gathers system and per-process metrics at regular intervals, computes run-level statistics, and pushes a summary on shutdown.
Receiver: HTTP service that stores metric summaries in SQLite and provides a query API.

Receiver API

POST `/api/v1/metrics`

Receives metric summaries from collectors.

GET `/api/v1/metrics/repo/{org}/{repo}/{workflow}/{job}`

Retrieves all stored metrics for a specific workflow and job.

Example request:

GET /api/v1/metrics/repo/my-org/my-repo/ci.yml/build

Example response:

[
  {
    "id": 1,
    "organization": "my-org",
    "repository": "my-org/my-repo",
    "workflow": "ci.yml",
    "job": "build",
    "run_id": "run-123",
    "received_at": "2026-02-06T14:30:23.056Z",
    "payload": {
      "start_time": "2026-02-06T14:30:02.185Z",
      "end_time": "2026-02-06T14:30:22.190Z",
      "duration_seconds": 20.0,
      "sample_count": 11,
      "cpu_total_percent": { ... },
      "mem_used_bytes": { ... },
      "mem_used_percent": { ... },
      "top_cpu_processes": [ ... ],
      "top_mem_processes": [ ... ],
      "containers": [
        {
          "name": "runner",
          "cpu_cores": {
            "peak": 2.007,
            "p99": 2.005,
            "p95": 2.004,
            "p75": 1.997,
            "p50": 1.817,
            "avg": 1.5
          },
          "memory_bytes": {
            "peak": 18567168,
            "p99": 18567168,
            "p95": 18567168,
            "p75": 18567168,
            "p50": 18567168,
            "avg": 18567168
          }
        }
      ]
    }
  }
]

Understanding the Metrics

CPU Metrics

There are two different CPU metric formats in the response:

1. System and Process CPU: Percentage (`cpu_total_percent`, `peak_cpu_percent`)

These values represent CPU utilization as a percentage of total available CPU time.

cpu_total_percent: Overall system CPU usage (0-100%)
peak_cpu_percent (in process lists): Per-process CPU usage where 100% = 1 full CPU core

2. Container CPU: Cores (`cpu_cores`)

Important: The cpu_cores field in container metrics represents CPU usage in number of cores, not percentage.

Value	Meaning
`0.5`	Half a CPU core
`1.0`	One full CPU core
`2.0`	Two CPU cores
`2.5`	Two and a half CPU cores

This allows direct comparison with Kubernetes resource limits (e.g., cpu: "2" or cpu: "500m").

Example interpretation:

{
  "name": "runner",
  "cpu_cores": {
    "peak": 2.007,
    "avg": 1.5
  }
}

This means the "runner" container used a peak of ~2 CPU cores and averaged 1.5 CPU cores during the run.

Memory Metrics

All memory values are in bytes:

mem_used_bytes: System memory usage
memory_bytes (in containers): Container RSS memory usage
peak_mem_rss_bytes (in processes): Process RSS memory

Statistical Fields

Each metric includes percentile statistics across all samples:

Field	Description
`peak`	Maximum value observed
`p99`	99th percentile
`p95`	95th percentile
`p75`	75th percentile
`p50`	Median (50th percentile)
`avg`	Arithmetic mean

Configuration

Collector Environment Variables

Variable	Description	Example
`GITHUB_REPOSITORY_OWNER`	Organization name	`my-org`
`GITHUB_REPOSITORY`	Full repository path	`my-org/my-repo`
`GITHUB_WORKFLOW`	Workflow filename	`ci.yml`
`GITHUB_JOB`	Job name	`build`
`GITHUB_RUN_ID`	Unique run identifier	`run-123`
`CGROUP_PROCESS_MAP`	JSON mapping process names to container names	`{"node":"runner"}`
`CGROUP_LIMITS`	JSON with CPU/memory limits per container	See below

CGROUP_LIMITS example:

{
  "runner": {"cpu": "2", "memory": "1Gi"},
  "sidecar": {"cpu": "500m", "memory": "256Mi"}
}

CPU values support Kubernetes notation: "2" = 2 cores, "500m" = 0.5 cores.

Memory values support: Ki, Mi, Gi, Ti (binary) or K, M, G, T (decimal).

Receiver Environment Variables

Variable	Description	Default
`DB_PATH`	SQLite database path	`metrics.db`
`LISTEN_ADDR`	HTTP listen address	`:8080`

Running

Docker Compose (stress test example)

docker compose -f test/docker/docker-compose-stress.yaml up -d
# Wait for metrics collection...
docker compose -f test/docker/docker-compose-stress.yaml stop collector
# Query results
curl http://localhost:9080/api/v1/metrics/repo/test-org/test-org%2Fstress-test/stress-test-workflow/heavy-workload

Local Development

# Build
go build -o collector ./cmd/collector
go build -o receiver ./cmd/receiver

# Run receiver
./receiver --listen=:8080 --db=metrics.db

# Run collector
./collector --interval=2s --top=10 --push-endpoint=http://localhost:8080/api/v1/metrics