Commit graph

17 commits

Author SHA1 Message Date
6770cfcea7
feat(summary): add per-container metrics with extended percentiles
All checks were successful
ci / build (push) Successful in 34s
- Extend StatSummary with p99, p75, p50 percentiles (in addition to peak, p95, avg)
- Add ContainerSummary type for per-container CPU cores and memory bytes stats
- Track container metrics from Cgroups map in Accumulator
- Include containers array in RunSummary sent to receiver

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 15:01:01 +01:00
5e470c33a5
feat(collector): group CPU and memory metrics by cgroup
All checks were successful
ci / build (push) Successful in 30s
Add cgroup-based process grouping to the resource collector. Processes are
grouped by their cgroup path, with container names resolved via configurable
process-to-container mapping.

New features:
- Read cgroup info from /proc/[pid]/cgroup (supports v1 and v2)
- Parse K8s resource notation (500m, 1Gi, etc.) for CPU/memory limits
- Group metrics by container using CGROUP_PROCESS_MAP env var
- Calculate usage percentages against limits from CGROUP_LIMITS env var
- Output cgroup metrics with CPU cores used, memory RSS, and percentages

Environment variables:
- CGROUP_PROCESS_MAP: Map process names to container names for discovery
- CGROUP_LIMITS: Define CPU/memory limits per container in K8s notation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 14:50:36 +01:00
0bf7dfee38
test: add integration tests for collector-receiver interaction
All checks were successful
ci / build (push) Successful in 38s
Add integration tests that verify the push client can successfully
send metrics to the receiver and they are stored correctly in SQLite.

Tests:
- TestPushClientToReceiver: Direct HTTP POST verification
- TestPushClientIntegration: Full PushClient with env vars
- TestMultiplePushes: Multiple pushes and filtering (parallel-safe)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 12:12:36 +01:00
7da7dc138f
refactor(receiver): change query endpoint to filter by workflow and job
All checks were successful
ci / build (push) Successful in 32s
Replace /api/v1/metrics/run/{runID} and /api/v1/metrics/repo/{org}/{repo}
with /api/v1/metrics/repo/{org}/{repo}/{workflow}/{job} for more precise
filtering by workflow and job name.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 12:00:22 +01:00
d99cd1dd56
feat(docker): multi-stage build for collector and receiver
All checks were successful
ci / build (push) Successful in 1m55s
Add multi-stage Dockerfile that can build both images:
- `docker build --target collector` for the collector
- `docker build --target receiver` for the receiver (with CGO for SQLite)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 11:54:20 +01:00
cfe583fbc4
feat(collector): add HTTP push for metrics to receiver
All checks were successful
ci / build (push) Successful in 46s
Add push client that sends run summary to a configurable HTTP endpoint
on shutdown. Execution context is read from GitHub Actions style
environment variables (with Gitea fallbacks).

New flag: -push-endpoint <url>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 11:44:20 +01:00
c309bd810d
feat(receiver): add HTTP metrics receiver with SQLite storage
All checks were successful
ci / build (push) Successful in 2m33s
Add a new receiver application under cmd/receiver that accepts metrics
via HTTP POST and stores them in SQLite using GORM. The receiver expects
GitHub Actions style execution context (org, repo, workflow, job, run_id).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 11:40:03 +01:00
c5c872a373
fix(output): correct JSON serialization of top process metrics
All checks were successful
ci / build (push) Successful in 1m56s
slog.Any() does not properly serialize slices of slog.Group() attributes,
resulting in broken output like {"Key":"","Value":{}}. Fixed by passing
structs with JSON tags directly to slog.Any() instead.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 11:00:34 +01:00
7201a527d8
feat(collector): Summaries metrics at the end of the process
All checks were successful
ci / build (push) Successful in 1m39s
2026-02-04 16:21:17 +01:00
54269e8a0e
chore: remove darwin from build targets
All checks were successful
ci / build (push) Successful in 1m1s
2026-02-04 15:10:07 +01:00
d6cf8e5f39
fix(ci): unset GITHUB_TOKEN via shell before running goreleaser
Some checks failed
ci / build (push) Successful in 1m20s
ci / goreleaser (push) Failing after 57s
2026-02-04 14:57:56 +01:00
d1cc6e3c15
fix(ci): unset GITHUB_TOKEN to avoid conflict with GITEA_TOKEN
Some checks failed
ci / goreleaser (push) Failing after 28s
ci / build (push) Successful in 1m7s
2026-02-04 14:54:51 +01:00
24adf4d642
fix(ci): use goreleaser config version 1
Some checks failed
ci / build (push) Successful in 1m12s
ci / goreleaser (push) Failing after 17s
2026-02-04 14:50:11 +01:00
820ebb72ac
fix(ci): use compatible action versions for Forgejo runner
Some checks failed
ci / build (push) Failing after 1m4s
- Downgrade actions/setup-go from v6 to v5 (node24 not supported)
- Downgrade goreleaser-action from v6 to v5
- Add CI workflow with goreleaser snapshot on push to main
- Add .goreleaser.yaml configuration
2026-02-04 14:46:51 +01:00
ddaf5fbd0f
chore: update module path to DevFW-CICD org
Some checks failed
ci / goreleaser (push) Failing after 1s
Rename module from edp.buildth.ing/DevFW/forgejo-runner-resource-collector
to edp.buildth.ing/DevFW-CICD/forgejo-runner-resource-collector
2026-02-04 14:25:50 +01:00
74ac653e58
chore: add Dockerfile with Go 1.25
Add multi-stage Dockerfile for building minimal container image
using golang:1.25-alpine base.
2026-02-04 14:19:47 +01:00
219d26959f
feat: add resource collector for Forgejo runners
Add Go application that collects CPU and RAM metrics from /proc filesystem:
- Parse /proc/[pid]/stat for CPU usage (user/system time)
- Parse /proc/[pid]/status for memory usage (RSS, VmSize, etc.)
- Aggregate metrics across all processes
- Output via structured logging (JSON/text)
- Continuous collection with configurable interval

Designed for monitoring pipeline runner resource utilization to enable
dynamic runner sizing.
2026-02-04 14:13:24 +01:00