Commit graph

30 commits

Author SHA1 Message Date
e1a4e9c579
ci: add docker build to release
Some checks failed
ci / build (push) Failing after 22s
ci / goreleaser (push) Successful in 1m3s
2026-02-12 11:26:29 +01:00
d713c25fa5
Rename repo from forgejo-runner-resource-collector
All checks were successful
ci / build (push) Successful in 26s
ci / goreleaser (push) Successful in 23s
2026-02-12 09:55:04 +01:00
d0dd209bc9
Add token-based authentication for receiver
All checks were successful
ci / build (push) Successful in 28s
2026-02-11 15:18:03 +01:00
042ce77ddc
feat: added pre-shared-key for read endpoints
All checks were successful
ci / build (push) Successful in 30s
2026-02-10 12:02:15 +01:00
90c89583a0
Ignore data with no delta (first datapoint or underflow)
All checks were successful
ci / build (push) Successful in 28s
2026-02-09 17:37:53 +01:00
2a4c64bfb0
Update and extend documentation 2026-02-09 17:36:59 +01:00
addab99e5d
docs: add CLAUDE.md with development guidance
All checks were successful
ci / build (push) Successful in 26s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 15:42:43 +01:00
fd02242d5e
docs: move background documentation to docs/background
All checks were successful
ci / build (push) Successful in 31s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 15:40:38 +01:00
52f1b8b64d
docs: add README with API documentation
All checks were successful
ci / build (push) Successful in 34s
Document receiver API endpoints and response format. Clarify that
container cpu_cores values are in number of cores (not percentage),
while system/process CPU values are percentages.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 15:38:22 +01:00
d624d46822
test(stress): increase cpu-stress to 3 workers with 2.0 CPU limit
All checks were successful
ci / build (push) Successful in 26s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 15:32:55 +01:00
eb01c1c842
fix(receiver): return Payload as JSON object instead of string
All checks were successful
ci / build (push) Successful in 1m49s
Changed the API response to embed Payload as a JSON object using
json.RawMessage instead of returning it as a JSON-encoded string
inside the JSON response. Added MetricResponse type with ToResponse
converter method.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 15:22:27 +01:00
0af8c28bc2
fix(aggregator): prevent CPU cores overflow when processes restart
All checks were successful
ci / build (push) Successful in 28s
Guard against unsigned integer underflow in cgroup CPU calculation.
When processes exit and new ones start, totalTicks can be less than
the previous value, causing the subtraction to wrap around to a huge
positive number.

Now checks totalTicks >= prev before calculating delta, treating
process churn as 0 CPU usage for that sample.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 15:15:30 +01:00
5b983692c8
test: add stress test with receiver integration
All checks were successful
ci / build (push) Successful in 38s
Docker Compose setup that:
- Runs metrics receiver with SQLite storage
- Spawns CPU and memory stress workloads using stress-ng
- Uses shared PID namespace (pid: service:cpu-stress) for proper isolation
- Collector gathers metrics and pushes summary on shutdown

Known issue: Container CPU summary may show overflow values on first
sample due to delta calculation - to be fixed in accumulator.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 15:11:22 +01:00
6770cfcea7
feat(summary): add per-container metrics with extended percentiles
All checks were successful
ci / build (push) Successful in 34s
- Extend StatSummary with p99, p75, p50 percentiles (in addition to peak, p95, avg)
- Add ContainerSummary type for per-container CPU cores and memory bytes stats
- Track container metrics from Cgroups map in Accumulator
- Include containers array in RunSummary sent to receiver

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 15:01:01 +01:00
5e470c33a5
feat(collector): group CPU and memory metrics by cgroup
All checks were successful
ci / build (push) Successful in 30s
Add cgroup-based process grouping to the resource collector. Processes are
grouped by their cgroup path, with container names resolved via configurable
process-to-container mapping.

New features:
- Read cgroup info from /proc/[pid]/cgroup (supports v1 and v2)
- Parse K8s resource notation (500m, 1Gi, etc.) for CPU/memory limits
- Group metrics by container using CGROUP_PROCESS_MAP env var
- Calculate usage percentages against limits from CGROUP_LIMITS env var
- Output cgroup metrics with CPU cores used, memory RSS, and percentages

Environment variables:
- CGROUP_PROCESS_MAP: Map process names to container names for discovery
- CGROUP_LIMITS: Define CPU/memory limits per container in K8s notation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 14:50:36 +01:00
0bf7dfee38
test: add integration tests for collector-receiver interaction
All checks were successful
ci / build (push) Successful in 38s
Add integration tests that verify the push client can successfully
send metrics to the receiver and they are stored correctly in SQLite.

Tests:
- TestPushClientToReceiver: Direct HTTP POST verification
- TestPushClientIntegration: Full PushClient with env vars
- TestMultiplePushes: Multiple pushes and filtering (parallel-safe)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 12:12:36 +01:00
7da7dc138f
refactor(receiver): change query endpoint to filter by workflow and job
All checks were successful
ci / build (push) Successful in 32s
Replace /api/v1/metrics/run/{runID} and /api/v1/metrics/repo/{org}/{repo}
with /api/v1/metrics/repo/{org}/{repo}/{workflow}/{job} for more precise
filtering by workflow and job name.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 12:00:22 +01:00
d99cd1dd56
feat(docker): multi-stage build for collector and receiver
All checks were successful
ci / build (push) Successful in 1m55s
Add multi-stage Dockerfile that can build both images:
- `docker build --target collector` for the collector
- `docker build --target receiver` for the receiver (with CGO for SQLite)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 11:54:20 +01:00
cfe583fbc4
feat(collector): add HTTP push for metrics to receiver
All checks were successful
ci / build (push) Successful in 46s
Add push client that sends run summary to a configurable HTTP endpoint
on shutdown. Execution context is read from GitHub Actions style
environment variables (with Gitea fallbacks).

New flag: -push-endpoint <url>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 11:44:20 +01:00
c309bd810d
feat(receiver): add HTTP metrics receiver with SQLite storage
All checks were successful
ci / build (push) Successful in 2m33s
Add a new receiver application under cmd/receiver that accepts metrics
via HTTP POST and stores them in SQLite using GORM. The receiver expects
GitHub Actions style execution context (org, repo, workflow, job, run_id).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 11:40:03 +01:00
c5c872a373
fix(output): correct JSON serialization of top process metrics
All checks were successful
ci / build (push) Successful in 1m56s
slog.Any() does not properly serialize slices of slog.Group() attributes,
resulting in broken output like {"Key":"","Value":{}}. Fixed by passing
structs with JSON tags directly to slog.Any() instead.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 11:00:34 +01:00
7201a527d8
feat(collector): Summaries metrics at the end of the process
All checks were successful
ci / build (push) Successful in 1m39s
2026-02-04 16:21:17 +01:00
54269e8a0e
chore: remove darwin from build targets
All checks were successful
ci / build (push) Successful in 1m1s
2026-02-04 15:10:07 +01:00
d6cf8e5f39
fix(ci): unset GITHUB_TOKEN via shell before running goreleaser
Some checks failed
ci / build (push) Successful in 1m20s
ci / goreleaser (push) Failing after 57s
2026-02-04 14:57:56 +01:00
d1cc6e3c15
fix(ci): unset GITHUB_TOKEN to avoid conflict with GITEA_TOKEN
Some checks failed
ci / goreleaser (push) Failing after 28s
ci / build (push) Successful in 1m7s
2026-02-04 14:54:51 +01:00
24adf4d642
fix(ci): use goreleaser config version 1
Some checks failed
ci / build (push) Successful in 1m12s
ci / goreleaser (push) Failing after 17s
2026-02-04 14:50:11 +01:00
820ebb72ac
fix(ci): use compatible action versions for Forgejo runner
Some checks failed
ci / build (push) Failing after 1m4s
- Downgrade actions/setup-go from v6 to v5 (node24 not supported)
- Downgrade goreleaser-action from v6 to v5
- Add CI workflow with goreleaser snapshot on push to main
- Add .goreleaser.yaml configuration
2026-02-04 14:46:51 +01:00
ddaf5fbd0f
chore: update module path to DevFW-CICD org
Some checks failed
ci / goreleaser (push) Failing after 1s
Rename module from edp.buildth.ing/DevFW/forgejo-runner-resource-collector
to edp.buildth.ing/DevFW-CICD/forgejo-runner-resource-collector
2026-02-04 14:25:50 +01:00
74ac653e58
chore: add Dockerfile with Go 1.25
Add multi-stage Dockerfile for building minimal container image
using golang:1.25-alpine base.
2026-02-04 14:19:47 +01:00
219d26959f
feat: add resource collector for Forgejo runners
Add Go application that collects CPU and RAM metrics from /proc filesystem:
- Parse /proc/[pid]/stat for CPU usage (user/system time)
- Parse /proc/[pid]/status for memory usage (RSS, VmSize, etc.)
- Aggregate metrics across all processes
- Output via structured logging (JSON/text)
- Continuous collection with configurable interval

Designed for monitoring pipeline runner resource utilization to enable
dynamic runner sizing.
2026-02-04 14:13:24 +01:00