forgejo-runner-optimiser

Author	SHA1	Message	Date
Manuel Ganter	addab99e5d	docs: add CLAUDE.md with development guidance All checks were successful ci / build (push) Successful in 26s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 15:42:43 +01:00
Manuel Ganter	fd02242d5e	docs: move background documentation to docs/background All checks were successful ci / build (push) Successful in 31s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 15:40:38 +01:00
Manuel Ganter	52f1b8b64d	docs: add README with API documentation All checks were successful ci / build (push) Successful in 34s Details Document receiver API endpoints and response format. Clarify that container cpu_cores values are in number of cores (not percentage), while system/process CPU values are percentages. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 15:38:22 +01:00
Manuel Ganter	d624d46822	test(stress): increase cpu-stress to 3 workers with 2.0 CPU limit All checks were successful ci / build (push) Successful in 26s Details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 15:32:55 +01:00
Manuel Ganter	eb01c1c842	fix(receiver): return Payload as JSON object instead of string All checks were successful ci / build (push) Successful in 1m49s Details Changed the API response to embed Payload as a JSON object using json.RawMessage instead of returning it as a JSON-encoded string inside the JSON response. Added MetricResponse type with ToResponse converter method. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 15:22:27 +01:00
Manuel Ganter	0af8c28bc2	fix(aggregator): prevent CPU cores overflow when processes restart All checks were successful ci / build (push) Successful in 28s Details Guard against unsigned integer underflow in cgroup CPU calculation. When processes exit and new ones start, totalTicks can be less than the previous value, causing the subtraction to wrap around to a huge positive number. Now checks totalTicks >= prev before calculating delta, treating process churn as 0 CPU usage for that sample. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 15:15:30 +01:00
Manuel Ganter	5b983692c8	test: add stress test with receiver integration All checks were successful ci / build (push) Successful in 38s Details Docker Compose setup that: - Runs metrics receiver with SQLite storage - Spawns CPU and memory stress workloads using stress-ng - Uses shared PID namespace (pid: service:cpu-stress) for proper isolation - Collector gathers metrics and pushes summary on shutdown Known issue: Container CPU summary may show overflow values on first sample due to delta calculation - to be fixed in accumulator. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 15:11:22 +01:00
Manuel Ganter	6770cfcea7	feat(summary): add per-container metrics with extended percentiles All checks were successful ci / build (push) Successful in 34s Details - Extend StatSummary with p99, p75, p50 percentiles (in addition to peak, p95, avg) - Add ContainerSummary type for per-container CPU cores and memory bytes stats - Track container metrics from Cgroups map in Accumulator - Include containers array in RunSummary sent to receiver Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 15:01:01 +01:00
Manuel Ganter	5e470c33a5	feat(collector): group CPU and memory metrics by cgroup All checks were successful ci / build (push) Successful in 30s Details Add cgroup-based process grouping to the resource collector. Processes are grouped by their cgroup path, with container names resolved via configurable process-to-container mapping. New features: - Read cgroup info from /proc/[pid]/cgroup (supports v1 and v2) - Parse K8s resource notation (500m, 1Gi, etc.) for CPU/memory limits - Group metrics by container using CGROUP_PROCESS_MAP env var - Calculate usage percentages against limits from CGROUP_LIMITS env var - Output cgroup metrics with CPU cores used, memory RSS, and percentages Environment variables: - CGROUP_PROCESS_MAP: Map process names to container names for discovery - CGROUP_LIMITS: Define CPU/memory limits per container in K8s notation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 14:50:36 +01:00
Manuel Ganter	0bf7dfee38	test: add integration tests for collector-receiver interaction All checks were successful ci / build (push) Successful in 38s Details Add integration tests that verify the push client can successfully send metrics to the receiver and they are stored correctly in SQLite. Tests: - TestPushClientToReceiver: Direct HTTP POST verification - TestPushClientIntegration: Full PushClient with env vars - TestMultiplePushes: Multiple pushes and filtering (parallel-safe) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 12:12:36 +01:00
Manuel Ganter	7da7dc138f	refactor(receiver): change query endpoint to filter by workflow and job All checks were successful ci / build (push) Successful in 32s Details Replace /api/v1/metrics/run/{runID} and /api/v1/metrics/repo/{org}/{repo} with /api/v1/metrics/repo/{org}/{repo}/{workflow}/{job} for more precise filtering by workflow and job name. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 12:00:22 +01:00
Manuel Ganter	d99cd1dd56	feat(docker): multi-stage build for collector and receiver All checks were successful ci / build (push) Successful in 1m55s Details Add multi-stage Dockerfile that can build both images: - `docker build --target collector` for the collector - `docker build --target receiver` for the receiver (with CGO for SQLite) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 11:54:20 +01:00
Manuel Ganter	cfe583fbc4	feat(collector): add HTTP push for metrics to receiver All checks were successful ci / build (push) Successful in 46s Details Add push client that sends run summary to a configurable HTTP endpoint on shutdown. Execution context is read from GitHub Actions style environment variables (with Gitea fallbacks). New flag: -push-endpoint <url> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 11:44:20 +01:00
Manuel Ganter	c309bd810d	feat(receiver): add HTTP metrics receiver with SQLite storage All checks were successful ci / build (push) Successful in 2m33s Details Add a new receiver application under cmd/receiver that accepts metrics via HTTP POST and stores them in SQLite using GORM. The receiver expects GitHub Actions style execution context (org, repo, workflow, job, run_id). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 11:40:03 +01:00
Manuel Ganter	c5c872a373	fix(output): correct JSON serialization of top process metrics All checks were successful ci / build (push) Successful in 1m56s Details slog.Any() does not properly serialize slices of slog.Group() attributes, resulting in broken output like {"Key":"","Value":{}}. Fixed by passing structs with JSON tags directly to slog.Any() instead. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-06 11:00:34 +01:00
Waldemar Kindler	7201a527d8	feat(collector): Summaries metrics at the end of the process All checks were successful ci / build (push) Successful in 1m39s Details	2026-02-04 16:21:17 +01:00
Manuel Ganter	54269e8a0e	chore: remove darwin from build targets All checks were successful ci / build (push) Successful in 1m1s Details	2026-02-04 15:10:07 +01:00
Manuel Ganter	d6cf8e5f39	fix(ci): unset GITHUB_TOKEN via shell before running goreleaser Some checks failed ci / build (push) Successful in 1m20s Details ci / goreleaser (push) Failing after 57s Details	2026-02-04 14:57:56 +01:00
Manuel Ganter	d1cc6e3c15	fix(ci): unset GITHUB_TOKEN to avoid conflict with GITEA_TOKEN Some checks failed ci / goreleaser (push) Failing after 28s Details ci / build (push) Successful in 1m7s Details	2026-02-04 14:54:51 +01:00
Manuel Ganter	24adf4d642	fix(ci): use goreleaser config version 1 Some checks failed ci / build (push) Successful in 1m12s Details ci / goreleaser (push) Failing after 17s Details	2026-02-04 14:50:11 +01:00
Manuel Ganter	820ebb72ac	fix(ci): use compatible action versions for Forgejo runner Some checks failed ci / build (push) Failing after 1m4s Details - Downgrade actions/setup-go from v6 to v5 (node24 not supported) - Downgrade goreleaser-action from v6 to v5 - Add CI workflow with goreleaser snapshot on push to main - Add .goreleaser.yaml configuration	2026-02-04 14:46:51 +01:00
Manuel Ganter	ddaf5fbd0f	chore: update module path to DevFW-CICD org Some checks failed ci / goreleaser (push) Failing after 1s Details Rename module from edp.buildth.ing/DevFW/forgejo-runner-resource-collector to edp.buildth.ing/DevFW-CICD/forgejo-runner-resource-collector	2026-02-04 14:25:50 +01:00
Manuel Ganter	74ac653e58	chore: add Dockerfile with Go 1.25 Add multi-stage Dockerfile for building minimal container image using golang:1.25-alpine base.	2026-02-04 14:19:47 +01:00
Manuel Ganter	219d26959f	feat: add resource collector for Forgejo runners Add Go application that collects CPU and RAM metrics from /proc filesystem: - Parse /proc/[pid]/stat for CPU usage (user/system time) - Parse /proc/[pid]/status for memory usage (RSS, VmSize, etc.) - Aggregate metrics across all processes - Output via structured logging (JSON/text) - Continuous collection with configurable interval Designed for monitoring pipeline runner resource utilization to enable dynamic runner sizing.	2026-02-04 14:13:24 +01:00

24 commits