Document receiver API endpoints and response format. Clarify that
container cpu_cores values are in number of cores (not percentage),
while system/process CPU values are percentages.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changed the API response to embed Payload as a JSON object using
json.RawMessage instead of returning it as a JSON-encoded string
inside the JSON response. Added MetricResponse type with ToResponse
converter method.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Guard against unsigned integer underflow in cgroup CPU calculation.
When processes exit and new ones start, totalTicks can be less than
the previous value, causing the subtraction to wrap around to a huge
positive number.
Now checks totalTicks >= prev before calculating delta, treating
process churn as 0 CPU usage for that sample.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Docker Compose setup that:
- Runs metrics receiver with SQLite storage
- Spawns CPU and memory stress workloads using stress-ng
- Uses shared PID namespace (pid: service:cpu-stress) for proper isolation
- Collector gathers metrics and pushes summary on shutdown
Known issue: Container CPU summary may show overflow values on first
sample due to delta calculation - to be fixed in accumulator.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extend StatSummary with p99, p75, p50 percentiles (in addition to peak, p95, avg)
- Add ContainerSummary type for per-container CPU cores and memory bytes stats
- Track container metrics from Cgroups map in Accumulator
- Include containers array in RunSummary sent to receiver
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add cgroup-based process grouping to the resource collector. Processes are
grouped by their cgroup path, with container names resolved via configurable
process-to-container mapping.
New features:
- Read cgroup info from /proc/[pid]/cgroup (supports v1 and v2)
- Parse K8s resource notation (500m, 1Gi, etc.) for CPU/memory limits
- Group metrics by container using CGROUP_PROCESS_MAP env var
- Calculate usage percentages against limits from CGROUP_LIMITS env var
- Output cgroup metrics with CPU cores used, memory RSS, and percentages
Environment variables:
- CGROUP_PROCESS_MAP: Map process names to container names for discovery
- CGROUP_LIMITS: Define CPU/memory limits per container in K8s notation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add integration tests that verify the push client can successfully
send metrics to the receiver and they are stored correctly in SQLite.
Tests:
- TestPushClientToReceiver: Direct HTTP POST verification
- TestPushClientIntegration: Full PushClient with env vars
- TestMultiplePushes: Multiple pushes and filtering (parallel-safe)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace /api/v1/metrics/run/{runID} and /api/v1/metrics/repo/{org}/{repo}
with /api/v1/metrics/repo/{org}/{repo}/{workflow}/{job} for more precise
filtering by workflow and job name.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add multi-stage Dockerfile that can build both images:
- `docker build --target collector` for the collector
- `docker build --target receiver` for the receiver (with CGO for SQLite)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add push client that sends run summary to a configurable HTTP endpoint
on shutdown. Execution context is read from GitHub Actions style
environment variables (with Gitea fallbacks).
New flag: -push-endpoint <url>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a new receiver application under cmd/receiver that accepts metrics
via HTTP POST and stores them in SQLite using GORM. The receiver expects
GitHub Actions style execution context (org, repo, workflow, job, run_id).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
slog.Any() does not properly serialize slices of slog.Group() attributes,
resulting in broken output like {"Key":"","Value":{}}. Fixed by passing
structs with JSON tags directly to slog.Any() instead.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Downgrade actions/setup-go from v6 to v5 (node24 not supported)
- Downgrade goreleaser-action from v6 to v5
- Add CI workflow with goreleaser snapshot on push to main
- Add .goreleaser.yaml configuration
Add Go application that collects CPU and RAM metrics from /proc filesystem:
- Parse /proc/[pid]/stat for CPU usage (user/system time)
- Parse /proc/[pid]/status for memory usage (RSS, VmSize, etc.)
- Aggregate metrics across all processes
- Output via structured logging (JSON/text)
- Continuous collection with configurable interval
Designed for monitoring pipeline runner resource utilization to enable
dynamic runner sizing.