Update and extend documentation

2026-02-09 17:24:57 +01:00 · 2026-02-09 17:24:57 +01:00 · 2a4c64bfb0
commit 2a4c64bfb0
parent addab99e5d
3 changed files with 212 additions and 571 deletions
--- a/README.md
+++ b/README.md
@ -1,28 +1,100 @@
 # Forgejo Runner Resource Collector

-A lightweight resource metrics collector designed to run alongside CI/CD workloads in shared PID namespace environments. It collects CPU and memory metrics, groups them by container/cgroup, and pushes summaries to a receiver service.
+A lightweight metrics collector for CI/CD workloads in shared PID namespace environments. Reads `/proc` to collect CPU and memory metrics, groups them by container/cgroup, and pushes run summaries to a receiver service for storage and querying.

-## Components
+## Architecture

- **Collector**: Gathers system and per-process metrics at regular intervals, computes run-level statistics, and pushes a summary on shutdown.
- **Receiver**: HTTP service that stores metric summaries in SQLite and provides a query API.
+The system has two independent binaries:

-## Receiver API
+```
+┌─────────────────────────────────────────────┐       ┌──────────────────────────┐
+│  CI/CD Pod (shared PID namespace)           │       │  Receiver Service        │
+│                                             │       │                          │
+│  ┌───────────┐  ┌────────┐  ┌───────────┐   │       │  POST /api/v1/metrics    │
+│  │ collector │  │ runner │  │ sidecar   │   │       │         │                │
+│  │           │  │        │  │           │   │       │         ▼                │
+│  │ reads     │  │        │  │           │   │  push │  ┌────────────┐          │
+│  │ /proc for │  │        │  │           │   │──────▶│  │  SQLite    │          │
+│  │ all PIDs  │  │        │  │           │   │       │  └────────────┘          │
+│  └───────────┘  └────────┘  └───────────┘   │       │         │                │
+│                                             │       │         ▼                │
+└─────────────────────────────────────────────┘       │  GET /api/v1/metrics/... │
+                                                      └──────────────────────────┘
+```

-### POST `/api/v1/metrics`
+### Collector

-Receives metric summaries from collectors.
+Runs as a sidecar alongside CI workloads. On a configurable interval, it reads `/proc` to collect CPU and memory for all visible processes, groups them by container using cgroup paths, and accumulates samples. On shutdown (SIGINT/SIGTERM), it computes run-level statistics (peak, avg, percentiles) and pushes a single summary to the receiver.

-### GET `/api/v1/metrics/repo/{org}/{repo}/{workflow}/{job}`
+```bash
+./collector --interval=2s --top=10 --push-endpoint=http://receiver:8080/api/v1/metrics
+```

-Retrieves all stored metrics for a specific workflow and job.
+**Flags:** `--interval`, `--proc-path`, `--log-level`, `--log-format`, `--top`, `--push-endpoint`
+
+**Environment variables:**
+
+| Variable | Description | Example |
+|----------|-------------|---------|
+| `GITHUB_REPOSITORY_OWNER` | Organization name | `my-org` |
+| `GITHUB_REPOSITORY` | Full repository path | `my-org/my-repo` |
+| `GITHUB_WORKFLOW` | Workflow filename | `ci.yml` |
+| `GITHUB_JOB` | Job name | `build` |
+| `GITHUB_RUN_ID` | Unique run identifier | `run-123` |
+| `CGROUP_PROCESS_MAP` | JSON: process name → container name | `{"node":"runner"}` |
+| `CGROUP_LIMITS` | JSON: per-container CPU/memory limits | See below |
+
+**CGROUP_LIMITS example:**
+```json
+{
+  "runner": {"cpu": "2", "memory": "1Gi"},
+  "sidecar": {"cpu": "500m", "memory": "256Mi"}
+}
+```
+CPU supports Kubernetes notation (`"2"` = 2 cores, `"500m"` = 0.5 cores). Memory supports `Ki`, `Mi`, `Gi`, `Ti` (binary) or `K`, `M`, `G`, `T` (decimal).
+
+### Receiver
+
+HTTP service that stores metric summaries in SQLite (via GORM) and exposes a query API.
+
+```bash
+./receiver --addr=:8080 --db=metrics.db
+```
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `DB_PATH` | SQLite database path | `metrics.db` |
+| `LISTEN_ADDR` | HTTP listen address | `:8080` |
+
+**Endpoints:**
+
+- `POST /api/v1/metrics` — receive and store a metric summary
+- `GET /api/v1/metrics/repo/{org}/{repo}/{workflow}/{job}` — query stored metrics
+
+## How Metrics Are Collected
+
+The collector reads `/proc/[pid]/stat` for every visible process to get CPU ticks (`utime` + `stime`) and `/proc/[pid]/status` for memory (RSS). It takes two samples per interval and computes the delta to derive CPU usage rates.
+
+Processes are grouped into containers by reading `/proc/[pid]/cgroup` and matching cgroup paths against the `CGROUP_PROCESS_MAP`. This is necessary because in shared PID namespace pods, `/proc/stat` only shows host-level aggregates — per-container metrics must be built up from individual process data.
+
+Container CPU is reported in **cores** (not percentage) for direct comparison with Kubernetes resource limits. System-level CPU is reported as a percentage (0-100%).
+
+Over the course of a run, the `summary.Accumulator` tracks every sample and on shutdown computes:
+
+| Stat | Description |
+|------|-------------|
+| `peak` | Maximum observed value |
+| `p99`, `p95`, `p75`, `p50` | Percentiles across all samples |
+| `avg` | Arithmetic mean |
+
+These stats are computed for CPU, memory, and per-container metrics.
+
+## API Response

-**Example request:**
 ```
 GET /api/v1/metrics/repo/my-org/my-repo/ci.yml/build
 ```

-**Example response:**
 ```json
 [
  {
@ -38,151 +110,66 @@ GET /api/v1/metrics/repo/my-org/my-repo/ci.yml/build
      "end_time": "2026-02-06T14:30:22.190Z",
      "duration_seconds": 20.0,
      "sample_count": 11,
-      "cpu_total_percent": { ... },
-      "mem_used_bytes": { ... },
-      "mem_used_percent": { ... },
-      "top_cpu_processes": [ ... ],
-      "top_mem_processes": [ ... ],
+      "cpu_total_percent": { "peak": ..., "avg": ..., "p50": ... },
+      "mem_used_bytes": { "peak": ..., "avg": ... },
      "containers": [
        {
          "name": "runner",
-          "cpu_cores": {
-            "peak": 2.007,
-            "p99": 2.005,
-            "p95": 2.004,
-            "p75": 1.997,
-            "p50": 1.817,
-            "avg": 1.5
-          },
-          "memory_bytes": {
-            "peak": 18567168,
-            "p99": 18567168,
-            "p95": 18567168,
-            "p75": 18567168,
-            "p50": 18567168,
-            "avg": 18567168
-          }
+          "cpu_cores": { "peak": 2.007, "avg": 1.5, "p50": 1.817, "p95": 2.004 },
+          "memory_bytes": { "peak": 18567168, "avg": 18567168 }
        }
-      ]
+      ],
+      "top_cpu_processes": [ ... ],
+      "top_mem_processes": [ ... ]
    }
  }
 ]
 ```

-## Understanding the Metrics
+**CPU metric distinction:**
+- `cpu_total_percent` — system-wide, 0-100%
+- `cpu_cores` (containers) — cores used (e.g. `2.0` = two full cores)
+- `peak_cpu_percent` (processes) — per-process, where 100% = 1 core

-### CPU Metrics
-
-There are two different CPU metric formats in the response:
-
-#### 1. System and Process CPU: Percentage (`cpu_total_percent`, `peak_cpu_percent`)
-
-These values represent **CPU utilization as a percentage** of total available CPU time.
-
- `cpu_total_percent`: Overall system CPU usage (0-100%)
- `peak_cpu_percent` (in process lists): Per-process CPU usage where 100% = 1 full CPU core
-
-#### 2. Container CPU: Cores (`cpu_cores`)
-
-**Important:** The `cpu_cores` field in container metrics represents **CPU usage in number of cores**, not percentage.
-
-| Value | Meaning |
-|-------|---------|
-| `0.5` | Half a CPU core |
-| `1.0` | One full CPU core |
-| `2.0` | Two CPU cores |
-| `2.5` | Two and a half CPU cores |
-
-This allows direct comparison with Kubernetes resource limits (e.g., `cpu: "2"` or `cpu: "500m"`).
-
-**Example interpretation:**
-```json
-{
-  "name": "runner",
-  "cpu_cores": {
-    "peak": 2.007,
-    "avg": 1.5
-  }
-}
-```
-This means the "runner" container used a peak of ~2 CPU cores and averaged 1.5 CPU cores during the run.
-
-### Memory Metrics
-
-All memory values are in **bytes**:
-
- `mem_used_bytes`: System memory usage
- `memory_bytes` (in containers): Container RSS memory usage
- `peak_mem_rss_bytes` (in processes): Process RSS memory
-
-### Statistical Fields
-
-Each metric includes percentile statistics across all samples:
-
-| Field | Description |
-|-------|-------------|
-| `peak` | Maximum value observed |
-| `p99` | 99th percentile |
-| `p95` | 95th percentile |
-| `p75` | 75th percentile |
-| `p50` | Median (50th percentile) |
-| `avg` | Arithmetic mean |
-
-## Configuration
-
-### Collector Environment Variables
-
-| Variable | Description | Example |
-|----------|-------------|---------|
-| `GITHUB_REPOSITORY_OWNER` | Organization name | `my-org` |
-| `GITHUB_REPOSITORY` | Full repository path | `my-org/my-repo` |
-| `GITHUB_WORKFLOW` | Workflow filename | `ci.yml` |
-| `GITHUB_JOB` | Job name | `build` |
-| `GITHUB_RUN_ID` | Unique run identifier | `run-123` |
-| `CGROUP_PROCESS_MAP` | JSON mapping process names to container names | `{"node":"runner"}` |
-| `CGROUP_LIMITS` | JSON with CPU/memory limits per container | See below |
-
-**CGROUP_LIMITS example:**
-```json
-{
-  "runner": {"cpu": "2", "memory": "1Gi"},
-  "sidecar": {"cpu": "500m", "memory": "256Mi"}
-}
-```
-
-CPU values support Kubernetes notation: `"2"` = 2 cores, `"500m"` = 0.5 cores.
-
-Memory values support: `Ki`, `Mi`, `Gi`, `Ti` (binary) or `K`, `M`, `G`, `T` (decimal).
-
-### Receiver Environment Variables
-
-| Variable | Description | Default |
-|----------|-------------|---------|
-| `DB_PATH` | SQLite database path | `metrics.db` |
-| `LISTEN_ADDR` | HTTP listen address | `:8080` |
+All memory values are in **bytes**.

 ## Running

-### Docker Compose (stress test example)
+### Docker Compose

 ```bash
 docker compose -f test/docker/docker-compose-stress.yaml up -d
-# Wait for metrics collection...
+# Wait for collection, then trigger shutdown summary:
 docker compose -f test/docker/docker-compose-stress.yaml stop collector
-# Query results
+# Query results:
 curl http://localhost:9080/api/v1/metrics/repo/test-org/test-org%2Fstress-test/stress-test-workflow/heavy-workload
 ```

-### Local Development
+### Local

 ```bash
-# Build
 go build -o collector ./cmd/collector
 go build -o receiver ./cmd/receiver

-# Run receiver
-./receiver --listen=:8080 --db=metrics.db
-
-# Run collector
+./receiver --addr=:8080 --db=metrics.db
 ./collector --interval=2s --top=10 --push-endpoint=http://localhost:8080/api/v1/metrics
 ```
+
+## Internal Packages
+
+| Package | Purpose |
+|---------|---------|
+| `internal/proc` | Low-level `/proc` parsing (stat, status, cgroup) |
+| `internal/metrics` | Aggregates process metrics from `/proc` into system/container views |
+| `internal/cgroup` | Parses `CGROUP_PROCESS_MAP` and `CGROUP_LIMITS` env vars |
+| `internal/collector` | Orchestrates the collection loop and shutdown |
+| `internal/summary` | Accumulates samples, computes stats, pushes to receiver |
+| `internal/receiver` | HTTP handlers and SQLite store |
+| `internal/output` | Metrics output formatting (JSON/text) |
+
+## Background
+
+Technical reference on the Linux primitives this project builds on:
+
+- [Identifying process cgroups by PID](docs/background/identify-process-cgroup-by-pid.md) — how to read `/proc/<PID>/cgroup` to determine which container a process belongs to
+- [/proc/stat behavior in containers](docs/background/proc-stat-in-containers.md) — why `/proc/stat` shows host-level data in containers, and how to aggregate per-process stats from `/proc/[pid]/stat` instead, including CPU tick conversion and cgroup limit handling
--- a/docs/background/identify-process-cgroup-by-pid.md
+++ b/docs/background/identify-process-cgroup-by-pid.md
@ -1,26 +1,14 @@
-# Identify Process Cgroup by PID
+# Identifying a Process's Cgroup by PID

-## TL;DR
+Read `/proc/<PID>/cgroup` to find which cgroup (and therefore which container) a process belongs to.

-You can identify a process's cgroup by reading `/proc/<PID>/cgroup`.
-
-## Outcome
-
-Multiple methods to find which cgroup a process belongs to using its PID.
-
-## Conversation
-
-**Question:** Can I identify the process's cgroup by having its PID?
-
-**Answer:** Yes, here are the main methods:
-
-### Method 1: Read `/proc/<PID>/cgroup`
+## /proc/PID/cgroup

 ```bash
 cat /proc/<PID>/cgroup
 ```

-This shows all cgroup controllers the process belongs to. For example:
+Shows all cgroup controllers the process belongs to:
 ```
 12:blkio:/user.slice
 11:memory:/user.slice/user-1000.slice
@ -28,35 +16,23 @@ This shows all cgroup controllers the process belongs to. For example:
 0::/user.slice/user-1000.slice/session-1.scope
 ```

-### Method 2: Using `ps` with format options
+On cgroup v2, the path after `::` is the cgroup path under `/sys/fs/cgroup/`.
+
+## Other Methods

 ```bash
+# ps format options
 ps -o pid,cgroup -p <PID>
-```

-### Method 3: Using `systemd-cgls`
-
-If you're on a systemd system:
-```bash
+# systemd systems
 systemd-cgls --unit <unit-name>
-# or to see the whole tree
-systemd-cgls
+systemd-cgls  # whole tree
 ```

-### Method 4: Check cgroup v2 unified hierarchy
-
-On cgroup v2 systems, check:
-```bash
-cat /proc/<PID>/cgroup
-# Output will look like: 0::/user.slice/user-1000.slice/session-1.scope
-```
-
-The path after `::` is the cgroup path under `/sys/fs/cgroup/`.
-
-### Quick one-liner
+## Quick One-Liners

 ```bash
-cat /proc/self/cgroup  # For current shell
-cat /proc/$$/cgroup    # Also current shell
-cat /proc/1234/cgroup  # For PID 1234
+cat /proc/self/cgroup  # current shell
+cat /proc/$$/cgroup    # also current shell
+cat /proc/1234/cgroup  # specific PID
 ```
--- a/docs/background/proc-stat-in-containers.md
+++ b/docs/background/proc-stat-in-containers.md
@ -1,64 +1,36 @@
-# /proc/stat Behavior in Containerized Environments
+# /proc/stat behavior in containerised environments

-## TL;DR
+`/proc/stat` in containers shows **host-level** statistics, not container-specific data. To get container-aware CPU metrics when processes span multiple cgroups (e.g., sidecars sharing a PID namespace), aggregate `/proc/[pid]/stat` for all visible processes and use cgroup limits from `/sys/fs/cgroup` for normalization.

-`/proc/stat` in containers shows host-level statistics, not container-specific data. To get accurate container CPU metrics when cgroups aren't an option (e.g., sidecars sharing PID namespace with different cgroups), aggregate `/proc/[pid]/stat` for all visible processes and calculate percentages using cgroup limits from `/sys/fs/cgroup`.
+## Why /proc/stat is wrong in containers

-## Outcome
+`/proc/stat` reports host-wide values (CPU times, context switches, boot time, process count) because `/proc` is mounted from the host kernel, which has no namespace awareness for these metrics.

- Explained why `/proc/stat` doesn't provide container-aware metrics
- Provided methods to aggregate process CPU usage from `/proc/[pid]/stat`
- Showed how to determine parent/child process relationships
- Demonstrated converting raw ticks to CPU percentages
- Covered how to read cgroup CPU limits for other containers
+This means:
+- Tools reading `/proc/stat` (top, htop, etc.) show **host** CPU usage, not container usage
+- Cgroup CPU limits (e.g., 2 CPUs) are not reflected — all host CPUs are visible
+- In shared environments, containers see each other's aggregate impact

-## Conversation
-
-### How does /proc/stat behave in containerized environments?
-
-By default, `/proc/stat` inside a container shows **host system statistics**, not container-specific data. This includes:
-
- **CPU times** (user, system, idle, etc.) - reflects all CPUs on the host
- **Context switches** - host-wide count
- **Boot time** - host's boot time
- **Processes created** - host-wide total
-
-#### Why This Happens
-
-The `/proc` filesystem is mounted from the host kernel. Containers share the host kernel, and `/proc/stat` is a kernel-level interface that doesn't have namespace awareness for most of its metrics.
-
-#### Practical Implications
-
-1. **CPU usage calculations are wrong** - Tools like `top`, `htop`, or custom monitoring that read `/proc/stat` will show host CPU usage, not container usage
-2. **Resource limits aren't reflected** - Even if you limit a container to 2 CPUs via cgroups, `/proc/stat` shows all host CPUs
-3. **Multi-tenant confusion** - In shared environments, containers see each other's aggregate impact
-
-#### Solutions
+### Alternatives

 | Approach | Description |
 |----------|-------------|
-| **cgroups** | Read from `/sys/fs/cgroup/cpu/` for container-specific CPU accounting |
-| **LXCFS** | A FUSE filesystem that provides container-aware `/proc` files |
-| **Container runtimes** | Some runtimes (like Kata) use VMs with isolated kernels |
-| **Metrics APIs** | Use Docker/Kubernetes APIs instead of `/proc/stat` |
-
-#### Example: Getting Real Container CPU
+| **cgroups** | Read `/sys/fs/cgroup/cpu/` for container-specific CPU accounting |
+| **LXCFS** | FUSE filesystem providing container-aware `/proc` files |
+| **Container runtimes** | Some (like Kata) use VMs with isolated kernels |
+| **Metrics APIs** | Docker/Kubernetes APIs instead of `/proc/stat` |

 ```bash
-# Instead of /proc/stat, use cgroups v2:
+# cgroups v2:
 cat /sys/fs/cgroup/cpu.stat

-# Or cgroups v1:
+# cgroups v1:
 cat /sys/fs/cgroup/cpu/cpuacct.usage
 ```

---
+## Aggregating per-Process CPU from /proc/[pid]/stat

-### Can I aggregate process usage to get the usage of the container?
-
-Yes, you can aggregate process usage, but there are caveats.
-
-#### Aggregating from /proc/[pid]/stat
+When cgroup-level reads aren't an option (sidecars sharing PID namespace with different cgroups), aggregate individual process stats:

 ```bash
 # Fields 14 (utime) and 15 (stime) in /proc/[pid]/stat
@ -67,137 +39,76 @@ for pid in /proc/[0-9]*; do
 done | awk '{sum += $1} END {print sum}'
 ```

-#### Problems with This Approach
+### Caveats

-1. **Race conditions** - Processes can spawn/die between reads, causing inconsistent totals
-2. **Short-lived processes** - Processes that start and exit between samples are missed entirely
-3. **Zombie/exited processes** - Their CPU time may not be captured
-4. **Overhead** - Scanning all PIDs repeatedly is expensive
-5. **Namespace visibility** - You only see processes in your PID namespace, but that's actually what you want in a container
-6. **Children accounting** - When a process exits, its CPU time gets added to the parent's `cutime`/`cstime` fields, which can cause double-counting if not handled carefully
+1. **Race conditions** — processes can spawn/die between reads
+2. **Short-lived processes** — missed if they start and exit between samples
+3. **Zombie/exited processes** — their CPU time may not be captured
+4. **Overhead** — scanning all PIDs repeatedly is expensive
+5. **Namespace visibility** — you only see processes in your PID namespace (which is what you want)
+6. **Children accounting** — when a process exits, its CPU time is added to the parent's `cutime`/`cstime`, risking double-counting

-#### Better Alternative: cgroups
+Cgroups handle these edge cases natively, but **cannot be used when sidecars share the PID namespace with different cgroups** — in that case, per-process aggregation is the best option.

-Cgroups track CPU usage at the group level and handle all these edge cases:
+## Parent/Child Process Relationships
+
+Field 4 in `/proc/[pid]/stat` is the PPID (parent process ID):

 ```bash
-# cgroups v2 (modern)
-cat /sys/fs/cgroup/cpu.stat
-# Shows: usage_usec, user_usec, system_usec
-
-# cgroups v1
-cat /sys/fs/cgroup/cpu/cpuacct.usage  # total nanoseconds
-cat /sys/fs/cgroup/cpu/cpuacct.stat   # user/system jiffies
+awk '{print $4}' /proc/1234/stat        # PPID from stat
+grep PPid /proc/1234/status              # more readable
 ```

-**However**, cgroups cannot be used when sidecars share the PID namespace but have different cgroups. In that case, aggregating `/proc/[pid]/stat` is the best option.
-
---
-
-### How do I determine child/parent processes?
-
-#### From /proc/[pid]/stat
-
-Field 4 is the **PPID** (parent process ID):
-
-```bash
-# Get PPID for a specific process
-awk '{print $4}' /proc/1234/stat
-
-# List all processes with their parents
-for stat in /proc/[0-9]*/stat; do
-    pid="${stat#/proc/}"
-    pid="${pid%/stat}"
-    ppid=$(awk '{print $4}' "$stat" 2>/dev/null) && \
-    echo "$pid -> parent: $ppid"
-done
-```
-
-#### From /proc/[pid]/status (more readable)
-
-```bash
-grep PPid /proc/1234/status
-# PPid: 1
-```
-
-#### Building a Process Tree
+### Building a Process Tree

 ```bash
 #!/bin/bash
-declare -A parent_of
-declare -A children_of
+declare -A parent_of children_of

 for stat in /proc/[0-9]*/stat; do
    if read -r line < "$stat" 2>/dev/null; then
-        pid="${stat#/proc/}"
-        pid="${pid%/stat}"
-
-        # Extract PPID (field 4, but handle comm with spaces)
-        rest="${line##*) }"
-        read -ra fields <<< "$rest"
+        pid="${stat#/proc/}"; pid="${pid%/stat}"
+        rest="${line##*) }"; read -ra fields <<< "$rest"
        ppid="${fields[1]}"  # 4th field overall = index 1 after state
-
        parent_of[$pid]=$ppid
        children_of[$ppid]+="$pid "
    fi
 done

-# Print tree from PID 1
 print_tree() {
-    local pid=$1
-    local indent=$2
+    local pid=$1 indent=$2
    echo "${indent}${pid}"
-    for child in ${children_of[$pid]}; do
-        print_tree "$child" "  $indent"
-    done
+    for child in ${children_of[$pid]}; do print_tree "$child" "  $indent"; done
 }
-
 print_tree 1 ""
 ```

-#### For CPU Aggregation: Handling cutime/cstime
+### Avoiding Double-Counting with cutime/cstime

-To properly handle `cutime`/`cstime` without double-counting:
+Only sum `utime` + `stime` per process. The `cutime`/`cstime` fields are cumulative from children that have already exited and been `wait()`ed on — those children no longer exist in `/proc`, so their time is only accessible via the parent.

 ```bash
 #!/bin/bash
-declare -A parent_of
 declare -A utime stime

-# First pass: collect all data
 for stat in /proc/[0-9]*/stat; do
    if read -r line < "$stat" 2>/dev/null; then
-        pid="${stat#/proc/}"
-        pid="${pid%/stat}"
-        rest="${line##*) }"
-        read -ra f <<< "$rest"
-
-        parent_of[$pid]="${f[1]}"
-        utime[$pid]="${f[11]}"
-        stime[$pid]="${f[12]}"
-        # cutime=${f[13]}  cstime=${f[14]} - don't sum these
+        pid="${stat#/proc/}"; pid="${pid%/stat}"
+        rest="${line##*) }"; read -ra f <<< "$rest"
+        utime[$pid]="${f[11]}"; stime[$pid]="${f[12]}"
+        # cutime=${f[13]} cstime=${f[14]} — don't sum these
    fi
 done

-# Sum only utime/stime (not cutime/cstime)
 total=0
-for pid in "${!utime[@]}"; do
-    ((total += utime[$pid] + stime[$pid]))
-done
-
+for pid in "${!utime[@]}"; do ((total += utime[$pid] + stime[$pid])); done
 echo "Total CPU ticks: $total"
 echo "Seconds: $(echo "scale=2; $total / $(getconf CLK_TCK)" | bc)"
 ```

-**Key insight:** Only sum `utime` + `stime` for each process. The `cutime`/`cstime` fields are cumulative from children that have already exited and been `wait()`ed on—those children no longer exist in `/proc`, so their time is only accessible via the parent's `cutime`/`cstime`.
+## Converting Ticks to CPU Percentages

---
-
-### How do I convert utime/stime to percentages?
-
-You need **two samples** over a time interval. CPU percentage is a rate, not an absolute value.
-
-#### The Formula
+CPU percentage is a rate — you need **two samples** over a time interval.

 ```
 CPU % = (delta_ticks / (elapsed_seconds * CLK_TCK * num_cpus)) * 100
@ -205,20 +116,17 @@ CPU % = (delta_ticks / (elapsed_seconds * CLK_TCK * num_cpus)) * 100

 - `delta_ticks` = difference in (utime + stime) between samples
 - `CLK_TCK` = ticks per second (usually 100, get via `getconf CLK_TCK`)
- `num_cpus` = number of CPUs (omit for single-CPU percentage)
-
-#### Two Common Percentage Styles
+- `num_cpus` = number of CPUs (omit for per-core percentage)

 | Style | Formula | Example |
 |-------|---------|---------|
 | **Normalized** (0-100%) | `delta / (elapsed * CLK_TCK * num_cpus) * 100` | 50% = half of total capacity |
 | **Cores-style** (0-N*100%) | `delta / (elapsed * CLK_TCK) * 100` | 200% = 2 full cores busy |

-#### Practical Script
+### Sampling Script

 ```bash
 #!/bin/bash
-
 CLK_TCK=$(getconf CLK_TCK)
 NUM_CPUS=$(nproc)

@ -226,267 +134,94 @@ get_total_ticks() {
    local total=0
    for stat in /proc/[0-9]*/stat; do
        if read -r line < "$stat" 2>/dev/null; then
-            rest="${line##*) }"
-            read -ra f <<< "$rest"
-            ((total += f[11] + f[12]))  # utime + stime
+            rest="${line##*) }"; read -ra f <<< "$rest"
+            ((total += f[11] + f[12]))
        fi
    done
    echo "$total"
 }

-# First sample
-ticks1=$(get_total_ticks)
-time1=$(date +%s.%N)
-
-# Wait
+ticks1=$(get_total_ticks); time1=$(date +%s.%N)
 sleep 1
+ticks2=$(get_total_ticks); time2=$(date +%s.%N)

-# Second sample
-ticks2=$(get_total_ticks)
-time2=$(date +%s.%N)
-
-# Calculate
 delta_ticks=$((ticks2 - ticks1))
 elapsed=$(echo "$time2 - $time1" | bc)

-# Percentage of total CPU capacity (all cores)
 pct=$(echo "scale=2; ($delta_ticks / ($elapsed * $CLK_TCK * $NUM_CPUS)) * 100" | bc)
 echo "CPU usage: ${pct}% of ${NUM_CPUS} CPUs"

-# Percentage as "CPU cores used" (like top's 200% for 2 full cores)
 cores_pct=$(echo "scale=2; ($delta_ticks / ($elapsed * $CLK_TCK)) * 100" | bc)
 echo "CPU usage: ${cores_pct}% (cores-style)"
 ```

-#### Continuous Monitoring
+## Respecting Cgroup CPU Limits
+
+The above calculations use `nproc`, which returns the **host** CPU count. If a container is limited to 2 CPUs on an 8-CPU host, `nproc` returns 8 and the percentage is misleading.
+
+### Reading Effective CPU Limit

 ```bash
 #!/bin/bash
-CLK_TCK=$(getconf CLK_TCK)
-NUM_CPUS=$(nproc)
-INTERVAL=1
-
-get_total_ticks() {
-    local total=0
-    for stat in /proc/[0-9]*/stat; do
-        read -r line < "$stat" 2>/dev/null || continue
-        rest="${line##*) }"
-        read -ra f <<< "$rest"
-        ((total += f[11] + f[12]))
-    done
-    echo "$total"
-}
-
-prev_ticks=$(get_total_ticks)
-prev_time=$(date +%s.%N)
-
-while true; do
-    sleep "$INTERVAL"
-
-    curr_ticks=$(get_total_ticks)
-    curr_time=$(date +%s.%N)
-
-    delta=$((curr_ticks - prev_ticks))
-    elapsed=$(echo "$curr_time - $prev_time" | bc)
-
-    pct=$(echo "scale=1; $delta / ($elapsed * $CLK_TCK * $NUM_CPUS) * 100" | bc)
-    printf "\rCPU: %5.1f%%" "$pct"
-
-    prev_ticks=$curr_ticks
-    prev_time=$curr_time
-done
-```
-
---
-
-### Does this calculation respect cgroup limits?
-
-No, it doesn't. The calculation uses `nproc` which typically returns **host CPU count**, not your cgroup limit.
-
-#### The Problem
-
-If your container is limited to 2 CPUs on an 8-CPU host:
- `nproc` returns 8
- Your calculation shows 25% when you're actually at 100% of your limit
-
-#### Getting Effective CPU Limit
-
-**cgroups v2:**
-
-```bash
-# cpu.max contains: $quota $period (in microseconds)
-# "max 100000" means unlimited
-read quota period < /sys/fs/cgroup/cpu.max
-if [[ "$quota" == "max" ]]; then
-    effective_cpus=$(nproc)
-else
-    effective_cpus=$(echo "scale=2; $quota / $period" | bc)
-fi
-echo "Effective CPUs: $effective_cpus"
-```
-
-**cgroups v1:**
-
-```bash
-quota=$(cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us)
-period=$(cat /sys/fs/cgroup/cpu/cpu.cfs_period_us)
-
-if [[ "$quota" == "-1" ]]; then
-    effective_cpus=$(nproc)
-else
-    effective_cpus=$(echo "scale=2; $quota / $period" | bc)
-fi
-```
-
-**Also Check cpuset Limits:**
-
-```bash
-# cgroups v2
-cpuset=$(cat /sys/fs/cgroup/cpuset.cpus.effective 2>/dev/null)
-
-# cgroups v1
-cpuset=$(cat /sys/fs/cgroup/cpuset/cpuset.cpus 2>/dev/null)
-
-# Parse "0-3,5,7" format to count CPUs
-count_cpus() {
-    local count=0
-    IFS=',' read -ra ranges <<< "$1"
-    for range in "${ranges[@]}"; do
-        if [[ "$range" == *-* ]]; then
-            start="${range%-*}"
-            end="${range#*-}"
-            ((count += end - start + 1))
-        else
-            ((count++))
-        fi
-    done
-    echo "$count"
-}
-```
-
-#### Updated Script Respecting Limits
-
-```bash
-#!/bin/bash
-CLK_TCK=$(getconf CLK_TCK)
-
 get_effective_cpus() {
-    # Try cgroups v2 first
+    # cgroups v2
    if [[ -f /sys/fs/cgroup/cpu.max ]]; then
        read quota period < /sys/fs/cgroup/cpu.max
-        if [[ "$quota" != "max" ]]; then
-            echo "scale=2; $quota / $period" | bc
-            return
-        fi
+        [[ "$quota" != "max" ]] && echo "scale=2; $quota / $period" | bc && return
    fi
-
-    # Try cgroups v1
+    # cgroups v1
    if [[ -f /sys/fs/cgroup/cpu/cpu.cfs_quota_us ]]; then
        quota=$(cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us)
        period=$(cat /sys/fs/cgroup/cpu/cpu.cfs_period_us)
-        if [[ "$quota" != "-1" ]]; then
-            echo "scale=2; $quota / $period" | bc
-            return
-        fi
+        [[ "$quota" != "-1" ]] && echo "scale=2; $quota / $period" | bc && return
    fi
-
-    # Fallback to nproc
-    nproc
+    nproc  # fallback
 }
-
-EFFECTIVE_CPUS=$(get_effective_cpus)
-echo "Effective CPU limit: $EFFECTIVE_CPUS"
-
-# ... rest of your sampling logic using $EFFECTIVE_CPUS
 ```

-#### Shared PID Namespace Consideration
+Also check cpuset limits (`cpuset.cpus.effective` for v2, `cpuset/cpuset.cpus` for v1) which restrict which physical CPUs are available.

-When sidecars share PID namespace but have different cgroups:
- Each container may have different CPU limits
- You're aggregating processes across those limits
- There's no single "correct" limit to use
+### Shared PID Namespace Complication
+
+When sidecars share a PID namespace but have different cgroups, there's no single "correct" CPU limit for normalization. Options:

-**Options:**
 1. **Use host CPU count** — percentage of total host capacity
 2. **Sum the limits** — if you know each sidecar's cgroup, sum their quotas
-3. **Report in cores** — skip normalization, just show `1.5 cores used` instead of percentage
+3. **Report in cores** — skip normalization, show `1.5 cores used` instead of percentage

---
+## Reading Cgroup Limits for Other Containers

-### Can I get the cgroup limit for another cgroup?
-
-Yes, if you have visibility into the cgroup filesystem.
-
-#### 1. Find a Process's Cgroup
-
-Every process exposes its cgroup membership:
+Every process exposes its cgroup membership via `/proc/<PID>/cgroup`. If the cgroup filesystem is mounted, you can read any container's limits:

 ```bash
-# Get cgroup for any PID you can see
-cat /proc/1234/cgroup
-
-# cgroups v2 output:
-# 0::/kubepods/pod123/container456
-
-# cgroups v1 output:
-# 12:cpu,cpuacct:/docker/abc123
-# 11:memory:/docker/abc123
-# ...
-```
-
-#### 2. Read That Cgroup's Limits
-
-If the cgroup filesystem is mounted and accessible:
-
-```bash
-#!/bin/bash
-
 get_cgroup_cpu_limit() {
    local pid=$1
-
-    # Get cgroup path for this PID
-    cgroup_path=$(grep -oP '0::\K.*' /proc/$pid/cgroup 2>/dev/null)  # v2
-
+    # cgroups v2
+    cgroup_path=$(grep -oP '0::\K.*' /proc/$pid/cgroup 2>/dev/null)
    if [[ -n "$cgroup_path" ]]; then
-        # cgroups v2
        limit_file="/sys/fs/cgroup${cgroup_path}/cpu.max"
        if [[ -r "$limit_file" ]]; then
            read quota period < "$limit_file"
-            if [[ "$quota" == "max" ]]; then
-                echo "unlimited"
-            else
-                echo "scale=2; $quota / $period" | bc
-            fi
+            [[ "$quota" == "max" ]] && echo "unlimited" || echo "scale=2; $quota / $period" | bc
            return
        fi
    fi
-
-    # Try cgroups v1
+    # cgroups v1
    cgroup_path=$(grep -oP 'cpu.*:\K.*' /proc/$pid/cgroup 2>/dev/null)
    if [[ -n "$cgroup_path" ]]; then
        quota_file="/sys/fs/cgroup/cpu${cgroup_path}/cpu.cfs_quota_us"
        period_file="/sys/fs/cgroup/cpu${cgroup_path}/cpu.cfs_period_us"
        if [[ -r "$quota_file" ]]; then
-            quota=$(cat "$quota_file")
-            period=$(cat "$period_file")
-            if [[ "$quota" == "-1" ]]; then
-                echo "unlimited"
-            else
-                echo "scale=2; $quota / $period" | bc
-            fi
+            quota=$(cat "$quota_file"); period=$(cat "$period_file")
+            [[ "$quota" == "-1" ]] && echo "unlimited" || echo "scale=2; $quota / $period" | bc
            return
        fi
    fi
-
    echo "unknown"
 }
-
-# Example: get limit for PID 1234
-get_cgroup_cpu_limit 1234
 ```

-#### 3. Mount Visibility Requirements
+### Mount Visibility

 | Scenario | Can Read Other Cgroups? |
 |----------|------------------------|
@ -495,66 +230,9 @@ get_cgroup_cpu_limit 1234
 | `/sys/fs/cgroup` mounted read-only from host | Yes (common in Kubernetes) |
 | Only own cgroup subtree mounted | No |

-Check what's visible:
+### Fallbacks When Cgroups Aren't Accessible

-```bash
-mount | grep cgroup
-ls /sys/fs/cgroup/
-```
-
-#### 4. Full Solution: Aggregate by Cgroup
-
-```bash
-#!/bin/bash
-CLK_TCK=$(getconf CLK_TCK)
-
-declare -A cgroup_ticks
-declare -A cgroup_limit
-
-for stat in /proc/[0-9]*/stat; do
-    pid="${stat#/proc/}"
-    pid="${pid%/stat}"
-
-    # Get cgroup for this process
-    cg=$(grep -oP '0::\K.*' /proc/$pid/cgroup 2>/dev/null)
-    [[ -z "$cg" ]] && continue
-
-    # Get CPU ticks
-    if read -r line < "$stat" 2>/dev/null; then
-        rest="${line##*) }"
-        read -ra f <<< "$rest"
-        ticks=$((f[11] + f[12]))
-
-        ((cgroup_ticks[$cg] += ticks))
-
-        # Cache the limit (only look up once per cgroup)
-        if [[ -z "${cgroup_limit[$cg]}" ]]; then
-            limit_file="/sys/fs/cgroup${cg}/cpu.max"
-            if [[ -r "$limit_file" ]]; then
-                read quota period < "$limit_file"
-                if [[ "$quota" == "max" ]]; then
-                    cgroup_limit[$cg]="unlimited"
-                else
-                    cgroup_limit[$cg]=$(echo "scale=2; $quota / $period" | bc)
-                fi
-            else
-                cgroup_limit[$cg]="unknown"
-            fi
-        fi
-    fi
-done
-
-echo "Ticks by cgroup:"
-for cg in "${!cgroup_ticks[@]}"; do
-    echo "  $cg: ${cgroup_ticks[$cg]} ticks (limit: ${cgroup_limit[$cg]} CPUs)"
-done
-```
-
-#### If You Can't Access Other Cgroups
-
-Fallback options:
-
-1. **Mount the cgroup fs** — add volume mount for `/sys/fs/cgroup:ro`
-2. **Use a sidecar with access** — one privileged container does the monitoring
+1. **Mount the cgroup fs** — volume mount `/sys/fs/cgroup:ro`
+2. **Use a sidecar with access** — one privileged container does monitoring
 3. **Accept "unknown" limits** — report raw ticks/cores instead of percentages
-4. **Kubernetes Downward API** — inject limits as env vars (only for your own container though)
+4. **Kubernetes Downward API** — inject limits as env vars (own container only)