website-and-documentation/content/en/docs/edp/deployment/infrastructure/stacks/observability-client.md
Martin McCaffery 41e3306942
Some checks failed
build / build (push) Failing after 52s
ci / build (push) Successful in 55s
feat(docs): Restructure entire documentation
2025-12-18 10:25:07 +01:00

14 KiB

title linkTitle weight description
Observability Client Observability Client 60 Core observability components for metrics collection, log aggregation, and monitoring

Overview

The Observability Client stack provides essential monitoring and observability infrastructure for Kubernetes environments. As part of the Edge Developer Platform, it deploys client-side components that collect, process, and forward metrics and logs to centralized observability systems.

The stack integrates three core components: Kubernetes Metrics Server for resource metrics, Vector for log collection and forwarding, and Victoria Metrics for comprehensive metrics monitoring and alerting.

Key Features

  • Resource Metrics: Real-time CPU and memory metrics via Kubernetes Metrics Server
  • Log Aggregation: Unified log collection and forwarding with Vector
  • Metrics Monitoring: Comprehensive metrics collection, storage, and alerting with Victoria Metrics
  • Prometheus Compatibility: Full Prometheus protocol support for metrics scraping
  • Multi-Tenant Support: Configurable tenant isolation for metrics and logs
  • Automated Alerting: Pre-configured alert rules with Alertmanager integration
  • Grafana Integration: Built-in dashboard provisioning and datasource configuration

Repository

Code: Observability Client Stack Templates

Documentation:

Getting Started

Prerequisites

  • Kubernetes cluster with ArgoCD installed (provided by core stack)
  • cert-manager for certificate management (provided by otc stack)
  • Observability backend services for receiving metrics and logs

Quick Start

The Observability Client stack is deployed as part of the EDP installation process:

  1. Trigger Deploy Pipeline

    • Go to Infra Deploy Pipeline
    • Click on Run workflow
    • Enter a name in "Select environment directory to deploy". This must be DNS Compatible.
    • Execute workflow
  2. ArgoCD Synchronization ArgoCD automatically deploys:

    • Metrics Server (Helm chart v3.12.2)
    • Vector agent (Helm chart v0.43.0)
    • Victoria Metrics k8s-stack (Helm chart v0.48.1)
    • ServiceMonitor resources for Prometheus scraping
    • Authentication secrets for remote write endpoints

Verification

Verify the Observability Client deployment:

# Check ArgoCD application status
kubectl get application -n argocd | grep -E "metrics-server|vector|vm-client"

# Verify Metrics Server is running
kubectl get pods -n observability -l app.kubernetes.io/name=metrics-server

# Test metrics API
kubectl top nodes
kubectl top pods -A

# Verify Vector pods are running
kubectl get pods -n observability -l app.kubernetes.io/name=vector

# Check Victoria Metrics components
kubectl get pods -n observability -l app.kubernetes.io/name=victoria-metrics-k8s-stack

# Verify ServiceMonitor resources
kubectl get servicemonitor -n observability

Architecture

Component Architecture

The Observability Client stack consists of three integrated components:

Metrics Server:

  • Collects resource metrics (CPU, memory) from kubelet
  • Provides Metrics API for kubectl top and HPA
  • Lightweight aggregator for cluster-wide resource usage
  • Exposes ServiceMonitor for Prometheus scraping

Vector Agent:

  • DaemonSet deployment for log collection across all nodes
  • Processes and transforms Kubernetes logs
  • Forwards logs to centralized Elasticsearch backend
  • Injects cluster metadata and environment information
  • Supports compression and bulk operations

Victoria Metrics Stack:

  • VMAgent: Scrapes metrics from Kubernetes components and applications
  • VMAlertmanager: Manages alert routing and notifications
  • VMOperator: Manages VictoriaMetrics CRDs and lifecycle
  • Integration with remote Victoria Metrics storage
  • Supports multi-tenant metrics isolation

Data Flow

Kubernetes Resources → Metrics Server → Metrics API
                                      ↓
                                ServiceMonitor → VMAgent → Remote VictoriaMetrics

Application Logs → Vector Agent → Transform → Remote Elasticsearch

Prometheus Exporters → VMAgent → Remote VictoriaMetrics → VMAlertmanager

Configuration

Metrics Server Configuration

Configured in stacks/observability-client/metrics-server/values.yaml:

metrics:
  enabled: true
serviceMonitor:
  enabled: true

Key Settings:

  • Enables metrics collection endpoint
  • Exposes ServiceMonitor for Prometheus-compatible scraping
  • Deployed via Helm chart from https://kubernetes-sigs.github.io/metrics-server/

Vector Configuration

Configured in stacks/observability-client/vector/values.yaml:

Role: Agent (DaemonSet deployment across nodes)

Authentication: Credentials sourced from simple-user-secret:

  • VECTOR_USER: Username for remote write authentication
  • VECTOR_PASSWORD: Password for remote write authentication

Data Sources:

  • k8s: Collects Kubernetes container logs
  • internal_metrics: Gathers Vector internal metrics

Log Processing:

transforms:
  parser:
    - Parse JSON from log messages
    - Inject cluster environment metadata
    - Remove original message field

Output Sink:

  • Elasticsearch bulk API (v8)
  • Basic authentication with environment variables
  • Gzip compression enabled
  • Custom headers: AccountID and ProjectID

Victoria Metrics Stack Configuration

Configured in stacks/observability-client/vm-client-stack/values.yaml:

Operator Settings:

  • Enabled with admission webhooks
  • Managed by cert-manager for ArgoCD compatibility

VMAgent Configuration:

  • Basic authentication for remote write
  • Credentials from vm-remote-write-secret
  • Stream parsing enabled
  • Drop original labels to reduce memory footprint

Monitoring Targets:

  • Node exporter for hardware metrics
  • kube-state-metrics for Kubernetes object states
  • Kubelet metrics (cadvisor)
  • Kubernetes control plane components (API server, etcd, scheduler, controller manager)
  • CoreDNS metrics

Alertmanager Integration:

  • Slack notification templates
  • Configurable routing rules
  • TLS support for secure communication

Storage Options:

  • VMSingle: Single-node deployment
  • VMCluster: Distributed deployment with replication
  • Configurable retention period

ArgoCD Application Configuration

Metrics Server Application (template/stacks/observability-client/metrics-server.yaml):

  • Name: metrics-server
  • Chart version: 3.12.2
  • Automated sync with self-heal enabled
  • Namespace: observability

Vector Application (template/stacks/observability-client/vector.yaml):

  • Name: vector
  • Chart version: 0.43.0
  • Automated sync with self-heal enabled
  • Namespace: observability

Victoria Metrics Application (template/stacks/observability-client/vm-client-stack.yaml):

  • Name: vm-client
  • Chart version: 0.48.1
  • Automated sync with self-heal enabled
  • Namespace: observability
  • References manifests from instance repository

Usage Examples

Querying Resource Metrics

Access resource metrics collected by Metrics Server:

# View node resource usage
kubectl top nodes

# View pod resource usage across all namespaces
kubectl top pods -A

# View pod resource usage in specific namespace
kubectl top pods -n observability

# Sort pods by CPU usage
kubectl top pods -A --sort-by=cpu

# Sort pods by memory usage
kubectl top pods -A --sort-by=memory

Using Metrics for Autoscaling

Create Horizontal Pod Autoscaler based on metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Accessing Application Logs

Vector automatically collects logs from all containers. View logs in your centralized Elasticsearch/Kibana:

# Logs are automatically forwarded to Elasticsearch
# Access via Kibana dashboard or Elasticsearch API

# Example: Query logs via Elasticsearch API
curl -u $VECTOR_USER:$VECTOR_PASSWORD \
  -X GET "https://elasticsearch.example.com/_search" \
  -H 'Content-Type: application/json' \
  -d '{
    "query": {
      "match": {
        "kubernetes.namespace": "my-namespace"
      }
    }
  }'

Querying Victoria Metrics

Query metrics collected by Victoria Metrics:

# Access Victoria Metrics query API
# Metrics are forwarded to remote Victoria Metrics instance

# Example PromQL queries:
# - Container CPU usage: container_cpu_usage_seconds_total
# - Pod memory usage: container_memory_usage_bytes
# - Node disk I/O: node_disk_io_time_seconds_total

# Query via Victoria Metrics API
curl -X POST https://victoriametrics.example.com/api/v1/query \
  -d 'query=up' \
  -d 'time=2025-12-16T00:00:00Z'

Creating Custom ServiceMonitors

Expose application metrics for collection:

apiVersion: v1
kind: Service
metadata:
  name: myapp-metrics
  labels:
    app: myapp
spec:
  ports:
  - name: metrics
    port: 8080
    targetPort: 8080
  selector:
    app: myapp
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-monitor
  namespace: observability
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
  - port: metrics
    path: /metrics
    interval: 30s

Integration Points

  • Core Stack: Depends on ArgoCD for deployment orchestration
  • OTC Stack: Requires cert-manager for certificate management
  • Observability Stack: Forwards metrics and logs to centralized observability backend
  • All Application Stacks: Collects metrics and logs from all platform applications

Troubleshooting

Metrics Server Not Responding

Problem: kubectl top commands fail or return no data

Solution:

  1. Check Metrics Server pod status:

    kubectl get pods -n observability -l app.kubernetes.io/name=metrics-server
    kubectl logs -n observability -l app.kubernetes.io/name=metrics-server
    
  2. Verify kubelet metrics endpoint:

    kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
    
  3. Check ServiceMonitor configuration:

    kubectl get servicemonitor -n observability -o yaml
    

Vector Not Forwarding Logs

Problem: Logs are not appearing in Elasticsearch

Solution:

  1. Check Vector agent status:

    kubectl get pods -n observability -l app.kubernetes.io/name=vector
    kubectl logs -n observability -l app.kubernetes.io/name=vector --tail=50
    
  2. Verify authentication secret:

    kubectl get secret simple-user-secret -n observability
    kubectl get secret simple-user-secret -n observability -o jsonpath='{.data.username}' | base64 -d
    
  3. Test Elasticsearch connectivity:

    kubectl exec -it -n observability $(kubectl get pod -n observability -l app.kubernetes.io/name=vector -o jsonpath='{.items[0].metadata.name}') -- \
      curl -u $VECTOR_USER:$VECTOR_PASSWORD https://elasticsearch.example.com/_cluster/health
    
  4. Check Vector internal metrics:

    kubectl port-forward -n observability svc/vector 9090:9090
    curl http://localhost:9090/metrics
    

Victoria Metrics Not Scraping

Problem: Metrics are not being collected or forwarded

Solution:

  1. Check VMAgent status:

    kubectl get pods -n observability -l app.kubernetes.io/name=vmagent
    kubectl logs -n observability -l app.kubernetes.io/name=vmagent
    
  2. Verify remote write secret:

    kubectl get secret vm-remote-write-secret -n observability
    kubectl get secret vm-remote-write-secret -n observability -o jsonpath='{.data.username}' | base64 -d
    
  3. Check ServiceMonitor targets:

    kubectl get servicemonitor -n observability
    kubectl describe servicemonitor metrics-server -n observability
    
  4. Verify operator is running:

    kubectl get pods -n observability -l app.kubernetes.io/name=victoria-metrics-operator
    kubectl logs -n observability -l app.kubernetes.io/name=victoria-metrics-operator
    

High Memory Usage

Problem: Victoria Metrics or Vector consuming excessive memory

Solution:

  1. For Victoria Metrics, verify dropOriginalLabels is enabled:

    kubectl get vmagent -n observability -o yaml | grep dropOriginalLabels
    
  2. Reduce scrape intervals for high-cardinality metrics:

    # Edit ServiceMonitor
    spec:
      endpoints:
      - interval: 60s  # Increase from 30s
    
  3. Filter unnecessary logs in Vector:

    # Add filter transform to Vector configuration
    transforms:
      filter:
        type: filter
        condition: '.kubernetes.namespace != "kube-system"'
    
  4. Check resource limits:

    kubectl describe pod -n observability -l app.kubernetes.io/name=vmagent
    kubectl describe pod -n observability -l app.kubernetes.io/name=vector
    

Certificate Issues

Problem: TLS certificate errors in logs

Solution:

  1. Verify cert-manager is running:

    kubectl get pods -n cert-manager
    
  2. Check certificate status:

    kubectl get certificate -n observability
    kubectl describe certificate -n observability
    
  3. Review webhook configuration:

    kubectl get validatingwebhookconfigurations | grep victoria-metrics
    kubectl get mutatingwebhookconfigurations | grep victoria-metrics
    
  4. Restart operator if needed:

    kubectl rollout restart deployment victoria-metrics-operator -n observability
    

Additional Resources