--- title: "Observability Client" linkTitle: "Observability Client" weight: 60 description: > Core observability components for metrics collection, log aggregation, and monitoring --- ## Overview The Observability Client stack provides essential monitoring and observability infrastructure for Kubernetes environments. As part of the Edge Developer Platform, it deploys client-side components that collect, process, and forward metrics and logs to centralized observability systems. The stack integrates three core components: Kubernetes Metrics Server for resource metrics, Vector for log collection and forwarding, and Victoria Metrics for comprehensive metrics monitoring and alerting. ## Key Features * **Resource Metrics**: Real-time CPU and memory metrics via Kubernetes Metrics Server * **Log Aggregation**: Unified log collection and forwarding with Vector * **Metrics Monitoring**: Comprehensive metrics collection, storage, and alerting with Victoria Metrics * **Prometheus Compatibility**: Full Prometheus protocol support for metrics scraping * **Multi-Tenant Support**: Configurable tenant isolation for metrics and logs * **Automated Alerting**: Pre-configured alert rules with Alertmanager integration * **Grafana Integration**: Built-in dashboard provisioning and datasource configuration ## Repository **Code**: [Observability Client Stack Templates](https://edp.buildth.ing/DevFW-CICD/stacks/src/branch/main/template/stacks/observability-client) **Documentation**: * [Kubernetes Metrics Server](https://github.com/kubernetes-sigs/metrics-server) * [Vector Documentation](https://vector.dev/docs/) * [Victoria Metrics Documentation](https://docs.victoriametrics.com/) ## Getting Started ### Prerequisites * Kubernetes cluster with ArgoCD installed (provided by `core` stack) * cert-manager for certificate management (provided by `otc` stack) * Observability backend services for receiving metrics and logs ### Quick Start The Observability Client stack is deployed as part of the EDP installation process: 1. **Trigger Deploy Pipeline** - Go to [Infra Deploy Pipeline](https://edp.buildth.ing/DevFW/infra-deploy/actions?workflow=deploy.yaml) - Click on Run workflow - Enter a name in "Select environment directory to deploy". This must be DNS Compatible. - Execute workflow 2. **ArgoCD Synchronization** ArgoCD automatically deploys: - Metrics Server (Helm chart v3.12.2) - Vector agent (Helm chart v0.43.0) - Victoria Metrics k8s-stack (Helm chart v0.48.1) - ServiceMonitor resources for Prometheus scraping - Authentication secrets for remote write endpoints ### Verification Verify the Observability Client deployment: ```bash # Check ArgoCD application status kubectl get application -n argocd | grep -E "metrics-server|vector|vm-client" # Verify Metrics Server is running kubectl get pods -n observability -l app.kubernetes.io/name=metrics-server # Test metrics API kubectl top nodes kubectl top pods -A # Verify Vector pods are running kubectl get pods -n observability -l app.kubernetes.io/name=vector # Check Victoria Metrics components kubectl get pods -n observability -l app.kubernetes.io/name=victoria-metrics-k8s-stack # Verify ServiceMonitor resources kubectl get servicemonitor -n observability ``` ## Architecture ### Component Architecture The Observability Client stack consists of three integrated components: **Metrics Server**: - Collects resource metrics (CPU, memory) from kubelet - Provides Metrics API for kubectl top and HPA - Lightweight aggregator for cluster-wide resource usage - Exposes ServiceMonitor for Prometheus scraping **Vector Agent**: - DaemonSet deployment for log collection across all nodes - Processes and transforms Kubernetes logs - Forwards logs to centralized Elasticsearch backend - Injects cluster metadata and environment information - Supports compression and bulk operations **Victoria Metrics Stack**: - VMAgent: Scrapes metrics from Kubernetes components and applications - VMAlertmanager: Manages alert routing and notifications - VMOperator: Manages VictoriaMetrics CRDs and lifecycle - Integration with remote Victoria Metrics storage - Supports multi-tenant metrics isolation ### Data Flow ``` Kubernetes Resources → Metrics Server → Metrics API ↓ ServiceMonitor → VMAgent → Remote VictoriaMetrics Application Logs → Vector Agent → Transform → Remote Elasticsearch Prometheus Exporters → VMAgent → Remote VictoriaMetrics → VMAlertmanager ``` ## Configuration ### Metrics Server Configuration Configured in `stacks/observability-client/metrics-server/values.yaml`: ```yaml metrics: enabled: true serviceMonitor: enabled: true ``` **Key Settings**: - Enables metrics collection endpoint - Exposes ServiceMonitor for Prometheus-compatible scraping - Deployed via Helm chart from `https://kubernetes-sigs.github.io/metrics-server/` ### Vector Configuration Configured in `stacks/observability-client/vector/values.yaml`: **Role**: Agent (DaemonSet deployment across nodes) **Authentication**: Credentials sourced from `simple-user-secret`: - `VECTOR_USER`: Username for remote write authentication - `VECTOR_PASSWORD`: Password for remote write authentication **Data Sources**: - `k8s`: Collects Kubernetes container logs - `internal_metrics`: Gathers Vector internal metrics **Log Processing**: ```yaml transforms: parser: - Parse JSON from log messages - Inject cluster environment metadata - Remove original message field ``` **Output Sink**: - Elasticsearch bulk API (v8) - Basic authentication with environment variables - Gzip compression enabled - Custom headers: AccountID and ProjectID ### Victoria Metrics Stack Configuration Configured in `stacks/observability-client/vm-client-stack/values.yaml`: **Operator Settings**: - Enabled with admission webhooks - Managed by cert-manager for ArgoCD compatibility **VMAgent Configuration**: - Basic authentication for remote write - Credentials from `vm-remote-write-secret` - Stream parsing enabled - Drop original labels to reduce memory footprint **Monitoring Targets**: - Node exporter for hardware metrics - kube-state-metrics for Kubernetes object states - Kubelet metrics (cadvisor) - Kubernetes control plane components (API server, etcd, scheduler, controller manager) - CoreDNS metrics **Alertmanager Integration**: - Slack notification templates - Configurable routing rules - TLS support for secure communication **Storage Options**: - VMSingle: Single-node deployment - VMCluster: Distributed deployment with replication - Configurable retention period ## ArgoCD Application Configuration **Metrics Server Application** (`template/stacks/observability-client/metrics-server.yaml`): - Name: `metrics-server` - Chart version: 3.12.2 - Automated sync with self-heal enabled - Namespace: `observability` **Vector Application** (`template/stacks/observability-client/vector.yaml`): - Name: `vector` - Chart version: 0.43.0 - Automated sync with self-heal enabled - Namespace: `observability` **Victoria Metrics Application** (`template/stacks/observability-client/vm-client-stack.yaml`): - Name: `vm-client` - Chart version: 0.48.1 - Automated sync with self-heal enabled - Namespace: `observability` - References manifests from instance repository ## Usage Examples ### Querying Resource Metrics Access resource metrics collected by Metrics Server: ```bash # View node resource usage kubectl top nodes # View pod resource usage across all namespaces kubectl top pods -A # View pod resource usage in specific namespace kubectl top pods -n observability # Sort pods by CPU usage kubectl top pods -A --sort-by=cpu # Sort pods by memory usage kubectl top pods -A --sort-by=memory ``` ### Using Metrics for Autoscaling Create Horizontal Pod Autoscaler based on metrics: ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: myapp-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 ``` ### Accessing Application Logs Vector automatically collects logs from all containers. View logs in your centralized Elasticsearch/Kibana: ```bash # Logs are automatically forwarded to Elasticsearch # Access via Kibana dashboard or Elasticsearch API # Example: Query logs via Elasticsearch API curl -u $VECTOR_USER:$VECTOR_PASSWORD \ -X GET "https://elasticsearch.example.com/_search" \ -H 'Content-Type: application/json' \ -d '{ "query": { "match": { "kubernetes.namespace": "my-namespace" } } }' ``` ### Querying Victoria Metrics Query metrics collected by Victoria Metrics: ```bash # Access Victoria Metrics query API # Metrics are forwarded to remote Victoria Metrics instance # Example PromQL queries: # - Container CPU usage: container_cpu_usage_seconds_total # - Pod memory usage: container_memory_usage_bytes # - Node disk I/O: node_disk_io_time_seconds_total # Query via Victoria Metrics API curl -X POST https://victoriametrics.example.com/api/v1/query \ -d 'query=up' \ -d 'time=2025-12-16T00:00:00Z' ``` ### Creating Custom ServiceMonitors Expose application metrics for collection: ```yaml apiVersion: v1 kind: Service metadata: name: myapp-metrics labels: app: myapp spec: ports: - name: metrics port: 8080 targetPort: 8080 selector: app: myapp --- apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: myapp-monitor namespace: observability spec: selector: matchLabels: app: myapp endpoints: - port: metrics path: /metrics interval: 30s ``` ## Integration Points * **Core Stack**: Depends on ArgoCD for deployment orchestration * **OTC Stack**: Requires cert-manager for certificate management * **Observability Stack**: Forwards metrics and logs to centralized observability backend * **All Application Stacks**: Collects metrics and logs from all platform applications ## Troubleshooting ### Metrics Server Not Responding **Problem**: `kubectl top` commands fail or return no data **Solution**: 1. Check Metrics Server pod status: ```bash kubectl get pods -n observability -l app.kubernetes.io/name=metrics-server kubectl logs -n observability -l app.kubernetes.io/name=metrics-server ``` 2. Verify kubelet metrics endpoint: ```bash kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes ``` 3. Check ServiceMonitor configuration: ```bash kubectl get servicemonitor -n observability -o yaml ``` ### Vector Not Forwarding Logs **Problem**: Logs are not appearing in Elasticsearch **Solution**: 1. Check Vector agent status: ```bash kubectl get pods -n observability -l app.kubernetes.io/name=vector kubectl logs -n observability -l app.kubernetes.io/name=vector --tail=50 ``` 2. Verify authentication secret: ```bash kubectl get secret simple-user-secret -n observability kubectl get secret simple-user-secret -n observability -o jsonpath='{.data.username}' | base64 -d ``` 3. Test Elasticsearch connectivity: ```bash kubectl exec -it -n observability $(kubectl get pod -n observability -l app.kubernetes.io/name=vector -o jsonpath='{.items[0].metadata.name}') -- \ curl -u $VECTOR_USER:$VECTOR_PASSWORD https://elasticsearch.example.com/_cluster/health ``` 4. Check Vector internal metrics: ```bash kubectl port-forward -n observability svc/vector 9090:9090 curl http://localhost:9090/metrics ``` ### Victoria Metrics Not Scraping **Problem**: Metrics are not being collected or forwarded **Solution**: 1. Check VMAgent status: ```bash kubectl get pods -n observability -l app.kubernetes.io/name=vmagent kubectl logs -n observability -l app.kubernetes.io/name=vmagent ``` 2. Verify remote write secret: ```bash kubectl get secret vm-remote-write-secret -n observability kubectl get secret vm-remote-write-secret -n observability -o jsonpath='{.data.username}' | base64 -d ``` 3. Check ServiceMonitor targets: ```bash kubectl get servicemonitor -n observability kubectl describe servicemonitor metrics-server -n observability ``` 4. Verify operator is running: ```bash kubectl get pods -n observability -l app.kubernetes.io/name=victoria-metrics-operator kubectl logs -n observability -l app.kubernetes.io/name=victoria-metrics-operator ``` ### High Memory Usage **Problem**: Victoria Metrics or Vector consuming excessive memory **Solution**: 1. For Victoria Metrics, verify `dropOriginalLabels` is enabled: ```bash kubectl get vmagent -n observability -o yaml | grep dropOriginalLabels ``` 2. Reduce scrape intervals for high-cardinality metrics: ```yaml # Edit ServiceMonitor spec: endpoints: - interval: 60s # Increase from 30s ``` 3. Filter unnecessary logs in Vector: ```yaml # Add filter transform to Vector configuration transforms: filter: type: filter condition: '.kubernetes.namespace != "kube-system"' ``` 4. Check resource limits: ```bash kubectl describe pod -n observability -l app.kubernetes.io/name=vmagent kubectl describe pod -n observability -l app.kubernetes.io/name=vector ``` ### Certificate Issues **Problem**: TLS certificate errors in logs **Solution**: 1. Verify cert-manager is running: ```bash kubectl get pods -n cert-manager ``` 2. Check certificate status: ```bash kubectl get certificate -n observability kubectl describe certificate -n observability ``` 3. Review webhook configuration: ```bash kubectl get validatingwebhookconfigurations | grep victoria-metrics kubectl get mutatingwebhookconfigurations | grep victoria-metrics ``` 4. Restart operator if needed: ```bash kubectl rollout restart deployment victoria-metrics-operator -n observability ``` ## Additional Resources * [Kubernetes Metrics Server Documentation](https://github.com/kubernetes-sigs/metrics-server) * [Vector Documentation](https://vector.dev/docs/) * [Victoria Metrics Documentation](https://docs.victoriametrics.com/) * [Victoria Metrics Operator](https://docs.victoriametrics.com/operator/) * [Prometheus Operator API](https://prometheus-operator.dev/docs/operator/api/) * [ArgoCD Documentation](https://argo-cd.readthedocs.io/)