14 KiB
| title | linkTitle | weight | description |
|---|---|---|---|
| Observability Client | Observability Client | 60 | Core observability components for metrics collection, log aggregation, and monitoring |
Overview
The Observability Client stack provides essential monitoring and observability infrastructure for Kubernetes environments. As part of the Edge Developer Platform, it deploys client-side components that collect, process, and forward metrics and logs to centralized observability systems.
The stack integrates three core components: Kubernetes Metrics Server for resource metrics, Vector for log collection and forwarding, and Victoria Metrics for comprehensive metrics monitoring and alerting.
Key Features
- Resource Metrics: Real-time CPU and memory metrics via Kubernetes Metrics Server
- Log Aggregation: Unified log collection and forwarding with Vector
- Metrics Monitoring: Comprehensive metrics collection, storage, and alerting with Victoria Metrics
- Prometheus Compatibility: Full Prometheus protocol support for metrics scraping
- Multi-Tenant Support: Configurable tenant isolation for metrics and logs
- Automated Alerting: Pre-configured alert rules with Alertmanager integration
- Grafana Integration: Built-in dashboard provisioning and datasource configuration
Repository
Code: Observability Client Stack Templates
Documentation:
Getting Started
Prerequisites
- Kubernetes cluster with ArgoCD installed (provided by
corestack) - cert-manager for certificate management (provided by
otcstack) - Observability backend services for receiving metrics and logs
Quick Start
The Observability Client stack is deployed as part of the EDP installation process:
-
Trigger Deploy Pipeline
- Go to Infra Deploy Pipeline
- Click on Run workflow
- Enter a name in "Select environment directory to deploy". This must be DNS Compatible.
- Execute workflow
-
ArgoCD Synchronization ArgoCD automatically deploys:
- Metrics Server (Helm chart v3.12.2)
- Vector agent (Helm chart v0.43.0)
- Victoria Metrics k8s-stack (Helm chart v0.48.1)
- ServiceMonitor resources for Prometheus scraping
- Authentication secrets for remote write endpoints
Verification
Verify the Observability Client deployment:
# Check ArgoCD application status
kubectl get application -n argocd | grep -E "metrics-server|vector|vm-client"
# Verify Metrics Server is running
kubectl get pods -n observability -l app.kubernetes.io/name=metrics-server
# Test metrics API
kubectl top nodes
kubectl top pods -A
# Verify Vector pods are running
kubectl get pods -n observability -l app.kubernetes.io/name=vector
# Check Victoria Metrics components
kubectl get pods -n observability -l app.kubernetes.io/name=victoria-metrics-k8s-stack
# Verify ServiceMonitor resources
kubectl get servicemonitor -n observability
Architecture
Component Architecture
The Observability Client stack consists of three integrated components:
Metrics Server:
- Collects resource metrics (CPU, memory) from kubelet
- Provides Metrics API for kubectl top and HPA
- Lightweight aggregator for cluster-wide resource usage
- Exposes ServiceMonitor for Prometheus scraping
Vector Agent:
- DaemonSet deployment for log collection across all nodes
- Processes and transforms Kubernetes logs
- Forwards logs to centralized Elasticsearch backend
- Injects cluster metadata and environment information
- Supports compression and bulk operations
Victoria Metrics Stack:
- VMAgent: Scrapes metrics from Kubernetes components and applications
- VMAlertmanager: Manages alert routing and notifications
- VMOperator: Manages VictoriaMetrics CRDs and lifecycle
- Integration with remote Victoria Metrics storage
- Supports multi-tenant metrics isolation
Data Flow
Kubernetes Resources → Metrics Server → Metrics API
↓
ServiceMonitor → VMAgent → Remote VictoriaMetrics
Application Logs → Vector Agent → Transform → Remote Elasticsearch
Prometheus Exporters → VMAgent → Remote VictoriaMetrics → VMAlertmanager
Configuration
Metrics Server Configuration
Configured in stacks/observability-client/metrics-server/values.yaml:
metrics:
enabled: true
serviceMonitor:
enabled: true
Key Settings:
- Enables metrics collection endpoint
- Exposes ServiceMonitor for Prometheus-compatible scraping
- Deployed via Helm chart from
https://kubernetes-sigs.github.io/metrics-server/
Vector Configuration
Configured in stacks/observability-client/vector/values.yaml:
Role: Agent (DaemonSet deployment across nodes)
Authentication:
Credentials sourced from simple-user-secret:
VECTOR_USER: Username for remote write authenticationVECTOR_PASSWORD: Password for remote write authentication
Data Sources:
k8s: Collects Kubernetes container logsinternal_metrics: Gathers Vector internal metrics
Log Processing:
transforms:
parser:
- Parse JSON from log messages
- Inject cluster environment metadata
- Remove original message field
Output Sink:
- Elasticsearch bulk API (v8)
- Basic authentication with environment variables
- Gzip compression enabled
- Custom headers: AccountID and ProjectID
Victoria Metrics Stack Configuration
Configured in stacks/observability-client/vm-client-stack/values.yaml:
Operator Settings:
- Enabled with admission webhooks
- Managed by cert-manager for ArgoCD compatibility
VMAgent Configuration:
- Basic authentication for remote write
- Credentials from
vm-remote-write-secret - Stream parsing enabled
- Drop original labels to reduce memory footprint
Monitoring Targets:
- Node exporter for hardware metrics
- kube-state-metrics for Kubernetes object states
- Kubelet metrics (cadvisor)
- Kubernetes control plane components (API server, etcd, scheduler, controller manager)
- CoreDNS metrics
Alertmanager Integration:
- Slack notification templates
- Configurable routing rules
- TLS support for secure communication
Storage Options:
- VMSingle: Single-node deployment
- VMCluster: Distributed deployment with replication
- Configurable retention period
ArgoCD Application Configuration
Metrics Server Application (template/stacks/observability-client/metrics-server.yaml):
- Name:
metrics-server - Chart version: 3.12.2
- Automated sync with self-heal enabled
- Namespace:
observability
Vector Application (template/stacks/observability-client/vector.yaml):
- Name:
vector - Chart version: 0.43.0
- Automated sync with self-heal enabled
- Namespace:
observability
Victoria Metrics Application (template/stacks/observability-client/vm-client-stack.yaml):
- Name:
vm-client - Chart version: 0.48.1
- Automated sync with self-heal enabled
- Namespace:
observability - References manifests from instance repository
Usage Examples
Querying Resource Metrics
Access resource metrics collected by Metrics Server:
# View node resource usage
kubectl top nodes
# View pod resource usage across all namespaces
kubectl top pods -A
# View pod resource usage in specific namespace
kubectl top pods -n observability
# Sort pods by CPU usage
kubectl top pods -A --sort-by=cpu
# Sort pods by memory usage
kubectl top pods -A --sort-by=memory
Using Metrics for Autoscaling
Create Horizontal Pod Autoscaler based on metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Accessing Application Logs
Vector automatically collects logs from all containers. View logs in your centralized Elasticsearch/Kibana:
# Logs are automatically forwarded to Elasticsearch
# Access via Kibana dashboard or Elasticsearch API
# Example: Query logs via Elasticsearch API
curl -u $VECTOR_USER:$VECTOR_PASSWORD \
-X GET "https://elasticsearch.example.com/_search" \
-H 'Content-Type: application/json' \
-d '{
"query": {
"match": {
"kubernetes.namespace": "my-namespace"
}
}
}'
Querying Victoria Metrics
Query metrics collected by Victoria Metrics:
# Access Victoria Metrics query API
# Metrics are forwarded to remote Victoria Metrics instance
# Example PromQL queries:
# - Container CPU usage: container_cpu_usage_seconds_total
# - Pod memory usage: container_memory_usage_bytes
# - Node disk I/O: node_disk_io_time_seconds_total
# Query via Victoria Metrics API
curl -X POST https://victoriametrics.example.com/api/v1/query \
-d 'query=up' \
-d 'time=2025-12-16T00:00:00Z'
Creating Custom ServiceMonitors
Expose application metrics for collection:
apiVersion: v1
kind: Service
metadata:
name: myapp-metrics
labels:
app: myapp
spec:
ports:
- name: metrics
port: 8080
targetPort: 8080
selector:
app: myapp
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp-monitor
namespace: observability
spec:
selector:
matchLabels:
app: myapp
endpoints:
- port: metrics
path: /metrics
interval: 30s
Integration Points
- Core Stack: Depends on ArgoCD for deployment orchestration
- OTC Stack: Requires cert-manager for certificate management
- Observability Stack: Forwards metrics and logs to centralized observability backend
- All Application Stacks: Collects metrics and logs from all platform applications
Troubleshooting
Metrics Server Not Responding
Problem: kubectl top commands fail or return no data
Solution:
-
Check Metrics Server pod status:
kubectl get pods -n observability -l app.kubernetes.io/name=metrics-server kubectl logs -n observability -l app.kubernetes.io/name=metrics-server -
Verify kubelet metrics endpoint:
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes -
Check ServiceMonitor configuration:
kubectl get servicemonitor -n observability -o yaml
Vector Not Forwarding Logs
Problem: Logs are not appearing in Elasticsearch
Solution:
-
Check Vector agent status:
kubectl get pods -n observability -l app.kubernetes.io/name=vector kubectl logs -n observability -l app.kubernetes.io/name=vector --tail=50 -
Verify authentication secret:
kubectl get secret simple-user-secret -n observability kubectl get secret simple-user-secret -n observability -o jsonpath='{.data.username}' | base64 -d -
Test Elasticsearch connectivity:
kubectl exec -it -n observability $(kubectl get pod -n observability -l app.kubernetes.io/name=vector -o jsonpath='{.items[0].metadata.name}') -- \ curl -u $VECTOR_USER:$VECTOR_PASSWORD https://elasticsearch.example.com/_cluster/health -
Check Vector internal metrics:
kubectl port-forward -n observability svc/vector 9090:9090 curl http://localhost:9090/metrics
Victoria Metrics Not Scraping
Problem: Metrics are not being collected or forwarded
Solution:
-
Check VMAgent status:
kubectl get pods -n observability -l app.kubernetes.io/name=vmagent kubectl logs -n observability -l app.kubernetes.io/name=vmagent -
Verify remote write secret:
kubectl get secret vm-remote-write-secret -n observability kubectl get secret vm-remote-write-secret -n observability -o jsonpath='{.data.username}' | base64 -d -
Check ServiceMonitor targets:
kubectl get servicemonitor -n observability kubectl describe servicemonitor metrics-server -n observability -
Verify operator is running:
kubectl get pods -n observability -l app.kubernetes.io/name=victoria-metrics-operator kubectl logs -n observability -l app.kubernetes.io/name=victoria-metrics-operator
High Memory Usage
Problem: Victoria Metrics or Vector consuming excessive memory
Solution:
-
For Victoria Metrics, verify
dropOriginalLabelsis enabled:kubectl get vmagent -n observability -o yaml | grep dropOriginalLabels -
Reduce scrape intervals for high-cardinality metrics:
# Edit ServiceMonitor spec: endpoints: - interval: 60s # Increase from 30s -
Filter unnecessary logs in Vector:
# Add filter transform to Vector configuration transforms: filter: type: filter condition: '.kubernetes.namespace != "kube-system"' -
Check resource limits:
kubectl describe pod -n observability -l app.kubernetes.io/name=vmagent kubectl describe pod -n observability -l app.kubernetes.io/name=vector
Certificate Issues
Problem: TLS certificate errors in logs
Solution:
-
Verify cert-manager is running:
kubectl get pods -n cert-manager -
Check certificate status:
kubectl get certificate -n observability kubectl describe certificate -n observability -
Review webhook configuration:
kubectl get validatingwebhookconfigurations | grep victoria-metrics kubectl get mutatingwebhookconfigurations | grep victoria-metrics -
Restart operator if needed:
kubectl rollout restart deployment victoria-metrics-operator -n observability