added obs-client stack docs
Some checks failed
Hugo Site Tests / test (push) Failing after 1s
ci / build (push) Successful in 1m1s

This commit is contained in:
Manuel Ganter 2025-12-16 11:03:37 +01:00
parent eb1aaec0bc
commit babd8df7b5
No known key found for this signature in database

View file

@ -1,127 +1,500 @@
---
title: "Observability-Client"
linkTitle: "Observability-Client"
title: "Observability Client"
linkTitle: "Observability Client"
weight: 60
description: Observability-Client
description: >
Core observability components for metrics collection, log aggregation, and monitoring
---
{{% alert title="Draft" color="warning" %}}
**Editorial Status**: This page is currently being developed.
* **Jira Ticket**: [TBD]
* **Assignee**: [Name or Team]
* **Status**: Draft
* **Last Updated**: YYYY-MM-DD
* **TODO**:
* [ ] Add detailed component description
* [ ] Include usage examples and code samples
* [ ] Add architecture diagrams
* [ ] Review and finalize content
{{% /alert %}}
## Overview
[Detailed description of the component - what it is, what it does, and why it exists]
The Observability Client stack provides essential monitoring and observability infrastructure for Kubernetes environments. As part of the Edge Developer Platform, it deploys client-side components that collect, process, and forward metrics and logs to centralized observability systems.
The stack integrates three core components: Kubernetes Metrics Server for resource metrics, Vector for log collection and forwarding, and Victoria Metrics for comprehensive metrics monitoring and alerting.
## Key Features
* [Feature 1]
* [Feature 2]
* [Feature 3]
## Purpose in EDP
[Explain the role this component plays in the Edge Developer Platform and how it contributes to the overall platform capabilities]
* **Resource Metrics**: Real-time CPU and memory metrics via Kubernetes Metrics Server
* **Log Aggregation**: Unified log collection and forwarding with Vector
* **Metrics Monitoring**: Comprehensive metrics collection, storage, and alerting with Victoria Metrics
* **Prometheus Compatibility**: Full Prometheus protocol support for metrics scraping
* **Multi-Tenant Support**: Configurable tenant isolation for metrics and logs
* **Automated Alerting**: Pre-configured alert rules with Alertmanager integration
* **Grafana Integration**: Built-in dashboard provisioning and datasource configuration
## Repository
**Code**: [Link to source code repository]
**Code**: [Observability Client Stack Templates](https://edp.buildth.ing/DevFW-CICD/stacks/src/branch/main/template/stacks/observability-client)
**Documentation**: [Link to component-specific documentation]
**Documentation**:
* [Kubernetes Metrics Server](https://github.com/kubernetes-sigs/metrics-server)
* [Vector Documentation](https://vector.dev/docs/)
* [Victoria Metrics Documentation](https://docs.victoriametrics.com/)
## Getting Started
### Prerequisites
* [Prerequisite 1]
* [Prerequisite 2]
* Kubernetes cluster with ArgoCD installed (provided by `core` stack)
* cert-manager for certificate management (provided by `otc` stack)
* Observability backend services for receiving metrics and logs
### Quick Start
[Step-by-step guide to get started with this component]
The Observability Client stack is deployed as part of the EDP installation process:
1. [Step 1]
2. [Step 2]
3. [Step 3]
1. **Trigger Deploy Pipeline**
- Go to [Infra Deploy Pipeline](https://edp.buildth.ing/DevFW/infra-deploy/actions?workflow=deploy.yaml)
- Click on Run workflow
- Enter a name in "Select environment directory to deploy". This must be DNS Compatible.
- Execute workflow
2. **ArgoCD Synchronization**
ArgoCD automatically deploys:
- Metrics Server (Helm chart v3.12.2)
- Vector agent (Helm chart v0.43.0)
- Victoria Metrics k8s-stack (Helm chart v0.48.1)
- ServiceMonitor resources for Prometheus scraping
- Authentication secrets for remote write endpoints
### Verification
[How to verify the component is working correctly]
## Usage Examples
### [Use Case 1]
[Example with code/commands showing common use case]
Verify the Observability Client deployment:
```bash
# Example commands
# Check ArgoCD application status
kubectl get application -n argocd | grep -E "metrics-server|vector|vm-client"
# Verify Metrics Server is running
kubectl get pods -n observability -l app.kubernetes.io/name=metrics-server
# Test metrics API
kubectl top nodes
kubectl top pods -A
# Verify Vector pods are running
kubectl get pods -n observability -l app.kubernetes.io/name=vector
# Check Victoria Metrics components
kubectl get pods -n observability -l app.kubernetes.io/name=victoria-metrics-k8s-stack
# Verify ServiceMonitor resources
kubectl get servicemonitor -n observability
```
### [Use Case 2]
[Another common scenario]
## Integration Points
* **[Component A]**: [How it integrates]
* **[Component B]**: [How it integrates]
* **[Component C]**: [How it integrates]
## Architecture
[Optional: Add architectural diagrams and descriptions]
### Component Architecture
### Component Architecture (C4)
The Observability Client stack consists of three integrated components:
[Add C4 Container or Component diagrams showing the internal structure]
**Metrics Server**:
- Collects resource metrics (CPU, memory) from kubelet
- Provides Metrics API for kubectl top and HPA
- Lightweight aggregator for cluster-wide resource usage
- Exposes ServiceMonitor for Prometheus scraping
### Sequence Diagrams
**Vector Agent**:
- DaemonSet deployment for log collection across all nodes
- Processes and transforms Kubernetes logs
- Forwards logs to centralized Elasticsearch backend
- Injects cluster metadata and environment information
- Supports compression and bulk operations
[Add sequence diagrams showing key interaction flows with other components]
**Victoria Metrics Stack**:
- VMAgent: Scrapes metrics from Kubernetes components and applications
- VMAlertmanager: Manages alert routing and notifications
- VMOperator: Manages VictoriaMetrics CRDs and lifecycle
- Integration with remote Victoria Metrics storage
- Supports multi-tenant metrics isolation
### Deployment Architecture
### Data Flow
[Add infrastructure and deployment diagrams showing how the component is deployed]
```
Kubernetes Resources → Metrics Server → Metrics API
ServiceMonitor → VMAgent → Remote VictoriaMetrics
Application Logs → Vector Agent → Transform → Remote Elasticsearch
Prometheus Exporters → VMAgent → Remote VictoriaMetrics → VMAlertmanager
```
## Configuration
[Key configuration options and how to set them]
### Metrics Server Configuration
Configured in `stacks/observability-client/metrics-server/values.yaml`:
```yaml
metrics:
enabled: true
serviceMonitor:
enabled: true
```
**Key Settings**:
- Enables metrics collection endpoint
- Exposes ServiceMonitor for Prometheus-compatible scraping
- Deployed via Helm chart from `https://kubernetes-sigs.github.io/metrics-server/`
### Vector Configuration
Configured in `stacks/observability-client/vector/values.yaml`:
**Role**: Agent (DaemonSet deployment across nodes)
**Authentication**:
Credentials sourced from `simple-user-secret`:
- `VECTOR_USER`: Username for remote write authentication
- `VECTOR_PASSWORD`: Password for remote write authentication
**Data Sources**:
- `k8s`: Collects Kubernetes container logs
- `internal_metrics`: Gathers Vector internal metrics
**Log Processing**:
```yaml
transforms:
parser:
- Parse JSON from log messages
- Inject cluster environment metadata
- Remove original message field
```
**Output Sink**:
- Elasticsearch bulk API (v8)
- Basic authentication with environment variables
- Gzip compression enabled
- Custom headers: AccountID and ProjectID
### Victoria Metrics Stack Configuration
Configured in `stacks/observability-client/vm-client-stack/values.yaml`:
**Operator Settings**:
- Enabled with admission webhooks
- Managed by cert-manager for ArgoCD compatibility
**VMAgent Configuration**:
- Basic authentication for remote write
- Credentials from `vm-remote-write-secret`
- Stream parsing enabled
- Drop original labels to reduce memory footprint
**Monitoring Targets**:
- Node exporter for hardware metrics
- kube-state-metrics for Kubernetes object states
- Kubelet metrics (cadvisor)
- Kubernetes control plane components (API server, etcd, scheduler, controller manager)
- CoreDNS metrics
**Alertmanager Integration**:
- Slack notification templates
- Configurable routing rules
- TLS support for secure communication
**Storage Options**:
- VMSingle: Single-node deployment
- VMCluster: Distributed deployment with replication
- Configurable retention period
## ArgoCD Application Configuration
**Metrics Server Application** (`template/stacks/observability-client/metrics-server.yaml`):
- Name: `metrics-server`
- Chart version: 3.12.2
- Automated sync with self-heal enabled
- Namespace: `observability`
**Vector Application** (`template/stacks/observability-client/vector.yaml`):
- Name: `vector`
- Chart version: 0.43.0
- Automated sync with self-heal enabled
- Namespace: `observability`
**Victoria Metrics Application** (`template/stacks/observability-client/vm-client-stack.yaml`):
- Name: `vm-client`
- Chart version: 0.48.1
- Automated sync with self-heal enabled
- Namespace: `observability`
- References manifests from instance repository
## Usage Examples
### Querying Resource Metrics
Access resource metrics collected by Metrics Server:
```bash
# View node resource usage
kubectl top nodes
# View pod resource usage across all namespaces
kubectl top pods -A
# View pod resource usage in specific namespace
kubectl top pods -n observability
# Sort pods by CPU usage
kubectl top pods -A --sort-by=cpu
# Sort pods by memory usage
kubectl top pods -A --sort-by=memory
```
### Using Metrics for Autoscaling
Create Horizontal Pod Autoscaler based on metrics:
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
```
### Accessing Application Logs
Vector automatically collects logs from all containers. View logs in your centralized Elasticsearch/Kibana:
```bash
# Logs are automatically forwarded to Elasticsearch
# Access via Kibana dashboard or Elasticsearch API
# Example: Query logs via Elasticsearch API
curl -u $VECTOR_USER:$VECTOR_PASSWORD \
-X GET "https://elasticsearch.example.com/_search" \
-H 'Content-Type: application/json' \
-d '{
"query": {
"match": {
"kubernetes.namespace": "my-namespace"
}
}
}'
```
### Querying Victoria Metrics
Query metrics collected by Victoria Metrics:
```bash
# Access Victoria Metrics query API
# Metrics are forwarded to remote Victoria Metrics instance
# Example PromQL queries:
# - Container CPU usage: container_cpu_usage_seconds_total
# - Pod memory usage: container_memory_usage_bytes
# - Node disk I/O: node_disk_io_time_seconds_total
# Query via Victoria Metrics API
curl -X POST https://victoriametrics.example.com/api/v1/query \
-d 'query=up' \
-d 'time=2025-12-16T00:00:00Z'
```
### Creating Custom ServiceMonitors
Expose application metrics for collection:
```yaml
apiVersion: v1
kind: Service
metadata:
name: myapp-metrics
labels:
app: myapp
spec:
ports:
- name: metrics
port: 8080
targetPort: 8080
selector:
app: myapp
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp-monitor
namespace: observability
spec:
selector:
matchLabels:
app: myapp
endpoints:
- port: metrics
path: /metrics
interval: 30s
```
## Integration Points
* **Core Stack**: Depends on ArgoCD for deployment orchestration
* **OTC Stack**: Requires cert-manager for certificate management
* **Observability Stack**: Forwards metrics and logs to centralized observability backend
* **All Application Stacks**: Collects metrics and logs from all platform applications
## Troubleshooting
### [Common Issue 1]
### Metrics Server Not Responding
**Problem**: [Description]
**Problem**: `kubectl top` commands fail or return no data
**Solution**: [How to fix]
**Solution**:
1. Check Metrics Server pod status:
```bash
kubectl get pods -n observability -l app.kubernetes.io/name=metrics-server
kubectl logs -n observability -l app.kubernetes.io/name=metrics-server
```
### [Common Issue 2]
2. Verify kubelet metrics endpoint:
```bash
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
```
**Problem**: [Description]
3. Check ServiceMonitor configuration:
```bash
kubectl get servicemonitor -n observability -o yaml
```
**Solution**: [How to fix]
### Vector Not Forwarding Logs
## Status
**Problem**: Logs are not appearing in Elasticsearch
**Maturity**: [Production / Beta / Experimental]
**Solution**:
1. Check Vector agent status:
```bash
kubectl get pods -n observability -l app.kubernetes.io/name=vector
kubectl logs -n observability -l app.kubernetes.io/name=vector --tail=50
```
2. Verify authentication secret:
```bash
kubectl get secret simple-user-secret -n observability
kubectl get secret simple-user-secret -n observability -o jsonpath='{.data.username}' | base64 -d
```
3. Test Elasticsearch connectivity:
```bash
kubectl exec -it -n observability $(kubectl get pod -n observability -l app.kubernetes.io/name=vector -o jsonpath='{.items[0].metadata.name}') -- \
curl -u $VECTOR_USER:$VECTOR_PASSWORD https://elasticsearch.example.com/_cluster/health
```
4. Check Vector internal metrics:
```bash
kubectl port-forward -n observability svc/vector 9090:9090
curl http://localhost:9090/metrics
```
### Victoria Metrics Not Scraping
**Problem**: Metrics are not being collected or forwarded
**Solution**:
1. Check VMAgent status:
```bash
kubectl get pods -n observability -l app.kubernetes.io/name=vmagent
kubectl logs -n observability -l app.kubernetes.io/name=vmagent
```
2. Verify remote write secret:
```bash
kubectl get secret vm-remote-write-secret -n observability
kubectl get secret vm-remote-write-secret -n observability -o jsonpath='{.data.username}' | base64 -d
```
3. Check ServiceMonitor targets:
```bash
kubectl get servicemonitor -n observability
kubectl describe servicemonitor metrics-server -n observability
```
4. Verify operator is running:
```bash
kubectl get pods -n observability -l app.kubernetes.io/name=victoria-metrics-operator
kubectl logs -n observability -l app.kubernetes.io/name=victoria-metrics-operator
```
### High Memory Usage
**Problem**: Victoria Metrics or Vector consuming excessive memory
**Solution**:
1. For Victoria Metrics, verify `dropOriginalLabels` is enabled:
```bash
kubectl get vmagent -n observability -o yaml | grep dropOriginalLabels
```
2. Reduce scrape intervals for high-cardinality metrics:
```yaml
# Edit ServiceMonitor
spec:
endpoints:
- interval: 60s # Increase from 30s
```
3. Filter unnecessary logs in Vector:
```yaml
# Add filter transform to Vector configuration
transforms:
filter:
type: filter
condition: '.kubernetes.namespace != "kube-system"'
```
4. Check resource limits:
```bash
kubectl describe pod -n observability -l app.kubernetes.io/name=vmagent
kubectl describe pod -n observability -l app.kubernetes.io/name=vector
```
### Certificate Issues
**Problem**: TLS certificate errors in logs
**Solution**:
1. Verify cert-manager is running:
```bash
kubectl get pods -n cert-manager
```
2. Check certificate status:
```bash
kubectl get certificate -n observability
kubectl describe certificate -n observability
```
3. Review webhook configuration:
```bash
kubectl get validatingwebhookconfigurations | grep victoria-metrics
kubectl get mutatingwebhookconfigurations | grep victoria-metrics
```
4. Restart operator if needed:
```bash
kubectl rollout restart deployment victoria-metrics-operator -n observability
```
## Additional Resources
* [Link to external documentation]
* [Link to community resources]
* [Link to related components]
## Documentation Notes
[Instructions for team members filling in this documentation - remove this section once complete]
* [Kubernetes Metrics Server Documentation](https://github.com/kubernetes-sigs/metrics-server)
* [Vector Documentation](https://vector.dev/docs/)
* [Victoria Metrics Documentation](https://docs.victoriametrics.com/)
* [Victoria Metrics Operator](https://docs.victoriametrics.com/operator/)
* [Prometheus Operator API](https://prometheus-operator.dev/docs/operator/api/)
* [ArgoCD Documentation](https://argo-cd.readthedocs.io/)