diff --git a/content/en/docs/components/orchestration/stacks/observability-client.md b/content/en/docs/components/orchestration/stacks/observability-client.md index 09158c6..89382c2 100644 --- a/content/en/docs/components/orchestration/stacks/observability-client.md +++ b/content/en/docs/components/orchestration/stacks/observability-client.md @@ -1,127 +1,500 @@ --- -title: "Observability-Client" -linkTitle: "Observability-Client" +title: "Observability Client" +linkTitle: "Observability Client" weight: 60 -description: Observability-Client +description: > + Core observability components for metrics collection, log aggregation, and monitoring --- -{{% alert title="Draft" color="warning" %}} -**Editorial Status**: This page is currently being developed. - -* **Jira Ticket**: [TBD] -* **Assignee**: [Name or Team] -* **Status**: Draft -* **Last Updated**: YYYY-MM-DD -* **TODO**: - * [ ] Add detailed component description - * [ ] Include usage examples and code samples - * [ ] Add architecture diagrams - * [ ] Review and finalize content -{{% /alert %}} - ## Overview -[Detailed description of the component - what it is, what it does, and why it exists] +The Observability Client stack provides essential monitoring and observability infrastructure for Kubernetes environments. As part of the Edge Developer Platform, it deploys client-side components that collect, process, and forward metrics and logs to centralized observability systems. + +The stack integrates three core components: Kubernetes Metrics Server for resource metrics, Vector for log collection and forwarding, and Victoria Metrics for comprehensive metrics monitoring and alerting. ## Key Features -* [Feature 1] -* [Feature 2] -* [Feature 3] - -## Purpose in EDP - -[Explain the role this component plays in the Edge Developer Platform and how it contributes to the overall platform capabilities] +* **Resource Metrics**: Real-time CPU and memory metrics via Kubernetes Metrics Server +* **Log Aggregation**: Unified log collection and forwarding with Vector +* **Metrics Monitoring**: Comprehensive metrics collection, storage, and alerting with Victoria Metrics +* **Prometheus Compatibility**: Full Prometheus protocol support for metrics scraping +* **Multi-Tenant Support**: Configurable tenant isolation for metrics and logs +* **Automated Alerting**: Pre-configured alert rules with Alertmanager integration +* **Grafana Integration**: Built-in dashboard provisioning and datasource configuration ## Repository -**Code**: [Link to source code repository] +**Code**: [Observability Client Stack Templates](https://edp.buildth.ing/DevFW-CICD/stacks/src/branch/main/template/stacks/observability-client) -**Documentation**: [Link to component-specific documentation] +**Documentation**: +* [Kubernetes Metrics Server](https://github.com/kubernetes-sigs/metrics-server) +* [Vector Documentation](https://vector.dev/docs/) +* [Victoria Metrics Documentation](https://docs.victoriametrics.com/) ## Getting Started ### Prerequisites -* [Prerequisite 1] -* [Prerequisite 2] +* Kubernetes cluster with ArgoCD installed (provided by `core` stack) +* cert-manager for certificate management (provided by `otc` stack) +* Observability backend services for receiving metrics and logs ### Quick Start -[Step-by-step guide to get started with this component] +The Observability Client stack is deployed as part of the EDP installation process: -1. [Step 1] -2. [Step 2] -3. [Step 3] +1. **Trigger Deploy Pipeline** + - Go to [Infra Deploy Pipeline](https://edp.buildth.ing/DevFW/infra-deploy/actions?workflow=deploy.yaml) + - Click on Run workflow + - Enter a name in "Select environment directory to deploy". This must be DNS Compatible. + - Execute workflow + +2. **ArgoCD Synchronization** + ArgoCD automatically deploys: + - Metrics Server (Helm chart v3.12.2) + - Vector agent (Helm chart v0.43.0) + - Victoria Metrics k8s-stack (Helm chart v0.48.1) + - ServiceMonitor resources for Prometheus scraping + - Authentication secrets for remote write endpoints ### Verification -[How to verify the component is working correctly] - -## Usage Examples - -### [Use Case 1] - -[Example with code/commands showing common use case] +Verify the Observability Client deployment: ```bash -# Example commands +# Check ArgoCD application status +kubectl get application -n argocd | grep -E "metrics-server|vector|vm-client" + +# Verify Metrics Server is running +kubectl get pods -n observability -l app.kubernetes.io/name=metrics-server + +# Test metrics API +kubectl top nodes +kubectl top pods -A + +# Verify Vector pods are running +kubectl get pods -n observability -l app.kubernetes.io/name=vector + +# Check Victoria Metrics components +kubectl get pods -n observability -l app.kubernetes.io/name=victoria-metrics-k8s-stack + +# Verify ServiceMonitor resources +kubectl get servicemonitor -n observability ``` -### [Use Case 2] - -[Another common scenario] - -## Integration Points - -* **[Component A]**: [How it integrates] -* **[Component B]**: [How it integrates] -* **[Component C]**: [How it integrates] - ## Architecture -[Optional: Add architectural diagrams and descriptions] +### Component Architecture -### Component Architecture (C4) +The Observability Client stack consists of three integrated components: -[Add C4 Container or Component diagrams showing the internal structure] +**Metrics Server**: +- Collects resource metrics (CPU, memory) from kubelet +- Provides Metrics API for kubectl top and HPA +- Lightweight aggregator for cluster-wide resource usage +- Exposes ServiceMonitor for Prometheus scraping -### Sequence Diagrams +**Vector Agent**: +- DaemonSet deployment for log collection across all nodes +- Processes and transforms Kubernetes logs +- Forwards logs to centralized Elasticsearch backend +- Injects cluster metadata and environment information +- Supports compression and bulk operations -[Add sequence diagrams showing key interaction flows with other components] +**Victoria Metrics Stack**: +- VMAgent: Scrapes metrics from Kubernetes components and applications +- VMAlertmanager: Manages alert routing and notifications +- VMOperator: Manages VictoriaMetrics CRDs and lifecycle +- Integration with remote Victoria Metrics storage +- Supports multi-tenant metrics isolation -### Deployment Architecture +### Data Flow -[Add infrastructure and deployment diagrams showing how the component is deployed] +``` +Kubernetes Resources → Metrics Server → Metrics API + ↓ + ServiceMonitor → VMAgent → Remote VictoriaMetrics + +Application Logs → Vector Agent → Transform → Remote Elasticsearch + +Prometheus Exporters → VMAgent → Remote VictoriaMetrics → VMAlertmanager +``` ## Configuration -[Key configuration options and how to set them] +### Metrics Server Configuration + +Configured in `stacks/observability-client/metrics-server/values.yaml`: + +```yaml +metrics: + enabled: true +serviceMonitor: + enabled: true +``` + +**Key Settings**: +- Enables metrics collection endpoint +- Exposes ServiceMonitor for Prometheus-compatible scraping +- Deployed via Helm chart from `https://kubernetes-sigs.github.io/metrics-server/` + +### Vector Configuration + +Configured in `stacks/observability-client/vector/values.yaml`: + +**Role**: Agent (DaemonSet deployment across nodes) + +**Authentication**: +Credentials sourced from `simple-user-secret`: +- `VECTOR_USER`: Username for remote write authentication +- `VECTOR_PASSWORD`: Password for remote write authentication + +**Data Sources**: +- `k8s`: Collects Kubernetes container logs +- `internal_metrics`: Gathers Vector internal metrics + +**Log Processing**: +```yaml +transforms: + parser: + - Parse JSON from log messages + - Inject cluster environment metadata + - Remove original message field +``` + +**Output Sink**: +- Elasticsearch bulk API (v8) +- Basic authentication with environment variables +- Gzip compression enabled +- Custom headers: AccountID and ProjectID + +### Victoria Metrics Stack Configuration + +Configured in `stacks/observability-client/vm-client-stack/values.yaml`: + +**Operator Settings**: +- Enabled with admission webhooks +- Managed by cert-manager for ArgoCD compatibility + +**VMAgent Configuration**: +- Basic authentication for remote write +- Credentials from `vm-remote-write-secret` +- Stream parsing enabled +- Drop original labels to reduce memory footprint + +**Monitoring Targets**: +- Node exporter for hardware metrics +- kube-state-metrics for Kubernetes object states +- Kubelet metrics (cadvisor) +- Kubernetes control plane components (API server, etcd, scheduler, controller manager) +- CoreDNS metrics + +**Alertmanager Integration**: +- Slack notification templates +- Configurable routing rules +- TLS support for secure communication + +**Storage Options**: +- VMSingle: Single-node deployment +- VMCluster: Distributed deployment with replication +- Configurable retention period + +## ArgoCD Application Configuration + +**Metrics Server Application** (`template/stacks/observability-client/metrics-server.yaml`): +- Name: `metrics-server` +- Chart version: 3.12.2 +- Automated sync with self-heal enabled +- Namespace: `observability` + +**Vector Application** (`template/stacks/observability-client/vector.yaml`): +- Name: `vector` +- Chart version: 0.43.0 +- Automated sync with self-heal enabled +- Namespace: `observability` + +**Victoria Metrics Application** (`template/stacks/observability-client/vm-client-stack.yaml`): +- Name: `vm-client` +- Chart version: 0.48.1 +- Automated sync with self-heal enabled +- Namespace: `observability` +- References manifests from instance repository + +## Usage Examples + +### Querying Resource Metrics + +Access resource metrics collected by Metrics Server: + +```bash +# View node resource usage +kubectl top nodes + +# View pod resource usage across all namespaces +kubectl top pods -A + +# View pod resource usage in specific namespace +kubectl top pods -n observability + +# Sort pods by CPU usage +kubectl top pods -A --sort-by=cpu + +# Sort pods by memory usage +kubectl top pods -A --sort-by=memory +``` + +### Using Metrics for Autoscaling + +Create Horizontal Pod Autoscaler based on metrics: + +```yaml +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +metadata: + name: myapp-hpa +spec: + scaleTargetRef: + apiVersion: apps/v1 + kind: Deployment + name: myapp + minReplicas: 2 + maxReplicas: 10 + metrics: + - type: Resource + resource: + name: cpu + target: + type: Utilization + averageUtilization: 70 +``` + +### Accessing Application Logs + +Vector automatically collects logs from all containers. View logs in your centralized Elasticsearch/Kibana: + +```bash +# Logs are automatically forwarded to Elasticsearch +# Access via Kibana dashboard or Elasticsearch API + +# Example: Query logs via Elasticsearch API +curl -u $VECTOR_USER:$VECTOR_PASSWORD \ + -X GET "https://elasticsearch.example.com/_search" \ + -H 'Content-Type: application/json' \ + -d '{ + "query": { + "match": { + "kubernetes.namespace": "my-namespace" + } + } + }' +``` + +### Querying Victoria Metrics + +Query metrics collected by Victoria Metrics: + +```bash +# Access Victoria Metrics query API +# Metrics are forwarded to remote Victoria Metrics instance + +# Example PromQL queries: +# - Container CPU usage: container_cpu_usage_seconds_total +# - Pod memory usage: container_memory_usage_bytes +# - Node disk I/O: node_disk_io_time_seconds_total + +# Query via Victoria Metrics API +curl -X POST https://victoriametrics.example.com/api/v1/query \ + -d 'query=up' \ + -d 'time=2025-12-16T00:00:00Z' +``` + +### Creating Custom ServiceMonitors + +Expose application metrics for collection: + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: myapp-metrics + labels: + app: myapp +spec: + ports: + - name: metrics + port: 8080 + targetPort: 8080 + selector: + app: myapp +--- +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: myapp-monitor + namespace: observability +spec: + selector: + matchLabels: + app: myapp + endpoints: + - port: metrics + path: /metrics + interval: 30s +``` + +## Integration Points + +* **Core Stack**: Depends on ArgoCD for deployment orchestration +* **OTC Stack**: Requires cert-manager for certificate management +* **Observability Stack**: Forwards metrics and logs to centralized observability backend +* **All Application Stacks**: Collects metrics and logs from all platform applications ## Troubleshooting -### [Common Issue 1] +### Metrics Server Not Responding -**Problem**: [Description] +**Problem**: `kubectl top` commands fail or return no data -**Solution**: [How to fix] +**Solution**: +1. Check Metrics Server pod status: + ```bash + kubectl get pods -n observability -l app.kubernetes.io/name=metrics-server + kubectl logs -n observability -l app.kubernetes.io/name=metrics-server + ``` -### [Common Issue 2] +2. Verify kubelet metrics endpoint: + ```bash + kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes + ``` -**Problem**: [Description] +3. Check ServiceMonitor configuration: + ```bash + kubectl get servicemonitor -n observability -o yaml + ``` -**Solution**: [How to fix] +### Vector Not Forwarding Logs -## Status +**Problem**: Logs are not appearing in Elasticsearch -**Maturity**: [Production / Beta / Experimental] +**Solution**: +1. Check Vector agent status: + ```bash + kubectl get pods -n observability -l app.kubernetes.io/name=vector + kubectl logs -n observability -l app.kubernetes.io/name=vector --tail=50 + ``` + +2. Verify authentication secret: + ```bash + kubectl get secret simple-user-secret -n observability + kubectl get secret simple-user-secret -n observability -o jsonpath='{.data.username}' | base64 -d + ``` + +3. Test Elasticsearch connectivity: + ```bash + kubectl exec -it -n observability $(kubectl get pod -n observability -l app.kubernetes.io/name=vector -o jsonpath='{.items[0].metadata.name}') -- \ + curl -u $VECTOR_USER:$VECTOR_PASSWORD https://elasticsearch.example.com/_cluster/health + ``` + +4. Check Vector internal metrics: + ```bash + kubectl port-forward -n observability svc/vector 9090:9090 + curl http://localhost:9090/metrics + ``` + +### Victoria Metrics Not Scraping + +**Problem**: Metrics are not being collected or forwarded + +**Solution**: +1. Check VMAgent status: + ```bash + kubectl get pods -n observability -l app.kubernetes.io/name=vmagent + kubectl logs -n observability -l app.kubernetes.io/name=vmagent + ``` + +2. Verify remote write secret: + ```bash + kubectl get secret vm-remote-write-secret -n observability + kubectl get secret vm-remote-write-secret -n observability -o jsonpath='{.data.username}' | base64 -d + ``` + +3. Check ServiceMonitor targets: + ```bash + kubectl get servicemonitor -n observability + kubectl describe servicemonitor metrics-server -n observability + ``` + +4. Verify operator is running: + ```bash + kubectl get pods -n observability -l app.kubernetes.io/name=victoria-metrics-operator + kubectl logs -n observability -l app.kubernetes.io/name=victoria-metrics-operator + ``` + +### High Memory Usage + +**Problem**: Victoria Metrics or Vector consuming excessive memory + +**Solution**: +1. For Victoria Metrics, verify `dropOriginalLabels` is enabled: + ```bash + kubectl get vmagent -n observability -o yaml | grep dropOriginalLabels + ``` + +2. Reduce scrape intervals for high-cardinality metrics: + ```yaml + # Edit ServiceMonitor + spec: + endpoints: + - interval: 60s # Increase from 30s + ``` + +3. Filter unnecessary logs in Vector: + ```yaml + # Add filter transform to Vector configuration + transforms: + filter: + type: filter + condition: '.kubernetes.namespace != "kube-system"' + ``` + +4. Check resource limits: + ```bash + kubectl describe pod -n observability -l app.kubernetes.io/name=vmagent + kubectl describe pod -n observability -l app.kubernetes.io/name=vector + ``` + +### Certificate Issues + +**Problem**: TLS certificate errors in logs + +**Solution**: +1. Verify cert-manager is running: + ```bash + kubectl get pods -n cert-manager + ``` + +2. Check certificate status: + ```bash + kubectl get certificate -n observability + kubectl describe certificate -n observability + ``` + +3. Review webhook configuration: + ```bash + kubectl get validatingwebhookconfigurations | grep victoria-metrics + kubectl get mutatingwebhookconfigurations | grep victoria-metrics + ``` + +4. Restart operator if needed: + ```bash + kubectl rollout restart deployment victoria-metrics-operator -n observability + ``` ## Additional Resources -* [Link to external documentation] -* [Link to community resources] -* [Link to related components] - -## Documentation Notes - -[Instructions for team members filling in this documentation - remove this section once complete] +* [Kubernetes Metrics Server Documentation](https://github.com/kubernetes-sigs/metrics-server) +* [Vector Documentation](https://vector.dev/docs/) +* [Victoria Metrics Documentation](https://docs.victoriametrics.com/) +* [Victoria Metrics Operator](https://docs.victoriametrics.com/operator/) +* [Prometheus Operator API](https://prometheus-operator.dev/docs/operator/api/) +* [ArgoCD Documentation](https://argo-cd.readthedocs.io/)