added obs-client stack docs
This commit is contained in:
parent
eb1aaec0bc
commit
babd8df7b5
1 changed files with 447 additions and 74 deletions
|
|
@ -1,127 +1,500 @@
|
|||
---
|
||||
title: "Observability-Client"
|
||||
linkTitle: "Observability-Client"
|
||||
title: "Observability Client"
|
||||
linkTitle: "Observability Client"
|
||||
weight: 60
|
||||
description: Observability-Client
|
||||
description: >
|
||||
Core observability components for metrics collection, log aggregation, and monitoring
|
||||
---
|
||||
|
||||
{{% alert title="Draft" color="warning" %}}
|
||||
**Editorial Status**: This page is currently being developed.
|
||||
|
||||
* **Jira Ticket**: [TBD]
|
||||
* **Assignee**: [Name or Team]
|
||||
* **Status**: Draft
|
||||
* **Last Updated**: YYYY-MM-DD
|
||||
* **TODO**:
|
||||
* [ ] Add detailed component description
|
||||
* [ ] Include usage examples and code samples
|
||||
* [ ] Add architecture diagrams
|
||||
* [ ] Review and finalize content
|
||||
{{% /alert %}}
|
||||
|
||||
## Overview
|
||||
|
||||
[Detailed description of the component - what it is, what it does, and why it exists]
|
||||
The Observability Client stack provides essential monitoring and observability infrastructure for Kubernetes environments. As part of the Edge Developer Platform, it deploys client-side components that collect, process, and forward metrics and logs to centralized observability systems.
|
||||
|
||||
The stack integrates three core components: Kubernetes Metrics Server for resource metrics, Vector for log collection and forwarding, and Victoria Metrics for comprehensive metrics monitoring and alerting.
|
||||
|
||||
## Key Features
|
||||
|
||||
* [Feature 1]
|
||||
* [Feature 2]
|
||||
* [Feature 3]
|
||||
|
||||
## Purpose in EDP
|
||||
|
||||
[Explain the role this component plays in the Edge Developer Platform and how it contributes to the overall platform capabilities]
|
||||
* **Resource Metrics**: Real-time CPU and memory metrics via Kubernetes Metrics Server
|
||||
* **Log Aggregation**: Unified log collection and forwarding with Vector
|
||||
* **Metrics Monitoring**: Comprehensive metrics collection, storage, and alerting with Victoria Metrics
|
||||
* **Prometheus Compatibility**: Full Prometheus protocol support for metrics scraping
|
||||
* **Multi-Tenant Support**: Configurable tenant isolation for metrics and logs
|
||||
* **Automated Alerting**: Pre-configured alert rules with Alertmanager integration
|
||||
* **Grafana Integration**: Built-in dashboard provisioning and datasource configuration
|
||||
|
||||
## Repository
|
||||
|
||||
**Code**: [Link to source code repository]
|
||||
**Code**: [Observability Client Stack Templates](https://edp.buildth.ing/DevFW-CICD/stacks/src/branch/main/template/stacks/observability-client)
|
||||
|
||||
**Documentation**: [Link to component-specific documentation]
|
||||
**Documentation**:
|
||||
* [Kubernetes Metrics Server](https://github.com/kubernetes-sigs/metrics-server)
|
||||
* [Vector Documentation](https://vector.dev/docs/)
|
||||
* [Victoria Metrics Documentation](https://docs.victoriametrics.com/)
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Prerequisites
|
||||
|
||||
* [Prerequisite 1]
|
||||
* [Prerequisite 2]
|
||||
* Kubernetes cluster with ArgoCD installed (provided by `core` stack)
|
||||
* cert-manager for certificate management (provided by `otc` stack)
|
||||
* Observability backend services for receiving metrics and logs
|
||||
|
||||
### Quick Start
|
||||
|
||||
[Step-by-step guide to get started with this component]
|
||||
The Observability Client stack is deployed as part of the EDP installation process:
|
||||
|
||||
1. [Step 1]
|
||||
2. [Step 2]
|
||||
3. [Step 3]
|
||||
1. **Trigger Deploy Pipeline**
|
||||
- Go to [Infra Deploy Pipeline](https://edp.buildth.ing/DevFW/infra-deploy/actions?workflow=deploy.yaml)
|
||||
- Click on Run workflow
|
||||
- Enter a name in "Select environment directory to deploy". This must be DNS Compatible.
|
||||
- Execute workflow
|
||||
|
||||
2. **ArgoCD Synchronization**
|
||||
ArgoCD automatically deploys:
|
||||
- Metrics Server (Helm chart v3.12.2)
|
||||
- Vector agent (Helm chart v0.43.0)
|
||||
- Victoria Metrics k8s-stack (Helm chart v0.48.1)
|
||||
- ServiceMonitor resources for Prometheus scraping
|
||||
- Authentication secrets for remote write endpoints
|
||||
|
||||
### Verification
|
||||
|
||||
[How to verify the component is working correctly]
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### [Use Case 1]
|
||||
|
||||
[Example with code/commands showing common use case]
|
||||
Verify the Observability Client deployment:
|
||||
|
||||
```bash
|
||||
# Example commands
|
||||
# Check ArgoCD application status
|
||||
kubectl get application -n argocd | grep -E "metrics-server|vector|vm-client"
|
||||
|
||||
# Verify Metrics Server is running
|
||||
kubectl get pods -n observability -l app.kubernetes.io/name=metrics-server
|
||||
|
||||
# Test metrics API
|
||||
kubectl top nodes
|
||||
kubectl top pods -A
|
||||
|
||||
# Verify Vector pods are running
|
||||
kubectl get pods -n observability -l app.kubernetes.io/name=vector
|
||||
|
||||
# Check Victoria Metrics components
|
||||
kubectl get pods -n observability -l app.kubernetes.io/name=victoria-metrics-k8s-stack
|
||||
|
||||
# Verify ServiceMonitor resources
|
||||
kubectl get servicemonitor -n observability
|
||||
```
|
||||
|
||||
### [Use Case 2]
|
||||
|
||||
[Another common scenario]
|
||||
|
||||
## Integration Points
|
||||
|
||||
* **[Component A]**: [How it integrates]
|
||||
* **[Component B]**: [How it integrates]
|
||||
* **[Component C]**: [How it integrates]
|
||||
|
||||
## Architecture
|
||||
|
||||
[Optional: Add architectural diagrams and descriptions]
|
||||
### Component Architecture
|
||||
|
||||
### Component Architecture (C4)
|
||||
The Observability Client stack consists of three integrated components:
|
||||
|
||||
[Add C4 Container or Component diagrams showing the internal structure]
|
||||
**Metrics Server**:
|
||||
- Collects resource metrics (CPU, memory) from kubelet
|
||||
- Provides Metrics API for kubectl top and HPA
|
||||
- Lightweight aggregator for cluster-wide resource usage
|
||||
- Exposes ServiceMonitor for Prometheus scraping
|
||||
|
||||
### Sequence Diagrams
|
||||
**Vector Agent**:
|
||||
- DaemonSet deployment for log collection across all nodes
|
||||
- Processes and transforms Kubernetes logs
|
||||
- Forwards logs to centralized Elasticsearch backend
|
||||
- Injects cluster metadata and environment information
|
||||
- Supports compression and bulk operations
|
||||
|
||||
[Add sequence diagrams showing key interaction flows with other components]
|
||||
**Victoria Metrics Stack**:
|
||||
- VMAgent: Scrapes metrics from Kubernetes components and applications
|
||||
- VMAlertmanager: Manages alert routing and notifications
|
||||
- VMOperator: Manages VictoriaMetrics CRDs and lifecycle
|
||||
- Integration with remote Victoria Metrics storage
|
||||
- Supports multi-tenant metrics isolation
|
||||
|
||||
### Deployment Architecture
|
||||
### Data Flow
|
||||
|
||||
[Add infrastructure and deployment diagrams showing how the component is deployed]
|
||||
```
|
||||
Kubernetes Resources → Metrics Server → Metrics API
|
||||
↓
|
||||
ServiceMonitor → VMAgent → Remote VictoriaMetrics
|
||||
|
||||
Application Logs → Vector Agent → Transform → Remote Elasticsearch
|
||||
|
||||
Prometheus Exporters → VMAgent → Remote VictoriaMetrics → VMAlertmanager
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
[Key configuration options and how to set them]
|
||||
### Metrics Server Configuration
|
||||
|
||||
Configured in `stacks/observability-client/metrics-server/values.yaml`:
|
||||
|
||||
```yaml
|
||||
metrics:
|
||||
enabled: true
|
||||
serviceMonitor:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
**Key Settings**:
|
||||
- Enables metrics collection endpoint
|
||||
- Exposes ServiceMonitor for Prometheus-compatible scraping
|
||||
- Deployed via Helm chart from `https://kubernetes-sigs.github.io/metrics-server/`
|
||||
|
||||
### Vector Configuration
|
||||
|
||||
Configured in `stacks/observability-client/vector/values.yaml`:
|
||||
|
||||
**Role**: Agent (DaemonSet deployment across nodes)
|
||||
|
||||
**Authentication**:
|
||||
Credentials sourced from `simple-user-secret`:
|
||||
- `VECTOR_USER`: Username for remote write authentication
|
||||
- `VECTOR_PASSWORD`: Password for remote write authentication
|
||||
|
||||
**Data Sources**:
|
||||
- `k8s`: Collects Kubernetes container logs
|
||||
- `internal_metrics`: Gathers Vector internal metrics
|
||||
|
||||
**Log Processing**:
|
||||
```yaml
|
||||
transforms:
|
||||
parser:
|
||||
- Parse JSON from log messages
|
||||
- Inject cluster environment metadata
|
||||
- Remove original message field
|
||||
```
|
||||
|
||||
**Output Sink**:
|
||||
- Elasticsearch bulk API (v8)
|
||||
- Basic authentication with environment variables
|
||||
- Gzip compression enabled
|
||||
- Custom headers: AccountID and ProjectID
|
||||
|
||||
### Victoria Metrics Stack Configuration
|
||||
|
||||
Configured in `stacks/observability-client/vm-client-stack/values.yaml`:
|
||||
|
||||
**Operator Settings**:
|
||||
- Enabled with admission webhooks
|
||||
- Managed by cert-manager for ArgoCD compatibility
|
||||
|
||||
**VMAgent Configuration**:
|
||||
- Basic authentication for remote write
|
||||
- Credentials from `vm-remote-write-secret`
|
||||
- Stream parsing enabled
|
||||
- Drop original labels to reduce memory footprint
|
||||
|
||||
**Monitoring Targets**:
|
||||
- Node exporter for hardware metrics
|
||||
- kube-state-metrics for Kubernetes object states
|
||||
- Kubelet metrics (cadvisor)
|
||||
- Kubernetes control plane components (API server, etcd, scheduler, controller manager)
|
||||
- CoreDNS metrics
|
||||
|
||||
**Alertmanager Integration**:
|
||||
- Slack notification templates
|
||||
- Configurable routing rules
|
||||
- TLS support for secure communication
|
||||
|
||||
**Storage Options**:
|
||||
- VMSingle: Single-node deployment
|
||||
- VMCluster: Distributed deployment with replication
|
||||
- Configurable retention period
|
||||
|
||||
## ArgoCD Application Configuration
|
||||
|
||||
**Metrics Server Application** (`template/stacks/observability-client/metrics-server.yaml`):
|
||||
- Name: `metrics-server`
|
||||
- Chart version: 3.12.2
|
||||
- Automated sync with self-heal enabled
|
||||
- Namespace: `observability`
|
||||
|
||||
**Vector Application** (`template/stacks/observability-client/vector.yaml`):
|
||||
- Name: `vector`
|
||||
- Chart version: 0.43.0
|
||||
- Automated sync with self-heal enabled
|
||||
- Namespace: `observability`
|
||||
|
||||
**Victoria Metrics Application** (`template/stacks/observability-client/vm-client-stack.yaml`):
|
||||
- Name: `vm-client`
|
||||
- Chart version: 0.48.1
|
||||
- Automated sync with self-heal enabled
|
||||
- Namespace: `observability`
|
||||
- References manifests from instance repository
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Querying Resource Metrics
|
||||
|
||||
Access resource metrics collected by Metrics Server:
|
||||
|
||||
```bash
|
||||
# View node resource usage
|
||||
kubectl top nodes
|
||||
|
||||
# View pod resource usage across all namespaces
|
||||
kubectl top pods -A
|
||||
|
||||
# View pod resource usage in specific namespace
|
||||
kubectl top pods -n observability
|
||||
|
||||
# Sort pods by CPU usage
|
||||
kubectl top pods -A --sort-by=cpu
|
||||
|
||||
# Sort pods by memory usage
|
||||
kubectl top pods -A --sort-by=memory
|
||||
```
|
||||
|
||||
### Using Metrics for Autoscaling
|
||||
|
||||
Create Horizontal Pod Autoscaler based on metrics:
|
||||
|
||||
```yaml
|
||||
apiVersion: autoscaling/v2
|
||||
kind: HorizontalPodAutoscaler
|
||||
metadata:
|
||||
name: myapp-hpa
|
||||
spec:
|
||||
scaleTargetRef:
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
name: myapp
|
||||
minReplicas: 2
|
||||
maxReplicas: 10
|
||||
metrics:
|
||||
- type: Resource
|
||||
resource:
|
||||
name: cpu
|
||||
target:
|
||||
type: Utilization
|
||||
averageUtilization: 70
|
||||
```
|
||||
|
||||
### Accessing Application Logs
|
||||
|
||||
Vector automatically collects logs from all containers. View logs in your centralized Elasticsearch/Kibana:
|
||||
|
||||
```bash
|
||||
# Logs are automatically forwarded to Elasticsearch
|
||||
# Access via Kibana dashboard or Elasticsearch API
|
||||
|
||||
# Example: Query logs via Elasticsearch API
|
||||
curl -u $VECTOR_USER:$VECTOR_PASSWORD \
|
||||
-X GET "https://elasticsearch.example.com/_search" \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"query": {
|
||||
"match": {
|
||||
"kubernetes.namespace": "my-namespace"
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
### Querying Victoria Metrics
|
||||
|
||||
Query metrics collected by Victoria Metrics:
|
||||
|
||||
```bash
|
||||
# Access Victoria Metrics query API
|
||||
# Metrics are forwarded to remote Victoria Metrics instance
|
||||
|
||||
# Example PromQL queries:
|
||||
# - Container CPU usage: container_cpu_usage_seconds_total
|
||||
# - Pod memory usage: container_memory_usage_bytes
|
||||
# - Node disk I/O: node_disk_io_time_seconds_total
|
||||
|
||||
# Query via Victoria Metrics API
|
||||
curl -X POST https://victoriametrics.example.com/api/v1/query \
|
||||
-d 'query=up' \
|
||||
-d 'time=2025-12-16T00:00:00Z'
|
||||
```
|
||||
|
||||
### Creating Custom ServiceMonitors
|
||||
|
||||
Expose application metrics for collection:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: myapp-metrics
|
||||
labels:
|
||||
app: myapp
|
||||
spec:
|
||||
ports:
|
||||
- name: metrics
|
||||
port: 8080
|
||||
targetPort: 8080
|
||||
selector:
|
||||
app: myapp
|
||||
---
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
name: myapp-monitor
|
||||
namespace: observability
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
app: myapp
|
||||
endpoints:
|
||||
- port: metrics
|
||||
path: /metrics
|
||||
interval: 30s
|
||||
```
|
||||
|
||||
## Integration Points
|
||||
|
||||
* **Core Stack**: Depends on ArgoCD for deployment orchestration
|
||||
* **OTC Stack**: Requires cert-manager for certificate management
|
||||
* **Observability Stack**: Forwards metrics and logs to centralized observability backend
|
||||
* **All Application Stacks**: Collects metrics and logs from all platform applications
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### [Common Issue 1]
|
||||
### Metrics Server Not Responding
|
||||
|
||||
**Problem**: [Description]
|
||||
**Problem**: `kubectl top` commands fail or return no data
|
||||
|
||||
**Solution**: [How to fix]
|
||||
**Solution**:
|
||||
1. Check Metrics Server pod status:
|
||||
```bash
|
||||
kubectl get pods -n observability -l app.kubernetes.io/name=metrics-server
|
||||
kubectl logs -n observability -l app.kubernetes.io/name=metrics-server
|
||||
```
|
||||
|
||||
### [Common Issue 2]
|
||||
2. Verify kubelet metrics endpoint:
|
||||
```bash
|
||||
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
|
||||
```
|
||||
|
||||
**Problem**: [Description]
|
||||
3. Check ServiceMonitor configuration:
|
||||
```bash
|
||||
kubectl get servicemonitor -n observability -o yaml
|
||||
```
|
||||
|
||||
**Solution**: [How to fix]
|
||||
### Vector Not Forwarding Logs
|
||||
|
||||
## Status
|
||||
**Problem**: Logs are not appearing in Elasticsearch
|
||||
|
||||
**Maturity**: [Production / Beta / Experimental]
|
||||
**Solution**:
|
||||
1. Check Vector agent status:
|
||||
```bash
|
||||
kubectl get pods -n observability -l app.kubernetes.io/name=vector
|
||||
kubectl logs -n observability -l app.kubernetes.io/name=vector --tail=50
|
||||
```
|
||||
|
||||
2. Verify authentication secret:
|
||||
```bash
|
||||
kubectl get secret simple-user-secret -n observability
|
||||
kubectl get secret simple-user-secret -n observability -o jsonpath='{.data.username}' | base64 -d
|
||||
```
|
||||
|
||||
3. Test Elasticsearch connectivity:
|
||||
```bash
|
||||
kubectl exec -it -n observability $(kubectl get pod -n observability -l app.kubernetes.io/name=vector -o jsonpath='{.items[0].metadata.name}') -- \
|
||||
curl -u $VECTOR_USER:$VECTOR_PASSWORD https://elasticsearch.example.com/_cluster/health
|
||||
```
|
||||
|
||||
4. Check Vector internal metrics:
|
||||
```bash
|
||||
kubectl port-forward -n observability svc/vector 9090:9090
|
||||
curl http://localhost:9090/metrics
|
||||
```
|
||||
|
||||
### Victoria Metrics Not Scraping
|
||||
|
||||
**Problem**: Metrics are not being collected or forwarded
|
||||
|
||||
**Solution**:
|
||||
1. Check VMAgent status:
|
||||
```bash
|
||||
kubectl get pods -n observability -l app.kubernetes.io/name=vmagent
|
||||
kubectl logs -n observability -l app.kubernetes.io/name=vmagent
|
||||
```
|
||||
|
||||
2. Verify remote write secret:
|
||||
```bash
|
||||
kubectl get secret vm-remote-write-secret -n observability
|
||||
kubectl get secret vm-remote-write-secret -n observability -o jsonpath='{.data.username}' | base64 -d
|
||||
```
|
||||
|
||||
3. Check ServiceMonitor targets:
|
||||
```bash
|
||||
kubectl get servicemonitor -n observability
|
||||
kubectl describe servicemonitor metrics-server -n observability
|
||||
```
|
||||
|
||||
4. Verify operator is running:
|
||||
```bash
|
||||
kubectl get pods -n observability -l app.kubernetes.io/name=victoria-metrics-operator
|
||||
kubectl logs -n observability -l app.kubernetes.io/name=victoria-metrics-operator
|
||||
```
|
||||
|
||||
### High Memory Usage
|
||||
|
||||
**Problem**: Victoria Metrics or Vector consuming excessive memory
|
||||
|
||||
**Solution**:
|
||||
1. For Victoria Metrics, verify `dropOriginalLabels` is enabled:
|
||||
```bash
|
||||
kubectl get vmagent -n observability -o yaml | grep dropOriginalLabels
|
||||
```
|
||||
|
||||
2. Reduce scrape intervals for high-cardinality metrics:
|
||||
```yaml
|
||||
# Edit ServiceMonitor
|
||||
spec:
|
||||
endpoints:
|
||||
- interval: 60s # Increase from 30s
|
||||
```
|
||||
|
||||
3. Filter unnecessary logs in Vector:
|
||||
```yaml
|
||||
# Add filter transform to Vector configuration
|
||||
transforms:
|
||||
filter:
|
||||
type: filter
|
||||
condition: '.kubernetes.namespace != "kube-system"'
|
||||
```
|
||||
|
||||
4. Check resource limits:
|
||||
```bash
|
||||
kubectl describe pod -n observability -l app.kubernetes.io/name=vmagent
|
||||
kubectl describe pod -n observability -l app.kubernetes.io/name=vector
|
||||
```
|
||||
|
||||
### Certificate Issues
|
||||
|
||||
**Problem**: TLS certificate errors in logs
|
||||
|
||||
**Solution**:
|
||||
1. Verify cert-manager is running:
|
||||
```bash
|
||||
kubectl get pods -n cert-manager
|
||||
```
|
||||
|
||||
2. Check certificate status:
|
||||
```bash
|
||||
kubectl get certificate -n observability
|
||||
kubectl describe certificate -n observability
|
||||
```
|
||||
|
||||
3. Review webhook configuration:
|
||||
```bash
|
||||
kubectl get validatingwebhookconfigurations | grep victoria-metrics
|
||||
kubectl get mutatingwebhookconfigurations | grep victoria-metrics
|
||||
```
|
||||
|
||||
4. Restart operator if needed:
|
||||
```bash
|
||||
kubectl rollout restart deployment victoria-metrics-operator -n observability
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
* [Link to external documentation]
|
||||
* [Link to community resources]
|
||||
* [Link to related components]
|
||||
|
||||
## Documentation Notes
|
||||
|
||||
[Instructions for team members filling in this documentation - remove this section once complete]
|
||||
* [Kubernetes Metrics Server Documentation](https://github.com/kubernetes-sigs/metrics-server)
|
||||
* [Vector Documentation](https://vector.dev/docs/)
|
||||
* [Victoria Metrics Documentation](https://docs.victoriametrics.com/)
|
||||
* [Victoria Metrics Operator](https://docs.victoriametrics.com/operator/)
|
||||
* [Prometheus Operator API](https://prometheus-operator.dev/docs/operator/api/)
|
||||
* [ArgoCD Documentation](https://argo-cd.readthedocs.io/)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue