- Go 98%
- Makefile 1%
- Smarty 0.8%
- Dockerfile 0.2%
|
All checks were successful
ci-main / test (push) Successful in 2m50s
Replace gitlab.devops.telekom.de/ipcei-cis/infra/kcp-cilium-mesh-provider with edp.buildth.ing/devfw-cicd/kcp-cilium-mesh-provider in go.mod and README. |
||
|---|---|---|
| .forgejo/workflows | ||
| chart | ||
| config | ||
| docs | ||
| samples | ||
| .gitignore | ||
| .goreleaser.yaml | ||
| conditions.go | ||
| conditions_test.go | ||
| connectivity_reconciler.go | ||
| controller.go | ||
| Dockerfile | ||
| Dockerfile.goreleaser | ||
| endpoint_discovery.go | ||
| exposed_service_reconciler.go | ||
| exposed_service_reconciler_test.go | ||
| fabric_gvr.go | ||
| fabric_reconciler.go | ||
| fabric_status.go | ||
| fabric_status_test.go | ||
| globalservice_reconciler.go | ||
| go.mod | ||
| go.sum | ||
| hubble_probe.go | ||
| hubble_probe_test.go | ||
| main.go | ||
| Makefile | ||
| mesh_reconciler.go | ||
| metrics.go | ||
| NOTICE | ||
| README.md | ||
| renovate.json | ||
| status_updater.go | ||
kcp-cilium-mesh-provider
A kcp provider that watches networking custom resources and configures Cilium cluster mesh, global service export, and cross-cluster network policies on referenced Kubernetes clusters.
Overview
This provider follows the same pattern as kcp-karmada-provider:
- Discovers virtual workspace endpoints via kcp
APIExportEndpointSlice - Watches three networking CRs across all tenant workspaces
- Resolves cluster references to obtain kubeconfig access via management plane
- Reconciles desired state onto target clusters
- Reports status back to kcp
Networking CRDs
| CRD | Purpose | Depends On |
|---|---|---|
| ClusterMeshBinding | Establishes Cilium cluster mesh between 2+ clusters | KubernetesCluster CRs |
| GlobalService | Exports a Kubernetes Service across all mesh clusters | ClusterMeshBinding (Connected) |
| ConnectivityIntent | Declares cross-cluster Allow/Deny policy → Cilium network policies | ClusterMeshBinding (Connected) |
All three resources live in the networking.edge-portal.eu API group.
CRD: ClusterMeshBinding
apiVersion: networking.edge-portal.eu/v1alpha1
kind: ClusterMeshBinding
metadata:
name: edge-mesh-eu
namespace: default
spec:
clusterRefs:
- name: cluster-ffm-1 # KubernetesCluster CR in same namespace
- name: cluster-ber-2
meshConfig:
serviceType: LoadBalancer # LoadBalancer or NodePort
enableKvStoreMesh: false
hubble:
enabled: false
ui: false
Status
status:
phase: Connected # Pending | Connecting | Connected | Degraded | Error
observedGeneration: 1
lastSyncTime: "2026-04-21T10:00:00Z"
meshStatus:
- clusterName: cluster-ffm-1
clusterId: 1
connected: true
readyNodes: 3
totalNodes: 3
- clusterName: cluster-ber-2
clusterId: 2
connected: true
readyNodes: 2
totalNodes: 2
conditions:
- type: MeshConnected
status: "True"
reason: MeshEstablished
message: "Mesh established between 2 clusters"
CRD: GlobalService
Declares that a Kubernetes Service should be exported across the cluster mesh.
The reconciler annotates the Service with io.cilium/global-service: "true"
and optionally io.cilium/shared-service: "true" on target clusters.
Prerequisite: The referenced ClusterMeshBinding must be in Connected
or Degraded phase.
apiVersion: networking.edge-portal.eu/v1alpha1
kind: GlobalService
metadata:
name: nginx-global
namespace: default
spec:
meshRef:
name: edge-mesh-eu # ClusterMeshBinding in same namespace
selector:
namespace: production # Namespace of the Service on target clusters
name: nginx # Service name to export
# OR use labelSelector to match multiple services:
# labelSelector:
# app: frontend
targetClusters: # Optional: subset of mesh clusters (empty = all)
- cluster-ffm-1
globalAnnotations: # Extra annotations applied to ServiceExport
service.cilium.io/shared: "true"
GlobalService Status
status:
phase: Exported # Pending | Exporting | Exported | Degraded | Error
observedGeneration: 1
lastSyncTime: "2026-04-21T10:00:00Z"
exportedClusters:
- cluster-ffm-1
- cluster-ber-2
endpointSummary:
- clusterName: cluster-ffm-1
endpoints: 3
ready: true
- clusterName: cluster-ber-2
endpoints: 2
ready: true
conditions:
- type: Accepted
status: "True"
reason: SpecValid
- type: Exported
status: "True"
reason: AllClustersExported
- type: Ready
status: "True"
reason: Exported
CRD: ConnectivityIntent
Declares a cross-cluster network policy. The reconciler translates the intent
into CiliumNetworkPolicy (namespaced) or CiliumClusterwideNetworkPolicy
(cluster-scoped) and applies them to all mesh clusters.
Prerequisite: The referenced ClusterMeshBinding must be in Connected
or Degraded phase.
apiVersion: networking.edge-portal.eu/v1alpha1
kind: ConnectivityIntent
metadata:
name: frontend-to-backend
namespace: default
spec:
meshRef:
name: edge-mesh-eu # ClusterMeshBinding in same namespace
action: Allow # Allow | Deny
policyMode: Namespaced # Namespaced | Clusterwide
source:
namespace: production
name: frontend # Maps to k8s:app label in Cilium selector
# OR selector: { "role": "web" }
destination:
namespace: production
name: backend
ports:
- protocol: TCP
port: 8080
- protocol: TCP
port: 443
probe:
enabled: false # Future: reachability probes
intervalSeconds: 60
ConnectivityIntent Status
status:
phase: Applied # Pending | Applying | Applied | Degraded | Error
observedGeneration: 1
lastSyncTime: "2026-04-21T10:00:00Z"
appliedClusters:
- cluster-ffm-1
- cluster-ber-2
probeStatus: # Only present if probe.enabled=true
lastProbeTime: "2026-04-21T10:05:00Z"
success: true
message: "probe not yet implemented; reporting apply status as proxy"
conditions:
- type: Accepted
status: "True"
reason: SpecValid
- type: Applied
status: "True"
reason: AllClustersApplied
- type: Ready
status: "True"
reason: Applied
Condition Types (all three CRDs)
| Condition | Used By | Meaning |
|---|---|---|
Ready |
All | Terminal: resource is fully reconciled |
Accepted |
GlobalService, ConnectivityIntent | Spec validated |
Exported |
GlobalService | Service annotations applied |
Applied |
ConnectivityIntent | Cilium policies applied |
Reachable |
ConnectivityIntent | Probe succeeded (future) |
Degraded |
All | Partial success (some clusters failed) |
MeshConnected |
ClusterMeshBinding | Legacy: mesh established |
Architecture
kcp (virtual workspace) Management Plane Target Clusters
┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────┐
│ ClusterMeshBinding │ │ cluster-operator │ │ cluster-ffm-1 │
│ GlobalService │ │ Cluster CRs │ │ ├─ cilium │
│ ConnectivityIntent │◄───│ kubeconfig Secrets │───►│ ├─ clustermesh │
│ KubernetesCluster │ │ │ │ ├─ svc exports │
│ (APIExport) │ │ │ │ └─ net policies │
└──────────────────────┘ └──────────────────────┘ ├──────────────────┤
│ watch │ read │ cluster-ber-2 │
│ │ kubeconfig │ ├─ cilium │
▼ ▼ │ ├─ clustermesh │
┌──────────────────────────────────────────────────┐ │ ├─ svc exports │
│ kcp-cilium-mesh-provider │ │ └─ net policies │
│ ┌─────────────────┐ ┌──────────────────────┐ │ └──────────────────┘
│ │ EndpointDiscovery│ │ MeshReconciler │───┼───── CA sync, secrets
│ │ Controller │ │ GlobalSvcReconciler │───┼───── svc annotations
│ │ StatusUpdater │ │ ConnIntentReconciler │───┼───── CiliumNetworkPolicy
│ └─────────────────┘ └──────────────────────┘ │
└──────────────────────────────────────────────────┘
Reconciler Pipeline
┌─────────────────────────┐
│ ClusterMeshBinding CR │──── MeshReconciler
│ phase: Connected │ ├─ Verify Cilium
└────────────┬────────────┘ ├─ Set cluster identity
│ referenced by ├─ Sync CA
│ ├─ Exchange secrets
▼ └─ Report meshStatus[]
┌─────────────────────────┐
│ GlobalService CR │──── GlobalServiceReconciler
│ meshRef: <binding> │ ├─ Resolve mesh binding
│ phase: Exported │ ├─ Find Service on clusters
└────────────┬────────────┘ ├─ Annotate io.cilium/global-service
│ same mesh └─ Report endpointSummary[]
▼
┌─────────────────────────┐
│ ConnectivityIntent CR │──── ConnectivityIntentReconciler
│ meshRef: <binding> │ ├─ Resolve mesh binding
│ phase: Applied │ ├─ Build CiliumNetworkPolicy
└─────────────────────────┘ ├─ Apply to target clusters
└─ Report appliedClusters[]
Mesh Setup Steps
The reconciler mirrors the logic from the existing cilium-cluster-mesh Helm job, but uses Kubernetes API calls instead of shell commands:
| Step | Shell Script | Provider |
|---|---|---|
| 1. Verify Cilium | cilium status |
DaemonSet GET |
| 2. Set cluster identity | cilium install --set cluster.name/id |
ConfigMap PATCH |
| 3. Sync CA | kubectl get/create secret cilium-ca |
Secret GET + CREATE |
| 4. Enable clustermesh | cilium clustermesh enable |
Deployment check (expects pre-installed) |
| 5. Connect clusters | cilium clustermesh connect |
Secret exchange between clusters |
| 6. Verify | cilium clustermesh status |
Deployment + Secret checks |
Integration with cluster-operator
The kcp-cilium-mesh-provider does not provision clusters or install Cilium. It relies on cluster-operator and the Crossplane compositions to prepare clusters before mesh setup begins. The interaction is read-only from the perspective of the management plane — the provider only reads Secrets and Cluster CRs, never writes them.
Data flow
Management Plane (cluster-operator)
┌───────────────────────────────────────────────────────┐
│ │
│ Cluster CR kubeconfig Secret │
│ ┌──────────────────────┐ ┌────────────────────┐ │
│ │ metadata: │ │ metadata: │ │
│ │ name: cluster-ffm-1 │ │ name: (varies) │ │
│ │ namespace: default │ │ namespace: ... │ │
│ │ spec: │ │ data: │ │
│ │ provider: OTC | OSC │ │ kubeconfig: ... │ │
│ │ status: │ │ (or value: ...) │ │
│ │ conditions: │ └────────────────────┘ │
│ │ - type: Ready ✓ │ ▲ │
│ │ kubeconfigSecretRef: ├───────────────┘ │
│ │ name: <secret> │ │
│ │ key: kubeconfig │ │
│ │ capabilities: │ │
│ │ - name: cilium-cni │◄──── CapabilitiesReconciler │
│ │ details: │ (checks every 5 min) │
│ │ status: ready │ │
│ │ version: 1.16 │ │
│ └──────────────────────┘ │
└───────────────────────────────────────────────────────┘
│ read │ read
▼ ▼
┌───────────────────────────────────────────────────────┐
│ kcp-cilium-mesh-provider │
│ │
│ 1. Read Cluster CR → check conditions[Ready]=True │
│ 2. Read kubeconfigSecretRef → get Secret name + key │
│ 3. Fetch Secret → extract kubeconfig bytes │
│ 4. Build rest.Config → connect to target cluster │
└───────────────────────────────────────────────────────┘
Kubeconfig Secret conventions
cluster-operator stores kubeconfig access differently per provider:
| Provider | Secret Name | Data Key | Created By |
|---|---|---|---|
| KIND | kind-<clusterName>-kubeconfig |
kubeconfig |
Kind reconciler |
| OTC (CCE) | From CCE.Status.Kubeconfig.KubeconfigSecretRef.Name |
From .Key |
Crossplane CCE composition |
| OSC (Gardener) | From ShootClaim.Status.Kubeconfig.SecretRef.Name |
kubeconfig |
Gardener extension |
| IMPORTED | From spec.importedK8SConfig.secretRef |
User-defined | User |
The mesh provider reads the Cluster CR's status.kubeconfigSecretRef (.name + .key)
and fetches the Secret from the same namespace as the Cluster CR. This is the same
retrieval logic used by cluster-operator's GetKubeconfigFromCluster() utility.
Capability detection: cilium-cni
cluster-operator's CapabilitiesReconciler periodically (every 5 minutes) connects to
each Ready cluster and runs registered checkers. The CiliumChecker queries the remote
cluster for a DaemonSet in kube-system with label k8s-app=cilium and writes:
status:
capabilities:
- name: cilium-cni
details:
status: ready # ready | not-installed | error
version: "1.16.1"
image: "quay.io/cilium/cilium:v1.16.1"
desired-nodes: "3"
ready-nodes: "3"
lastTransitionTime: "2026-04-21T10:00:00Z"
The mesh provider can use this capability as a pre-flight check — if a referenced
cluster's cilium-cni capability is not-installed or error, the mesh reconciler
short-circuits with an Error status instead of timing out trying to reach the DaemonSet.
What cluster-operator owns vs. what this provider owns
| Concern | Owner | How |
|---|---|---|
| Cluster provisioning | cluster-operator | Creates Crossplane Claims (CCE, ShootClaim) |
| Cilium CNI installation | Crossplane composition | Gardener: networking.type: cilium; OTC: replaced via Helm |
| Kubeconfig Secret lifecycle | cluster-operator | Copies/creates Secret, writes kubeconfigSecretRef to status |
| Cilium capability detection | cluster-operator | CiliumChecker → status.capabilities[cilium-cni] |
| ClusterMesh setup | this provider | CA sync, identity, connection secrets |
| ClusterMesh status reporting | this provider | Writes to ClusterMeshBinding.status in kcp |
| ClusterMesh teardown | this provider | Finalizer-driven cleanup on CR deletion |
Prerequisites
Before a ClusterMeshBinding can be reconciled, all of the following must be true:
1. kcp infrastructure
- kcp is running with the
edge-portal-networkingAPIExport registeredkubectl apply -f config/kcp-cilium-mesh-apiexport.yaml # in system workspace kubectl apply -f config/kcp-cilium-mesh-rbac.yaml - Tenant workspace has the APIBinding for networking CRDs
kubectl apply -f config/kcp-cilium-mesh-apibinding.yaml # in tenant workspace - The
edge-portal-fleetAPIExport (from kcp-karmada-provider) is also registered, becauseClusterMeshBindingreferencesKubernetesClusterCRs from that export
2. Cluster provisioning
- At least 2
KubernetesClusterCRs exist in the same kcp workspace namespace - Each KubernetesCluster has
status.phase: Ready(for kcp-karmada-provider) or the correspondingClusterCR on the management plane has conditionReady=True - Each management-plane
ClusterCR has a validstatus.kubeconfigSecretRefpointing to a Secret with a working kubeconfig
3. Cilium on target clusters
-
Cilium CNI is installed and running on every referenced cluster
- OSC/Gardener: Automatic via
networking.type: ciliumin the Shoot spec - OTC: Requires uninstalling yangtse-cilium and installing upstream Cilium
(currently done by the
kcpcrospplane/cilium-cluster-meshHelm job; Phase 2 will have the Crossplane composition handle this) - KIND: Must install Cilium manually or via Helm (
cilium install) - IMPORTED: User must ensure Cilium is pre-installed
- OSC/Gardener: Automatic via
-
clustermesh-apiserveris enabled in Cilium's Helm values:# In Cilium Helm values (Crossplane composition or manual install): clustermesh: useAPIServer: true apiserver: service: type: LoadBalancer # or NodePort for dev -
cluster-operator's
CapabilitiesReconcilerreportscilium-cni.status=readyon the management-plane Cluster CR (this confirms the above is working)
4. Network connectivity
- Clusters can reach each other's
clustermesh-apiserverService endpoint (LoadBalancer IP or NodePort) - For OTC: an ELB (Elastic Load Balancer) must be pre-provisioned and its ID
annotated on the
KubernetesClusterCR:metadata: annotations: networking.edge-portal.eu/elb-id: "41fae315-2964-49c4-afde-bc6d358cd031"
How to establish a mesh
Step 1: Provision clusters
Using the portal or directly in kcp, create KubernetesCluster CRs:
apiVersion: platform.edge-portal.eu/v1alpha1
kind: KubernetesCluster
metadata:
name: cluster-ffm-1
namespace: default
annotations:
networking.edge-portal.eu/elb-id: "c582b56c-2b4e-4c83-9d9e-29e12893c6f6" # OTC only
spec:
providerRef:
name: kind # or otc, osc
region: eu-de
nodeCount: 3
---
apiVersion: platform.edge-portal.eu/v1alpha1
kind: KubernetesCluster
metadata:
name: cluster-ber-2
namespace: default
annotations:
networking.edge-portal.eu/elb-id: "41fae315-2964-49c4-afde-bc6d358cd031"
spec:
providerRef:
name: kind
region: eu-de
nodeCount: 2
Wait for both clusters to reach status.phase: Ready.
Step 2: Verify Cilium is running
Check that cluster-operator has detected Cilium:
# On the management plane:
kubectl get cluster cluster-ffm-1 -o jsonpath='{.status.capabilities}' | jq '.[] | select(.name=="cilium-cni")'
Expected output:
{
"name": "cilium-cni",
"details": {
"status": "ready",
"version": "1.16.1",
"ready-nodes": "3",
"desired-nodes": "3"
}
}
Step 3: Create the ClusterMeshBinding
In the kcp tenant workspace:
apiVersion: networking.edge-portal.eu/v1alpha1
kind: ClusterMeshBinding
metadata:
name: edge-mesh-eu
namespace: default
spec:
clusterRefs:
- name: cluster-ffm-1
- name: cluster-ber-2
meshConfig:
serviceType: LoadBalancer
hubble:
enabled: true
ui: false
Step 4: Monitor status
# Watch the binding status:
kubectl get clustermeshbinding edge-mesh-eu -w
# Detailed status:
kubectl get clustermeshbinding edge-mesh-eu -o yaml
The phase progression is:
Pending → Connecting → Connected
└→ Degraded (partial connectivity)
└→ Error (prerequisite not met)
Step 5: Export a global service
Once phase: Connected, create a GlobalService to export services across the mesh:
apiVersion: networking.edge-portal.eu/v1alpha1
kind: GlobalService
metadata:
name: my-service-global
namespace: default
spec:
meshRef:
name: edge-mesh-eu
selector:
namespace: default
name: my-service
The provider annotates my-service with io.cilium/global-service: "true" on all
mesh clusters. Pods on cluster-ber-2 can now reach my-service as if it were local.
Step 6: Define connectivity policies
Create a ConnectivityIntent to control cross-cluster traffic:
apiVersion: networking.edge-portal.eu/v1alpha1
kind: ConnectivityIntent
metadata:
name: allow-frontend-backend
namespace: default
spec:
meshRef:
name: edge-mesh-eu
action: Allow
source:
namespace: production
name: frontend
destination:
namespace: production
name: backend
ports:
- protocol: TCP
port: 8080
The provider creates CiliumNetworkPolicy resources on all mesh clusters.
Teardown
Delete resources in reverse dependency order. Each CR has a finalizer that cleans up on the target clusters before allowing deletion:
# 1. Remove policies (deletes CiliumNetworkPolicy from clusters)
kubectl delete connectivityintent allow-frontend-backend
# 2. Remove global services (removes annotations from clusters)
kubectl delete globalservice my-service-global
# 3. Remove the mesh (removes connection secrets from clusters)
kubectl delete clustermeshbinding edge-mesh-eu
This does not uninstall Cilium — it only removes the mesh configuration, service exports, and network policies created by this provider.
Troubleshooting
ClusterMeshBinding
| Symptom | Cause | Fix |
|---|---|---|
Phase stuck at Pending |
KubernetesCluster CRs not Ready | Check cluster provisioning in kcp-karmada-provider logs |
Phase Error: "Cilium not found" |
Cilium not installed on cluster | Check Crossplane composition or install Cilium manually |
Phase Error: "clustermesh-apiserver not found" |
Cilium installed without clustermesh | Upgrade Cilium Helm: --set clustermesh.useAPIServer=true |
Phase Error: "reading kubeconfig secret" |
Management plane Secret missing | Verify kubectl get cluster <name> -o jsonpath='{.status.kubeconfigSecretRef}' |
Phase Degraded |
Connectivity issue between clusters | Check that clustermesh-apiserver Service has an external IP/NodePort |
Phase Connecting (stuck) |
Secrets exchanged but not yet synced | Wait ~60s; Cilium agents need time to establish kvstore connection |
GlobalService
| Symptom | Cause | Fix |
|---|---|---|
Phase Pending |
Referenced ClusterMeshBinding not Connected | Wait for mesh to connect, check binding status |
Phase Error: "ClusterMeshBinding not found" |
Wrong meshRef.name |
Fix the reference name |
Phase Error: "Service not found" |
Service doesn't exist on target clusters | Create the Service on the clusters first |
Phase Degraded |
Annotation failed on some clusters | Check cluster connectivity, kubeconfig secrets |
ConnectivityIntent
| Symptom | Cause | Fix |
|---|---|---|
Phase Pending |
Referenced ClusterMeshBinding not Connected | Wait for mesh to connect |
Phase Error: "validation failed" |
Missing required spec fields | Check source.namespace and destination.namespace |
Phase Degraded |
Policy apply failed on some clusters | Check target cluster access and Cilium CRD availability |
Phase Applied but traffic blocked |
Wrong selector or namespace | Verify source/destination labels match actual pods |
Usage
Build
make build
Run locally
export KCP_KUBECONFIG=~/.kcp/admin.kubeconfig
export MGMT_KUBECONFIG=~/.kube/config
export NODE_ID=dev-node
make run
Docker
make docker-build
kcp Setup
# In the kcp system workspace:
kubectl apply -f config/kcp-cilium-mesh-apiexport.yaml
kubectl apply -f config/kcp-cilium-mesh-rbac.yaml
# In tenant workspaces:
kubectl apply -f config/kcp-cilium-mesh-apibinding.yaml
Flags
| Flag | Env | Default | Description |
|---|---|---|---|
--kcp-kubeconfig |
KCP_KUBECONFIG |
- | kcp kubeconfig (required) |
--mgmt-kubeconfig |
MGMT_KUBECONFIG |
- | Management plane kubeconfig (required) |
--mgmt-namespace |
MGMT_NAMESPACE |
default |
Namespace for cluster kubeconfig secrets |
--node-id |
NODE_ID |
- | Stable provider node ID (required) |
--api-export-name |
- | edge-portal-networking |
kcp APIExport name |
--kcp-external-url |
KCP_EXTERNAL_URL |
- | Override kcp host for dev setups |
--reconcile-interval |
- | 30s |
Endpoint discovery interval |
--mesh-check-interval |
- | 5m |
Mesh health check interval |
Relationship to Existing Components
- Replaces:
kcpcrospplane/cilium-cluster-mesh/(imperative Helm job) - Reads from:
cluster-operatorkubeconfig Secrets on mgmt plane - Watches:
KubernetesClusterCRs (fromkcp-karmada-provider's APIExport) - Feeds into:
platform/data-pipeline/Hubble metrics (future) - Portal integration:
edge-connect-portal-pocrenders GlobalService/ConnectivityIntent in the Networking page alongside ClusterMeshBinding
Source Layout
main.go Entry point, flag parsing, reconciler wiring
controller.go Main controller loop: informers, dispatch, finalizer handling
mesh_reconciler.go ClusterMeshBinding → Cilium mesh setup
globalservice_reconciler.go GlobalService → io.cilium/global-service annotations
connectivity_reconciler.go ConnectivityIntent → CiliumNetworkPolicy / CiliumClusterwideNetworkPolicy
conditions.go Shared condition constants and builder helpers
status_updater.go Status patch writer for all three CRDs
endpoint_discovery.go VW endpoint discovery from APIExportEndpointSlice
conditions_test.go Unit tests for condition helpers
config/
kcp-cilium-mesh-apiexport.yaml APIResourceSchema + APIExport (3 schemas)
kcp-cilium-mesh-apibinding.yaml APIBinding for tenant workspaces
kcp-cilium-mesh-rbac.yaml RBAC for the provider
samples/
mesh-binding.yaml Example ClusterMeshBinding
global-service.yaml Example GlobalService
connectivity-intent.yaml Example ConnectivityIntent