No description
  • Go 98%
  • Makefile 1%
  • Smarty 0.8%
  • Dockerfile 0.2%
Find a file
mawu 27e7c7eb02
All checks were successful
ci-main / test (push) Successful in 2m50s
chore: migrate Go module path to edp.buildth.ing/devfw-cicd
Replace gitlab.devops.telekom.de/ipcei-cis/infra/kcp-cilium-mesh-provider
with edp.buildth.ing/devfw-cicd/kcp-cilium-mesh-provider in go.mod and
README.
2026-05-21 09:05:42 +02:00
.forgejo/workflows ci: add forgejo workflows, goreleaser, helm chart, makefile parity 2026-05-20 15:16:56 +02:00
chart ci: add forgejo workflows, goreleaser, helm chart, makefile parity 2026-05-20 15:16:56 +02:00
config feat(m3): ExposedService reconciler, Hubble probe stub, metric concept emission 2026-05-20 17:13:32 +02:00
docs docs: add deployment, architecture, C4 L3, developer-guide 2026-05-20 15:43:51 +02:00
samples feat: kcp-cilium-mesh-provider with ClusterMeshBinding, GlobalService, ConnectivityIntent 2026-04-21 14:12:49 +02:00
.gitignore feat: kcp-cilium-mesh-provider with ClusterMeshBinding, GlobalService, ConnectivityIntent 2026-04-21 14:12:49 +02:00
.goreleaser.yaml ci: add forgejo workflows, goreleaser, helm chart, makefile parity 2026-05-20 15:16:56 +02:00
conditions.go feat: kcp-cilium-mesh-provider with ClusterMeshBinding, GlobalService, ConnectivityIntent 2026-04-21 14:12:49 +02:00
conditions_test.go feat: kcp-cilium-mesh-provider with ClusterMeshBinding, GlobalService, ConnectivityIntent 2026-04-21 14:12:49 +02:00
connectivity_reconciler.go ci: add forgejo workflows, goreleaser, helm chart, makefile parity 2026-05-20 15:16:56 +02:00
controller.go feat(m3): ExposedService reconciler, Hubble probe stub, metric concept emission 2026-05-20 17:13:32 +02:00
Dockerfile feat: kcp-cilium-mesh-provider with ClusterMeshBinding, GlobalService, ConnectivityIntent 2026-04-21 14:12:49 +02:00
Dockerfile.goreleaser ci: add forgejo workflows, goreleaser, helm chart, makefile parity 2026-05-20 15:16:56 +02:00
endpoint_discovery.go feat: kcp-cilium-mesh-provider with ClusterMeshBinding, GlobalService, ConnectivityIntent 2026-04-21 14:12:49 +02:00
exposed_service_reconciler.go feat(m3): ExposedService reconciler, Hubble probe stub, metric concept emission 2026-05-20 17:13:32 +02:00
exposed_service_reconciler_test.go feat(m3): ExposedService reconciler, Hubble probe stub, metric concept emission 2026-05-20 17:13:32 +02:00
fabric_gvr.go feat(m3): ExposedService reconciler, Hubble probe stub, metric concept emission 2026-05-20 17:13:32 +02:00
fabric_reconciler.go feat(m3): ExposedService reconciler, Hubble probe stub, metric concept emission 2026-05-20 17:13:32 +02:00
fabric_status.go feat(m3): ExposedService reconciler, Hubble probe stub, metric concept emission 2026-05-20 17:13:32 +02:00
fabric_status_test.go feat(m3): ExposedService reconciler, Hubble probe stub, metric concept emission 2026-05-20 17:13:32 +02:00
globalservice_reconciler.go fix: align GlobalService controller with CRD schema 2026-04-21 15:06:10 +02:00
go.mod chore: migrate Go module path to edp.buildth.ing/devfw-cicd 2026-05-21 09:05:42 +02:00
go.sum fix(deps): bump golang.org/x/oauth2 to v0.27.0 (CVE-2025-22868) 2026-05-20 15:26:43 +02:00
hubble_probe.go feat(m3): ExposedService reconciler, Hubble probe stub, metric concept emission 2026-05-20 17:13:32 +02:00
hubble_probe_test.go feat(m3): ExposedService reconciler, Hubble probe stub, metric concept emission 2026-05-20 17:13:32 +02:00
main.go feat(m3): ExposedService reconciler, Hubble probe stub, metric concept emission 2026-05-20 17:13:32 +02:00
Makefile ci: add forgejo workflows, goreleaser, helm chart, makefile parity 2026-05-20 15:16:56 +02:00
mesh_reconciler.go feat: kcp-cilium-mesh-provider with ClusterMeshBinding, GlobalService, ConnectivityIntent 2026-04-21 14:12:49 +02:00
metrics.go feat(m3): ExposedService reconciler, Hubble probe stub, metric concept emission 2026-05-20 17:13:32 +02:00
NOTICE feat: kcp-cilium-mesh-provider with ClusterMeshBinding, GlobalService, ConnectivityIntent 2026-04-21 14:12:49 +02:00
README.md chore: migrate Go module path to edp.buildth.ing/devfw-cicd 2026-05-21 09:05:42 +02:00
renovate.json chore: add renovate config 2026-05-20 15:22:14 +02:00
status_updater.go feat: kcp-cilium-mesh-provider with ClusterMeshBinding, GlobalService, ConnectivityIntent 2026-04-21 14:12:49 +02:00

kcp-cilium-mesh-provider

A kcp provider that watches networking custom resources and configures Cilium cluster mesh, global service export, and cross-cluster network policies on referenced Kubernetes clusters.

Overview

This provider follows the same pattern as kcp-karmada-provider:

  1. Discovers virtual workspace endpoints via kcp APIExportEndpointSlice
  2. Watches three networking CRs across all tenant workspaces
  3. Resolves cluster references to obtain kubeconfig access via management plane
  4. Reconciles desired state onto target clusters
  5. Reports status back to kcp

Networking CRDs

CRD Purpose Depends On
ClusterMeshBinding Establishes Cilium cluster mesh between 2+ clusters KubernetesCluster CRs
GlobalService Exports a Kubernetes Service across all mesh clusters ClusterMeshBinding (Connected)
ConnectivityIntent Declares cross-cluster Allow/Deny policy → Cilium network policies ClusterMeshBinding (Connected)

All three resources live in the networking.edge-portal.eu API group.

CRD: ClusterMeshBinding

apiVersion: networking.edge-portal.eu/v1alpha1
kind: ClusterMeshBinding
metadata:
  name: edge-mesh-eu
  namespace: default
spec:
  clusterRefs:
    - name: cluster-ffm-1       # KubernetesCluster CR in same namespace
    - name: cluster-ber-2
  meshConfig:
    serviceType: LoadBalancer   # LoadBalancer or NodePort
    enableKvStoreMesh: false
    hubble:
      enabled: false
      ui: false

Status

status:
  phase: Connected              # Pending | Connecting | Connected | Degraded | Error
  observedGeneration: 1
  lastSyncTime: "2026-04-21T10:00:00Z"
  meshStatus:
    - clusterName: cluster-ffm-1
      clusterId: 1
      connected: true
      readyNodes: 3
      totalNodes: 3
    - clusterName: cluster-ber-2
      clusterId: 2
      connected: true
      readyNodes: 2
      totalNodes: 2
  conditions:
    - type: MeshConnected
      status: "True"
      reason: MeshEstablished
      message: "Mesh established between 2 clusters"

CRD: GlobalService

Declares that a Kubernetes Service should be exported across the cluster mesh. The reconciler annotates the Service with io.cilium/global-service: "true" and optionally io.cilium/shared-service: "true" on target clusters.

Prerequisite: The referenced ClusterMeshBinding must be in Connected or Degraded phase.

apiVersion: networking.edge-portal.eu/v1alpha1
kind: GlobalService
metadata:
  name: nginx-global
  namespace: default
spec:
  meshRef:
    name: edge-mesh-eu           # ClusterMeshBinding in same namespace
  selector:
    namespace: production         # Namespace of the Service on target clusters
    name: nginx                   # Service name to export
    # OR use labelSelector to match multiple services:
    # labelSelector:
    #   app: frontend
  targetClusters:                 # Optional: subset of mesh clusters (empty = all)
    - cluster-ffm-1
  globalAnnotations:              # Extra annotations applied to ServiceExport
    service.cilium.io/shared: "true"

GlobalService Status

status:
  phase: Exported               # Pending | Exporting | Exported | Degraded | Error
  observedGeneration: 1
  lastSyncTime: "2026-04-21T10:00:00Z"
  exportedClusters:
    - cluster-ffm-1
    - cluster-ber-2
  endpointSummary:
    - clusterName: cluster-ffm-1
      endpoints: 3
      ready: true
    - clusterName: cluster-ber-2
      endpoints: 2
      ready: true
  conditions:
    - type: Accepted
      status: "True"
      reason: SpecValid
    - type: Exported
      status: "True"
      reason: AllClustersExported
    - type: Ready
      status: "True"
      reason: Exported

CRD: ConnectivityIntent

Declares a cross-cluster network policy. The reconciler translates the intent into CiliumNetworkPolicy (namespaced) or CiliumClusterwideNetworkPolicy (cluster-scoped) and applies them to all mesh clusters.

Prerequisite: The referenced ClusterMeshBinding must be in Connected or Degraded phase.

apiVersion: networking.edge-portal.eu/v1alpha1
kind: ConnectivityIntent
metadata:
  name: frontend-to-backend
  namespace: default
spec:
  meshRef:
    name: edge-mesh-eu           # ClusterMeshBinding in same namespace
  action: Allow                   # Allow | Deny
  policyMode: Namespaced          # Namespaced | Clusterwide
  source:
    namespace: production
    name: frontend                # Maps to k8s:app label in Cilium selector
    # OR selector: { "role": "web" }
  destination:
    namespace: production
    name: backend
  ports:
    - protocol: TCP
      port: 8080
    - protocol: TCP
      port: 443
  probe:
    enabled: false                # Future: reachability probes
    intervalSeconds: 60

ConnectivityIntent Status

status:
  phase: Applied                # Pending | Applying | Applied | Degraded | Error
  observedGeneration: 1
  lastSyncTime: "2026-04-21T10:00:00Z"
  appliedClusters:
    - cluster-ffm-1
    - cluster-ber-2
  probeStatus:                   # Only present if probe.enabled=true
    lastProbeTime: "2026-04-21T10:05:00Z"
    success: true
    message: "probe not yet implemented; reporting apply status as proxy"
  conditions:
    - type: Accepted
      status: "True"
      reason: SpecValid
    - type: Applied
      status: "True"
      reason: AllClustersApplied
    - type: Ready
      status: "True"
      reason: Applied

Condition Types (all three CRDs)

Condition Used By Meaning
Ready All Terminal: resource is fully reconciled
Accepted GlobalService, ConnectivityIntent Spec validated
Exported GlobalService Service annotations applied
Applied ConnectivityIntent Cilium policies applied
Reachable ConnectivityIntent Probe succeeded (future)
Degraded All Partial success (some clusters failed)
MeshConnected ClusterMeshBinding Legacy: mesh established

Architecture

kcp (virtual workspace)          Management Plane              Target Clusters
┌──────────────────────┐    ┌──────────────────────┐    ┌──────────────────┐
│  ClusterMeshBinding  │    │  cluster-operator     │    │  cluster-ffm-1   │
│  GlobalService       │    │  Cluster CRs          │    │  ├─ cilium       │
│  ConnectivityIntent  │◄───│  kubeconfig Secrets    │───►│  ├─ clustermesh  │
│  KubernetesCluster   │    │                       │    │  ├─ svc exports  │
│  (APIExport)         │    │                       │    │  └─ net policies │
└──────────────────────┘    └──────────────────────┘    ├──────────────────┤
         │ watch                     │ read              │  cluster-ber-2   │
         │                           │ kubeconfig        │  ├─ cilium       │
         ▼                           ▼                   │  ├─ clustermesh  │
┌──────────────────────────────────────────────────┐    │  ├─ svc exports  │
│          kcp-cilium-mesh-provider                │    │  └─ net policies │
│  ┌─────────────────┐  ┌──────────────────────┐   │    └──────────────────┘
│  │ EndpointDiscovery│  │ MeshReconciler       │───┼───── CA sync, secrets
│  │ Controller       │  │ GlobalSvcReconciler  │───┼───── svc annotations
│  │ StatusUpdater    │  │ ConnIntentReconciler  │───┼───── CiliumNetworkPolicy
│  └─────────────────┘  └──────────────────────┘   │
└──────────────────────────────────────────────────┘

Reconciler Pipeline

┌─────────────────────────┐
│  ClusterMeshBinding CR  │──── MeshReconciler
│  phase: Connected       │        ├─ Verify Cilium
└────────────┬────────────┘        ├─ Set cluster identity
             │ referenced by       ├─ Sync CA
             │                     ├─ Exchange secrets
             ▼                     └─ Report meshStatus[]
┌─────────────────────────┐
│  GlobalService CR       │──── GlobalServiceReconciler
│  meshRef: <binding>     │        ├─ Resolve mesh binding
│  phase: Exported        │        ├─ Find Service on clusters
└────────────┬────────────┘        ├─ Annotate io.cilium/global-service
             │ same mesh           └─ Report endpointSummary[]
             ▼
┌─────────────────────────┐
│  ConnectivityIntent CR  │──── ConnectivityIntentReconciler
│  meshRef: <binding>     │        ├─ Resolve mesh binding
│  phase: Applied         │        ├─ Build CiliumNetworkPolicy
└─────────────────────────┘        ├─ Apply to target clusters
                                   └─ Report appliedClusters[]

Mesh Setup Steps

The reconciler mirrors the logic from the existing cilium-cluster-mesh Helm job, but uses Kubernetes API calls instead of shell commands:

Step Shell Script Provider
1. Verify Cilium cilium status DaemonSet GET
2. Set cluster identity cilium install --set cluster.name/id ConfigMap PATCH
3. Sync CA kubectl get/create secret cilium-ca Secret GET + CREATE
4. Enable clustermesh cilium clustermesh enable Deployment check (expects pre-installed)
5. Connect clusters cilium clustermesh connect Secret exchange between clusters
6. Verify cilium clustermesh status Deployment + Secret checks

Integration with cluster-operator

The kcp-cilium-mesh-provider does not provision clusters or install Cilium. It relies on cluster-operator and the Crossplane compositions to prepare clusters before mesh setup begins. The interaction is read-only from the perspective of the management plane — the provider only reads Secrets and Cluster CRs, never writes them.

Data flow

                      Management Plane (cluster-operator)
                     ┌───────────────────────────────────────────────────────┐
                     │                                                       │
                     │  Cluster CR                    kubeconfig Secret       │
                     │  ┌──────────────────────┐      ┌────────────────────┐ │
                     │  │ metadata:             │      │ metadata:          │ │
                     │  │   name: cluster-ffm-1 │      │   name: (varies)  │ │
                     │  │   namespace: default   │      │   namespace: ...  │ │
                     │  │ spec:                  │      │ data:             │ │
                     │  │   provider: OTC | OSC  │      │   kubeconfig: ... │ │
                     │  │ status:                │      │   (or value: ...) │ │
                     │  │   conditions:          │      └────────────────────┘ │
                     │  │     - type: Ready ✓    │               ▲             │
                     │  │   kubeconfigSecretRef: ├───────────────┘             │
                     │  │     name: <secret>     │                             │
                     │  │     key: kubeconfig    │                             │
                     │  │   capabilities:        │                             │
                     │  │     - name: cilium-cni │◄──── CapabilitiesReconciler │
                     │  │       details:         │      (checks every 5 min)   │
                     │  │         status: ready  │                             │
                     │  │         version: 1.16  │                             │
                     │  └──────────────────────┘                               │
                     └───────────────────────────────────────────────────────┘
                                    │ read                    │ read
                                    ▼                         ▼
                     ┌───────────────────────────────────────────────────────┐
                     │            kcp-cilium-mesh-provider                    │
                     │                                                       │
                     │  1. Read Cluster CR → check conditions[Ready]=True    │
                     │  2. Read kubeconfigSecretRef → get Secret name + key  │
                     │  3. Fetch Secret → extract kubeconfig bytes           │
                     │  4. Build rest.Config → connect to target cluster     │
                     └───────────────────────────────────────────────────────┘

Kubeconfig Secret conventions

cluster-operator stores kubeconfig access differently per provider:

Provider Secret Name Data Key Created By
KIND kind-<clusterName>-kubeconfig kubeconfig Kind reconciler
OTC (CCE) From CCE.Status.Kubeconfig.KubeconfigSecretRef.Name From .Key Crossplane CCE composition
OSC (Gardener) From ShootClaim.Status.Kubeconfig.SecretRef.Name kubeconfig Gardener extension
IMPORTED From spec.importedK8SConfig.secretRef User-defined User

The mesh provider reads the Cluster CR's status.kubeconfigSecretRef (.name + .key) and fetches the Secret from the same namespace as the Cluster CR. This is the same retrieval logic used by cluster-operator's GetKubeconfigFromCluster() utility.

Capability detection: cilium-cni

cluster-operator's CapabilitiesReconciler periodically (every 5 minutes) connects to each Ready cluster and runs registered checkers. The CiliumChecker queries the remote cluster for a DaemonSet in kube-system with label k8s-app=cilium and writes:

status:
  capabilities:
    - name: cilium-cni
      details:
        status: ready          # ready | not-installed | error
        version: "1.16.1"
        image: "quay.io/cilium/cilium:v1.16.1"
        desired-nodes: "3"
        ready-nodes: "3"
      lastTransitionTime: "2026-04-21T10:00:00Z"

The mesh provider can use this capability as a pre-flight check — if a referenced cluster's cilium-cni capability is not-installed or error, the mesh reconciler short-circuits with an Error status instead of timing out trying to reach the DaemonSet.

What cluster-operator owns vs. what this provider owns

Concern Owner How
Cluster provisioning cluster-operator Creates Crossplane Claims (CCE, ShootClaim)
Cilium CNI installation Crossplane composition Gardener: networking.type: cilium; OTC: replaced via Helm
Kubeconfig Secret lifecycle cluster-operator Copies/creates Secret, writes kubeconfigSecretRef to status
Cilium capability detection cluster-operator CiliumCheckerstatus.capabilities[cilium-cni]
ClusterMesh setup this provider CA sync, identity, connection secrets
ClusterMesh status reporting this provider Writes to ClusterMeshBinding.status in kcp
ClusterMesh teardown this provider Finalizer-driven cleanup on CR deletion

Prerequisites

Before a ClusterMeshBinding can be reconciled, all of the following must be true:

1. kcp infrastructure

  • kcp is running with the edge-portal-networking APIExport registered
    kubectl apply -f config/kcp-cilium-mesh-apiexport.yaml   # in system workspace
    kubectl apply -f config/kcp-cilium-mesh-rbac.yaml
    
  • Tenant workspace has the APIBinding for networking CRDs
    kubectl apply -f config/kcp-cilium-mesh-apibinding.yaml  # in tenant workspace
    
  • The edge-portal-fleet APIExport (from kcp-karmada-provider) is also registered, because ClusterMeshBinding references KubernetesCluster CRs from that export

2. Cluster provisioning

  • At least 2 KubernetesCluster CRs exist in the same kcp workspace namespace
  • Each KubernetesCluster has status.phase: Ready (for kcp-karmada-provider) or the corresponding Cluster CR on the management plane has condition Ready=True
  • Each management-plane Cluster CR has a valid status.kubeconfigSecretRef pointing to a Secret with a working kubeconfig

3. Cilium on target clusters

  • Cilium CNI is installed and running on every referenced cluster

    • OSC/Gardener: Automatic via networking.type: cilium in the Shoot spec
    • OTC: Requires uninstalling yangtse-cilium and installing upstream Cilium (currently done by the kcpcrospplane/cilium-cluster-mesh Helm job; Phase 2 will have the Crossplane composition handle this)
    • KIND: Must install Cilium manually or via Helm (cilium install)
    • IMPORTED: User must ensure Cilium is pre-installed
  • clustermesh-apiserver is enabled in Cilium's Helm values:

    # In Cilium Helm values (Crossplane composition or manual install):
    clustermesh:
      useAPIServer: true
      apiserver:
        service:
          type: LoadBalancer    # or NodePort for dev
    
  • cluster-operator's CapabilitiesReconciler reports cilium-cni.status=ready on the management-plane Cluster CR (this confirms the above is working)

4. Network connectivity

  • Clusters can reach each other's clustermesh-apiserver Service endpoint (LoadBalancer IP or NodePort)
  • For OTC: an ELB (Elastic Load Balancer) must be pre-provisioned and its ID annotated on the KubernetesCluster CR:
    metadata:
      annotations:
        networking.edge-portal.eu/elb-id: "41fae315-2964-49c4-afde-bc6d358cd031"
    

How to establish a mesh

Step 1: Provision clusters

Using the portal or directly in kcp, create KubernetesCluster CRs:

apiVersion: platform.edge-portal.eu/v1alpha1
kind: KubernetesCluster
metadata:
  name: cluster-ffm-1
  namespace: default
  annotations:
    networking.edge-portal.eu/elb-id: "c582b56c-2b4e-4c83-9d9e-29e12893c6f6"  # OTC only
spec:
  providerRef:
    name: kind                    # or otc, osc
  region: eu-de
  nodeCount: 3
---
apiVersion: platform.edge-portal.eu/v1alpha1
kind: KubernetesCluster
metadata:
  name: cluster-ber-2
  namespace: default
  annotations:
    networking.edge-portal.eu/elb-id: "41fae315-2964-49c4-afde-bc6d358cd031"
spec:
  providerRef:
    name: kind
  region: eu-de
  nodeCount: 2

Wait for both clusters to reach status.phase: Ready.

Step 2: Verify Cilium is running

Check that cluster-operator has detected Cilium:

# On the management plane:
kubectl get cluster cluster-ffm-1 -o jsonpath='{.status.capabilities}' | jq '.[] | select(.name=="cilium-cni")'

Expected output:

{
  "name": "cilium-cni",
  "details": {
    "status": "ready",
    "version": "1.16.1",
    "ready-nodes": "3",
    "desired-nodes": "3"
  }
}

Step 3: Create the ClusterMeshBinding

In the kcp tenant workspace:

apiVersion: networking.edge-portal.eu/v1alpha1
kind: ClusterMeshBinding
metadata:
  name: edge-mesh-eu
  namespace: default
spec:
  clusterRefs:
    - name: cluster-ffm-1
    - name: cluster-ber-2
  meshConfig:
    serviceType: LoadBalancer
    hubble:
      enabled: true
      ui: false

Step 4: Monitor status

# Watch the binding status:
kubectl get clustermeshbinding edge-mesh-eu -w

# Detailed status:
kubectl get clustermeshbinding edge-mesh-eu -o yaml

The phase progression is:

Pending → Connecting → Connected
                    └→ Degraded (partial connectivity)
                    └→ Error (prerequisite not met)

Step 5: Export a global service

Once phase: Connected, create a GlobalService to export services across the mesh:

apiVersion: networking.edge-portal.eu/v1alpha1
kind: GlobalService
metadata:
  name: my-service-global
  namespace: default
spec:
  meshRef:
    name: edge-mesh-eu
  selector:
    namespace: default
    name: my-service

The provider annotates my-service with io.cilium/global-service: "true" on all mesh clusters. Pods on cluster-ber-2 can now reach my-service as if it were local.

Step 6: Define connectivity policies

Create a ConnectivityIntent to control cross-cluster traffic:

apiVersion: networking.edge-portal.eu/v1alpha1
kind: ConnectivityIntent
metadata:
  name: allow-frontend-backend
  namespace: default
spec:
  meshRef:
    name: edge-mesh-eu
  action: Allow
  source:
    namespace: production
    name: frontend
  destination:
    namespace: production
    name: backend
  ports:
    - protocol: TCP
      port: 8080

The provider creates CiliumNetworkPolicy resources on all mesh clusters.

Teardown

Delete resources in reverse dependency order. Each CR has a finalizer that cleans up on the target clusters before allowing deletion:

# 1. Remove policies (deletes CiliumNetworkPolicy from clusters)
kubectl delete connectivityintent allow-frontend-backend

# 2. Remove global services (removes annotations from clusters)
kubectl delete globalservice my-service-global

# 3. Remove the mesh (removes connection secrets from clusters)
kubectl delete clustermeshbinding edge-mesh-eu

This does not uninstall Cilium — it only removes the mesh configuration, service exports, and network policies created by this provider.

Troubleshooting

ClusterMeshBinding

Symptom Cause Fix
Phase stuck at Pending KubernetesCluster CRs not Ready Check cluster provisioning in kcp-karmada-provider logs
Phase Error: "Cilium not found" Cilium not installed on cluster Check Crossplane composition or install Cilium manually
Phase Error: "clustermesh-apiserver not found" Cilium installed without clustermesh Upgrade Cilium Helm: --set clustermesh.useAPIServer=true
Phase Error: "reading kubeconfig secret" Management plane Secret missing Verify kubectl get cluster <name> -o jsonpath='{.status.kubeconfigSecretRef}'
Phase Degraded Connectivity issue between clusters Check that clustermesh-apiserver Service has an external IP/NodePort
Phase Connecting (stuck) Secrets exchanged but not yet synced Wait ~60s; Cilium agents need time to establish kvstore connection

GlobalService

Symptom Cause Fix
Phase Pending Referenced ClusterMeshBinding not Connected Wait for mesh to connect, check binding status
Phase Error: "ClusterMeshBinding not found" Wrong meshRef.name Fix the reference name
Phase Error: "Service not found" Service doesn't exist on target clusters Create the Service on the clusters first
Phase Degraded Annotation failed on some clusters Check cluster connectivity, kubeconfig secrets

ConnectivityIntent

Symptom Cause Fix
Phase Pending Referenced ClusterMeshBinding not Connected Wait for mesh to connect
Phase Error: "validation failed" Missing required spec fields Check source.namespace and destination.namespace
Phase Degraded Policy apply failed on some clusters Check target cluster access and Cilium CRD availability
Phase Applied but traffic blocked Wrong selector or namespace Verify source/destination labels match actual pods

Usage

Build

make build

Run locally

export KCP_KUBECONFIG=~/.kcp/admin.kubeconfig
export MGMT_KUBECONFIG=~/.kube/config
export NODE_ID=dev-node

make run

Docker

make docker-build

kcp Setup

# In the kcp system workspace:
kubectl apply -f config/kcp-cilium-mesh-apiexport.yaml
kubectl apply -f config/kcp-cilium-mesh-rbac.yaml

# In tenant workspaces:
kubectl apply -f config/kcp-cilium-mesh-apibinding.yaml

Flags

Flag Env Default Description
--kcp-kubeconfig KCP_KUBECONFIG - kcp kubeconfig (required)
--mgmt-kubeconfig MGMT_KUBECONFIG - Management plane kubeconfig (required)
--mgmt-namespace MGMT_NAMESPACE default Namespace for cluster kubeconfig secrets
--node-id NODE_ID - Stable provider node ID (required)
--api-export-name - edge-portal-networking kcp APIExport name
--kcp-external-url KCP_EXTERNAL_URL - Override kcp host for dev setups
--reconcile-interval - 30s Endpoint discovery interval
--mesh-check-interval - 5m Mesh health check interval

Relationship to Existing Components

  • Replaces: kcpcrospplane/cilium-cluster-mesh/ (imperative Helm job)
  • Reads from: cluster-operator kubeconfig Secrets on mgmt plane
  • Watches: KubernetesCluster CRs (from kcp-karmada-provider's APIExport)
  • Feeds into: platform/data-pipeline/ Hubble metrics (future)
  • Portal integration: edge-connect-portal-poc renders GlobalService/ConnectivityIntent in the Networking page alongside ClusterMeshBinding

Source Layout

main.go                      Entry point, flag parsing, reconciler wiring
controller.go                Main controller loop: informers, dispatch, finalizer handling
mesh_reconciler.go           ClusterMeshBinding → Cilium mesh setup
globalservice_reconciler.go  GlobalService → io.cilium/global-service annotations
connectivity_reconciler.go   ConnectivityIntent → CiliumNetworkPolicy / CiliumClusterwideNetworkPolicy
conditions.go                Shared condition constants and builder helpers
status_updater.go            Status patch writer for all three CRDs
endpoint_discovery.go        VW endpoint discovery from APIExportEndpointSlice
conditions_test.go           Unit tests for condition helpers
config/
  kcp-cilium-mesh-apiexport.yaml   APIResourceSchema + APIExport (3 schemas)
  kcp-cilium-mesh-apibinding.yaml  APIBinding for tenant workspaces
  kcp-cilium-mesh-rbac.yaml        RBAC for the provider
samples/
  mesh-binding.yaml           Example ClusterMeshBinding
  global-service.yaml         Example GlobalService
  connectivity-intent.yaml    Example ConnectivityIntent