16. Dezember 2021

Cilium Host Policies

At Puzzle, we are constantly looking for new bleeding edge technologies which benefit our customers’ system/software architectures. This was also one of the reasons, why Puzzle partnered with Isovalent a few months ago, since they are the main company driving Cilium development, and a major contributor to eBPF. We think the impact of these technologies will be massive in the near future–especially in the container platform ecosystem.

Infrastructure & Cloud Services

Kubernetes & Cloud

Solutions

Cilium

Philip Schmid

System Engineer

In order to strengthen our knowledge about Cilium even further, we recently gathered as a team of five system engineers to deep dive into specific Cilium features for a full week long. During this week, Isovalent enabled multiple tech-sessions with some of their Cilium engineers to gain even more insight about the architecture and implementation of the following features:

Cilium Host Policies
Cilium Cluster Mesh
Kubernetes without Kube-Proxy
Cilium LoadBalancer

More Cilium related content will follow in the future, but in this blog post we’ll to stick with the first topic, a very handy feature of Cilium: Host Policies.

In the following guide, we would like to present a complete Cilium host policy example for a RKE2-based Kubernetes cluster. By using this feature, you can run and at the same time protect your RKE2 nodes without any firewall daemon running or iptables/nftables installed. Because the host firewalling is performed directly inside eBPF, this scheme provides high performance and visibility right away. On top, you can even manage host policies directly inside Kubernetes with CiliumClusterwideNetworkPolicy custom resources.

Prerequisites

To benefit from all the mentioned features, some prerequisites need to be fulfilled:

We recommend running the RKE2 cluster without any Kube-Proxy instance. It can be disabled with the disable-kube-proxy flag inside the RKE2 server config (/etc/rancher/rke2/config.yaml).
Enable Cilium “Kube-Proxy free” mode by configuring kubeProxyReplacement=strict, k8sServiceHost=REPLACE_WITH_API_SERVER_IP and k8sServicePort=REPLACE_WITH_API_SERVER_PORT. (More details in the Cilium documentation.)
Supply appropriate labels to the RKE2 worker nodes, as RKE2 does not do so by default. Some host policies will be assigned to workers based on this label.
kubectl label node node-role.kubernetes.io/worker=true

Configuration

Audit Mode Safeguard

First, set the Cilium policy enforcement mode for the host endpoints to audit. This is a crucial to-do before applying any host policy custom resources because it’s easy to lock yourself out of your Kubernetes cluster/nodes by just missing a single port within your policies!

CILIUM_NAMESPACE=kube-system
for NODE_NAME in $(kubectl get nodes --no-headers=true | awk '{print $1}')
do
    CILIUM_POD_NAME=$(kubectl -n $CILIUM_NAMESPACE get pods -l "k8s-app=cilium" -o jsonpath="{.items[?(@.spec.nodeName=='$NODE_NAME')].metadata.name}")
    HOST_EP_ID=$(kubectl -n $CILIUM_NAMESPACE exec $CILIUM_POD_NAME -c cilium-agent -- cilium endpoint list -o jsonpath='{[?(@.status.identity.id==1)].id}')
    kubectl -n $CILIUM_NAMESPACE exec $CILIUM_POD_NAME -c cilium-agent -- cilium endpoint config $HOST_EP_ID PolicyAuditMode=Enabled
    kubectl -n $CILIUM_NAMESPACE exec $CILIUM_POD_NAME -c cilium-agent -- cilium endpoint config $HOST_EP_ID | grep PolicyAuditMode
done

To see the currently activated policy mode, use the following commands:

for NODE_NAME in $(kubectl get nodes --no-headers=true | awk '{print $1}')
do
    CILIUM_POD_NAME=$(kubectl -n $CILIUM_NAMESPACE get pods -l "k8s-app=cilium" -o jsonpath="{.items[?(@.spec.nodeName=='$NODE_NAME')].metadata.name}")
    kubectl -n $CILIUM_NAMESPACE exec $CILIUM_POD_NAME -c cilium-agent -- cilium endpoint list
done

The output should look something like this (search for the endpoints with the label reserved:host):

ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                                  IPv6   IPv4          STATUS  
           ENFORCEMENT        ENFORCEMENT
...
2152       Disabled (Audit)   Disabled          1          k8s:node-role.kubernetes.io/control-plane=true                                                    ready
                                                           k8s:node-role.kubernetes.io/etcd=true
                                                           k8s:node-role.kubernetes.io/master=true
                                                           k8s:node-role.kubernetes.io/worker=true
                                                           k8s:node.kubernetes.io/instance-type=rke2
                                                           reserved:host

Applying the Host Policies

Next, we recommend to apply a suitable host policy for the master nodes. You could theoretically lock yourself out of the cluster (unless running policy audit mode), as soon as there is another host policy in place which does not explicitly allow the Kubernetes API (6443/TCP).

apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: "rke2-master-host-rule-set"
spec:
  description: "Cilium host policy set for RKE2 masters"
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/master: 'true'
      node.kubernetes.io/instance-type: rke2
  ingress:
  - fromEntities:
    - all
    toPorts:
    - ports:
        # Kubernetes API
      - port: "6443"
        protocol: TCP
        # RKE2 API for nodes to register
      - port: "9345"
        protocol: TCP

With the following set of base host policy rules, you regain SSH access and also permit access to some additional ports which are required by RKE2 and Cilium:

apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: "rke2-base-host-rule-set"
spec:
  description: "Cilium default host policy set for RKE2 nodes"
  nodeSelector:
    matchLabels:
      node.kubernetes.io/instance-type: rke2
  ingress:
  - fromEntities:
    - remote-node
    toPorts:
    - ports:
        # Cilium WireGuard
      - port: "51871"
        protocol: UDP
        # Cilium VXLAN
      - port: "8472"
        protocol: UDP
  - fromEntities:
    - health
    - remote-node
    toPorts:
    - ports:
        # Inter-node Cilium cluster health checks (please also have a look at the appendix for further details)
      - port: "4240"
        protocol: TCP
        # Cilium cilium-agent health status API
      - port: "9876"
        protocol: TCP
  - fromEntities:
    - cluster
    toPorts:
    - ports:
        # RKE2 Kubelet
      - port: "10250"
        protocol: TCP
        # Cilium cilium-agent Prometheus metrics
      - port: "9090"
        protocol: TCP
  - fromEntities:
    - all
    toPorts:
    - ports:
        # SSH
      - port: "22"
        protocol: TCP

Now it’s time for the ETCD rules:

apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: "rke2-etcd-host-rule-set"
spec:
  description: "Cilium host policy set for RKE2 etcd nodes"
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/etcd: 'true'
      node.kubernetes.io/instance-type: rke2
  ingress:
  - fromEntities:
    - host
    - remote-node
    toPorts:
    - ports:
        # RKE2 etcd client port
      - port: "2379"
        protocol: TCP
        # RKE2 etcd peer communication port
      - port: "2380"
        protocol: TCP

And finally, the configuration can be completed by applying the rules which are required for worker nodes:

apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: "rke2-worker-host-rule-set"
spec:
  description: "Cilium host policy set for RKE2 workers"
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker: 'true'
      node.kubernetes.io/instance-type: rke2
  ingress:
  - fromEntities:
    - cluster
    toPorts:
    - ports:
        # Rancher monitoring Cilium operator metrics
      - port: "6942"
        protocol: TCP
        # Cilium Hubble relay
      - port: "4244"
        protocol: TCP
  - fromEntities:
    - all
    toPorts:
    - ports:
        # Ingress HTTP
      - port: "80"
        protocol: TCP
        # Ingress HTTPS
      - port: "443"
        protocol: TCP

Policy Verification

Before you deactivate the policy audit mode, have a look at the packets which would have been dropped if the rules were already enforced:

# Set Cilium namespace
CILIUM_NAMESPACE=kube-system
# Print Cilium pod names to console:
kubectl -n $CILIUM_NAMESPACE get pods -l "k8s-app=cilium" -o jsonpath="{.items[*].metadata.name}"
# Find the Cilium endpoint ID of the host itself (again, search for the endpoint the the label "reserved:host"):
kubectl -n $CILIUM_NAMESPACE exec  -c cilium-agent -- cilium endpoint list
# Output all connections (allowed & audited)
kubectl -n $CILIUM_NAMESPACE exec  -c cilium-agent -- cilium monitor -t policy-verdict --related-to

Activating the Host Policies

Once you’re confident that the rule sets are fine and no essential connections get blocked, set the policy mode back to enforcing:

for NODE_NAME in $(kubectl get nodes --no-headers=true | awk '{print $1}')
do
    CILIUM_POD_NAME=$(kubectl -n $CILIUM_NAMESPACE get pods -l "k8s-app=cilium" -o jsonpath="{.items[?(@.spec.nodeName=='$NODE_NAME')].metadata.name}")
    HOST_EP_ID=$(kubectl -n $CILIUM_NAMESPACE exec $CILIUM_POD_NAME -c cilium-agent -- cilium endpoint list -o jsonpath='{[?(@.status.identity.id==1)].id}')
    kubectl -n $CILIUM_NAMESPACE exec $CILIUM_POD_NAME -c cilium-agent -- cilium endpoint config $HOST_EP_ID PolicyAuditMode=Disabled
done

That’s basically it! Your nodes are now firewalled via Cilium with eBPF in the background, while you can manage the required rules in the same easy way as any other “traditional” Kubernetes NetworkPolicy–via Kubernetes (custom) resources.

Appendix

Here are some more topics to consider before trying to work with Cilium host policies:

Cilium requires ICMP type 0/8 (or 4240/TCP as an alternative) for internal health checks. As host policies do not yet support ICMP filtering without a feature flag (more details here), you will see dropped ICMP packages between the nodes inside Hubble/the cilium monitor output. Nevertheless, that’s not something to worry about because 4240/TCP is allowed in the shown manifests, and the health checks pass because of that anyway.
Keep in mind, these rules only allow TCP/UDP filtering. ICMP (and especially ICMPv6) is not supported yet (without a feature flag), hence IPv6 communication will not work. NeighborSolicitation (ICMPv6 type 135), NeighborAdvertisement (ICMPv6 type 136), RouterSolicitation (ICMPv6 type 133) and RouterAdvertisement (ICMPv6 type 134) packets will be dropped.
Finally, we would like to highlight that the newest Cilium release 1.11.0 brought „ICMP and ICMPv6 support for CNP and CCNP policies with a feature flag“. Nevertheless, as the main issue regarding this topic is still open, and some final work needs to be done, we think it’s probably better to wait with the activation of this feature on productive clusters until it’s fully supported, marked as stable and activated by default in a future release of Cilium.

Sources: