K3s Load Balancer with OpenBSD

· 7min · Dan F.

This post will go into how I exposed a home-grown kubernetes cluster running k3s on four raspberry pi 5's, with an Ubiquiti Edgerouter, and an OpenBSD relayd load balancer.

I will be brief, as I will probably have to use this post as a reference years down the road.

K3s

First, I built out a k3s cluster on four raspberry pi's, connected to a separate network (192.168.1.0/24) than my normal home network (192.168.0.0/24). I followed this document for the installation of k3s on the nodes. I did some research for a hosted postgres database, and finally went with Neon Serverless Postgres for the cluster. I had initially tried out CockroachDB, but ran into compatibility with k3s. Nothing against Cockroach, but k3s requires Postgres versions 10.7, 11.5, or 14.2 at the time of this writing.

Once the k3s cluster was operational, I next installed ArgoCD, as well as cert-manager for certification management. I won't go into details on the ArgoCD installation, but the Applications I used to configure cert-manager are below:

cert-manager.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: cert-manager
  namespace: argocd
spec:
  project: argocd
  source:
    helm:
      values: |-
        installCRDs: true
    repoURL: https://charts.jetstack.io
    chart: cert-manager
    targetRevision: v1.13.3
  destination:
    server: https://kubernetes.default.svc
    namespace: cert-manager
  syncPolicy:
    automated:
      selfHeal: true
      prune: true
  revisionHistoryLimit: 5

cluster_issuer.yaml

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
  namespace: argocd
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: <my email>@findelabs.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: traefik
          serviceType: ClusterIP

EdgeRouter

Next up, I had to make some tweaks to the router. I had created one interface for the 192.168.0.0/24 network, and one for the 192.168.1.0/24 network.

Then, I needed to setup port forwarding for incoming port 80, 443, and 6443, and sent them all to the OpenBSD load balancers (LB) 192.168.0.2 IP. The LB has two networking interfaces, announcing IP's 192.168.0.2, and 192.168.1.2. In the Edge Router's port forwarding configuration, because my kube network is on a different network than the home network, I had to also ensure that both LAN interfaces were configured for port forwarding.

Additionally, within the EdgeRouter's Firewall/NAT -> Port Forwarding page, I had to enable masquerade for internal (192.168.0.1) interface so that we can get responses from inside the kube network.

I discovered that all cert-manager certification requests were failing due to http -> https redirections occurring somewhere on my network. My initial suspicion was that Traefik was redirecting unencrypted calls to TLS, but I found that only requests coming in externally were being 301'd to https.

This turned out because the EdgeRouter was configured to have its GUI listen on all IP's, and had a default setting for https redirection. This was disabled by SSH'ing into the router, and running the following:

configure
set service gui listen-address 192.168.0.1
commit

OpenBSD

Finally, OpenBSD's relayd and pf needed to be configured. I did this with different settings for http, kubeapi, and https. The http and kubeapi listeners are both simple relays, but I couldn't for the life of me get a relay functional for https. I kept getting SSL termination errors. What I did end up getting to work is a Direct Server Return load balancer, by using a relayd redirection. Microsoft has a fairly good blog post which explains DSR better than I ever could.

Here is the relayd.conf I used in the OpenBSD load balancer:

local_addr="192.168.0.2" # This is port em1

interval 10
timeout 200
prefork 5

log connection
log state changes

table <kube_api_hosts> { 192.168.1.3, 192.168.1.4, 192.168.1.5 }
table <kube_hosts> { 192.168.1.3, 192.168.1.4, 192.168.1.5, 192.168.1.6 }

######################
### Kube API Relay ###
######################

protocol "tcp_service" {
  match url log
  tcp { nodelay, socket buffer 65536 }
}

relay "kube_forwarder" {
  listen on $local_addr port 6443
  protocol "tcp_service"
  forward to <kube_api_hosts> port 6443 mode loadbalance check https "/healthz" code 401
}

############################################
### HTTP Relay (Needed for cert manager) ###
############################################

http protocol http_service_lb {
  match url log
  block
  tcp { nodelay, socket buffer 65536 }
  pass quick path "/.well-known/acme-challenge/*" forward to <kube_hosts>
}

relay "http_forwarder" {
  listen on $local_addr port 80
  protocol "http_service_lb"
  forward to <kube_hosts> port 80 mode loadbalance check tcp
}

######################
### HTTPS Redirect ###
######################

redirect "https_forwarder" {
  listen on $local_addr port 443 interface vether0
  sticky-address
  route to <kube_hosts> port 443 mode roundrobin check tcp interface em0 # em0 is the 192.168.1.2 IP
}

And because I used a redirection, I had to include anchor "relayd/*" in the pf.conf file as well. However, because DSR is more difficult to troubleshot since the return flow doesn't go through the LB, I added a few extra lines to my pf config, as shown below:

# em0 is 192.168.1.2
# em1 is 192.168.0.2

set loginterface em0
set state-defaults pflow

match in log on em1 inet proto tcp from any to port 443

anchor "relayd/*"

After a reboot (needed to create the pflog0 interface), I could then watch pflog rule evaluations on incoming connections. You can view historical flows with tcpdump -e -ttt -r /var/log/pflog, or tail active connections with tcpdump -e -ttt -i pflog0.

To further help with debugging pf, here are some useful commands I utilized:

# Show all anchors
doas pfctl -vsA

# Show anchor rules
doas pfctl -a "relayd/https_forwarder" -s rules

# Show how many packets have been let through for each rule
doas pfctl -a 'relayd/https_forwarder'  -sr -vv

K3s Linux Config

Finally, to complete the DSR circle, I had to configure a loopback IP on each of the k3s nodes which matched the listening IP (192.168.0.2) of the OpenBSD loadbalancer.

echo net.ipv4.conf.lo.arp_ignore=1 | sudo tee -a /etc/sysctl.conf
echo net.ipv4.conf.lo.arp_announce=2 | sudo tee -a /etc/sysctl.conf
sudo sysctl --system

We also need to disable wifi on the Pi's by running the following on each pi:

sudo sed -i 's/^/# /g' /etc/netplan/50-cloud-init.yaml

Finally, and modify this for your own networks, I used the following config on all my Pi's netplan config, found at /etc/netplan/99_config.yaml:

network:
  version: 2
  renderer: networkd
  ethernets:
    eth0:
      dhcp4: no
      addresses:
        - 192.168.1.XX/24
      routes: 
        - to: default
          via: 192.168.1.1
      nameservers:
        search:
          - "<MY DOMAIN>"
        addresses:
          - 192.168.1.1 # The edgerouter DNS service
    lo:
      match:
        name: lo
      addresses: 
      - 192.168.0.2/32 #  This must match the ingress IP of the OpenBSD relayd service

Note: be sure to correct file permissions: sudo chmod 600 /etc/netplan/99_config.yaml

You can then run sudo netplan try in order to configure the network interfaces on each Pi.

DNS Configuration

On to the last section. I am going to use proxima.findelabs.com as the example in this section, which is one of the sites hosted on my k3s cluster. In order for cert-manager to be able to generate Let's Encrypt certs for this domain, we need to ensure that the domain points back at my network's public IP.

For ease of future changes, I set up two records: one A for the hosting service: fhs.findelabs.com, and one CNAME for proxima.findelabs.com.

In my DNS provider for findelabs.com, I configured something that looks like the following:

fhs.findelabs.com A RECORD -> 172.10.37.13
proxima.findelabs.com CNAME -> fhs.findelabs.com

This allows me to easily change the fhs.findelabs.com A record whenever my public IP changes in the future, without having to update every single domain hosted on my systems. All domains hosted on my k3s should instead point at the fhs.findelabs.com CNAME.

Final Thoughts

Thanks for reading all!


Has been tested on OpenBSD 7.4