Provision a Multi-Region k3s cluster on Google Cloud with Terraform
k3s, a lightweight certified Kubernetes distribution, developed at Rancher Labs. It is one of the three most popular distributions on the CNCF Landscape. Because it is a single binary, it is quite easy to download and install, all while giving you the same bells and whistles like all the other Kubernetes distribution.
With k3s, you can turn any VM or even IoT and edge devices like a Raspberry Pi into a functional Kubernetes cluster. Of course, a single node cluster is good to start with, but when this node crashes for some reason, your applications will suffer a total failure.
In this post, I give some highlights on a Terraform module to build a high available k3s cluster, spread across multiple regions, on the Google Cloud Platform with Terraform.
The complete Terraform configuration is available on GitHub if you want to try it yourself.
The HA k3s is built with the following Google Cloud resources:
- a Cloud SQL instance of an external datastore
- a Managed Instance Group of server nodes that will serve the Kubernetes API and run other control plane services
- multiple Managed Instance Groups of agent nodes that will run our apps spread across multiple regions
- an Internal TCP Load Balancer in front of the server nodes to allow the agent nodes to register with the cluster
- an External TCP Load Balancer to expose to API server to allow interaction with the cluster using e.g.
kubectl
Cloud SQL as the External DB
Because we will be running more than one server node, a datastore is needed to store the state of the cluster. Most Kubernetes distributions use etcd as the datastore, while k3s also has support for SQL databases.
Like many other cloud providers, GCP offers a managed MySQL or Postgres service, known as Cloud SQL, which is a perfect fit for our datastore.
Here is a Cloud SQL configuration in Terraform code:
resource "random_id" "k3s-db" {
prefix = "k3s-db-"
byte_length = 4
}
resource "google_sql_database_instance" "k3s-db" {
name = random_id.k3s-db.hex
region = var.region
database_version = "POSTGRES_11"
settings {
tier = var.db_tier
availability_type = "REGIONAL"
disk_size = 50
disk_type = "PD_SSD"
disk_autoresize = true
ip_configuration {
ipv4_enabled = "false"
private_network = var.network
}
backup_configuration {
enabled = true
start_time = "01:00"
}
maintenance_window {
day = 6
hour = 1
}
}
depends_on = [google_service_networking_connection.k3s-private-vpc-connection]
}
As you can see, high availability is enabled for this instance by marking it as REGIONAL
, and from a security perspective, a public IP address is disabled, making the SQL service only available from within the Virtual Private Network.
The k3s server nodes
For the k3s server nodes, we create a regional managed instance group:
Managed instance groups (MIGs) let you operate apps on multiple identical VMs. You can make your workloads scalable and highly available by taking advantage of automated MIG services, including: autoscaling, autohealing, regional (multiple zone) deployment, and automatic updating.
The usage of a Regional MIG adds a higher availability by spreading the created instances across multiple zones within the same region. For a such a managed instance group, an instance template is needed which defines how the instances should look like. For this setup, we will use the following configurations:
- k3s is installed at startup using a cloud-init startup script
- a dedicated service account for the k3s server nodes
- the instances only have a private IP address, so they are not directly exposed to the internet
- the instances will run in a dedicated subnet of our VPC
- for outgoing traffic, a Cloud NAT is installed for this subnet
The Terraform manifest also contains an External and an Internal Load Balancer for the k3s server nodes, the first for an administrator using kubectl
and the second for the agent registration.
The startup script template for the server nodes:
#! /bin/bash
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.19.3+k3s3" sh -s - \
--write-kubeconfig-mode 644 \
--token "${token}" \
--tls-san "${internal_lb_ip_address}" \
--tls-san "${external_lb_ip_address}" \
--node-taint "CriticalAddonsOnly=true:NoExecute" \
--disable traefik \
--datastore-endpoint "postgres://${db_user}:${db_password}@${db_host}:5432/${db_name}"
Creating the instance template with Terraform:
data "template_file" "k3s-server-startup-script" {
template = file("${path.module}/templates/server.sh")
vars = {
token = random_string.token.result
internal_lb_ip_address = google_compute_address.k3s-api-server-internal.address
external_lb_ip_address = google_compute_address.k3s-api-server-external.address
db_host = var.db_host
db_name = var.db_name
db_user = var.db_user
db_password = var.db_password
}
}
resource "google_compute_instance_template" "k3s-server" {
name_prefix = "k3s-server-"
machine_type = var.machine_type
tags = ["k3s", "k3s-server"]
metadata_startup_script = data.template_file.k3s-server-startup-script.rendered
metadata = {
block-project-ssh-keys = "TRUE"
enable-oslogin = "TRUE"
}
disk {
source_image = "debian-cloud/debian-10"
auto_delete = true
boot = true
}
network_interface {
network = var.network
subnetwork = google_compute_subnetwork.k3s-servers.id
}
shielded_instance_config {
enable_secure_boot = true
}
service_account {
email = var.service_account
scopes = [
"https://www.googleapis.com/auth/cloud-platform",
]
}
lifecycle {
create_before_destroy = true
}
}
Exposing the server nodes with some load balancers:
resource "google_compute_region_backend_service" "k3s-api-server-internal" {
name = "k3s-api-server-internal"
region = var.region
load_balancing_scheme = "INTERNAL"
health_checks = [google_compute_health_check.k3s-health-check-internal.id]
backend {
group = google_compute_region_instance_group_manager.k3s-servers.instance_group
}
}
resource "google_compute_forwarding_rule" "k3s-api-server-internal" {
name = "k3s-api-server-internal"
region = var.region
load_balancing_scheme = "INTERNAL"
allow_global_access = true
ip_address = google_compute_address.k3s-api-server-internal.address
backend_service = google_compute_region_backend_service.k3s-api-server-internal.id
ports = [6443]
subnetwork = google_compute_subnetwork.k3s-servers.self_link
}
resource "google_compute_region_backend_service" "k3s-api-server-external" {
name = "k3s-api-server-external"
region = var.region
load_balancing_scheme = "EXTERNAL"
health_checks = [google_compute_region_health_check.k3s-health-check-external.id]
backend {
group = google_compute_region_instance_group_manager.k3s-servers.instance_group
}
}
resource "google_compute_forwarding_rule" "k3s-api-server-external" {
name = "k3s-api-server-external"
region = var.region
load_balancing_scheme = "EXTERNAL"
ip_address = google_compute_address.k3s-api-server-external.address
backend_service = google_compute_region_backend_service.k3s-api-server-external.id
port_range = "6443-6443"
}
The k3s agent nodes
The setup of the agent nodes is pretty much the same as the server nodes. Except that we create multiple k3s agent pools in different subnets in different regions. All of them register themself to the server nodes using the internal load balancer created earlier.
The startup script template for the server nodes:
#! /bin/bash
curl -sfL https://get.k3s.io | K3S_TOKEN="${token}" INSTALL_K3S_VERSION="v1.19.3+k3s3" K3S_URL="https://${server_address}:6443" sh -s - \
--node-label "svccontroller.k3s.cattle.io/enablelb=true"
Accessing the cluster
To access the cluster from the outside, we first need to download the kubeconfig file from one of the server nodes:
$ gcloud compute instances list --project=$PROJECT
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
k3s-agent-eu002-028c europe-north1-b e2-micro 10.166.0.3 RUNNING
k3s-agent-eu002-zwjx europe-north1-c e2-micro 10.166.0.2 RUNNING
k3s-server-rld4 europe-west1-b e2-micro 10.128.0.5 RUNNING
k3s-server-lzjs europe-west1-c e2-micro 10.128.0.4 RUNNING
k3s-server-tz0w europe-west1-d e2-micro 10.128.0.3 RUNNING
k3s-agent-eu003-j91b europe-west2-b e2-medium 10.154.0.2 RUNNING
k3s-agent-eu003-bsc3 europe-west2-c e2-medium 10.154.0.3 RUNNING
k3s-agent-eu001-3pnp europe-west4-b e2-micro 10.164.0.2 RUNNING
k3s-agent-eu001-39nt europe-west4-c e2-micro 10.164.0.3 RUNNING
$ gcloud compute scp --zone "europe-west1-b" --tunnel-through-iap --project $PROJECT k3s-server-rld4:/etc/rancher/k3s/k3s.yaml ./kubeconfig
k3s.yaml 100% 2961 77.3KB/s 00:00
Now you should have a kubeconfig file in your current directory. Next, take the public IP address of the TCP Load Balancer and replace 127.0.0.1 in the kubeconfig file with that IP address.
export IP=$(gcloud compute addresses list --project $PROJECT | grep k3s-api-server-external | tr -s ' ' | cut -d ' ' -f 2)
sed -i "s/127.0.0.1/$IP/g" kubeconfig
Test if you can reach the cluster:
$ kubectl --kubeconfig ./kubeconfig get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k3s-server-lzjs Ready master 54m v1.19.3+k3s3 10.128.0.4 <none> Debian GNU/Linux 10 (buster) 4.19.0-12-cloud-amd64 containerd://1.4.1-k3s1
k3s-agent-eu001-3pnp Ready <none> 54m v1.19.3+k3s3 10.164.0.2 <none> Debian GNU/Linux 10 (buster) 4.19.0-12-cloud-amd64 containerd://1.4.1-k3s1
k3s-server-rld4 Ready master 54m v1.19.3+k3s3 10.128.0.5 <none> Debian GNU/Linux 10 (buster) 4.19.0-12-cloud-amd64 containerd://1.4.1-k3s1
k3s-agent-eu002-zwjx Ready <none> 54m v1.19.3+k3s3 10.166.0.2 <none> Debian GNU/Linux 10 (buster) 4.19.0-12-cloud-amd64 containerd://1.4.1-k3s1
k3s-agent-eu003-bsc3 Ready <none> 54m v1.19.3+k3s3 10.154.0.3 <none> Debian GNU/Linux 10 (buster) 4.19.0-12-cloud-amd64 containerd://1.4.1-k3s1
k3s-agent-eu002-028c Ready <none> 54m v1.19.3+k3s3 10.166.0.3 <none> Debian GNU/Linux 10 (buster) 4.19.0-12-cloud-amd64 containerd://1.4.1-k3s1
k3s-agent-eu003-j91b Ready <none> 54m v1.19.3+k3s3 10.154.0.2 <none> Debian GNU/Linux 10 (buster) 4.19.0-12-cloud-amd64 containerd://1.4.1-k3s1
k3s-agent-eu001-39nt Ready <none> 54m v1.19.3+k3s3 10.164.0.3 <none> Debian GNU/Linux 10 (buster) 4.19.0-12-cloud-amd64 containerd://1.4.1-k3s1
k3s-server-tz0w Ready master 54m v1.19.3+k3s3 10.128.0.3 <none> Debian GNU/Linux 10 (buster) 4.19.0-12-cloud-amd64 containerd://1.4.1-k3s1
Wrapping up
In this article, I gave some insights on how to build a high available k3s cluster spread across multiple regions on the Google Cloud Platform.
High Availability is accomplished by:
- a regional Cloud SQL instance as the k3s datastore.
- a regional Managed Instance Group for k3s the server nodes, running multiple server nodes spread across multiple zones.
Not only HA is considered, but we also brought some security configuration in place like:
- dedicated service account for the servers and the agents
- strict firewall rules, exposing only the ports required to get a functional cluster
- only using private IP address for the database, servers and agents
The complete Terraform configuration is available on GitHub
What’s next?
With this k3s cluster setup, you can start deploying your applications globally. What is not (yet) included in the Terraform Module is a way to make the applications available to the public. This can be achieved by creating, e.g. a Google L7 HTTP(S) Load Balancer that covers all the k3s agent instance groups.
References:
- https://github.com/jsiebens/k3s-on-gcp
- https://k3s.io
- https://terraform.io
- https://rancher.com/blog/2020/k3s-high-availability
- https://cloud.google.com/compute/docs/instance-groups
- https://cloud.google.com/load-balancing/docs/load-balancing-overview
- https://cloud.google.com/sql/docs/postgres/private-ip