Google Kubernetes Engine (GKE) Autopilot is a fully managed, opinionated Kubernetes platform that abstracts away infrastructure management so teams can focus on deploying applications.
These fully managed platforms are designed to be locked down so you don’t shoot yourself or others in the foot. Unfortunately, this also means that practicing chaos engineering on these platforms is very limited.
By becoming an official partner for Google Autopilot, Steadybit is now able to lift some of the restrictions and enable better experimentation options for teams.
You can integrate chaos engineering practices directly into your fully managed Kubernetes environments, using the same container-level fault injections as you would for any other standard cluster.
The Steadybit platform consists of a central control plane and agents that run within your Kubernetes cluster. Since GKE Autopilot has some restrictions compared to standard GKE (like limited permissions and no direct access to nodes), the installation process leverages features compatible with its security model.
Here’s what you need to get started.
GKE Autopilot requires an allowlist that defines exemptions from restrictions for particular workloads. We maintain such an allowlist for our container extension, You can install on your GKE Autopilot cluster (>= 1.32.1-gke.1729000) like this:
kubectl apply -f - <<'EOF' apiVersion: auto.gke.io/v1 kind: AllowlistSynchronizer metadata: name: steadybit-synchronizer spec: allowlistPaths: - Steadybit/extension-container/* EOF kubectl wait --for=condition=Ready allowlistsynchronizer/steadybit-synchronizer --timeout=60s
Step 2: Install the Steadybit Agent and extensions to the cluster
Use our Helm Chart to install the agent and extensions to the cluster. Please remember to replace the cluster name and agent-key with your own.
helm repo add steadybit https://steadybit.github.io/helm-charts helm repo update helm upgrade --install steadybit-agent --namespace steadybit-agent \ --create-namespace \ --set agent.key=<replace-with-agent-key> \ --set global.clusterName=<replace-with-cluster-name> \ --set extension-container.container.runtime=containerd \ --set extension-container.platform=gke-autopilot \ --set extension-host.enabled=false \ --set agent.registerUrl=https://platform.steadybit.com \ steadybit/steadybit-agent
When the agent and the discovered targets appear in the Steadybit Explorer, you can start creating and running experiments on your cluster.
These Container attacks are fully supported on GKE Autopilot:
As GKE Autopilot restricts access to the underlying nodes and makes no exemptions from it. Host-level attacks (e.g. reboot) or node-level attacks (e.g. drain node) are not and will not be available.
Now, you can have the best of both worlds: the ease of use and best practices of GKE Autopilot, plus the ability to improve the reliability of your applications using Steadybit as a Partner of Google Autopilot.
We also recently introduced new extensions for Kafka and Splunk. No matter what tools and technologies your team is using, Steadybit is easy to plug in and start running tests.
Are you ready to experiment on your cluster and strengthen your resiliency?
Get started with our free trial or book a demo today.