-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Validate Tekton works with GKE Autopilot #3798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
/remove-lifecycle stale
need to verify |
🧑💻 🎉 in short,
@imjasonh @vdemeester @bobcatfish installation log
|
Thanks for trying it out @nikhil-thomas Do e2e tests pass against the autopilot cluster? In theory we could redo any webhook-initiated mutations on a resource's first reconcilation (which we should maybe do anyway, and maybe already do), which would make the mutating webhook a nice-to-have optimization. |
i haven't tried. I shall add it to my list and post an update soon. 🧑💻 |
I am also running into this issue 👋🏾 |
I am also running into this issue. Happy to help out testing if I can. |
If the lack of webhooks is indeed the only issue (big if), I think this boils down to two issues:
And a secret third:
|
Seeing as my background is Java and my Go is basic / still learning. Can I help with 1 or 2? I was using the helm chart, not sure if that's using the operator under the covers or whether its slightly different. |
Same here. |
I thought I could get away with this autopilot restriction, but it seems tekton-pipelines-webhook doesn't start up correctly due to the MutatingWebhookConfiguration. Other things seem to work okay though. |
The trick is that we use the mutating webhook for our default values set or our auto conversion from v1alpha1 to v1beta1, .. |
As a test, I created a kind cluster, installed latest Tekton, disabled webhooks:
...and ran the e2e tests with:
logs attached: e2e-testlog.txt Only a few tests failed, mainly because they expected to have the webhook block an invalid request. If we loosened these tests to handle async validation on the first reconcile loop, they should pass. |
tl;dr: it should be much easier to test on GKE Autopilot >1.21 now. |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Tried tekton (v0.33.0) with Autopilot (1.21.6-gke.1500) with a pipeline with two tasks:
Generally I should also say with Autopilot scheduling and using |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
I've played with Tekton on GKE Autopilot, including with GKE Spot Pods, and everything in my tests seems to work fine. There might be room for some official GCP-authored-and-hosted doc about how to make them work best together, but that's probably out of scope for Tekton. If someone finds a bug that makes it not work with GKE Autopilot, please let us know. Until then: /close |
@imjasonh: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
GKE Autopilot is a new mode for GKE which locks down certain aspects of the cluster, in exchange for a more managed environment, and billing based on pod resource requests instead of node reservations. Someone should make sure Tekton works well with it, or at least identify where it doesn't and document those.
Looking through the overview (https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview) there aren't many things locked down that we might rely on, but a few might be problematic:
Webhook limitations:
Tekton uses mutating webhooks to set defaults. There's also no mention of conversion webhooks, which Tekton uses.
Pod affinity and anti-affinity:
This might affect the Affinity Assistant
Allowable resource ranges:
We should set reasonable resource requests for controller and webhook deployments, especially if we think we should request lower than the default.
We might find out other things don't work as expected as well. If things already work fine with Autopilot, we should document that somewhere too.
/kind documentation
The text was updated successfully, but these errors were encountered: