Behind the Scenes on AWS Contributions to Cloud Native Open Source Projects
April 5, 2023Amazon Elastic Kubernetes Service (Amazon EKS) is well known in the Kubernetes community. But few realize that AWS engineers are closely involved and contributing upstream to Kubernetes and to many more cloud native open source projects.
In the past year alone, AWS contributed significantly to containerd, Cortex, etcd, Fluentd, nerdctl, Notary, OpenTelemetry, Thanos, and Tinkerbell. We employ maintainers and contributors on these projects and we will contribute more to these and other projects in the coming year. Here’s a behind-the-scenes look at our contributions and why we’re investing in the open source projects we support. You can also meet many of our contributors in the AWS booth at KubeCon Europe in Amsterdam, April 18-21, 2023 and hear from them in our virtual Container Day event 9 a.m. – 4 p.m. CEST on April 18.
“Amazon EKS is committed to open source and we are spending a lot of our cycles now focused on contributing back to the community. Kubernetes is part of a community that’s bigger than AWS and so we’re continuing to be committed to maintaining and helping that community to be successful because without it, we wouldn’t exist, either,” said Barry Cooks, Vice President, Kubernetes, at AWS and a Cloud Native Computing Foundation (CNCF) governing board member.
AWS contributes to Kubernetes and Etcd
Today, AWS is heavily involved in open source, cloud native projects. Consider, for example, some of our recent key contributions to Kubernetes and etcd, the underlying data store for Kubernetes.
“We’re building the AWS cloud provider, contributing to CAPI (cluster API), and serve as part of the security response committee. We helped implement gzip optimization which improves the performance of Kubernetes clients,” said Nathan Taber who leads the product team for Kubernetes at AWS, in a keynote at KubeCon North America 2022. “With etcd we’re bringing our operational learnings from running just so much etcd at scale, back into the community.”
The AWS cloud provider for Kubernetes is the open source interface between a Kubernetes cluster and AWS service APIs. This project allows a Kubernetes cluster to provision, monitor, and remove AWS resources necessary for operation of the cluster.
As of Kubernetes 1.27, AWS has just finished a multi-year effort to migrate our legacy cloud provider out of tree to an external cloud provider. The cloud provider migration reduces binary bloat in the main kubernetes/kubernetes (k/k) repository, as well as reduces dependency complexity and the surface area for security vulnerabilities.
AWS has also built a webhook framework that allows cloud providers to host webhooks in their cloud-controller-managers, which makes certain migration tasks easier. One use case for this is helping other cloud providers to migrate the persistent volume labeller admission controllers from the API server code, which is one of the last areas of cloud provider specific code that needs to be migrated out of core Kubernetes.
“We’ve included a lot of space in our planning for upstream open source work this year,” said Nick Turner, software developer on the AWS Kubernetes team and a chair in Kubernetes SIG-cloud-provider. “Expect us to keep up our contributions to the cloud provider and the load balancer controller as well as increase our investments in the AWS IAM authenticator for Kubernetes and KMS encryption provider.”
These and other Kubernetes contributions bring value to the entire Kubernetes community as well as to the EKS service and its customers.
Since KubeCon Detroit last fall, the EKS-etcd team has contributed numerous improvements to etcd. Chao Chen contributed to the effort to improve testing mechanisms for etcd by unifying the test frameworks used by etcd tests. Baoming Wang contributed an important metric to the Kubernetes API server code base which will help catch data corruption issues early. We’ve also worked on building a linearizability test suite, made various improvements to the core etcd database and the etcd backend database Bolt-DB, contributed to documentation, made helm more resilient to etcd side transient errors, and fixed an issue with the installation script for argo-cd-helmfile.
What’s driving AWS to contribute more to cloud native open source
Like most modern companies, AWS builds many of its services with open source components. There are several business and technical reasons we do this, which we’ve outlined in an article on The New Stack about why we invest in sustainable open source. We recognize that the success of our services depends on the success of those underlying open source projects.
Given that most of the open source projects that AWS supports underpin specific services, AWS tasks all engineers working in services, regardless of their assigned sub-service teams, to contribute in any way that they can to those upstream projects.
The result is a virtuous cycle that promotes mutually beneficial growth. As AWS services grow, so too do the open source projects upon which they are based because of AWS contributions and support. Conversely, as these open source projects grow from the contributions of other companies and developers, so do the benefits to the AWS services that depend upon them.
AWS contributions focus on performance and scale
AWS contributions to open source typically come as a practical matter in the form of bug fixes, code reviews, documentation, new features, or security enhancements. Like many developers working in the open source space, AWS engineers often work to address issues that arise in the course of their day jobs and then share the fixes with the rest of the open source community. Similarly, new features for an open source project are developed by AWS engineers to expand the project’s scale or performance which in turn increases the project’s usability, stability, and overall appeal.
Because AWS has a large number of Kubernetes clusters under management, it affords AWS a unique opportunity to test the limitations of open source software and build its edges stronger and further out from its initial core. So many of the contributions that our team members do for upstream Kubernetes, etcd, containerd, and other projects center on making sure that we provide insights to the upstream community on where things break down in scaling, production, and operational readiness.
The resulting insights provide value for the entire open source community as well as our own customers.
Take for example, the lag fix that curiously performed as a latency expander. AWS engineer Shyam Jeedigunta, was looking at the logs and metrics collected from thousands of production EKS clusters. He determined that Gzip compression is enabled inside the Kubernetes API server to reduce the demand on network bandwidth and to decrease latency. However, the compression was actually increasing the latency for large list requests made by clients to the Kubernetes API server. Shyam, who is also co-chair of the Kubernetes scalability special interest group (SIG), took a deep dive into the issue to investigate whether a particular compression level created the problem and if so, could the compression level be reduced? Could Gzip compression be disabled entirely? What impact would that have on latency and network bandwidth?
Answers to questions like this one lead to contributions upstream in etcd and core Kubernetes from AWS service teams. Customers and others often report these kinds of issues to the project as well, but the nature of the problem isn’t clear until it’s viewed on 1,000 nodes and 200,000 objects of a certain kind. AWS engineers diagnose what’s going on, put together troubleshooting information, and collate information into proposals on how to fix the problem(s) to upstream to Kubernetes. AWS likes to spearhead fixing issues that arise from running the projects at scale.
Key AWS contributions
AWS contributes to many Kubernetes sub projects and SIGs. For example, Micah Hausler and Sri Saran Balaji Vellore Rajakumar serve on the Kubernetes Security Response Committee (SRC), Davanum Srinivas (Dims) chairs SIG-Architecture and SIG-k8s-infra, and Nick Turner is a chair in SIG-cloud-provider. Key contributions have gone into projects including containerd, Cortex, cdk8s, CNI, nerdctl and Prometheus. Innovations have also been substantial and include TorchServe, improved ARM support through AWS Graviton, and the Virtual GPU plugin. However, this is not an exhaustive or complete list of AWS contributions and innovations in the cloud native community.
On containerd, for example, AWS employs two maintainers who contribute features and help ensure the project’s general health and security. Key contributions from AWS engineers to the containerd project include OpenTelemetry integration in the 1.7.0 release, improved tracing, and improved fuzzing integration.
“It’s been awesome to see the growth on the container runtime team here at AWS these past few years. I love to see the eagerness to learn not just *how* to contribute, but how to do it well and really benefit the broader community,” said Phil Estes, a principal engineer at AWS and a containerd maintainer.
Nerdctl, a Docker-compatible CLI for containerd and a containerd sub-project, is used by other open source projects Lima, Finch, and Rancher Desktop. AWS engineers significantly improved nerdctl’s compose support by adding 11 out of 13 missing compose commands. We enhanced nerdctl’s image signing/verification support by contributing cosign support for nerdctl compose, and notation support for nerdctl. And engineer Jin Dong recently became the first reviewer for the project from AWS.
AWS services are also standardizing on OpenTelemetry, a set of open source tools and standards for collecting metrics, logs, and traces to measure application performance. AWS Distro for OpenTelemetry (ADOT), OpenSearch, and CloudWatch are all building on OpenTelemetry and contribute back to the upstream project. All ADOT code is 100% open source and contributed upstream. Key contributions include: adding functionality to upstream observability components such as OpenTelemetry language SDKs, collectors, and agents.
“Amazon is the fourth largest contributor to OpenTelemetry with a dedicated maintainer and many contributors working on the project. A key contribution has been improving collector and metric stability, including improved Prometheus interoperability with OpenTelemetry,” said Taber.
A fourth example is Cortex where AWS is the top supporter of the project and employs three maintainers. As AWS runs this project at scale, engineers have the opportunity to identify and fix scaling cliffs before they become a problem for the rest of the community. Some of the key contributions are new features and performance improvements. Examples include partition compactor, Ring DynamoDB Multikey KV, out of order samples ingestion, snappy-block gRPC compression, ARM images, and Thanos PromQL engine integration.
We have also contributed bug fixes to Thanos, a tool for setting up highly available Prometheus instances with long term storage. Thanos is a CNCF incubating project which Cortex depends on. We participated in the development of the new Thanos PromQL engine and open sourced a tool that could use fuzzing for correctness testing which has already caught a few bugs.
AWS employs four maintainers on Tinkerbell, a cloud native open source bare metal provisioning engine for EKS Anywhere and a CNCF Sandbox project. Key contributions include organizing the project roadmap, VLAN support, a Kubernetes native backend, out-of-band management Kubernetes controller, Helm Chart deployment, and Cluster API provider updates.
“Our team has done a lot of work to update the Tinkerbell backend from Postgres to native Kubernetes,” said Taber.
AWS employs three maintainers in Notation, a sub project of Notary under the CNCF, and is the third largest code contributor to Notary. Notation enables the generation of cryptographic signatures for container images so users can verify that they come from a trusted source or process. AWS founded the sub project with other contributors to come up with specifications for signature format, generation, verification, and revocation. As part of this work we also defined a process for evaluating signature envelope formats like COSE ensuring that they met a high security bar before they were used in Notation.
AWS employees have either written or reviewed the majority of code contributions for the core Notation libraries and a CLI. AWS also employs a maintainer to Ratify so Kubernetes users can easily enable policies for signature verification with their existing admissions controllers. Similarly we also employ a maintainer to ORAS so signatures can easily be pushed to OCI registries. Notation enables users to define granular trust policies for defining which sources they want to trust, balance deployment safety and security needs, and flexibility on secure signing key storage options.
We have contributed to many other open source projects as well, including Crossplane, for which AWS added support for EKS IRSA in the China region and fixed Amazon Route 53 wildcard support, and Backstage, with AWS Proton and AWS Code Suite (AWS CodeBuild, AWS CodePipeline, and AWS CodeDeploy).
“We’re very excited about doing more development in the open, sharing that with our customers, and working directly in some cases with customers on their needs in open source projects and working together to make the community stronger in the Kubernetes space,” Cooks said.
AWS is open
We want to hear from you. AWS engineers are open to helping community members through collaboration and contribution opportunities. Tell us how we can help meet your needs.
AWS engineers, solutions architects, and product managers are hanging out on the Kubernetes community and the CNCF community Slack channels. Channels where you can reach out to us include the provider AWS channel and Karpenter channel, and the AWS controllers for Kubernetes channel on the Kubernetes Slack.
Find us and tell us what you’d like us to work on. Or if you have a particular issue that you found in one of these upstream projects that you think our engineers can help move the needle on. Come find us and talk to us in the CNCF’s AWS Slack channel and join us for our virtual Container Day on April 18, before KubeCon EU.