On large deployments with many jobs, we cannot check each job that we recorded in the DB against the GH API. Before this change, if a job was updated more than 10 minutes ago, garm would check against the GH api if that job still existed. While this approach allowed us to maintain a consistent view over which jobs still exist and which are stale, it had the potential of spamming the GH API, leading to rate limiting. This change uses the scale-down loop as an indicator for job staleness. If a job remains in queued state in our DB, but has dissapeared from GH or was serviced by another runner and we never got the hook (garm was down or GH had an issue - happened in the past), then garm will spin up a new runner for it. If that runner or any other runner is scaled down, we check if we have jobs in the queue that should have matched that runner. If we did, there is a high chance that the job no longer exists in GH and we can remove the job from the queue. Of course, there is a chance that GH is having issues and the job is never pushed to the runner, but we can't really account for everything. In this case I'd rather avoid rate limiting ourselves. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com> |
||
|---|---|---|
| .github/workflows | ||
| apiserver | ||
| auth | ||
| client | ||
| cmd | ||
| config | ||
| contrib | ||
| database | ||
| doc | ||
| internal/testing | ||
| metrics | ||
| params | ||
| runner | ||
| scripts | ||
| test/integration | ||
| testdata | ||
| util | ||
| vendor | ||
| websocket | ||
| .gitignore | ||
| Dockerfile | ||
| Dockerfile.build-static | ||
| go.mod | ||
| go.sum | ||
| LICENSE | ||
| Makefile | ||
| README.md | ||
GitHub Actions Runner Manager (GARM)
Welcome to GARM!
GARM enables you to create and automatically maintain pools of self-hosted GitHub runners, with auto-scaling that can be used inside your github workflow runs.
The goal of GARM is to be simple to set up, simple to configure and simple to use. It is a single binary that can run on any GNU/Linux machine without any other requirements other than the providers it creates the runners in. It is intended to be easy to deploy in any environment and can create runners in any system you can write a provider for. There is no complicated setup process and no extremely complex concepts to understand. Once set up, it's meant to stay out of your way.
GARM supports creating pools on either GitHub itself or on your own deployment of GitHub Enterprise Server. For instructions on how to use GARM with GHE, see the credentials section of the documentation.
Through the use of providers, GARM can create runners in a variety of environments using the same GARM instance. Want to create pools of runners in your OpenStack cloud, your Azure cloud and your Kubernetes cluster? No problem! Just install the appropriate providers, configure them in GARM and you're good to go. Create zero-runner pools for instances with high costs (large VMs, GPU enabled instances, etc) and have them spin up on demand, or create large pools of k8s backed runners that can be used for your CI/CD pipelines at a moment's notice. You can mix them up and create pools in any combination of providers or resource allocations you want.
Join us on slack
Whether you're running into issues or just want to drop by and say "hi", feel free to join us on slack.
Installing
On virtual or physical machines
Check out the quickstart document for instructions on how to install GARM. If you'd like to build from source, check out the building from source document.
On Kubernetes
Thanks to the efforts of the amazing folks at @mercedes-benz, GARM can now be integrated into k8s via their operator. Check out the GARM operator for more details.
Supported providers
GARM has a built-in LXD provider that you can use out of the box to spin up runners on any machine that runs either a stand-alone LXD instance, or an LXD cluster. The quick start guide mentioned above will get you up and running with the LXD provider.
GARM also supports external providers for a variety of other targets.
Installing external providers
External providers are binaries that GARM calls into to create runners in a particular IaaS. There are currently two external providers available:
- OpenStack
- Azure
- Kubernetes - Thanks to the amazing folks at @mercedes-benz for sharing their awesome provider!
Follow the instructions in the README of each provider to install them.
Configuration
The GARM configuration is a simple toml. The sample config file in the testdata folder is fairly well commented and should be enough to get you started. The configuration file is split into several sections, each of which is documented in its own page. The sections are:
Optimizing your runners
If you would like to optimize the startup time of new instance, take a look at the performance considerations page.
Write your own provider
The providers are interfaces between GARM and a particular IaaS in which we spin up GitHub Runners. These providers can be either native or external. The native providers are written in Go, and must implement the interface defined here. External providers can be written in any language, as they are in the form of an external executable that GARM calls into.
There is currently one native provider for LXD and several external providers linked above.
If you want to write your own provider, you can choose to write a native one, or implement an external one. I encourage you to opt for an external provider, as those are the easiest to write and you don't need to merge it in GARM itself to be able to use. Faster to write, faster to iterate. The LXD provider may at some point be split from GARM into it's own external project, at which point we will remove the native provider interface and only support external providers.
Please see the Writing an external provider document for details. Also, feel free to inspect the two available sample external providers in this repository.