GitHub Actions Runner Manager

autoscaling autoscaller aws azure ec2 equinix gcp gitea github incus k8s kubernetes lxd openstack runner self-hosted

Find a file

Gabriel Adrian Samfira 459906d97e Prevent abusing the GH API On large deployments with many jobs, we cannot check each job that we recorded in the DB against the GH API. Before this change, if a job was updated more than 10 minutes ago, garm would check against the GH api if that job still existed. While this approach allowed us to maintain a consistent view over which jobs still exist and which are stale, it had the potential of spamming the GH API, leading to rate limiting. This change uses the scale-down loop as an indicator for job staleness. If a job remains in queued state in our DB, but has dissapeared from GH or was serviced by another runner and we never got the hook (garm was down or GH had an issue - happened in the past), then garm will spin up a new runner for it. If that runner or any other runner is scaled down, we check if we have jobs in the queue that should have matched that runner. If we did, there is a high chance that the job no longer exists in GH and we can remove the job from the queue. Of course, there is a chance that GH is having issues and the job is never pushed to the runner, but we can't really account for everything. In this case I'd rather avoid rate limiting ourselves. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>		2023-12-15 22:41:50 +00:00
.github/workflows	Refactor integration E2E tests	2023-08-24 15:22:46 +03:00
apiserver	Add force delete runner	2023-10-12 06:15:36 +00:00
auth	Add temporary redirect to go-github fork	2023-09-24 13:49:04 +00:00
client	Add force delete runner	2023-10-12 06:15:36 +00:00
cmd	Add force delete runner	2023-10-12 06:15:36 +00:00
config	Add option to disable JIT config	2023-12-11 12:37:33 +00:00
contrib	Rotate log file on SIGHUP	2023-06-27 20:04:20 +03:00
database	Add force delete runner	2023-10-12 06:15:36 +00:00
doc	feat: add new metrics	2023-10-06 10:21:56 +02:00
internal/testing	Rename module	2023-03-12 16:01:49 +02:00
metrics	feat: add new metrics	2023-10-06 10:21:56 +02:00
params	Prevent abusing the GH API	2023-12-15 22:41:50 +00:00
runner	Prevent abusing the GH API	2023-12-15 22:41:50 +00:00
scripts	Enable Windows builds	2023-08-18 14:46:00 +00:00
test/integration	Fix linting issues	2023-11-21 20:52:40 +02:00
testdata	Update sample config	2023-12-11 14:05:10 +00:00
util	Replace deprecated function call	2023-09-24 08:04:00 +00:00
vendor	Add force delete runner	2023-10-12 06:15:36 +00:00
websocket	Fixed a bunch of linting issues	2023-01-20 22:21:22 +02:00
.gitignore	Refactor integration E2E tests	2023-08-24 15:22:46 +03:00
Dockerfile	Enable Windows builds	2023-08-18 14:46:00 +00:00
Dockerfile.build-static	Enable Windows builds	2023-08-18 14:46:00 +00:00
go.mod	feat: build garm with go 1.21	2023-12-01 11:56:06 +01:00
go.sum	feat: build garm with go 1.21	2023-12-01 11:56:06 +01:00
LICENSE	Initial commit	2022-04-13 19:45:01 +03:00
Makefile	Enable Windows builds	2023-08-18 14:46:00 +00:00
README.md	Merge pull request #191 from gabriel-samfira/update-readme	2023-12-11 17:53:32 +02:00

README.md

GitHub Actions Runner Manager (GARM)

Welcome to GARM!

GARM enables you to create and automatically maintain pools of self-hosted GitHub runners, with auto-scaling that can be used inside your github workflow runs.

The goal of GARM is to be simple to set up, simple to configure and simple to use. It is a single binary that can run on any GNU/Linux machine without any other requirements other than the providers it creates the runners in. It is intended to be easy to deploy in any environment and can create runners in any system you can write a provider for. There is no complicated setup process and no extremely complex concepts to understand. Once set up, it's meant to stay out of your way.

GARM supports creating pools on either GitHub itself or on your own deployment of GitHub Enterprise Server. For instructions on how to use GARM with GHE, see the credentials section of the documentation.

Through the use of providers, GARM can create runners in a variety of environments using the same GARM instance. Want to create pools of runners in your OpenStack cloud, your Azure cloud and your Kubernetes cluster? No problem! Just install the appropriate providers, configure them in GARM and you're good to go. Create zero-runner pools for instances with high costs (large VMs, GPU enabled instances, etc) and have them spin up on demand, or create large pools of k8s backed runners that can be used for your CI/CD pipelines at a moment's notice. You can mix them up and create pools in any combination of providers or resource allocations you want.

Join us on slack

Whether you're running into issues or just want to drop by and say "hi", feel free to join us on slack.

Installing

On virtual or physical machines

Check out the quickstart document for instructions on how to install GARM. If you'd like to build from source, check out the building from source document.

On Kubernetes

Thanks to the efforts of the amazing folks at @mercedes-benz, GARM can now be integrated into k8s via their operator. Check out the GARM operator for more details.

Supported providers

GARM has a built-in LXD provider that you can use out of the box to spin up runners on any machine that runs either a stand-alone LXD instance, or an LXD cluster. The quick start guide mentioned above will get you up and running with the LXD provider.

GARM also supports external providers for a variety of other targets.

Installing external providers

External providers are binaries that GARM calls into to create runners in a particular IaaS. There are currently two external providers available:

OpenStack
Azure
Kubernetes - Thanks to the amazing folks at @mercedes-benz for sharing their awesome provider!

Follow the instructions in the README of each provider to install them.

Configuration

The GARM configuration is a simple toml. The sample config file in the testdata folder is fairly well commented and should be enough to get you started. The configuration file is split into several sections, each of which is documented in its own page. The sections are:

Optimizing your runners

If you would like to optimize the startup time of new instance, take a look at the performance considerations page.

Write your own provider

The providers are interfaces between GARM and a particular IaaS in which we spin up GitHub Runners. These providers can be either native or external. The native providers are written in Go, and must implement the interface defined here. External providers can be written in any language, as they are in the form of an external executable that GARM calls into.

There is currently one native provider for LXD and several external providers linked above.

If you want to write your own provider, you can choose to write a native one, or implement an external one. I encourage you to opt for an external provider, as those are the easiest to write and you don't need to merge it in GARM itself to be able to use. Faster to write, faster to iterate. The LXD provider may at some point be split from GARM into it's own external project, at which point we will remove the native provider interface and only support external providers.

Please see the Writing an external provider document for details. Also, feel free to inspect the two available sample external providers in this repository.