From 979c07adbec9d7ca59144f13d69752355b97d286 Mon Sep 17 00:00:00 2001
From: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Date: Fri, 9 May 2025 21:28:29 +0000
Subject: [PATCH] Add some info about scale sets

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
---
 README.md        | 38 ++++++++++++--------
 doc/scalesets.md | 93 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 117 insertions(+), 14 deletions(-)
 create mode 100644 doc/scalesets.md

diff --git a/README.md b/README.md
index 4411834c..69f4ee6f 100644
--- a/README.md
+++ b/README.md
@@ -4,16 +4,17 @@
 
 <!-- TOC -->
 
-- [About GARM](#about-garm)
-- [Join us on slack](#join-us-on-slack)
-- [Installing](#installing)
-    - [Quickstart](#quickstart)
-    - [Installing on Kubernetes](#installing-on-kubernetes)
-- [Using GARM](#using-garm)
-- [Supported providers](#supported-providers)
-    - [Installing external providers](#installing-external-providers)
-- [Optimizing your runners](#optimizing-your-runners)
-- [Write your own provider](#write-your-own-provider)
+- [GitHub Actions Runner Manager GARM](#github-actions-runner-manager-garm)
+    - [About GARM](#about-garm)
+    - [Join us on slack](#join-us-on-slack)
+    - [Installing](#installing)
+        - [Quickstart](#quickstart)
+        - [Installing on Kubernetes](#installing-on-kubernetes)
+    - [Using GARM](#using-garm)
+    - [Supported providers](#supported-providers)
+        - [Installing external providers](#installing-external-providers)
+    - [Optimizing your runners](#optimizing-your-runners)
+    - [Write your own provider](#write-your-own-provider)
 
 <!-- /TOC -->
 
@@ -23,19 +24,28 @@ Welcome to GARM!
 
 GARM enables you to create and automatically maintain pools of [self-hosted GitHub runners](https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners), with auto-scaling that can be used inside your github workflow runs.
 
-The goal of ```GARM``` is to be simple to set up, simple to configure and simple to use. The server itself is a single binary that can run on any GNU/Linux machine without any other requirements other than the providers you want to enable in your setup. It is intended to be easy to deploy in any environment and can create runners in virtually any system you can write a provider for. There is no complicated setup process and no extremely complex concepts to understand. Once set up, it's meant to stay out of your way.
+The goal of ```GARM``` is to be simple to set up, simple to configure and simple to use. The server itself is a single binary that can run on any GNU/Linux machine without any other requirements other than the providers you want to enable in your setup. It is intended to be easy to deploy in any environment and can create runners in virtually any system you can write a provider for (if one does not alreay exist). There is no complicated setup process and no extremely complex concepts to understand. Once set up, it's meant to stay out of your way.
 
-GARM supports creating pools in either GitHub itself or in your own deployment of [GitHub Enterprise Server](https://docs.github.com/en/enterprise-server@3.10/admin/overview/about-github-enterprise-server). For instructions on how to use ```GARM``` with GHE, see the [credentials](/doc/github_credentials.md) section of the documentation.
+GARM supports creating pools and scale sets in either GitHub itself or in your own deployment of [GitHub Enterprise Server](https://docs.github.com/en/enterprise-server@3.10/admin/overview/about-github-enterprise-server). For instructions on how to use ```GARM``` with GHE, see the [credentials](/doc/github_credentials.md) section of the documentation.
 
-Through the use of providers, `GARM` can create runners in a variety of environments using the same `GARM` instance. Whether you want to create pools of runners in your OpenStack cloud, your Azure cloud or your Kubernetes cluster, that is easily achieved by just installing the appropriate providers, configuring them in `GARM` and creating pools that use them. You can create zero-runner pools for instances with high costs (large VMs, GPU enabled instances, etc) and have them spin up on demand, or you can create large pools of eagerly created k8s backed runners that can be used for your CI/CD pipelines at a moment's notice. You can mix them up and create pools in any combination of providers or resource allocations you want.
+Through the use of providers, `GARM` can create runners in a variety of environments using the same `GARM` instance. Whether you want to create runners in your OpenStack cloud, your Azure cloud or your Kubernetes cluster, that is easily achieved by installing the appropriate providers, configuring them in `GARM` and creating pools that use them. You can create zero-runner pools for instances with high costs (large VMs, GPU enabled instances, etc) and have them spin up on demand, or you can create large pools of eagerly created k8s backed runners that can be used for your CI/CD pipelines at a moment's notice. You can mix them up and create pools in any combination of providers or resource allocations you want.
 
-Here is a brief architectural diagram of how GARM reacts to workflows triggered in GitHub (click the image to see a larger version):
+GARM supports two modes of operation:
+
+* Pools
+* Scale sets
+
+Here is a brief architectural diagram of how pools work and how GARM reacts to workflows triggered in GitHub (click the image to see a larger version):
 
 ![GARM architecture diagram](/doc/images/garm-light.drawio.svg?raw=true#gh-light-mode-only)
 ![GARM architecture diagram](/doc/images/garm-dark.drawio.svg?raw=true#gh-dark-mode-only)
 
+**Scale sets** work differently. While pools (as they are defined in GARM) rely on webhooks to know when a job was started and GARM needs to internally make the right decission in terms of which pool should handle that runner, scale sets have a lot of the scheduling and decission making logic done in GitHub itself.
+
 :warning: **Important note**: The README and documentation in the `main` branch are relevant to the not yet released code that is present in `main`. Following the documentation from the `main` branch for a stable release of GARM, may lead to errors. To view the documentation for the latest stable release, please switch to the appropriate tag. For information about setting up `v0.1.5`, please refer to the [v0.1.5 tag](https://github.com/cloudbase/garm/tree/v0.1.5).
 
+:warning: **Important note**: The `main` branch holds the latest code and is not guaranteed to be stable. If you are looking for a stable release, please check the releases page. If you plan to use the `main` branch, please do so on a new instance. Do not upgrade from a stable release to `main`.
+
 ## Join us on slack
 
 Whether you're running into issues or just want to drop by and say "hi", feel free to [join us on slack](https://communityinviter.com/apps/garm-hq/garm).
diff --git a/doc/scalesets.md b/doc/scalesets.md
new file mode 100644
index 00000000..2bbd1a8e
--- /dev/null
+++ b/doc/scalesets.md
@@ -0,0 +1,93 @@
+# Scale Sets
+
+<!-- TOC -->
+
+- [Scale Sets](#scale-sets)
+    - [Create a new scale set](#create-a-new-scale-set)
+    - [Scale Set vs Pool](#scale-set-vs-pool)
+
+<!-- /TOC -->
+
+GARM supports [scale sets](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/deploying-runner-scale-sets-with-actions-runner-controller). This new mode of operation was added by GitHub to enable more efficient scheduling of runners using their own ARC (Actions Runner Controller) project. The APIs for enabling scale sets are not yet public and the scale set functionlity itself is not terribly well documented outside the context of ARC, but it can be implemented in third party auto scalers.
+
+In this document we will focus on how scale sets work, how they are different than pools and how to manage them.
+
+We'll start with detailing how to create a scale set.
+
+## Create a new scale set
+
+Creating a scale set is identical to [creating a pool](/doc/using_garm.md#creating-a-runner-pool), but instead of adding labels to a scale set, it takes a name. We'll assume you already have a provider enabled and you have added a repo, org or enterprise to GARM.
+
+```bash
+ubuntu@garm:~$ garm-cli repo ls
++--------------------------------------+-----------+--------------+------------+------------------+--------------------+------------------+
+| ID                                   | OWNER     | NAME         | ENDPOINT   | CREDENTIALS NAME | POOL BALANCER TYPE | POOL MGR RUNNING |
++--------------------------------------+-----------+--------------+------------+------------------+--------------------+------------------+
+| 84a5e82f-7ab1-427f-8ee0-4569b922296c | gsamfira  | garm-testing | github.com | gabriel-samfira  | roundrobin         | true             |
++--------------------------------------+-----------+--------------+------------+------------------+--------------------+------------------+
+```
+
+List providers:
+
+```bash
+ubuntu@garm:~$ garm-cli provider list
++--------------+---------------------------------+----------+
+| NAME         | DESCRIPTION                     | TYPE     |
++--------------+---------------------------------+----------+
+| incus        | Incus external provider         | external |
++--------------+---------------------------------+----------+
+| azure        | azure provider                  | external |
++--------------+---------------------------------+----------+
+| aws_ec2      | Amazon EC2 provider             | external |
++--------------+---------------------------------+----------+
+```
+
+Create a new scale set:
+
+```bash
+garm-cli scaleset add \
+    --repo  84a5e82f-7ab1-427f-8ee0-4569b922296c \
+    --provider-name incus \
+    --image ubuntu:22.04 \
+    --name garm-scale-set \
+    --flavor default \
+    --enabled true \
+    --min-idle-runners=0 \
+    --max-runners=20
++--------------------------+-----------------------+
+| FIELD                    | VALUE                 |
++--------------------------+-----------------------+
+| ID                       | 8                     |
+| Scale Set ID             | 14                    |
+| Scale Name               | garm-scale-set        |
+| Provider Name            | incus                 |
+| Image                    | ubuntu:22.04          |
+| Flavor                   | default               |
+| OS Type                  | linux                 |
+| OS Architecture          | amd64                 |
+| Max Runners              | 20                    |
+| Min Idle Runners         | 0                     |
+| Runner Bootstrap Timeout | 20                    |
+| Belongs to               | gsamfira/garm-testing |
+| Level                    | repo                  |
+| Enabled                  | true                  |
+| Runner Prefix            | garm                  |
+| Extra specs              |                       |
+| GitHub Runner Group      | Default               |
++--------------------------+-----------------------+
+```
+
+That's it. You now have a scale set created, ready to accept jobs.
+
+## Scale Set vs Pool
+
+Scale sets are a new way of managing runners. They were introduced by GitHub to enable more efficient scheduling of runners using their own Actions Runner Controller (ARC) project. Scale sets are meant to reduce API calls, improve reliability of message deliveries and improve efficiency of runner management. While webhooks work great most of the time, under heavy load, they may not fire or they may fire while the auto scaler is offline. If webhooks are fired while GARM is down, we will never know about those jobs unless we query the current workflow runs.
+
+Listing workflow runs is not feisable for orgs or enterprises, as that would mean listing all repos withing an org then for each repository, listing all workflow runs. This gets worse for enterprises. Scale sets on the other hand allows GARM to subscribe to a message queue and get messages just for that scale set over HTTP long poll.
+
+Advantages of scale sets over pools:
+
+* No more need to install a webhook, reducing your security footprint.
+* Scheduling is done by GitHub. GARM receives runner requests from GitHub and GARM can choose to acquire those jobs or leave them for some other scaler.
+* Easier use of runner groups. While GARM supports runner groups, github currently [does not send the group name](https://github.com/orgs/community/discussions/158000) as part of webhooks in `queued` state. This prevents GARM (or any other auto scaler) to efficiently schedule runners to pools that have runner groups set. But given that in the case of scale sets, GitHub schedules the runners to the scaleset itself, we can efficiently create runners in certain runner groups.
+* scale set names must be unique within a runner group
\ No newline at end of file