The check for max runners was added to CreateInstance(), but we crete the
JIT runners before we run the function to add a runner to the DB. The defer
function to clean up the JIT runner was being run after the error return
generated by CreateInstance. So the cleanup code never ran. Additionally
we would know that max runners was reached only after creating the JIT
runner. Which kills rate limits.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
GARM cares about jobs in queued state for anything that requires
decision making. Anything else is purely informational.
This change cleans up all inactionable jobs and refuses to record jobs
that are not already in the database, have an inactionable state and
which do not have a runner we own handling them.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds a new "generation" field to pools, scalesets and
runners. The generation field is inherited by runners from scale sets
or pools at the time of creation.
The generation field on scalesets and pools is incremented when the
pool or scale set is updated in a way that might influence how runners
are created (flavor, image, specs, runner groups, etc).
Using this new field, we can determine if existing runners have diverged
from the settings of the pool/scale set that spawned them.
In the CLI we now have a new set of commands available for both
pools and scalesets that lists runners, with an optional --outdated
flag and a new "rotate" flag that removes all idle runners. Optionally
the --outdated flag can be passed to the rotate command to only remove
outdated runners.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds a new "agent mode" to GARM. The agent enables GARM to
set up a persistent websocket connection between the garm server and the
runners it spawns. The goal is to be able to easier keep track of state,
even without subsequent webhooks from the forge.
The Agent will report via websockets when the runner is actually online,
when it started a job and when it finished a job.
Additionally, the agent allows us to enable optional remote shell between
the user and any runner that is spun up using agent mode. The remote shell
is multiplexed over the same persistent websocket connection the agent
sets up with the server (the agent never listens on a port).
Enablement has also been done in the web UI for this functionality.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change caches jobs meant for an entity in the pool manager. This
allows us to avoid querying the db as much and allows us to better determine
when we should scale down.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Add template api endpoints
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Added template bypass
Pools and scale sets will automatically migrate to the new template
system for runner install scripts. If a pool or a scale set cannot be
migrate, it is left alone. It is expected that users set a runner install
template manually for scenarios we don't yet have a template for (windows
on gitea for example).
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Integrate templates with pool create/update
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Add webapp integration with templates
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Add unit tests
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Populate all relevant context fields
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Update dependencies
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Fix lint
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Validate uint
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Add CLI template management
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Some editor improvements and bugfixes
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Fix scale set return values post create
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Fix template websocket events filter
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
---------
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds some more cache helper functions, additional tests,
vastly improves memory usage when loading instances and cleans up some
code.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
On github, attempt to use the scaleset API to list all runners without
pagination. This will avoid missing runners and accidentally removing them.
Fall back to paginated API if we can't use the scaleset API.
Add ability to retrieve all instances from cache, for an entity.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Github will remove inactive scale sets after 7 days. This change
ensures the scale set exists in github before spinning up the listener.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Although runner names are unique, we still have an ID on the model
which is used as a primary key. We should allow using that ID to
reference a runner in the API.
This change allows users to specify ID or runner name.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
There seems to be a change in the scale set message. It now includes
a jobID and sets the runner request ID to 0. This change adds separate
job ID fields for workflow jobs and scaleset jobs.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Do not look for a name when composing the scale set. Preload may not
have been called on an entity, but we still have the ID, which is the
only thing needed when GetEntity() is called.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* time.NewTicker will panic if the duration is 0. Make it return
early if duration is 0.
* Return a pre-closed channel in Wait() instead of nil. Ensures receiver
will not block forever.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change renames a lot of variables, types and functions to be more
generic. The goal is to allow GARM to add more forges in the future.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Adds a periodic cleanup function that cross checks runners between github,
the provider and the GARM database. If an inconsistency is found, GARM will
attempt to fix it.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The locking logic was added to its own package as it may need to be used
by other parts of the code.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change moves the github client to a subpackage in utils
and adds the scaleset github client code.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds a backoff mechanism when deleting github runners.
If the delete operation fails, we record the event and retry with
a geometric progression of 1.5 starting from 5 seconds, which is the
pool consolidation timeout.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change scopes all github entities to a github endpoint, allowing
users to have the same repo/org/enterprise created for each endpoint.
This way, if your username is the same on github.com and on your GHES
server, and you have the same repository name or org in both places,
GARM can now handle that situation.
This change also fixes a leaky watcher in the pool manager.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Improper use of time.After can lead to memory leaks if the timer never
gets a chance to fire.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
GARM has a backoff interval when consuming queued jobs. This backoff
is intended to allow any potential idle runners to pick up a job before
GARM attempts to spin up a new one. This change allows users to set a
custom backoff interval or disable it altogether by setting it to 0.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change uses the database watcher to watch for changes to the
github entities, credentials and controller info.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Adds a simple database watcher. At this point it's just one process, but
the plan is to allow different implementations that inform the local running
workers of changes that have occured on entities of interest in the database.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>