Break the lock on a job if it's still queued and the runner that it
triggered was assigned to another job. This may cause leftover runners
to be created, but we scale those down in ~3 minutes.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Removes completed jobs from the db
* Skip ensure min idle runners for pools with min idle runners set to 0
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* removes an extra loop. The fetch tools loop does the same job
* add a lot of log messages
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Lock operations per instance name. This should avoid go routines trying
to update the same instance when operations may be slow.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This commit adds:
* more granular loops for various operations
* update go-github to latest version
* skip trying to fetch runner info for canceled or skipped jobs
* loops use waitgroups to signal exit
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This commit:
* swaps WaitGroups with errgroups
* wraps errgroup.Wait() in a select to prevent situations in which an
operation takes a long time and prevents garm from being restarted.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* When a runner fails to set up the github agent, we reap it after the
pool timeout is reached.
* add a retry in the userdata when configuring the runner agent
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Providers may return only 3 possible statuses:
* InstanceRunning
* InstanceError
* InstanceStopped
Every other status is reserved for the controller to set. Provider
responses will be split from the instance response in a future commit.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Use an errgroup to wait for all instance deletion operations before
returning. Log any failure.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* cleanupOrphanedGithubRunners() now uses errgroup to parallelize and
report errors when removing runners from the provider.
* retryFailedInstancesForOnePool() now uses errgroup
* Removed some setPoolRunningState which should be treated in the loop
where those errors eventually bubble up and can be handled.
* Added a number of timeouts in the LXD provider for delete and list
instances. This provider should be converted into an external
provider.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The external provider needs a simple way to indicate certain types of
errors. Duplicate error and not found error are such an example.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Add Run() helper for external providers
* Make GARM_CONTROLLER_ID env var common to all commands
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This interface is similar to the common.Provider interface, but lacks
the AsParams() function. Decoupling the external provider interface from
the internal provider interface allows us to account for any
particularities there may appear between them.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The execution package is a common package that can be used by external
providers to load environment variables and stdin, in a coherent struct
that can be consumed by the various commands that need to execute as
part of the provider.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>