For now, the aditional labels would only contain the job ID that triggered
the creation of the runner. It does not make sense to add this label to the
actual runner that registeres against github. We can simply use it internally
by fetching it from the DB.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* enable foreign key constraints on sqlite
* on delete cascade for addresses and status messages
* add debug server config option
* fix rr allocation
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Break the lock on a job if it's still queued and the runner that it
triggered was assigned to another job. This may cause leftover runners
to be created, but we scale those down in ~3 minutes.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Removes completed jobs from the db
* Skip ensure min idle runners for pools with min idle runners set to 0
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* removes an extra loop. The fetch tools loop does the same job
* add a lot of log messages
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Lock operations per instance name. This should avoid go routines trying
to update the same instance when operations may be slow.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This commit adds:
* more granular loops for various operations
* update go-github to latest version
* skip trying to fetch runner info for canceled or skipped jobs
* loops use waitgroups to signal exit
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This commit:
* swaps WaitGroups with errgroups
* wraps errgroup.Wait() in a select to prevent situations in which an
operation takes a long time and prevents garm from being restarted.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* When a runner fails to set up the github agent, we reap it after the
pool timeout is reached.
* add a retry in the userdata when configuring the runner agent
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Use an errgroup to wait for all instance deletion operations before
returning. Log any failure.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* cleanupOrphanedGithubRunners() now uses errgroup to parallelize and
report errors when removing runners from the provider.
* retryFailedInstancesForOnePool() now uses errgroup
* Removed some setPoolRunningState which should be treated in the loop
where those errors eventually bubble up and can be handled.
* Added a number of timeouts in the LXD provider for delete and list
instances. This provider should be converted into an external
provider.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change renames the module from "garm" to "github.com/cloudbase/garm".
This will make it easier to consume public functions defined in garm, by
external applications, without having to resort to replace.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Add a grace period for idle runners of 5 minutes. A new idle runner will
not be taken into consideration for scale-down unless it's older than 5
minutes. This should prevent situations where the scaleDown() routine
that runs every minute will evaluate candidates for reaping and
erroneously count the new one as well. The in_progress hooks that
transitiones an idle runner to "active" may arive a long while after the
"queued" hook has spun up a runner.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
if we fail to cleanup failed instance, we return before retrying to
recreate it.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Extra specs is an opaque valid JSON that can be set on a pool and which
will be passed along to the provider as part of instance bootstrap params.
This field is meant to allow operators to send extra configuration values
to external or built-in providers. The extra specs is not interpreted or
useful in any way to garm itself, but it may be useful to the provider
which interacts with the IaaS.
The extra specs are not meant to be used for secrets. Adding sensitive
information to this field is highly discouraged. This field is meant as a
means to add fine tuning knobs to the providers, on a per pool basis.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
On some providers the default character set used by shortid may lead to
errors when creating runners, due to the fact that underscores are not
allowed in their names.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
There are several fields that are common among some of the data
structures in garm. The RunnerPrefix is just one of them. Perhaps we
should move some of the rest in a common type and embed that into the
types that share those fields.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>