Commit graph

75 commits

Author SHA1 Message Date
Gabriel Adrian Samfira
da13cec2de Move code to external package
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-21 15:34:18 +00:00
Gabriel Adrian Samfira
a41eeb6f1e Update comment on function
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-04 10:48:14 +00:00
Gabriel Adrian Samfira
6c06afb8e8 Don't add aditional labels to GH runner
For now, the aditional labels would only contain the job ID that triggered
the creation of the runner. It does not make sense to add this label to the
actual runner that registeres against github. We can simply use it internally
by fetching it from the DB.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:48:22 +00:00
Gabriel Adrian Samfira
0ab8f73bb4 Use r.log()
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
f92ac2a74f Lower backoff timer to 1 minute
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
7f510ec40a Check if we have a recorded job
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
a526c1024c Various fixes
* enable foreign key constraints on sqlite
  * on delete cascade for addresses and status messages
  * add debug server config option
  * fix rr allocation

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
f7cf6bb619 increase backoff to 30 seconds
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
3796c25228 Amend some log messages
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
bf90eb323a Add back update locks
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
b52f107bde Update log messages
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
c04a93dde9 Add basic round robin for pools
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
4b9c20e1be Reduce timeout to 10 seconds
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
28360fd662 Do not record jobs not meant for us
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
a15a91b974 Break lock and lower scale down timeout
Break the lock on a job if it's still queued and the runner that it
triggered was assigned to another job. This may cause leftover runners
to be created, but we scale those down in ~3 minutes.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
b6a02db446 Remove completed jobs and slight optimization
* Removes completed jobs from the db
  * Skip ensure min idle runners for pools with min idle runners set to 0

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
5153738359 Small fixes
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
fbffd8157b Add job tracking
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
67b871488d Log the actual error
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-30 09:14:10 +03:00
Gabriel Adrian Samfira
0a27acd818 Remove extra loop and add logging
* removes an extra loop. The fetch tools loop does the same job
  * add a lot of log messages

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-30 08:52:16 +03:00
Gabriel Adrian Samfira
7358beb2b9 Merge Unlock() and UnlockAndDelete()
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-30 08:48:29 +03:00
Gabriel Adrian Samfira
1edb9247a8 Add per instance mux
Lock operations per instance name. This should avoid go routines trying
to update the same instance when operations may be slow.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-23 15:43:31 +00:00
Gabriel Adrian Samfira
a9cf5127a9 More granular loops, update go-github
This commit adds:

  * more granular loops for various operations
  * update go-github to latest version
  * skip trying to fetch runner info for canceled or skipped jobs
  * loops use waitgroups to signal exit

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-23 08:16:41 +00:00
Gabriel Adrian Samfira
4921692ee2 Wrap errgroup in select
This commit:

  * swaps WaitGroups with errgroups
  * wraps errgroup.Wait() in a select to prevent situations in which an
    operation takes a long time and prevents garm from being restarted.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-23 01:07:55 +03:00
Gabriel Adrian Samfira
0e637c10e3
Remove failed runner and some retries
* When a runner fails to set up the github agent, we reap it after the
pool timeout is reached.
  * add a retry in the userdata when configuring the runner agent

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-13 21:01:40 +03:00
Gabriel Adrian Samfira
d8ed55288f Small comment change
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 14:55:08 +00:00
Gabriel Adrian Samfira
829933559d Allow installing runners to finish
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 14:49:28 +00:00
Gabriel Adrian Samfira
bd2f103743
Wait for addPendingInstances to finish
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 16:40:27 +03:00
Gabriel Adrian Samfira
e9f66c2035
Wait for deletePendingInstances() to finish
Use an errgroup to wait for all instance deletion operations before
returning. Log any failure.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 16:36:08 +03:00
Gabriel Adrian Samfira
05a79d298c
Move some code around
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 16:31:30 +03:00
Gabriel Adrian Samfira
9efefc0d6a
Parallelization and LXD timeouts
* cleanupOrphanedGithubRunners() now uses errgroup to parallelize and
    report errors when removing runners from the provider.
  * retryFailedInstancesForOnePool() now uses errgroup
  * Removed some setPoolRunningState which should be treated in the loop
    where those errors eventually bubble up and can be handled.
  * Added a number of timeouts in the LXD provider for delete and list
    instances. This provider should be converted into an external
    provider.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 16:07:07 +03:00
Gabriel Adrian Samfira
a433bede96 Return only enabled pools
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-31 14:47:27 +00:00
Gabriel Adrian Samfira
243ae75476 Properly handle stopped runners
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-27 15:02:25 +00:00
Gabriel Adrian Samfira
6b3ea50ca5 Add runner group option to pools
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-27 09:21:21 +00:00
Gabriel Adrian Samfira
e7f208367b
Error if we can't remove instance from provider
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-26 22:19:20 +03:00
Gabriel Adrian Samfira
91e2d7b029
Remove unused param, add OSType to bootstrap param
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-16 12:29:00 +02:00
Gabriel Adrian Samfira
829db87f15
Rename module
This change renames the module from "garm" to "github.com/cloudbase/garm".

This will make it easier to consume public functions defined in garm, by
external applications, without having to resort to replace.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-12 16:01:49 +02:00
Gabriel
24f61ceb8c
Merge pull request #78 from gabriel-samfira/add-scale-down-grace-period
Add grace period to scale-down
2023-02-08 15:16:28 +02:00
Gabriel Adrian Samfira
43d2fd8c2d
Add grace period to scale-down
Add a grace period for idle runners of 5 minutes. A new idle runner will
not be taken into consideration for scale-down unless it's older than 5
minutes. This should prevent situations where the scaleDown() routine
that runs every minute will evaluate candidates for reaping and
erroneously count the new one as well. The in_progress hooks that
transitiones an idle runner to "active" may arive a long while after the
"queued" hook has spun up a runner.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-02-07 13:36:15 +02:00
Gabriel
439eeee479
Update runner/pool/pool.go
Co-authored-by: Michael Kuhnt <maigl@users.noreply.github.com>
2023-02-07 13:14:28 +02:00
Gabriel Adrian Samfira
77307998ea
Bail if we fail to cleanup failed instance
if we fail to cleanup failed instance, we return before retrying to
recreate it.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-02-06 14:53:23 +02:00
Michael Kuhnt
4eb8d905ab
fix: skip spawn new runners if enough idle runner available 2023-01-31 16:38:04 +01:00
Gabriel Adrian Samfira
d00da32375 Deduplicate some code
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-01-30 14:29:55 +00:00
Gabriel Adrian Samfira
f25951decb Add extra specs on pools
Extra specs is an opaque valid JSON that can be set on a pool and which
will be passed along to the provider as part of instance bootstrap params.

This field is meant to allow operators to send extra configuration values
to external or built-in providers. The extra specs is not interpreted or
useful in any way to garm itself, but it may be useful to the provider
which interacts with the IaaS.

The extra specs are not meant to be used for secrets. Adding sensitive
information to this field is highly discouraged. This field is meant as a    
means to add fine tuning knobs to the providers, on a per pool basis.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-01-30 13:10:21 +00:00
Gabriel Adrian Samfira
4d071b7d10 Return only alpha numeric characters as an ID
On some providers the default character set used by shortid may lead to
errors when creating runners, due to the fact that underscores are not
allowed in their names.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-01-27 14:57:25 +00:00
Gabriel Adrian Samfira
e93b6d73e5
Sanitize log entries
While most of these log entries come from either github or our own
database, it's still a good idea to sanitize them.
2023-01-23 18:01:46 +02:00
Gabriel Adrian Samfira
b354cedf7e
Fixed a bunch of linting issues
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-01-20 22:21:22 +02:00
Gabriel Adrian Samfira
f2cf947c00
Move pool type in params
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-01-20 17:08:15 +02:00
Gabriel Adrian Samfira
abcc9569bd
Add a common RunnerPrefix type
There are several fields that are common among some of the data
structures in garm. The RunnerPrefix is just one of them. Perhaps we
should move some of the rest in a common type and embed that into the
types that share those fields.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-01-20 12:12:15 +02:00
Michael Kuhnt
49762d7f9e
fix: shortid generates correctly now 2023-01-19 14:23:40 +01:00