Commit graph

152 commits

Author SHA1 Message Date
Gabriel Adrian Samfira
28360fd662 Do not record jobs not meant for us
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
a15a91b974 Break lock and lower scale down timeout
Break the lock on a job if it's still queued and the runner that it
triggered was assigned to another job. This may cause leftover runners
to be created, but we scale those down in ~3 minutes.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
1287a93cf2 Add job list to API and CLI
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
b6a02db446 Remove completed jobs and slight optimization
* Removes completed jobs from the db
  * Skip ensure min idle runners for pools with min idle runners set to 0

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
5153738359 Small fixes
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
fbffd8157b Add job tracking
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
5bca63eeb1 Replace wait implementation with errgroup
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:40:57 +00:00
Gabriel Adrian Samfira
67b871488d Log the actual error
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-30 09:14:10 +03:00
Gabriel Adrian Samfira
0a27acd818 Remove extra loop and add logging
* removes an extra loop. The fetch tools loop does the same job
  * add a lot of log messages

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-30 08:52:16 +03:00
Gabriel Adrian Samfira
7358beb2b9 Merge Unlock() and UnlockAndDelete()
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-30 08:48:29 +03:00
Gabriel Adrian Samfira
1edb9247a8 Add per instance mux
Lock operations per instance name. This should avoid go routines trying
to update the same instance when operations may be slow.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-23 15:43:31 +00:00
Gabriel Adrian Samfira
a9cf5127a9 More granular loops, update go-github
This commit adds:

  * more granular loops for various operations
  * update go-github to latest version
  * skip trying to fetch runner info for canceled or skipped jobs
  * loops use waitgroups to signal exit

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-23 08:16:41 +00:00
Gabriel Adrian Samfira
4921692ee2 Wrap errgroup in select
This commit:

  * swaps WaitGroups with errgroups
  * wraps errgroup.Wait() in a select to prevent situations in which an
    operation takes a long time and prevents garm from being restarted.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-23 01:07:55 +03:00
Mihaela Balutoiu
d698e2815e Fix runner/pools.go typo
Signed-off-by: Mihaela Balutoiu <mbalutoiu@cloudbasesolutions.com>
2023-06-21 10:56:51 +03:00
Mihaela Balutoiu
00c0ada0aa Add more runner/pools.go unit tests
Signed-off-by: Mihaela Balutoiu <mbalutoiu@cloudbasesolutions.com>
2023-06-19 23:39:07 +03:00
Mihaela Balutoiu
7ac2455379 Fix runner/pools.go typo
Signed-off-by: Mihaela Balutoiu <mbalutoiu@cloudbasesolutions.com>
2023-06-15 14:12:07 +03:00
Gabriel
05168ac994
Merge pull request #106 from gabriel-samfira/reap-failed-agent
Reap failed agent
2023-06-14 22:19:20 +03:00
Mihaela Balutoiu
e1ad300f79 Add test cases for the runner/pools.go
Signed-off-by: Mihaela Balutoiu <mbalutoiu@cloudbasesolutions.com>
2023-06-14 14:17:47 +03:00
Gabriel Adrian Samfira
0e637c10e3
Remove failed runner and some retries
* When a runner fails to set up the github agent, we reap it after the
pool timeout is reached.
  * add a retry in the userdata when configuring the runner agent

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-13 21:01:40 +03:00
Lea Waller
e3065d6951
Allow installing additional packages in lxd 2023-06-11 17:32:30 +02:00
Gabriel Adrian Samfira
06745eb88a
Validate the result returned by providers
Providers may return only 3 possible statuses:

  * InstanceRunning
  * InstanceError
  * InstanceStopped

Every other status is reserved for the controller to set. Provider
responses will be split from the instance response in a future commit.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-11 15:02:36 +03:00
Gabriel Adrian Samfira
d8ed55288f Small comment change
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 14:55:08 +00:00
Gabriel Adrian Samfira
829933559d Allow installing runners to finish
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 14:49:28 +00:00
Gabriel Adrian Samfira
bd2f103743
Wait for addPendingInstances to finish
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 16:40:27 +03:00
Gabriel Adrian Samfira
e9f66c2035
Wait for deletePendingInstances() to finish
Use an errgroup to wait for all instance deletion operations before
returning. Log any failure.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 16:36:08 +03:00
Gabriel Adrian Samfira
05a79d298c
Move some code around
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 16:31:30 +03:00
Gabriel Adrian Samfira
9efefc0d6a
Parallelization and LXD timeouts
* cleanupOrphanedGithubRunners() now uses errgroup to parallelize and
    report errors when removing runners from the provider.
  * retryFailedInstancesForOnePool() now uses errgroup
  * Removed some setPoolRunningState which should be treated in the loop
    where those errors eventually bubble up and can be handled.
  * Added a number of timeouts in the LXD provider for delete and list
    instances. This provider should be converted into an external
    provider.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 16:07:07 +03:00
Gabriel Adrian Samfira
88a39220f5 Allow disabling updates
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-04 22:28:50 +00:00
Gabriel Adrian Samfira
132823b453 Let CreateInstance download and cache image
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-04 21:20:22 +00:00
Gabriel Adrian Samfira
234095e456
Fix pool ID check when listing instances
We should be looking for the poolIDKey in the extended LXD config.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-04 15:43:03 +03:00
Gabriel Adrian Samfira
a433bede96 Return only enabled pools
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-31 14:47:27 +00:00
Gabriel Adrian Samfira
6d4f297097 Fix error type passed into errors.As
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-31 04:36:16 +00:00
Gabriel Adrian Samfira
243ae75476 Properly handle stopped runners
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-27 15:02:25 +00:00
Gabriel Adrian Samfira
6b3ea50ca5 Add runner group option to pools
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-27 09:21:21 +00:00
Gabriel Adrian Samfira
80e8f6dc1e
Add some exit codes
The external provider needs a simple way to indicate certain types of
errors. Duplicate error and not found error are such an example.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-26 22:31:55 +03:00
Gabriel Adrian Samfira
e7f208367b
Error if we can't remove instance from provider
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-26 22:19:20 +03:00
Gabriel Adrian Samfira
c6366ab896
Add Windows userdata
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-20 20:10:48 +02:00
Gabriel Adrian Samfira
da5e8f9719
Fall back to info saved in tags
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-20 10:50:08 +02:00
Gabriel Adrian Samfira
92af290c27
Properly resolve OS tag
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-19 21:28:24 +02:00
Gabriel Adrian Samfira
a56ab9609a
Add Windows as a supported OS
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-19 19:29:01 +02:00
Gabriel Adrian Samfira
91e2d7b029
Remove unused param, add OSType to bootstrap param
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-16 12:29:00 +02:00
Gabriel Adrian Samfira
b17c921a7c
Add Run() function
* Add Run() helper for external providers
  * Make GARM_CONTROLLER_ID env var common to all commands

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-15 01:35:42 +02:00
Gabriel Adrian Samfira
e29d5db72c
Properly detect if we have something on stdin
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-14 20:20:26 +02:00
Gabriel Adrian Samfira
223477c4dd
Define interface for external providers
This interface is similar to the common.Provider interface, but lacks
the AsParams() function. Decoupling the external provider interface from
the internal provider interface allows us to account for any
particularities there may appear between them.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-14 19:49:54 +02:00
Gabriel Adrian Samfira
882d07a0da
Add execution environment to external provider
The execution package is a common package that can be used by external
providers to load environment variables and stdin, in a coherent struct
that can be consumed by the various commands that need to execute as
part of the provider.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-14 18:28:31 +02:00
Gabriel Adrian Samfira
0074af9370
Move some defaults and types from config
The params package should not depend on config. The params packages
should be consumable by external applications that wish to interact with
garm, and it makes no sense to pull in the config package just for some
constants. As such, the following changes have been made:

  * Moved some types from config to params
  * Moved defaults in a new leaf package called appdefaults

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-14 14:15:10 +02:00
Gabriel Adrian Samfira
829db87f15
Rename module
This change renames the module from "garm" to "github.com/cloudbase/garm".

This will make it easier to consume public functions defined in garm, by
external applications, without having to resort to replace.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-12 16:01:49 +02:00
Gabriel
24f61ceb8c
Merge pull request #78 from gabriel-samfira/add-scale-down-grace-period
Add grace period to scale-down
2023-02-08 15:16:28 +02:00
Gabriel Adrian Samfira
43d2fd8c2d
Add grace period to scale-down
Add a grace period for idle runners of 5 minutes. A new idle runner will
not be taken into consideration for scale-down unless it's older than 5
minutes. This should prevent situations where the scaleDown() routine
that runs every minute will evaluate candidates for reaping and
erroneously count the new one as well. The in_progress hooks that
transitiones an idle runner to "active" may arive a long while after the
"queued" hook has spun up a runner.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-02-07 13:36:15 +02:00
Gabriel
439eeee479
Update runner/pool/pool.go
Co-authored-by: Michael Kuhnt <maigl@users.noreply.github.com>
2023-02-07 13:14:28 +02:00