Commit graph

171 commits

Author SHA1 Message Date
Gabriel
3fe5d510fe
Merge pull request #124 from gabriel-samfira/fix-entity-update
Fix entity update
2023-07-05 13:39:07 +03:00
Gabriel Adrian Samfira
fe2cb01528 Slight refactor and fix tests
Updating a pool will no longer try to create a pool manager if one does
not already exist. A pool manager must be started when a pool is created.
Updating an existing pool without a pool manager is an error condition.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-05 09:46:19 +00:00
Gabriel Adrian Samfira
86ed06d6ff Rename UpdateRepositoryParams to UpdateEntityParams
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-05 00:00:24 +00:00
Gabriel Adrian Samfira
c162bde6cb Set pool manager status
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-04 23:05:01 +00:00
Gabriel Adrian Samfira
3d26900d32 Set credentials in pool manager
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-04 22:11:45 +00:00
Gabriel Adrian Samfira
a41eeb6f1e Update comment on function
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-04 10:48:14 +00:00
Gabriel Adrian Samfira
6c06afb8e8 Don't add aditional labels to GH runner
For now, the aditional labels would only contain the job ID that triggered
the creation of the runner. It does not make sense to add this label to the
actual runner that registeres against github. We can simply use it internally
by fetching it from the DB.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:48:22 +00:00
Gabriel Adrian Samfira
0ab8f73bb4 Use r.log()
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
9101cdc0a2 Lower the tool update interval to 1 minute
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
f92ac2a74f Lower backoff timer to 1 minute
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
7f510ec40a Check if we have a recorded job
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
a526c1024c Various fixes
* enable foreign key constraints on sqlite
  * on delete cascade for addresses and status messages
  * add debug server config option
  * fix rr allocation

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
f7cf6bb619 increase backoff to 30 seconds
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
3796c25228 Amend some log messages
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
bf90eb323a Add back update locks
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
4a2ba68867 Use the same db connection in runner
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
b52f107bde Update log messages
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
c04a93dde9 Add basic round robin for pools
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
4b9c20e1be Reduce timeout to 10 seconds
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
28360fd662 Do not record jobs not meant for us
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
a15a91b974 Break lock and lower scale down timeout
Break the lock on a job if it's still queued and the runner that it
triggered was assigned to another job. This may cause leftover runners
to be created, but we scale those down in ~3 minutes.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
1287a93cf2 Add job list to API and CLI
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
b6a02db446 Remove completed jobs and slight optimization
* Removes completed jobs from the db
  * Skip ensure min idle runners for pools with min idle runners set to 0

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
5153738359 Small fixes
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
fbffd8157b Add job tracking
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
5bca63eeb1 Replace wait implementation with errgroup
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:40:57 +00:00
Gabriel Adrian Samfira
67b871488d Log the actual error
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-30 09:14:10 +03:00
Gabriel Adrian Samfira
0a27acd818 Remove extra loop and add logging
* removes an extra loop. The fetch tools loop does the same job
  * add a lot of log messages

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-30 08:52:16 +03:00
Gabriel Adrian Samfira
7358beb2b9 Merge Unlock() and UnlockAndDelete()
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-30 08:48:29 +03:00
Gabriel Adrian Samfira
1edb9247a8 Add per instance mux
Lock operations per instance name. This should avoid go routines trying
to update the same instance when operations may be slow.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-23 15:43:31 +00:00
Gabriel Adrian Samfira
a9cf5127a9 More granular loops, update go-github
This commit adds:

  * more granular loops for various operations
  * update go-github to latest version
  * skip trying to fetch runner info for canceled or skipped jobs
  * loops use waitgroups to signal exit

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-23 08:16:41 +00:00
Gabriel Adrian Samfira
4921692ee2 Wrap errgroup in select
This commit:

  * swaps WaitGroups with errgroups
  * wraps errgroup.Wait() in a select to prevent situations in which an
    operation takes a long time and prevents garm from being restarted.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-23 01:07:55 +03:00
Mihaela Balutoiu
d698e2815e Fix runner/pools.go typo
Signed-off-by: Mihaela Balutoiu <mbalutoiu@cloudbasesolutions.com>
2023-06-21 10:56:51 +03:00
Mihaela Balutoiu
00c0ada0aa Add more runner/pools.go unit tests
Signed-off-by: Mihaela Balutoiu <mbalutoiu@cloudbasesolutions.com>
2023-06-19 23:39:07 +03:00
Mihaela Balutoiu
7ac2455379 Fix runner/pools.go typo
Signed-off-by: Mihaela Balutoiu <mbalutoiu@cloudbasesolutions.com>
2023-06-15 14:12:07 +03:00
Gabriel
05168ac994
Merge pull request #106 from gabriel-samfira/reap-failed-agent
Reap failed agent
2023-06-14 22:19:20 +03:00
Mihaela Balutoiu
e1ad300f79 Add test cases for the runner/pools.go
Signed-off-by: Mihaela Balutoiu <mbalutoiu@cloudbasesolutions.com>
2023-06-14 14:17:47 +03:00
Gabriel Adrian Samfira
0e637c10e3
Remove failed runner and some retries
* When a runner fails to set up the github agent, we reap it after the
pool timeout is reached.
  * add a retry in the userdata when configuring the runner agent

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-13 21:01:40 +03:00
Lea Waller
e3065d6951
Allow installing additional packages in lxd 2023-06-11 17:32:30 +02:00
Gabriel Adrian Samfira
06745eb88a
Validate the result returned by providers
Providers may return only 3 possible statuses:

  * InstanceRunning
  * InstanceError
  * InstanceStopped

Every other status is reserved for the controller to set. Provider
responses will be split from the instance response in a future commit.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-11 15:02:36 +03:00
Gabriel Adrian Samfira
d8ed55288f Small comment change
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 14:55:08 +00:00
Gabriel Adrian Samfira
829933559d Allow installing runners to finish
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 14:49:28 +00:00
Gabriel Adrian Samfira
bd2f103743
Wait for addPendingInstances to finish
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 16:40:27 +03:00
Gabriel Adrian Samfira
e9f66c2035
Wait for deletePendingInstances() to finish
Use an errgroup to wait for all instance deletion operations before
returning. Log any failure.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 16:36:08 +03:00
Gabriel Adrian Samfira
05a79d298c
Move some code around
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 16:31:30 +03:00
Gabriel Adrian Samfira
9efefc0d6a
Parallelization and LXD timeouts
* cleanupOrphanedGithubRunners() now uses errgroup to parallelize and
    report errors when removing runners from the provider.
  * retryFailedInstancesForOnePool() now uses errgroup
  * Removed some setPoolRunningState which should be treated in the loop
    where those errors eventually bubble up and can be handled.
  * Added a number of timeouts in the LXD provider for delete and list
    instances. This provider should be converted into an external
    provider.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-06 16:07:07 +03:00
Gabriel Adrian Samfira
88a39220f5 Allow disabling updates
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-04 22:28:50 +00:00
Gabriel Adrian Samfira
132823b453 Let CreateInstance download and cache image
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-04 21:20:22 +00:00
Gabriel Adrian Samfira
234095e456
Fix pool ID check when listing instances
We should be looking for the poolIDKey in the extended LXD config.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-04 15:43:03 +03:00
Gabriel Adrian Samfira
a433bede96 Return only enabled pools
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-31 14:47:27 +00:00