Commit graph

209 commits

Author SHA1 Message Date
Gabriel Adrian Samfira
d09f12dfd8 Add force delete runner
This branch adds the ability to forcefully remove a runner from GARM.

When the operator wishes to manually remove a runner, the workflow is as
follows:

* Check that the runner exists in GitHub. If it does, attempt to
  remove it. An error here indicates that the runner may be processing
  a job. In this case, we don't continue and the operator gets immediate
  feedback from the API.
* Mark the runner in the database as pending_delete
* Allow the consolidate loop to reap it from the provider and remove it
  from the database.

Removing the instance from the provider is async. If the provider errs out,
GARM will keep trying to remove it in perpetuity until the provider succedes.

In situations where the provider is misconfigured, this will never happen, leaving
the instance in a permanent state of pending_delete.

A provider may fail for various reasons. Either credentials have expired, the
API endpoint has changed, the provider is misconfigured or the operator may just
have removed it from the config before cleaning up the runners. While some cases
are recoverable, some are not. We cannot have a situation in which we cannot clean
resources in garm because of a misconfiguration.

This change adds the pending_force_delete instance status. Instances marked with
this status, will be removed from GARM even if the provider reports an error.

The GARM cli has been modified to give new meaning to the --force-remove-runner
option. This option in the CLI is no longer mandatory. Instead, setting it will mark
the runner with the new pending_force_delete status. Omitting it will mark the runner
with the old status of pending_delete.

Fixes: #160

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-10-12 06:15:36 +00:00
Gabriel Adrian Samfira
26dbc3d8e5 Update garm-provider-common
This update pulls in the latest version of garm-provider-common which removes
its dependency on go-github, making future updates much less painful.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-10-09 10:55:11 +00:00
Gabriel Adrian Samfira
019948acbe Add JIT config as part of instance create
We must create the DB entry for a runner with a JIT config included. Adding it later
via an update runs the risk of having the consolidate loop pick up the incomplete instance.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:51:17 +00:00
Gabriel Adrian Samfira
8c507a9251 Run go generate
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:51:17 +00:00
Gabriel Adrian Samfira
4bedb1dd63 Fix URLs for enterprises
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:51:17 +00:00
Gabriel Adrian Samfira
5f2cb19503 Use accessors when getting response values
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:51:17 +00:00
Gabriel Adrian Samfira
e53c271337 Add metadata URLs
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:50:21 +00:00
Gabriel Adrian Samfira
1268507ce6 Add jit config routes
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:50:20 +00:00
Gabriel Adrian Samfira
5214aca228 Add jit config for new runner
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:49:57 +00:00
Gabriel Adrian Samfira
6dea1c1937 Add temporary replace to fork
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:49:56 +00:00
Gabriel Adrian Samfira
d5f8cf079e Ignore instances that are still being created from reaping
When using JIT runners, we register the runner on GitHub before we get
a chance to spin up the instance in the provider. In such cases, we end
up with a runner in "offline" state while we're creating the actual resource
that will embody the runner. This change will give runners a chance to come
online before garm tries to clean them up.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:49:06 +00:00
Gabriel Adrian Samfira
591641a8a3 Add temporary redirect to go-github fork
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:49:04 +00:00
Gabriel Adrian Samfira
09d2f1b061 Add Jit functions to GH client interface
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:48:09 +00:00
Gabriel Adrian Samfira
fc77a4b735 Update go-github and garm-provider-common
We need to abstract away the tools struct and not have garm-provider-common
depend on go-github just for that one struct. It makes it hard to update
go-github without updating garm-provider-common first and then all the rest
of the providers.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 07:56:56 +00:00
Gabriel Adrian Samfira
2caf25f18b Run go generate
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-28 09:58:13 +00:00
Gabriel Adrian Samfira
a26907fb91 Add root CA bundle metadata URL
Thic change adds a metadata endpoint that returns a list of root CA
certificates a runner must install in order to be able to validate all
relevant API endpoints it may require. This includes any GHES API that
runs on a self signed certificate.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-28 09:44:18 +00:00
Gabriel Adrian Samfira
d700b790ac Update garm-provider-common and go-github
* Updates the garm-provider-common and go-github packages.
* Update sqlToParamsInstance to return an error when unmarshaling

This change is needed to pull in the new Seal/Unseal functions in common.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-28 08:13:44 +00:00
Gabriel Adrian Samfira
891b6d3105 Run go generate
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-26 19:47:17 +00:00
Gabriel Adrian Samfira
baa7df65a4 Fix garm pool manager startup
If we fail to get the tools for one pool, garm fails to start due to pool
manager startup timeout. Launch the initial tools update function as a
goroutine and return from Start(). If it fails, it will retry, and we won't
block garm from starting.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-25 08:57:24 +00:00
Gabriel Adrian Samfira
93bfb6fe07
ping the webhook after creation
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:02 +03:00
Gabriel Adrian Samfira
7bd3e0b6f7
Add ping hook functions to client
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:02 +03:00
Gabriel Adrian Samfira
d57e488f12
Return details in case PAT does not have access
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:02 +03:00
Gabriel Adrian Samfira
1c0ff85a0d
Add flag to toggle webhook management
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:02 +03:00
Gabriel Adrian Samfira
bb6ee9c668
Add keepWebhook flag when deleting entities
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:01 +03:00
Gabriel Adrian Samfira
763eb705d8
Remove webhook when removing an entity & cmd fixes
* When removing a repo or org, we uninstall the webhook as well.
  * Upgrade cobra command and mark "webhook-secret" and "random-webhook-secret"
    as MarkFlagsOneRequired()

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:01 +03:00
Gabriel Adrian Samfira
779afe980e
Add webhook show, return info and some fixes
* Added a webhook show command. This gives us info about the webhook and
    if it is installed.
  * Return webhook info when installing the webhook
  * Small typo fixes.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:01 +03:00
Gabriel Adrian Samfira
6051fa016c
Return bad request if hook already installed
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:01 +03:00
Gabriel Adrian Samfira
dbd41f518d
Add CLI webhook enablement
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:01 +03:00
Gabriel Adrian Samfira
7ce3f007b0
Add functions to (un)install webhooks for orgs and repos
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:01 +03:00
Gabriel Adrian Samfira
99539edde7 Add controller info
This change adds a new controller info endpoint and associated client and
CLI command. The controller info endpoint returns information about controller
status and configuration.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-12 22:47:50 +00:00
Gabriel Adrian Samfira
3e446a8191 Update docs and deps
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-26 16:03:36 +00:00
Gabriel Adrian Samfira
72f45bba47 Remove unused code
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-24 07:14:29 +00:00
Gabriel Adrian Samfira
7e4dcea056 Update deps
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-23 15:50:49 +00:00
Gabriel Adrian Samfira
e33b64aacb Providers now return ProviderInstance{}
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-23 12:47:56 +00:00
Gabriel Adrian Samfira
2a6b453fc1 Move the exec helper
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-23 12:33:49 +00:00
Gabriel Adrian Samfira
e775c9c11d Move most of util package
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-22 22:39:17 +00:00
Gabriel Adrian Samfira
ed651bb7d0 Move errors to external package
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-22 22:26:47 +00:00
Gabriel Adrian Samfira
da13cec2de Move code to external package
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-21 15:34:18 +00:00
Gabriel
3fe5d510fe
Merge pull request #124 from gabriel-samfira/fix-entity-update
Fix entity update
2023-07-05 13:39:07 +03:00
Gabriel Adrian Samfira
fe2cb01528 Slight refactor and fix tests
Updating a pool will no longer try to create a pool manager if one does
not already exist. A pool manager must be started when a pool is created.
Updating an existing pool without a pool manager is an error condition.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-05 09:46:19 +00:00
Gabriel Adrian Samfira
86ed06d6ff Rename UpdateRepositoryParams to UpdateEntityParams
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-05 00:00:24 +00:00
Gabriel Adrian Samfira
c162bde6cb Set pool manager status
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-04 23:05:01 +00:00
Gabriel Adrian Samfira
3d26900d32 Set credentials in pool manager
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-04 22:11:45 +00:00
Gabriel Adrian Samfira
a41eeb6f1e Update comment on function
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-04 10:48:14 +00:00
Gabriel Adrian Samfira
6c06afb8e8 Don't add aditional labels to GH runner
For now, the aditional labels would only contain the job ID that triggered
the creation of the runner. It does not make sense to add this label to the
actual runner that registeres against github. We can simply use it internally
by fetching it from the DB.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:48:22 +00:00
Gabriel Adrian Samfira
0ab8f73bb4 Use r.log()
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
9101cdc0a2 Lower the tool update interval to 1 minute
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
f92ac2a74f Lower backoff timer to 1 minute
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
7f510ec40a Check if we have a recorded job
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
a526c1024c Various fixes
* enable foreign key constraints on sqlite
  * on delete cascade for addresses and status messages
  * add debug server config option
  * fix rr allocation

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00