Commit graph

59 commits

Author SHA1 Message Date
Gabriel Adrian Samfira
d09f12dfd8 Add force delete runner
This branch adds the ability to forcefully remove a runner from GARM.

When the operator wishes to manually remove a runner, the workflow is as
follows:

* Check that the runner exists in GitHub. If it does, attempt to
  remove it. An error here indicates that the runner may be processing
  a job. In this case, we don't continue and the operator gets immediate
  feedback from the API.
* Mark the runner in the database as pending_delete
* Allow the consolidate loop to reap it from the provider and remove it
  from the database.

Removing the instance from the provider is async. If the provider errs out,
GARM will keep trying to remove it in perpetuity until the provider succedes.

In situations where the provider is misconfigured, this will never happen, leaving
the instance in a permanent state of pending_delete.

A provider may fail for various reasons. Either credentials have expired, the
API endpoint has changed, the provider is misconfigured or the operator may just
have removed it from the config before cleaning up the runners. While some cases
are recoverable, some are not. We cannot have a situation in which we cannot clean
resources in garm because of a misconfiguration.

This change adds the pending_force_delete instance status. Instances marked with
this status, will be removed from GARM even if the provider reports an error.

The GARM cli has been modified to give new meaning to the --force-remove-runner
option. This option in the CLI is no longer mandatory. Instead, setting it will mark
the runner with the new pending_force_delete status. Omitting it will mark the runner
with the old status of pending_delete.

Fixes: #160

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-10-12 06:15:36 +00:00
Gabriel Adrian Samfira
fb2bb7fb08 Fix nil pointer dereference
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-23 11:21:15 +00:00
Gabriel Adrian Samfira
3b651afe07
Add optional --install-webhook flag when creating repo/org
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:02 +03:00
Gabriel Adrian Samfira
c641e1d596
Fixup field name
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:02 +03:00
Gabriel Adrian Samfira
c00048e128
Add optional keepWebhook flag when removing an entity
The user can opt to not delete the webhook (if installed) when removing
the entity from garm. Garm will only ever try to remove a webhook that
exactly matches the URL that is composed of the base webhook URL configured
in the config.toml file and the unique controller ID that is generated
when the controller is first installed. It should be safe to remove the
webhook when the entity is removed.

Of course, this behavior can be disabled.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:02 +03:00
Gabriel Adrian Samfira
763eb705d8
Remove webhook when removing an entity & cmd fixes
* When removing a repo or org, we uninstall the webhook as well.
  * Upgrade cobra command and mark "webhook-secret" and "random-webhook-secret"
    as MarkFlagsOneRequired()

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:01 +03:00
Gabriel Adrian Samfira
779afe980e
Add webhook show, return info and some fixes
* Added a webhook show command. This gives us info about the webhook and
    if it is installed.
  * Return webhook info when installing the webhook
  * Small typo fixes.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:01 +03:00
Gabriel Adrian Samfira
dbd41f518d
Add CLI webhook enablement
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:01 +03:00
Gabriel Adrian Samfira
7ce3f007b0
Add functions to (un)install webhooks for orgs and repos
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-22 09:39:01 +03:00
Gabriel Adrian Samfira
99539edde7 Add controller info
This change adds a new controller info endpoint and associated client and
CLI command. The controller info endpoint returns information about controller
status and configuration.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-12 22:47:50 +00:00
Mihaela Balutoiu
ff5abf1294 Implement API client functionality in the garm-cli
Signed-off-by: Mihaela Balutoiu <mbalutoiu@cloudbasesolutions.com>
2023-07-25 19:26:00 +03:00
Gabriel Adrian Samfira
e775c9c11d Move most of util package
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-22 22:39:17 +00:00
Gabriel Adrian Samfira
ed651bb7d0 Move errors to external package
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-22 22:26:47 +00:00
Gabriel Adrian Samfira
da13cec2de Move code to external package
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-21 15:34:18 +00:00
Gabriel Adrian Samfira
86ed06d6ff Rename UpdateRepositoryParams to UpdateEntityParams
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-05 00:00:24 +00:00
Gabriel Adrian Samfira
6c6c6636ba Add org and enterprise update
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-04 23:24:19 +00:00
Gabriel Adrian Samfira
cec1d59991 Add repo update command
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-04 22:27:25 +00:00
Gabriel Adrian Samfira
f92ac2a74f Lower backoff timer to 1 minute
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
45dceae88c Add requested labels to cli
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
a15a91b974 Break lock and lower scale down timeout
Break the lock on a job if it's still queued and the runner that it
triggered was assigned to another job. This may cause leftover runners
to be created, but we scale those down in ~3 minutes.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
1287a93cf2 Add job list to API and CLI
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-07-03 07:46:20 +00:00
Gabriel Adrian Samfira
66dc18c3ff
Don't merge the second column in the CLI
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-06-08 13:57:08 +03:00
Gabriel Adrian Samfira
2ebd95977f Limit width in pool view to 100 characters
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-31 07:44:23 +00:00
Gabriel Adrian Samfira
702937f636 Add github runner group in pool show
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-30 09:00:46 +00:00
Gabriel Adrian Samfira
6b3ea50ca5 Add runner group option to pools
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-27 09:21:21 +00:00
Gabriel Adrian Samfira
0074af9370
Move some defaults and types from config
The params package should not depend on config. The params packages
should be consumable by external applications that wish to interact with
garm, and it makes no sense to pull in the config package just for some
constants. As such, the following changes have been made:

  * Moved some types from config to params
  * Moved defaults in a new leaf package called appdefaults

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-14 14:15:10 +02:00
Gabriel Adrian Samfira
829db87f15
Rename module
This change renames the module from "garm" to "github.com/cloudbase/garm".

This will make it easier to consume public functions defined in garm, by
external applications, without having to resort to replace.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-03-12 16:01:49 +02:00
Gabriel Adrian Samfira
040cb0f5f6 Add extra-specs flag
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-01-30 16:03:50 +00:00
Gabriel Adrian Samfira
f25951decb Add extra specs on pools
Extra specs is an opaque valid JSON that can be set on a pool and which
will be passed along to the provider as part of instance bootstrap params.

This field is meant to allow operators to send extra configuration values
to external or built-in providers. The extra specs is not interpreted or
useful in any way to garm itself, but it may be useful to the provider
which interacts with the IaaS.

The extra specs are not meant to be used for secrets. Adding sensitive
information to this field is highly discouraged. This field is meant as a    
means to add fine tuning knobs to the providers, on a per pool basis.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-01-30 13:10:21 +00:00
Gabriel Adrian Samfira
77f96d2761 Drive-by fix: Get propper runner prefix for pool
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-01-27 15:00:06 +00:00
Michael Kuhnt
ee659f509f
feat: add prometheus metrics & endpoint 2023-01-26 14:15:16 +01:00
Gabriel Adrian Samfira
e93b6d73e5
Sanitize log entries
While most of these log entries come from either github or our own
database, it's still a good idea to sanitize them.
2023-01-23 18:01:46 +02:00
Gabriel Adrian Samfira
b354cedf7e
Fixed a bunch of linting issues
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-01-20 22:21:22 +02:00
Gabriel Adrian Samfira
abcc9569bd
Add a common RunnerPrefix type
There are several fields that are common among some of the data
structures in garm. The RunnerPrefix is just one of them. Perhaps we
should move some of the rest in a common type and embed that into the
types that share those fields.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-01-20 12:12:15 +02:00
Michael Kuhnt
6af3025743
feat: allow to configure the runner name 2023-01-19 11:13:36 +01:00
Gabriel Adrian Samfira
547eaeed28
Slightly better error handling 2022-12-31 17:18:59 +02:00
Gabriel Adrian Samfira
05057e37fd
Start pool managers in the background
Garm no longer fails on startup if a pool manager cannot be started. It
will attempt to start the pool manager in the background. If it fails
due to an unauthorized error, it will sleep for 3 hours. It is unlikely
it will work a second time if credentials are not updated in the config
and garm is restarted, so no point in getting rate limited.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-10-21 17:14:03 +03:00
Gabriel Adrian Samfira
3e3b91ee59
Add enterprise support to garm-cli
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-10-21 17:14:03 +03:00
Gabriel Adrian Samfira
f40420bfb6
Add ability to specify github enpoints for creds
The GitHub credentials section now allows setting some API endpoints
that point the github client and the runner setup script to the propper
URLs. This allows us to use garm with an on-prem github enterprise server.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-10-21 17:14:03 +03:00
Gabriel Adrian Samfira
16b56c89de
Format message on error 2022-10-21 11:38:16 +03:00
Gabriel Adrian Samfira
bad7398b6c
Convert byte array to string 2022-10-21 11:23:04 +03:00
Gabriel Adrian Samfira
a7f151e2d2
Add log streamer
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-10-21 11:13:42 +03:00
Gabriel Adrian Samfira
5566cde77f A few fixes
* CLI properly formats the IP addresses in runner show
  * LXD provider now waits for an IP address before returning on Create
  * Added a few mocks for testing

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-07-10 14:52:15 +00:00
Gabriel Adrian Samfira
280cad96e4 Make runner-bootstrap-timeout optional
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-07-06 17:37:18 +00:00
Gabriel Adrian Samfira
9e949d833f Update .gitignore
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-06-30 10:34:51 +00:00
Gabriel Adrian Samfira
bbbe67bf7c Vendor packages and add Makefile
* Vendors packages
  * Adds a Makefile that uses docker to build a static binary against musl
using alpine linux.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-06-30 10:20:32 +00:00
Gabriel Adrian Samfira
15a1308441 Add timeout functionality for pool runner bootstrap
Pools can now define a bootstrap timeout for runners. The timeout can
be defined per pool and indicates the amount of time after which a runner
is considered defunct and removed.

If a runner doesn't join github in the configured amount of time, and it
receives no updates indicating that it is installing the runner via instance
status updates, it is considered defunct.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-06-29 23:44:03 +00:00
Gabriel Adrian Samfira
5390efbaab Add manual runner removal
Runners can now be manually removed using the CLI. Some restrictions apply:

  * A runner must be idle in github. Github will not allow us to remove a runner
that is running a workflow.
  * The runner status must be "running"

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-06-29 16:23:01 +00:00
Gabriel Adrian Samfira
a8274dcc02 Add profile management in the CLI 2022-06-17 10:58:35 +00:00
Gabriel Adrian Samfira
dc04bca95c Retry failed runners
* retry adding runners for up to 5 times if they fail.
  * various fixes
2022-05-10 12:28:39 +00:00