This change uses the database watcher to watch for changes to the
github entities, credentials and controller info.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change allows users to bypass GitHub Unauthorized errors when removing
github runners. This means that removing runners will now be possible even
if the pool manager is stopped.
There is a new flag added to the runner rm command and to the API that
tells GARM to bypass pool being stopped and any 401 error returned by
GitHub.
This means you will be able to remove the runners from garm and your
provider, but will mean that the runner will still exist in github as
"offline" if the credentials are not updated or the runner manually removed.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change increases the tools refresh interval to 5 minutes, cleans
up the websocket code a bit, augments the error message that may be returned
when trying to delete a runner in an invalid state and removes a log message
that does not add much value.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds a flag on providers that allows users to disable JIT
configuration even when it's available. For context, JIT is available
on github.com and any GHES instance >=3.10.
This option is a stopgap measure for providers that have not yet been
updated to use JIT configs instead of runner registration tokens.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This branch adds the ability to forcefully remove a runner from GARM.
When the operator wishes to manually remove a runner, the workflow is as
follows:
* Check that the runner exists in GitHub. If it does, attempt to
remove it. An error here indicates that the runner may be processing
a job. In this case, we don't continue and the operator gets immediate
feedback from the API.
* Mark the runner in the database as pending_delete
* Allow the consolidate loop to reap it from the provider and remove it
from the database.
Removing the instance from the provider is async. If the provider errs out,
GARM will keep trying to remove it in perpetuity until the provider succedes.
In situations where the provider is misconfigured, this will never happen, leaving
the instance in a permanent state of pending_delete.
A provider may fail for various reasons. Either credentials have expired, the
API endpoint has changed, the provider is misconfigured or the operator may just
have removed it from the config before cleaning up the runners. While some cases
are recoverable, some are not. We cannot have a situation in which we cannot clean
resources in garm because of a misconfiguration.
This change adds the pending_force_delete instance status. Instances marked with
this status, will be removed from GARM even if the provider reports an error.
The GARM cli has been modified to give new meaning to the --force-remove-runner
option. This option in the CLI is no longer mandatory. Instead, setting it will mark
the runner with the new pending_force_delete status. Omitting it will mark the runner
with the old status of pending_delete.
Fixes: #160
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
We need to abstract away the tools struct and not have garm-provider-common
depend on go-github just for that one struct. It makes it hard to update
go-github without updating garm-provider-common first and then all the rest
of the providers.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Thic change adds a metadata endpoint that returns a list of root CA
certificates a runner must install in order to be able to validate all
relevant API endpoints it may require. This includes any GHES API that
runs on a self signed certificate.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Updates the garm-provider-common and go-github packages.
* Update sqlToParamsInstance to return an error when unmarshaling
This change is needed to pull in the new Seal/Unseal functions in common.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Added a webhook show command. This gives us info about the webhook and
if it is installed.
* Return webhook info when installing the webhook
* Small typo fixes.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This commit adds:
* more granular loops for various operations
* update go-github to latest version
* skip trying to fetch runner info for canceled or skipped jobs
* loops use waitgroups to signal exit
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change renames the module from "garm" to "github.com/cloudbase/garm".
This will make it easier to consume public functions defined in garm, by
external applications, without having to resort to replace.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Garm no longer fails on startup if a pool manager cannot be started. It
will attempt to start the pool manager in the background. If it fails
due to an unauthorized error, it will sleep for 3 hours. It is unlikely
it will work a second time if credentials are not updated in the config
and garm is restarted, so no point in getting rate limited.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The GitHub credentials section now allows setting some API endpoints
that point the github client and the runner setup script to the propper
URLs. This allows us to use garm with an on-prem github enterprise server.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
In some cases, runner information is not sent via webhook by Github when
a workflow job transitions to in_progress. We need to know the runner
name in order to update the state in the database. Attempt to fetch the
runner from the API using the workflow ID.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The instance JWT token expiration time was set at 15 minutes, regardless
of bootstrap timeout. This meant that instances that take longer than 15
minutes, would not be able to send their status updates and github agent
ID back to garm.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* CLI properly formats the IP addresses in runner show
* LXD provider now waits for an IP address before returning on Create
* Added a few mocks for testing
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* added interface for the github client. This will help mocking it
out for testing.
* removed some unused code
* moved some code around
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Runners can now be manually removed using the CLI. Some restrictions apply:
* A runner must be idle in github. Github will not allow us to remove a runner
that is running a workflow.
* The runner status must be "running"
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>