This change adds a new websocket endpoint for database events. The events
endpoint allows clients to stream events as they happen in GARM. Events
are defined as a structure containning the event type (create, update, delete),
the database entity involved (instances, pools, repos, etc) and the payload
consisting of the object involved in the event. The payload translates
to the types normally returned by the API and can be deserialized as one
of the types present in the params package.
The events endpoint is a websocket endpoint and it accepts filters as
a simple json send over the websocket connection. The filters allows the
user to specify which entities are of interest, and which operations should
be returned. For example, you may be interested in changes made to pools
or runners, in which case you could create a filter that only returns
update operations for pools. Or update and delete operations.
The filters can be defined as:
{
"filters": [
{
"entity_type": "instance",
"operations": ["update", "delete"]
},
{
"entity_type": "pool"
},
],
"send_everything": false
}
This would return only update and delete events for instances and all events
for pools. Alternatively you can ask GARM to send you everything:
{
"send_everything": true
}
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The websocket client and hub interaction has been simplified a bit.
The hub now acts only as a tee writer to the various clients that
register. Clients must register and unregister explicitly. The hub
is no longer passed in to the client.
Websocket clients now watch for password changes or jwt token expiration
times. Clients are disconnected if auth token expires or if the password
is changed.
Various aditional safety checks have been added.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
GARM has a backoff interval when consuming queued jobs. This backoff
is intended to allow any potential idle runners to pick up a job before
GARM attempts to spin up a new one. This change allows users to set a
custom backoff interval or disable it altogether by setting it to 0.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change uses the database watcher to watch for changes to the
github entities, credentials and controller info.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Adds a simple database watcher. At this point it's just one process, but
the plan is to allow different implementations that inform the local running
workers of changes that have occured on entities of interest in the database.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change moves the callback_url, metadata_url and webhooks_url from
the config to the database. The goal is to move as much as possible from
the config to the DB, in preparation for a potential refactor that will
allow GARM to scale out. This would allow multiple nodes to share a single
source of truth.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Do not rely on the entity object to hold updated or detailed credentials,
fetch them from the DB every time.
This change also ensures that we pass in the user context instead of the
runner context to the DB methods.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Add database models that deal with github credentials. This change
adds models for github endpoints (github.com, GHES, etc). This change
also adds code to migrate config credntials to the DB.
Tests need to be fixed and new tests need to be written. This will come
in a later commit.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change fixes a potential nil pointer dereference when a call to
create a pool, fails.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds the ability to specify the pool balancing strategy to
use when processing queued jobs. Before this change, GARM would round-robin
through all pools that matched the set of tags requested by queued jobs.
When round-robin (default) is used for an entity (repo, org or enterprise)
and you have 2 pools defined for that entity with a common set of tags that
match 10 jobs (for example), then those jobs would trigger the creation of
a new runner in each of the two pools in turn. Job 1 would go to pool 1,
job 2 would go to pool 2, job 3 to pool 1, job 4 to pool 2 and so on.
When "stack" is used, those same 10 jobs would trigger the creation of a
new runner in the pool with the highest priority, every time.
In both cases, if a pool is full, the next one would be tried automatically.
For the stack case, this would mean that if pool 2 had a priority of 10 and
pool 1 would have a priority of 5, pool 2 would be saturated first, then
pool 1.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change allows users to bypass GitHub Unauthorized errors when removing
github runners. This means that removing runners will now be possible even
if the pool manager is stopped.
There is a new flag added to the runner rm command and to the API that
tells GARM to bypass pool being stopped and any 401 error returned by
GitHub.
This means you will be able to remove the runners from garm and your
provider, but will mean that the runner will still exist in github as
"offline" if the credentials are not updated or the runner manually removed.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
by adding the context from main and make auth.GetAdminContext accepting a
context we are now able to stop the metrics collection loop once the
context is canceled
Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>
refactoring is needed to make the metrics package usable from within the
runner package for further metrics.
This change also makes the metric-collector independent from requests to
the /metrics endpoint
Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>
I accidentally disabled the log streamer when I moved the config options
to their own section. This change fixes that.
This change also adds some safety checks and locking when cleaning up stale
clients. The websocket hub Write() function now copies the message before
sending it on the channel to the clients.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Pool managers will have 2 fields identifying which manager generated
the log line.
In the future, we will add tracking ids in various cases, allowing
us to track down issues faster.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change switches GARM to the new structured logging standard
library. This will allow us to set log levels and reduce some of
the log spam.
Given that we introduced new knobs to tweak logging, the number of
config options for logging now warrants it's own section.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This branch adds the ability to forcefully remove a runner from GARM.
When the operator wishes to manually remove a runner, the workflow is as
follows:
* Check that the runner exists in GitHub. If it does, attempt to
remove it. An error here indicates that the runner may be processing
a job. In this case, we don't continue and the operator gets immediate
feedback from the API.
* Mark the runner in the database as pending_delete
* Allow the consolidate loop to reap it from the provider and remove it
from the database.
Removing the instance from the provider is async. If the provider errs out,
GARM will keep trying to remove it in perpetuity until the provider succedes.
In situations where the provider is misconfigured, this will never happen, leaving
the instance in a permanent state of pending_delete.
A provider may fail for various reasons. Either credentials have expired, the
API endpoint has changed, the provider is misconfigured or the operator may just
have removed it from the config before cleaning up the runners. While some cases
are recoverable, some are not. We cannot have a situation in which we cannot clean
resources in garm because of a misconfiguration.
This change adds the pending_force_delete instance status. Instances marked with
this status, will be removed from GARM even if the provider reports an error.
The GARM cli has been modified to give new meaning to the --force-remove-runner
option. This option in the CLI is no longer mandatory. Instead, setting it will mark
the runner with the new pending_force_delete status. Omitting it will mark the runner
with the old status of pending_delete.
Fixes: #160
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The user can opt to not delete the webhook (if installed) when removing
the entity from garm. Garm will only ever try to remove a webhook that
exactly matches the URL that is composed of the base webhook URL configured
in the config.toml file and the unique controller ID that is generated
when the controller is first installed. It should be safe to remove the
webhook when the entity is removed.
Of course, this behavior can be disabled.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* When removing a repo or org, we uninstall the webhook as well.
* Upgrade cobra command and mark "webhook-secret" and "random-webhook-secret"
as MarkFlagsOneRequired()
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Added a webhook show command. This gives us info about the webhook and
if it is installed.
* Return webhook info when installing the webhook
* Small typo fixes.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>