* Close response body in scaleset client
* Wait for message listener loop to exit before attempting restart
* Add LastMessageID field to scaleset model and function to update it
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This adds the workers needed to start listening for scale set messages.
There is no handling of messages yet.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds a --long option to most commands and includes the
CreateAt and UpdatedAt fields in the output.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The JSON that gets returned by the API is filled with empty values
which serve no purpose. Adding omitempty will skip empty values.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The websocket client and hub interaction has been simplified a bit.
The hub now acts only as a tee writer to the various clients that
register. Clients must register and unregister explicitly. The hub
is no longer passed in to the client.
Websocket clients now watch for password changes or jwt token expiration
times. Clients are disconnected if auth token expires or if the password
is changed.
Various aditional safety checks have been added.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
GARM has a backoff interval when consuming queued jobs. This backoff
is intended to allow any potential idle runners to pick up a job before
GARM attempts to spin up a new one. This change allows users to set a
custom backoff interval or disable it altogether by setting it to 0.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change uses the database watcher to watch for changes to the
github entities, credentials and controller info.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change moves the callback_url, metadata_url and webhooks_url from
the config to the database. The goal is to move as much as possible from
the config to the DB, in preparation for a potential refactor that will
allow GARM to scale out. This would allow multiple nodes to share a single
source of truth.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Do not rely on the entity object to hold updated or detailed credentials,
fetch them from the DB every time.
This change also ensures that we pass in the user context instead of the
runner context to the DB methods.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
When updating credentials on an entity, we must ensure that the new credentials
belong to the same endpoint as the entity.
When an entity is created, the endpoint is determined by the credentials that
were used during the create operation. From that point forward the entity is
associated with an endpoint, and that cannot change.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Add database models that deal with github credentials. This change
adds models for github endpoints (github.com, GHES, etc). This change
also adds code to migrate config credntials to the DB.
Tests need to be fixed and new tests need to be written. This will come
in a later commit.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
We'll use GithubEntityType throughout the codebase to determine the
type of operation that is about to take place, so this won't belimited
to determining only pool type. We'll also use this to dedupe the label
scope as well.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds the ability to specify the pool balancing strategy to
use when processing queued jobs. Before this change, GARM would round-robin
through all pools that matched the set of tags requested by queued jobs.
When round-robin (default) is used for an entity (repo, org or enterprise)
and you have 2 pools defined for that entity with a common set of tags that
match 10 jobs (for example), then those jobs would trigger the creation of
a new runner in each of the two pools in turn. Job 1 would go to pool 1,
job 2 would go to pool 2, job 3 to pool 1, job 4 to pool 2 and so on.
When "stack" is used, those same 10 jobs would trigger the creation of a
new runner in the pool with the highest priority, every time.
In both cases, if a pool is full, the next one would be tried automatically.
For the stack case, this would mean that if pool 2 had a priority of 10 and
pool 1 would have a priority of 5, pool 2 would be saturated first, then
pool 1.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds the ability to use GitHub Apps to authenticate against the
GitHub API. This gives us a larger quota for API requests (15k vs 5k for PATs).
Also, each GitHub App has its own quota, whereas PATs share the same user quota.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Allow runners to update their own system information. Runners can now send
back os_name, os_version and agent_id back as part of a POST to
CALLBACK_URL/system-info/.
The goal is to get better info in regard to the actual OS that's running
and to move the agent_id from the status updates to the system-info callback.
The status updates should be used only to send back info about the status of
the installation process.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
On large deployments with many jobs, we cannot check each job that
we recorded in the DB against the GH API.
Before this change, if a job was updated more than 10 minutes ago,
garm would check against the GH api if that job still existed. While
this approach allowed us to maintain a consistent view over which jobs
still exist and which are stale, it had the potential of spamming the
GH API, leading to rate limiting.
This change uses the scale-down loop as an indicator for job staleness.
If a job remains in queued state in our DB, but has dissapeared from GH
or was serviced by another runner and we never got the hook (garm was down
or GH had an issue - happened in the past), then garm will spin up a new
runner for it. If that runner or any other runner is scaled down, we check
if we have jobs in the queue that should have matched that runner. If we did,
there is a high chance that the job no longer exists in GH and we can remove
the job from the queue.
Of course, there is a chance that GH is having issues and the job is never
pushed to the runner, but we can't really account for everything. In this case
I'd rather avoid rate limiting ourselves.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
We must create the DB entry for a runner with a JIT config included. Adding it later
via an update runs the risk of having the consolidate loop pick up the incomplete instance.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
We need to abstract away the tools struct and not have garm-provider-common
depend on go-github just for that one struct. It makes it hard to update
go-github without updating garm-provider-common first and then all the rest
of the providers.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Thic change adds a metadata endpoint that returns a list of root CA
certificates a runner must install in order to be able to validate all
relevant API endpoints it may require. This includes any GHES API that
runs on a self signed certificate.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Updates the garm-provider-common and go-github packages.
* Update sqlToParamsInstance to return an error when unmarshaling
This change is needed to pull in the new Seal/Unseal functions in common.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Added a webhook show command. This gives us info about the webhook and
if it is installed.
* Return webhook info when installing the webhook
* Small typo fixes.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>