Commit graph

592 commits

Author SHA1 Message Date
Mario Constanti
215bd71855 feat: passthrough additional env vars to provider
as some provider binaries probably need additional environment variables
set (e.g kubernetes as client-go depends on KUBERNETES_SERVICE_ vars) it
should be possible to define a list of environment variables which
should get bypassed into the provider binary execution
2023-12-01 11:54:34 +01:00
Gabriel
05e179604d
Merge pull request #183 from gabriel-samfira/force-remove-runner
Add force delete runner
2023-11-21 20:56:24 +02:00
Gabriel Adrian Samfira
b9c7c93f7f
Fix linting issues
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-11-21 20:52:40 +02:00
Gabriel
70501ffc78
Merge pull request #3 from mihaelabalutoiu/add-more-integration-tests
Add more integration tests
2023-11-21 20:40:29 +02:00
Mihaela Balutoiu
c563ace750 Add integration tests for test external provider
Signed-off-by: Mihaela Balutoiu <mbalutoiu@cloudbasesolutions.com>
2023-11-06 13:03:25 +02:00
Gabriel Adrian Samfira
d09f12dfd8 Add force delete runner
This branch adds the ability to forcefully remove a runner from GARM.

When the operator wishes to manually remove a runner, the workflow is as
follows:

* Check that the runner exists in GitHub. If it does, attempt to
  remove it. An error here indicates that the runner may be processing
  a job. In this case, we don't continue and the operator gets immediate
  feedback from the API.
* Mark the runner in the database as pending_delete
* Allow the consolidate loop to reap it from the provider and remove it
  from the database.

Removing the instance from the provider is async. If the provider errs out,
GARM will keep trying to remove it in perpetuity until the provider succedes.

In situations where the provider is misconfigured, this will never happen, leaving
the instance in a permanent state of pending_delete.

A provider may fail for various reasons. Either credentials have expired, the
API endpoint has changed, the provider is misconfigured or the operator may just
have removed it from the config before cleaning up the runners. While some cases
are recoverable, some are not. We cannot have a situation in which we cannot clean
resources in garm because of a misconfiguration.

This change adds the pending_force_delete instance status. Instances marked with
this status, will be removed from GARM even if the provider reports an error.

The GARM cli has been modified to give new meaning to the --force-remove-runner
option. This option in the CLI is no longer mandatory. Instead, setting it will mark
the runner with the new pending_force_delete status. Omitting it will mark the runner
with the old status of pending_delete.

Fixes: #160

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-10-12 06:15:36 +00:00
Gabriel
7f4f4bd7e1
Merge pull request #182 from gabriel-samfira/update-common
Update garm-provider-common
2023-10-09 14:02:09 +03:00
Gabriel Adrian Samfira
26dbc3d8e5 Update garm-provider-common
This update pulls in the latest version of garm-provider-common which removes
its dependency on go-github, making future updates much less painful.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-10-09 10:55:11 +00:00
Gabriel
7fda604a37
Merge pull request #180 from mercedes-benz/additional_metrics
feat: add new metrics
2023-10-06 11:43:00 +03:00
Mario Constanti
58e8b3454c feat: add new metrics
add info metrics about providers, enterprises, organizations,
repositories and pools.

Also expose most of the configurable pool information as metric like
e.g. max Runners as garm_pool_max_runners

Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>
2023-10-06 10:21:56 +02:00
Gabriel
a48ec0c0a8
Merge pull request #163 from gabriel-samfira/add-jit-config
Add jit config
2023-09-24 17:15:38 +03:00
Gabriel Adrian Samfira
019948acbe Add JIT config as part of instance create
We must create the DB entry for a runner with a JIT config included. Adding it later
via an update runs the risk of having the consolidate loop pick up the incomplete instance.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:51:17 +00:00
Gabriel Adrian Samfira
8c507a9251 Run go generate
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:51:17 +00:00
Gabriel Adrian Samfira
4bedb1dd63 Fix URLs for enterprises
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:51:17 +00:00
Gabriel Adrian Samfira
5f2cb19503 Use accessors when getting response values
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:51:17 +00:00
Gabriel Adrian Samfira
e238f84781 update modules
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:51:16 +00:00
Gabriel Adrian Samfira
e53c271337 Add metadata URLs
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:50:21 +00:00
Gabriel Adrian Samfira
1268507ce6 Add jit config routes
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:50:20 +00:00
Gabriel Adrian Samfira
5214aca228 Add jit config for new runner
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:49:57 +00:00
Gabriel Adrian Samfira
6dea1c1937 Add temporary replace to fork
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:49:56 +00:00
Gabriel Adrian Samfira
d5f8cf079e Ignore instances that are still being created from reaping
When using JIT runners, we register the runner on GitHub before we get
a chance to spin up the instance in the provider. In such cases, we end
up with a runner in "offline" state while we're creating the actual resource
that will embody the runner. This change will give runners a chance to come
online before garm tries to clean them up.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:49:06 +00:00
Gabriel Adrian Samfira
591641a8a3 Add temporary redirect to go-github fork
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:49:04 +00:00
Gabriel Adrian Samfira
09d2f1b061 Add Jit functions to GH client interface
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:48:09 +00:00
Gabriel Adrian Samfira
de17fb04b4 Add helper functions for marshaling and sealing
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:48:09 +00:00
Gabriel Adrian Samfira
034cc47185 Add jitconfig model field
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 13:48:09 +00:00
Gabriel
6089f17b08
Merge pull request #179 from gabriel-samfira/update-dependencies
Update go-github and garm-provider-common
2023-09-24 11:10:54 +03:00
Gabriel Adrian Samfira
90c954e0e5 Replace deprecated function call
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 08:04:00 +00:00
Gabriel Adrian Samfira
fc77a4b735 Update go-github and garm-provider-common
We need to abstract away the tools struct and not have garm-provider-common
depend on go-github just for that one struct. It makes it hard to update
go-github without updating garm-provider-common first and then all the rest
of the providers.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-09-24 07:56:56 +00:00
Gabriel
b1dd54c07e
Merge pull request #175 from mihaelabalutoiu/add-cleanup-webhooks
Cleaning up leftover Github webhooks for `org/repo`
2023-08-29 17:54:55 +03:00
Mihaela Balutoiu
f9c3f30ae4 Cleaning up leftover Github webhooks for org/repo
Signed-off-by: Mihaela Balutoiu <mbalutoiu@cloudbasesolutions.com>
2023-08-29 11:10:11 +03:00
Gabriel
0da5f106a0
Merge pull request #174 from gabriel-samfira/add-ca-cert-bundle-metadata
Add root CA bundle metadata URL
2023-08-28 13:10:40 +03:00
Gabriel Adrian Samfira
2caf25f18b Run go generate
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-28 09:58:13 +00:00
Gabriel Adrian Samfira
a26907fb91 Add root CA bundle metadata URL
Thic change adds a metadata endpoint that returns a list of root CA
certificates a runner must install in order to be able to validate all
relevant API endpoints it may require. This includes any GHES API that
runs on a self signed certificate.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-28 09:44:18 +00:00
Gabriel
f463a41ce2
Merge pull request #173 from gabriel-samfira/switch-to-seal-unseal
Switch to seal unseal
2023-08-28 11:42:25 +03:00
Gabriel Adrian Samfira
4d1acdcaab Switch to util.Seal and util.Unseal
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-28 08:15:50 +00:00
Gabriel Adrian Samfira
d700b790ac Update garm-provider-common and go-github
* Updates the garm-provider-common and go-github packages.
* Update sqlToParamsInstance to return an error when unmarshaling

This change is needed to pull in the new Seal/Unseal functions in common.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-28 08:13:44 +00:00
Gabriel
4348999cb1
Merge pull request #172 from gabriel-samfira/add-relation-to-jobs
Add relation to jobs
2023-08-26 23:19:22 +03:00
Gabriel Adrian Samfira
f2100f7c91 Fix tests
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-26 20:13:48 +00:00
Gabriel Adrian Samfira
891b6d3105 Run go generate
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-26 19:47:17 +00:00
Gabriel Adrian Samfira
59e6fb28c2 Create relation between WorkflowJobs and Instances
Ensure that there is a foreign key constraint between runners and jobs.
Once a runner is associated with a job, we want the job to be removed along
with the runner.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-26 19:45:59 +00:00
Gabriel
d3479790d7
Merge pull request #171 from ionutbalutoiu/optimize-wait-pool-running-idle-func
Optimize `waitPoolRunningIdleInstances` func
2023-08-25 17:37:15 +03:00
Ionut Balutoiu
93d290df47 Optimize waitPoolRunningIdleInstances func
Instead of asking for all the GARM instances from the API, and then
filtering them by `poolID`, we can ask the API to return only the
instances that are in the `poolID` pool.

Therefore, we only need to count the instances that are running and idle.

Signed-off-by: Ionut Balutoiu <ibalutoiu@cloudbasesolutions.com>
2023-08-25 17:14:11 +03:00
Gabriel
4f8ca6082c
Merge pull request #170 from gabriel-samfira/fix-runner-group-on-create-pool
Properly set runner group when creating a pool
2023-08-25 16:39:26 +03:00
Gabriel Adrian Samfira
7b6f51c032 Properly set runner group when creating a pool
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-25 13:29:26 +00:00
Gabriel
b44f04be5d
Merge pull request #169 from gabriel-samfira/fix-pool-manager-start
Fix garm pool manager startup
2023-08-25 12:04:14 +03:00
Gabriel Adrian Samfira
baa7df65a4 Fix garm pool manager startup
If we fail to get the tools for one pool, garm fails to start due to pool
manager startup timeout. Launch the initial tools update function as a
goroutine and return from Start(). If it fails, it will retry, and we won't
block garm from starting.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-08-25 08:57:24 +00:00
Gabriel
e314688ad7
Merge pull request #168 from ionutbalutoiu/fix-e2e-tests
Fix `waitPoolRunningIdleInstances` function
2023-08-24 16:43:16 +03:00
Ionut Balutoiu
4e3ad41c0b Fix waitPoolRunningIdleInstances function
The variable `runningIdleCount` would get incremented for instances
on every pool, instead of only for the pool we are interested in.
This change fixes this.

Also, adjust the logging message when error occurs in this timeout
exceeded scenario.

Signed-off-by: Ionut Balutoiu <ibalutoiu@cloudbasesolutions.com>
2023-08-24 16:12:00 +03:00
Gabriel
385a00ef9d
Merge pull request #167 from ionutbalutoiu/refactor-e2e-tests
Refactor integration E2E tests
2023-08-24 15:56:12 +03:00
Ionut Balutoiu
318bc52b57 Refactor integration E2E tests
* General cleanup of the integration tests Golang code. Move the
  `e2e.go` codebase into its own package and separate files.
* Reduce the overall log spam from the integration tests output.
* Add final GitHub workflow step that stops GARM server, and does the
  GitHub cleanup of any orphaned resources.
* Add `TODO` to implement cleanup of the orphaned GitHub webhooks.
  This is useful, if the uninstall of the webhooks failed.
* Add `TODO` for extra missing checks on the GitHub webhooks
  install / uninstall logic.

Signed-off-by: Ionut Balutoiu <ibalutoiu@cloudbasesolutions.com>
2023-08-24 15:22:46 +03:00