Canonical have relicensed the LXD project to AGPLv3. This means that we can
no longer update the go LXD client without re-licensing GARM as AGPLv3. This
is not desirable or possible.
The existing code seems to be Apache 2.0 and all code that has already been
contributed seems to stay as Apache 2.0, but new contributions from Canonical
employees will be AGPLv3.
We cannot risc including AGPLv3 code now or in the future, so we will separate
the LXD provider into its own project which can be AGPLv3. GARM will simply
execute the external provider.
If the client code of LXD will ever be split from the main project and re-licensed
as Apache 2.0 or a compatible license, we will reconsider adding it back as a
native provider. Although in the long run, I believe external providers will
be the only option as they are easier to write, easier to maintain and safer to
ship (a bug in the provider does not crash GARM itself).
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This branch adds the ability to forcefully remove a runner from GARM.
When the operator wishes to manually remove a runner, the workflow is as
follows:
* Check that the runner exists in GitHub. If it does, attempt to
remove it. An error here indicates that the runner may be processing
a job. In this case, we don't continue and the operator gets immediate
feedback from the API.
* Mark the runner in the database as pending_delete
* Allow the consolidate loop to reap it from the provider and remove it
from the database.
Removing the instance from the provider is async. If the provider errs out,
GARM will keep trying to remove it in perpetuity until the provider succedes.
In situations where the provider is misconfigured, this will never happen, leaving
the instance in a permanent state of pending_delete.
A provider may fail for various reasons. Either credentials have expired, the
API endpoint has changed, the provider is misconfigured or the operator may just
have removed it from the config before cleaning up the runners. While some cases
are recoverable, some are not. We cannot have a situation in which we cannot clean
resources in garm because of a misconfiguration.
This change adds the pending_force_delete instance status. Instances marked with
this status, will be removed from GARM even if the provider reports an error.
The GARM cli has been modified to give new meaning to the --force-remove-runner
option. This option in the CLI is no longer mandatory. Instead, setting it will mark
the runner with the new pending_force_delete status. Omitting it will mark the runner
with the old status of pending_delete.
Fixes: #160
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This update pulls in the latest version of garm-provider-common which removes
its dependency on go-github, making future updates much less painful.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
We need to abstract away the tools struct and not have garm-provider-common
depend on go-github just for that one struct. It makes it hard to update
go-github without updating garm-provider-common first and then all the rest
of the providers.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* Updates the garm-provider-common and go-github packages.
* Update sqlToParamsInstance to return an error when unmarshaling
This change is needed to pull in the new Seal/Unseal functions in common.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* When removing a repo or org, we uninstall the webhook as well.
* Upgrade cobra command and mark "webhook-secret" and "random-webhook-secret"
as MarkFlagsOneRequired()
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This commit adds:
* more granular loops for various operations
* update go-github to latest version
* skip trying to fetch runner info for canceled or skipped jobs
* loops use waitgroups to signal exit
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* cleanupOrphanedGithubRunners() now uses errgroup to parallelize and
report errors when removing runners from the provider.
* retryFailedInstancesForOnePool() now uses errgroup
* Removed some setPoolRunningState which should be treated in the loop
where those errors eventually bubble up and can be handled.
* Added a number of timeouts in the LXD provider for delete and list
instances. This provider should be converted into an external
provider.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Extra specs is an opaque valid JSON that can be set on a pool and which
will be passed along to the provider as part of instance bootstrap params.
This field is meant to allow operators to send extra configuration values
to external or built-in providers. The extra specs is not interpreted or
useful in any way to garm itself, but it may be useful to the provider
which interacts with the IaaS.
The extra specs are not meant to be used for secrets. Adding sensitive
information to this field is highly discouraged. This field is meant as a
means to add fine tuning knobs to the providers, on a per pool basis.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Add more test cases regarding the SQL database interactions.
Moreover, add `go-sqlmock` dependency used to mock SQL DB instructions.
Signed-off-by: Mihaela Balutoiu <mbalutoiu@cloudbasesolutions.com>
* CLI properly formats the IP addresses in runner show
* LXD provider now waits for an IP address before returning on Create
* Added a few mocks for testing
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
When deleting a VM, we try to force stop it. If the VM is already stopped,
LXD will return an error. Unfortunately, we can't import the drivers package
from LXD without also pulling in a bunch of linux specific CGO dependencies
which we want to avoid.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>