Commit graph

640 commits

Author SHA1 Message Date
Gabriel
3b9e822ee0
Merge pull request #207 from gabriel-samfira/update-readme
Update readme
2024-02-12 11:04:33 +02:00
Gabriel Adrian Samfira
8efdcba359 Update README
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-02-12 09:01:13 +00:00
Gabriel Adrian Samfira
d4a9a821cf Add some more info
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-02-12 08:44:26 +00:00
Gabriel Adrian Samfira
1ffa562581 Add using garm doc
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-02-12 01:06:25 +00:00
Gabriel Adrian Samfira
d17629efd4 Add logging section info
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-02-11 12:47:19 +00:00
Gabriel Adrian Samfira
53264528ee Add Equinix and EC2 in the README
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-02-11 12:28:49 +00:00
Gabriel
7c44b8125a
Merge pull request #206 from gabriel-samfira/more-strict-token-checks
More strict instance token checks
2024-01-30 13:57:14 +02:00
Gabriel Adrian Samfira
9031a4029e More strict instance token checks
This change invalidates tokens based on more parameters. Tokens that were
generated for previous attempts of spinning up an instance will be invalidates.

Also, only instances that are in Running or Creating will be able to authenticate.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-01-30 11:07:55 +00:00
Gabriel
23c9ed6b6d
Merge pull request #205 from gabriel-samfira/add-some-logging
Add some logging
2024-01-30 11:41:53 +02:00
Gabriel Adrian Samfira
43b96c543d Add some logging
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-01-30 09:37:26 +00:00
Gabriel
8e0456c83a
Merge pull request #203 from gabriel-samfira/small-fixes
Small adjustments
2024-01-12 22:07:28 +02:00
Gabriel Adrian Samfira
5b735eaaf4 Small adjustments
This change increases the tools refresh interval to 5 minutes, cleans
up the websocket code a bit, augments the error message that may be returned
when trying to delete a runner in an invalid state and removes a log message
that does not add much value.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-01-12 19:53:27 +00:00
Gabriel
523237ca18
Merge pull request #202 from gabriel-samfira/cleanup-websocket-code
Fix log streamer and cleanup code
2024-01-06 16:22:09 +02:00
Gabriel Adrian Samfira
4d7fcbe23a Safely close the quit channel
Prevent accidental closure of an already closed channel.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-01-06 14:15:52 +00:00
Gabriel Adrian Samfira
70bfff96e0 Fix log streamer and cleanup code
I accidentally disabled the log streamer when I moved the config options
to their own section. This change fixes that.

This change also adds some safety checks and locking when cleaning up stale
clients. The websocket hub Write() function now copies the message before
sending it on the channel to the clients.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-01-06 14:05:38 +00:00
Gabriel
0aaeebde28
Merge pull request #201 from gabriel-samfira/switch-to-slog
Switch to slog
2024-01-06 03:04:09 +02:00
Gabriel Adrian Samfira
d44d64dbfd Use log_file from logging config
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-01-06 00:32:39 +00:00
Gabriel Adrian Samfira
61e97f0896 Append pool_type and pool_mgr info to logs
Pool managers will have 2 fields identifying which manager generated
the log line.

In the future, we will add tracking ids in various cases, allowing
us to track down issues faster.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-01-06 00:21:50 +00:00
Gabriel Adrian Samfira
e441b6ce89 Switch to log/slog
This change switches GARM to the new structured logging standard
library. This will allow us to set log levels and reduce some of
the log spam.

Given that we introduced new knobs to tweak logging, the number of
config options for logging now warrants it's own section.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-01-05 23:46:40 +00:00
Gabriel
f72e97209f
Merge pull request #200 from gabriel-samfira/add-systeminfo-callback
Add system-info instance callback
2024-01-04 17:38:02 +02:00
Gabriel Adrian Samfira
2a5e2409b2 Add system-info instance callback
Allow runners to update their own system information. Runners can now send
back os_name, os_version and agent_id back as part of a POST to
CALLBACK_URL/system-info/.

The goal is to get better info in regard to the actual OS that's running
and to move the agent_id from the status updates to the system-info callback.

The status updates should be used only to send back info about the status of
the installation process.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2024-01-04 15:23:43 +00:00
Gabriel
8fe4f17e1c
Merge pull request #198 from gabriel-samfira/create-dirs
Create needed folders before use
2023-12-18 18:50:22 +02:00
Gabriel Adrian Samfira
912371cf57 Create needed folders before use
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-18 16:50:02 +00:00
Gabriel Adrian Samfira
6c7c5a913f Define variable before use
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-18 16:43:30 +00:00
Gabriel
6dbdd5e9b0
Merge pull request #197 from gabriel-samfira/copy-config
Copy provider config in garm folder
2023-12-18 18:39:49 +02:00
Gabriel Adrian Samfira
5d596aa94c Copy provider config in garm folder
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-18 16:39:30 +00:00
Gabriel
a8f468b4a9
Merge pull request #196 from gabriel-samfira/fix-e2e-tests
Export required variables
2023-12-18 18:32:40 +02:00
Gabriel Adrian Samfira
e6eed93546 Export required variables
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-18 16:31:52 +00:00
Gabriel
ca62deab8d
Merge pull request #195 from gabriel-samfira/update-deps
Update deps
2023-12-18 18:25:58 +02:00
Gabriel Adrian Samfira
0dd4f38691 Update go-github
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-18 16:20:44 +00:00
Gabriel Adrian Samfira
66bf762cd6 Update to latest jwt
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-18 16:20:44 +00:00
Gabriel Adrian Samfira
3ec6aeace2 Update dependencies
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-18 16:20:44 +00:00
Gabriel
a13c5db1a7
Merge pull request #194 from gabriel-samfira/remove-lxd-provider
Remove the LXD internal provider
2023-12-18 18:15:54 +02:00
Gabriel Adrian Samfira
ff5b9d22a7 Fix k8s path
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-18 15:24:52 +00:00
Gabriel Adrian Samfira
c4b2a3cd1f Update Dockerfile
Add new providers to Dockerfile:

* k8s
* lxd
* incus

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-18 14:56:05 +00:00
Gabriel Adrian Samfira
d1d8bfa703 Update docs
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-18 14:49:36 +00:00
Gabriel Adrian Samfira
affb56f9a0 Remove the LXD internal provider
Canonical have relicensed the LXD project to AGPLv3. This means that we can
no longer update the go LXD client without re-licensing GARM as AGPLv3. This
is not desirable or possible.

The existing code seems to be Apache 2.0 and all code that has already been
contributed seems to stay as Apache 2.0, but new contributions from Canonical
employees will be AGPLv3.

We cannot risc including AGPLv3 code now or in the future, so we will separate
the LXD provider into its own project which can be AGPLv3. GARM will simply
execute the external provider.

If the client code of LXD will ever be split from the main project and re-licensed
as Apache 2.0 or a compatible license, we will reconsider adding it back as a
native provider. Although in the long run, I believe external providers will
be the only option as they are easier to write, easier to maintain and safer to
ship (a bug in the provider does not crash GARM itself).

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-18 12:16:48 +00:00
Gabriel
fc7a7dde35
Merge pull request #193 from gabriel-samfira/prevent-api-spam
Prevent abusing the GH API
2023-12-16 14:06:30 +02:00
Gabriel Adrian Samfira
459906d97e Prevent abusing the GH API
On large deployments with many jobs, we cannot check each job that
we recorded in the DB against the GH API.

Before this change, if a job was updated more than 10 minutes ago,
garm would check against the GH api if that job still existed. While
this approach allowed us to maintain a consistent view over which jobs
still exist and which are stale, it had the potential of spamming the
GH API, leading to rate limiting.

This change uses the scale-down loop as an indicator for job staleness.

If a job remains in queued state in our DB, but has dissapeared from GH
or was serviced by another runner and we never got the hook (garm was down
or GH had an issue - happened in the past), then garm will spin up a new
runner for it. If that runner or any other runner is scaled down, we check
if we have jobs in the queue that should have matched that runner. If we did,
there is a high chance that the job no longer exists in GH and we can remove
the job from the queue.

Of course, there is a chance that GH is having issues and the job is never
pushed to the runner, but we can't really account for everything. In this case
I'd rather avoid rate limiting ourselves.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-15 22:41:50 +00:00
Gabriel
46ac1b8166
Merge pull request #191 from gabriel-samfira/update-readme
Add some more info to README
2023-12-11 17:53:32 +02:00
Gabriel Adrian Samfira
71c741c43a Add some more info to README
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-11 15:48:28 +00:00
Gabriel
c6ec83a7c6
Merge pull request #190 from gabriel-samfira/update-readme
Add the k8s provider to the list
2023-12-11 17:22:18 +02:00
Gabriel Adrian Samfira
71657bd06b Add the k8s provider to the list
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-11 15:17:12 +00:00
Gabriel
c712366663
Merge pull request #189 from gabriel-samfira/add-option-to-disable-jit-config
Add option to disable JIT config
2023-12-11 16:15:45 +02:00
Gabriel Adrian Samfira
49e06efdf8 Update sample config
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-11 14:05:10 +00:00
Gabriel Adrian Samfira
85968598b0 Add option to disable JIT config
This change adds a flag on providers that allows users to disable JIT
configuration even when it's available. For context, JIT is available
on github.com and any GHES instance >=3.10.

This option is a stopgap measure for providers that have not yet been
updated to use JIT configs instead of runner registration tokens.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2023-12-11 12:37:33 +00:00
Gabriel
0e36eb7056
Merge pull request #188 from mercedes-benz/bypass_additional_environment_variables
feat: passthrough additional env vars to provider bin
2023-12-01 13:06:27 +02:00
Mario Constanti
927a1a4308 feat: build garm with go 1.21
Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>
2023-12-01 11:56:06 +01:00
Mario Constanti
215bd71855 feat: passthrough additional env vars to provider
as some provider binaries probably need additional environment variables
set (e.g kubernetes as client-go depends on KUBERNETES_SERVICE_ vars) it
should be possible to define a list of environment variables which
should get bypassed into the provider binary execution
2023-12-01 11:54:34 +01:00
Gabriel
05e179604d
Merge pull request #183 from gabriel-samfira/force-remove-runner
Add force delete runner
2023-11-21 20:56:24 +02:00