This change adds workflow job ID, scaleset job ID and workflow run ID
to the metrics.
This change also attempts to fix how jobs are recorded when a workflow
is posted by a webhook, but the job is handled by a scale set.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
It seems that on some systems like k8s, rfc 1123 is a hard requirement
and validation fails if hostnames have any uppercase letters, leading to
nodes not being able to join.
This change makes all runner names lowercase, hopefully fixing this.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
When AgentURL is not explicitly configured, derive it automatically
from MetadataURL (or CallbackURL as fallback) by taking the base URL
and setting the path to /agent.
This makes AgentURL effectively optional for users who use the standard
URL structure, reducing configuration burden when upgrading to versions
that require agent mode support.
The check for max runners was added to CreateInstance(), but we crete the
JIT runners before we run the function to add a runner to the DB. The defer
function to clean up the JIT runner was being run after the error return
generated by CreateInstance. So the cleanup code never ran. Additionally
we would know that max runners was reached only after creating the JIT
runner. Which kills rate limits.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
- Add nexthop-ai/garm-provider-cloudstack to Dockerfile provider build
- Add CloudStack to the list of supported providers in README.md
- Add CloudStack to provider lists in docs, also sort them alphabetically
If GARM is killed or restarted while creating a runner, there is a chance
that runners remain in creating or deleting state. We've started checking
state transitions in GARM and allow a transition when the new state makes
sense in normal circumstances. However, when recovering from a crash, we
may be in an inconsisten state from which we need to recover.
This change added a ForceUpdateInstance() function that ignores state
transition inconsistencies. For now, we only use it when spinning up a
scale set and check for instance states.
This change also fixes a locking issue.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The RepoID, OrgID and EnterpriseID are the entities that generated
the webhook which notified us of the job running in the repo.
The RepositoryName is the actual repository that started the job.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
GARM cares about jobs in queued state for anything that requires
decision making. Anything else is purely informational.
This change cleans up all inactionable jobs and refuses to record jobs
that are not already in the database, have an inactionable state and
which do not have a runner we own handling them.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds metrics for rate limits. Rate limits are now recorded
via a rate limit check loop (as before), but in addition, we are now
taking the rate limit info that gets returned in all github responses
and we're recording that as it happens as opposed to every 30 seconds.
The loop remains to update rate limits even for credentials that are
used rarely.
This change also adds a credentials details page in the webUI.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
* The "CA Certificate Bundle (Optional)" showed no indication of a
certificate being selected. This change fixes that.
* The gitea tools cache worker should not fall back to the default releases
page if the custom page set by the user returned an error.
* Selecting "Use Internal Tools Metadata" in the gitea endpoint edit modal
now greys out the "Tools Metadata URL (optional)" text field.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds a new "generation" field to pools, scalesets and
runners. The generation field is inherited by runners from scale sets
or pools at the time of creation.
The generation field on scalesets and pools is incremented when the
pool or scale set is updated in a way that might influence how runners
are created (flavor, image, specs, runner groups, etc).
Using this new field, we can determine if existing runners have diverged
from the settings of the pool/scale set that spawned them.
In the CLI we now have a new set of commands available for both
pools and scalesets that lists runners, with an optional --outdated
flag and a new "rotate" flag that removes all idle runners. Optionally
the --outdated flag can be passed to the rotate command to only remove
outdated runners.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Users and instances now have different endpoint for listing tools.
Moreover, users can now use a flag to see what tools are available
upstream if sync is off:
garm-cli controller tools list --upstream
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds the ability to manage garm-agent tools downloads. Users
can:
* Set an upstream releases page (github releases api)
* Enable sync from upstream. In this case, GARM will automatically download
garm-agent tools from the releases page and save them in the internal
object store
* Manually upload tools. Manually uploaded tools for an OS/arch combination
will never be overwritten by auto-sync. Usrs will need to delete manually
uploaded tools to enable sync for that os/arch release.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds a new "agent mode" to GARM. The agent enables GARM to
set up a persistent websocket connection between the garm server and the
runners it spawns. The goal is to be able to easier keep track of state,
even without subsequent webhooks from the forge.
The Agent will report via websockets when the runner is actually online,
when it started a job and when it finished a job.
Additionally, the agent allows us to enable optional remote shell between
the user and any runner that is spun up using agent mode. The remote shell
is multiplexed over the same persistent websocket connection the agent
sets up with the server (the agent never listens on a port).
Enablement has also been done in the web UI for this functionality.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Bumps the go_modules group with 1 update in the / directory: [golang.org/x/crypto](https://github.com/golang/crypto).
Updates `golang.org/x/crypto` from 0.43.0 to 0.45.0
- [Commits](https://github.com/golang/crypto/compare/v0.43.0...v0.45.0)
---
updated-dependencies:
- dependency-name: golang.org/x/crypto
dependency-version: 0.45.0
dependency-type: direct:production
dependency-group: go_modules
...
Signed-off-by: dependabot[bot] <support@github.com>
Added a loop over installdependencies.sh call so that if a parallel process is using dpkg,
we can wait and try again.
The timeout between attempts is set at 15sec, and the max number of attempts is 5.
When checking if a pool has required labels, we need to make sure the
search is case insensitive.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The runner metatada URL is meant to give runner install scripts an
easier way to get instance specific metadata, needed for the setup
process. We can use this URL to easier expand installation metadata as
opposed to having to change the cloud config InstallRunnerParams{}.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Use temporary storage while the client is streaming the file to GARM.
This ensures that while uploading, we don't lock the blob database. On slow
connections this would mean that no readers would be able to access the db
while data was being written to it via the upload process.
By saving the file to a temporary location and only after we receive the
entire thing, add it to the DB, we significantly reduce the time we need to
keep the DB locked.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This change adds the API endpoints, the CLI commands and the web UI elements
needed to manage objects in GARMs internal storage.
This storage system is meant to be used to distribute the garm-agent and as a
single source of truth for provider binaries, when we will add the ability for GARM
to scale out.
Potentially, we can also use this in air gapped systems to distribute the runner binaries
for forges that don't have their own internal storage system (like GHES).
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This commit adds the DB models and functions needed to create, read,
search through, update and delete files within sqlite3.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Use uncompressed tools for gitea. Gitea compresses using .xz, including for
Windows, which does not have a native, built-in tool to uncompress that
format.
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>