DevFW-CICD/garm - EDP: Build your thing in minutes

Author	SHA1	Message	Date
Gabriel	8e13588edd	Merge pull request #314 from mercedes-benz/improve_error_message Improve error messages in garm log	2024-11-26 10:44:41 +02:00
Michael Kuhnt	d6de59619d	commit suggestion	2024-11-22 16:49:56 +01:00
Michael Kuhnt	8a31d81faf	ignore workflow_jobs without labels	2024-11-22 11:48:59 +01:00
Fabian Fulga	dcff6f9854	Add getProviderBaseParams function in basePoolManager	2024-09-02 15:25:44 +03:00
Fabian Fulga	03f280da59	Version provider interface	2024-08-21 16:14:38 +03:00
Gabriel Adrian Samfira	cc6e985629	Fix: Scope entities to endpoint This change scopes all github entities to a github endpoint, allowing users to have the same repo/org/enterprise created for each endpoint. This way, if your username is the same on github.com and on your GHES server, and you have the same repository name or org in both places, GARM can now handle that situation. This change also fixes a leaky watcher in the pool manager. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-07-29 17:35:57 +00:00
Gabriel Adrian Samfira	2554f70b89	Replace time.After with time.NewTimer Improper use of time.After can lead to memory leaks if the timer never gets a chance to fire. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-07-05 12:55:35 +00:00
Gabriel Adrian Samfira	892a62bfe4	Allow configuration of job backoff interval GARM has a backoff interval when consuming queued jobs. This backoff is intended to allow any potential idle runners to pick up a job before GARM attempts to spin up a new one. This change allows users to set a custom backoff interval or disable it altogether by setting it to 0. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-07-01 10:27:31 +00:00
Gabriel Adrian Samfira	daaca0bd8f	Use watcher and get rid of RefreshState() This change uses the database watcher to watch for changes to the github entities, credentials and controller info. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-06-21 13:47:48 +00:00
Gabriel Adrian Samfira	1dfa74efd8	Lower the log level of ignored jobs Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-06-20 15:30:25 +00:00
Gabriel Adrian Samfira	8d57fc8fa2	Add rudimentary database watcher Adds a simple database watcher. At this point it's just one process, but the plan is to allow different implementations that inform the local running workers of changes that have occured on entities of interest in the database. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-06-14 19:47:12 +00:00
Mario Constanti	b4e7dead1c	fix: check if runner name is empty and return Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>	2024-06-05 13:48:53 +02:00
Mario Constanti	dc74c45317	fix: remove unnecessary github api call There are only a few cases, where we get a job information from github where the runner name is not set. For all this cases we do not need to check github API at all because these jobs are never ever get scheduled to a runner: job.Action is: * queued: a queued job is just queued and not scheduled to a runner so we do not get a runner name from the GH API * completed: when conclusion=cancelled\|failure github never scheduled the job to a runner and with that we do not get a runner name from the GH API Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>	2024-06-05 12:37:20 +02:00
Mario Constanti	7adc48c75f	fix: use the american english type of cancelled github is sending job events where conclusion=cancelled is spelled in american english. Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>	2024-06-05 11:57:33 +02:00
Gabriel Adrian Samfira	cb4d56773f	Remove some code, move some code around Remove code that was just wrapping other functions at this point, and move some code around. We need to get a better idea what is actually still needed in the pool manager, to begin to refactor it into something that can scale out. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-04-01 14:52:37 +00:00
Gabriel Adrian Samfira	36288c65e6	Slightly simplify code Change instance DB functions from querying by ID to querying by name. Names are unique in GARM, so we might as well use the name instead of the ID and spare ourselves the extra query to get the ID when a qorkflow comes in. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-30 18:22:06 +00:00
Gabriel Adrian Samfira	f9f545f060	Remove duplicate code Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-29 18:50:04 +00:00
Gabriel Adrian Samfira	9384e37bb1	Fix tests Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-28 18:23:49 +00:00
Gabriel Adrian Samfira	0152b21529	Implement some common logic for pool creation Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-28 10:09:20 +00:00
Gabriel Adrian Samfira	39f1be5512	Fix JIT config with empty runner group name When no runner group is set, do not attempt to resolve the runner group. Looking for an empty runner group will just return a not found error, which will make GARM fall back to registration token. This change fixes that. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-25 18:53:53 +00:00
Gabriel Adrian Samfira	f0080047a3	Remove superfluous function Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-18 10:56:49 +00:00
Gabriel Adrian Samfira	56da6a4437	Slightly better UX when dealing with webhooks Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-18 10:19:16 +00:00
Gabriel Adrian Samfira	9259f84e56	Fix getting webhook URL info Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-18 09:53:34 +00:00
Gabriel Adrian Samfira	cfb68f8928	Check webhook secret for entity Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-18 09:39:07 +00:00
Gabriel Adrian Samfira	fa75ecfa8e	Dedupe more code Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-17 10:59:09 +00:00
Gabriel Adrian Samfira	b550d0c5b9	remove extra function Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-17 10:28:35 +00:00
Gabriel Adrian Samfira	1734e6f87c	Deduplicate code Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-17 10:21:41 +00:00
Gabriel Adrian Samfira	234f71d9d1	Rename PoolType to GithubEntityType We'll use GithubEntityType throughout the codebase to determine the type of operation that is about to take place, so this won't belimited to determining only pool type. We'll also use this to dedupe the label scope as well. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-17 06:58:03 +00:00
Gabriel Adrian Samfira	206fe42c73	Remove unused code, update test Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-15 15:48:53 +00:00
Gabriel Adrian Samfira	d7ea80a657	Remove log message Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-15 08:12:16 +00:00
Gabriel Adrian Samfira	ce3c917ae5	Add pool balancing strategy This change adds the ability to specify the pool balancing strategy to use when processing queued jobs. Before this change, GARM would round-robin through all pools that matched the set of tags requested by queued jobs. When round-robin (default) is used for an entity (repo, org or enterprise) and you have 2 pools defined for that entity with a common set of tags that match 10 jobs (for example), then those jobs would trigger the creation of a new runner in each of the two pools in turn. Job 1 would go to pool 1, job 2 would go to pool 2, job 3 to pool 1, job 4 to pool 2 and so on. When "stack" is used, those same 10 jobs would trigger the creation of a new runner in the pool with the highest priority, every time. In both cases, if a pool is full, the next one would be tried automatically. For the stack case, this would mean that if pool 2 had a priority of 10 and pool 1 would have a priority of 5, pool 2 would be saturated first, then pool 1. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-14 20:04:34 +00:00
Gabriel Adrian Samfira	7d33e0f0cf	Add job info in runner list This change adds information about the job a runner is currently handling. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-11 15:46:18 +00:00
Gabriel Adrian Samfira	9a6770c3a3	Allow bypassing Unauthorized error when deleting runner This change allows users to bypass GitHub Unauthorized errors when removing github runners. This means that removing runners will now be possible even if the pool manager is stopped. There is a new flag added to the runner rm command and to the API that tells GARM to bypass pool being stopped and any 401 error returned by GitHub. This means you will be able to remove the runners from garm and your provider, but will mean that the runner will still exist in github as "offline" if the credentials are not updated or the runner manually removed. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-10 15:21:39 +00:00
Gabriel Adrian Samfira	cbb2134f0e	Add GitHub App support This change adds the ability to use GitHub Apps to authenticate against the GitHub API. This gives us a larger quota for API requests (15k vs 5k for PATs). Also, each GitHub App has its own quota, whereas PATs share the same user quota. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-03-01 19:47:50 +00:00
Mario Constanti	3fd09f6dcd	fix: assignOp linter finding Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>	2024-02-22 15:06:53 +01:00
Mario Constanti	b0e3f78fbb	fix: godoc linter warnings (TODOs) Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>	2024-02-22 15:06:53 +01:00
Mario Constanti	f6404456b9	fix: indent-error-flow linter findings Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>	2024-02-22 15:06:53 +01:00
Mario Constanti	e5ed45c258	fix: unnecessary conversion linter findings Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>	2024-02-22 15:06:53 +01:00
Mario Constanti	3b9f8b555b	fix: var-naming linter findings Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>	2024-02-22 15:06:53 +01:00
Mario Constanti	bd0b27ab10	fix: gci section warnings Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>	2024-02-22 15:06:53 +01:00
Mario Constanti	8fc001f5f6	fix: misspell linter warnings Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>	2024-02-22 15:06:53 +01:00
Mario Constanti	b36b5137b6	feat: count github api calls introduce metrics counter for github api calls Signed-off-by: Mario Constanti <mario.constanti@mercedes-benz.com>	2024-02-21 14:22:45 +01:00
Gabriel Adrian Samfira	43b96c543d	Add some logging Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-01-30 09:37:26 +00:00
Gabriel Adrian Samfira	e441b6ce89	Switch to log/slog This change switches GARM to the new structured logging standard library. This will allow us to set log levels and reduce some of the log spam. Given that we introduced new knobs to tweak logging, the number of config options for logging now warrants it's own section. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2024-01-05 23:46:40 +00:00
Gabriel Adrian Samfira	0dd4f38691	Update go-github Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2023-12-18 16:20:44 +00:00
Gabriel Adrian Samfira	459906d97e	Prevent abusing the GH API On large deployments with many jobs, we cannot check each job that we recorded in the DB against the GH API. Before this change, if a job was updated more than 10 minutes ago, garm would check against the GH api if that job still existed. While this approach allowed us to maintain a consistent view over which jobs still exist and which are stale, it had the potential of spamming the GH API, leading to rate limiting. This change uses the scale-down loop as an indicator for job staleness. If a job remains in queued state in our DB, but has dissapeared from GH or was serviced by another runner and we never got the hook (garm was down or GH had an issue - happened in the past), then garm will spin up a new runner for it. If that runner or any other runner is scaled down, we check if we have jobs in the queue that should have matched that runner. If we did, there is a high chance that the job no longer exists in GH and we can remove the job from the queue. Of course, there is a chance that GH is having issues and the job is never pushed to the runner, but we can't really account for everything. In this case I'd rather avoid rate limiting ourselves. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2023-12-15 22:41:50 +00:00
Gabriel Adrian Samfira	85968598b0	Add option to disable JIT config This change adds a flag on providers that allows users to disable JIT configuration even when it's available. For context, JIT is available on github.com and any GHES instance >=3.10. This option is a stopgap measure for providers that have not yet been updated to use JIT configs instead of runner registration tokens. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2023-12-11 12:37:33 +00:00
Gabriel Adrian Samfira	d09f12dfd8	Add force delete runner This branch adds the ability to forcefully remove a runner from GARM. When the operator wishes to manually remove a runner, the workflow is as follows: * Check that the runner exists in GitHub. If it does, attempt to remove it. An error here indicates that the runner may be processing a job. In this case, we don't continue and the operator gets immediate feedback from the API. * Mark the runner in the database as pending_delete * Allow the consolidate loop to reap it from the provider and remove it from the database. Removing the instance from the provider is async. If the provider errs out, GARM will keep trying to remove it in perpetuity until the provider succedes. In situations where the provider is misconfigured, this will never happen, leaving the instance in a permanent state of pending_delete. A provider may fail for various reasons. Either credentials have expired, the API endpoint has changed, the provider is misconfigured or the operator may just have removed it from the config before cleaning up the runners. While some cases are recoverable, some are not. We cannot have a situation in which we cannot clean resources in garm because of a misconfiguration. This change adds the pending_force_delete instance status. Instances marked with this status, will be removed from GARM even if the provider reports an error. The GARM cli has been modified to give new meaning to the --force-remove-runner option. This option in the CLI is no longer mandatory. Instead, setting it will mark the runner with the new pending_force_delete status. Omitting it will mark the runner with the old status of pending_delete. Fixes: #160 Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2023-10-12 06:15:36 +00:00
Gabriel Adrian Samfira	26dbc3d8e5	Update garm-provider-common This update pulls in the latest version of garm-provider-common which removes its dependency on go-github, making future updates much less painful. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2023-10-09 10:55:11 +00:00
Gabriel Adrian Samfira	019948acbe	Add JIT config as part of instance create We must create the DB entry for a runner with a JIT config included. Adding it later via an update runs the risk of having the consolidate loop pick up the incomplete instance. Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>	2023-09-24 13:51:17 +00:00

1 2 3

139 commits