External provider enables you to write a fully functional provider, using any scripting or programming language. Garm will call your executable to manage the lifecycle of the instances hosting the runners. This document describes the API that an executable needs to implement to be usable by `garm`.
When `garm` calls your executable, a number of environment variables are set, depending on the operation. There are three environment variables that will always be set regardless of operation. Those variables are:
The `GARM_COMMAND` environment variable will be set to one of the operations defined in the interface. When your executable is called, you'll need to inspect this variable to know which operation you need to execute.
The `GARM_PROVIDER_CONFIG_FILE` variable will contain a path on disk to a file that can contain whatever configuration your executable needs. For example, in the case of the [OpenStack external provider](https://github.com/cloudbase/garm-provider-openstack), this file is a toml which contains provider specific configuration options. The provider author decides what this file needs to contain for the provider to function properly.
The OpenStack provider mentioned above is written in Go, but it doesn't need to be. For example, if your provider is written in BASH, handling the config file could look something like this:
When garm first starts up, it generates a unique ID that identifies it as an instance. This ID is passed to the provider and should always be used to tag resources in whichever cloud you write your provider for. This ensures that if you have multiple garm installations, one particular deployment of garm will never touch any resources it did not create.
As with the `GARM_CONTROLLER_ID`, this ID **must** also be attached as a tag or whichever mechanism your target cloud supports, to identify the pool to which the resources (in most cases the VMs) belong to.
It contains the `provider_id` of the instance. The `provider_id` is a unique identifier, specific to the IaaS in which the compute resource was created. In OpenStack, it's an `UUID4`, while in LXD, it's the virtual machine's name.
The operations that a provider must implement are described in the `Provider` [interface available here](https://github.com/cloudbase/garm/blob/223477c4ddfb6b6f9079c444d2f301ef587f048b/runner/providers/external/execution/interface.go#L9-L27). The external provider implements this interface, and delegates each operation to your external executable. [These operations are](https://github.com/cloudbase/garm/blob/223477c4ddfb6b6f9079c444d2f301ef587f048b/runner/providers/external/execution/commands.go#L5-L13):
The `CreateInstance` command has the most moving parts. The ideal external provider is one that will create all required resources for a fully functional instance, will start the instance. Waiting for the instance to start is not necessary. If the instance can reach the `callback_url` configured in `garm`, it will update it's own status when it starts running the userdata script.
But aside from creating resources, the ideal external provider is also idempotent, and will clean up after itself in case of failure. If for any reason the executable will fail to create the instance, any dependency that it has created up to the point of failure, should be cleaned up before returning an error code.
At the very least, it must be able to clean up those resources, if it is called with the `DeleteInstance` command by `garm`. Garm will retry creating a failed instance. Before it tries again, it will attempt to run a `DeleteInstance` using the `provider_id` returned by your executable.
If your executable failed before a `provider_id` could be supplied, `garm` will send the name of the instance as a `GARM_INSTANCE_ID` environment variable.
Your external provider will need to be able to handle both. The instance name generated by `garm` will be unique, so it's fairly safe to use when deleting instances.
The `CreateInstance` command is the only command that needs to handle standard input. Garm will send the runner bootstrap information in stdin. The environment variables set for this command are:
The information sent in via standard input is a `json` serialized instance of the [BootstrapInstance structure](https://github.com/cloudbase/garm/blob/6b3ea50ca54501595e541adde106703d289bb804/params/params.go#L164-L217)
Then you can easily parse it. If you're using `bash`, you can use the amazing [jq json processor](https://stedolan.github.io/jq/). Other programming languages have suitable libraries that can handle `json`.
You will have to parse the bootstrap params, verify that the requested image exists, gather operating system information, CPU architecture information and using that information, you will need to select the appropriate tools for the arch/OS combination you are deploying.
Refer to the OpenStack or Azure providers available in the [providers.d](../contrib/providers.d/) folder. Of particular interest are the [cloudconfig folders](../contrib/providers.d/openstack/cloudconfig/), where the instance user data templates are stored. These templates are used to generate the needed automation for the instances to download the github runner agent, send back status updates (including the final github runner agent ID), and download the github runner registration token from garm.
On success, your executable is expected to print to standard output a json that can be deserialized into an `Instance{}` structure [defined here](https://github.com/cloudbase/garm/blob/6b3ea50ca54501595e541adde106703d289bb804/params/params.go#L90-L154).
In case of error, `garm` expects at the very least to see a non-zero exit code. If possible, your executable should return as much information as possible via the above `json`, with the `status` field set to `error` and the `provider_fault` set to a meaningful error message describing what has happened. That information will be visible when doing a:
On success, this command is expected to return a valid `json` that can be deserialized into an `Instance{}` structure (see CreateInstance). If possible, IP addresses allocated to the VM should be returned in addition to the sample `json` printed above.
The `RemoveAllInstances` operation will remove all resources created in a cloud that have been tagged with the `GARM_CONTROLLER_ID`. External providers should tag all resources they create with the garm controller ID. That tag can then be used to identify all resources when attempting to delete all instances.