docs(routing): more general information on http-routing #13

Merged
stephan.lo merged 1 commit from routing-excerpt into main 2024-11-08 12:30:31 +00:00
4 changed files with 121 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

View file

@ -55,3 +55,124 @@ if necessary.
DNS solutions like `nip.io` or the already used `localtest.me` mitigate the
need for path based routing
## Excerpt
HTTP is a cornerstone of the internet due to its high flexibility. Starting
from HTTP/1.1 each request in the protocol contains among others a path and a
`Host`name in its header. While an HTTP request is sent to a single IP address
/ server, these two pieces of data allow (distributed) systems to handle
requests in various ways.
```shell
$ curl -v http://google.com/something > /dev/null
* Connected to google.com (2a00:1450:4001:82f::200e) port 80
* using HTTP/1.x
> GET /something HTTP/1.1
> Host: google.com
> User-Agent: curl/8.10.1
> Accept: */*
...
```
### Path-Routing
Imagine requesting `http://myhost.foo/some/file.html`, in a simple setup, the
web server `myhost.foo` resolves to would serve static files from some
directory, `/<some_dir>/some/file.html`.
In more complex systems, one might have multiple services that fulfill various
roles, for example a service that generates HTML sites of articles from a CMS
and a service that can convert images into various formats. Using path-routing
both services are available on the same host from a user's POV.
An article served from `http://myhost.foo/articles/news1.html` would be
generated from the article service and points to an image
`http://myhost.foo/images/pic.jpg` which in turn is generated by the image
converter service. When a user sends an HTTP request to `myhost.foo`, they hit
a reverse proxy which forwards the request based on the requested path to some
other system, waits for a response, and subsequently returns that response to
the user.
![Path-Routing Example](../path-routing.png)
Such a setup hides the complexity from the user and allows the creation of
large distributed, scalable systems acting as a unified entity from the
outside. Since everything is served on the same host, the browser is inclined
to trust all downstream services. This allows for easier 'communication'
between services through the browser. For example, cookies could be valid for
the entire host and thus authentication data could be forwarded to requested
downstream services without the user having to explicitly re-authenticate.
Furthermore, services 'know' their user-facing location by knowing their path
and the paths to other services as paths are usually set as a convention and /
or hard-coded. In practice, this makes configuration of the entire system
somewhat easier, especially if you have various environments for testing,
development, and production. The hostname of the system does not matter as one
can use hostname-relative URLs, e.g. `/some/service`.
Load balancing is also easily achievable by multiplying the number of service
instances. Most reverse proxy systems are able to apply various load balancing
strategies to forward traffic to downstream systems.
Problems might arise if downstream systems are not built with path-routing in
mind. Some systems require to be served from the root of a domain, see for
example the container registry spec.
### Hostname-Routing
Each downstream service in a distributed system is served from a different
host, typically a subdomain, e.g. `serviceA.myhost.foo` and
`serviceB.myhost.foo`. This gives services full control over their respective
host, and even allows them to do path-routing within each system. Moreover,
hostname-routing allows the entire system to create more flexible and powerful
routing schemes in terms of scalability. Intra-system communication becomes
somewhat harder as the browser treats each subdomain as a separate host,
shielding cookies for example form one another.
Each host that serves some services requires a DNS entry that has to be
published to the clients (from some DNS server). Depending on the environment
this can become quite tedious as DNS resolution on the internet and intranets
might have to deviate. This applies to intra-cluster communication as well, as
seen with the idpbuilder's platform. In this case, external DNS resolution has
to be replicated within the cluster to be able to use the same URLs to address
for example gitea.
The following example depicts DNS-only routing. By defining separate DNS
entries for each service / subdomain requests are resolved to the respective
servers. In theory, no additional infrastructure is necessary to route user
traffic to each service. However, as services are completely separated other
infrastructure like authentication possibly has to be duplicated.
![DNS-only routing](../hostname-routing.png)
When using hostname based routing, one does not have to set different IPs for
each hostname. Instead, having multiple DNS entries pointing to the same set of
IPs allows re-using existing infrastructure. As shown below, a reverse proxy is
able to forward requests to downstream services based on the `Host` request
parameter. This way specific hostname can be forwarded to a defined service.
![Hostname Proxy](../hostname-routing-proxy.png)
At the same time, one could imagine a multi-tenant system that differentiates
customer systems by name, e.g. `tenant-1.cool.system` and
`tenant-2.cool.system`. Configured as a wildcard-sytle domain, `*.cool.system`
could point to a reverse proxy that forwards requests to a tenants instance of
a system, allowing re-use of central infrastructure while still hosting
separate systems per tenant.
The implicit dependency on DNS resolution generally makes this kind of routing
more complex and error-prone as changes to DNS server entries are not always
possible or modifiable by everyone. Also, local changes to your `/etc/hosts`
file are a constant pain and should be seen as a dirty hack. As mentioned
above, dynamic DNS solutions like `nip.io` are often helpful in this case.
### Conclusion
Path and hostname based routing are the two most common methods of HTTP traffic
routing. They can be used separately but more often they are used in
conjunction. Due to HTTP's versatility other forms of HTTP routing, for example
based on the `Content-Type` Header are also very common.

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB