Thursday, September 13, 2007

Apache Is My Service Hub

Ok, so I nicked the idea of this post from mnot's Squid is My Service Bus and tried to extend the concept based on ways I've used Apache HTTPD in the past.

In general, I'm a big fan of Apahce HTTPD. It's immensely powerful and has been my "Swiss Army Knife" of the web for some time now, proxying this, caching that, dynamically serving things all the while never once becoming the primary constraint in any distributed system. That's saying a lot as a perimeter service responsible for all ingress traffic.

It occurred to me in discussions with Mike and Patrick that Apache's utility makes it an ideal Service Bus of sorts. Service Hub might be a better term. Most of Apache's default functionality is geared towards moving resource requests around and not protocol mediation (a key criteria of a bus). However, nothing stops the Apache developer from making the server more bus like if the requirements so dictate.

The idea goes like this

  • Service consumers, RESTful preferably (although SOAP will work too), desire to perform operations on a resource. The consumer has a client that is smart - it knows that it can rely on finding one or more Service Hubs in the local vicinity, but doesn't care exactly where that hub is.
  • The client performs some measure of discovery, a DNS lookup backed by BIND zones, a Zeroconf search for the Service Hub, a broadcast search, something that lets the client discover, at runtime, it's friendly local Service Hub. The important part is that all clients search for their local Service Hub in the same way and in a manner that lets us have one or twenty hubs depending on our needs. We can scale up or down and the client behavior doesn't change.
  • When the service client finds a hub, it makes a request of that hub. Next time around, it might use a different hub, or it might hold a keep-alive for the HTTP connection to that hub.


So, what makes Apache particularly well suited for this role?
  1. Apache HTTPD is ubiquitous - Apache is capable as acting as an origin, a proxy or a gateway for resources all in one set of configuration while being 99% HTTP 1.1 compliant. It runs on all Unixes I know of, Windows and other platforms. I generally don't think of Apache as middleware, but if you abstract Apache up a layer and treat it as a RESTful gateway, it does fill a role traditionally reserved for middleware.
  2. Apache HTTPD can reverse proxy - Apache has an added advantage that it can perform crude load balancing in a reverse proxy configuration. This means, you can have service clients consume HTTP resources without knowing that they are talking with one or twenty back-end origin servers. Sure, F5s have been doing this for some time now, but try installing thirty F5s in a large integration, one for each broadcast domain and see what it does for your budget. Additionally, Apache can reverse proxy for loads of protocols out of the box, HTTP(S), FTP and AJP open worlds of opportunities for the systems integrator.
  3. Performance - Apache HTTPD builds on Apache's portable runtime (APR) which is a fairly low-level abstraction on top of most OS system calls. This gives it tremendous performance benefits (like using scalable non-blocking I/O ala epoll on Linux) while still remaining relatively platform-agnostic.
  4. Extensible - Not long ago, writing Apache modules was a poorly-documented black art. That is improving substantially with books and better reference documentation. But, writing a module isn't necessary at all. With dynamic language alternatives like Python and Ruby that can run in-process with Apache workers, you can extend Apache as a Service Hub however you like and without writing C.


I don't expect to see wide adoption of this type of approach any time soon. If RESTful architectures catch on and people start to see the web as a series of resources, Apache HTTPD's role will continue to grow. It's high time we stopped looking at Apache HTTPD as just a web server and started viewing it as an HTTP Gateway.

No comments: