« The Theme this Christmas: Movies | Home | Using Live Chat to run a meeting »

December 28, 2002

Emulating a load balancer with Apache

I spent the past week trying to emulate certain aspects of a hardware load balancer using Apache. It wasn't actually load balancing I was interested in, but the ability of load balancers to pick URLs apart, and redirect the client to different servers depending on the contents of the URL.

At Georgia Tech, we're working with a large Java-based dynamic web application (Campus Pipeline's Luminis portal software). From our point of view, one problem with this application is that it's monolithic: we can't scale it except by buying larger boxes; that single server is also a single point of failure. To alleviate this problem, We'd like to be able to front the Java server with static web servers, which could handle the bits that don't have to be generated dynamically. We're doing this without the cooperation of the central application, so the easiest way to do this is by URL inspection by a device sitting between the client and that server. URLs that indicate dynamic requests get passed on to the Java-backed web server, while URLs that can be served statically are redirected to static web servers. It's something hardware load balancers do very well, but I'm working at a state-funded university, and state budgets aren't too good now. Perhaps we'll get what we need in the new year, but right now, I needed to prototype something that would show that the application would even work under these circumstances.

In any case, I was able to doing the URL redirection tricks I needed using Apache's mod_rewrite package. (I'll write that up another time.) But I had another problem: the central application serves some things via HTTPS/SSL. Since those a are a small percentage of the bits served by the application, I didn't need to pick those apart, but I did need to be able to redirect those on to the Java-based server.

I had a devil of a time figuring out how redirect HTTPS/SSL connections. The mod_rewrite approach doesn't work, because mod_rewrite works by examining each HTTP request and changing it or forwarding it. Once an HTTPS connection is set up, you can't examine the requests: they're encrypted inside an SSL connection, which is the whole point of HTTPS.

In the end, what I needed was a port-forwarder. A port-forwarder takes requests on a TCP port on one machine, and passes them off to a TCP port on another machine. In this case, I needed to forward all packets coming into port 443 on my pseudo load balancer, and pass them on to the same port on the application server.

I was building this all under Linux, and Linux has very strong facilities for routing and forwarding TCP/IP, so I thought that would be the way to go. I spent many hours chasing that mirage. In the end, I was convinced that it would be easier to get a Ph.D in physics than it would be to figure out all the details of Linux routing.

For a time, I was convinced that I could forward HTTPS connection using Apache's proxying facilities. I found an intriguing note that suggested it should be possible. It's possible that might work, but I wasn't able to figure it out.

In the end, I settled on an open source package called portfwd. It works on Linux, and claims to work on Solaris as well. portfwd does exactly what says it does: given a simple config file, it forwards all packets arriving at one port on to another port, much like a wormhole out of a Star Trek show.

With portfwd in place, my SSL connections were quickly sped on to the appplication server, and all was well.

In the end, through a lot of dead ends, I was able to get what I wanted done. I have a feeling that it would have been a lot easier with a piece of hardware. If we can't get a new hardware load balancer in the budget, perhaps we'll take up a collection and get one on eBay.