In my travels at work, I recently came across an interesting situation using the Apache Commons HTTPClient library. For a project, I'm using the Commons HTTPClient to open a TCP based tunnel through a web-proxy. Unfortunately, I have to use a proxy because my employer forces me to use one for all outgoing HTTP traffic. This means that when I want to establish a secure tunnel to a web-server outside of my employer's corporate firewall, the Apache Commons HTTPClient must open a TCP tunnel through my employer's web-proxy. In other words, when I need the HTTPClient to connect to a secure web-site with HTTPS, it needs to open a TCP tunnel through the proxy.
In doing so, I see a ton of these casual INFO messages from the HTTPClient library in my log files:
INFO: Response content length is not known Apr 10, 2009 1:12:26 AM
org.apache.commons.httpclient.HttpMethodBase readResponseBody
INFO: Response content length is not known Apr 10, 2009 1:12:29 AM
org.apache.commons.httpclient.HttpMethodBase readResponseBody
It looks like the warning is coming from the readResponseBody() method of the HttpMethodBase class. What gives, man?
:: browser sends a CONNECT request to the configured web-proxy
CONNECT server.example.com:443 HTTP/1.1
User-Agent: Mozilla/5.0
:: web-proxy establishes TCP tunnel to server.example.com:443
:: web-proxy returns Connection established to browser
HTTP/1.1 200 Connection established
Content-Length: 0
Connection: Keep-Alive
:: the Content-Length header above is missing in the
:: actual response. Plus, the proxy is returning a
:: Connection: Keep-Alive header which supposedly violates
:: the protocol here.
:: browser starts sending data through tunnel to secure server
So, the problem here appears to be a missing "Content-Length" header and an invalid "Connection: Keep-Alive" header on the HTTP/1.1 response. The corporate web-proxy I'm using does NOT return a Content-Length header when it establishes a connection. I claim that when the HTTPClient processes the response (tries to read the response body) it doesn't know how many bytes to read, because the Content-Length header is missing. Or, it's seeing the Connection: Keep-Alive header (expecting a Connection: close instead?) and kinda freaks out.
In this case, it's possible that one of two things are happening:
- The web-proxy is violating HTTP and isn't returning a Content-Length header on the response when it should be. However, I couldn't find any official specs or documentation on opening TCP tunnels through a web-proxy other than this draft spec dated August 1998. This spec does not state if a HTTP/1.1 200 Connection established response must include a Content-Length header.
- The HTTPClient is wrong, and lazily checking all responses for a Content-Length header following a successful HTTP/1.1 200 Connection established. If the Content-Length header doesn't exist, then it logs the warning.
That's about all I know on this problem, and I'm not immediately sure who is wrong: the proxy, or the HTTPClient? If you know more about this than I do, please let me know.
Cheers.


Did you find this post helpful, or at least, interesting?