Oct 21 2009 Wed
5:28 pm PHT
Warning: This is a an ultra-geeky article that I am writing down for reference, and also so that others having the same—maybe rare—problem can find it online. You can stop reading now if you want to.
The Apache web server modules mod_gzip (or mod_deflate), mod_rewrite, and mod_include apparently don’t interact very well. Specifically, mod_gzip returns to the web browser an incorrectly compressed file when that file is constructed using includes (via mod_include) with virtual URLs rewritten using mod_rewrite. The apparent workaround is to avoid rewriting include URLs or to disable mod_gzip.
As a background, mod_gzip is the Apache web server module (i.e. plugin) that handles gzip compression so that text and HTML files can be served to the browser compressed, saving network bandwidth (at the cost of processing time on both the server and browser sides). mod_include is a popular module for including common, changing, or conditional content into served documents. This enables such things like a common footer in all HTML pages that can be changed by just modifying one file. mod_rewrite is a must if you want to have pretty (and SEO-friendly) URLs—it maps the URL namespace to the filesystem using various powerful options.
In mod_include, the recommended way to include files is via the “virtual” element. Whenever the server encounters this element in a document to be served, the server issues a “subrequest”, as if it were the browser, using the specified URL in the element. The file returned by that subrequest is then inserted into the original document in place of the virtual element and then the whole thing is then served to the actual web browser. Since these included files are retrieved via an Apache subrequest, the specified URL can theoretically be rewritten using mod_rewrite. This actually works—except when mod_gzip is enabled. What happens is that the served document is compressed wrongly, leading to mysterious “Content Encoding Error” messages in browsers.
I am currently developing a website and ran into this problem that confounded me for a span of two days. I thought the problem had to do with my forcing all HTML content to be served as UTF-8, but disabling that did not solve the problem. I next tried disabling gzip compression on the browser side (in both Firefox and a web app) and the pages were retrieved correctly. However, I didn’t want to disable gzip on the server side so I looked for another solution. I tried removing the URL rewriting for included files and the problem disappeared! Since I consider the loss of rewriting (for included files only) a very minor cost compared to losing compression (and since mod_gzip is enabled by default on my host Dreamhost and I trust that it means they prefer bandwidth savings to CPU time savings), include URL rewriting had to go. I only encountered this problem on the live web server and not on my development setup (using XAMPP) because mod_gzip apparently wasn’t enabled there.
I tried researching my problem on the web but I failed to find information on my exact problem. However, I found forum postings and bug reports surrounding Apache subrequest handling and mod_gzip. It turns out that Apache’s subrequest mechanism is quite intricate and so it shouldn’t be surprising that various modules can have problems in this way. It’s like the Apache equivalent of unforseen multiple drug interactions. Since I couldn’t be bothered to file a bug report (and rewriting include URLs seem counterproductive on hindsight), I just went with my workaround.
Hopefully this blog posting can help other poor web developers confounded with the same problem.