vaes9

mod_gzip, mod_include, and mod_rewrite Don’t Mix

5:28 pm PHT

Warning: This is a an ultra-geeky article that I am writing down for reference, and also so that others having the same—maybe rare—problem can find it online. You can stop reading now if you want to.  :-)

The Apache web server modules mod_gzip (or mod_deflate), mod_rewrite, and mod_include apparently don’t interact very well. Specifically, mod_gzip returns to the web browser an incorrectly compressed file when that file is constructed using includes (via mod_include) with virtual URLs rewritten using mod_rewrite. The apparent workaround is to avoid rewriting include URLs or to disable mod_gzip.

As a background, mod_gzip is the Apache web server module (i.e. plugin) that handles gzip compression so that text and HTML files can be served to the browser compressed, saving network bandwidth (at the cost of processing time on both the server and browser sides). mod_include is a popular module for including common, changing, or conditional content into served documents. This enables such things like a common footer in all HTML pages that can be changed by just modifying one file. mod_rewrite is a must if you want to have pretty (and SEO-friendly) URLs—it maps the URL namespace to the filesystem using various powerful options.

In mod_include, the recommended way to include files is via the “virtual” element. Whenever the server encounters this element in a document to be served, the server issues a “subrequest”, as if it were the browser, using the specified URL in the element. The file returned by that subrequest is then inserted into the original document in place of the virtual element and then the whole thing is then served to the actual web browser. Since these included files are retrieved via an Apache subrequest, the specified URL can theoretically be rewritten using mod_rewrite. This actually works—except when mod_gzip is enabled. What happens is that the served document is compressed wrongly, leading to mysterious “Content Encoding Error” messages in browsers.

I am currently developing a website and ran into this problem that confounded me for a span of two days. I thought the problem had to do with my forcing all HTML content to be served as UTF-8, but disabling that did not solve the problem. I next tried disabling gzip compression on the browser side (in both Firefox and a web app) and the pages were retrieved correctly. However, I didn’t want to disable gzip on the server side so I looked for another solution. I tried removing the URL rewriting for included files and the problem disappeared! Since I consider the loss of rewriting (for included files only) a very minor cost compared to losing compression (and since mod_gzip is enabled by default on my host Dreamhost and I trust that it means they prefer bandwidth savings to CPU time savings), include URL rewriting had to go. I only encountered this problem on the live web server and not on my development setup (using XAMPP) because mod_gzip apparently wasn’t enabled there.

I tried researching my problem on the web but I failed to find information on my exact problem. However, I found forum postings and bug reports surrounding Apache subrequest handling and mod_gzip. It turns out that Apache’s subrequest mechanism is quite intricate and so it shouldn’t be surprising that various modules can have problems in this way. It’s like the Apache equivalent of unforseen multiple drug interactions. Since I couldn’t be bothered to file a bug report (and rewriting include URLs seem counterproductive on hindsight), I just went with my workaround.

Hopefully this blog posting can help other poor web developers confounded with the same problem.  :-)

Filed under and

Add your comment | 2 comments

Comments

Comment times are in Philippine time (+0800).

1

On 10:47 p.m., 21 Oct 2009, markku wrote:

Quite the contrary: Dreamhost would rather that you waste bandwidth instead of CPU cycles. They have the habit of disabling accounts that use un-optimized WordPress that eat so much CPU when traffic spikes up. This is exactly why I moved out of DH and got a VPS instead.  ;)

On mod_include, it is good practice to use local paths to save the server from more work.

If you were on your own server, I’d recommend you use Nginx instead of Apache to maximize server resources.

So ano tong project na ginagawa mo?  =)

2

On 2:20 p.m., 24 Oct 2009, seav wrote:

Hey Markku! Napa-research ako sa CPU minutes ng Dreamhost. They apparently lifted the automatic suspension a few years ago, but I enabled the CPU usage monitoring just the same and I’ll be monitoring it if the site gets popular.  :-)

I’ll tell you about it sometime in November.  ;-)

Post your comment here

Comments moderated: Comments for this entry is now moderated. That means that the author will have to approve the comment before it can be viewed by the public.

Remember The Data Above? (Uses Cookies)

Comment shown to:

Comment notes

Your name and e-mail address are required. Your e-mail won't be displayed.