Archive for December 27, 2004

Monday, December 27, 2004

Anchors and Cruft-Free URLs

I’m trying to clean up some URLs using mod_rewrite. I have .shtml files on the server, but I want browsers to be able to access them without the file extension. And if the browser does send a URL with .shtml, it should be redirected to the URL without the file extension. I have this much working, and here are the contents of the .htaccess file:

RewriteEngine On
RewriteBase /

# remove .shtml; use THE_REQUEST to prevent infinite loops
RewriteCond %{THE_REQUEST} ^GET\ (.*)\.shtml\ HTTP
RewriteRule (.*)\.shtml$ $1 [R=301]

# remove index
RewriteRule (.*)/index$ $1/ [R=301]

# remove slash if not directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} /$
RewriteRule (.*)/ $1 [R=301]

# add .shtml to access file, but don't redirect
RewriteCond %{REQUEST_FILENAME}.shtml -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1\.shtml [L]

The problem is that this doesn’t work when the URL has an anchor. If I type:

http://www.example.com/foo#bar

into the browser, it works fine. But:

http://www.example.com/foo.shtml#bar

gets redirected to:

http://www.example.com/foo

instead of:

http://www.example.com/foo#bar

As far as I know, #bar stays in the browser; it isn’t sent to the server. So I don’t think there’s any way to preserve it using the rewrite rules. Is there any way to get the browser to tack on the #bar after it gets the 301 from the server?