Monday, December 27, 2004

Anchors and Cruft-Free URLs

I’m trying to clean up some URLs using mod_rewrite. I have .shtml files on the server, but I want browsers to be able to access them without the file extension. And if the browser does send a URL with .shtml, it should be redirected to the URL without the file extension. I have this much working, and here are the contents of the .htaccess file:

RewriteEngine On
RewriteBase /

# remove .shtml; use THE_REQUEST to prevent infinite loops
RewriteCond %{THE_REQUEST} ^GET\ (.*)\.shtml\ HTTP
RewriteRule (.*)\.shtml$ $1 [R=301]

# remove index
RewriteRule (.*)/index$ $1/ [R=301]

# remove slash if not directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} /$
RewriteRule (.*)/ $1 [R=301]

# add .shtml to access file, but don't redirect
RewriteCond %{REQUEST_FILENAME}.shtml -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1\.shtml [L]

The problem is that this doesn’t work when the URL has an anchor. If I type:

http://www.example.com/foo#bar

into the browser, it works fine. But:

http://www.example.com/foo.shtml#bar

gets redirected to:

http://www.example.com/foo

instead of:

http://www.example.com/foo#bar

As far as I know, #bar stays in the browser; it isn’t sent to the server. So I don’t think there’s any way to preserve it using the rewrite rules. Is there any way to get the browser to tack on the #bar after it gets the 301 from the server?

9 Comments RSS · Twitter

AFAIK, Safari indeed removes the anchor on a 301 result code, but Camino doesn't.

Wouldn't this be easier with mod_speling?

I just tried it with http://localhost/index?go#2 and it pointed me perfectly to /index.php?go#2. I haven't tried it with shtml files though.

Harold: I want it to redirect to the URL without the .shtml, even though the file on the server is .shtml. So I don't think mod_speling will help, since it changes the request to match the file on the server.

You are right. It isn't mod_speling, it's mod_negotiation that does what you want, though you might need mod_speling as well.

An article about mod_negotiation.

Harold: It's not clear to me how mod_negotiation solves the problem of redirecting the browser to the new (cruft-free) URL.

Michael: well, it doesn't redirect, but if you start pointing only to extensionless files eventually all the old ones will phase out.
This doesn't help in the short term, but then the server won't have to process all the requests. And it keeps the apache conf more maintainable because of the lack of mod_rewrite voodoo.

This is the way I would go, but if redirecting is a must obviously you will have to use mod_rewrite (or mod_alias with something like (untested!):
RedirectMatch 301 ^([a-z0-9\.\/_-]+)\.shtml$ http://mjtsai.com/$1).

Michael - you obviously got it to work with anchors. But how?

David: I didn't get it to work. For instance, the anchor gets stripped from:

http://c-command.com/spamsieve/manual.shtml#faq

You can't redirect based on anchors, you can though pass them on by grabbing all (.*) or with [QSA].

The # symbol has a special functionality and that is of acting like an anchor, and thats the only thing you can do with it in a url. (From my experience)

Leave a Comment