1. some handy rack-rewrite rules

    Every decision you make for a client should be based on sound reasoned advice. But what if you can’t give that advice because your facts are not accurate?

    This was a recent problem we ran into when looking into some page stats on google analytics, we found that a page was appearing twice in the stats, one without a trailing slash and one with. I also saw in Google Webmaster Tools that sites were still linking to index.html pages (the old php site convention), which meant if people followed the links they would get a 404.

    This might not sound like a big problem but it is, if your page view stats are off then your facts are off then your advice is off. Page views and navigation is a big part of advice to a client, if it be looking at current trends, or pages which can be improved, analytics should be as correct as possible.

    That story aside, the problem is not hard to fix, in fact, you have three possible solutions to choose from.

    1. Apache rewrite rules
    2. Analytics filters
    3. Rack middleware / before filter

    Apache rewrite is certainly a good option, but if you are using a production infrastructure you do not have control over, like Heroku, this would not be an option. Also, it can be expensive to test, time wise, as each time you change the rule you have to restart, not a big problem but one to be aware of.

    Using Analytics filters is also a very good option, but to be honest, I personally could not figure out how the filter system worked. Also, SEO wise, its better to redirect then to just fix the stats reporting, as you don’t want Google marking your pages down just because a trailing slash version exists. 

    That brings me to the third option, middleware and/or before filters. 

    If you are familiar with rails you would instantly think ‘just shove that in a before_filter in the application controller’, kind of how I did, but be warned, having a request go through the stack just to be redirected is slow. Also, if you are just creating more than a couple of checks you may find the separation of concerns gets a bit muddled.

    Instead, I recommend rack-rewrite.

    Rack-Rewrite is built to hook into the middleware config like so:

    config.middleware.insert_before("Rack::Lock", "Rack::Rewrite") do
      r301 ....
    end

    Then you can setup your redirects:

    r301 %r{^\/(.+)\/(\?.*)?$}, '/$1$2'
    

    The redirect above finds everything before the trailing slash and puts it in $1, and all of the querystring and puts it in $2, then redirects to /$1$2.

    We also added the following redirect:

    r301 %r{^[\/]?(.*)\/index.html(\?.*)?$}, '/$1$2'

    Very similar to the one above, this removed the /index.html and did a 301 (moved permanently).

    The first redirect is useful for anyone, the second redirect is a good reminder for other sites that go through a rebuild or url structure change, Google might catch up over time, but old blog posts / reviews / referral links etc. won’t.

    If you have any good rewrite rules you would like to share, please post them blog, or a link to a gist with your rewrite rules in them.

    Happy easter,

    Josh