Ernie Miller

No, I don't work in NYC, DC, or the valley, and I'm cool with that.

RSS Feed

Prevent GoogleBot Overload with Default Nofollow

Posted by Ernie on September 26, 2011 at 12:15 pm

Here’s a quick tip to exert greater control over which parts of your site a search engine should crawl: modify your link_to helper to make links rel="nofollow" by default. It’s easy:

The Code

Just add this to your app’s application_helper:

  def link_to(*args, &block)
    unless block_given?
      html_options = args[2] || {}
 
      unless html_options.delete(:follow)
        if html_options[:rel]
          html_options[:rel] += ' nofollow'
        else
          html_options[:rel] = 'nofollow'
        end
      end
 
      args[2] = html_options
    end
 
    super(*args)
  end

Use WillPaginate? You may also want to add the following (adapted from this StackOverflow post):

  def will_paginate(*args)
    options = args.extract_options!
    options[:renderer] = PaginationNoFollow unless options[:renderer] || options.delete(:follow)
    super(args.first, options)
  end

Set PaginationNoFollow up with an initializer:

require 'will_paginate/view_helpers/link_renderer'
 
class PaginationNoFollow < WillPaginate::ViewHelpers::LinkRenderer
  def rel_value(page)
    case page
    when @collection.previous_page; 'prev nofollow' + (page == 1 ? ' start nofollow' : '')
    when @collection.next_page; 'next nofollow'
    when 1; 'start nofollow'
    else
      'nofollow'
    end
  end
end

Now, all of your app’s links will default to rel="nofollow", but obviously you will want some of your links to be crawled. Just add :follow => true to the options hash of either link_to or will_paginate to opt-in specific links for search engine crawling.

Filed under Blog
Tagged as , ,
You can leave a comment, or trackback from your own site.
  • Chris White

    I’d recommend looking into robots.txt as well ( http://www.robotstxt.org/ ). The nice thing about it is that it’s very flexible as to what parts of the site automated crawlers can crawl, and you can even assign specific rules to specific bots. A lot of automated software such as curl and wget respect robot.txt rules out of the box, though they can be modified to ignore it.

  • http://metautonomo.us Ernie Miller

    Yep — there’s definitely room/need for the use of multiple strategies to control the indexing of a site. The trick above is just one part of a more comprehensive plan, albeit an unbeatably flexible one, in my experience.

About

I'm Ernie Miller. But then, you probably knew that by looking at the page title, or the URL. I'm a Ruby programmer in Louisville, Kentucky. This blog used to be called "metautonomo.us", which I thought was kind of clever, but nobody, including me, could type it. Lesson learned.