1. only index when you need to

    a.k.a. tricks to be aware of when indexing a user model

    black-cat cookie

    Thinking Sphinx has been a god send for us, it brings the power of full text searching with such a beautiful dsl. Its powerful, configurable, flexible and every other kind of ful or ble you can think of.

    I could go on for paragraphs on my love for Thinking Sphinx, and to some degree Sphinx, but lets get to the point, otherwise I will waffle on for ages.

    When you save an indexed model, lets say a Location, two hook methods are called; a before save to toggle the delta and a after commit to run the delta index. As you can expect, if the delta isn’t toggled then the delta index isn’t run, but if you are saving a model wouldn’t you always want to toggle the delta and reindex it?

    Thinking Sphinx is smart and likes to try to reduce the need to index by only toggling the delta if data in the indexed columns has changed. This works fine if you only index simple information (fields or relationships), but doesn’t work so well when you group two fields together in a custom SQL statement.

    eg. Works fine:

    define_index do
      indexes :name
      indexes :city
      indexes :description
    end
    

    Works fine:

    define_index do
      indexes :name
      indexes [address_line_one,
               address_line_two,
               suburb, town_city], :as => :address
      indexes :type, :as => :location_type
    end
    

    Does not work fine:

    define_index do
      indexes "LOWER(`locations`.`name`)", :as => :name, :sortable => true
      ...
    end
    

    The problem here is that TS can not check if something has changed when it has SQL in the definition, so you are always going to have the model indexed in the delta, even if nothing changed.

    But is this really an issue? Does it really matter?

    Over the last couple of months I saw a interesting problem occurring only on production, and I only got to see it because of the volume of requests. In an app I manage we have indexed our users for easy searching and filtering, but for some reason the user delta index was being reindexed On each request from a logged in user, even though the information indexed for a user wasn’t changing.

    To understand the problem, cause and solution, we need to understand the infrastructure a little.

    We use Authlogic for our authentication but the important part to note is that we also have user tracking turned on, which means information about last request date and time, as well as last login etc. is updated on each request.

    When a user is updated Thinking Sphinx uses a before save hook to toggle the delta if any of the indexed information has changed. In our setup we don’t use the standard field definitions due to inconsistent sorting, so instead we use custom definitions which LOWER all the values, making sure sorting is standardized. But as explained above, this is where the problem lies.

    We can correct this on a model to model basis by overriding the handy method indexed_model_changed? by using code similar to this:

    def indexed_data_changed?
      indexed = [:first_name, :last_name, :email, :role]
      self.changed.any? { |col| indexed.include?(col.to_sym) }
    end
    

    Now when a user is updated upon each request the delta won’t be toggled and the index won’t be run on subsequent requests, reducing server load on delayed job and correcting a small bug which only showed its head when under the right conditions.

    So what should you take away from this?

    If you use SQL in your define_index blocks, make sure you override indexed_data_changed?

    side note: Thinking Sphinx is the bees knees, read the source, understand the code, and spread the indexing love

     
    1. cookiestack-old posted this