#120 Thinking Sphinx (revised)

Dec 09, 2011 | 10 minutes | Active Record, Plugins, Search

Sphinx is a full-text search engine for use with MySQL or PostgreSQL. Learn how to add Thinking Sphinx by defining an index on your model and searching with various options.

Click to Play Video ▶

Download:
source codeProject Files in Zip (85.3 KB)
mp4Full Size H.264 Video (19.7 MB)
m4vSmaller H.264 Video (11.3 MB)
webmFull Size VP8 Video (13.4 MB)
ogvFull Size Theora Video (25.1 MB)

Below is a page from a Rails application that shows a list of articles. We want to add a search feature to this page and, as these articles are text-based, we’ll use a full-text search engine instead of stringing together SQL queries.

Installing Thinking Sphinx

There are a variety of solutions for adding full-text searching to a Rails application and in this episode we’ll use Thinking Sphinx. You should consider all of the options before deciding which one to use and we have already covered one alternative using Sunspot and Solr in episode 278.

To get started we’ll need to install Sphinx. Sphinx communicates directly with an SQL database so there are limitations as to which databases it works with. Sphinx currently only supports MySQL or PostgreSQL so if you’re using Sqlite or MongoDB as the database for your Rails application you’ll need to find an alternative solution.

The easiest way to install Sphinx on OS X is to use Homebrew. With it we can install Sphinx by running

          terminal
        
$ brew install sphinx

If you’re not running OS X or you don’t want to install Homebrew there are other download options available. We’ll need to have MySQL or PostgreSQL installed and we can do that through Homebrew too if need be.

Rails uses Sqlite by default, but we can create a application that uses a different database by using the -d option.

          terminal
        
$ rails new foo -d mysql

Our Rails application is already running with MySQL as its database so we can start to use Thinking Sphinx with it. Thinking Sphinx is an older gem but it’s still well maintained. Before starting with it it’s worth reading the documentation, especially the pages on indexing and searching. To add Thinking Sphinx to our application we need to add its gem to the gemfile and run bundle.

          /Gemfile
        
source 'http://rubygems.org'

gem 'rails', '3.1.3'

# Bundle edge Rails instead:
# gem 'rails',     :git => 'git://github.com/rails/rails.git'

gem 'mysql2'


# Gems used only for assets and not required
# in production environments by default.
group :assets do
  gem 'sass-rails',   '~> 3.1.4'
  gem 'coffee-rails', '~> 3.1.1'
  gem 'uglifier', '>= 1.0.3'
end

gem 'jquery-rails'
gem 'thinking-sphinx'

Adding Indexes

Next we need to define the index in the model we want to search, in this case the Article model. We do this by calling define_index and passing it a block. Inside the block we call indexes and pass it the name of the column we want to index.

          /app/models/article.rb
        
class Article < ActiveRecord::Base
  belongs_to :author
  has_many :comments
  
  define_index do
    indexes content
  end
end

Note that we don’t pass the column as a string or a symbol but as a method call as this is the way that the Thinking Sphinx DSL works.

We also want to index the name column, but there’s a trap here. The methods name and id are reserved so if we want to index one of these columns we do need to pass it in as a symbol. If we want to make a column sortable we can use the sortable option. We can then use the sort option in the search itself to sort the matching articles.

          /app/models/article.rb
        
define_index do
  indexes content
  indexes :name, sortable: true
end

It’s also possible to index columns through associations. Our Article model has many comments. If we want the comments to be included in the index we can do so like this:

          /app/models/article.rb
        
define_index do
  indexes content
  indexes :name, sortable: true
  indexes comments.content, as: :comment_content
end

When we index an association it’s a good idea to use to the as option to assign a name to it and we’ve done so here.

If we want to index multiple columns at once we can pass in an array. For example we want to index the authors name, but it’s stored in separate first_name and last_name fields. We can index it like this:

          /app/models/article.rb
        
define_index do
  indexes content
  indexes :name, sortable: true
  indexes comments.content, as: :comment_content
  indexes [author.first_name, author.last_name], as: :author_name
end

Adding a Search Form

Our Article model now has the indexes we want but before we can search the articles we need to build the index which we do by running a Rake task.

          terminal
        
$ rake ts:index

Once Thinking Sphinx has finished indexing we can start up the Sphinx server.

          terminal
        
$ rake ts:start

Next we need to add a search field to the articles page and hook it up to Thinking Sphinx. This form is will submit to the articles page, the same page that the form itself is on, and use GET.

          /app/views/articles/index.html.erb
        
<h1>Articles</h1>

<%= form_tag articles_path, method: :get do %>
  <div class="field">
    <%= text_field_tag :search, params[:search] %>
    <%= submit_tag "Search", name: nil %>
<% end %>

<!-- #Rest of page omitted -->

The ArticlesController’s index action is triggered when the user makes a search so we need to search against the Article model there. Thinking Sphinx adds a search method to model classes and we can use that here. All we need to do is pass it the search parameter from the form.

          /app/controllers/articles_controller.rb
        
def index
  @articles = Article.search(params[:search])
end

When we reload the articles page now the search form is there and we can use it to search for say, “Batman”.

Only articles matching “Batman” are returned.

Only the article that contains the text we’ve searched for has been returned so it looks like this is working. We can also search by author name or even combine the two and search for part of the title and part of the author’s name and the matching articles will be correctly returned.

Customizing The Search

There are a number of options that we can pass to search. One of these is order which we can use with any column that’s been marked as sortable. Earlier we made the name column sortable so we can use this here and any searches we make now will be sorted by name, rather than by the default, which is to sort by relevance.

          /app/controllers/articles_controller.rb
        
def index
  @articles = Article.search(params[:search], order: :name)
end

If we want to paginate the results we can pass in pagination options with the page and per_page options, like this:

          /app/controllers/articles_controller.rb
        
def index
  @articles = Article.search(params[:search], page: 1, per_page: 20)
end

This will work if we have will_paginate installed and Kaminari support is being worked on.

If we want to filter the results based on the value in a specific field we can use a conditions hash. For example if we want to return only articles that have “Batman” in their name we can do this:

          /app/controllers/articles_controller.rb
        
def index
  @articles = Article.search(params[:search], conditions: { name: "Batman" })
end

What, though, if we want to filter the results on a non text-based field such as the author_id? We might expect to be able to do something like this.

          /app/controllers/articles_controller.rb
        
def index
  @articles = Article.search(params[:search], conditions: { author_id: 2 })
end

This won’t work as we haven’t indexed the author_id field. There are a number of ways to index something in Sphinx. One is to use a field, which is what the indexes method that we’ve been using does, but this is meant for text fields. For numeric, date or time fields it’s better to use an attribute. Thinking Sphinx handles these with a has method we can pass a list of fields to.

          /app/models/article.rb
        
define_index do
  indexes content
  indexes :name, sortable: true
  indexes comments.content, as: :comment_content
  indexes [author.first_name, author.last_name], as: :author_name
    
  has author_id, published_at
end

Since we’ve changed the index’s definition we need to rebuild it. We can do that by running

          terminal
        
$ rake ts:rebuild

This will stop Sphinx, rebuild the index and then start it again.

In the controller we’re using the conditions hash to filter by the author_id but again this is meant for fields rather than attributes. To filter by attributes we should use with.

          /app/controllers/articles_controller.rb
        
def index
  @articles = Article.search(params[:search], with: { author_id: 2 })
end

When we reload the page now it will filter the articles so only those written by the author with an id of 2.

The articles are filtered by the author id.

We also added to published_at column to the index so we can filter by this field too. We can use a range here to find all the articles published in, say, the last three weeks.

          /app/controllers/articles_controller.rb
        
def index
  @articles = Article.search(params[:search], with: { published_at: 3.weeks.ago..Time.zone.now })
end

There are a couple of other options we can use when searching. If we want to give results from certain fields more weight than others we can use field_weights.

          /app/controllers/articles_controller.rb
        
def index
  @articles = Article.search(params[:search], field_weights: {name: 20, content: 10, author_name: 5})
end

The default weight for a field is 1. We’ve given the name field the highest weighting above, followed by content and author_name, so matches in the articles’ names will be treated as the most relevant when showing the results.

Match Modes

We’ll finish off by showing the match_mode option. There are various modes we can switch to, for example boolean. If this option is set and we pass in a query with more than one keyword, say, “Superman Krypton” only the articles which match both words will be returned. If we separate the keywords with a pipe and search for “Superman | Krypton” then we’ll see articles that match either keyword returned.

We can also use a minus sign to return articles that match one keyword but which don’t include the other. A search for “Superman -Krypton” will return only those articles that contain “Superman” but not “Krypton”.

As we’ve shown in this episode Thinking Sphinx has some has some nice options for adding searching to Rails applications. It can fall a little short when it comes to reindexing, however. If we create, update or delete an article in our application the index won’t be automatically updated and so will be out-of-date. To pick up any database changes we have to run

          terminal
        
$ rake ts:reindex

which will reindex everything. Thinking Sphinx reindexes quite quickly but this command still needs to be triggered. One way to do this is to set up a cron task to update the index every so often and we can use the Whenever gem which was covered in episode 164 to do this.

If you have a large database and need to update in index frequently you should read up on Delta Indexes. These allow you to reindex only the records that have changed rather than everything.