#120 Thinking Sphinx (revised)
- Download:
- source codeProject Files in Zip (85.3 KB)
- mp4Full Size H.264 Video (19.7 MB)
- m4vSmaller H.264 Video (11.3 MB)
- webmFull Size VP8 Video (13.4 MB)
- ogvFull Size Theora Video (25.1 MB)
Below is a page from a Rails application that shows a list of articles. We want to add a search feature to this page and, as these articles are text-based, we’ll use a full-text search engine instead of stringing together SQL queries.
Installing Thinking Sphinx
There are a variety of solutions for adding full-text searching to a Rails application and in this episode we’ll use Thinking Sphinx. You should consider all of the options before deciding which one to use and we have already covered one alternative using Sunspot and Solr in episode 278.
To get started we’ll need to install Sphinx. Sphinx communicates directly with an SQL database so there are limitations as to which databases it works with. Sphinx currently only supports MySQL or PostgreSQL so if you’re using Sqlite or MongoDB as the database for your Rails application you’ll need to find an alternative solution.
The easiest way to install Sphinx on OS X is to use Homebrew. With it we can install Sphinx by running
$ brew install sphinx
If you’re not running OS X or you don’t want to install Homebrew there are other download options available. We’ll need to have MySQL or PostgreSQL installed and we can do that through Homebrew too if need be.
Rails uses Sqlite by default, but we can create a application that uses a different database by using the -d
option.
$ rails new foo -d mysql
Our Rails application is already running with MySQL as its database so we can start to use Thinking Sphinx with it. Thinking Sphinx is an older gem but it’s still well maintained. Before starting with it it’s worth reading the documentation, especially the pages on indexing and searching. To add Thinking Sphinx to our application we need to add its gem to the gemfile and run bundle.
source 'http://rubygems.org' gem 'rails', '3.1.3' # Bundle edge Rails instead: # gem 'rails', :git => 'git://github.com/rails/rails.git' gem 'mysql2' # Gems used only for assets and not required # in production environments by default. group :assets do gem 'sass-rails', '~> 3.1.4' gem 'coffee-rails', '~> 3.1.1' gem 'uglifier', '>= 1.0.3' end gem 'jquery-rails' gem 'thinking-sphinx'
Adding Indexes
Next we need to define the index in the model we want to search, in this case the Article
model. We do this by calling define_index
and passing it a block. Inside the block we call indexes
and pass it the name of the column we want to index.
class Article < ActiveRecord::Base belongs_to :author has_many :comments define_index do indexes content end end
Note that we don’t pass the column as a string or a symbol but as a method call as this is the way that the Thinking Sphinx DSL works.
We also want to index the name
column, but there’s a trap here. The methods name
and id
are reserved so if we want to index one of these columns we do need to pass it in as a symbol. If we want to make a column sortable we can use the sortable
option. We can then use the sort option in the search itself to sort the matching articles.
define_index do indexes content indexes :name, sortable: true end
It’s also possible to index columns through associations. Our Article
model has many comments
. If we want the comments to be included in the index we can do so like this:
define_index do indexes content indexes :name, sortable: true indexes comments.content, as: :comment_content end
When we index an association it’s a good idea to use to the as
option to assign a name to it and we’ve done so here.
If we want to index multiple columns at once we can pass in an array. For example we want to index the authors name, but it’s stored in separate first_name
and last_name
fields. We can index it like this:
define_index do indexes content indexes :name, sortable: true indexes comments.content, as: :comment_content indexes [author.first_name, author.last_name], as: :author_name end
Adding a Search Form
Our Article
model now has the indexes we want but before we can search the articles we need to build the index which we do by running a Rake task.
$ rake ts:index
Once Thinking Sphinx has finished indexing we can start up the Sphinx server.
$ rake ts:start
Next we need to add a search field to the articles page and hook it up to Thinking Sphinx. This form is will submit to the articles page, the same page that the form itself is on, and use GET.
<h1>Articles</h1> <%= form_tag articles_path, method: :get do %> <div class="field"> <%= text_field_tag :search, params[:search] %> <%= submit_tag "Search", name: nil %> <% end %> <!-- #Rest of page omitted -->
The ArticlesController
’s index
action is triggered when the user makes a search so we need to search against the Article
model there. Thinking Sphinx adds a search
method to model classes and we can use that here. All we need to do is pass it the search
parameter from the form.
def index @articles = Article.search(params[:search]) end
When we reload the articles page now the search form is there and we can use it to search for say, “Batman”.
Only the article that contains the text we’ve searched for has been returned so it looks like this is working. We can also search by author name or even combine the two and search for part of the title and part of the author’s name and the matching articles will be correctly returned.
Customizing The Search
There are a number of options that we can pass to search
. One of these is order
which we can use with any column that’s been marked as sortable
. Earlier we made the name
column sortable so we can use this here and any searches we make now will be sorted by name, rather than by the default, which is to sort by relevance.
def index @articles = Article.search(params[:search], order: :name) end
If we want to paginate the results we can pass in pagination options with the page
and per_page
options, like this:
def index @articles = Article.search(params[:search], page: 1, per_page: 20) end
This will work if we have will_paginate installed and Kaminari support is being worked on.
If we want to filter the results based on the value in a specific field we can use a conditions
hash. For example if we want to return only articles that have “Batman” in their name we can do this:
def index @articles = Article.search(params[:search], conditions: { name: "Batman" }) end
What, though, if we want to filter the results on a non text-based field such as the author_id
? We might expect to be able to do something like this.
def index @articles = Article.search(params[:search], conditions: { author_id: 2 }) end
This won’t work as we haven’t indexed the author_id
field. There are a number of ways to index something in Sphinx. One is to use a field, which is what the indexes
method that we’ve been using does, but this is meant for text fields. For numeric, date or time fields it’s better to use an attribute. Thinking Sphinx handles these with a has method we can pass a list of fields to.
define_index do indexes content indexes :name, sortable: true indexes comments.content, as: :comment_content indexes [author.first_name, author.last_name], as: :author_name has author_id, published_at end
Since we’ve changed the index’s definition we need to rebuild it. We can do that by running
$ rake ts:rebuild
This will stop Sphinx, rebuild the index and then start it again.
In the controller we’re using the conditions hash to filter by the author_id
but again this is meant for fields rather than attributes. To filter by attributes we should use with
.
def index @articles = Article.search(params[:search], with: { author_id: 2 }) end
When we reload the page now it will filter the articles so only those written by the author with an id
of 2
.
We also added to published_at
column to the index so we can filter by this field too. We can use a range here to find all the articles published in, say, the last three weeks.
def index @articles = Article.search(params[:search], with: { published_at: 3.weeks.ago..Time.zone.now }) end
There are a couple of other options we can use when searching. If we want to give results from certain fields more weight than others we can use field_weights
.
def index @articles = Article.search(params[:search], field_weights: {name: 20, content: 10, author_name: 5}) end
The default weight for a field is 1
. We’ve given the name field the highest weighting above, followed by content
and author_name
, so matches in the articles’ names will be treated as the most relevant when showing the results.
Match Modes
We’ll finish off by showing the match_mode
option. There are various modes we can switch to, for example boolean
. If this option is set and we pass in a query with more than one keyword, say, “Superman Krypton” only the articles which match both words will be returned. If we separate the keywords with a pipe and search for “Superman | Krypton” then we’ll see articles that match either keyword returned.
We can also use a minus sign to return articles that match one keyword but which don’t include the other. A search for “Superman -Krypton” will return only those articles that contain “Superman” but not “Krypton”.
As we’ve shown in this episode Thinking Sphinx has some has some nice options for adding searching to Rails applications. It can fall a little short when it comes to reindexing, however. If we create, update or delete an article in our application the index won’t be automatically updated and so will be out-of-date. To pick up any database changes we have to run
$ rake ts:reindex
which will reindex everything. Thinking Sphinx reindexes quite quickly but this command still needs to be triggered. One way to do this is to set up a cron task to update the index every so often and we can use the Whenever gem which was covered in episode 164 to do this.
If you have a large database and need to update in index frequently you should read up on Delta Indexes. These allow you to reindex only the records that have changed rather than everything.