#306 ElasticSearch Part 1
- Download:
- source codeProject Files in Zip (83.4 KB)
- mp4Full Size H.264 Video (31.6 MB)
- m4vSmaller H.264 Video (16.1 MB)
- webmFull Size VP8 Video (14.5 MB)
- ogvFull Size Theora Video (44.2 MB)
Below is a page from a Rails application that shows a list of articles. We want to add a search feature to this page and, as these articles are text-based, we’ll use a full-text search engine instead of stringing together SQL queries.
We’ve covered this topic in previous episodes. In episode 120 we used Thinking Sphinx and in episode 278 we used Sunspot with Solr. In this episode we’ll be using Elasticsearch to add full-text searching to our application.
Elasticsearch is a full-featured search engine build on top of Apache Lucene, much like Solr. It has a REST API and communicates over JSON. Elasticsearch isn’t Ruby-specific so we’ll be using a gem called Tire to communicate with it. Tire can be used in any Ruby project, but it also has some nice model functionality to make it easy to integrate into a Rails application. There’s even a Rails template we can use to set up a new app with Elasticsearch.
$ rails new searchapp -m https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb
This will set up a new Rails application and set up Elasticsearch and Tire. If the template doesn’t detect that Elasticsearch is running it will download and install it automatically for us and set it up just for this new application. Once the command has finished we can visit http://localhost:3000/
where we’ll see a basic application that lets us use Elasticsearch to search through a small number of records. The source code for the template is worth taking a look at as it’s a good example of what can be done with a Rails application template.
Adding Elastic Search to Our Application
The example application is interesting but how do we use Elasticsearch in our application? The first step is to install it. If you’re running Homebrew on OS X this is simple; if not the Elasticsearch website has details on how to download it.
$ brew install elasticsearch
Once Elasticsearch has installed it gives us instructions on how to get it running. We start it up with this command (note that this might be different based on the version of Elasticsearch you’re running):
$ elasticsearch -f -D es.config=/usr/local/Cellar/elasticsearch/0.18.1/config/elasticsearch.yml
This command starts up the server on port 9200 and we can talk to this server manually if we want through the JSON REST API. We’re going to use Tire, though, so the next thing to do is install that. As usual this is done by adding the gem to the application’s gemfile and running bundle
.
source 'http://rubygems.org' gem 'rails', '3.1.3' # Bundle edge Rails instead: # gem 'rails', :git => 'git://github.com/rails/rails.git' gem 'sqlite3' # Gems used only for assets and not required # in production environments by default. group :assets do gem 'sass-rails', '~> 3.1.4' gem 'coffee-rails', '~> 3.1.1' gem 'uglifier', '>= 1.0.3' end gem 'jquery-rails' gem 'tire'
We can add Tire to any model we want to search against by adding two modules. We want to search articles so we’ll add them to our application’s Article
model.
class Article < ActiveRecord::Base belongs_to :author has_many :comments include Tire::Model::Search include Tire::Model::Callbacks end
The first of these modules adds various searching and indexing methods while the second one adds callbacks so that when an article is created, updated or destroyed the index is automatically updated.
We already have some articles in our application’s database and these won’t be included in the index. All of the records are defined in the application’s seeds file, though, so we run the setup file again and the records will be indexed automatically when they are reloaded.
$ rake db:setup
Adding The Search Form
Now that our articles have been indexed we can add a form for searching them on the articles page. This page’s template looks like this:
<h1>Articles</h1> <div id="articles"> <% @articles.each do |article| %> <h2> <%= link_to article.name, article %> <span class="comments">(<%= pluralize(article.comments.size, 'comment') %>)</span> </h2> <div class="info"> by <%= article.author.name %> on <%= article.published_at.strftime('%b %d, %Y') %> </div> <div class="content"><%= article.content %></div> <% end %> </div>
We’ll add this simple form below the page’s header. This form will submit to the articles page, the same page that the form itself is on, and use GET.
<%= form_tag articles_path, method: :get do %> <p> <%= text_field_tag :query, params[:query] %> <%= submit_tag "Search", name: nil %> </p> <% end %>
When the search form is submitted it triggers the ArticlesController
’s index
action and this action currently returns all of the articles. We’ll add a check in the code so that if the query
parameter from the form is present Tire’s search
method is called instead.
def index if params[:query].present? @articles = Article.search(params[:query]) else @articles = Article.all end end
When we reload the page now we’ll see the search form. When we enter a search term and submit the form, however, we’ll get an error.
The error is caused by calling article.comments.size
to display the number of comments that each article has so it seems that the associations aren’t working on the articles that are returned by Tire.
Tail tries to minimize access to the database and when we call Article.search
what’s returned is not the actual ActiveRecord models but instead a found result set from Tire with attributes based on what’s stored in the search index. The index doesn’t know about the comments association so doesn’t know how to set it up. To fix this we can add a load
option to the call to search
to tell Tire to load the actual records from the database.
def index if params[:query].present? @articles = Article.search(params[:query], load: true) else @articles = Article.all end end
When we make a search now the page loads and shows the correct results.
It would be better if all of the data we need was inside the search index so that we don’t need to use load: true
to fetch the records from the database. We can do this, but we won’t be covering that here. Instead we’ll show you how to do this in the next episode. What we will show next is how to further customize the query by passing in additional options. We’ll do this by redefining the search
model in Article
so that it accepts the params hash that the user passes in.
def self.search(params) tire.search(load: true) do query { string params[:query]} if params[:query].present? end end
As we’re overriding Tire’s search
method we use tire.search
to call the overridden method and, as we want to fetch the actual models we’ve used the load: true
option. Instead of passing the search parameters directly to this method we’ve used a block so that we can further customize the query with more options. In this block we call query
and pass it another block. In this block we pass the parameters to the string
method but only if the parameters exist.
We can simplify the ArticlesController
now so that it just calls our custom search
method and passes it the params hash.
def index @articles = Article.search(params) end
If we reload the articles page now will still work and if we clear the search box and click “Search” we’ll see all of the results returned.
There’s an article showing in the results that we don’t want there as it has a publication date in the future. We’ll change the search so that it doesn’t show articles that haven’t yet been published. To do this just we need to add a filter to the search
block in the model.
def self.search(params) tire.search(load: true) do query { string params[:query]} if params[:query].present? filter :range, published_at: {lte: Time.zone.now } end end
The first argument is the type of filter we want, in this case a range
filter. Next we pass a hash containing the attributes we want to filter by. In this case we filter by published_at
and include only those articles with a published_at
time less that or equal to the current time.
You might be wondering what other options you can pass to a search to customize it. There is some documentation on this topic but it’s rather scattered. A good place to start is Tire’s README file, although the beginning of the file may be a little confusing as it discusses indexing and mapping which you don’t need to worry about as we’re doing dynamic mapping. There’s some additional documentation available that Tire provides which is also worth reading.
Most of Tire’s options map one-to-one with Elasticsearch so it’s a good idea to look at its documentation. The page on the Query DSL has a whole section on filtering which includes the range filter we used earlier and which lists all of the options we can use. The code snippets are written in JSON but it’s easy to covert them to use with Tire.