#399 Autocomplete Search Terms pro

Dec 31, 2012 | 17 minutes | Performance, Caching

Learn how to add autocompletion to a search form and improve performance using Rack middleware, caching and switching from SQL to Redis.

Click to Play Video ▶

Download:
source codeProject Files in Zip (93.2 KB)
mp4Full Size H.264 Video (31 MB)
m4vSmaller H.264 Video (17.6 MB)
webmFull Size VP8 Video (21.5 MB)
ogvFull Size Theora Video (38.3 MB)

In the example application below we have a list of products, each one of which has a name, a price and a category. We also have a way to search these products by either their name or category. What we’d like to do is add some auto-completion to the search field so that when a user starts typing a list of suggestions is shown containing products that match the search term. Doing this presents some interesting problems, especially in regards to performance. Let’s get started.

The products page of our application has a search box.

There are a variety of ways that we can we can handle this on the client-side but we’ll keep it simple and use jQueryUI which makes it easy to add an auto-completion dropdown to a text field. We’ll step through this part fairly quickly as it was covered in more detail back in episode 102. To include jQueryUI in our application we just need to modify our application’s JavaScript manifest file and add a line there to include it.

          /app/assets/javascripts/application.js
        
//= require jquery
//= require jquery-ui
//= require jquery_ujs
//= require_tree .

We can add auto-complete behaviour to the search text field in the products CoffeeScript file. We’ll do this when the DOM loads, although you might want to change this behaviour if you’re using TurboLinks. All we need to do is call autocomplete to the text field. This takes several options and we’ll use source to specify where the auto-complete data comes from. This will be a URL in our application as there’s too much data to include and load them all on the client at once. We’ll use a search_suggestions path.

          /app/assets/javascripts/products.js.coffee
        
jQuery ->
  $('#search').autocomplete
    source: "/search_suggestions"

Our application will need to be able to respond to this path. It doesn’t currently so we’ll need a new controller. We’ll actually create a whole new resource with a model and a controller so that we have a convenient location to store our search suggestions and look them up quickly. There are some performance concerns with storing this kind of data in a SQL database but we’ll look at this later. We’ll give our new resource a term field, which will contain the search suggestion term that’s returned to the user, and a popularity field which will determine how the results are sorted.

          terminal
        
$ rails g resource search_suggestion term popularity:integer
$ rake db:migrate

The /search_suggestions path now routes to the index action of our new SearchSuggestionsController. jQueryUI expects some JSON to be returned from this action and an array of data returned here will display the results to the user. To test that this is working we’ll add some test data here.

          /app/controllers/search_suggestions_controller.rb
        
class SearchSuggestionsController < ApplicationController
  def index
    render json: %w[foo bar]
  end
end

We’ve already added some CSS to make the auto-complete list look good so we can test our auto-complete behaviour.

Entering text into the search box now shows our test autocomplete data.

Instead of displaying dummy data the list should show common keywords from the products that match the text that has been entered. We’ll write this behaviour in the SearchSuggestions model and have the controller call a method on the model.

          /app/controllers/search_suggestions_controller.rb
        
class SearchSuggestionsController < ApplicationController
  def index
    render json: SearchSuggestion.terms_for(params[:term])
  end
end

This method should return an array of the terms that we want to suggest to the user but how should it do that? We need to look up a list of suggestions that start with the text that has been entered into the text box. Ideally these would already be in the database and we could look them up with a query, ordered by their popularity, and return up to, say, 10 results. The code to do that would look something like this:

          /app/models/search_suggestions.rb
        
def self.terms_for(prefix)
  suggestions = where("term like ?", "#{prefix}_%")
  suggestions.order("popularity desc").limit(10).pluck(:term)
end

To get this to work we’d just need to fill up our table by indexing the products data. We’ll create a Rake task called search_suggestions:index to do this.

          /lib/tasks/search_suggestions.rake
        
namespace :search_suggestions do
  desc "Generate search suggestions from products"
  task :index => :environment do
    SearchSuggestion.index_products
  end
end

This Rake task calls an index_products method on the SearchSuggestion model which we’ll need to write.

          /app/models/search_suggestion.rb
        
def self.index_products
  Product.find_each do |product|
    index_term(product.name)
    product.name.split.each { |t| index_term(t) }
    index_term(product.category)
  end
end

def self.index_term(term)
  where(term: term.downcase).first_or_initialize.tap do |suggestion|
    suggestion.increment! :popularity
  end
end

Here we loop through all the products, using find_each so that a batch find is used and all the products aren’t loaded into memory at once. We then call an index_term method on each name and also split each product’s name at the spaces and index each separate word in the name too. We also index each product’s category. There’s probably a more efficient way to do this but what we’ve got will work well here. In the index_term method we look for a SearchSuggestion with the term that’s passed in and use first_or_initialize to find or create a term if a matching one isn’t found. We then increment its popularity so that terms that are found more often appear nearer the top of the list. Instead of using this as a metric for popularity we could measure the popularity of products and have more popular products show up at the top of the list or keep track of the search terms that are used most often and order the list based on this. We’ll run this Rake task now to index our products.

          terminal
        
$ rake search_suggestions:index

Now it’s the moment of truth. When we start typing in the search box we should see the matching suggestions and we do.

This has worked. We can see a list of search suggestions with the most popular terms at the top of the list.

Increasing Performance

Our search box now works but how well does it perform? The request to fetch matching search terms will be triggered frequently as users type search terms and we might be getting hundreds of searches made per second if our site gets busy. How can we measure this and how can we ensure that the results are returned as quickly as possible? One way is to use the Rack Mini-Profiler gem which we covered in episode 368. This is installed by adding it to the gemfile and running bundle.

          /Gemfile
        
gem 'rack-mini-profiler'

When we restart the server now each request will report the time it took to process and this even works for AJAX requests. As we type into the search field each request’s time will be added to the list on that’s shown on the page.

Mini-Profiler now shows us how long each request takes to complete.

Each request only takes a few milliseconds, which isn’t bad, but lets see if we can improve it. If we want to shave milliseconds off a request a good place to look is in the Rack middleware. Each request that comes in to our application goes through an entire middleware stack which can add overhead. To get around this we can apply the technique we demonstrated in episode 150. Instead of responding with a traditional Rails controller we can add some middleware near the top of the stack which will intercept requests for search suggestions. This way the request won’t go through the entire stack, although we will lose some functionality such as logging and cookies. We’ll put our custom middleware in an app/middleware directory.

          /app/middleware/search_suggestions.rb
        
class SearchSuggestions
  def initialize(app)
    @app = app
  end

  def call(env)
    if env["PATH_INFO"] == "/search_suggestions"
      request = Rack::Request.new(env)
      terms = SearchSuggestion.terms_for(request.params["term"])
      [200, {"Content-Type" => "application/json"}, [terms.to_json]]
    else
      @app.call(env)
    end
  end
end

Middleware can just be a plain Ruby class with an initialize method that takes a Rack application, which is what we’ve done here. In initialize we store the app in an instance method here for later use. We also have a call method which accepts a Rack environment and in here we intercept requests that match the search_suggestions path. Requests that don’t match are passed on to the application which makes our middleware act as a kind of endpoint as it can handle certain types of request directly. For those requests that do match we return a 200 OK response, setting the content type to application/json and setting the body to the output from SearchSuggestion.terms_for for the terms passed in. We don’t have access to the request params in the usual way so we grab the request object and use that to get the term param. Now we just need to add this middleware to our app in its configuration file. By using insert_before and passing in 0 as the first argument and the name of our middleware class as the second our middleware will be added at the top of the stack.

          /config/application.rb
        
config.middleware.insert_before 0, "SearchSuggestions"

After we restart the server again and try entering a search term now the suggestions should come back more quickly.

Search requests now complete around twice as quickly now that we’re using our middleware.

The search suggestions are now taking less than 2ms to return so we’ve decreased the response time quite substantially with our middleware. Keep in mind that we’re profiling our application in the development environment so these numbers may well be different in production. To get more accurate figures we could set up a staging environment with similar hardware to that we’ll be using in production. That said, a decrease in response time in development will generally be reflected in production, too.

The next performance concern that we’ll take a look at is the call to the database that returns the suggestions. This is where most of the time will be spent for these requests and while these queries are currently fairly quick as the dataset is small as it grows this might become an issue. One way to solve this issue is through caching. We could set up a Memcached store like we did in episode 380 and cache the results of the terms_for method, like this:

          /app/models/search_suggestion.rb
        
def self.terms_for(prefix)
  Rails.cache.fetch(["search-terms", prefix]) do 
    suggestions = where("term like ?", "#{prefix}_%")
    suggestions.order("popularity desc").limit(10).pluck(:term)
  end
end

We use a combination of the string “search-terms” and the entered search phrase to create a unique key for each search term. The results are then cached using whatever store we specify. If we try this out now by searching for the same term more than once we now see response times of under a millisecond.

Redis makes our searches complete even more quickly.

Using Redis To Improve Performance

This caching technique can be effective but if the user’s searching for a term that isn’t cached the response will take longer as not only does the database query need to be made, the results also need to be written to the cache store. At this point we might start looking for a different storage engine for the autocompletion to replace the SQL database. One such option is Redis which is an advanced in-memory key-value store which is very fast and which has some features that will suit our needs perfectly, such as sorted sets. With it we can add records to a sorted set and assign a value to each one so that it’s sorted. We can then use ZRANGE to fetch the set in the right order later. Let’s say that we want to index the term “food”. We could make a separate set for each partial match of that word (“f”, “fo” and “foo”) and when a user starts typing the word “food” we can look up a set with that name. If we index multiple terms that start with “foo” these would also be returned, sorted by their score. We can use ZREVRANGE to do this to return them in reverse order of score so that the most popular are returned first. To add an item to a set we use ZADD but we can also use ZINCRBY to increment the score for an item that already exists. This will be useful to us as a way to increase the popularity of a given term.

To try this out we’ll need to install Redis. If you’re running OS X the easiest way to do this is to use Homebrew. Redis can then be installed by running brew install redis. We can then start it up by running this command.

          terminal
        
$ redis-server /usr/local/etc/redis.conf

We can now set up our application to work with Redis. First we’ll add the redis gem to our gemfile then run bundle to install it.

          /Gemfile
        
gem 'redis'

Next we’ll create an initializer file where we’ll set up our Redis connection. We’ll store this in a global variable so that we can access it easily in the rest of our app.

          /config/initializers/redis.rb
        
$redis = Redis.new

Now we just need to configure our SearchSuggestion model so that it no longer uses ActiveRecord but Redis instead.

          /app/models/search_suggestion.rb
        
class SearchSuggestion
  def self.terms_for(prefix)
    $redis.zrevrange "search-suggestions:#{prefix.downcase}", 0, 9
  end 

  def self.index_products
    Product.find_each do |product|
      index_term(product.name)
      product.name.split.each { |t| index_term(t) }
      index_term(product.category)
    end
  end

  def self.index_term(term)
    1.upto(term.length - 1) do |n|
      prefix = term[0, n]
      $redis.zincrby "search-suggestions:#{prefix.downcase}", 1, term.downcase
    end
  end
end

We fetch matching terms by calling $redis.zrevrange, passing in the name of a set and asking for the first ten items. The set’s name is made up from the string “search-suggestions” and the search term. In a production application we’d do some escaping here to make sure that the key is sanitized, but we won’t do that here. To do the indexing we make a separate set for each portion of the search term, starting with the first letter then adding a letter each time. We call $redis.zincrby to increment the score for each term, using the same set name we used to read the terms before.

We’ll need to run rake search_suggestions:index again to index our records in Redis and after we’ve restarted our server again we can test out our auto-completion again. When we enter a search term now we get similar results as before and the speed is similar to what we were getting when we were using caching before.

To learn more about auto-completion with Redis take a look at the Soulmate gem. The source code was a big help in building the Redis solution for this episode. You might even consider using this gem in your applications if it suits your needs.