Sign in through GitHub

Please read for an updated status on RailsCasts:

Learn more or hide this

Daniel Errante's Profile

GitHub User: danoph

Site: http://graphicleftovers.com

Comments by Daniel Errante

Avatar

@Jarron,

In sphinx, delta indexing acts like a secondary index where you can index a smaller number of documents (such as new documents added today) and your site will search the main + delta index. You need to merge the delta index with the main index frequently (once a day or once a week at least) by using a cron job or other periodically running task. Also, the delta index doesn't imply real-time indexing either, you still have to periodically update the delta index as well. It's not real time. Last time I used sphinx about a year ago, they were experimenting with an update API where you can just update the document in the index, which would be real time.

Avatar

No problem @Dom. Ryan gives a nice overview in the screencast but there are some awesome features that aren't covered like date histogram facets and percolate queries that are worth looking into. The date histogram facet can group a field's total by month, week, day etc. For example, if you have a website with items and they belong to users, you can group the user's items by month with the date histogram facet. And, from ES's website, percolate queries...

Think of it as the reverse operation of indexing and then searching. Instead of sending docs, indexing them, and then running queries. One sends queries, registers them, and then sends docs and finds out which queries match that doc.

As an example, a user can register an interest (a query) on all tweets that contain the word “elasticsearch”. For every tweet, one can percolate the tweet against all registered user queries, and find out which ones matched."

Also, the documentation for the tire gem is somewhat lacking/confusing in my opinion so I had to do some extra research to find out how to add date boosting for queries and use different stemmers like KStem (KStem is less agressive than snowball and the other stemmers if you need stemming). It's really easy to customize your index settings to optimize for faster queries or faster indexing, setting up custom analyzers and changing your index schema.

Avatar

@Thomas

Not that I know of. Maybe the WebSolr guys would be interested in providing an ElasticSearch solution?

Avatar

@Thomas: I've used Sphinx, Solr and ElasticSearch in production and my favorite is ElasticSearch for a few reasons:

  1. Sphinx and Solr both index documents periodically (with a large delay or manual re-index by default) so it's more difficult to index documents near real time. ElasticSearch has a default delay of one second.
  2. ElasticSearch has built-in cluster support for High Availability solutions. This is possible in Sphinx/Solr as well but again more difficult to set up.
  3. ElasticSearch and Solr both have built in "More Like This" queries, which is the main reason I had to stop using Sphinx. Sphinx doesn't have a built in solution for providing "similar" documents.
  4. ElasticSearch is the easiest to set up and get running for development AND production. Sphinx comes close because you only have to edit one configuration file. Solr is the hardest because you need to optimize it for production (and not use the built in Solr package provided with Sunspot, for example). There are lots of configuration files with Solr and usually you would want to use Tomcat to serve the search engine in production, which requires a lot of configuration by itself. It's also extremely simple to set up multiple indexes (or version your indexes) in ElasticSearch.
  5. The ElasticSearch API is JSON-based, so you can integrate the search engine easily with any application. You don't NEED a wrapper library to get up and running fast.

After working with different search engines for a while now, most of them require lots of time tweaking and configuring to fit your needs. The biggest advantage for ElasticSearch is the built-in functionality that usually requires lots of configuration. Less configuring means fewer opportunities to break and more time to spend concentrating on more important things like building your website! Also check out ElasticSearch's percolate queries...another cool feature that you may find useful.

Avatar

Murat: Sunspot is a ruby wrapper for Solr. Another search engine to check out is ElasticSearch (and the Tire gem for it). It has some nice features like distributed indices for high availability and it's a lot easier than to set up than solr.

Avatar

Hi Ryan...Are you using this on railscasts.com? I noticed there is a bug...when you are on an episode page, click on one of the tabs like comments (I'm assuming this is using pjax). Then type in google.com and press the back button. It shows the pjax request code instead of reloading the full page.