RailsCasts Pro episodes are now free!

Learn more or hide this

Nick Zadrozny's Profile

GitHub User: nz

Site: http://nick.zadrozny.com/

Comments by Nick Zadrozny

Avatar

There are definite feature, performance and scaling tradeoffs versus a dedicated search engine like Solr or Sphinx. I'll compare to Solr, since that's where my expertise lies (I'm a cofounder of websolr)

Relative advantages for Postgres:

  • Reuse an existing service that you're already running instead of setting up and maintaining something new.
  • Much better search performance than SQL LIKE.

Relative advantages for Lucene/Solr:

  • Scale your indexing and search load separately from your regular database load.
  • More efficient indexing-time performance (microlithic segments vs. monolithic B-tree).
  • More flexible term analysis for things like accent normalizing, linguistic stemming, N-grams, markup removal.
  • Probably faster search performance for common terms or complicated queries.
  • Much better term relevancy ranking -- faster, and cheap to customize.
  • More flexible data model and better tolerance for changing data model
  • Phrase search

Other Postgres TODOs that Lucene/Solr handle just fine: http://www.sai.msu.su/~megera/wiki/FTS_Todo

Clearly I think a dedicated search engine is the better option here. But at least if you're using LIKE, then Postgres full-text search is a clear upgrade :)

Avatar

Thanks for the plug, Linus! We're web developers at Websolr, and also felt the same pain of setting up and monitoring Solr servers for our client projects. Hence the birth of Websolr.

For those interested in trying out Websolr, you can use the coupon RAILSCAST278 at signup for your first month free of our Silver plan. (Or $25 off any other.)

Avatar

Real-time indexing is available in recent versions of Lucene, and can be accessed in Solr if you are willing to roll up your sleeves and write some Java. We're beta testing our own flavor of it over at Websolr

Avatar

Sphinx is faster for indexing, certainly, because Solr has a lot more overhead built in to that process. The 'client' software (Sunspot) has to fetch data from the database, format it into XML, then HTTP POST that XML back to Solr.

I like how Mat Brown (author of Sunspot) put it:

In my unbiased opinion, Solr is better than Sphinx in every way, except Sphinx is faster at reindexing the entire data set, which you pretty much never need to do. Unless you use Sphinx.

Avatar

There's some ongoing work to support the new, official Solr 3 spatial search APIs.

Avatar

You're looking for the highlight method:

ruby
Post.search do
  keywords params[:q] do
    highlight :title, :body
  end
end