Nick Zadrozny's Profile

Nick Zadrozny on #343 Full-Text Search in PostgreSQL (pro) 2012-04-25 16:45:04

There are definite feature, performance and scaling tradeoffs versus a dedicated search engine like Solr or Sphinx. I'll compare to Solr, since that's where my expertise lies (I'm a cofounder of websolr)

Relative advantages for Postgres:

Reuse an existing service that you're already running instead of setting up and maintaining something new.
Much better search performance than SQL LIKE.

Relative advantages for Lucene/Solr:

Scale your indexing and search load separately from your regular database load.
More efficient indexing-time performance (microlithic segments vs. monolithic B-tree).
More flexible term analysis for things like accent normalizing, linguistic stemming, N-grams, markup removal.
Probably faster search performance for common terms or complicated queries.
Much better term relevancy ranking -- faster, and cheap to customize.
More flexible data model and better tolerance for changing data model
Phrase search

Other Postgres TODOs that Lucene/Solr handle just fine: http://www.sai.msu.su/~megera/wiki/FTS_Todo

Clearly I think a dedicated search engine is the better option here. But at least if you're using LIKE, then Postgres full-text search is a clear upgrade :)

Nick Zadrozny on #278 Search with Sunspot 2011-08-11 15:30:03

Thanks for the plug, Linus! We're web developers at Websolr, and also felt the same pain of setting up and monitoring Solr servers for our client projects. Hence the birth of Websolr.

For those interested in trying out Websolr, you can use the coupon RAILSCAST278 at signup for your first month free of our Silver plan. (Or $25 off any other.)

Nick Zadrozny on #278 Search with Sunspot 2011-08-11 15:24:03

Real-time indexing is available in recent versions of Lucene, and can be accessed in Solr if you are willing to roll up your sleeves and write some Java. We're beta testing our own flavor of it over at Websolr

Nick Zadrozny on #278 Search with Sunspot 2011-08-11 15:21:03

Sphinx is faster for indexing, certainly, because Solr has a lot more overhead built in to that process. The 'client' software (Sunspot) has to fetch data from the database, format it into XML, then HTTP POST that XML back to Solr.

I like how Mat Brown (author of Sunspot) put it:

In my unbiased opinion, Solr is better than Sphinx in every way, except Sphinx is faster at reindexing the entire data set, which you pretty much never need to do. Unless you use Sphinx.

Nick Zadrozny on #278 Search with Sunspot 2011-08-11 15:16:03

There's some ongoing work to support the new, official Solr 3 spatial search APIs.

Nick Zadrozny on #278 Search with Sunspot 2011-08-11 15:15:03

You're looking for the highlight method:

          ruby
        
Post.search do
  keywords params[:q] do
    highlight :title, :body
  end
end

Nick Zadrozny's Profile

Comments by Nick Zadrozny