#343 Full-Text Search in PostgreSQL pro
Postgres offers full-text searching right out of the box. This episode shows how to write queries from scratch, apply tools like Texticle and pg_search, and optimize performance through indexes.
- Download:
- source code
- mp4
- m4v
- webm
- ogv
How does each episode come out right at the time I need it? It's happened so many times it's a little frightening.
This is because right now you need tens of things, so it is not hard to match one of them. If today Ryan released another topic, probably you would tell the same. :) No mystics.
The veil is lifted.
(But yes, that's probably the right answer)
Alexander
I will respectfully challenge you on this one. Ryan's timing is absolutely uncanny! I've been agonizing over how to get rid of elastic search as I move my app to heroku which has only beta support via Bonsai.io for last week.
It also happened to be my Birthday on the 17th so the timing is especially auspicious!
Thank you Ryan for these amazing screencasts. When someone finally creates the RoR Hall of Fame, you will surely be one of the top nominees!
Nice!
As an aside, a tend to create a tsv column for any models that I'm going to be having a full text search on and using before update / insert triggers for maintenance.
This way, the expense of the
to_tsvector
happens only on writes and I don't need to have multiple indexes, just the one index on the whole vector.Obviously this is not the solution for a volatile schema as you'll need to recreate the triggers and recreate the tsv column if you want to search for more things, but one where the schema and search is fairly stable its ideal.
Thanks for the tip. One side effect of this approach is that you will not be able to weight certain columns AFAIK. Unless maybe you repeat them when building the tsvector?
The example I have is something like:
http://omarqureshi.net/articles/2010-12-25-tsearch2-for-rails-applications
I re-use the same function for inserts that i do for updates and just call it differently inside the trigger. The only real bit of duplication is the refresh function - which i have for existing content only.
Awesome!!! Thanks for the blog post.
Thanks for another informative screencast.
Not sure if this is only happening to me, but it appeared to clip the end by maybe only a few seconds. I don't think anything substantive was cut, just a heads-up
Same here
Yes - this happened to me too. It is also happening on older railscasts. I think that may important in solving this little issue. :)
Thank you for letting me know about this issue. Which browser are you using? Also try right-clicking the video and switching between HTML 5 and Flash, does that change it?
It seems to be working okay here in Safari.
Ubuntu 12.04 with Chromium 18.0.1025.151:
Works fine with HTML5 and cuts with Flash, so no worries :)
Works in HTML5, is cut off with Flash.
OSX Lion with Chrome 19.
Same OS & Browser, same results.
Happening for me in Chrome (both Lion and Snow Leopard). Switching to html5 seems to resolve the problem.
thanks Ryan
small correction, @14:42 you actually mean 5ms not 5s
Can't play on ipad :/
Excellent. Ryan, i have notice you havent touched integrating rails with social networks, any hopes for seen that in a future episode?
Yes, I hope to cover Facebook and Twitter integration in future episodes.
This is insane. No more MySQL for me!
Thank you Ryan. This is really amazing. PostgreSQL and Pg_search are life-saving for more web applications.
+1
At the end search gives 3 same articles named "Batman", a bit strange.
Not so much as Ryan seeded his database a thousand times... so there will be multiple instances of the articles he has, so probably a hundred or so "Batman" articles...
Good one. Under what circumstances would you recommend this over elastic search? The obvious benefit is that you don't need to install and keep up another piece of software just for indexing your content, but I'm concerned about the speed. Have anyone done a speed test between these two search systems? For example, could this handle searching data for a site like pinterest? To think of it, one could start using this and add elastic search later...but any suggestions? Thanks-
I recommend using Postgres for full text search to start off, and then you can always switch to something like Elastic Search later if you need to. It is just so convenient to keep the full text search in SQL.
You may run into problems if you combine Texticle with eager loading (as I have done). Depending on your particular queries, you could end up with an error like this:
The gory details (sans a simple solution) are here https://github.com/tenderlove/texticle/issues/71, here https://github.com/Casecommons/pg_search/issues/14, and here https://github.com/rails/rails/issues/950
Thanks for another great episode. I'm new to Rails, and I'm learning alot from RailsCasts.
This episode gets deeper in the weeds than I've ever gone with a database. I'm not familiar with such SQL commands as tsvector. Any suggestions as to a good book or online tutorial to help me get up to speed?
BTW, the problem I noted above does not afflict me if I use
pg_search
(so, thanks for the alternative, Ryan).Wow, I moved to pg over a year ago so I could use Heroku. I never even knew it would help me with search. I have some code to update. Thanks!
Great screencast, I was using texticle already but this has given me much more flexibility.
Has anybody else ran into an error when creating the unaccent extension? I'm running rails 3.2, I've had no problems elsewhere in this screencast for some reason or other its not liking this migration, I've triple check the syntax
I second this. I'm getting this exact same error running rails 3.21 and Ruby 1.9.2.
Brilliant screencast, BTW, Ryan. Extremely helpful stuff.
Make sure you are using the latest homebrew install of postgres (9.1.x) on Mac.
On Ubuntu/Debian you will probably need the postgres-contrib package installed for extensions.
Agh, that could be it, I'm running 9.0.4, I'll update any confirm whether or not this is the problem. Thanks
Confirmed, it was the postgres version, thanks for your help.
Ah but it's not going to work on Heroku until they upgrade from 8.3
Heroku has released a Public Beta of their Shared Database at Postgres 9.1 - you just need to add it to your app through the Heroku command. See https://devcenter.heroku.com/articles/heroku-shared-postgresql if you're able to use beta versions it should work.
http://www.postgresql.org/docs/9.1/interactive/contrib.html
"When building from the source distribution, these modules are not built automatically, unless you build the "world" target (see step 2)."
unaccent is not built by default for a postgres install. this might be the issue.
Does this mean if we switch to postgres from mysql, we no longer need solr or sphinx anymore?
There are definite feature, performance and scaling tradeoffs versus a dedicated search engine like Solr or Sphinx. I'll compare to Solr, since that's where my expertise lies (I'm a cofounder of websolr)
Relative advantages for Postgres:
SQL LIKE
.Relative advantages for Lucene/Solr:
Other Postgres TODOs that Lucene/Solr handle just fine: http://www.sai.msu.su/~megera/wiki/FTS_Todo
Clearly I think a dedicated search engine is the better option here. But at least if you're using LIKE, then Postgres full-text search is a clear upgrade :)
Hey Nick, thanks, this is useful to know!
Hwaaaa! Awsome timing. I just started adding Sphinx to my application, and I'm already using Postgres. For my particular needs, Postgres apparently offers more than enough. Thanks again very much Ryan.
Is it possible to have condition clause in searching like has clause in thinking sphinx ?
Such an amazong epsiode. I was reading on this full text searching for PostgreSQL before, but I didn't understand quite a lot of things, and this really cleared things up. Thank you so much :)
I was wrong, above. Even with
pg_search
, I run into problems with the ActiveRecord-generated queries. e.g.:https://github.com/Casecommons/pg_search/issues/14
From trolling the web, it looks like the problem in ActiveRecord won't actually be fixed until Rails 4.0.
Thanks, Brian. I was just banging my head against my keyboard. I guess I'll have to stick with crafting my search queries myself for now.
Do you think this solution could handle a system with a single model but millions (around 4M) of records?
I don't think so. Unless you only index fields with only a few words such as title. If you want to also index content field, I think you need a dedicated searching software such as sphinx or solr.
I might be missing something obvious here, but I ran through this tutorial on my own app and then downloaded and setup the Blog After from this screencast. In both cases, when I do a search with more than one word... say for example "Lex Luthor"...
I get the following. So then, one word search no problem.. words with spaces between =
PG::Error: ERROR: syntax error in tsquery: "Lex Luthor"
LINE 1: ...articles" WHERE (to_tsvector('english', name) @@ 'Lex Lutho...
^
: SELECT "articles".* FROM "articles" WHERE (to_tsvector('english', name) @@ 'Lex Luthor' or to_tsvector('english', content) @@ 'Lex Luthor') ORDER BY ts_rank(to_tsvector(name), plainto_tsquery('Lex Luthor'))
desc LIMIT 3 OFFSET 0
I get the same error on multi-word search too aesthetica
Did you ever figure this out? I've got the same issue?
Did you guys figure this out?
Ok. I think I've finally figured it out. You have to wrap all of the query strings in plainto_tsquery, so wherever you have 'Lex Luthor', you need it to be plainto_tsquery('Lex Luthor').
So I have a scope called organization_name I use like this:
HTH
Thank god, I was not getting any results back.
It's not a solution. You still losing features like :prefix and searching multiwords trought multicolumns. Searching results show you "Lex Luthor" where both words are in same column. But not, when "Lex" is in one and "Luthor" in another. Ryan show faster soulution, but with limitations.
I hate to say it but all the other solutions limit your functionality with using prefixes. The proper way is to follow the documentation here: http://www.postgresql.org/docs/9.1/static/textsearch-controls.html
Replace all spaces with "&"
You can do this in your search method:
query = query.split(" ").join(" & ")
This is an interesting tip. Super cool.
It's okay to start using the
search_field_tag
now. You'll get some benefits from mobile browsers and the correct styling in most desktops.We also can start using
phone_
andtelephone_
,number_
,url_
andemail_
helpers. On mobile device, for instance, these helpers queue the browser to pop up the correct virtual keyboard for the context.I was having issues with upper/lower case searching with pg_search. I then went and changed my query to match something like
to_tsvector('english', title) @@ to_tsquery('english', :q)
and it worked. Do you have any ideas why this would be the case?Ryan,
It doesn't seem the index updates when you update the data.
Has anyone seen how to do associated_against with UUIDs? I am getting this error.
PG::Error: ERROR: operator does not exist: uuid = character varying
LINE 1: ...M "articles" INNER JOIN "forums" ON "forums"."id" = "article...
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
Is anyone else getting odd behaviour?
For example, when I search 'police' I don't get any results, but when I search 'polic' I do! (Police exists in the content I am searching). It is also not returning some results at all - yet I know the word exists, whether the word is large or small.
When I start rails dbconsole it says I am using 9.1.4, so that seems ok, and the rest of the site seems to be functioning as usual too.
Do I need to rebuild indexes or something? Can anyone suggest anything else?
I had this problem too. Before, my queries looked like:
to fix it:
A big thank you - worked for me
+1 but I still cant make my search find any row when I search for "Police" writing only "polic". I have to write the full word exactly how it is. Any help?
A Newbie question. If a user typed a misspelled or mistype word e.g graphc rather than graphic, will it still give some result. What's the best way to handle that kind of situation as i can't find any plugin for "did you mean? graphic" functionality.
You might want to look at fuzzy matching
http://stackoverflow.com/questions/7730027/how-to-create-simple-fuzzy-search-with-postgresql-only
http://fuzzytolerance.info/using-fuzzy-search-in-postgresql/
another thing that people might want to play around with is partial match. searching for 'jun' with the blog-after code will get
where as the following will get Superman (content has the word june) and Robin (junior) as a result
also, there is an additional contrib, pg_trgm, that can help with partial searching:
http://www.postgresql.org/docs/9.1/interactive/pgtrgm.html
and can also do similarity.
like unaccent, pg_trgm is not built by default for a postgres install.
Hey Ryan,
Thanks for the great screencast! The solution that I found to your problem with slow rank or weighted search on multiple columns is to create a column to actually use for searching:
Example:
I've tested this solution with 200K records and it makes a significant difference.
Solution was found on "http://linuxgazette.net/164/sephton.html"
Thank you, thank you, thank you for this comment!
I was trying to index multiple columns without seeing any kind of performance gains using the pg_search gem. Sped up my full text query of 2000+ records from 1700 ms down to 60 ms.
Thanks!
Very useful, thanks. I also borrowed some code from this answer on Stack Overflow to make the updates lazier.
Any thought on how to do this in a multi tenant environment? Ideally something that would filter the search results by the current tenant.
?? I used the above solution from ziemekwolski and I still get nothing back when i search for say "anky" but I get results back when doing:
@dr_wallaches = DrWallach.where("callers like ? or monologue like ?","%#{params[:search][:terms]}%","%#{params[:search][:terms]}%")
It seems that indexing columns doesn´t have any effect when you are ALSO searching against associated tables
From PGSearch :
It is possible to search columns on associated models. Note that if you do this, it will be impossible to speed up searches with database indexes. However, it is supported as a quick way to try out cross-model searching.
Anyone knows about this?
I trying to use the dynamics search scope like this
But I'm getting wrong number of arguments (3 for 4) some help please
Could someone please help me? I'm stuck early on in this Railscast. I cannot seem to get searching with multiple words working (as at approx 02:15 in the video) even though I am using the exact same code as below:
Single word searches work fine, just not two or more. I'm not getting any errors, just no results are returned. Any ideas why this could possibly be?
Try bbarton's suggestion - it worked for us.
Also a couple of gotcha's was default scope and multiple includes.
We had to run the text_search as 'unscoped.where' and remove a couple of includes before any search results would displayed.
This app had one search query only on strings such as names, usernames, and email addresses (no full text searches). so we incorporated Ryan's Auto Completion:
http://railscasts.com/episodes/102-auto-complete-association-revised
It works great. Thank's Ryan!
Regarding the ranking and performance. Based on Postgres documentation you can store the results of the to_tsvector calls in a column, complete with weighting and then put your index on that specific column. Use a database trigger to keep that column up to date (containing as many search fields as you like and any other relevant data you'd like to throw in, author names, etc).
Calculating rank against that column will give you a relevance rank against everything you're searching for. I just implemented this approach on a huge site and the performance was lightning fast.
http://www.postgresql.org/docs/8.3/static/textsearch-features.html
See the "triggers and automatic updates" section.
Does somebody has any idea how to use multiple dictionaries?
Yea, you just specify the dictionary in your tsvector and tsquery calls. tsquery('english','my search terms') or tsquery('simple','search this stuff'). Same with tsvector('simple',field).
Thanks for this excellent railscast, Ryan!
pg_search is great, but setting up indexes and triggers can be tricky. Here's how I set it up, works great (like, 40 ms to search on 200K documents)
https://github.com/Casecommons/pg_search/issues/15#issuecomment-21884006
Thanks! This works great.
Here is a presentation that says instead of
use
But the >< operator is not recognized by PostgreSQL? (Version 9.2)
EDIT: ah, later on it says it's going to be a 9.3 feature
I'm running postgresl 9.2.1 on Mac OS X, installed by homebrew and Rails 3.2.12
The postrgresql lib directory has unaccent.so
But it is not installed by default and Ryan's migration did not seem to do anything.
What worked for me was:
I have a site using pg_search and GIN indexes per this screencast. Recently I changed the pg_search_scope on a model to use the "simple" dictionary instead of "english". So then, do I need to write a new migration to update the GIN indexes as well since those indexes refer to_tsvector "english"?
Simplified example below.
Hey, is it possible for pgSearch to search whole phrases? If I have a text called "On Hold" and a text called "On Hold - later", I want the search result to return only "On Hold" if the search term is "On Hold".
Is this possible? Thanks in advance!
So according to comments on this pull request
https://github.com/texticle/texticle/pull/25
Texticle is now called Textacular.
https://github.com/textacular/textacular
Where is your method "scoped" defined at in your def.self.text_search(query) ?
And exactly how did you define it?
I had the same problem. Turns out that
scoped
has been deprecated and the replacement iswhere(nil)
.This StackOverflow posting has more details.
Important note. Database extensions and index creation (at least for Postgres ones like those used in this episode) are supported since Rails 4 so there should be no need to switch to
structure.sql
as advised in this episode anymore.See this Pull Request merged over a year ago.