You are actually a mind reader !!! Thanks..
I would have to agree with elad. You must have a crystal ball for rails developers. Thanks for the episode.
I like this episode, it inspires me a lot, really thanks!
Firstly thanks! I look forward to each weeks episode.
I'm not sure what goes wrong but this is the output of the scrapitest.rb
/usr/lib/ruby/gems/1.8/gems/scrapi-1.2.0/lib/scraper/reader.rb:216:in `parse_page': Scraper::Reader::HTMLParseError: Unable to load /usr/lib/ruby/gems/1.8/gems/scrapi-1.2.0/lib/scraper/../tidy/libtidy.dylib (Scraper::Reader::HTMLParseError)
from /usr/lib/ruby/gems/1.8/gems/scrapi-1.2.0/lib/scraper/base.rb:865:in `document'
from /usr/lib/ruby/gems/1.8/gems/scrapi-1.2.0/lib/scraper/base.rb:749:in `scrape'
from /usr/lib/ruby/gems/1.8/gems/scrapi-1.2.0/lib/scraper/base.rb:347:in `scrape'
gems list scrapi gives: 1.2.0
I will try to fix it and post my solution here.
its a 64-bit problem.
This guy has a quick and dirty fix for it. I did not use it. I will wait until the gem is improved to not include tidy/tidylib.dll tidy/tidylib.so as they hopefully are in the middle of removing tidy/.
regardless it's a quite nice gem and another good episode.
for me Nokogiri (http://github.com/tenderlove/nokogiri/tree/master) does the job pretty well.
If you do not want to replace FireBug with FireQuark you can use http://www.selectorgadget.com/ bookmarklet to interactively build a unique CSS selector for any element on a page. This works also in Safari.
hey Ryan, I would actually suggest taking a look at Hpricot... I've done a few applications that required quite a bit of scraping (legal of course :) ) and fount Hpricot to be a stable, good solution. The Hpricot API also uses the familiarity of CSS selectors for convenience ... unless I'm missing something is there something else that ScrAPI offers that Hpricot doesn't?
Thanks for another great 'cast. I have been scraping with mechanize/nokogiri and like it (except installing is painful). I was (and still occassionally) use watir to scrap. As always, it is good to see a new tool.
@Henning, thanks for the selectorgadget link, just what i needed, cause some how FireQuark can't work on latest Firefox ver 3.5.1
This episode seems to freeze both audio and video around 3:12. Just thought you might want to know!
It seems to work fine on site, but it wasn't working when I tried to download from the RSS feed.
Excellent screencast, scaping with ruby and scrAPI sees just so much fun. Cant wait to try it out tomorrow! Big Thanks!
I get a problem though running Product.fetch_prices
You have a nil object when you didn't expect it!
You might have expected an instance of ActiveRecord::Base.
The error occurred while evaluating nil.
I'm with @RORgasm on Hpricot. It uses CSS or Xpath selectors and has great block handling for multiple elements. Behaves similarly to jQuery on the traversal end.
As always, thanks for the great screencast!
@elad New version of firequark (compatible with ff3.5 is here): http://www.quarkruby.com/assets/2009/8/4/firequark-3.5.2.xpi
I have the latest version of scrapi installed, however for some reason when I try running the scrapitest.rb code, I receive the following error:
NameError: uninitialized constant Scraper
at top level in scrapitest1.rb at line 5
Program exited with code #1 after 0.14 seconds.
I've been playing with scRUBYt and FireWatir lately, they've given me much joy. I'll be looking forward to your screencast on scRUBYt when you do get it to run. Salute!
Yes, how does hpricot compare to ScrAPI? How about their speeds in comparison?
And of course: THANK YOU very much for these ultra high quality screencasts. I am so glad that I have this very convenient source of know how.
One question I have:
So far I am not very comfortable with the concept of the Ruby symbols. Most of the time I know how to modify existing code, but so far I was not able to find a text explaining the concept of Ruby symbols sufficiently.
This text that I just found helps somewhat: http://glu.ttono.us/articles/2005/08/19/understanding-ruby-symbols
and comment 1 and 12 on mentioned page indicate that there are special Rails aspects of Ruby symbols, but the article is not intended to cover Rails.
If you try it on Windows and get an error related to "libtidy.so", just delete the libtidy.so file in folder "ruby/lib/ruby/gems/1.8/gems/scrapi-1.2.0/lib/tidy". This will force scrapi to use "libtidy.dll" in the same folder...
@Ryan what is the advantage of use scrAPI? why don't just use Hpricot?
if you're getting this error:
./scrapi.rb:5: uninitialized constant Scraper (NameError)
from /opt/ruby-1.8.7-p72/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
then add this 'gem 'scrapi'
Hi Ryan, love the screencasts, don't love the spam getting through your filters in the comments.
Hopefully you can improve this and share how you did it.
Interesting topic even tho I am not really impressed by scrapi, I hope to see some alternatives (maybe hpricot)
Thanks for another great screencast!
Is it possible to scrape password protected pages, or will ScRUBYt! be required?
In my case it works all fine.
And pls kill this spam :-)
Not bed, API is very interestinc for internet.
Guys having problem with libtidy.dilyb:
Scrapi works good but there is a problem with UTF-8 characters, e.g. german "Umlaute" like ö, ä, ü.
Scrapi messes them up.
In the scrape cheat sheet there is a hint that one can call:
where :parse_options should have something to do with tidy, i.e. scrapi should be able to deal with utf-8 characters.
Has anybody done this ?
I don't see how to use :parse_options.
Please post an example of working code which uses those :parse_options! Thanks.
use tidy_ffi, works like a charm
I can't get scrapi running under snow leopard as it seems that not only do you need a new libtidy.dylib you also new a new .so, which I can't seem to find anywhere!
I am not sure why scrapi requires all these binaries and doesn't just use what is installed.
Excellent screencast, thanks!
Eduardo M. - Internal Development
very interesting once again!
First sign in through GitHub to post a comment.