I would have to agree with elad. You must have a crystal ball for rails developers. Thanks for the episode.
I like this episode, it inspires me a lot, really thanks!
Hi,
Firstly thanks! I look forward to each weeks episode.
I'm not sure what goes wrong but this is the output of the scrapitest.rb
/usr/lib/ruby/gems/1.8/gems/scrapi-1.2.0/lib/scraper/reader.rb:216:in `parse_page': Scraper::Reader::HTMLParseError: Unable to load /usr/lib/ruby/gems/1.8/gems/scrapi-1.2.0/lib/scraper/../tidy/libtidy.dylib (Scraper::Reader::HTMLParseError)
from /usr/lib/ruby/gems/1.8/gems/scrapi-1.2.0/lib/scraper/base.rb:865:in `document'
from /usr/lib/ruby/gems/1.8/gems/scrapi-1.2.0/lib/scraper/base.rb:749:in `scrape'
from /usr/lib/ruby/gems/1.8/gems/scrapi-1.2.0/lib/scraper/base.rb:347:in `scrape'
from scrapitest.rb:10
gems list scrapi gives: 1.2.0
I will try to fix it and post my solution here.
Back again,
its a 64-bit problem.
This guy has a quick and dirty fix for it. I did not use it. I will wait until the gem is improved to not include tidy/tidylib.dll tidy/tidylib.so as they hopefully are in the middle of removing tidy/.
http://anti.teamidiot.de/nusse/2009/05/scrapi_libtidyso_fail/
regardless it's a quite nice gem and another good episode.
for me Nokogiri (http://github.com/tenderlove/nokogiri/tree/master) does the job pretty well.
If you do not want to replace FireBug with FireQuark you can use http://www.selectorgadget.com/ bookmarklet to interactively build a unique CSS selector for any element on a page. This works also in Safari.
hey Ryan, I would actually suggest taking a look at Hpricot... I've done a few applications that required quite a bit of scraping (legal of course :) ) and fount Hpricot to be a stable, good solution. The Hpricot API also uses the familiarity of CSS selectors for convenience ... unless I'm missing something is there something else that ScrAPI offers that Hpricot doesn't?
Thanks for another great 'cast. I have been scraping with mechanize/nokogiri and like it (except installing is painful). I was (and still occassionally) use watir to scrap. As always, it is good to see a new tool.
I would love to see a 'cast where you navigate and scrap a site that includes Javascript.
@Henning, thanks for the selectorgadget link, just what i needed, cause some how FireQuark can't work on latest Firefox ver 3.5.1
Hey Ryan,
This episode seems to freeze both audio and video around 3:12. Just thought you might want to know!
Clarification:
It seems to work fine on site, but it wasn't working when I tried to download from the RSS feed.
Excellent screencast, scaping with ruby and scrAPI sees just so much fun. Cant wait to try it out tomorrow! Big Thanks!
Great screencast.
I get a problem though running Product.fetch_prices
"
You have a nil object when you didn't expect it!
You might have expected an instance of ActiveRecord::Base.
The error occurred while evaluating nil.[]
"
Any clues?
Bates, you listened .... :)
One more request, hope you could have another installment with nokogiri and mechanize
Thanks
Chetan
I'm with @RORgasm on Hpricot. It uses CSS or Xpath selectors and has great block handling for multiple elements. Behaves similarly to jQuery on the traversal end.
As always, thanks for the great screencast!
@elad New version of firequark (compatible with ff3.5 is here): http://www.quarkruby.com/assets/2009/8/4/firequark-3.5.2.xpi
I have the latest version of scrapi installed, however for some reason when I try running the scrapitest.rb code, I receive the following error:
NameError: uninitialized constant Scraper
at top level in scrapitest1.rb at line 5
copy output
Program exited with code #1 after 0.14 seconds.
Any ideas?
I've been playing with scRUBYt and FireWatir lately, they've given me much joy. I'll be looking forward to your screencast on scRUBYt when you do get it to run. Salute!
Yes, how does hpricot compare to ScrAPI? How about their speeds in comparison?
And of course: THANK YOU very much for these ultra high quality screencasts. I am so glad that I have this very convenient source of know how.
One question I have:
So far I am not very comfortable with the concept of the Ruby symbols. Most of the time I know how to modify existing code, but so far I was not able to find a text explaining the concept of Ruby symbols sufficiently.
...
This text that I just found helps somewhat: http://glu.ttono.us/articles/2005/08/19/understanding-ruby-symbols
and comment 1 and 12 on mentioned page indicate that there are special Rails aspects of Ruby symbols, but the article is not intended to cover Rails.
If you try it on Windows and get an error related to "libtidy.so", just delete the libtidy.so file in folder "ruby/lib/ruby/gems/1.8/gems/scrapi-1.2.0/lib/tidy". This will force scrapi to use "libtidy.dll" in the same folder...
@Ryan what is the advantage of use scrAPI? why don't just use Hpricot?
if you're getting this error:
./scrapi.rb:5: uninitialized constant Scraper (NameError)
from /opt/ruby-1.8.7-p72/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
then add this 'gem 'scrapi'
require 'rubygems'
gem 'scrapi'
require 'scrapi'
-mark
Hi Ryan, love the screencasts, don't love the spam getting through your filters in the comments.
Hopefully you can improve this and share how you did it.
Thanks
Interesting topic even tho I am not really impressed by scrapi, I hope to see some alternatives (maybe hpricot)
Thanks for another great screencast!
Is it possible to scrape password protected pages, or will ScRUBYt! be required?
Thanks.
Guys having problem with libtidy.dilyb:
http://exceptionz.wordpress.com/2009/11/03/scrapi-on-snow-leopard/
Scrapi works good but there is a problem with UTF-8 characters, e.g. german "Umlaute" like ö, ä, ü.
Scrapi messes them up.
In the scrape cheat sheet there is a hint that one can call:
myscraper=scraper.scrape(uri, :parse_options)
where :parse_options should have something to do with tidy, i.e. scrapi should be able to deal with utf-8 characters.
Has anybody done this ?
I don't see how to use :parse_options.
Please post an example of working code which uses those :parse_options! Thanks.
web tasarımı, kurumsal site tasarımı, profesyonel web sitesi tasarımı, profesyonel web tasarımı
<a href="http://www.webtasarimturk.net" title="web tasarımı">web tasarımı</a>
I can't get scrapi running under snow leopard as it seems that not only do you need a new libtidy.dylib you also new a new .so, which I can't seem to find anywhere!
I am not sure why scrapi requires all these binaries and doesn't just use what is installed.
Good post. I am also going to write a blog post about this...I enjoyed reading your post and I like your take on the issue. Thanks.
Useful and nice episode! High quality low price.It's fit for you. Thanks MattR for sharing that. And thanks Ryan for this great screencast.
Discount Wholesale Electronics, Wholesale Cell Phones, Electronic Gadgets and More from the Best Dropship Wholesaler
Thanks for sharing your article. I really enjoyed it. I put a link to my site to here so other people can read it. My readers have about the same interets
it is a nice post, thanks for your sharing, like it so much.
I would have to agree with elad. You must have a crystal ball for rails developers. Thanks for the episode.
Nice post. My friend John told me about this blog some weeks ago but this is the first time I’m visting. I’ll undoubtedly be back.






