#191
Dec 07, 2009

Mechanize

Mechanize extends the power of Nokogiri allowing you to interact with multiple pages on the site: click links, submit forms, etc.
Tags: tools
Download (22.8 MB, 10:16)
alternative download for iPod & Apple TV (14.8 MB, 10:16)

Note: I’ve changed the password I used in the Ta-Da List account for this episode, so the supplied code will not work without modification.

Resources

sudo gem install mechanize
# script/console
puts Readline::HISTORY.entries.split("exit").last[0..-2].join("\n")

# lib/tasks/product_prices.rake
desc "Import wish list"
task :import_list => :environment do
  require 'mechanize'
  agent = WWW::Mechanize.new
  
  agent.get("http://railscasts.tadalist.com/session/new")
  form = agent.page.forms.first
  form.password = "secret"
  form.submit
  
  agent.page.link_with(:text => "Wish List").click
  agent.page.search(".edit_item").each do |item|
    Product.create!(:name => item.text.strip)
  end
end

RSS Feed for Episode Comments 36 comments

1. Ryan Dec 07, 2009 at 00:19

Ive been wanting to use mechanize for a while! Great to see a screencast on it!

Thanks ryanb!


2. Steve Dec 07, 2009 at 00:21

Sweet! Thanks again for your consistent screen casts!
Is that your real wish-list?


3. chris Dec 07, 2009 at 00:23

Hi Ryan,
Love you work! Small thing though - you promised to stick the line for getting the console history in the show notes, and I don't see it ...


4. Ryan Bates Dec 07, 2009 at 00:48

@Steve, nope, not a real wish list so don't get me anything from in. ;)

@Chris, oops, it's up there now.


5. Richard Dec 07, 2009 at 02:27

Looks awesome and I think this will save me a lot of time. Does this work with Javascript at all or only straight websites - Some stupid websites do the form send through javascript only, if it could handle that it would be amazing...


6. Sam Millar Dec 07, 2009 at 03:43

That irb copy function is pretty neat, thanks for the tip!

It was very cool when it all came together and you proved how easy it was to simply add products to your site and almost instantly update the prices, put a smile on my face.


7. Godfrey Chan Dec 07, 2009 at 04:46

@Sam

That was exactly my reaction. Reminds me of how I felt when I first watched the create a blog a 15 minutes screencast :)


8. Jeff Tucker Dec 07, 2009 at 06:48

@Sam, @Godfrey
I felt the same way! I couldn't help but laugh -- I worked with Mechanize on Python a while back and it just did not seem that easy :)

@Ryan
You have a real talent for presenting this stuff. I really appreciate the time you put into it!


9. mrbrdo Dec 07, 2009 at 07:22

I'll just leave this here aswell (already commented in #190).

If anyone needs to scrap AJAX-heavy sites and html parsers just don't cut it, you might want to take a look at a HtmlUnit library. Sadly, it's only available for Java, but it's the only library capable of Javascript that I found.
Most of the time you wouldn't need this, but if a site uses a lot of ajax and some obfuscated javascript, and changes a lot, it might be the only way.


10. adrenally fatigued Dec 07, 2009 at 08:24

Ryan does it again. This is exactly what I need for an app I'm working on now.

Is there any way to deal with captchas with mechanize? Seems like more and more pages particularly any type of form submission have captcha.

Because someone is sure to bring up the black hat type stuff that can be done with mechanize, I assure you my intentions are purely white hat.


11. Chip Castle Dec 07, 2009 at 08:39

I put this at the bottom of my ~/.irbrc file to quickly access this command history:

def hist
  puts Readline::HISTORY.entries.split("exit").last[0..-2].join("\n")
end

HTH,
Chip

Need to invoice? http://invoicethat.com


12. eltados Dec 07, 2009 at 14:47

Maybe it is a stupid question but would this not be possible to do with web rat ?


13. lmjabreu Dec 07, 2009 at 17:37

Awesome screencasts, thanks a lot.


14. Kieran P Dec 07, 2009 at 20:40

I'm trying to use the history command but I'm getting:

NoMethodError: private method `split' called for #<Array:0x10122ea28>

This commit from you dotfiles does work somewhat better, but it lists everything in my .irb_history, so it doesn't contain exit lines to split at?

http://github.com/ryanb/dotfiles/commit/78c149fb7e9ac1f2d89ed3a7518aee293b63b747


15. Peter Dec 07, 2009 at 20:47

How would you get access to Nokogiri object of a give web page off once you are "Authenticated" to scrape non form / link date form the page?


16. Peter D Dec 08, 2009 at 09:18

I've been using Mechanize to scrape web content for a while and it's extremely convenient.

What I noticed though is if you keep the agent alive for multiple requests (like looping through pagination) it starts consuming more and more memory. My guess is Mechanize agent is storing the pages previously loaded even if you clear the variable holding said page.

Anyone know how to deal with this? Can you clear the 'cache' so to speak?


17. Simon Cookie Lover Dec 09, 2009 at 03:17

Thanks Ryan, great one !

I'd like to ask for some help...

Actually I need to forge a cookie. My rails application is a kind of proxy between the user and another webapp. I need to preserve the session of the end-webapp, through the entire user session on MY Rails app.

hence, I'm creating a new agent object for each new request, and I need to re-create the cookie with the previous session ID.

I'm struggling with Mechanize::CookieJar and stuff, but no luck yet...

Any idea ?


18. Simon Cookie Lover Dec 09, 2009 at 05:59

Wow lots of spams despite your Rails-captcha...

Anyway, I found a way to hack around my problem :

agent.cookie_jar.jar['mydomain'] = {'/' => {'PHPSESSID' => WWW::Mechanize::Cookie.new('PHPSESSID', previous_session_id)}}

However, #jar is not documented... I wonder if it will stay ok with upgrades...


19. Susann Dec 09, 2009 at 06:18

Thanks. That's cool


20. Peter D Dec 09, 2009 at 08:54

For anyone struggling with Mechanize's memory usage like I was, you can limit the maximum number of pages it retains in memory by setting the agent's "max_history", for example:

agent = WWW::Mechanize.new
agent.max_history = 20


21. Juan Medín Piñeiro Dec 09, 2009 at 21:06

Great. You are showing how powerful Mechanize is and a "first touch" of it in a clear way.

Thanks for the screen casts. The quality is top-notch.


22. pankaj Dec 10, 2009 at 09:25

Great work Ryan,
It would be great if you could come up with a screencast on making screencasts. Till then could you share the tools you use to make your screencasts.

Thanks


23. Kieran P Dec 10, 2009 at 17:45

Looks like SPAM is taking over again :-( Was nice there for a while.

Have you considered a plugin like Rakismet?

I'm going to fork the Railscasts repo, make some changes to hide comments if they appear spammy, until you flag them as ok. That way, even if spam still gets entered, at least it won't show up.


24. sh Dec 11, 2009 at 12:41

thanks


25. Lin He Dec 11, 2009 at 17:12

great episode once again! Thanks so much!


26. for-sec Jan 02, 2010 at 06:25

thanks a lot


27. Nils Jan 04, 2010 at 05:32

Well done, i like it. Nice to get used to what mechanize is, and how to use it!

Thanks a lot!


28. austin_web_developer Jan 05, 2010 at 09:09

Just a note ...

If you're having problems getting a form to submit ... try using the click_button form method instead.

http://mechanize.rubyforge.org/mechanize/WWW/Mechanize/Form.html


29. Chong Km Jan 05, 2010 at 21:22

I get an error when I try out the history code.

    Readline::HISTORY.entries.split("exit").last[0..-2]

which results in:

NoMethodError: private method `split' called for #<Array:0x1500554>

But it works if I do this:

    Readline::HISTORY.entries[0..-2].join("\n")


30. AaronH Jan 08, 2010 at 15:46

Nice history tip, Ryan.

I created a method in my .irbrc that easily allows me to get the history of this or previous sessions as well as being able to limit to a certain number of lines ala

hits :all, 25

You can get the code at http://gist.github.com/272588


31. Exclusive Local Leads Jan 10, 2010 at 12:51

Is there any way to deal with captchas with mechanize? Seems like more and more pages particularly any type of form submission have captcha.


32. Branden Jan 13, 2010 at 02:05

Is it possible to use mechanize to fetch images and such from a web site or is there a better gem/plugin suited for this?


33. JasminBanus Jan 28, 2010 at 08:30

I like your this article very much, and they’re so gorgeous! Thank you to share this! At the same time, they are your best choice.


34. Buy Tool Chest Feb 03, 2010 at 09:45

What a great post. The post is good for newbie’s. You have done a remarkable job in the area of corporate blogging. Keep it up!


35. Steve Feb 06, 2010 at 12:21

Is there a way to use this Mechanize to direct a user to their email provider and plug their username into the form and tab down to the password field? So essentially they signup and I redirect the to their email account where they just enter their password since I provided the username?


36. Website Design Company Feb 13, 2010 at 02:40

I was very pleased to find this post on this site.I wanted to thank you for this great read!! I definitely enjoying every little bit of it and I have you bookmarked to check out new stuff you post.


37. Michael Sepcot Feb 14, 2010 at 11:04

Array#split is an ActiveSupport extension. If you want to use the history code in irb be sure to require 'active_support/core_ext/array'


38. Horace Ho Mar 30, 2010 at 20:12

on asp generated page, the form submit automatically fill 2 form variables:

theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;

before submit. How to define them in Mechanize? thx!

additional info: http://www.xefteri.com/articles/show.cfm?id=18


39. liechtenstein May 25, 2010 at 03:00

Thanks, cool Mechanize


40. free card sharing Jun 29, 2010 at 09:06

Maybe it is a stupid question but would this not be possible to do with web rat


41. spam Jun 29, 2010 at 09:07

Well I think it's very useful


42. sevişme sanatı Jul 19, 2010 at 01:50

I was actually looking for this resource a few weeks back. Thanks for sharing with us your wisdom.This will absolutely going to help me in my projects .


43. DVD to iPad Converter Jul 25, 2010 at 19:18

this is ah ha ,thank you for sharing the post


44. Earrings Jul 27, 2010 at 02:36

thanks for sharing thoes informations , It is very usefull,really, i like it!I hope you do better and better on your website,And popular more and more!


46. игры онлайн гонки Aug 08, 2010 at 18:32

Игры онлайн - бесплатные онлайн игры. Играть в лучшие игры онлайн бесплатно. Спортивные игры онлайн, стратегии игры онлайн, стрелялки игры онлайн, тесты онлайн


47. радио онлайн Aug 08, 2010 at 18:34

Слушать радио онлайн бесплатно


48. UGG Boots on sale Aug 10, 2010 at 18:56

Gooooooooooooooooooood luck ~~!!


49. Озеленення, фото ландшафтного дизайну Aug 12, 2010 at 23:31

Ви навіть не уявляєте, якими різними можуть бути стилі ландшафту та озеленення, а дизайнер, запропонує клієнтові класичні стилі ландшафтного дизайну або оригінальні ідеї,
після яких будить складно не замовити у нас проекти.
Якщо Вас зацікавила наша пропозиція по озелененню та ландшафтному дизайну, але Ви все одно не можете вибрати дизайн - пошукайте
фото ландшафтного дизайну в Інтернеті і порівняйте з нашими роботами.


50. Rip Blu-ray for Mac Aug 18, 2010 at 01:18

Thanks,it's so good.
suport!


51. AVI to iPad Aug 19, 2010 at 00:46

my style,like here.


52. Air Jordan Spizike Aug 19, 2010 at 20:58

Great post! Thanks for share. I really appreciate what you post. If only i could think of a plugin to actually write. I will instantly grab your rss feed to stay informed of any updates.


53. wholesale new era caps Aug 20, 2010 at 20:17

That is an awfully astounding column you've posted.Thanks a lot for that a fantastically amazing post!


54. converse all star Aug 20, 2010 at 20:56

love converse all star,love yourself.High quality low price.It's fit for you.


55. Jimmy Choo sale Aug 21, 2010 at 00:56

Thank you for sharing!


56. PDF to Images Converter Aug 24, 2010 at 23:04

Some times, to a certain need, we have to convert PDF to image for enjoyment.


57. Wholesale Electronics Aug 25, 2010 at 01:22

Discount Wholesale Electronics, Wholesale Cell Phones, Electronic Gadgets and More from the Best Dropship Wholesaler


58. louis vuitton shoes Aug 26, 2010 at 23:13

Thanks for sharing your article. I really enjoyed it. I put a link to my site to here so other people can read it. My readers have about the same interets


59. eztoo Aug 29, 2010 at 18:27

Try our products. There will be surprises.
http://www.eztoosoft.com


60. louis vuitton neverfull Aug 29, 2010 at 23:00

Well I think it's very useful


61. snow boots Aug 30, 2010 at 20:39

Does this work with Javascript at all or only straight websites - Some stupid websites do the form send through javascript only, if it could handle that it would be amazing...


62. levis belts Sep 01, 2010 at 20:37

Thanks for sharing your article.


63. batteries Sep 02, 2010 at 07:35

i love your articles youre always giving me great ideas on how to progress with my blog. thanks a lot for keeping us informed.

Add your comment:

(SKIP THIS ONE)

(required)

(not shown)


(use pastie or gist for code)

sponsored by:
if you want to help:
required:
Get Quicktime Player
Give Back to Open Source