Show HN: MechanicalSoup, Python library for automating interaction with websites

[+] flexd|11 years ago|reply

An alternative to this is RoboBrowse [1] which is also based on requests + BeautifulSoup4 and seems a lot more mature.

[1] http://robobrowser.readthedocs.org/en/latest/readme.html

[+] rahimnathwani|11 years ago|reply

Note: this fails when you try to install using pip 1.1, the version that comes with Debian wheezy. If you have this problem, it's easy to work around it for your specific virtualenv, without changing the rest of the system:

- mkvirtualenv whatever

- pip install pip --upgrade

- pip install robobrowser

[+] mattme|11 years ago|reply

Haha! If I'd known, I would have used that myself.

[+] wc-|11 years ago|reply

Mechanize is a core part in quite a few of my projects lately and the fact that it hasn't been modified in over 2 years has been very worrisome.

There are lots of edge cases out there on websites. Mechanize has built up years of fixes and workarounds for these, I hope that MechanicalSoup can learn from these the easy way rather than waiting to make the same mistakes again.

I also hope that this repo grows into a bigger community of support, not just one person contributing (who could leave / get bored at any time). Looking forward to following this!

[+] danso|11 years ago|reply

Er...that's just the Python Mechanize, right? Ruby's Mechanize has been regularly updated and patched, though I can't say I've used it to the extent that I've run into infuriating edge cases: https://github.com/sparklemotion/mechanize

[+] lazerwalker|11 years ago|reply

I personally find Capybara[0] to be the happy medium for web scraping, if Python isn't a hard requirement. It has a simple API, like MechanicalSoup, but it can also easily be configured to use Selenium, node-webkit, or any other browser you want for full proper JS evaluation.

[0]http://github.com/jnicklas/capybara

[+] actionscripted|11 years ago|reply

Unfortunately there isn't a Capybara library for Python. There are comparable packages like Lettuce and now MechanicalSoup (to a certain extent).

[+] Deusdies|11 years ago|reply

This is fantastic. I've used python mechanize in some very large projects and it was very frustrating - their lack of documentation and, well, the fact that it's complete "abandonware".

I've had mechanize repository cloned for a year now, planning to do something with it - never got around to. Looks like MechanicalSoup just got themselves a new contributor!

[+] diminoten|11 years ago|reply

Sell me MechanicalSoup over Selenium.

[+] wc-|11 years ago|reply

Selenium seems very heavy-weight to me (granted I have only used selenium server). If you don't need to interpret javascript after a page loads then you might be able to use mechanize. In my experience I've gotten better performance and a higher ease of development with mechanize over selenium or other "full" headless browsers. Different tools for different jobs I suppose, I just tend to go for the smallest tool first.

edit: replace mechanize with mechanicalsoup in the above paragraph, they are aiming to solve the same problems in the same way.

[+] seanp2k2|11 years ago|reply

Personally I really like selenium + PhantomJS (headless WebKit browser) since it allows you to do things like automate real user interactions in your tests, then run the test suite on your CI: http://www.realpython.com/blog/python/headless-selenium-test...

Also interesting if you're going down this road is CasperJS which is basically the same thing for JavaScript, and Velocity ( https://github.com/xolvio/velocity ) which is a test runner for Meteor that seems to run all your tests constantly and give you ~real-time feedback for TDD.

Lastly, there are many things to tie in Cucumber-style testing specs with Capybara and these other tools if you're into that.

[+] auvrw|11 years ago|reply

they're for different use cases.

mechanicalsoup is more like zombie.js ( http://zombie.labnotes.org/ ) than selenium, chromedriver, or some other webdriver ( https://dvcs.w3.org/hg/webdriver/raw-file/default/webdriver-... ) implementation in that it emulates browser functionality from within a runtime, making http requests and parsing the response directly from the python (or node or ruby or whatever) runtime rather than communicating with a browser that, in turn, makes requests to the website.

the advantages of these "emulated" browsers is that tests run faster and are easier to set up. the disadvantage is that they don't fully duplicate browser functionality, particularly for client-side javascript. i think zombie might be able to run some javascript since it's in no, but mechanical soup appears ( https://github.com/hickford/MechanicalSoup/blob/master/mecha... ) not to execute javascript at all.

this is a nice little library that, as the README explains, fills a spot in the python ecosystem that had apparently become somewhat stagnant, but there's really not much to it other than combining Requests with beautifulsoup in order to provide a drop-in replacement for some existing api. i think this would mainly be useful for scraping rather than testing. the emulated browser and custom unittest module that ship with Django are probably better for the latter.

[+] smellf|11 years ago|reply

If you don't care what browser is making the requests, use MechanicalSoup. If you do care, use Selenium.

[+] goorpyguy|11 years ago|reply

Does it have a javascript engine? Because we had to abandon BeautifulSoup/Mechanize over this a couple years ago and switch to HTMLUnit (Java).

[+] jdnier|11 years ago|reply

There's not a lot to it so far (a single class, three tests). I wonder if the author has a road map for the project.

[+] unknown|11 years ago|reply

[deleted]

[+] webmaven|11 years ago|reply

How does MechanicalSoup (or RoboBrowse, for that matter, this is the first I've heard of either) compare to Scrapy?: http://scrapy.org/

[+] rhgraysonii|11 years ago|reply

Have any documentation on a roadmap for things as they go forward? Would love to send some PR's your way :)

[+] supsep|11 years ago|reply

This is exactly what I was looking for my next project. I was trying to do this with Node.js to avail, Thanks!

27 comments