Note: this fails when you try to install using pip 1.1, the version that comes with Debian wheezy. If you have this problem, it's easy to work around it for your specific virtualenv, without changing the rest of the system:
Mechanize is a core part in quite a few of my projects lately and the fact that it hasn't been modified in over 2 years has been very worrisome.
There are lots of edge cases out there on websites. Mechanize has built up years of fixes and workarounds for these, I hope that MechanicalSoup can learn from these the easy way rather than waiting to make the same mistakes again.
I also hope that this repo grows into a bigger community of support, not just one person contributing (who could leave / get bored at any time). Looking forward to following this!
Er...that's just the Python Mechanize, right? Ruby's Mechanize has been regularly updated and patched, though I can't say I've used it to the extent that I've run into infuriating edge cases: https://github.com/sparklemotion/mechanize
I personally find Capybara[0] to be the happy medium for web scraping, if Python isn't a hard requirement. It has a simple API, like MechanicalSoup, but it can also easily be configured to use Selenium, node-webkit, or any other browser you want for full proper JS evaluation.
This is fantastic. I've used python mechanize in some very large projects and it was very frustrating - their lack of documentation and, well, the fact that it's complete "abandonware".
I've had mechanize repository cloned for a year now, planning to do something with it - never got around to. Looks like MechanicalSoup just got themselves a new contributor!
Selenium seems very heavy-weight to me (granted I have only used selenium server). If you don't need to interpret javascript after a page loads then you might be able to use mechanize. In my experience I've gotten better performance and a higher ease of development with mechanize over selenium or other "full" headless browsers. Different tools for different jobs I suppose, I just tend to go for the smallest tool first.
edit: replace mechanize with mechanicalsoup in the above paragraph, they are aiming to solve the same problems in the same way.
Personally I really like selenium + PhantomJS (headless WebKit browser) since it allows you to do things like automate real user interactions in your tests, then run the test suite on your CI: http://www.realpython.com/blog/python/headless-selenium-test...
Also interesting if you're going down this road is CasperJS which is basically the same thing for JavaScript, and Velocity ( https://github.com/xolvio/velocity ) which is a test runner for Meteor that seems to run all your tests constantly and give you ~real-time feedback for TDD.
Lastly, there are many things to tie in Cucumber-style testing specs with Capybara and these other tools if you're into that.
mechanicalsoup is more like zombie.js ( http://zombie.labnotes.org/ ) than selenium, chromedriver, or some other webdriver ( https://dvcs.w3.org/hg/webdriver/raw-file/default/webdriver-... ) implementation in that it emulates browser functionality from within a runtime, making http requests and parsing the response directly from the python (or node or ruby or whatever) runtime rather than communicating with a browser that, in turn, makes requests to the website.
the advantages of these "emulated" browsers is that tests run faster and are easier to set up. the disadvantage is that they don't fully duplicate browser functionality, particularly for client-side javascript. i think zombie might be able to run some javascript since it's in no, but mechanical soup appears ( https://github.com/hickford/MechanicalSoup/blob/master/mecha... ) not to execute javascript at all.
this is a nice little library that, as the README explains, fills a spot in the python ecosystem that had apparently become somewhat stagnant, but there's really not much to it other than combining Requests with beautifulsoup in order to provide a drop-in replacement for some existing api. i think this would mainly be useful for scraping rather than testing. the emulated browser and custom unittest module that ship with Django are probably better for the latter.
[+] [-] flexd|11 years ago|reply
[1] http://robobrowser.readthedocs.org/en/latest/readme.html
[+] [-] rahimnathwani|11 years ago|reply
- mkvirtualenv whatever
- pip install pip --upgrade
- pip install robobrowser
[+] [-] mattme|11 years ago|reply
[+] [-] wc-|11 years ago|reply
There are lots of edge cases out there on websites. Mechanize has built up years of fixes and workarounds for these, I hope that MechanicalSoup can learn from these the easy way rather than waiting to make the same mistakes again.
I also hope that this repo grows into a bigger community of support, not just one person contributing (who could leave / get bored at any time). Looking forward to following this!
[+] [-] danso|11 years ago|reply
[+] [-] lazerwalker|11 years ago|reply
[0]http://github.com/jnicklas/capybara
[+] [-] actionscripted|11 years ago|reply
[+] [-] Deusdies|11 years ago|reply
I've had mechanize repository cloned for a year now, planning to do something with it - never got around to. Looks like MechanicalSoup just got themselves a new contributor!
[+] [-] diminoten|11 years ago|reply
[+] [-] wc-|11 years ago|reply
edit: replace mechanize with mechanicalsoup in the above paragraph, they are aiming to solve the same problems in the same way.
[+] [-] seanp2k2|11 years ago|reply
Also interesting if you're going down this road is CasperJS which is basically the same thing for JavaScript, and Velocity ( https://github.com/xolvio/velocity ) which is a test runner for Meteor that seems to run all your tests constantly and give you ~real-time feedback for TDD.
Lastly, there are many things to tie in Cucumber-style testing specs with Capybara and these other tools if you're into that.
[+] [-] auvrw|11 years ago|reply
mechanicalsoup is more like zombie.js ( http://zombie.labnotes.org/ ) than selenium, chromedriver, or some other webdriver ( https://dvcs.w3.org/hg/webdriver/raw-file/default/webdriver-... ) implementation in that it emulates browser functionality from within a runtime, making http requests and parsing the response directly from the python (or node or ruby or whatever) runtime rather than communicating with a browser that, in turn, makes requests to the website.
the advantages of these "emulated" browsers is that tests run faster and are easier to set up. the disadvantage is that they don't fully duplicate browser functionality, particularly for client-side javascript. i think zombie might be able to run some javascript since it's in no, but mechanical soup appears ( https://github.com/hickford/MechanicalSoup/blob/master/mecha... ) not to execute javascript at all.
this is a nice little library that, as the README explains, fills a spot in the python ecosystem that had apparently become somewhat stagnant, but there's really not much to it other than combining Requests with beautifulsoup in order to provide a drop-in replacement for some existing api. i think this would mainly be useful for scraping rather than testing. the emulated browser and custom unittest module that ship with Django are probably better for the latter.
[+] [-] smellf|11 years ago|reply
[+] [-] goorpyguy|11 years ago|reply
[+] [-] jdnier|11 years ago|reply
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] webmaven|11 years ago|reply
[+] [-] rhgraysonii|11 years ago|reply
[+] [-] supsep|11 years ago|reply
[+] [-] jpd750|11 years ago|reply
[+] [-] volent|11 years ago|reply
[+] [-] msane|11 years ago|reply
[deleted]