Does anyone know if anyone is working on a "flexible" HTML/XHTML parser a la BeautifulSoup / Nokogiri / TagSoup, etc? Node could become very useful as a base for building scrapers if this existed.
I've been trying to model libxml.js after Nokogiri. I wanted to get something built and working first. The next step is to expose libxml2's html parser.
Someone else has started working on find-by-CSS a la Nokogiri. I'll merge that into libxml.js when it's ready.
BTW, I'm looking for more help on this project. A new job has diminished the amount of time I can spend on OSS projects.
Why use libxml when JavaScript already has a standard XML API, E4X (ECMAScript for XML), as specified by ECMA 357? At least libxml should use the faster native XML support behind the scenes if available.
[+] [-] Maciek416|16 years ago|reply
Does anyone know if anyone is working on a "flexible" HTML/XHTML parser a la BeautifulSoup / Nokogiri / TagSoup, etc? Node could become very useful as a base for building scrapers if this existed.
[+] [-] sprsquish|16 years ago|reply
Someone else has started working on find-by-CSS a la Nokogiri. I'll merge that into libxml.js when it's ready.
BTW, I'm looking for more help on this project. A new job has diminished the amount of time I can spend on OSS projects.
[+] [-] olegp|16 years ago|reply
[+] [-] simonw|16 years ago|reply
[+] [-] Sephr|16 years ago|reply
[+] [-] Maciek416|16 years ago|reply
http://code.google.com/p/v8/issues/detail?id=235
"There are currently no plans for implementing E4X in V8."
This doesn't seem to have moved forward, and the thread ends in a link to libxmljs :)