We just started out one and a half week ago, joining the Pragmatic Programmer's writing month. We though a 'release early, release often' approach would be best, that's why there are just a few in-progress chapters.
We will keep you posted, and thanks for the encouragement!
Thanks for your work, I am really enjoying it so far. As a counterpoint to some of the comments regarding the choice of language, I had the opposite response. Oh neat, I get to learn haskell and natural language processing at the same time.
If I could make one request, could you remove the mouseover from the paragraph text that shows the topic heading? It is really distracting for those us who like to use our mouse pointer as a finger when reading.
You seem to know a lot about NLP and I've asked this question in various places and never even found anyone who knew just a little, so I hope you don't mind that I ask you a small question on whether my problem can even be solved with NLP at all.
I'm looking for a way to extract addresses from web pages, where these addresses are immediately recognizable as such by people but are not in a standard format (zip codes before city or after, no zip codes at all, p/o box instead of street name, ...). All in text format (no graphics, no OCR problem) but inside html tags, in various forms (as row in a table, inside one or multple <div>'s, as an <ul>, etc).
- Is this an NLP problem?
- If so, where do I start reading/learning? Most NLP seems to be about understanding free-flowing texts of all sorts of subjects. I'm looking for 98% solutions in what I think is a restricted problem space. Is this a reasonable expectation?
It's an interesting paper that I intend to dig into more carefully, but I kind of wish that a paper "for the Working Programmer" used a language like Python rather than Haskell. I'm aware that Haskell has a very nice type system for doing things like this -- and I'm a language nerd myself, so It's not that -- but it just seems like it would be more practical in something more "mainstream."
That said, it is interesting from what I've read so far.
I may try to translate the examples into Javascript (CommonJS platform, not client-side .. on second thought, I may simply do it on NodeJS with Coffeescript) - just so I can learn the topic better. I find that much like taking notes during a lecture, it helps me retain the knowledge better.
Maybe you should try to do the same with python? :)
Agreed, I saw Haskell in the TOC and stopped reading. Still, an interesting project - and about as appropriate for "working" programmers as Larry Paulson's book, so no issues with the title.
It'd be easy enough to rewrite most of the examples in another language anyway (I'd hope), even if elegance is lost in the process...
I've posted this link before, but these NLP posts keep popping up on HN, so I'll keep posting.
Over at http://www.repustate.com, we're taking the more common functions that NLTK performs (and the ones it should) and porting them over as web services. NLTK is kind of buggy here & there, and it's not too great if you're dealing with big data sets. Our API, with the obvious handicap of network latency, is lightning fast because we ported many NLTK functions down to raw C.
Our API is free so have at it, let me know if you want to see us add anything.
whiled I'm sure something nice will come out of it you may wish to temporarily disable the NER feature because, it seems to amount to "select capitalized words" at least on the few pages I tried (wikipedia, bitcoin, nytimes).
I'll have to put this on my 'to read' list, it looks really interesting. I think natural language processing/understanding may become one of those next 'big things' like mobile and social media simply because understanding what a user is trying to do will become very important.
I am not a very good Haskel programmer, but I spend an occasional evening with it, and I am interested in NLP also (have been working off and on on NLP since the early 1980s).
From skimming through the book, it looks like a nice read and just went on my reading list.
This is pretty neat. At the risk of sounding childish, here I go -- I wish books like these could be given life like tryruby.org where you could try out examples and learn along the way. That would be wicked cool.
It's interesting to note that a lot of natural language processing is English-centered. It's clear that English natural language processing is way ahead of the curve, but based on the quality of Chinese results on Google Translate, I take it Asian languages don't do so hot when it comes to natural language processing?
Translation is a lot harder for language pairs that are less related. Most of the European languages are fairly close cousins, so translation between, say, English and French isn't that hard.
That said, it's generally true that for most NLP tasks, we're doing much better on languages similar to English.
Also note that translation is a very different task than parsing, or part-of-speech tagging, for example. Summarization and translation are both open research topics in NLP from what I understand, and aren't really 'solved' in any language.
translation is also a lot harder for data poor pairs of languages. machine translation relies on training on a parallel corpus (=same text, different languages), and gets better the bigger this is.
[+] [-] microtonal|15 years ago|reply
We will keep you posted, and thanks for the encouragement!
[+] [-] angrycoder|15 years ago|reply
If I could make one request, could you remove the mouseover from the paragraph text that shows the topic heading? It is really distracting for those us who like to use our mouse pointer as a finger when reading.
[+] [-] roel_v|15 years ago|reply
I'm looking for a way to extract addresses from web pages, where these addresses are immediately recognizable as such by people but are not in a standard format (zip codes before city or after, no zip codes at all, p/o box instead of street name, ...). All in text format (no graphics, no OCR problem) but inside html tags, in various forms (as row in a table, inside one or multple <div>'s, as an <ul>, etc).
- Is this an NLP problem? - If so, where do I start reading/learning? Most NLP seems to be about understanding free-flowing texts of all sorts of subjects. I'm looking for 98% solutions in what I think is a restricted problem space. Is this a reasonable expectation?
[+] [-] jimmyjim|15 years ago|reply
I actually remember reading your Slackware book a few years back. I've no doubt that the quality of this text will be as superb as that one's! Cheers!
[+] [-] hvs|15 years ago|reply
That said, it is interesting from what I've read so far.
[+] [-] albertsun|15 years ago|reply
http://www.nltk.org/book
[+] [-] 1331|15 years ago|reply
http://www.cl.cam.ac.uk/~lp15/MLbook/
[+] [-] microtonal|15 years ago|reply
[+] [-] warfangle|15 years ago|reply
Maybe you should try to do the same with python? :)
[+] [-] jlees|15 years ago|reply
It'd be easy enough to rewrite most of the examples in another language anyway (I'd hope), even if elegance is lost in the process...
[+] [-] dons|15 years ago|reply
[+] [-] sabat|15 years ago|reply
[+] [-] waterside81|15 years ago|reply
Over at http://www.repustate.com, we're taking the more common functions that NLTK performs (and the ones it should) and porting them over as web services. NLTK is kind of buggy here & there, and it's not too great if you're dealing with big data sets. Our API, with the obvious handicap of network latency, is lightning fast because we ported many NLTK functions down to raw C.
Our API is free so have at it, let me know if you want to see us add anything.
[+] [-] unknown|15 years ago|reply
[deleted]
[+] [-] riffraff|15 years ago|reply
It is blazing fast though :)
[+] [-] LeBlanc|15 years ago|reply
If anyone is interested in playing around with a robust natural language processing tool, I built an API for the Stanford Parser. http://nlp.naturalparsing.com/browserparser/parse
[+] [-] mark_l_watson|15 years ago|reply
I am not a very good Haskel programmer, but I spend an occasional evening with it, and I am interested in NLP also (have been working off and on on NLP since the early 1980s).
From skimming through the book, it looks like a nice read and just went on my reading list.
[+] [-] samratjp|15 years ago|reply
For now, OpenStudy will do the trick. I created a "StudyPad" if anyone wants to go through this book together. http://openstudy.com/studypads/Natural-Language-Processing-f...
[+] [-] jasonjei|15 years ago|reply
[+] [-] syllogism|15 years ago|reply
That said, it's generally true that for most NLP tasks, we're doing much better on languages similar to English.
[+] [-] brettbender|15 years ago|reply
[+] [-] _corbett|15 years ago|reply
[+] [-] unknown|15 years ago|reply
[deleted]