Show HN: Natural Language Processing Demystified (Part One)
166 points| mothcamp | 3 years ago |nlpdemystified.org | reply
I published part one of my free NLP course. The course is intended to help anyone who knows Python and a bit of math go from the very basics all the way to today's mainstream models and frameworks.
I strive to balance theory and practice and so every module consists of detailed explanations and slides along with a Colab notebook (in most modules) putting the theory into practice.
In part one, we cover text preprocessing, how to turn text into numbers, and multiple ways to classify and search text using "classical" approaches. And along the way, we'll pick up useful bits on how to use tools such as spaCy and scikit-learn.
No registration required: https://www.nlpdemystified.org/
[+] [-] jll29|3 years ago|reply
But what saddens me is too many people are trying to dive into NLP without trying to understand language & linguistics first. For example, you can run a part of speech (POS) tagger in three lines of Python, but you will still not know much about what parts of speech are, which languages have which ones, what function they have in linguistic theory or practical applications.
What are the advantages of using the C7 tagset over the C5 or PENN tagsets?
Why is AT sometimes called DET?
etc.
I recommend people spend a bit of time to read an(y) introduction to linguistics textbook before diving into NLP, then the second investment will be worth so much more.
[+] [-] mywaifuismeta|3 years ago|reply
I don't think you necessarily need a linguistics background for NLP, but I think you need either a strong linguistics OR ML background so that you know what's going on under the hood and can make connections. Anyone can call into Huggingface, you don't need a course for that.
[+] [-] amitport|3 years ago|reply
(also an NLP researcher. Knows nothing about linguistics)
[+] [-] screye|3 years ago|reply
Transformers and scaling laws have made it such that the only thing that truly matters is your ability to build a model that can computationally and parametrically scale. The 2nd would be to figure out how to make more data 'viable' for usable within such a hungry model's encoding.
Look at anyone who has written the last 20 seminal papers in NLP, and almost none of them have a strong background in linguistics. Vision went through a similar period of forced obsolescence, during the 2012-2016 Alexnet -> VGG -> Inception -> Resnet transition.
It is unfortunate. But, time is limited and most researchers can only spare enough time to learn a few new things. Unfortunately for linguistics, it does not rank that high.
[+] [-] adamsmith143|3 years ago|reply
[+] [-] philophyse|3 years ago|reply
[+] [-] true_religion|3 years ago|reply
You are right that lack of fundamental knowledge is problematic, especially that tools can allow you to make a greater quantity of solutions and therefore also a greater quantity of mistakes.
However, at least the problem is still being solved.
For example, a few months ago I wanted to organize my media collection by tagging files with artist names. I had a list of artist names but it wasn’t comprehensive so I wired together a bunch of python NLP libraries together to automatically pull out proper nouns from filenames, recognize English names, then annotate the files.
I know almost nothing about parts of speech or anything else, so I made mistakes. About 10% of the results were errors in the first run, but after tuning it was down to about 1% which was good enough to run over the entire media library.
If not for the tools, I would have never been able to finish that chore in a single day. To me, it was worth it despite my amateur mistakes.
I view the library just like any other tool: a screw driver, a hammer, a wrench. I’m not a plumber or a carpenter, or an NLP researcher but I still want to use tools to fix my leaky faucets, remount my leaning cabinet doors, and organize my media collections as weekend projects.
[+] [-] LunaSea|3 years ago|reply
[+] [-] xtiansimon|3 years ago|reply
Linguists is a broad area of study. Can you be more specific? Such as grammar and syntax.
[+] [-] ad404b8a372f2b9|3 years ago|reply
- Frederick Jelinek
[+] [-] vb234|3 years ago|reply
[+] [-] meristem|3 years ago|reply
[+] [-] PainfullyNormal|3 years ago|reply
Do you have a favorite you can recommend?
[+] [-] mothcamp|3 years ago|reply
I published part one of my free NLP course. The course is intended to help anyone who knows Python and a bit of math go from the very basics all the way to today's mainstream models and frameworks.
I strive to balance theory and practice and so every module consists of detailed explanations and slides along with a Colab notebook (in most modules) putting the theory into practice.
In part one, we cover text preprocessing, how to turn text into numbers, and multiple ways to classify and search text using "classical" approaches. And along the way, we'll pick up useful bits on how to use tools such as spaCy and scikit-learn.
No registration required: https://www.nlpdemystified.org/
[+] [-] irln|3 years ago|reply
[+] [-] jasfi|3 years ago|reply
Which are the toughest NLP problems you know of that aren't being solved satisfactorily?
[+] [-] Der_Einzige|3 years ago|reply
think extractive QA but the answer size should be configurable and the answer can potentially be multiple spans, and spans may not need to be contiguous.
If you got a solution, I'd love to see it - and you could even beat the baselines for the only dataset that exists for it: https://paperswithcode.com/sota/extractive-document-summariz...
[+] [-] riku_iki|3 years ago|reply
[+] [-] airstrike|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] Utkarsh_Mood|3 years ago|reply