jonathanbgn | 4 years ago | on: Wav2vec Overview: Semi and Unsupervised Speech Recognition
jonathanbgn's comments
jonathanbgn | 6 years ago | on: Show HN: Machine Wisdom – Generated Inspiration from Deep Learning (GPT-2)
jonathanbgn | 6 years ago | on: Show HN: Visualize how HN/Reddit talk about your company and products
The easiest library to do that would probably be scikit-learn with their ComplementNB class: https://scikit-learn.org/stable/modules/generated/sklearn.na...
For the data you can use the SemEval 2017 Task4-A dataset (around ~10K labeled tweets): https://github.com/cbaziotis/datastories-semeval2017-task4/t...
jonathanbgn | 6 years ago | on: Show HN: Visualize how HN/Reddit talk about your company and products
jonathanbgn | 6 years ago | on: Show HN: Visualize how HN/Reddit talk about your company and products
To prevent those words from appearing, I was thinking to implement some dictionary-check to only allow for meaningful words. However this approach also have drawback as you restrict people's words and can miss important new concepts.
Thanks for the feedback.
jonathanbgn | 6 years ago | on: Show HN: Visualize how HN/Reddit talk about your company and products
jonathanbgn | 6 years ago | on: Show HN: Visualize how HN/Reddit talk about your company and products
For example, in the case of Mazda where you say that "regret" is classified as positive, if you look into which message it comes from you can see the original sentence: "Buy a Mazda, you won't regret it :)"
I agree with you that the word cloud is not useful on its own, and this is why you can click on a word to see the actual messages. Think of the word cloud as merely an entry point into a more detailed analysis by a human.
Thanks for the feedback.
jonathanbgn | 6 years ago | on: Show HN: Visualize how HN/Reddit talk about your company and products
jonathanbgn | 6 years ago | on: Show HN: Visualize how HN/Reddit talk about your company and products
Also consider the lack of labeled data for HN and Reddit messages: I had to use Twitter messages to train the classifiers.
This is the reason why I tried to play with BERT to see if I could get a model to generalize well from only Twitter messages. From my experiments, if you activate BERT (which makes the app much slower), you should be able to get 60~70% accuracy.
It's not perfect, but not too bad as well if you are getting averages over a large amount of messages.
Overall it's still a work in progress, I expect to greatly improve the accuracy over the following weeks!
jonathanbgn | 6 years ago | on: Show HN: Visualize how HN/Reddit talk about your company and products
jonathanbgn | 6 years ago | on: Show HN: Visualize how HN/Reddit talk about your company and products
As for the subreddit, it's already on my next features list :)
jonathanbgn | 6 years ago | on: Show HN: Visualize how HN/Reddit talk about your company and products
The back-end is just Python/Flask and I use the free Algolia and Pushshift.io APIs to source the messages from HN and Reddit (big thanks to them!)
The Illustrated Wav2vec - https://jonathanbgn.com/2021/06/29/illustrated-wav2vec.html