(no title)
wrath | 11 years ago
2. Create a validation by manually identifying each review as positive or negative. Each time you modify your algorithm run it through your validation set and note the results in a spreadsheet. If you don't do that, you'll never know if and how you've improved the results. The bigger the validation set the better. Similarly, you can use part of your validation set as a training set into a classifier.
3. Find a scale that works to bias your score. For example, I would try to bias your negative score using a log scale. The fewer negative words you have the more they are worth, the more you have the less they are worth.
markovbling|11 years ago
Interesting reflection on society if there are more 1-gram ways of communicating negativity than positivity e.g. I'm more inclined to say 'terrible' for something very bad while it feels more natural to say 'very good' than 'excellent'. If that makes any sense :)
namecast|11 years ago
http://arxiv.org/pdf/1305.6143v2.pdf
and the lead authors's github repos are:
https://github.com/vivekn/sentiment https://github.com/vivekn/sentiment-web
He's implemented 'negative bi-gram detection' (my phrasing, not his) with this function:
https://github.com/vivekn/sentiment/blob/master/info.py#L26-...
...which I found useful as a jumping off point. Good luck!