top | item 10363764

Show HN: Analyzes your social media comments to get psychometrics on personality

2 points| testitouter | 10 years ago |emotize.co | reply

2 comments

[+] Nadya|10 years ago|reply

"Okay"? Is all I can think about. I would try and provide more information, such as how people can use this information.

I found it was accurate for me - but these sorts of things tend to be broad enough to be accurate to anyone with vague accuracy.

>natural language processing and sentiment analysis of your last 25 reddit comments!

Perhaps expand on this or give a page on how you're doing the processing/analysis?

FWIW: http://i.imgur.com/xOY3owP.png

[+] testitouter|10 years ago|reply

Sure thing! It creates a machine learning classifier from a corpus of data using a unique algorithm which allows for a higher accuracy rate under the circumstances of natural language dialogue. The machine learning and text classification algorithm I employ is my own derivative of Naive Bayes, with added tokenization, named entity recognition, and Laplace smoothing. I have plotted the accuracy rate of the true/false positives of each of these machine learning classifers on a ROC curve, with their AUC probability from it. Here are the categories which Emotize’s text classifying and machine learning algorithms classify the chatting input from the user into: Mood and Sentiment Polarity Classifier: True False Personality Analysis Conducted with the Five-Factor Model (FFM): Openness to experience inventive/curious consistent/cautious Conscientiousness efficient/organized easy-going/careless Extraversion outgoing/energetic solitary/reserved Agreeableness friendly/compassionate analytical/detached Neuroticism sensitive/nervous secure/confident Mood and Sentiment Polarity Classifier

Since Emotize’s sentiment analysis is polar (either Negative or Positive), the algorithm can either classify positive correctly, or incorrectly. For 77.9% of the corpus phrases, the text classifying algorithm categorized the corpus data correctly. The dataset corpus that this text classification algorithm was trained with for use on Emotize was introduced by Pang/Lee, with 3,800 corpus phrases. 1/3 of them (1,280 corpus sentences) were excluded from the training set of the classifier and the ROC curve was calculated with the remaining corpus sentences from the training examples. This allowed the true/false outcomes to be known, and for the accuracy rate to be calculated by comparing the two outcomes. This data illustrates that Emotize’s sentiment analysis machine learning algorithm has an accuracy rate of 77.9%, which is a 1.1% difference between the average human sentiment analysis detection accuracy rate to positive/negative texts, which is 79%.

Mood and Sentiment Polarity Classifier

The corpus for the 5-factor psychometric machine learning algorithms were collected through a survey of 1,741 participants through the Emotize website. Emotize was able to build the 5-factor psychometric machine learning algorithm through this data. As in the Mood and Sentiment Polarity Analysis, 1/3 of the algorithm’s original training set corpus of the classifier was used in the ROC Curve.