Sure thing! It creates a machine learning classifier from a corpus of data using a unique algorithm which allows for a higher accuracy rate under the circumstances of natural language dialogue. The machine learning and text classification algorithm I employ is my own derivative of Naive Bayes, with added tokenization, named entity recognition, and Laplace smoothing. I have plotted the accuracy rate of the true/false positives of each of these machine learning classifers on a ROC curve, with their AUC probability from it.
Here are the categories which Emotize’s text classifying and machine learning algorithms classify the chatting input from the user into:
Mood and Sentiment Polarity Classifier:
True
False
Personality Analysis Conducted with the Five-Factor Model (FFM):
Openness to experience
inventive/curious
consistent/cautious
Conscientiousness
efficient/organized
easy-going/careless
Extraversion
outgoing/energetic
solitary/reserved
Agreeableness
friendly/compassionate
analytical/detached
Neuroticism
sensitive/nervous
secure/confident
Mood and Sentiment Polarity Classifier
Since Emotize’s sentiment analysis is polar (either Negative or Positive), the algorithm can either classify positive correctly, or incorrectly. For 77.9% of the corpus phrases, the text classifying algorithm categorized the corpus data correctly. The dataset corpus that this text classification algorithm was trained with for use on Emotize was introduced by Pang/Lee, with 3,800 corpus phrases. 1/3 of them (1,280 corpus sentences) were excluded from the training set of the classifier and the ROC curve was calculated with the remaining corpus sentences from the training examples. This allowed the true/false outcomes to be known, and for the accuracy rate to be calculated by comparing the two outcomes.
This data illustrates that Emotize’s sentiment analysis machine learning algorithm has an accuracy rate of 77.9%, which is a 1.1% difference between the average human sentiment analysis detection accuracy rate to positive/negative texts, which is 79%.
Mood and Sentiment Polarity Classifier
Since Emotize’s sentiment analysis is polar (either Negative or Positive), the algorithm can either classify positive correctly, or incorrectly. For 77.9% of the corpus phrases, the text classifying algorithm categorized the corpus data correctly. The dataset corpus that this text classification algorithm was trained with for use on Emotize was introduced by Pang/Lee, with 3,800 corpus phrases. 1/3 of them (1,280 corpus sentences) were excluded from the training set of the classifier and the ROC curve was calculated with the remaining corpus sentences from the training examples. This allowed the true/false outcomes to be known, and for the accuracy rate to be calculated by comparing the two outcomes.
This data illustrates that Emotize’s sentiment analysis machine learning algorithm has an accuracy rate of 77.9%, which is a 1.1% difference between the average human sentiment analysis detection accuracy rate to positive/negative texts, which is 79%.
The raw, unparsed, and unorganized survey data used partly as the corpus data during the training process for this algorithm, is available here in the form of a ZIP file.
Personality Analysis Conducted with the Five-Factor Model (FFM) Classifier
The corpus for the 5-factor psychometric machine learning algorithms were collected through a survey of 1,741 participants through the Emotize website. Emotize was able to build the 5-factor psychometric machine learning algorithm through this data. As in the Mood and Sentiment Polarity Analysis, 1/3 of the algorithm’s original training set corpus of the classifier was used in the ROC Curve.
[+] [-] Nadya|10 years ago|reply
I found it was accurate for me - but these sorts of things tend to be broad enough to be accurate to anyone with vague accuracy.
>natural language processing and sentiment analysis of your last 25 reddit comments!
Perhaps expand on this or give a page on how you're doing the processing/analysis?
FWIW: http://i.imgur.com/xOY3owP.png
[+] [-] testitouter|10 years ago|reply
Since Emotize’s sentiment analysis is polar (either Negative or Positive), the algorithm can either classify positive correctly, or incorrectly. For 77.9% of the corpus phrases, the text classifying algorithm categorized the corpus data correctly. The dataset corpus that this text classification algorithm was trained with for use on Emotize was introduced by Pang/Lee, with 3,800 corpus phrases. 1/3 of them (1,280 corpus sentences) were excluded from the training set of the classifier and the ROC curve was calculated with the remaining corpus sentences from the training examples. This allowed the true/false outcomes to be known, and for the accuracy rate to be calculated by comparing the two outcomes. This data illustrates that Emotize’s sentiment analysis machine learning algorithm has an accuracy rate of 77.9%, which is a 1.1% difference between the average human sentiment analysis detection accuracy rate to positive/negative texts, which is 79%.
Mood and Sentiment Polarity Classifier
Since Emotize’s sentiment analysis is polar (either Negative or Positive), the algorithm can either classify positive correctly, or incorrectly. For 77.9% of the corpus phrases, the text classifying algorithm categorized the corpus data correctly. The dataset corpus that this text classification algorithm was trained with for use on Emotize was introduced by Pang/Lee, with 3,800 corpus phrases. 1/3 of them (1,280 corpus sentences) were excluded from the training set of the classifier and the ROC curve was calculated with the remaining corpus sentences from the training examples. This allowed the true/false outcomes to be known, and for the accuracy rate to be calculated by comparing the two outcomes. This data illustrates that Emotize’s sentiment analysis machine learning algorithm has an accuracy rate of 77.9%, which is a 1.1% difference between the average human sentiment analysis detection accuracy rate to positive/negative texts, which is 79%. The raw, unparsed, and unorganized survey data used partly as the corpus data during the training process for this algorithm, is available here in the form of a ZIP file. Personality Analysis Conducted with the Five-Factor Model (FFM) Classifier
The corpus for the 5-factor psychometric machine learning algorithms were collected through a survey of 1,741 participants through the Emotize website. Emotize was able to build the 5-factor psychometric machine learning algorithm through this data. As in the Mood and Sentiment Polarity Analysis, 1/3 of the algorithm’s original training set corpus of the classifier was used in the ROC Curve.