Abstract
We define a new, fully automated and domain-independent method for building feature vectors from Twitter text corpus for machine learning sentiment analysis based on a fuzzy thesaurus and sentiment replacement. The proposed method measures the semantic similarity of Tweets with features in the feature space instead of using terms’ presence or frequency feature vectors. Thus, we account for the sentiment of the context instead of just counting sentiment words. We use sentiment replacement to reduce the dimensionality of the feature space and a fuzzy thesaurus to incorporate semantics. Experimental results show that sentiment replacement yields up to 35% reduction in the dimensionality of the feature space. Moreover, feature vectors developed based on a fuzzy thesaurus show improvement of sentiment classification performance with multinomial naïve Bayes and support vector machine classifiers with accuracies of 83 and 85%, respectively, on the Stanford testing dataset. Incorporating the fuzzy thesaurus resulted in the best accuracy compared to the baselines with an increase greater than 3%. Comparable results were obtained with a larger dataset, the STS-Gold, indicating the robustness of the proposed method. Furthermore, comparison of results with previous work shows that the proposed method outperforms other methods reported in the literature using the same benchmark data.
Original language | English |
---|---|
Pages (from-to) | 6011-6024 |
Number of pages | 14 |
Journal | Soft Computing |
Volume | 22 |
Issue number | 18 |
DOIs | |
Publication status | Published - Sept 1 2018 |
Keywords
- Fuzzy thesaurus
- Semantic analysis
- Text context
- Text mining
- Twitter sentiment analysis
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Geometry and Topology