Semantic Twitter sentiment analysis based on a fuzzy thesaurus

Heba M. Ismail, Boumediene Belkhouche, Nazar Zaki

Research output: Contribution to journalArticlepeer-review

22 Citations (Scopus)

Abstract

We define a new, fully automated and domain-independent method for building feature vectors from Twitter text corpus for machine learning sentiment analysis based on a fuzzy thesaurus and sentiment replacement. The proposed method measures the semantic similarity of Tweets with features in the feature space instead of using terms’ presence or frequency feature vectors. Thus, we account for the sentiment of the context instead of just counting sentiment words. We use sentiment replacement to reduce the dimensionality of the feature space and a fuzzy thesaurus to incorporate semantics. Experimental results show that sentiment replacement yields up to 35% reduction in the dimensionality of the feature space. Moreover, feature vectors developed based on a fuzzy thesaurus show improvement of sentiment classification performance with multinomial naïve Bayes and support vector machine classifiers with accuracies of 83 and 85%, respectively, on the Stanford testing dataset. Incorporating the fuzzy thesaurus resulted in the best accuracy compared to the baselines with an increase greater than 3%. Comparable results were obtained with a larger dataset, the STS-Gold, indicating the robustness of the proposed method. Furthermore, comparison of results with previous work shows that the proposed method outperforms other methods reported in the literature using the same benchmark data.

Original languageEnglish
Pages (from-to)6011-6024
Number of pages14
JournalSoft Computing
Volume22
Issue number18
DOIs
Publication statusPublished - Sept 1 2018

Keywords

  • Fuzzy thesaurus
  • Semantic analysis
  • Text context
  • Text mining
  • Twitter sentiment analysis

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Geometry and Topology

Fingerprint

Dive into the research topics of 'Semantic Twitter sentiment analysis based on a fuzzy thesaurus'. Together they form a unique fingerprint.

Cite this