A Generalized Method for Sentiment Analysis across Different Sources

M. Ashir, Abubakar (2021) A Generalized Method for Sentiment Analysis across Different Sources. Applied Computational Intelligence and Soft Computing.

[img] Text (Research Article)
2529984.pdf - Published Version

Download (1MB)
Official URL: https://www.hindawi.com/journals/acisc/2021/252998...

Abstract

Sentiment analysis is widely used in a variety of applications such as online opinion gathering for policy directives in government, monitoring of customers, and staff satisfactions in corporate bodies, in politics and security structures for public tension monitoring, and so on. In recent times, the field met with new set of challenges where new algorithms have to contend with highly unstructured sources for sentiment expressions emanating from online social media fora. In this study, a rule and lexical-based procedure is proposed together with unsupervised machine learning to implement sentiment analysis with an improved generalization ability across different sources. To deal with sources devoid of syntactic and grammatical structure, the approach incorporates a ruled-based technique for emoticon detection, word contraction expansion, noise removal, and lexicon-based text preprocessing using lexical features such as part of speech (POS), stop words, and lemmatization for local context analysis. A text is broken into number of tokens with each representing a sentence and then lexicon-dependent features are extracted from each token. )e features are merged together using a combining function for a given text before being used to train a machine learning classifier. )e proposed combining functions leverage on averaging and information gain concepts. Experimental results with different machine leaning classifiers indicate that improved performance with great deal of generalization capacity across both structured and nonstructured sources can be realized. )e finding shows that carefully designed lexical features reinforce learning process in unsupervised learning more than using word embeddings alone as the features. Obtained experimental results from movie review dataset (recall � 74.9%, precision � 70.9%, F1-score � 72.9%, and accuracy � 72.0%) and twitter samples’ datasets (recall � 93.4%, precision � 89.5%, F1-score � 91.4%, and accuracy � 91.1%) show the efficacy of the proposed approach in comparison with other state-of-the-art research studies.

Item Type: Article
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Depositing User: ePrints deposit
Date Deposited: 02 Mar 2022 08:59
Last Modified: 02 Mar 2022 08:59
URI: http://eprints.tiu.edu.iq/id/eprint/843

Actions (login required)

View Item View Item