Computational Linguistics
About

Sentiment Analysis

Sentiment analysis determines the affective orientation of text — whether the expressed opinion is positive, negative, or neutral — drawing on computational methods from machine learning and lexicon-based approaches to quantify subjective language at document, sentence, and aspect levels.

sentiment(d) = argmax_{s ∈ {pos, neg, neu}} P(s | d)

Sentiment analysis, also called opinion mining, is the computational study of people's opinions, sentiments, evaluations, attitudes, and emotions expressed in written text. The task is typically formulated as a classification problem: given a text unit (document, sentence, or phrase), determine whether it expresses a positive, negative, or neutral sentiment. More fine-grained formulations predict sentiment on ordinal scales (e.g., 1 to 5 stars) or assign continuous valence scores. Sentiment analysis has become one of the most commercially important NLP applications, with uses spanning product review analysis, brand monitoring, financial market prediction, and political opinion tracking.

Lexicon-Based Approaches

Lexicon-Based Sentiment Scoring score(d) = ∑_{w ∈ d} polarity(w) × modifier(w)

polarity(w) ∈ {-1, 0, +1} from sentiment lexicon
modifier(w) accounts for negation, intensification, diminution

Classification: positive if score(d) > θ, negative if score(d) < -θ

Lexicon-based methods compute sentiment by aggregating the polarities of individual words using a sentiment dictionary such as SentiWordNet, VADER, or the MPQA lexicon. Each word is assigned a polarity score, and the document's overall sentiment is derived by summing or averaging these scores, with adjustments for valence shifters such as negation ("not good"), intensifiers ("very good"), and diminishers ("somewhat good"). VADER (Valence Aware Dictionary and sEntiment Reasoner) incorporates grammatical and syntactic heuristics, handling capitalisation, punctuation, and degree modifiers to achieve performance competitive with machine learning approaches on social media text.

Machine Learning and Deep Learning Approaches

Supervised machine learning approaches treat sentiment analysis as a standard text classification problem, training classifiers on labelled datasets of reviews or opinions. Pang, Lee, and Vaithyanathan (2002) established this paradigm by showing that machine learning classifiers (Naive Bayes, SVMs, MaxEnt) trained on bag-of-words features could classify movie review sentiment with accuracies around 80%, comparable to or exceeding lexicon-based methods. Subsequent work improved performance through feature engineering, incorporating subjectivity detection, and leveraging document structure.

The Challenge of Sarcasm and Irony

Sarcasm and irony pose fundamental challenges for sentiment analysis because the surface-level sentiment of individual words contradicts the intended sentiment of the utterance. The sentence "Oh great, another meeting" uses the positive word "great" to express a negative sentiment. Detecting sarcasm requires pragmatic reasoning that goes beyond lexical and even syntactic analysis, involving context, speaker intent, and common-sense knowledge. Despite significant research, sarcasm detection remains an open problem, with the best systems achieving F1 scores well below human performance.

Deep learning has substantially advanced sentiment analysis. Recursive neural networks (Socher et al., 2013) model compositional semantics by building representations bottom-up over parse trees, capturing how negation and other modifiers interact with sentiment-bearing words. Convolutional neural networks extract sentiment-relevant local features, while attention-based models learn which parts of the text are most indicative of sentiment. Pretrained transformers such as BERT have pushed state-of-the-art results on standard benchmarks like SST (Stanford Sentiment Treebank), achieving accuracies above 95% on binary sentiment classification. However, domain adaptation remains challenging: a model trained on movie reviews may perform poorly on restaurant reviews or financial text.

Interactive Calculator

Enter labeled training examples (one per line, format label,text) followed by a blank line and a single test line to classify. The calculator trains a Naive Bayes classifier with Laplace smoothing and shows posterior probabilities for each class.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics

References

  1. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. Proceedings of EMNLP, 79–86. doi:10.3115/1118693.1118704
  2. Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan & Claypool.
  3. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. Proceedings of EMNLP, 1631–1642.
  4. Hutto, C. J., & Gilbert, E. (2014). VADER: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of ICWSM, 216–225.

External Links