Opinion mining extends sentiment analysis by extracting structured representations of opinions from text. While sentiment analysis typically classifies the overall polarity of a document or sentence, opinion mining aims to identify the full structure of an expressed opinion: who holds the opinion (the opinion holder), what entity or aspect the opinion is about (the target), what sentiment is expressed (the polarity), and when the opinion was expressed (the time). This structured extraction enables applications such as comparative opinion mining, opinion summarisation, and fine-grained analysis of public discourse about products, policies, or public figures.
Opinion Components and Extraction
h = opinion holder (entity expressing the opinion)
t = opinion target (entity being evaluated)
a = aspect of the target
s = sentiment (positive, negative, neutral)
time = temporal reference
Extracting opinion components from unstructured text requires a combination of NLP techniques. Opinion holder identification draws on named entity recognition and coreference resolution to determine who is expressing an opinion, particularly important in news text where journalists report others' opinions. Target extraction identifies the entities or topics under discussion, often using dependency parsing to link sentiment expressions to their syntactic arguments. The sentiment classification component determines the polarity of the opinion, which may require resolving the scope of negation and understanding comparative constructions ("X is better than Y").
Comparative and Contrastive Opinions
Comparative opinion mining identifies sentences that express preferences or comparisons between entities and extracts the comparative relations. Comparative sentences take forms such as "Camera X has better image quality than Camera Y" or "I prefer restaurant A to restaurant B." Jindal and Liu (2006) developed methods to identify comparative sentences using class sequential rules derived from labelled training data and to extract the entities being compared, the aspect of comparison, and the preferred entity. This structured extraction enables applications in consumer decision support and competitive analysis.
The commercial value of online opinions has given rise to opinion spam — fake reviews written to artificially promote or damage products. Ott et al. (2011) created the first gold-standard dataset of deceptive opinion spam and showed that human judges perform only slightly better than chance at detecting fake reviews, while machine learning classifiers achieve around 90% accuracy using psycholinguistic features. Detecting opinion spam remains an active research area, as the sophistication of spam generation has increased with the availability of large language models.
Opinion summarisation aggregates individual opinions into concise summaries that capture the distribution of sentiments across different aspects of a target entity. Aspect-based opinion summarisation might produce output such as "Most reviewers praised the camera's image quality (85% positive) but criticised its battery life (60% negative)." Generating such summaries requires aspect extraction, sentiment classification for each aspect, and natural language generation to produce readable output. Both extractive approaches, which select representative opinion sentences, and abstractive approaches, which generate novel summary text, have been explored for this task.