SQSentimentIQ

Methodology

How we compute sentiment from Amazon product reviews

01

Dataset

The analysis is based on the McAuley Amazon Cell Phones & Accessories 5-core dataset, a publicly available corpus compiled by Julian McAuley at UC San Diego. The 5-core variant retains only users and items with at least five reviews each, ensuring statistical reliability.

SourceMcAuley Lab — Amazon Review Data (2018)
CategoryCell Phones & Accessories
Scope18 products across 6 brands (top 3 by review count per brand)
BrandsApple · Samsung · Google · OnePlus · Motorola · Nokia
Reviews~500k–1M sampled from the full dataset
02

VADER Sentiment Scoring

VADER (Valence Aware Dictionary and sEntiment Reasoner) is the primary sentiment engine. Tuned for social media and informal text, it outputs a compound score in [-1.0, +1.0].

LabelConditionMeaning
Positivecompound ≥ 0.05Clearly positive tone
Neutral-0.05 < c < 0.05No strong signal
Negativecompound ≤ -0.05Clearly negative tone
03

TextBlob — Secondary Signal

TextBlobprovides two supplementary metrics that complement VADER's output:

Polarity

A float in [-1.0, +1.0]. Used as a cross-check against the VADER compound score to flag inconsistencies.

Subjectivity

A float in [0.0, 1.0]. Reviews with very low subjectivity (e.g. shipping confirmations) are excluded from aspect scoring.

04

Aspect Extraction

Aspect-level sentiment is extracted using keyword co-occurrence powered by spaCy's dependency parser. Six aspects are tracked:

BatteryCameraScreenPriceBuild QualityDelivery

For each aspect, predefined seed keywords are matched against review tokens. spaCy's dependency graph is then walked to collect nearby adjectives and adverbs. VADER scores the extracted snippet, and results are aggregated into positive/neutral/negative percentages.

05

Aggregation & Output

After per-review scoring, results are aggregated per product into three outputs:

Overall sentiment

Positive/neutral/negative review counts as percentages of total.

Aspect scores

Per-aspect sentiment percentages plus mention count. Top 3 praised and criticized aspects ranked.

Monthly sentiment

Reviews bucketed by calendar month into a time-series used in the Trends section.

All results are serialised into a single data.json file and bundled statically — no runtime API calls are made.