Methodology

How we compute sentiment from Amazon product reviews

Dataset

The analysis is based on the McAuley Amazon Cell Phones & Accessories 5-core dataset, a publicly available corpus compiled by Julian McAuley at UC San Diego. The 5-core variant retains only users and items with at least five reviews each, ensuring statistical reliability.

SourceMcAuley Lab — Amazon Review Data (2018)

CategoryCell Phones & Accessories

Scope18 products across 6 brands (top 3 by review count per brand)

BrandsApple · Samsung · Google · OnePlus · Motorola · Nokia

Reviews~500k–1M sampled from the full dataset

VADER Sentiment Scoring

VADER (Valence Aware Dictionary and sEntiment Reasoner) is the primary sentiment engine. Tuned for social media and informal text, it outputs a compound score in [-1.0, +1.0].

Label	Condition	Meaning
Positive	`compound ≥ 0.05`	Clearly positive tone
Neutral	`-0.05 < c < 0.05`	No strong signal
Negative	`compound ≤ -0.05`	Clearly negative tone

TextBlob — Secondary Signal

TextBlobprovides two supplementary metrics that complement VADER's output:

Polarity

A float in [-1.0, +1.0]. Used as a cross-check against the VADER compound score to flag inconsistencies.

Subjectivity

A float in [0.0, 1.0]. Reviews with very low subjectivity (e.g. shipping confirmations) are excluded from aspect scoring.

Aspect Extraction

Aspect-level sentiment is extracted using keyword co-occurrence powered by spaCy's dependency parser. Six aspects are tracked:

BatteryCameraScreenPriceBuild QualityDelivery

For each aspect, predefined seed keywords are matched against review tokens. spaCy's dependency graph is then walked to collect nearby adjectives and adverbs. VADER scores the extracted snippet, and results are aggregated into positive/neutral/negative percentages.

Aggregation & Output

After per-review scoring, results are aggregated per product into three outputs:

→

Overall sentiment

Positive/neutral/negative review counts as percentages of total.

→

Aspect scores

Per-aspect sentiment percentages plus mention count. Top 3 praised and criticized aspects ranked.

→

Monthly sentiment

Reviews bucketed by calendar month into a time-series used in the Trends section.

All results are serialised into a single data.json file and bundled statically — no runtime API calls are made.