Methodology
How we compute sentiment from Amazon product reviews
Dataset
The analysis is based on the McAuley Amazon Cell Phones & Accessories 5-core dataset, a publicly available corpus compiled by Julian McAuley at UC San Diego. The 5-core variant retains only users and items with at least five reviews each, ensuring statistical reliability.
VADER Sentiment Scoring
VADER (Valence Aware Dictionary and sEntiment Reasoner) is the primary sentiment engine. Tuned for social media and informal text, it outputs a compound score in [-1.0, +1.0].
| Label | Condition | Meaning |
|---|---|---|
| Positive | compound ≥ 0.05 | Clearly positive tone |
| Neutral | -0.05 < c < 0.05 | No strong signal |
| Negative | compound ≤ -0.05 | Clearly negative tone |
TextBlob — Secondary Signal
TextBlobprovides two supplementary metrics that complement VADER's output:
Polarity
A float in [-1.0, +1.0]. Used as a cross-check against the VADER compound score to flag inconsistencies.
Subjectivity
A float in [0.0, 1.0]. Reviews with very low subjectivity (e.g. shipping confirmations) are excluded from aspect scoring.
Aspect Extraction
Aspect-level sentiment is extracted using keyword co-occurrence powered by spaCy's dependency parser. Six aspects are tracked:
For each aspect, predefined seed keywords are matched against review tokens. spaCy's dependency graph is then walked to collect nearby adjectives and adverbs. VADER scores the extracted snippet, and results are aggregated into positive/neutral/negative percentages.
Aggregation & Output
After per-review scoring, results are aggregated per product into three outputs:
Overall sentiment
Positive/neutral/negative review counts as percentages of total.
Aspect scores
Per-aspect sentiment percentages plus mention count. Top 3 praised and criticized aspects ranked.
Monthly sentiment
Reviews bucketed by calendar month into a time-series used in the Trends section.
All results are serialised into a single data.json file and bundled statically — no runtime API calls are made.