We watch what AI says
We don’t test products. We watch what leading AI shopping assistants recommend when buyers ask them ordinary questions, and we publish the patterns we see.
Methodology
We don’t test products. We watch what leading AI shopping assistants recommend when buyers ask them ordinary questions, and we publish the patterns we see.
A single answer from a single AI on a single day is a coin flip. Patterns across several frontier assistants, repeated week after week, are what turn that noise into a signal. Where the assistants agree, we say so. Where they don’t, we say that too — the disagreement is often the more interesting story.
A high score on this site means an AI assistant is likely to name this product or brand when a buyer asks about its category. It is not a product review, not editorial endorsement, not a verdict from us. Rankings are derived from the AI responses alone, independent of any commercial relationship — affiliate status never affects rank.
Glossary · what each number names
Every number on the site links back here. The entry names what the metric measures — not how we compute it.
The mean reviewer score across the reviewed products in a lineup — each product’s score (1 to 5), distilled from its video reviews, averaged over the products that carry one. Reported as “X.X / 5 across N reviewed”; unreviewed products are excluded, never counted as zero.
Whether the AIs and the reviewer room rank a lineup the same way. We rank a brand’s reviewed products two ways — by AI rank and by reviewer score — and read the per-product gap. Mostly in step is “aligned”; a product reviewers place far above its AI rank is a hidden gem, far below means the AIs are running generous.
A brand’s average product rating across the products we track in a section — blended from what video reviewers and shoppers give those products. Reported as “X.X ★ · avg of N”, with the creators / people split shown beneath. A brand with no rated product shows nothing — never counted as zero, never invented.
How deep a brand plays in a section: how many of its products rank here and how high its best one reaches — “N products · best #k”. Across a department we also note how many subcategories it competes in. It reads a brand as a portfolio, not a single product.
The summary position a brand or product holds across an entire section — pulled together from how every AI ranked it in every question that lives under that section.
The small line next to a row plots the same metric as the number to its left, week by week. The right edge is the current value.
A chip like #1 on a per-AI row means at least one of that AI’s answers in the current section placed the subject there. It can differ from the aggregate rank above — broad presence and peak placement are different signals.
Whether several AI assistants name the same subject when asked the same buyer question. Full dots mean every AI we watch included it; partial dots mean some did and some didn’t.
The arrow + number next to a rank compares the current weekly snapshot with the one before. Up means the subject moved closer to the top; down means it slid.
How many of the AI assistants put this product or brand at #1 on at least one of the section’s buyer questions — a quick read on how often it tops the panel rather than merely places.
The number of distinct advantage tags (e.g. “battery life”, “value”) the AIs attached to this product or brand across the section’s questions. Wider breadth means it gets recommended for more different reasons.
A coarse confidence ladder that filters out noise — brands and products that have shown up once and never again sit at the bottom; subjects with broad, repeated presence sit at the top. Tier governs what appears in listings and what feeds editorial blocks.
A brand or product that was outside the visible top in the previous snapshot but inside the visible top this snapshot. Movement up from #45 → #28 is a riser, not a newcomer.
A count of questions whose top answer changed between the previous snapshot and this one. Bigger number = noisier week.
How much a category’s top ten reshuffled over the tracked window. Week to week we compare the set of leaders; the share that changed places, averaged across the window, is the turnover. We open each deep-dive on whichever category moved most.
Where Google review ratings (the crowd) and the AI rankings (the models) disagree. A brand the crowd rates highly but the models rank low is a crowd-favourite the models lagged on; a brand the models keep near the top despite cooler reviews is one they run hot on. Ratings and volume come from Google reviews of the brand’s products.
For each press item we mark whether the brand it’s about rose or fell in the rankings over the same window (or whether the story moved the whole category). It’s a correlation we surface to read the headlines as cause — not a proven causal claim.
How often the AIs we watch agreed on the same top pick for the same buyer question this week. Low agreement means the answer depends on which AI you happen to ask.
A question this week where the AIs we watch each named a different top pick. We feature one of these as the week’s contested intent.
How many times a brand was named across all the buyer questions in a given week. Rewards broad presence across categories rather than peak rank in any one of them.
A coarse positive / mixed / critical label derived from how human reviewers talked about the product. Independent of what AI says about it.
How recent news coverage of a brand reads, over the last 30 days. We pull the latest articles from Google News, classify each one as positive, neutral, or critical, and tally them — the bar and the net word are computed straight from those counts, never edited by hand. It’s a measure of press mood, independent of what AI or reviewers say; a brand with no recent coverage simply shows no press block. We re-check the most visible brands on a rolling cadence — the better-known and faster-moving ones more often — and once a read passes two months old we stop showing it rather than present stale coverage as current. On a brand hub (a category or subcategory rail), the same read rolls up across the ranked brands there — the “Brands in the news” strip lists the ones drawing coverage, and the hub’s net mood is tallied the same way, one brand per vote.
What the wider buying public makes of a product, pooled from Google Shopping’s aggregated reviews across retailers. We match the product to its catalogue listing, then read the average rating, the star histogram, and the owner-aspect breakdown. The headline score, the histogram bars, and the “trust” stamp are computed straight from the numbers; the aspect scores and the two quotes are derived from the real review text, never invented. It’s a far wider but blunter jury than the video critics — a product with no confidently-matched listing simply shows no buyer block.
How a product’s official marketing claims hold up against independent reality. We lift the brand’s headline claims off its product page, then check each against what owners and expert reviewers report — pooling our buyer-review aspects, the video reviewers’ verdicts, and their stored verbatim quotes. Every claim gets a plain verdict (Holds up, Mixed, Overstated, or Unverified), a one-sentence read of the evidence, and — where one fits — a single real, sourced quote (always picked from quotes we already store, never written by us). The marketing-honesty score and the single widest promise→reality gap are computed straight from those verdicts: the score is the average of per-claim points — Holds up 100, Mixed 50, Overstated 0 — and a claim the reviews can’t speak to is marked Unverified and left out of that average.
The approximate price range a product currently sells for, read from Google Shopping at capture time. We match the product to its catalogue page and prefer Google’s own typical price range for that item; when Google doesn’t publish one, we derive the span from the new-condition retailer listings, trimming obvious outliers (accessories, bundles, knockoff listings) — computed straight from the listed prices, never edited by hand. It is informational, not an offer: prices move daily and vary by region, so treat it as a ballpark, refreshed roughly monthly with an honest “as of” date. A product with no confidently-matched listing simply shows no price.
Where a product’s street price sits within its own field — all the priced products currently ranked in the same subcategory. We take the midpoint of each product’s price range and split the field’s midpoints into thirds; the two boundary numbers are stored on the subcategory and every page inherits them, so a product’s tier never disagrees with its field’s spectrum. Tiers are relative to the field, not absolute — a “premium” kettle costs less than a “budget” laptop — and they describe price position only, never quality. Recomputed monthly with the price refresh; fields with fewer than five priced products show no tier blocks at all. The field reflects the current ranking window, so boundaries shift honestly as products enter and leave the board.
Where a brand’s pricing sits among the brands it competes with, derived from its ranked products’ street prices — each product keeps the tier its own subcategory assigned it, and the brand’s median is the median of those products’ price midpoints across the category. A brand earns a label only when at least three of its ranked products carry prices and at least 60% of them fall in one tier; brands spread across tiers show no label at all. On brand hubs each brand is placed by its median within that one subcategory. Like the product tier, the label describes price position only, never quality, and recomputes monthly with the price refresh.
How well a brand’s marketing holds up across everything we’ve checked from it. We take the brand’s products that have a Claim Check and average their honesty scores — each product weighted equally — to get one brand-wide number, then rank that number against the other brands competing in the category. A brand needs at least two checked products before it earns a score, and a category needs at least three such brands before it shows a ranking. Like Claim Check itself, it measures only whether promises match independent reality, never overall quality, and recomputes weekly as more products are checked.
Whether human video reviewers broadly agree with where AI ranked a product. We distil a reviewer score from the product’s review videos, order the ranked products by that score, and compare each one’s reviewer position with its AI rank. Close positions read as agree; a few slots apart, differ; far apart, oppose — flagged as “AI over-rates” when AI ranks a product well above the reviewers, or a “sleeper” when reviewers rate it well above AI. The column only appears once enough products on the page carry reviewer coverage.
A single read on how strong the case to buy is right now, blending the signals already on the product’s page: where the AI panel ranks it, how the video critics scored it, what the wider buying public makes of it, whether the brand’s marketing claims hold up to reality (the Claim Check honesty score — our “lovemarks” signal), and how many of the AI models agree. The signals are weighed against each other; when one is missing — no critics yet, no matched buyer listing, no claim check — it is left out and the rest reweighted, never counted as zero. The gauge needle and the word (Reconsider, Check closely, Trust) are read straight off that blend, and the caption notes how many of the five signals fed it — a synthesis of what this page already holds, not a product review or a verdict from us.
Each ranking is captured weekly. The time-machine stepper rewinds a leaderboard to a recent past snapshot — the rank and the snapshot-over-snapshot change are recomputed from that week’s data, while the AI score, per-model and reviewer columns are current-only and dim to mark it (we never fabricate a historical score). We surface only the last few snapshots; deeper history is retained but not published here — get in touch if you need it.
A one-glance read of how a slice’s human reviewers line up with the AI ranking. Within each slice we re-rank the reviewed picks by their reviewer score and compare that order with the AI order. Tight agreement reads in step; when reviewers consistently rate the picks higher than the AIs did, reviewers rate higher; when the AIs rank them above where reviewers land them, AIs run high; and lots of disagreement with no clear direction, split. The chip only appears once a slice has at least a few reviewed picks — otherwise it is left off rather than guessed.
We watch a small panel of leading AI shopping assistants. The exact lineup is held constant within a measurement period and rotated only when the broader market changes.
The questions in a page’s FAQ are the real ones buyers ask about that exact topic or product — drawn from the “People Also Ask” box Google shows for the query, when available. They appear on category, subcategory and buyer-question pages as well as on individual brand and product profiles, and each level answers at its own altitude: a category page helps you choose between types, a product page answers about that one product. The answers are written only from material we already hold for the page: what independent video reviewers say and the AI ranking itself (the top picks and the traits the assistants value). We never invent specs, prices, or claims, and a question we cannot answer from that material is left off rather than padded.
Methodology evolves quietly. Principles stay public.