How it works

How we ask. How we count.

The chart — what AI recommends

We don’t test products. We chart what leading AI shopping assistants recommend when buyers ask them ordinary questions — the same buyer question, put to every assistant on the panel, every week.

A single answer from a single AI on a single day is a coin flip. Patterns across several frontier assistants, repeated week after week, are what turn that noise into a signal. A high rank means the assistants are likely to name this product — it is not a product review, not editorial endorsement, not a verdict from us. Rankings derive from the AI responses alone, independent of any commercial relationship — affiliate status never affects rank.

The reality check — what people say back

Then we check the leaders against the people who actually live with them: owner reviews, expert testers, the professional press, the maker’s own marketing claims, and safety data. The check speaks in its own words — Holds up, Divided, Overturned — and stays a separate signal: it never moves a product’s AI rank, and the rank never softens the check. Where the machines and the people disagree, we print the disagreement — that is often the most useful thing on the page.

The evidence — how much stands behind the check

Every check also says how much independent material backs it: reviewer videos, named press outlets, and retailer review clusters, counted as sources and banded Strong / Moderate / Thin. When there isn’t enough, the page says not enough evidence yet instead of pretending — thin evidence is a description, not a penalty, and a new release starts there honestly. No blended score anywhere: three signals, each with its own name, each auditable on the page that shows it.

Glossary · what each number names

A short glossary, not a spec sheet.

Every number on the site links back here. The entry names what the metric measures — not how we compute it.

Reviewer score (aggregate)

The mean reviewer score across the reviewed products in a lineup — each product’s score (1 to 5), distilled from its video reviews, averaged over the products that carry one. Reported as “X.X / 5 across N reviewed”; unreviewed products are excluded, never counted as zero.

Reviewer-checked (coverage)

How many of the ranked entries on this page carry an independent human cross-check — a distilled summary of their video reviews and/or aggregated buyer reviews — reported as “N of M reviewer-checked”. It measures coverage, not score: a higher share means more of the lineup has been read against real-world use, not just the AI panel.

Evidence behind the ranking

The human sources we cross-check the AI ranking against, shown in a section’s header: the AI panel (how many models were queried), reviewers (the count of review videos behind the ranked products, split by how positive they were), buyers (pooled shopping-review count and the review-count-weighted average rating), and press (how many outlets recently covered the ranked brands). Each source appears only when the section has real data for it — never a fabricated number — so an AI-only section shows just the panel.

Evidence strength

How much independent material stands behind one product’s reality check, counted as SOURCES: each reviewer video behind the “what reviewers say” summary, each named press outlet in the per-outlet consensus, and each retailer whose buyer reviews Google aggregated for the listing — one voice each. Banded Strong (15 sources or more), Moderate (6–14) and Thin (1–5); zero sources reads “not enough evidence yet”. Thin is a description, not a penalty — new releases start there. The count re-tallies material the product page already shows, so every number can be audited by scrolling the profile; it says how MUCH backs the check, never how positive the check is.

AI vs reviewers alignment

Whether the AIs and the reviewer room rank a lineup the same way. We rank a brand’s reviewed products two ways — by AI rank and by reviewer score — and read the per-product gap. Mostly in step is “aligned”; a product reviewers place far above its AI rank is a hidden gem, far below means the AIs are running generous.

Portfolio rating

A brand’s average product rating across the products we track in a section — blended from what video reviewers and shoppers give those products. Reported as “X.X ★ · avg of N”, with the creators / people split shown beneath. A brand with no rated product shows nothing — never counted as zero, never invented.

Lineup

How deep a brand plays in a section: how many of its products rank here and how high its best one reaches — “N products · best #k”. Across a department we also note how many subcategories it competes in. It reads a brand as a portfolio, not a single product.

Aggregate rank

The summary position a brand or product holds across an entire section — pulled together from how every AI ranked it in every question that lives under that section.

A score is measured against the answers that were actually available, not against a fixed number of AI systems. One of our panelists reads Google’s shopping results rather than answering as a chatbot, so on questions where those results don’t exist — digital services like VPNs or web hosting, and health topics where Google withholds them — it has nothing to say. For those questions we divide by the panel that could answer, so a brand isn’t marked down for a verdict nobody was ever going to give. Every other question is scored exactly as before.

This applies only where a panelist can never weigh in on a question. One that simply missed a given week still counts, and its most recent answer keeps its place — an outage doesn’t quietly lift everyone’s score.

Trend line

The small line next to a row plots the same metric as the number to its left, week by week. The right edge is the current value.

Per-AI chip

A chip like #1 on a per-AI row means at least one of that AI’s answers in the current section placed the subject there. It can differ from the aggregate rank above — broad presence and peak placement are different signals.

AI consensus

Whether several AI assistants name the same subject when asked the same buyer question. Full dots mean every AI we watch included it; partial dots mean some did and some didn’t.

Brand-only questions

Some buyer questions get brands but never a settled answer at the model level — asked for the best socks or the best soy candle, the assistants name the same houses but each invents a different product. Where that holds across a whole area, we publish the brand race and no product ranking, and the product view for that area points back at the brand one. It is a statement about how much agreement exists, not a judgement on the brands: the brand signal there is often among the strongest we measure.

When we rank our own judges

A few areas we chart contain the companies that build the assistants we ask. Where one of them lands near the top of a board, the page says so and names which of our sources that company builds. We do not discount it for that: the whole point of the ranking is to report what the assistants actually said, so correcting the number quietly would be the dishonest move. The disclosure is derived from the board each week, not written by hand, so it appears exactly where and when the conflict is real — and disappears when the company slips down.

Week-over-week movement

The arrow + number next to a rank compares the current weekly snapshot with the one before. Up means the subject moved closer to the top; down means it slid.

Intents won

How many of the AI assistants put this product or brand at #1 on at least one of the section’s buyer questions — a quick read on how often it tops the panel rather than merely places.

Tag breadth

The number of distinct advantage tags (e.g. “battery life”, “value”) the AIs attached to this product or brand across the section’s questions. Wider breadth means it gets recommended for more different reasons.

What the panel weighs

For each subcategory we tally the advantage tags the AI assistants attach most often to that field’s top picks — the recurring traits (≥ two of the leaders carry each). The short line beside a field reads those tags as a buying lens: what the consensus optimises for there. Because each field is tallied on its own ranking, the priorities differ field to field rather than averaging into one category-wide statement.

Why a brand leads

A short, grounded read on why the AI consensus ranks one brand #1 in a field. It draws only on signals already on the page — the advantages the panel rewards across that field’s top brands, what reviewers say about them, and the brand it edges out — and explains the company-level reason (lineup range, reputation, a halo product, value, reviewer backing), not a single product’s spec sheet.

Top product vs top brand

Every buyer question is ranked twice, on two separate boards: one of individual products and one of the companies that make them. They are scored independently, so the top product is often not made by the top brand — and that is a real answer, not a mismatch. A field can have one outstanding model from a smaller maker while a different company is the one the panel trusts across its whole lineup.

So when a question shows both, read them as two answers to two questions: “which single thing should I buy?” and “whose products hold up across the board?” Neither is a summary of the other.

Tier

A coarse confidence ladder that filters out noise — brands and products that have shown up once and never again sit at the bottom; subjects with broad, repeated presence sit at the top. Tier governs what appears in listings and what feeds editorial blocks.

Newcomer

A brand or product that was outside the visible top in the previous snapshot but inside the visible top this snapshot. Movement up from #45 → #28 is a riser, not a newcomer.

Leaders changed

A count of questions whose top answer changed between the previous snapshot and this one. Bigger number = noisier week.

Category turnover

How much a category’s top ten reshuffled over the tracked window. Week to week we compare the set of leaders; the share that changed places, averaged across the window, is the turnover. We open each deep-dive on whichever category moved most.

Segment share

Each band’s share of a category’s top ten, week by week. We sort brands into three bands from signals we already track — a brand that debuted inside the window is “fresh / upstart”, a long-standing established leader is an “incumbent”, the rest are “premium”. It’s a read on the structural drift under the #1, not a market-research taxonomy, and it grows sharper as more weeks accumulate.

Crowd vs models

Where Google review ratings (the crowd) and the AI rankings (the models) disagree. A brand the crowd rates highly but the models rank low is a crowd-favourite the models lagged on; a brand the models keep near the top despite cooler reviews is one they run hot on. Ratings and volume come from Google reviews of the brand’s products.

Creator coverage share

Each brand’s share of the YouTube creator reviews we track in a category, split into an earlier window and a recent one. A rising share means cameras are turning toward that brand — attention often leads the rankings.

News impact

For each press item we mark whether the brand it’s about rose or fell in the rankings over the same window (or whether the story moved the whole category). It’s a correlation we surface to read the headlines as cause — not a proven causal claim.

AI agreement

How often the AIs we watch agreed on the same top pick for the same buyer question this week. Low agreement means the answer depends on which AI you happen to ask.

AI split

A question this week where the AIs we watch each named a different top pick. We feature one of these as the week’s contested intent.

Mentions

How many times a brand was named across all the buyer questions in a given week. Rewards broad presence across categories rather than peak rank in any one of them.

Reviewer sentiment

A coarse positive / mixed / critical label derived from how human reviewers talked about the product. Independent of what AI says about it.

Press sentiment

How recent news coverage of a brand reads, over the last 30 days. We pull the latest articles from Google News, classify each one as positive, neutral, or critical, and tally them — the bar and the net word are computed straight from those counts, never edited by hand. It’s a measure of press mood, independent of what AI or reviewers say; a brand with no recent coverage simply shows no press block. We re-check the most visible brands on a rolling cadence — the better-known and faster-moving ones more often — and once a read passes two months old we stop showing it rather than present stale coverage as current. On a brand hub (a category or subcategory rail), the same read rolls up across the ranked brands there — the “Brands in the news” strip lists the ones drawing coverage, and the hub’s net mood is tallied the same way, one brand per vote.

Buyer reviews (Google)

What the wider buying public makes of a product, pooled from Google Shopping’s aggregated reviews across retailers. We match the product to its catalogue listing, then read the average rating, the star histogram, and the owner-aspect breakdown. The headline score, the histogram bars, and the “trust” stamp are computed straight from the numbers; the aspect scores and the two quotes are derived from the real review text, never invented. It’s a far wider but blunter jury than the video critics — a product with no confidently-matched listing simply shows no buyer block.

Consensus spec (attribute matrix)

For each trait that decides a subcategory, the one product that leads across independent source types (buyer reviews, reviewer videos, the AI panel’s own tags) — a “leads” call, and every product shown beside it, always carries a two-source quorum. Single-source or no-read verdicts are held back rather than shown as noise. Verdicts are qualitative by design — no composite scores are invented.

Voice of the street

What real people say in fifteen seconds — a candid read distilled from short-form video clips (TikTok and YouTube Shorts) about a product, leaning toward the freshest coverage; each read states the window it actually spans. Where a product is thinly covered on those platforms, we top the read up with Instagram reels, reading the reel's own caption — where creators reliably write the review that the audio, set to music, often doesn't carry. Clips with a disclosed sponsorship ("#ad", "paid partnership", a brand's own account) are excluded from the read entirely — nothing on the page is shown or computed from them. It sits next to the considered video reviewers as a different voice: raw, high-volume, unpaid, where the recurring gripe or delight is the signal, not any one clip. We group the repeated themes, mark each as praise, a gripe, or genuinely split, and compute the net lean straight from those counts; a read built on only a few clips says so ("early read"). Themes that concern a concrete feature also appear as the Street column in the by-feature table. From the same clips we derive — arithmetically, from the real view counts — the reach test (a theme's share of all views, not just its clip count), the rival products the clips name, and the spoken usage contexts. The one quote we show is verbatim from a real clip, and the clips the read was built from are shown beneath it. When the crowd contradicts the pros or the star rating, we say so. A product with no usable short-form material simply shows no block.

Press consensus (per product)

How the established review press lands on a single product — distinct from the per-aspect critics digest above it, and from brand-level press mood. We regroup the named publications that reviewed the product (the outlets Google surfaces in its review insights, kept to real publications with sources under 18 months old) into one row per outlet, and read each outlet’s lean — recommends, mixed, or critical — from the balance of what it praised versus faulted. The “N of M recommend” tally, the agree-or-split badge, and the flagged dissenter are all computed from those counts, never written by a model; each quote is verbatim from the source review. A consensus shows only when at least three independent outlets cover the product — one opinion is not a consensus — so a thinly covered product simply shows no press block. When the AI models themselves split on a product, the block leads with that, as a tie-breaker.

Owner signals (forum threads)

What long-term owners keep flagging about a product, grouped from the public forum threads where people post after the honeymoon wears off — distinct from launch-day video reviews. For each recurring complaint we read four things straight from the threads: how long owners had used it, which feature they fault, how often the complaint recurs, and how serious it is. A theme that the aggregate star rating hides is flagged as such. The themes and counts come from real thread text, never invented; where we have only thread titles (not full bodies) the ownership window often can’t be established and reads as “—” rather than a guess. A product with no recurring owner signal simply shows no owner block.

Safety recall (the override)

The one signal that overrides any ranking. We check each product against the official recall registers — the U.S. Consumer Product Safety Commission, the FDA’s enforcement reports (food, supplements, drugs, devices), and the National Highway Traffic Safety Administration — matching strictly by brand and model, so we attach a recall only when we are confident it is the same product (better to skip than to wrongly accuse). When a product has an active recall, its buyer verdict is forced to “Overturned” no matter how the AIs rank it or how buyers rate it, and an alert names the hazard, the recall ID, and the date, linking out to the regulator. No confident match means nothing is shown and the verdict is left untouched. We never recommend something that is being pulled from the market right now.

Claim Check (Promise vs. proof)

How a product’s official marketing claims hold up against independent reality. We lift the brand’s headline claims off its official product page — or, when that page is unreachable or generic, off the maker’s own description in Google’s product catalogue — then check each against what owners and expert reviewers report — pooling our buyer-review aspects, the video reviewers’ verdicts, and their stored verbatim quotes. Every claim gets a plain verdict (Holds up, Mixed, Overstated, or Unverified), a one-sentence read of the evidence, and — where one fits — a single real, sourced quote (always picked from quotes we already store, never written by us). The marketing-honesty score and the single widest promise→reality gap are computed straight from those verdicts: the score is the average of per-claim points — Holds up 100, Mixed 50, Overstated 0 — and a claim the reviews can’t speak to is marked Unverified and left out of that average.

Street price

The approximate price range a product currently sells for, read from Google Shopping at capture time. We match the product to its catalogue page and prefer Google’s own typical price range for that item; when Google doesn’t publish one, we derive the span from the new-condition retailer listings, trimming obvious outliers (accessories, bundles, knockoff listings) — computed straight from the listed prices, never edited by hand. It is informational, not an offer: prices move daily and vary by region, so treat it as a ballpark, refreshed roughly monthly with an honest “as of” date. A product with no confidently-matched listing simply shows no price.

Price tier (budget / mid-range / premium)

Where a product’s street price sits within its own field — all the priced products currently ranked in the same subcategory. We take the midpoint of each product’s price range and split the field’s midpoints into thirds; the two boundary numbers are stored on the subcategory and every page inherits them, so a product’s tier never disagrees with its field’s spectrum. Tiers are relative to the field, not absolute — a “premium” kettle costs less than a “budget” laptop — and they describe price position only, never quality. Recomputed monthly with the price refresh; fields with fewer than five priced products show no tier blocks at all. The field reflects the current ranking window, so boundaries shift honestly as products enter and leave the board.

Brand price tier (Value / Mid / Premium)

Where a brand’s pricing sits among the brands it competes with, derived from its ranked products’ street prices — each product keeps the tier its own subcategory assigned it, and the brand’s median is the median of those products’ price midpoints across the category. A brand earns a label only when at least three of its ranked products carry prices and at least 60% of them fall in one tier; brands spread across tiers show no label at all. On brand hubs each brand is placed by its median within that one subcategory. Like the product tier, the label describes price position only, never quality, and recomputes monthly with the price refresh.

Brand honesty (Marketing vs. reality)

How well a brand’s marketing holds up across everything we’ve checked from it. We take the brand’s products that have a Claim Check and average their honesty scores — each product weighted equally — to get one brand-wide number, then rank that number against the other brands competing in the category. A brand needs at least two checked products before it earns a score, and a category needs at least three such brands before it shows a ranking. Like Claim Check itself, it measures only whether promises match independent reality, never overall quality, and recomputes weekly as more products are checked.

Brand trust (Lovemark scale)

A single 0–100 reading of how much a brand is trusted, used on the brand-vs-brand comparison. It blends the two reputation signals we hold per brand: its marketing-honesty score and its press sentiment (a balance of positive over critical coverage, centred at 50). The two are averaged; when one is missing the reading uses whichever is present rather than penalising the gap, and a brand with neither shows no reading. The result places each brand on a skeptical→loved scale, with the raw inputs shown beside it. It measures reputation, not product quality, and moves as honesty and press refresh.

Archetype fit

How well a product suits each of seven fixed buyer archetypes — the Value-Maximizer (price first), Quality Perfectionist (best-in-class), Premium Connoisseur (brand & status), Early Adopter (newest first), Reliability-Seeker (proven & safe), Simplifier (just works) and Enthusiast (joy & aesthetics). For each one, a single pass reads the signals we already hold for the product — buyer reviews, reviewer videos, street price, whether a newer model exists, and brand standing — and grounds a fit on them, shown as Good fit, Could fit orNot for you with the recognisable situation and one plain reason (a poor fit names a better pick). It judges suitability for a kind of buyer, never overall quality; an archetype the signals can’t honestly place is left off rather than guessed, so thin products show fewer cards. Higher up the catalog the same fit is rolled up: a subcategory shows the single product that tops the most of its buyer-questions for each type, and a category routes each type to the subcategory it’s most at home in — proof is the share of questions won, never a fresh number. On the brand pages it rolls up one more level, to the brand: a coverage map and positioning map place each brand by the buyer types its products serve best, and a podium names the leading brands for each type — every read still the same per-product fit, aggregated, never a new score.

Reviewer verdict

Whether human video reviewers broadly agree with where AI ranked a product. We distil a reviewer score from the product’s review videos, order the ranked products by that score, and compare each one’s reviewer position with its AI rank. Close positions read as agree; a few slots apart, differ; far apart, oppose — flagged as “AI over-rates” when AI ranks a product well above the reviewers, or a “sleeper” when reviewers rate it well above AI. The column only appears once enough products on the page carry reviewer coverage.

Buy-confidence

A single read on how strong the case to buy is right now, blending the signals already on the product’s page: where the AI panel ranks it, how the video critics scored it, what the wider buying public makes of it, whether the professional press recommends it, whether the brand’s marketing claims hold up to reality (the Claim Check honesty score — our “lovemarks” signal), and how many of the AI models agree. The signals are weighed against each other; when one is missing — no critics yet, no matched buyer listing, no claim check — it is left out and the rest reweighted, never counted as zero. The gauge needle and the word (Overturned, Divided, Holds up) are read straight off that blend, and the caption notes how many of the six signals fed it — a synthesis of what this page already holds, not a product review or a verdict from us. A verdict needs corroboration: when only one voice has weighed in the gauge reads “Not enough evidence yet” rather than a buy-or-skip word (in either direction), and when nothing has weighed in yet it says so plainly instead of showing a needle. The one exception: an active safety recall pins the needle to Overturned regardless of the blend — we never point you at a product being pulled from the market.

Time machine (weekly snapshots)

Each ranking is captured weekly, and the top of every week’s board is stored as it stood. On the Top 100 leaderboards and on a buyer question’s full ranking, stepping back loads that stored board rather than re-sorting the entries ranked today, so a brand or product that has since fallen off reappears in the weeks it actually ranked. The change beside each row compares its position with the previous stored week, and a note under the stepper names which of the two reconstructions is on screen. On the remaining pages, and on any week we hold no stored board for, the table is re-ranked from the entries ranked today, which cannot show anything that has since dropped out.

A stored week covers the top of the board, not every entry the models named, so a row marked new that week was outside the stored depth the week before rather than absent from the catalog. How many AI models agreed is recorded for the live board only, so a rewound board leaves that column blank for every row instead of borrowing today’s number. Reviewer scores, evidence counts and photos are current values shown beside a past rank, and dim where the layout allows it; the price band is dropped entirely on a rewound board rather than shown undimmed. We never fabricate a historical score. We surface only the last few weeks; deeper history is retained but not published here — get in touch if you need it.

Reviewer lean (per slice)

A one-glance read of how a slice’s human reviewers line up with the AI ranking. Within each slice we re-rank the reviewed picks by their reviewer score and compare that order with the AI order. Tight agreement reads in step; when reviewers consistently rate the picks higher than the AIs did, reviewers rate higher; when the AIs rank them above where reviewers land them, AIs run high; and lots of disagreement with no clear direction, split. The chip only appears once a slice has at least a few reviewed picks — otherwise it is left off rather than guessed.

The AI panel

We watch a small panel of leading AI shopping assistants. The exact lineup is held constant within a measurement period and rotated only when the broader market changes.

FAQ questions

The questions in a page’s FAQ are the real ones buyers ask about that exact topic or product — drawn from the “People Also Ask” box Google shows for the query, when available. They appear on category, subcategory and buyer-question pages as well as on individual brand and product profiles, and each level answers at its own altitude: a category page helps you choose between types, a product page answers about that one product. The answers are written only from material we already hold for the page: what independent video reviewers say and the AI ranking itself (the top picks and the traits the assistants value). We never invent specs, prices, or claims, and a question we cannot answer from that material is left off rather than padded.

Methodology evolves quietly. Principles stay public.

See this week’s recap →Read the rules →