Research

Why 70% of B2B sites fail the agent-readiness test — and what the top 5% do differently

Across 14,208 scans this quarter, the median B2B supplier scores 38/100. We broke down what separates the top five percent from everyone else — and it isn't what most people expect.

Mira SatoApr 18, 202612 min read

When we started Helo, we assumed the fail-state was structural data: missing schema.org, no JSON-LD, no llms.txt. Twelve thousand scans in, the data says something different. The cliff isn't at data — it's at protocol.

The shape of the distribution

Across 14,208 domains scanned between January and April 2026, scores cluster tightly around the median with long tails on both ends. The top 5% — 710 sites — all score above 78. The bottom 10% — 1,421 sites — all score below 15.

The middle of the distribution is where most of the story lives. These sites have somestructured data — enough to prove they've heard of schema.org — but fail the harder tests.

Four findings

1. The top 5% all support at least one commerce protocol

Every domain scoring above 78 implements either AP2 endpoints, MCP-over-HTTPS, or an OpenAPI 3.1 description with a live /.well-knownentry. This isn't correlation — it's the single highest-weighted criterion in our scorecard.

“Protocol support isn't a feature an agent asks for politely. It's the thing the crawler tries first, and when it's absent, your site is effectively read-only.”
— internal note, Jan 2026

2. Middle-of-pack sites have schema.org but no product feed

Sites scoring 30–60 almost always implement Product schema on marketing pages — but have no machine-readable catalog, no sitemap beyond HTML pages, and no programmatic way to get a current price. For an agent, this is a brochure, not a store.

No products.json or feed endpoint — 82% of the middle band
Prices only available after a gated quote form — 61%
Sitemap limited to marketing pages, excludes SKUs — 74%

Curious where your site lands?

Run a free scan. 90 seconds, no sign-up.

See how your site scores →

3. Freshness is the quiet killer

The correlation between “last-modified” headers older than 180 days and a score drop of 15+ points is one of the cleanest we've seen. Agents penalize stale content heavily — more than we expected when we first calibrated the scorer.

4. Trust signals are over-indexed on visual proof

Site owners tend to invest heavily in trust badges, testimonial carousels, and press logos. Agents can't see any of it. What they can see — SSL posture, WHOIS age, verified organization schema — is often missing even on sites with rich visual trust surfaces.

What to do Monday morning

If your site scores in the middle band, the highest-leverage change is publishing a single machine-readable catalog at a stable URL. That one change, shipped across twelve sites in our pilot cohort, moved the median score from 41 to 58 — a full grade bump — within two weeks.

Everything else can wait. Schema tweaks, trust badges, even a proper llms.txt — they matter, but they're marginal until the protocol layer exists.

Filed under Research← All insights