🔎 Focus: Crawling & Indexation
🔴 Impact: High
🟠 Difficulty: Medium

Sponsored by Ahrefs
Stop wondering if AI is talking about your brand. This post is brought to you by Ahrefs, the leading SEO & marketing intelligence platform.
With their new Brand Radar tool, you can finally monitor your visibility across major AI engines: from ChatGPT to Google’s AI Overviews. Secure your brand as the top choice in the era of AI.
The Day I Discovered 1.2 Billion Pages
Back in 2019, I was doing a technical SEO audit for one of the largest media companies in Poland.
Among their properties was a massive furniture aggregator ecommerce site with around a million products.
Big site. But nothing unusual.
Then I opened their indexation report.
Google had 1.2 billion pages from this site in its database.
Not millions.
Billions with a B.

Insane Coverage Report
At first, I thought the data was wrong. But it wasn’t.
Google was crawling the site like a machine possessed. It was downloading over a terabyte of data from their servers every single day.
Every hour, the crawling looked worse.
The site structure simply didn’t have that many pages.
So where were they coming from?
The Hidden SEO Disaster Most Ecommerce Sites Never Notice
The culprit was filtering.
Thousands of categories.
Dozens of filters per category.
Color × size × material × brand × price × style × availability.
Each combination generated a new URL.
Multiply that across thousands of categories, and suddenly you don’t have thousands of pages.
You have billions of crawlable combinations.
Google was wasting its crawl budget exploring an infinite maze of useless filter pages.
Meanwhile, the pages that actually mattered - the products and valuable category pages were barely getting attention.
Organic growth stalled.
And no one inside the company realized why.
How We Removed 600 Million Pages From Google
We had to move fast.
First, we analyzed which filter combinations actually had search demand.
Those were allowed to stay indexable and remained in the internal linking structure.
Everything else?
We shut it down.
We deindexed low-value filters,
Blocked many of them in robots.txt,
Removed crawlable links by hiding them behind JavaScript interactions.
Within six months, we removed roughly 600 million pages from Google’s index.
Crawl efficiency improved.
Important pages started getting crawled again.
And the site’s growth came back.

Traffic growth after fixing the filtering crawling strategy
Of course, around 2 years later, Google absolutely nuked aggregator sites in Poland after the May 2022 Core Update and the next ones, and the site lost a ton of traffic since then.

Traffic declines after core updates
Still a cool case to watch out for, how filtering could kill your site performance
Your Ecommerce Site Might Have the Same Problem
Most ecommerce founders never look at their crawl data.
But filtering systems quietly generate millions of useless URLs that search engines keep discovering forever.
It’s one of the fastest ways to destroy SEO performance without realizing it.
Reply to this email with “Filter” and your e-commerce domain.
I’ll take a look at your ecommerce site and check if your technical SEO or filtering setup is silently killing your rankings.
How I analyze Technical SEO on Fortune 100 Stores.
Here I audited Lowe’s - huuuuuge US-based hardware store. Before you ask, yes, they also struggle with indexation.
Until next time 👋
oh that’s a human
—
