🔎 Focus: Technical SEO
🔴 Impact: High
🟠 Difficulty: Mid-High

How to get your server logs

Most of us spend 90% of their time in Google Search Console (GSC). Don’t get me wrong, GSC is the only tool we cannot live without, but it's the "filtered" version of reality (and quite slow tbh):

GSC is sampled data
GSC is delayed by 48 hours (or more)
GSC tells you what Google wants you to know.

If you want the Ground Truth, you need Server Logs.

Owning a shared server? Then is as easy as getting the Log Files from the Cpanel:

How to get the Server Log Files from the Cpanel

Most advanced cases we manage (large sites +100k pages) run on dedicated servers on Cloudflare, Vercel or AWS which have platform-specific logging systems that deliver logs data in structured JSON formats.

Whatever might be de case, the raw logs do not tell much…

Sample of a server log file in a shared server

I have to analyze server logs myself and I always check 5 critical points. Today, I will share those critical server checks.

Ready? Let’s goooooooo

Quick Log Analysis Checklist

Check	What to look for	Action
1. Orphan Discovery	URLs with hits in logs but zero hits in your site crawl.	Link to high-value pages; 301 or 410 garbage URLs.
2. AI Bot Access	403 Forbidden or 404 errors for agents like `GPTBot` or `ChatGPT-User`.	Adjust server/WAF settings to allow bots if you want AI visibility.
3. Negative Audit	"Money Pages" in your sitemap with 0 hits in the last 30 days.	Add these pages to main navigation, footers, and internal linking.
4. Rendering Gap	High HTML hits vs. significantly lower JS bundle (e.g., `main.js`) hits.	Monitor weekly to ensure Google is triggering the 2nd wave of rendering.
5. Wasted Budget	High hits on `?search=`, `?filter=`, `?sort=`, or endless pagination.	Remove links from HTML, use noindex/follow, and block patterns in robots.txt.

1. Orphan Pages

I know, “you do not need a server log to audit orphan pages”. Most of us do this:

Get the sitemap URLS
Set our Screaming Frog (or similar tool) to crawl our site
Connect GA4, GSC and the crawling tool
Compare URLs and reports that identify orphan pages

Sample of the Screaming Frog report on orphan pages

Using crawl tools + sitemaps to identify orphan pages has a major flaw:

Sitemaps are can be incomplete.

In some cases, I worked with sites without a sitemap file of any kind…

✅ The Server Log Way: Export your log file URLs and compare them to your site crawl. If a URL has hits in the logs but zero hits in your crawl, it's an orphan.
- If it's high-value, link to it
- If it’s garbage, 301 or 410

2. AI Crawler Visibility (Training vs. Agents)

By 2030, 50% of website traffic will be AI agents

That was a prediction I did 9 months ago.

Well it turns out that Open ChatBot (Chat GPT) is now the leading crawler above Google’s according to an study.

In retail, Agentic Commerce will be the first move towards letting AI buying on our behalf. So my prediction might become true before 2030.

My "audience" has expanded quite a bit the last 3 years.

Apart from regular human visitors now I get training bots like GPTBot and user-triggered agents (citations or search functionalities) like OAI-SearchBot or Claude-SearchBot. If these bots aren't in on my logs, my brand doesn't exist to AI.

Below an example of one of my websites with ZERO AI crawler activity but well visited by regular search engines:

Example from the Screaming Frog Server Log Analyzer, that shows no AI bot comes to a particular website (zero AI visibility).

🔎 What to check: Filter your logs by User-Agent. Look for these bots on your server logs:
- GPTBot, ClaudeBot, PerplexityBot, and Applebot-Extended.

Problem 1: you do not get any AI crawler because of lack of relevant content. Your site has literally noting of interest to the millions of users of AI chatbots.

How to fix: match search market, create something of value for your humans

Problem 2: If AI bots are visiting your site but your response is a 403 Forbidden or 404 Not found, you basically blocked the bots to get to your site despite they want to.

How to fix: unless you actively want to block AI bots, arrange your server settings to allow bots

⚠ Hallucination note: it could be that AI bots make URLs up that never existed. Evaluate carefully and create those 301 despite you did not do anything to create this issue…

Hear Me Speak About Tech SEO

How to do eCommerce Technical SEO (for AI Search)

18-19 June - Berlin

3. The Negative Audit

I learned this one from Pedro Dias. He calls it the "absence of signal". A page being crawled is the right behavior. Absence of signal is when a page that isn’t crawled but should.

Map your sitemap URLs against your log files.

⚠ How to test: If 30% of your "Money Pages" haven't been touched by Googlebot in 30 days, but your /category/page/99/ is getting hit daily, you have a structural internal linking crisis. Google can't index what it doesn't fetch.
✅ How To Fix: All pages that should be crawled and are not, must be added in navigation, footer and overall internal linking sections. If they are money pages they have to be present all across the site.
- Overload of links per page: it can happen each page has thousands of links. Many links might not be even reached by bots because there are just too many. Optimize internal links for crawlability and keep 200-500 internal links per page to stay safe. One common issue are those Mega-menus that load ALL navigation of the site at once. Remove that functionality and only show the relevant links per page.
- JS Rendered links: it can happen the navigation is loaded with JavaScript only by user interaction. Bots won’t interact with the site as humans do, therefore all JS loaded navigation is pretty much invisible. Remove any JS-rendering on navigation. Raw HTML preferred.

4. The Rendering Gap (Static vs. JS)

Google uses a two-wave indexing process. First, it fetches the static HTML. Later, it renders the JavaScript. Nothing new here.

Your logs can show you if Google is getting "stuck" in wave one.

I am focusing on Googlebot (or similar). Google can render JS but it is not guaranteed it will. The least I can do is to ensure the bot comes for the second wave of JS rendering.

⚠ How to test: As Google might take days even weeks to come for the 2nd wave the best practice is best to monitor within a weekly or monthly timeframe.
- ❌ Bad signal: I have 10,000 requests/week for my Product Pages (HTML) but only 1,000 requests/week for the main JavaScript bundle (main.js).
  - JavaScript is not being crawler enough.
- ✅ Good signal: I have 10,000 requests/week for my Product Pages (HTML) and 12,000 requests/week for the main JavaScript bundle (main.js).
  - Google not only triggered 100% of the JS on each crawled page but it came to render JS on previous carwled pages that were in cue. This is good.

5. Wasted Crawl Budget

Finding wasted crawling resources is the easiest issue to spot. What I always search:

Search Parameters: /?q , /?s , /?search or similar that point to search pages that should not be crawled (let alone indexed).
Faceted Navigation: /?filter , /?color , /?topic or similar from filters. Every extra filter creates a new URL string. If your site has 10 filters you could end up easily with millions of combinations.
Excessive Pagination: /category/products/page/999/ endless pagination usually created by silly HTML issues.
Tracking & Session IDs: /?utm_source=, /?sessionid=, or /?gclid=. These are often leaked from feeds, campaigns, social media, etc.

We should not aim for a perfect crawling setup. I like to keep 50% ratio between the pages that should exits (valid pages for the index) and the ones that should not (not valid for the index).

Here is an example of a bad valid/not valid pages you can easily do on your GSC:

My website has 94k pages know to Google
- 90k pages are not valid, that is 95% of not valid pages
- Only 3k pages are actually suitable for the index > 5% of valid pages

This issue alone is costing this client to be at a loss of 40% of traffic from what it was.

The GSC helps me to understand the impact and the server logs to know all affected pages.

Bad ratio of non-valid pages for the Google Index

✅ How To Fix: there is not a one-fits-all solution for this. Each case is different. 2 things have to be implemented:
- Remove the links entirely from being present in the HTML
- Tag them with noindex,follow (not always)
- Wait for them to be removed from the Google logs
- After they are gone, block the URL patterns to avoid them to come back

Do you want me to check if your Server Logs?

Reply SERVER LOG and your domain name and I your current server status.

(Also if you reply to this email Google will know I’m legit and not label me as spam 🙏 )

My Top Picks

The Technical SEO Audit Needs A New Layer

Websites need a new audit framework that accounts for AI crawlers, rendering limitations, structured data, and accessibility tree parsing.

Search Engine Journal • Slobodan Manic

Log file analysis for SEO: Find crawl issues & fix them fast

Unlock hidden SEO insights with log file analysis. Learn how to track crawler behavior, spot indexing problems, and optimize your site for search engines.

Search Engine Land