Technical

Indexing

Indexing is the stage where a search engine stores and organizes a crawled page in its database so it can be retrieved for relevant searches. A page must be indexed before it can rank.

Once a page has been crawled, the search engine has to decide whether to keep it. Indexing is that decision plus the storage that follows. The engine analyzes the content, figures out what the page is about, and files it away in a massive database so it can be pulled up later when someone searches for something relevant.

People throw the words crawling and indexing around like they mean the same thing. They do not. Crawling is fetching the page. Indexing is keeping it and understanding it. Google can crawl a page and then decide not to index it, which is one of the most common and most frustrating situations you will run into.

Crawled is not indexed

Google Search Console literally has a status called Crawled, currently not indexed. It means the bot fetched your page, looked at it, and chose not to file it. That is usually a quality or duplication signal, not a bug.

How a page gets indexed

  1. 1The page is crawled and the raw HTML is fetched.
  2. 2If the page relies on scripts, it goes into a rendering queue so the engine can see the final content.
  3. 3The engine extracts text, links, structured data, and signals about topic and quality.
  4. 4It checks for duplicates and picks a canonical version to represent the content.
  5. 5If the page clears the bar, it gets stored in the index and becomes eligible to rank.
bolt

Being in the index does not guarantee a good ranking, but being out of the index guarantees no ranking at all.

Common reasons a page is not indexed

  • The page carries a noindex meta tag or X-Robots-Tag header.
  • A canonical tag points to a different URL, so this version is treated as a duplicate.
  • The content is thin, near-duplicate, or judged too low value to keep.
  • The page was blocked in robots.txt, so the engine never saw enough to index it confidently.
  • The site is new or has weak internal linking, so the engine has not gotten around to it yet.
<!-- This tag tells search engines not to index the page -->
<meta name="robots" content="noindex, follow">

<!-- The same instruction can be sent as an HTTP header -->
X-Robots-Tag: noindex

warningWATCH OUT

Leftover noindex tags from a staging site are one of the most common ways real pages silently drop out of Google. After any migration or launch, scan your live site for noindex and confirm only the pages you intend to hide are carrying it.

targetHow to check coverage

Use the Pages report under Indexing in Google Search Console. It groups your URLs into indexed and not indexed, and gives you a reason for every exclusion. Work the not-indexed list from the top: fix the reasons that affect the most important pages first.

There is also a timing factor people forget about. Indexing is not instant. A brand new page on an established, frequently crawled site might get indexed within hours. The same page on a small, rarely crawled site might wait days or weeks. You can nudge the process by requesting indexing in the URL Inspection tool, but do not treat that as a magic button. If a page keeps getting refused, the answer is almost always to improve the page or its internal links, not to keep hitting request.

If you want indexing to go smoothly, give the engine a clean path: crawlable pages, a clear canonical for every piece of content, real substance on the page, and tidy internal links. The biggest lever for borderline pages is usually internal linking. A page that important sections of your site point to looks more valuable than one floating alone, and the engine treats it that way. For the deeper mechanics and the rest of the technical foundation, see my technical SEO guide.

Want this handled by someone who has measured search for 20 years?

Work with me