Technical

Noindex

Noindex is a directive that tells search engines to keep a specific page out of their index, so it never shows up in search results. The page can still be crawled and its links still followed, but it stays invisible to searchers.

Not every page on your site deserves to be in search. Your internal search results pages, thank-you pages, thin tag archives, login screens, and duplicate filter URLs add nothing to a searcher and can actively drag your site down by spreading crawl budget and ranking signals across junk. Noindex is how you tell Google to leave those pages out of the index while keeping them live for the people who reach them directly.

The directive is simple. The discipline is knowing what to point it at. Noindex aggressively on the wrong pages and you bury content you wanted ranking. Forget it on the right pages and you let low-value URLs dilute your site in the eyes of the engine.

How to apply noindex

There are two clean ways to do it. The first is a meta tag in the head of the HTML page.

<meta name="robots" content="noindex, follow">

Note the follow. You almost always want noindex, follow rather than noindex, nofollow. That combination keeps the page out of search while still letting ranking signals flow through its links to the pages you do want indexed. Using nofollow here strands those signals for no good reason.

The second method is the HTTP header, which works for HTML and non-HTML files alike.

X-Robots-Tag: noindex

Good candidates for noindex

  • Internal search result pages, which can spawn thousands of thin URLs.
  • Thank-you and confirmation pages that only make sense after an action.
  • Faceted or filtered URLs that duplicate a main category page.
  • Login, account, and admin pages with no search value.
  • Thin tag or archive pages that hold little unique content.
bolt

Noindex removes a page from results but does not stop it being crawled. If you want it both uncrawled and unindexed, that is two different jobs handled by two different tools.

The catch that keeps pages stuck

A noindex only works if the engine can actually read it. If you have also blocked the URL in robots.txt, the crawler never fetches the page, never sees the directive, and the page can linger in the index. So when you want something gone, leave it crawlable, apply the noindex, and wait. Google needs to recrawl the page to notice the new instruction, which can take days or weeks depending on how often it visits.

warningWATCH OUT

Do not noindex a page and disallow it in robots.txt at the same time. The block prevents Google from seeing the noindex, and the page may stay in results as a link with no snippet.

targetNoindex versus 410 versus canonical

Reach for noindex when a page should stay live for users but out of search. Use a 410 Gone when the page is truly dead and should be dropped entirely. Use a canonical tag when two similar pages should both stay indexable but you want to consolidate ranking onto one. Three different problems, three different fixes.

Crawlable beats blocked

For a noindex to take effect, the page must stay crawlable so the engine can read the directive. Pair noindex with follow, keep it out of robots.txt, and be patient while the recrawl happens.

For the full rundown on indexing control and crawl hygiene, see my guide on technical SEO fundamentals.

Want this handled by someone who has measured search for 20 years?

Work with me