Technical

Meta Robots Tag

The meta robots tag is an HTML directive in the head of a page that tells search engine crawlers how to treat that specific page, whether to index it, follow its links, show a snippet, and more. It is your per-page control panel for crawler behavior.

Robots.txt controls whether a crawler is allowed onto a page. The meta robots tag controls what the crawler does once it is on that page. That difference trips people up constantly, so lock it in now. One lives in a file at the root of your site. The other lives in the head of a single HTML page and speaks only for that page.

The tag is one line, and it carries a comma-separated list of instructions. Each instruction is a directive that the major engines understand and honor. Get the directives right and you steer indexing with surgical precision. Get them wrong and you can quietly wipe a page, or a whole section, out of search.

What the tag looks like

<meta name="robots" content="index, follow">

That example says: index this page and follow the links on it. It is also the default behavior, so writing it out does nothing you were not already getting. The tag earns its keep when you want to say no. The two directives you will reach for most are noindex, which keeps a page out of the index, and nofollow, which tells the engine not to pass signals through the links on the page.

The directives worth knowing

index / noindex: allow or block this page from appearing in search results.
follow / nofollow: pass or withhold ranking signals through the links on the page.
noarchive: tell the engine not to store a cached copy of the page.
nosnippet: prevent any text snippet or preview from showing in results.
max-snippet, max-image-preview, max-video-preview: cap how much of your content shows in the result.
noimageindex: keep images on the page out of image search.

You can also target a specific crawler instead of all of them by swapping the name value. Use name="googlebot" to speak only to Google while leaving other engines on their default behavior. That is how you hand one engine a different instruction from the rest.

<meta name="googlebot" content="noindex, follow">

bolt

If the page is blocked in robots.txt, the crawler never reads your noindex tag. To deindex a page, leave it crawlable and let the engine see the noindex.

The mistake that bites everyone

People reach for robots.txt and a noindex tag at the same time, thinking two locks are safer than one. They are not. They cancel each other out. If you disallow a URL in robots.txt, the crawler is not allowed to fetch it, which means it never sees the noindex sitting in the head. The page can then hang around in the index as a bare link with no description. Pick one tool for the job. To remove a page from search, leave it crawlable and use noindex.

warningWATCH OUT

A sitewide noindex left in a staging template is one of the fastest ways to vanish from Google. After any launch or migration, view source on key pages and confirm the robots tag says what you think it says.

targetX-Robots-Tag for non-HTML files

You cannot put a meta tag inside a PDF or an image. For those, use the X-Robots-Tag HTTP response header instead. It carries the same directives, sent at the server level, so you can noindex a PDF or block an image from indexing without touching any HTML.

One page, one set of instructions

The meta robots tag governs a single page's crawl and index behavior. Use it for precise control, never alongside a robots.txt block of the same URL, and always verify it after a launch.

For how this fits the bigger picture of crawl control and indexing, read my guide on technical SEO fundamentals.

GO DEEPER

data_object

Technical SEO

Crawl, render, index.

Want this handled by someone who has measured search for 20 years?

Work with me