GlossaryUpdated July 4, 20261 min read

Robots.txt

By Acadia Marketing

Robots.txt is a small file with outsized power — it guides crawlers around your site. A single wrong line can accidentally hide your whole business from Google.

Robots.txt

Key Takeaways

  • Robots.txt is a text file that tells crawlers which parts of your site they may crawl.
  • It controls crawling, not indexing — it is not a reliable way to hide a page from search.
  • A stray Disallow rule can accidentally block your entire site from Google.
How Google Search works: crawling, indexing, and rankingA three-stage pipeline. Googlebot crawls pages by following links, the pages are analyzed and stored in the Google index, then the most relevant pages are ranked and served in the search results.1Crawl
Googlebot follows links and discovers your pages
2Index
Google analyzes and stores the page in its index
3Rank
The best-matching pages are served for a query

What robots.txt does

Robots.txt is a plain text file at the root of your site (yoursite.com/robots.txt) that tells search engine crawlers which parts of your site they are and are not allowed to crawl. Crawlers like Googlebot check this file before crawling.

It uses simple rules. For example, Disallow: /admin/ tells crawlers to stay out of your admin area, while Allow: permits access. You can also point crawlers to your XML sitemap from here. It is commonly used to keep crawlers away from areas that offer no value in search — login pages, internal search results, or duplicate parameter URLs.

The critical thing robots.txt cannot do

Here is the misunderstanding that causes real damage: robots.txt controls crawling, not indexing. Blocking a page in robots.txt tells Google not to crawl it — but if that page is linked from elsewhere, Google can still index the URL (often with no description, since it never read the content). If you truly want a page kept out of search results, use a noindex tag on the page itself, not a robots.txt block.

Even more important is the catastrophic mistake it can cause. A single line like this:

  • User-agent: *
  • Disallow: /

tells every crawler to stay off your entire site. This line is standard on staging sites during development — and every so often it gets pushed live by accident, quietly wiping a business off Google. If your rankings suddenly vanish, checking robots.txt is one of the first things to do.

Because the stakes are high and the syntax is unforgiving, robots.txt is worth handling carefully. It is a core part of technical SEO basics, and it works hand in hand with indexing controls.

Frequently Asked Questions

Does robots.txt stop a page from appearing in Google?+

Not reliably. Robots.txt blocks crawling, but a blocked URL can still be indexed if other pages link to it. To keep a page out of search results, use a noindex tag on the page rather than a robots.txt disallow.

What happens if my robots.txt blocks everything?+

A rule like "Disallow: /" for all user agents tells crawlers to avoid your entire site, which can remove you from Google. This often happens when a staging site's robots.txt is accidentally pushed live — always check it if rankings disappear.

Where is my robots.txt file?+

It lives at the root of your domain: yoursite.com/robots.txt. You can view any site's file by visiting that URL. Most website platforms generate a sensible default, but it is worth confirming yours is not blocking anything important.

Want This Done For You?

We build the systems behind rankings, ads, and leads

Acadia Marketing helps Maine businesses turn search traffic into booked, paying customers — with SEO, Google Ads, and Local Services Ads that actually perform.