Robots.txt in Plain English: What It Does, When It Matters, and Copy-Paste Templates

There exists, in the root directory of nearly every website on the internet, a small text file that most business owners have never seen, never edited, and -- in many cases -- never even heard of. It is called robots.txt, and despite its unassuming name, it wields a remarkable amount of power over whether search engines can actually find and index your website.

The truth is, this file is simultaneously one of the simplest and one of the most misunderstood elements of technical SEO. Simple because it is, quite literally, a plain text file with a handful of basic commands. Misunderstood because the misconceptions surrounding it are so widespread that they have become a kind of folk wisdom in the SEO world -- and much of that folk wisdom is, to put it diplomatically, wrong.

This guide will explain what robots.txt actually does, what it does not do (a critically important distinction), when you need to care about it, and when you can safely leave it alone. And because we believe in practical value over theoretical knowledge, we have included copy-paste templates for the most common scenarios. Let us begin.

What Robots.txt Actually Is -- In Words a Business Owner Would Use

Imagine your website is a building with many rooms. Search engines like Google send automated programs -- called crawlers or bots -- to visit your building and look in every room. The robots.txt file is essentially a sign posted at the entrance that says: "Welcome, but please do not go into rooms 3 and 7."

That is it. That is all it does.

More technically, robots.txt is a plain text file that lives at yoursite.com/robots.txt. When a search engine bot arrives at your website, the very first thing it does -- before looking at any of your pages -- is check this file for instructions. The file tells the bot which parts of your site it is allowed to crawl and which parts it should skip.

The key word here is "should." And this brings us to the single most important misconception about robots.txt.

The Misconception That Could Cost You: Robots.txt Does NOT Hide Pages

This cannot be stated firmly enough, because the misunderstanding is genuinely dangerous: robots.txt does not prevent your pages from appearing in Google search results. It merely asks search engine bots not to crawl those pages.

Why does this distinction matter? Because if another website links to a page you have blocked in robots.txt, Google may still index that page. It will appear in search results with a message like "No information is available for this page" -- but it will appear. Google knows the page exists; it simply has not been allowed to read it.

If you genuinely need to prevent a page from appearing in search results, robots.txt is the wrong tool. You need either a noindex meta tag or a login requirement. Robots.txt is about controlling crawling, not about controlling indexing -- and the difference between these two concepts is, alla fine, the difference between asking someone not to visit your house and asking them not to talk about your house.

The Syntax -- In Plain Language

The robots.txt file uses a remarkably simple syntax. There are really only four commands you need to understand:

User-agent

This specifies which crawler the following rules apply to. Think of it as addressing a specific delivery driver by name.

User-agent: * means "these rules apply to all crawlers"
User-agent: Googlebot means "these rules apply only to Google's crawler"
User-agent: Bingbot means "these rules apply only to Bing's crawler"

For most small business websites, you will always use User-agent: * because you want the same rules for everyone.

Disallow

This tells the specified crawler not to access a particular path on your site.

Disallow: /admin/ means "do not crawl anything in the admin directory"
Disallow: /private-page.html means "do not crawl this specific page"
Disallow: / means "do not crawl anything at all" -- the entire site

An empty disallow -- Disallow: with nothing after it -- means "everything is allowed." This is an important nuance.

Allow

This is used to make exceptions within a broader disallow rule.

If you have blocked /admin/ but want Google to see /admin/public-page.html, you would use:

`` Disallow: /admin/ Allow: /admin/public-page.html ``

Sitemap

This tells crawlers where your XML sitemap is located. It is not technically a "rule" -- it is more like a helpful note at the bottom of the file.

Sitemap: https://yoursite.com/sitemap.xml

That is the entire vocabulary. Four words. If you understand these four commands, you understand robots.txt.

How to Check Your Current Robots.txt

Before changing anything, you should see what you currently have. This takes approximately ten seconds:

Open your web browser
Type your website address followed by /robots.txt
For example: https://yoursite.com/robots.txt
Press Enter

You will see one of three things:

A text file with rules -- Your site has a robots.txt file. Read on to understand whether the rules make sense.

A 404 error page -- Your site does not have a robots.txt file. This is perfectly fine for most small websites. Google will simply crawl everything, which is usually what you want.

A blank page -- Your site has an empty robots.txt file. Also fine -- same effect as not having one.

Google Search Console also provides a robots.txt testing tool that lets you check whether specific URLs are blocked. If you have Search Console set up (and you should), this is the most reliable way to verify your configuration.

Copy-Paste Templates for Common Scenarios

Here is the part that, I suspect, most readers came for. Below are ready-to-use robots.txt templates for the most common situations. Choose the one that matches your setup, copy it, and paste it into your robots.txt file.

Template 1: Basic Business Website (Allow Everything)

This is the right choice for most small business websites. You want Google to crawl and index everything.

User-agent: *
Disallow:

Sitemap: https://yoursite.com/sitemap.xml

That is it. The empty Disallow: line explicitly tells all crawlers that nothing is blocked. The sitemap line helps them find your content efficiently.

Template 2: WordPress Website

WordPress creates certain directories that you generally do not want cluttering up Google's crawl -- the admin area, login pages, and internal search results.

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-login.php
Disallow: /?s=
Disallow: /search/

Sitemap: https://yoursite.com/sitemap.xml

The Allow: /wp-admin/admin-ajax.php line is important -- this file is needed for many WordPress themes and plugins to function correctly, and blocking it can cause rendering issues for Google.

Template 3: Shopify Store

Shopify generates certain internal pages that are not useful for search engines.

User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkouts/
Disallow: /checkout
Disallow: /account
Disallow: /collections/*+*
Disallow: /blogs/*+*
Disallow: /*?*variant=*
Disallow: /search

Sitemap: https://yoursite.com/sitemap.xml

Note that Shopify actually manages your robots.txt file automatically in most cases. Check before overriding.

Template 4: Website Under Construction

If your website is being built and you do not want Google indexing incomplete pages:

User-agent: *
Disallow: /

Important: Remember to remove this before you launch. We have seen businesses lose months of potential visibility because someone forgot to update robots.txt after their site went live. Set a calendar reminder.

Template 5: Block Specific AI Crawlers

If you want to allow regular search engines but block certain AI crawlers from training on your content:

User-agent: *
Disallow:

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

Sitemap: https://yoursite.com/sitemap.xml

This allows Google and Bing to crawl everything while blocking specific AI model training crawlers. Whether this is the right choice for your business depends on your perspective on AI training -- but the option exists.

Common Mistakes That Genuinely Hurt

In our experience auditing thousands of small business websites, these are the robots.txt errors we encounter most frequently -- and each one has real consequences.

Mistake 1: Blocking CSS and JavaScript Files

Some older robots.txt configurations include lines like Disallow: /wp-content/themes/ or Disallow: /*.css$. This was common advice a decade ago. Today, it actively harms your SEO because Google needs to render your page fully -- CSS, JavaScript, images, and all -- to understand it properly. If you block these resources, Google sees a broken, unstyled page and may rank it accordingly.

The fix: Remove any rules that block CSS, JavaScript, or image files. Google should see your pages exactly as your visitors do.

Mistake 2: Blocking the Entire Site Accidentally

The line Disallow: / blocks everything. It is a single character -- a forward slash -- that can make your entire website invisible to search engines. We have seen this left in place after site migrations, after testing, after a developer forgot to revert a temporary change. The consequences are severe and can take weeks to recover from.

The fix: If you see Disallow: / without a corresponding User-agent line targeting a specific bot (like an AI crawler), something is almost certainly wrong. Change it to Disallow: (with nothing after it) or remove the line entirely.

Mistake 3: Using Robots.txt Instead of Noindex

As discussed above, if your goal is to keep a page out of Google's search results, robots.txt is the wrong tool. Use a <meta name="robots" content="noindex"> tag in the page's HTML instead. If you block the page with robots.txt, Google cannot see the noindex tag, which means Google cannot obey the noindex instruction, which means the page might appear in search results anyway. It is, as they say, a catch-22.

Mistake 4: Different Robots.txt on HTTP and HTTPS

If your site is accessible on both http://yoursite.com and https://yoursite.com, each version has its own robots.txt. Make sure they are identical, or -- better yet -- ensure that HTTP redirects to HTTPS so there is only one version to worry about.

Mistake 5: Forgetting the Sitemap Line

The Sitemap: directive in robots.txt is an easy, passive way to ensure every search engine that visits your site knows where your sitemap is. Not including it means relying entirely on Search Console and Bing Webmaster Tools for sitemap discovery. Adding it costs nothing and takes five seconds.

When You Genuinely Need to Edit Robots.txt

Not every website needs a custom robots.txt file. In fact, for many small business sites, the default configuration -- either no file at all, or the one your site builder generates automatically -- is perfectly adequate.

You should consider editing robots.txt when:

Your site has internal search pages that create infinite URL combinations Google could try to crawl
You have staging or development areas accessible on the live domain that should not be indexed
Your CMS generates duplicate content paths (like tag pages, author archives, or date-based archives that add no unique value)
You want to manage crawl budget because your site has thousands of pages and Google is spending too much time on unimportant ones
You want to block specific AI training crawlers from accessing your content

You probably do not need to edit robots.txt when:

Your site has fewer than 50 pages and you want everything indexed
Your site builder (Wix, Squarespace, Shopify) manages it automatically
You are not experiencing any crawling or indexing issues in Search Console

The principle here is simple: if it is not broken, do not fix it. And if it is broken, the fix is almost always a two-minute edit to a text file.

How to Edit Your Robots.txt File

The process depends on your platform:

WordPress: Install the Yoast SEO or Rank Math plugin. Both provide a robots.txt editor in their settings panel -- no FTP access or coding required.

Shopify: Navigate to Online Store > Themes > Actions > Edit code. Look for the robots.txt.liquid file in the Templates section.

Squarespace: Squarespace handles robots.txt automatically and does not provide direct editing. This is usually fine; their default configuration is sensible.

Wix: Wix manages robots.txt automatically. You can view it but cannot edit it directly in most cases.

Custom website: Upload or edit the robots.txt file in the root directory of your web server via FTP, cPanel, or your hosting provider's file manager.

After making changes, always verify using Google Search Console's URL Inspection tool to confirm that your important pages are not accidentally blocked.

Testing Your Robots.txt Changes

Before and after any changes, use these free tools to verify everything is working correctly:

Google Search Console: The URL Inspection tool will tell you whether Google can access any specific page, including whether it is blocked by robots.txt
Google's robots.txt Tester: Available within Search Console, this lets you test specific URLs against your rules
Your browser: Simply visit yoursite.com/robots.txt to confirm the file looks correct

One additional note: changes to robots.txt take effect almost immediately for new crawl requests, but if Google has already crawled and indexed pages you are now blocking, those pages will not immediately disappear from search results. They will gradually be dropped as Google recrawls and finds them blocked, which can take several weeks.

The Connection to Your Broader SEO Strategy

Robots.txt is one piece of a larger technical SEO picture. It works in concert with your XML sitemap (which tells Google what to crawl), your canonical tags (which tell Google which version of a page to prefer), and your meta robots tags (which tell Google whether to index specific pages). Together, these tools give you precise control over how search engines interact with your website.

If you are looking for a broader understanding of technical SEO -- what matters, what does not, and where to focus your limited time -- our Technical SEO for Business Owners hub covers the complete landscape in the same plain-English approach.

And if you would rather start with a clear picture of where your website stands today -- including any robots.txt issues -- our free SEO check will scan your site in thirty seconds and explain everything it finds in language designed for business owners, not developers.

The Bottom Line

Robots.txt is a small file with an outsized reputation. For most small business websites, it requires little or no attention. But when something goes wrong -- an accidental block, a misconfigured rule, a forgotten development restriction -- the consequences can be significant.

The practical advice is this: check your robots.txt once. Make sure it is not blocking anything important. Add your sitemap URL if it is not there. Then, unless you encounter specific crawling issues in Google Search Console, leave it alone and focus your energy on the things that actually move rankings -- your content, your Google Business Profile, and the experience you provide to the people who find you.

Because in the end, robots.txt is not about mastering technical complexity. It is about making sure the door is open when Google comes knocking. And for most businesses, that door is already open. You just need to verify it.