How to Prevent Search Engines from Indexing a Page
I recently helped a client delete an old YouTube video from their channel. They hadn’t intended on making it publicly available, and didn’t realize that it was until they Google searched themselves.
While this won’t immediately deindex the page from Google, it got me thinking of reasons that someone might want to prevent search engines from indexing a page before any damage is done.
3 Reasons to Block Google from Indexing a Page
While there are very few pressing reasons you may want to learn how to prevent Google from indexing a page, here are some of the marketing reasons for doing so.
1. Improve Your Tracking and Goal Attribution
For many webmasters and marketers, goals for form completions are tracked by visits to a Thank You page. In order to prevent your Thank You page from accidentally receiving organic traffic, you’ll want to learn how to prevent Google from indexing the page entirely.
If you have organic traffic landing on your page in addition to users who have filled out your forms, your goals and goal conversion rate won’t be accurate.
2. Reduce Pages with No User Value
While it’s an overly simplistic model, you can almost imagine that your site has a pool of SEO value.
For a site with 10 pages, each page gets approximately 1/10th of the SEO value. If the site owner has learned how to do keyword research and optimized their all of the pages, all of those pages will be efficient and effective at generating organic traffic.
Conversely, image a site with 100 pages. There are four pages that actually talk about a business’ services, and the other 96 pages are “blog posts” that are really just the owner dumping information onto their site. These pages aren’t addressing known audience needs, and are not optimized for any relevant keyword groups.
In our simplified model, the pool of SEO value is spread thin. Each of the four services pages receives 1/100th of the site’s aggregate SEO value, making them very weak even though they are relatively optimized. The other 96 pages are receiving a 96/100ths of the value, but they are dead ends that trap and waste your website’s ranking potential.
Learning how to prevent search engines from indexing a page (or 96) is a great way to keep your site’s SEO value from being spread too thin. You can hide most of your website from search engines so that Google only knows about the useful and relevant pages that deserve to be found.
3. Avoid Duplicate Content Issues
Publishing a page that is identical or nearly identical to another page on the internet can cause some unnecessary decision-making for Google.
Which page is the original? Even if one of the pages was published first, is the duplicate page that followed the more authoritative source? If both pages are on your website, which one did you intend to be included in Google’s search results? You may not always like the outcome.
To avoid causing duplicate content issues, you may try to stop bots from crawling certain pages on your site.
How to Keep Google from Indexing a Page
The easiest and most common method to prevent search engines from indexing a page is to include the meta noindex tag.
Include the Noindex Tag
The noindex meta tag is used in between the <head></head> HTML tags on a web page to prevent search engine bots from including that page in their index. This still allows crawlers to read through your pages, but it suggests that they don’t include a copy of it to serve up in their search results.
The noindex tag to prevent search engines from indexing a page looks like this:
<meta name="robots" content="noindex">
If you’re only worried about preventing Google from indexing a page, you can use the following code:
<meta name="googlebot" content="noindex">
If you’re using WordPress as your CMS (which I highly recommend), then you may want to use the Yoast SEO plugin (which I also highly recommend). With a couple clicks of your mouse, you can add the noindex tag to any page that you desire.
In the backend of any page, scroll down to your Yoast SEO box. Then click the gear icon, and change the drop down field that says “Allow search engines to show this Post in search results?” to say “No.”
This is not a directive, so search engine can choose to ignore your meta noindex tag. For a more sturdy technique, you can use your robots.txt file.
Disallow Bots in Your Robots.txt
If you want to be sure that bots like Googlebot and Bingbot can’t crawl your pages at all, you can add directives to your robots.txt file.
Robots.txt is the file found in the root of an Apache server that can disallow certain bots from ever making it to your pages in the first place. It’s important to note that some bots can be instructed to ignore your robots.txt file, so you can really only block that “good” bots with this technique.
Let’s use a page on your site, https://www.mysite.com/example-page/, as an example. To disallow all bots from accessing this page, you would use the following code in your robots.txt:
Notice that you don’t have to use your full URL, just the URI that comes after your domain name. If you only want to block Googlebot from crawling the page, you could use the following code:
Stop Bots from Crawling Your Site with .htaccess
I personally don’t know any clients that would ever need to use this, but you can use your .htaccess file to block any user-agent from crawling your site.
This is a way to completely stop Google from crawling your site that can’t be ignored, even by “bad” bots. The caveat is that this is more of a sweeping solution, and less targeted to a specific page. Managing the targeted denial of access for several pages inside of your .htaccess file would be a nightmare.
The code to block Googlebot only would look like this:
RewriteCond %HTTP_USER_AGENT Googlebot [NC]
RewriteRule .* - [F,L]
If you want to block several bots at a time, you can set your code up like this:
RewriteCond %HTTP_USER_AGENT ^.*(Googlebot|Bingbot|Baiduspider).*$ [NC]
RewriteRule .* - [F,L]
Learning how to prevent search engine from indexing one of your pages is sometimes necessary, and not very difficult depending on how you choose to do it.