How To Fix the “Indexed Though Blocked by robots.txt” Error?

If you’re experiencing issues where content that you intended to block using the robots.txt file has still been indexed by search engines, there could be a few reasons for this.

In this blog post, I’ll go through a step-by-step guide on how to fix it.

Read When Should You Use a Robots.txt file?

What is “Indexed Though Blocked by robots.txt”?

When Google and other search engines crawl your website, they refer to the robots.txt file to tell which pages or directories of your website they can or cannot crawl.

Pages that you block via robots.txt should not appear in search engine results.

However, the “Indexed Though Blocked by robots.txt” error means that a page has been indexed by Google even though it’s been marked as un-crawlable in your robots.txt file.

Why Does it Happen?

There are several reasons why this could occur, here are some examples:

  1. Prior Indexing: If the page was indexed before being blocked, Google may continue to show it in search results.
  2. External Links: Even if a page is blocked by robots.txt, search engines may still index it if they find a link to that page on another website.
  3. Inaccurate robots.txt Rules: The blocking rule may be incorrect or not specific enough.
  4. Search Engine Issue: Sometimes search engines make mistakes.

Implications for SEO

While the impact can vary, it’s generally not a good sign for your SEO for the following reasons:

  • Inconsistent Signals: It sends inconsistent signals to search engines, leading to inefficient crawling and indexing.
  • Wasted Crawl Budget: Search engines have a limited crawl budget for each website. Blocked but indexed pages can waste this budget.
  • Quality and Relevance: Pages that you intended to block may not be the best content to represent your website.

Fix the “Indexed Though Blocked by robots.txt” Error

Here are steps you can take to address the issue:

Step 1: Verify the error:

Before taking any corrective measures, make sure to verify the error. Check your Google Search Console account to identify the pages that have been indexed using the URL inspection tool.

Copy and paste the URL in the toolbox and let Google retrieve data from Google Index.

If the result is “URL is not on Google” then the page is not indexed. But if the result is “URL is on Google” that means that the page is already indexed and you need to fix this.

Step 2: Check robots.txt file placement:

Ensure that your robots.txt file is placed in the root directory of your website and is accessible via the correct URL https://yourdomain.com/robots.txt.

If the file is not in the right location or if it’s not accessible, web crawlers won’t be able to read it.

Step 3: Check the robots.txt file:

Double-check your robots.txt file to ensure that the URLs you want to block are correctly specified in the Disallow directives.

Make sure that you haven’t made any typos or errors in the syntax. Remember that the robots.txt file is case-sensitive.

Step 4: Use no-index meta tags:

If you want to prevent specific pages from being indexed, you can use the <meta name="robots" content="noindex"> meta tag in the HTML of those pages.

This tag tells search engines not to index the content, even if they have already crawled it.

Step 5: Use the removals tools:

Some search engines provide webmaster tools that allow you to request the removal of specific URLs from their index.

Google, for example, has a “Remove URLs” tool in Google Search Console that can be used to request the removal of indexed URLs.

Keep in mind that this is a manual process and might not be suitable for large-scale removals. If it’s a few pages that you want to remove from the search, this is a great tool to use.

Step 6: Wait for recrawling:

After you’ve updated your robots.txt file to block content, you’ll need to wait for search engines to recrawl your website.

It might take some time for them to process the changes and reflect them in their indexes.

Step 7: Check for redirects:

Sometimes, content might still appear in search results due to redirects that lead to blocked content.

Check for any redirects that might be inadvertently allowing access to blocked content.

Step 8: Monitor your pages:

Continue to monitor Google Search Console to ensure the issue is resolved. It may take some time for changes to reflect.

Conclusion

Fixing the “Indexed Though Blocked by robots.txt” error in some situations is crucial for the SEO health of your website.

It involves diagnosing the problem through Google Search Console, and accurately adjusting your robots.txt file, or updating meta tags.

By taking these steps, you can make sure that only the pages you want to be indexed show up in search engine results.

Remember that search engines might take some time to reflect changes, and not all search engines handle the robots.txt file in the same way.

Be patient and monitor the situation over time to ensure that the content you want to block is no longer appearing in search results.

Read When Should You Use a Robots.txt file?

Leave a Reply