Robots.txt Issues in Google Search Console

by @ahxnamc (151), 7 months ago

Hello SEOs,

I found multiple robots txt URLs in my Google search console that I didn't even create. I don't know from where GSC is fetching these URLs?

1https //example(dot)com/robots(dot)txt 2https //subdomain(dot)example(dot)com/robots(dot)txt 3http //www.example(dot)com/robots(dot)txt 4https //www.example(dot)com/robots(dot)txt 5http //example(dot)com/robots(dot)txt

The main version of my website is the first one (https //example(dot)com/robots(dot)txt). I don't know how to remove other robots txt URLs. Need help on this.

Moreover, in Google search console >> settings >> Crawl stats >> Hosts

I can see three different URLs of my site 1example(dot)com 2subdomain-example(dot)com 3www(dot)example(dot)com

The website is on WordPress. I worked on a lot of websites and never faced such issues. Can anybody tell me if these are the technical issues? The website has more than 900 pages and only 10 are indexed. Google is not crawling my site's pages. Content on the website is related to healthcare and its 100% AI-generated.

What should I do in order to make Google crawl my website and index its pages.

seo technical seo

584 Views

0 Upvotes

4 Replies

3 Users

Sort replies:

by @Khaldoon Al Mubarak (-999), 7 months ago

Hi @ahxnamc,

It sounds like you're dealing with a few common issues related to robots.txt and site configuration. Here are some key steps to fix the problem:

Check for Duplicate Robots.txt Files: Ensure that only one robots.txt file is accessible at the correct URL (e.g., https://example.com/robots.txt). The multiple entries might be due to your site having different versions (www, non-www, subdomain). Use 301 redirects to point all variations to the preferred version.
Canonicalization: Set a canonical URL for your main site version (e.g., https://example.com) in your site's WordPress settings and in the header of each page using a ``.
Check Crawl Stats: In Google Search Console, check your Crawl Stats and URL Parameters settings. Ensure Googlebot can access your site’s pages without restrictions.
Content Quality: Since your content is AI-generated, ensure it’s original, valuable, and well-optimized. Google may struggle with low-quality or duplicate content, which could affect crawling and indexing.
Submit a Sitemap: Submit your XML sitemap to Google Search Console to help Google crawl and index your pages.

Once you've addressed these, Google should be able to crawl and index your pages more efficiently. Let me know if you need more help!

Share Report

by @ahxnamc (151), 7 months ago

Hi @Khaldoon Al Mubarak

Thanks for the replying.

The redirects have been placed and there is only 1 file of robots.txt added. But still, there are multiple robots.txt URLs can be seen in GSC and in Crawl stats GSC is crawling 3 URLs of the website.

1example.com 2subdomain-example.com 3www.example.com

Google has fetched the Sitemap. It's been more than 20 days, but Google is not crawling the website's pages.

Share Report

by @rpdkdigital (105), 7 months ago

Fix the canonicalization and robots.txt issues immediately. Redirect non-preferred domain versions to the preferred one. Rewrite AI-generated content to improve its quality. Regularly monitor GSC and crawl stats for changes.

Share Report

by @ahxnamc (151), 6 months ago

Canonicalization has been done, and robots.txt is fixed. In short, everything has been done from my end.

Share Report

Join the forum to unlock true power of SEO community

You're welcome to become part of SEO Forum community. Register for free, learn and contribute.

Robots.txt Issues in Google Search Console

Join the forum to unlock true power of SEO community

Categories