Robots.txt Issues in Google Search Console
Hello SEOs,
I found multiple robots txt URLs in my Google search console that I didn't even create. I don't know from where GSC is fetching these URLs?
1https //example(dot)com/robots(dot)txt 2https //subdomain(dot)example(dot)com/robots(dot)txt 3http //www.example(dot)com/robots(dot)txt 4https //www.example(dot)com/robots(dot)txt 5http //example(dot)com/robots(dot)txt
The main version of my website is the first one (https //example(dot)com/robots(dot)txt). I don't know how to remove other robots txt URLs. Need help on this.
Moreover, in Google search console >> settings >> Crawl stats >> Hosts
I can see three different URLs of my site 1example(dot)com 2subdomain-example(dot)com 3www(dot)example(dot)com
The website is on WordPress. I worked on a lot of websites and never faced such issues. Can anybody tell me if these are the technical issues? The website has more than 900 pages and only 10 are indexed. Google is not crawling my site's pages. Content on the website is related to healthcare and its 100% AI-generated.
What should I do in order to make Google crawl my website and index its pages.
Hi @ahxnamc,
It sounds like you're dealing with a few common issues related to robots.txt and site configuration. Here are some key steps to fix the problem:
-
Check for Duplicate Robots.txt Files: Ensure that only one
robots.txt
file is accessible at the correct URL (e.g.,https://example.com/robots.txt
). The multiple entries might be due to your site having different versions (www, non-www, subdomain). Use 301 redirects to point all variations to the preferred version. -
Canonicalization: Set a canonical URL for your main site version (e.g.,
https://example.com
) in your site's WordPress settings and in the header of each page using a ``. -
Check Crawl Stats: In Google Search Console, check your Crawl Stats and URL Parameters settings. Ensure Googlebot can access your site’s pages without restrictions.
-
Content Quality: Since your content is AI-generated, ensure it’s original, valuable, and well-optimized. Google may struggle with low-quality or duplicate content, which could affect crawling and indexing.
-
Submit a Sitemap: Submit your XML sitemap to Google Search Console to help Google crawl and index your pages.
Once you've addressed these, Google should be able to crawl and index your pages more efficiently. Let me know if you need more help!
Hi @Khaldoon Al Mubarak
Thanks for the replying.
The redirects have been placed and there is only 1 file of robots.txt added. But still, there are multiple robots.txt URLs can be seen in GSC and in Crawl stats GSC is crawling 3 URLs of the website.
1example.com 2subdomain-example.com 3www.example.com
Google has fetched the Sitemap. It's been more than 20 days, but Google is not crawling the website's pages.
Fix the canonicalization and robots.txt issues immediately. Redirect non-preferred domain versions to the preferred one. Rewrite AI-generated content to improve its quality. Regularly monitor GSC and crawl stats for changes.
Canonicalization has been done, and robots.txt is fixed. In short, everything has been done from my end.