Robots.txt and Sitemap.xml files are used by all popular search engines, and both files are subject to specific rules and protocols and because of that there are few limitations you need to know about.
- This file must be on the top level of your domain. If you have websites as sub-directories on the main domain, they can’t have own robots.txt, they must share top level domain robots.txt file.
- There is no guarantee that robot or crawler visiting your website will use your robots.txt file. Most search engines do use and obey robots.txt file, but malware robots will not.
- Some elements of the robots file can be ignored by some search engines.
- Sitemap location determines the set of URL’s included in the sitemap. Sitemap located at: http://www.example.com/test/sitemap.xml can contain URL’s that start with http://www.example.com/test/.
- Only submit the main sitemap.xml index to the search engines, do not submit individual sitemaps listed in the index. Index contains all the individual sitemaps, and search engines know how to process them.
- Do not add URL’s from other domains to your sitemap, first rule related to the location of the sitemap applies.