You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #9955, @brson added a robots.txt file (thanks!) to prevent search engines crawling non-current docs. However, I just noticed that this doesn't really address the issue - I used Google to search for 'Rust pointers', and the third result was static.rust-lang.org/doc/0.6/tutorial-borrowed-ptr.html , although the robots.txt did stop Google from providing a description of the result ;)
Apparently robots.txt will stop Google/etc from crawling a file, but not from indexing a file if anyone on the internet has linked to it: https://support.google.com/webmasters/answer/156449?hl=en
Not sure what web server rust-lang runs on, but e.g. in Apache you can use .htaccess to write a X-Robots-Tag header to set noindex/nofollow on entire directories, instead of having to add it in the header of each page: http://perishablepress.com/taking-advantage-of-the-x-robots-tag/
One note is that to make this approach work, I believe you have to not block crawling using robots.txt, or else the crawlers will never notice the X-Robots-Tag :)
Thanks!
The text was updated successfully, but these errors were encountered:
In #9955, @brson added a robots.txt file (thanks!) to prevent search engines crawling non-current docs. However, I just noticed that this doesn't really address the issue - I used Google to search for 'Rust pointers', and the third result was static.rust-lang.org/doc/0.6/tutorial-borrowed-ptr.html , although the robots.txt did stop Google from providing a description of the result ;)
Apparently robots.txt will stop Google/etc from crawling a file, but not from indexing a file if anyone on the internet has linked to it:
https://support.google.com/webmasters/answer/156449?hl=en
Not sure what web server rust-lang runs on, but e.g. in Apache you can use .htaccess to write a X-Robots-Tag header to set noindex/nofollow on entire directories, instead of having to add it in the header of each page:
http://perishablepress.com/taking-advantage-of-the-x-robots-tag/
One note is that to make this approach work, I believe you have to not block crawling using robots.txt, or else the crawlers will never notice the X-Robots-Tag :)
Thanks!
The text was updated successfully, but these errors were encountered: