-
-
Notifications
You must be signed in to change notification settings - Fork 63
robots.txt should steer search engines away from old docs #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I couldn't find |
@berkerpeksag i think we can achieve this via Fastly? https://docs.fastly.com/guides/basic-configuration/creating-and-customizing-a-robots-file |
@MarkMangoba good point! Could you please check whether https://docs.python.org/robots.txt is created on Fastly? I don't have a Fastly account so I can't check it myself. Also, I think we can safely add 3.2 and 3.3 to the list Skip shared in https://github.com/python/pythondotorg/issues/1030#issue-187084143. |
@ewdurbin Is this something we could talk about this week, as we work to close out Python 2 sunsetting communications tasks? |
Hi @JulienPalard -- could you confirm that this is ok to do? |
A We already have proper canonical links in some builds, like:
they are placed here by Doc/tools/templates/layout.html. this should be enough to have a /3/ instead of a /3.6/ in search results. But we don't have it for really old docs like 3.4:
See also #51. Using a Using a |
I agree with @JulienPalard, a disallow looks violent and I will always want to find the version 2 in some year if I need it ... We should let search engines do their job, but maybe we can help them a little:
|
I defer to the Docs team to make a decision on how to move forward here. Personally I don't see an immediate action that should be taken. Search is hard and I think there are enough little traps described above that this should be approached gradually and with a plan. |
I agree @ewdurbin - should we move this issue perhaps to the CPython repo, or to https://github.com/python/docsbuild-scripts ? |
Here's 95 links on English Wikipedia that begin with https://docs.python.org/2 and and 15 with http://docs.python.org/2. Not all need to be or should be changed, but I'll do a few now. (Edit: now 80 and 15.) |
I'd gladly take the issue on docsbuild-script, but in any cases I won't do disallows in robots.txt. Thanks a lot @hugovk to try and fix external links, it's a good was as it also helps users, not only search engines 👍 Also please note that the situation already enhanced since 2016, I'm having /fr/3/library/timeit.html first, /3/library/timeit.html 2nd in Google, and I'm having /3/library/timeit.html first on duckduckgo. Did not experimented more. This issue can be closed, reopen as needed on docsbuild scripts if you see SERP clearly lagging behind. |
We have another report showing links to Python 2 docs in python/pythondotorg#1619. I was going to transfer this issue to docsbuild-scripts, but I couldn't find it in the repository list. |
Not sure why you were unable to transfer @berkerpeksag, but it has been completed. |
@ewdurbin thank you very much! |
See https://mail.python.org/pipermail/pydotorg-www/2016-November/003921.html for original discussion.
When searching Google for "Python timeit" recently, the first hit was for
https://doc.python.org/2/library/timeit.html
The second hit, unfortunately, was for
https://doc.python.org/3.0/library/timeit.html
The first page of results didn't mention
https://doc.python.org/3/library/timeit.html
at all. It seems that the robots.txt file should be tweaked to strongly discourage search engine crawlers from traversing outdated documentation, at least < 3.2 or < 2.6. It's been a long while since I messed with a robots.txt file (so I won't pretend I could submit a proper PR), but something like
User-agent: *
disallow: /3.0/
disallow: /3.1/
disallow: /2.5/
disallow: /2.4/
disallow: /2.3/
disallow: /2.2/
disallow: /2.1/
disallow: /2.0/
should steer well-behaved crawlers away from obsolete documentation.
The text was updated successfully, but these errors were encountered: