Skip to content

email.utils.make_msgid return ids that break email messages with related content #100293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ostefano opened this issue Dec 16, 2022 · 14 comments
Open
Labels
docs Documentation in the Doc dir topic-email type-bug An unexpected behavior, bug, or error

Comments

@ostefano
Copy link

ostefano commented Dec 16, 2022

Bug report

I have been trying to replicate the examples listed here: https://docs.python.org/3/library/email.examples.html

For some reason the one about "creating an HTML message with an alternative plain text version" is assembling an email message that Thunderbird (and other email readers) does not display correctly, as images are not displayed and marked as broken.

The example uses make_msgid() to generate content ids.

Python 3.10.9 (main, Dec  7 2022, 03:14:04) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from email import utils
>>> utils.make_msgid()
'<167119948916.50921.14529814791249370642@1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa>'
>>>

Turns out that for some reason the string is too long, because if I either remove the domain part or purportedly shorten it, e.g., make_msgid(domain="0.0.0.ip6.arpa"), then everything works again and the resulting email can be correctly displayed in Thunderbird/Outlook.

Your environment

  • CPython versions tested on: 3.10.9
  • Operating system and architecture: OSX/M1

Linked PRs

@ostefano ostefano added the type-bug An unexpected behavior, bug, or error label Dec 16, 2022
@sobolevn
Copy link
Member

sobolevn commented Jan 8, 2023

make_msgid by default uses socket.getfqdn() to get the domain part. For my machine it is short enough. So, you have two options:

  1. Change your hostname
  2. Use explicit domain name

I don't think that there's anything we can do from our side.

@sobolevn sobolevn added topic-email pending The issue will be closed if no feedback is provided labels Jan 8, 2023
@ostefano
Copy link
Author

ostefano commented Jan 8, 2023

@sobolevn I am perfectly fine implementing that workaround in my code. The problem is that this issue is not documented at all, and users reading the official documentation here https://docs.python.org/3/library/email.examples.html might try to implement the example and found themselves completely stumped.

I think we should at least add what you say in your reply to the documentation page linked above.
What do you think?

@sobolevn
Copy link
Member

sobolevn commented Jan 8, 2023

Looks like it is documented here: https://docs.python.org/3/library/email.utils.html?highlight=make_msgid#email.utils.make_msgid

I don't think that adding implementation details of make_msgid to the multi-alternatives example is a good idea.

However, making docs better is always a good thing, so - if you have some specific suggestions, please feel free to post them! :)

@ostefano
Copy link
Author

ostefano commented Jan 8, 2023

What about something like: "Note that modern email clients might not display correctly emails containing resources with message-id longer than XX characters" ?

@dtrodrigues
Copy link
Contributor

While Thunderbird doesn't display messages with a long msgid correctly, Apple Mail does. Which other email clients are not working?

@ostefano
Copy link
Author

ostefano commented Jan 8, 2023

Outlook 365 (latest on the stable channel)

@sobolevn
Copy link
Member

sobolevn commented Jan 8, 2023

Something like "Note that some email clients might not correctly display emails containing resources with long Message-Id, which usually happens due to the long domain part" sounds like a reasonable note to add! 👍

@AlexWaygood AlexWaygood added the docs Documentation in the Doc dir label Jan 8, 2023
@ostefano
Copy link
Author

ostefano commented Jan 8, 2023

@sobolevn 👍 If you point me to the right documentation file, I'd be happy to create the PR.

@sobolevn sobolevn removed the pending The issue will be closed if no feedback is provided label Jan 8, 2023
@sobolevn
Copy link
Member

sobolevn commented Jan 8, 2023

@dtrodrigues
Copy link
Contributor

FWIW, the Thunderbird bug report is here: https://bugzilla.mozilla.org/show_bug.cgi?id=1612465

The longer domain is causing python to encode the Content-ID value to split it across multiple lines, but Thunderbird doesn't seem to support that part of the spec.

@ostefano
Copy link
Author

ostefano commented Jan 8, 2023

@sobolevn done 👍

@bitdancer
Copy link
Member

bitdancer commented Jan 10, 2023

At the risk of muddying the waters, I think this is actually a bug. I don't believe message-id headers are technically allowed to be encoded using encoded words. The spec is pretty clear that addr-specs are not to be rfc 2047 encoded, and a message-id is composed of addr-spec like things. More directly on point, it is a structured field and its contents is not a phrase. The email package should really probably default to not doing encoding except where it is permitted...instead I went with preventing it on demand (encode_as_ew = False, but the default is True). I believe I did that because X-headers can contain encoded words, and I wanted doing such encoding of X-headers to be the default. I think now that was an incorrect design decision, as it has resulted in several bug reports like this one, including one, if I recall correctly, that was an X-header.

Now, I could be wrong about encoding of message-id headers. After all, I was much more cognizant of the RFCs when I was writing the code than I am now, years later ;)

If I'm right this raises the question of how you comply with the RFC line length requirements while also not using encoded words. The answer, I think, is that you don't. Long lines are handled correctly by far more mail clients than encoding-where-it-doesn't-belong is.

@ostefano
Copy link
Author

@sobolevn @bitdancer what is the consensus here? Shall we merge the PR in the meanwhile?

@blaisep
Copy link
Contributor

blaisep commented May 20, 2024

Also there is a related doc PR #100856

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir topic-email type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants