So, after staring at much Python and sending dozens of e-mail messages
to myself, here's what I've learned about sending non-ASCII characters
The fundamental problem is that Python's smtplib.sendmail() function
throws an exception if non-ASCII characters are present in the e-mail
Mailman therefore executes msgtext = msgtext.encode('ascii',
'replace').decode('ascii') before passing anything to smtplib. This is
what (I surmise) converts right-apostrophes, Æ's, and the like into
Normal e-mail clients don't trigger the conversion because they
variously encode non-ASCII characters using quoted printable encoding
(=36=86 etc), convert the non-ASCII characters into HTML entities, or
encode the message as a base64 MIME attachment. I think Mailman is
trying to do the last of these but my e-mail client (at least) displays
the ASCII-ified part of the message rather than the attachment.
On the other hand, PHP's mail() function simply sends whatever bytes the
programmer gives it. This works as long as your SMTP server accepts
UTF-8 characters, but, as already discussed, Python doesn't.
I sent the message below using
$to = "selenetest(a)lochac.sca.org".org";
$from = "aelfred(a)nps.id.au"d.au";
$subject = "Sea Dragons Picnic";
$body = "Testing <i>Sea Dragon’s Picnic</i> with an apostrophe, from
$header = "From: $from\r\n" .
"Content-Type: text/html; charset=utf-8\r\n" .
mail($to, $subject, quoted_printable_encode($body), $header);
where quoted_printable_encode() is part of the standard PHP library.
When we first tested Mailman3, I recall that some webmail clients (e.g.
Gmail) had trouble with the quoted-printable characters. The message
below appears correctly in Roundcube, though.
The other two approaches I know of are:
Convert non-ASCII characters to HTML entitites. This is what Roundcube
does, taking advantage of a function provided by the TinyMCE editor
Encode the message body with Base64 and set Content-Transfer-Encoding:
base64. I haven't tested this.
Finally, all of this applies only to the e-mail body. The subject line
can't contain raw UTF-8 characters because most SMTP servers reject
them. Messages I've sent from my desktop client use a kind of
quoted-printable encoding that looks like this:
...but I haven't looked into what the relevant standard is or what PHP
functions might produce such a thing.
------ Original Message ------
From "aelfred--- via Selenetest" <selenetest(a)lochac.sca.org>
Date 25/02/2023 3:41:20 PM
Subject [Selenetest] Sea Dragons Picnic
Testing Sea Dragon’s Picnic with an apostrophe, from