So, after staring at much Python and sending dozens of e-mail messages
to myself, here's what I've learned about sending non-ASCII characters
through Mailman:
The fundamental problem is that Python's smtplib.sendmail() function
throws an exception if non-ASCII characters are present in the e-mail
body.
Mailman therefore executes msgtext = msgtext.encode('ascii',
'replace').decode('ascii') before passing anything to smtplib. This is
what (I surmise) converts right-apostrophes, Æ's, and the like into
question marks.
Normal e-mail clients don't trigger the conversion because they
variously encode non-ASCII characters using quoted printable encoding
(=36=86 etc), convert the non-ASCII characters into HTML entities, or
encode the message as a base64 MIME attachment. I think Mailman is
trying to do the last of these but my e-mail client (at least) displays
the ASCII-ified part of the message rather than the attachment.
On the other hand, PHP's mail() function simply sends whatever bytes the
programmer gives it. This works as long as your SMTP server accepts
UTF-8 characters, but, as already discussed, Python doesn't.
I sent the message below using
$to = "selenetest(a)lochac.sca.org".org";
$from = "aelfred(a)nps.id.au"d.au";
$subject = "Sea Dragons Picnic";
$body = "Testing <i>Sea Dragon’s Picnic</i> with an apostrophe, from
Ælfred.";
$header = "From: $from\r\n" .
"Content-Type: text/html; charset=utf-8\r\n" .
"Content-Transfer-Encoding: quoted-printable";
mail($to, $subject, quoted_printable_encode($body), $header);
where quoted_printable_encode() is part of the standard PHP library.
When we first tested Mailman3, I recall that some webmail clients (e.g.
Gmail) had trouble with the quoted-printable characters. The message
below appears correctly in Roundcube, though.
The other two approaches I know of are:
Convert non-ASCII characters to HTML entitites. This is what Roundcube
does, taking advantage of a function provided by the TinyMCE editor
(Javascript). I don't know of any ready-made PHP function that will do
this.
Encode the message body with Base64 and set Content-Transfer-Encoding:
base64. I haven't tested this.
Finally, all of this applies only to the e-mail body. The subject line
can't contain raw UTF-8 characters because most SMTP servers reject
them. Messages I've sent from my desktop client use a kind of
quoted-printable encoding that looks like this:
Subject: [Selenetest]
=?utf-8?q?Message_with_an_=C3=86_in_it?=
...but I haven't looked into what the relevant standard is or what PHP
functions might produce such a thing.
Ælfred
------ Original Message ------
From "aelfred--- via Selenetest" <selenetest(a)lochac.sca.org>
To selenetest(a)lochac.sca.org
Date 25/02/2023 3:41:20 PM
Subject [Selenetest] Sea Dragons Picnic
Testing Sea Dragon’s Picnic with an apostrophe, from
Ælfred.