So, after staring at much Python and sending dozens of e-mail messages to myself, here's what I've learned about sending non-ASCII characters through Mailman:
- The fundamental problem is that Python's smtplib.sendmail() function throws an exception if non-ASCII characters are present in the e-mail body.
- Mailman therefore executes msgtext = msgtext.encode('ascii', 'replace').decode('ascii') before passing anything to smtplib. This is what (I surmise) converts right-apostrophes, Æ's, and the like into question marks.
- Normal e-mail clients don't trigger the conversion because they variously encode non-ASCII characters using quoted printable encoding (=36=86 etc), convert the non-ASCII characters into HTML entities, or encode the message as a base64 MIME attachment. I think Mailman is trying to do the last of these but my e-mail client (at least) displays the ASCII-ified part of the message rather than the attachment.
- On the other hand, PHP's mail() function simply sends whatever bytes the programmer gives it. This works as long as your SMTP server accepts UTF-8 characters, but, as already discussed, Python doesn't.
I sent the message below using
$to = "selenetest@lochac.sca.org";
$from = "
aelfred@nps.id.au";
$subject = "Sea Dragons Picnic";
$body = "Testing <i>Sea Dragon’s Picnic</i> with an apostrophe, from Ælfred.";
$header = "From: $from\r\n" .
"Content-Type: text/html; charset=utf-8\r\n" .
"Content-Transfer-Encoding: quoted-printable";
mail($to, $subject, quoted_printable_encode($body), $header);
where quoted_printable_encode() is part of the standard PHP library.
When we first tested Mailman3, I recall that some webmail clients (e.g. Gmail) had trouble with the quoted-printable characters. The message below appears correctly in Roundcube, though.
The other two approaches I know of are:
- Convert non-ASCII characters to HTML entitites. This is what Roundcube does, taking advantage of a function provided by the TinyMCE editor (Javascript). I don't know of any ready-made PHP function that will do this.
- Encode the message body with Base64 and set Content-Transfer-Encoding: base64. I haven't tested this.
Finally, all of this applies only to the e-mail body. The subject line can't contain raw UTF-8 characters because most SMTP servers reject them. Messages I've sent from my desktop client use a kind of quoted-printable encoding that looks like this:
Subject: [Selenetest] =?utf-8?q?Message_with_an_=C3=86_in_it?=
...but I haven't looked into what the relevant standard is or what PHP functions might produce such a thing.
Ælfred
------ Original Message ------
Date 25/02/2023 3:41:20 PM
Subject [Selenetest] Sea Dragons Picnic
Testing Sea Dragon’s Picnic with an apostrophe, from Ælfred.