Great detective work! I had a look into the subject encoding, and it seems to align with https://www.rfc-editor.org/info/rfc2047 - as you pointed out, similar to quoted-printable. Looks like PHP has both https://www.php.net/manual/en/function.iconv-mime-encode.php and https://www.php.net/manual/en/function.mb-encode-mimeheader.php, though the latter looks a little easier to use. I will give this a test! Cheers, Dylan On Sat, 25 Feb 2023, 06:24 Ælfred se leof via Selenetest, < selenetest@lochac.sca.org> wrote:
So, after staring at much Python and sending dozens of e-mail messages to myself, here's what I've learned about sending non-ASCII characters through Mailman:
1. The fundamental problem is that Python's smtplib.sendmail() function throws an exception if non-ASCII characters are present in the e-mail body. 2. Mailman therefore executes msgtext = msgtext.encode('ascii', 'replace').decode('ascii') before passing anything to smtplib. This is what (I surmise) converts right-apostrophes, Æ's, and the like into question marks. 3. Normal e-mail clients don't trigger the conversion because they variously encode non-ASCII characters using quoted printable encoding (=36=86 etc), convert the non-ASCII characters into HTML entities, or encode the message as a base64 MIME attachment. I think Mailman is trying to do the last of these but my e-mail client (at least) displays the ASCII-ified part of the message rather than the attachment. 4. On the other hand, PHP's mail() function simply sends whatever bytes the programmer gives it. This works as long as your SMTP server accepts UTF-8 characters, but, as already discussed, Python doesn't.
I sent the message below using
$to = "selenetest@lochac.sca.org"; $from = "aelfred@nps.id.au"; $subject = "Sea Dragons Picnic"; $body = "Testing <i>Sea Dragon’s Picnic</i> with an apostrophe, from Ælfred."; $header = "From: $from\r\n" . "Content-Type: text/html; charset=utf-8\r\n" . "Content-Transfer-Encoding: quoted-printable";
mail($to, $subject, quoted_printable_encode($body), $header);
where quoted_printable_encode() is part of the standard PHP library.
When we first tested Mailman3, I recall that some webmail clients (e.g. Gmail) had trouble with the quoted-printable characters. The message below appears correctly in Roundcube, though.
The other two approaches I know of are:
1. Convert non-ASCII characters to HTML entitites. This is what Roundcube does, taking advantage of a function provided by the TinyMCE editor (Javascript). I don't know of any ready-made PHP function that will do this. 2. Encode the message body with Base64 and set Content-Transfer-Encoding: base64. I haven't tested this.
Finally, all of this applies only to the e-mail body. The subject line can't contain raw UTF-8 characters because most SMTP servers reject them. Messages I've sent from my desktop client use a kind of quoted-printable encoding that looks like this:
Subject: [Selenetest] =?utf-8?q?Message_with_an_=C3=86_in_it?=
...but I haven't looked into what the relevant standard is or what PHP functions might produce such a thing.
Ælfred
------ Original Message ------ From "aelfred--- via Selenetest"
To selenetest@lochac.sca.org Date 25/02/2023 3:41:20 PM Subject [Selenetest] Sea Dragons Picnic Testing *Sea Dragon’s Picnic* with an apostrophe, from Ælfred.
_______________________________________________ Selenetest mailing list -- selenetest@lochac.sca.org To unsubscribe send an email to selenetest-leave@lochac.sca.org or manage your subscription via the web interface at https://mailman.lochac.sca.org/postorius/lists/selenetest.lochac.sca.org