Monday, June 27, 2022

Exporting/Archiving E-mail From Apple Mail

Miles Wolbe:

Mail’s built-in export (and import) methods suffer from various limitations and bugs; the following tests were run under macOS 12.4 and Mail 16.0[…]

Of course, I recommend EagleFiler for this, which offers a variety of ways of importing from Apple Mail.

1. Mailbox → Export Mailbox...

I’m not sure what’s causing the importing problem that he mentions, but historically the Export Mailbox… command has not generated valid mbox files. I have not seen the “some messages could not be imported” error when importing properly formed mbox files, such as those generated by EagleFiler.

2. Click and drag multiple messages from Mail to Finder

You can also do this to export messages from EagleFiler, if you want to convert them from mbox to .eml format. Unlike Mail it doesn’t fail for messages with long subjects.

3. AppleScript

Many scripts, such as the mentioned Export Selected Mail Messages, will corrupt message data because they don’t properly handle a bug/oddity in Mail’s AppleScript support. Mail types the source AppleScript property as text when really it should be treated as raw data. If the script retrieves the source as a Unicode string and then writes it to a file, the non-ASCII bytes will be altered.

26 Comments RSS · Twitter

Thanks very much for sharing this post here with your insights, Michael.

> "If the script retrieves the source as a Unicode string and then writes it to a file, the non-ASCII bytes will be altered."

I haven't been able to reproduce this bug; can you please let me know how? I tried reproducing it by exporting messages containing Unicode characters (mainly in Japanese) via the aforementioned AppleScript and did not have any trouble with mojibake or the like with the exported EML files.

However, no matter whether I used AppleScript, click and drag from Mail to Finder, or messages imported from Mail into EagleFiler, the encoding for Unicode emails was always reported as us-ascii (though it did not appear to effect the data):

% file -bI exported-via-script.eml exported-via-drag.eml copied-from-eaglefiler.eml
message/rfc822; charset=us-ascii
message/rfc822; charset=us-ascii
message/rfc822; charset=us-ascii

Running diff3 showed all 3 EML files to be identical.

Beatrix Willius

There are other problems.

A Mail Archiver user told me that creating mbox files that are larger than 2 GB fails.

I still have an open Radar case about importing Sent Messages failing. There is an error "no valid mbox files found".

AppleScripts for Mail are painfully slow. That started in Crapolina and still isn't really good in Monterey.

Beatrix Willius

@Miles Wolbe: the first email I tried with Japanese characters ended up as mojibake.

@Beatrix Willius thank you for your report.

What did the headers look like in your EML files?

All 3 of my tests showed UTF-8 encoding as expected in the Subject and Content-Type fields.

@Miles This applies to e-mails where the raw message source contains non-ASCII characters, for example if the content has a charset but is not transfer-encoded with QP or Base64. There are really two issues here. One is that AppleScript receives the wrong bytes, i.e. the message may be in UTF-8 and AppleScript receives those bytes stuck into UTF-16 slots. The other is that, even if Mail didn’t do that, it’s not possible to represent an e-mail message (which might have multiple parts with different encodings) as a Unicode string.

@Michael Thanks for your reply. Here is the result of some cursory testing:

* Created a new mailbox in Mail and copied 172 messages containing Japanese characters from other Mail mailboxes into it.

* Exported via 4 methods:

1. Mail > Export Mailbox to MBOX

2. Click and drag from Mail to Finder

3. Tom Floeren's "Export Selected Mail Messages" AppleScript (four duplicate messages, i.e., those containing the same Message ID and date/time stamp as other messages, were not exported)

4. Initially tried clicking and dragging from EagleFiler to Finder, which worked for up to 50 messages at a time; attempting more than that resulted in a textClipping file containing the subject lines instead (dragging in batches of 50 is not ideal, especially as filename collisions result in only Skip, Stop, or Replace options). Worked around by highlighting all 172 messages in EagleFiler > Record > Reveal in Finder, which created an EML file for each message stored in the MBOX created by EF when messages were imported from Mail (two were missing from the output, both of which had subjects ending in "..."; was able to go back and run Reveal in Finder for each, which produced the missing EML files).

* Of the 172 messages, 97 were base64, 55 were quoted-printable, 18 were 7bit, and 2 were 8bit. 97 were plain text and 75 contained HTML.

* Comparing EML files derived from methods 2, 3, and 4, only EagleFiler's output differed from the others: in most cases (all but 17), it had appended 0x0a to the end of each file. Otherwise, no differences were found, and no mojibake appeared for any of the exported messages.

@Miles Here’s an example message that demonstrates one of the issues. You can open it in Mail and then use the Move To command in the Message menu to add it to a mailbox in preparation for exporting. It starts out with 2 emoji in the subject, but they get turned into a combination of ASCII and mojibake by the script.

Sorry about the 50-messages limit for dragging messages from an mbox to Finder. You can raise it if necessary using the MaxMessagesToDragAndDrop user default. This was originally in place for performance reasons because there was a bug that prevented EagleFiler from using promised files for the drag. I suspect that with some other changes in the interim I may be able to work around that and support dragging an arbitrary number of messages. You can also avoid the limit by dragging to a folder within EagleFiler, and then reveal that folder.

@Michael Thank you very much for the sample EML - have not come across another like it in hundreds of thousands of emails:

After importing it into Mail, all methods but Mail's own Mailbox > Export Mailbox... (to MBOX) failed to preserve the proper formatting on first blush:

* Clicking and dragging from Mail to Finder caused Finder to hang - had to Force Quit.

* The AppleScript-generated EML file did not preserve the Subject correctly.

* Importing from Mail into EagleFiler via F1 did not preserve the Subject correctly in EF's list view, nor in the filename when exported to EML. Happily, the exported EML's Subject line displayed properly when imported back into Mail (though it continued to cause Finder to crash when attempting to click and drag from Mail to Finder).

How do users end up with emails like this exactly?

Thanks also for the MaxMessagesToDragAndDrop and MBOX to Folder/EML within EF tips - very handy!

@Miles Hmm, I wonder why. They are definitely uncommon for a lot of people, but I probably get tens of them per day. It’s not just emoji—could be as simple as a bare 8-bit smart apostrophe in the subject. It also seems to happen a lot with Japanese text.

My primary concern with EagleFiler is that the file on disk has the correct data, which I believe it does. That makes it possible to adjust and improve the processing of the message later. Display of the message is a trickier issue because these messages are out-of-spec, so it has to guess how to interpret them. In this case, EagleFiler assumes an encoding of MacRoman for unmarked text because a lot of people have older message (e.g. from Eudora) using that encoding. You can use the DefaultMessageEncoding setting in the esoteric preferences to change this, e.g. to UTF-8. (This will immediately fix the message display; fixing the cached list display would require rebuilding the mailbox’s table of contents or re-importing a .eml file.) I’ve got some code written for an upcoming version of SpamSieve that will try to auto-detect the proper encoding, and that will eventually go into EagleFiler, too.

@Michael Thank you for your reply. I imported 2022-06-28-original.eml into Mail, copied the Subject, created a new email, and pasted the Subject into it; the new message had no trouble exporting normally. Here is the raw source of both messages compared side by side (sadly, Mail seems to ignore the Plain Text setting in Preferences > Composing > Composing: Message format:): https://tinyapps.org/screenshots/20220629-email-sbs.png .

Interestingly, if I clicked and dragged your message along with others out of Mail into Finder, not only did Finder not hang, but your message was exported successfully.

Did you create 2022-06-28-original.eml using a text editor? Or were you able to craft a message like that in Mail (or any other email client) natively? The other emails I could find containing Japanese or emoji with 7bit or 8bit Content-Transfer-Encoding did not suffer from the same issue.

@Miles That’s not surprising because I think Mail is good about sending properly formed e-mails. The problem messages usually seem to be sent by a mass-email program rather than a regular mail client.

To create the original.eml file I exported an actual e-mail that I received (and which failed to export properly using the script) and deleted most of the irrelevant parts using BBEdit.

@Michael I'm curious if the original email had an X-Mailer entry in the header, and if so, what it might've been? I'd like to scan for it in my own archive.

I've updated the AppleScript con section of the blog post with a link to this page - thanks for taking the time to discuss it so thoroughly.

@Miles There was no X-Mailer.

@Michael Thanks for checking. Of the tens of messages with the same issue you mentioned getting on a daily basis, can you please share a few of those X-Mailer headers? I'm eager to dig through my own archive and find similar cases, as I have several decades worth and have been thoroughly testing long-term mail storage options on Linux, macOS, and Windows.

@Miles Here are some X-Mailers from my recent messages with this issue:

188899562862.01.51
188899562862.06.51
606
AOL 7.0 for Windows US sub 118
ActiveCampaign Mailer
DreamHost Mailing Lists
Episerver
Foxmail 4.1 [cn]
Foxmail 5.0 beta2 [cn]
Foxmail 6, 13, 102, 15 [cn]
Internet Mail Service (5.5.2650.21)
JiXing mailer V1.75 Design By JohnnieHuang
MIME-tools 5.503 (Entity 5.501)
MailMate (1.14r5900)
Maillink Scheduler v3.5
Microsoft Outlook Express 5.00.2919.6700
Microsoft Outlook Express 5.50.4133.2400
Microsoft Outlook Express 5.50.4522.1200
Microsoft Outlook Express 6.00.2462.0000
Microsoft Outlook Express 6.00.2600.0000
Microsoft Outlook Express 6.00.2800.1106
Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
Microsoft Outlook, Build 10.0.2616
Microsoft Outlook, Build 10.0.2627
PHPMailer 5.2.28 (https://github.com/PHPMailer/PHPMailer)
QUALCOMM Windows Eudora Version 5.1
The Bat! (v1.52f) Business
WebService/1.1.19266 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.yahoo
eGroups Message Poster

@Michael - Awesome! Thanks so much. Looking forward to some digging.

I received a kind email from Beatrix Willius (developer of Mail Archiver X and earlier commenter in this thread) who shared that changing "source of theMsg" to "source of theMsg as Unicode text" in Tom's script would prevent mojibake. However, the double emoji in your sample EML was still not preserved intact by the script.

@Miles I actually have a script posted that shows how to do this properly. Writing as Unicode text (UTF-16) is half of it. That will put the raw data into every other byte, i.e. it won’t mangle it by trying to change the encoding. But what you end up with is not actually Unicode, which is why it doesn’t look right. You can then delete the NULL half of each pair to end up with the raw data.

P.S. @Michael What an epic range of recent correspondents! Amazing that ancient clients like AOL 7, OE 5 & 6, and The Bat! v1.5 are still being leveraged by users. Reminded me of this video:

Signing on to AOL 3.0 (1996) in 2022
https://www.youtube.com/watch?v=3OTr1pAPhIg

@Miles Some of them are from spams and so were probably not actually generated by those clients.

@Michael Thanks very much for the additional info and link to your script.

@Michael: Just tried compiling your raw script and received the following syntax error with "viewers" in the third line of code highlighted:

> Expected "given", "with", "without", other parameter name, etc. but found identifier.

@Miles I’m not sure why. It’s working for me. Did you try downloading one of the script files instead of copying/pasting?

@Michael Yes, I downloaded the raw text version from here using Safari 15.5:

https://c-command.com/scripts/eaglefiler/Import%20From%20Apple%20Mail.applescript

pasted it into Script Editor 2.11 and clicked Compile. Just tried it again this moment with the same result.

For good measure, downloaded via curl, opened "Import%20From%20Apple%20Mail.applescript" in Script Editor, and received the same error after clicking Compile.

@Michael I thought I'd try testing in a 10.15 VM; other than an apparent encoding issue with the greater than or equal sign in the raw text version, the script compiled without error. So I tried again in Monterey using the same version I tried hours ago - it compiled without error! As a sanity check, I started recording the screen and managed to capture the same syntax error as earlier while using the exact same code: https://tinyapps.org/screenshots/20220630-script-editor.mov . Beatrix also reported (via email) a syntax error when compiling. Not sure what to make of it, but grateful for you sharing your code. Sorry for the trouble.

@Michael
First time venturing onto your blog (as distinct from the c-command forums). Hope this is the right place to ask a fairly nooby question, which seems relevant to this post & thread.
In my noob way I've been trying to script the fortnightly downloading of a certain file for which I receive a download link in an email. I have some pieces of the workflow, but the glaring fail is how to extract the email's raw source (to input to my shell script that should extract the URL).
I've taken the liberty of playing with one of your EagleFiler AppleScripts in hope of achieving the above. But (a) I'm too nooby to properly understand the scope of the script and (b) I'm not sure it does produce the data or format I need.
Is there a script somewhere that might do it for me? Or is there a better forum for asking? Thanks.

@Lance In this case, perhaps you don’t want the raw source data. That’s useful if you want to save a copy of the full message. You could instead get the message’s content via AppleScript and that way let Mail handle the encoding issues, and then you would just extract the piece you want from the Unicode text.

Leave a Comment