Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 3, 2016

“Just the texts, Ma’am, just the texts” – Colin Powell Emails Sans Attachments

Filed under: Colin Powell Emails,Government,Politics,Uncategorized — Patrick Durusau @ 7:55 pm

As I reported in Bulk Access to the Colin Powell Emails – Update, I was looking for a host for the complete Colin Powell emails at 2.5 GB, but I failed on that score.

I can’t say if that result is lack of interest in making the full emails easily available or if I didn’t ask the right people. Please circulate my request when you have time.

In the meantime, I have been jumping from one “easy” solution to another, most of which involved parsing the .eml files.

But my requirement is to separate the attachment from the emails, quickly and easily. Not to parse the .eml files in preparation for further process.

How does a 22 character, command line sed expression sound?

Do you know of an “easier” solution?

sed -i '/base64/,$d' *

Reasoning the first attachment (in the event of multiple attachments) will include the string “base64” so I pass a range expression that starts there and ends at the end of the message “$” and delete that pattern, d, and write the files in place “-i.”

There are far more sophisticated solutions to this problem but as crude as this may be, I have reduced the 2.5 GB archive file that includes all the emails and their attachments down to 63 megabytes.

Attachments are important too but my first steps were to make these and similar files more accessible.

Obtaining > 29K files through the drinking straw at DCLeaks or waiting until I find a host for a consolidated 2.5 GB files, doesn’t make these files more accessible.

A 63 MB download of the Colin Powells Emails With No Attachments may.

Please feel free to mirror these files.

PS: One oddity I noticed in testing the download. With Chrome, the file size inflates to 294MB. With Mozilla, the file size is 65MB. ? Both unpack properly. Suggestions?

PPS: More sophisticated processing of the raw emails and other post-processing to follow.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress