Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 26, 2016

Bulk Access to the Colin Powell Emails

Filed under: Colin Powell Emails,Government,Politics — Patrick Durusau @ 7:26 pm

The Colin Powell Email leak is important, but if you visit the DCLeaks page for Powell emails, June, July and August of 2014, this is what you find:

dc-leaks-search-460

If you attempt to use the “search” box, you discover that your search is limited to June, July and August of 2014.

Then you remember the main page:

dcleaks-powell-contents-460

Which means every search must be repeated thirteen (13) times to find all relevant emails.

The phone is ringing, your pager is going off, emails and IMs are piling up and your on deadline. How useful is this interface to you as a reporter?

Have your own methods for processing large leaks of documents?

Not relevant here because access the Powell emails is one email at a time.

Put your drinking straw into a lake of 29,641 emails.

Best of luck with that drinking straw approach.

I’m suggesting a different approach.

What if someone automated that drinking straw and created a mirrored set of those 29,641 emails? Along with correcting the twelve (12) emails that chocked a .eml to .mbox converter.

Interested?

Hosting Request: The full data set runs 2.5 GB, which, if popular, is far more traffic than I can support.

Requirements for hosting:

  1. Distribute the file as delivered to you.
  2. Distribute the file for free.

If you are interested, drop me a line at: patrick@durusau.net.

Warning: I have not checked the files or their attachments for malware, hostile links, etc. Open untrusted files in VMs without network connections. At a minimum.

Test your interest against the emails for March-April of 2016: powell-sample.tar.gz. (roughly 108MB)

Manipulation, enhancement and analysis of samples and the full set to follow.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress