Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 1, 2016

How To DeDupe Clinton/Weiner/Abedin Emails….By Tomorrow

Filed under: FBI,Hillary Clinton,Politics — Patrick Durusau @ 1:43 pm

The report by Haliman Abdullah, FBI Working to Winnow Through Emails From Anthony Weiner’s Laptop, casts serious doubt on the technical prowess of the FBI when it says:


Officials have been combing through the emails since Sunday night — using a program designed to find only the emails to and from Abedin within the time when Clinton was secretary of state. Agents will compare the latest batch of messages with those that have already been investigated to determine whether any classified information was sent from Clinton’s server.

This process will take some time, but officials tell NBC News that they hope that they will wrap up the winnowing process this week.

Since Sunday night?

Here’s how the FBI, using standard Unix tools, could have finished the “winnowing” in time for the Monday evening news cycle:

  1. Transform (if not already) all the emails into .eml format (to give you separate files for each email).
  2. Grep the resulting file set for emails that contain the Clinton email server by name or addess.
  3. Save the result of #2 to a file and copy all those messages to a separate directory.
  4. Extract the digital signature from each of the copied messages (see below), save to the Abedin file digital signature + file name where found.
  5. Extract the digital signatures from previously reviewed Clinton email server emails, save digital signatures only to the prior-Clinton-review file.
  6. Search for each digital signature in the Abedin file in the prior-Clinton-review file. If found, reviewed. If not found, new email.

The digital signatures are unique to each email and can therefore be used to dedupe or in this case, identify previously reviewed emails.

Here’s a DKIM example signature:

How can I read the DKIM header?

Here is an example DKIM signature (recorded as an RFC2822 header field) for the signed message:

DKIM-Signature a=rsa-sha1; q=dns;
d=example.com;
i=user@eng.example.com;
s=jun2005.eng; c=relaxed/simple;
t=1117574938; x=1118006938;
h=from:to:subject:date;
b=dzdVyOfAKCdLXdJOc9G2q8LoXSlEniSb
av+yuU4zGeeruD00lszZVoG4ZHRNiYzR

Let’s take this piece by piece to see what it means. Each “tag” is associated with a value.

  • b = the actual digital signature of the contents (headers and body) of the mail message
  • bh = the body hash
  • d = the signing domain
  • s = the selector
  • v = the version
  • a = the signing algorithm
  • c = the canonicalization algorithm(s) for header and body
  • q = the default query method
  • l = the length of the canonicalized part of the body that has been signed
  • t = the signature timestamp
  • x = the expire time
  • h = the list of signed header fields, repeated for fields that occur multiple times

We can see from this email that:

  • The digital signature is dzdVyOfAKCdLXdJOc9G2q8LoXSlEniSb
    av+yuU4zGeeruD00lszZVoG4ZHRNiYzR
    .
    This signature is matched with the one stored at the sender’s domain.
  • The body hash is not listed.
  • The signing domain is example.com.
    This is the domain that sent (and signed) the message.
  • The selector is jun2005.eng.
  • The version is not listed.
  • The signing algorithm is rsa-sha1.
    This is the algorith used to generate the signature.
  • The canonicalization algorithm(s) for header and body are relaxed/simple.
  • The default query method is DNS.
    This is the method used to look up the key on the signing domain.
  • The length of the canonicalized part of the body that has been signed is not listed.
    The signing domain can generate a key based on the entire body or only some portion of it. That portion would be listed here.
  • The signature timestamp is 1117574938.
    This is when it was signed.
  • The expire time is 1118006938.
    Because an already signed email can be reused to “fake” the signature, signatures are set to expire.
  • The list of signed header fields includes from:to:subject:date.
    This is the list of fields that have been “signed” to verify that they have not been modified.

From: What is DKIM? Everything You Need to Know About Digital Signatures by Geoff Phillips.

Altogether now, to eliminate previously reviewed emails we need only compare:

dzdVyOfAKCdLXdJOc9G2q8LoXSlEniSbav+yuU4zGeeruD00lszZVoG4ZHRNiYzR (example, use digital signatures from Abedin file)

to the digital signatures in the prior-Clinton-review file.

Those that don’t match, are new files to review.

Why the news media hasn’t pressed the FBI on its extremely poor data processing performance is a mystery to me.

You?

PS: FBI field agents with data mining questions, I do off-your-books freelance consulting. Apologies but on-my-books for the tax man. If they don’t tell, neither will I.

1 Comment

  1. […] http://tm.durusau.net/?p=72337 […]

    Pingback by Daily Reading #20 | thinkpatriot — November 4, 2016 @ 6:23 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress