October 6, 2016

517 Excel Files Led The Guccifer2.0 Parade (October 6, 2016)

Filed under: Government,Hillary Clinton,Politics — Patrick Durusau @ 8:36 pm

As of today, the data dumps by Guccifer2.0 have contained 517 Excel files.

The vehemence of posts dismissing this dumps makes me wonder two things:

  1. How many of the Excel files these commentators have reviewed?
  2. What is it that you might find in them that worries them so?

I don’t know the answer to #1 and I won’t speculate on their diligence in examining these files. You can reach your own conclusions in that regard.

Nor can I give you an answer to #2, but I may be able to help you explore these spreadsheets.

The old fashioned way, opening each file, at one Excel file per minute, assuming normal Office performance, ;-), would take longer than an eight-hour day to open them all.

You still must understand and compare the spreadsheets.

To make 517 Excel files more than a number, here’s a list of all the Guccifer2.0 released Excel files as of today: guccifer2.0-excel-files-sorted.txt.

(I do have an unfair advantage in that I am willing to share the files I generate, enabling you to check my statements for yourself. A personal preference for fact-based pleading as opposed to conclusory hand waving.)

If you think of each line in the spreadsheets as a record, this sounds like a record linkage problem. Except they have no uniform number of fields, headers, etc.

With record linkage, we would munge all the records into a single record format and then and only then, match up records to see which ones have data about the same subjects.

Thinking about that, the number 517 looms large because all the formats must be reconciled to one master format, before we start getting useful comparisons.

I think we can do better than that.

First step, let’s consider how to create a master record set that keeps all the data as it exists now in the spreadsheets, but as a single file.

See you tomorrow!

