Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 7, 2016

Avoiding Viruses in DNC/DCCC/CF Excel Files

Filed under: Cybersecurity,Excel,Government,Hillary Clinton,Politics — Patrick Durusau @ 4:36 pm

I hope you haven’t opened any of the DNC/DCCC/CF Excel files outside of a VM. 517 Excel Files Led The Guccifer2.0 Parade (October 6, 2016)

Yes?

Files from trusted sources can contain viruses. Files from unknown or rogue sources even more so. However tempting (and easy) it is to open up alleged purloined files on your desktop, minimal security conscious users will resist the temptation.

Warning: I did NOT scan the Excel files for viruses. The best way to avoid Excel viruses is to NOT open Excel files.

I used ssconvert, one of the utilities included with gnumeric to bulk convert the Excel files to csv format. (Comma Separate Values is documents in RFC 4780.

Tip: If you are looking for a high performance spreadsheet application, take a look at gnumeric.

Ssconvert relies on file extensions (although other options are available) so I started with:

ssconvert -S donors.xlsx donors.csv

The -S option takes care of workbooks with multiple worksheets. You need a later version of ssconvert (mine is 1.12.9-1 (2013) and the current version of gnumeric and ssconvert is 1.12.31 (August 2016), to convert the .xlsx files without warning.

I’m upgrading to Ubuntu 16.04 soon so it wasn’t worth the trouble trying to stuff a later version of gnumeric/ssconvert onto my present Ubuntu 14.04.

Despite the errors, the conversion appears to have worked properly:

donors-01-460

to its csv output:

donor-03-460

I don’t see any problems.

I’m checking a sampling of the other conversions as well.

BTW, do notice the confirmation of reports from some commentators that they contacted donors who confirmed donating, but could not recall the amounts.

Could be true. If you pay protection money often enough, I’m sure it’s hard to recall a specific payment.

Sorry, I got distracted.

So, only 516 files to go.

I don’t recommend you do:

ssconvert -S filename.xlsx filename.csv

516 times. That will be tedious and error prone.

At least for Linux, I recommend:

for f in *.xls*; do
   ssconvert -S $f $f.csv
done

The *.xls* captures both .xsl and .xslx files, then invokes ssconvert -S on the file and then saves the output file with the original name, plus the extension .csv.

The wc -l command reports 1030828 lines in the consolidated csv file for these spreadsheets.

That’s a lot of lines!

I have some suggestions on processing that file, see: DNC/DCCC/CF Excel Files, As Of October 7, 2016.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress