Archive for the ‘CIA’ Category

Shadow Brokers Compilation Dates

Wednesday, April 19th, 2017

ShadowBrokers EquationGroup Compilation Timestamp Observation

From the post:

I looked at the IOCs @GossiTheDog ‏posted, looked each up in virus total and dumped the compilation timestamp into a spreadsheet.

To step back a second, the Microsoft Windows compiler embeds the date and time that the given .exe or .dll was compiled. Compilation time is a very useful characteristic of Portable Executable. Malware authors could zero it or change it to a random value, but I’m not sure there is any indication of that here. If the compilation timestamps are real, then there’s an interesting observation in this dataset.

A very clever observation! Check time stamps for patterns!

Enables an attentive reader to ask:

  1. Where the Shadow Broker exploits stolen prior to 2013-08-22?
  2. If no to #1, where are the exploits post 2013-08-22?

Have dumps so far been far away lightning that precedes very close thunderclaps?

Imagine compilation timestamps in 2014, 2015, or even 2016?

Listen for Shadow Brokers to roar!

Shadow Brokers Level The Playing Field

Monday, April 17th, 2017

The whining and moaning from some security analysts over Shadow Broker dumps is a mystery to me.

Apologies for the pie chart, but the blue area represents the widely vulnerable population pre-Shadow Brokers leak:

I’m sorry, you can’t really see the 0.01% or less, who weren’t vulnerable pre-Shadow Brokers leak. Try this enlargement:

Shadow Brokers, especially if they leak more current tools, are leveling the playing field for the average user/hacker.

Instead of 99.99% of users being in danger from people who buy/sell zero-day exploits, some governments and corporations, now it is closer to 100% of all users who are in danger.

Listen to them howl!

Was was not big deal, since people with power could hack the other 99.99% of us, certainly is now a really big deal.

Maybe we will see incentives for more secure software when everyone and I mean everyone is at equal risk.

Help Shadow Brokers level the security playing field.

A post on discovery policy for vulnerabilities promotes user equality.

Do you favor user equality or some other social regime?

The Line Between Safety and Peril – (patched) “Supported Products”

Saturday, April 15th, 2017

Dan Goodin in NSA-leaking Shadow Brokers just dumped its most damaging release yet reports in part:


Friday’s release—which came as much of the computing world was planning a long weekend to observe the Easter holiday—contains close to 300 megabytes of materials the leakers said were stolen from the NSA. The contents (a convenient overview is here) included compiled binaries for exploits that targeted vulnerabilities in a long line of Windows operating systems, including Windows 8 and Windows 2012. It also included a framework dubbed Fuzzbunch, a tool that resembles the Metasploit hacking framework that loads the binaries into targeted networks.

Independent security experts who reviewed the contents said it was without question the most damaging Shadow Brokers release to date.
“It is by far the most powerful cache of exploits ever released,” Matthew Hickey, a security expert and co-founder of Hacker House, told Ars. “It is very significant as it effectively puts cyber weapons in the hands of anyone who downloads it. A number of these attacks appear to be 0-day exploits which have no patch and work completely from a remote network perspective.”

News of the release has been fanned by non-technical outlets, such as CNN Tech, NSA’s powerful Windows hacking tools leaked online by Selena Larson.

Microsoft has responded with: Protecting customers and evaluating risk:

Today, Microsoft triaged a large release of exploits made publicly available by Shadow Brokers. Understandingly, customers have expressed concerns around the risk this disclosure potentially creates. Our engineers have investigated the disclosed exploits, and most of the exploits are already patched. Below is our update on the investigation.

Code Name Solution
EternalBlue Addressed by MS17-010
EmeraldThread Addressed by MS10-061
EternalChampion Addressed by CVE-2017-0146 & CVE-2017-0147
“ErraticGopher” Addressed prior to the release of Windows Vista
EsikmoRoll Addressed by MS14-068
EternalRomance Addressed by MS17-010
EducatedScholar Addressed by MS09-050
EternalSynergy Addressed by MS17-010
EclipsedWing Addressed by MS08-067

Of the three remaining exploits, “EnglishmanDentist”, “EsteemAudit”, and “ExplodingCan”, none reproduces on supported platforms, which means that customers running Windows 7 and more recent versions of Windows or Exchange 2010 and newer versions of Exchange are not at risk. Customers still running prior versions of these products are encouraged to upgrade to a supported offering.
… (emphasis in original)

You are guaranteed to be in peril if you are not running patched, supported Microsoft products.

Even if you are running a supported product, know that 50% of all vulnerabilities are from failure to apply patches.

Unlike the hackers who may be in your system right now, liability of vendors for unreasonably poor coding practices or your company for data breaches caused by your practices, such as failure to apply patches, would be incentives for more secure software and better security practices.

If you are serious about cybersecurity, focus on people you can reach and not those you encounter at random (hackers).

Happy Easter From Shadow Brokers!

Friday, April 14th, 2017

Shadow Brokers Release New Files Revealing Windows Exploits, SWIFT Attacks by Catalin Cimpanu.

From the post:

On Good Friday and ahead of the Easter holiday, the Shadow Brokers have dumped a new collection of files, containing what appears to be exploits and hacking tools targeting Microsoft’s Windows OS and evidence the Equation Group had gained access to servers and targeted the SWIFT banking system of several banks across the world.

The tools were dumped via the Shadow Brokers Twitter account and were accompanied by a blog post, as the group did in the past.

Called “Lost in Translation,” the blog post contains the usual indecipherable ramblings the Shadow Brokers have published in the past, and a link to a Yandex Disk file storage repo.

Cimpanu has a partial list of some of the more interesting hacking tools in the release.

Encouragement to grab a copy of the archive for yourself.

Assuming any, some or all of these tools are genuine, you can now start peeling banks, corporations and governments like eating an orange.

The only thing that’s missing is you.

Transparency anyone?

CIA To Silence Wikileaks? Donate/Leak to Wikileaks

Thursday, April 13th, 2017

CIA chief targets WikiLeaks and Julian Assange as ‘hostile,’ vows to take action by Tim Johnson.

From the post:

CIA Director Mike Pompeo on Thursday called the anti-secrecy group WikiLeaks a hostile intelligence service and said the group would soon face decisive U.S. action to stifle its disclosures of leaked material.

“It ends now,” Pompeo said in his first public remarks after 10 weeks on the job, indicating that President Donald Trump will take undefined but forceful action.

Pompeo lashed out aggressively against Julian Assange, the Australian founder of WikiLeaks – who has been holed up in the Ecuadorean embassy in London for nearly five years – calling him a narcissist and “a fraud, a coward hiding behind a screen.”

Really?

Given the perennial failure of the CIA to discover terror attacks before they happen, recognize when governments are about to fall, and maintain their own security, I can’t imagine Assange and Wikileaks are shaking in their boots.

I disagree with Wikileaks on their style of leaking, I prefer faster and unedited leaking but that’s a question of style and not whether to leak.

If, and it’s a big if, Wikileaks is silenced, the world will grow suddenly darker. Much of what Wikileaks has published would not be published by main stream media, much to the detriment of citizens around the world.

Two things you need to do:

The easy one, donate to support WikiLeaks. As often as you can.

The harder one, leak secrets to Wikileaks.

Repressive governments are pressing WikiLeaks, help WikiLeaks make a fire hose of leaks to push them back.

Wikileaks Vault 7 “Grasshopper” – A Value Added Listing

Friday, April 7th, 2017

Wikileaks has released Vault 7 “Grasshopper.”

As I have come to expect the release:

  • Is in no particular order
  • Requires loading an HTML page before obtaining a PDF file

Here is a value-added listing that corrects both of those problems (and includes page numbers):

  1. GH-Drop-v1_0-UserGuide.pdf 2 pages
  2. GH-Module-Bermuda-v1_0-UserGuide.pdf 9 pages
  3. GH-Module-Buffalo-Bamboo-v1_0-UserGuide.pdf 7 pages
  4. GH-Module-Crab-v1_0-UserGuide.pdf 6 pages
  5. GH-Module-NetMan-v1_0-UserGuide.pdf 6 pages
  6. GH-Module-Null-v2_0-UserGuide.pdf 5 pages
  7. GH-Module-Scrub-v1_0-UserGuide.pdf 6 pages
  8. GH-Module-Wheat-v1_0-UserGuide.pdf 5 pages
  9. GH-Module-WUPS-v1_0-UserGuide.pdf 6 pages
  10. GH-Run-v1_0-UserGuide.pdf 2 pages
  11. GH-Run-v1_1-UserGuide.pdf 2 pages
  12. GH-ScheduledTask-v1_0-UserGuide.pdf 3 pages
  13. GH-ScheduledTask-v1_1-UserGuide.pdf 4 pages
  14. GH-ServiceDLL-v1_0-UserGuide.pdf 4 pages
  15. GH-ServiceDLL-v1_1-UserGuide.pdf 5 pages
  16. GH-ServiceDLL-v1_2-UserGuide.pdf 5 pages
  17. GH-ServiceDLL-v1_3-UserGuide.pdf 6 pages
  18. GH-ServiceProxy-v1_0-UserGuide.pdf 4 pages
  19. GH-ServiceProxy-v1_1-UserGuide.pdf 5 pages
  20. Grasshopper-v1_1-AdminGuide.pdf 107 pages
  21. Grasshopper-v1_1-UserGuide.pdf 53 pages
  22. Grasshopper-v2_0_1-UserGuide.pdf 134 pages
  23. Grasshopper-v2_0_2-UserGuide.pdf 134 pages
  24. Grasshopper-v2_0-UserGuide.pdf 134 pages
  25. IVVRR-Checklist-StolenGoods-2_0.pdf 2 pages
  26. StolenGoods-2_0-UserGuide.pdf 11 pages
  27. StolenGoods-2_1-UserGuide.pdf 22 pages

If you notice that the Grasshopper-*****-UserGuide.pdf appears in four different versions, good for you!

I suggest you read only Grasshopper-v2_0_2-UserGuide.pdf.

The differences between Grasshopper-v1_1-UserGuide.pdf at 53 pages and Grasshopper-v2_0-UserGuide.pdf at 134 pages, are substantial.

However, between Grasshopper-v2_0-UserGuide.pdf and Grasshopper-v2_0_1-UserGuide.pdf the only differences from Grasshopper-v2_0_2-UserGuide.pdf are these:

diff Grasshopper-v2_0-UserGuide.txt Grasshopper-v2_0_1-UserGuide.txt

4c4
< Grasshopper v2.0 
---
> Grasshopper v2.0.1 
386a387,389
> 
> Payloads arguments can be added with the optional -a parameter when adding a 
> payload component. 


diff Grasshopper-v2_0_1-UserGuide.txt Grasshopper-v2_0_2-UserGuide.txt

4c4
< Grasshopper v2.0.1 
---
> Grasshopper v2.0.2 
1832c1832
< winxppro-sp0 winxppro-sp1 winxppro-sp2 winxppro-sp3 
---
> winxp-x64-sp0 winxp-x64-sp1 winxp-x64-sp2 winxp-x64-sp3 
1846c1846
< winxppro win2003 
---
> winxp-x64 win2003 

Unless you are preparing a critical edition for the CIA and/or you are just exceptionally anal, the latest version, Grasshopper-v2_0_2-UserGuide.pdf, should be sufficient for most purposes.

Not to mention saving you 321 pages of duplicated reading.

Enjoy!

Wikileaks Marble – 676 Source Code Files – Would You Believe 295 Unique (Maybe)

Friday, March 31st, 2017

Wikileaks released Marble Framework, described as:

Today, March 31st 2017, WikiLeaks releases Vault 7 “Marble” — 676 source code files for the CIA’s secret anti-forensic Marble Framework. Marble is used to hamper forensic investigators and anti-virus companies from attributing viruses, trojans and hacking attacks to the CIA.

Effective leaking doesn’t seem to have recommended itself to Wikileaks.

Marble-Framework-ls-lRS-devworks.txt, is an ls -lRS listing of the devworks directory.

After looking for duplicate files and starting this post, I discovered entirely duplicated directories:

Compare:

devutils/marbletester/props with devutils/marble/props.

devutils/marbletester/props/internal with devutils/marble/props/internal

devutils/marbleextensionbuilds/Marble/Deobfuscators with devutils/marble/Shared/Deobfuscators

That totals to 182 entirely duplicated files.

In Marble-Framework-ls-lRS-devworks-annotated.txt I separated files on the basis of file size. Groups of duplicate files are separated from other files with a blank line and headed by the number of duplicate copies.

I marked only exact file size matches as duplicates, even though files close in size could be the result of insignificant whitespace.

After removing the entirely duplicated directories, there remain 199 duplicate files.

With 182 files in entirely duplicated directories and 199 remaining duplicates brings us to a grand total of 381 duplicate files.

Or the quicker way to say it: Vault 7 Marble — 295 unique source code files for the CIA’s secret anti-forensic Marble Framework.

Wikileaks may be leaking the material just as it was received. But that’s very poor use of your time and resources.

Leak publishers should polish leaks until they have a fire-hardened point.

Other Methods for Boiling Vault 7: CIA Hacking Tools Revealed?

Friday, March 24th, 2017

You may have other methods for boiling content out of the Wikileaks Vault 7: CIA Hacking Tools Revealed.

To that end, here is the list of deduped files.

Warning: The Wikileaks pages I have encountered are so malformed that repair will always be necessary before using XQuery.

Enjoy!

Efficient Querying of Vault 7: CIA Hacking Tools Revealed

Friday, March 24th, 2017

This week we have covered:

  1. Fact Checking Wikileaks’ Vault 7: CIA Hacking Tools Revealed (Part 1) Eliminated duplication and editorial artifacts, 1134 HTML files out of 7809 remain.
  2. Fact Checking Wikileaks’ Vault 7: CIA Hacking Tools Revealed (Part 2 – The PDF Files) Eliminated public and placeholder documents, 114 arguably CIA files remain.
  3. CIA Documents or Reports of CIA Documents? Vault7 All of the HTML files are reports of possibly CIA material but we do know HTML file != CIA document.
  4. Boiling Reports of CIA Documents (Wikileaks CIA Vault 7 CIA Hacking Tools Revealed) The HTML files contain a large amount of cruft, which can be extracted using XQuery and common tools.

Interesting, from a certain point of view, but aside from highlighting bloated leaking from Wikileaks, why should anyone care?

Good question!

Let’s compare the de-duped but raw with the de-duped but boiled document set.

De-duped but raw document set:

De-duped and boiled document set:

In raw count, boiling took us from 2,131,135 words/tokens to 665,202 words/tokens.

Here’s a question for busy reporters/researchers:

Has the CIA compromised the Tor network?

In the raw files, Tor occurs 22660 times.

In the boiled files, Tor occurs 4 times.

Here’s a screen shot of the occurrences:

With TextSTAT, select the occurrence in the concordance and another select (mouse click to non-specialists) takes you to:

In a matter of seconds, you can answer as far as the HTML documents of Vault7 Part1 show, the CIA is talking about top of rack (ToR), a switching architecture for networks. Not, the Tor Project.

What other questions do you want to pose to the boiled Vault 7: CIA Hacking Tools Revealed document set?

Tooling up for efficient queries

First, you need: Boiled Content of Unique CIA Vault 7 Hacking Tools Revealed Files.

Second, grab a copy of: TextSTAT – Simple Text Analysis Tool runs on Windows, GNU/Linux and MacOS. (free)

When you first open TextSTAT, it will invite you to create a copora.

The barrel icon to the far left creates a new corpora. Select it and:

Once you save the new corpora, this reminder about encodings pops up:

I haven’t explored loading Windows files while on a Linux box but will and report back. Interesting to see inclusion of PDF. Something we need to explore after we choose which of the 124 possibly CIA PDF files to import.

Finally, you are at the point of navigating to where you have stored the unzipped Boiled Content of Unique CIA Vault 7 Hacking Tools Revealed Files:

Select the first file, scroll to the end of the list, press shift and select the last file. Then choose OK. It takes a minute or so to load but it has a progress bar to let you know it is working.

Observations on TextSTAT

As far as I can tell, TextSTAT doesn’t use the traditional stop list of words but enables you to set of maximum and minimum occurrences in the Word Form window. Along with wildcards as well. More flexible than the old stop list practice.

BTW, the context right/left on the Concordance window refers to characters, not words/tokens. Another departure from my experience with concordances. Not a criticism, just an observation of something that puzzled me at first.

Conclusion

The benefit of making secret information available, a la Wikileaks cannot be over-stated.

But making secret information available isn’t the same as making it accessible.

Investigators, reporters, researchers, the oft-mentioned public, all benefit from accessible information.

Next week look for a review of the probably CIA PDF files to see which ones I would incorporate into the corpora. (You may include more or less.)

PS: I’m looking for telecommuting work, editing, research (see this blog), patrick@durusau.net.

Boiling Reports of CIA Documents (Wikileaks CIA Vault 7 CIA Hacking Tools Revealed)

Thursday, March 23rd, 2017

Before you read today’s installment on the Wikileaks CIA Vault 7 CIA Hacking Tools Revealed, you should check out the latest drop from Wikileaks: CIA Vault 7 Dark Matter. Five documents and they all look interesting.

I started with a fresh copy of the HTML files in a completely new directory and ran Tidy first, plus fixed:

page_26345506.html:<declarations><string name="½ö"></string></declarations><p>›<br>

which I described in: Fact Checking Wikileaks’ Vault 7: CIA Hacking Tools Revealed (Part 1).

So with a clean and well-formed set of files, I modified the XQuery to collect all of the references to prior versions. Reasoning that any file that was a prior version, we can ditch that, leaving only the latest files.


for $doc in collection('collection.xml')//a[matches(.,'^\d+$')]
return ($doc/@href/string(), ' ')

Unlike the count function we used before, this returns the value of the href attribute and appends a new line, after each one.

I saved that listing to priors.txt and then (your experience may vary on this next step):

xargs rm < priors.txt

WARNING: If your file names have spaces in them, you may delete files unintentionally. My data had no such spaces so this works in this case.

Once I had the set of files without those representing “previous versions,” I’m down to the expected 1134.

That’s still a fair number of files and there is a lot of cruft in them.

For variety I did look at XSLT, but these are some empty XSLT template statements needed to clean these files:

<xsl:template match="style"/>

<xsl:template match="//div[@id = 'submit_wlkey']" />

<xsl:template match="//div[@id = 'submit_help_contact']" />

<xsl:template match="//div[@id = 'submit_help_tor']" />

<xsl:template match="//div[@id = 'submit_help_tips']" />

<xsl:template match="//div[@id = 'submit_help_after']" />

<xsl:template match="//div[@id = 'submit']" />

<xsl:template match="//div[@id = 'submit_help_buttons']" />

<xsl:template match="//div[@id = 'efm-button']" />

<xsl:template match="//div[@id = 'top-navigation']" />

<xsl:template match="//div[@id = 'menu']" />

<xsl:template match="//footer" />

<xsl:template match="//script" />

<xsl:template match="//li[@class = 'comment']" />

Compare the XQuery query, on the command line no less:

for file in *.html; do
java -cp /home/patrick/saxon/saxon9he.jar net.sf.saxon.Query -s:"$file"
-qs:"/html/body//div[@id = 'uniquer']" -o:"$file.new"
done

(The line break in front of -qs: is strictly for formatting for this post.)

The files generated here will not be valid HTML.

Easy enough to fix with another round of Tidy.

After running Tidy, I was surprised to see a large number of very small files. Or at least I interpret 296 files of less than 1K in size to be very small files.

I created a list of them, linked back to the Wikileaks originals (296 Under 1K Wikileaks CIA Vault 7 Hacking Tools Revealed Files) so you can verify that I capture the content reported by Wikileaks. Oh, and here are the files I generated as well, Boiled Content of Unique CIA Vault 7 Hacking Tools Revealed Files.

In case you are interested, boiling the 1134 files took them from 38.6 MB to 8.8 MB of actual content for indexing, searching, concordances, etc.

Using the content only files, tomorrow I will illustrate how you can correlate information across files. Stay tuned!

When To Worry About CIA’s Zero-Day Exploits

Wednesday, March 22nd, 2017

Chris McNab’s Alexsey’s TTPs (.. Tactics, Techniques, and Procedures) post on Alexsey Belan provides a measure for when to worry about Zero-Day exploits held by the CIA.

McNab lists:

  • Belan’s 9 offensive characteristics
  • 5 defensive controls
  • WordPress hack – 12 steps
  • LinkedIn targeting – 11 steps
  • Third victim – 11 steps

McNab observes:


Consider the number of organizations that provide services to their users and employees over the public Internet, including:

  • Web portals for sales and marketing purposes
  • Mail access via Microsoft Outlook on the Web and Google Mail
  • Collaboration via Slack, HipChat, SharePoint, and Confluence
  • DevOps and support via GitHub, JIRA, and CI/CD utilities

Next, consider how many enforce 2FA across their entire attack surface. Large enterprises often expose domain-joined systems to the Internet that can be leveraged to provide privileged network access (via Microsoft IIS, SharePoint, and other services supporting NTLM authentication).

Are you confident safe 2FA is being enforced over your entire attack surface?

If not, don’t worry about potential CIA held Zero-Day exploits.

You’re in danger from script kiddies, not the CIA (necessarily).

Alexsey Belan made the Most Wanted list at the FBI.

Crimes listed:

Conspiring to Commit Computer Fraud and Abuse; Accessing a Computer Without Authorization for the Purpose of Commercial Advantage and Private Financial Gain; Damaging a Computer Through the Transmission of Code and Commands; Economic Espionage; Theft of Trade Secrets; Access Device Fraud; Aggravated Identity Theft; Wire Fraud

His FBI poster runs two pages but you could edit off the bottom of the first page to make it suitable for framing.

😉

Try hanging that up in your local university computer lab to test their support for free speech.

CIA Documents or Reports of CIA Documents? Vault7

Wednesday, March 22nd, 2017

As I tool up to analyze the 1134 non-duplicate/artifact HTML files in Vault 7: CIA Hacking Tools Revealed, it occurred to me those aren’t “CIA documents.”

Take Weeping Angel (Extending) Engineering Notes as an example.

Caveat: My range of experience with “CIA documents” is limited to those obtained by Michael Best and others using Freedom of Information Act requests. But that should be sufficient to identify “CIA documents.”

Some things I notice about Weeping Angel (Extending) Engineering Notes:

  1. A Wikileaks header with donation button.
  2. “Vault 7: CIA Hacking Tools Revealed”
  3. Wikileaks navigation
  4. reported text
  5. More Wikileaks navigation
  6. Ads for Wikileaks, Tor, Tails, Courage, bitcoin

I’m going to say that the 1134 non-duplicate/artifact HTML files in Vault7, Part1, are reports of portions (which portions is unknown) of some unknown number of CIA documents.

A distinction that influences searching, indexing, concordances, word frequency, just to name a few.

What I need is the reported text, minus:

  1. A Wikileaks header with donation button.
  2. “Vault 7: CIA Hacking Tools Revealed”
  3. Wikileaks navigation
  4. More Wikileaks navigation
  5. Ads for Wikileaks, Tor, Tails, Courage, bitcoin

Check in tomorrow when I boil 1134 reports of CIA documents to get something better suited for text analysis.

Fact Checking Wikileaks’ Vault 7: CIA Hacking Tools Revealed (Part 1)

Monday, March 20th, 2017

Executive Summary:

If you reported Vault 7: CIA Hacking Tools Revealed as containing:

8,761 documents and files from an isolated, high-security network situated inside the CIA’s Center for Cyber Intelligence in Langley, Virgina…. (Vault 7: CIA Hacking Tools Revealed)

you failed to check your facts.

I detail my process below but in terms of numbers:

  1. Of 7809 HTML files, 6675 are duplicates or Wikileaks artifacts
  2. Of 357 PDF files, 134 are Wikileaks artifacts (for materials not released). Of the remaining 223 PDF files, 109 of them are public information, the GNU Make Manual for instance. Out of the 357 pdf files, Wikileaks has delivered 114 arguably from the CIA and some of those are doubtful. (Part 2, forthcoming)

Wikileaks haters will find little solace here. My criticisms of Wikileaks are for padding the leak and not enabling effective use of the leak. Padding the leak is obvious from the inclusion of numerous duplicate and irrelevant documents. Effective use of the leak is impaired by the padding but also by teases of what could have been released but wasn’t.

Getting Started

To start on common ground, fire up a torrent client, obtain and decompress: Wikileaks-Year-Zero-2017-v1.7z.torrent.

Decompression requires this password: SplinterItIntoAThousandPiecesAndScatterItIntoTheWinds

The root directory is year0.

When I run a recursive ls from above that directory:

ls -R year0 | wc -l

My system reports: 8820

Change to the year0 directory and ls reveals:

bootstrap/ css/ highlighter/ IMG/ localhost:6081@ static/ vault7/

Checking the files in vault7:

ls -R vault7/ | wc -l

returns: 8755

Change to the vault7 directory and ls shows:

cms/ files/ index.html logo.png

The directory files/ has only one file, org-chart.png. An organization chart of the CIA but with sub-departments are listed with acronyms and “???.” Did the author of the chart not know the names of those departments? I point that out as the first of many file anomalies.

Some 7809 HTML files are found under cms/.

The cms/ directory has a sub-directory files, plus main.css and 7809 HTML files (including the index.html file).

Duplicated HTML Files

I discovered duplication of the HTML files quite by accident. I had prepared the files with Tidy for parsing with Saxon and compressed a set of those files for uploading.

The 7808 files I compressed started at 296.7 MB.

The compressed size, using 7z, was approximately 3.5 MB.

That’s almost 2 order of magnitude of compression. 7z is good, but it’s not quite that good. 😉

Checking my file compression numbers

You don’t have to take my word for the file compression experience. If you select all the page_*, space_* and user_* HTML files in a file browser, it should report a total size of 296.7 MB.

Create a sub-directory to year0/vault7/cms/, say mkdir htmlfiles and then:

cp *.html htmlfiles

Then: cd htmlfiles

and,

7z a compressedhtml.7z *.html

Run: ls -l compressedhtml.7z

Result: 3488727 Mar 16 16:31 compressedhtml.7z

Tom Harris, in How File Compression Works, explains that:


Most types of computer files are fairly redundant — they have the same information listed over and over again. File-compression programs simply get rid of the redundancy. Instead of listing a piece of information over and over again, a file-compression program lists that information once and then refers back to it whenever it appears in the original program.

If you don’t agree the HTML file are highly repetitive, check the next section where one source of duplication is demonstrated.

Demonstrating Duplication of HTML files

Let’s start with the same file as we look for a source of duplication. Load Cinnamon Cisco881 Testing at Wikileaks into your browser.

Scroll to near the bottom of the file where you see:

Yes! There are 136 prior versions of this alleged CIA file in the directory.

Cinnamon Cisco881 Testinghas the most prior versions but all of them have prior versions.

Are we now in agreement that duplicated versions of the HTML pages exist in the year0/vault7/cms/ directory?

Good!

Now we need to count how many duplicated files there are in year0/vault7/cms/.

Counting Prior Versions of the HTML Files

You may or may not have noticed but every reference to a prior version takes the form:

<a href=”filename.html”>integer</a*gt;

That going to be an important fact but let’s clean up the HTML so we can process it with XQuery/Saxon.

Preparing for XQuery

Before we start crunching the HTML files, let’s clean them up with Tidy.

Here’s my Tidy config file:

output-xml: yes
quote-nbsp: no
show-warnings: no
show-info: no
quiet: yes
write-back: yes

In htmlfiles I run:

tidy -config tidy.config *.html

Tidy reports two errors:


line 887 column 1 - Error: is not recognized!
line 887 column 15 - Error: is not recognized!

Grepping for “declarations>”:

grep "declarations" *.html

Returns:

page_26345506.html:<declarations><string name="½ö"></string></declarations><p>›<br>

The string element is present as well so we open up the file and repair it with XML comments:

<!-- <declarations><string name="½ö"></string></declarations><p>›<br> -->
<!-- prior line commented out to avoid Tidy error, pld 14 March 2017-->

Rerun Tidy:

tidy -config tidy.config *.html

Now Tidy returns no errors.

XQuery Finds Prior Versions

Our files are ready to be queried but 7809 is a lot of files.

There are a number of solutions but a simple one is to create an XML collection of the documents and run our XQuery statements across the files as a set.

Here’s how I created a collection file for these files:

I did an ls in the directory and piped that to collection.xml. Opening the file I deleted index.html, started each entry with <doc href=" and ended each one with "/>, inserted <collection> before the first entry and </collection> after the last entry and then saved the file.

Your version should look something like:

<collection>
  <doc href="page_10158081.html"/>
  <doc href="page_10158088.html"/>
  <doc href="page_10452995.html"/>
...
  <doc href="user_7995631.html"/>
  <doc href="user_8650754.html"/>
  <doc href="user_9535837.html"/>
</collection>

The prior versions in Cinnamon Cisco881 Testing from Wikileaks, have this appearance in HTML source:

<h3>Previous versions:</h3>
<p>| <a href=”page_17760540.html”>1</a> <span class=”pg-tag”><i>empty</i></span>
| <a href=”page_17760578.html”>2</a> <span class=”pg-tag”></span>

…..

| <a href=”page_23134323.html”>135</a> <span class=”pg-tag”>[Xetron]</span>
| <a href=”page_23134377.html”>136</a> <span class=”pg-tag”>[Xetron]</span>
|</p>
</div>

You will need to spend some time with the files (I have obviously) to satisfy yourself that <a> elements that contain only numbers are exclusively used for prior references. If you come across any counter-examples, I would be very interested to hear about them.

To get a file count on all the prior references, I used:

let $count := count(collection('collection.xml')//a[matches(.,'^\d+$')])
return $count

Run that script to find: 6514 previous editions of the base files

Unpacking the XQuery

Rest assured that’s not how I wrote the first XQuery on this data set! 😉

Without exploring all the by-ways and alleys I traversed, I will unpack that query.

First, the goal of the query is to identify every <a> element that only contains digits. Recalling that previous versions link have digits only in their <a> elements.

A shout out to Jonathan Robie, Editor of XQuery, for reminding me that string expressions match substrings unless they are have beginning and ending line anchors. Here:

'^\d+$'

The \d matches only digits, the + enables matching 1 or more digits, and the beginning ^ and ending $ eliminate any <a> elements that might start with one or more digits, but also contains text. Like links to files, etc.

Expanding out a bit more, [matches(.,'^\d+$')], the [ ] enclose a predicate that consist of the matches function, which takes two arguments. The . here represents the content of an <a> element, followed by a comma as a separator and then the regex that provides the pattern to match against.

Although talked about as a “code smell,” the //a in //a[matches(.,'^\d+$')] enables us to pick up the <a> elements wherever they are located. We did have to repair these HTML files and I don’t want to spend time debugging ephemeral HTML.

Almost there! The collection file, along with the collection function, collection('collection.xml') enables us to apply the XQuery to all the files listed in the collection file.

Finally, we surround all of the foregoing with the count function: count(collection('collection.xml')//a[matches(.,'^\d+$')]) and declare a variable to capture the result of the count function: let $count :=

So far so good? I know, tedious for XQuery jocks but not all news reporters are XQuery jocks, at least not yet!

Then we produce the results: return $count.

But 6514 files aren’t 6675 files, you said 6675 files

Yes, your right! Thanks for paying attention!

I said at the top, 6675 are duplicates or Wikileaks artifacts.

Where are the others?

If you look at User #71477, which has the file name, user_40828931.html, you will find it’s not a CIA document but part of Wikileaks administration for these documents. There are 90 such pages.

If you look at Marble Framework, which has the file name, space_15204359.html, you find it’s a CIA document but a form of indexing created by Wikileaks. There are 70 such pages.

Don’t forget the index.html page.

When added together, 6514 (duplicates), 90 (user pages), 70 (space pages), index.html, I get 6675 duplicates or Wikileaks artifacts.

What’s your total?


Tomorrow:

In Fact Checking Wikileaks’ Vault 7: CIA Hacking Tools Revealed (Part 2), I look under year0/vault7/cms/files to discover:

  1. Arguably CIA files (maybe) – 114
  2. Public documents – 109
  3. Wikileaks artifacts – 134

I say “Arguably CIA” because there are file artifacts and anomalies that warrant your attention in evaluating those files.

XQuery Ready CIA Vault7 Files

Friday, March 10th, 2017

I have extracted the HTML files from WikiLeaks Vault7 Year Zero 2017 V 1.7z, processed them with Tidy (see note on correction below), and uploaded the “tidied” HTML files to: Vault7-CIA-Clean-HTML-Only.

Beyond the usual activities of Tidy, I did have to correct the file page_26345506.html: by creating a comment around one line of content:

<!– <declarations><string name=”½ö”></string></declarations&>lt;p>›<br> –>

Otherwise, the files are only corrected HTML markup with no other changes.

The HTML compresses well, 7811 files coming in at 3.4 MB.

Demonstrate the power of basic XQuery skills!

Enjoy!

That CIA exploit list in full: … [highlights]

Wednesday, March 8th, 2017

That CIA exploit list in full: The good, the bad, and the very ugly by Iain Thomson.

From the post:

We’re still going through the 8,761 CIA documents published on Tuesday by WikiLeaks for political mischief, although here are some of the highlights.

First, though, a few general points: one, there’s very little here that should shock you. The CIA is a spying organization, after all, and, yes, it spies on people.

Two, unlike the NSA, the CIA isn’t mad keen on blanket surveillance: it targets particular people, and the hacking tools revealed by WikiLeaks are designed to monitor specific persons of interest. For example, you may have seen headlines about the CIA hacking Samsung TVs. As we previously mentioned, that involves breaking into someone’s house and physically reprogramming the telly with a USB stick. If the CIA wants to bug you, it will bug you one way or another, smart telly or no smart telly. You’ll probably be tricked into opening a dodgy attachment or download.

That’s actually a silver lining to all this: end-to-end encrypted apps, such as Signal and WhatsApp, are so strong, the CIA has to compromise your handset, TV or computer to read your messages and snoop on your webcam and microphones, if you’re unlucky enough to be a target. Hacking devices this way is fraught with risk and cost, so only highly valuable targets will be attacked. The vast, vast majority of us are not walking around with CIA malware lurking in our pockets, laptop bags, and living rooms.

Thirdly, if you’ve been following US politics and WikiLeaks’ mischievous role in the rise of Donald Trump, you may have clocked that Tuesday’s dump was engineered to help the President pin the hacking of his political opponents’ email server on the CIA. The leaked documents suggest the agency can disguise its operations as the work of a foreign government. Thus, it wasn’t the Russians who broke into the Democrats’ computers and, by leaking the emails, helped swing Donald the election – it was the CIA all along, Trump can now claim. That’ll shut the intelligence community up. The President’s pet news outlet Breitbart is already running that line.

Iain does a good job of picking out some of the more interesting bits from the CIA (alleged) file dump. No, you will have to read Iain’s post for those.

I mention Iain’s post primarily as a way to entice you into reading the all the files in hopes of discovering more juicy tidbits.

Read the files. Your security depends on the indifference of the CIA and similar agencies. Is that your model for privacy?

Confirmation: Internet of Things As Hacking Avenue

Tuesday, March 7th, 2017

I mentioned in the Internet of Things (IoT) in Reading the Unreadable SROM: Inside the PSOC4 [Hacking Leader In Internet of Things Suppliers] as a growing, “Compound Annual Growth Rate (CAGR) of 33.3%,” source of cyber insecurity.

Today, Bill Brenner writes:

WikiLeaks’ release of 8,761 pages of internal CIA documents makes this much abundantly clear: the agency has built a monster hacking operation – possibly the biggest in the world – on the backs of the many internet-connected household gadgets we take for granted.

That’s the main takeaway among security experts Naked Security reached out to after the leak went public earlier Tuesday.

I appreciate the confirmation!

Yes, the IoT can and is being used for government surveillance.

At the same time, the IoT is a tremendous opportunity to level the playing field against corporations and governments alike.

If the IoT isn’t being used against corporations and governments, whose fault is that?

That’s my guess too.

You can bulk download the first drop from: https://archive.org/details/wikileaks.vault7part1.tar.