Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 10, 2017

XQuery Ready CIA Vault7 Files

Filed under: CIA,Government,XML,XQuery — Patrick Durusau @ 11:26 am

I have extracted the HTML files from WikiLeaks Vault7 Year Zero 2017 V 1.7z, processed them with Tidy (see note on correction below), and uploaded the “tidied” HTML files to: Vault7-CIA-Clean-HTML-Only.

Beyond the usual activities of Tidy, I did have to correct the file page_26345506.html: by creating a comment around one line of content:

<!– <declarations><string name=”½ö”></string></declarations&>lt;p>›<br> –>

Otherwise, the files are only corrected HTML markup with no other changes.

The HTML compresses well, 7811 files coming in at 3.4 MB.

Demonstrate the power of basic XQuery skills!

Enjoy!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress