I have extracted the HTML files from WikiLeaks Vault7 Year Zero 2017 V 1.7z, processed them with Tidy (see note on correction below), and uploaded the “tidied” HTML files to: Vault7-CIA-Clean-HTML-Only.
Beyond the usual activities of Tidy, I did have to correct the file page_26345506.html: by creating a comment around one line of content:
<!– <declarations><string name=”½ö”></string></declarations&>lt;p>›<br> –>
Otherwise, the files are only corrected HTML markup with no other changes.
The HTML compresses well, 7811 files coming in at 3.4 MB.
Demonstrate the power of basic XQuery skills!