Archive for May, 2015

External Metadata Management? A Riff for Topic Maps?

Sunday, May 31st, 2015


I was reading a white paper on M-Files when I encountered the following passage:

And to get to the “what” that’s behind metadata, many are turning to a best practice approach of separate metadata management. This approach takes into account the entire scope of enterprise content, including addressing the idea of metadata associated with information for which no file exists. For instance, an audit or a deviation is not a file, but an object for which metadata exists, so by definition, to support this, the system must manage metadata separately from the file itself.

And when metadata is not embedded in files, but managed separately, IT administrators gain more flexibility to:

  • Manage metadata structure using centralized tools.
  • Support adding metadata to all documents regardless of file format.
  • Add metadata to documents (or objects) that do not contain files (or that contain multiple files). This is useful when a document is actually a single paper copy that needs to be incorporated into the ECM system. Some ECM providers often refer to records management when discussing this capability, and others simply provide it as another way to manage a document.
  • Export files from the ECM platform without metadata tags.

Separate metadata management in ECM helps to ensure that all enterprise information is searchable, available and exportable – regardless of file type, format or object type — underscoring again the idea that data is not valuable to an organization unless it can be found.

Topic maps offer other advantages as well but external metadata management may be a key riff to introducing topic maps to big data.

I have uncovered some of the research and other literature on external metadata management. More to follow!

How the TPP Amounts to a Corporate Takeover

Sunday, May 31st, 2015

How the TPP Amounts to a Corporate Takeover by Joseph Stiglitz.

From the post:

The United States and the world are engaged in a great debate about new trade agreements. Such pacts used to be called “free-trade agreements”; in fact, they were managed trade agreements, tailored to corporate interests, largely in the US and the European Union. Today, such deals are more often referred to as “partnerships,”as in the Trans-Pacific Partnership(TPP). But they are not partnerships of equals: the US effectively dictates the terms. Fortunately, America’s “partners” are becoming increasingly resistant.

It is not hard to see why. These agreements go well beyond trade, governing investment and intellectual property as well, imposing fundamental changes to countries’ legal, judicial, and regulatory frameworks, without input or accountability through democratic institutions.

Perhaps the most invidious – and most dishonest – part of such agreements concerns investor protection. Of course, investors have to be protected against the risk that rogue governments will seize their property. But that is not what these provisions are about. There have been very few expropriations in recent decades, and investors who want to protect themselves can buy insurance from the Multilateral Investment Guarantee Agency, a World Bank affiliate (the US and other governments provide similar insurance). Nonetheless, the US is demanding such provisions in the TPP, even though many of its “partners” have property protections and judicial systems that are as good as its own.

The real intent of these provisions is to impede health, environmental, safety, and, yes, even financial regulations meant to protect America’s own economy and citizens. Companies can sue governments for full compensation for any reduction in their future expected profits resulting from regulatory changes.

Assuming that the effort to declassify the TPP succeeds, it will be interesting to compare all the provisions of TPP with the reassurances offered while the text was still secret.

The peer review drugs don’t work [Faith Based Science]

Sunday, May 31st, 2015

The peer review drugs don’t work by Richard Smith.

From the post:

It is paradoxical and ironic that peer review, a process at the heart of science, is based on faith not evidence.

There is evidence on peer review, but few scientists and scientific editors seem to know of it – and what it shows is that the process has little if any benefit and lots of flaws.

Peer review is supposed to be the quality assurance system for science, weeding out the scientifically unreliable and reassuring readers of journals that they can trust what they are reading. In reality, however, it is ineffective, largely a lottery, anti-innovatory, slow, expensive, wasteful of scientific time, inefficient, easily abused, prone to bias, unable to detect fraud and irrelevant.

As Drummond Rennie, the founder of the annual International Congress on Peer Review and Biomedical Publication, says, “If peer review was a drug it would never be allowed onto the market.”

Cochrane reviews, which gather systematically all available evidence, are the highest form of scientific evidence. A 2007 Cochrane review of peer review for journals concludes: “At present, little empirical evidence is available to support the use of editorial peer review as a mechanism to ensure quality of biomedical research.”

We can see before our eyes that peer review doesn’t work because most of what is published in scientific journals is plain wrong. The most cited paper in Plos Medicine, which was written by Stanford University’s John Ioannidis, shows that most published research findings are false. Studies by Ioannidis and others find that studies published in “top journals” are the most likely to be inaccurate. This is initially surprising, but it is to be expected as the “top journals” select studies that are new and sexy rather than reliable. A series published in The Lancet in 2014 has shown that 85 per cent of medical research is wasted because of poor methods, bias and poor quality control. A study in Nature showed that more than 85 per cent of preclinical studies could not be replicated, the acid test in science.

I used to be the editor of the BMJ, and we conducted our own research into peer review. In one study we inserted eight errors into a 600 word paper and sent it 300 reviewers. None of them spotted more than five errors, and a fifth didn’t detect any. The median number spotted was two. These studies have been repeated many times with the same result. Other studies have shown that if reviewers are asked whether a study should be published there is little more agreement than would be expected by chance.

As you might expect, the humanities are lagging far behind the sciences in acknowledging that peer review is an exercise in social status rather than quality:

One of the changes I want to highlight is the way that “peer review” has evolved fairly quietly during the expansion of digital scholarship and pedagogy. Even though some scholars, such as Kathleen Fitzpatrick, are addressing the need for new models of peer review, recognition of the ways that this process has already been transformed in the digital realm remains limited. The 2010 Center for Studies in Higher Education (hereafter cited as Berkeley Report) comments astutely on the conventional role of peer review in the academy:

Among the reasons peer review persists to such a degree in the academy is that, when tied to the venue of a publication, it is an efficient indicator of the quality, relevance, and likely impact of a piece of scholarship. Peer review strongly influences reputation and opportunities. (Harley, et al 21)

These observations, like many of those presented in this document, contain considerable wisdom. Nevertheless, our understanding of peer review could use some reconsideration in light of the distinctive qualities and conditions associated with digital humanities.
…(Living in a Digital World: Rethinking Peer Review, Collaboration, and Open Access by Sheila Cavanagh.)

Can you think of another area where something akin to peer review is being touted?

What about internal guidelines of the CIA, NSA, FBI and secret courts reviewing actions by those agencies?

How do those differ from peer review, which is an acknowledged failure in science and should be acknowledged in the humanities?

They are quite similar in the sense that some secret group is empowered to make decisions that impact others and members of those groups, don’t want to relinquish those powers. Surprise, surprise.

Peer review should be scrapped across the board and replaced by tracked replication and use by others, both in the sciences and the humanities.

Government decisions should be open to review by all its citizens and not just a privileged few.

Mathematics: Best Sets of Lecture Notes and Articles

Sunday, May 31st, 2015

Mathematics: Best Sets of Lecture Notes and Articles by Alex Youcis.

From the post:

Let me start by apologizing if there is another thread on that subsumes this.

I was updating my answer to the question here during which I made the claim that “I spend a lot of time sifting through books to find [the best source]”. It strikes me now that while I love books (I really do) I often find that I learn best from sets of lecture notes and short articles. There are three particular reasons that make me feel this way.

1.Lecture notes and articles often times take on a very delightful informal approach. They generally take time to bring to the reader’s attention some interesting side fact that would normally be left out of a standard textbook (lest it be too big). Lecture notes and articles are where one generally picks up on historical context, overarching themes (the “birds eye view”), and neat interrelations between subjects.

2.It is the informality that often allows writers of lecture notes or expository articles to mention some “trivial fact” that every textbook leaves out. Whenever I have one of those moments where a definition just doesn’t make sense, or a theorem just doesn’t seem right it’s invariably a set of lecture notes that sets everything straight for me. People tend to be more honest in lecture notes, to admit that a certain definition or idea confused them when they first learned it, and to take the time to help you understand what finally enabled them to make the jump.

3.Often times books are very outdated. It takes a long time to write a book, to polish it to the point where it is ready for publication. Notes often times are closer to the heart of research, closer to how things are learned in the modern sense.

It is because of reasons like this that I find myself more and more carrying around a big thick manila folder full of stapled together articles and why I keep making trips to Staples to get the latest set of notes bound.

So, if anyone knows of any set of lecture notes, or any expository articles that fit the above criteria, please do share!

I’ll start:

Fascinating collections of lecture notes and articles, in no particular order with duplication virtually guaranteed. Still, as a browsing resource or if you want to clean it up for others, it is a great resource.


Hand Drawn Map Association

Sunday, May 31st, 2015

Hand Drawn Map Association

The homepage promises:

The Hand Drawn Map Association (HDMA) is an ongoing archive of user submitted maps and other interesting diagrams created by hand.

From Here to There: A Curious Collection: A Curious Collection from the Hand Drawn Map Association.

From the page about the book:

The situation is as familiar as it is mundane: planning to visit friends in an unfamiliar part of the city, you draw yourself a basic map with detailed directions. In 2008, artist and designer Kris Harzinski founded the Hand Drawn Map Association to collect simple drawings of the everyday. Fascinated by these accidental records of a moment in time, he soon amassed a wide variety of maps, ranging from simple directions to maps of fictional locations, found maps, and maps of unusual places (such as a map of a high school locker), including examples by such well-known luminaries as Abraham Lincoln, Ernest Shackleton, and Alexander Calder.

From Here to There celebrates these ephemeral documents—usually forgotten or tossed after having served their purpose—and gives them their due as everyday artifacts. The more than 140 maps featured in this book, including, among many others, maps of an imaginary country for ants, of a traffic island in Australia, of a childhood fort, and of the Anne Frank House in Amseterdam, are as varied and touching as the stories they tell.

Does the ephemeral nature of these maps have a lesson for topic maps? While long lived topic maps are one use case, are temporary, even very temporary topic maps another?

Are there topic maps that I don’t need/want to persist beyond their immediate use?

The time and resources would vary from topic maps meant for long term usage, but so should the doctrines for subject identity?

I don’t have a copy of the book, yet, but there is much to be learned here.

20 tools and resources every journalist should experiment with

Sunday, May 31st, 2015

20 tools and resources every journalist should experiment with by Alastair Reid.

From the post:

Tools have always come from the need to carry out a specific task more effectively. It’s one of the main differences between human beings and the rest of the animal kingdom. We may still be slaves to the same old evolutionary urges but we sure know how to eat noodles in style.

In journalism, an abstract tool for uncovering the most interesting and insightful information about society, we can generally boil the workflow down to four stages: finding, reporting, producing and distributing stories.

So with that in mind, here are a range of tools which will – hopefully – help you carry out your journalism tasks more effectively.

The resources range from advanced Google and Twitter searching to odder items and even practical advice:

Funny story: Glenn Greenwald received an anonymous email in early 2013 from a source wishing to discuss a potential tip, but only if communications were encrypted. Greenwald didn’t have encryption. The source emailed a step-by-step video with instructions to install encryption software. Greenwald ignored it.

The same source, a now slightly frustrated Edward Snowden, contacted film-maker Laura Poitras about the stack of NSA files burning a hole in his hard drive. Poitras persuaded Greenwald that he might want to listen, and the resulting revelations of government surveillance is arguably the story of the decade so far.

The lesson? Learn how to encrypt your email. Mailvelope is a good option with a worthwhile tutorial for PGP encryption, the same as the NSA use, and Knight Fellow Christopher Guess has a great step-by-step guide for setting it up.

In addition to the supporting encryption advice, the other lesson is that major stories can break from new sources.

Oh, the post also mentions:

Unfortunately for reporters, one of the internet’s favourite pastimes is making up rumours and faking photos.

Sounds like a normal function of government to me.

Many journalists have reported something along the lines of:

Iraq’s Defense Ministry said Wednesday an airstrike by the U.S.-led coalition killed a senior Islamic State commander and others near the extremist-held city of Mosul, though the country’s Interior Ministry later said it wasn’t clear if he even was wounded.

The Defense Ministry said the strike killed Abu Alaa al-Afari and others who were in a meeting inside a mosque in the northern city of Tal Afar, 72 kilometers (45 miles) west of Mosul. Senior ISIS Commander Alaa Al-Afari Killed In U.S. Airstrike: Iraqi Officials

rather than:

A communique from the Iraq Defense Ministry claimed credit for killing a senior Islamic State commander and others near the city of Mosul last Wednesday.

The attack focused on a mosque inside the northern city of Tal Afar, 72 kilometers (45 miles) west of Mosul. How many people were inside the mosque at the time of this cowardly attack, along with Abu Alaa al-Afari, is unknown.

Same “facts,” but a very different view of them. I mention this because an independent press or even one that wants to pretend at independence, should not be cheerfully reporting government propaganda.

Quitting the UK

Sunday, May 31st, 2015

Another tech firm says it has quit the UK over government internet surveillance plans by Graham Cluley.

From the post:

When British technology firm announced it was quitting the UK because of the government’s plans to widen mass internet surveillance through a Snooper’s charter, and to block messaging services unless they have a government backdoor, I predicted that they weren’t going to the be the last.

Turns out I was right.

Eris Industries has announced it has told its staff to leave the country and, at least temporarily, moved its headquarters to New York.

The firm says it will only come back if the Communications Data Bill (the UK government’s preferred name for the Snooper’s Charter) has its offending legislation amended.

A blog post by Eris Industries’ COO Preston Byrne, explains the company’s position – it simply cannot engage in its business if it is forced to incorporate cryptographic backdoors that can be accessed by MI5 and GCHQ:

If you have flown into any UK airport but particularly those for London, you will realize that the UK is now a fully functional police state. Not as unpleasant as say the former German Democratic Republic, but that is a matter of degree and not kind.

Don’t get me wrong, the UK is bursting with cultural and human wealth but is being oppressed by a fearful few. How and when that oppression will end is unknown but one hopes with severe consequences for the fearful few.

Exaggerating the Chinese Cyber Threat

Sunday, May 31st, 2015

Exaggerating the Chinese Cyber Threat by Jon R. Lindsay. (Policy Brief, Belfer Center for Science and International Affairs, Harvard Kennedy School.

From the post:

Bottom Lines

Inflated Threats and Growing Mistrust. The United States and China have more to gain than lose through their intensive use of the internet, even as friction in cyberspace remains both frustrating and inevitable. Threat misperception heightens the risks of miscalculation in a crisis and of Chinese backlash against competitive U.S. firms.

The U.S. Advantage. For every type of Chinese cyber threat—political, espionage, and military—there are also serious Chinese vulnerabilities and countervailing U.S. strengths.

Protection of Internet Governance. To ensure the continued high performance of information technology firms and the mutual benefits of globalization, the United States should preserve liberal norms of open interconnection and the multistakeholder system—the loose network of academic, corporate, and governmental actors managing global technical protocols.

A welcome antidote to the rumor mongering of both U.S. and Chinese policy makers.

That government statements on cyber threats and terrorism are exaggerations is never in doubt. The only real questions are how do the individuals making the statements benefiting and how much exaggeration is in play?

GNU Octave 4.0

Sunday, May 31st, 2015

GNU Octave 4.0

From the webpage:

GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. It also provides extensive graphics capabilities for data visualization and manipulation. Octave is normally used through its interactive command line interface, but it can also be used to write non-interactive programs. The Octave language is quite similar to Matlab so that most programs are easily portable.

Version 4.0.0 has been released and is now available for download. Octave 4.0 is a major new release with many new features, including a graphical user interface, support for classdef object-oriented programming, better compatibility with Matlab, and many new and improved functions.

An official Windows binary installer is also available from

A list of important user-visible changes is availble at, by selecting the Release Notes item in the News menu of the GUI, or by typing news at the Octave command prompt.

In terms of documentation:

Reference Manual

Octave is fully documented by a comprehensive 800 page manual.

The on-line HTML and PDF versions of the manual are generated directly from the Texinfo source files that are distributed along with every copy of the Octave source code. The complete text of the manual is also available at the Octave prompt using the doc command.

A printed version of the Octave manual may be ordered from Network Theory, Ltd.. Any money raised from the sale of this book will support the development of free software. For each copy sold $1 will be donated to the GNU Octave Development Fund.

Yemen Cyber Army will release 1M of records per week to stop Saudi Attacks

Sunday, May 31st, 2015

Yemen Cyber Army will release 1M of records per week to stop Saudi Attacks by Pierluigi Paganini.

From the post:

Hackers of the Yemen Cyber Army (YCA) had dumped another 1,000,000 records obtained by violating systems at the Saudi Ministry of Foreign Affairs.

The hacking crew known as the Yemen Cyber Army is continuing its campaign against the Government of Saudi Arabia.

The Yemen Cyber Army (YCA) has released other data from the stolen archived belonging to the Saudi Ministry of Foreign Affairs. The data breach was confirmed by the authorities, Osama bin Ahmad al-Sanousi, a senior official at the kingdom’s Foreign Ministry, made the announcement last week.

Now the hackers have released a new data dump containing 1,000,000 Records ff Saudi VISA Database, they also announced that every week they will release a new lot of 1M records. The Yemen Cyber Army have also shared secret documents of the Private Saudi MOFA with Wikileaks.

he hackers of the Yemen Cyber Army have released 10 records from the archive including a huge amount of data.

Mirror #1 :
Mirror #2 :
Mirror #3 :

The Website has published a detailed analysis of the dump published by the Yemen Cyber Army. reports that the latest dump is mostly visa data.

Good to know that the Yemen Cyber Army is backing up their data with Wikileaks but I don’t think of Wikileaks as a transparent source of government documents. For reasons best known to themselves, Wikileaks has taken on the role of government censor with regard to the information it releases. Acknowledging the critical role Wikileaks has played in recent public debates, don’t blind me to their arrogation of the role of public censor.

Speaking of data dumps, where are the diplomatic records from Iraq? Before or since becoming a puppet government for the United States?

In the meantime, keep watching for more data dumps from the Yemem Cyber Army.


Saturday, May 30th, 2015


From the readme file:

XSL(T) stylesheets to translate non topic map sources and Topic Maps syntaxes

Currently supported:

* TM/XML -> CTM 1.0, XTM 1.0, XTM 2.0, XTM 2.1

* XTM 1.0 -> CTM 1.0, XTM 2.0, XTM 2.1

* XTM 2.x -> CTM 1.0, XTM 2.1, JTM 1.0, JTM 1.1, XTM 1.0

* Atom 1.0 -> XTM 2.1

* RSS -> XTM 2.1

* OpenDocument Metadata -> TM/XML (experimental)

License: BSD

Lars Heuer has updated TMXSL!

The need for robust annotation of data grows daily and every new solution that I have seen is “Another Do It My Way (ADIMW).” Which involves loss of data “Done The Old Way (DTOW)” and changing software. And the cycle repeats itself in small and large ways with every new generation.

Topic maps could change that, even topic maps with the syntactic cruft from early designs could do better. Reconsidered, topic maps can do far better.

More on that topic anon!


Saturday, May 30th, 2015


From the webpage:

NLP4L is a natural language processing tool for Apache Lucene written in Scala. The main purpose of NLP4L is to use the NLP technology to improve Lucene users’ search experience. Lucene/Solr, for example, already provides its users with auto-complete and suggestion functions for search keywords. Using NLP technology, NLP4L development members may be able to present better keywords. In addition, NLP4L provides functions to collaborate with existing machine learning tools, including one to directly create document vector from a Lucene index and write it to a LIBSVM format file.

As NLP4L processes document data registered in the Lucene index, you can directly access a word database normalized by powerful Lucene Analyzer and use handy search functions. Being written in Scala, NLP4L excels at trying ad hoc interactive processing as well.

The documentation is currently in Japanese with a TOC for the English version. Could be interesting if you want to try your hand either at translation and/or working from the API Docs.


Congress Can — and Should — Declassify the TPP

Saturday, May 30th, 2015

Congress Can — and Should — Declassify the TPP by Robert Naiman.

From the post:

One of the most controversial aspects of the proposed Trans-Pacific Partnership (TPP) is the fact that the Obama administration has tried to impose a public blockade on the text of the draft agreement.

When Congress votes on whether to grant the president “fast-track authority” to negotiate the TPP — which would bar Congress from making any changes to the secret pact after it’s negotiated — it will effectively be a vote to pre-approve the TPP itself.

Although the other negotiating countries and “cleared” corporate advisers to the US Trade Representative have access to the draft TPP agreement, the American people haven’t been allowed to see it before Congress votes on fast track. Members of Congress can read the draft agreement under heavy restrictions, but they can’t publicly discuss or consult on what they have read.

Correction: The Obama administration hasn’t “tried to impose a public blockade on the text of the draft agreement,” it has succeeded in imposing a public blockage on the text of TPP.

The question is: What is Congress going to do to break the current blockade on the text of the TPP?

Robert has a great writeup of all the reasons why the American public should be allowed to see the text of the TPP. Such as the other parties to the agreement already know what it says, so why not the American people? Interim texts of agreements get published all the time, so why not this one?

The United States Senate or the House of Representatives can declassify the TPP text. I would say to write to your Senators and Representatives, but not this time. Starting Monday, June 1, 2015, I am going to call both of my Senators and my Representative until I have spoken with each one of them personally to express my concern that the TPP text should be available to the American public before it is submitted to Congress for approval. Including any additional or revised versions.

I will be polite and courteous but will keep calling until contact is made. I suggest you do the same. Leave your request that the TPP be declassified (including later versions) by (appropriate body) for the American public with every message.

BTW, keep count of how many calls it takes to speak to your Senator or Representative. It may give you a better understanding of how effective democracy is in the United States.

I first saw this in a tweet by

Of History & Hashes:…

Saturday, May 30th, 2015

Of History & Hashes: A Brief History of Password Storage, Transmission, & Cracking by Adrian Crenshaw.

From the post:

A while back Jeremy Druin asked me to be a part of a password cracking class along with Martin Bos. I was to cover the very basics, things like “What is a password hash?”, “What types are there?”, and “What is the history of passwords, hashes and cracking them?”. This got me thinking about a paper I read in school that pretty much outlines most of the mistakes made in the handling of passwords and crypto over the almost four decades since it was written. I think a lot of academic InfoSec papers end up being self-indulgent navel gazing, but if this paper, “Password Security: A Case History – Robert Morris & Ken Thompson”, published on April 3, 1978 had been read by more people, many password storage problems would have been avoided. A great deal of people think of information security as being an ever moving field where you have to constantly catch up, and it does have those aspects, but many problems and concepts go way back and people make the same sorts of mistakes over and over again. The way I like to put it is, “Software vulnerabilities generally get patched, but bad design decisions and recurring configuration mistakes are forever”. Were this Sunday School, I’d reference Ecclesiastes 1:9. In this post (and an upcoming talk at ShowMeCon) I’m going to pontificate about password history and mistakes in password handling that people might not have made if they read up on password history.

I’m biased because I like computer history and design issues but I think this is a great read. Just enough detail to keep your interest but no so much that you can’t keep track of the story line.

Adrian finishes up with a set of links to other resources on password history.

Do you want to avoid prior password design issues or no?

I first saw this in a tweet by InfoSec Taylor Swift.

PS: Ecclesiastes 1:9.

Announcing KeystoneML

Saturday, May 30th, 2015

Announcing KeystoneML

From the post:

We’ve written about machine learning pipelines in this space in the past. At the AMPLab Retreat this week, we released (live, on stage!) KeystoneML, a software framework designed to simplify the construction of large scale, end-to-end, machine learning pipelines in Apache Spark. KeystoneML is alpha software, but we’re releasing it now to get feedback from users and to collect more use cases.

Included in the package is a type-safe API for building robust pipelines and example operators used to construct them in the domains of natural language processing, computer vision, and speech. Additionally, we’ve included and linked to several scalable and robust statistical operators and machine learning algorithms which can be reused by many workflows.

Also included in the code are several example pipelines that demonstrate how to use the software to reproduce recent academic results in computer vision, natural language processing, and speech processing….

In case you don’t have plans for the rest of the weekend! 😉

Being mindful of Emmett McQuinn’s post, Amazon Machine Learning is not for your average developer – yet, doesn’t mean you have to remain an “average” developer.

You can wait for a cookie cutter solution from Amazon or you can get ahead of the curve. Your call.

Market Research

Saturday, May 30th, 2015

The products most Googled in every country of the world in one crazy map by Drake Baer.

If you are looking to successfully market goods or services, its helpful to know what they are interested in buying.

Some of the products are quite surprising:

Mauritania: Slaves.

Japan: Watermelon.

Russia: Fly a MIG.

How do you do your market research?

I first saw this in a post to Facebook by Jamie Clark.

WikiLeaks releases more than half a million US diplomatic cables from 1978

Saturday, May 30th, 2015

WikiLeaks releases more than half a million US diplomatic cables from 1978 by Julian Assange.

From the post:

Today WikiLeaks has released more than half a million US State Department cables from 1978. The cables cover US interactions with, and observations of, every country.

1978 was an unusually important year in geopolitics. The year saw the start of a great many political conflicts and alliances which continue to define the present world order, as well as the rise of still-important personalities and political dynasties.

The cables document the start of the Iranian Revolution, leading to the stand-off between Iran and the West (1979 – present); the Second Oil Crisis; the Afghan conflict (1978 – present); the Lebanon–Israel conflict (1978 – present); the Camp David Accords; the Sandinista Revolution in Nicaragua and the subsequent conflict with US proxies (1978 – 1990); the 1978 Vietnamese invasion of Cambodia; the Ethiopian invasion of Eritrea; Carter’s critical decision on the neutron bomb; the break-up of the USSR’s nuclear-powered satellite over Canada, which changed space policy; the US “playing the China card” against Russia; Brzezinski’s visit to China, which led to the subsequent normalisation of relations and a proxy war in Cambodia; with the US, UK, China and Cambodia on one side and Vietnam and the USSR on the other.

Through 1978, Zbigniew “Zbig” Brzezinski was US National Security Advisor. He would become the architect of the destabilisation of Soviet backed Afghanistan through the use of Islamic militants, elements of which would later become known as al-Qaeda. Brzezinski continues to affect US policy as an advisor to Obama. He has been especially visible in the recent conflict between Russia and the Ukraine.

WikiLeaks’ Carter Cables II comprise 500,577 US diplomatic cables and other diplomatic communications from and to US embassies and missions in nearly every country. It follows on from the Carter Cables (368,174 documents from 1977), which WikiLeaks published in April 2014.

The Carter Cables II bring WikiLeaks total published US diplomatic cable collection to 2.7 million documents.

The Public Library of US Diplomacy has an impressive search interface:

  • The Kissinger Cables : 1,707,500 diplomatic cables from 1973 to 1976
  • The Carter Cables : 367,174 diplomatic cables from 1977
  • The Carter Cables 2 : 500,577 diplomatic cables from 1978
  • Cablegate : 251,287 diplomatic cables, nearly all from 2003 to 2010
  • Keywords : Search for a word in the document text or its header
  • Subject:only Keywords Search for a word in the document subject line
  • Concepts Keywords of subjects dealt with in the document
  • Traffic Analysis by Geography and Subject (TAGS) : There are geographic, organization and subject “TAGS” : the classification system implemented by the Department of State for its central files in 1973
  • From : Who/where sent the document
  • To : Who/where received the document
  • Office Origin : Which State Department office or bureau sent the document
  • Office Action : Which State Department office or bureau received the document
  • Original Classification : Classification the document was originally given when produced
  • Handling Restrictions : All handling restrictions governing the document distribution that have been used to date
  • Advanced Search Features
    • Current Classification: Classification the document currently holds
    • Markings: Markings of declassification/release review of the document
    • Type: Correspondence type or format of original document
    • Enclosure: Attachments or other items sent with the original document These are not necessarily currently held in this library
    • Archive Status: Original documents not deleted or lost by State Department after review are available in one of four formats:
    • Locator: Where the original document is now held online or on microfilm, or remains in “ADS” (State Department’s 1973 Automated Data System of indexing by TAGS of electronic telegrams and Preels) with the text either garbled, not converted or unretrievable
  • Character Count : The number of characters, including spaces, in the document
  • Date : Document date range of the search
  • Sort by : Date, oldest first; Date, newest first; Relevance; Random; Length, largest first; Length, smallest first

Great for historical research into yesteryear’s disputes.

Not so great for current government transparency.

The crimes and poor decision making of elected officials and their appointees need to be disclosed in time to hold them accountable. (Say dumps every ninety (90) days, uncensored by Wikileaks or the New York Times.)

Web Page Structure, Without The Semantic Web

Saturday, May 30th, 2015

Could a Little Startup Called Diffbot Be the Next Google?

From the post:

Diffbot founder and CEO Mike Tung started the company in 2009 to fix a problem: there was no easy, automated way for computers to understand the structure of a Web page. A human looking at a product page on an e-commerce site, or at the front page of a newspaper site, knows right away which part is the headline or the product name, which part is the body text, which parts are comments or reviews, and so forth.

But a Web-crawler program looking at the same page doesn’t know any of those things, since these elements aren’t described as such in the actual HTML code. Making human-readable Web pages more accessible to software would require, as a first step, a consistent labeling system. But the only such system to be seriously proposed, Tim Berners-Lee’s Semantic Web, has long floundered for lack of manpower and industry cooperation. It would take a lot of people to do all the needed markup, and developers around the world would have to adhere to the Resource Description Framework prescribed by the World Wide Web Consortium.

Tung’s big conceptual leap was to dispense with all that and attack the labeling problem using computer vision and machine learning algorithms—techniques originally developed to help computers make sense of edges, shapes, colors, and spatial relationships in the real world. Diffbot runs virtual browsers in the cloud that can go to a given URL; suck in the page’s HTML, scripts, and style sheets; and render it just as it would be shown on a desktop monitor or a smartphone screen. Then edge-detection algorithms and computer-vision routines go to work, outlining and measuring each element on the page.

Using machine-learning techniques, this geometric data can then be compared to frameworks or “ontologies”—patterns distilled from training data, usually by humans who have spent time drawing rectangles on Web pages, painstakingly teaching the software what a headline looks like, what an image looks like, what a price looks like, and so on. The end result is a marked-up summary of a page’s important parts, built without recourse to any Semantic Web standards.

The irony here, of course, is that much of the information destined for publication on the Web starts out quite structured. The WordPress content-management system behind Xconomy’s site, for example, is built around a database that knows exactly which parts of this article should be presented as the headline, which parts should look like body text, and (crucially, to me) which part is my byline. But these elements get slotted into a layout designed for human readability—not for parsing by machines. Given that every content management system is different and that every site has its own distinctive tags and styles, it’s hard for software to reconstruct content types consistently based on the HTML alone.

There are several themes here that are relevant to topic maps.

First, it is true that most data starts with some structure, styles if you will, before it is presented for user consumption. Imagine an authoring application that automatically and unknown to its user, metadata that can then provide semantics for its data.

Second, the recognition of structure approach being used by Diffbot is promising in the large but should also be promising in the small as well. Local documents of a particular type are unlikely to have the variance of documents across the web. Meaning that with far less effort, you can build recognition systems that can empower more powerful searching of local document repositories.

Third, and perhaps most importantly, while the results may not be 100% accurate, the question for any such project should be how much accuracy is required? If I am mining social commentary blogs, a 5% error rate on recognition of speakers might be acceptable, because for popular threads or speakers, those errors are going to be quickly corrected. Unpopular threads or authors never followed, does that come under no harm/no foul?

Highly recommended for reading/emulation.

Government and “legitimate secrets?”

Saturday, May 30th, 2015

Benjamin Wittes in An Approach to Ameliorating Press-IC Tensions Over Classified Information gives a good set of pointers to the recent dispute between the intelligence community and the New York Times:

I’ve been thinking about the exchange over the past couple of weeks—much of which took place on Lawfare—between the New York Times and the intelligence community over the naming of CIA undercover officers in a Times story. (A brief recap in links: here are Bob Litt’s original comments, the 20 former intelligence officers’ letter, Jack’s interview with Dean Baquet, my comments in response, Mark Mazzetti’s comments, and Jack’s comments.)

I want to float an idea for a mechanism that might ameliorate tensions over this sort of issue in the future. It won’t eliminate those tensions, which are inherent in the relationship between a government that has legitimate secrets to keep and a press that rightly wants to report on the activities of government, but it might give the public a lens through which to view individual disputes, and it might create a means through the which the two sides can better and more fully communicate in high-stakes situations.

The basic problem when the press has an undoubtedly newsworthy item that involves legitimately sensitive information is two-fold: the government will tend to err on the side of overstating the sensitivity of the information, because it has to protect against the biggest risk, and the government often cannot disclose to the newspaper the full reasons for its concerns.

I can’t say that I care for Wittes’ proposal because it begins with the assumption that a democratically elected government can have “legitimate secrets.” Once that point is conceded, the only argument is about the degree of ignorance of the electorate. That the electorate will never know, hopefully in eyes of some, the truth about government activities is taken as a given.

For example, did you know that the United States government supported the Pol Pot regime in Cambodia? A regime that reduced the population of Cambodia by 25%, deaths coming from executions, forced labor, starvation, etc.

Question for readers who vote in the United States:

Would United States support for a genocidal regime affect your vote in an election?

Assuming that one candidate promised continued support for such a regime and their opponent promised to discontinue support.

That seems pretty obvious, but that is exactly the sort of secrets that the government keeps from the voters.

How do I know the United States government supported Pol Pot? Good question! Sources? Would you believe diplomatic cables from the relevant time period? Recently published by Wikileaks?

The Pol Pot dilemma by Charles Parkinson, Alice Cuddy and Daniel Pye, reads in part:

A trove of more than 500,000 US diplomatic cables from 1978 released by WikiLeaks on Wednesday includes hundreds that paint a vivid picture of a US administration torn between revulsion at the brutality of Pol Pot’s government and fear of Vietnamese influence should it collapse.

“We believe a national Cambodia must exist even though we believe the Pol Pot regime is the world’s worst violator of human rights,” reads a cable sent by the State Department to six US embassies in Asia on October 11, 1978. “We cannot support [the] Pol Pot government, but an independent Kampuchea must exist.”

They are the second batch of cables to be released by the whistle-blowing website from Jimmy Carter’s presidency, which was marked by a vocal emphasis on human rights. That focus shines through in much of the correspondence, even to the point of wishing success on the Khmer Rouge in repelling Vietnamese incursions during the ongoing war between the two countries, in the hope it would, paradoxically, prevent more of the worst excesses of the government in Phnom Penh.

“While the Pol Pot government has few, if any, redeeming features, the cause of human rights is not likely to be served by the continuation of fighting between the Vietnamese and the government,” reads a cable sent by the US Embassy in Thailand to the State Department on October 17. “A negotiated settlement of [Vietnamese-Cambodian] differences might reduce the purges.”

Read also: SRV-KHMER CONFLICT PRESENTS BENEFITS AND POTENTIAL PROBLEMS FOR MOSCOW to get a feel for the proxy war status of the conflict between Cambodia and Vietnam during this period.

Although the government keeps a copy of your financial information, social security number, etc., that is your information that it holds in trust. Secrecy of that information should not be in doubt.

However, when we are talking about information that is generated in the course of government relations to other governments or in carrying out government policy, we aren’t talking about information that belongs to individuals. At least in a democracy, we are talking about information that belongs to the general public.

In your next debate about government secrecy, challenge the presumption of a need for government secrecy. History is on your side.

Roll-Your-Own Ransomware

Saturday, May 30th, 2015

Malware is going mainstream. Along with surveys, machine learning, predictive analytics, there is now an online service for developing your own ransomware malware.

Swati Khandelwal writes in: ‘Tox’ Offers Free build-your-own Ransomware Malware Toolkit:

“Ransomware” threat is on the rise, but the bad news is that Ransomware campaigns are easier to run, and now a Ransomware kit is being offered by hackers for free for anyone to download and distribute the threat.

Ransomware is a type of computer virus that infects a target computer, encrypts their sensitive documents and files, and locks the out until the victim pays a ransom amount, most often in Bitcoins.

Sometimes even the best security experts aren’t able to unlock them and end up paying off ransom to crooks in order to get their important files back.

Tox — Free Ransomware Kit

Now, to spread this creepy threat more easily by even a non-tech user, one dark web hacker has released a ransomware-as-a-service kit, dubbed “Tox,” for anyone to download and set up their own ransomware for free.

Yes, believe it or not, but Tox is completely free to use. The developers of the online software make money by taking a cut (20%) of any successful ransomware campaigns its users run.

Tox, which runs on TOR, requires not much technical skills to use and is designed in such a way that almost anyone can easily deploy ransomware in three simple steps, according to security researchers at McAfee who discovered the kit.

Before you leap onto the Dark Web to sign up, consider this:

Anyone willing to assist you in cheating others, is capable of cheating you.

Swati has good advice on avoiding ransomware and points to Free Ransomware Decryption and Malware Removal ToolKit and How to protect your computer from ransomware malware? as additional resources.

Ponemon Data Breach Report Has No Business Intelligence

Friday, May 29th, 2015

Study: Average cost of data breach is $6.5M by Ashley Carman.

From the post:

In a year already characterized by data breaches at recognizable healthcare organizations, such as CareFirst BlueCross BlueShield, and at major government entities, including the IRS, it’s no surprise that victims’ personal information is a hot commodity.

An annual study from the Ponemon Institute and IBM released on Wednesday found that the average cost per capita cost in a data breach increased to $217 in 2015 from $201 in 2014. Plus, the average total cost of a data breach increased to $6.5 million from $5.8 million the prior year.

The U.S. looked at 62 companies in 16 industry sectors after they experienced the loss or theft of protected personal data and then had to notify victims.

The Ponemon data breach study has no business intelligence. Despite a wealth of detail on expenses of data breaches, not a word on the corresponding costs to avoid those breaches.

Reminds me of saying “…solar panels provide renewable energy…,” which makes sense, if you ignore the multi-decade cost of recovering your investment. No sane business person would take that flyer.

But many will read anxiously that the “average” data breach cost is $6.5 million. If that were the cost to CareFirst BlueCross BlueShield, its charitable giving, $50,959,000 was over eight (8) times that amount, on a total revenue of $7.2 Billion dollars in 2011. Depending on the cost of greater security, $6.5 million may be a real steal.

Data breach reports should contain business intelligence. Business intelligence requires not only the cost of data breaches but the costs of reducing data breaches. And some methodology for determining which security measures reduce data breach costs by what percentage.

Without numbers and a methodology on determining cost of security improvements, file the Ponemon data breach report with 1970’s marketing literature on solar panels.

PS: Solar panels have become much more attractive in recent years but the point is that all business decisions should be made on the basis of cost versus benefit. The Ponemon report is just noise until there is a rational basis for business decisions in this area.

USB Modem Vulnerability

Friday, May 29th, 2015

Like routers, most USB modems also vulnerable to drive-by hacking by Lucian Constantin.

From the post:

The majority of 3G and 4G USB modems offered by mobile operators to their customers have vulnerabilities in their Web-based management interfaces that could be exploited remotely when users visit compromised websites.

The flaws could allow attackers to steal or manipulate text messages, contacts, Wi-Fi settings or the DNS (Domain Name System) configuration of affected modems, but also to execute arbitrary commands on their underlying operating systems. In some cases, the devices can be turned into malware delivery platforms, infecting any computers they’re plugged into.

Russian security researchers Timur Yunusov and Kirill Nesterov presented some of the flaws and attacks that can be used against USB modems Thursday at the Hack in the Box security conference in Amsterdam.

USB modems are actually small computers, typically running Linux or Android-based operating systems, with their own storage and Wi-Fi capability. They also have a baseband radio processor that’s used to access the mobile network using a SIM card.

Many modems have an embedded Web server that powers a Web-based dashboard where users can change settings, see the modem’s status, send text messages and see the messages they receive. These dashboards are often customized or completely developed by the mobile operators themselves and are typically full of security holes, Yunusov and Nesterov said.

The researchers claim to have found remote code execution vulnerabilities in the Web-based management interfaces of more than 90 percent of the modems they tested. These flaws could allow attackers to execute commands on the underlying operating systems.

Unlike CNN, the authors report real security issues with USB modems. (It’s entirely possible some CNN stories are accurate, useful, but I don’t know the odds.)

I particularly liked the lines on slide 56:

Please don’t plug computers into your USB

Is it safe to plug USB devices on 220v wall sockets?

(I assume “on” = “into.” 😉 )

I don’t know if there will be a video but you can obtain the presentation materials.

I didn’t see any videos from prior events but there are presentation materials and white papers at: Hack In The Box Security Conference.

Friday, May 29th, 2015

From the webpage:

In the spirit of OpenCourseWare and the Khan Academy, is dedicated to sharing training material for computer security classes, on any topic, that are at least one day long.

All material is licensed with an open license like CreativeCommons, allowing anyone to use the material however they see fit, so long as they share modified works back to the community.

We highly encourage people who already know these topic areas to take the provided material and pursue paid and unpaid teaching opportunities.

Those who can, teach.

There are twenty-eight (28) local classes and thirteen (13) video classes.

I haven’t viewed any of the courses in their entirety but this certainly sounds like a good idea!

“Bake Cake” = “Build a Bomb”?

Friday, May 29th, 2015

The CNN never misses an opportunity to pollute the English language when it issues vague, wandering alerts social media and terrorists.

In its coverage of an FBI terror bulletin, FBI issues terror bulletin on ISIS social media reach (video), CNN displays a tweet allegedly using “bake cake” for “build a bomb” at time mark 1:42.

The link pointed to is obscured and due to censorship of my Twitter feed, I cannot confirm the authenticity of the tweet, nor to what location the link pointed.

The FBI bulletin was issued on May 21, 2015 and the tweet in question was dated May 27, 2015. Its relevance to the FBI bulletin is highly questionable.

The tweet in its entirety reads:

want to bake cake but dont know how?>

for free cake baking training>

Why is this part of the CNN story?

What better way to stoke fear than to make common phrases into fearful ones?

Hearing the phrase “bake a cake” isn’t going to send you diving under the couch but as CNN pushes this equivalence, you will become more and more aware of it.

Not unlike being in the Dallas/Ft. Worth airport for hours listening to: “Watch out for unattended packages!” Whether there is danger or not, it wears on your psyche.

XML Calabash 1.1.4

Thursday, May 28th, 2015

XML Calabash 1.1.4 by Norm Walsh.

XML Calabash implements XProc: An XML Pipeline Language.

Time to update again!

Writing this reminds me I owe Norm responses on comments. 😉 Coming!

Who Created ISIS?

Thursday, May 28th, 2015

2012 Defense Intelligence Agency document: West will facilitate rise of Islamic State “in order to isolate the Syrian regime” by Brad Hoff.

Amazing what you can find with Freedom of Information Act lawsuits. Judicial Watch, a conservative group, obtained Defense Intelligence Agency documents saying that an Islamic State was a desired result:


See the full DIA report.

Brad has a full workup of this issue with pointers into the recently released documents.

This does a lot to explain the government paranoia over ISIS and the absurd claims of government lackeys about the influence of ISIS in social media, etc.

Despite it being a creature of Western policy at its outset, members of ISIS have strayed from the path desired by the United States. The lesson of Saddam Hussein, who committed the same error, has yet to be learned.

Arab groups/governments, for some inexplicable reason, don’t realize they are lackies of the West and behave accordingly. Personally I am betting in the long run on a pan-Arab state that shakes off Western manipulation and controls its own destiny.

Update: A very good but lengthy followup by Brad Hoff: The DIA Gives an Official Response to Article Alleging the West Backed ‘Islamic State’.

Content Recommendation From Links Shared on Twitter Using Neo4j and Python

Thursday, May 28th, 2015

Content Recommendation From Links Shared on Twitter Using Neo4j and Python by William Lyon.

From the post:


I’ve spent some time thinking about generating personalized recommendations for articles since I began working on an iOS reading companion for the bookmarking service. One of the features I want to provide is a feed of recommended articles for my users based on articles they’ve saved and read. In this tutorial we will look at how to implement a similar feature: how to recommend articles for users based on articles they’ve shared on Twitter.


The main tools we will use are Python and Neo4j, a graph database. We will use Python for fetching the data from Twitter, extracting keywords from the articles shared and for inserting the data into Neo4j. To find recommendations we will use Cypher, the Neo4j query language.

Very clear and complete!


Attempt to Screw Security Researchers and You Too

Thursday, May 28th, 2015

I mentioned efforts by the U.S. to make changes to the Wassenaar arrangement in: Beyond TPP: An International Agreement to Screw Security Researchers and Average Citizens.

The Electronic Freedom Foundation (Nate Cardozo and Eva Galperin) has blogged on this topic in: What Is the U.S. Doing About Wassenaar, and Why Do We Need to Fight It?.

Deadline for comments: July 20, 2015.

Nate and Eva close with this plea:

BIS [Bureau of Industry and Security]has posted a request for comments on this proposed rule and the comment period is open through July 20, 2015. BIS is specifically asking for information about the negative effects the proposed rule would have on “vulnerability research, audits, testing or screening and your company’s ability to protect your own or your client’s networks.” We encourage independent researchers, academics, the security community, and companies both inside and outside the U.S. to answer BIS’ call and submit formal comments. Researchers and companies whose work has been hindered by the European regulations, which are notably less restrictive than the U.S. proposal, are also encouraged to submit comments about their experience.

EFF will be submitting our own comments closer to the July 20 deadline, but in the meantime, we’d love it if those of you who are submitting comments to copy us ( so that we can collect and highlight the best arguments both in our own comments and on this blog.

Take special note of:

BIS is specifically asking for information about the negative effects the proposed rule would have on “vulnerability research, audits, testing or screening and your company’s ability to protect your own or your client’s networks.”

Here is your chance to be specific. The more detail you can provide, the stronger the case will be against the proposed changes. If all the concerns seem like idle hand waving, bad things could happen.

I don’t do security research but if you need a second pair of eyes on your comments, I can make the time to review a very limited number of comments.

No promises that efforts to oppose these changes will be successful but if we skip opportunities to influence the process, we have only ourselves to blame for the outcome.

Understanding Map Projections

Thursday, May 28th, 2015

Understanding Map Projections by Tiago Veloso.

From the post:

Few subjects are so controversial – or at least, misunderstood- in cartography as map projections, especially if you’re taking your first steps in this field. And that’s simply because every flat map misrepresents the surface of the Earth in some way. So, in this matter, your work in map-mapping is basically to choose the best projection that suits your needs and reduces the distortion of the most important features you are trying to show/highlight.

But it’s not because you don’t have enough literature about it. There are actually a bunch of great resources and articles that will help you choose the correct projection for your map, so we decided to bring together a quick reference list.

Hope you enjoy it!

I rather like the remark:

…reduces the distortion of the most important features you are trying to show/highlight.

In part because I read it as a concession that all projections are distortions, including those that suit our particular purposes.

I would argue that all maps are at their inception distortions. They never represent every detail of what is being mapped and that implies a process of selective omission. Someone will consider what was omitted important, but it was less important than some other detail to the map maker.

Would the equivalent of projections for topic maps be choice of associations between topics or choices of subjects? Or both?

I lean towards the choice of associations and subjects because graphical rendering of associations creates impressions of the existence and strengths of relationships. Subjects because they are the anchors of the associations.

Speaking of distortion, I would consider any topic map about George H. W. Bush that doesn’t list his war crimes and members of his administration who were also guilty of war crimes as incomplete. There are other opinions on that topic (or at least so I am told).

Suggestions on how to spot “tells” of omission? What can be left out of a map that clues you in that something is missing? Varies from subject to subject but even a rough list would be helpful.

How journals could “add value”

Thursday, May 28th, 2015

How journals could “add value” by Mark Watson.

From the post:

I wrote a piece for Genome Biology, you may have read it, about open science. I said a lot of things in there, but one thing I want to focus on is how journals could “add value”. As brief background: I think if you’re going to make money from academic publishing (and I have no problem if that’s what you want to do), then I think you should “add value”. Open science and open access is coming: open access journals are increasingly popular (and cheap!), preprint servers are more popular, green and gold open access policies are being implemented etc etc. Essentially, people are going to stop paying to access research articles pretty soon – think 5-10 year time frame.

So what can journals do to “add value”? What can they do that will make us want to pay to access them? Here are a few ideas, most of which focus on going beyond the PDF:

Humanities journals and their authors should take heed of these suggestions.

Not applicable in every case but certainly better than “journal editorial board as resume padding.”