Archive for July, 2015

600 Terabytes/30,000 Instances Exposed (Not OPM or Sony)

Tuesday, July 21st, 2015

Graeme Burton writes in Check your NoSQL database – 600 terabytes of MongoDB data publicly exposed on the internet, that the inventor of Shodan, John Matherly, claims 595.2 terabytes (TB) of data are exposed in MongoDB instances without authentication.

On the MongoDB story, see: It’s the Data, Stupid! by John Matherly, if you want the story on collecting the data on MongoDB instances.

The larger item of interest is Shodan, “Shodan is the world’s first search engine for Internet-connected devices.

Considering how well everyone has done with computer security to date, being able to search Internet-connected devices should not be problem. Yes?

From the homepage:

Explore the Internet of Things

Use Shodan to discover which of your devices are connected to the Internet, where they are located and who is using them.

See the Big Picture

Websites are just one part of the Internet. There are power plants, Smart TVs, refrigerators and much more that can be found with Shodan!

Monitor Network Security

Keep track of all the computers on your network that are directly accessible from the Internet. Shodan lets you understand your digital footprint.

Get a Competitive Advantage

Who is using your product? Where are they located? Use Shodan to perform empirical market intelligence.

My favorite is the last one, “…to perform empirical market intelligence.” You bet!

There are free accounts so I signed up for one to see what I could see. 😉

Here are some of the popular saved searches (that you can see with a free account):

  • Webcam – best ip cam search I have found yet.
  • Cams – admin admin
  • Netcam – Netcam
  • dreambox – dreambox
  • default password – Finds results with “default password” in the banner; the named defaults might work!
  • netgear – user: admin pass: password
  • – Trendnet IP Cam
  • ssh – ssh
  • Router w/ Default Info – Routers that give their default username/ password as admin/1234 in their banner.
  • SCADA – SCADA systems search

With the free account, you can only see the first fifty (50) results for a search.

I’m not sure I agree that the pricing is “simple” but it is attractive. Note the difference between query credits and scan credits. The first applies to searches of the Shodan database and the second applies to networks you have targeted.

The 20K+ routers w/ default info could be a real hoot!

You know, this might be a cost effective alternative for the lower level NSA folks.

BTW, in case you are looking for it: the API documentation.

Definitely worth a bookmark and blog entries about your experiences with it.

It could well be that being insecure in large numbers is a form of cyberdefense.

Who is going to go after you when numerous larger, more well known targets are out there for the taking? And with IoT, the number of targets is going to increase geometrically.

North Korean Security Advisers for the FBI?

Tuesday, July 21st, 2015

US trade restrictions on North Korea have prevented FBI Director James Comey from securing cybersecurity advice from North Korea.

Due to recent data breaches, users want to replace the cyber vulnerability known as Windows XP. If Directory Comey could make a deal with North Korea, he could secure distribution and labeling rights to North Korea’s Red Star Linux operating system.

Darren Pauli reports in North Korea’s Red Star Linux inserts sneaky serial content tracker that:

ERNW security analyst Florian Grunow says North Korea’s Red Star Linux operating system is tracking users by tagging content with unique hidden tags.

The operating system, developed from 2002 as a replacement for Windows XP, was relaunched with a Mac-like interface in 2013’s version three. The newest version emerged in January 2015.

Grunow says files including Microsoft Word documents and JPEG images connected to but not necessarily executed in Red Star will have a tag introduced into its code that includes a number based on hardware serial numbers.

It’s not a perfect solution for the problems faced by Director Comey because it doesn’t track files created using OpenOffice.

There is an added bonus of using North Korean security advisers, they have been trained to not contradict management goals with technical objections. Have You Really Tried? (FBI to Encryption Experts Opposing Back Doors).

If you know anyone with the FBI please pass this tip along.

I suspect Director Comey and his staff are still working their way through Hoover era bedroom tapes and don’t have time to follow my blog personally.

Life is Short. Have an Affair [Or, get a divorce]

Monday, July 20th, 2015

Hackers Gain Access to Extramarital Dating Databases by Chris DiMarco.

From the post:

Few things in life are as private as our romantic entanglements. So with hackers announcing they’ve made off with as many as 37 million records from the parent company of extramarital dating site, you can be sure there are plenty of people sweating over the potential fallout.

The hackers, “The Impact Team,” have demanded the extramarital site shut down or all the data will be released. That’s an odd use of pirated data.

A better strategy would be to complete the profiles to discover spouses and sell the resulting list to divorce lawyers. The profiles of the offending partners would be extra.

There are any number of countries want to help the United States police its loose morals. They could legalize the sales of the data (not its acquisition). You would not even have to launder the money.

The other upside would be a giant lesson to many users in protecting their own privacy.

No one else is going to.

Is that clear?

Why Are Data Silos Opaque?

Monday, July 20th, 2015

As I pointed out in In Praise of the Silo [Listening NSA?], quoting Neil Ward-Dutton:

Every organisational feature – including silos – is an outcome of some kind of optimisation. By talking about trying to destroy silos, we’re denying the sometimes very rational reasons behind their creation.

While working on a presentation for Balisage 2015, it occurred to me to ask: Why Are Data Silos Opaque?

A popular search engine reports that sans duplicates, there were three hundred and thirty-three (333) “hits” on “data silo” that were updated in the last year. Far more reports than I want to list or that you want to read.

The common theme, of course, is the difficulty of accessing data silos.

OK, I’ll bite, why are data silos opaque?

Surely if our respective data silos are based on relational database technology, even with NoSQL, still a likely bet, don’t our programmers know about JDBC drivers? Doesn’t connecting to the data silo solve the problem?

Can we assume that data silos are not opaque due to accessibility? That is drivers exist for accessing data stores, modulo the necessity for system security. Yes?

Data silos aren’t opaque to the users who use them or the DBAs who maintain them. So opacity isn’t something inherent in the data silo itself because we know of people who successfully use what we call a data silo.

What do you think makes data silos opaque?

If we knew where the problem comes from, it might be possible to discuss solutions.


Sunday, July 19th, 2015


Whether you are tracking the latest outrageous statements from the Repubicans for U.S. President Clown Car or have more serious mapping purposes in mind, you need to take a look at MapFig. There are plugins from WordPress, Drupal, Joomla, and Omeka, along with a host of useful features.

There is one feature in particular I want to call to your attention: “Create highly customized leaflet maps quickly and easily.”

I stumbled over that sentence because I have never encountered “leaflet” maps before. Street, terrain, weather, historical, geological, archaeological, astronomical, etc., but no “leaflet” maps. Do they mean a format size? As in a leaflet for distribution? Seems unlikely because it is delivered electronically.

FAQ was no help. No hits at all.

Of course, you are laughing at this point because you know that “Leaflet” (note the uppercase “L”) is a JavaScript library developed by Vladimir Agafonkin.

So a “leaflet map” is one created using the Leftlet Javascript Library.

Clearer to say “Create highly customized maps quickly and easily using the Leaflet JS library.”




Sunday, July 19th, 2015

Aho-Corasick – Java implementation of the Aho-Corasick algorithm for efficient string matching.

From the webpage:

Nowadays most free-text searching is based on Lucene-like approaches, where the search text is parsed into its various components. For every keyword a lookup is done to see where it occurs. When looking for a couple of keywords this approach is great. But what about it if you are not looking for just a couple of keywords, but a 100,000 of them? Like, for example, checking against a dictionary?

This is where the Aho-Corasick algorithm shines. Instead of chopping up the search text, it uses all the keywords to build up a construct called a Trie. There are three crucial components to Aho-Corasick:

  • goto
  • fail
  • output

Every character encountered is presented to a state object within the goto structure. If there is a matching state, that will be elevated to the new current state.

However, if there is no matching state, the algorithm will signal a fail and fall back to states with less depth (ie, a match less long) and proceed from there, until it found a matching state, or it has reached the root state.

Whenever a state is reached that matches an entire keyword, it is emitted to an output set which can be read after the entire scan has completed.

The beauty of the algorithm is that it is O(n). No matter how many keywords you have, or how big the search text is, the performance will decline in a linear way.

Some examples you could use the Aho-Corasick algorithm for:

  • looking for certain words in texts in order to URL link or emphasize them
  • adding semantics to plain text
  • checking against a dictionary to see if syntactic errors were made

This library is the Java implementation of the afore-mentioned Aho-Corasick algorithm for efficient string matching. The algorithm is explained in great detail in the white paper written by Aho and Corasick:

The link to the Aho-Corasick paper timed out on me. Try: Efficient String Matching: An Aid to Bibliographic Search (CiteSeer).

The line that caught my eye was “adding semantics to plain text.

Apologies for the “lite” posting over the past several days. I have just completed a topic maps paper for the Balisage conference in collaboration with Sam Hunting. My suggestion is that you register before all the seats are gone.

Linux still rules supercomputing

Saturday, July 18th, 2015

Linux still rules supercomputing by Steven J. Vaughan-Nichols.

Cutting to the chase, 486 out of the top 500 computers is running Linux.

You already knew that. You’re reading this post on a Linux box. 😉

Now if we can just get it routinely onto desktops!


Saturday, July 18th, 2015


From the webpage:

These code examples accompany the O’Reilly video course “Intermediate d3.js: Charts, Layouts, and Maps”.

This video is preceded by the introductory video course “An Introduction to d3.js: From Scattered to Scatterplot”. I recommend watching and working through that course before attempting this one.

Some of these examples are adapted from the sample code files for Interactive Data Visualization for the Web (O’Reilly, March 2013).

If you have been looking to step up your d3 skills, here’s the opportunity to do so!


“Your mission Dan/Jim, should you decide to accept it” [Interior Department]

Friday, July 17th, 2015

Security of the U.S. Department of the Interior’s Publicly Accessible Information Technology Systems by Office of the Inspector General, U.S. Department of the Interior.

A sanitized version of a report that found:

Specifically, we found nearly 3,000 critical and high-risk vulnerabilities in hundreds of publicly accessible computers operated by these three Bureaus. If exploited, these vulnerabilities would allow a remote attacker to take control of publicly accessible computers or render them unavailable. More troubling, we found that a remote attacker could then use a compromised computer to attack the Department’s internal or non-public computer networks. The Department’s internal networks host computer systems that support mission-critical operations and contain highly sensitive data. A successful cyber attack against these internal computer networks could severely degrade or even cripple the Department’s operations, and could also result in the loss of sensitive data. These deficiencies occurred because the Department did not: 1) effectively monitor its publicly accessible systems to ensure they were free of vulnerabilities, or 2) isolate its publicly accessible systems from its internal computer networks to limit the potential adverse effects of a successful cyber attack.

It is hard to imagine anyone needing a vulnerability list in order to crack into the Interior Department. Rather than sanitize its reports, the Inspector General should publish a vulnerability by vulnerability listing. Years of concealing that type of information hasn’t improved the behavior of the Interior Department.

Time to see what charging upper management with criminal negligence can do after data breaches.

The title is from Mission Impossible, which is this case should be renamed: Mission Possible.

Mapping the Medieval Countryside

Thursday, July 16th, 2015

Mapping the Medieval Countryside – Places, People, and Properties in the Inquisitions Post Mortem.

From the webpage:

Mapping the Medieval Countryside is a major research project dedicated to creating a digital edition of the medieval English inquisitions post mortem (IPMs) from c. 1236 to 1509.

IPMs recorded the lands held at their deaths by tenants of the crown. They comprise the most extensive and important body of source material for landholding in medieval England. Describing the lands held by thousands of families, from nobles to peasants, they are a key source for the history of almost every settlement in England and many in Wales.

This digital edition is the most authoritative available. It is based on printed calendars of the IPMs but incorporates numerous corrections and additions: in particular, the names of some 48,000 jurors are newly included.

The site is currently in beta phase: it includes IPMs from 1418-1447 only, and aspects of the markup and indexing are still incomplete. An update later this year will make further material available.

The project is funded by the Arts and Humanities Research Council and is a collaboration between the University of Winchester and the Department of Digital Humanities at King’s College London. The project uses five volumes of the Calendars of Inquisitions Post Mortem, gen. ed. Christine Carpenter, xxii-xxvi (The Boydell Press, Woodbridge, 2003-11) with kind permission from The Boydell Press. These volumes are all in print and available for purchase from Boydell, price £195.

One of the more fascinating aspects of the project is the list of eighty-nine (89) place types, which can be used for filtering. Just scanning the list I happened across “rape” as a place type, with four (4) instances recorded thus far.

The term “rape” in this context refers to a subdivision of the county of Sussex in England. The origin of this division is unknown but it pre-dates the Norman Conquest.

The “rapes of Sussex” and the eighty-eight (88) other place types are a great opportunity to explore place distinctions that may or may not be noticed today.


Increase Multi-Language Productivity with Document Translator

Wednesday, July 15th, 2015

Increase Multi-Language Productivity with Document Translator

From the post:

The Document Translator app and the associated source code demonstrate how Microsoft Translator can be integrated into enterprise and business workflows. The app allows you to rapidly translate documents, individually or in batches, with full fidelity—keeping formatting such as headers and fonts intact, and allowing you to continue editing if necessary. Using the Document Translator code and documentation, developers can learn how to incorporate the functionality of the Microsoft Translator cloud service into a custom workflow, or add extensions and modifications to the batch translation app experience. Document Translator is a showcase for use of the Microsoft Translator API to increase productivity in a multi-language environment, released as an open source project on GitHub.

Whether you are writing in Word, pulling together the latest numbers into Excel, or creating presentations in PowerPoint, documents are at the center of many of your everyday activities. When your team speaks multiple languages, quick and efficient translation is essential to your organization’s communication and productivity. Microsoft Translator already brings the speed and efficiency of automatic translation to Office, Yammer, as well as a number of other apps, websites and workflows. Document Translator uses the power of the Translator API to accelerate the translation of large numbers of Word, PDF*, PowerPoint, or Excel documents into all the languages supported by Microsoft Translator.

How many languages does your topic map offer?

That many?

The Translator FAQ lists these languages for the Document Translator:

Microsoft Translator supports languages that cover more than 95% of worldwide gross domestic product (GDP)…and one language that is truly out of this world: Klingon.

Arabic English Hungarian Maltese Slovak Yucatec Maya
Bosnian (Latin) Estonian Indonesian Norwegian Slovenian
Bulgarian Finnish Italian Persian Spanish
Catalan French Japanese Polish Swedish
Chinese Simplified German Klingon Portuguese Thai
Chinese Traditional Greek Klingon (plqaD) Queretaro Otomi Turkish
Croatian Haitian Creole Korean Romanian Ukrainian
Czech Hebrew Latvian Russian Urdu
Danish Hindi Lithuanian Serbian (Cyrillic) Vietnamese
Dutch Hmong Daw Malay Serbian (Latin) Welsh

I have never looked for a topic map in Klingon but a translation could be handy at DragonCon.

Fifty-one languages by my count. What did you say your count was? 😉 (beta)

Wednesday, July 15th, 2015 (beta)

From the announcement post:

In February this year we announced that we will be iteratively improving the user experience. Today we are launching the new Beta site. There are many changes and we hope you will like them.

  • Dataset pages have been greatly simplified so that you can get to your data within two clicks.
  • We have re-written many of the descriptions to simply explanations.
  • We have launched which is aimed at non-developers to search and then download data.
  • We have also greatly improved and revised our API documentation. For example have a look here
  • We have added content from our blog and twitter feeds into the home page and I hope you agree that we are now presenting a more cohesive offering.

We are still working on datasets, and those in the pipeline waiting for release imminently are

  • Bills meta-data for bills going through Parliamentary process.)
  • Commons Select Committee meta-data.
  • Deposited Papers
  • Lords Attendance data

Let us know what you think.

There could be some connection between what the government says publicly and what it does privately. As they say, “anything is possible.”

Curious, what do you make of the Thesaurus?

Typing the “related” link to say how they are related would be a step in the right direction. Apparently there is an organization with the title: “‘Sdim Curo Plant!” (other sources report Welsh for “Children are Unbeatable”.) Which turns out to be the preferred label.

The entire set has 107,337 records and can be downloaded, albeit in 500 record chunks. That should improve over time according to: Downloading data from data.parliment.

I have always been interested in what terms other people use and this looks like an interesting data set, that is part of a larger interesting data set.


Cybersecurity Poverty Index 2015

Wednesday, July 15th, 2015

Cybersecurity Poverty Index 2015

A great survey graphic of the state of cybersecurity poverty by RSA.

The entire survey is worth a look but the Key Takeaways, are real treasures:

Organizations still prioritize protection over detection and response, despite the fact that protection is fundamentally incapable of stopping today’s greatest cyber threats.

The biggest weakness of surveyed organizations is the ability to measure, assess, and mitigate cybersecurity risk, which makes it difficult or impossible to prioritize security activity and investment.

It is nice to have RSA confirm my adding cybersecurity protection graphic:


Software, including security software, is so broken that even attempting add on security is worthless.

That doesn’t mean better software practices should not be developed but in the meantime, you are better off monitoring and responding to specific threats.

I don’t know of anyone who would disagree that being unable to “measure, assess, and mitigate cybersecurity risk,” makes setting security priorities impossible.

But, why do organizations lack those capabilities?

Do you know of any surveys/studies that address the “why” issue?

I suspect it is a lack of incentive. Consider the following paragraph from CardHub on credit card fraud:

What consumers generally do not know is that they are shielded from liability for unauthorized transactions made with their credit cards via the combination of federal law issuer/card network policy. As a result, financial institutions and merchants assume responsibility for most of the money lost as a result of fraud. For example, card issuers bore a 63% share of fraudulent losses in 2012 and merchants assumed the other 37% of liability, according to the Nilson Report, August 2013.

With credit card fraud at $11.2 billion in 2012, you would think card issuers and merchants would have plenty of incentive for reducing this loss.

Simple steps, like requiring a second form of identification, a slight delay as the transaction goes through fraud prevention, etc., could make a world of difference. But, they would also impact the convenience of using credit cards.

Do you care to guess what strategy credit card issuers chose? Credit card holders are extolled to prevent credit card fraud, which has no impact on them in most events.

Does that offer a clue to the reason for the lack of proper preparation for cybersecurity?

Yes, breaches occur, yes, we sustain losses, yes, those losses are regrettable, but, we have no ROI measure for an investment in effective cybersecurity.

Unless and until there are financial incentives and an ROI to be associated with cybersecurity, it is unlikely we will see significant progress on that front.

Clojure At Scale @WalmartLabs

Wednesday, July 15th, 2015

From the description:

There are many resources to help you build Clojure applications. Most however use trivial examples that rarely span more than one project. What if you need to build a big clojure application comprising many projects? Over the three years that we’ve been using Clojure at WalmartLabs, we’ve had to figure this stuff out. In this session, I’ll discuss some of the challenges we’ve faced scaling our team and code base as well as our experience using Clojure in the enterprise.

I first saw this mentioned by Marc Phillips in a post titled: Walmart Runs Clojure At Scale. Marc mentions a tweet from Anthony Marcar that reads:

Our Clojure system just handled its first Walmart black Friday and came out without a scratch.

Black Friday,” is the Friday after the Thanksgiving holiday in the United States. Since 2005, it has been the busiest shopping day of the year and in 2014, $50.9 billion was spend on that one day. (Yes, billions with a “b.”)

Great narrative of issues encountered as this system was built to scale.

ProxyHam’s early demise… [+ an alternative]

Wednesday, July 15th, 2015

ProxyHam’s early demise gives way to new and improved privacy devices by Dan Goodin.

From the post:

Privacy advocates disappointed about the sudden and unexplained demise of the ProxyHam device for connecting to the Internet have reason to cheer up: there are two similarly low-cost boxes that do the same thing or even better.

The more impressive of the two is the ProxyGambit, a $235 device that allows people to access an Internet connection from anywhere in the world without revealing their true location or IP address. One-upping the ProxyHam, its radio link can offer a range of up to six miles, more than double the 2.5 miles of the ProxyHam. More significantly, it can use a reverse-tunneled GSM bridge that connects to the Internet and exits through a wireless network anywhere in the world, a capability that provides even greater range.

A bit pricey and 2.5 miles doesn’t sound like a lot to me.

Using Charter Communications as my cable provider, my location is shown by router to be twenty (20) miles from my physical location. Which makes for odd results when sites try to show a store “nearest to” my physical location.

Of course, Charter knows the actual service address and I have no illusions about my cable provider throwing themselves on a grenade to save me. Or a national security letter.

With a little investigation you can get distance from your physical location for free in some instances, bearing in mind that if anyone knows where you are, physically, then you can be found.

Think of security as a continuum that runs from being broadcast live at a public event to lesser degrees of openness. The question always is how much privacy is useful to you at what cost?

+300 Latin American Investigations

Wednesday, July 15th, 2015

Database Launched with +300 Latin American Investigations by Gabriela Manuli.

A unique database of more than 300 investigative journalism reports from across Latin America is now available from The Institute for Press and Society (Instituto Prensa y Sociedad, or IPYS). Called BIPYS (Banco de Investigaciones Periodísticas, or Bank of Investigative Journalism) the UNESCO-backed initiative was announced July 6 at the annual conference of Abraji, Brazil’s investigative journalism association.

BIPYS is a repository of many of the best examples of investigative journalism in the region, comprised largely of winners of the annual Latin American Investigative Journalism Awards that IPYS and Transparency International have given out for the past 13 years.

Investigations cover a wide range of topics, including corruption, malfeasance, organized crime, environment, national security, and human rights.

See Gabriela’s post for more but in summary the site is still under development and fees being discussed.

An admirable effort considering that words in Latin American can and do have real consequences.

Unlike some places where disagreement can be quite heated but when the broadcast ends, the participants slip away for drinks together. Meanwhile, the subjects of their disagreement continue to struggle and die due to policy decisions made far, far away.

Google Data Leak!

Wednesday, July 15th, 2015

Google accidentally reveals data on ‘right to be forgotten’ requests by Sylvia Tippman and Julia Powles.

From the post:

Less than 5% of nearly 220,000 individual requests made to Google to selectively remove links to online information concern criminals, politicians and high-profile public figures, the Guardian has learned, with more than 95% of requests coming from everyday members of the public.

The Guardian has discovered new data hidden in source code on Google’s own transparency report that indicates the scale and flavour of the types of requests being dealt with by Google – information it has always refused to make public. The data covers more than three-quarters of all requests to date.

Previously, more emphasis has been placed on selective information concerning the more sensational examples of so-called right to be forgotten requests released by Google and reported by some of the media, which have largely ignored the majority of requests made by citizens concerned with protecting their personal privacy.

It is a true data leak but not nearly as exciting as it sounds. If you follow the Explore the data link, you will find a link to “snapshots on WayBack Machine” that will provide access to the data now scrubbed from Google transparency reports. Starting about three months ago the data simply disappeared from the transparency reports.

Here is an example from the February 4th report as saved by the WayBack Machine:

“GB”: { “name”: “United Kingdom”, “requests”: {“all”: {“rejected”: 11308, “total”: 26979, “pending”: 989, “complied”: 8527, “need_more_info”: 4050}, “issues”: {“serious_crime”: {“rejected”: 483, “total”: 694, “pending”: 28, “complied”: 93, “need_more_info”: 90}, “cp”: {“rejected”: 260, “total”: 339, “pending”: 11, “complied”: 29, “need_more_info”: 39}, “political”: {“rejected”: 83, “total”: 117, “pending”: 4, “complied”: 19, “need_more_info”: 11}, “private_personal_info”: {“rejected”: 10185, “total”: 23217, “pending”: 934, “complied”: 8201, “need_more_info”: 3857}, “public_figure”: {“rejected”: 156, “total”: 220, “pending”: 12, “complied”: 38, “need_more_info”: 13}}}, “urls”: {“all”: {“rejected”: 55731, “total”: 105337, “pending”: 3677, “complied”: 29148, “need_more_info”: 15429}, “issues”: {“serious_crime”: {“rejected”: 2413, “total”: 3249, “pending”: 81, “complied”: 298, “need_more_info”: 455}, “cp”: {“rejected”: 1160, “total”: 1417, “pending”: 22, “complied”: 90, “need_more_info”: 144}, “political”: {“rejected”: 345, “total”: 482, “pending”: 17, “complied”: 58, “need_more_info”: 59}, “private_personal_info”: {“rejected”: 49926, “total”: 97413, “pending”: 3442, “complied”: 28118, “need_more_info”: 14603}, “public_figure”: {“rejected”: 1430, “total”: 1834, “pending”: 115, “complied”: 190, “need_more_info”: 95}}} },

The post concludes with:

Dr Paul Bernal, lecturer in technology and media law at the UEA School of Law, argues that the data reveals that the right to be forgotten seems to be a legitimate piece of law. “If most of the requests are private and personal ones, then it’s a good law for the individuals concerned. It seems there is a need for this – and people go for it for genuine reasons.”

On the contrary, consider this chart (from the Guardian explore the data page):


The data shows that 96% of the requests are likely to have one searcher, the person making the request.

If the EU wants to indulge such individuals, it should create a traveling “Board of the Right to Be Forgotten,” populate it with judges, clerks, transcribers, translators, etc. that visits every country in the EU on some regular schedule and holds televised hearings for every applicant and publishes written decisions (in all EU languages) on which links should be delisted from Google.

That would fund the travel, housing and entertainment industries in the EU, a perennial feature of EU funding and relieve Google of the distraction of such cases. It would establish a transparent record of the self-obsessed who request delisting of facts from a search engine and the facts deleted.

Decisions by a “Board of the Right to Be Forgotten” would also enable the monetization of requests to be forgotten, by easing the creation of search engines that only report facts “forgotten” by Google. Winners all the way around!

Blue Light Special: Windows Server 2003

Wednesday, July 15th, 2015

“Blue light special” is nearly a synonym for KMart. If you search for “blue light special” at Wikipedia, you will be redirected to the entry for Kmart.

A “blue light special” consisted of a blue police light being turned on and a KMart employee announcing the special to all shoppers in the store.

As of Tuesday, July 14, 2015, there are now blue light specials on Windows Server 2003. Well, sans the blue police light and the KMart employee. But hackers will learn of vulnerabilities in Windows Server 2003 and there will be no patches to close off those opportunities.

The last patches for Windows Server 2003 were issued on Tuesday and are described at: Microsoft releases 14 bulletins on Patch Tuesday, ends Windows Server 2003 support.

You can purchase, from Microsoft, special support contracts but as the experience of the US Navy has shown, that can be an expensive proposition ($9.1 million per year).

That may sound like a lot of income, and it is to a small to medium company, but remember that $9.1 million is 0.00010% of Microsoft’s revenue as shown in its 2014 Annual Report.

I don’t know who to ask at Microsoft but they could should making Windows XP, Windows Server 2003, etc. into open source projects.

Some 61% of businesses are reported to still be using Windows Server 2003. Support beyond the end of life for Windows Server 2003 will be $600 per server, for the first year with higher fees to follow.

Although open sourcing Windows Server 2003 might cut into some of the maintenance contract income, it would greatly increase the pressure on businesses to migrate off of Windows Server 2003 as hackers get first hand access to this now ancient code base.

In some ways, open sourcing Windows XP, Windows Server 2003 could be a blue light special that benefits all shoppers.

Microsoft obtains the obvious benefits of greater demand, initially, for formal support contracts and in the long run, the decreasing costs of maintaining ancient code bases, plus new income from migrations.

People concerned with the security, or lack thereof in ancient systems gain first hand knowledge of those systems and bugs to avoid in the future.

IT departments benefit from having stronger grounds to argue that long delayed migrations must be undertaken or face the coming tide of zero-day vulnerabilities based on source code access.

Users benefit in the long run from the migration to modern computing architectures and their features. A jump comparable to going from a transistor radio to a smart phone.

Beyond Code

Tuesday, July 14th, 2015

From the description:

To understand large legacy systems we need to look beyond the current structure of the code. We need to understand both how the system evolves and how the people building it collaborate. In this session you’ll learn about a Clojure application to mine social information such as communication paths, developer knowledge and hotspots from source code repositories (including Clojure itself). It’s information you use to improve both the design and the people-side of your codebase. We’ll also look at some interesting libraries like Incanter and Instaparse before we discuss the pros and cons of writing the tool in Clojure.

From the presentation:

“Laws” of Software Evolution

Continuing Change

“a system must be continually adapted or it becomes progressively less satisfactory”

Increasing Complexity

“as a system evolves, its complexity increases unless work is done to maintain or reduce it.

Those two “laws” can be claimed for software but they are applicable to any system, including semantics.

Adam develops code that uses source control logs to identify “hot spots” in code, which is available at: Code Maat.

You are likely to also be interested in Adam’s book: Your Code as a Crime Scene: Use Forensic Techniques to Arrest Defects, Bottlenecks, and Bad Design in Your Programs

Early in his presentation Adam mentions that the majority of a programmer’s time isn’t spent programming but rather “…making changes to existing code and the majority of that, trying to understand what the code does….” Imagine capturing your “understanding” of existing code using a topic map.

That increases your value-add to the organization. Yes?

Iranian Nuclear Arm Deal (full text)

Tuesday, July 14th, 2015

Here’s the full text of the Iran nuclear deal via Max Fisher.

From the post:

Here is the full text of the Iran nuclear deal. The “E3/EU+3” is a reference to the world powers that negotiated the deal with Iran (three European Union states of UK, France, and Germany, plus three others of China, the US, and Russia). A lot of the text is highly technical, but it’s still surprisingly readable for an international arms control agreement that was hammered out in many past-midnight sessions.

Kudos to Max and Vox for making the primary text available.

Suggest you selectively print it out and keep a copy close at hand while watching reporting/commentary on the document in your hand.

I say selectively because the full document runs one-hundred and fifty-nine (159) pages with some pages being entirely lists of entities that are no longer sanctioned and similar details.

Courthouse High Club

Tuesday, July 14th, 2015

You have no doubt heard of the “mile high club,” well, now there is even a more exclusive club for the sexually adventuresome. The Courthouse High Club as reported in: US Marshals Employee Caught Having Sex On Courthouse Roof in Pennsylvania.

From the post:

A resident of a nearby apartment building who was concerned that there was a security breach snapped the pictures this week and sent them to WHTM-TV in Harrisburg, which alerted authorities.

Is this another instance of see something, say something?

The inability to filter duplicates of this story prevents estimating the membership of the courthouse high club.

Visualising Geophylogenies in Web Maps Using GeoJSON

Monday, July 13th, 2015

Visualising Geophylogenies in Web Maps Using GeoJSON by Roderic Page.


This article describes a simple tool to display geophylogenies on web maps including Google Maps and OpenStreetMap. The tool reads a NEXUS format file that includes geographic information, and outputs a GeoJSON format file that can be displayed in a web map application.

From the introduction (with footnotes omitted):

The increasing number of georeferenced sequences in GenBank [ftnt omitted] and the growth of DNA barcoding [ftnt omitted] means that the raw material to create geophylogenies [ftnt omitted] is readily available. However, constructing visualisations of phylogenies and geography together can be tedious. Several early efforts at visualising geophylogenies focussed on using existing GIS software [ftnt omitted], or tools such as Google Earth [ftnt omitted]. While the 3D visualisations enabled by Google Earth are engaging, it’s not clear that they are easy to interpret. Another tool, GenGIS [ftnt omitted], supports 2D visualisations where the phylogeny is drawn flat on the map, avoiding some of the problems of Google Earth visualisations. However, like Google Earth, GenGIS requires the user to download and install additional software on their computer.

By comparison, web maps such as Google Maps [ftnt omitted] are becoming ubiquitous and work in most modern web browsers. They support displaying user-supplied data, including geometrical information encoded in formats such as GeoJSON, making them a light weight alternative to 3D geophylogeny viewers. This paper describes a tool that makes use of the GeoJSON format and the capabilities of web maps to create quick and simple visualisations of geophylogenies.

Whether you are interested in geophylogenies or in the use of GeoJSON, this is a post for you.


Building News Apps Quickly?

Monday, July 13th, 2015

Want to make it easier to build news apps quickly? Vox Media has opensourced its solution, Autotune by Justin Ellis.

From the post:

Making a beautiful app for news is great; making a beautiful reusable app for news is better. At least that’s the thinking behind a new project released by Vox Media today: Autotune is a system meant to simplify the creation and duplication of things like data visualizations, graphics, or games.

Autotune was designed by members of the Vox Media product team to cut down on the repetitive work of taking one project — say, a an image slider — and making it easy to use elsewhere. It’s “a centralized management system for your charts, graphics, quizzes and other tools, brought to you by the Editorial Products team at Vox Media,” according to the project’s GitHub page. And, yes, that means Autotune is open source.

Sounds like a great project but I will have to get a cellphone to pass judgement on apps. 😉 I would have to get a Farraday cage to keep it in when not testing apps.

The impact of fast networks on graph analytics, part 1

Monday, July 13th, 2015

The impact of fast networks on graph analytics, part 1 by Frank McSherry.

From the post:

This is a joint post with Malte Schwarzkopf, cross-blogged here and at the CamSaS blog.

tl;dr: A recent NSDI paper argued that data analytics stacks don’t get much faster at tasks like PageRank when given better networking, but this is likely just a property of the stack they evaluated (Spark and GraphX) rather than generally true. A different framework (timely dataflow) goes 6x faster than GraphX on a 1G network, which improves by 3x to 15-17x faster than GraphX on a 10G network.

I spent the past few weeks visiting the CamSaS folks at the University of Cambridge Computer Lab. Together, we did some interesting work, which we – Malte Schwarzkopf and I – are now going to tell you about.

Recently, a paper entitled “Making Sense of Performance in Data Analytics Frameworks” appeared at NSDI 2015. This paper contains some surprising results: in particular, it argues that data analytics stacks are limited more by CPU than they are by network or disk IO. Specifically,

“Network optimizations can only reduce job completion time by a median of at most 2%. The network is not a bottleneck because much less data is sent over the network than is transferred to and from disk. As a result, network I/O is mostly irrelevant to overall performance, even on 1Gbps networks.” (§1)

The measurements were done using Spark, but the authors argue that they generalize to other systems. We thought that this was surprising, as it doesn’t match our experience with other data processing systems. In this blog post, we will look into whether these observations do indeed generalize.

One of the three workloads in the paper is the BDBench query set from Berkeley, which includes a “page-rank-like computation”. Moreover, PageRank also appears as an extra example in the NSDI slide deck (slide 38-39), used there to illustrate that at most a 10% improvement in job completion time can be had even for a network-intensive workload.

This was especially surprising to us because of the recent discussion around whether graph computations require distributed data processing systems at all. Several distributed systems get beat by a simple, single-threaded implementation on a laptop for various graph computations. The common interpretation is that graph computations are communication-limited; the network gets in the way, and you are better off with one machine if the computation fits.[footnote omitted]

The authors introduce Rust and timely dataflow to achieve rather remarkable performance gains. That is if you think a 4x-16x speedup over GraphX on the same hardware is a performance gain. (Most do.)

Code and instructions are available so you can test their conclusions for yourself. Hardware is your responsibility.

While you are waiting for part 2 to arrive, try Frank’s homepage for some fascinating reading.

Nominations by the U.S. President

Monday, July 13th, 2015

Nominations by the U.S. President

The Library of Congress created this resource which enables you to search for nominations by U.S. Presidents starting in 1981. There information about the nomination process, the records and related nomination resources at About Nominations of the U.S. Congress.

Unfortunately I did not find a link to bulk data for presidential nominations nor an API for the search engine behind this webpage.

I say that because matching up nominees and/or their sponsors with campaign contributions would help get a price range on becoming the ambassador to Uraguay, etc.

I wrote to Ask a Law Librarian to check on the status of bulk data and/or an API. Will amend this post when I get a response.

Oh, there will be a response. For all the ills and failures of the U.S. government, which are legion, it is capable of assembling vast amounts of information and training people to perform research on it. Not in every case but if it falls within the purview of the Law Library of Congress, I am confident of a useful answer.

22 Amazing Sites and Tools

Monday, July 13th, 2015

22 of The Most Cool and Amazingly Useful Sites (and Tools) You Are Not Using by Thomas Oppong.

From the post:

The Internet is full of fascinating information. While getting lost can sometimes be entertaining, most people still prefer somewhat guided surfing excursions. It’s always great to find new websites that can help out and be useful in one way or another.

Here are a few of our favorites.

Thomas does a great job of collecting, the links/tools run from odd, photos taken at the same location but years later (haven’t they heard of Photoshop?), to notes that self destruct after being read.

Remember for the self-destroying notes that bytes did cross the Internet to arrive. Capturing them sans the self-destruction method is probably a freshman exercise at better CS programs.

My problem is that my bookmark list probably has hundreds of useful links, if I could just remember which subjects they were associated with and was able to easily retrieve them. Yes, the cobbler’s child with no shoes.

Still, it is an interesting list.

What’s yours?

Democracy or Bankocracy? (+ Offsetting Greek Debt)

Monday, July 13th, 2015

If you are topic mapping the Greek debt crisis, have you considered what form of government Greece will have in the event that a recent agreement is approved by the Greek legislature?.

The agreement, hammered out in what would be considered a coercive environment for criminal confessions, makes it clear that Greece is no longer a democracy.

Webster’s defines a democracy as:

a government in which the supreme power is vested in the people and exercised by them directly or indirectly through a system of representation usually involving periodically held free elections

Now read the statement from the Euro Summit.

Although the entire document is offensive to any notion of sovereignty and democracy, on page 4 you will find:

On top of that, the Greek authorities shall take the following actions:

  • to develop a significantly scaled up privatisation programme with improved governance; valuable Greek assets will be transferred to an independent fund that will monetize the assets through privatisations and other means. The monetization of the assets will be one source to make the scheduled repayment of the new loan of ESM and generate over the life of the new loan a targeted total of EUR 50bn of which EUR 25bn will be used for the repayment of recapitalization of banks and other assets and 50% of every remaining euro (i.e. 50% of EUR 25bn) will be used for decreasing the debt to GDP ration and the remaining 50% will be used for investments.

    This fund would be established in Greece and managed by the Greek authorities under the supervision of relevant European Institutions. In agreement with Institutions and building on best international practices, a legislative framework should be adopted to ensure transparent procedures and adequate asset sale pricing, according to OECD principles and standards on the management of State Owned Enterprises (SOEs);

So a “democratic” government is being forced to submit to supervision of its debt repayment, by sales which will undoubtedly benefit investors of the states doing the forcing?

The agreement contains other provisions that make it clear the Greek government will be operation under the supervision of its creditors.

If the term bankocracy is unfamiliar, it is a form of government where banks and creditors dictate to citizens of a country government “reform,” legislation, social policy, etc.

Disagreement by the public (the sovereign in democracies) is followed by smothering the public’s economy with financial measures. People get to vote but creditors have the only votes that count.

Creditors strive for that position vis-a-vis debtors in the United States and to no small degree have turned bankruptcy courts into their collection agencies.

Should Greece “accept” this agreement, it will mean Greece is no longer a democracy but in fact a bankocracy, surviving only at the indulgence of its creditors.

Offsetting Greek Debt

For all of the discussion about Greek “debt,” there is an unmentioned solution that would resolve the Greek debt and leave Greece with a positive cash flow.

The EU nations and others, can repay Greece and/or cancel debts owed by Greece, to the equivalent to the current value (plus accrued interest), on items looted from Greece.

Since “trust” is a legendary issue with the rest of the EU, they should take immediate steps to purge themselves of their status of being thieves.

One advantage of my proposal is that its likely popularity with the Greek people will mean that passing the necessary legislation will not be difficult.

Another advantage of my suggestion is Greece would be spared the transition from democracy to bankocracy.

The “trust” I would offer Greece’s creditors is the promise that democracy will not become a bankocracy, not at the birth place of democratic thought.

Greece’s creditors are desperate to avoid any people realizing their scribbling has no value without the self-enslavement of people and their governments. Greece should denounce the current “indebtedness” as a control mechanism, meant to benefit the few at the expense of the many. Part of the price of freedom is the courage to believe yourself to be free.

Radically slashing the principal and interest on Greek debt, so as to enable a comfortable repayment plan, as authorized by the Greek people, is more than enough to offer EU usurers in exchange for honoring their fiction.

EU creditors are trying to conquer Greece without the use of military force. That could be considered an act of war and Greece may start looking for allies to resist EU aggression. There are less pleasant alternatives than EU usurers being unhappy as a result of this “crisis.”

OEWatch (Operational Environment Watch)

Sunday, July 12th, 2015

OEWatch (Operational Environment Watch)

From the webpage:

FMSO’s Operational Environment Watch provides translated selections and analysis from a diverse range of foreign articles and other media that our analysts believe will give military and security experts an added dimension to their critical thinking about the Operational Environment.

The Foreign Military Studies Office (FMSO) at Fort Leavenworth, Kansas, is an open source research organization of the U.S. Army. Founded as the Soviet Army Studies Office in 1986, it was an innovative program that brought together military specialists and civilian academics to focus on military and security topics derived from unclassified, foreign media. The results were unclassified articles and papers that provided new understandings and broad access to information from a base of expertise in the U.S. Army, Department of Defense, and foreign and U.S. defense communities and universities.

Today FMSO maintains this research tradition of special insight and highly collaborative work. FMSO conducts unclassified research of foreign perspectives of defense and security issues that are understudied or unconsidered but that are important for understanding the environments in which the U.S. military operates. FMSO’s work today is still aimed at publication in unclassified journals and its research findings are taught in both military and civilian venues in the United States and around the world. FMSO is organized in the U.S. Army Training and Doctrine Command under the TRADOC G-2.

If you are working in open source intelligence, OEWatch is already familiar.

If OEWatch isn’t familiar and you are interested in foreign perspectives, you should add it to your reading list.

Granting that OEWatch has a perspective, it does collect, collate and dispense high quality information that would be difficult to collect for yourself.

BTW, OEWatch does visually separate its commentary from the content it is reporting from other sources. A far cry from U.S. media treatment of foreign news.

Reddit Archive! 1 TB of Comments

Sunday, July 12th, 2015

You can now download a dataset of 1.65 billion Reddit comments: Beware the Redditor AI by Mic Wright.

From the post:

Once our species’ greatest trove of knowledge was the Library of Alexandria.

Now we have Reddit, a roiling mass of human ingenuity/douchebaggery that has recently focused on tearing itself apart like Tommy Wiseau in legendarily awful flick ‘The Room.’

But unlike the ancient library, the fruits of Reddit’s labors, good and ill, will not be destroyed in fire.

In fact, thanks to Jason Baumgartner of (aided by The Internet Archive), a dataset of 1.65 billion comments, stretching from October 2007 to May 2015, is now available to download.

The data – pulled using Reddit’s API – is made up of JSON objects, including the comment, score, author, subreddit, position in the comment tree and a range of other fields.

The uncompressed dataset weighs in at over 1TB, meaning it’ll be most useful for major research projects with enough resources to really wrangle it.

Technically, the archive is incomplete, but not significantly. After 14 months of work and many API calls, Baumgartner was faced with approximately 350,000 comments that were not available. In most cases that’s because the comment resides in a private subreddit or was simply removed.

If you don’t have a spare TB of space at the moment, you will also be interested in:, where you will find several BigQueries already.

The full data set certainly makes an interesting alternative to the Turing test for AI. Can you AI generate without assistance or access to this data set, the responses that appear therein? Is that a fair test for “intelligence?”

If you want updated data, consult the Reddit API.

Hacking Team Email Archive

Saturday, July 11th, 2015

Hacking Team Email Archive (Wikileaks)

Wikileaks has created a searchable version of over one (1) million emails from Hacking Team.