Archive for April, 2016

Similar Pages for Wikipedia – Lateral – Indexing Practices

Saturday, April 23rd, 2016

Similar Pages for Wikipedia (Chrome extension)

I started looking at this software with a mis-impression that I hope you can avoid.

I installed the extension and as advertised, if I am on a Wikipedia page, it recommends “similar” Wikipedia pages.

Unless I’m billing time, plowing through page after page of tangentially related material isn’t my idea of a good time.

Ah, but I confused “document” with “page.”

I discovered that error while reading Adding Documents at Lateral, which gives the following example:


Ah! So “document” means as much or as little text as I choose to use when I add the document.

Which means if I were creating a document store of graph papers, I would capture only the new material and not the inevitable a “graph consists of nodes and edges….”

There are pre-populatd data sets, News 350,000+ news and blog articles, updated every 15 mins; arXiv 1M+ papers (all), updated daily; PubMed 6M+ medical journals from before July 2014; SEC 6,000+ yearly financial reports / 10-K filings from 2014; Wikipedia 463,000 pages which had 20+ page views in 2013.

I suspect the granularity on the pre-populated data sets is “document” in the usual sense size.

Glad to see the option to define a “document” to be an arbitrary span of text.

I don’t need to find more “documents” (in the usual sense) but more relevant snippets that are directly on point.

Hmmm, perhaps indexing at the level of paragraphs instead of documents (usual sense)?

Which makes me wonder why we index at the level of documents (usual sense) anyway? Is it simply tradition from when indexes were prepared by human indexers? And indexes were limited by physical constraints?

300 Terabytes of Raw Collider Data

Saturday, April 23rd, 2016

CERN Just Dropped 300 Terabytes of Raw Collider Data to the Internet by Andrew Liptak.

From the post:

Yesterday, the European Organization for Nuclear Research (CERN) dropped a staggering amount of raw data from the Large Hadron Collider on the internet for anyone to use: 300 terabytes worth.

The data includes a 100 TB “of data from proton collisions at 7 TeV, making up half the data collected at the LHC by the CMS detector in 2011.” The release follows another infodump from 2014, and you can take a look at all of this information through the CERN Open Data Portal. Some of the information released is simply the raw data that CERN’s own scientists have been using, while another segment is already processed, with the anticipated audience being high school science courses.

It’s not the same as having your own cyclotron in the backyard with a bubble chamber but its the next best thing!

If you have been looking for “big data” to stretch your limits, this fits the bill nicely.

Peer Review Fails, Again.

Saturday, April 23rd, 2016

One in 25 papers contains inappropriately duplicated images, screen finds by Cat Ferguson.

From the post:

Elisabeth Bik, a microbiologist at Stanford, has for years been a behind-the-scenes force in scientific integrity, anonymously submitting reports on plagiarism and image duplication to journal editors. Now, she’s ready to come out of the shadows.

With the help of two editors at microbiology journals, she has conducted a massive study looking for image duplication and manipulation in 20,621 published papers. Bik and co-authors Arturo Casadevall and Ferric Fang (a board member of our parent organization) found 782 instances of inappropriate image duplication, including 196 published papers containing “duplicated figures with alteration.” The study is being released as a pre-print on bioArxiv.

I don’t know if the refusal of three (3) journals to date to publish this work or that peer reviewers of the original papers missed the duplication is the sadder news about this paper.

Being in the business of publishing, not in the business of publishing correct results, the refusal to publish an article that establishes the poor quality of those publications, is perhaps understandable. Not acceptable but understandable.

Unless the joke is on the reading public and other researchers. Publications are just that, publications. May or may not resemble any experiment or experience that can be duplicated by others. Rely on published results at your own peril.

Transparent access to all data and not peer review is the only path to solving this problem.

Doom as a tool for system administration (1999) – Pen Testing?

Saturday, April 23rd, 2016

Doom as a tool for system administration by Dennis Chao.

From the webpage:

As I was listening to Anil talk about daemons spawning processes and sysadmins killing them, I thought, “What a great user interface!” Imagine running around with a shotgun blowing away your daemons and processes, never needing to type kill -9 again.

In Doom: The Aftermath you will find some later references, the most recent being from 2004.

You will have better luck at the ACM Digital library entry for Doom as an interface for process management that lists 29 subsequent papers citing Chao’s work on Doom. Latest is 2015.

If system administration with a Doom interface sounds cool, imagine a Doom hacking interface.

I can drive a car but I don’t set the timing, adjust the fuel injection, program the exhaust controls to beat inspectors, etc.

A higher level of abstraction for tools carries a cost but advantages as well.

Imagine cadres of junior high/high school students competing in pen testing contests.

Learning a marketable skill and helping cash-strapped IT departments with security testing.

Isn’t that a win-win situation?

Nine Inch Gremlins

Saturday, April 23rd, 2016

Nine Inch Gremlins


Stephen Mallette writes:

On the back of TinkerPop 3.1.2-incubating comes TinkerPop 3.2.0-incubating. Yes – a dual release – an unprecedented and daring move you’ve come to expect and not expect from the TinkerPop clan! Be sure to review the upgrade documentation in full as you may find some changes that introduce some incompatibilities.

The release artifacts can be found at this location:

The online docs can be found here: (user docs) (upgrade docs) (core javadoc) (full javadoc)

The release notes are available here:

The Central Maven repo has sync’d as well:

Another impressive release!

In reading the documentation I discovered that Ketrina Yim is responsible for drawing Gremlin and his TinkerPop friends.

I was relieved to find that Marko was only responsible for the Gremlin/TinkerPop code/prose and not the graphics as well. That would be too much talent for any one person! 😉


How to Use Excel

Friday, April 22nd, 2016

At the other end of the software universe from Erlang is Excel. 😉

Sometimes you don’t need a hand rolled death ray but something more prosaic.

But you need skills even with simple tools. This tutorial will get you past a number of “obvious” to experienced Excel user gotchas.

How to Use Excel

From the post:

Ever find yourself elbows deep in an Excel worksheet with seemingly no end in sight? You’re manually replicating columns and scribbling down long-form math on a scrap of paper, all while thinking to yourself, “There has to be a better way to do this.”

Truth be told, there probably is … you just don’t know it yet. In a world where being proficient in Excel is often regarded as no more impressive than being proficient at breathing, there are still plenty of tips and tricks that remain unknown to the majority of office workers.

Mastering Excel specifically for marketing is another beast in its own right. More than likely, you’ve already been tasked with analyzing data from an NPS survey, performing a content topic analysis, or pulling in sales data to calculate return on marketing investment — all of which require a bit more Excel knowledge than a simple SUM formula.

Here’s where this guide comes in. Whether you’d like to speed up your chart formatting, finally understand pivot tables, or complete a VLOOKUP (I promise it’s not as scary as it sounds), we’ll teach you everything you need to know to call yourself a master of Excel — and truly mean it.

Since we all know that reading about Excel may not be the most captivating topic, we’ve tried to cater the training to your unique learning style. At the start of each advanced topic, you’ll find a short video to dip your toe in the water — a perfect solution for those pressed for time and in search of a quick answer. Next, the deep dive. Read along for a few extra functions and reporting insight. Finally, for those who learn best by doing, we’ve included Test Your Skills questions at the close of each chapter for you to complete with our Excel practice document.


SOAP and ODBC Erlang Libraries!

Friday, April 22nd, 2016

Bet365 donates Erlang libraries to GitHub by Cliff Saran.

From the post:

Online bookie Bet365 has released code into the GitHub open-source library to encourage enterprise developers to use the Erlang functional programming language.

The company has used Erlang since 2012 to overcome the challenges of using higher performance hardware to support ever-increasing volumes of web traffic.

“Erlang is a precision tool for developing distributed systems that demand scale, concurrency and resilience. It has been a superb technology choice in a business such as ours that deals in high traffic volumes,” said Chandru Mullaparthi, head of software architecture at Bet365.

I checked, the SOAP library is out and the ODBC library is forthcoming.

Cliff’s post ends with this cryptic sentence:

These releases represent the first phase of a support programme that will aim to address each of the major issues surrounding the uptake of Erlang.

That sounds promising!

Following @cmullaparthi to catch developing news.

Where You Look – Determines What You See

Friday, April 22nd, 2016

Mapping an audience-centric World Wide Web: A departure from hyperlink analysis by Harsh Taneja.


This article argues that maps of the Web’s structure based solely on technical infrastructure such as hyperlinks may bear little resemblance to maps based on Web usage, as cultural factors drive the latter to a larger extent. To test this thesis, the study constructs two network maps of 1000 globally most popular Web domains, one based on hyperlinks and the other using an “audience-centric” approach with ties based on shared audience traffic between these domains. Analyses of the two networks reveal that unlike the centralized structure of the hyperlink network with few dominant “core” Websites, the audience network is more decentralized and clustered to a larger extent along geo-linguistic lines.

Apologies but the article is behind a firewall.

A good example of what you look for determining your results. And an example of how firewalls prevent meaningful discussion of such research.

Unless you know of a site like of course.


PS: This is what an audience-centric web mapping looks like:


Impressive work!

Weekend Hacking Practice – WIN-T

Friday, April 22nd, 2016


U.S. Army Finds Its New Communications Network Is Vulnerable to Hackers by Aaron Pressman.

From the post:

The U.S. Army’s new $12 billion mobile communications system remains vulnerable to hackers, according to a recent assessment by outside security experts, prompting a series of further improvements.

Already in use in Iraq and Afghanistan, the Warfighter Information Network-Tactical Increment 2, or WIN-T, system is supposed to allow for protected voice, video, and data communications by troops on the move. In June, General Dynamics won a $219 million order for communications systems to go in more than 300 vehicles.

Government overseers have regularly criticized cyber security features of WIN-T in reports over the past few years, prompting an outside review by Johns Hopkins University and the Army Research Laboratory. The public reports do not disclose specific vulnerabilities, however.

Do you appreciate the use of “finds” rather than “admits to” flaws in their $12 billion mobile communication center?

Public reports not “…disclos[ing] specific vulnerabilities” was very likely in the interest of saving space in the reports.

Or as noted in the DOE&T report on the WIN-T:

WIN-T Increment 2 is not survivable. Although improved, WIN-T Increment 2 continues to demonstrate cybersecurity vulnerabilities. This is a complex challenge for the Army since WIN-T is dependent upon the cyber defense capabilities of all mission command systems connected to the network. (Emphasis added.) at page WIN-T 156.

Listing all the vulnerabilities of the WIN-T Increment 2 or Increment 3, would be equivalent to detailing all the vulnerabilities of the Sony network.

Interesting in a cataloging sort of way but only just.

Besides, its more sporting to challenge hackers to find vulnerabilities in WIN-T Increment 2 or Increment 3 without a detailed listing.

PS: Talk about an attack surface: General Dynamics Receives $219 Million for U.S. Army’s WIN-T Increment 2 Systems

General Dynamics Mission Systems and more than 500 suppliers nationwide will continue to work together to build and deliver WIN-T Increment 2 systems, the Army’s “Digital Guardian Angel.”

That doesn’t include all the insecure systems that tie into the WIN-T.

Maybe they will change the acronym to RDS – Rolling Digital Sieve?

Cybersecurity Via Litigation

Friday, April 22nd, 2016

Ex-Hacker: If You Get Hacked, Sue Somebody by Frank Konkel.

From the post:

Jeff Moss, the hacker formerly known as Dark Tangent and founder of Black Hat and DEFCON computer security conferences, has a message for the Beltway tech community: If you get owned, sue somebody.

Sue the hackers, the botnet operators that affect your business or the company that developed insecure software that let attackers in, Moss said. The days of software companies having built-in legal “liability protections” are about to come to an end, he argued.

“When the Internet-connected toaster burns down the kitchen, someone is going to get sued,” said Moss, speaking Wednesday at the QTS Information Security and Compliance Forum in Washington, D.C. “The software industry is the only industry with liability protection. Nobody else has liability protection for some weird reason. Do you think that is going to last forever?”

Some customer and their law firm will be the first ones to tag a major software company for damages.

Will that be your company/lawyers?

The only way to dispel the aura invulnerability from around software companies is by repeated assaults by people damaged by their negligence.

Tort (think liability for civil damages) law has a long and complex history. A history that would not have developed had injured people been content to simply be injured with no compensation.

On torts in general, see: Elements of Torts in the USA by Robert B. Standler.

I tried to find an online casebook that had edited versions of some of the more amusing cases from tort history but to no avail.

You would be very surprised at what conduct has been shielded from legal liability over the years. But times do change and sometimes in favor of the injured party.

If you want to donate a used tort casebook, I’ll post examples of changing liability as encouragement for suits against software vendors. Stripped of all the legalese, facts of cases can be quite amusing/outraging.

UTF-8 encoding table and Unicode characters

Friday, April 22nd, 2016

UTF-8 encoding table and Unicode characters

The mapping between UTF-8 and binary representations doesn’t come up often but it did today. but it does come up.

Rather than hunting through bookmarks in the future, I am capturing this resource here.

Corporate Bribery/Corruption – Poland/U.S./Russia – A Trio

Friday, April 22nd, 2016

GIJN (Global Investigation Journalism Network) tweeted a link to Corporate misconduct – individual consequences, 14th Global Fraud Survey this morning.

From the foreword by David L. Stulb:

In the aftermath of recent major terrorist attacks and the revelations regarding widespread possible misuse of offshore jurisdictions, and in an environment where geopolitical tensions have reached levels not seen since the Cold War, governments around the world are under increased pressure to face up to the immense global challenges of terrorist financing, migration and corruption. At the same time, certain positive events, such as the agreement by the P5+1 group (China, France, Russia, the United Kingdom, the United States, plus Germany) with Iran to limit Iran’s sensitive nuclear activities are grounds for cautious optimism.

These issues contribute to volatility in financial markets. The banking sector remains under significant regulatory focus, with serious stress points remaining. Governments, meanwhile, are increasingly coordinated in their approaches to investigating misconduct, including recovering the proceeds of corruption. The reason for this is clear. Bribery and corruption continue to represent a substantial threat to sluggish global growth and fragile financial markets.

Law enforcement agencies, including the United States Department of Justice and the United States Securities and Exchange Commission, are increasingly focusing on individual misconduct when investigating impropriety. In this context, boards and executives need to be confident that their businesses comply with rapidly changing laws and regulations wherever they operate.

For this, our 14th Global Fraud Survey, EY interviewed senior executives with responsibility for tackling fraud, bribery and corruption. These individuals included chief financial officers, chief compliance officers, heads of internal audit and heads of legal departments. They are ideally placed to provide insight into the impact that fraud and corruption is having on business globally.

Despite increased regulatory activity, our research finds that boards could do significantly more to protect both themselves and their companies.

Many businesses have failed to execute anti-corruption programs to proactively mitigate their risk of corruption. Similarly, many businesses are not yet taking advantage of rich seams of information that would help them identify and mitigate fraud, bribery and corruption issues earlier.

Between October 2015 and January 2016, we interviewed 2,825 individuals from 62 countries and territories. The interviews identified trends, apparent contradictions and issues about which boards of directors should be aware.

Partners from our Fraud Investigation & Dispute Services practice subsequently supplemented the Ipsos MORI research with in-depth discussions with senior executives of multinational companies. In these interviews, we explored the executives’ experiences of operating in certain key business environments that are perceived to expose companies to higher fraud and corruption risks. Our conversations provided us with additional insights into the impact that changing legislation, levels of enforcement and cultural behaviors are having on their businesses. Our discussions also gave us the opportunity to explore pragmatic steps that leading companies have been taking to address these risks.

The executives to whom we spoke highlighted many matters that businesses must confront when operating across borders: how to adapt market-entry strategies in countries where cultural expectations of acceptable behaviors can differ; how to get behind a corporate structure to understand a third party’s true ownership; the potential negative impact that highly variable pay can have on incentives to commit fraud and how to encourage whistleblowers to speak up despite local social norms to the contrary, to highlight a few.

Our survey finds that many respondents still maintain the view that fraud, bribery and corruption are other people’s problems despite recognizing the prevalence of the issue in their own countries. There remains a worryingly high tolerance or misunderstanding of conduct that can be considered inappropriate — particularly among respondents from finance functions. While companies are typically aware of the historic risks, they are generally lagging behind on the emerging ones, for instance the potential impact of cybercrime on corporate reputation and value, while now well publicized, remains a matter of varying priority for our respondents. In this context, companies need to bolster their defenses. They should apply anti-corruption compliance programs, undertake appropriate due diligence on third parties with which they do business and encourage and support whistleblowers to come forward with confidence. Above all, with an increasing focus on the accountability of the individual, company leadership needs to set the right tone from the top. It is only by taking such steps that boards will be able to mitigate the impact should the worst happen.

This survey is intended to raise challenging questions for boards. It will, we hope, drive better conversations and ongoing dialogue with stakeholders on what are truly global issues of major importance.

We acknowledge and thank all those executives and business leaders who participated in our survey, either as respondents to Ipsos MORI or through meeting us in person, for their contributions and insights. (emphasis in original)

Apologies for the long quote but it was necessary to set the stage of the significance of:

…increasingly focusing on individual misconduct when investigating impropriety.

That policy grants a “bye” to corporations who benefit from individual mis-coduct, in favor of punishing individual actors within a corporation.

While granting the legitimacy of punishing individuals, corporations cannot act except by their agents, failing to punish corporations enables their shareholders to continue to benefit from illegal behavior.

Another point of significance, listing of countries on page 44, gives the percentage of respondents that agree “…bribery/corrupt practices happen widely…” as follows (in part):

Rank Country % Agree
30 Poland 34
31 Russia 34
32 U.S. 34

When the Justice Department gets hoity-toity about law and corruption, keep those figures in mind.

If the Justice Department representative you are talking to isn’t corrupt, it happens, there’s one on either side of them that probably is.

Topic maps can help ferret out or manage “corruption,” depending upon your point of view. Even structural corruption, take the U.S. political campaign donation process.

Uniting Journalists and Hackers?

Friday, April 22nd, 2016

Kevin Gosztola’s post: US News Editors Find It Increasingly Difficult to Defend First Amendment is very sad, especially where he covers the inability to obtain records/information:

Forty-four percent of editors indicated their news organization was less able to go on the offense and sue to open up access to information.

“Newspaper-based (and especially TV-based) companies have tougher budgets and are less willing to spend on lawyers to challenge sunshine and public records violations,” one editor acknowledged.

Another editor declared, “The loss of journalist jobs and publishers’ declining profits means there’s less opportunity to pursue difficult stories and sue for access to information.” The costs of litigation constrain organizations.

“Government agencies are well aware that we do not have the money to fight. More and more, their first response to our records request is, ‘Sue us if you want to get the records,’” one editor stated.

What if the journalism and hacker communities can unite to change:

‘Sue us if you want to get the records’

into a crowd-sourced:

‘Hack us if you want to get the records’

The effectiveness of crowd-sourcing requires no documentation.

Public service hacking by crowds of hackers would greatly reduce the legal fees expended to obtain records.

There are two elements missing for effective crowd-sourced hacking in support of journalists:

  1. Notice of what records journalists want.
  2. Disconnecting hackers from journalists.

Both elements could be satisfied by a public records request board that enables journalists to anonymously request records and allows anonymous responses with pointers to the requested records.

If subpoenaed, give the authorities the records that were posted anonymously. (One assumes hackers won’t leave their fingerprints on them.)

There may be such a notice board already frequented by journalists and hackers so please pardon my ignorance if that is the case.

From Kevin’s post I got the impression that isn’t the case.

PS: If you have ethical qualms about this approach, recall the executive branch decided to lie at will to judicial fact-finders, thereby rendering judicial review a farce. They have no one but themselves to blame for suggestions to by-pass that process.

Cosmic Web

Thursday, April 21st, 2016

Cosmic Web

From the webpage:

Immerse yourself in a network of 24,000 galaxies with more than 100,000 connections. By selecting a model, panning and zooming, and filtering different, you can delve into three distinct models of the cosmic web.

Just one shot from the gallery:


I’m not sure if the display is accurate enough for inter-galactic navigation but it is certainly going to give you ideas about more effective visualization.


Scope Rules!

Thursday, April 21st, 2016

I was reminded of the power of scope (in the topic map sense) when I saw John D. Cook’s Quaternions in Paradise Lost.


See John’s post for the details but in summary, Kuiper’s Quaternions and Rotation Sequences quoted a passage from Milton that used the term quarterion.

Your search appliance and most if not all of the public search engines will happily return all uses of quarterion without distinction. (Yes, I am implying there is more than one meaning for quarterion. See John’s post for the details.)

In addition to distinguishing between usages in Milton and Kuiper, scope can cleanly separate terms by agency, activity, government or other distinctions.

Or you can simply wade through search glut.

Your call.

News Flash: Only “Customary” Speakers Protected From Prior Restraint

Thursday, April 21st, 2016

National Security Letters Upheld As Constitutional

From the post:

A federal judge has unsealed her ruling that National Security Letter (NSL) provisions in federal law—as amended by the USA FREEDOM Act—don’t violate the Constitution. The ruling allows the FBI to continue to issue the letters with accompanying gag orders that silence anyone from disclosing they have received an NSL, often for years. The Electronic Frontier Foundation (EFF) represents two service providers in challenging the NSL statutes, who will appeal this decision to the United States Court of Appeals for the Ninth Circuit.

“Our heroic clients want to talk about the NSLs they received from the government, but they’ve been gagged—one of them since 2011,” said EFF Deputy Executive Director Kurt Opsahl. “This government silencing means the service providers cannot issue open and honest transparency reports and can’t share their experiences as part of the ongoing public debate over NSLs and their potential for abuse. Despite this setback, we will take this fight to the appeals court, again, to combat USA FREEDOM’s unconstitutional NSL provisions.”

This long-running battle started in 2011, after one of EFF’s clients challenged an NSL and the gag order it received. In 2013, U.S. District Court Judge Susan Illston issued a groundbreaking decision, ruling that the NSL power was unconstitutional. However, the government appealed, and the Ninth Circuit found that changes made by the USA FREEDOM Act passed by Congress last year required a new review by the District Court.

In the decision unsealed this week, the District Court found that the USA FREEDOM Act sufficiently addressed the facial constitutional problems with the NSL law. However, she also ruled that the FBI had failed to provide a sufficient justification for one of our client’s challenges to the NSLs. After reviewing the government’s justification, the court found no “reasonable likelihood that disclosure … would result in danger to the national security of the United States,” or other asserted dangers, and prohibited the government from enforcing that gag. However, the client still cannot identify itself because the court stayed this portion of the decision pending appeal.

The district court’s decision has many low points, perhaps the lowest is its quoting of the Second Circuit in John Doe, Inc. v. Mukasey:

Although the nondisclosure requirement is in some sense a prior restraint,… it is not a typical example of such a restriction for it is not a restraint imposed on those who customarily wish to exercise rights of free expression, such as speakers in public fora, distributors of literature, or exhibitors of movies. And although the nondisclosure requirement is triggered by the content of a category of information, that category, consisting of the fact of the receipt of an NSL and some related details, it far more limited than the broad categories of information that have been at issue with respect to typical content-based restrictions.

In the court’s judgment since customary speakers weren’t at issue, there’s no protection from prior restraint.

What a bizarre concept.

Are you a speaker in a public fora, distributor of literature, exhibitor of movies?

Well, I don’t qualify as an exhibitor of movies.

Nor do I qualify as a distributor of literature, at least in the sense of a traditional publisher.

Hmmm, do you think I qualify as a speaker in a public fora?

Perhaps, perhaps, but considering the tortured lengths the court went to reach its decision, what do you think the odds are that Wolf Blizer is a speaker in a public fora and I’m not?

Or you for that matter?

Support the EFF in this fight, it’s your right to be informed about FBI excesses and to raise those with your elected representatives that is at stake.

Cory Doctorow on Librarians

Thursday, April 21st, 2016

Just in case you missed Cory’s tweet on April 21, 2016:

Saying “Librarians are obsolete now that we have the Internet” is like saying “Doctors are obsolete now that we have the plague”

If that doesn’t make sense to you:

  1. Time yourself finding relevant information about any topic.
  2. Ask you local librarian for relevant information on the same topic.
  3. Compare the results of 1 and 2.

Do you get it now?

Writing Challenge: Short Science!

Wednesday, April 20th, 2016

Welcome to Short Science!

From the webpage:

Sometimes research papers are hard to understand because of nomenclature, unclear writing, unnecessary formalisms, etc… It is useful to ask an expert in the field to summarize the intuition behind a paper to help you understand it. Have you every finally understood a paper and then said “That’s it?” The goal is that you have those “That’s it?” moments without having to spend days reading the paper!

Short Science allows researchers to publish paper summaries that are voted on and ranked until the best and most accessible summary has been found! The goal is to make all the seminal ideas in science accessible to the people that want to understand them.

Everyone can write summaries for any paper that exists in our database. It is easy to think you know enough about a paper until you try to write something public about it. You may discover that you don’t understand the concept when you try to write a short accessible summary about it.

Short Science is where Einstein puts you to the test “If I can’t explain it simply, I don’t understand it well enough.” Do you understand a paper well enough? Prove it to yourself by writing a summary for it! Also, write summaries of your own papers to help people to understand it and gain impact!

If you are looking for an authoring/writing challenge, you have found the place!

Not that I need another authoring venue but this looks like fun. Not to mention a way to get external validation.


Indigo is the new Blue

Wednesday, April 20th, 2016

Letter from Carl Malamud to Mr. Michael Zuckerman, Harvard Law Review Association.

You can read Carl’s letter for yourself.

Recommend to law students, law professors, judges, lawyers, people practicing for Jeopardy appearances, etc., The Indigo Book: An Open and Compatible Implementation of A Uniform System of Citation.

In any pleadings, briefs, essays, cite this resource as:

Sprigman et al., The Indigo Book: A Manual of Legal Citation, Public Resource (2016).

Every download of the Indigo Book saves someone $25.89 over a competing work on Amazon, which I won’t name for copyright reasons.

Adobe vs. Department of HomeLand Security (DHS) – Who To Trust?

Wednesday, April 20th, 2016

Windows Users Warned to Dump QuickTime Pronto by David Jones.

From the post:

The U.S. Department of Homeland Security on Thursday issued a warning to remove Apple’s QuickTime for Windows. The alert came in response to Trend Micro‘s report of two security flaws in the software, which will never be patched because Apple has ended support for QuickTime for Windows.

Computers running QuickTime are open to increased risk of malicious attack or data loss, US-CERT warned, and remote attackers could take control of a victim’s computer system. US-CERT is part of DHS’ National Cybersecurity and Communications Integration Center.

“We alerted DHS because we felt the situation was broad enough that people having unpatched vulnerabilities on their system needed to be made aware,” said Christopher Budd, global threat communication manager at Trend Micro.

The a few days later, Adobe clouds that warning as reported in: Don’t be too quick to uninstall QuickTime for Windows warns Adobe by Graham Cluley.

Adobe issued this notice, pointing out that removing QuickTime for Windows may bite you if you are a Creative Cloud user.

Along with no date from removing QuickTime dependencies.

BTW, Apple has announced that the vulnerabilities that lead to these conflicting announcements, will not be fixed.

In terms of impact, I did find these statistics on website usage of QuickTime but wasn’t able to locate statistics on user installations of QuickTime on Windows boxes.

My guess is that QuickTime on Windows machines at government installations resembles a rash. Diaper rash that is.

This is story to keep in mind when planning dependencies on software or data that is not under your control.

PS: I would uninstall it but then I don’t run Flash either. No one is completely safe but sleeping outside, naked, is just inviting trouble. That’s the analog equivalent of running either QuickTime or Flash.

Are Your Users Idiots?

Wednesday, April 20th, 2016

Here is a flowchart designed to prove to the news industry their audiences aren’t idiots:


Apologies, you will have to select the image to see a legible view of it.

The article, A serious problem the news industry does not talk about by Jennifer Barndel, has this section heading:

The culture of journalism breeds disdain for the people we’re meant to be serving, i.e., the audience.

While I recommend this article to journalists, programmers, lawyers, judges, agency heads, etc., could all profit from reading it and substituting their stories and vocabularies into it.

It really isn’t the case that your “audience” is stupid, in the overwhelming majority of the cases they don’t know what you know. Either specific facts or context in which to understand those facts.

Of course I should listen to my own advice when I chide the EU, but I despise provincialism almost as I do willful ignorance. That’s an excuse, not a justification.

Read Jennifer’s post as though it is speaking of your audience/users. It may present opportunities for growth you (and I) have overlooked.

EU Too Obvious With Wannabe A Monopoly Antics

Wednesday, April 20th, 2016

If you ever had any doubts (I didn’t) that the EU is as immoral as any other government, recent moves by the EU in the area of software will cure those.

EU hits Google with second antitrust charge by Foo Yun Chee reports:

EU antitrust regulators said that by requiring mobile phone manufacturers to pre-install Google Search and the Google Chrome browser to get access to other Google apps, the U.S. company was harming consumers by stifling competition.

Show of hands. How many of you think the EU gives a sh*t about consumers?

Yeah, that’s what I thought as well.

Or as Chee quotes European Competition Commissioner Margrethe Vestager:

“We believe that Google’s behavior denies consumers a wider choice of mobile apps and services and stands in the way of innovation by other players,” she said.

Hmmm, “other players.” Those don’t sound like consumers, those sound like people who will be charging consumers.

If you need confirmation of that reading, consider Anti-innovation: EU excludes open source from new tech standards by Glyn Moody.

From the post:

“Open” is generally used in the documents to denote “open standards,” as in the quotation above. But the European Commission is surprisingly coy about what exactly that phrase means in this context. It is only on the penultimate page of the ICT Standardisation Priorities document that we finally read the following key piece of information: “ICT standardisation requires a balanced IPR [intellectual property rights] policy, based on FRAND licensing terms.”

It’s no surprise that the Commission was trying to keep that particular detail quiet, because FRAND licensing—the acronym stands for “fair, reasonable, and non-discriminatory”—is incompatible with open source, which will therefore find itself excluded from much of the EU’s grand new Digital Single Market strategy. That’s hardly a “balanced IPR policy.”

Glyn goes on to say that FRAND licensing is the result of lobbying by American technical giants but seems unlikely.

The EU has attempted to favor EU-origin “allegedly” competitive software for years.

I say “allegedly” because the EU never points to competitive software in its antitrust proceedings that was excluded, only to the speculation that but for those evil American monopolists, there would be this garden of commercial and innovative European software. You bet.

There is a lot of innovative European software, but it hasn’t been produced in the same mindset that afflicts officials at the EU. They are fixated on an out-dated software sales/licensing model. Consider the rising number of companies based on nothing but open source if you want a sneak peek at the market of the future.

Being mired in market models from the past, the EU sees only protectionism (the Google complaint) and out-dated notions of software licensing (FRAND) as foundations for promoting a software industry in Europe.

Not to mention the provincialism of the EU makes it the enemy of a growing software industry in Europe. Did you know that EU funded startups are limited to hiring EU residents? (Or so I have been told, by EU startups.) That certainly works that way with EU awards.

There is nothing inconsistent with promoting open source and a vibrant EU software industry, so long as you know something about both. Knowing nothing about either has led the EU astray.

Searching for Subjects: Which Method is Right for You?

Wednesday, April 20th, 2016

Leaving to one side how to avoid re-evaluating the repetitive glut of materials from any search, there is the more fundamental problem of how to you search for a subject?

This is a back-of-the-envelope sketch that I will be expanding, but here goes:

Basic Search

At its most basic, a search consists of a <term> and the search seeks to match strings that match that <term>.

Even allowing for Boolean operators, the matches against <term> are only and forever string matches.

Basic Search + Synonyms

Of course, as skilled searchers you will try not only one <term>, but several <synonym>s for the term as well.

A good example of that strategy is used at PubMed:

If you enter an entry term for a MeSH term the translation will also include an all fields search for the MeSH term associated with the entry term. For example, a search for odontalgia will translate to: “toothache”[MeSH Terms] OR “toothache”[All Fields] OR “odontalgia”[All Fields] because Odontalgia is an entry term for the MeSH term toothache. [PubMed Help]

The expansion to include the MeSH term Odontalgia is useful, but how do you maintain it?

A reader can see “toothache” and “Odontalgia” are treated as synonyms, but why remains elusive.

This is the area of owl:sameAs, the mapping of multiple subject identifiers/locators to a single topic, etc. You know that “sameness” exists, but why isn’t clear.

Subject Identity Properties

In order to maintain a PubMed or similar mapping, you need people who either “know” the basis for the mappings or you can have the mappings documented. That is you can say on what basis the mapping happened and what properties were present.

For example:


Key Value
symptom pain
general-location mouth
specific-location tooth

So if we are mapping terms to other terms and the specific location value reads “tongue,” then we know that isn’t a mapping to “toothache.”

How Far Do You Need To Go?

Of course for every term that we use as a key or value, there can be an expansion into key/value pairs, such as for tooth:


Key Value
general-location mouth
composition enamel coated bone
use biting, chewing


Each step towards more precise gathering of information increases your pre-search costs but decreases your post-search cost of casting out irrelevant material.

Moreover, precise gathering of information will help you avoid missing data simply due to data glut returns.

If maintenance of your mapping across generations is a concern, doing more than mapping of synonyms for reason or reasons unknown may be in order.

The point being that your current retrieval or lack thereof of current and correct information has a cost. As does improving your current retrieval.

The question of improved retrieval isn’t ideological but an ROI driven one.

  • If you have better mappings will that give you an advantage over N department/agency?
  • Will better retrieval slow down (never stop) the time wasted by staff on voluminous search results?
  • Will more precision focus your limited resources (always limited) on highly relevant materials?

Formulate your own ROI questions and means of measuring them. Then reach out to topic maps to see how they improve (or not) your ROI.

Properly used, I think you are in for a pleasant surprise with topic maps.

Databases are categories

Tuesday, April 19th, 2016

Databases are categories by David I. Spivak.

Slides from a presentation 2010/06/03.

If you are more comfortable with databases than category theory, you may want to give these a spin.

I looked but was unable to locate video of the presentation. That would be a nice addition.


Hacking Target for Week of April 18 – 25, 2016

Monday, April 18th, 2016

3.2 Million Machines Found Vulnerable to Ransomware Campaign by David Bisson

From the post:

Researchers have found 3.2 million machines that are vulnerable of being targeted in a ransomware campaign.

According to a post published by the Cisco Talos Security Intelligence and Research Group, attackers can leverage vulnerabilities found in WildFly, an application server that also goes by the name JBoss, as an initial point of compromise to target upwards of 3.2 million machines.

Once they have established a foothold, bad actors can download malware onto the compromised machines and move laterally across the network to infect other computers.

Such was the case in a recent Samsam ransomware campaign, where attackers used a tool known as “JexBoss” to exploit JBoss application servers.

Further investigation by the Cisco Talos research team has uncovered 2,100 JBoss backdoors that have already been installed on 1,600 unique IP addresses.

There are far more than 3.2 million systems vulnerable to ransomware campaigns but here you have the advantage of targeting information and good odds of finding one of those targets.

Not that I advocate the use of ransomware but increases in cyberattacks drives the need for better information management of hacking information for “white,” “gray,” and “black” hats alike.

Or as they say:

It’s an ill wind indeed that doesn’t blow anyone good.

Ask yourself how much prose do you have to sift every day, day in, day out, just to remain partially current on security issues?

No, I’m not interested in fostering yet another meta-collection, rather a view into all existing collections, meta or not. Build upon what already exists and is useful.


PS: I’m not concerned with your hat color. That’s between you and your local law enforcement officials.

Dictionary of Fantastic Vocabulary [Increasing the Need for Topic Maps]

Monday, April 18th, 2016

Dictionary of Fantastic Vocabulary by Greg Borenstein.

Alexis Lloyd tweeted this link along with:

This is utterly fantastic.

Well, it certainly increases the need for topic maps!

From the bot description on Twitter:

Generating new words with new meanings out of the atoms of English.

Ahem, are you sure about that?

Is a bot is generating meaning?

Or are readers conferring meaning on the new words as they are read?

If, as I contend, readers confer meaning, the utterance of every “new” word, opens up as many new meanings as there are readers of the “new” word.

Example of people conferring different meanings on a term?

Ask a dozen people what is meant by “shot” in:

It’s just a shot away

When Lisa Fischer breaks into her solo in:

(Best played loud.)

Differences in meanings make for funny moments, awkward pauses, blushes, in casual conversation.

What if the stakes are higher?

What if you need to produce (or destroy) all the emails by “bobby1.”

Is it enough to find some of them?

What have you looked for lately? Did you find all of it? Or only some of it?

New words appear everyday.

You are already behind. You will get further behind using search.

Clojure/west 2016 – Videos! [+ Unix Sort Trick]

Monday, April 18th, 2016

I started seeing references to Clojure/west 2016 videos and to marginally increase your access to them, I have sorted them by author and with a Unix sort trick, by title.

Unix Sort Trick (truthfully, just a new switch to me)

Having the videos in author order is useful but other people may remember a title and not the author.

I want to sort the already created <li> elements with sort, but you can see the obvious problem.

By default, sort uses the entire line for sorting, which given the urls, isn’t going to give the order I want.

To the rescue, the -k switch for sort, which allows you to define which field and character offset in that field to use for sorting.

In this case, I used 1, the default field and then character offset 74, the first character following the > of the <a> element.

Resulted in:

In full: sort -k 1.74 sort-file.txt > sorted-file.txt

SICP [In Modern HTML]

Monday, April 18th, 2016

SICP by Andres Raba.

Poorly formatted or styled HTML for CS texts is a choice, not a necessity.

As proof I offer this new HTML5 and EPUB3 version of “Structure and Interpretation of Computer Programs” by Abelson, Sussman, and Sussman.

From the webpage:

Modern solutions such as scalable vector graphics, mathematical markup with MathML and MathJax, embedded web fonts, and syntax highlighting are used. Rudimentary scaffolding for responsive design is in place, which adapts the page for viewing on pocket devices and tablets. More tests on small screens are needed to adjust the font size and formatting, so I encourage feedback from smartphone and tablet owners.


Keeping Panama Papers Secret? Law Firms, Journalists and Privacy

Sunday, April 17th, 2016

Panama battle looms as HMRC demands leaked data is handed over to pursue tax fraudsters by Alex Hawkes.

From the post:

British tax authorities are heading for a showdown with the media groups behind the Panama Papers exposé, demanding they hand over the cache of documents.

The Government has pledged £10million for a task-force to investigate the Panama data, but it is understood the authorities have so far only managed to get hold of some of the 11.5million documents.

Revenue & Customs told The Mail on Sunday it was ‘determined’ to get hold of the leaked information to pursue criminal investigations against tax fraudsters and would ‘explore every avenue, nationally and internationally’.

t said: ‘While we appreciate that the media is not an arm of law enforcement, given the seriousness of the allegations that they have published and the calls they have made for action to be taken, we would reasonably expect them to co-operate in giving us access to the Panama data.’

The documents were leaked from Panamanian law firm Mossack Fonseca to organisations including The Guardian and the BBC, leading to headlines around the world alleging widespread tax evasion and to calls for a crackdown.

I hate to side with tax authorities anywhere, for any reason, but the Panama Papers should be publicly posted for everyone to see.

The claim that the estimated 400 or so journalists who worked on the Panama Papers have some right to withhold the papers to protect the privacy of those named therein is entirely specious.

If you credit that claim, then you also have to credit the privacy claim of Mossack Fonseca on behalf of its clients, in which case, the journalists should have deleted the files upon receipt.

The point is that secrecy of the files, whether in the hands of Mossack Fonseca or the 400 or so journalists, inures to the financial benefit of those making the claim.

Mossack Fonseca, with privacy intact, could continue to make arrangements beneficial to both its clients and to Mossack Fonseca.

If the journalists are successful in withholding the leaked files, they benefit from mining this treasure trove for who they deem worthy of exposure and not incidentally, making money from their media outlets.

All without anyone holding their reporting accountable as compared to the leaked documents.

Privacy, at least in the Panama Papers case, is about power and benefits from having that power.

So long as Suddeutsche Zeitung and others can dole out snippets and tidbits of information from the Panama Papers, they enjoy the same status as having say the Snowden leaks.

Information is power!

The journalists in question merit kudos for their hard work but being the chance recipient of a leak should not translate into a life long privilege.

Or the ability to dictate unaccountable media coverage based on information the reading public can never see.

Personally I don’t trust governments, based on long experience of journalists demonstrating governments lie, cover up, etc.

Why should I suddenly turn into a gullible Gus when given unaccountable reporting from a news outlet?

If you are going to leak, leak as widely as possible!

I would say leak to Wikileaks but they are as bad as the media about holding information back from the public based for undisclosed reasons.

Where can/should people leak to assure unfettered access to all leaked materials?

I’m asking because I honestly don’t know and have no clue where to start looking.

I have exactly zero interest in empowering anyone as the censor of leaked information.

It leaks, it flows.