Archive for February, 2016

All Talk and No Buttons: The Conversational UI

Wednesday, February 24th, 2016

All Talk and No Buttons: The Conversational UI by Matty Mariansky.

From the post:

We’re witnessing an explosion of applications that no longer have a graphical user interface (GUI). They’ve actually been around for a while, but they’ve only recently started spreading into the mainstream. They are called bots, virtual assistants, invisible apps. They can run on Slack, WeChat, Facebook Messenger, plain SMS, or Amazon Echo. They can be entirely driven by artificial intelligence, or there can be a human behind the curtain.

Not to put too sharp a point on it but they used to be called sales associates or sales clerks, if you imagine a human being behind the curtain.

Since they are no longer visible in distinctive clothing, you have the task of creating a UI that isn’t quite as full bandwidth as human to human proximity but still useful.

A two part series that will have you thinking more seriously about what a conversational UI might look like.


Visualizing the Clinton Email Network in R

Wednesday, February 24th, 2016

Visualizing the Clinton Email Network in R by Bob Rudis.

From the post:

This isn’t a post about politics. I do have opinions about the now infamous e-mail server (which will no doubt come out here), but when the WSJ folks made it possible to search the Clinton email releases I though it would be fun to get the data into R to show how well the igraph and ggnetwork packages could work together, and also show how to use svgPanZoom to make it a bit easier to poke around the resulting hairball network.

NOTE: There are a couple “Assignment” blocks in here. My Elements of Data Science students are no doubt following the blog by now so those are meant for you 🙂 Other intrepid readers can ignore them.

A great walk through on importing, analyzing, and visualizing any email archive, not just Hillary’s.

You will quickly find that “…connecting the dots…” isn’t as useful as the intelligence community would have you believe.

Yes, yes! There is a call to Papa John’s! Oh, that’s not a code name, that’s a pizza place. (Even suspected terrorists have to eat.)

Great to have the dots. Great to have connections. Not so great if that is all that you have.

I found a number of other interesting posts at Bob’s blog:

Including: Dairy-free Parker House Rolls! I bake fairly often so am susceptible to this sort of posting. Looks very good!

Open-Source Sequence Clustering Methods Improve the State Of the Art

Wednesday, February 24th, 2016

Open-Source Sequence Clustering Methods Improve the State Of the Art by Evguenia Kopylova et al.


Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH’s most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release.

IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014,

Bioinformatics has specialized clustering issues but improvements in clustering algorithms are likely to have benefits for others.

Not to mention garage gene hackers, who may benefit more directly.

Import Table Into Google Spreadsheet – Worked Example – Baby Blue’s

Wednesday, February 24th, 2016

I encountered a post by Zach Klein with the title: You can automatically scrape and import any table or list from any URL into Google Spreadsheets.

As an image of his post:


Despite it having 1,844 likes and 753 retweets, I had to test it before posting it here. 😉

An old habit born of not cited anything I haven’t personally checked. It means more reading but you get to commit your own mistakes and are not limited to the mistakes made by others.

Anyway, I thought of the HTML version of Baby Blue’s Manual of Legal Citation as an example.

After loading that URL, view the source of the page because we want to search for table elements in the text. There are display artifacts that look like tables but are lists, etc.

The table I chose was #11, which appears in Baby Blue’s as:


So I opened up a blank Google Spreadsheet and entered:

BabyBlue.20160205.html", "table", 11)

in the top left cell.

The results:


I’m speculating but Google Spreadsheets appears to have choked on the entities used around “name” in the entry for Borough court.

If you’re not fluent with XSLT or XQuery, importing tables and lists into Google Spreadsheets is an easy way to capture information.

Apple Refuses to “Unlock” – False Meme – FBI Attempts To Press Gang Apple

Tuesday, February 23rd, 2016

The meme that Apple has refused to “unlock” an iPhone in the San Bernardino shooting case is demonstrably false.

There is no magic key which Apple has refused to release to the FBI. (full stop)

Every media outlet or person describing the request to Apple as to “unlock:”

  1. Is ignorant of the facts of the FBI request,
  2. Is deliberating spreading disinformation for the FBI,
  3. Or both.

The falseness of the “unlock” meme isn’t hard to demonstrate.

The original court order reads in part:

1. Apple shall assist in enabling the search of a cellular telephone, Apple make: iPhone 5C, Model: A1532, P/N:MGF2LL/A, S/N:FFMNQ3MTG2DJ, IMEI:358820052301412, on the Verizon Network, (the “SUBJECT DEVICE”) pursuant to a warrant of this Court by providing reasonable technical assistance to assist law enforcement agents in obtaining access to the data on the SUBJECT DEVICE.

2. Apple’s reasonable technical assistance shall accomplish the following three important functions: (1) it will bypass or disable the auto-erase function whether or not it has been enabled; (2) it will enable the FBI to submit passcodes to the SUBJECT DEVICE for testing electronically via the physical device port, Bluetooth, Wi-Fi, or other protocol available on the SUBJECT DEVICE; and (3) it will ensure that when the FBI submits passcodes to the SUBJECT DEVICE, software running on the device will not purposefully introduce any additional delay between passcode attempts beyond what is incurred by Apple hardware.

3. Apple’s reasonable technical assistance may include, but is not limited to: providing the FBI with a signed IPhone Software file, recovery bundle, or other Software Image File (“SIF”) that can be loaded onto the SUBJECT DEVICE. The SIF will load and run from Random Access Memory (“RAM”) and will not modify the iOS on the actual phone, the user data partition or system partition on the device’s flash memory. The SIF will be coded by Apple with a unique identifier of the phone so that the SIF would only load and execute on the SUBJECT DEVICE. The SIF will be loaded via Device Firmware Upgrade (“DFU”) mode, recovery mode, or other applicable mode available to the FBI. Once active on the SUBJECT DEVICE, the SIF will accomplish the three functions specified in paragraph 2. The SIF will be loaded on the SUBJECT DEVICE at either a government facility, or alternatively, at an Apple facility; if the latter, Apple shall provide the government with remote access to the SUBJECT DEVICE through a computer allowing the government to conduct passcode recovery analysis.

4. If Apple determines that it can achieve the three functions stated above in paragraph 2, as well as the functionality set forth in paragraph 3, using alternative technological means from that recommended by the government, and the government concurs, Apple may comply with this Order in that way.

Does that sound like “unlock” to you?

Yet, media outlets as diverse as NPR, the New York Times and the Pew Foundation, have all repeated the false “unlock” meme, along with many others.

The court order is an attempt to force Apple to undertake a custom programming project at the behest of the government.

Do you think the government can whistle up Apple, IBM or Microsoft, or even you, for a custom programming job?

Whether you want to participate or not?

That’s what FBI Director Comey and his media allies want to hide under the false “unlock” meme.

Spread the true meme – this is slavery on the software seas and should be denounced as such.

The freedom you save may well be your own.

Anti-Encryption National Commission News 24 February 2016

Tuesday, February 23rd, 2016

Shedding Light on ‘Going Dark’: Practical Approaches to the Encryption Challenge.

WHEN: Wednesday, February 24, 2016 12:00 p.m. to 1:30 p.m. ET
WHERE: Bipartisan Policy Center, 1225 Eye Street NW, Suite 1000, Washington, DC, 20005


From the post:

The spate of terrorist attacks last year, especially those in Paris and San Bernardino, raised the specter of terrorists using secure digital communications to evade intelligence and law enforcement agencies and, in the words of FBI Director James Comey, “go dark.” The same technologies that companies use to keep Americans safe when they shop online and communicate with their friends and family on the Internet are the same technologies that terrorists and criminals exploit to disguise their illicit activity.

In response to this challenge, House Homeland Security Committee Chairman Michael McCaul (R-TX) and Sen. Mark Warner (D-VA), a member of the Senate Intelligence Committee, have proposed a national commission on security and technology challenges in the digital age. The commission would bring together experts who understand the complexity and the stakes to develop viable recommendations on how to balance competing digital security priorities.

Please join the Bipartisan Policy Center on February 24 for a conversation with the two lawmakers as they roll out their legislation creating the McCaul-Warner Digital Security Commission followed by a panel discussion highlighting the need to take action on this critical issue.

Ironically, I won’t be able to watch the live streaming of this event because:

The video you are trying to watch is using the HTTP Live Streaming protocol which is only support in iOS devices.

Any discussion of privacy or First Amendment rights must begin without the presumption that any balancing or trade-off is necessary.

While it is true that some trade-offs have been made in the past, the question that should begin the anti-encryption discussion is whether terrorism is any more than a fictional threat or not?

Since 9/11, it has been 5278 days without a terrorist being sighted at a U.S. airport.

One explanation for those numbers is the number of terrorists in the United States is extremely small.

The FBI routinely takes advantage of people suffering from mental illness to create terrorist “threats,” which the FBI then eliminates. So those arrests should be removed from the showing of “terrorists” in our midst.

Before any discussion of “balancing” take place, challenge the need for balancing at all.

PS: Find someone with an unhacked iOS device on which to watch this presentation.

I first saw this in a post by Cory Doctorow, U.S. lawmakers expected to introduce major encryption bill.

OrientDB 2.2 Beta – But Is It FBI Upgrade Secure?

Monday, February 22nd, 2016

OrientDB 2.2 Beta

From the post:

Our engineers have been working hard to make OrientDB even better and we’re now ready to share it with you! We’re pleased to announce OrientDB 2.2 Beta. We’re really proud of this release as it includes multiple enhancements in both the Community Edition and the the Enterprise Edition. Please note that this is still a beta release, so the software and documentation still have some rough edges to iron out.

We’ve focused on five main themes for this release: Security, Operations, Performance, APIs and Teleporter.

Security for us is paramount as customers are storing their valuable, critical and confidential data in OrientDB. With improved auditing and authentication, password salt and data-at-rest encryption, OrientDB is now one of the most secure DBMSs ever.


Our users have been asking for incremental backup, so we delivered. Now it’s ready to be tested! That’s not the only addition to version 2.2, as the new OrientDB Studio now replaces the workbench, adding a new architecture and new modules.


We’ve done multiple optimizations to the core engine and, in many cases, performance increased tenfold. Our distributed capabilities are also constantly improving and we’ve introduced fast-resync of nodes. This release supports the new configurable Graph Consistency to speed-up change operations against graphs.


“SQL is the English of databases” and we’re constantly improving our SQL access layer to simplify graph operations. New additions include Pattern Matching, Command Cache, Automated Parallel Queries and Live Query graduating from experimental to stable. OrientJS, the official Node driver, now supports native unmarshalling.


You may have heard rumors about a new, easy way to convert your relational databases to OrientDB. It’s time to formally release Teleporter: a new tool to sync with other databases and simplify migrations to OrientDB.

With the FBI attempting to enslave Apple to breach the security of all iPhones, I have to ask if the security features of OrientDB are FBI upgrade secure?

Same question applies to other software as well. OrientDB happens to be the first software release I have seen since the FBI decided it can press gang software vendors to subvert their own software security.

Thoughts on how to protect systems from upgrade removal of security measures?

Unlike the fictional terrorists being pursued by the FBI (ask the TSA, not one terrorist has been seen at a USA airport in the 5277 days since 9/11), the FBI poses a clear and present danger to the American public.

Tim Cook (Apple) and others should stop conceding the terrorist issue to the FBI. It just encourages their delusion to the detriment of us all.

Physics, Topology, Logic and Computation: A Rosetta Stone

Monday, February 22nd, 2016

Physics, Topology, Logic and Computation: A Rosetta Stone by John C. Baez and Mike Stay.


In physics, Feynman diagrams are used to reason about quantum processes. In the 1980s, it became clear that underlying these diagrams is a powerful analogy between quantum physics and topology. Namely, a linear operator behaves very much like a ‘cobordism’: a manifold representing spacetime, going between two manifolds representing space. This led to a burst of work on topological quantum field theory and ‘quantum topology’. But this was just the beginning: similar diagrams can be used to reason about logic, where they represent proofs, and computation, where they represent programs. With the rise of interest in quantum cryptography and quantum computation, it became clear that there is extensive network of analogies between physics, topology, logic and computation. In this expository paper, we make some of these analogies precise using the concept of ‘closed symmetric monoidal category’. We assume no prior knowledge of category theory, proof theory or computer science.

While this is an “expository” paper, at some 66 pages (sans the references), you best set aside some of your best thinking/reading time to benefit from it.


U.S. Patents Requirements: Novel/Non-Obvious or Patent Fee?

Monday, February 22nd, 2016

IBM brags about its ranking in patents granted, IBM First in Patents for 23rd Consecutive Year, and is particularly proud of patent 9087304, saying:

We’ve all been served up search results we weren’t sure about, whether they were for “the best tacos in town” or “how to tell if your dog has eaten chocolate.” With IBM Patent no. 9087304, you no longer have to second-guess the answers you’re given. This new tech helps cognitive machines find the best potential answers to your questions by thinking critically about the trustworthiness and accuracy of each source. Simply put, these machines can use their own judgment to separate the right information from wrong. (From:

Did you notice that the 1st for 23 years post did not have a single link for any of the patents mentioned?

You would think IBM would be proud enough to link to its new patents and especially 9087304, that “…separate[s] right information from wrong.”

But if you follow the link for 9087304, you get an impression of one reason IBM didn’t include the link.

The abstract for 9087304 reads:

Method, computer program product, and system to perform an operation for a deep question answering system. The operation begins by computing a concept score for a first concept in a first case received by the deep question answering system, the concept score being based on a machine learning concept model for the first concept. The operation then excludes the first concept from consideration when analyzing a candidate answer and an item of supporting evidence to generate a response to the first case upon determining that the concept score does not exceed a predefined concept minimum weight threshold. The operation then increases a weight applied to the first concept when analyzing the candidate answer and the item of supporting evidence to generate the response to the first case when the concept score exceeds a predefined maximum weight threshold.

I will spare you further recitations from the patent.

Show of hands, do U.S. Patents always require:

  1. novel/non-obvious ideas
  2. patent fee
  3. #2 but not #1


Judge rankings by # of patents granted accordingly.

Photo-Reconnaissance For Your Revolution

Sunday, February 21st, 2016

Using Computer Vision to Analyze Aerial Big Data from UAVs During Disasters by Patrick Meier.

From the post:

Recent scientific research has shown that aerial imagery captured during a single 20-minute UAV flight can take more than half-a-day to analyze. We flew several dozen flights during the World Bank’s humanitarian UAV mission in response to Cyclone Pam earlier this year. The imagery we captured would’ve taken a single expert analyst a minimum 20 full-time workdays to make sense of. In other words, aerial imagery is already a Big Data problem. So my team and I are using human computing (crowdsourcing), machine computing (artificial intelligence) and computer vision to make sense of this new Big Data source.

Revolutionaries are chronically understaffed so Meier’s advice for natural disasters is equally applicable to disasters known as governments.

Imagine the Chicago police riot or Watts or the Rodney King riot where NGO leadership had real time data on government forces.

Meier’s book, Digital Humanitarians is a good advocacy book for the use of technology during “disasters.” It is written for non-specialists so you will have to look to other resources to build up your technical infrastructure.

PS: With the advent of cheap drones, imagine stitching together images from multiple drones with overlapping coverage. Could provide better real-time combat intelligence than more expensive options.

I first saw this in a tweet by Kirk Borne.

Satellites in Global Development [How Do You Verify Satellite Images?]

Sunday, February 21st, 2016

Satellites in Global Development

From the webpage:

We have better systems to capture, analyze, and distribute data about the earth. This is fundamentally improving, and creating, opportunities for impact in global development.

This is an exploratory overview of current and upcoming sources of data, processing pipelines and data products. It is aimed to offer non GIS experts an exploration of the unfolding revolution of earth observation, with an emphasis on development. See footer for license and contributors.

A great overview of Earth satellite data for the non-specialist.

The impressive imagery of 0.31M resolution, calls to mind the danger of relying on such data without confirmation.

The image of Fortaleza “shows” (at 0.31M) what appears to be a white car parked near the intersection of two highways. What if instead of a white car that was a mobile missile launch platform? It’s not much bigger than a car so would show up on this image.

Would you target that location based on that information alone?

Or consider the counter-case: What reassurance do you have that what appears to be a white car in the image at the intersection is not a mobile missile launcher, but is reported to you on the image as a white car?

Or in either case, what if the image is reporting an inflatable object placed there to deceive remote imaging applications?

As with all data, satellite data is presented to you for a reason.

A reason that may or may not align with your goals and purposes.

I first saw this in a tweet by Kirk Borne.

FBI Must Reveal Its Hack – Maybe

Sunday, February 21st, 2016

Judge Rules FBI Must Reveal Malware It Used to Hack Over 1,000 Computers by Joseph Cox.

From the post:

On Wednesday, a judge ruled that defense lawyers in an FBI child pornography case must be provided with all of the code used to hack their client’s computer.

When asked whether the code would include the exploit used to bypass the security features of the Tor Browser, Colin Fieman, a federal public defender working on the case, told Motherboard in an email, simply, “Everything.”

“The declaration from our code expert was quite specific and comprehensive, and the order encompasses everything he identified,” he continued.

Fieman is defending Jay Michaud, a Vancouver public schools administration worker. Michaud was arrested after the FBI seized ‘Playpen’, a highly popular child pornography site on the dark web, and then deployed a network investigative technique (NIT)—the agency’s term for a hacking tool.

This NIT grabbed suspects’ real IP address, MAC address, and pieces of other technical information, and sent them to a government controlled server.

The case has drawn widespread attention from civil liberties activists because, from all accounts, one warrant was used to hack the computers of unknown suspects all over the world. On top of this, the defense has argued that because the FBI kept the dark web site running in order to deploy the NIT, that the agency, in effect, distributed child pornography. Last month, a judge ruled that the FBI’s actions did not constitute “outrageous conduct.”

If that sounds like a victory for those trying to protect users from government overreaching, consider the Department of Justice response to questions about the ruling:

“The court has granted the defense’s third motion to compel, subject to the terms of the protective order currently in place,” Carr wrote to Motherboard in an email.

I’m just guessing but I suspect “…the terms of the protective order currently in place,…” means that post-arrest the public may find out about the FBI hack but not before.

bAbI – Facebook Datasets For Automatic Text Understanding And Reasoning

Sunday, February 21st, 2016

The bAbI project

Four papers and datasets on text understanding and reasoning from Facebook.

Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer, Armand Joulin and Tomas Mikolov. Towards AI Complete Question Answering: A Set of Prerequisite Toy Tasks. arXiv:1502.05698.

Felix Hill, Antoine Bordes, Sumit Chopra and Jason Weston. The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations. arXiv:1511.02301.

Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexander Miller, Arthur Szlam, Jason Weston. Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems. arXiv:1511.06931.

Antoine Bordes, Nicolas Usunier, Sumit Chopra and Jason Weston. Simple Question answering with Memory Networks. arXiv:1506.02075.


Streaming 101 & 102 – [Stream Processing with Batch Identities?]

Sunday, February 21st, 2016

The world beyond batch: Streaming 101 by Tyler Akidau.

From part 1:

Streaming data processing is a big deal in big data these days, and for good reasons. Amongst them:

  • Businesses crave ever more timely data, and switching to streaming is a good way to achieve lower latency.
  • The massive, unbounded data sets that are increasingly common in modern business are more easily tamed using a system designed for such never-ending volumes of data.
  • Processing data as they arrive spreads workloads out more evenly over time, yielding more consistent and predictable consumption of resources.

Despite this business-driven surge of interest in streaming, the majority of streaming systems in existence remain relatively immature compared to their batch brethren, which has resulted in a lot of exciting, active development in the space recently.

Since I have quite a bit to cover, I’ll be splitting this across two separate posts:

  1. Streaming 101: This first post will cover some basic background information and clarify some terminology before diving into details about time domains and a high-level overview of common approaches to data processing, both batch and streaming.
  2. The Dataflow Model: The second post will consist primarily of a whirlwind tour of the unified batch + streaming model used by Cloud Dataflow, facilitated by a concrete example applied across a diverse set of use cases. After that, I’ll conclude with a brief semantic comparison of existing batch and streaming systems.

The world beyond batch: Streaming 102

In this post, I want to focus further on the data-processing patterns from last time, but in more detail, and within the context of concrete examples. The arc of this post will traverse two major sections:

  • Streaming 101 Redux: A brief stroll back through the concepts introduced in Streaming 101, with the addition of a running example to highlight the points being made.
  • Streaming 102: The companion piece to Streaming 101, detailing additional concepts that are important when dealing with unbounded data, with continued use of the concrete example as a vehicle for explaining them.

By the time we’re finished, we’ll have covered what I consider to be the core set of principles and concepts required for robust out-of-order data processing; these are the tools for reasoning about time that truly get you beyond classic batch processing.

You should also catch the paper by Tyler and others, The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing.

Cloud Dataflow, known as Beam at the Apache incubator, offers a variety of operations for combining and/or merging collections of values in data.

I mention that because I would hate to hear of you doing stream processing with batch identities. You know, where you decide on some fixed set of terms and those are applied across dynamic data.

Hmmm, fixed terms applied to dynamic data. Doesn’t even sound right does it?

Sometimes, fixed terms (read schema, ontology) are fine but in linguistically diverse environments (read real life), that isn’t always adequate.

Enjoy the benefits of stream processing but don’t artificially limit them with batch identities.

I first saw this in a tweet by Bob DuCharme.

Saturday Night And You Ain’t Got Nobody?

Saturday, February 20th, 2016

If you are spending Saturday night alone, take a look at More IoT insecurity: The surveillance camera that anyone can log into by Paul Ducklin.

It won’t help your social life but you are likely to see people who do have social lives.

The message here should be clear:

Your security is your responsibility, no one else’s.

No Patriotic Senators, Senate Staffers, Agency Heads – CIA Torture Report

Saturday, February 20th, 2016

I highly recommend your reading The CIA torture report belongs to the public by Lauren Harper.

I have quoted from Lauren’s introduction below as an inducement to read the article in full, but she fails to explore why a patriotic senator, staffer or agency head has not already leaked the CIA Torture Report?

It has already been printed, bound, etc., and who knows how many people were involved in every step of that process.

Do you seriously believe that report has gone unread except for its authors?

So far as I know, member of Congress, that “other” branch of the government, are entitled to make their own decisions about the handling of their reports.

What America needs now is a Senator or even a Senate staffer with more loyalty to the USA than to the bed wetters and torturers (same group) in the DoJ.

If you remember a part of the constitution that grants the DoJ the role of censor for the American public, please point it out in comments below.

From the post:

The American public’s ability to read the Senate Intelligence Committee’s full, scathing report on the Central Intelligence Agency’s torture program is in danger because David Ferriero, the archivist of the United States, will not call the report what it is, a federal record. He is refusing to use his clear statutory authority to label the report a federal record, which would be subject to Freedom of Information Act (FOIA) disclosure requirements, because the Justice Department has told the National Archives and Records Administration (NARA) not to. The DOJ has a long history of breaking the law to avoid releasing information in response to FOIA requests. The NARA does not have such a legacy and should not allow itself to be bullied by the DOJ.

The DOJ instructed the NARA not to make any determination on the torture report’s status as a federal record, ostensibly because it would jeopardize the government’s position in a FOIA lawsuit seeking the report’s release. The DOJ, however, has no right to tell the NARA not to weigh in on the record’s status, and the Presidential and Federal Records Act Amendments of 2014 gives the archivist of the United States the binding legal authority to make precisely that determination.

Democratic Sens. Patrick Leahy of Vermont and Dianne Feinstein of California revealed the DOJ’s insistence that the archivist of the United States not faithfully fulfill his duty in a Nov. 5, 2015, letter to Attorney General Loretta Lynch. They protested the DOJ’s refusal to allow its officials as well as those of the Defense Department, the CIA and the State Department to read the report. Leahy and Feinstein’s letter notes that “personnel at the National Archives and Records Administration have stated that, based on guidance from the Department of Justice, they will not respond to questions about whether the study constitutes a federal record under the Federal Records Act because the FOIA case is pending.” Rather than try to win the FOIA case on a technicality and step on the NARA’s statutory toes, the DOJ should allow the FOIA review process to determine on the case’s merits whether the document may be released.

Not even officials with security clearances may read the document while its status as a congressional or federal record is debated. The New York Times reported in November 2015 that in December of the previous year, a Senate staffer delivered envelopes containing the 6,700-page top secret report to the DOJ, the State Department, the Federal Bureau of Investigation and the Pentagon. Yet a year later, none of the envelopes had been opened, and none of the country’s top officials had read the report’s complete findings. This is because the DOJ, the Times wrote, “prohibited officials from the government agencies that possess it from even opening the report, effectively keeping the people in charge of America’s counterterrorism future from reading about its past.” The DOJ contends that if any agency officials read the report, it could alter the outcome of the FOIA lawsuit.

American war criminals who are identified or who can be discovered because of the CIA Torture Report should be prosecuted to the full extent of national and international law.

Anyone who has participated in attempts to conceal those events or to prevent disclosure of the CIA Torture Report, should be tried as accomplices after the fact to those war crimes.

The facility at Guantanamo Bay can be converted into a holding facility for DoJ staffers who tried to conceal war crimes. Poetic justice I would say.

Cybersecurity and Business ROI

Saturday, February 20th, 2016

Cybersecurity is slowing down my business, say majority of chief execs by Kat Hall.

From the post:

Cisco Live Chief execs polled in a major survey have little time for their cybersecurity folk and believe complying with security regulations hampers business.

Some 71 per cent of 1,000 top bosses surveyed by Cisco feel that efforts to shore up IT defences slows the pace of commerce. The study is due to be published next month.

Big cheeses cheesed off with security staff getting in the way of profit may well rid themselves of their troublesome priests, though: Craig Williams, senior technical leader at Cisco’s security biz Talos, believes quite a few bods working in computer security will not be in the sector in the next five years.

The profit motive is responsible for vulnerable software. Fitting the profit motive is responsible for a lack of effective efforts to protect against vulnerable software.

Does it seem odd that the business community views cybersecurity, both in terms of original software vulnerabilities and efforts to guard against them in the balance sheet of profit and loss?

That is even though data breaches can and do occur, if they are reasonable in scope and cost, it is easier to simply roll on and keep making a profit.

If you think about it, only the government and the uninformed (are those different groups?) think cybersecurity should be free and that it should never fail.

Neither one of those is the case nor will they ever be the case.

Security is always a question of how much security can you afford and for what purpose?

At the next report of a data breach, ask how do the costs of the breach compare to the cost to prevent the breach?

And to who? If a business suffers a data breach but the primary cost is to its customers, how does ROI work in that situation for the business? Or for the consumer? Am I going to move because the State of Georgia suffers data breaches?

I don’t recall that question ever being asked. Do you?

OMOP Common Data Model V5.0

Friday, February 19th, 2016

OMOP Common Data Model V5.0

From the webpage:

The Observational Medical Outcomes Partnership (OMOP) was a public-private partnership established to inform the appropriate use of observational healthcare databases for studying the effects of medical products. Over the course of the 5-year project and through its community of researchers from industry, government, and academia, OMOP successfully achieved its aims to:

  1. Conduct methodological research to empirically evaluate the performance of various analytical methods on their ability to identify true associations and avoid false findings
  2. Develop tools and capabilities for transforming, characterizing, and analyzing disparate data sources across the health care delivery spectrum, and
  3. Establish a shared resource so that the broader research community can collaboratively advance the science.

The results of OMOP's research has been widely published and presented at scientific conferences, including annual symposia.

The OMOP Legacy continues…

The community is actively using the OMOP Common Data Model for their various research purposes. Those tools will continue to be maintained and supported, and information about this work is available in the public domain.

The OMOP Research Lab, a central computing resource developed to facilitate methodological research, has been transitioned to the Reagan-Udall Foundation for the FDA under the Innovation in Medical Evidence Development and Surveillance (IMEDS) Program, and has been re-branded as the IMEDS Lab. Learn more at

Observational Health Data Sciences and Informatics (OHDSI) has been established as a multi-stakeholder, interdisciplinary collaborative to create open-source solutions that bring out the value of observational health data through large-scale analytics. The OHDSI collaborative includes all of the original OMOP research investigators, and will develop its tools using the OMOP Common Data Model. Learn more at

The OMOP Common Data Model will continue to be an open-source, community standard for observational healthcare data. The model specifications and associated work products will be placed in the public domain, and the entire research community is encouraged to use these tools to support everybody's own research activities.

One of the many data models that will no doubt be in play as work begins on searching for a common cancer research language.

Every data model has a constituency, the trick is to find two or more where cross-mapping has semantic and hopefully financial ROI.

I first saw this in a tweet by Christophe Lalanne.

How I build up a ggplot2 figure [Class Response To ggplot2 criticism]

Friday, February 19th, 2016

How I build up a ggplot2 figure by John Muschelli.

From the post:

Recently, Jeff Leek at Simply Statistics discussed why he does not use ggplot2. He notes “The bottom line is for production graphics, any system requires work.” and describes a default plot that needs some work:

John responds to perceived issues with using ggplot2 by walking through each issue and providing you with examples of how to solve it.

That doesn’t mean that you will switch to ggplot2, but it does mean you will be better informed of your options.

An example to be copied!

Motion Forcing Apple to comply with FBI [How Does Baby Blue’s Make Law More Accessible?]

Friday, February 19th, 2016

The DoJ is trying to force Apple to comply with FBI by Nicole Lee.

I mention this because Nicole includes a link to: Case 5:16-cm-00010-SP Document 1 Filed 02/19/16 Page 1 of 35 Page ID #:1, which is the GOVERNMENT’S MOTION TO COMPEL APPLE INC. TO COMPLY WITH THIS COURT’S FEBRUARY 16, 2016 ORDER COMPELLING ASSISTANCE IN SEARCH; EXHIBIT.

Whatever the Justice Department wants to contend to the contrary, a hearing date of March 22, 2016 on this motion is ample evidence that the government has no “urgent need” for information, if any, on the cell phone in question. The government’s desire to waste more hours and resources on dead suspects is quixotic at best.

Now that Baby Blue’s Manual of Legal Citation (Baby Blue’s) is online and legal citations are no long captives of the Bluebook® gang, tell me again how Baby Blue’s has increased public access to the law?

This is, after all, a very important public issue and the public should be able to avail itself of the primary resources.

You will find Baby Blue’s doesn’t help much in that regard.

Contrast Baby Blue’s citation style advice with adding hyperlinks to the authorities cited in the Department of Justice’s “memorandum of points and authorities:”

Federal Cases

Central Bank of Denver v. First Interstate Bank of Denver, 551 U.S. 164 (1994).

General Construction Company v. Castro, 401 F.3d 963 (9th Cir. 2005)

In re Application of the United States for an Order Directing a Provider of Communication Services to Provide Technical Assistance to the DEA, 2015 WL 5233551, at *4-5 (D.P.R. Aug. 27, 2015)

In re Application of the United States for an Order Authorizing In-Progress Trace o Wire Commc’ns over Tel. Facilities (Mountain Bell), 616 F.2d 1122 (9th Cir. 1980)

In re Application of the United States for an Order Directing X to Provide Access to Videotapes (Access to Videotapes), 2003 WL 22053105, at *3 (D. Md. Aug. 22, 2003) (unpublished)

In re Order Requiring [XXX], Inc., to Assist in the Execution of a Search Warrant Issued by This Court by Unlocking a Cellphone (In re XXX) 2014 WL 5510865, at #2 (S.D.N.Y. Oct. 31, 2014)

Konop v. Hawaiian Airlines, Inc., 302 F.3d 868 (9th Cir. 2002)

Pennsylvania Bureau of Correction v. United States Marshals Service, 474 U.S. 34 (1985)

Plum Creek Lumber Co. v. Hutton, 608 F.2d 1283 (9th Cir. 1979)

Riley v. California, 134 S. Ct. 2473 (2014) [For some unknown reason, local rules must allow varying citation styles for U.S. Supreme Courts decisions.]

United States v. Catoggio, 698 F.3d 64 (2nd Cir. 2012)

United States v. Craft, 535 U.S. 274 (2002)

United States v. Fricosu, 841 F.Supp.2d 1232 (D. Co. 2012)

United States v. Hall, 583 F. Supp. 717 (E.D. Va. 1984)

United States v. Li, 55 F.3d 325, 329 (7th Cir. 1995)

United States v. Navarro, No. 13-CR-5525, ECF No. 39 (W.D. Wa. Nov. 13, 2013)

United States v. New York Telephone Co., 434 U.S. 159 (1977)

Federal Statutes

18 U.S.C. 2510

18 U.S.C. 3103

28 U.S.C. 1651

47 U.S.C. 1001

47 U.S.C. 1002

First, I didn’t format a one of these citations. I copied them “as is” into a search engine so Baby Blue’s played no role in those searches.

Second, I added hyperlinks to a variety of sources for both the case law and statutes to make the point that one citation can resolve to a multitude of places.

Some places are commercial and have extra features while others are non-commercial and may have fewer features.

If instead of individual hyperlinks, I had a nexus for each case, perhaps using its citation as its public name, then I could attach pointers to a multitude of resources that all offer the same case or statute.

If you have WestLaw, LexisNexis or some other commercial vendor, you could choose to find the citation there. If you prefer non-commercial access to the same material, you could choose one of those access methods.

That nexus is what we call a topic in topic maps (“proxy” in the TMRM) and it would save every user, commercial or non-commercial, the sifting of search results that I performed this afternoon.

The hyperlinks I used above make some of the law more accessible but not as accessible as it could be.

Creating a nexus/topic/proxy for each of these citations would enable users to pick pre-formatted citations (good-bye to formatting manuals for most of us) and the law material most accessible to them.

That sounds like greater public access to the law to me.


Read the government’s “Memorandum of Points and Authorities” with a great deal of care.

For example:

The government is also aware of multiple other unpublished orders in this district and across the country compelling Apple to assist in the execution of a search warrant by accessing the data on devices running earlier versions of iOS, orders with which Apple complied.5

Be careful! Footnote 5 refers to a proceeding in the Eastern District of New York where the court sua sponte raised the issue of its authority under the All Writs Act. Footnote 5 recites no sources or evidence for the prosecutor’s claim of “…multiple other unpublished orders in this district and across the country….” None.

My impression is the government’s argument is mostly bluster and speculation. Plus repeating that Apple has betrayed its customers in the past and the government doesn’t understand its reluctance now. Business choices are not subject to government approval, or at least they weren’t the last time I read the U.S. Constitution.


How to find breaking news on Twitter

Friday, February 19th, 2016

How to find breaking news on Twitter by Ruben Bouwmeester, Julia Bayer, and Alastair Reid.

From the post:

By its very nature, breaking news happens unexpectedly. Simply waiting for something to start trending on Twitter is not an option for journalists – you’ll have to actively seek it out.

The most important rule is to switch perspectives with the eyewitness and ask yourself, “What would I tweet if I were an eyewitness to an accident or disaster?”

To find breaking news on Twitter you have to think like a person who’s experiencing something out of the ordinary. Eyewitnesses tend to share what they see unfiltered and directly on social media, usually by expressing their first impressions and feelings. Eyewitness media can include very raw language that reflects the shock felt as a result of the situation. These posts often include misspellings.

In this article, we’ll outline some search terms you can use in order to find breaking news. The list is not intended as exhaustive, but a starting point on which to build and refine searches on Twitter to find the latest information.

Great collections of starter search terms but those are going to vary depending on your domain of “breaking” news.

Good illustration of use of Twitter search operators.

Other collections of Twitter search terms?

Law Library Blogs

Friday, February 19th, 2016

Law Library Blogs by Aaron Kirschenfeld.

A useful collection of fifty-four (54) institutional law library blogs on Feedly.

Law library blogs being one of the online resources you should be following if you are interested in legal informatics.

Bluebook® vs. Baby Blue’s (Or, Bleak House “Lite”)

Friday, February 19th, 2016

The suspense over what objections The Bluebook® A Uniform System of Citation® could have to the publication of Baby Blue’s Manual of Legal Citation, ended with a whimper and not a bang on the publication of Baby Blue’s.

You may recall I have written in favor of Baby Blue’s, sight unseen, Bloggers! Help Defend The Public Domain – Prepare To Host/Repost “Baby Blue”, and, Oxford Legal Citations Free, What About BlueBook?.

Of course, then Baby Blue’s Manual of Legal Citation was published.

I firmly remain of the opinion that legal citations are in the public domain. Moreover, the use of legal citations is the goal of any citation originator so assertion of copyright on the same would be self-defeating, if not insane.

Having said that, Baby Blue’s Manual of Legal Citation is more of a Bleak House “Lite” than a useful re-imagining of legal citation in a modern context.

I don’t expect you to take my word for that judgment so I have prepared mappings from Bluebook® to Baby Blue’s and Baby Blue’s to Bluebook®.

Caveat 1: Baby Blue’s is still subject to revision and may tinker with its table numbering to further demonstrate its “originality” for example, so consider these mappings as provisional and subject to change.

Caveat 2: The mappings are pointers to equivalent subject matter and not strictly equivalent content.

How closely the content of these two publications track each other is best resolved by automated comparison of the two.

As general assistance, pages 68-191 (out of 198) of Baby Blue’s are in substantial accordance with pages 233-305 and 491-523 of the Bluebook®. Foreign citations, covered by pages 307-490 in the Bluebook®, merit a scant two pages, 192-193, in Baby Blue’s.

The substantive content of Baby Blue’s doesn’t begin until page 10 and continues to page 67, with tables beginning on page 68. In terms of non-table content, there is only 57 pages of material for comparison to the Bluebook®. As you can see from the mappings, the ordering of rules has been altered from the Bluebook®, no doubt as a showing of “originality.”

The public does need greater access to primary legal resources but treating the ability to cite Tucker and Celphane (District of Columbia, 1892-1893) [Baby Blue’s page 89] on a par with Federal Reporter [Baby Blue’s page 67], is not a step in that direction.

PS: To explore the issues and possibilities at hand, you will need a copy of the The Bluebook® A Uniform System of Citation®.

Some starter questions:

  1. What assumptions underlie the rules reported in the Bluebook®?
  2. How would you measure the impact of changing the rules it reports?
  3. What technologies drove the its form and organization?
  4. What modern technologies could alter its form and organization?
  5. How can modern technologies display content differently that used its citations?

A more specific question could be: Do we need 123 pages of abbreviations (Babyblue), 113 pages of abbreviations (Bluebook®) when software has the capability to display expanded abbreviations to any user? Even if written originally as an abbreviation.

Abbreviations being both a means of restricting access/understanding and partially a limitation of the printed page into which we sought to squeeze as much information as possible.

Should anyone raise the issue of “governance,” with you in regard to the Bluebook®, they are asking for a seat at the citation rule table for themselves, not you. My preference is to turn the table over in favor of modern mechanisms for citations that result in access, not promises of access if you learn a secret code.

PS: I use Bleak House as a pejorative above but it is one of my favorite novels. Bear in mind that I also enjoy reading the Bluebook and the Chicago Manual of Style. 😉

John McAfee As Unpaid Intern?

Friday, February 19th, 2016

I read with disappointment John McAfee’s JOHN MCAFEE: I’ll decrypt the San Bernardino phone free of charge so Apple doesn’t need to place a back door on its product.

McAfee writes:

So here is my offer to the FBI. I will, free of charge, decrypt the information on the San Bernardino phone, with my team. We will primarily use social engineering, and it will take us three weeks. If you accept my offer, then you will not need to ask Apple to place a back door in its product, which will be the beginning of the end of America.

I don’t object to McAfee breaking the security on the San Bernardino phone, but I do object to him doing it for free.

McAfee donating services to governments with budgets in the $trillions sets a bad precedent.

First, it enables and encourages the government to continue hiring from the shallow end of the talent/gene pool for technical services. When it is stymied by ROT-13, some prince or princess will come riding in to save the day.

Second, we know the use of unpaid interns damage labor markets, Unpaid internships: A scourge on the labor market.

Third, and perhaps most importantly, “free” services cause governments and others to value those services as of little or no value. “Free” services degrade the value of those services in the future.

McAfee’s estimate of breaking the encryption on the San Bernardino phone in three weeks seems padded to me. I suspect there will be eighteen days of drunken debauchery concluded by three (3) actual days of work when the encryption is broken. For a total of twenty-one (21) days.

Open request to John McAfee: Please withdraw your offer to break the encryption on the San Bernardino phone for free. Charge at least $1 million on the condition that it is tax free. The bar you set for the hacker market will benefit everyone in that market.

The FBI interest in breaking encryption on the San Bernadino phone has nothing to do with that incident. Both the shooters in a spur of the moment incident are dead and no amount of investigation is going to change that. What the FBI wants is a routine method of voiding such encryption in the future.

To that extent, sell the FBI the decrypted phone and not the decryption method. So you can maintain a market for phone decryption on a case by case basis. Coupled with a high enough price for the service, that will help keep FBI intrusions into iPhones to a minimum.

Anti-Terrorism By Quota?

Thursday, February 18th, 2016

U.S. Department of Justice, Office of the Attorney General, FY 2015 Annual Performance Report and FY 2017 Annual Performance Plan

Did you know that the FBI has terrorist plots quotas?


That may help explain why so many mentally ill people are duped into terrorist-like activities, assisted by the FBI. Have to make those quotas.

You may want to crib some of the definition of “disruption” for your next performance review:

A disruption is defined as interrupting or inhibiting a threat actor from engaging in criminal or national security related activity. A disruption is the result of direct actions and may include, but is not limited to, the arrest; seizure of assets; or impairing the operational capabilities of key threat actors.

That’s so, ah, vague. Anything could qualify as “impairing the operational capabilities….”

It does occur to me that if there were 440 “disruptions” up to September 30, 2015, then there should be some roughly equivalent number of arrests, indictments, etc. Yes?

Anyone have a quick handle on a source for such records?


How Much Can paragraph -> subparagraph mean? Lots under TPP!

Thursday, February 18th, 2016

Sneaky Change to the TPP Drastically Extends Criminal Penalties by Jeremy Malcolm.

From the post:

What does this surreptitious change from “paragraph” to “subparagraph” mean? Well, in its original form the provision exempted a country from making available any of the criminal procedures and penalties listed above, except in circumstances where there was an impact on the copyright holder’s ability to exploit their work in the market.

In its revised form, the only criminal provision that a country is exempted from applying in those circumstances is the one to which the footnote is attached—namely, the ex officio action provision. Which means, under this amendment, all of the other criminal procedures and penalties must be available even if the infringement has absolutely no impact on the right holder’s ability to exploit their work in the market. The only enforcement provision that countries have the flexibility to withhold in such cases is the authority of state officials to take legal action into their own hands.

Sneaky, huh?

The United States Trade Representative (USTR) isn’t representing your interests or mine in the drafting of the TPP.

If you had any doubt in that regard, Jeremy’s post on this change and others should remove all doubt in that regard.


Wednesday, February 17th, 2016

Johan Oosterman tweets:

Look, the next page begins with these words! Very helpful man makes clear how catchwords work. HuntingtonHM1048



Catchwords were originally used to keep pages in order for binding. You won’t encounter them in post-19th century materials but still interesting from a markup perspective.

The catchword and in this case with a graphic, appears on the page and the next page does appear with these words. Do you capture the catchword? Its graphic? The relationship between the catchword and the opening words of the next page? What if there is an error?

You Can Backup OrientDB Databases on Ubuntu 14.04

Wednesday, February 17th, 2016

How To Back Up Your OrientDB Databases on Ubuntu 14.04

From the post:

OrientDB is a multi-model, NoSQL database with support for graph and document databases. It is a Java application and can run on any operating system; it’s also fully ACID-complaint with support for multi-master replication.

An OrientDB database can be backed up using a backup script and also via the command line interface, with built-in support for compression of backup files using the ZIP algorithm.

By default, backing up an OrientDB database is a blocking operation — writes to be database are locked until the end of the backup operation, but if the operating system was installed on an LVM partitioning scheme, the backup script can perform a non-blocking backup. LVM is the Linux Logical Volume Manager.

In this article, you’ll learn how to backup your OrientDB database on an Ubuntu 14.04 server.

I don’t know if it is still true, given the rate of data breaches, but failure to maintain useful backups was the #1 cause for sysadmins being fired.

If that is still true today (and it should be), pay attention to proper backup processes! Yes, its unimaginative, tedious, routine, etc. but a life saver when the system crashes.

Don’t buy into the replicas, RAID5, etc., rant. Yes, do all those things plus have physical backups that are store off-site on a regular rotation schedule.

The job you save may well be your own.

Challenges of Electronic Dictionary Publication

Wednesday, February 17th, 2016

Challenges of Electronic Dictionary Publication

From the webpage:

April 8-9th, 2016

Venue: University of Leipzig, GWZ, Beethovenstr. 15; H1.5.16

This April we will be hosting our first Dictionary Journal workshop. At this workshop we will give an introduction to our vision of „Dictionaria“, introduce our data model and current workflow and will discuss (among others) the following topics:

  • Methodology and concept: How are dictionaries of „small“ languages different from those of „big“ languages and what does this mean for our endeavour? (documentary dictionaries vs. standard dictionaries)
  • Reviewing process and guidelines: How to review and evaluate a dictionary database of minor languages?
  • User-friendliness: What are the different audiences and their needs?
  • Submission process and guidelines: reports from us and our first authors on how to submit and what to expect
  • Citation: How to cite dictionaries?

If you are interested in attending this event, please send an e-mail to dictionary.journal[AT]

Workshop program

Our workshop program can now be downloaded here.

See the webpage for a list of confirmed participants, some with submitted abstracts.

Any number of topic map related questions arise in a discussion of dictionaries.

  • How to represent dictionary models?
  • What properties should be used to identify the subjects that represent dictionary models?
  • On what basis, if any, should dictionary models be considered the same or different? And for what purposes?
  • What data should be captured by dictionaries and how should it be identified?
  • etc.

Those are only a few of the questions that could be refined into dozens, if not hundreds of more, when you reach the details of constructing a dictionary.

I won’t be attending but wait with great anticipation the output from this workshop!

Support Apple and Tim Cook!

Wednesday, February 17th, 2016

Tim Cook’s open letter summarizes the demand being made on Apple by the FBI:

Specifically, the FBI wants us to make a new version of the iPhone operating system, circumventing several important security features, and install it on an iPhone recovered during the investigation. In the wrong hands, this software — which does not exist today — would have the potential to unlock any iPhone in someone’s physical possession.

The FBI may use different words to describe this tool, but make no mistake: Building a version of iOS that bypasses security in this way would undeniably create a backdoor. And while the government may argue that its use would be limited to this case, there is no way to guarantee such control.

That’s insane.

First, unless it has gone unreported, the FBI hasn’t offered to pay Apple for such an operating system. Last time I looked, involuntary servitude was prohibited by the US constitution.

Second, Apple is free to choose the products it wishes to create and cannot be forced to create any particular product, even if paid.

The FBI’s position runs counter to any principled notion of liberty. The FBI would have our liberty subject to the whim and caprice of law enforcement agencies.

I don’t use any Apple products but if a defense fund for Apple is created to resist this absurdity, I will certainly be in line to contribute to it.

So should we all.