ODI – Access To Legal Data News

January 13th, 2017

Strengthening our legal data infrastructure by Amanda Smith.

Amanda recounts an effort between the Open Data Institute (ODI) and Thomas Reuters to improve access to legal data.

From the post:

Paving the way for a more open legal sector: discovery workshop

In September 2016, Thomson Reuters and the ODI gathered publishers of legal data, policy makers, law firms, researchers, startups and others working in the sector for a discovery workshop. Its aims were to explore important data types that exist within the sector, and map where they sit on the data spectrum, discuss how they flow between users and explore the opportunities that taking a more open approach could bring.

The notes from the workshop explore current mechanisms for collecting, managing and publishing data, benefits of wider access and barriers to use. There are certain questions that remain unanswered – for example, who owns the copyright for data collected in court. The notes are open for comments, and we invite the community to share their thoughts on these questions, the data types discussed, how to make them more open and what we might have missed.

Strengthening data infrastructure in the legal sector: next steps

Following this workshop we are working in partnership with Thomson Reuters to explore data infrastructure – datasets, technologies and processes and organisations that maintain them – in the legal sector, to inform a paper to be published later in the year. The paper will focus on case law, legislation and existing open data that could be better used by the sector.

The Ministry of Justice have also started their own data discovery project, which the ODI have been contributing to. You can keep up to date on their progress by following the MOJ Digital and Technology blog and we recommend reading their data principles.

Get involved

We are looking to the legal and data communities to contribute opinion pieces and case studies to the paper on data infrastructure for the legal sector. If you would like to get involved, contact us.
…(emphasis in original)

Encouraging news, especially for those interested in building value-added tools on top of data that is made available publicly. At least they can avoid the cost of collecting data already collected by others.

Take the opportunity to comment on the notes and participate as you are able.

If you think you have seen use cases for topic maps before, consider that the Code of Federal Regulations (US), as of December 12, 2016, has 54938 separate but not unique, definitions of “person.” The impact of each regulation depending upon its definition of that term.

Other terms have similar semantic difficulties both in the Code of Federal Regulations as well as the US Code.

Cellebrite Hacked (Crowd-Funding for Tools?)

January 13th, 2017

Phone-Hacking Firm Cellebrite Got Hacked; 900GB of Data Stolen by Swati Khandelwal.

From the post:

Israeli firm Cellebrite, the popular company that provides digital forensics tools and software to help law enforcement access mobile phones in investigations, has had 900 GB of its data stolen by an unknown hacker.

But the hacker has not yet publicly released anything from the stolen data archive, which includes its customer information, user databases, and a massive amount of technical data regarding its hacking tools and products.

Instead, attackers are looking for possible opportunities to sell the access to Cellebrite system and data on a few selected IRC chat rooms, the hacker told Joseph Cox, contributor at Motherboard, who was contacted by the hacker and received a copy of the stolen data.

I can understand the hacker’s desire to make money and if unlike TheShadowBrokers, who are still pricing themselves out of a sale (approximately $8,230,000), the price is a reasonable one, crowd-funding might be a useful approach to purchasing the tools for public release.

I can’t afford to bid on the tools as an individual, but would contribute to a crowd-funded effort to secure a public release of the tools.

Why? The more hacking tools that are available, the less secure governments become.

People become less secure as well but governments are a far greater threat to people than cyber-criminals will ever be.

Cyber-criminals want your money, governments want your freedom.

Humanities Digital Library [A Ray of Hope]

January 13th, 2017

Humanities Digital Library (Launch Event)

From the webpage:

17 Jan 2017, 18:00 to 17 Jan 2017, 19:00


IHR Wolfson Conference Suite, NB01/NB02, Basement, IHR, Senate House, Malet Street, London WC1E 7HU


6-7pm, Tuesday 17 January 2017

Wolfson Conference Suite, Institute of Historical Research

Senate House, Malet Street, London, WC1E 7HU


About the Humanities Digital Library

The Humanities Digital Library is a new Open Access platform for peer reviewed scholarly books in the humanities.

The Library is a joint initiative of the School of Advanced Study, University of London, and two of the School’s institutes—the Institute of Historical Research and the Institute of Advanced Legal Studies.

From launch, the Humanities Digital Library offers scholarly titles in history, law and classics. Over time, the Library will grow to include books from other humanities disciplines studied and researched at the School of Advanced Study. Partner organisations include the Royal Historical Society whose ‘New Historical Perspectives’ series will appear in the Library, published by the Institute of Historical Research.

Each title is published as an open access PDF, with copies also available to purchase in print and EPUB formats. Scholarly titles come in several formats—including monographs, edited collections and longer and shorter form works.
(emphasis in the original)

Timely evidence that not everyone in the UK is barking mad! “Barking mad” being the only explanation I can offer for the Investigatory Powers Bill.

I won’t be attending but if you can, do and support the Humanities Digital Library after it opens.

The People vs the Snoopers’ Charter [No Input = No Surveillance, Of Gaff Hooks]

January 13th, 2017

The People vs the Snoopers’ Charter

From the webpage:

Ever googled something personal?

Who you text, email or call. Your social media activity. Which websites you visit.

Who you bank with. Where your kids go to school. Your sexual preferences, health worries, religious and political beliefs.

Since November, the Snoopers’ Charter – the Investigatory Powers Act – has let the Government access all this intimate information, building up an incredibly detailed picture of you, your family and friends, your hobbies and habits – your entire life.

And it won’t just be accessed by the Home Secretary. Dozens of agencies – the Department for Work and Pensions, HMRC and 46 others – can now see sensitive details of your personal life.

Over 200,000 people signed a petition to stop the Snoopers’ Charter, the Government didn’t listen so we’re taking them to court and we need your help.

There’s no opt-out and you don’t need to be suspected of anything. It will just happen all the time, to every one of us.

The Investigatory Powers Act lets Government keep records of and monitor your private emails, texts and phone calls – that’s where you are, who you speak to, what you say – and all without any suspicion of wrongdoing.

It forces internet companies like Sky, BT and TalkTalk to log every website you visit or app you have used, creating a vast database of deeply sensitive and revealing information. At a time when companies and governments are under increasingly frequent attack from hackers, this will create a goldmine for criminals and foreign spies.

Your support will help us clear the first hurdle, being granted permission by the Court to proceed with our case against the Government.

It’s time we all took a stand. We’ve told the Government we’ll see them in court and we need your help to make that happen. Please donate whatever you can to fund this vital case.
… (emphasis in original)

In case you are missing the background, see: Investigatory Powers Act 2016, which is now law in the UK.

The text as originally enacted.

The true extent of surveillance in the United States is unknown so it isn’t clear if the UK was playing “catch up” with this draconian measure or trying to beat the United States in a race to the least civil society.

Either way, it is an unfortunate milestone in the legal history of a country that gave us the common law.


From a data science perspective, I would point out that no input = no surveillance.

Your eyes maybe better than mine but in the surveillance camera image, I count at least three vulnerabilities that would render the camera useless.

Ordinary wire cutters:


won’t be useful but a gaff hook could be quite effective in creating a no input state.

The same principle applies whether you choose a professionally made gaff hook or some DIY version of the same instrument.

A gaff hook won’t stop surveillance of ISPs, etc., but disabling a surveillance camera could be seen as poking the government in the eye.

That’s an image I can enjoy. You?

PS: I’m not intimate with UK criminal law. Is possession of a gaff hook legal in the UK?

Applied Computational Genomics Course at UU: Spring 2017

January 12th, 2017

Applied Computational Genomics Course at UU: Spring 2017 by Aaron Quinlan.

I initially noticed this resource from posts on the two part Introduction to Unix (part 1) and Introduction to Unix (part 2).

Both of which are too elementary for you but something you can pass onto others. They do give you an idea of the Unix skill level required for the rest of the course.

From the GitHub page:

This course will provide a comprehensive introduction to fundamental concepts and experimental approaches in the analysis and interpretation of experimental genomics data. It will be structured as a series of lectures covering key concepts and analytical strategies. A diverse range of biological questions enabled by modern DNA sequencing technologies will be explored including sequence alignment, the identification of genetic variation, structural variation, and ChIP-seq and RNA-seq analysis. Students will learn and apply the fundamental data formats and analysis strategies that underlie computational genomics research. The primary goal of the course is for students to be grounded in theory and leave the course empowered to conduct independent genomic analyses. (emphasis in the original)

I take it successful completion will also enable you to intelligently question genomic analyses by others.

The explosive growth of genomics makes that a valuable skill in public discussions as well something nice for your toolbox.

Stanford CoreNLP – a suite of core NLP tools (3.7.0)

January 12th, 2017

Stanford CoreNLP – a suite of core NLP tools

The beta is over and Stanford CoreNLP 3.7.0 is on the street!

From the webpage:

Stanford CoreNLP provides a set of natural language analysis tools. It can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract particular or open-class relations between entity mentions, get quotes people said, etc.

Choose Stanford CoreNLP if you need:

  • An integrated toolkit with a good range of grammatical analysis tools
  • Fast, reliable analysis of arbitrary texts
  • The overall highest quality text analytics
  • Support for a number of major (human) languages
  • Available interfaces for most major modern programming languages
  • Ability to run as a simple web service

Stanford CoreNLP’s goal is to make it very easy to apply a bunch of linguistic analysis tools to a piece of text. A tool pipeline can be run on a piece of plain text with just two lines of code. CoreNLP is designed to be highly flexible and extensible. With a single option you can change which tools should be enabled and which should be disabled. Stanford CoreNLP integrates many of Stanford’s NLP tools, including the part-of-speech (POS) tagger, the named entity recognizer (NER), the parser, the coreference resolution system, sentiment analysis, bootstrapped pattern learning, and the open information extraction tools. Moreover, an annotator pipeline can include additional custom or third-party annotators. CoreNLP’s analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications.

What stream of noise, sorry, news are you going to pipeling into the Stanford CoreNLP framework?


Imagine a web service that offers levels of analysis alongside news text.

Or does the same with leaked emails and/or documents?

Interactive Color Wheel

January 12th, 2017

Interactive Color Wheel


You will need to visit this interactive color wheel to really appreciate its capabilities.

What I find most helpful is the display of hex codes for the colors. I can distinguish colors but getting the codes right can be a real challenge.


Inaugural Ball Cancellation!

January 12th, 2017

Your antipathy towards upcoming inaugural balls and work on possible blockades is having an impact!

The Arkansas Inaugural Ball has been cancelled do to “low demand.”

The listing of inauguration balls I pointed to yesterday has plenty of other places for blockades and other mischief.

Looking forward to the least attended inauguration in history, longest and largest traffic snarl in history and complete social disasters at the inauguration balls.

Flashing/Mooning For Inauguration Forecast

January 11th, 2017

You can find updated weather forecast for January 20, 2017, updated from my speculations in Blockading Washington – #DisruptJ20 – Unusual Tactic – Nudity in Angela Friz’s Here’s the first of what will surely be many inauguration weather forecasts.

Angela isn’t reporting sun-bathing weather but warm enough that a heavy coat over your birthday suit may be sufficient.

Of course, you could always build a fire in a trash barrel, something we are likely to see a lot of during the Trump presidency.

I’m sure other protesters, in the buff or not will appreciate the extra warmth.

Missing The Beltway Blockade? Considering Blockading A Ball?

January 11th, 2017

For one reason or another, you may not be able to participate in a Beltway Blockade January 20, 2017, see:

Don’t Panic!

You can still enjoy a non-permitted protest and contribute to the least attended inauguration in history!

2017 Presidential Inaugural Balls

The list is short on location information for many of the scheduled balls but the Commander in Chief’s Ball, Presidential Inaugural Ball, Mid-Atlantic Inauguration Ball, Midwest Inaugural Ball, Western Inaugural Ball, and the Neighborhood Inaugural Ball, are all being held at the: Walter E. Washington Convention Center.

Apologies but I haven’t looked up prior attendance records but just based on known scheduling, disruption in the area of Walter E. Washington Convention Center looks like it will pay the highest returns.

For the balls with location information and/or location information that I can discover, I will post a fuller list with Google Map links tomorrow.

Oh, for inside protesting, here are floor plans of the Walter E. Washington Convention Center.

Those are the official, posted floor plans.

Should that link go dark, let me know. I have a backup copy of them. ;-)

Overcoming Congressional Provincialism

January 10th, 2017

While doing hard core data collection on members of Congress, I kept encountering:

Regrettably, I am unable to reply to any email from individuals residing outside of my congressional district.

The problem, of course, is that you may have an opinion on national intelligence but your representative, for example, isn’t on the intelligence committee.

What if you could identify and reach across congressional boundaries?

More on that tomorrow, alone with news of the data set that has distracted me for several days!

ANSWER Secures More “Permitted” Protest Space

January 9th, 2017


The ANSWER Coalition has:

…secured another permitted assembly area for an even larger gathering site on the parade route, the Navy Memorial (8th St. and Pennsylvania Ave. NW).

See: ANSWER Coalition for details and ways to support.

If your not interested in “permitted” protesting, it could be the case that attendees and protesters alike find it difficult if not impossible to attend the inauguration. See protests for an ongoing series of speculations in that direction.

PS: I remember the Constitution reading:

Congress shall make no law … abridging…the right of the people peaceably to assemble…


Congress shall make no law … abridging…the right of the people to be permitted to assemble…

Do you?

> 3000 for 2017? – Defining Blockade Success

January 7th, 2017

Trump Inauguration Planners Unveil Tickets, Map says:

About 3,000 people holding purple tickets got stuck on foot in the Interstate 395 tunnel when trying to attend President Barack Obama’s first inauguration, causing many of them to miss the ceremony. The tunnel was later nicknamed the Purple Tunnel of Doom.

Data science projects should define criteria for success, before the project starts.

That helps prevent management from moving the goal posts to claim victory where none exists and protects you from “but it doesn’t ….” when that feature wasn’t included in the criteria for success.

In efforts to #DisruptJ20 the Trump inauguration, it appears that at least 3K people must be prevented from reaching the inauguration.

For your planning purposes, the 2017 SWEARING IN CEREMONY INFORMATION (FAQ) advises:

What time should I get to the US Capitol for the ceremony?

The gates to the mall will most likely open at 5:00am, and the ticketed areas are usually filled by 8:00am. The ceremony will begin around 11:30am with a musical performance prior to that time.

Blockaders are in for a long night!

The rate of removal of cars that intentionally or unintentionally run out of gas, disabled vehicles (think flat tires), etc., is unknown.

As a guesstimate, I would say gridlock conditions starting around 3 AM and persisting until NOON, EST, would result in an inauguration to which a majority of the ticket holders did not attend.

I wonder if the news channels will focus more on protesters or empty bleachers? Guesses?

Implementing Indivisible – Early Difficulties

January 7th, 2017

Indivisible (Indivisible: A Practical Guide for Resisting the Trump Agenda recommends appearing at every public appearance of your representative.

The same logic to other representatives since their committees votes on issues that impact you.

Question: How do you find all the offices of members of the US House or Senate?

Answer: Not easily.

Take Senator Diane Feinstein for example.

First source

http://bioguide.congress.gov/scripts/biodisplay.pl?index=F000062 gives:


Second Source

GPO Congressional Directory provides:


Third Source

https://www.congress.gov/member/dianne-feinstein/F000062 lists:


Local office information appears at Senator Diane Feinstein, but its listing varies from page to page, making automated extraction an iffy proposition.

Empowering Indivisible

Congress is sorely in need of a topic map for its members, that much is obvious.

What does the lack of an easy way to local office information suggest to you?

Would local office information improve your odds of contacting your own representative and others?

Online Database of “Verified” Twitter Accounts (Right On!)

January 6th, 2017

The WikiLeaks Task Force tweeted on 6 Jan. 2017:

We are thinking of making an online database with all “verified” twitter accounts & their family/job/financial/housing relationships.

There are a number of comments to this tweet, the ones containing “dox,” “doxx,” “doxing,” “creepy,” “evil,” etc. that should be ignored.

Ignored because intelligence agencies, news organizations, merchants, banks, etc. are all collecting and organizing that data and more.

Ignored because the public should not preemptively disarm itself.

If anything, the Wikileaks Task Force should start with “verified” Twitter accounts and expand outwards, rapidly.

The public should be able to rapidly find relationships of individuals nominated for office, who contribute money to candidates, who profit from contracts, who launder public money. The public should have the same advantages intelligence agencies enjoy today.

To the nay-sayers to the WikiLeaks Task Force proposal:

Why do you seek to prevent putting the public on a better footing vis-a-vis government?

Question to my readers: What do the nay-sayers gain from a disarmed public?

Three More Reasons To Learn R

January 6th, 2017

Three reasons to learn R today by David Smith.

From the post:

If you're just getting started with data science, the Sharp Sight Labs blog argues that R is the best data science language to learn today.

The blog post gives several detailed reasons, but the main arguments are:

  1. R is an extremely popular (arguably the most popular) data progamming language, and ranks highly in several popularity surveys.
  2. Learning R is a great way of learning data science, with many R-based books and resources for probability, frequentist and Bayesian statistics, data visualization, machine learning and more.
  3. Python is another excellent language for data science, but with R it's easier to learn the foundations.

Once you've learned the basics, Sharp Sight also argues that R is also a great data science to master, even though it's an old langauge compared to some of the newer alternatives. Every tool has a shelf life, but R isn't going anywhere and learning R gives you a foundation beyond the language itself.

If you want to get started with R, Sharp Sight labs offers a data science crash course. You might also want to check out the Introduction to R for Data Science course on EdX.

Sharp Sight Labs: Why R is the best data science language to learn today, and Why you should master R (even if it might eventually become obsolete)

If you need more reasons to learn R:

  • Unlike Facebook, R isn’t a sinkhole of non-testable propositions.
  • Unlike Instagram, R is rarely NSFW.
  • Unlike Twitter, R is a marketable skill.

Glad to hear you are learning R!

ANSWER Protest Permit – Least Attended Inauguration in History

January 5th, 2017

Permits secured for Jan. 20 Mass Protest at the Inauguration!

From the post:

The ANSWER Coaltion has a permit for 14th street and Pennsylvania Ave NW and a portion of Freedom Plaza, beginning at 7:00am on Jan. 20 for a protest that will continue throughout the day.

We are also continuing our long legal battle for additional permitted space along Pennsylvania Ave. on Inauguration Day. The National Park Service has stonewalled the issuance of additional permits in an attempt to sanitize the most visible and primary locations along the route from dissent and free speech activity. Additionally, we are continuing to challenge the illegal, unconstitutional system whereby NPS reserves large portions of the route and other parts of D.C. on Inauguration Day, and in the days and weeks prior to and after, on behalf of Trump’s Presidential Inaugural Committee.

We think it’s critically important for the people to not be intimidated, to not be silent and to use all public spaces to express themselves.

(emphasis in original)

All of that is true, but attending the ANSWER protest means you will be counted as “attending the inauguration.”

What if you took ANSWER’s later advice:

use all public spaces to express themselves.

With tailgate parties on the Beltway (Tailgating @DisruptJ20), and blockades of the same (Low Risk Blockading of the DC Beltway).

Secure President-elect Trump’s place in the history books, with the least attended inauguration in history.

You can be part of that historical event, while sitting on the Beltway out of gas.

Which is it? Do you want to be “in attendance” or “truant” for Trump’s inauguration?

Beall’s List of Predatory Publishers 2017 [Avoiding “fake” scholarship, journalists take note]

January 5th, 2017

Beall’s List of Predatory Publishers 2017 by Jeffrey Beall.

From the webpage:

Each year at this time I formally announce my updated list of predatory publishers. Because the publisher list is now very large, and because I now publish four, continuously-updated lists, the annual releases do not include the actual lists but instead include statistical and explanatory data about the lists and links to them.

Jeffrey maintains four lists of highly questionable publishers/publications:

Beall’s list should be your first stop when an article arrives from an unrecognized publication.

Not that being published in Nature and/or Science is a guarantee of quality scholarship, but publication on Beall’s list should raise publication stopping red flags.

Such a publication could be true, but bears the burden of proving itself to be so.

BaseX – Bad News, Good News

January 4th, 2017

Good news and bad news from BaseX. Christian Grün posted to the BaseX discussion list today:

Dear all,

This year, there will be no BaseX Preconference Meeting in XML Prague. Sorry for that! The reason is simple: We were too busy in order to prepare ourselves for the event, look for speakers, etc.

At least we are happy to announce that BaseX 8.6 will finally be released this month. Stay tuned!

All the best,


That’s disappointing but understandable. Console yourself with watching presentations from 2013 – 2016 and reviewing issues for 8.6.

Just a guess on my part, ;-), but I suspect more testing of BaseX builds would not go unappreciated.

Something to keep yourself busy while waiting for BaseX 8.6 to drop.

Eight Years of the Republican Weekly Address

January 4th, 2017

We looked at eight years of the Republican Weekly Address by Jesse Rifkin.

From the post:

Every week since Ronald Reagan started the tradition in 1982, the president delivers a weekly address. And every week, the opposition party delivers an address as well.

What can the Weekly Republican Addresses during the Obama era reveal about how the GOP has attempted to portray themselves to the American public, by the public policy topics they discussed and the speakers they featured? To find out, GovTrack Insider analyzed all 407 Weekly Republican Addresses for which we could find data during the Obama era, the first such analysis of the weekly addresses as best we can tell. (See the full list of weekly addresses here.)

Sometimes they discuss the same topic as the president’s weekly address — particularly common if a noteworthy event occurs in the news that week — although other times it’s on an unrelated topic of the party’s choosing. It also features a rotating cast of Republicans delivering the speech, most of them congressional, unlike the White House which has almost always featured President Obama, with Vice President Joe Biden occasionally subbing in.

On the issues, we found that Republicans have almost entirely refrained from discussing such inflammatory social issues as abortion, guns, or same-sex marriage in their weekly addresses, despite how animating such issues are to their base. They also were remarkably silent on Donald Trump until the week before the election.

We also find that while Republicans often get slammed on women’s rights and minority issues, Republican congressional women and African Americans are at least proportionally represented in the weekly addresses, compared to their proportions in Congress, if not slightly over-represented — but Hispanics are notably under-represented.

You have seen credible claims of On Predicting Social Unrest Using Social Media by Rostyslav Korolov, et al., and less credible claims from others, CIA claims it can predict some social unrest up to 5 days ahead.

Rumor has it that the CIA has a Word template named, appropriately enough: theRussiansDidIt. I can neither confirm nor deny that rumor.

Taking credible actors at their word, are you aware of any parallel research on weekly addresses by Congress and following congressional action?

A very lite skimming of the literature on predicting Supreme Court decisions comes up with: Competing Approaches to Predicting Supreme Court Decision Making by Andrew D. Martin, Kevin M. Quinn, Theodore W. Ruger, and Pauline T. Kim (2004), Algorithm predicts US Supreme Court decisions 70% of time by David Kravets (2014), Fantasy Scotus (a Supreme Court fantasy league with cash prizes).

Congressional voting has been studied as well, for instance, Predicting Congressional Voting – Social Identification Trumps Party. (Now there’s an unfortunate headline for searchers.)

Congressional votes are important but so is the progress of bills, the order in which issues are addressed, etc., and it the reflection of those less formal aspects in weekly addresses from congress that could be interesting.

The weekly speeches may be as divorced from any shared reality as comments inserted in the Congressional Record. On the other hand, a partially successful model, other than the timing of donations, may be possible.

Q&A Cathy O’Neil…

January 4th, 2017

Q&A Cathy O’Neil, author of ‘Weapons of Math Destruction,’ on the dark side of big data by Christine Zhang.

From the post:

Cathy O’Neil calls herself a data skeptic. A former hedge fund analyst with a PhD in mathematics from Harvard University, the Occupy Wall Street activist left finance after witnessing the damage wrought by faulty math in the wake of the housing crash.

In her latest book, “Weapons of Math Destruction,” O’Neil warns that the statistical models hailed by big data evangelists as the solution to today’s societal problems, like which teachers to fire or which criminals to give longer prison terms, can codify biases and exacerbate inequalities. “Models are opinions embedded in mathematics,” she writes.

Great interview that hits enough high points to leave you wanting to learn more about Cathy and her analysis.

On that score, try:

Read her mathbabe blog.

Follow @mathbabedotorg.

Read Weapons of math destruction : how big data increases inequality and threatens democracy.

Try her new business: ORCAA [O’Neil Risk Consulting and Algorithmic Auditing].

From the ORCAA homepage:

ORCAA’s mission is two-fold. First, it is to help companies and organizations that rely on time and cost-saving algorithms to get ahead of this wave, to understand and plan for their litigation and reputation risk, and most importantly to use algorithms fairly.

The second half of ORCAA’s mission is this: to develop rigorous methodology and tools, and to set rigorous standards for the new field of algorithmic auditing.

There are bright line cases, sentencing, housing, hiring discrimination where “fair” has a binding legal meaning. And legal liability for not being “fair.”

Outside such areas, the search for “fairness” seems quixotic. Clients are entitled to their definitions of “fair” in those areas.

FEMA – HOW-TO Demonize Your Opponents

January 4th, 2017

Beryl Lipton writes in FEMA Field Force manual offers protesters insights into the future of crowd control:

Though construction on the Dakota Access Pipeline has halted for now, the lessons for law enforcement and protesters are still percolating. For the former, they’ll likely find themselves one day studying the event as they prepare for future mass gatherings, maybe in a guide just like the one distributed by the Federal Emergency Management Agency (FEMA) to North Dakota law enforcement in September.

… (graphic of DHS omitted)

Obtained by Unicorn Riot via a request to the North Dakota Department of Corrections, an agency with far fewer individuals in its custody than attended the protest at Standing Rock, the manual is a Field Force Operations training program for students, a crash course in eight parts on how to deal with a mixed crowd of lawful and unlawful dissenters.

I extracted the Field Force Operations PER-200 manual, from the zip file posted at MuckRock for your reading/access convenience.

As a government training document, allegedly “our” government, the manual fails in a number of aspects.

Consider its efforts to demonize protesters:

b. Protesters. Not every protester is the same nor should be viewed the same by law enforcement. By better understanding protesters, law enforcement officers can make better choices on how to respond. A small group of unruly protesters can stand out from the peaceful majority—often comprised of others who just want to be there along with innocent bystanders accidentally caught in the melee.

(1) Everyday citizens. Most protests include everyday citizens gathering through their First Amendment right to peaceably make their voices heard (Driscoll, 2003).

(2) Professional protesters. These people train or are trained in protester tactics often by direct action organizations that promote two universal messages: First, intervention demands responsibility. Second, a smaller harm is acceptable if it prevents a greater harm. One interpretation of this second message is that it is acceptable for protesters to break laws they consider less important like vandalism to prevent a greater harm like environmental damage. Some activism organizations may produce booklets that demonstrate use or construction of devices, including the infamous Road Raging – Top Tips for Wrecking Roadbuilding (Road Alert!, 1997).

(3) Anarchists. These people aim to disrupt, often seeking to challenge authority and capitalism at any cost. They are frequently young college students who express themselves through the destruction of property. Anarchists may mix into peaceful protests despite the efforts of the nonviolent protesters to limit destructive activities—leading to fighting sometimes between protesters. One common anarchist technique is the black bloc (violent, destructive activity), demonstrated at the Occupy Seattle protests.
… (at page 106)

If you think that lacks a charitable attitude towards ordinary people out-raged as some government misconduct, consider this listing of the “types” of individual protesters:

(1) Impulsive. These short-tempered people are the kind who are always spoiling for a fight and only need a fancied insult or a slight provocation to excite them to violence or incite others to violence.

(2) Suggestible. People who get into the action early and are easily influenced to follow the lead of the more violent.

(3) Cautious. Individuals who wait for the cloak of anonymity to give them courage by hiding their identity.

(4) Yielders. Those who do not join the action until a large number of participants give the impression of universality. In other words, “Everyone is taking part, so why shouldn’t I?”

(5) Supportive. People who do not actively join the mob but who enjoy the show and even shout encouragement.

(6) Resisters. Persons whose standards of judgment are not swayed by the emotional frenzy of the mob but who maintain level heads. They can disagree with the actions of the majority.

(7) Psychopathic. Individuals with a pathological personality structure are angry at the world and seek to use a riotous situation as a means of getting even with society (FBI, 1967, p. 21).
… (page 108)

How’s that for a rhetorical move?

In three pages the reader is drawn from “everyday citizens” to a range of personality disorders that range up to the “psychopathic.”

Any reader instinctively feels a gathering of protesters is a boiling pot of crazy ready to explode.

A false worldview but one promoted by the FEMA manual.

Imagine you are a local law enforcement officer, with little or no personal experience with civil disobedience, being told by FEMA that protesters are the harbingers of chaos. What’s your reaction going to be?

It’s only one example but Julia Carrie Wong and Sam Levin report in: Standing Rock protesters hold out against extraordinary police violence:

Harkening back to an earlier era, when police in Birmingham, Alabama, attacked African American schoolchildren with dogs and high-pressure water hoses, North Dakota officers trained water cannons on hundreds of Dakota Access pipeline protesters.

On the night of 20 November, though, the temperature was below freezing and the protesters, who call themselves “water protectors”, were camping outdoors for the evening.

Water is just one many “less-than-lethal” munitions that have been trained against the activists.

“They seem to have almost an infinite arsenal of different types of weapons,” said Rachel Lederman, attorney for the National Lawyers Guild (NLG). “I don’t think local law enforcement understands how dangerous they are.”

Police have acknowledged using sponge rounds, bean bag rounds, stinger rounds, teargas grenades, pepper spray, Mace, Tasers and a sound weapon. The explosive teargas grenades in use at Standing Rock have been banned by some US law enforcement agencies because they indiscriminately spray people, Lederman said.

More than two dozen people were hospitalized and 300 injured during the conflict, according to the medic and healer council. One woman’s arm was nearly blown off, according to her father, and the complaint alleges that another woman was shot in the eye, resulting in the detachment of her retina and possible permanent blindness.

Question: Should “everyday citizens” be sprayed with water cannon in sub-zero weather and assaulted with sponge rounds, bean bag rounds, stinger rounds, teargas grenades, pepper spray, Mace, Tasers and a sound weapon?

That’s not a hard question is it?

I suspect every non-psychotic law enforcement officer at Standing Rock would answer no, just like you.

But Morton County sheriff Kyle Kirchmeier confirms the FEMA schooled view of law enforcement:

On Thanksgiving, Morton County sheriff Kyle Kirchmeier released a statement condemning the actions of “paid agitators and protesters” without offering any evidence that people were being paid to fight the pipeline. The department has not responded to requests to substantiate the claim.

In another statement that week, the sheriff said activists were not engaged in “civil disobedience” but were acting like “evil agitators”. The Mandan, North Dakota, police chief, Jason Ziegler, has asserted that law enforcement agencies “can use whatever force necessary to maintain peace”.

To be fair, numerous law enforcement agencies have declined to subscribe to this FEMA inspired madness, Sheriffs Across US Refusing to Send Police and Equipment to DAPL as Outrage and Costs Grow by Claire Bernish.

At least in this instance. When protests come closer to home is the real test of law enforcement avoiding the FEMA “…evil agitators….” psychosis.

Government training manuals that humanize protesters are less likely to result in protests being used as proving grounds for “less lethal” weapons.

Teaching police officers to see protesters as their kith and kin will make major strides in the humane treatment of protesters.

Police officers may realize they have more in common with protesters than with players far removed from consequences on the streets. (Is that what FEMA is trying to avoid?)

Sharpening Your Hacking Skills!

January 3rd, 2017

40+ Intentionally Vulnerable Websites To Practice Your Hacking Skills.

From the post:

Attack is definitely the best form of defense and this also applies to Cyber Security.

Companies are now hacking their own websites and even hiring ethical hackers in an attempt to find vulnerabilities before the bad guys do. As such ethical hacking is now a much sought after skill but hacking websites without permission can get you on the wrong side of the law, even if you’re just practising.

So how do practice your hacking skills whilst staying on the right side of the law? Well there are a number of deliberately vulnerable websites out there designed to allow you to practise and hone your hacking skills, without fear of prosecution. So we’ve decided to compile a list of over forty of them, each with short description.

Once you feel comfortable finding vulnerabilities, the next step could be a job as a penetration tester or participation in one of the bug bounty programmes where companies reward you based on the severity of the bugs that you find, which could be very lucrative. Facebook is one such company offering a bug bounty programme and has paid out more than a million dollars to date.

So without further ado, here’s the list. If you know of a good hacking website that’s not on this list, let me know and I’ll add it. Oh, and don’t forget to bookmark this page! :)

Yes! Not only bookmark this page but visit the sites it lists!

My only disappointment was that the Office of Personnel Management wasn’t listed. I guess the OPM site is requiring permission for hacking now. ;-)

The GRU-Ukraine Artillery Hack That May Never Have Happened

January 3rd, 2017

The GRU-Ukraine Artillery Hack That May Never Have Happened by Jeffrey Carr.

From the post:

Crowdstrike’s latest report regarding Fancy Bear contains its most dramatic and controversial claim to date; that GRU-written mobile malware used by Ukrainian artillery soldiers contributed to massive artillery losses by the Ukrainian military. “It’s pretty high confidence that Fancy Bear had to be in touch with the Russian military,” Dmitri Alperovich told Forbes. “This is exactly what the mission is of the GRU.”

Crowdstrike’s core argument has three premises:

  1. Fancy Bear (APT28) is the exclusive developer and user of X-Agent [1]
  2. Fancy Bear developed an X-Agent Android variant specifically to compromise an Android ballistic computing application called Попр-Д30.apk for the purpose of geolocating Ukrainian D-30 Howitzer artillery sites[2]
  3. The D-30 Howitzers suffered 80% losses since the start of the war.[3]

If all of these premises were true, then Crowdstrike’s prior claim that Fancy Bear must be affiliated with the GRU [4] would be substantially supported by this new finding. Dmitri referred to it in the PBS interview as “DNA evidence”.

In fact, none of those premises are supported by the facts. This article is a summary of the evidence that I’ve gathered during hours of interviews and background research with Ukrainian hackers, soldiers, and an independent analysis of the malware by CrySys Lab. My complete findings will be presented in Washington D.C. next week on January 12th at Suits and Spooks.

Sadly I won’t be in attendance but am looking forward to reports of Carr’s details on the alleged GRU-Ukraine hack.

Not that I am expecting the New York Times to admit the Russian hacking of the 2016 election is a tissue of self-serving lies.

Disappointing but not expected.

Achieving a 300% speedup in ETL with Apache Spark

January 3rd, 2017

Achieving a 300% speedup in ETL with Apache Spark by Eric Maynard.

From the post:

A common design pattern often emerges when teams begin to stitch together existing systems and an EDH cluster: file dumps, typically in a format like CSV, are regularly uploaded to EDH, where they are then unpacked, transformed into optimal query format, and tucked away in HDFS where various EDH components can use them. When these file dumps are large or happen very often, these simple steps can significantly slow down an ingest pipeline. Part of this delay is inevitable; moving large files across the network is time-consuming because of physical limitations and can’t be readily sped up. However, the rest of the basic ingest workflow described above can often be improved.

Campaign finance data suffers more from complexity and obscurity than volume.

However, there are data problems where volume and not deceit is the issue. In those cases, you may find Eric’s advice quite helpful.

Expiring Patents

January 3rd, 2017

Expatents returns a list of patents expiring that day and you can sign up for a weekly digest of expiring patents.

The site claims that over 80% of patents are never commercially exploited.

Are expired patents, that is without commercial exploitation, like articles that are never cited by anyone?

Potential shareholder litigation over the not-so-trivial cost of patents that never resulted in commercial exploitation?

Was it inside or outside counsel that handled the patent filings?

There’s an interesting area for tracing relationships (associations) and expenses.

A 3-Second Blockading Proposal

January 3rd, 2017

Nearly everyone I know has read Steal This Book by Abbie Hoffman at some point but despite its being on the internet, younger readers may have missed it.

To set the background for the 3-second blockading proposal, consider what Abbie has to say about anti-tire weapons:

Don’t believe all those bullshit tire ads that make tires seem like the Superman of the streets. Roofing nails spread out on the street are effective in stopping a patrol car. A nail sticking out from a strong piece of wood wedged under a rear tire will work as effectively as a bazooka. An ice pick will do the trick repeatedly but you’ve got to have a strong arm to strike home…. (page 122 of the pdf Steal This Book I can’t say how that corresponds to other copies.)

Everything Abbie says is true, but I see problems with each of his suggestions:

  1. Roofing nails: Roofing nails work, are easy to purchase and not expensive. At the same time, they are an indiscriminate weapon, not unlike carpet bombing when the objective is intersecting a single road.
  2. Nail in wood: The comparison of a “strong piece of wood” and “bazooka” makes me think of nails in the end of a 2 x 4 board. Works but even TSA agents trained to spot bottled water can spot someone sporting a 2 x 4 on one shoulder. (Not what Abbie meant but a humorous image.)
  3. Ice Pick: Like the man says, requires “a strong arm to strike home.” If reduced to using an ice pick, you do know to go for the thinner sideways. Yes?

Other tire weapon methods include: flattening tires with bayonets, shooting out tires, snd the current fad with spike strips:


Pictured is the Stinger Spike System, which is advertised online for $889.20 (not including shipping and tax).

Blockading with tire weapons sounds indiscriminate (roofing nails), obvious (2 x 4 with nails), difficult (ice pick), unlikely (bayonets/guns), and/or expensive (police spikes).

But that’s not necessarily so.

What if you had the opportunity to use this truck as part of a highway blockade:


Impressive. Yes?

Look at all those tires! That just seems way too difficult. But, perhaps not.

How many of those tires would have to be disabled to make that semi-train part of a road blockade?

Here’s an image to help with that question:


Out of all those tires, only one of the font two steering tires must be disabled. Disable either one and the truck becomes a fixture unless and until someone can clear enough traffic from around it and repair the tire.

BTW, the same lesson applies to school buses, tour buses, garbage trucks, dump trucks, Metro buses (includes links to schedules in case you want to wait for one), in short, anything that is big and difficult to move until repaired.

A 3-Second Blockading Proposal

Large, difficult to repair vehicles make great elements of a roadway blockade. If they lose either one of their front tires, there they sit until repaired.

So how do either one of the front tires get flattened?

What fact about tires did Abbie Hoffman overlook in Steal This Book?

You’re ahead of me. Yes, valve stems.

Valve stems are nearly obscured on tractor trailer rigs by the wheel housing:


Valve stems vary depending on the type of vehicle and by design are not easy to cut.

The ideal (and unproven) scenario would be:

  1. Spot blockade target’s valve stem
  2. Cut valve stem
  3. Be on your way

all in 3 seconds or less.

But see the next section:

Lack of Practical Experience – Variety Intervenes

My first impulse was to recommend using robust cutters:


for severing valve stems but your success with those will depend upon your arm strength and the tires you encounter.

Quite frankly, the variety of wheels and tires is too large to make a judgment about tools until reconnaissance on the tires you are likely to encounter.

Add to that my lack of tractor trailer tires immediately available for trials, and further research is indicated.

Any research/experience you can point to and/or contribute concerning cutting valve stems, specifying tire model(s) and the tool(s) used, would be greatly appreciated.

Steal This Book is still a great read but is sorely in need of an update. It does have my favorite paragraph from all counter-culture literature:

If you are around a military base, you will find it relatively easy to get your hands on an M-79 grenade launcher, which is like a giant shotgun and is probably the best self-defense weapon of all time.

It’s not clear what experience Abbie had with the M-79, but you have to admit it is one hell of an image:


I understand that ammunition for the M-79 is hard to find. You?

News Bubble Bursting – World Newspapers and Magazines Online

January 2nd, 2017

World Newspapers and Magazines Online

Newspaper and magazine listings for one hundred and ninety-nine (199) countries.

At the rate of one country per week, it would take 3.8 years to work your way through this listing.

Considering the depth of government and corporate deception, don’t you owe it to yourself, if not your readers, to sample that deception widely?

In an age of automatic, if not always smooth and correct, translation, do you have a good excuse for doing any less?

Russian Hackers – Repeating History?

January 2nd, 2017

Maybe there is something to reading accounts of recent history. (A fascination with markup/computer and ANE languages doesn’t lead to much recent reading in “recent” history.)

But I was reading Manufacturing Consent by Edward S. Herman and Noam Chomsky (2002), when I encountered a repetition of the currently popular meme, “Russian hackers hacked the DNC.” (Despite the Podesta emails being obtained due to user carelessness that is hard to characterize as a “hack.”)

History Repeating (Not for the first time)

Set your wayback machine for 1981, another time when Russia (then the USSR) was an “evil empire.” (Or so claimed by people with particular agendas.)

A Turkish facist and member of a violent anti-left party in Turkey, one Mehmet Ali Agca, attempted to assassinate Pope John Paul II in May 1981. After being interrogated for 17 months, Agca “confessed” that he was an agent of the KGB and Bulgarians.

Herman and Chomsky walk through the unraveling of this fantasy of the Reagan era political elites (pages xxvii-xxix), only to conclude:

The New York Times, which had been consistently supportive of the connection in both news and editorials, not only failed to report Weinstein’s negative findings from the search of the Bulgarian files, it also excluded Goodman’s statements on the CIA penetration of the Bulgarian secret services from their excerpts of his testimony. The Times had long maintained that the CIA and the Reagan administration “recoiled from the devastating implication that Bulgarian agents were bound to have acted only on a signal from Moscow.” 58 But Goodman’s and Ford’s testimony show that this was the reverse of the truth, and that CIA heads William Casey and Robert Gates overrode the views of CIA professionals and falsified evidence to support a Soviet linkage. The Times was not alone in following a misleading party line, but it is notable that this paper of record has yet to acknowledge its exceptional gullibility and propaganda service.


recoiled from the devastating implication that Bulgarian agents were bound to have acted only on a signal from Moscow

Does that sound similar to anything you have read recently or have heard repeated by the out-going US president?

December 11, 2016

Jump forward now to December 11, 2016 and you can read the New York Times reporting:

“This is why I hate the term ‘we speak truth to power,’” said Mark M. Lowenthal, a former senior C.I.A. analyst. “We don’t have truth. We have really good ideas.”

Mr. Lowenthal said that determining the motives of foreign leaders — in this case, what drove President Vladimir V. Putin of Russia to order the hacking — was one of the most important missions for C.I.A. analysts. In 2002, one of the critical failures of American spy agencies was their inability to understand Saddam Hussein’s goals and motives.

A simple search reveals the internet is replete with such trash talking by the CIA, DHS, FBI and an assorted of agencies that rearrange conclusions but offer no facts in support of those conclusions.

A Final Blow as 2016 Closes

With the same credibility I would accord the now discredited NYTimes fable about Russian backing for the attempt on the life of Pope John Paul II, hacking of the Democratic National Committee at the direction of Vladimir Putin, comes this final shot from Russian hackers:


Since the Islamic States hasn’t claimed credit, it must be those damned Russian hackers! (Caution: That is “fake news.” Carey may have been sabotaged by someone but it wasn’t Russian hackers.)

A Case For Topic Maps & Subject Identity Anyone?

I haven’t worked out the details but these repeated charades by the US government, among others, offer an opportunity to put subject identity as defined by topic maps to work for true journalists.

The particulars of any particular subject vary but they all have:

  1. Accusations sans evidence by one or more agencies of the US government
  2. Chest-thumping by the New York Times (and others) in both reporting (sic) and editorial columns
  3. Articles/editorials rely on unnamed government sources or financially interested contractors
  4. Months without any evidence but more chest-thumping by US government agencies and their familiars

When all four of those properties are found, you are at least part way to identifying yet another repetition of the attempted assassination of Pope John Paul II fable.

Although, quite honestly, it needs a catchier moniker than that one.


Historic American Newspapers (Bulk OCR Data Find!)

January 1st, 2017

Historic American Newspapers

From the webpage:

Search America’s historic newspaper pages from 1789-1924 or use the U.S. Newspaper Directory to find information about American newspapers published between 1690-present.

A total of 2,134 newspapers, digitized (images) and searchable. Some 11,520,159 pages for searching and review.

Quite a treasure trove for genealogy types, primary/secondary research papers, people trying to escape the smoothing influence over historical events by history books and others.

Did I mention the site has an API?

Or that it offers access to all of its OCR data in bulk?

It’s not “big data” in the sense of the astronomy community but creating sub-sets for local communities of “their papers” would have a certain cachet.