Global Muckraking: Investigative Journalism and Global Media – Starts February 8, 2017

January 21st, 2017

Global Muckraking: Investigative Journalism and Global Media by Anya Schiffrin. (Free Columbia University MOOC)

From the webpage:

About this course

Using examples of investigative and crusading journalism from Asia, Africa, Latin America and Europe, this course will help you understand how raising public awareness can create political and social change.

This course is a fast-paced introduction to global muckraking, past and present, and includes penetrating interviews with historians and investigative journalists.

Join us to discover the vital role that journalism has played in fighting injustice and wrongdoing over the last 100 years and delve into the current trends reshaping investigative reporting in the digital age.

What you’ll learn

  • How journalists can act as government and corporate watchdogs
  • The hard and soft pressures on investigative journalism
  • Stories of prominent reporters uncovering injustice from the late 19th century to today
  • Trends in media innovation

American Exceptionalism is mainstream in US journalism.

Consider the first line of this course description:

Using examples of investigative and crusading journalism from Asia, Africa, Latin America and Europe, ….

What? No mention of the class-based corruption, which is preferred by American policy makers over the “corrupt” quid-pro-quo corruption of other countries?

No mention of quid-pro-quo corruption in the US, which resulted in four of the last seven governors of Illinois going to jail. (As of 2012. It hasn’t been long enough to convict another governor of Illinois. Question of when, not if.)

Journalists in foreign countries deserve all the support they can be given.

At the same time, “injustice and wrongdoing” aren’t limited to “over there.”

Anyone who chooses to look, will see injustice and wrongdoing much closer to home.

That said, a history of investigative and crusading journalism may inspire you to take up the banner.

Enjoy!

Actionable Reporting – An Example

January 21st, 2017

Republican Lawmakers in Five States Propose Bills to Criminalize Peaceful Protests by Spencer Woodman.

I don’t mind prosy reporting but I should not be forced to recover information that was (or should have been known) to the reporter.

Quick summary of Woodman’s post: Iowa, imagined future law; Michigan, proposal that died last year; Minnesota, two pending bills; North Dakota, one pending bill, Washington, one pending bill. So, three states and not five.

The scattered links aren’t ones to help the reader track the current status of legislation, if it exists. Nor are the authors of these offenses against the common good identified.

Actionable reporting appends links to prose that enable readers to go beyond the text. In this case, links to legislatures, current bill status and authors of the legislation.

Here’s an actionable appendix for Woodmen’s post:

Iowa Legislature

Imagined future bill, “suck it up, buttercup bill,” to be proposed by Representative Bobby Kaufman.

Michigan Legislature

HOUSE BILL No. 4643 – An act to create a commission relative to labor disputes, and to prescribe its powers and duties; to provide for the mediation and arbitration of labor disputes, and the holding of elections thereon; to regulate the conduct of parties to labor disputes and to require the parties to follow certain procedures; to regulate and limit the right to strike and picket; to protect the rights and privileges of employees, including the right to organize and engage in lawful concerted activities; to protect the rights and privileges of employers; to make certain acts unlawful; to make appropriations; and to prescribe means of enforcement and penalties for violations of this act,” by amending section 9f (MCL 423.9f).

Referred to Senate Committee on Commerce – 12/8/2016 (died)

Authors:

Gary Glenn – (primary), Amanda Price, Michael McCready, Joseph Graves.

Unlike the Michigan legislature page, I substituted links to member webpages instead of bills they have sponsored. Interesting data on sponsorship but not helpful for contacting them. BTW, the link for Amanda Price is to her Wikipedia page. Doesn’t have a member page at the legislature.

Minnesota Legislature

Two bills:

  1. A bill for an act relating to public safety; increasing penalties for obstructing a highway; amending Minnesota Statutes 2016, sections 160.2715; 609.74.

    Authors:
    Lohmer; Fenton; Zerwas; Rarick; Miller; Runbeck; Albright; Green; Daudt; Lueck; Uglem; Dettmer; Daniels

  2. A bill for an act relating to public safety; creating the Minnesota Public Safety Personnel Protection Act; increasing penalties for obstructing emergency responders; amending Minnesota Statutes 2016, section 609.50.

    Authors:

    Garofalo; Newberger; Lohmer; Uglem.

North Dakota Legislature

HOUSE BILL NO. 1203 A BILL for an Act to create and enact section 32-03.2-02.2 of the North Dakota Century Code, relating to the liability exemption of a motor vehicle driver; and to amend and reenact section 39-10-33 of the North Dakota Century Code, relating to pedestrians on roadways. PDF text as introduced.

Authors: Representatives Kempenich, Brandenburg, Laning, Oliver, Rohr; Senators Cook, Schaible.

Washington Legislature

SB 5009 – 2017-18 Concerning offenses involving economic disruption.

Authors: Ericksen, Sheldon

Known as Preventing Economic Disruption Act (PEDA) in the 2017 legislative session.


Actionable reporting lowers the bar for readers to act on what they have read.

Trump Inauguration Police Tactics/Blockades – 10:30 AM EST

January 20th, 2017

Unicorn Riot is live streaming protests, including checkpoint blockades, from Washington, D.C.

An interesting variation on the police formation I detailed in Defeating Police Formations – Parallel Distributed Protesting, the police are breaching the blockade single file to create a path for people who want to attend the inauguration.

An odd reverse of the “surge and arrest” tactic to “surge and enable passage.”

The inauguration is still two hours out.

Join Unicorn Riot, Democracy Now! or one of the other live streams covering protests.

Personally I have no interest in the “official” ceremonies and will be skipping those.

PS: A tweet as of 35 minutes ago reports (unconfirmed) that 6 of 12 inauguration entrances have been completely shut down and traffic at others slowed to a “trickle.”

Why I Tweet by Donald Trump

January 19th, 2017

David Uberti and Pete Vernon in The coming storm for journalism under Trump capture why Donald Trump tweets:


As Trump explained the retention of his personal Twitter handle to the Sunday Times recently: “I thought I’d do less of it, but I’m covered so dishonestly by the press—so dishonestly—that I can put out Twitter…I can go bing bing bing and I just keep going and they put it on and as soon as I tweet it out—this morning on television, Fox: Donald Trump, we have breaking news.

In order for Trump tweets to become news, two things are required:

  1. Trump tweets (quite common)
  2. Media evaluates the tweets to be newsworthy (should be less common)

Reported as newsworthy tweets are unlikely to match the sheer volume of Trump’s tweeting.

You have all read:

trump-on-sat-night-460

Is Trump’s opinion, to which he is entitled, about Saturday Night Live newsworthy?

Trump on television is as trustworthy as the “semi-literate one-legged man” Dickens quoted for the title “Our Mutual Friend” is on English grammar. (Modern American Usage by William Follett, edited by Jacques Barzum. Under the entry for “mutual friend.”)

Other examples abound but suffice it to say the media needs to make its own judgments about newsworthy or not.

Otherwise the natters of another semi-literate become news by default for the next four years.

ScriptSource [Fonts but so much more]

January 19th, 2017

ScriptSource

From the about page:

ScriptSource is a dynamic, collaborative reference to the writing systems of the world, with detailed information on scripts, characters, languages – and the remaining needs for supporting them in the computing realm. It is sponsored, developed and maintained by SIL International. It currently contains only a skeleton of information, and so depends on your participation in order to grow and assist others.

The need for information on Writing Systems

In today’s expanding global community, designers, linguists and computer professionals are called upon more frequently to support the myriad writing systems around the world. A key to this development is consistent, trustworthy, complete and organised information on the alphabets and scripts used to write the world’s languages. The development of Writing System Implementations (WSIs) depends on the availability of this information, so a lack of it can hinder the cultural, economic and intellectual development of communities that communicate in minority languages and scripts.

ssctypes

The information needed varies widely, and can include:

  • Design information and guidelines – both for alphabets and for specific letters/glyphs
  • Linguistic information – how the script is used for specific languages
  • Encoding details – particularly Unicode, including new Unicode proposals
  • Script behaviour – how letters change shape and position in context
  • Keyboarding conventions – including information on data entry tools
  • Testing tools and sample texts – so developers can test their software, fonts, keyboards

Some of this information is available, but is scattered around among a variety of web sites that have different purposes and structures, and often lies undocumented in the minds of individual script experts, or hidden in library books.

This information is also often segregated by audience. A font designer may be frustrated to find that available resources on a script address the spoken/written language relationship, but not the background and visual rules of the letterforms. A linguist may find information on encoding the script – such as the information in The Unicode Standard – but not important details of which languages use which symbols. An application developer may find a long writeup on the development and use of the script, but nothing to tell them what script behaviours are required.

There are also relatively few opportunities for experts from these fields to cooperate and work together. What interaction does exist often happens at conferences, on various mailing lists and forums, and through personal email. There are few experts who have the time to participate in these exchanges, and those that do may be frustrated to find that the same questions keep coming up again and again. Until now, there has been no place where this knowledge can be captured, organised and maintained.

The purpose of ScriptSource

ScriptSource exists to provide this information and bridge the gap between the designer, developer, linguist and user. It seeks to document the writing systems of the world and help those wanting to implement them on computers and other devices.

The initial content is relatively sparse, but includes basic information on all scripts in the ISO 15924 standard. It will grow dynamically through public submissions, expert content development and live linkages with other web sites. Rather than being just another web site about writing systems, ScriptSource provides a single hub of information where both old and new content can be found.

A truly remarkable resource on writing systems by SIL International.

You can think of ScriptSource as a way to locate fonts, but you may be drawn into complexities others rarely see!

Enjoy!

Permitted Trump Protesters Will Be Ignored

January 19th, 2017

I wish my headline was some of the “fake news” Democrats complain about but Alexandra Rosemann proves the truth of that headline in:

Ignoring anti-Trumpers: Why we can expect media blackout of protests against Trump’s inauguration.

Not ignored by just anybody, ignored by the media.

On Jan. 20 — 16 years ago — thousands of protesters lined the inauguration parade route of the incoming Republican president. “Not my president,” they chanted. But despite the enormity of the rally, it was largely ignored. Instead, pundits marveled over how George W. Bush “filled out the suit” and confirmed authority.

“The inauguration of George W. Bush was certainly a spectacle on Inauguration Day,” marvels Robin Andersen, the director of Peace and Justice studies at Fordham University, in the 2001 short documentary “Not My President: Voices From the Counter Coup.”

It’s nearly impossible not to anticipate the eerie parallels between George W. Bush’s inauguration and that of Donald Trump.

“Forty percent of the public still believed that Bush had not been legitimately elected, yet there’s almost no discussion of these electoral problems or the constitutional crisis,” Andersen explains in the film. “Instead, Bush undergoes a kind of transformation where he fills out the suit and becomes a leader. Forgotten are any of the questions about his ability, his experience or his mangling of the English language. His transformation is almost magical,” she adds.

Andersen estimated the inauguration protests, which occurred throughout the country, garnered approximately 10 minutes of total coverage on all the major networks.

“When we did see images of protesters, there was no explanation as to why. We were asked to be passive spectators in this ritual of legitimation when the real democratic issues that should have been being discussed were ignored,” Andersen says in the film, reflecting on the “real democracy” in the streets of Washington, D.C.

Your choice. Ten minutes of coverage out of over 24 hours of permitted protesting, or the media covering a 24 hour blockade of the DC Beltway.

fox5dc-map-460

Which one do you think draws more attention to your issues?

A new president will be inaugurated on January 20, 2017, but its your choice whether its him, his wife and a few cronies in attendance or hundreds of thousands.

See protests for more ideas on that possibility.

Empirical Analysis Of Social Media

January 19th, 2017

How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, not Engaged Argument by Gary King, Jennifer Pan, and Margaret E. Roberts. American Political Science Review, 2017. (Supplementary Appendix)

Abstract:

The Chinese government has long been suspected of hiring as many as 2,000,000 people to surreptitiously insert huge numbers of pseudonymous and other deceptive writings into the stream of real social media posts, as if they were the genuine opinions of ordinary people. Many academics, and most journalists and activists, claim that these so-called “50c party” posts vociferously argue for the government’s side in political and policy debates. As we show, this is also true of the vast majority of posts openly accused on social media of being 50c. Yet, almost no systematic empirical evidence exists for this claim, or, more importantly, for the Chinese regime’s strategic objective in pursuing this activity. In the first large scale empirical analysis of this operation, we show how to identify the secretive authors of these posts, the posts written by them, and their content. We estimate that the government fabricates and posts about 448 million social media comments a year. In contrast to prior claims, we show that the Chinese regime’s strategy is to avoid arguing with skeptics of the party and the government, and to not even discuss controversial issues. We infer that the goal of this massive secretive operation is instead to regularly distract the public and change the subject, as most of the these posts involve cheerleading for China, the revolutionary history of the Communist Party, or other symbols of the regime. We discuss how these results fit with what is known about the Chinese censorship program, and suggest how they may change our broader theoretical understanding of “common knowledge” and information control in authoritarian regimes.

I differ from the authors on some of their conclusions but this is an excellent example of empirical as opposed to wishful analysis of social media.

Wishful analysis of social media includes the farcical claims that social media is an effective recruitment tool for terrorists. Too often claimed to dignify with a citation but never with empirical evidence, only an author’s repetition of the common “wisdom.”

In contrast, King et al. are careful to say what their analysis does and does not support, finding in a number of cases, the evidence contradicts commonly held thinking about the role of the Chinese government in social media.

One example I found telling was the lack of evidence that anyone is paid for pro-government social media comments.

In the authors’ words:


We also found no evidence that 50c party members were actually paid fifty cents or any other piecemeal amount. Indeed, no evidence exists that the authors of 50c posts are even paid extra for this work. We cannot be sure of current practices in the absence of evidence but, given that they already hold government and Chinese Communist Party (CCP) jobs, we would guess this activity is a requirement of their existing job or at least rewarded in performance reviews.
… (at pages 10-11)

Here I differ from the author’s “guess”

…this activity is a requirement of their existing job or at least rewarded in performance reviews.

Kudos to the authors for labeling this a “guess,” although one expects the mainstream press and members of Congress to take it as written in stone.

However, the authors presume positive posts about the government of China can only result from direct orders or pressure from superiors.

That’s a major weakness in this paper and similar analysis of social media postings.

The simpler explanation of pro-government posts is a poster is reporting the world as they see it. (Think Occam’s Razor.)

As for sharing them with the so-called “propaganda office,” perhaps they are attempting to curry favor. The small number of posters makes it difficult to credit their motives (unknown) and behavior (partially known) as representative for the estimated 2 million posters.

Moreover, out of a population that nears 1.4 billion, the existence of 2 million individuals with a positive view of the government isn’t difficult to credit.

This is an excellent paper that will repay a close reading several times over.

Take it also as a warning about ideologically based assumptions that can mar or even invalidate otherwise excellent empirical work.

PS:

Additional reading:

From the Gary King’s webpage on the article:

This paper follows up on our articles in Science, “Reverse-Engineering Censorship In China: Randomized Experimentation And Participant Observation”, and the American Political Science Review, “How Censorship In China Allows Government Criticism But Silences Collective Expression”.

GNU Unifont Glyphs [Good News/Bad News]

January 19th, 2017

GNU Unifont Glyphs 9.0.06.

From the webpage:

GNU Unifont is part of the GNU Project. This page contains the latest release of GNU Unifont, with glyphs for every printable code point in the Unicode 9.0 Basic Multilingual Plane (BMP). The BMP occupies the first 65,536 code points of the Unicode space, denoted as U+0000..U+FFFF. There is also growing coverage of the Supplemental Multilingual Plane (SMP), in the range U+010000..U+01FFFF, and of Michael Everson’s ConScript Unicode Registry (CSUR).
… (red highlight in original)

That’s the good news.

The bad news is shown by the coverage mapping:

0.0%  U+012000..U+0123FF  Cuneiform*
0.0%  U+012400..U+01247F  Cuneiform Numbers and Punctuation*
0.0%  U+012480..U+01254F  Early Dynastic Cuneiform*
0.0%  U+013000..U+01342F  Egyptian Hieroglyphs*
0.0%  U+014400..U+01467F  Anatolian Hieroglyphs*

These scripts will require a 32-by-32 pixel grid:

*Note: Scripts such as Cuneiform, Egyptian Hieroglyphs, and Bamum Supplement will not be drawn on a 16-by-16 pixel grid. There are plans to draw these scripts on a 32-by-32 pixel grid in the future.

One additional resource on creating cuneiform fonts:

Creating cuneiform fonts with MetaType1 and FontForge by Karel Píška:

Abstract:

A cuneiform font collection covering Akkadian, Ugaritic and Old Persian glyph subsets (about 600 signs) has been produced in two steps. With MetaType1 we generate intermediate Type 1 fonts, and then construct OpenType fonts using FontForge. We describe cuneiform design and the process of font development.

On creating fonts more generally with FontForge, see: Design With FontForge.

Enjoy!

Do You Have Big Brass Ones*? FOIA The President

January 18th, 2017

Join our project to FOIA the Trump administration by Michael Morisy.

From the post:

Since June 2015, MuckRock users have been filing FOIA requests regarding a possible Trump presidency. In fact, so far there’s been over 160 public Trump-related requests filed through the site, all of which you can browse here.

We’ve also put together a number of guides and articles on the upcoming administration, ranging from what you can and can’t file regarding Trump to deep dives into what’s already out there:

We’ve launched a new project page for users to showcase their requests, find new documents regarding the Trump administration, or get inspiration for their own requests, and we’ve created a special Slack channel for you to join in and strategize on future requests, or help share big league FOIA stories that shed light on the President Elect’s team.

We’ve had a few users join us there already and they’ve helped file some really fun requests, so we’re excited about what else the transparency community can come up with.

An effort worthy of both your time and support!

One answered, remember that availability isn’t the same thing as meaningful access.

OCR, indexing, entity extraction, in short any skill you have is important in this effort.

* No longer a gender specific reference as you well know.

PS: I’ve signed up and need suggestions on what to ask for? Suggestions?

The CIA’s Secret History Is Now Online [Indexing, Mapping, NLP Anyone?]

January 18th, 2017

The CIA’s Secret History Is Now Online by Jason Leopold.

From the post:

Decades ago, the CIA declassified a 26-page secret document cryptically titled “clarifying statement to Fidel Castro concerning assassination.”

It was a step toward greater transparency for one of the most secretive of all federal agencies. But to find out what the document actually said, you had to trek to the National Archives in College Park, Maryland, between the hours of 9 a.m. and 4:30 p.m. and hope that one of only four computers designated by the CIA to access its archives would be available.

But today the CIA posted the Castro record on its website along with more than 12 million pages of the agency’s other declassified documents that have eluded the public, journalists, and historians for nearly two decades. You can view the documents here.

The title of the Castro document, as it turns out, was far more interesting than the contents. It includes a partial transcript of a 1977 transcript between Barbara Walters and Fidel Castro in which she asked the late Cuban dictator whether he had “proof” of the CIA’s last attempt to assassinate him. The transcript was sent to Adm. Stansfield Turner, the CIA director at the time, by a public affairs official at the agency with a note highlighting all references to CIA.

But that’s just one of the millions documents, which date from the 1940s to 1990s, are wide-ranging, covering everything from Nazi war crimes to mind-control experiments to the role the CIA played in overthrowing governments in Chile and Iran. There are also secret documents about a telepathy and precognition program known as Star Gate, files the CIA kept on certain media publications, such as Mother Jones, photographs, more than 100,000 pages of internal intelligence bulletins, policy papers, and memos written by former CIA directors.

Michael Best, @NatSecGeek has pointed out the “CIA de-OCRed at least some of the CREST files before they uploaded them.”

Spy agency class petty. Grant public access but force the restoration of text search.

The restoration of text search work is underway so next steps will be indexing, NLP, mapping, etc.

A great set of documents to get ready for future official and unofficial leaks of CIA documents.

Enjoy!

PS: Curious if any of the search engine vendors will use CREST as demonstration data? Non-trivial size, interesting search issues, etc.

Ask at the next search conference.

Resistance Manual / Indivisible

January 18th, 2017

Resistance Manual

An essential reference for the volatile politics of the Trump presidency.

Indivisible

Four former congressional staffers banded together to write: “A practical guide to resisting the Trump Agenda.”

Both are shaped by confidence in current political and social mechanisms, to say nothing of a faith in non-violence.

Education is seen as the key to curing bigotry/prejudice and moving towards a more just society.

You will not find links to:

Steal this Book or the Anarchist Cookbook, 2000 edition for example.

There are numerous examples cited as “successful” non-violent protests. The elimination of de jure segregation in the American South. (Resource includes oral histories of the time.)

But, de facto segregation in schools is larger than it was in the 1960’s.

How do you figure that into the “success” of non-violent protests?

Read both Resistance Manual and Indivisible for what may be effective techniques.

But ask yourself, do non-violent protests comfort the victims of violence?

Or just the non-violent protesters?

Quantum Computer Resistant Encryption

January 18th, 2017

Irish Teen Introduces New Encryption System Resistant to Quantum Computers by Joseph Young.

From the post:


… a 16-year-old student was named as Ireland’s top young scientist and technologist of 2017, after demonstrating the application of qCrypt, which offers higher levels of protection, privacy and encryption in comparison to other innovative and widely-used cryptographic systems.

BT Young Scientist Judge John Dunnion, the associate professor at University of College Dublin, praised Curran’s project that foresaw the impact quantum computing will have on current cryptographic and encryption methods.

“qCrypt is a novel distributed data storage system that provides greater protection for user data than is currently available. It addresses a number of shortfalls of current data encryption systems; in particular, the algorithm used in the system has been demonstrated to be resistant to attacks by quantum computers in the future,” said Dunnion.

While it may be too early to predict whether technologies like qCrypt can protect existing encryption methods and data protection systems from quantum computers, Curran and the judges of the competition saw promising potential in the technology.

Word is spreading rapidly.

qCrypt has a place-holder website, Post-Quantum Cryptography for the Masses.

A Youtube video:

Shane’s Github repository (no qCrypt, yet)

Not to mention Shane’s website.

qCrypt has the potential to provide safety from government surveillance for everyone, everywhere.

Looking forward to this!

Top considerations for creating bioinformatics software documentation

January 18th, 2017

Top considerations for creating bioinformatics software documentation by Mehran Karimzadeh and Michael M. Hoffman.

Abstract

Investing in documenting your bioinformatics software well can increase its impact and save your time. To maximize the effectiveness of your documentation, we suggest following a few guidelines we propose here. We recommend providing multiple avenues for users to use your research software, including a navigable HTML interface with a quick start, useful help messages with detailed explanation and thorough examples for each feature of your software. By following these guidelines, you can assure that your hard work maximally benefits yourself and others.

Introduction

You have written a new software package far superior to any existing method. You submit a paper describing it to a prestigious journal, but it is rejected after Reviewer 3 complains they cannot get it to work. Eventually, a less exacting journal publishes the paper, but you never get as many citations as you expected. Meanwhile, there is not even a single day when you are not inundated by emails asking very simple questions about using your software. Your years of work on this method have not only failed to reap the dividends you expected, but have become an active irritation. And you could have avoided all of this by writing effective documentation in the first place.

Academic bioinformatics curricula rarely train students in documentation. Many bioinformatics software packages lack sufficient documentation. Developers often prefer spending their time elsewhere. In practice, this time is often borrowed, and by ducking work to document their software now, developers accumulate ‘documentation debt’. Later, they must pay off this debt, spending even more time answering user questions than they might have by creating good documentation in the first place. Of course, when confronted with inadequate documentation, some users will simply give up, reducing the impact of the developer’s work.
… (emphasis in original)

Take to heart the authors’ observation on automatic generation of documentation:


The main disadvantage of automatically generated documentation is that you have less control of how to organize the documentation effectively. Whether you used a documentation generator or not, however, there are several advantages to an HTML web site compared with a PDF document. Search engines will more reliably index HTML web pages. In addition, users can more easily navigate the structure of a web page, jumping directly to the information they need.

I would replace “…less control…” with “…virtually no meaningful control…” over the organization of the documentation.

Think about it for a second. You write short comments, sometimes even incomplete sentences as thoughts occur to you in a code or data context.

An automated tool gathers those comments, even incomplete sentences, rips them out of their original context and strings them one after the other.

Do you think that provides a meaningful narrative flow for any reader? Including yourself?

Your documentation doesn’t have to be great literature but as Karimzadeh and Hoffman point out, good documentation can make the difference between use and adoption and your hard work being ignored.

Ping me if you want to take your documentation to the next level.

Online tracking: A 1-million-site measurement and analysis [Leaving False Trails]

January 17th, 2017

Online tracking: A 1-million-site measurement and analysis by Steven Englehardt and Arvind Narayanan.

From the webpage:

Tracking Results

During our January 2016 measurement of the top 1 million sites, our tool made over 90 million requests, assembling the largest dataset (to our knowledge) used for studying web tracking. With this scale we can answer many web tracking questions: Who are the largest trackers? Which sites embed the largest number of trackers? Which tracking technologies are used, and who is using them? and many more.

Findings

The total number of third parties present on at least two first parties is over 81,000, but the prevalence quickly drops off. Only 123 of these 81,000 are present on more than 1% of sites. This suggests that the number of third parties that a regular user will encounter on a daily basis is relatively small. The effect is accentuated when we consider that different third parties may be owned by the same entity. All of the top 5 third parties, as well as 12 of the top 20, are Google-owned domains. In fact, Google, Facebook, and Twitter are the only third-party entities present on more than 10% of sites.
… (emphasis in original)

Impressive research based upon an impressive tool, OpenWPM.

The Github page for OpenWPM reads in part:

OpenWPM is a web privacy measurement framework which makes it easy to collect data for privacy studies on a scale of thousands to millions of site. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection, including a proxy, a Firefox extension, and access to Flash cookies. Check out the instrumentation section below for more details.

Just a point of view but I’m more interested in specific privacy tracking data for some given set of servers than general privacy statistics.

Specific privacy tracking data that enables planning the use of remote browsers to leave false trails.

Kudos to the project, however you choose to use the software.

The Political Librarian (volume 2, issue 2)

January 17th, 2017

The Political Librarian

From the webpage:

The Political Librarian is dedicated to expanding the discussion of, promoting research on, and helping to re-envision locally focused advocacy, policy, and funding issues for libraries.

We want to bring in a variety of perspectives to the journal and do not limit our contributors to just those working in the field of library and information science. We seek submissions from researchers, practitioners, community members, or others dedicated to furthering the discussion, promoting research, and helping to re-envision tax policy and public policy on the extremely local level.

Grab the entire volume 2, issue 2 (December 2016) for reading while stopped on the DC Beltway, January 20, 2017.

Libraries need your help to survive and prosper during the rapidly approaching winter of ignorance.

#DisruptJ20 – 3 inch resolution aerial imagery Washington, DC @J20protests

January 17th, 2017

3 inch imagery resolution for Washington, DC by Jacques Tardie.

From the post:

We updated our basemap in Washington, DC with aerial imagery at 3 inch (7.5 cm) resolution. The source data is openly licensed by DC.gov, thanks to the District’s open data initiative.

If you aren’t familiar with Mapbox, there is no time like the present!

If you are interested in the just the 3 inch resolution aerial imagery, see: http://opendata.dc.gov/datasets?keyword=imagery.

Enjoy!

Raw SIGINT Locations Expanded

January 17th, 2017

President Obama has issued new rules for sharing information under Executive Order 12333, with the ungainly title: (U) Procedures for the Availability or Dissemination of Raw Signals Intelligence Information by the National Security Agency Under Section 2.3 of Executive Order 12333 (Raw SIGINT Availability Procedures).

Kate Tummarello, in Obama Expands Surveillance Powers On His Way Out by Kate Tummarello, sees a threat to “innocent persons:”

With mere days left before President-elect Donald Trump takes the White House, President Barack Obama’s administration just finalized rules to make it easier for the nation’s intelligence agencies to share unfiltered information about innocent people.

New rules issued by the Obama administration under Executive Order 12333 will let the NSA—which collects information under that authority with little oversight, transparency, or concern for privacy—share the raw streams of communications it intercepts directly with agencies including the FBI, the DEA, and the Department of Homeland Security, according to a report today by the New York Times.

That’s a huge and troubling shift in the way those intelligence agencies receive information collected by the NSA. Domestic agencies like the FBI are subject to more privacy protections, including warrant requirements. Previously, the NSA shared data with these agencies only after it had screened the data, filtering out unnecessary personal information, including about innocent people whose communications were swept up the NSA’s massive surveillance operations.

As the New York Times put it, with the new rules, the government claims to be “reducing the risk that the N.S.A. will fail to recognize that a piece of information would be valuable to another agency, but increasing the risk that officials will see private information about innocent people.”

All of which is true, but the new rules have other impacts as well.

Who is an “IC element?”

The new rules make numerous references to an “IC element,” but comes up short in defining them:

L. (U) IC element is as defined in section 3.5(h) of E.O. 12333.
(emphasis in original)

Great.

Searching for E.O. 12333 isn’t enough. You need Executive Order 12333 United States Intelligence Activities (As amended by Executive Orders 13284 (2003), 13355 (2004) and 13470 (2008)). The National Archives version of Executive Order 12333 is not amended and hence is misleading.

From the amended E.0. 12333:

3.5 (h) Intelligence Community and elements of the Intelligence Community 
        refers to:
(1) The Office of the Director of National Intelligence;
(2) The Central Intelligence Agency;
(3) The National Security Agency;
(4) The Defense Intelligence Agency;
(5) The National Geospatial-Intelligence Agency;
(6) The National Reconnaissance Office; 
(7) The other offices within the Department of Defense for the collection of 
    specialized national foreign intelligence through reconnaissance programs;
(8) The intelligence and counterintelligence elements of the Army, the Navy,
    the Air Force, and the Marine Corps;
(9) The intelligence elements of the Federal Bureau of Investigation;
(10) The Office of National Security Intelligence of the Drug Enforcement
     Administration;
(11) The Office of Intelligence and Counterintelligence of the Department
      of Energy;
(12) The Bureau of Intelligence and Research of the Department of State;
(13) The Office of Intelligence and Analysis of the Department of the Treasury;
(14) The Office of Intelligence and Analysis of the Department of Homeland 
     Security;
(15) The intelligence and counterintelligence elements of the Coast Guard; and
(16) Such other elements of any department or agency as may be designated by 
     the President, or designated jointly by the Director and the head of the 
     department or agency concerned, as an element of the Intelligence Community. 

The Office of the Director of National Intelligence has an incomplete list of IC elements:

Air Force Intelligence Defense Intelligence Agency Department of the Treasury National Geospatial-Intelligence Agency
Army Intelligence Department of Energy Drug Enforcement Administration National Reconnaissance Office
Central Intelligence Agency Department of Homeland Security Federal Bureau of Investigation National Security Agency
Coast Guard Intelligence Department of State Marine Corps Intelligence Navy Intelligence

I say “incomplete” because from E.O. 12333, it is missing (with original numbers for reference):

...
(7) The other offices within the Department of Defense for the collection of 
    specialized national foreign intelligence through reconnaissance programs;
(8) The intelligence and counterintelligence elements of ..., and the 
    Marine Corps;
...
(16) Such other elements of any department or agency as may be designated by 
     the President, or designated jointly by the Director and the head of the 
     department or agency concerned, as an element of the Intelligence Community.

Under #7 and #16, there are other IC elements that are unnamed and unlisted by the Office of the DOI. I suspect the Marines were omitted for stylistic reasons.

Where to Find Raw SIGINT?

Identified IC elements are important because the potential presence of “Raw SIGINT,” beyond the NSA, has increased their value as targets.

P. (U) Raw SIGINT is any SIGINT and associated data that has not been evaluated for foreign intelligence purposes and/or minimized.
… (emphasis in original, from the new rules.)

Tummarello is justly concerned about “innocent people” but there are less than innocent people, any number of appointed/elected official or barons of industry who may be captured on the flypaper of raw SIGINT.

Happy hunting!

PS:

Warning: It’s very bad OPSEC to keep a trophy chart on your wall. ;-)

IC_Circle-460

You will, despite this warning, but I had to try.

The original image is here at Wikipedia.

Never Allow Your Self-Worth To Depend Upon A Narcissist

January 16th, 2017

The White House press corps has failed, again, in its relationship with President Trump.

The latest debacle is described in Defiant WH Press Corps “won’t go away” if ejected, says Major Garrett.

From the post:

There have been rumblings about kicking the press out of the White House almost since Donald Trump won the presidency, culminating with a report in Esquire last week that the Trump administration has in fact been giving the idea “serious consideration.”

“If they do so, we’ll still cover him. The White House press corps won’t go away,” CBS News Chief White House Correspondent Major Garrett told CBSN’s Josh Elliott Monday. “You can shove us a block away, two blocks away, a mile away. We will be on top of this White House — as we’ve been on top of every White House.”

Mr. Trump and several on his communications team have had a stormy relationship with the press, both during his presidential campaign and during his transition.

“I would not be surprised if they moved us out. I really do think there is something about the Trump administration and those closest to him who want the symbolism of driving reporters out of the White House, moving the elites out farther away from this president,” Garrett said.

Does the self-worth of the White House press corps depend upon where they are located by a known narcissist?

If so, they are in for a long four years.

That is doubly true for Trump’s denigration of reporters and others.

A fundamental truth to remember for the next four years:

Trump’s comments about you, favorable or unfavorable, are smelly noise. They will dissipate, unless repeated over and over, as though it matters if a narcissist denies or affirms your existence.

It doesn’t.

XML.com Relaunch!

January 16th, 2017

XML.com

Lauren Wood posted this note about the relaunch of XML.com recently:

I’ve relaunched XML.com (for some background, Tim Bray wrote an article here: https://www.xml.com/articles/2017/01/01/xmlcom-redux/). I’m hoping it will become part of the community again, somewhere for people to post their news (submit your news here: https://www.xml.com/news/submit-news-item/) and articles (see the guidelines at https://www.xml.com/about/contribute/). I added a job board to the site as well (if you’re in Berlin, Germany, or able to
move there, look at the job currently posted; thanks LambdaWerk!); if your employer might want to post XML-related jobs please email me.

The old content should mostly be available but some articles were previously available at two (or more) locations and may now only be at one; try the archive list (https://www.xml.com/pub/a/archive/) if you’re looking for something. Please let me know if something major is missing from the archives.

XML is used in a lot of areas, and there is a wealth of knowledge in this community. If you’d like to write an article, send me your ideas. If you have comments on the site, let me know that as well.

Just in time as President Trump is about to stir, vigorously, that big pot of crazy known as federal data.

Mapping, processing, transformation demands will grow at an exponential rate.

Notice the emphasis on demand.

Taking a two weeks to write custom software to sort files (you know the Weiner/Abedin laptop story, yes?) won’t be acceptable quite soon.

How are your on-demand XML chops?

Defeating Police Formations – Parallel Distributed Protesting

January 16th, 2017

If you haven’t read FEMA’s Field Force Operations PER-200, then you are unprepared for #DisruptJ20 or any other serious protest effort.

It’s a real snore in parts, but knowing police tactics will:

  1. Eliminate the element of surprise and fear of the unexpected
  2. Enable planning of protective clothing and other measures
  3. Enable planning of protests to eliminate police advantages
  4. Enable protesters to respond with their own formations

among other things.

On Common Police Formation

While reading Field Force Operations PER-200, I encountered several police formations you are likely to see at #DisruptJ20.

The crossbow arrest formation is found at pages 48-49 and illustrated with:

cross-bow-01-460

cross-bow-02-460

cross-bow-03-460

A number of counter tactics suggest themselves, depending upon your views on non-violence. Passive resistance by anyone who is arrested, thereby consuming more police personnel to secure their arrest. Passively prevented the retreat of the arrest team and its security circle. Breaching the skirmish line on either side of the column, just before the column surges forward, exposing the flank of the column.

Requirements for the crossbow arrest formation

What does the crossbow arrest formation require more than anything else?

You peeked! ;-)

Yes, the police formations in Field Force Operations PER-200, including the crossbow arrest formation all require a crowd.

Don’t get me wrong, crowds can be a good thing and sometimes the only solution. Standing Rock is a great example of taking and holding a location against all odds.

But a great tactic for one protest and its goals, may be a poor tactic for another protest, depending upon goals, available tactics, resources, etc.

Consider the planned and permitted protests for #DisruptJ20.

All are subject to the police formation detailed by FEMA and the use of “less lethal” force by police forces.

How can #DisruptJ20 demonstrate the anger of the average citizen and at the same time defeat police formations?

Parallel Distributed Protesting

Instead of massing in a crowd, where police formations and “less lethal” force are options, what if protesters stopped, ran out of gas, had flat tires on the 64-mile DC Beltway.

I mention the length of the Beltway, 64 miles, because it is ten miles longer than marches from Montgomery to Selma, Alabama. You may remember one of those marches, it’s documented at The incident at the Edmund Pettus Bridge.

On March 7, 195, Representative John Lewis, Hosea Williams and other protesters marched across the Pettus bridge knowing that brutality and perhaps death awaited them.

Protesters who honor Lewis, Williams and other great civil rights leaders can engage in parallel distributed protesting on January 20, 2017.

Each car slowing, stopping, having a flat tire, is a distributed protest point. With distributed protest points occurring in parallel, the Beltway grinds to a halt. No one enters or leaves Washington, D.C. for a day.

Not the same as the footage from the Pettus Bridge, but shutting down the D.C. Beltway will be a news story for months and years to come.

fox5dc-map-460

Lewis, Williams and others were willing to march into the face violence and evil, are you willing to drive to the D.C. Beltway to stop, run out of gas or have a flat tire in their honor?

PS: Beltway blockaders should always be respectful of police officers. They probably don’t like what is happening any more than you do. Besides, their police cruisers are also blocking traffic so their presence is contributing to the gridlock as well.

Password Advice For Leakers

January 16th, 2017

What the Most Common Passwords of 2016 List Reveals [Research Study] by Keeper.

As a prospective leaker, if your password is any of the ones listed below, congratulations! (“123456″ leads with 17%.)

Your password is in the top 50% of 10 million passwords analyzed by Keeper in 2016.

Extremely plausible evil hackers “discovered” your login and then “cracked” your password.

No longer a “leak,” but a theft and the thief isn’t you. (How’s that for protecting leakers?)

Rank Password
1. 123456
2. 123456789
3. qwerty
4. 12345678
5. 11111
6. 1234567890
7. 1234567
8. password
9. 123123
10. 987654321
11. qwertyuiop
12. mynoob
13. 123321
14. 666666
15. 18atcsk2w
16. 7777777
17. 1q2w3e4r
18. 654321
19. 555555
20. 3rjs1la7qe
21. google
22. 1q2w3e4r5t
23. 123qwe
24. zxcvbnm
25. 1q2w3e

You must follow the leaking instructions at: https://theintercept.com/leak/, but leak only your login, password and network URL.

No guarantees that The Intercept will take the initiative but they aren’t the only game in town.

Highly Effective Gmail Phishing

January 16th, 2017

Wide Impact: Highly Effective Gmail Phishing Technique Being Exploited by Mark Maunder.

From the post:

As you know, at Wordfence we occasionally send out alerts about security issues outside of the WordPress universe that are urgent and have a wide impact on our customers and readers. Unfortunately this is one of those alerts. There is a highly effective phishing technique stealing login credentials that is having a wide impact, even on experienced technical users.

I have written this post to be as easy to read and understand as possible. I deliberately left out technical details and focused on what you need to know to protect yourself against this phishing attack and other attacks like it in the hope of getting the word out, particularly among less technical users. Please share this once you have read it to help create awareness and protect the community.

Mark’s omission of the “technical details” makes this more of an advertisement for phishing with Gmail than a how-to guide.

Still, the observation that even “experienced technical users” are trapped by this technique should encourage journalists in particular to consider adding phishing, voluntary or otherwise to their data gathering toolkit.

As I pointed out yesterday, Phishing As A Public Service – Leak Access, Not Data, enabling leakers to choose to receive phishing emails can result in greater access to documents by reporters at less risk to leakers.

With the daily hype about data breaches, who can blame some mid-level management type for their computer being breached? Oh, it could result in loss of employment, maybe, but greatly reduces the odds of being fingered as a leaker.

Unlike plain brown paper wrappers with Glenn Greenwald‘s address on them. ;-)

If phishing sounds a bit exotic, consider listing software/versions with known vulnerabilities that users can install and then visit a website for an innocent registration that captures their details.

Journalism as active information gathering as opposed to consuming leaks and government hand-outs.

Phishing As A Public Service – Leak Access, Not Data

January 15th, 2017

The Intercept tweeted today:

intercept-460

Kudos to The Intercept for reaching out to (US) federal employees to encourage safe leaking.

On the other hand, have you thought about the allocation of risks for leaking?

Take Edward Snowden for example. If caught, Snowden is going to jail, NOT Glenn Greenwald or other reporters who used the Snowden leak.

The Intercept has a valid point when it says:


Without leaks, journalists would have never connected the Watergate scandal to President Nixon, or discovered that the Reagan White House illegally sold weapons to Iran. In the past 15 years alone, inside sources played a vital role in uncovering secret prisons, abuses at Abu Ghraib, atrocities in Afghanistan and Iraq, and mass surveillance by the NSA.

At least historically speaking. Back in the days when hard copy was the norm.

Hard copy isn’t the norm now and leaking guidelines need to catch up to the present day.

Someone could have leaked a portion of the Office of Personnel Management records but in a modern age, digital was far more powerful. (That was a straight hack but it illustrates the difference between sweaty smuggling of hard copy versus giving others the key to a vault.)

If instead of leaking documents/data, imagine following these instructions:

The best option is to use our SecureDrop server, which has the advantage of allowing us to send messages back to you, while allowing you to remain totally anonymous — even to us, if that is what you prefer.

  • Begin by bringing your personal computer to a Wi-Fi network that isn’t associated with you or your employer, like one at a coffee shop. Download the Tor Browser. (Tor allows you to go online while concealing your IP address from the websites you visit.)
  • You can access our SecureDrop server by going to http://y6xjgkgwj47us5ca.onion/ in the Tor Browser. This is a special kind of URL that only works in Tor. Do NOT type this URL into a non-Tor Browser. It won’t work — and it will leave a record.
  • If that is too complicated, or you don’t wish to engage in back-and-forth communication with us, a perfectly good alternative is to simply send mail to P.O. Box 65679, Washington, D.C., 20035, or to The Intercept, 114 Fifth Avenue, 18th Floor, New York, New York, 10011. Drop it in a mailbox (do not send it from home, work or a post office) with no return address.

And you send the following:

  1. Your email address
  2. Screen shots of legitimate emails you get on a regular basis
  3. What passwords are the most important

That’s it.

The receiver constructs a phishing email and sends it to your address.

Like John Podesta and numerous other public figures, you are taken in by this scam.

Evil doers use your present password for access and you have system recorded evidence that you were duped.

How does that allocation of risk look to you, as a potential leaker?

PS: Some, but not all, journalists will be quick to point out what I suggest is, drum roll, illegal. OK, and the question?

Those journalists are being very brave on behalf of leakers, knowing they will never share the fate of a leaker.

I make an exception for all the very brave journalists writing outside of the United States and a few other areas at great personal risk. But then they are unlikely to be concerned with the niceties of the law when dealing with a rogue government.

Update: Apologies but I forgot to include a link to the original post: Attention Federal Employees: If You See Something, Leak Something.

Thoughts on Blockading Metro Rail Stops

January 15th, 2017

A recent news report mentioned the potential for blockades of DC Metro Rail stops.

Curbed Washington DC posted a list of those stops, but like many reporters, did not provide links to the stops.

:-(

Here’s their list:

metro-stops-460

Metro Stops with Hyperlinks

Here’s my version, in the same ticket color order:

Presented as the original, the list leaves the impression of more Metro stops than require blockading. Here is “apparent” count of Metro Stops is twelve (12).

Discovering Duplicate Metro Rail Stops

Rearrangement by Metro Rail stops reveals duplicates:

Deduped Metro Stops and Priority Map

If we remove the duplicate stops and sort by stop name, we find only eight (8) Metro Stops for blockading.

  1. Capital South Green Ticket Holders
  2. Eastern Market Green Ticket Holders
  3. Federal Center SW Orange Ticket Holders, Silver Ticket Holders
  4. Gallery Place-Chinatown Blue Ticket Holders, Red Ticket Holders
  5. Judiciary Square Blue Ticket Holders, Red Ticket Holders
  6. L’Enfant Plaza Orange Ticket Holders, Silver Ticket Holders
  7. NoMa-Gallaudet U Yellow Ticket Holders
  8. Union Station Yellow Ticket Holders

All of this is public information and with a little rearrangement, it becomes easier to focus resources on any potential blockading of those stops.

In terms of priorities, Curbed Washington DC posted a map of the gate locations and guest sections for ticket holders. I took a screen-shot of the center portion:

guest-sections-460

If your are interested in activities around the checkpoints, see the larger map.

So You Want To Blockade A Metro Stop

A map of Union Station reminded me that open street blockading isn’t likely to close a Metro Rail stop.

Why? Even with a large number of hardened protesters, the police can approach you from all sides, driving you in particular directions with “less lethal” weapons.

But the architecture of a Metro Rail stop offers an alternative strategy to open air resistance.

Don’t blockade outside a Metro Rail stop, blockade the stop by occupying stairwells, access points, etc.

Anyone opposing the blockade will seek to restore service and so be less likely to use persistent gases or other irritants in closed spaces.

The other advantage of escalators, stairways is that the police can only approach from in front or from behind you. Enabling you to defend the edges of your formation with layers of the most recalcitrant protesters.

I know you intend to peacefully and lawfully assemble only but be aware you may have those in your midst who damage and/or disable turnstiles. Either with some variety of fast acting adhesives or jamming them with thin metallic objects. Although illegal, those acts will also contribute to delaying the restoration of full service.

More thoughts on blockades reduce the number of people reaching Metro Rail stops tomorrow.

PS: It’s unfortunate the Metro doesn’t use tokens anymore. There are some interesting things that can happen with tokens.

New Spaceship Speed in Conway’s Game of Life

January 14th, 2017

New Spaceship Speed in Conway’s Game of Life by Alexy Nigin.

From the post:

In this article, I assume that you have basic familiarity with Conway’s Game of Life. If this is not the case, you can try reading an explanatory article but you will still struggle to understand much of the following content.

The day before yesterday ConwayLife.com forums saw a new member named zdr. When we the lifenthusiasts meet a newcomer, we expect to see things like “brand new” 30-cell 700-gen methuselah and then have to explain why it is not notable. However, what zdr showed us made our jaws drop.

It was a 28-cell c/10 orthogonal spaceship:

An animated image of the spaceship

… (emphasis in the original)

The mentioned introduction isn’t sufficient to digest the material in this post.

There is a wealth of material available on cellular automata (the Game of Life is one).

LifeWiki is one and Complex Cellular Automata is another. While not exhaustive of all there is to know about cellular automata, familiarity with take some time and skill.

Still, I offer this as encouragement that fundamental discoveries remain to be made.

But if and only if you reject conventional wisdom that prevents you from looking.

Looking up words in the OED with XQuery [Details on OED API Key As Well]

January 14th, 2017

Looking up words in the OED with XQuery by Clifford Anderson.

Clifford has posted a gist of work from the @VandyLibraries XQuery group, looking up words in the Oxford English Dictionary (OED) with XQuery.

To make full use of Clifford’s post, you will need for the Oxford Dictionaries API.

If you go straight to the regular Oxford English Dictionary (I’m omitting the URL so you don’t make the same mistake), there is nary a mention of the Oxford Dictionaries API.

The free plan allows 3K queries a month.

Not enough to shut out the outside world for the next four/eight years but enough to decide if it’s where you want to hide.

Application for the free api key was simple enough.

Save that the dumb password checker insisted on one or more special characters, plus one or more digits, plus upper and lowercase. When you get beyond 12 characters the insistence on a special character is just a little lame.

Email response with the key was fast, so I’m in!

What about you?

D-Wave Just Open-Sourced Quantum Computing [DC Beltway Parking Lot Distraction]

January 13th, 2017

D-Wave Just Open-Sourced Quantum Computing by Dom Galeon.

D-Wave has just released a welcome distraction for CS types sitting in the DC Beltway Parking Lot on January 20-21, 2017. (I assuming you brought extra batteries for your laptop.) After you run out of gas, your laptop will be running on battery power alone.

Just remember to grab a copy of Qbsolv before you leave for the tailgate/parking lot party on the Beltway.

A software tool known as Qbsolv allows developers to program D-Wave’s quantum computers even without knowledge of quantum computing. It has already made it possible for D-Wave to work with a bunch of partners, but the company wants more. “D-Wave is driving the hardware forward,” Bo Ewald, president of D-Wave International, told Wired. “But we need more smart people thinking about applications, and another set thinking about software tools.”

To that end, D-Wave has open-sourced Qbsolv, making it possible for anyone to freely share and modify the software. D-Wave hopes to build an open source community of sorts for quantum computing. Of course, to actually run this software, you’d need access to a piece of hardware that uses quantum particles, like one of D-Wave’s quantum computers. However, for the many who don’t have that access, the company is making it possible to download a D-Wave simulator that can be used to test Qbsolv on other types of computers.

This open-source Qbsolv joins an already-existing free software tool called Qmasm, which was developed by one of Qbsolv’s first users, Scott Pakin of Los Alamos National Laboratory. “Not everyone in the computer science community realizes the potential impact of quantum computing,” said mathematician Fred Glover, who’s been working with Qbsolv. “Qbsolv offers a tool that can make this impact graphically visible, by getting researchers and practitioners involved in charting the future directions of quantum computing developments.”

D-Wave’s machines might still be limited to solving optimization problems, but it’s a good place to start with quantum computers. Together with D-Wave, IBM has managed to develop its own working quantum computer in 2000, while Google teamed up with NASA to make their own. Eventually, we’ll have a quantum computer that’s capable of performing all kinds of advanced computing problems, and now you can help make that happen.

From the github page:

qbsolv is a metaheuristic or partitioning solver that solves a potentially large quadratic unconstrained binary optimization (QUBO) problem by splitting it into pieces that are solved either on a D-Wave system or via a classical tabu solver.

The phrase, “…might still be limited to solving optimization problems…” isn’t as limiting as it might appear.

A recent (2014) survey of quadratic unconstrained binary optimization (QUBO), The Unconstrained Binary Quadratic Programming Problem: A Survey runs some thirty-three pages and should keep you occupied however long you sit on the DC Beltway.

From page 10 of the survey:


Kochenberger, Glover, Alidaee, and Wang (2005) examine the use of UBQP as a tool for clustering microarray data into groups with high degrees of similarity.

Where I read one person’s “similarity” to be another person’s test of “subject identity.”

PS: Enjoy the DC Beltway. You may never see it motionless ever again.

Calling Bullshit in the Age of Big Data (Syllabus)

January 13th, 2017

Calling Bullshit in the Age of Big Data by Carl T. Bergstrom and Jevin West.

From the about page:

The world is awash in bullshit. Politicians are unconstrained by facts. Science is conducted by press release. So-called higher education often rewards bullshit over analytic thought. Startup culture has elevated bullshit to high art. Advertisers wink conspiratorially and invite us to join them in seeing through all the bullshit, then take advantage of our lowered guard to bombard us with second-order bullshit. The majority of administrative activity, whether in private business or the public sphere, often seems to be little more than a sophisticated exercise in the combinatorial reassembly of bullshit.

We’re sick of it. It’s time to do something, and as educators, one constructive thing we know how to do is to teach people. So, the aim of this course is to help students navigate the bullshit-rich modern environment by identifying bullshit, seeing through it, and combatting it with effective analysis and argument.

What do we mean, exactly, by the term bullshit? As a first approximation, bullshit is language intended to persuade by impressing and overwhelming a reader or listener, with a blatant disregard for truth and logical coherence.

While bullshit may reach its apogee in the political sphere, this isn’t a course on political bullshit. Instead, we will focus on bullshit that comes clad in the trappings of scholarly discourse. Traditionally, such highbrow nonsense has come couched in big words and fancy rhetoric, but more and more we see it presented instead in the guise of big data and fancy algorithms — and these quantitative, statistical, and computational forms of bullshit are those that we will be addressing in the present course.

Of course an advertisement is trying to sell you something, but do you know whether the TED talk you watched last night is also bullshit — and if so, can you explain why? Can you see the problem with the latest New York Times or Washington Post article fawning over some startup’s big data analytics? Can you tell when a clinical trial reported in the New England Journal or JAMA is trustworthy, and when it is just a veiled press release for some big pharma company?

Our aim in this course is to teach you how to think critically about the data and models that constitute evidence in the social and natural sciences.

Learning Objectives

Our learning objectives are straightforward. After taking the course, you should be able to:

  • Remain vigilant for bullshit contaminating your information diet.
  • Recognize said bullshit whenever and wherever you encounter it.
  • Figure out for yourself precisely why a particular bit of bullshit is bullshit.
  • Provide a statistician or fellow scientist with a technical explanation of why a claim is bullshit.
  • Provide your crystals-and-homeopathy aunt or casually racist uncle with an accessible and persuasive explanation of why a claim is bullshit.

We will be astonished if these skills do not turn out to be among the most useful and most broadly applicable of those that you acquire during the course of your college education.

A great syllabus and impressive set of readings, although I must confess my disappointment that Is There a Text in This Class? The Authority of Interpretive Communities and Doing What Comes Naturally: Change, Rhetoric, and the Practice of Theory in Literary and Legal Studies, both by Stanley Fish, weren’t on the list.

Bergstrom and West are right about the usefulness of this “class” but I would use Fish and other literary critics to push your sensitivity to “bullshit” a little further than the readings indicate.

All communication is an attempt to persuade within a social context. If you share a context with a speaker, you are far more likely to recognize and approve of their use of “evidence” to make their case. If you don’t share such a context, say a person claiming a particular interpretation of the Bible due to divine revelation, their case doesn’t sound like it has any evidence at all.

It’s a subtle point but one known in the legal, literary and philosophical communities for a long time. That it’s new to scientists and/or data scientists speaks volumes about the lack of humanities education in science majors.

Security Design: Stop Trying to Fix the User (Or Catch Offenders)

January 13th, 2017

Security Design: Stop Trying to Fix the User by Bruce Schneier.

From the post:

Every few years, a researcher replicates a security study by littering USB sticks around an organization’s grounds and waiting to see how many people pick them up and plug them in, causing the autorun function to install innocuous malware on their computers. These studies are great for making security professionals feel superior. The researchers get to demonstrate their security expertise and use the results as “teachable moments” for others. “If only everyone was more security aware and had more security training,” they say, “the Internet would be a much safer place.”

Enough of that. The problem isn’t the users: it’s that we’ve designed our computer systems’ security so badly that we demand the user do all of these counterintuitive things. Why can’t users choose easy-to-remember passwords? Why can’t they click on links in emails with wild abandon? Why can’t they plug a USB stick into a computer without facing a myriad of viruses? Why are we trying to fix the user instead of solving the underlying security problem?

Traditionally, we’ve thought about security and usability as a trade-off: a more secure system is less functional and more annoying, and a more capable, flexible, and powerful system is less secure. This “either/or” thinking results in systems that are neither usable nor secure.

Non-reliance on users is a good first step.

An even better second step would create financial incentives for Bruce’s first step.

Financial incentives similar to those in products liability cases, where a “reasonable care” standard evolves over time. No product has to be perfect, but there are expectations of how not bad a product must be.

Liability not only for the producer of the software but also enterprises using that software, when third-parties are hurt by data breaches.

Claims about the complexity of software are true, but can you honestly say that software is more complex than drug interactions across an unknown population? Yet, we have products liability standards for those cases.

Without financial incentives, substantial financial incentives, such as with products liability, cybersecurity experts (Bruce excepted) will still be trying to “fix the user” a decade from now.

The romantic quest to capture and punish those guilty of cybercrime, hasn’t worked so well. One collection of cybercrime statistics pointed out that detected cybercrime incidents increased by 38% in the last year.

Tell me, do you know of any statistics showing a 38% increase in the arrest and prosecution of cybercriminals in the last year? No? That’s what I thought.

With estimated cybercrime prevention spending at $80 billion this year and an estimated cybercrime cost of $2 trillion by 2019, you don’t seem to be getting very much return on your investment.

We know that fixing users doesn’t work and capturing cybercriminals is a dicey proposition.

Both of those issues can be addressed by establishing incentives for more secure software. (Legal liability takes legislative misjudgment out of the loop, enabling the organic growth of software liability principles.)

Ultrasound Tracking Defeats Tor (Provides Pathway Into Government Offices)

January 13th, 2017

Tor users at risk of being unmasked by ultrasound tracking by Danny Bradbury.

How close is your phone to your computer right now?

That close?

You may want to rethink your phone’s location.

From the post:

A new type of attack should make Tor users – and countless dogs around the world – prick up their ears. The attack, revealed at BlackHat Europe in November and at the 33rd Chaos Computer Congress the following month, uses ultrasounds to track users, even if they are communicating over anonymous networks.

The attack uses a technique called ultrasound cross-device tracking (uXDT), which made its way into advertising circles as early as 2012. Marketing companies running uXDT campaigns will play an ultrasonic sound, inaudible to the human ear, in a TV or radio ad, or even in an ad delivered via a computer browser.

Although the user won’t hear it, other devices such as smartphones using uXDT-enabled apps will be listening. When the app hears the signal, it will ping the advertising network with details about itself. What details? Anything it asks for the phone for, such as its IP address, geolocation Coleman’s, telephone number and IMEI (SIM card) code.

That’s creepy enough in marketing. Now, advertisers can tell what TV or radio ads you’ve been listening to, matching them with the universe of other information they have about you from your web searches, social media activity and emails.

In essence the technique uses an ultrasound “beacon” to trigger your phone to “call home.”

Hmmm, betrayed by your own phone.

Danny outlines a number of scenarios of governments using this technique against users.

Ultrasound tracking poses a significant risk for Tor users, but they are security conscious enough to be using Tor.

Consider the flip side of using ultrasound tracking as a pathway into government offices. A phone that can “call home” can certainly listen for keystrokes.

Where do you think most sysadmins keep their phones? ;-)