Archive for December, 2016

Merry Christmas To All Astronomers! (Pan-STARRS)

Tuesday, December 20th, 2016

The Panoramic Survey Telescopes & Rapid Response System (Pan-STARRS) dropped its data release on December 19, 2016.

Realizing you want to jump straight to the details, check out: PS1 Data Processing procedures.

There is far more to be seen but here’s a shot of the sidebar:


Jim Gray favored the use of astronomical data because it was “big” (this was before “big data” became marketing hype) and it is free.


Auto Trump fact-checks – Alternative to Twitter Censorship

Monday, December 19th, 2016

Washington Post automatically inserts Trump fact-checks into Twitter by Sam Machkovech.

From the post:

In an apparent first for any American news outlet, the Washington Post released a Chrome plug-in on Friday designed to fact-check posts from a single Twitter account. Can you guess which one?

The new “RealDonaldContext” plug-in for the Google Chrome browser, released by WaPo reporter Philip Bump, adds fact-check summaries to selected posts by President-elect Donald Trump. Users will need to click a post in The Donald’s Twitter feed to see any fact-check information from the Washington Post, which appears as a gray text box beneath the tweet.

I differ with the Washington Post on its slavish reporting of unsubstantiated claims of the US intelligence community, but high marks for the “RealDonaldContext” plug-in for the Google Chrome browser!

What a great alternative to censoring “fake news” on Twitter! Fact check it!

Pointers to source code for similar plug-ins?

Clinton/Trump Political Maps – Strategy for 2020

Sunday, December 18th, 2016

A pair of maps posted by OnlMaps captures the essence of Clinton’s loss to Trump (no, it didn’t have anything to do with Russian hackers):



I did not re-scale these images so either one enlarges to 1200 x 714 (Clinton) 653 (Trump). Very impressive on a large screen.

Democrats should take note:

Despite having hundreds of position papers (yawn), the candidate with “well-reasoned and detailed proposals” lost to the candidate promising voters a pig in a poke, with no real likelihood of delivery of either.

If the choice is between boring voters into apathy and winning the presidency, I don’t find that a hard choice at all.

Do you?

The Biggest Fake News…

Sunday, December 18th, 2016

Naval Ravikant tweeted:

The biggest fake news is that the “fake news” debate is about anything other than censorship.

Any story/report/discussion/debate over “fake news,” should start with the observation that regulation, filtering, tagging, etc., of “fake news” is a form of censorship.

Press advocates of regulation, filtering, tagging “fake news” until they admit advocating censorship.

The only acceptable answer to censorship is NO. Well, perhaps Hell NO! but you get the idea.

Fight Censorship – Expand Content Flow! Censor Overflow!

Sunday, December 18th, 2016

Facebook, Twitter and others have undertaken demented and pernicious censorship campaigns. Depending upon your politics and preferences, some of their rationales may or may not be compelling to you.

All censorship solutions fail to honor the fundamental right of all users to choose to listen/view or not, whatever content they choose. Instead, these censors seek to impose their choices on everyone.

I’m indifferent to the motivations of censors, some of which I would find personally compelling. The fact remains that users and only users should exercise the right of choice over the content they consume. I would not interfere with that right, even to further my own views on appropriate content.

Having said all that, you no doubt have noticed that your freedom to consume the content of your choice are being rapidly curtailed by the aforementioned censors and others.

One practical defense against these censorious vermin is to explode the flow of content. Producing a condition I call “censor overflow.”

Radio.Garden (which I posted on yesterday) is one source of new content.

Here are some others:

Australian Live Radio Some 268 “proper” radio stations (no internet only) from Australia.

InternetRadio As of today, 39,539 internet radio stations. Even more intriguing is the capability to create your own radio station. Servers are in London and the US so you will need to self-censor or find concealment for your station if you want to be edgy. Over 4000 “proper” radio stations from across Europe. Another “proper” radio station listing but this time with worldwide coverage. A focus on online radio stations from around the world, now numbering more than 3000 stations.

Radio Station World A wider worldwide listing which expressly includes:

RadioStationWorld is an informational directory dealing with the radio broadcasters worldwide. We depend on many people around the world to help us keep the RadioStationWorld listings up to date. (And much thanks to those that take some time to help keep information up-to-date!) Some of the features you will find on our site include listings of local radio stations on the web, radio station that offer streaming webcast services, and in depth listings of local radio broadcast stations including digital radio throughout North America. Also featured are national and regional broadcast networks, shortwave radio, satellite radio, hospital radio, cable radio, closed circuit/campus radio and radio service providers, as well as a growing list of links to sites that deal with the radio broadcasting industry. Enjoy RadioStationWorld, we hope you find this site useful to whatever your needs are, but remember, we do depend on people like yourself to help update in an ever changing broadcast industry. [Correction: The shortwave radio broadcast listing ahs been withdrawn and the provided link points to a dead resource.]

TuneIn Radio

TuneIn enables people to discover, follow and listen to what’s most important to them — from sports, to news, to music, to talk. TuneIn provides listeners access to over 100,000 real radio stations and more than four million podcasts streaming from every continent.(emphasis in original)

For the sake of completeness, avoid the List of Internet radio stations at Wikipedia. It is too outdated to be anything other than a waste of time.

Contribute content, writing, sound, music, videos, graphics, images, anything that can bring us closer to a state of censor overload!

No promises that censors will tire and go away, after all, censors have been censoring since Plato’s Republic.

But, we have more opportunities to bury censors in a tidal wave of content.

Which will be almost as enjoyable as the content in which we bury them.

Radio Show Host Manual

Saturday, December 17th, 2016

Host manual for the Software Engineering Radio

The manual if you want to do a show for Software Engineering Radio and quite possibly the manual for any radio show.


Consider the numbers (page 7, although engineers haven’t figured out pagination yet):

  • is in its 11th year with over 270 episodes;
  • published three times monthly by IEEE Software magazine’
  • is downloaded in aggregate 180,000 times or more per month (including current and back catalog), with each show reaching each show 30,000-40,000 within three months;
  • was named the #1 rated developer podcast based on an aggregation of hacker news comments;
  • appeared in in The Simple Programmer’s ultimate list of developer podcasts;
  • was included among 11 podcasts that will make you a better software engineer;
  • is highly rated on iTunes “Top Podcasts” under the category Software:How To;
  • features thought leaders in the field (Eric Evans, David Heinemeier Hansson, Kent Beck, The Gang of Four, Rich Hickey, Michael Nygard, James Turnbull, Michael Stonebraker, Adrian Cockroft, Martin Fowler, Martin Odersky, Eric Brewer,…);
  • a demographic survey we did a few years ago indicated that most of our listeners are software engineers with 5-10 years experience, architects, and technical managers.
  • Twenty-eight pages of information and suggestions.

    Instead of trolling internet censors and their suggestions, create high quality content. (Advice to myself as much as anyone else.)

    Expanding Your Bubble – Internet Radio Stations

    Saturday, December 17th, 2016

    Will Coldwell writes in Want to tune in to the world’s radio stations? Grow your listening with Radio.Garden:

    A new interactive online website allows users to explore radio stations around the world – as they broadcast live. It’s a timely project that celebrates human communication across borders.

    (graphic omitted)

    Even in the digital age, it’s an experience familiar to many: scrolling through a radio tuner, jumping from crackled voices to clearcut sound, shipping forecasts to pop tunes, in the hunt for a station you want to listen to.

    Now, you can experience this on a global scale, hopping thousands of the world’s radio stations. Launched this week, Radio.Garden is an interactive website that presents Earthcovered in tiny dots, each representing a radio station that can be tuned into at the click of a button.

    Defaults to your location and after a bit of exploring, here’s my current location:


    The interface is very smooth and entertaining.

    Caveat on the location data. The image shown for the stations KANH-HD2 and KJIL-KJLG lists Emporia, United States as the “location.”

    If you look up or, you will find them located in Meade and Lawrence Kansas, respectively.

    Adding state/nation borders would help with navigation.

    Still, quite a joy to find.

    Indivisible: A Practical Guide for Resisting the Trump Agenda

    Friday, December 16th, 2016

    Indivisible: A Practical Guide for Resisting the Trump Agenda

    From page one:

    Donald Trump is the biggest popular vote loser in history to ever to call himself President-Elect. In spite of the fact that he has no mandate, he will attempt to use his congressional majority to reshape America in his own racist, authoritarian, and corrupt image. If progressives are going to stop this, we must stand indivisibly opposed to Trump and the members of Congress who would do his bidding. Together, we have the power to resist – and we have the power to win.

    We know this because we’ve seen it before. The authors of this guide are former congressional staffers who witnessed the rise of the Tea Party. We saw these activists take on a popular president with a mandate for change and a supermajority in Congress. We saw them organize locally and convince their own members of Congress to reject President Obama’s agenda. Their ideas were wrong, cruel, and tinged with racism – and they won.

    We believe that protecting our values and neighbors will require mounting a similar resistance to the Trump agenda — but a resistance built on the values of inclusion, tolerance, and fairness. Trump is not popular. He does not have a mandate. He does not have large congressional margins. If a small minority in the Tea Party can stop President Barack Obama, then we the majority can stop a petty tyrant named Trump.

    To this end, the following chapters offer a step-by-step guide for individuals, groups, and organizations looking to replicate the Tea Party’s success in getting Congress to listen to a small, vocal, dedicated group of constituents. The guide is intended to be equally useful for stiffening Democratic spines and weakening pro-Trump Republican resolve.

    We believe that the next four years depend on citizens across the country standing indivisible against the Trump agenda. We believe that buying into false promises or accepting partial concessions will only further empower Trump to victimize our fellow citizens. We hope that this guide will provide those who share that belief useful tools to make Congress listen.

    Some twenty-two (22) pages following that cover page that outline the basics of creating effective grass-roots influence on a member of congress (MoC).

    If you can agree to follow this guide to what MoCs care about:


    then the advice is this guide will help you be effective.

    If you say “yes, but …” to any of those points, you need to go distract someone else from their worthy cause.

    Overall I think this guide is golden and remarkably honest:

    As discussed in the second chapter, we strongly recommend focusing on defense against the Trump agenda rather than developing an entire alternative policy agenda. This is time-intensive, divisive, and, quite frankly, a distraction, since there is zero chance that we as progressives will get to put our agenda into action at the federal level in the next four years. (emphasis in original)

    Democrats know all there is to know about creating divisions in the party and then losing.

    Your suggestions for data science and/or research aspects of this guide?

    The Joy of Collective Action: Elsevier Boycott – Germany

    Friday, December 16th, 2016

    Germany-wide consortium of research libraries announce boycott of Elsevier journals over open access by Cory Doctorow.

    Cory writes:

    Germany’s DEAL project, which includes over 60 major research institutions, has announced that all of its members are canceling their subscriptions to all of Elsevier’s academic and scientific journals, effective January 1, 2017.

    The boycott is in response to Elsevier’s refusal to adopt “transparent business models” to “make publications more openly accessible.”

    Just guessing but I suspect the DEAL project would welcome news of other consortia and schools taking similar action.

    Over the short term, scholars can tide themselves over with Sci-Hub.

    Cory ends:

    No full-text access to Elsevier journals to be expected from 1 January 2017 on [Göttingen State and University Library]

    How many libraries will you contact by the end of this year?

    Ringing the Clinton/Wikileaks Bell

    Friday, December 16th, 2016

    In Who Enabled Russian “Interference” With Election? (Facts, Yes, Facts), I posted queries against the New York Times Article API that counted all their stories on both Wikileaks and Hillary Clinton between September 1, 2016 and November 7, 2016.

    You can run the queries for yourself (unlike CIA “evidence” which remains a matter of rumor and conjecture) but the final results show that from September 1, 2016 and November 7, 2016, the New York Times published articles on Wikileaks and Hillary Clinton 252 times.

    Eric Lipton, David E. Sanger and Scott Shane posted The Perfect Weapon: How Russian Cyberpower Invaded the U.S., which is a lengthy recounting of the events and coverage of the Clinton/Wikileaks story.

    The authors characterize the roles of the Times and the press as:

    Every major publication, including The Times, published multiple stories citing the D.N.C. and Podesta emails posted by WikiLeaks, becoming a de facto instrument of Russian intelligence.

    I responded to an earlier New York Times criticism of Wikileaks in Drip, Drip, Drip, Leaking At Wikileaks saying:

    The New York Times, a sometimes collaborator with Wikileaks (The War Logs (NYT)), has sponsored a series of disorderly and nearly incoherent attacks on Wikileaks for these leaks.

    The dominant theme in those attacks is that readers should not worry their shallow and insecure minds about social media but rely upon media outlets to clearly state any truth readers need to know.

    I am not exaggerating. The exact language that appears in one such attack was:

    …people rarely act like rational, civic-minded automatons. Instead, we are roiled by preconceptions and biases, and we usually do what feels easiest — we gorge on information that confirms our ideas, and we shun what does not.

    Is that how you think of yourself? It is how the New York Times thinks about you.

    There are legitimate criticisms concerning Wikileaks and its drip, drip, drip leaking but the Times manages to miss all of them.

    For example, the daily drops of Podesta emails, selected on some “unknown to the public” criteria, prevented the creation of a coherent narrative by reporters and the public. The next day’s leak might contain some critical link, or not.

    Reporters, curators and the public were teased with drips and drabs of information, which served to drive traffic to the Wikileaks site, traffic that serves no public interest.

    Wikileaks/Assange weren’t seeking a coherent narrative but rather a knee-jerk ringing of the Clinton/Wikileaks bell.

    Once all the emails appeared, there was some personal embarrassment to be sure but any New York cop would be saying: “Show’s over, nothing to see here, move along, move along.”

    The strategy of drip, drip, drip leaking kept the press in a high state of alert, despite the nearly universal disappointment that followed every actual leak.

    Lessons Learned?

    If the data for leaking is weak and/or mundane, wait for critical time frames when time for reflection is in short supply and deadlines are tight. Then leak with great show and promise the “next” leak will be the one with real juicy details.

    If your data is strong, “smoking gun,” sort of stuff, you may want to pick off opponents one at a time.

    What’s your strategy for leaking data?

    Sigh, Tolerance for Censorship is High

    Friday, December 16th, 2016

    Almost half of Americans believe government ‘responsible’ for tackling fake news by Alastair Reid.

    From the post:

    Americans are increasingly concerned about the impact of fake news and believe the government bears responsibility in stopping its spread, according to a new survey published today by the Pew Research Center.

    Almost 90 per cent of respondents believe fake news causes a “great deal” or “some” confusion about “the basic facts of current events”, and 45 per cent think the government, politicians or elected officials have a “great deal of responsibility” in stopping the spread of fake news.

    I am less concerned with the 75 per cent of people who believe fake stories to be true (BuzzFeed News) than the 45% who find it acceptable for government to combat fake news.

    I don’t know of any government or tech company I would trust to filter the content I see.


    The full Pew report.

    Tailgating @DisruptJ20

    Friday, December 16th, 2016

    As an improvement to my musings in How To Brick A School Bus, Data Science Helps Park It (Part 2), have you considered tailgating on the Washington DC Beltway on January 20, 2017?


    Joe Cahn, The Commissioner of tailgating, has numerous tips, recipes and suggestions at, saying on the homepage:

    I look forward to sharing our common ideas of tailgating food, family, country, and hopefully meeting you whether it be at a concert, NASCAR race, or the Super Bowl as I travel the parking lots researching and cataloging my annual travels of your favorite sporting event. Whether it be a media tour or just sitting in a hot tube in front of the stadium the commissioner of tailgating wants to bring you the fan all the LATEST TAILGATING TRENDS, WHERE FANS CAN COME TOGETHER TO SHARE, FIND, AND LEARN ABOUT TAIGATING FROM THE COMMISHIONER OF TAILGATING while I continue with my celebration of winners. How about those Black Hawks winning their 3rd Stanly Cup, any photo’s you can share? I’d like you to submit them, and I’ll get them posted right away. Did you say Triple Crown Winner, share your story, send a photo, take my poll and enjoy the one thing we all have in common tailgating the last great American Social.

    Combine protesting with the all-American social tradition of tailgating on the Beltway January 20, 2017.

    Tailgating requires more cooperation by a group of drivers, plus buses carrying protesters, to create entirely blocked areas for protesters to disembark and tailgaters to setup their grills, tables, etc. Safety of your tailgaters and protesters being a primary concern.

    Once you have established a tailgating/protest area, invite other drivers join. Live music is too much to ask but you could use generators and sound equipment.

    Sports teams talk about their tailgate parties, you have the opportunity to create the most dispersed tailgate party in the history of tailgate parties!


    As of today, WeatherTAB is predicting for January 20, 2017, a 20% chance of rain/snow, high temperature 34 to 44 F, low temperature 14 to 24 F, no wind predictions.

    Take appropriate cold weather precautions, dress in layers, use buses as warming stations, etc. Post on social media to get advice from Redskins fans on tailgating in cold weather.


    The inauguration proper is set to begin around NOON, EST so if you are going to impede traffic flow with out of gas cars and tailgating parties, best to start 8:30 – 9:00 AM to have the maximum impact on attendance. Much earlier than that and you may be cleared away, although that effort will further impede traffic flow as well.


    Did you know that potential protesters are the biggest concern of inauguration planners? I kid you not: Inaugural planners’ biggest concern: Protesters.

    You have already set a world record and the event is thirty-five (35) days out!

    The details on street closures don’t appear until about ten days before the inauguration but I will post links new inauguration data as it appears.

    PS: Be sure to ask for comfortable cold weather clothing as holiday presents. You will be needing it. [Or at least not any further]

    Friday, December 16th, 2016 [Or at least not any further]

    Write a list of things you would never do. Because it is possible that in the next year, you will do them. —Sarah Kendzior [1]

    We, the undersigned, are employees of tech organizations and companies based in the United States. We are engineers, designers, business executives, and others whose jobs include managing or processing data about people. We are choosing to stand in solidarity with Muslim Americans, immigrants, and all people whose lives and livelihoods are threatened by the incoming administration’s proposed data collection policies. We refuse to build a database of people based on their Constitutionally-protected religious beliefs. We refuse to facilitate mass deportations of people the government believes to be undesirable.

    We have educated ourselves on the history of threats like these, and on the roles that technology and technologists played in carrying them out. We see how IBM collaborated to digitize and streamline the Holocaust, contributing to the deaths of six million Jews and millions of others. We recall the internment of Japanese Americans during the Second World War. We recognize that mass deportations precipitated the very atrocity the word genocide was created to describe: the murder of 1.5 million Armenians in Turkey. We acknowledge that genocides are not merely a relic of the distant past—among others, Tutsi Rwandans and Bosnian Muslims have been victims in our lifetimes.

    Today we stand together to say: not on our watch, and never again.

    I signed up but FYI, the databases we are pledging to not build, already exist.

    The US Census Bureau collects information on race, religion and national origin.

    The Statistical Abstract of the United States: 2012 (131st Edition) Section 1. Population confirms the Census Bureau has this data:

    Population tables are grouped by category as follows:

    • Ancestry, Language Spoken At Home
    • Elderly, Racial And Hispanic Origin Population Profiles
    • Estimates And Projections By Age, Sex, Race/Ethnicity
    • Estimates And Projections–States, Metropolitan Areas, Cities
    • Households, Families, Group Quarters
    • Marital status And Living Arrangements
    • Migration
    • National Estimates And Projections
    • Native And Foreign-Born Populations
    • Religion

    To be fair, the privacy principles of the Census Bureau state:

    Respectful Treatment of Respondents: Are our efforts reasonable and did we treat you with respect?

    • We promise to ensure that any collection of sensitive information from children and other sensitive populations does not violate federal protections for research participants and is done only when it benefits the public good.

    Disclosure: I like the US Census Bureau. Left to their own devices, I don’t have any reasonable fear of their mis-using the data in question.

    But that’s the question isn’t it? Will the US Census Bureau be left to its own policies and traditions?

    I view the various “proposed data collection policies” of the incoming administrations as intentional distractions. While everyone is focused on Trump’s Theater of the Absurd, appointments and policies at the US Census Bureau, may achieve the same ends.

    Sign the pledge yes, but use FOIA requests, personal contacts with Census staff, etc., to keep track of the use of dangerous data at the Census Bureau and elsewhere.

    Instructions for adding your name to the pledge are found at:

    Assume Census Bureau staff are committed to their privacy and appropriate use policies. A friendly approach will be far more productive than a confrontational or suspicious one. Let’s work with them to maintain their agency’s long history of data security.

    “Inappropriate Pictures” – Bureaucratic Speak for…

    Thursday, December 15th, 2016

    A local news reporter covering a story of fire fighters who were dismissed for “inappropriate pictures,” described “inappropriate pictures” as bureaucratic speak for, what an unnamed source who had seen the pictures described as “bad.”

    Whether you say “inappropriate pictures,” or “bad,” the report has nearly zero semantic content.

    To illustrate, here’s a quick summary:

    Four unnamed fire fighters were terminated in Cherokee County, GA because of “inappropriate pictures,” which were taken at some unknown fire station in Cherokee County, on some unknown date, involving a person or persons or animals or plants or minerals unknown. The “inappropriate pictures,” have also been described as “bad.”

    Do you see any “news” in that morass of undisclosed, unnamed, unknowns?

    It sounds more like a soft-porn ad than a news report.

    If you want to gain credibility as a reporter, try reporting facts on stories that inform the public on issues relevant to them. Leave the soft-porn to others.

    One-Off Email Hacks?

    Thursday, December 15th, 2016

    A tweet I saw this morning asked:

    If the DNC/RNC/campaigns were hacked, doesn’t that mean that Russia probably has all of our personal political info? @paix120

    While obtaining the files gathered by the DNC, RNC, etc. on voters would require direct hacks, there’s an unfortunate impression that hacking emails requires a direct hack of an account.

    Something along the lines of the hacker in War Games, only multiplied:


    But if I am interested in compromising/embarrassing emails, why on earth would I go to that much trouble?

    Consider the following network map as illustrative only:


    Whether it is the DNC, RNC or some other group, you need only secure an appropriate location upstream and harvest their network traffic.

    If you don’t have a privileged place on the appropriate network, you haven’t offered enough money.


    If you represent a nation-state or multi-national corporation, have you considered purchasing one or more networks and/or ISPs?

    One-off email hack tales distract from the larger issue that un-encrypted email is always insecure.

    So, yes, if you are using unencrypted email then anyone and everyone has access to your email.

    Without the necessity of hacking your email account.

    PS: I can gather links on the well known story of assembling networks packets if you are really interested. It has been told at length and better than I can by others.

    How to weigh a dog with a ruler? [Or Price a US Representative?]

    Wednesday, December 14th, 2016

    How to weigh a dog with a ruler? (looking for translators)

    From the post:

    We are working on a series of comic books that introduce statistical thinking and could be used as activity booklets in primary schools. Stories are built around adventures of siblings: Beta (skilled mathematician) and Bit (data hacker).

    What is the connection between these comic books and R? All plots are created with ggplot2.

    The first story (How to weigh a dog with a ruler?) is translated to English, Polish and Czech. If you would like to help us to translate this story to your native language, just write to me (przemyslaw.biecek at gmail) or create an issue on GitHub. It’s just 8 pages long, translations are available on Creative Commons BY-ND licence.

    The key is to chart animals by their height as against their weight.

    Pricing US Representatives is likely to follow a similar relationship where their priced goes up by years of service in Congress.

    I haven’t run the data but such a chart would keep “people” (includes corporations in the US) from paying too much or offering too little. To the embarrassment of all concerned.

    DeepBach: a Steerable Model for Bach chorales generation

    Wednesday, December 14th, 2016

    DeepBach: a Steerable Model for Bach chorales generation by Gaëtan Hadjeres and François Pachet.


    The composition of polyphonic chorale music in the style of J.S Bach has represented a major challenge in automatic music composition over the last decades. The art of Bach chorales composition involves combining four-part harmony with characteristic rhythmic patterns and typical melodic movements to produce musical phrases which begin, evolve and end (cadences) in a harmonious way. To our knowledge, no model so far was able to solve all these problems simultaneously using an agnostic machine-learning approach. This paper introduces DeepBach, a statistical model aimed at modeling polyphonic music and specifically four parts, hymn-like pieces. We claim that, after being trained on the chorale harmonizations by Johann Sebastian Bach, our model is capable of generating highly convincing chorales in the style of Bach. We evaluate how indistinguishable our generated chorales are from existing Bach chorales with a listening test. The results corroborate our claim. A key strength of DeepBach is that it is agnostic and flexible. Users can constrain the generation by imposing some notes, rhythms or cadences in the generated score. This allows users to reharmonize user-defined melodies. DeepBach’s generation is fast, making it usable for interactive music composition applications. Several generation examples are provided and discussed from a musical point of view.

    Take this with you on January 20, 2017 in case you tire of playing #DisruptJ20 Twitter Game (guessing XQuery/XPath definitions). Unlikely I know but anything can happen.

    Deeply impressive work.

    You can hear samples at:

    Download the code:

    Makes me curious about the composition of “like” works for composers who left smaller corpora.

    How To Brick A School Bus, Data Science Helps Park It (Part 2)

    Wednesday, December 14th, 2016

    Immediate reactions to How To Brick A School Bus, Data Science Helps Park It (Part 1) include:

    • Blocking a public street with a bricked school bus is a crime.
    • Publicly committing a crime isn’t on your bucket list.
    • School buses are expensive.
    • Turning over a school bus is dangerous.

    All true and all likely to diminish any enthusiasm for participation.

    Bright yellow school buses bricked and blocking transportation routes attract the press like flies to …, well, you know, but may not be your best option.

    Alternatives to a Bricked School Bus

    Despite the government denying your right to assemble near the inauguration on January 20, 2017 in Washington, D.C., what other rights could lead to a newsworthy result?

    You have the right to travel, although the Supreme Court has differed on the constitutional basis for that right. (Constitution of the United States of America: Analysis and Interpretation, 14th Admendment, page 1834, footnote 21).

    You also have the right to be inattentive, which I suspect is secured 9th Amendment:

    The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.

    If we put the right to travel together with the right to be inattentive (or negligent), then it stands to reason that your car could run out of gas on the highways normally used to attend an inauguration.

    Moreover, we know from past cases, that drivers have not been held to be negligent simply for running out of gas, even at the White House.

    Where to Run Out of Gas?

    Interesting question and the one that originally had me reaching for historic traffic data.

    It does exist, yearly summaries (Virginia), Inrix (Washington, DC), Traffic Volume Maps (District Department of Transportation), and others.

    But we don’t want to be like the data scientist who used GPS and satellite data to investigate why you can’t get a taxi in Singapore when it rains. Starting Data Analysis with Assumptions Crunching large amounts of data discovered that taxis in Singapore stop moving when it rains.

    Interesting observation but not the answer to the original question. Asking a local taxi driver, it was discovered that draconian traffic liability laws are the reason taxi drivers pull over when it rains. Not a “big data” question at all.

    What Do We Know About DC Metro Traffic Congestion?

    Let’s review what is commonly known about DC metro traffic congestion:

    D.C. tops list of nation’s worst traffic gridlock (2015), Study ranks D.C. traffic 2nd-worst in U.S. (2016), DC Commuters Abandon Metro, Making Already Horrible Traffic Even Worse (metro repairs make traffic far worse).

    At the outset, we know that motor vehicle traffic is a chaotic system, so small changes, such as addition impediment of traffic flow by cars running out of gas, can have large effects. Especially on a system that teeters on the edge of gridlock every day.

    The loss of Metro usage has a cascading impact on metro traffic (from above). Which means blockage of access to Metro stations will exacerbate the impact of blockages on the highway system.

    Time and expense could be spent on overly precise positioning of out-of-gas cars, but a two part directive is just as effective if not more so:

    • Go to Metro stations ingresses.
    • Go to any location on traffic map that is not red.

    Here’s a sample traffic map that has traffic cameras:


    From Fox5 DC but it is just one of many.

    The use of existing traffic maps removes the need to construct the same and enable chaotic participation, which means you quite innocently ran out of gas and did not at any time contact and/or conspire with others to run out of gas.

    Conspiracy is a crime and you should always avoid committing crimes.

    General Comments

    You may be wondering if authorities being aware of a theoretical discussion of people running out of gas will provoke effective counter measures?

    I don’t think so and here’s why: What would be the logical response of an authority? Position more tow trucks? Setup temporary refueling stations?

    Do you think the press will be interested in those changes? Such that not only do you have the additional friction of the additional equipment but the press buzzing about asking about the changes?

    An authorities best strategy would be to do nothing at all but that advice is rarely taken. At the very best, local authorities will make transportation even more fragile in anticipation someone might run out of gas.

    The numbers I hear tossed about as additional visitors, some activities are expecting more than 100,000 (Women’s March on Washington), so even random participation in running out of gas should have a significant impact.

    What if they held the inauguration to empty bleachers?

    Data Science Traditionalists – Don’t Re-invent the Wheel

    Nudging a chaotic traffic system into gridlock, for hours if not more than a day, may not strike you as traditional data science.

    Perhaps not but please don’t re-invent the wheel.

    If you want to be more precise, perhaps to block particular activities or locations, let me direct you to the Howard University Transportation Safety Data Center.

    They have the Traffic Count Database System (TCDS). Two screen shots that don’t do it justice:



    From their guide to the system:

    The Traffic Count Database System (TCDS) module is a powerful tool for the traffic engineer or planner to organize an agency’s traffic count data. It allows you to upload data from a traffic counter; view graphs, lists and reports of historic traffic count data; search for count data using either the database or the Google map; and print or export data to your desktop.

    This guide is for users who are new to the TCDS system. It will provide you with the tools to carry out many common tasks. Any features not discussed in this guide are considered advanced features. If you have further questions, feel free to explore the online help guide or to contact the staff at MS2 for assistance.

    I have referred to the inauguration of president-elect Donald J. Trump but the same lessons are applicable, with local modifications, to many other locations.

    PS: Nothing should be construed as approval and/or encouragement that you break local laws in any venue. Those vary from jurisdiction to jurisdiction and what are acceptable risks and consequences are entirely your decision.

    If you do run out of gas in or near Washington, DC on January 20, 2017, be polite to first-responders, including police officers. If you don’t realize your real enemies lie elsewhere, then you too have false class consciousness.

    If you are tail-gating on the “Beltway,” offer responders a soft drink (they are on duty) and a hot dog.

    Reporting in Aleppo: Can data science help?

    Wednesday, December 14th, 2016

    Reporting in Aleppo: Can data science help? by Nausicaa Renner. (Columbia Journalism Review)

    from the post:

    In war zones, reporting is hard to come by. Nowhere is this truer than in Syria, where many international journalists are banned, and more than one hundred journalists have been killed since the war began in early 2011. A deal was made on Tuesday between the Syrian government and the rebels allowing civilians and rebels to evacuate eastern Aleppo, but after years of bloody conflict, clarity is still hard to come by.

    Is there a way for data science to give access to understudied war zones? A project at the Center for Spatial Research at Columbia University, partly funded by the Tow Center for Digital Journalism, uses what information we do have to “link eyes in the sky with algorithms and ears on the ground” in Aleppo.

    The Center overlaid satellite images from 2012 to 2016 to create a map showing how Aleppo has changed: Destroyed buildings were identified by discrepancies in the images from year to year. Visualization can also put things in perspective; at a seminar the Center held, one student created a map showing how little the front lines of Aleppo have moved—a stark expression of the futility of war.

    As of this AM, I saw reports that the ceasefire mentioned in this post failed.

    The content is horrific but using the techniques described in The Twitterverse of Donald Trump to harvest Aleppo videos and images could preserve a record of the fall of Aleppo. Would mapping geo-locations to a map of Aleppo help document/confirm reports of atrocities?

    Unlike the wall of silence around US military operations, there is a great deal of first-hand data and opportunities for analysis and confirmation. (It’s hard to analyze or confirm a press briefing document.)

    Be Undemocratic – Think For Other People – Courtesy of Slate

    Wednesday, December 14th, 2016

    Feeling down? Left out of the “big boys” internet censor game by the likes of Facebook and Twitter?

    Dry your eyes! Slate has ridden to your rescue!

    Will Oremus writes in: Only You Can Stop the Spread of Fake News:

    Slate has created a new tool for internet users to identify, debunk, and—most importantly—combat the proliferation of bogus stories. Conceived and built by Slate developers, with input and oversight from Slate editors, it’s a Chrome browser extension called This Is Fake, and you can download and install it for free either on its home page or in the Chrome web store. The point isn’t just to flag fake news; you probably already know it when you see it. It’s to remind you that, anytime you see fake news in your feed, you have an opportunity to interrupt its viral transmission, both within your network and beyond.

    I’m glad Slate is taking the credit/blame for This is Fake.

    Can you name a more undemocratic position than assuming your fellow voters are incapable of making intelligent choices about the news they consume.

    Well, everybody but you and your friends. Right?

    Thanks for your offer to help Slate, but no thanks.

    The Twitterverse of Donald Trump, in 26,234 Tweets

    Tuesday, December 13th, 2016

    The Twitterverse of Donald Trump, in 26,234 Tweets by Lam Thuy Vo.

    From the post:

    We wanted to get a better idea of where President-elect Donald Trump gets his information. So we analyzed everything he has tweeted since he launched his campaign to take a look at the links he has shared and the news sources they came from.

    Step-by-step guide to the software and analysis Trump’s tweets!


    Follow: @lamthuyvo.

    Which public figure’s tweets are you going to track/analyze?

    How To Brick A School Bus, Data Science Helps Park It (Part 1)

    Tuesday, December 13th, 2016

    Apologies for being a day late! I was working on how the New York Times acted as a bullhorn for those election interfering Russian hackers.

    We left off in Data Science and Protests During the Age of Trump [How To Brick A School Bus…] with:

    • How best to represent these no free speech and/or no free assembly zones on a map?
    • What data sets do you need to make protesters effective under these restrictions?
    • What questions would you ask of those data sets?
    • How to decide between viral/spontaneous action versus publicly known but lawful conduct, up until the point it becomes unlawful?

    I started this series of posts because the Women’s March on Washington wasn’t able to obtain a protest permit from the National Park Service due to a preemptive reservation by the Presidential Inauguration Committee.

    Since then, the Women’s March on Washington has secured a protest permit (sic) from the Metropolitan Police Department.

    If you are interested in protests organized for the convenience of government:

    “People from across the nation will gather” at the intersection of Independence Avenue and Third Street SW, near the U.S. Capitol, at 10:00am” on Jan. 21, march organizers said in a statement on Friday.

    Each to their own.

    Bricking A School Bus

    We are all familiar with the typical school bus:


    By Die4kids (Own work) [GFDL or CC BY-SA 3.0], via Wikimedia Commons

    The saying, “no one size fits all,” applies to the load capacity of school buses. For example, the North Carolina School Bus Safety Web posted this spreadsheet detailing the empty (column I) and maximum weight (column R) of a variety of school bus sizes. For best results, get the GVWR (Gross Vehicle Weight Rating, maximum load) for your bus and then weight it on reliable scales.

    Once you determine the maximum weight capacity of your bus, divide that weight by 4,000 pounds, the weight of one cubic yard of concrete. That results is the amount of concrete that you can have poured into your bus as part of the bricking process.

    I use the phrase “your bus” deliberately because pouring concrete into a school bus that doesn’t belong to you would be destruction of private property and thus a crime. Don’t commit crimes. Use your own bus.

    Once the concrete has hardened (for stability), drive to a suitable location. It’s a portable barricade, at least for a while.

    At a suitable location, puncture the tires on one side and tip the bus over. Remove/burn the tires.

    Consulting line 37 of the spreadsheet, with that bus, you have a barricade of almost 30,000 pounds, with no wheels.


    I’m still working on the data science aspects of where to park. More on that in How To Brick A School Bus, Data Science Helps Park It (Part 2), which I will post tomorrow.

    XQuery/XPath CRs 3.1! [#DisruptJ20 Twitter Game]

    Tuesday, December 13th, 2016

    Just in time for the holidays, new CRs for XQuery/XPath hit the street! Comments due by 2017-01-10.

    XQuery and XPath Data Model 3.1

    XML Path Language (XPath) 3.1

    XQuery 3.1: An XML Query Language

    XPath and XQuery Functions and Operators 3.1

    XQueryX 3.1

    #DisruptJ20 is too late for comments to the W3C but you can break the boredom of indeterminate waiting to protest excitedly for TV cameras and/or to be arrested.


    Play the XQuery/XPath 3.1 Twitter Game!

    Definitions litter the drafts and appear as:

    [Definition: A sequence is an ordered collection of zero or more items.]

    You Tweet:

    An ordered collection of zero or more items? #xquery

    Correct response:

    A sequence.

    Some definitions are too long to be tweeted in full:

    An expanded-QName is a value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 3.1]): that is, a triple containing namespace prefix (optional), namespace URI (optional), and local name. (xpath-functions)

    Suggest you tweet:

    A triple containing namespace prefix (optional), namespace URI (optional), and local name.


    A value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 3.1].

    In both cases, the correct response:

    An expanded-QName.

    Use a $10 burner phone and it unlocked at protests. If your phone is searched, imagine the attempts to break the “code.”

    You could agree on definitions/responses as instructions for direct action. But I digress.

    Who Enabled Russian “Interference” With Election? (Facts, Yes, Facts)

    Monday, December 12th, 2016

    I ask: “Who Assisted With Russian “Interference” With Election?,” because even if the DNC was hacked at the instigation of some Russian government agency, that doesn’t equal interference in the 2016 presidential election.

    Russian hackers can’t vote in the United States, with the possible exception of Chicago, so the initial hack had no impact on the election.

    Following the alleged Russian hack, the files were transferred to Wikileaks. At the time Julian Assange, editorin-chief of Wikileaks was residing in the Embassy of Ecuador, London. Julian and friends can’t vote in the United States either, again excepting for Chicago.

    Up to this point, there is exactly zero impact on the 2016 US presidential election.

    Even Snopes concedes it is only unproven that Hillary Clinton wants to assassinate Julian Assange, so it isn’t difficult to imagine hard feelings on the part of Julian.

    Wikileaks has a long history of being equally difficult for all governments that needs no elaboration here. If you doubt that, you haven’t spent any time at the Wikileaks site. Take a day or so to satisfy yourself on that score and return to this post.

    In any event, with great fanfare and general disappointment with each release, Wikileaks trickled out the Podesta emails. John Podesta was Clinton’s campaign manager.

    I saw the emails as did many others but still not in numbers that would constitute “interference” with an election.

    So, where did the Russian “interference” come from?

    Did the New York Times Enable Russian “Interference”?

    If you run this query: (%22Clinton%22AND%22Wikileaks%22)&begin_date=20160901

    with your own New York Times article API key, you will get (in part):


    In English: Between September 1, 2016 and November 7, 2016, both “Clinton” and “Wikileaks” occurred in 252 separate articles appearing in the New York Times.

    Over 68 days there were more than 4.5 articles per day in the New York Times on Hillary Clinton and Wikileaks.

    Did The Guardian Enable Russian “Interference”?

    If you run this query:

    with your own Guardian API key, you will get (in part):


    In English: Between September 1, 2016 and November 7, 2016, both “Clinton” and “Wikileaks” occurred in 123 separate times in The Guardian.

    Over 68 days there were approximately 1.8 articles per day in The Guardian on Hillary Clinton and Wikileaks.

    Enabling “Interference”

    Let’s be clear, I chose the New York Times and The Guardian in part because they have public APIs but also to illustrate the absurdity of the claims of “interference” in an election by the Russians.

    The chain of “inference” runs something along the lines of:

    • Looks like Russian work (no fact/evidence)
    • Wikileaks is a Russian operative (facially false)
    • One or more of the editors of the New York Times are Russian sleeper agents (also facially false)

    I included the line about New York Times editors because the emails didn’t spread themselves from the Wikileaks servers did they?


    To find Russian “interference” with the 2016 US presidential election you have to believe that Wikileaks, the New York Times, and The Guardian (and others) all acted in furtherance of a plan hatched by imaginary Russian hackers to release some of the dullest emails since the first email (1971).

    You can believe that based on specious assurances from known liars (Clapper comes to mind) but I’m passing.

    Congressional hearings should include every news source and commentator that repeated the Clinton/Wikileaks story so they can be sifted for “Russian” influence and fellow travelers.

    “We must, indeed, all hang together or, most assuredly, we shall all hang separately.” Benjamin Franklin.

    How To Defeat Grumpy Bear Email Leaks

    Monday, December 12th, 2016

    Grumpy Bear, a/k/a, Russia, is alleged to have interfered with the 2016 US presidential election.


    I say “alleged” because the “evidence,” if you want to call it that, consists of “looks like,” “similar to,” and similar comparisons, to yet unrevealed evidence.

    Being mindful that the FBI, one of the supporters of the Russian interference rumor mill, had to get special Windows software written to separate husband and wife emails.

    Forensic computer experts developed new software to “de-dupe” the contents, weeding out duplicate emails. With a warrant in the FBI’s possession, agents can read Weiner’s emails and determine if any of the messages are relevant to Mrs. Clinton’s server.

    The CIA has the technical chops to make an assessment but you have to assume they aren’t aping the FBI from published news reports. The Department of Homeland Security echoes that opinion, but the CIA + FBI + DHS = one opinion. With no facts being given.

    Whatever conclusion you reach on the involvement or non-involvement of Grumpy Bear, Hillary Clinton had the ability to end the email “crisis” at the time of her own choosing.

    The simple solution was to release all of her unclassified emails and those of her campaign staff. Not just John Podesta‘s but all of them.

    For good measure, I would have thrown Bill Clinton‘s in as well. Give the press a target other than Hillary.

    After the initial shock of transparency wore off, the press would quickly discover there’s nothing there and move along. Something else, perhaps legitimate campaign issues, would have been at the top of the news.

    Don’t repeat Clinton’s mistake and allow a leak, that has already happened, become an albatross around your neck.

    Dark Web Data Dumps

    Sunday, December 11th, 2016

    Dark Web Data Dumpsby Sarah Jamie Lewis.

    From the webpage:

    A collection of structured data obtained from scraping the dark web.


    Researchers need more data about the dark web.

    The best resource we have right now are the (Black Market Archives)[], scrapping of various marketplaces scrapped by Gwern et al in 2013.

    Much has changed since 2013, and complete web dumps, while useful for some research tasks, frustrate and complicate others.

    Further, governments & corporations are already building out such data in private & for profit.

    This Resource

    Sarah Jamie Lewis. Dark Web Data Dumps, 2016, 10 Dec 2016. Web. [access date]

    Valhalla Marketplace Listings October 2016

    Sarah Jamie Lewis. Dark Web Data Dumps, 2016, Octber 2016. Web. [access date]


    If you would like to support this, and other dark web research, please become a patron.

    Valhalla Marketplace Listings runs 1.3 MB and 16511 lines.

    Sans the Rolex watch ads, makes great New Year’s party material. 😉

    How To Leak To ProPublica (Caveat on Leaking)

    Sunday, December 11th, 2016

    How To Leak To ProPublica by David Sleight.

    From the post:

    Our job is to hold people and institutions accountable. And it requires evidence. Documents are a crucial part of that. We are always on the lookout for them — especially, now.

    Have you seen something that troubles you or that you think should be a story? Do you have a tip about something we should be investigating? Do you have documents or other materials that we should see? We want to hear from you.

    Here are a few ways to contact us or send us documents and other materials, safely, securely and anonymously as possible.

    Here is our staff list, which links to each of our bios and email addresses. Of course, email is convenient, but if your information is sensitive, there are better options.

    David outlines your options in detail:

    • Encrypted Messages and Calls
    • Encrypted Email
    • The Low-Tech, but Secure Option: Postal Mail
    • Super Hi-Tech, Time-Consuming but Maximum Security: SecureDrop

    One caveat on leaking, not specific to ProPublica, secure agreement on when the raw leak will be released.

    Enough time must be allowed for the reporters to prepare and benefit from the leak, but the public has an interest in comparing reports based on leaked information to the raw leaked information.

    4 Days Left – Submission Alert – XML Prague

    Sunday, December 11th, 2016

    A tweet by Jirka Kosek reminded me there are only 4 days left for XML Prague submissions!

    • December 15th – End of CFP (full paper or extended abstract)
    • January 8th – Notification of acceptance/rejection of paper to authors
    • January 29th – Final paper

    From the call for papers:

    XML Prague 2017 now welcomes submissions for presentations on the following topics:

    • Markup and the Extensible Web – HTML5, XHTML, Web Components, JSON and XML sharing the common space
    • Semantic visions and the reality – micro-formats, semantic data in business, linked data
    • Publishing for the 21th century – publishing toolchains, eBooks, EPUB, DITA, DocBook, CSS for print, …
    • XML databases and Big Data – XML storage, indexing, query languages, …
    • State of the XML Union – updates on specs, the XML community news, …

    All proposals will be submitted for review by a peer review panel made up of the XML Prague Program Committee. Submissions will be chosen based on interest, applicability, technical merit, and technical correctness.

    Accepted papers will be included in published conference proceedings.

    I don’t travel but if you need a last-minute co-author or proofer, you know where to find me!

    Poor Presentation – Failure to Communicate

    Sunday, December 11th, 2016

    If you ask about the age of city, do you expect to be told it founding date or its age?

    If you said founding date, you will be as confused as I was by:


    You can see the map in its full confusion.

    The age of Aubsburg is indeed 2013, but 15 BCE (on orders of the Emperor Augustus) established the same fact with less effort on the part of the reader.

    Making users work for information is always a poor communication strategy. Always.

    Cognitive Bias Exercises

    Sunday, December 11th, 2016

    I encountered Cognitive bias cheat sheet – Because thinking is hard by Buster Benson today along with the visualization contributed by John Manoogian III.

    Benson divided Wikipedia’s list of cognitive biases into twenty groupings and summarizes those into four principles to use and four truths about our solutions.

    That’s handy but how do I practice spotting those cognitive biases?

    I started with Problem 1: Too much information., the first group:

    We notice things that are already primed in memory or repeated often. (emphasis in original)

    (You notice I have revealed one of my cognitive “biases.” I can’t stand to have lists in non-alphabetical order.)

    Users are invited to write a one-sentence definition for each bias and then to supply examples of each one.

    Scoring: 1 point for each example of a bias, 3 points if it’s in your own work.

    Spotting the biases, as you see them is one aspect of the exercise.

    Group discussion of results will hone your cognitive bias spotting skills to a fine edge.

    My first cut on problem 1, group 1.

    Suggestions? Comments?