Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 7, 2016

Search points of interest by radius with Mapbox GL JS

Filed under: MapBox,Mapping,Maps — Patrick Durusau @ 4:47 pm

Search points of interest by radius with Mapbox GL JS by Zach Beattie.

From the post:

Mapbox GL JS provides new and interesting ways to query data found on a map. We built an example that filters points of interest by genre, which uses the featuresAt method to only select data based on the position and radius of the circle. Drag the blue circle around the map to populate points of interest on the left. Zoom in and out on the circle to adjust the radius.

Visually, places of interest appear as colored dots on the map and you can select what type of places appear at all. You use the blue circle to move about the map and as it encompasses dots on the map, additional information appears to the left of the map.

That’s a poor description when compared to the experience. Visit the live map to really appreciate it.

Assuming a map of surveillance cameras and the movement of authorities (in near real time), this would make a handy planning tool.

Flying while trans [Ikarran]: still unbelievably horrible

Filed under: Government,Security — Patrick Durusau @ 4:12 pm

Flying while trans: still unbelievably horrible by Cory Doctorow.

Cory writes:

Cary Gabriel Costello is a trans-man in Milwaukee. Two-thirds of the time when he flies, the TSA has a complete freakout over the “anomalies” his body displays on the full-body scanner.

See Cory’s post for his commentary on this atrocity.

What Cory describes is a precursor for an episode of Babylon 5 with the title Infection. The gist of the story is that an alien race had been repeatedly attacked so they created a biological weapon that would destroy any entity that wasn’t “pure Ikarran.”

The definition of “pure Ikarran,” like “loyal American,” was set by fanatics, extremists, the result of which no Ikarran could possibly fit the profile of a “pure Ikarran.”

Here’s how the use of that weapon ended for the Ikarrans:

Sinclair: So who set the parameters of what it meant to be a pure Ikarran?

Franklin: A coalition of religious fanatics and military extremists that ran the government. They programmed the weapons with a level of standards based on ideology, not science.

Sinclair: Like the Nazi ideal of the perfect Aryan during World War Two.

Franklin: Exactly. The next invasion, eleven of the machines were released. And they stopped the invaders by killing anything that didn’t match their profile of the pure, perfect Ikarran. When they were finished, the machines turned on their creators. They began a process of extermination based on the slightest deviation from what they were programmed to consider normal. They killed; they kept killing until the last Ikarran was dead. (Dialogue from http://www.midwinter.com/lurk/countries/us/guide/004.weapons.html)

Now the TSA is using a profile of what a “loyal American” looks like in a full body scanner. Data mining is already underway of social media to determine what a “loyal American” says online.

The Ikarran experience lead to complete genocide. That won’t happen here but security fanatics are well on the way to taking all of us to a far darker place than the Japanese internment camps during WWII.

Visual Tools From NPR

Filed under: Graphics,Journalism,News,Visualization — Patrick Durusau @ 3:45 pm

Tools You Can Use

From the post:

Open-source tools for your newsroom. Take a look through all our repos, read about our best practices, and learn how to setup your Mac to develop like we do.

Before you rush off to explore all the repos (there are more than a few), check out these projects on the Tools You Can Use page:

App Template – An opinionated template that gets the first 90% of building a static website out of the way. It integrates with Google Spreadsheets, Bootstrap and Github seamlessly.

Copytext – A Python library for accessing a spreadsheet as a native object suitable for templating.

Dailygraphics – A framework for creating and deploying responsive graphics suitable for publishing inside a CMS with pym.js. It includes d3.js templates for many different types of charts.

Elex – A command-line tool to get election results from the Associated Press Election API v2.0. Elex is designed to be friendly, fast and agnostic to your language/database choices.

Lunchbox – A suite of tools to create images for social media sharing.

Mapturner – A command line utility for generating topojson from various data sources for fast maps.

Newscast.js – A library to radically simplify Chromecast web app development.

Pym.js – A JavaScript library for responsive iframes.

More tools to consider for your newsroom or other information delivery center.

Twitter Fighting Censorship? (Man Bites Dog Story?)

Filed under: Censorship,Tweets,Twitter — Patrick Durusau @ 2:55 pm

Twitter sues Turkey over ‘terror propaganda’ fine

From the post:

Twitter has challenged Turkey in an Ankara court seeking to cancel a $50,000 fine for not removing content from its website, the social media site’s lawyer told Al Jazeera on Thursday.

Turkey temporarily banned access to Twitter several times in the past for failing to comply with requests to remove content. But the 150,000 lira ($50,000) fine imposed by the Information and Communication Technologies Authority (BTK) was the first of its kind imposed by Turkish authorities on Twitter.

A Turkish official told Reuters news agency on Thursday that much of the material in question was related to the Kurdistan Workers Party (PKK), which Ankara called “terrorist propaganda”.

Twitter, in its lawsuit, is arguing the fine goes against Turkish law and should be annulled, the official told Reuters.

Reading about Twitter opposing censorship is like seeing a news account about a man biting a dog. That really is news!

I say that because only a few months ago in Secretive Twitter Censorship Fairy Strikes Again!, I pointed to reports of Twitter silencing 10,000 Islamic State accounts on April 2nd of 2015. More censorship of Islamic State accounts followed but that’s an impressive total for one day.

From all reports, entirely at Twitter’s on initiative. Why Twitter decided to single out accounts that favor the Islamic State over those that favor the U.S. military isn’t clear. The U.S. military is carrying out daily bombing attacks in Iraq and Syria, something you can’t say about the Islamic State.

Now Twitter finds itself in the unhappy position of being an inadequate censor, a censor that violates the fundamental premise of being a common carrier, that is it is open to all opinions, fair and foul, and a censor that has failed a state that is even less tolerant of free speech than Twitter.

Despised by one side for censorship and loathed by the other for being an inadequate toady.

Not an enviable position.

Just my suggestion but Twitter needs to reach out to the telcos and others who provide international connectivity for phones and other services to Turkey.

A 24 to 72 hour black-out of all telecommunications, for banks, media, phone, internet, should give the Turkish government a taste of the economic disruption, to say nothing of disruption of government, that will follow future attempts to censor, fine or block any international common carrier.

The telcos and other have the power to bring outlandish actors such as the Turkish government to a rapid heel.

It’s time that power was put to use.

You see, no bombs, no boots on the ground, no lengthy and tiresome exchanges of blustering speeches, just a quick trip back to the 19th century to remind Turkey’s leaders how painful a longer visit could be.

New York Public Library – 180K Hi-Res Images/Metadata

Filed under: History,Library,Public Data — Patrick Durusau @ 2:29 pm

NYPL Releases Hi-Res Images, Metadata for 180,000 Public Domain Items in its Digital Collections

from the post:

JANUARY 6, 2016 — The New York Public Library has expanded access to more than 180,000 items with no known U.S. copyright restrictions in its Digital Collections database, releasing hi-res images, metadata, and tools facilitating digital creation and reuse. The release represents both a simplification and an enhancement of digital access to a trove of unique and rare materials: a removal of administration fees and processes from public domain content, and also improvements to interfaces — popular and technical — to the digital assets themselves. Online users of the NYPL Digital Collections website will find more prominent download links and filters highlighting restriction-free content; while more technically inclined users will also benefit from updates to the Library’s collections API enabling bulk use and analysis, as well as data exports and utilities posted to NYPL’s GitHub account. These changes are intended to facilitate sharing, research and reuse by scholars, artists, educators, technologists, publishers, and Internet users of all kinds. All subsequently digitized public domain collections will be made available in the same way, joining a growing repository of open materials.

“The New York Public Library is committed to giving our users access to information and resources however possible,” said Tony Marx, president of the Library. “Today, we are going beyond providing our users with digital facsimiles that give only an impression of something we have in our physical collection. By making our highest-quality assets freely available, we are truly giving our users the greatest access possible to our collections in the digital environment.”

To encourage novel uses of its digital resources, NYPL is also now accepting applications for a new Remix Residency program. Administered by the Library’s digitization and innovation team, NYPL Labs, the residency is intended for artists, information designers, software developers, data scientists, journalists, digital researchers, and others to make transformative and creative uses of digital collections and data,and the public domain assets in particular. Two projects will be selected, receiving financial and consultative support from Library curators and technologists.

To provide further inspiration for reuse, the NYPL Labs team has also released several demonstration projects delving into specific collections, as well as a visual browsing tool allowing users to explore the public domain collections at scale. These projects — which include a then-and-now comparison of New York’s Fifth Avenue, juxtaposing 1911 wide angle photographs with Google Street View, and a “trip planner” using locations extracted from mid-20th century motor guides that listed hotels, restaurants, bars, and other destinations where black travelers would be welcome — suggest just a few of the myriad investigations made possible by fully opening these collections.

The public domain release spans the breadth and depth of NYPL’s holdings, from the Library’s rich New York City collection, historic maps, botanical illustrations, unique manuscripts, photographs, ancient religious texts, and more. Materials include:

Visit nypl.org/publicdomain for information about the materials related to the public domain update and links to all of the projects demonstrating creative reuse of public domain materials.

The New York Public Library’s Rights and Information Policy team has carefully reviewed Items and collections to determine their copyright status under U.S. law. As a U.S.-based library, NYPL limits its determinations to U.S. law and does not analyze the copyright status of an item in every country. However, when speaking more generally, the Library uses terms such as “public domain” and “unrestricted materials,” which are used to describe the aggregate collection of items it can offer to the public without any restrictions on subsequent use.

If you are looking for content for a topic map or inspiration to pass onto other institutions about opening up their collections, take a look at the New York Public Library’s Digital Collections.

Content designed for re-use. Imagine that, re-use of content.

The exact time/place of the appearance of seamless re-use of content will be debated by future historians but for now, this is a very welcome step in that direction.

January 6, 2016

A Lesson about Let Clauses (XQuery)

Filed under: XML,XQuery — Patrick Durusau @ 10:48 pm

I was going to demonstrate how to localize roll call votes so that only representatives from your state and their votes were displayed for any given roll call vote.

Which would enable libraries or local newsrooms, whose users/readers have little interest in how obscure representatives from other states voted, to pare down the roll call vote list to those that really matter, your state’s representatives.

But remembering that I promised to clean up the listings in yesterday’s post that read:

{string(doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//rollcall-num)}

and kept repeating (doc(“http://clerk.house.gov/evs/2015/roll705.xml”).

My thought was to replace that string with a variable declared by a let clause and then substituting that variable for that string.

To save you from the same mistake, combining a let clause with direct element constructors returns an error saying, in this case:

Left operand of ‘>’ needs parentheses

Not a terribly helpful error message.

I have found examples of using a let clause within a direct element constructor that would have defeated the rationale for declaring the variable to begin with.

Tomorrow I hope to post today’s content, which will enable you to display data relevant to local voters, news reporters, for any arbitrary roll call vote in Congress.

Mark today’s adventure as a mistake to avoid. 😉

Sloth As Character Flaw and Security Acronym

Filed under: Cybersecurity,Security — Patrick Durusau @ 9:05 pm

Fatally weak MD5 function torpedoes crypto protections in HTTPS and IPSEC by Dan Goodin.

From the post:

If you thought MD5 was banished from HTTPS encryption, you’d be wrong. It turns out the fatally weak cryptographic hash function, along with its only slightly stronger SHA1 cousin, are still widely used in the transport layer security protocol that underpins HTTPS. Now, researchers have devised a series of attacks that exploit the weaknesses to break or degrade key protections provided not only by HTTPS but also other encryption protocols, including Internet Protocol Security and secure shell.

The attacks have been dubbed SLOTH—short for security losses from obsolete and truncated transcript hashes. The name is also a not-so-subtle rebuke of the collective laziness of the community that maintains crucial security regimens forming a cornerstone of Internet security. And if the criticism seems harsh, consider this: MD5-based signatures weren’t introduced in TLS until version 1.2, which was released in 2008. That was the same year researchers exploited cryptographic weaknesses in MD5 that allowed them to spoof valid HTTPS certificates for any domain they wanted. Although SHA1 is considerably more resistant to so-called cryptographic collision attacks, it too is considered to be at least theoretically broken. (MD5 signatures were subsequently banned in TLS certificates but not other key aspects of the protocol.)

“Notably, we have found a number of unsafe uses of MD5 in various Internet protocols, yielding exploitable chosen-prefix and generic collision attacks,” the researchers wrote in a technical paper scheduled to be discussed Wednesday at the Real World Cryptography Conference 2016 in Stanford, California. “We also found several unsafe uses of SHA1 that will become dangerous when more efficient collision-finding algorithms for SHA1 are discovered.”

Dan’s final sentence touches on the main reason for cyberinsecurity:

The findings generate yet another compelling reason why technical architects should wean themselves off the SHA1 and MD5 functions, even if it generates short-term pain for people who still use older hardware that aren’t capable of using newer, more secure algorithms.

What kind of pain?

Economic pain.

Amazing that owners of older hardware are allowed to endanger everyone with newer hardware.

At least until you realize that no cybersecurity discussions starts with one source of cybersecurity problems, bugs in software.

Increasing penalties for cybercrime isn’t going to decrease the rate of software bugs that make cybercrime possible.

Incentives for the production of better written and tested code, an option heretofore not explored, might. With enough incentive, even the sloth that leads to software bugs might be reduced, but I would not hold my breath.

Internet Explorer 8, 9, and 10 – “Really Most Sincerely Dead”

Filed under: Browsers,Microsoft,Software — Patrick Durusau @ 5:35 pm

Web developers rejoice; Internet Explorer 8, 9 and 10 die on Tuesday by Owen Williams.

From the post:

Internet Explorer has long been the bane of many Web developers’ existence, but here’s some news to brighten your day: Internet Explorer 8, 9 and 10 are reaching ‘end of life’ on Tuesday, meaning they’re no longer supported by Microsoft.

Three down and one to go, IE 11, if I’m reading Owen’s post correctly. Past IE 11, users will be on Edge in Windows 10.

Oh, the “…really most sincerely dead…” is from the 1939 movie, Wizard of Oz.

Ggplot2 Quickref

Filed under: Charts,Ggplot2,R — Patrick Durusau @ 5:04 pm

Ggplot2 Quickref by Selva Prabhakaran.

If you use ggplot2, map this to a “hot” key on your keyboard.

Enjoy!

Statistical Learning with Sparsity: The Lasso and Generalizations (Free Book!)

Filed under: Sparse Learning,Statistical Learning,Statistics — Patrick Durusau @ 4:46 pm

Statistical Learning with Sparsity: The Lasso and Generalizations by Trevor Hastie, Robert Tibshirani, and Martin Wainwright.

From the introduction:

I never keep a scorecard or the batting averages. I hate statistics. What I got to know, I keep in my head.

This is a quote from baseball pitcher Dizzy Dean, who played in the major leagues from 1930 to 1947.

How the world has changed in the 75 or so years since that time! Now large quantities of data are collected and mined in nearly every area of science, entertainment, business, and industry. Medical scientists study the genomes of patients to choose the best treatments, to learn the underlying causes of their disease. Online movie and book stores study customer ratings to recommend or sell them new movies or books. Social networks mine information about members and their friends to try to enhance their online experience. And yes, most major league baseball teams have statisticians who collect and analyze detailed information on batters and pitchers to help team managers and players make better decisions.

Thus the world is awash with data. But as Rutherford D. Roger (and others) has said:

We are drowning in information and starving for knowledge.

There is a crucial need to sort through this mass of information, and pare it down to its bare essentials. For this process to be successful, we need to hope that the world is not as complex as it might be. For example, we hope that not all of the 30, 000 or so genes in the human body are directly involved in the process that leads to the development of cancer. Or that the ratings by a customer on perhaps 50 or 100 different movies are enough to give us a good idea of their tastes. Or that the success of a left-handed pitcher against left-handed batters will be fairly consistent for different batters. This points to an underlying assumption of simplicity. One form of simplicity is sparsity, the central theme of this book. Loosely speaking, a sparse statistical model is one in which only a relatively small number of parameters (or predictors) play an important role. In this book we study methods that exploit sparsity to help recover the underlying signal in a set of data.

The delightful style of the authors had me going until they said:

…we need to hope that the world is not as complex as it might be.

What? “…not as complex as it might be?

Law school and academia both train you to look for complexity so “…not as complex as it might be” is as close to apostasy as any statement I can imagine. 😉 (At least I can say I am honest about my prejudices. Some of them at any rate.)

Not for the mathematically faint of heart but it may certainly be a counter to the intelligence communities’ mania about collecting every scrap of data.

Finding a needle in a smaller haystack could be less costly and more effective. Both of those principles run counter to well established government customs but there are those in government who wish to be effective. (Article of faith on my part.)

I first saw this in a tweet by Chris Diehl.

Top 100 AI Influencers of 2015 – Where Are They Now? [Is There A Curator In The House?]

Filed under: Artificial Intelligence — Patrick Durusau @ 4:20 pm

Top 100 Artificial and Robotics Influencers 2015.

Kirk Borne tweeted the Top 100 … link today.

More interesting than most listicles but as static HTML, it doesn’t lend itself to re-use.

For example, can you tell me:

  • Academic publications anyone listed had in 2014? (One assumes the year they were judged against for the 2015 list.)
  • Academic publications anyone listed had in 2015?
  • Which of these people were co-authors?
  • Which of these people have sent tweets on AI?
  • etc.

Other than pandering to our love of lists, lists appear organized and we like organization at little or no cost, what does an HTML listicle have to say for itself?

This is a top candidate for one or two XQuery posts next week. I need to finish this week on making congressional roll call vote documents useful. See: Jazzing Up Roll Call Votes For Fun and Profit (XQuery) for the start of that series.

January 5, 2016

Jazzing a Roll Call Vote – Part 3 (XQuery)

Filed under: XML,XQuery — Patrick Durusau @ 9:48 pm

I posted Congressional Roll Call Vote – Accessibility Issues earlier today to deal with some accessibility issues noticed by @XQuery with my color coding.

Today we are going to start at the top of the boring original roll call vote and work our way down using XQuery.

Be forewarned that the XQuery you see today we will be shortening and cleaning up tomorrow. It works, but its not best practice.

You will need to open up the source of the original roll call vote to see the elements I select in the path expressions.

Here is the XQuery that is the goal for today:

xquery version “3.0”;
declare boundary-space preserve;
<html>
<head></head>
<body>
<h2 align=”center”>FINAL VOTE RESULTS FOR ROLL CALL {string(doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//rollcall-num)} </h2>

<strong>{string(doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//rollcall-num)}</strong> {string(doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//action-date)} {string(doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//action-time)} <br/>

<strong>Question:</strong> {string(doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//vote-question)} <br/>

<strong>Bill Title:</strong> {string(doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//vote-desc)}
</body>
</html>

The title of the document we obtain with:

<h2 align=”center”>FINAL VOTE RESULTS FOR ROLL CALL {string(doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//rollcall-num)} </h2>

Two quick things to notice:

First, for very simple documents like this one, I use “//” rather than writing out the path to the rollcall-num element. I already know it only occurs once in each rollcall document.

Second, when using direct element constructors, the XQuery statements are enclosed by “{ }” brackets.

The rollcall number, date and time of the vote come next (I have introduced line breaks for readability):

<strong>{string(doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//rollcall-num)}</strong>

{string(doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//action-date)}

{string(doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//action-time)} <br/>

If you compare my presentation of that string and that from the original, you will find the original has slightly more space between the items.

Here is the XSLT for that spacing:

<xsl:if test=”legis-num[text()!=’0′]”><xsl:text>      </xsl:text><b><xsl:value-of select=”legis-num”/></b></xsl:if>
<xsl:text>      </xsl:text><xsl:value-of select=”vote-type”/>
<xsl:text>      </xsl:text><xsl:value-of select=”action-date”/>
<xsl:text>      </xsl:text><xsl:value-of select=”action-time”/><br/>

Since I already had white space separating my XQuery expressions, I just added to the prologue:

declare boundary-space preserve;

The last two lines:

<strong>Question:</strong> {string(doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//vote-question)} <br/>

<strong>Bill Title:</strong> {string(doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//vote-desc)}

Are just standard queries for content. The string operator extracts the content of the element you address.

Tomorrow we are going to talk about how to clean up and shorten the path statements and look around for information that should be at the top of this document, but isn’t!

PS: Did you notice that the vote totals, etc., are written as static data in the XML file? Curious isn’t it? Easy enough to generate from the voting data. I don’t have an answer but thought you might.

Koch Snowflake

Filed under: Fractals,Graphics — Patrick Durusau @ 7:53 pm

Koch Snowflake by Nick Berry.

From the post:

We didn’t get a White Christmas in Seattle this year.

Let’s do the next best thing, let’s generate fractal snowflakes!

What is a fractal? A fractal is a self-similar shape.

Fractals are never-ending infinitely complex shapes. If you zoom into a fractal, you get see a shape similar to that seen at a higher level (albeit it at smaller scale). It’s possible to continuously zoom into a fractal and experience the same behavior.

Two of the most well-known fractal curves are Hilbert Curves and Koch Curves. I’ve written about the Hilbert Curve in a previous article, and today will talk about the Koch Curve.

There wasn’t any snow for Christmas in Atlanta, GA either but this is one of the clearest and most complete explanations of the Koch curve that I have seen.

Whether you get snow this year or not, take some time for a slow walk on Koch snowflakes.

Enjoy!

Jane, John … Leslie? A Historical Method for Algorithmic Gender Prediction [Gatekeeping]

Filed under: History,R,Text Mining — Patrick Durusau @ 7:43 pm

Jane, John … Leslie? A Historical Method for Algorithmic Gender Prediction by Cameron Blevins and Lincoln Mullen.

Abstract:

This article describes a new method for inferring the gender of personal names using large historical datasets. In contrast to existing methods of gender prediction that treat names as if they are timelessly associated with one gender, this method uses a historical approach that takes into account how naming practices change over time. It uses historical data to measure the likelihood that a name was associated with a particular gender based on the time or place under study. This approach generates more accurate results for sources that encompass changing periods of time, providing digital humanities scholars with a tool to estimate the gender of names across large textual collections. The article first describes the methodology as implemented in the gender package for the R programming language. It goes on to apply the method to a case study in which we examine gender and gatekeeping in the American historical profession over the past half-century. The gender package illustrates the importance of incorporating historical approaches into computer science and related fields.

An excellent introduction to the gender package for R, historical grounding of the detection of gender by name, with the highlight of the article being the application of this technique to professional literature in American history.

It isn’t uncommon to find statistical techniques applied to texts whose authors and editors are beyond the reach of any critic or criticism.

It is less than common to find statistical techniques applied to extant members of a profession.

Kudos to both Blevins and Mullen for refinement the detection of gender and for applying that refinement publishing in American history.

Clojure Distilled

Filed under: Clojure,Functional Programming — Patrick Durusau @ 7:25 pm

Clojure Distilled by Dmitri Sotnikov.

From the post:

The difficulty in learning Clojure does not stem from its syntax, which happens to be extremely simple, but from having to learn new methods for solving problems. As such, we’ll focus on understanding the core concepts and how they can be combined to solve problems the functional way.

All the mainstream languages belong to the same family. Once you learn one of these languages there is very little effort involved in learning another. Generally, all you have to do is learn some syntax sugar and the useful functions in the standard library to become productive. There might be a new concept here and there, but most of your existing skills are easily transferable.

This is not the case with Clojure. Being a Lisp dialect, it comes from a different family of languages and requires learning new concepts in order to use effectively. There is no reason to be discouraged if the code appears hard to read at first. I assure you that the syntax is not inherently difficult to understand, and that with a bit of practice you might find it to be quite the opposite.

The goal of this guide is to provide an overview of the core concepts necessary to become productive with Clojure. Let’s start by examining some of the key advantages of the functional style and why you would want to learn a functional language in the first place.

Dmitri says near the end, “…we only touched on only a small portion of the overall language…,” but it is an impressive “…small portion…” and is very likely to leave you wanting to hear more.

The potential for immutable data structures in collaborative environments is vast. I’ll have something longer to post on that next week.

Enjoy!

Back from the Dead: Politwoops

Filed under: Journalism,Privacy,Tweets,Twitter — Patrick Durusau @ 7:07 pm

Months after Twitter revoked API access, Politwoops is back, tracking the words politicians take back by Joseph Lichterman.

From the post:

We’ll forgive you if you missed the news, since it was announced on New Year’s Eve: Politwoops, the service which tracks politicians’ deleted tweets, is coming back after Twitter agreed to let it access the service’s API once again.

On Tuesday, the Open State Foundation, the Dutch nonprofit that runs the international editions of Politwoops, said it was functioning again in 25 countries, including the United Kingdom, the Netherlands, Ireland, and Turkey. The American version of Politwoops, operated by the Sunlight Foundation, isn’t back up yet, but the foundation said in a statement that “in the coming days and weeks, we’ll be working behind the scenes to get Politwoops up and running.”

Excellent news!

Politwoops will be reporting tweets that politicians send and then suddenly regret.

I don’t disagree with Twitter that any user can delete their tweets but strongly disagree that I can’t capture the original tweet and at a minimum, point to its absence from the “now” Twitter archive.

Politicians should not be allowed to hide from their sporadic truthful tweets.

Congressional Roll Call Vote – Accessibility Issues

Filed under: XML,XQuery,XSLT — Patrick Durusau @ 2:43 pm

I posted a color coded version of a congressional roll call vote in Jazzing a Roll Call Vote – Part 2 (XQuery, well XSLT anyway), using red for Republicans and blue for Democrats. #XQuery points out accessibility issues which depend upon color perception.

Color coding works better for me than the more traditional roman versus italic font face distinction but let’s improve the color coding to remove the accessibility issue.

The first question is what colors should I use for accessibility?

In searching to answer that question I found this thread at Edward Tufte’s site (of course), Choice of colors in print and graphics for color-blind readers, which has a rich list of suggestions and pointers to other resources.

One in particular, Color Universal Design (CUD), posted by Maarten Boers, has this graphic on colors:

colorblind_palette

Relying on that palette, I changed the colors for the roll call vote to Republicans in orange; Democrats in sky blue and re-generated the roll call document.

roll-call-access

Here is an accessible version, but color-coded version of: FINAL VOTE RESULTS FOR ROLL CALL 705.

An upside of XML is that changing the presentation of all 429 votes took only a few seconds to change the stylesheet and re-generate the results.

Thanks to #XQuery for prodding me on the accessibility issue which resulted in finding the thread at Tufte and the Colorblind barrier-free color pallet.


Other post on congressional roll call votes:

1. Jazzing Up Roll Call Votes For Fun and Profit (XQuery)

2. Jazzing a Roll Call Vote – Part 2 (XQuery, well XSLT anyway)

January 4, 2016

Jazzing a Roll Call Vote – Part 2 (XQuery, well XSLT anyway)

Filed under: XML,XQuery — Patrick Durusau @ 11:41 pm

Apologies but did not make as much progress on the Congressional Roll Call vote as I had hoped.

I did find some interesting information about the vote.xsl stylesheet and manage to use color to code members of the House.

You probably remember me whining about how hard it is to tell between roman and italics to distinguish members of different parties. Jazzing Up Roll Call Votes For Fun and Profit (XQuery)

The XSLT code is worse than I imagined.

Here’s what I mean:

<b><center><font size=”+2″>FINAL VOTE RESULTS FOR ROLL CALL <xsl:value-of select=”/rollcall-vote/vote-metadata/rollcall-num”/>
<xsl:if test=”/rollcall-vote/vote-metadata/vote-correction[text()!=”]”>*</xsl:if></font></center></b>
<!– <xsl:if test = “/rollcall-vote/vote-metadata/majority[text() = ‘D’]”> –>
<xsl:if test = “$Majority=’D'”>
<center>(Democrats in roman; Republicans in <i>italic</i>; Independents <u>underlined</u>)</center><br/>
</xsl:if>
<!– <xsl:if test = “/rollcall-vote/vote-metadata/majority[text() = ‘R’]”> –>
<xsl:if test = “$Majority!=’D'”>
<center>(Republicans in roman; Democrats in <i>italic</i>; Independents <u>underlined</u>)</center><br/>
</xsl:if>

Which party is in the majority determines whether the names in a party appear in roman or italic face font.

Now there’s a distinction that will be lost on a casual reader!

What’s more, if you are trying to reform the stylesheet, don’t look for R or D but again for majority party:

<xsl:template match=”vote”>
<!– Handles formatting of Member names based on party. –>
<!– <xsl:if test=”../legislator/@party=’R'”><xsl:value-of select=”../legislator”/></xsl:if>
<xsl:if test=”../legislator/@party=’D'”><i><xsl:value-of select=”../legislator”/></i></xsl:if> –>
<xsl:if test=”../legislator/@party=’I'”><u><xsl:value-of select=”../legislator”/></u></xsl:if>
<xsl:if test=”../legislator/@party!=’I'”>
<xsl:if test=”../legislator/@party = $Majority”><!– /rollcall-vote/vote-metadata/majority/text()”> –>
<xsl:value-of select=”../legislator”/>
</xsl:if>
<xsl:if test=”../legislator/@party != $Majority”><!– /rollcall-vote/vote-metadata/majority/text()”> –>
<i><xsl:value-of select=”../legislator”/></i>
</xsl:if>
</xsl:if>
</xsl:template>

As you can see, selecting by party has been commented out in favor of the roman/italic distinction based on the majority party.

I wanted to label the Republicans with an icon but my GIMP skills don’t extend to making an icon of young mothers throwing their children under the carriage wheels of the wealthy to save them from a live of poverty and degradation. A bit much to get into a HTML button sized icon.

I settled for using the traditional red for Republicans and blue for Republicans and ran the modified stylesheet against roll705.xml locally.

vote-color-coded

Here is FINAL VOTE RESULTS FOR ROLL CALL 705 as HTML.

Question: Are red and blue easier to distinguish than roman and italic?

If your answer is yes, why resort to typographic subtlety on something like party affiliation?

Are subtle distinctions used to confuse the uninitiated and unwary?

Math Translator Wanted/Topic Map Needed: Mochizuki and the ABC Conjecture

Filed under: Mathematics,Topic Maps,Translation — Patrick Durusau @ 10:07 pm

What if you Discovered the Answer to a Famous Math Problem, but No One was able to Understand It? by Kevin Knudson.

From the post:

The conjecture is fairly easy to state. Suppose we have three positive integers a,b,c satisfying a+b=c and having no prime factors in common. Let d denote the product of the distinct prime factors of the product abc. Then the conjecture asserts roughly there are only finitely many such triples with c > d. Or, put another way, if a and b are built up from small prime factors then c is usually divisible only by large primes.

Here’s a simple example. Take a=16, b=21, and c=37. In this case, d = 2x3x7x37 = 1554, which is greater than c. The ABC conjecture says that this happens almost all the time. There is plenty of numerical evidence to support the conjecture, and most experts in the field believe it to be true. But it hasn’t been mathematically proven — yet.

Enter Mochizuki. His papers develop a subject he calls Inter-Universal Teichmüller Theory, and in this setting he proves a vast collection of results that culminate in a putative proof of the ABC conjecture. Full of definitions and new terminology invented by Mochizuki (there’s something called a Frobenioid, for example), almost everyone who has attempted to read and understand it has given up in despair. Add to that Mochizuki’s odd refusal to speak to the press or to travel to discuss his work and you would think the mathematical community would have given up on the papers by now, dismissing them as unlikely to be correct. And yet, his previous work is so careful and clever that the experts aren’t quite ready to give up.

It’s not clear what the future holds for Mochizuki’s proof. A small handful of mathematicians claim to have read, understood and verified the argument; a much larger group remains completely baffled. The December workshop reinforced the community’s desperate need for a translator, someone who can explain Mochizuki’s strange new universe of ideas and provide concrete examples to illustrate the concepts. Until that happens, the status of the ABC conjecture will remain unclear.

It’s hard to imagine a more classic topic map problem.

At some point, Shinichi Mochizuki shared a common vocabulary with his colleagues in number theory and arithmetic geometry but no longer.

As Kevin points out:

The December workshop reinforced the community’s desperate need for a translator, someone who can explain Mochizuki’s strange new universe of ideas and provide concrete examples to illustrate the concepts.

Taking Mochizuki’s present vocabulary and working backwards to where he shared a common vocabulary with colleagues is simple enough to say.

The crux of the problem being that discussions are going to be fragmented, distributed in a variety of formal and informal venues.

Combining those discussions to construct a path back to where most number theorists reside today would require something with as few starting assumptions as is possible.

Where you could describe as much or as little about new subjects and their relations to other subjects as is necessary for an expert audience to continue to fill in any gaps.

I’m not qualified to venture an opinion on the conjecture or Mochizuki’s proof but the problem of mapping from new terminology that has its own context back to “standard” terminology is a problem uniquely suited to topic maps.

rOpenSci (updated tutorials) [Learn Something, Write Something]

Filed under: Open Data,Open Science,R — Patrick Durusau @ 9:47 pm

rOpenSci has updated 16 of its tutorials!

More are on the way!

Need a detailed walk through of what our packages allow you to do? Click on a package below, quickly install it and follow along. We’re in the process of updating existing package tutorials and adding several more in the coming weeks. If you find any bugs or have comments, drop a note in the comments section or send us an email. If a tutorial is available in multiple languages we indicate that with badges, e.g., (English) (Português).

  • alm    Article-level metrics
  • antweb    AntWeb data
  • aRxiv    Access to arXiv text
  • bold    Barcode data
  • ecoengine    Biodiversity data
  • ecoretriever    Retrieve ecological datasets
  • elastic    Elasticsearch R client
  • fulltext    Text mining client
  • geojsonio    GeoJSON/TopoJSON I/O
  • gistr    Work w/ GitHub Gists
  • internetarchive    Internet Archive client
  • lawn    Geospatial Analysis
  • musemeta    Scrape museum metadata
  • rAltmetric    Altmetric.com client
  • rbison    Biodiversity data from USGS
  • rcrossref    Crossref client
  • rebird    eBird client
  • rentrez    Entrez client
  • rerddap    ERDDAP client
  • rfisheries    OpenFisheries.org client
  • rgbif    GBIF biodiversity data
  • rinat    Inaturalist data
  • RNeXML    Create/consume NeXML
  • rnoaa    Client for many NOAA datasets
  • rplos    PLOS text mining
  • rsnps    SNP data access
  • rvertnet    VertNet.org biodiversity data
  • rWBclimate    World Bank Climate data
  • solr    SOLR database client
  • spocc    Biodiversity data one stop shop
  • taxize    Taxonomic toolbelt
  • traits    Trait data
  • treebase

     

        Treebase data
  • wellknown    Well-known text <-> GeoJSON
  • More tutorials on the way.

Good documentation is hard to come by and good tutorials even more so.

Yet, here are rOpenSci you will find thirty-four (34) tutorials and more on the way.

Let’s answer that moronic security saying: See Something, Say Something, with:

Learn Something, Write Something.

January 3, 2016

Jazzing Up Roll Call Votes For Fun and Profit (XQuery)

Filed under: Government,XML,XQuery — Patrick Durusau @ 11:02 pm

Roll call votes in the US House of Representatives are a stable of local, state and national news. If you go looking for the “official” version, what you find is as boring as your 5th grade civics class.

Trigger Warning: Boring and Minimally Informative Page Produced By Following Link: Final Vote Results For Roll Call 705.

Take a deep breath and load the page. It will open in a new browser tab. Boring. Yes? (You were warned.)

It is the recent roll call vote to fund the US government, take another slice of privacy from citizens, and make a number of other dubious policy choices. (Everything after the first comma depending upon your point of view.)

Whatever your politics though, you have to agree this is sub-optimal presentation, even for a government document.

This is no accident, sans the header, you will find the identical presentation of this very roll call vote at: page H10696, Congressional Record for December 18, 2015 (pdf).

Disappointing so much XML, XSLT, XQuery, etc., has been wasted duplicating non-informative print formatting. Or should I say less-informative formatting than is possible with XML?

Once the data is in XML, legend has it, users can transform that XML in ways more suited to their purposes and not those of the content providers.

I say “legend has it,” because we all know if content providers had their way, web navigation would be via ads and not bare hyperlinks. You want to see the next page? You must select the ad + hyperlink, waiting for the ad to clear before the resource appears.

I can summarize my opinion about content provider control over information legally delivered to my computer: Screw that!

If a content provider enables access to content, I am free to transform that content into speech, graphics, add information, take away information, in short do anything that my imagination desires and my skill enables.

Let’s take the roll call vote in the House of Representatives, Final Vote Results For Roll Call 705.

Just under the title you will read:

(Republicans in roman; Democrats in italic; Independents underlined)

Boring.

For a bulk display of voting results, we can do better than that.

What if we had small images to identify the respective parties? Here are some candidates (sic) for the Republicans:

r-photo1

r-photo-2

r-photo-3

Of course we would have to reduce them to icons size, but XML processing is rarely ever just XML processing. Nearly every project includes some other skill set as well.

Which one do you think looks more neutral? 😉

Certainly be more colorful and depending upon your inclinations, more fun to play about with than the difference in roman and italic. Yes?

Presentation of the data in http://clerk.house.gov/evs/2015/roll705.xml is only one of the possibilities that XQuery offers. Follow along and offer your suggestions for changes, additions and modifications.

First steps:

In the browser tab with Final Vote Results For Roll Call 705, use CNTR-u to view the page source. First notice that the boring web presentation is controlled by http://clerk.house.gov/evs/vote.xsl.

Copy and paste: http://clerk.house.gov/evs/vote.xsl into a new browser tab and select return. The resulting xsl:stylesheet is responsible for generating the original page, from the vote totals to column presentation of the results.

Pay particular attention to the generation of totals from the <vote-data> element and its children. That generation is powered by these lines in vote.xsl:

<xsl:apply-templates select=”/rollcall-vote/vote-metadata”/>
<!– Create total variables based on counts. –>
<xsl:variable name=”y” select=”count(/rollcall-vote/vote-data/recorded-vote/vote[text()=’Yea’])”/>
<xsl:variable name=”a” select=”count(/rollcall-vote/vote-data/recorded-vote/vote[text()=’Aye’])”/>
<xsl:variable name=”yeas” select=”$y + $a”/>
<xsl:variable name=”nay” select=”count(/rollcall-vote/vote-data/recorded-vote/vote[text()=’Nay’])”/>
<xsl:variable name=”no” select=”count(/rollcall-vote/vote-data/recorded-vote/vote[text()=’No’])”/>
<xsl:variable name=”nays” select=”$nay + $no”/>
<xsl:variable name=”nvs” select=”count(/rollcall-vote/vote-data/recorded-vote/vote[text()=’Not Voting’])”/>
<xsl:variable name=”presents” select=”count(/rollcall-vote/vote-data/recorded-vote/vote[text()=’Present’])”/>
<br/>

(Not entirely, I omitted the purely formatting stuff.)

For tomorrow I will be working on a more “visible” way to identify political party affiliation and “borrowing” the count code from vote.xsl.

Enjoy!


You may be wondering what XQuery has to do with topic maps? Well, if you think about it, every time we select, aggregate, etc., data, we are making choices based on notions of subject identity.

That is we think the data we are manipulating represents some subjects and/or information about some subjects, that we find sensible (for some unstated reason) to put together for others to read.

The first step towards a topic map, however, is the putting of information together so we can judge what subjects need explicit representation and how we choose to identify them.

Prior topic map work was never explicit about how we get to a topic map, putting that possibly divisive question behind us, we simply start with topic maps, ab initio.

I was in the car when we took that turn and for the many miles since then. I have come to think that a better starting place is choosing subjects, what we want to say about them and how we wish to say it, so that we have only so much machinery as is necessary for any particular set of subjects.

Some subjects can be identified by IRIs, others by multi-dimensional vectors, still others by unspecified processes of deep learning, etc. Which ones we choose will depend upon the immediate ROI from subject identity and relationships between subjects.

I don’t need triples, for instance, to recognize natural languages to a sufficient degree of accuracy. Unnecessary triples, topics or associations are just padding. If you are on a per-triple contract, they make sense, otherwise, not.

A long way of saying that subject identity lurks just underneath the application of XQuery and we will see where it is useful to call subject identity to the fore.

Emacs – Helm and @ndw

Filed under: Editor,Emacs,Lisp — Patrick Durusau @ 10:49 pm

A Package in a league of its own: Helm

From the webpage:

Helm is incremental completion and selection narrowing framework for Emacs. It will help steer you in the right direction when you’re looking for stuff in Emacs (like buffers, files, etc).

Helm is a fork of anything.el originally written by Tamas Patrovic and can be considered to be its successor. Helm sets out to clean up the legacy code in anything.el and provide a cleaner, leaner and more modular tool, that’s not tied in the trap of backward compatibility.

I saw the following tweet from Norman Walsh today:

Been hacking #Emacs again the past few days. Serious fun. Also: finally became a helm convert. Seriously productive.

I know better.

I should look away, quickly when I see “Emacs” and “Norman Walsh” in a tweet together.

But, just like every other time, I follow the reference, this time to Helm (see above).

I will become more productive, from using Helm or learning more about Emacs in the process. It’s a win either way.

The downside, not too serious a downside, is that I will lose N hours this week as I pursue this lead.

It’s one of the risks of following someone like Norman Walsh on Twitter. But an acceptable one.

Enjoy!

Searching for Geolocated Posts On YouTube

Filed under: Geographic Data,Geography,Journalism,News,Reporting,Searching — Patrick Durusau @ 10:29 pm

Searching for Geolocated Posts On YouTube (video) by First Draft News.

Easily the most information filled 1 minutes and 18 seconds of the holiday season!

Illustrates searching for geolocated post to YouTube, despite YouTube not offering that option!

New tool in development may help!

Visit: http://youtube.github.io/geo-search-tool/search.html

Both the video and site are worth a visit!

Don’t forget to check out First Draft News as well!

When back doors backfire [Uncorrected Tweet From Economist Hits 1.1K Retweets]

Filed under: Cryptography,Encryption,Ethics,Journalism,News,Reporting — Patrick Durusau @ 8:41 pm

When back doors backfire

From the post:

encryption-economist

Push back against back doors

Calls for the mandatory inclusion of back doors should therefore be resisted. Their potential use by criminals weakens overall internet security, on which billions of people rely for banking and payments. Their existence also undermines confidence in technology companies and makes it hard for Western governments to criticise authoritarian regimes for interfering with the internet. And their imposition would be futile in any case: high-powered encryption software, with no back doors, is available free online to anyone who wants it.

Rather than weakening everyone’s encryption by exploiting back doors, spies should use other means. The attacks in Paris in November succeeded not because terrorists used computer wizardry, but because information about their activities was not shared. When necessary, the NSA and other agencies can usually worm their way into suspects’ computers or phones. That is harder and slower than using a universal back door—but it is safer for everyone else.

By my count on two (2) tweets from The Economist, they are running at 50% correspondence between their tweets and actual content.

You may remember my checking their tweet about immigrants yesterday, that got 304 retweets (and was wrong) in Fail at The Economist Gets 304 Retweets!.

Today I saw the When back doors backfire tweet and I followed the link to the post to see if it corresponded to the tweet.

Has anyone else been checking on tweet/story correspondence at The Economist (zine)? The twitter account is: @TheEconomist.

I ask because no correcting tweet has appeared in @TheEconomist tweet feed. I know because I just looked at all of its tweets in chronological order.

Here is the uncorrected tweet:

econ-imm-tweet

As of today, the uncorrected tweet on immigrants has 1.1K retweets and 707 likes.

From the Economist article on immigrants:

Refugee resettlement is the least likely route for potential terrorists, says Kathleen Newland at the Migration Policy Institute, a think-tank. Of the 745,000 refugees resettled since September 11th, only two Iraqis in Kentucky have been arrested on terrorist charges, for aiding al-Qaeda in Iraq.

Do retweets and likes matter more than factual accuracy, even as reported in the tweeted article?

Is this a journalism ethics question?

What’s the standard journalism position on retweet-bait tweets?

January 2, 2016

Sorting Slightly Soiled Data (Or The Danger of Perfect Example Data) – XQuery (Part 2)

Filed under: XML,XQuery — Patrick Durusau @ 7:30 pm

Despite heavy carousing during the holidays, you may still remember Great R packages for data import, wrangling & visualization [+ XQuery], where I re-sorted the table by Sharon Machlis, to present the R packages in package name order.

I followed that up with: Sorting Slightly Soiled Data (Or The Danger of Perfect Example Data) – XQuery, where I detailed the travails of trying to sort the software packages by their short descriptions, again in alphabetical order. My assumption in that post was that either the spaces or the “,” commas in the descriptions were fouling the sort by.

That wasn’t the case, which I should have known because the string operator always returns a string. That is the spaces and “,” inside are just parts of a string, nothing more.

The up-side of the problem was that I spent more than a little while with Walmsley’s XQuery book, searching for ever more esoteric answers.

Here’s the failing XQuery:

<html>
<body>
<table>{
for $row in doc("/home/patrick/working/favorite-R-packages.xml")/table/tr
order by lower-case(string($row/td[2]/a))
return <tr>{$row/td[2]} {$row/td[1]}</tr>
}</table>
</body>
</html>

And here is the working XQuery:

<html>
<body>
<table>{
for $row in doc("/home/patrick/working/favorite-R-packages.xml")/table/tr
order by lower-case(string($row/td[2]))
return <tr>{$row/td[2]} {$row/td[1]}</tr>
}</table>
</body>
</html>

Here is the mistake highlighted:

order by lower-case"(string($row/td[2]/a))"

My first mistake was the inclusion of “/a” in the path. Using string on ($row/td[1]), that is without having /a at the end of the path, gives the original script the same result. (Run that for yourself on favorite-R-packages.xml).

Make any path as long as required and no longer!

My second mistake was not checking the XPath immediately upon the failure of the sort. (The simplest answer is usually the correct one.)

Enjoy!


Update: Removed the quotes marks around (string($row/td[2])) in both queries, they were part of an explanation that did not make the cut. Thanks to XQuery for the catch!

Oxford Legal Citations Free, What About BlueBook?

Filed under: Law — Patrick Durusau @ 4:02 pm

Oxford University Standard for Citation of Legal Authorities (OSCOLA)

From the webpage:

The Oxford University Standard for Citation of Legal Authorities is designed to facilitate accurate citation of authorities, legislation, and other legal materials. It is widely used in law schools and by journal and book publishers in the UK and beyond. OSCOLA is edited by the Oxford Law Faculty, in consultation with the OSCOLA Editorial Advisory Board*. OSCOLA was shortlisted for the Halsbury Legal Awards, 2013 Award for Academic Contribution.

OSCOLA (4th edn, Hart Publishers) is available for free in PDF and the webpage lists supplemental materials, such as OSCOLA styles for popular software packages.

I saw this in a tweet by Carl Malamud who asks:

A sense of public purpose. What happened to Harvard?

As of today, Sales Rank Express (Aaron Shepard) reports that:

The Bluebook: A Uniform System of Citation has a sales rank of 9147.

Grapes of Wrath, Amazon sales rank of 2437.

Snow Crash comes in at 3047.

The Firm, by John Grisham is now ranked at 18619.

Projecting from Amazon sales ranking is uncertain but I suspect The Bluebook is making less money than Grapes of Wrath and Snow Crash but more money than The Firm by John Grisham.

The answer to what happened to Harvard is money.

Fail at The Economist Gets 304 Retweets!

Filed under: Journalism,News,Reporting — Patrick Durusau @ 2:17 pm

Are you ever tempted to re-tweet a tweet with “facts” you already agree with? Without checking the story first?

I sure was today when I saw:

economist-refugees

Posted from @TheEconomist.

When I saw it, that tweet had been retweeted 304 times and liked 202 times.

If you following the link to: Yearning to breathe free, you will find the sixth paragraph reads in full:

Refugee resettlement is the least likely route for potential terrorists, says Kathleen Newland at the Migration Policy Institute, a think-tank. Of the 745,000 refugees resettled since September 11th, only two Iraqis in Kentucky have been arrested on terrorist charges, for aiding al-Qaeda in Iraq.

In order to reconcile “not one” in the graphic and “two Iraqis” in the story, I have to assume the graphic artist didn’t read the story.

Moreover, I have to assume most of the 304 retweeters didn’t read the story either.

I wish the graphic were true but people being people the story sounds closer to the truth. Any sufficiently large number of people is going to have a few terrorists in it.

So? I assume there were some rapists, murderers, pedophiles, as well as doctors, lawyers, dentists and a lot of just decent people, the vast majority of the 750,000.

Generosity towards refugees should not be moderated or limited by as selfish and base a motive as fear. Not now, not ever.

BTW, read before you re-tweet. Yes?

January 1, 2016

Street-Fighting Mathematics – Free Book – Lesson For Semanticists?

Filed under: Mathematical Reasoning,Mathematics,Ontology,Semantics — Patrick Durusau @ 8:28 pm

Street-Fighting Mathematics: The Art of Educated Guessing and Opportunistic Problem Solving by Sanjoy Mahajan.

From the webpage:

street-fighting

In problem solving, as in street fighting, rules are for fools: do whatever works—don’t just stand there! Yet we often fear an unjustified leap even though it may land us on a correct result. Traditional mathematics teaching is largely about solving exactly stated problems exactly, yet life often hands us partly defined problems needing only moderately accurate solutions. This engaging book is an antidote to the rigor mortis brought on by too much mathematical rigor, teaching us how to guess answers without needing a proof or an exact calculation.

In Street-Fighting Mathematics, Sanjoy Mahajan builds, sharpens, and demonstrates tools for educated guessing and down-and-dirty, opportunistic problem solving across diverse fields of knowledge—from mathematics to management. Mahajan describes six tools: dimensional analysis, easy cases, lumping, picture proofs, successive approximation, and reasoning by analogy. Illustrating each tool with numerous examples, he carefully separates the tool—the general principle—from the particular application so that the reader can most easily grasp the tool itself to use on problems of particular interest. Street-Fighting Mathematics grew out of a short course taught by the author at MIT for students ranging from first-year undergraduates to graduate students ready for careers in physics, mathematics, management, electrical engineering, computer science, and biology. They benefited from an approach that avoided rigor and taught them how to use mathematics to solve real problems.

I have just started reading Street-Fighting Mathematics but I wonder if there is a parallel between mathematics and the semantics that everyone talks about capturing from information systems.

Consider this line:

Traditional mathematics teaching is largely about solving exactly stated problems exactly, yet life often hands us partly defined problems needing only moderately accurate solutions.

And re-cast it for semantics:

Traditional semantics (Peirce, FOL, SUMO, RDF) is largely about solving exactly stated problems exactly, yet life often hands us partly defined problems needing only moderately accurate solutions.

What if the semantics we capture and apply are sufficient for your use case? Complete with ROI for that use case.

Is that sufficient?

How to Avoid Being a Terrorism “False Positive” in 2016

Filed under: Government,Privacy,Security — Patrick Durusau @ 7:56 pm

For all of the fear mongering about terrorists and terrorism, I’m more worried about being a “false positive” for terrorism than terrorism.

Radley Balko wrote about a swat raid on an entirely innocent family in: Federal judge: Drinking tea, shopping at a gardening store is probable cause for a SWAT raid on your home, saying:

Last week, U.S. District Court Judge John W. Lungstrum dismissed every one of the Hartes’s claims. Lungstrum found that sending a SWAT team into a home first thing in the morning based on no more than a positive field test and spotting a suspect at a gardening store was not a violation of the Fourth Amendment. He found that the police had probable cause for the search, and that the way the search was conducted did not constitute excessive force. He found that the Hartes had not been defamed by the raid or by the publicity surrounding it. He also ruled that the police were under no obligation to know that drug testing field kits are inaccurate, nor were they obligated to wait for the more accurate lab tests before conducting the SWAT raid. The only way they’d have a claim would be if they could show that the police lied about the results, deliberately manipulated the tests or showed a reckless disregard for the truth — and he ruled that the Hartes had failed to
do so.

If you think that’s a sad “false positive” story, consider Jean Charles de Menezes who was murdered by London Metropolitan Police for sitting on a bus. He was executed with 7 shots to the head, while being physically restrained by another police officer.

Home Secretary Charles Clarke (at that time) is quoted by the BBC saying:

“I very, very much regret what happened.

“I hope [the family] understand the police were trying to do their very best under very difficult circumstances.”

What “very difficult circumstances?” Menezes was sitting peacefully on a bus, unarmed and unaware that he was about to be attacked by three police officers. What’s “very difficult” about those circumstances?

Ah, but it was the day after bombings in London and the usual suspects had spread fear among the police and the public. The “very difficult circumstances” victimized the police, the public and of course, Menezes.

If you live in the United States, there is the ongoing drum roll of police shooting unarmed black men, when they don’t murder a neighbor on the way.

No doubt the police need to exercise more restraint but the police are being victimized by the toxic atmosphere of fear generated by public officials as well as those who profit from fear-driven public policies.

You do realize the TSA agents at airports are supplied by contractors. Yes? $Billions in contracts.

Less fear, fewer TSA (if any at all) = Loss of $Billions in contracts

With that kind of money at stake, the toxic atmosphere of fear will continue to grow.

How can you reduce your personal odds of being a terrorism “false positive” in 2016?

The first thing is to realize that the police may look like the “enemy” but they really aren’t. For the most part they are underpaid, under-trained, ordinary people who have a job most of us wouldn’t take on a bet. There are bad cops, have no doubt, but the good ones out-number the bad ones.

The police are being manipulated by the real bad actors, the ones who drive and profit from the fear machine.

The second thing to do is for you and your community to reach out to the police officers who regularly patrol your community. Get to know them by volunteering at police events or inviting them to your own.

Create an environment where the police don’t see a young black man but Mr. H’s son, you know Mr. H, he helped with the litter campaign last year, etc.

Getting to know the local police and getting the police to know your community won’t solve every problem but it may lower the fear level enough to save lives, one of which may be your own.

You won’t be any worse off and on the up side, enough good community relations may result in the police being on your side when it is time to oust the fear mongers.

XQilla-2.3.2 – Tooling up for 2016 (Part 2) (XQuery)

Filed under: Virtual Machines,XML,XQilla,XQuery — Patrick Durusau @ 5:03 pm

As I promised yesterday, a solution to the XQilla-2.3.2 installation problem!

Using a virtual machine to install the latest version of Ubuntu (15.10), which had the libraries required to install XQilla!

I use VirtualBox from Oracle but people also use VMware.

Virtual boxes come in all manner of configurations so you are likely to spend some time loading linux headers and the like to compile software.

The advantage of a virtual box is that I don’t have to risk doing something dumb or out of fatigue to my working setup. If I have to blow away the entire virtual machine, its takes only a few minutes to download another one.

Well, on any day other than New Year’s Day I found out today. I don’t know if people were streaming that many football games or streaming live “acts” of some sort but the Net was very slow today.

Introducing XQuery to humanists, librarians and reporters using a VM with the usual XQuery suspects pre-loaded would be very cool!

Great way to distribute xqueries* and shell scripts that run them for immediate results.

If you have any thoughts about what such a VM should contain, etc., drop me an email patrick@durusau.net or leave a comment. Thanks!

PS: XQueries returned approximately 26K “hits,” and xquerys returned approximately 1,700 “hits.” Usage favors the plural as “xqueries” so that is what I am following. At the first of a sentence, XQueries?

PPS: I could have written this without the woes of failed downloads, missing header files, etc. but I wanted to know for myself that Ubuntu (15.10) with all the appropriate header files would in fact compile XQilla-2.3.2.

You may need this line to get all the headers:

apt-get install dkms build-essential linux-headers-generic

Not to mention that I would update everything before trying to compile software. Hard to say how long your VM has been on the shelf.

« Newer Posts

Powered by WordPress