Archive for the ‘Measurement’ Category

4 [5] ways to misinterpret your measurement

Saturday, March 5th, 2016

4 ways to misinterpret your measurement by Katie Paine.

I mention this primarily because of the great graphic and a fifth way to misinterpret data that Katie doesn’t mention.


The misinterpretations Katie mentions are important and see her post for those.

The graphic, on the other hand, illustrates misinterpretation by not understanding the data.

The use of English, integers, etc., provides no assurances you will “understand” the data.

Not “understanding” the data, you are almost certain to misinterpret it.

I first saw this in a tweet by Kirk Borne.

Facts vs. Expert Opinion

Saturday, May 3rd, 2014

In a recent story about randomized medical trials:

“I should leave the final word to Archie Cochrane. In his trial of coronary care units, run in the teeth of vehement opposition, early results suggested that home care was at the time safer than hospital care. Mischievously, Cochrane swapped the results round, giving the cardiologists the (false) message that their hospitals were best all along.

“They were vociferous in their abuse,” he later wrote, and demanded that the “unethical” trial stop immediately. He then revealed the truth and challenged the cardiologists to close down their own hospital units without delay. “There was dead silence.”

Followed by Harford’s closing line: “The world often surprises even the experts. When considering an intervention that might profoundly affect people’s lives, if there is one thing more unethical than running a randomised trial, it’s not running the trial”

One of the persistent dangers of randomized trials is that the results can contradict what is “known” to be true by experts.

Another reason for user rather than c-suite “testing” of product interfaces, assuming the c-suite types are willing to hear “bad” news.

And a good illustration that claims of “ethics” can be hiding less pure concerns.

I first saw this in A brilliant anecdote on how scientists react to science against their interests by Chris Blattman, which lead me to: Weekly Links May 2: Mobile phones, working with messy data, funding, working with children, and more… and thence to the original post: The random risks of randomised trials by Tim Harford.

The Units Ontology: a tool for integrating units of measurement in science

Sunday, October 14th, 2012

The Units Ontology: a tool for integrating units of measurement in science by Georgios V. Gkoutos, Paul N. Schofield, and Robert Hoehndorf. ( Database (2012) 2012 : bas033 doi: 10.1093/database/bas03)


Units are basic scientific tools that render meaning to numerical data. Their standardization and formalization caters for the report, exchange, process, reproducibility and integration of quantitative measurements. Ontologies are means that facilitate the integration of data and knowledge allowing interoperability and semantic information processing between diverse biomedical resources and domains. Here, we present the Units Ontology (UO), an ontology currently being used in many scientific resources for the standardized description of units of measurements.

As the paper acknowledges, there are many measurement systems in use today.

Leaves me puzzled as to what happens to data that follows some other drummer? Other than this one?

I assume any coherent system has no difficulty integrating data written in that system.

So how does adding another coherent system assist in that integration?

Unless everyone universally moves to the new system. Unlikely don’t you think?

Measurement = Meaningful?

Saturday, July 7th, 2012

A two part series of posts on data and education has started up at Hortonworks. Data in Education (Part I) by James Locus.

From the post:

The education industry is transforming into a 21st century data-driven enterprise. Metrics based assessment has been a powerful force that has swept the national education community in response to widespread policy reform. Passed in 2001, the No-Child-Left-Behind Act pushed the idea of standards-based education whereby schoolteachers and administrators are held accountable for the performance of their students. The law elevated standardized tests and dropout rates as the primary way officials measure student outcomes and achievement. Underperforming schools can be placed on probation, and if no improvement is seen after 3-4 years, the entire staff of the school can be replaced.

The political ramifications of the law inspire much debate amongst policy analysts. However, from a data perspective, it is more informative to understand how advances in technology can help educators both meet the policy’s guidelines and work to create better student outcomes.

How data measurement can drive poor management practices is captured in:

whereby schoolteachers and administrators are held accountable for the performance of their students.

Really? The only people who are responsible for the performance of students are schoolteachers and administrators?

Recalling that schoolteachers don’t see a child until they are at least four or five years old and most of their learning and behavior patterns have been well established. By their parents, by advertisers, by TV shows, by poor diets, by poor health care, etc.

And when they do see children, it is only for seven hours out of twenty-four.

Schoolteachers and administrators are in a testable situation, which isn’t the same thing as a situation where tests are meaningful.

As data “scientists” we can crunch the numbers given to us and serve the industry’s voracious appetite for more numbers.

Or we can point out that better measurement design could result in different policy choices.

Depends on your definition of “scientist.”

There were people who worked for Big Tobacco that still call themselves “scientists.”

What do you think?

The inevitable perversion of measurement

Friday, April 13th, 2012

The inevitable perversion of measurement

From the post:

Supposedly one of the tactics in the fight against obesity is to change how we measure obesity (from BMI to DXA): that’s the key message in an LA Times article (link).

This is a great read if only because it covers many common problems of measurement systems. In thinking about invented metrics, such as SAT scores, employee performance ratings and teacher ratings, bear in mind they only have names because we gave them names.

Measuring things always lead to perverse behavior. Here are some examples straight out of this article:

The list of “perversions” include:

1. The metric, even if accurately measured, has no value

2. Blame the failure of a program on the metric

3. A metric becomes more complicated over time

If I am looking for “perversion” I am likely to skip this channel. 😉

On the other hand, the post does list some of the issues relative to our attempts at measurement.

Measurement is an important component for the judging of similarity and sameness.

Can you find/point out other posts addressing issues with measurement? (perverse or not)

Catalog QUDT

Wednesday, October 5th, 2011

Catalog QUDT

From the website:

The QUDT, or ‘Quantity, Unit, Dimension and Type’ collection of ontologies define base classes, properties, and instances for modeling physical quantities, units of measure, and their dimensions in various measurement systems. The goal of the QUDT collection of models is to provide a machine-processable approach for specifying measurable quantities, units for measuring different kinds of quantities, the numerical values of quantities in different units of measure and the data structures and data types used to store and manipulate these objects in software. A simple treatment of units is separated from a full dimensional treatment of units. Vocabulary graphs will be used to organize units for different disciplines.

Useful in a number of domains. Comparison to other measurement ontology efforts should prove to be interesting.

Introduction to Restricted Boltzmann Machines

Saturday, October 1st, 2011

Introduction to Restricted Boltzmann Machines

While I was at Edwin Chen’s blog, I discovered this post on Restricted Boltzmann Machines which begins:

Suppose you ask a bunch of users to rate a set of movies on a 0-100 scale. In classical factor analysis, you could then try to explain each movie and user in terms of a set of latent factors. For example, movies like Star Wars and Lord of the Rings might have strong associations with a latent science fiction and fantasy factor, and users who like Wall-E and Toy Story might have strong associations with a latent Pixar factor.

Restricted Boltzmann Machines essentially perform a binary version of factor analysis. (This is one way of thinking about RBMs; there are, of course, others, and lots of different ways to use RBMs, but I’ll adopt this approach for this post.) Instead of users rating a set of movies on a continuous scale, they simply tell you whether they like a movie or not, and the RBM will try to discover latent factors that can explain the activation of these movie choices.

Not for the novice user but something you may run across in the analysis of data sets or need yourself. Excellent pointers to additional resources.