## Archive for the ‘Proofing’ Category

Monday, February 1st, 2016

It’s important to get things right in writing as well as code.

Where do you think you fall on a scale of 1 to 10, with 10 being nothing gets by you?

Think of the last time you saw a sign for a 7-Eleven store. If you don’t know what that is, try this link for images of 7-Eleven stores. Thousands of them.

This image is spoiler space:

As is this one:

Have no fear, that’s not the image from Snow Crash. đ

Hear is an annotated 7-Eleven image posted by Michael G. Munz on Twitter.

So, how did you do?

Were you paying close enough attention?

Personal confession: I grew up in the land of 7-Eleven and never gave it a second thought. Ever.

In case you want to explore more, see 7-Eleven’s corporate site.

### Estimating “known unknowns”

Saturday, December 12th, 2015

Estimating “known unknowns” by Nick Berry.

From the post:

There’s a famous quote from former Secretary of Defense Donald Rumsfeld:

â âŠ there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns â the ones we don’t know we don’t know.â

I write this blog. I’m an engineer. Whilst I do my best and try to proof read, often mistakes creep in. I know there are probably mistakes in just about everything I write! How would I go about estimating the number of errors?

The idea for this article came from a book I recently read by Paul J. Nahin, entitled Duelling Idiots and Other Probability Puzzlers (In turn, referencing earlier work by the eminent mathematician George PĂłlya).

Imagine I write a (non-trivially short) document and give it to two proof readers to check. These two readers (independantly) proof read the manuscript looking for errors, highlighting each one they find.

Just like me, these proof readers are not perfect. They, also, are not going to find all the errors in the document.

Because they work independently, there is a chance that reader #1 will find some errors that reader #2 does not (and vice versa), and there could be errors that are found by both readers. What we are trying to do is get an estimate for the number of unseen errors (errors detected by neither of the proof readers).*

*An alternate way of thinking of this is to get an estimate for the total number of errors in the document (from which we can subtract the distinct number of errors found to give an estimate to the number of unseen errros.

A highly entertaining posts on estimating “known unknowns,” such as the number of errors in a paper that has been proofed by two independent proof readers.

Of more than passing interest to me because I am involved in a New Testament Greek Lexicon project that is an XML encoding of a 500+ page Greek lexicon.

The working text is in XML, but not every feature of the original lexicon was captured in markup and even if that were true, we would still want to improve upon features offered by the lexicon. All of which depend upon the correctness of the original markup.

You will find Nick’s analysis interesting and more than that, memorable. Just in case you are asked about “estimating ‘known unknowns'” in a data science interview.

Only Rumsfeld could tell you how to estimate an “unknown unknowns.” I think it goes: “Watch me pull a number out of my ….”

đ

I was found this post by following another post at this site, which was cited by Data Science Renee.

### Bugs, features, and risk

Thursday, January 12th, 2012

Bugs, features, and risk by John D. Cook.

All software has bugs. Someone has estimated that production code has about one bug per 100 lines. Of course thereâs some variation in this number. Some software is a lot worse, and some is a little better.

But bugs-per-line-of-code is not very useful for assessing risk. The risk of a bug is the probability of running into it multiplied by its impact. Some lines of code are far more likely to execute than others, and some bugs are far more consequential than others.

Devoting equal effort to testing all lines of code would be wasteful. Youâre not going to find all the bugs anyway, so you should concentrate on the parts of the code that are most likely to run and that would produce the greatest harm if they were wrong.

Has anyone done error studies on RDF/OWL/LinkedData? Asking because obviously topic maps, Semantic Web, and other semantic applications are going to have errors.

Some obvious questions:

• What data is most critical to be correct?
• What is your acceptable error rate? (0 is not an acceptable answer)
• What is the error rate for data entry with your application?

If you are interested in error correction, in semantic contexts or otherwise, start with General Error Detection, a set of pages maintained by Roy Panko.

From General Error Detection homepage:

Proofreading catches about 90% of all nonword spelling errors and about 70% of all word spelling errors. The table below shows that error detection varies widely by the type of task being done.

In general, our error detection rate only approaches 90% for simple mechanical errors, such as mistyping a number.

For logic errors, error detection is far worse, often 50% or less.

For omission errors, where we have left something out, correction rates are very low.