Archive for the ‘Books’ Category

My Last Index (Is Search A Form of Discrimination?)

Tuesday, June 20th, 2017

My Last Index by Judith Pascoe.

From the post:

A casual reader of authors’ acknowledgment pages will encounter expressions of familial gratitude that paper over years of spousal neglect and missed cello recitals. A keen reader of those pages may happen upon animals that were essential to an author’s well-being—supportive dogs, diverting cats, or, in one instance, “four very special squirrels.” But even an assiduous reader of acknowledgments could go a lifetime without coming across a single shout-out to a competent indexer.

That is mostly because the index gets constructed late in the book-making process. But it’s also because most readers pay no mind to indexes, especially at this moment in time when they are being supplanted by Amazon and Google. More and more, when I want to track down an errant tidbit of information about a book, I use Amazon’s “Search inside this book” function, which allows interested parties to access a book’s front cover, copyright, table of contents, first pages (and sometimes more), and index. But there’s no reason to even use the index when you can “Look Inside!” to find anything you need.

I had plenty of time to ponder the unsung heroism of indexers when I was finishing my latest book. Twice before, I had assembled an indexer’s tools of trade: walking down the stationery aisles of a college book store, pausing to consider the nib and color of my Flair pens, halting before the index cards. But when I began work on this index, I was overcome with thoughts of doom that Nancy Mulvany, author of Indexing Books, attributes to two factors that plague self-indexing authors: general fatigue and too much self-involvement. “Intense involvement with one’s book,” Mulvany writes, “can make it very difficult to anticipate the index user’s needs accurately.”

Perhaps my mood was dire because I’d lost the services of my favorite proofreader, a woman who knew a blackberry from a BlackBerry, and who could be counted on to fix my flawed French. Perhaps it was because I was forced to notice how often I’d failed to include page citations in my bibliography entries, and how inconsistently I’d applied the protocol for citing Web sites—a result of my failure to imagine a future index user so needy as to require the exact date of my visit to Or perhaps it was because my daughter was six months away from leaving home for college and I was missing her in advance.

Perhaps for all of those reasons, I could only see my latest index as a running commentary on the fragility of all human endeavor. And so I started reading indexes while reluctantly compiling my own.

A highly instructive tale on the importance of indexing (and hiring a professional indexer) that includes this reference to Jonathan Swift:

Jonathan Swift, in his 1704 A Tale of a Tub, describes two means of using books: “to serve them as men do lords—learn their titles exactly and then brag of their acquaintance,” or “the choicer, the profounder and politer method, to get a thorough insight into the index, by which the whole book is governed and turned, like fishes by the tail.”

In full context, the Swift passage is even more amusing:

The whole course of things being thus entirely changed between us and the ancients, and the moderns wisely sensible of it, we of this age have discovered a shorter and more prudent method to become scholars and wits, without the fatigue of reading or of thinking. The most accomplished way of using books at present is twofold: either first to serve them as some men do lords, learn their titles exactly, and then brag of their acquaintance; or, secondly, which is indeed the choicer, the profounder, and politer method, to get a thorough insight into the index by which the whole book is governed and turned, like fishes by the tail. For to enter the palace of learning at the great gate requires an expense of time and forms, therefore men of much haste and little ceremony are content to get in by the back-door. For the arts are all in a flying march, and therefore more easily subdued by attacking them in the rear. Thus physicians discover the state of the whole body by consulting only what comes from behind. Thus men catch knowledge by throwing their wit on the posteriors of a book, as boys do sparrows with flinging salt upon their tails. Thus human life is best understood by the wise man’s rule of regarding the end. Thus are the sciences found, like Hercules’ oxen, by tracing them backwards. Thus are old sciences unravelled like old stockings, by beginning at the foot. (The Tale of a Tub by Jonathan Swift)

Searching, as opposed to indexing (good indexing at any rate), is the equivalent of bragging of the acquaintance of a lord. Yes, you did find term A or term B in the text, but you don’t know what other terms appear in the text, nor do you know what other statements were made about term A or term B.

Search is at best a partial solution and one that varies based on the skill of the searcher.

Indexing, on the other hand, can reflect an accumulation of insights, made equally available to all readers.

Hmmm, equally made available to all readers.

Is search a form of discrimination?

Is search a type of access with disproportionate (read disadvantageous) impact on some audiences and not others?

Any research on the social class, racial, ethnic impact of search you would suggest?

All leads and tips appreciated!

Are You A Serious Reader?

Saturday, June 17th, 2017

What does it mean for a journalist today to be a Serious Reader? by Danny Funt.

From the post:

BEFORE THE BOOKS ARRIVED, Adam Gopnik, in an effort to be polite, almost contradicted the essential insight of his life. An essayist, critic, and reporter at The New Yorker for the last 31 years, he was asked whether there is an imperative for busy, ambitious journalists to read books seriously—especially with journalism, and not just White House reporting, feeling unusually high-stakes these days—when the doorbell rang in his apartment, a block east of Central Park. He came back with a shipment and said, “It would be,” pausing to think of and lean into the proper word, “brutally unkind and unrealistic to say, Oh, all of you should be reading Stendhal. You’ll be better BuzzFeeders for it.” For the part about the 19th-century French novelist, he switched from his naturally delicate voice to a buffoonish, apparently bookish, baritone.

Then, as he tore open the packaging of two nonfiction paperbacks (one, obscure research for an assignment on Ernest Hemingway; the other, a new book on Adam Smith, a past essay subject) and sat facing a wall-length bookcase and sliding ladder in his heavenly, all-white living room, Gopnik took that back. His instinct was to avoid sermonizing about books, particularly to colleagues with grueling workloads, because time for books is a privilege of his job. And yet, to achieve such an amazingly prolific life, the truth is he simply read his way here.

I spoke with a dozen accomplished journalists of various specialties who manage to do their work while reading a phenomenal number of books, about and beyond their latest project. With journalists so fiercely resented after last year’s election for their perceived elitist detachment, it might seem like a bizarre response to double down on something as hermetic as reading—unless you see books as the only way to fully see the world.

Being well-read is a transcendent achievement similar to training to run 26.2 miles, then showing up for a marathon in New York City and finding 50,000 people there. It is at once superhuman and pedestrian.

… (emphasis in original)

A deeply inspirational and instructive essay on serious readers and the benefits that accrue to them. Very much worth two or more slow reads, plus looking up the authors, writers and reporters who are mentioned.

Earlier this year I began the 2017 Women of Color Reading Challenge. I have not discovered any technical insights into data science or topic maps, but I am gaining, incrementally for sure, a deeper appreciation for how race and gender shapes a point of view.

Or perhaps more accurately, I am encountering points of view different enough from my own that I recognize them as being different. That in and of itself, the encountering of different views, is one reason I aspire to become a “serious reader.”


Digitised Manuscripts hyperlinks Spring 2017

Thursday, June 1st, 2017

Digitised Manuscripts hyperlinks Spring 2017

From the post:

From ancient papyri to a manuscript given by the future Queen Elizabeth I to King Henry VIII for New Year’s Day, from books written entirely in gold to Leonardo da Vinci’s notebook, there is a wealth of material on the British Library’s Digitised Manuscripts site. At the time of writing, you can view on Digitised Manuscripts no fewer than 1,783 manuscripts made in Europe before 1600, and more are being added all the time. For a full list of what is currently available, please see this file: Download PDF of Digitised MSS Spring 2017. This is also available in the form of a spreadsheet (although this format can not be downloaded on all web browsers): Download Spreadsheet of Digitised MSS Spring 2017.

The post is replete with guidance on use of the Digitised Manuscripts and other aids for the reader.

These works won’t interest Washington illiterati, but I don’t read to please others, only myself.

So should you.

The Marshall Index: A Guide to Negro Periodical Literature, 1940-1948

Tuesday, May 2nd, 2017

The Marshall Index: A Guide to Negro Periodical Literature, 1940-1948 by Albert P. Marshall, revised edition, Danky and Newman, 2002. Posted by ProQuest as a guide to their literature collections.

From the introduction:

For researchers today, one of the rewarding aspects of Marshall’s Guide, and an important one, is the number of obscure, little-collected, and discontinued African-American serials that he includes. Who today is familiar, for example, with Pulse, Service, New Vistas, Negro Traveler, Informer, Whetstone, Sphinx. Ivy Leaf, or Oracle? Until the large and comprehensive bibliography of black periodicals collected and edited by James P. Danky and Maureen Hady of the State Historical Society of Wisconsin and published by Harvard University Press is widely disseminated, few will even know the existence of many of these rare sources.

Superseded in some sense by African American newspapers and periodicals : a national bibliography by James P. Danky, but only in a sense.

The Marshall Index will always remain the first index of Black periodical literature and reflect the choices and judgments of its author.

Pass this along to your librarian friends and anyone interested in Black literature.

Textbook manifesto

Sunday, April 9th, 2017

Textbook manifesto by Allen B. Downey.

From the post:

My textbook manifesto is so simple it sounds stupid. Here it is:

Students should read and understand textbooks.

That’s it. It’s hard to imagine that anyone would disagree, but here’s the part I find infuriating: the vast majority of textbook authors, publishers, professors and students behave as if they do not expect students to read or understand textbooks.

Here’s how it works. Most textbook authors sit down with the goal writing the bible of their field. Since it is meant to be authoritative, they usually stick to well-established ideas and avoid opinion and controversy. The result is a book with no personality.

For publishers, the primary virtue is coverage. They want books that can be used for many classes, so they encourage authors to include all the material for all possible classes. The result is a 1000-page book with no personality.
… (emphasis in original)

You probably know Downey from his Think Python, Think Bayes books.

Think Python, with the index, front matter, etc. runs 244 pages from tip to tail.

Longer than his proposed 10 pages per week for a semester course, total pages of 140 pages for a class, but not unreasonably so.

Take this as encouragement that a useful book need not be comprehensive, just effectively communicating more than the reader knows already.

Notes to (NUS) Computer Science Freshmen…

Monday, March 13th, 2017

Notes to (NUS) Computer Science Freshmen, From The Future

From the intro:

Early into the AY12/13 academic year, Prof Tay Yong Chiang organized a supper for Computer Science freshmen at Tembusu College. The bunch of seniors who were gathered there put together a document for NUS computing freshmen. This is that document.

Feel free to create a pull request to edit or add to it, and share it with other freshmen you know.

There is one sad note:

The Art of Computer Programming (a review of everything in Computer Science; pretty much nobody, save Knuth, has finished reading this)

When you think about the amount of time Knuth has spent researching, writing and editing The Art of Computer Programming (TAOCP), it doesn’t sound unreasonable to expect others, a significant number of others, to have read it.

Any online reading groups focused on TAOCP?

BBC News Could Do Better: Scottish witchcraft book published online

Friday, November 4th, 2016

Scottish witchcraft book published online

From the post:

The Names of Witches in Scotland, 1658 collection, was drawn up during a time when the persecution of supposed witches was rife.

The book also lists the towns where the accused lived and notes of confession.

It is believed many were healers, practicing traditional folk medicine.

Some of the notes give small insights into the lives of those accused.

It is recorded that the spouse of Agnes Watsone, from Dumbarton, is “umquhile” (deceased).

A majority of those accused of witchcraft were women although the records reveal that some men were also persecuted.

Jon Gilchreist and Robert Semple, from Dumbarton, are recorded as sailors. A James Lerile of Alloway, Ayr, is noted as “clenged”, in other words cleaned or made clean.

While Mr Lerile’s fate is unclear, the term probably meant banishment or death.

I’m glad BBC News drew attention to this volume but the only links in the post go to a very annoying commercial site that has transcribed the work.


With very little effort, I can send you to images of the original:

Names of the witches (in Scotland) 1658.

Some readers (cough), may find the commercial service useful. OK, but BBC News should include links to originals, especially then those are sans annoying subscription requests.

The GCHQ Puzzle Book

Friday, November 4th, 2016

The GCHQ Puzzle Book

The Amazon description:

If 3=T, 4=S, 5=P, 6=H, 7=H … what is 8?

What is the next letter in the sequence: M, V, E, M, J, S, U, ?

Which of the following words is the odd one out: CHAT, COMMENT, ELF, MANGER, PAIN, POUR?

GCHQ is a top-secret intelligence and security agency which recruits some of the very brightest minds. Over the years, their codebreakers have helped keep our country safe, from the Bletchley Park breakthroughs of WWII to the modern-day threat of cyberattack. So it comes as no surprise that, even in their time off, the staff at GCHQ love a good puzzle. Whether they’re recruiting new staff or challenging each other to the toughest Christmas quizzes and treasure hunts imaginable, puzzles are at the heart of what GCHQ does. Now they’re opening up their archives of decades’ worth of codes, puzzles and challenges for everyone to try.
(emphasis in original)

Hard to say if successful completion of the GCHQ Puzzle Book or hacking into GCHQ would be the better way to introduce yourself to the GCHQ.

Depends on which department within GCHQ captures your interest. 😉

Be aware that some pedestrian agencies and their personnel view intrusion into government computers to be crime and punishable as such.

More sophisticated agencies/personnel realize that “…in Jersey, anything is legal so long as you don’t get caught” and/or if you have something of sufficient value to trade.

The “rule of law,” and “letter of the law” stuff is for groundlings. Don’t be a groundling.

How To Use Data Science To Write And Sell More Books (Training Amazon)

Sunday, October 30th, 2016

From the description:

Chris Fox is the bestselling author of science fiction and dark fantasy, as well as non-fiction books for authors including Write to Market, 5000 words per hour and today we’re talking about his next book, Six Figure Author: Using data to sell books.

Show Notes What Amazon data science, and machine learning, are and how authors can use them. How Amazon differs from the other online book retailers and how authors can train Amazon to sell more books. What to look for to find a voracious readership. Strategically writing to market and how to know what readers are looking for. On Amazon ads and when they are useful. Tips on writing faster. The future of writing, including virtual reality and AI help with story.

Joanna Penn of The Creative Penn interviews Chris Fox

Some of the highlights:

Training Amazon To Work For You

…What you want to do is figure out, with as much accuracy as possible, who your target audience is.

And when you start selling your book, the number of sales is not nearly as important as who you sell your book to, because each of those sales to Amazon represents a customer profile.

If you can convince them that people who voraciously read in your genre are going to love this book and you sell a couple of hundred copies to people like that, Amazon’s going to take it and run with it. You’ve now successfully trained them about who your audience is because you used good data and now they’re able to easily sell your book.

If, on the other hand, you and your mom buys a copy and your friend at the coffee shop buys a copy, and people who aren’t necessarily into that genre are all buying it, Amazon gets really lost and confused.

Easier said than done but how’s that for taking advantage of someone else’s machine learning?

Chris also has tips for not “polluting” your Amazon sales data.

Discovering and Writing to a Market

How do you find a sub-category or a smaller niche within the Amazon ecosystem? What are the things to look for in order to find a voracious readership?

Chris: What I do is I start looking at the rankings of the number 1, the number 20, 40, 60, 80 and 100 books. You can tell based on where those books are ranked, how many books in the genre are selling. If the number one book is ranked in the top 100 in the store and so is the 20th book, then you’ve found one of the hottest genres on Amazon.

If you find that by the time you get down to number 40, the rank is dropping off sharply, that suggests that not enough books are being produced in that genre and it might be a great place for you to jump in and make a name for yourself. (emphasis in original)

I know, I know, this is a tough one. Especially for me.

As I have pointed out here on multiple occasions, “terrorism” is largely a fiction of both government and media.

However, if you look at the top 100 paid sellers on terrorism at Amazon, the top fifty (50) don’t have a single title that looks like it denies terrorism is a problem.


Which I take to mean, in terms of selling books, services, or data, the terrorism is coming for us all gravy train is the profitable line.

Or at least to indulge in analysis on the basis of “…if the threat of terrorism is real…” and let readers supply their own answers to that question.

There are other valuable tips and asides, so watch the video or read the transcript: How To Use Data Science To Write And Sell More Books With Chris Fox.

PS: As of today, there are 292 podcasts by Jonna Penn.

Everything You Wanted to Know about Book Sales (But Were Afraid to Ask)

Tuesday, July 5th, 2016

Everything You Wanted to Know about Book Sales (But Were Afraid to Ask) by Lincoln Michel.

From the post:

Publishing is the business of creating books and selling them to readers. And yet, for some reason we aren’t supposed to talk about the latter.

Most literary writers consider book sales a half-crass / half-mythological subject that is taboo to discuss.
While authors avoid the topic, every now and then the media brings up book sales — normally to either proclaim, yet again, the death of the novel, or to make sweeping generalizations about the attention spans of different generations. But even then, the data we are given is almost completely useless for anyone interested in fiction and literature. Earlier this year, there was a round of excited editorials about how print is back, baby after industry reports showed print sales increasing for the second consecutive year. However, the growth was driven almost entirely by non-fiction sales… more specifically adult coloring books and YouTube celebrity memoirs. As great as adult coloring books may be, their sales figures tell us nothing about the sales of, say, literary fiction.

Lincoln’s account mirrors my experience (twice) with a small press decades ago.

While you (rightfully) think that every sane person on the planet will forego the rent in order to purchase your book, sadly your publisher is very unlikely to share that view.

One of the comments to this post reads:

…Writing is a calling but publishing is a business.

Quite so.

Don’t be discouraged by this account but do allow it to influence your expectations, at least about the economic rewards of publishing.

Just in case I get hit with the publishing bug again, good luck to us all!

Free Programming Books – Update

Tuesday, July 5th, 2016

Free Programming Books by Victor Felder.

From the webpage:

This list initially was a clone of stackoverflow – List of Freely Available Programming Books by George Stocker. Now updated, with dead links gone and new content.

Moved to GitHub for collaborative updating.

Great listing of resources!

But each resource stands alone as its own silo. It can (and many do) refer to other materials, even with hyperlinks, but if you want to explore any of them, you must explore them separately. That’s what being in a silo means. You have to start over at the beginning. Every time.

That is complicated by the existence of thousands of slideshows and videos on programming topics not listed here. Search for your favorite programming language at Slideshare and Youtube. There are other repositories of slideshows and videos, those are just examples.

Each one of those slideshows and/or videos is also a silo. Not to mention that with video you need a time marker if you aren’t going to watch every second of it to find relevant material.

What if you could traverse each of those silos, books, posts, slideshows, videos, documentation, source code, seamlessly?

Making that possible for C/C++ now, given the backlog of material, would have a large upfront cost before it could be useful.

Making that possible for languages with shorter histories, well, how useful would it need to be to justify its cost?

And how would you make it possible for others to easily contribute gems that they find?

Something to think about as you wander about in each of these separate silos.


How do you skim through a digital book?

Sunday, June 19th, 2016

How do you skim through a digital book? by Chloe Roberts.

From the post:

We’ve had a couple of digitised books that proved really popular with online audiences. Perhaps partly reflecting the interests of the global population, they’ve been about prostitutes and demons.

I’ve been especially interested in how people have interacted with these popular digitised books. Imagine how you’d pick up a book to look at in a library or bookshop. Would you start from page one, laboriously working through page by page, or would you flip through it, checking for interesting bits? Should we expect any different behaviour when people use a digital book?

We collect data on aggregate (nothing personal or trackable to our users) about what’s being asked of our digitised items in the viewer. With such a large number of views of these two popular books, I’ve got a big enough dataset to get an interesting idea of how readers might be using our digitised books.

Focusing on ‘Compendium rarissimum totius Artis Magicae sistematisatae per celeberrimos Artis hujus Magistros. Anno 1057. Noli me tangere’ (the 18th century one about demons) I’ve mapped the number of page views (horizontal axis) against page number (vertical axis, with front cover at the top), and added coloured bands to represent what’s on those pages.

Chole captured and then analyzed the reading behavior of readers on two very popular electronic titles.

She explains her second observation:

Observation 2: People like looking at pictures more than text

by suggesting the text being in Latin and German may explain the fondness for the pictures.

Perhaps, but I have heard the same observation made about Playboy magazine. 😉

From a documentation/training perspective, Chole’s technique, for digital training materials, could provide guidance on:

  • Length of materials
  • Use of illustrations
  • Organization of materials
  • What material is habitually unread?

If critical material isn’t being read, exhorting newcomers to read more carefully, is not the answer.

If security and/or on-boarding reading isn’t happening, as shown by reader behavior, that’s your fault, not the readers.

Your call, successful staff and customers or failing staff and customers you can blame for security faults and declining sales.

Choose carefully.

Dissertations – Searching Tip

Friday, May 27th, 2016

It been years since I have ordered a dissertation but I ran across one today that isn’t already on the web.

I landed at ProQuest but there was no obvious place to search for a dissertation.

Ah, that’s because you have to follow “Order Now” before this interface is displayed:


I wasn’t “ready” to order so I missed the obvious link for several minutes.

Tip for ProQuest: Search Dissertations link should be on your homepage. (Who approved your homepage design? Management?)

Hacking Book Sale! To Support the Electronic Frontier Foundation

Wednesday, April 27th, 2016

Humble Books Bundle: Hacking

No Starch Press has teamed up with Humble Bundle to raise money for the Electronic Frontier Foundation (EFF)!

$366 worth of No Starch hacking books on a pay what you want basis!

Charitable opportunities don’t get any better than this!

As I type this post, sales of these bundles rolled over 6,200 sales!

To help me participate in this sale, consider a donation.


Google BigQuery Public Datasets

Wednesday, March 30th, 2016

Google BigQuery Public Datasets

An amazing set of public datasets, from the post:

  • : A Social Security Administration dataset that contains all names from Social Security card applications for births that occurred in the United States after 1879.
  • : Data collected by the NYC Taxi and Limousine Commission (TLC) that includes trip records from all trips completed in yellow and green taxis in NYC from 2009 to 2015.
  • : A dataset that contains all stories and comments from Hacker News since its launch in 2006.
  • : A dataset published by the US Department of Health and Human Services that includes all weekly surveillance reports of nationally notifiable diseases for all U.S. cities and states published between 1888 and 2013.
  • : A dataset that contains 3.5 million digitized books stretching back two centuries, encompassing the complete English-language public domain collections of the Internet Archive (1.3M volumes) and HathiTrust (2.2 million volumes).
  • : This public dataset was created by the National Oceanic and Atmospheric Administration (NOAA) and includes global data obtained from the USAF Climatology Center. This dataset covers GSOD data between 1929 and 2016, collected from over 9000 stations.

I can readily see myself loosing serious time in the GDELT Book Corpus!


Serious Non-Transparency (+ work around)

Tuesday, March 29th, 2016

I mentioned yesterday in my post: Courses -> Texts: A Hidden Relationship, where I lamented the inability to find courses by their titles.

So you could easily discover the required/suggested texts for any given course. Like browsing a physical campus bookstore.

Obscurity is an “information smell” (to build upon Felienne‘s expansion of code smell to spreadsheets).

In this particular case, the “information smell” is skunk class.

I revisited today to extract its > 1200 bookstores for use in crawling a sample of those sites.

For ugly HTML, view the source of:

Parsing that is going to take time and surely there is an easy way to get a sample of the sites for mining.

The idea didn’t occur to me immediately but I noticed yesterday that the general form of web addresses was:

So, after some flailing about with the HTML from, I searched for “” and requested all the results.

I’m picking a random ten bookstores with law books for further searching.

Not a high priority but I am curious what lies behind the smoke, mirrors, complex HTML and poor interfaces.

Maybe something, maybe nothing. Won’t know unless we look.

PS: Perhaps a better query string: textbooks-and-course-materials

Suggested refinements?

Courses -> Texts: A Hidden Relationship

Monday, March 28th, 2016

Quite by accident I discovered the relationship between courses and their texts is hidden in many (approx. 2000) campus bookstore interfaces.

If you visit a physical campus bookstore you can browse courses for their textbooks. Very useful if you are interested the subject but not taking the course.

An online LLM (master’s of taxation) flyer prompted me to check the textbooks for the course work.

A simple enough information request. Find the campus bookstore and browse by course for text listings.

Not so fast!

The online presences of over 1200 campus bookstores are delivered, which offers this interface:


Another 748 campus bookstores are delivered by, with a similar interface for textbooks:


I started this post by saying the relationship between courses and their texts is hidden, but that’s not quite right.

The relationship between a meaningless course number and its required/suggested text is visible, but the identification of a course by a numeric string is hardly meaningful to the casual observer. (read not an enrolled student)

Perhaps better to say that a meaningful identification of courses for non-enrolled students and their relationship to required/suggested texts is absent.

That is the relationship of course -> text is present, but not in a form meaningful to anyone other than a student in that course.

Considering two separate vendors across almost 2,000 bookstores deliberately obscure the course -> text relationship, who has to wonder why?

I don’t have any immediate suggestions but when I encounter systematic obscuring of information across vendors, alarm bells start to go off.

Just for completeness sake, you can get around the obscuring of the course -> text relationship by searching for syllabus LLM taxation income OR estate OR corporate or (school name) syllabus LLM taxation income OR estate OR corporate. Extract required/suggested texts from posted syllabi.

PS: If you can offer advice on bookstore interfaces suggest enabling the browsing of courses by name and linking to the required/suggested texts.

During the searches I made writing this post, I encountered a syllabus on basic tax by Prof. Bret Wells which has this quote by Martin D. Ginsburg:

Basic tax, as everyone knows, is the only genuinely funny subject in law school.

Tax law does have an Alice in Wonderland quality about it, but The Hunting of the Snark: an Agony in Eight Fits is probably the closer match.

Amazon Top 20 Books in Data Mining – 18? Low Quality Listicle?

Monday, January 25th, 2016

Amazon Top 20 Books in Data Mining by Matthew Mayo.

Matthew’s bio says:

Bio: Matthew Mayo is a computer science graduate student currently working on his thesis parallelizing machine learning algorithms. He is also a student of data mining, a data enthusiast, and an aspiring machine learning scientist.

So, puzzle me this:

  • Why does this listicle have “Data Science From Scratch: First Principles with Python” by Joel Grus, listed twice?
  • Why does David Pogue’s “iPhone: The Missing Manual” appear in this list?

“Data Science From Scratch: First Principles with Python” appears twice because one is paperback and the other is Kindle. Amazon treats those as separate subjects for sales purposes, although to a reader they are more likely a single subject, which has several formats.

The appearance of “iPhone: The Missing Manual” in this listing is a category error.

If you want to generate unproofed listicles of bestsellers, start with the Amazon best link for computer science or choose one of its many sub-categories such as data mining.

The measure of a listicle isn’t how easy it was to generate but how useful it is to the targeted community.

Duplication and irrelevant results detract from the usefulness of a listicle.


YC’s 2015 Reading List

Sunday, December 20th, 2015

YC’s 2015 Reading List

From the post:

Here is a roundup of some of the best books we at Y Com­bi­na­tor read in 2015 – some of them hap­pened to be pub­lished this year, but many of them were not. A big hat-tip to Bill Gates, whose leg­endary read­ing lists in­spired us to make one of our own.

Be not afraid!

There is no ordering by importance, topic or other metric.

Just a list of twenty (20) books that were enjoyed by the folks at Y Combinator.

I read recently that diverse inputs and opinions will make you smarter.

While I run that to ground, check you local library or bookstore for one or more of these volumes.

Paradise Lost (John MILTON, 1608 – 1674) Audio Version

Thursday, December 10th, 2015

Paradise Lost (John MILTON, 1608 – 1674) Audio Version.

As you know, John Milton was blind when he wrote Paradise Lost. His only “interface” for writing, editing and correcting was aural.

Shoppers and worshipers need to attend very closely to the rhetoric of the season. Listening to Paradise Lost even as Milton did, may sharpen your ear for rhetorical devices and words that would otherwise pass unnoticed.

For example, what are the “good tidings” of Christmas hymns? Are they about the “…new born king…” or are they anticipating the sacrifice of that “…new born king…” instead of ourselves?

The first seems traditional and fairly benign, the second, seems more self-centered and selfish than the usual Christmas holiday theme.

If you think that is an aberrant view of the holiday, consider that in A Christmas Carol by Charles Dickens, that Scrooge, spoiler alert, ends the tale by keeping Christmas in his heart all year round.

One of the morals being that we should treat others kindly and with consideration every day of the year. Not as some modern Christians do, half-listening at an hour long service once a week and spending the waking portion of the other 167 hours not being Christians.

Paradise Lost is a complex and nuanced text. Learning to spot its rhetorical moves and devices will make you a more discerning observer of modern discourse.


The Preservation of Favoured Traces [Multiple Editions of Darwin]

Thursday, December 10th, 2015

The Preservation of Favoured Traces

From the webpage:

Charles Darwin first published On the Origin of Species in 1859, and continued revising it for several years. As a result, his final work reads as a composite, containing more than a decade’s worth of shifting approaches to his theory of evolution. In fact, it wasn’t until his fifth edition that he introduced the concept of “survival of the fittest,” a phrase that actually came from philosopher Herbert Spencer. By color-coding each word of Darwin’s final text by the edition in which it first appeared, our latest book and poster of his work trace his thoughts and revisions, demonstrating how scientific theories undergo adaptation before their widespread acceptance.

The original interactive version was built in tandem with exploratory and teaching tools, enabling users to see changes at both the macro level, and word-by-word. The printed poster allows you to see the patterns where edits and additions were made and—for those with good vision—you can read all 190,000 words on one page. For those interested in curling up and reading at a more reasonable type size, we’ve also created a book.

The poster and book are available for purchase below. All proceeds are donated to charity.

For textual history fans this is an impressive visualization of the various editions of On the Origin of Species.

To help students get away from the notion of texts as static creations, plus to gain some experience with markup, consider choosing a well known work that has multiple editions that is available in TEI.

Then have the students write XQuery expressions to transform a chapter of such a work into a later (or earlier) edition.

Depending on the quality of the work, that could be a means of contributing to the number of TEI encoded texts and your students would gain experience with both TEI and XQuery.

The Architecture of Open Source Applications

Thursday, November 12th, 2015

The Architecture of Open Source Applications

From the webpage:

Architects look at thousands of buildings during their training, and study critiques of those buildings written by masters. In contrast, most software developers only ever get to know a handful of large programs well—usually programs they wrote themselves—and never study the great programs of history. As a result, they repeat one another’s mistakes rather than building on one another’s successes.

Our goal is to change that. In these two books, the authors of four dozen open source applications explain how their software is structured, and why. What are each program’s major components? How do they interact? And what did their builders learn during their development? In answering these questions, the contributors to these books provide unique insights into how they think.

If you are a junior developer, and want to learn how your more experienced colleagues think, these books are the place to start. If you are an intermediate or senior developer, and want to see how your peers have solved hard design problems, these books can help you too.

Follow us on our blog at, or on Twitter at @aosabook and using the #aosa hashtag.

I happened upon these four books because of a tweet that mentioned: Early Access Release of Allison Kaptur’s “A Python Interpreter Written in Python” Chapter, which I found to be the tenth chapter of “500 Lines.”

OK, but what the hell is “500 Lines?” Poking around a bit I found The Architecture of Open Source Applications.

Which is the source for the material I quote above.

Do you learn from example?

Let me give you the flavor of three of the completed volumes and the “500 Lines” that is in progress:

The Architecture of Open Source Applications: Elegance, Evolution, and a Few Fearless Hacks (vol. 1), from the introduction:

Carpentry is an exacting craft, and people can spend their entire lives learning how to do it well. But carpentry is not architecture: if we step back from pitch boards and miter joints, buildings as a whole must be designed, and doing that is as much an art as it is a craft or science.

Programming is also an exacting craft, and people can spend their entire lives learning how to do it well. But programming is not software architecture. Many programmers spend years thinking about (or wrestling with) larger design issues: Should this application be extensible? If so, should that be done by providing a scripting interface, through some sort of plugin mechanism, or in some other way entirely? What should be done by the client, what should be left to the server, and is “client-server” even a useful way to think about this application? These are not programming questions, any more than where to put the stairs is a question of carpentry.

Building architecture and software architecture have a lot in common, but there is one crucial difference. While architects study thousands of buildings in their training and during their careers, most software developers only ever get to know a handful of large programs well. And more often than not, those are programs they wrote themselves. They never get to see the great programs of history, or read critiques of those programs’ designs written by experienced practitioners. As a result, they repeat one another’s mistakes rather than building on one another’s successes.

This book is our attempt to change that. Each chapter describes the architecture of an open source application: how it is structured, how its parts interact, why it’s built that way, and what lessons have been learned that can be applied to other big design problems. The descriptions are written by the people who know the software best, people with years or decades of experience designing and re-designing complex applications. The applications themselves range in scale from simple drawing programs and web-based spreadsheets to compiler toolkits and multi-million line visualization packages. Some are only a few years old, while others are approaching their thirtieth anniversary. What they have in common is that their creators have thought long and hard about their design, and are willing to share those thoughts with you. We hope you enjoy what they have written.

The Architecture of Open Source Applications: Structure, Scale, and a Few More Fearless Hacks (vol. 2), from the introduction:

In the introduction to Volume 1 of this series, we wrote:

Building architecture and software architecture have a lot in common, but there is one crucial difference. While architects study thousands of buildings in their training and during their careers, most software developers only ever get to know a handful of large programs well… As a result, they repeat one another’s mistakes rather than building on one another’s successes… This book is our attempt to change that.

In the year since that book appeared, over two dozen people have worked hard to create the sequel you have in your hands. They have done so because they believe, as we do, that software design can and should be taught by example—that the best way to learn how think like an expert is to study how experts think. From web servers and compilers through health record management systems to the infrastructure that Mozilla uses to get Firefox out the door, there are lessons all around us. We hope that by collecting some of them together in this book, we can help you become a better developer.

The Performance of Open Source Applications, from the introduction:

It’s commonplace to say that computer hardware is now so fast that most developers don’t have to worry about performance. In fact, Douglas Crockford declined to write a chapter for this book for that reason:

If I were to write a chapter, it would be about anti-performance: most effort spent in pursuit of performance is wasted. I don’t think that is what you are looking for.

Donald Knuth made the same point thirty years ago:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

but between mobile devices with limited power and memory, and data analysis projects that need to process terabytes, a growing number of developers do need to make their code faster, their data structures smaller, and their response times shorter. However, while hundreds of textbooks explain the basics of operating systems, networks, computer graphics, and databases, few (if any) explain how to find and fix things in real applications that are simply too damn slow.

This collection of case studies is our attempt to fill that gap. Each chapter is written by real developers who have had to make an existing system faster or who had to design something to be fast in the first place. They cover many different kinds of software and performance goals; what they have in common is a detailed understanding of what actually happens when, and how the different parts of large applications fit together. Our hope is that this book will—like its predecessor The Architecture of Open Source Applications—help you become a better developer by letting you look over these experts’ shoulders.

500 Lines or Less From the GitHub page:

Every architect studies family homes, apartments, schools, and other common types of buildings during her training. Equally, every programmer ought to know how a compiler turns text into instructions, how a spreadsheet updates cells, and how a database efficiently persists data.

Previous books in the AOSA series have done this by describing the high-level architecture of several mature open-source projects. While the lessons learned from those stories are valuable, they are sometimes difficult to absorb for programmers who have not yet had to build anything at that scale.

“500 Lines or Less” focuses on the design decisions and tradeoffs that experienced programmers make when they are writing code:

  • Why divide the application into these particular modules with these particular interfaces?
  • Why use inheritance here and composition there?
  • How do we predict where our program might need to be extended, and how can we make that easy for other programmers

Each chapter consists of a walkthrough of a program that solves a canonical problem in software engineering in at most 500 source lines of code. We hope that the material in this book will help readers understand the varied approaches that engineers take when solving problems in different domains, and will serve as a basis for projects that extend or modify the contributions here.

If you answered the question about learning from example with yes, adding these works to your read and re-read list.

BTW, for markup folks, check out Parsing XML at the Speed of Light by Arseny Kapoulkine.

Many hours of reading and keyboard pleasure await anyone using these volumes.

How to Read a Book:…

Saturday, October 31st, 2015

How to Read a Book: The Classic Guide to Intelligent Reading (A Touchstone book) by Mortimer J. Adler and Charles Van Doren.

I should have thought about this book when I posted How to Read a Paper. I haven’t seen a copy in years but that’s a flimsy excuse for forgetting about it. I was reminded of it today when I saw it in a tweet by Michael Nielson.

Amazon has this description:

With half a million copies in print, How to Read a Book is the best and most successful guide to reading comprehension for the general reader, completely rewritten and updated with new material.

Originally published in 1940, this book is a rare phenomenon, a living classic that introduces and elucidates the various levels of reading and how to achieve them—from elementary reading, through systematic skimming and inspectional reading, to speed reading. Readers will learn when and how to “judge a book by its cover,” and also how to X-ray it, read critically, and extract the author’s message from the text.

Also included is instruction in the different techniques that work best for reading particular genres, such as practical books, imaginative literature, plays, poetry, history, science and mathematics, philosophy and social science works.

Finally, the authors offer a recommended reading list and supply reading tests you can use measure your own progress in reading skills, comprehension, and speed.

Is How to Read a Book as relevant today as it was in 1940?

In chapter 1, Adler makes a critical distinction between facts and understanding and laments the packaging of opinions:

Perhaps we know more about the world than we used to, and insofar as knowledge is prerequisite to understanding, that is all to the good. But knowledge is not as much a prerequisite to understanding as is commonly supposed. We do not have to know everything about something in order to understand it; too many facts are often as much of an obstacle to understanding as too few. There is a sense in which we moderns are inundated with facts to the detriment of understanding.

One of the reasons for this situation is that the very media we have mentioned are so designed as to make thinking seem unnecessary (though this is only an appearance). The packaging of intellectual positions and views is one of the most active enterprises of some of the best minds of our day. The viewer of television, the listener to radio, the reader of magazines, is presented with a whole complex of elements—all the way from ingenious rhetoric to carefully selected data and statistics—to make it easy for him to “make up his own mind” with the minimum of difficulty and effort. But the packaging is often done so effectively that the viewer, listener, or reader does not make up his own mind at all. Instead, he inserts a packaged opinion into his mind, somewhat like inserting a cassette into a cassette player. He then pushes a button and “plays back” the opinion whenever it seems appropriate to do so. He has performed acceptably without having had to think.

I can’t imagine Adler’s characterization of Fox News, CNN, Facebook and other forums that inundate us with nothing but pre-packaged opinions and repetition of the same.

Although not in modern gender neutral words:

…he inserts a packaged opinion into his mind, somewhat like inserting a cassette into a cassette player. He then pushes a button and “plays back” the opinion whenever it seems appropriate to do so. He has performed acceptably without having had to think.

In a modern context, such viewers, listeners, or readers, in addition to the “play back” function are also quick to denounce anyone who questions their pre-recorded narrative as a “troll.” Fearing discussion of other narratives, alternative experiences or explanations, is a sure sign of a pre-recorded opinion. Discussion interferes with the propagation of pre-recorded opinions.

How to Mark a Book has delightful advice from Adler on marking books. It captures the essence of Adler’s love of books and reading.

Obfuscation: how leaving a trail of confusion can beat online surveillance [Book]

Saturday, October 24th, 2015

Obfuscation: how leaving a trail of confusion can beat online surveillance by Julia Powles.

From the post:

At the heart of Cambridge University, there’s a library tower filled with 200,000 forgotten books. Rumoured by generations of students to hold the campus collection of porn, Sir Gilbert Scott’s tower is, in fact, filled with pocket books. Guides, manuals, tales and pamphlets for everyday life, deemed insufficiently scholarly for the ordinary collection, they stand preserved as an extraordinary relic of past preoccupations.

One new guide in the handbook tradition – and one that is decidedly on point for 2015 – is the slim, black, cloth-bound volume, Obfuscation: A User’s Guide for Privacy and Protest, published by MIT Press. A collaboration between technologist Finn Brunton and philosopher Helen Nissenbaum, both of New York University, Obfuscation packs utility, charm and conviction into its tightly-composed 100-page core. This is a thin book, but its ambition is vast.

Brunton and Nissenbaum aim to start a “big little revolution” in the data-mining and surveillance business, by “throwing some sand in the gears, kicking up dust and making some noise”. Specifically, the authors champion the titular term, obfuscation, or “the addition of ambiguous, confusing, or misleading information to interfere with surveillance and data collection projects”. The objective of such measures is to thwart profiling, “to buy time, gain cover, and hide in a crowd of signals”.

Read Julia’s review and then order Obfuscation: A User’s Guide for Privacy and Protest or add it to your wish list!

MIT Press give this description:

With Obfuscation, Finn Brunton and Helen Nissenbaum mean to start a revolution. They are calling us not to the barricades but to our computers, offering us ways to fight today’s pervasive digital surveillance—the collection of our data by governments, corporations, advertisers, and hackers. To the toolkit of privacy protecting techniques and projects, they propose adding obfuscation: the deliberate use of ambiguous, confusing, or misleading information to interfere with surveillance and data collection projects. Brunton and Nissenbaum provide tools and a rationale for evasion, noncompliance, refusal, even sabotage—especially for average users, those of us not in a position to opt out or exert control over data about ourselves. Obfuscation will teach users to push back, software developers to keep their user data safe, and policy makers to gather data without misusing it.

Brunton and Nissenbaum present a guide to the forms and formats that obfuscation has taken and explain how to craft its implementation to suit the goal and the adversary. They describe a series of historical and contemporary examples, including radar chaff deployed by World War II pilots, Twitter bots that hobbled the social media strategy of popular protest movements, and software that can camouflage users’ search queries and stymie online advertising. They go on to consider obfuscation in more general terms, discussing why obfuscation is necessary, whether it is justified, how it works, and how it can be integrated with other privacy practices and technologies.

In hardcover, Obfuscation retails at $19.95, for 136 pages.

MIT should issue a paperback version for $5.00 (or less in bulk), to put Obfuscation in the range of conference swag.

The underlying principles and discussion are all very scholarly I’m sure (I haven’t read it yet) but obfuscation can only flourish when practiced in large numbers. Cf. “I’m Spartacus”. Spartacus (IMDB), Spartacus Film (Wikipedia)

To paraphrase the Capital One ad: How many different identities do you have in your wallet?

16+ Free Data Science Books

Sunday, October 18th, 2015

16+ Free Data Science Books by William Chen.

From the webpage:

As a data scientist at Quora, I often get asked for my advice about becoming a data scientist. To help those people, I’ve took some time to compile my top recommendations of quality data science books that are either available for free (by generosity of the author) or are Pay What You Want (PWYW) with $0 minimum.

Please bookmark this place and refer to it often! Click on the book covers to take yourself to the free versions of the book. I’ve also provided Amazon links (when applicable) in my descriptions in case you want to buy a physical copy. There’s actually more than 16 free books here since I’ve added a few since conception, but I’m keeping the name of this website for recognition.

The authors of these books have put in much effort to produce these free resources – please consider supporting them through avenues that the authors provide, such as contributing via PWYW or buying a hard copy [Disclosure: I get a small commission via the Amazon links, and I am co-author of one of these books].

Some of the usual suspects are here along with some unexpected titles, such as A First Course in Design and Analysis of Experiments by Gary W. Oehlert.

From the introduction:

Researchers use experiments to answer questions. Typical questions might be:

  • Is a drug a safe, effective cure for a disease? This could be a test of how AZT affects the progress of AIDS
  • Which combination of protein and carbohydrate sources provides the best nutrition for growing lambs?
  • How will long-distance telephone usage change if our company offers a different rate structure to our customers
  • Will an ice cream manufactured with a new kind of stabilizer be as palatable as our current ice cream?
  • Does short-term incarceration of spouse abusers deter future assaults?
  • Under what conditions should I operate my chemical refinery, given this month’s grade of raw material?

This book is meant to help decision makers and researchers design good experiments, analyze them properly, and answer their questions.

It isn’t short, six hundred and fifty-nine pages, but taken in small doses you will learn a great deal about experimental design. Not only how to properly design experiments but how to spot when they aren’t well designed.

Think of it as training to go big-game hunting in the latest issue of Nature or Science. Adds a bit of competitiveness to the enterprise.

Python Week 2015 (Packt Publishing)

Monday, October 12th, 2015

Python Week 2015 (Packt Publishing)

Packt Publishing is giving away free ebooks and offering 20% off their top selling Python books and videos.

The free book for today (good for approximately 22 hours from this posting):

Building Machine Learning Systems with Python

Expand your Python knowledge and learn all about machine-learning libraries in this user-friendly manual. ML is the next big breakthrough in technology and this book will give you the head-start you need.

  • Master Machine Learning using a broad set of Python libraries and start building your own Python-based ML systems
  • Covers classification, regression, feature engineering, and much more guided by practical examples
  • A scenario-based tutorial to get into the right mind-set of a machine learner (data exploration) and successfully implement this in your new or existing projects

I didn’t know this was Python week! 😉

BTW, there is a website devoted to awareness days, weeks, months:

They seem to take the idea quite seriously but they didn’t have Python week on their calendar.

Is the term “tease” still in fashion?

Thursday, October 1st, 2015

I ask if “tease” is still in fashion (or its more sexist equivalent) because I keep running across partial O’Reilly publications that are touted as “free,” but are in reality, just extended ads for forthcoming books.

A case in point is “Transforms in CSS” which isn’t really a book but an excerpt from the forth edition of CSS: The Definitive Guide.

Forty page book?

Social media with light up with posts and reposts about this “free” title.

Save your time and disk space. If anything, get a preview copy of the forth edition of CSS: The Definitive Guide when it is available.

Make no mistake, I like O’Reilly publications and I am presently reading what I suspect is the best O’Reilly title in a number of years, XQuery by Priscilla Walmsley.

O’Reilly shouldn’t waste bandwidth with disconnected excerpts for its titles.

Writing “Python Machine Learning”

Saturday, September 26th, 2015

Writing “Python Machine Learning” by Sebastian Raschka.

From the post:

It’s been about time. I am happy to announce that “Python Machine Learning” was finally released today! Sure, I could just send an email around to all the people who were interested in this book. On the other hand, I could put down those 140 characters on Twitter (minus what it takes to insert a hyperlink) and be done with it. Even so, writing “Python Machine Learning” really was quite a journey for a few months, and I would like to sit down in my favorite coffeehouse once more to say a few words about this experience.

A delightful tale for those of us who have authored books and an inspiration (with some practical suggestions) for anyone who hopes to write a book.

Sebastian’s productivity hints will ring familiar for those with similar habits and bear study by those who hope to become more productive.

Sebastian never comes out and says it but his writing approach breaks each stage of the book into manageable portions. It is far easier to say (and do) “write an outline” than to “write the complete and fixed outline for an almost 500 page book.”

If the task is too large, the complete and immutable outline, you won’t get up enough momentum to make a reasonable start.

After reading Sebastian’s post, what book are you thinking about writing?

Free Data Science Books (Update, + 53 books, 117 total)

Saturday, September 26th, 2015

Free Data Science Books (Update).

From the post:

Pulled from the web, here is a great collection of eBooks (most of which have a physical version that you can purchase on Amazon) written on the topics of Data Science, Business Analytics, Data Mining, Big Data, Machine Learning, Algorithms, Data Science Tools, and Programming Languages for Data Science.

While every single book in this list is provided for free, if you find any particularly helpful consider purchasing the printed version. The authors spent a great deal of time putting these resources together and I’m sure they would all appreciate the support!

Note: Updated books as of 9/21/15 are post-fixed with an asterisk (*). Scroll to updates

Great news but also more content.

Unlike big data, you have to read this content in detail to obtain any benefit from it.

And books in the same area are going to have overlapping content as well as some unique content.

Imagine how useful it would be to compose a free standing work with the “best” parts from several works.

Copyright laws would be a larger barrier but no more than if you cut-n-pasted your own version for personal use.

If such an approach could be made easy enough, the resulting value would drown out dissenting voices.

I think PDF is the principal practical barrier.

Do you suspect others?

I first saw this in a tweet by Kirk Borne.

The Enemies of Books

Friday, September 4th, 2015

The Enemies of Books by William Blades.

Published in 1888, The Enemies of Books reflects the biases and prejudices of its time, much as our literature transparently carries forward our biases and prejudices.

A valuable reminder in these censorship happy times that knowledge has long be deemed dangerous.

See in particular Chapter 5 Ignorance and Bigotry.

The suppression of “terrorist” literature, from tweets to websites, certainly falls under bigotry and possibly ignorance as well.

Extremist literature of all kinds is heavily repetitive and while it may be exciting to look at what has been forbidden, the thrill wears off fairly quickly. Al Goldstein, the publisher of Screw, once admitted in an interview that after about a year of Screw, if you were paying attention, you would notice the same story lines starting to circle back around.

If that’s a problem with sex, it isn’t hard to imagine that political issues discussed with no nuance, no depth of analysis, no sense of history, but simply “I’m right and X must die!” gets old pretty quickly.

If you believe U.S. reports on Osama bin Lauden, even bin Laden wasn’t on a steady diet of hate literature but had Western materials as well as soft porn.

If the would-be-censors would stop wasting funds on trying to censor social media and the Internet, perhaps they could find the time for historical, nuanced and deep analysis of current issues to publish in an attractive manner.

Censors don’t think and they don’t want you to either.

Let’s disappoint them together!