Archive for the ‘Books’ Category

Notes to (NUS) Computer Science Freshmen…

Monday, March 13th, 2017

Notes to (NUS) Computer Science Freshmen, From The Future

From the intro:

Early into the AY12/13 academic year, Prof Tay Yong Chiang organized a supper for Computer Science freshmen at Tembusu College. The bunch of seniors who were gathered there put together a document for NUS computing freshmen. This is that document.

Feel free to create a pull request to edit or add to it, and share it with other freshmen you know.

There is one sad note:


The Art of Computer Programming (a review of everything in Computer Science; pretty much nobody, save Knuth, has finished reading this)

When you think about the amount of time Knuth has spent researching, writing and editing The Art of Computer Programming (TAOCP), it doesn’t sound unreasonable to expect others, a significant number of others, to have read it.

Any online reading groups focused on TAOCP?

BBC News Could Do Better: Scottish witchcraft book published online

Friday, November 4th, 2016

Scottish witchcraft book published online

From the post:

The Names of Witches in Scotland, 1658 collection, was drawn up during a time when the persecution of supposed witches was rife.

The book also lists the towns where the accused lived and notes of confession.

It is believed many were healers, practicing traditional folk medicine.

Some of the notes give small insights into the lives of those accused.

It is recorded that the spouse of Agnes Watsone, from Dumbarton, is “umquhile” (deceased).

A majority of those accused of witchcraft were women although the records reveal that some men were also persecuted.

Jon Gilchreist and Robert Semple, from Dumbarton, are recorded as sailors. A James Lerile of Alloway, Ayr, is noted as “clenged”, in other words cleaned or made clean.

While Mr Lerile’s fate is unclear, the term probably meant banishment or death.

I’m glad BBC News drew attention to this volume but the only links in the post go to a very annoying commercial site that has transcribed the work.

🙁

With very little effort, I can send you to images of the original:

Names of the witches (in Scotland) 1658.

Some readers (cough), may find the commercial service useful. OK, but BBC News should include links to originals, especially then those are sans annoying subscription requests.

The GCHQ Puzzle Book

Friday, November 4th, 2016

The GCHQ Puzzle Book

The Amazon description:

If 3=T, 4=S, 5=P, 6=H, 7=H … what is 8?

What is the next letter in the sequence: M, V, E, M, J, S, U, ?

Which of the following words is the odd one out: CHAT, COMMENT, ELF, MANGER, PAIN, POUR?

GCHQ is a top-secret intelligence and security agency which recruits some of the very brightest minds. Over the years, their codebreakers have helped keep our country safe, from the Bletchley Park breakthroughs of WWII to the modern-day threat of cyberattack. So it comes as no surprise that, even in their time off, the staff at GCHQ love a good puzzle. Whether they’re recruiting new staff or challenging each other to the toughest Christmas quizzes and treasure hunts imaginable, puzzles are at the heart of what GCHQ does. Now they’re opening up their archives of decades’ worth of codes, puzzles and challenges for everyone to try.
(emphasis in original)

Hard to say if successful completion of the GCHQ Puzzle Book or hacking into GCHQ would be the better way to introduce yourself to the GCHQ.

Depends on which department within GCHQ captures your interest. 😉

Be aware that some pedestrian agencies and their personnel view intrusion into government computers to be crime and punishable as such.

More sophisticated agencies/personnel realize that “…in Jersey, anything is legal so long as you don’t get caught” and/or if you have something of sufficient value to trade.

The “rule of law,” and “letter of the law” stuff is for groundlings. Don’t be a groundling.

How To Use Data Science To Write And Sell More Books (Training Amazon)

Sunday, October 30th, 2016

From the description:

Chris Fox is the bestselling author of science fiction and dark fantasy, as well as non-fiction books for authors including Write to Market, 5000 words per hour and today we’re talking about his next book, Six Figure Author: Using data to sell books.

Show Notes What Amazon data science, and machine learning, are and how authors can use them. How Amazon differs from the other online book retailers and how authors can train Amazon to sell more books. What to look for to find a voracious readership. Strategically writing to market and how to know what readers are looking for. On Amazon ads and when they are useful. Tips on writing faster. The future of writing, including virtual reality and AI help with story.

Joanna Penn of The Creative Penn interviews Chris Fox

Some of the highlights:

Training Amazon To Work For You

…What you want to do is figure out, with as much accuracy as possible, who your target audience is.

And when you start selling your book, the number of sales is not nearly as important as who you sell your book to, because each of those sales to Amazon represents a customer profile.

If you can convince them that people who voraciously read in your genre are going to love this book and you sell a couple of hundred copies to people like that, Amazon’s going to take it and run with it. You’ve now successfully trained them about who your audience is because you used good data and now they’re able to easily sell your book.

If, on the other hand, you and your mom buys a copy and your friend at the coffee shop buys a copy, and people who aren’t necessarily into that genre are all buying it, Amazon gets really lost and confused.

Easier said than done but how’s that for taking advantage of someone else’s machine learning?

Chris also has tips for not “polluting” your Amazon sales data.

Discovering and Writing to a Market


How do you find a sub-category or a smaller niche within the Amazon ecosystem? What are the things to look for in order to find a voracious readership?

Chris: What I do is I start looking at the rankings of the number 1, the number 20, 40, 60, 80 and 100 books. You can tell based on where those books are ranked, how many books in the genre are selling. If the number one book is ranked in the top 100 in the store and so is the 20th book, then you’ve found one of the hottest genres on Amazon.

If you find that by the time you get down to number 40, the rank is dropping off sharply, that suggests that not enough books are being produced in that genre and it might be a great place for you to jump in and make a name for yourself. (emphasis in original)

I know, I know, this is a tough one. Especially for me.

As I have pointed out here on multiple occasions, “terrorism” is largely a fiction of both government and media.

However, if you look at the top 100 paid sellers on terrorism at Amazon, the top fifty (50) don’t have a single title that looks like it denies terrorism is a problem.

🙁

Which I take to mean, in terms of selling books, services, or data, the terrorism is coming for us all gravy train is the profitable line.

Or at least to indulge in analysis on the basis of “…if the threat of terrorism is real…” and let readers supply their own answers to that question.

There are other valuable tips and asides, so watch the video or read the transcript: How To Use Data Science To Write And Sell More Books With Chris Fox.

PS: As of today, there are 292 podcasts by Jonna Penn.

Everything You Wanted to Know about Book Sales (But Were Afraid to Ask)

Tuesday, July 5th, 2016

Everything You Wanted to Know about Book Sales (But Were Afraid to Ask) by Lincoln Michel.

From the post:

Publishing is the business of creating books and selling them to readers. And yet, for some reason we aren’t supposed to talk about the latter.

Most literary writers consider book sales a half-crass / half-mythological subject that is taboo to discuss.
While authors avoid the topic, every now and then the media brings up book sales — normally to either proclaim, yet again, the death of the novel, or to make sweeping generalizations about the attention spans of different generations. But even then, the data we are given is almost completely useless for anyone interested in fiction and literature. Earlier this year, there was a round of excited editorials about how print is back, baby after industry reports showed print sales increasing for the second consecutive year. However, the growth was driven almost entirely by non-fiction sales… more specifically adult coloring books and YouTube celebrity memoirs. As great as adult coloring books may be, their sales figures tell us nothing about the sales of, say, literary fiction.

Lincoln’s account mirrors my experience (twice) with a small press decades ago.

While you (rightfully) think that every sane person on the planet will forego the rent in order to purchase your book, sadly your publisher is very unlikely to share that view.

One of the comments to this post reads:

…Writing is a calling but publishing is a business.

Quite so.

Don’t be discouraged by this account but do allow it to influence your expectations, at least about the economic rewards of publishing.

Just in case I get hit with the publishing bug again, good luck to us all!

Free Programming Books – Update

Tuesday, July 5th, 2016

Free Programming Books by Victor Felder.

From the webpage:

This list initially was a clone of stackoverflow – List of Freely Available Programming Books by George Stocker. Now updated, with dead links gone and new content.

Moved to GitHub for collaborative updating.

Great listing of resources!

But each resource stands alone as its own silo. It can (and many do) refer to other materials, even with hyperlinks, but if you want to explore any of them, you must explore them separately. That’s what being in a silo means. You have to start over at the beginning. Every time.

That is complicated by the existence of thousands of slideshows and videos on programming topics not listed here. Search for your favorite programming language at Slideshare and Youtube. There are other repositories of slideshows and videos, those are just examples.

Each one of those slideshows and/or videos is also a silo. Not to mention that with video you need a time marker if you aren’t going to watch every second of it to find relevant material.

What if you could traverse each of those silos, books, posts, slideshows, videos, documentation, source code, seamlessly?

Making that possible for C/C++ now, given the backlog of material, would have a large upfront cost before it could be useful.

Making that possible for languages with shorter histories, well, how useful would it need to be to justify its cost?

And how would you make it possible for others to easily contribute gems that they find?

Something to think about as you wander about in each of these separate silos.

Enjoy!

How do you skim through a digital book?

Sunday, June 19th, 2016

How do you skim through a digital book? by Chloe Roberts.

From the post:

We’ve had a couple of digitised books that proved really popular with online audiences. Perhaps partly reflecting the interests of the global population, they’ve been about prostitutes and demons.

I’ve been especially interested in how people have interacted with these popular digitised books. Imagine how you’d pick up a book to look at in a library or bookshop. Would you start from page one, laboriously working through page by page, or would you flip through it, checking for interesting bits? Should we expect any different behaviour when people use a digital book?

We collect data on aggregate (nothing personal or trackable to our users) about what’s being asked of our digitised items in the viewer. With such a large number of views of these two popular books, I’ve got a big enough dataset to get an interesting idea of how readers might be using our digitised books.

Focusing on ‘Compendium rarissimum totius Artis Magicae sistematisatae per celeberrimos Artis hujus Magistros. Anno 1057. Noli me tangere’ (the 18th century one about demons) I’ve mapped the number of page views (horizontal axis) against page number (vertical axis, with front cover at the top), and added coloured bands to represent what’s on those pages.

Chole captured and then analyzed the reading behavior of readers on two very popular electronic titles.

She explains her second observation:

Observation 2: People like looking at pictures more than text

by suggesting the text being in Latin and German may explain the fondness for the pictures.

Perhaps, but I have heard the same observation made about Playboy magazine. 😉

From a documentation/training perspective, Chole’s technique, for digital training materials, could provide guidance on:

  • Length of materials
  • Use of illustrations
  • Organization of materials
  • What material is habitually unread?

If critical material isn’t being read, exhorting newcomers to read more carefully, is not the answer.

If security and/or on-boarding reading isn’t happening, as shown by reader behavior, that’s your fault, not the readers.

Your call, successful staff and customers or failing staff and customers you can blame for security faults and declining sales.

Choose carefully.

Dissertations – Searching Tip

Friday, May 27th, 2016

It been years since I have ordered a dissertation but I ran across one today that isn’t already on the web.

I landed at ProQuest but there was no obvious place to search for a dissertation.

Ah, that’s because you have to follow “Order Now” before this interface is displayed:

proquest-order-450

I wasn’t “ready” to order so I missed the obvious link for several minutes.

Tip for ProQuest: Search Dissertations link should be on your homepage. (Who approved your homepage design? Management?)

Hacking Book Sale! To Support the Electronic Frontier Foundation

Wednesday, April 27th, 2016

Humble Books Bundle: Hacking

No Starch Press has teamed up with Humble Bundle to raise money for the Electronic Frontier Foundation (EFF)!

$366 worth of No Starch hacking books on a pay what you want basis!

Charitable opportunities don’t get any better than this!

As I type this post, sales of these bundles rolled over 6,200 sales!

To help me participate in this sale, consider a donation.

Thanks!

Google BigQuery Public Datasets

Wednesday, March 30th, 2016

Google BigQuery Public Datasets

An amazing set of public datasets, from the post:

  • : A Social Security Administration dataset that contains all names from Social Security card applications for births that occurred in the United States after 1879.
  • : Data collected by the NYC Taxi and Limousine Commission (TLC) that includes trip records from all trips completed in yellow and green taxis in NYC from 2009 to 2015.
  • : A dataset that contains all stories and comments from Hacker News since its launch in 2006.
  • : A dataset published by the US Department of Health and Human Services that includes all weekly surveillance reports of nationally notifiable diseases for all U.S. cities and states published between 1888 and 2013.
  • : A dataset that contains 3.5 million digitized books stretching back two centuries, encompassing the complete English-language public domain collections of the Internet Archive (1.3M volumes) and HathiTrust (2.2 million volumes).
  • : This public dataset was created by the National Oceanic and Atmospheric Administration (NOAA) and includes global data obtained from the USAF Climatology Center. This dataset covers GSOD data between 1929 and 2016, collected from over 9000 stations.

I can readily see myself loosing serious time in the GDELT Book Corpus!

Enjoy!

Serious Non-Transparency (+ work around)

Tuesday, March 29th, 2016

I mentioned http://www.bkstr.com/ yesterday in my post: Courses -> Texts: A Hidden Relationship, where I lamented the inability to find courses by their titles.

So you could easily discover the required/suggested texts for any given course. Like browsing a physical campus bookstore.

Obscurity is an “information smell” (to build upon Felienne‘s expansion of code smell to spreadsheets).

In this particular case, the “information smell” is skunk class.

I revisited http://www.bkstr.com/ today to extract its > 1200 bookstores for use in crawling a sample of those sites.

For ugly HTML, view the source of: http://www.bkstr.com/.

Parsing that is going to take time and surely there is an easy way to get a sample of the sites for mining.

The idea didn’t occur to me immediately but I noticed yesterday that the general form of web addresses was:

bookstore-prefix.bkstr.com

So, after some flailing about with the HTML from bkstr.com, I searched for “bkstr.com” and requested all the results.

I’m picking a random ten bookstores with law books for further searching.

Not a high priority but I am curious what lies behind the smoke, mirrors, complex HTML and poor interfaces.

Maybe something, maybe nothing. Won’t know unless we look.

PS: Perhaps a better query string:

www.bkstr.com textbooks-and-course-materials

Suggested refinements?

Courses -> Texts: A Hidden Relationship

Monday, March 28th, 2016

Quite by accident I discovered the relationship between courses and their texts is hidden in many (approx. 2000) campus bookstore interfaces.

If you visit a physical campus bookstore you can browse courses for their textbooks. Very useful if you are interested the subject but not taking the course.

An online LLM (master’s of taxation) flyer prompted me to check the textbooks for the course work.

A simple enough information request. Find the campus bookstore and browse by course for text listings.

Not so fast!

The online presences of over 1200 campus bookstores are delivered http://www.bkstr.com/, which offers this interface:

bookstore-campus

Another 748 campus bookstores are delivered by http://bncollege.com/, with a similar interface for textbooks:

harvard-yale

I started this post by saying the relationship between courses and their texts is hidden, but that’s not quite right.

The relationship between a meaningless course number and its required/suggested text is visible, but the identification of a course by a numeric string is hardly meaningful to the casual observer. (read not an enrolled student)

Perhaps better to say that a meaningful identification of courses for non-enrolled students and their relationship to required/suggested texts is absent.

That is the relationship of course -> text is present, but not in a form meaningful to anyone other than a student in that course.

Considering two separate vendors across almost 2,000 bookstores deliberately obscure the course -> text relationship, who has to wonder why?

I don’t have any immediate suggestions but when I encounter systematic obscuring of information across vendors, alarm bells start to go off.

Just for completeness sake, you can get around the obscuring of the course -> text relationship by searching for syllabus LLM taxation income OR estate OR corporate or (school name) syllabus LLM taxation income OR estate OR corporate. Extract required/suggested texts from posted syllabi.

PS: If you can offer advice on bookstore interfaces suggest enabling the browsing of courses by name and linking to the required/suggested texts.


During the searches I made writing this post, I encountered a syllabus on basic tax by Prof. Bret Wells which has this quote by Martin D. Ginsburg:

Basic tax, as everyone knows, is the only genuinely funny subject in law school.

Tax law does have an Alice in Wonderland quality about it, but The Hunting of the Snark: an Agony in Eight Fits is probably the closer match.

Amazon Top 20 Books in Data Mining – 18? Low Quality Listicle?

Monday, January 25th, 2016

Amazon Top 20 Books in Data Mining by Matthew Mayo.

Matthew’s bio says:

Bio: Matthew Mayo is a computer science graduate student currently working on his thesis parallelizing machine learning algorithms. He is also a student of data mining, a data enthusiast, and an aspiring machine learning scientist.

So, puzzle me this:

  • Why does this listicle have “Data Science From Scratch: First Principles with Python” by Joel Grus, listed twice?
  • Why does David Pogue’s “iPhone: The Missing Manual” appear in this list?

“Data Science From Scratch: First Principles with Python” appears twice because one is paperback and the other is Kindle. Amazon treats those as separate subjects for sales purposes, although to a reader they are more likely a single subject, which has several formats.

The appearance of “iPhone: The Missing Manual” in this listing is a category error.

If you want to generate unproofed listicles of bestsellers, start with the Amazon best http://www.amazon.com/Best-Sellers-Books-Computers-Technology/zgbs/books/5/ref=zg_bs_unv_b_2_549646_1seller link for computer science or choose one of its many sub-categories such as data mining.

The measure of a listicle isn’t how easy it was to generate but how useful it is to the targeted community.

Duplication and irrelevant results detract from the usefulness of a listicle.

Yes?

YC’s 2015 Reading List

Sunday, December 20th, 2015

YC’s 2015 Reading List

From the post:

Here is a roundup of some of the best books we at Y Com­bi­na­tor read in 2015 – some of them hap­pened to be pub­lished this year, but many of them were not. A big hat-tip to Bill Gates, whose leg­endary read­ing lists in­spired us to make one of our own.

Be not afraid!

There is no ordering by importance, topic or other metric.

Just a list of twenty (20) books that were enjoyed by the folks at Y Combinator.

I read recently that diverse inputs and opinions will make you smarter.

While I run that to ground, check you local library or bookstore for one or more of these volumes.

Paradise Lost (John MILTON, 1608 – 1674) Audio Version

Thursday, December 10th, 2015

Paradise Lost (John MILTON, 1608 – 1674) Audio Version.

As you know, John Milton was blind when he wrote Paradise Lost. His only “interface” for writing, editing and correcting was aural.

Shoppers and worshipers need to attend very closely to the rhetoric of the season. Listening to Paradise Lost even as Milton did, may sharpen your ear for rhetorical devices and words that would otherwise pass unnoticed.

For example, what are the “good tidings” of Christmas hymns? Are they about the “…new born king…” or are they anticipating the sacrifice of that “…new born king…” instead of ourselves?

The first seems traditional and fairly benign, the second, seems more self-centered and selfish than the usual Christmas holiday theme.

If you think that is an aberrant view of the holiday, consider that in A Christmas Carol by Charles Dickens, that Scrooge, spoiler alert, ends the tale by keeping Christmas in his heart all year round.

One of the morals being that we should treat others kindly and with consideration every day of the year. Not as some modern Christians do, half-listening at an hour long service once a week and spending the waking portion of the other 167 hours not being Christians.

Paradise Lost is a complex and nuanced text. Learning to spot its rhetorical moves and devices will make you a more discerning observer of modern discourse.

Enjoy!

The Preservation of Favoured Traces [Multiple Editions of Darwin]

Thursday, December 10th, 2015

The Preservation of Favoured Traces

From the webpage:

Charles Darwin first published On the Origin of Species in 1859, and continued revising it for several years. As a result, his final work reads as a composite, containing more than a decade’s worth of shifting approaches to his theory of evolution. In fact, it wasn’t until his fifth edition that he introduced the concept of “survival of the fittest,” a phrase that actually came from philosopher Herbert Spencer. By color-coding each word of Darwin’s final text by the edition in which it first appeared, our latest book and poster of his work trace his thoughts and revisions, demonstrating how scientific theories undergo adaptation before their widespread acceptance.

The original interactive version was built in tandem with exploratory and teaching tools, enabling users to see changes at both the macro level, and word-by-word. The printed poster allows you to see the patterns where edits and additions were made and—for those with good vision—you can read all 190,000 words on one page. For those interested in curling up and reading at a more reasonable type size, we’ve also created a book.

The poster and book are available for purchase below. All proceeds are donated to charity.

For textual history fans this is an impressive visualization of the various editions of On the Origin of Species.

To help students get away from the notion of texts as static creations, plus to gain some experience with markup, consider choosing a well known work that has multiple editions that is available in TEI.

Then have the students write XQuery expressions to transform a chapter of such a work into a later (or earlier) edition.

Depending on the quality of the work, that could be a means of contributing to the number of TEI encoded texts and your students would gain experience with both TEI and XQuery.

The Architecture of Open Source Applications

Thursday, November 12th, 2015

The Architecture of Open Source Applications

From the webpage:

Architects look at thousands of buildings during their training, and study critiques of those buildings written by masters. In contrast, most software developers only ever get to know a handful of large programs well—usually programs they wrote themselves—and never study the great programs of history. As a result, they repeat one another’s mistakes rather than building on one another’s successes.

Our goal is to change that. In these two books, the authors of four dozen open source applications explain how their software is structured, and why. What are each program’s major components? How do they interact? And what did their builders learn during their development? In answering these questions, the contributors to these books provide unique insights into how they think.

If you are a junior developer, and want to learn how your more experienced colleagues think, these books are the place to start. If you are an intermediate or senior developer, and want to see how your peers have solved hard design problems, these books can help you too.

Follow us on our blog at http://aosabook.org/blog/, or on Twitter at @aosabook and using the #aosa hashtag.

I happened upon these four books because of a tweet that mentioned: Early Access Release of Allison Kaptur’s “A Python Interpreter Written in Python” Chapter, which I found to be the tenth chapter of “500 Lines.”

OK, but what the hell is “500 Lines?” Poking around a bit I found The Architecture of Open Source Applications.

Which is the source for the material I quote above.

Do you learn from example?

Let me give you the flavor of three of the completed volumes and the “500 Lines” that is in progress:

The Architecture of Open Source Applications: Elegance, Evolution, and a Few Fearless Hacks (vol. 1), from the introduction:

Carpentry is an exacting craft, and people can spend their entire lives learning how to do it well. But carpentry is not architecture: if we step back from pitch boards and miter joints, buildings as a whole must be designed, and doing that is as much an art as it is a craft or science.

Programming is also an exacting craft, and people can spend their entire lives learning how to do it well. But programming is not software architecture. Many programmers spend years thinking about (or wrestling with) larger design issues: Should this application be extensible? If so, should that be done by providing a scripting interface, through some sort of plugin mechanism, or in some other way entirely? What should be done by the client, what should be left to the server, and is “client-server” even a useful way to think about this application? These are not programming questions, any more than where to put the stairs is a question of carpentry.

Building architecture and software architecture have a lot in common, but there is one crucial difference. While architects study thousands of buildings in their training and during their careers, most software developers only ever get to know a handful of large programs well. And more often than not, those are programs they wrote themselves. They never get to see the great programs of history, or read critiques of those programs’ designs written by experienced practitioners. As a result, they repeat one another’s mistakes rather than building on one another’s successes.

This book is our attempt to change that. Each chapter describes the architecture of an open source application: how it is structured, how its parts interact, why it’s built that way, and what lessons have been learned that can be applied to other big design problems. The descriptions are written by the people who know the software best, people with years or decades of experience designing and re-designing complex applications. The applications themselves range in scale from simple drawing programs and web-based spreadsheets to compiler toolkits and multi-million line visualization packages. Some are only a few years old, while others are approaching their thirtieth anniversary. What they have in common is that their creators have thought long and hard about their design, and are willing to share those thoughts with you. We hope you enjoy what they have written.

The Architecture of Open Source Applications: Structure, Scale, and a Few More Fearless Hacks (vol. 2), from the introduction:

In the introduction to Volume 1 of this series, we wrote:

Building architecture and software architecture have a lot in common, but there is one crucial difference. While architects study thousands of buildings in their training and during their careers, most software developers only ever get to know a handful of large programs well… As a result, they repeat one another’s mistakes rather than building on one another’s successes… This book is our attempt to change that.

In the year since that book appeared, over two dozen people have worked hard to create the sequel you have in your hands. They have done so because they believe, as we do, that software design can and should be taught by example—that the best way to learn how think like an expert is to study how experts think. From web servers and compilers through health record management systems to the infrastructure that Mozilla uses to get Firefox out the door, there are lessons all around us. We hope that by collecting some of them together in this book, we can help you become a better developer.

The Performance of Open Source Applications, from the introduction:

It’s commonplace to say that computer hardware is now so fast that most developers don’t have to worry about performance. In fact, Douglas Crockford declined to write a chapter for this book for that reason:

If I were to write a chapter, it would be about anti-performance: most effort spent in pursuit of performance is wasted. I don’t think that is what you are looking for.

Donald Knuth made the same point thirty years ago:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

but between mobile devices with limited power and memory, and data analysis projects that need to process terabytes, a growing number of developers do need to make their code faster, their data structures smaller, and their response times shorter. However, while hundreds of textbooks explain the basics of operating systems, networks, computer graphics, and databases, few (if any) explain how to find and fix things in real applications that are simply too damn slow.

This collection of case studies is our attempt to fill that gap. Each chapter is written by real developers who have had to make an existing system faster or who had to design something to be fast in the first place. They cover many different kinds of software and performance goals; what they have in common is a detailed understanding of what actually happens when, and how the different parts of large applications fit together. Our hope is that this book will—like its predecessor The Architecture of Open Source Applications—help you become a better developer by letting you look over these experts’ shoulders.

500 Lines or Less From the GitHub page:

Every architect studies family homes, apartments, schools, and other common types of buildings during her training. Equally, every programmer ought to know how a compiler turns text into instructions, how a spreadsheet updates cells, and how a database efficiently persists data.

Previous books in the AOSA series have done this by describing the high-level architecture of several mature open-source projects. While the lessons learned from those stories are valuable, they are sometimes difficult to absorb for programmers who have not yet had to build anything at that scale.

“500 Lines or Less” focuses on the design decisions and tradeoffs that experienced programmers make when they are writing code:

  • Why divide the application into these particular modules with these particular interfaces?
  • Why use inheritance here and composition there?
  • How do we predict where our program might need to be extended, and how can we make that easy for other programmers

Each chapter consists of a walkthrough of a program that solves a canonical problem in software engineering in at most 500 source lines of code. We hope that the material in this book will help readers understand the varied approaches that engineers take when solving problems in different domains, and will serve as a basis for projects that extend or modify the contributions here.

If you answered the question about learning from example with yes, adding these works to your read and re-read list.

BTW, for markup folks, check out Parsing XML at the Speed of Light by Arseny Kapoulkine.

Many hours of reading and keyboard pleasure await anyone using these volumes.

How to Read a Book:…

Saturday, October 31st, 2015

How to Read a Book: The Classic Guide to Intelligent Reading (A Touchstone book) by Mortimer J. Adler and Charles Van Doren.

I should have thought about this book when I posted How to Read a Paper. I haven’t seen a copy in years but that’s a flimsy excuse for forgetting about it. I was reminded of it today when I saw it in a tweet by Michael Nielson.

Amazon has this description:

With half a million copies in print, How to Read a Book is the best and most successful guide to reading comprehension for the general reader, completely rewritten and updated with new material.

Originally published in 1940, this book is a rare phenomenon, a living classic that introduces and elucidates the various levels of reading and how to achieve them—from elementary reading, through systematic skimming and inspectional reading, to speed reading. Readers will learn when and how to “judge a book by its cover,” and also how to X-ray it, read critically, and extract the author’s message from the text.

Also included is instruction in the different techniques that work best for reading particular genres, such as practical books, imaginative literature, plays, poetry, history, science and mathematics, philosophy and social science works.

Finally, the authors offer a recommended reading list and supply reading tests you can use measure your own progress in reading skills, comprehension, and speed.

Is How to Read a Book as relevant today as it was in 1940?

In chapter 1, Adler makes a critical distinction between facts and understanding and laments the packaging of opinions:

Perhaps we know more about the world than we used to, and insofar as knowledge is prerequisite to understanding, that is all to the good. But knowledge is not as much a prerequisite to understanding as is commonly supposed. We do not have to know everything about something in order to understand it; too many facts are often as much of an obstacle to understanding as too few. There is a sense in which we moderns are inundated with facts to the detriment of understanding.

One of the reasons for this situation is that the very media we have mentioned are so designed as to make thinking seem unnecessary (though this is only an appearance). The packaging of intellectual positions and views is one of the most active enterprises of some of the best minds of our day. The viewer of television, the listener to radio, the reader of magazines, is presented with a whole complex of elements—all the way from ingenious rhetoric to carefully selected data and statistics—to make it easy for him to “make up his own mind” with the minimum of difficulty and effort. But the packaging is often done so effectively that the viewer, listener, or reader does not make up his own mind at all. Instead, he inserts a packaged opinion into his mind, somewhat like inserting a cassette into a cassette player. He then pushes a button and “plays back” the opinion whenever it seems appropriate to do so. He has performed acceptably without having had to think.

I can’t imagine Adler’s characterization of Fox News, CNN, Facebook and other forums that inundate us with nothing but pre-packaged opinions and repetition of the same.

Although not in modern gender neutral words:

…he inserts a packaged opinion into his mind, somewhat like inserting a cassette into a cassette player. He then pushes a button and “plays back” the opinion whenever it seems appropriate to do so. He has performed acceptably without having had to think.

In a modern context, such viewers, listeners, or readers, in addition to the “play back” function are also quick to denounce anyone who questions their pre-recorded narrative as a “troll.” Fearing discussion of other narratives, alternative experiences or explanations, is a sure sign of a pre-recorded opinion. Discussion interferes with the propagation of pre-recorded opinions.

How to Mark a Book has delightful advice from Adler on marking books. It captures the essence of Adler’s love of books and reading.

Obfuscation: how leaving a trail of confusion can beat online surveillance [Book]

Saturday, October 24th, 2015

Obfuscation: how leaving a trail of confusion can beat online surveillance by Julia Powles.

From the post:

At the heart of Cambridge University, there’s a library tower filled with 200,000 forgotten books. Rumoured by generations of students to hold the campus collection of porn, Sir Gilbert Scott’s tower is, in fact, filled with pocket books. Guides, manuals, tales and pamphlets for everyday life, deemed insufficiently scholarly for the ordinary collection, they stand preserved as an extraordinary relic of past preoccupations.

One new guide in the handbook tradition – and one that is decidedly on point for 2015 – is the slim, black, cloth-bound volume, Obfuscation: A User’s Guide for Privacy and Protest, published by MIT Press. A collaboration between technologist Finn Brunton and philosopher Helen Nissenbaum, both of New York University, Obfuscation packs utility, charm and conviction into its tightly-composed 100-page core. This is a thin book, but its ambition is vast.

Brunton and Nissenbaum aim to start a “big little revolution” in the data-mining and surveillance business, by “throwing some sand in the gears, kicking up dust and making some noise”. Specifically, the authors champion the titular term, obfuscation, or “the addition of ambiguous, confusing, or misleading information to interfere with surveillance and data collection projects”. The objective of such measures is to thwart profiling, “to buy time, gain cover, and hide in a crowd of signals”.

Read Julia’s review and then order Obfuscation: A User’s Guide for Privacy and Protest or add it to your wish list!

MIT Press give this description:

With Obfuscation, Finn Brunton and Helen Nissenbaum mean to start a revolution. They are calling us not to the barricades but to our computers, offering us ways to fight today’s pervasive digital surveillance—the collection of our data by governments, corporations, advertisers, and hackers. To the toolkit of privacy protecting techniques and projects, they propose adding obfuscation: the deliberate use of ambiguous, confusing, or misleading information to interfere with surveillance and data collection projects. Brunton and Nissenbaum provide tools and a rationale for evasion, noncompliance, refusal, even sabotage—especially for average users, those of us not in a position to opt out or exert control over data about ourselves. Obfuscation will teach users to push back, software developers to keep their user data safe, and policy makers to gather data without misusing it.

Brunton and Nissenbaum present a guide to the forms and formats that obfuscation has taken and explain how to craft its implementation to suit the goal and the adversary. They describe a series of historical and contemporary examples, including radar chaff deployed by World War II pilots, Twitter bots that hobbled the social media strategy of popular protest movements, and software that can camouflage users’ search queries and stymie online advertising. They go on to consider obfuscation in more general terms, discussing why obfuscation is necessary, whether it is justified, how it works, and how it can be integrated with other privacy practices and technologies.

In hardcover, Obfuscation retails at $19.95, for 136 pages.

MIT should issue a paperback version for $5.00 (or less in bulk), to put Obfuscation in the range of conference swag.

The underlying principles and discussion are all very scholarly I’m sure (I haven’t read it yet) but obfuscation can only flourish when practiced in large numbers. Cf. “I’m Spartacus”. Spartacus (IMDB), Spartacus Film (Wikipedia)

To paraphrase the Capital One ad: How many different identities do you have in your wallet?

16+ Free Data Science Books

Sunday, October 18th, 2015

16+ Free Data Science Books by William Chen.

From the webpage:

As a data scientist at Quora, I often get asked for my advice about becoming a data scientist. To help those people, I’ve took some time to compile my top recommendations of quality data science books that are either available for free (by generosity of the author) or are Pay What You Want (PWYW) with $0 minimum.

Please bookmark this place and refer to it often! Click on the book covers to take yourself to the free versions of the book. I’ve also provided Amazon links (when applicable) in my descriptions in case you want to buy a physical copy. There’s actually more than 16 free books here since I’ve added a few since conception, but I’m keeping the name of this website for recognition.

The authors of these books have put in much effort to produce these free resources – please consider supporting them through avenues that the authors provide, such as contributing via PWYW or buying a hard copy [Disclosure: I get a small commission via the Amazon links, and I am co-author of one of these books].

Some of the usual suspects are here along with some unexpected titles, such as A First Course in Design and Analysis of Experiments by Gary W. Oehlert.

From the introduction:

Researchers use experiments to answer questions. Typical questions might be:

  • Is a drug a safe, effective cure for a disease? This could be a test of how AZT affects the progress of AIDS
  • Which combination of protein and carbohydrate sources provides the best nutrition for growing lambs?
  • How will long-distance telephone usage change if our company offers a different rate structure to our customers
  • Will an ice cream manufactured with a new kind of stabilizer be as palatable as our current ice cream?
  • Does short-term incarceration of spouse abusers deter future assaults?
  • Under what conditions should I operate my chemical refinery, given this month’s grade of raw material?

This book is meant to help decision makers and researchers design good experiments, analyze them properly, and answer their questions.

It isn’t short, six hundred and fifty-nine pages, but taken in small doses you will learn a great deal about experimental design. Not only how to properly design experiments but how to spot when they aren’t well designed.

Think of it as training to go big-game hunting in the latest issue of Nature or Science. Adds a bit of competitiveness to the enterprise.

Python Week 2015 (Packt Publishing)

Monday, October 12th, 2015

Python Week 2015 (Packt Publishing)

Packt Publishing is giving away free ebooks and offering 20% off their top selling Python books and videos.

The free book for today (good for approximately 22 hours from this posting):

Building Machine Learning Systems with Python

Expand your Python knowledge and learn all about machine-learning libraries in this user-friendly manual. ML is the next big breakthrough in technology and this book will give you the head-start you need.

  • Master Machine Learning using a broad set of Python libraries and start building your own Python-based ML systems
  • Covers classification, regression, feature engineering, and much more guided by practical examples
  • A scenario-based tutorial to get into the right mind-set of a machine learner (data exploration) and successfully implement this in your new or existing projects

I didn’t know this was Python week! 😉

BTW, there is a website devoted to awareness days, weeks, months: http://www.national-awareness-days.com/

They seem to take the idea quite seriously but they didn’t have Python week on their calendar.

Is the term “tease” still in fashion?

Thursday, October 1st, 2015

I ask if “tease” is still in fashion (or its more sexist equivalent) because I keep running across partial O’Reilly publications that are touted as “free,” but are in reality, just extended ads for forthcoming books.

A case in point is “Transforms in CSS” which isn’t really a book but an excerpt from the forth edition of CSS: The Definitive Guide.

Forty page book?

Social media with light up with posts and reposts about this “free” title.

Save your time and disk space. If anything, get a preview copy of the forth edition of CSS: The Definitive Guide when it is available.

Make no mistake, I like O’Reilly publications and I am presently reading what I suspect is the best O’Reilly title in a number of years, XQuery by Priscilla Walmsley.

O’Reilly shouldn’t waste bandwidth with disconnected excerpts for its titles.

Writing “Python Machine Learning”

Saturday, September 26th, 2015

Writing “Python Machine Learning” by Sebastian Raschka.

From the post:

It’s been about time. I am happy to announce that “Python Machine Learning” was finally released today! Sure, I could just send an email around to all the people who were interested in this book. On the other hand, I could put down those 140 characters on Twitter (minus what it takes to insert a hyperlink) and be done with it. Even so, writing “Python Machine Learning” really was quite a journey for a few months, and I would like to sit down in my favorite coffeehouse once more to say a few words about this experience.

A delightful tale for those of us who have authored books and an inspiration (with some practical suggestions) for anyone who hopes to write a book.

Sebastian’s productivity hints will ring familiar for those with similar habits and bear study by those who hope to become more productive.

Sebastian never comes out and says it but his writing approach breaks each stage of the book into manageable portions. It is far easier to say (and do) “write an outline” than to “write the complete and fixed outline for an almost 500 page book.”

If the task is too large, the complete and immutable outline, you won’t get up enough momentum to make a reasonable start.

After reading Sebastian’s post, what book are you thinking about writing?

Free Data Science Books (Update, + 53 books, 117 total)

Saturday, September 26th, 2015

Free Data Science Books (Update).

From the post:

Pulled from the web, here is a great collection of eBooks (most of which have a physical version that you can purchase on Amazon) written on the topics of Data Science, Business Analytics, Data Mining, Big Data, Machine Learning, Algorithms, Data Science Tools, and Programming Languages for Data Science.

While every single book in this list is provided for free, if you find any particularly helpful consider purchasing the printed version. The authors spent a great deal of time putting these resources together and I’m sure they would all appreciate the support!

Note: Updated books as of 9/21/15 are post-fixed with an asterisk (*). Scroll to updates

Great news but also more content.

Unlike big data, you have to read this content in detail to obtain any benefit from it.

And books in the same area are going to have overlapping content as well as some unique content.

Imagine how useful it would be to compose a free standing work with the “best” parts from several works.

Copyright laws would be a larger barrier but no more than if you cut-n-pasted your own version for personal use.

If such an approach could be made easy enough, the resulting value would drown out dissenting voices.

I think PDF is the principal practical barrier.

Do you suspect others?

I first saw this in a tweet by Kirk Borne.

The Enemies of Books

Friday, September 4th, 2015

The Enemies of Books by William Blades.

Published in 1888, The Enemies of Books reflects the biases and prejudices of its time, much as our literature transparently carries forward our biases and prejudices.

A valuable reminder in these censorship happy times that knowledge has long be deemed dangerous.

See in particular Chapter 5 Ignorance and Bigotry.

The suppression of “terrorist” literature, from tweets to websites, certainly falls under bigotry and possibly ignorance as well.

Extremist literature of all kinds is heavily repetitive and while it may be exciting to look at what has been forbidden, the thrill wears off fairly quickly. Al Goldstein, the publisher of Screw, once admitted in an interview that after about a year of Screw, if you were paying attention, you would notice the same story lines starting to circle back around.

If that’s a problem with sex, it isn’t hard to imagine that political issues discussed with no nuance, no depth of analysis, no sense of history, but simply “I’m right and X must die!” gets old pretty quickly.

If you believe U.S. reports on Osama bin Lauden, even bin Laden wasn’t on a steady diet of hate literature but had Western materials as well as soft porn.

If the would-be-censors would stop wasting funds on trying to censor social media and the Internet, perhaps they could find the time for historical, nuanced and deep analysis of current issues to publish in an attractive manner.

Censors don’t think and they don’t want you to either.

Let’s disappoint them together!

unglue.it

Monday, August 31st, 2015

unglue.it

From the webpage:

unglue (v. t.) 2. To make a digital book free to read and use, worldwide.

New to me, possibly old to you.

I “discovered” this site while looking at Intermediate Python.

From the general FAQ:

Basics

How It Works

What is Unglue.it?

Unglue.it is a a place for individuals and institutions to join together to make ebooks free to the world. We work together with authors, publishers, or other rights holders who want their ebooks to be free but also want to be able to earn a living doing so. We use Creative Commons licensing as an enabling tool to “unglue” the ebooks.

What are Ungluing Campaigns?

We have three types of Ungluing Campaigns: Pledge Campaigns, Buy-to-Unglue Campaigns and Thanks-for-Ungluing campaigns.

  • In a Pledge Campaign, book lovers pledge their support for ungluing a book. If enough support is found to reach the goal (and only then), the supporter’s credit cards are charged, and an unglued ebook is released.
  • In a Buy-to-Unglue Campaign, every ebook copy sold moves the book’s ungluing date closer to the present. And you can donate ebooks to your local library- that’s something you can’t do in the Kindle or Apple Stores!
  • In a Thanks-for-Ungluing Campaign, the ebook is already released with a Creative Commons license. Supporters can express their thanks by paying what they wish for the license and the ebook.

What is Crowdfunding?

Crowdfunding is collectively pooling contributions (or pledges) to support some cause. Using the internet for coordination means that complete strangers can work together, drawn by a common cause. This also means the number of supporters can be vast, so individual contributions can be as large or as small as people are comfortable with, and still add up to enough to do something amazing.

Want to see some examples? Kickstarter lets artists and inventors solicit funds to make their projects a reality. For instance, webcomic artist Rich Burlew sought $57,750 to reprint his comics in paper form — and raised close to a million.

In other words, crowdfunding is working together to support something you love. By pooling resources, big and small, from all over the world, we can make huge things happen.

What will supplement and then replace contemporary publishing models remains to be seen.

In terms of experiments, this one looks quite promising.

If you use unglue.it, please ping me with your experience. Thanks!

Free Packtpub Books (Legitimate Ones)

Thursday, August 20th, 2015

Packtpub Books is running a “free book per day” event. Most of you know Packtpub already so I won’t belabor the quality of their publications, etc.

The important news is that for 24 hours each day in August, Packtpub Books is offering a different book for free download! The current free book offer appears to expire at the end of August, 2015.

Packtpub Books – Free Learning

This is a great way to introduce non-Packtpub customers to Packtpub publications.

Please share this news widely (and with other publishers). 😉

PACKT Publishing – FREE LEARNING – HELP YOURSELF

Wednesday, February 25th, 2015

PACKT Publishing – FREE LEARNING – HELP YOURSELF

I’m not sure when this started but according to the webpage, there will be one free book per day until March 5, 2015.

I will be checking back tomorrow to see if the selection changes day to day.

Worth a trip just to see if there is anything of interest.

Enjoy!

Harry Potter eBooks

Sunday, February 1st, 2015

All the Harry Potter ebooks are now on subscription site Oyster by Laura Hazard Owen.

Laura reports the Harry Potter books are available on Oyster and Amazon. She says that Oyster has the spin-off titles from the original series where Amazon does not.

Both offer $9.95 per month subscription rates, where Oyster claims “over a million” books and Amazon over 700,000. After reading David Mason’s How many books will you read in your lifetime?, I am not sure the difference in raw numbers will make much difference.

Access to electronic texts will certainly make creating topic maps for popular literature a good deal easier.

Enjoy!

Early English Books Online – Good News and Bad News

Friday, January 2nd, 2015

Early English Books Online

The very good news is that 25,000 volumes from the Early English Books Online collection have been made available to the public!

From the webpage:

The EEBO corpus consists of the works represented in the English Short Title Catalogue I and II (based on the Pollard & Redgrave and Wing short title catalogs), as well as the Thomason Tracts and the Early English Books Tract Supplement. Together these trace the history of English thought from the first book printed in English in 1475 through to 1700. The content covers literature, philosophy, politics, religion, geography, science and all other areas of human endeavor. The assembled collection of more than 125,000 volumes is a mainstay for understanding the development of Western culture in general and the Anglo-American world in particular. The STC collections have perhaps been most widely used by scholars of English, linguistics, and history, but these resources also include core texts in religious studies, art, women’s studies, history of science, law, and music.

Even better news from Sebastian Rahtz Sebastian Rahtz (Chief Data Architect, IT Services, University of Oxford):

The University of Oxford is now making this collection, together with Gale Cengage’s Eighteenth Century Collections Online (ECCO), and Readex’s Evans Early American Imprints, available in various formats (TEI P5 XML, HTML and ePub) initially via the University of Oxford Text Archive at http://www.ota.ox.ac.uk/tcp/, and offering the source XML for community collaborative editing via Github. For the convenience of UK universities who subscribe to JISC Historic Books, a link to page images is also provided. We hope that the XML will serve as the base for enhancements and corrections.

This catalogue also lists EEBO Phase 2 texts, but the HTML and ePub versions of these can only be accessed by members of the University of Oxford.

[Technical note]
Those interested in working on the TEI P5 XML versions of the texts can check them out of Github, via https://github.com/textcreationpartnership/, where each of the texts is in its own repository (eg https://github.com/textcreationpartnership/A00021). There is a CSV file listing all the texts at https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv, and a simple Linux/OSX shell script to clone all 32853 unrestricted repositories at https://raw.githubusercontent.com/textcreationpartnership/Texts/master/cloneall.sh

Now for the BAD NEWS:

An additional 45,000 books:

Currently, EEBO-TCP Phase II texts are available to authorized users at partner libraries. Once the project is done, the corpus will be available for sale exclusively through ProQuest for five years. Then, the texts will be released freely to the public.

Can you guess why the public is barred from what are obviously public domain texts?

Because our funding is limited, we aim to key as many different works as possible, in the language in which our staff has the most expertise.

Academic projects are supposed to fund themselves and be self-sustaining. When anyone asks about sustainability of an academic project, ask them when the last time your countries military was “self sustaining?” The U.S. has spent $2.6 trillion on a “war on terrorism” and has nothing to show for it other than dead and injured military personnel, perversion of budgetary policies, and loss of privacy on a world wide scale.

It is hard to imagine what sort of life-time access for everyone on Earth could be secured for less than $1 trillion. No more special pricing and contracts if you are in countries A to Zed. Eliminate all that paperwork for publishers and to access all you need is a connection to the Internet. The publishers would have a guaranteed income stream, less overhead from sales personnel, administrative staff, etc. And people would have access (whether used or not) to educate themselves, to make new discoveries, etc.

My proposal does not involve payments to large military contractors or subversion of legitimate governments or imposition of American values on other cultures. Leaving those drawbacks to one side, what do you think about it otherwise?