Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 11, 2011

Free Programming Books

Filed under: Language,Programming,Recognition,Subject Identity — Patrick Durusau @ 5:54 pm

Free Programming Books

Despite the title (the author’s update only went so far), there are more than 50 books listed here.

I won’t have tweeted this because like Lucene turning ten, everyone in the world has already tweeted or retweeted the news of these books.

I seem to be on a run of mostly programming resources today and I thought you might find the list interesting, possibly useful.

Especially those of you interested on pattern matching.

It occurs to me that programming languages and books about programming languages are fit fodder for the same tools used on other texts.

I am sure there probably exists an index with all the “hello, world” examples from various computer languages but are there more/deeper similarities that the universal initial example?

There was a universe of programming languages prior to “hello, world” and there is certainly a very large one beyond those listed here but one has to start somewhere. So why not with this set?

I think the first question I would ask is the obvious one: Are there groupings of these works, other than the ones noted? What measures would you use and why? What results do you get?

I suppose before that you need to gather up the texts and do whatever cleanup/conversion is required, perhaps a short note on what you did there would be useful.

What parts were in anticipation of your methods for grouping the texts?

Patience topic map fans, we are getting to the questions of subject recognition.

So, what subjects should we recognize across programming languages? Not tokens or even signatures but subjects. Signatures may be a way of identifying subjects, but can the same subjects have different signatures in distinct languages?

Would recognition of subjects across programming languages assist in learning languages?, in developing new languages (what is commonly needed)?, in studying the evolution of languages (where did we go right/wrong)?, in usefully indexing CS literature?, etc.

And you thought this was a post about “free” programming books. 😉

September 14, 2011

Don’t trust your instincts

Filed under: Data Analysis,Language,Recognition,Research Methods — Patrick Durusau @ 7:04 pm

I stumbled upon a review of: “The Secret Life of Pronouns: What Our Words Say About Us” by James W. Pennebaker in the New York Times Book Review, 28 August 2011.

Pennebaker is a word counter who first rule is: “Don’t trust your instincts.”

Why? In part because our expectations shape our view of the data. (sound familiar?)

The review quotes the Druge Report as posting a headline about President Obama that reads: “I ME MINE: Obama praises C.I.A. for bin Laden raid – while saying ‘I’ 35 Times.”

If the listener thinks President Obama is self-centered, the “I’s” have it as it were.

But, Pennebaker has used his programs to mindlessly count usage of words in press conferences since Truman. Obama is the lowest user I-word user of modern presidents.

That is only one illustration of how badly we can “look” at text or data and get it seriously wrong.

The Secret Life of Pronouns website has exercises to demonstrate how badly we get things wrong. (The videos are very entertaining.)

What does that mean for topic maps and authoring topic maps?

  1. Don’t trust your instincts. (courtesy of Pennebaker)
  2. View your data in different ways, ask unexpected questions.
  3. Ask people unfamiliar with your data how they view it.
  4. Read books on subjects you know nothing about. (Just general good advice.)
  5. Ask known unconventional people to question your data/subjects. (Like me! Sorry, consulting plug.)

August 17, 2011

Embracing Uncertainty: Applied Machine Learning Comes of Age

Filed under: Machine Learning,Recognition — Patrick Durusau @ 6:48 pm

Embracing Uncertainty: Applied Machine Learning Comes of Age

Christopher Bishop, Microsoft Research Cambridge, ICML 2011 Keynote.

Christopher reports the discovery that solving the problem of guesture controls isn’t one of tracking location, say of your arm from position to position.

Rather it is a question of recognition, at every frame. Which makes the computation tractable on older hardware.

Which makes me wonder how many other problems we have “viewed” the most difficult way possible? Or where viewing as problems of recognition will make previously intractable problems tractable? Won’t know unless we make the effort to ask.

Powered by WordPress