Archive for the ‘Statistical Learning’ Category

The Elements of Statistical Learning (2nd ed.)

Wednesday, December 5th, 2012

The Elements of Statistical Learning (2nd ed.) by Trevor Hastie, Robert Tibshirani and Jerome Friedman. (PDF)

The authors note in the preface to the first edition:

The field of Statistics is constantly challenged by the problems that science and industry brings to its door. In the early days, these problems often came from agricultural and industrial experiments and were relatively small in scope. With the advent of computers and the information age, statistical problems have exploded both in size and complexity. Challenges in the areas of data storage, organization and searching have led to the new field of “data mining”; statistical and computational problems in biology and medicine have created “bioinformatics.” Vast amounts of data are being generated in many fields, and the statistician’s job is to make sense of it all: to extract important patterns and trends, and understand “what the data says.” We call this learning from data.

I’m sympathetic to that sentiment but with the caveat that it is our semantic expectations of the data that give it any meaning to be “learned.”

Data isn’t lurking outside our door with “meaning” captured separate and apart from us. Our fancy otherwise obscures our role in the origin of “meaning” that we attach to data. In part to bolster the claim that the “facts/data say….”

It is us who take up the gauge for our mute friends, facts/data, and make claims on their behalf.

If we recognized those as our claims, perhaps we would be more willing to listen to the claims of others. Perhaps.

I first saw this in a tweet by Michael Conover.

Are Expert Semantic Rules so 1980′s?

Monday, October 8th, 2012

In The Geometry of Constrained Structured Prediction: Applications to Inference and Learning of Natural Language Syntax André Martins proposes advances in inferencing and learning for NLP processing. And it is important work for that reason.

But in his introduction to recent (and rapid) progress in language technologies, the following text caught my eye:

So, what is the driving force behind the aforementioned progress? Essentially, it is the alliance of two important factors: the massive amount of data that became available with the advent of the Web, and the success of machine learning techniques to extract statistical models from the data (Mitchell, 1997; Manning and Schötze, 1999; Schölkopf and Smola, 2002; Bishop, 2006; Smith, 2011). As a consequence, a new paradigm has emerged in the last couple of decades, which directs attention to the data itself, as opposed to the explicit representation of knowledge (Abney, 1996; Pereira, 2000; Halevy et al., 2009). This data-centric paradigm has been extremely fruitful in natural language processing (NLP), and came to replace the classic knowledge representation methodology which was prevalent until the 1980s, based on symbolic rules written by experts. (emphasis added)

Are RDF, Linked Data, topic maps, and other semantic technologies caught in a 1980′s “symbolic rules” paradigm?

Are we ready to make the same break that NLP did, what, thirty (30) years ago now?

To get started on the literature, consider André’s sources:

Abney, S. (1996). Statistical methods and linguistics. In The balancing act: Combining symbolic and statistical approaches to language, pages 1–26. MIT Press, Cambridge, MA.

A more complete citation: Steven Abney. Statistical Methods and Linguistics. In: Judith Klavans and Philip Resnik (eds.), The Balancing Act: Combining Symbolic and Statistical Approaches to Language. The MIT Press, Cambridge, MA. 1996. (Link is to PDF of Abney’s paper.)

Pereira, F. (2000). Formal grammar and information theory: together again? Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 358(1769):1239–1253.

I added a pointer to the Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences abstract for the article. You can see it at: Formal grammar and information theory: together again? (PDF file).

Halevy, A., Norvig, P., and Pereira, F. (2009). The unreasonable effectiveness of data. Intelligent Systems, IEEE, 24(2):8–12.

I added a pointer to the Intelligent Systems, IEEE abstract for the article. You can see it at: The unreasonable effectiveness of data (PDF file).

The Halevy article doesn’t have an abstract per se but the ACM reports one as:

Problems that involve interacting with humans, such as natural language understanding, have not proven to be solvable by concise, neat formulas like F = ma. Instead, the best approach appears to be to embrace the complexity of the domain and address it by harnessing the power of data: if other humans engage in the tasks and generate large amounts of unlabeled, noisy data, new algorithms can be used to build high-quality models from the data. [ACM]

That sounds like a challenge to me. You?

PS: I saw the pointer to this thesis at Christophe Lalanne’s A bag of tweets / September 2012

AI & Statistics 2012

Sunday, April 22nd, 2012

AI & Statistics 2012 (La Palma, Canary Islands)

Proceedings:

http://jmlr.csail.mit.edu/proceedings/papers/v22/

As one big file:

http://jmlr.csail.mit.edu/proceedings/papers/v22/v22.tar.gz

Why you should care:

The fifteenth international conference on Artificial Intelligence and Statistics (AISTATS 2012) will be held on La Palma in the Canary Islands. AISTATS is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas. Since its inception in 1985, the primary goal of AISTATS has been to broaden research in these fields by promoting the exchange of ideas among them. We encourage the submission of all papers which are in keeping with this objective.

The conference runs April 21 – 23, 2012. Sorry!

You will enjoy looking over the papers!

Statistical Learning Part III

Tuesday, November 8th, 2011

Statistical Learning Part III by Steve Miller.

From the post:

I finally got around to cleaning up my home office the other day. The biggest challenge was putting away all the loose books in such a way that I can quickly retrieve them when needed.

In the clutter I found two copies of “The Elements of Statistical Learning” by Trevor Hastie, Robert Tibshirani and Jerome Friedman – one I purchased two years ago and the other I received at a recent Statistical Learning and Data Mining (SLDM III) seminar taught by first two authors. ESL is quite popular in the predictive modeling world, often referred to by aficionados as “the book”, “the SL book” or the “big yellow book” in reverence to its status as the SL bible.

Hastie, Tibshirani and Friedman are Professors of Statistics at Stanford University, the top-rated stats department in the country. For over 20 years, the three have been leaders in the field of statistical learning and prediction that sits between traditional statistical modeling and data mining algorithms from computer science. I was introduced to their work when I took the SLDM course three years ago.

Interesting discussion of statistical learning with Q/A session at the end.

Practical Aggregation of Semantical Program Properties for Machine Learning Based Optimization

Tuesday, September 13th, 2011

Practical Aggregation of Semantical Program Properties for Machine Learning Based Optimization by Mircea Namolaru, Albert Cohen, Grigori Fursin, Ayal Zaks, and Ari Freund.

ABSTRACT

Iterative search combined with machine learning is a promising approach to design optimizing compilers harnessing the complexity of modern computing systems. While traversing a program optimization space, we collect characteristic feature vectors of the program, and use them to discover correlations across programs, target architectures, data sets, and performance. Predictive models can be derived from such correlations, effectively hiding the time-consuming feedback-directed optimization process from the application programmer.

One key task of this approach, naturally assigned to compiler experts, is to design relevant features and implement scalable feature extractors, including statistical models that filter the most relevant information from millions of lines of code. This new task turns out to be a very challenging and tedious one from a compiler construction perspective. So far, only a limited set of ad-hoc, largely syntactical features have been devised. Yet machine learning is only able to discover correlations from information it is fed with: it is critical to select topical program features for a given optimization problem in order for this approach to succeed.

We propose a general method for systematically generating numerical features from a program. This method puts no restrictions on how to logically and algebraically aggregate semantical properties into numerical features. We illustrate our method on the difficult problem of selecting the best possible combination of 88 available optimizations in GCC. We achieve 74% of the potential speedup obtained through iterative compilation on a wide range of benchmarks and four different general-purpose and embedded architectures. Our work is particularly relevant to embedded system designers willing to quickly adapt the optimization heuristics of a mainstream compiler to their custom ISA, microarchitecture, benchmark suite and workload. Our method has been integrated with the publicly released MILEPOST GCC [14].

Read the portions on extracting features, inference of new relations, extracting relations from programs, extracting features from relations and tell me this isn’t a description of pre-topic map processing! ;-)

Practical Machine Learning

Wednesday, May 18th, 2011

Practical Machine Learning, by Michael Jordan (UC Berkeley).

From the course webpage:

This course introduces core statistical machine learning algorithms in a (relatively) non-mathematical way, emphasizing applied problem-solving. The prerequisites are light; some prior exposure to basic probability and to linear algebra will suffice.

This is the Michael Jordan who gave a Posner Lecture at the 24th Annual Conference on Neural Information Processing Systems 2010.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition

Sunday, April 17th, 2011

The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition

by Trevor Hastie, Robert Tibshirani and Jerome Friedman.

The full pdf of the latest printing is available at this site.

Strongly recommend that if you find the text useful, that you ask your library to order the print version.

From the website:

During the past decade has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book descibes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book’s coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting–the first comprehensive treatment of this topic in any book.

This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization and spectral clustering. There is also a chapter on methods for “wide” data (italics p bigger than n), including multiple testing and false discovery rates.

Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful {italics An Introduct ion to the Bootstrap}. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.