Interactive visual machine learning in spreadsheets

Interactive visual machine learning in spreadsheets by Advait Sarkar, Mateja Jamnik, Alan F. Blackwell, Martin Spott.

Abstract:

BrainCel is an interactive visual system for performing general-purpose machine learning in spreadsheets, building on end-user programming and interactive machine learning. BrainCel features multiple coordinated views of the model being built, explaining its current confidence in predictions as well as its coverage of the input domain, thus helping the user to evolve the model and select training examples. Through a study investigating users’ learning barriers while building models using BrainCel, we found that our approach successfully complements the Teach and Try system [1] to facilitate more complex modelling activities.

To assist users in building machine learning models in spreadsheets:

The user should be able to critically evaluate the quality, capabilities, and outputs of the model. We present “BrainCel,” an interface designed to facilitate this. BrainCel enables the end-user to understand:

  1. How their actions modify the model, through visualisations of the model’s evolution.
  2. How to identify good training examples, through a colour-based interface which “nudges” the user to attend to data where the model has low confidence.
  3. Why and how the model makes certain predictions, through a network visualisation of the k-nearest neighbours algorithm; a simple, consistent way of displaying decisions in an arbitrarily high-dimensional space.

A great example of going where users are spending their time, spreadsheets, as opposed to originating new approaches to data they already possess.

To get a deeper understanding of the Sarkar’s approach to users via spreadsheets as an interface, see also:

Spreadsheet interfaces for usable machine learning by Advait Sarkar.

Abstract:

In the 21st century, it is common for people of many professions to have interesting datasets to which machine learning models may be usefully applied. However, they are often unable to do so due to the lack of usable tools for statistical non-experts. We present a line of research into using the spreadsheet — already familiar to end-users as a paradigm for data manipulation — as a usable interface which lowers the statistical and computing knowledge barriers to building and using these models.

Teach and Try: A simple interaction technique for exploratory data modelling by end users by Advait Sarkar, Alan F Blackwell, Mateja Jamnik, Martin Spott.

Abstract:

The modern economy increasingly relies on exploratory data analysis. Much of this is dependent on data scientists – expert statisticians who process data using statistical tools and programming languages. Our goal is to offer some of this analytical power to end-users who have no statistical training through simple interaction techniques and metaphors. We describe a spreadsheet-based interaction technique that can be used to build and apply sophisticated statistical models such as neural networks, decision trees, support vector machines and linear regression. We present the results of an experiment demonstrating that our prototype can be understood and successfully applied by users having no professional training in statistics or computing, and that the experience of interacting with the system leads them to acquire some understanding of the concepts underlying exploratory statistical modelling.

Sarkar doesn’t mention it but while non-expert users lack skills with machine learning tools, they do have expertise with their own data and domain. Data/domain expertise that is more difficult to communicate to an expert user than machine learning techniques to the non-expert.

Comparison of machine learning expert vs. domain data expert analysis lies in the not too distant and interesting future.

I first saw this in a tweet by Felienne Hermans.

Comments are closed.