Archive for the ‘Deep Learning’ Category

Teaching Deep Convolutional Neural Networks to Play Go [Networks that can’t explain their play]

Sunday, October 18th, 2015

Teaching Deep Convolutional Neural Networks to Play Go by Christopher Clark, Amos Storkey.


Mastering the game of Go has remained a long standing challenge to the field of AI. Modern computer Go systems rely on processing millions of possible future positions to play well, but intuitively a stronger and more ‘humanlike’ way to play the game would be to rely on pattern recognition abilities rather then brute force computation. Following this sentiment, we train deep convolutional neural networks to play Go by training them to predict the moves made by expert Go players. To solve this problem we introduce a number of novel techniques, including a method of tying weights in the network to ‘hard code’ symmetries that are expect to exist in the target function, and demonstrate in an ablation study they considerably improve performance. Our final networks are able to achieve move prediction accuracies of 41.1% and 44.4% on two different Go datasets, surpassing previous state of the art on this task by significant margins. Additionally, while previous move prediction programs have not yielded strong Go playing programs, we show that the networks trained in this work acquired high levels of skill. Our convolutional neural networks can consistently defeat the well known Go program GNU Go, indicating it is state of the art among programs that do not use Monte Carlo Tree Search. It is also able to win some games against state of the art Go playing program Fuego while using a fraction of the play time. This success at playing Go indicates high level principles of the game were learned.

The last line of the abstract caught my eye:

This success at playing Go indicates high level principles of the game were learned.

That statement is expanded in 4.3 Playing Go:

The results are very promising. Even though the networks are playing using a ‘zero step look ahead’ policy, and using a fraction of the computation time as their opponents, they are still able to play better then GNU Go and take some games away from Fuego. Under these settings GNU Go might play at around a 6-8 kyu ranking and Fuego at 2-3 kyu, which implies the networks are achieving a ranking of approximately 4-5 kyu. For a human player reaching this ranking would normally require years of study. This indicates that sophisticated knowledge of the game was acquired. This also indicates great potential for a Go program that integrates the information produced by such a network.

An interesting limitation that the network can’t communicate what it has learned. It can only produce an answer for a given situation. In gaming situations that opaqueness isn’t immediately objectionable.

But what if the situation was fire/don’t fire in a combat situation? Would the limitation that the network can only say yes or no, with no way to explain its answer, be acceptable?

Is that any worse than humans inventing explanations for decisions that weren’t the result of any rational thinking process?

Some additional Go resources you may find useful: American Go Association, Go Game Guru (with a printable Go board and stones), (has a Japanese dictionary). Those site will lead you to many other Go sites.

Neural Networks and Deep Learning

Wednesday, June 3rd, 2015

Neural Networks and Deep Learning by Michael Nielsen.

From the webpage:

Neural Networks and Deep Learning is a free online book. The book will teach you about:

  • Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data
  • Deep learning, a powerful set of techniques for learning in neural networks

Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you the core concepts behind neural networks and deep learning.

The book is currently an incomplete beta draft. More chapters will be added over the coming months. For now, you can:

Michael starts off with a task that we all mastered as small children, recognizing hand written digits. Along the way, you will learn not just the mechanics of how the characters are recognized but why neural networks work the way they do.

Great introductory material to pass along to a friend.

Deep Learning (MIT Press Book) – Update

Friday, May 22nd, 2015

Deep Learning (MIT Press Book) by Yoshua Bengio, Ian Goodfellow and Aaron Courville.

I last mentioned this book last August and wanted to point out that a new draft appeared on 19/05/2015.

Typos and opportunities for improvement still exist! Now is your chance to help the authors make this a great book!


Summer DIY: Combination Lock Cracker

Monday, May 18th, 2015

Former virus writer open-sources his DIY combination lock-picking robot by Paul Ducklin.

Amusing account of Samy Kamkar and his hacking history up to and including:

…an open-source 3D-printed robot that can crack a combination lock in just 30 seconds by twiddling the dial all by itself.

Paul includes some insights into opening combination locks.

Good opportunity to learn about 3D printing and fundamentals of combination locks.

Advanced: Safe Cracker

If that seems too simple, try safe locks with the 3D-printed robot (adjust for the size/torque required to turn the dial). The robot will turn the dial more consistently than any human hand. Use very sensitive vibration detectors to pick up the mechanical movement of the lock, capture that vibration as a digital file, from knowledge of the lock, you know the turns, directions, etc.

Then use deep learning over several passes on the lock to discover the opening sequence. Need a stand for the robot to isolate its vibrations from the safe housing and for it to reach the combination dial.

Or you can call a locksmith and pay big bugs to open a safe.

The DIY way has you learning some mechanics, a little physics and deep learning.

If you are up for a real challenge, consider the X-09™ Locks (NSN #5340-01-498-2758), which is certified to meet FF-L-2740A, the “the US Government’s highest security standard for container locks and doors.”

The factory default combination is 50-25-50, so try that first. 😉

Practical Text Analysis using Deep Learning

Friday, May 1st, 2015

Practical Text Analysis using Deep Learning by Michael Fire.

From the post:

Deep Learning has become a household buzzword these days, and I have not stopped hearing about it. In the beginning, I thought it was another rebranding of Neural Network algorithms or a fad that will fade away in a year. But then I read Piotr Teterwak’s blog post on how Deep Learning can be easily utilized for various image analysis tasks. A powerful algorithm that is easy to use? Sounds intriguing. So I decided to give it a closer look. Maybe it will be a new hammer in my toolbox that can later assist me to tackle new sets of interesting problems.

After getting up to speed on Deep Learning (see my recommended reading list at the end of this post), I decided to try Deep Learning on NLP problems. Several years ago, Professor Moshe Koppel gave a talk about how he and his colleagues succeeded in determining an author’s gender by analyzing his or her written texts. They also released a dataset containing 681,288 blog posts. I found it remarkable that one can infer various attributes about an author by analyzing the text, and I’ve been wanting to try it myself. Deep Learning sounded very versatile. So I decided to use it to infer a blogger’s personal attributes, such as age and gender, based on the blog posts.

If you haven’t gotten into deep learning, here’s another opportunity focused on natural language processing. You can follow Michael’s general directions to learn on your own or follow more detailed instructions in his Ipython notebook.


Deep Space Navigation With Deep Learning

Saturday, April 18th, 2015

Well, that’s not exactly the title but the paper does describe a better than 99% accuracy when compared to human recognition of galaxy images by type. I assume galaxy type is going to be a question on deep space navigation exams in the distant future. 😉

Rotation-invariant convolutional neural networks for galaxy morphology prediction by Sander Dieleman, Kyle W. Willett, Joni Dambre.


Measuring the morphological parameters of galaxies is a key requirement for studying their formation and evolution. Surveys such as the Sloan Digital Sky Survey (SDSS) have resulted in the availability of very large collections of images, which have permitted population-wide analyses of galaxy morphology. Morphological analysis has traditionally been carried out mostly via visual inspection by trained experts, which is time-consuming and does not scale to large (≳104) numbers of images.

Although attempts have been made to build automated classification systems, these have not been able to achieve the desired level of accuracy. The Galaxy Zoo project successfully applied a crowdsourcing strategy, inviting online users to classify images by answering a series of questions. Unfortunately, even this approach does not scale well enough to keep up with the increasing availability of galaxy images.

We present a deep neural network model for galaxy morphology classification which exploits translational and rotational symmetry. It was developed in the context of the Galaxy Challenge, an international competition to build the best model for morphology classification based on annotated images from the Galaxy Zoo project.

For images with high agreement among the Galaxy Zoo participants, our model is able to reproduce their consensus with near-perfect accuracy (>99%) for most questions. Confident model predictions are highly accurate, which makes the model suitable for filtering large collections of images and forwarding challenging images to experts for manual annotation. This approach greatly reduces the experts’ workload without affecting accuracy. The application of these algorithms to larger sets of training data will be critical for analysing results from future surveys such as the LSST.

I particularly like the line:

Confident model predictions are highly accurate, which makes the model suitable for filtering large collections of images and forwarding challenging images to experts for manual annotation.

It reminds me of a suggestion I made for doing something quite similar where the uncertainly of crowd classifiers on a particular letter (as in a manuscript) would trigger the forwarding of that portion to an expert for a “definitive” read. You would surprised at the resistance you can encounter to the suggestion that no special skills are needed to read Greek manuscripts, which are in many cases as clear as when they were written in the early Christian era. Some aren’t and some aspects of them require expertise, but that isn’t to say they all require expertise.

Of course, if successful, such a venture could quite possibly result in papers that cite the images of all extant biblical witnesses and all of the variant texts, as opposed to those that cite a fragment entrusted to them for publication. The difference being whether you want to engage in scholarship, the act of interpreting witnesses or whether you wish to tell the proper time and make a modest noise while doing so.

Recommending music on Spotify with deep learning

Friday, April 17th, 2015

Recommending music on Spotify with deep learning by Sander Dieleman.

From the post:

This summer, I’m interning at Spotify in New York City, where I’m working on content-based music recommendation using convolutional neural networks. In this post, I’ll explain my approach and show some preliminary results.


This is going to be a long post, so here’s an overview of the different sections. If you want to skip ahead, just click the section title to go there.

If you are interested in the details of deep learning and recommendation for music, you have arrived at the right place!

Walking through Sander’s post will take some time but it will repay your efforts handsomely.

Not to mention Spotify having the potential to broaden your musical horizons!

I first saw this in a tweet by Mica McPeeters.

Google DeepMind Resources

Monday, March 30th, 2015

Google DeepMind Resources

A collection of all the Google DeepMind publications to date.

Twenty-two (22) papers so far!

A nice way to start your reading week!


Classifying Plankton With Deep Neural Networks

Monday, March 23rd, 2015

Classifying Plankton With Deep Neural Networks by Sander Dieleman.

From the post:

The National Data Science Bowl, a data science competition where the goal was to classify images of plankton, has just ended. I participated with six other members of my research lab, the Reservoir lab of prof. Joni Dambre at Ghent University in Belgium. Our team finished 1st! In this post, we’ll explain our approach.

The ≋ Deep Sea ≋ team consisted of Aäron van den Oord, Ira Korshunova, Jeroen Burms, Jonas Degrave, Lionel Pigou, Pieter Buteneers and myself. We are all master students, PhD students and post-docs at Ghent University. We decided to participate together because we are all very interested in deep learning, and a collaborative effort to solve a practical problem is a great way to learn.

There were seven of us, so over the course of three months, we were able to try a plethora of different things, including a bunch of recently published techniques, and a couple of novelties. This blog post was written jointly by the team and will cover all the different ingredients that went into our solution in some detail.


This blog post is going to be pretty long! Here’s an overview of the different sections. If you want to skip ahead, just click the section title to go there.


The problem

The goal of the competition was to classify grayscale images of plankton into one of 121 classes. They were created using an underwater camera that is towed through an area. The resulting images are then used by scientists to determine which species occur in this area, and how common they are. There are typically a lot of these images, and they need to be annotated before any conclusions can be drawn. Automating this process as much as possible should save a lot of time!

The images obtained using the camera were already processed by a segmentation algorithm to identify and isolate individual organisms, and then cropped accordingly. Interestingly, the size of an organism in the resulting images is proportional to its actual size, and does not depend on the distance to the lens of the camera. This means that size carries useful information for the task of identifying the species. In practice it also means that all the images in the dataset have different sizes.

Participants were expected to build a model that produces a probability distribution across the 121 classes for each image. These predicted distributions were scored using the log loss (which corresponds to the negative log likelihood or equivalently the cross-entropy loss).

This loss function has some interesting properties: for one, it is extremely sensitive to overconfident predictions. If your model predicts a probability of 1 for a certain class, and it happens to be wrong, the loss becomes infinite. It is also differentiable, which means that models trained with gradient-based methods (such as neural networks) can optimize it directly – it is unnecessary to use a surrogate loss function.

Interestingly, optimizing the log loss is not quite the same as optimizing classification accuracy. Although the two are obviously correlated, we paid special attention to this because it was often the case that significant improvements to the log loss would barely affect the classification accuracy of the models.

This rocks!

Code is coming soon to Github!

Certainly of interest to marine scientists but also to anyone in bio-medical imaging.

The problem of too much data and too few experts is a common one.

What I don’t recall seeing are releases of pre-trained classifiers. Is the art developing too quickly for that to be a viable product? Just curious.

I first saw this in a tweet by Angela Zutavern.

FaceNet: A Unified Embedding for Face Recognition and Clustering

Saturday, March 21st, 2015

FaceNet: A Unified Embedding for Face Recognition and Clustering by Florian Schroff, Dmitry Kalenichenko and James Philbin.


Despite significant recent advances in the field of face recognition, implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors.

Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches. To train, we use triplets of roughly aligned matching / non-matching face patches generated using a novel online triplet mining method. The benefit of our approach is much greater representational efficiency: we achieve state-of-the-art face recognition performance using only 128-bytes per face.

On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99.63%. On YouTube Faces DB it achieves 95.12%. Our system cuts the error rate in comparison to the best published result by 30% on both datasets. (emphasis in the original)

With accuracy at 99.63%, the possibilities are nearly endless. 😉

How long will it be before some start-up is buying ATM feeds from banks? Fast and accurate location information would be of interest to process servers, law enforcement, debt collectors, various government agencies, etc.

Looking a bit further ahead, ATM surrogate services will become a feature of better hotels and escort services.

Convolutional Neural Networks for Visual Recognition

Friday, March 20th, 2015

Convolutional Neural Networks for Visual Recognition by Fei-Fei Li and Andrej Karpathy.

From the description:

Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems. This course is a deep dive into details of the deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. During the 10-week course, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. The final assignment will involve training a multi-million parameter convolutional neural network and applying it on the largest image classification dataset (ImageNet). We will focus on teaching how to set up the problem of image recognition, the learning algorithms (e.g. backpropagation), practical engineering tricks for training and fine-tuning the networks and guide the students through hands-on assignments and a final course project. Much of the background and materials of this course will be drawn from the ImageNet Challenge.

Be sure to check out the course notes!

A very nice companion for your DIGITS experiments over the weekend.

I first saw this in a tweet by Lasse.

DIGITS: Deep Learning GPU Training System

Friday, March 20th, 2015

DIGITS: Deep Learning GPU Training System by Allison Gray.

From the post:

The hottest area in machine learning today is Deep Learning, which uses Deep Neural Networks (DNNs) to teach computers to detect recognizable concepts in data. Researchers and industry practitioners are using DNNs in image and video classification, computer vision, speech recognition, natural language processing, and audio recognition, among other applications.

The success of DNNs has been greatly accelerated by using GPUs, which have become the platform of choice for training these large, complex DNNs, reducing training time from months to only a few days. The major deep learning software frameworks have incorporated GPU acceleration, including Caffe, Torch7, Theano, and CUDA-Convnet2. Because of the increasing importance of DNNs in both industry and academia and the key role of GPUs, last year NVIDIA introduced cuDNN, a library of primitives for deep neural networks.

Today at the GPU Technology Conference, NVIDIA CEO and co-founder Jen-Hsun Huang introduced DIGITS, the first interactive Deep Learning GPU Training System. DIGITS is a new system for developing, training and visualizing deep neural networks. It puts the power of deep learning into an intuitive browser-based interface, so that data scientists and researchers can quickly design the best DNN for their data using real-time network behavior visualization. DIGITS is open-source software, available on GitHub, so developers can extend or customize it or contribute to the project.

Apologies for the delay in seeing Allison’s post but at least I saw it before the weekend!

In addition to a great write-up, Allison walks through how she has used DIGITS. In terms of “onboarding” to software, it doesn’t get any better than this.

What are you going to apply DIGITS to?

I first saw this in a tweet by Christian Rosnes.

Use The Code Luke!

Wednesday, March 18th, 2015

Hacker’s guide to Neural Networks by Andrej Karpathy.

From the post:

Hi there, I'm a CS PhD student at Stanford. I've worked on Deep Learning for a few years as part of my research and among several of my related pet projects is ConvNetJS – a Javascript library for training Neural Networks. Javascript allows one to nicely visualize what's going on and to play around with the various hyperparameter settings, but I still regularly hear from people who ask for a more thorough treatment of the topic. This article (which I plan to slowly expand out to lengths of a few book chapters) is my humble attempt. It's on web instead of PDF because all books should be, and eventually it will hopefully include animations/demos etc.

My personal experience with Neural Networks is that everything became much clearer when I started ignoring full-page, dense derivations of backpropagation equations and just started writing code. Thus, this tutorial will contain very little math (I don't believe it is necessary and it can sometimes even obfuscate simple concepts). Since my background is in Computer Science and Physics, I will instead develop the topic from what I refer to as hackers's perspective. My exposition will center around code and physical intuitions instead of mathematical derivations. Basically, I will strive to present the algorithms in a way that I wish I had come across when I was starting out.

"…everything became much clearer when I started writing code."

You might be eager to jump right in and learn about Neural Networks, backpropagation, how they can be applied to datasets in practice, etc. But before we get there, I'd like us to first forget about all that. Let's take a step back and understand what is really going on at the core. Lets first talk about real-valued circuits.

I won’t say you don’t need to more formal methods as well but everyone learns in different ways. If doing the code first is better for you, here’s a treatment of deep learning from that perspective.

The last comments were approximately four (4) months ago. I am hopeful this work will continue.

Deep Learning for Natural Language Processing (March – June, 2015)

Saturday, March 7th, 2015

CS224d: Deep Learning for Natural Language Processing by Richard Socher.


Natural language processing (NLP) is one of the most important technologies of the information age. Understanding complex language utterances is also a crucial part of artificial intelligence. Applications of NLP are everywhere because people communicate most everything in language: web search, advertisement, emails, customer service, language translation, radiology reports, etc. There are a large variety of underlying tasks and machine learning models powering NLP applications. Recently, deep learning approaches have obtained very high performance across many different NLP tasks. These models can often be trained with a single end-to-end model and do not require traditional, task-specific feature engineering. In this spring quarter course students will learn to implement, train, debug, visualize and invent their own neural network models. The course provides a deep excursion into cutting-edge research in deep learning applied to NLP. The final project will involve training a complex recurrent neural network and applying it to a large scale NLP problem. On the model side we will cover word vector representations, window-based neural networks, recurrent neural networks, long-short-term-memory models, recursive neural networks, convolutional neural networks as well as some very novel models involving a memory component. Through lectures and programming assignments students will learn the necessary engineering tricks for making neural networks work on practical problems.

Assignments, course notes and slides will all be posted online. You are free to “follow along” but no credit.

Are you ready for the cutting-edge?

I first saw this in a tweet by Randall Olson.

Code for DeepMind & Commentary

Monday, March 2nd, 2015

If you are following the news of Google’s Atari buster, ;-), the following items will be of interest:

Code for Human-Level Control through Deep Reinforcement Learning, which offers the source code to accompany the Nature article.

DeepMind’s Nature Paper and Earlier Related Work by Jürgen Schmidhuber. Jürgen takes issue with some of the claims made in the abstract of the Nature paper. Quite usefully he cites references and provides links to numerous other materials on deep learning.

How soon before this comes true?

In an online multiplayer game, no one knows you are an AI.

Beginning deep learning with 500 lines of Julia

Monday, March 2nd, 2015

Beginning deep learning with 500 lines of Julia by Deniz Yuret.

From the post:

There are a number of deep learning packages out there. However most sacrifice readability for efficiency. This has two disadvantages: (1) It is difficult for a beginner student to understand what the code is doing, which is a shame because sometimes the code can be a lot simpler than the underlying math. (2) Every other day new ideas come out for optimization, regularization, etc. If the package used already has the trick implemented, great. But if not, it is difficult for a researcher to test the new idea using impenetrable code with a steep learning curve. So I started writing KUnet.jl which currently implements backprop with basic units like relu, standard loss functions like softmax, dropout for generalization, L1-L2 regularization, and optimization using SGD, momentum, ADAGRAD, Nesterov’s accelerated gradient etc. in less than 500 lines of Julia code. Its speed is competitive with the fastest GPU packages (here is a benchmark). For installation and usage information, please refer to the GitHub repo. The remainder of this post will present (a slightly cleaned up version of) the code as a beginner’s neural network tutorial (modeled after Honnibal’s excellent parsing example).

This tutorial “begins” with you coding deep learning. If you need a bit more explanation on deep learning, you could do far worse than consulting Deep Learning: Methods and Applications or Deep Learning in Neural Networks: An Overview.

If you are already at the programming stage of deep learning, enjoy!

For Julia, Julia (homepage), Julia (online manual), (Julia blog aggregator), should be enough to get you started.

I first saw this in a tweet by Andre Pemmelaar.

Deep Learning Track at GTC

Friday, February 20th, 2015

Deep Learning Track at GTC

March 17-20, 2015 | San Jose, California

From the webpage:

The Deep Learning Track at GTC features over 40 sessions from industry experts on topics ranging from visual object recognition to the next generation of speech.

Just the deep learning sessions.

Keynote Speakers in Deep Learning track:

Jeff Dean – Google, Senior Fellow

Jen-Hsun – Huang NVIDIA, CEO & Co-Founder

Andrew Ng – Baidu, Chief Scientist

Featured Speakers:

John Canny – UC Berkeley, Professor

Dan Ciresan – IDSIA, Senior Researcher

Rob Fergus – Facebook, Research Scientist

Yangqing Jia – Google, Research Scientist

Ian Lane – Carnegie Mellon University, Assistant Research Professor

Ren Wu – Baidu, Distinguished Scientist

Have you registered yet? If not, why not? 😉

Expecting lots of blog posts covering presentations at the conference.

MS Deep Learning Beats Humans (and MS is modest about it)

Tuesday, February 10th, 2015

Microsoft researchers say their newest deep learning system beats humans — and Google

Two stories for the price of one! Microsoft’s deep learning project beats human recognition on a data set and Microsoft is modest about it. 😉

From the post:

The Microsoft creation got a 4.94 percent error rate for the correct classification of images in the 2012 version of the widely recognized ImageNet data set , compared with a 5.1 percent error rate among humans, according to the paper. The challenge involved identifying objects in the images and then correctly selecting the most accurate categories for the images, out of 1,000 options. Categories included “hatchet,” “geyser,” and “microwave.”

“While our algorithm produces a superior result on this particular dataset, this does not indicate that machine vision outperforms human vision on object recognition in general,” they wrote. “On recognizing elementary object categories (i.e., common objects or concepts in daily lives) such as the Pascal VOC task, machines still have obvious errors in cases that are trivial for humans. Nevertheless, we believe that our results show the tremendous potential of machine algorithms to match human-level performance on visual recognition.”

You can grab the paper here.

Hoping that Microsoft sets a trend in reporting breakthroughs in big data and machine learning. Stating the achievement but also its limitations may lead to more accurate reporting of technical news. Not holding my breath but I am hopeful.

I first saw this in a tweet by GPUComputing.

Facebook open sources tools for bigger, faster deep learning models

Saturday, January 17th, 2015

Facebook open sources tools for bigger, faster deep learning models by Derrick Harris.

From the post:

Facebook on Friday open sourced a handful of software libraries that it claims will help users build bigger, faster deep learning models than existing tools allow.

The libraries, which Facebook is calling modules, are alternatives for the default ones in a popular machine learning development environment called Torch, and are optimized to run on Nvidia graphics processing units. Among the modules are those designed to rapidly speed up training for large computer vision systems (nearly 24 times, in some cases), to train systems on potentially millions of different classes (e.g., predicting whether a word will appear across a large number of documents, or whether a picture was taken in any city anywhere), and an optimized method for building language models and word embeddings (e.g., knowing how different words are related to each other).

“‘[T]here is no way you can use anything existing” to achieve some of these results, said Soumith Chintala, an engineer with Facebook Artificial Intelligence Research.

How very awesome! Keeping abreast of the latest releases and papers on deep learning is turning out to be a real chore. Enjoyable but a time sink none the less.

Derrick’s post and the release from Facebook have more details.

Apologies for the “lite” posting today but I have been proofing related specifications where one defines a term and the other uses the term, but doesn’t cite the other specification’s definition or give its own. Do those mean the same thing? Probably the same thing but users outside the process may or may not realize that. Particularly in translation.

I first saw this in a tweet by Kirk Borne.

Deep Learning: Methods and Applications

Tuesday, January 13th, 2015

Deep Learning: Methods and Applications by Li Deng and Dong Yu. (Li Deng and Dong Yu (2014), “Deep Learning: Methods and Applications”, Foundations and Trends® in Signal Processing: Vol. 7: No. 3–4, pp 197-387.


This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge of the authors; (2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and (3) the application areas that have the potential to be impacted significantly by deep learning and that have been experiencing research growth, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.


Deep learning, Machine learning, Artificial intelligence, Neural networks, Deep neural networks, Deep stacking networks, Autoencoders, Supervised learning, Unsupervised learning, Hybrid deep networks, Object recognition, Computer vision, Natural language processing, Language models, Multi-task learning, Multi-modal processing

If you are looking for another rich review of the area of deep learning, you have found the right place. Resources, conferences, primary materials, etc. abound.

Don’t be thrown off by the pagination. This is issues 3 and 4 of the periodical Foundations and Trends® in Signal Processing. You are looking at the complete text.

Be sure to read Selected Applications in Information Retrieval (Section 9, pages 308-319). Where 9.2 starts with:

Here we discuss the “semantic hashing” approach for the application of deep autoencoders to document indexing and retrieval as published in [159, 314]. It is shown that the hidden variables in the final layer of a DBN not only are easy to infer after using an approximation based on feed-forward propagation, but they also give a better representation of each document, based on the word-count features, than the widely used latent semantic analysis and the traditional TF-IDF approach for information retrieval. Using the compact code produced by deep autoencoders, documents are mapped to memory addresses in such a way that semantically similar text documents are located at nearby addresses to facilitate rapid document retrieval. The mapping from a word-count vector to its compact code is highly efficient, requiring only a matrix multiplication and a subsequent sigmoid function evaluation for each hidden layer in the encoder part of the network.

That is only one of the applications detailed in this work. I do wonder if this will be the approach that breaks the “document” (as in this work for example) model of information retrieval? If I am searching for “deep learning” and “information retrieval,” a search result that returns these pages would be a great improvement over the entire document. (At the user’s option.)

Before the literature on deep learning gets much more out of hand, now would be a good time to start building not only a corpus of the literature but a sub-document level topic map to ideas and motifs as they develop. That would be particularly useful as patents start to appear for applications of deep learning. (Not a volunteer or charitable venture.)

I first saw this in a tweet by StatFact.

Use Google’s Word2Vec for movie reviews

Saturday, January 10th, 2015

Use Google’s Word2Vec for movie reviews Kaggle Tutorial.

From the webpage:

In this tutorial competition, we dig a little “deeper” into sentiment analysis. Google’s Word2Vec is a deep-learning inspired method that focuses on the meaning of words. Word2Vec attempts to understand meaning and semantic relationships among words. It works in a way that is similar to deep approaches, such as recurrent neural nets or deep neural nets, but is computationally more efficient. This tutorial focuses on Word2Vec for sentiment analysis.

Sentiment analysis is a challenging subject in machine learning. People express their emotions in language that is often obscured by sarcasm, ambiguity, and plays on words, all of which could be very misleading for both humans and computers. There’s another Kaggle competition for movie review sentiment analysis. In this tutorial we explore how Word2Vec can be applied to a similar problem.

Mark Needham mentions this Kaggle tutorial in Thoughts on Software Development Python NLTK/Neo4j:….

The description also mentions:

Since deep learning is a rapidly evolving field, large amounts of the work has not yet been published, or exists only as academic papers. Part 3 of the tutorial is more exploratory than prescriptive — we experiment with several ways of using Word2Vec rather than giving you a recipe for using the output.

To achieve these goals, we rely on an IMDB sentiment analysis data set, which has 100,000 multi-paragraph movie reviews, both positive and negative.

Movie, book, TV, etc., reviews are fairly common.

Where would you look for a sentiment analysis data set on contemporary U.S. criminal proceedings?

Deep Learning in Neural Networks: An Overview

Saturday, January 10th, 2015

Deep Learning in Neural Networks: An Overview by Jüergen Schmidhuber.


In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

A godsend for any graduate student working in deep learning! Not only does Jüergen cover recent literature but he also traces the ideas back into history. Fortunately for all of us interested in the history of ideas in computer science, both the LATEX source, DeepLearning8Oct2014.tex and the BIBTEX file deep.bib are available.

Be forewarned that deep.bib has 2944 entries.

This is what was termed “European” scholarship, scholarship that traces ideas across disciplines and time. As opposed to more common American scholarship in the sciences (both social and otherwise), which has a discipline focus and shorter time point of view. There are exceptions both ways but I point out this difference to urge you to take a broader and longer range view of ideas.

Simple Pictures That State-of-the-Art AI Still Can’t Recognize

Thursday, January 8th, 2015

Simple Pictures That State-of-the-Art AI Still Can’t Recognize by Kyle VanHemert.

I encountered this non-technical summary of Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images, which I covered as: Deep Neural Networks are Easily Fooled:… earlier today.

While I am sure you have read the fuller explanation, I wanted to replicate the top 40 images for your consideration:


Select the image to see a larger, readable version.

Enjoy the images and pass the Wired article along to friends.

Google Maps For The Genome

Tuesday, January 6th, 2015

This man is trying to build Google Maps for the genome by Daniela Hernandez.

From the post:

The Human Genome Project was supposed to unlock all of life’s secrets. Once we had a genetic roadmap, we’d be able to pinpoint why we got ill and figure out how to fix our maladies.

That didn’t pan out. Ten years and more than $4 billion dollars later, we got the equivalent of a medieval hand-drawn map when what we needed was Google Maps.

“Even though we had the text of the genome, people didn’t know how to interpret it, and that’s really puzzled scientists for the last decade,” said Brendan Frey, a computer scientist and medical researcher at the University of Toronto. “They have no idea what it means.”

For the past decade, Frey has been on a quest to build scientists a sort of genetic step-by-step navigation system for the genome, powered by some of the same artificial-intelligence systems that are now being used by big tech companies like Google, Facebook, Microsoft, IBM and Baidu for auto-tagging images, processing language, and showing consumers more relevant online ads.

Today Frey and his team are unveiling a new artificial intelligence system in the top-tier academic journal Science that’s capable of predicting how mutations in the DNA affect something called gene splicing in humans. That’s important because many genetic diseases–including cancers and spinal muscular atrophy, a leading cause of infant mortality–are the result of gene splicing gone wrong.

“It’s a turning point in the field,” said Terry Sejnowski, a computational neurobiologist at the Salk Institute in San Diego and a long-time machine learning researcher. “It’s bringing to bear a completely new set of techniques, and that’s when you really make advances.”

Those leaps could include better personalized medicine. Imagine you have a rare disease doctors suspect might be genetic but that they’ve never seen before. They could sequence your genome, feed the algorithm your data, and, in theory, it would give doctors insights into what’s gone awry with your genes–maybe even how to fix things.

For now, the system can only detect one minor genetic pathway for diseases, but the platform can be generalized to other areas, says Frey, and his team is already working on that.

I really like the line:

Ten years and more than $4 billion dollars later, we got the equivalent of a medieval hand-drawn map when what we needed was Google Maps.

Daniela gives a high level view of deep learning and its impact on genomic research. There is still much work to be done but it sounds very promising.

I tried to find a non-paywall copy of Frey’s most recent publication in Science but to no avail. After all, the details of such a break trough couldn’t possibly interest anyone other than subscribers to Science.

In lieu of the details, I did find an image on the Frey Lab. Probabilistic and Statistical Inference Group, University of Toronto page:


I am very sympathetic to publishers making money. At one time I worked for a publisher and they have to pay for staff and that involves making money. However, hoarding information to which publishers contribute so little, isn’t a good model. Leaving public access to one side, specialty publishers have a fragile economic position based on their subscriber base.

An alternative model to managing individual and library subscriptions would be to site license their publications to national governments over the WWW. Their publications would become expected resources in every government library and used by everyone who had an interest in the subject. A stable source of income (governments), becoming part of the expected academic infrastructure, much wider access to a broader audience, with additional revenue from anyone who wanted a print copy.

Sorry, a diversion from the main point, which is an important success story about deep learning.

I first saw this in a tweet by Nikhil Buduma.

Deep Learning in a Nutshell

Tuesday, January 6th, 2015

Deep Learning in a Nutshell by Nikhil Buduma.

From the post:

Deep learning. Neural networks. Backpropagation. Over the past year or two, I’ve heard these buzz words being tossed around a lot, and it’s something that has definitely seized my curiosity recently. Deep learning is an area of active research these days, and if you’ve kept up with the field of computer science, I’m sure you’ve come across at least some of these terms at least once.

Deep learning can be an indimidating concept, but it’s becoming increasingly important these days. Google’s already making huge strides in the space with the Google Brain project and its recent acquisition of the London-based deep learning startup DeepMind. Moreover, deep learning methods are beating out traditional machine learning approaches on virtually every single metric.

So what exactly is deep learning? How does it work? And most importantly, why should you even care?

One of the more accessible introductions to deep learning that I have seen recently.

There are hints of more posts to come on deep learning topics.


Deep learning Reading List

Tuesday, January 6th, 2015

Deep learning Reading List by J Mohamed Zahoor.

Fifty-two unannotated links but enough to keep you busy for a while. 😉

Update – 7 January 2015

Working Now: The dataset link: Berkeley Segmentation Dataset 500 was broken. Working now. Thanks Berkeley!

I first saw this in a tweet by Alexander Beck.

Minerva: a fast and flexible system for deep learning

Saturday, January 3rd, 2015

Minerva: a fast and flexible system for deep learning

From the post:

Minerva is a fast and flexible tool for deep learning. It provides NDarray programming interface, just like Numpy. Python bindings and C++ bindings are both available. The resulting code can be run on CPU or GPU. Multi-GPU support is very easy. Please refer to the examples to see how multi-GPU setting is used.Minerva is a fast and flexible tool for deep learning. It provides NDarray programming interface, just like Numpy. Python bindings and C++ bindings are both available. The resulting code can be run on CPU or GPU. Multi-GPU support is very easy. Please refer to the examples to see how multi-GPU setting is used.


  • Matrix programming interface
  • Easy interaction with NumPy
  • Multi-GPU, multi-CPU support
  • Good performance: ImageNet AlexNet training achieves 213 and 403 images/s with one and two Titan GPU, respectivly. Four GPU cards number will be coming soon.

I first saw this in a blog post by Danny Bickson, Minerva: open source deep learning on GPU software from MS.

Deep learning is gaining traction fast. Fast enough that when government contractors convince the FBI wire tapping is no long a matter of plugging into the local junction box, they may start working on deep learning.

Before deep learning gets to that point, defensive measures against deep learning need to be developed. Given the variety of deep learning approaches and algorithms that is going to be a real challenge.

Perhaps immutable data structures where copying enables real time performance in presenting results that are expected? While maintaining a copy of the unexpected results?

I think there is a presumption that on querying, information systems repeat the information they have stored. That’s a fairly naive view of data storage. We know it is a matter of permissions to “see” data. Why shouldn’t the answer you see also depend upon permissions?

Defenses against deep learning and reactive data storage may become very relevant in the not too distant future. Give it some thought.

Show and Tell (C-suite version)

Thursday, January 1st, 2015

How Google “Translates” Pictures into Words Using Vector Space Mathematics

From the post:

Translating one language into another has always been a difficult task. But in recent years, Google has transformed this process by developing machine translation algorithms that change the nature of cross cultural communications through Google Translate.

Now that company is using the same machine learning technique to translate pictures into words. The result is a system that automatically generates picture captions that accurately describe the content of images. That’s something that will be useful for search engines, for automated publishing and for helping the visually impaired navigate the web and, indeed, the wider world.

One of the best c-suite level explanations I have seen of Show and Tell: A Neural Image Caption Generator.

May be useful to you in obtaining support/funding for similar efforts in your domain.

Take particular note of the decision to not worry overmuch about the meaning of words. I would never make that simplifying assumption. Just runs counter to the grain for the meaning of the words to not matter. However, I am very glad that Oriol Vinyals and colleagues made that assumption!

That assumption enables the processing of images at a large scale.

I started to write that I would not use such an assumption for more precise translation tasks, say the translation of cuneiform tablets. But as a rough finding aid for untranslated cuneiform or hieroglyphic texts, this could be the very thing. Doesn’t have to be 100% precise or accurate, just enough that the vast archives of ancient materials becomes easier to use.

Is there an analogy for topic maps here? That topic maps need not be final production quality materials when released but can be refined over time by authors, editors and others?

Like Wikipedia but not quite so eclectic and more complete. Imagine a Solr reference manual that inlines or at least links to the most recent presentations and discussions on a particular topic. And incorporates information from such sources into the text.

Is Google offering us “good enough” results with data, expectations that others will refine the data further? Perhaps a value-add economic model where the producer of the “good enough” content has an interest in the further refinement of that data by others?

DL4J: Deep Learning for Java

Wednesday, December 24th, 2014

DL4J: Deep Learning for Java

From the webpage:

Deeplearning4j is the first commercial-grade, open-source deep-learning library written in Java. It is meant to be used in business environments, rather than as a research tool for extensive data exploration. Deeplearning4j is most helpful in solving distinct problems, like identifying faces, voices, spam or e-commerce fraud.

Deeplearning4j integrates with GPUs and includes a versatile n-dimensional array class. DL4J aims to be cutting-edge plug and play, more convention than configuration. By following its conventions, you get an infinitely scalable deep-learning architecture suitable for Hadoop and other big-data structures. This Java deep-learning library has a domain-specific language for neural networks that serves to turn their multiple knobs.

Deeplearning4j includes a distributed deep-learning framework and a normal deep-learning framework (i.e. it runs on a single thread as well). Training takes place in the cluster, which means it can process massive amounts of data. Nets are trained in parallel via iterative reduce, and they are equally compatible with Java, Scala and Clojure, since they’re written for the JVM.

This open-source, distributed deep-learning framework is made for data input and neural net training at scale, and its output should be highly accurate predictive models.

By following the links at the bottom of each page, you will learn to set up, and train with sample data, several types of deep-learning networks. These include single- and multithread networks, Restricted Boltzmann machines, deep-belief networks, Deep Autoencoders, Recursive Neural Tensor Networks, Convolutional Nets and Stacked Denoising Autoencoders.

For a quick introduction to neural nets, please see our overview.

There are a lot of knobs to turn when you’re training a deep-learning network. We’ve done our best to explain them, so that Deeplearning4j can serve as a DIY tool for Java, Scala and Clojure programmers. If you have questions, please join our Google Group; for premium support, contact us at Skymind. ND4J is the Java scientific computing engine powering our matrix manipulations.

And you thought I write jargon laden prose. 😉

This both looks both exciting (as a technology) and challenging (as in needing accessible documentation).

Are you going to be “…turn[ing] their multiple knobs” over the holidays?

GitHub Repo


#deeplearning4j @IRC

Google Group

I first saw this in a tweet by Gregory Piatetsky.

Deep Learning: Doubly Easy and Doubly Powerful with GraphLab Create

Tuesday, December 23rd, 2014

Deep Learning: Doubly Easy and Doubly Powerful with GraphLab Create by Piotr Teterwak.

From the post:

One of machine learning’s core goals is classification of input data. This is the task of taking novel data and assigning it to one of a pre-determined number of labels, based on what the classifier learns from a training set. For instance, a classifier could take an image and predict whether it is a cat or a dog.


The pieces of information fed to a classifier for each data point are called features, and the category they belong to is a ‘target’ or ‘label’. Typically, the classifier is given data points with both features and labels, so that it can learn the correspondence between the two. Later, the classifier is queried with a data point and the classifier tries to predict what category it belongs to. A large group of these query data-points constitute a prediction-set, and the classifier is usually evaluated on its accuracy, or how many prediction queries it gets correct.

Despite a slow start, the post moves onto deep learning and GraphLab Create in detail, with code. You will need the GPU version of GraphLab Create to get the full benefit of this post.

Beyond distinguishing dogs and cats, a concern for other dogs and cats I’m sure, what images would you classify with deep learning?

I first saw this in a tweet by Aapo Kyrola