Archive for the ‘Neural Networks’ Category

Was that Stevie Nicks or Tacotron 2.0? ML Singing in 2018

Tuesday, December 19th, 2017

[S]amim @samim tweeted:

In 2018, machine learning based singing vocal synthesisers will go mainstream. It will transform the music industry beyond recognition.

With these two links:

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions by Jonathan Shen, et al.


This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinion score (MOS) of 4.53 comparable to a MOS of 4.58 for professionally recorded speech. To validate our design choices, we present ablation studies of key components of our system and evaluate the impact of using mel spectrograms as the input to WaveNet instead of linguistic, duration, and F0 features. We further demonstrate that using a compact acoustic intermediate representation enables significant simplification of the WaveNet architecture.


Audio samples from “Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions”

Try the samples before dismissing the prediction of machine learning singing in 2018.

I have a different question:

What is in your test set for ML singing?

Among my top picks, Stevie Nicks, Janis Joplin, and of course, Grace Slick.

CMU Neural Networks for NLP 2017 (16 Lectures)

Saturday, October 28th, 2017

Course Description:

Neural networks provide powerful new tools for modeling language, and have been used both to improve the state-of-the-art in a number of tasks and to tackle new problems that were not easy in the past. This class will start with a brief overview of neural networks, then spend the majority of the class demonstrating how to apply neural networks to natural language problems. Each section will introduce a particular problem or phenomenon in natural language, describe why it is difficult to model, and demonstrate several models that were designed to tackle this problem. In the process of doing so, the class will cover different techniques that are useful in creating neural network models, including handling variably sized and structured sentences, efficient handling of large data, semi-supervised and unsupervised learning, structured prediction, and multilingual modeling.

Suggested pre-requisite: 11-711 “Algorithms for NLP”.

I wasn’t able to find videos for the algorithms for NLP course but you can explore the following as supplemental materials:

Each of these courses can be found in two places: YouTube and Academic Torrents. The advantage of Academic Torrents is that you can also download the supplementary course materials, like transcripts, PDFs, or PPTs.

  1. Natural Language Processing: Dan Jurafsky and Christopher Manning, Stanford University. YouTube | Academic Torrents
  2. Natural Language Processing: Michael Collins, Columbia University. YouTube | Academic Torrents
  3. Introduction to Natural Language Processing: Dragomir Radev, University of Michigan. YouTube | Academic Torrents

… (From 9 popular online courses that are gone forever… and how you can still find them)

Enjoyable but not as suited to binge watching as Stranger Things. 😉


Neuroscience-Inspired Artificial Intelligence

Saturday, August 5th, 2017

Neuroscience-Inspired Artificial Intelligence by Demis Hassabis, Dharshan Kumaran, Christopher Summerfield, and Matthew Botvinick.


The fields of neuroscience and artificial intelligence (AI) have a long and intertwined history. In more recent times, however, communication and collaboration between the two fields has become less commonplace. In this article, we argue that better understanding biological brains could play a vital role in building intelligent machines. We survey historical interactions between the AI and neuroscience fields and emphasize current advances in AI that have been inspired by the study of neural computation in humans and other animals. We conclude by highlighting shared themes that may be key for advancing future research in both fields.

Extremely rich article with nearly four (4) pages of citations.

Reading this paper closely and chasing the citations is a non-trivial task but you will be prepared understand and/or participate in the next big neuroscience/AI breakthrough.


Deep Learning for NLP Best Practices

Wednesday, July 26th, 2017

Deep Learning for NLP Best Practices by Sebastian Ruder.

From the introduction:

This post is a collection of best practices for using neural networks in Natural Language Processing. It will be updated periodically as new insights become available and in order to keep track of our evolving understanding of Deep Learning for NLP.

There has been a running joke in the NLP community that an LSTM with attention will yield state-of-the-art performance on any task. While this has been true over the course of the last two years, the NLP community is slowly moving away from this now standard baseline and towards more interesting models.

However, we as a community do not want to spend the next two years independently (re-)discovering the next LSTM with attention. We do not want to reinvent tricks or methods that have already been shown to work. While many existing Deep Learning libraries already encode best practices for working with neural networks in general, such as initialization schemes, many other details, particularly task or domain-specific considerations, are left to the practitioner.

This post is not meant to keep track of the state-of-the-art, but rather to collect best practices that are relevant for a wide range of tasks. In other words, rather than describing one particular architecture, this post aims to collect the features that underly successful architectures. While many of these features will be most useful for pushing the state-of-the-art, I hope that wider knowledge of them will lead to stronger evaluations, more meaningful comparison to baselines, and inspiration by shaping our intuition of what works.

I assume you are familiar with neural networks as applied to NLP (if not, I recommend Yoav Goldberg’s excellent primer [43]) and are interested in NLP in general or in a particular task. The main goal of this article is to get you up to speed with the relevant best practices so you can make meaningful contributions as soon as possible.

I will first give an overview of best practices that are relevant for most tasks. I will then outline practices that are relevant for the most common tasks, in particular classification, sequence labelling, natural language generation, and neural machine translation.

Certainly a resource to bookmark while you read A Primer on Neural Network Models for Natural Language Processing by Yoav Goldberg, at 76 pages and to consult frequently as you move beyond the primer stage.

Enjoy and pass it on!

Deanonymizing the Past

Thursday, July 6th, 2017

What Ever Happened to All the Old Racist Whites from those Civil Rights Photos? by Johnny Silvercloud raises an interesting question but never considers it from a modern technology perspective.

Silvercloud includes this lunch counter image:

I count almost twenty (20) full or partial faces in this one image. Thousands if not hundreds of thousands of other images from the civil rights era capture similar scenes.

Then it occurred to me, unlike prior generations with volumes of photographs, populated by anonymous bystanders/perpetrators to/of infamous acts, we have the present capacity to deanonimize the past.

As a starting point, may I suggest Deep Face Recognition by Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman, one of the more popular papers in this area, with 429 citations as of today (06 July 2017).


The goal of this paper is face recognition – from either a single photograph or from a set of faces tracked in a video. Recent progress in this area has been due to two factors: (i) end to end learning for the task using a convolutional neural network (CNN), and (ii) the availability of very large scale training datasets.

We make two contributions: first, we show how a very large scale dataset (2.6M images, over 2.6K people) can be assembled by a combination of automation and human in the loop, and discuss the trade off between data purity and time; second, we traverse through the complexities of deep network training and face recognition to present methods and procedures to achieve comparable state of the art results on the standard LFW and YTF face benchmarks.

That article was written in 2015 so consulting a 2017 summary update posted to Quora is advised for current details.

Banks, governments and others are using facial recognition for their own purposes, let’s also uses it to hold people responsible for their moral choices.

Moral choices at lunch counters, police riots, soldiers and camp guards from any number of countries and time periods, etc.


SketchRNN model released in Magenta [Hieroglyphs/Cuneiform Anyone?]

Friday, May 19th, 2017

SketchRNN model released in Magenta by Douglas Eck.

From the post:

Sketch-RNN, a generative model for vector drawings, is now available in Magenta. For an overview of the model, see the Google Research blog from April 2017, Teaching Machines to Draw (David Ha). For the technical machine learning details, see the arXiv paper A Neural Representation of Sketch Drawings (David Ha and Douglas Eck).

To try out Sketch-RNN, visit the Magenta GitHub for instructions. We’ve provided trained models, code for you to train your own models in TensorFlow and a Jupyter notebook tutorial (check it out!)

The code release is timed to coincide with a Google Creative Lab data release. Visit Quick, Draw! The Data for more information. For versions of the data pre-processed to work with Sketch-RNN, please refer to the GitHub repo for more information.

We’ll leave you with a look at yoga poses generated by moving through the learned representation (latent space) of the model trained on yoga drawings. Notice how it gets confused at around 10 seconds when it moves from poses standing towards poses done on a yoga mat. In our arXiv paper A Neural Representation of Sketch Drawings we discuss reasons for this behavior.

The paper, A Neural Representation of Sketch Drawings mentions:

ShadowDraw [17] is an interactive system that predicts what a finished drawing looks like based on a set of incomplete brush strokes from the user while the sketch is being drawn. ShadowDraw used a dataset of 30K raster images combined with extracted vectorized features. In this work, we use a much larger dataset of vector sketches that is made publicly available.

ShadowDraw is described at: ShadowDraw: Real-Time User Guidance for Freehand Drawing as:

We present ShadowDraw, a system for guiding the freeform drawing of objects. As the user draws, ShadowDraw dynamically updates a shadow image underlying the user’s strokes. The shadows are suggestive of object contours that guide the user as they continue drawing. This paradigm is similar to tracing, with two major differences. First, we do not provide a single image from which the user can trace; rather ShadowDraw automatically blends relevant images from a large database to construct the shadows. Second, the system dynamically adapts to the user’s drawings in real-time and produces suggestions accordingly. ShadowDraw works by efficiently matching local edge patches between the query, constructed from the current drawing, and a database of images. A hashing technique enforces both local and global similarity and provides sufficient speed for interactive feedback. Shadows are created by aggregating the top retrieved edge maps, spatially weighted by their match scores. We test our approach with human subjects and show comparisons between the drawings that were produced with and without the system. The results show that our system produces more realistically proportioned line drawings.

My first thought was the use of such techniques to assist in copying hieroglyphs or cuneiform as such or perhaps to assist in the practice of such glyphs.

OK, that may not have been your first thought but you have to admit it would make a rocking demonstration!

AI Brain Scans

Monday, March 13th, 2017

‘AI brain scans’ reveal what happens inside machine learning

The ResNet architecture is used for building deep neural networks for computer vision and image recognition. The image shown here is the forward (inference) pass of the ResNet 50 layer network used to classify images after being trained using the Graphcore neural network graph library

Credit Graphcore / Matt Fyles

The image is great eye candy, but if you want to see images annotated with information, check out: Inside an AI ‘brain’ – What does machine learning look like? (Graphcore)

From the product overview:

Poplar™ is a scalable graph programming framework targeting Intelligent Processing Unit (IPU) accelerated servers and IPU accelerated server clusters, designed to meet the growing needs of both advanced research teams and commercial deployment in the enterprise. It’s not a new language, it’s a C++ framework which abstracts the graph-based machine learning development process from the underlying graph processing IPU hardware.

Poplar includes a comprehensive, open source set of Poplar graph libraries for machine learning. In essence, this means existing user applications written in standard machine learning frameworks, like Tensorflow and MXNet, will work out of the box on an IPU. It will also be a natural basis for future machine intelligence programming paradigms which extend beyond tensor-centric deep learning. Poplar has a full set of debugging and analysis tools to help tune performance and a C++ and Python interface for application development if required.

The IPU-Appliance for the Cloud is due out in 2017. I have looked at Graphcore but came up dry on the Poplar graph libraries and/or an emulator for the IPU.

Perhaps those will both appear later in 2017.

Optimized hardware for graph calculations sounds promising but rapidly processing nodes that may or may not represent the same subject seems like a defect waiting to make itself known.

Many approaches rapidly process uncertain big data but being no more ignorant than your competition is hardly a selling point.

Turning Pixelated Faces Back Into Real Ones

Thursday, February 9th, 2017

Google’s neural networks turn pixelated faces back into real ones by John E. Dunn.

From the post:

Researchers at Google Brain have come up with a way to turn heavily pixelated images of human faces into something that bears a usable resemblance to the original subject.

In a new paper, the company’s researchers describe using neural networks put to work at two different ends of what should, on the face of it, be an incredibly difficult problem to solve: how to resolve a blocky 8 x 8 pixel images of faces or indoor scenes containing almost no information?

It’s something scientists in the field of super resolution (SR) have been working on for years, using techniques such as de-blurring and interpolation that are often not successful for this type of image. As the researchers put it:

When some details do not exist in the source image, the challenge lies not only in “deblurring” an image, but also in generating new image details that appear plausible to a human observer.

Their method involves getting the first “conditioning” neural network to resize 32 x 32 pixel images down to 8 x 8 pixels to see if that process can find a point at which they start to match the test image.

John raises a practical objection:

The obvious practical application of this would be enhancing blurry CCTV images of suspects. But getting to grips with real faces at awkward angles depends on numerous small details. Emphasise the wrong ones and police could end up looking for the wrong person.

True but John presumes the “suspects” are unknown. That’s true for the typical convenience store robbery on the 10 PM news but not so for “suspects” under intentional surveillance.

In those cases, multiple ground truth images from a variety of angles are likely to be available.

Four Experiments in Handwriting with a Neural Network

Tuesday, December 6th, 2016

Four Experiments in Handwriting with a Neural Network by Shan Carter, David Ha, Ian Johnson, and Chris Olah.

While the handwriting experiments are compelling and entertaining, the author’s have a more profound goal for this activity:

The black box reputation of machine learning models is well deserved, but we believe part of that reputation has been born from the programming context into which they have been locked into. The experience of having an easily inspectable model available in the same programming context as the interactive visualization environment (here, javascript) proved to be very productive for prototyping and exploring new ideas for this post.

As we are able to move them more and more into the same programming context that user interface work is done, we believe we will see richer modes of human-ai interactions flourish. This could have a marked impact on debugging and building models, for sure, but also in how the models are used. Machine learning research typically seeks to mimic and substitute humans, and increasingly it’s able to. What seems less explored is using machine learning to augment humans. This sort of complicated human-machine interaction is best explored when the full capabilities of the model are available in the user interface context.

Setting up a search alert for future work from these authors!

Your Next Favorite Twitter Account: @DeepDrumpf

Friday, August 5th, 2016

@DeepDrumpf is a Neural Network trained on Donald Trump transcripts.

If you are curious beyond the tweets, see: Postdoc’s Trump Twitterbot Uses AI To Train Itself On Transcripts From Trump Speeches.

Ideally an interface would strip @DeepDrumpf and @realDonaldTrump off of tweets and present you with the option to assign authorship to @DeepDrumpf or @realDonaldTrump.

At the end of twenty or thirty tweets, you get your accuracy score over assignment of authorship.


Deep Learning Trends @ ICLR 2016 (+ Shout-Out to arXiv)

Friday, June 3rd, 2016

Deep Learning Trends @ ICLR 2016 by Tomasz Malisiewicz.

From the post:

Started by the youngest members of the Deep Learning Mafia [1], namely Yann LeCun and Yoshua Bengio, the ICLR conference is quickly becoming a strong contender for the single most important venue in the Deep Learning space. More intimate than NIPS and less benchmark-driven than CVPR, the world of ICLR is arXiv-based and moves fast.

Today’s post is all about ICLR 2016. I’ll highlight new strategies for building deeper and more powerful neural networks, ideas for compressing big networks into smaller ones, as well as techniques for building “deep learning calculators.” A host of new artificial intelligence problems is being hit hard with the newest wave of deep learning techniques, and from a computer vision point of view, there’s no doubt that deep convolutional neural networks are today’s “master algorithm” for dealing with perceptual data.

Information packed review of the conference and if that weren’t enough, this shout-out to arXiv:

ICLR Publishing Model: arXiv or bust
At ICLR, papers get posted on arXiv directly. And if you had any doubts that arXiv is just about the single awesomest thing to hit the research publication model since the Gutenberg press, let the success of ICLR be one more data point towards enlightenment. ICLR has essentially bypassed the old-fashioned publishing model where some third party like Elsevier says “you can publish with us and we’ll put our logo on your papers and then charge regular people $30 for each paper they want to read.” Sorry Elsevier, research doesn’t work that way. Most research papers aren’t good enough to be worth $30 for a copy. It is the entire body of academic research that provides true value, for which a single paper just a mere door. You see, Elsevier, if you actually gave the world an exceptional research paper search engine, together with the ability to have 10-20 papers printed on decent quality paper for a $30/month subscription, then you would make a killing on researchers and I would endorse such a subscription. So ICLR, rightfully so, just said fuck it, we’ll use arXiv as the method for disseminating our ideas. All future research conferences should use arXiv to disseminate papers. Anybody can download the papers, see when newer versions with corrections are posted, and they can print their own physical copies. But be warned: Deep Learning moves so fast, that you’ve gotta be hitting refresh or arXiv on a weekly basis or you’ll be schooled by some grad students in Canada.

Is your publishing < arXiv?

Do you hit arXiv every week?

Automating Amazon/Hotel/Travel Reviews (+ Human Intelligence Test (HIT))

Sunday, February 28th, 2016

The Neural Network That Remembers by Zachary C. Lipton & Charles Elkan.

From the post:

On tap at the brewpub. A nice dark red color with a nice head that left a lot of lace on the glass. Aroma is of raspberries and chocolate. Not much depth to speak of despite consisting of raspberries. The bourbon is pretty subtle as well. I really don’t know that find a flavor this beer tastes like. I would prefer a little more carbonization to come through. It’s pretty drinkable, but I wouldn’t mind if this beer was available.

Besides the overpowering bouquet of raspberries in this guy’s beer, this review is remarkable for another reason. It was produced by a computer program instructed to hallucinate a review for a “fruit/vegetable beer.” Using a powerful artificial-intelligence tool called a recurrent neural network, the software that produced this passage isn’t even programmed to know what words are, much less to obey the rules of English syntax. Yet, by mining the patterns in reviews from the barflies at, the program learns how to generate similarly coherent (or incoherent) reviews.

The neural network learns proper nouns like “Coors Light” and beer jargon like “lacing” and “snifter.” It learns to spell and to misspell, and to ramble just the right amount. Most important, the neural network generates reviews that are contextually relevant. For example, you can say, “Give me a 5-star review of a Russian imperial stout,” and the software will oblige. It knows to describe India pale ales as “hoppy,” stouts as “chocolatey,” and American lagers as “watery.” The neural network also learns more colorful words for lagers that we can’t put in print.

This particular neural network can also run in reverse, taking any review and recognizing the sentiment (star rating) and subject (type of beer). This work, done by one of us (Lipton) in collaboration with his colleagues Sharad Vikram and Julian McAuley at the University of California, San Diego, is part of a growing body of research demonstrating the language-processing capabilities of recurrent networks. Other related feats include captioning images, translating foreign languages, and even answering e-mail messages. It might make you wonder whether computers are finally able to think.

(emphasis in original)

An enthusiastic introduction and projection of the future of recurrent neural networks! Quite a bit so.

My immediate thought was what a time saver a recurrent neural network would be for “evaluation” requests that appear in my inbox with alarming regularity.

What about a service that accepts forwarded emails and generates a review for the book, seller, hotel, travel, etc., which is returned to you for cut-n-paste?

That would be about as “intelligent” as the amount of attention most of us devote to such requests.

You could set the service to mimic highly followed reviewers so over time you would move up the ranks of reviewers.

I mention Amazon, hotel, travel reviews but those are just low-lying fruit. You could do journal book reviews with a different data set.

Near the end of the post the authors write:

In this sense, the computer-science community is evaluating recurrent neural networks via a kind of Turing test. We try to teach a computer to act intelligently by training it to imitate what people produce when faced with the same task. Then we evaluate our thinking machine by seeing whether a human judge can distinguish between its output and what a human being might come up with.

While the very fact that we’ve come this far is exciting, this approach may have some fundamental limitations. For instance, it’s unclear how such a system could ever outstrip the capabilities of the people who provide the training data. Teaching a machine to learn through imitation might never produce more intelligence than was present collectively in those people.

One promising way forward might be an approach called reinforcement learning. Here, the computer explores the possible actions it can take, guided only by some sort of reward signal. Recently, researchers at Google DeepMind combined reinforcement learning with feed-forward neural networks to create a system that can beat human players at 31 different video games. The system never got to imitate human gamers. Instead it learned to play games by trial and error, using its score in the video game as a reward signal.

Instead of asking whether computers can think, the more provocative question is “whether people think for a large range of daily activities?”

Consider it as the Human Intelligence Test (HIT).

How much “intelligence” does it take to win a video game?

Eye/hand coordination to be sure, attention, but what “intelligence” is involved?

Computers may “eclipse” human beings at non-intelligent activities, as a shovel “eclipses” our ability to dig with our bare hands.

But I’m not overly concerned.

Are you?

Superhuman Neural Network – Urban War Fighters Take Note

Wednesday, February 24th, 2016

Google Unveils Neural Network with “Superhuman” Ability to Determine the Location of Almost Any Image

From the post:

Here’s a tricky task. Pick a photograph from the Web at random. Now try to work out where it was taken using only the image itself. If the image shows a famous building or landmark, such as the Eiffel Tower or Niagara Falls, the task is straightforward. But the job becomes significantly harder when the image lacks specific location cues or is taken indoors or shows a pet or food or some other detail.

Nevertheless, humans are surprisingly good at this task. To help, they bring to bear all kinds of knowledge about the world such as the type and language of signs on display, the types of vegetation, architectural styles, the direction of traffic, and so on. Humans spend a lifetime picking up these kinds of geolocation cues.

So it’s easy to think that machines would struggle with this task. And indeed, they have.

Today, that changes thanks to the work of Tobias Weyand, a computer vision specialist at Google, and a couple of pals. These guys have trained a deep-learning machine to work out the location of almost any photo using only the pixels it contains.

Their new machine significantly outperforms humans and can even use a clever trick to determine the location of indoor images and pictures of specific things such as pets, food, and so on that have no location cues.

The full paper: PlaNet—Photo Geolocation with Convolutional Neural Networks.


Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. While previous approaches only recognize landmarks or perform approximate matching using global image descriptors, our model is able to use and integrate multiple visible cues. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman levels of accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, we demonstrate that this model achieves a 50% performance improvement over the single-image model.

You might think that with GPS engaged that the location of images is a done deal.

Not really. You can be facing in any direction from a particular GPS location and in a dynamic environment, analysts or others don’t have the time to sort out which images are relevant from those that are just noise.

Urban warfare does not occur on a global scale, bringing home the lesson it isn’t the biggest data set but the most relevant and timely data set that is important.

Relevantly oriented images and feeds are a natural outgrowth of this work. Not to mention pairing those images with other relevant data.

PS: Before I forget, enjoy paying the game at:

More Bad News For EC Brain Project Wood Pigeons

Sunday, February 14th, 2016

I heard the story of how the magpie tried to instruct other birds, particularly the wood pigeon, on how to build nests in a different form but the lesson was much the same.

The EC Brain project reminds me of the wood pigeon hearing “…take two sticks…” and running off to build its nest.

With no understanding of the human brain, the EC set out to build one, on a ten year deadline.

Byron Spice’s report in: Project Aims to Reverse-engineer Brain Algorithms, Make Computers Learn Like Humans casts further doubt upon that project:

Carnegie Mellon University is embarking on a five-year, $12 million research effort to reverse-engineer the brain, seeking to unlock the secrets of neural circuitry and the brain’s learning methods. Researchers will use these insights to make computers think more like humans.

The research project, led by Tai Sing Lee, professor in the Computer Science Department and the Center for the Neural Basis of Cognition (CNBC), is funded by the Intelligence Advanced Research Projects Activity (IARPA) through its Machine Intelligence from Cortical Networks (MICrONS) research program. MICrONS is advancing President Barack Obama’s BRAIN Initiative to revolutionize the understanding of the human brain.

“MICrONS is similar in design and scope to the Human Genome Project, which first sequenced and mapped all human genes,” Lee said. “Its impact will likely be long-lasting and promises to be a game changer in neuroscience and artificial intelligence.”

Artificial neural nets process information in one direction, from input nodes to output nodes. But the brain likely works in quite a different way. Neurons in the brain are highly interconnected, suggesting possible feedback loops at each processing step. What these connections are doing computationally is a mystery; solving that mystery could enable the design of more capable neural nets.

My goodness! Unknown loops in algorithms?

The Carnegie Mellon project is exploring potential algorithms, not trying to engineer the unknown.

If the EC had titled its project the Graduate Assistant and Hospitality Industry Support Project, one could object to the use of funds for travel junkets but it would otherwise be intellectually honest.

Build your own neural network classifier in R

Wednesday, February 10th, 2016

Build your own neural network classifier in R by Jun Ma.

From the post:

Image classification is one important field in Computer Vision, not only because so many applications are associated with it, but also a lot of Computer Vision problems can be effectively reduced to image classification. The state of art tool in image classification is Convolutional Neural Network (CNN). In this article, I am going to write a simple Neural Network with 2 layers (fully connected). First, I will train it to classify a set of 4-class 2D data and visualize the decision boundary. Second, I am going to train my NN with the famous MNIST data (you can download it here: and see its performance. The first part is inspired by CS 231n course offered by Stanford:, which is taught in Python.

One suggestion, based on some unrelated reading, don’t copy-n-paste the code.

Key in the code so you will get accustomed to your typical typing mistakes, which are no doubt different from mine!

Plus you will develop muscle memory in your fingers and code will either “look right” or not.


PS: For R, Jun’s blog looks like one you need to start following!

‘Picard and Dathon at El-Adrel’

Saturday, December 26th, 2015

Machines, Lost In Translation: The Dream Of Universal Understanding by Anne Li.

From the post:

It was early 1954 when computer scientists, for the first time, publicly revealed a machine that could translate between human languages. It became known as the Georgetown-IBM experiment: an “electronic brain” that translated sentences from Russian into English.

The scientists believed a universal translator, once developed, would not only give Americans a security edge over the Soviets but also promote world peace by eliminating language barriers.

They also believed this kind of progress was just around the corner: Leon Dostert, the Georgetown language scholar who initiated the collaboration with IBM founder Thomas Watson, suggested that people might be able to use electronic translators to bridge several languages within five years, or even less.

The process proved far slower. (So slow, in fact, that about a decade later, funders of the research launched an investigation into its lack of progress.) And more than 60 years later, a true real-time universal translator — a la C-3PO from Star Wars or the Babel Fish from The Hitchhiker’s Guide to the Galaxy — is still the stuff of science fiction.

How far are we from one, really? Expert opinions vary. As with so many other areas of machine learning, it depends on how quickly computers can be trained to emulate human thinking.

The Star Trek Next Generation episode Darmok was set during a five-year mission that began in 2364, some 349 years in our future. Faster than light travel, teleportation, etc. are day to day realities. One expects machine translation to have improved at least as much.

As Li reports exciting progress is being made with neural networks for translation but transposing words from one language to another, as illustrated in Darmok, isn’t a guarantee of “universal understanding.”

In fact, the transposition may be as opaque as the statement in its original language, such as “Darmok and Jalad at Tanagra,” leaves the hearer to wonder what happened at Tanagra, what was the relationship between Darmok and Jalad, etc.

In the early lines of The Story of the Shipwrecked Sailor, a Middle Kingdom (Egypt, 2000 BCE – 1700 BCE) story, there is a line that describes the sailor returning home and words to the effect “…we struck….” Then the next sentence picks up.

The words necessary to complete that statement don’t occur in the text. You have to know that mooring boats on the Nile did not involve piers, etc. but simply banking your boat and then driving a post (the unstated subject of “we struck”) to secure the vessel.

Transposition from Middle Egyptian to English leaves you without a clue as to the meaning of that passage.

To be sure, neural networks may clear away some of the rote work of transposition between languages but that is a far cry from “universal understanding.”

Both now and likely to continue into the 24th century.

Neural Networks, Recognizing Friendlies, $Billions; Friendlies as Enemies, $Priceless

Thursday, December 24th, 2015

Elon Musk merits many kudos for the recent SpaceX success.

At the same time, Elon has been nominated for Luddite of the Year, along with Bill Gates and Stephen Hawking, for fanning fears of artificial intelligence.

One favorite target for such fears are autonomous weapons systems. Hannah Junkerman annotated a list of 18 posts, articles and books on such systems for Just Security.

While moralists are wringing their hands, military forces have not let grass grow under their feet with regard to autonomous weapon systems. As Michael Carl Haas reports in Autonomous Weapon Systems: The Military’s Smartest Toys?:

Military forces that rely on armed robots to select and destroy certain types of targets without human intervention are no longer the stuff of science fiction. In fact, swarming anti-ship missiles that acquire and attack targets based on pre-launch input, but without any direct human involvement—such as the Soviet Union’s P-700 Granit—have been in service for decades. Offensive weapons that have been described as acting autonomously—such as the UK’s Brimstone anti-tank missile and Norway’s Joint Strike Missile—are also being fielded by the armed forces of Western nations. And while governments deny that they are working on armed platforms that will apply force without direct human oversight, sophisticated strike systems that incorporate significant features of autonomy are, in fact, being developed in several countries.

In the United States, the X-47B unmanned combat air system (UCAS) has been a definite step in this direction, even though the Navy is dodging the issue of autonomous deep strike for the time being. The UK’s Taranis is now said to be “merely” semi-autonomous, while the nEUROn developed by France, Greece, Italy, Spain, Sweden and Switzerland is explicitly designed to demonstrate an autonomous air-to-ground capability, as appears to be case with Russia’s MiG Skat. While little is known about China’s Sharp Sword, it is unlikely to be far behind its competitors in conceptual terms.

The reasoning of military planners in favor of autonomous weapons systems isn’t hard to find, especially when one article describes air-to-air combat between tactically autonomous and machine-piloted aircraft versus piloted aircraft this way:

This article claims that a tactically autonomous, machine-piloted aircraft whose design capitalizes on John Boyd’s observe, orient, decide, act (OODA) loop and energy-maneuverability constructs will bring new and unmatched lethality to air-to-air combat. It submits that the machine’s combined advantages applied to the nature of the tasks would make the idea of human-inhabited platforms that challenge it resemble the mismatch depicted in The Charge of the Light Brigade.

Here’s the author’s mock-up of sixth-generation approach:


(Select the image to see an undistorted view of both aircraft.)

Given the strides being made on the use of neural networks, I would be surprised if they are not at the core of present and future autonomous weapons systems.

You can join the debate about the ethics of autonomous weapons but the more practical approach is to read How to trick a neural network into thinking a panda is a vulture by Julia Evans.

Autonomous weapon systems will be developed by a limited handful of major military powers, at least at first, which means the market for counter-measures, such as turning such weapons against their masters, will bring a premium price. Far more than the offensive development side. Not to mention there will be a far larger market for counter-measures.

Deception, one means of turning weapons against their users, has a long history, not the earliest of which is the tale of Esau and Jacob (Genesis, chapter 26):

11 And Jacob said to Rebekah his mother, Behold, Esau my brother is a hairy man, and I am a smooth man:

12 My father peradventure will feel me, and I shall seem to him as a deceiver; and I shall bring a curse upon me, and not a blessing.

13 And his mother said unto him, Upon me be thy curse, my son: only obey my voice, and go fetch me them.

14 And he went, and fetched, and brought them to his mother: and his mother made savoury meat, such as his father loved.

15 And Rebekah took goodly raiment of her eldest son Esau, which were with her in the house, and put them upon Jacob her younger son:

16 And she put the skins of the kids of the goats upon his hands, and upon the smooth of his neck:

17 And she gave the savoury meat and the bread, which she had prepared, into the hand of her son Jacob.

Julia’s post doesn’t cover the hard case of seeing Jacob as Esau up close but in a battle field environment, the equivalent of mistaking a panda for a vulture, may be good enough.

The primary distinction that any autonomous weapons system must make is the friendly/enemy distinction. The term “friendly fire” was coined to cover cases where human directed weapons systems fail to make that distinction correctly.

The historical rate of “friendly fire” or fratricide is 2% but Mark Thompson reports in The Curse of Friendly Fire, that the actual fratricide rate in the 1991 Gulf war was 24%.

#Juniper, just to name one recent federal government software failure, is evidence that robustness isn’t an enforced requirement for government software.

Apply that lack of requirements to neural networks in autonomous weapons platforms and you have the potential for both developing and defeating autonomous weapons systems.

Julia’s post leaves you a long way from defeating an autonomous weapons platform but it is a good starting place.

PS: Defeating military grade neural networks will be good training for defeating more sophisticated ones used by commercial entities.

Why Neurons Have Thousands of Synapses! (Quick! Someone Call the EU Brain Project!)

Thursday, November 12th, 2015

Single Artificial Neuron Taught to Recognize Hundreds of Patterns.

From the post:

Artificial intelligence is a field in the midst of rapid, exciting change. That’s largely because of an improved understanding of how neural networks work and the creation of vast databases to help train them. The result is machines that have suddenly become better at things like face and object recognition, tasks that humans have always held the upper hand in (see “Teaching Machines to Understand Us”).

But there’s a puzzle at the heart of these breakthroughs. Although neural networks are ostensibly modeled on the way the human brain works, the artificial neurons they contain are nothing like the ones at work in our own wetware. Artificial neurons, for example, generally have just a handful of synapses and entirely lack the short, branched nerve extensions known as dendrites and the thousands of synapses that form along them. Indeed, nobody really knows why real neurons have so many synapses.

Today, that changes thanks to the work of Jeff Hawkins and Subutai Ahmad at Numenta, a Silicon Valley startup focused on understanding and exploiting the principles behind biological information processing. The breakthrough these guys have made is to come up with a new theory that finally explains the role of the vast number of synapses in real neurons and to create a model based on this theory that reproduces many of the intelligent behaviors of real neurons.

A very enjoyable and accessible summary of a paper on the cutting edge of neuroscience!

Relevant for another concern, that I will be covering in the near future, but the post concludes with:

One final point is that this new thinking does not come from an academic environment but from a Silicon Valley startup. This company is the brain child of Jeff Hawkins, an entrepreneur, inventor and neuroscientist. Hawkins invented the Palm Pilot in the 1990s and has since turned his attention to neuroscience full-time.

That’s an unusual combination of expertise but one that makes it highly likely that we will see these new artificial neurons at work on real world problems in the not too distant future. Incidentally, Hawkins and Ahmad call their new toys Hierarchical Temporal Memory neurons or HTM neurons. Expect to hear a lot more about them.

If you want all the details, see:

Why Neurons Have Thousands of Synapses, A Theory of Sequence Memory in Neocortex by Jeff Hawkins, Subutai Ahmad.


Neocortical neurons have thousands of excitatory synapses. It is a mystery how neurons integrate the input from so many synapses and what kind of large-scale network behavior this enables. It has been previously proposed that non-linear properties of dendrites enable neurons to recognize multiple patterns. In this paper we extend this idea by showing that a neuron with several thousand synapses arranged along active dendrites can learn to accurately and robustly recognize hundreds of unique patterns of cellular activity, even in the presence of large amounts of noise and pattern variation. We then propose a neuron model where some of the patterns recognized by a neuron lead to action potentials and define the classic receptive field of the neuron, whereas the majority of the patterns recognized by a neuron act as predictions by slightly depolarizing the neuron without immediately generating an action potential. We then present a network model based on neurons with these properties and show that the network learns a robust model of time-based sequences. Given the similarity of excitatory neurons throughout the neocortex and the importance of sequence memory in inference and behavior, we propose that this form of sequence memory is a universal property of neocortical tissue. We further propose that cellular layers in the neocortex implement variations of the same sequence memory algorithm to achieve different aspects of inference and behavior. The neuron and network models we introduce are robust over a wide range of parameters as long as the network uses a sparse distributed code of cellular activations. The sequence capacity of the network scales linearly with the number of synapses on each neuron. Thus neurons need thousands of synapses to learn the many temporal patterns in sensory stimuli and motor sequences.

BTW, did I mention the full source code is available at:

Coming from a startup, this discovery doesn’t have a decade of support for travel, meals, lodging, support staff, publications, administrative overhead, etc., for a cast of hundreds across the EU. But, then that decade would not have resulted in such a fundamental discovery in any event.

Is that a hint about the appropriate vehicle for advancing fundamental discoveries in science?

Text Mining Meets Neural Nets: Mining the Biomedical Literature

Wednesday, October 28th, 2015

Text Mining Meets Neural Nets: Mining the Biomedical Literature by Dan Sullivan.

From the webpage:

Text mining and natural language processing employ a range of techniques from syntactic parsing, statistical analysis, and more recently deep learning. This presentation presents recent advances in dense word representations, also known as word embedding, and their advantages over sparse representations, such as the popular term frequency-inverse document frequency (tf-idf) approach. It also discusses convolutional neural networks, a form of deep learning that is proving surprisingly effective in natural language processing tasks. Reference papers and tools are included for those interested in further details. Examples are drawn from the bio-medical domain.

Basically an abstract for the 58 slides you will find here:

The best thing about these slides is the wealth of additional links to other resources. There is only so much you can say on a slide so links to more details should be a standard practice.

Slide 53: Formalize a Mathematical Model of Semantics, seems a bit ambitious to me. Considering mathematics are a subset of natural languages. Difficult to see how the lesser could model the greater.

You could create a mathematical model of some semantics and say it was all that is necessary, but that’s been done before. Always strive to make new mistakes.

What a Deep Neural Network thinks about your #selfie

Sunday, October 25th, 2015

What a Deep Neural Network thinks about your #selfie by Andrej Karpathy.

From the post:

Convolutional Neural Networks are great: they recognize things, places and people in your personal photos, signs, people and lights in self-driving cars, crops, forests and traffic in aerial imagery, various anomalies in medical images and all kinds of other useful things. But once in a while these powerful visual recognition models can also be warped for distraction, fun and amusement. In this fun experiment we’re going to do just that: We’ll take a powerful, 140-million-parameter state-of-the-art Convolutional Neural Network, feed it 2 million selfies from the internet, and train it to classify good selfies from bad ones. Just because it’s easy and because we can. And in the process we might learn how to take better selfies 🙂

A must read for anyone interested in deep neural networks and image recognition!

Selfies provide abundant and amusing data to illustrate neural network techniques that are being used every day.

Andrej provides numerous pointers to additional materials and references on neural networks. Good think considering how much interest his post is going to generate!

Neural Networks Demystified (videos)

Tuesday, October 20th, 2015

I first saw this video series in a tweet by Jason Baldridge.

You know what a pig’s breakfast YouTube’s related videos can be. No matter which part I looked at, there was no full listing of the other parts.

To save you that annoyance, here are all the videos in this series. (That’s a partial definition of curation, saving other people time and expense in finding information.)

Teaching Deep Convolutional Neural Networks to Play Go [Networks that can’t explain their play]

Sunday, October 18th, 2015

Teaching Deep Convolutional Neural Networks to Play Go by Christopher Clark, Amos Storkey.


Mastering the game of Go has remained a long standing challenge to the field of AI. Modern computer Go systems rely on processing millions of possible future positions to play well, but intuitively a stronger and more ‘humanlike’ way to play the game would be to rely on pattern recognition abilities rather then brute force computation. Following this sentiment, we train deep convolutional neural networks to play Go by training them to predict the moves made by expert Go players. To solve this problem we introduce a number of novel techniques, including a method of tying weights in the network to ‘hard code’ symmetries that are expect to exist in the target function, and demonstrate in an ablation study they considerably improve performance. Our final networks are able to achieve move prediction accuracies of 41.1% and 44.4% on two different Go datasets, surpassing previous state of the art on this task by significant margins. Additionally, while previous move prediction programs have not yielded strong Go playing programs, we show that the networks trained in this work acquired high levels of skill. Our convolutional neural networks can consistently defeat the well known Go program GNU Go, indicating it is state of the art among programs that do not use Monte Carlo Tree Search. It is also able to win some games against state of the art Go playing program Fuego while using a fraction of the play time. This success at playing Go indicates high level principles of the game were learned.

The last line of the abstract caught my eye:

This success at playing Go indicates high level principles of the game were learned.

That statement is expanded in 4.3 Playing Go:

The results are very promising. Even though the networks are playing using a ‘zero step look ahead’ policy, and using a fraction of the computation time as their opponents, they are still able to play better then GNU Go and take some games away from Fuego. Under these settings GNU Go might play at around a 6-8 kyu ranking and Fuego at 2-3 kyu, which implies the networks are achieving a ranking of approximately 4-5 kyu. For a human player reaching this ranking would normally require years of study. This indicates that sophisticated knowledge of the game was acquired. This also indicates great potential for a Go program that integrates the information produced by such a network.

An interesting limitation that the network can’t communicate what it has learned. It can only produce an answer for a given situation. In gaming situations that opaqueness isn’t immediately objectionable.

But what if the situation was fire/don’t fire in a combat situation? Would the limitation that the network can only say yes or no, with no way to explain its answer, be acceptable?

Is that any worse than humans inventing explanations for decisions that weren’t the result of any rational thinking process?

Some additional Go resources you may find useful: American Go Association, Go Game Guru (with a printable Go board and stones), (has a Japanese dictionary). Those site will lead you to many other Go sites.

10 Misconceptions about Neural Networks [Update to car numberplate game?]

Saturday, September 19th, 2015

10 Misconceptions about Neural Networks by Stuart Reid.

From the post:

Neural networks are one of the most popular and powerful classes of machine learning algorithms. In quantitative finance neural networks are often used for time-series forecasting, constructing proprietary indicators, algorithmic trading, securities classification and credit risk modelling. They have also been used to construct stochastic process models and price derivatives. Despite their usefulness neural networks tend to have a bad reputation because their performance is “temperamental”. In my opinion this can be attributed to poor network design owing to misconceptions regarding how neural networks work. This article discusses some of those misconceptions.

The car numberplate game was a game where passengers in a car, usually children, would compete to find license plates from different states (in the US). That was prior to children being entombed in intellectual isolation bubbles with iPads, Gameboys, DVD players and wireless access, while riding.

Hard to believe but some people used to look outside the vehicle in which they were riding. Now of course what little attention they have is captured by cellphones and not other occupants of the same vehicle.

Rather than rail against that trend, may I suggest we update the car numberplate game to “mistakes about neural networks?”

Using Stuart’s post as a baseline, send a text message to each passenger pointing to Stuart’s post and requesting a count of the number of “mistakes about neural networks” they can find in an hour.

Personally I would put popular media off limits for post-high school players to keep the scores under four digits.

When discussing the scores, after sharing browsing histories, each player has to analyze the claimed error and match it to one on Stuart’s list.

I realize that will require full bandwidth communication with others in your physical presence but with practice, that won’t seem so terribly odd.

I first saw this in a tweet by Kirk Borne.

A Critical Review of Recurrent Neural Networks for Sequence Learning

Monday, June 29th, 2015

A Critical Review of Recurrent Neural Networks for Sequence Learning by Zachary C. Lipton.


Countless learning tasks require awareness of time. Image captioning, speech synthesis, and video game playing all require that a model generate sequences of outputs. In other domains, such as time series prediction, video analysis, and music information retrieval, a model must learn from sequences of inputs. Significantly more interactive tasks, such as natural language translation, engaging in dialogue, and robotic control, often demand both.

Recurrent neural networks (RNNs) are a powerful family of connectionist models that capture time dynamics via cycles in the graph. Unlike feedforward neural networks, recurrent networks can process examples one at a time, retaining a state, or memory, that reflects an arbitrarily long context window. While these networks have long been difficult to train and often contain millions of parameters, recent advances in network architectures, optimization techniques, and parallel computation have enabled large-scale learning with recurrent nets.

Over the past few years, systems based on state of the art long short-term memory (LSTM) and bidirectional recurrent neural network (BRNN) architectures have demonstrated record-setting performance on tasks as varied as image captioning, language translation, and handwriting recognition. In this review of the literature we synthesize the body of research that over the past three decades has yielded and reduced to practice these powerful models. When appropriate, we reconcile conflicting notation and nomenclature. Our goal is to provide a mostly self-contained explication of state of the art systems, together with a historical perspective and ample references to the primary research.

Lipton begins with an all too common lament:

The literature on recurrent neural networks can seem impenetrable to the uninitiated. Shorter papers assume familiarity with a large body of background literature. Diagrams are frequently underspecified, failing to indicate which edges span time steps and which don’t. Worse, jargon abounds while notation is frequently inconsistent across papers or overloaded within papers. Readers are frequently in the unenviable position of having to synthesize conflicting information across many papers in order to understand but one. For example, in many papers subscripts index both nodes and time steps. In others, h simultaneously stands for link functions and a layer of hidden nodes. The variable t simultaneously stands for both time indices and targets, sometimes in the same equation. Many terrific breakthrough papers have appeared recently, but clear reviews of recurrent neural network literature are rare.

Unfortunately, Lipton gives no pointers to where the variant practices occur, leaving the reader forewarned but not forearmed.

Still, this is a survey paper with seventy-three (73) references over thirty-three (33) pages, so I assume you will encounter various notation practices if you follow the references and current literature.

Capturing variations in notation, along with where they have been seen, won’t win the Turing Award but may improve the CS field overall.

Learning to Execute

Monday, June 22nd, 2015

Learning to Execute by Wojciech Zaremba and Ilya Sutskever.


Recurrent Neural Networks (RNNs) with Long Short-Term Memory units (LSTM) are widely used because they are expressive and are easy to train. Our interest lies in empirically evaluating the expressiveness and the learnability of LSTMs in the sequence-to-sequence regime by training them to evaluate short computer programs, a domain that has traditionally been seen as too complex for neural networks. We consider a simple class of programs that can be evaluated with a single left-to-right pass using constant memory. Our main result is that LSTMs can learn to map the character-level representations of such programs to their correct outputs. Notably, it was necessary to use curriculum learning, and while conventional curriculum learning proved ineffective, we developed a new variant of curriculum learning that improved our networks’ performance in all experimental conditions. The improved curriculum had a dramatic impact on an addition problem, making it possible to train an LSTM to add two 9-digit numbers with 99% accuracy.

Code to replicate the experiments:

A step towards generation of code that conforms to coding standards?

I first saw this in a tweet by samin.

Real-time Trainable Neural Network (on a chip)

Saturday, June 20th, 2015

Real-time Trainable Neural Network

From the webpage:

The architecture of the CogniMem™ chip makes it the most practical implementation of a Radial Basis Function classifier with autonomous adaptive learning capabilities.

The Radial Basis Function is a classifier capable of representing complex nonlinear decision spaces using hyperspheres with adaptable radii. It is widely used for face recognition and other image recognition applications, function approximation, time series prediction, novelty detection.


The CogniMem Advantage: Upon receipt of an input vector, all the cognitive memories holding a previously learned vector calculate their distance to the input vector and evaluate immediately if it falls in their similarity domain. If so, the “firing” cells are ready to output their response in an orderly fashion giving the way to the cell which holds the smallest distance. If no cell fires and a teaching command is issued, the next available cell automatically learns the vector. Also, if a teaching command conflicts with the category that a firing cell, the latter automatically corrects itself by reducing its influence field.

This autonomous learning and recognition behavior pertains to the unique CogniMem parallel architecture and a patented Search and Sort process.

The website has a wealth of information and modules start at $175 per unit.

I first saw this in a tweet by Kirk Borne.

Inceptionism: Going Deeper into Neural Networks

Friday, June 19th, 2015

Inceptionism: Going Deeper into Neural Networks by Alexander Mordvintsev, Christopher Olah, and Mike Tyka.

From the post:

Artificial Neural Networks have spurred remarkable recent progress in image classification and speech recognition. But even though these are very useful tools based on well-known mathematical methods, we actually understand surprisingly little of why certain models work and others don’t. So let’s take a look at some simple techniques for peeking inside these networks.

We train an artificial neural network by showing it millions of training examples and gradually adjusting the network parameters until it gives the classifications we want. The network typically consists of 10-30 stacked layers of artificial neurons. Each image is fed into the input layer, which then talks to the next layer, until eventually the “output” layer is reached. The network’s “answer” comes from this final output layer.

One of the challenges of neural networks is understanding what exactly goes on at each layer. We know that after training, each layer progressively extracts higher and higher-level features of the image, until the final layer essentially makes a decision on what the image shows. For example, the first layer maybe looks for edges or corners. Intermediate layers interpret the basic features to look for overall shapes or components, like a door or a leaf. The final few layers assemble those into complete interpretations—these neurons activate in response to very complex things such as entire buildings or trees.

Have you ever looked under the hood of a neural network? If not, you are in for a real treat! As a bonus, this research may help you understand why some models work and others don’t.

Same title but images as seen by neural networks before it reaches an outcome.

I don’t think anyone has captured an interruption of image processing in the human brain. With a neural network, that is a reality.


The Unreasonable Effectiveness of Recurrent Neural Networks

Friday, May 22nd, 2015

The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy.

From the post:

There’s something magical about Recurrent Neural Networks (RNNs). I still remember when I trained my first recurrent network for Image Captioning. Within a few dozen minutes of training my first baby model (with rather arbitrarily-chosen hyperparameters) started to generate very nice looking descriptions of images that were on the edge of making sense. Sometimes the ratio of how simple your model is to the quality of the results you get out of it blows past your expectations, and this was one of those times. What made this result so shocking at the time was that the common wisdom was that RNNs were supposed to be difficult to train (with more experience I’ve in fact reached the opposite conclusion). Fast forward about a year: I’m training RNNs all the time and I’ve witnessed their power and robustness many times, and yet their magical outputs still find ways of amusing me. This post is about sharing some of that magic with you.

We’ll train RNNs to generate text character by character and ponder the question “how is that even possible?”

By the way, together with this post I am also releasing code on Github that allows you to train character-level language models based on multi-layer LSTMs. You give it a large chunk of text and it will learn to generate text like it one character at a time. You can also use it to reproduce my experiments below. But we’re getting ahead of ourselves; What are RNNs anyway?

I try to blog or reblog about worthy posts by others but every now and again, I encounter a post that is stunning in its depth and usefulness.

This post by Andrej Karpathy is one of the stunning ones.

In addition to covering RNNs in general, he takes the reader on a tour of “Fun with RNNs.”

Which covers the application of RNNs to:

  • A Paul Graham generator
  • Shakespeare
  • Wikipedia
  • Algebraic Geometry (Latex)
  • Linux Source Code

Along with sourcecode, Andrej provides a list of further reading.

What’s your example of using RNNs?

Exploring the Unknown Frontier of the Brain

Tuesday, April 7th, 2015

Exploring the Unknown Frontier of the Brain by James L. Olds.

From the post:

To a large degree, your brain is what makes you… you. It controls your thinking, problem solving and voluntary behaviors. At the same time, your brain helps regulate critical aspects of your physiology, such as your heart rate and breathing.

And yet your brain — a nonstop multitasking marvel — runs on only about 20 watts of energy, the same wattage as an energy-saving light bulb.

Still, for the most part, the brain remains an unknown frontier. Neuroscientists don’t yet fully understand how information is processed by the brain of a worm that has several hundred neurons, let alone by the brain of a human that has 80 billion to 100 billion neurons. The chain of events in the brain that generates a thought, behavior or physiological response remains mysterious.

Building on these and other recent innovations, President Barack Obama launched the Brain Research through Advancing Innovative Neurotechnologies Initiative (BRAIN Initiative) in April 2013. Federally funded in 2015 at $200 million, the initiative is a public-private research effort to revolutionize researchers’ understanding of the brain.

James reviews currently funded efforts under the BRAIN Initiative, each of which is pursuing possible ways to explore, model and understand brain activity. Exploration in its purest sense. The researchers don’t know what they will find.

I suspect the leap from not understanding <302 neurons in a worm to understanding the 80 to 100 billion neurons in each person, is going to happen anytime soon. Just as well, think of all the papers, conferences and publications along the way!

Classifying Plankton With Deep Neural Networks

Monday, March 23rd, 2015

Classifying Plankton With Deep Neural Networks by Sander Dieleman.

From the post:

The National Data Science Bowl, a data science competition where the goal was to classify images of plankton, has just ended. I participated with six other members of my research lab, the Reservoir lab of prof. Joni Dambre at Ghent University in Belgium. Our team finished 1st! In this post, we’ll explain our approach.

The ≋ Deep Sea ≋ team consisted of Aäron van den Oord, Ira Korshunova, Jeroen Burms, Jonas Degrave, Lionel Pigou, Pieter Buteneers and myself. We are all master students, PhD students and post-docs at Ghent University. We decided to participate together because we are all very interested in deep learning, and a collaborative effort to solve a practical problem is a great way to learn.

There were seven of us, so over the course of three months, we were able to try a plethora of different things, including a bunch of recently published techniques, and a couple of novelties. This blog post was written jointly by the team and will cover all the different ingredients that went into our solution in some detail.


This blog post is going to be pretty long! Here’s an overview of the different sections. If you want to skip ahead, just click the section title to go there.


The problem

The goal of the competition was to classify grayscale images of plankton into one of 121 classes. They were created using an underwater camera that is towed through an area. The resulting images are then used by scientists to determine which species occur in this area, and how common they are. There are typically a lot of these images, and they need to be annotated before any conclusions can be drawn. Automating this process as much as possible should save a lot of time!

The images obtained using the camera were already processed by a segmentation algorithm to identify and isolate individual organisms, and then cropped accordingly. Interestingly, the size of an organism in the resulting images is proportional to its actual size, and does not depend on the distance to the lens of the camera. This means that size carries useful information for the task of identifying the species. In practice it also means that all the images in the dataset have different sizes.

Participants were expected to build a model that produces a probability distribution across the 121 classes for each image. These predicted distributions were scored using the log loss (which corresponds to the negative log likelihood or equivalently the cross-entropy loss).

This loss function has some interesting properties: for one, it is extremely sensitive to overconfident predictions. If your model predicts a probability of 1 for a certain class, and it happens to be wrong, the loss becomes infinite. It is also differentiable, which means that models trained with gradient-based methods (such as neural networks) can optimize it directly – it is unnecessary to use a surrogate loss function.

Interestingly, optimizing the log loss is not quite the same as optimizing classification accuracy. Although the two are obviously correlated, we paid special attention to this because it was often the case that significant improvements to the log loss would barely affect the classification accuracy of the models.

This rocks!

Code is coming soon to Github!

Certainly of interest to marine scientists but also to anyone in bio-medical imaging.

The problem of too much data and too few experts is a common one.

What I don’t recall seeing are releases of pre-trained classifiers. Is the art developing too quickly for that to be a viable product? Just curious.

I first saw this in a tweet by Angela Zutavern.