## Archive for the ‘Deep Learning’ Category

### Porn, AI and Open Source Ethics

Thursday, February 8th, 2018

From the post:

In 2015, Google announced it would release its internal tool for developing artificial intelligence algorithms, TensorFlow, a move that would change the tone of how AI research and development would be conducted around the world. The means to build technology that could have an impact as profound as electricity, to borrow phrasing from Google’s CEO, would be open, accessible, and free to use. The barrier to entry was lowered from a Ph.D to a laptop.

But that also meant TensorFlow’s undeniable power was now out of Google’s control. For a little over two years, academia and Silicon Valley were still the ones making the biggest splashes with the software, but now that equation is changing. The catalyst is deepfakes, an anonymous Reddit user who built around AI software that automatically stitches any image of a face (nearly) seamlessly into a video. And you can probably imagine where this is going: As first reported by Motherboard, the software was being used to put anyone’s face, such as a famous woman or friend on Facebook, on the bodies of porn actresses.

After the first Motherboard story, the user created their own subreddit, which amassed more than 91,000 subscribers. Another Reddit user called deepfakeapp has also released a tool called FakeApp, which allows anyone to download the AI software and use it themselves, given the correct hardware. As of today, Reddit has banned the community, saying it violated the website’s policy on involuntary pornography.

According to FakeApp’s user guide, the software is built on top of TensorFlow. Google employees have pioneered similar work using TensorFlow with slightly different setups and subject matter, training algorithms to generate images from scratch. And there are plenty of potentially fun (if not inane) uses for deepfakes, like putting Nicolas Cage in a bunch of different movies. But let’s be real: 91,000 people were subscribed to deepfakes’ subreddit for the porn.

While much good has come from TensorFlow being open source, like potential cancer detection algorithms, FakeApp represents the dark side of open source. Google (and Microsoft and Amazon and Facebook) have loosed immense technological power on the world with absolutely no recourse. Anyone can download AI software and use it for anything they have the data to create. That means everything from faking political speeches (with help from the cadre of available voice-imitating AI) to generating fake revenge porn. All digital media is a series of ones and zeroes, and artificial intelligence is proving itself proficient at artfully arranging them to generate things that never happened.

You can imagine the rest or read the rest of Gershgon’s (deep voice): “dark side of open source.”

While you do, remember that Gershgon would have made the same claims about:

1. Telephones
2. Photography
3. Cable television
4. Internet
5. etc.

The simplest rejoinder is that the world did not create porn with AI. A tiny subset of the world signed up to see porn created by an even smaller subset of the world.

The next simplest rejoinder is the realization that Gershgon wants a system that dictates ethics to users of open source software. Gershgon should empower an agency to enforce ethics on journalists and check back in a couple of years to report on their experience.

I’m willing to be ahead of time it won’t be a happy report.

Bottom line: Leave the ethics of open source software to the people using such software. May not always have a happy outcome but will always be better than the alternatives.

### The Matrix Calculus You Need For Deep Learning

Wednesday, February 7th, 2018

Abstract:

This paper is an attempt to explain all the matrix calculus you need in order to understand the training of deep neural networks. We assume no math knowledge beyond what you learned in calculus 1, and provide links to help you refresh the necessary math where needed. Note that you do not need to understand this material before you start learning to train and use deep learning in practice; rather, this material is for those who are already familiar with the basics of neural networks, and wish to deepen their understanding of the underlying math. Don’t worry if you get stuck at some point along the way—just go back and reread the previous section, and try writing down and working through some examples. And if you’re still stuck, we’re happy to answer your questions in the Theory category at forums.fast.ai. Note: There is a reference section at the end of the paper summarizing all the key matrix calculus rules and terminology discussed here.

Here’s a recommendation for reading the paper:

(We teach in University of San Francisco’s MS in Data Science program and have other nefarious projects underway. You might know Terence as the creator of the ANTLR parser generator. For more material, see Jeremy’s fast.ai courses and University of San Francisco’s Data Institute in-person version of the deep learning course.

Apologies to Jeremy but I recognize ANTLR more quickly than I do Jeremy’s fast.ai courses. (Need to fix that.)

The paper runs thirty-three pages and as the authors say, most of it is unnecessary unless you want to understand what’s happening under the hood with deep learning.

Think of it as the difference between knowing how to drive a sports car and being able to work on a sports car.

With the latter set of skills, you can:

• tweak your sports car for maximum performance
• tweak someone else’s sports car for less performance
• detect someone tweaking your sports car

Read the paper, master the paper.

No test, just real world consequences that separate the prepared from the unprepared.

### Finally! A Main Stream Use for Deep Learning!

Tuesday, February 6th, 2018

Using deep learning to generate offensive license plates by Jonathan Nolis.

From the post:

If you’ve been on the internet for long enough you’ve seen quality content generated by deep learning algorithms. This includes algorithms trained on band names, video game titles, and Pokémon. As a data scientist who wants to keep up with modern tends in the field, I figured there would be no better way to learn how to use deep learning myself than to find a fun topic to generate text for. After having the desire to do this, I waited for a year before I found just the right data set to do it,

I happened to stumble on a list of banned license plates in Arizona. This list contains all of the personalized license plates that people requested but were denied by the Arizona Motor Vehicle Division. This dataset contained over 30,000 license plates which makes a great set of text for a deep learning algorithm. I included the data as text in my GitHub repository so other people can use it if they so choose. Unfortunately the data is from 2012, but I have an active Public Records Request to the state of Arizona for an updated list. I highly recommend you look through it, it’s very funny.

What a great idea! Not only are you learning deep learning but you are being offensive at the same time. A double-dipper!

A script for banging against your state license registration is left as an exercise for the reader.

A password generator using phonetics to spell offensive phrases for c-suite users would be nice.

### Secrets to Searching for Video Footage (AI Assistance In Your Future?)

Friday, January 12th, 2018

From the post:

Much of Bellingcat’s work requires intense research into particular events, which includes finding every possible photograph, video and witness account that will help inform our analysis. Perhaps most notably, we exhaustively researched the events surrounding the shoot down of Malaysian Airlines Flight 17 (MH17) over eastern Ukraine.

The photographs and videos taken near the crash in eastern Ukraine were not particularly difficult to find, as they were widely publicized. However, locating over a dozen photographs and videos of the Russian convoy transporting the Buk anti-aircraft missile launcher that shot down MH17 three weeks before the tragedy was much harder, and required both intense investigation on social networks and some creative thinking.

Most of these videos were shared on Russian-language social networks and YouTube, and did not involve another type of video that is much more important today than it was in 2014 — live streaming. Bellingcat has also made an effort to compile all user-generated videos of the events in Charlottesville on August 12, 2017, providing a database of livestreamed videos on platforms like Periscope, Ustream and Facebook Live, along with footage uploaded after the protest onto platforms like Twitter and YouTube.

Verifying videos is important, as detailed in this Bellingcat guide, but first you have to find them. This guide will provide advice and some tips on how to gather as much video as possible on a particular event, whether it is videos from witnesses of a natural disaster or a terrorist attack. For most examples in this guide, we will assume that the event is a large protest or demonstration, but the same advice is applicable to other events.

I was amused by this description of Snapchat and Instagram:

Snapchat and Instagram are two very common sources for videos, but also two of the most difficult platforms to trawl for clips. Neither has an intuitive search interface that easily allows researchers to sort through and collect videos.

I’m certain that’s true but a trained AI could sort out videos obtained by overly broad requests. As I’m fond of pointing out, not 100% accuracy but you can’t get that with humans either.

Augment your searching with a tireless AI. For best results, add or consult a librarian as well.

PS: I have other concerns at the moment but a subset of the Bellingcat Charlottesville database would make a nice training basis for an AI, which could then be loosed on Instagram and other sources to discover more videos. The usual stumbling block for AI projects being human curated material, which Bellingcat has already supplied.

### Tutorial on Deep Generative Models (slides and video)

Wednesday, December 27th, 2017

Slides for: Tutorial on Deep Generative Models by Shakir Mohamed and Danilo Rezende.

Abstract:

This tutorial will be a review of recent advances in deep generative models. Generative models have a long history at UAI and recent methods have combined the generality of probabilistic reasoning with the scalability of deep learning to develop learning algorithms that have been applied to a wide variety of problems giving state-of-the-art results in image generation, text-to-speech synthesis, and image captioning, amongst many others. Advances in deep generative models are at the forefront of deep learning research because of the promise they offer for allowing data-efficient learning, and for model-based reinforcement learning. At the end of this tutorial, audience member will have a full understanding of the latest advances in generative modelling covering three of the active types of models: Markov models, latent variable models and implicit models, and how these models can be scaled to high dimensional data. The tutorial will expose many questions that remain in this area, and for which thereremains a great deal of opportunity from members of the UAI community.

Deep sledding on the latest developments in deep generative models (August 2017 presentation) that ends with a bibliography starting on slide 84 of 96.

Depending on how much time has passed since the tutorial, try searching the topics as they are covered, keep a bibliography of your finds and compare it to that of the authors.

### Deep Learning for NLP, advancements and trends in 2017

Sunday, December 24th, 2017

If you didn’t get enough books as presents, Couto solves your reading shortage rather nicely:

Over the past few years, Deep Learning (DL) architectures and algorithms have made impressive advances in fields such as image recognition and speech processing.

Their application to Natural Language Processing (NLP) was less impressive at first, but has now proven to make significant contributions, yielding state-of-the-art results for some common NLP tasks. Named entity recognition (NER), part of speech (POS) tagging or sentiment analysis are some of the problems where neural network models have outperformed traditional approaches. The progress in machine translation is perhaps the most remarkable among all.

In this article I will go through some advancements for NLP in 2017 that rely on DL techniques. I do not pretend to be exhaustive: it would simply be impossible given the vast amount of scientific papers, frameworks and tools available. I just want to share with you some of the works that I liked the most this year. I think 2017 has been a great year for our field. The use of DL in NLP keeps widening, yielding amazing results in some cases, and all signs point to the fact that this trend will not stop.

After skimming this post, I suggest you make a fresh pot of coffee before starting to read and chase the references. It will take several days/pots to finish so it’s best to begin now.

### Deep Learning: Practice and Trends [NIPS 2017]

Wednesday, December 13th, 2017

NIPS 2017 Tutorial, Long Beach, CA.

The image is easier to read as the first slide but the dark blue line represents registrations versus time to the NIPS conference for 2017.

The hyperlinks for the authors are to their Twitter accounts. Need I say more?

Trivia question (before you review the slides): Name two early computer scientists who rejected the use of logic as the key to intelligence?

No prize, just curious if you know without the slides.

### Connecting R to Keras and TensorFlow

Tuesday, December 12th, 2017

Connecting R to Keras and TensorFlow by Joseph Rickert.

From the post:

It has always been the mission of R developers to connect R to the “good stuff”. As John Chambers puts it in his book Extending R:

One of the attractions of R has always been the ability to compute an interesting result quickly. A key motivation for the original S remains as important now: to give easy access to the best computations for understanding data.

From the day it was announced a little over two years ago, it was clear that Google’s TensorFlow platform for Deep Learning is good stuff. This September (see announcment), J.J. Allaire, François Chollet, and the other authors of the keras package delivered on R’s “easy access to the best” mission in a big way. Data scientists can now build very sophisticated Deep Learning models from an R session while maintaining the flow that R users expect. The strategy that made this happen seems to have been straightforward. But, the smooth experience of using the Keras API indicates inspired programming all the way along the chain from TensorFlow to R.

The Redditor deepfakes, of AI-Assisted Fake Porn fame mentions Keras as one of his tools. Is that an endorsement?

Rickert’s post is a quick start to Keras and Tensorflow but he does mention:

the MEAP from the forthcoming Manning Book, Deep Learning with R by François Chollet, the creator of Keras, and J.J. Allaire.

I’ve had good luck with Manning books in general so am looking forward to this one as well.

### 23 Deep Learning Papers To Get You Started — Part 1 (Reading Slowly)

Saturday, November 25th, 2017

23 Deep Learning Papers To Get You Started — Part 1 by Rupak Kr. Thakur.

Deep Learning has probably been the single-most discussed topic in the academia and industry in recent times. Today, it is no longer exclusive to an elite group of scientists. Its widespread applications warrants that people from all disciplines have an understanding of the underlying concepts, so as to be able to better apply these techniques in their field of work. As a result of which, MOOCs, certifications and bootcamps have flourished. People have generally preferred the hands-on learning experiences. However, there is a considerable population who still give in to the charm of learning the subject the traditional way — through research papers.

Reading research papers can be pretty time-consuming, especially since there are hordes of publications available nowadays, as Andrew Ng said at an AI conference, recently, along with encouraging people to use the existing research output to build truly transformative solutions across industries.

In this series of blog posts, I’ll try to condense the learnings from some really important papers into 15–20 min reads, without missing out on any key formulas or explanations. The blog posts are written, keeping in mind the people, who want to learn basic concepts and applications of deep learning, but can’t spend too much time scouring through the vast literature available. Each part of the blog will broadly cater to a theme and will introduce related key papers, along with suggesting some great papers for additional reading.

In the first part, we’ll explore papers related to CNNs — an important network architecture in deep learning. Let’s get started!

The start of what promises to be a great series on deep learning!

While the posts will extract the concepts and important points of the papers, I suggest you download the papers and map the summaries back to the papers themselves.

It will be good practice on reading original research, not to mention re-enforcing what you have learned from the posts.

In my reading, I will be looking for ways to influence deep learning towards one answer or another.

Whatever they may say about “facts” in public, no sane client asks for advice without an opinion on the range of acceptable answers.

Imagine you found ISIS content on Twitter has no measurable impact on ISIS recruiting. Would any intelligence agency would ask you for deep learning services again?

### Deep Learning for NLP Best Practices

Wednesday, July 26th, 2017

From the introduction:

This post is a collection of best practices for using neural networks in Natural Language Processing. It will be updated periodically as new insights become available and in order to keep track of our evolving understanding of Deep Learning for NLP.

There has been a running joke in the NLP community that an LSTM with attention will yield state-of-the-art performance on any task. While this has been true over the course of the last two years, the NLP community is slowly moving away from this now standard baseline and towards more interesting models.

However, we as a community do not want to spend the next two years independently (re-)discovering the next LSTM with attention. We do not want to reinvent tricks or methods that have already been shown to work. While many existing Deep Learning libraries already encode best practices for working with neural networks in general, such as initialization schemes, many other details, particularly task or domain-specific considerations, are left to the practitioner.

This post is not meant to keep track of the state-of-the-art, but rather to collect best practices that are relevant for a wide range of tasks. In other words, rather than describing one particular architecture, this post aims to collect the features that underly successful architectures. While many of these features will be most useful for pushing the state-of-the-art, I hope that wider knowledge of them will lead to stronger evaluations, more meaningful comparison to baselines, and inspiration by shaping our intuition of what works.

I assume you are familiar with neural networks as applied to NLP (if not, I recommend Yoav Goldberg’s excellent primer [43]) and are interested in NLP in general or in a particular task. The main goal of this article is to get you up to speed with the relevant best practices so you can make meaningful contributions as soon as possible.

I will first give an overview of best practices that are relevant for most tasks. I will then outline practices that are relevant for the most common tasks, in particular classification, sequence labelling, natural language generation, and neural machine translation.

Certainly a resource to bookmark while you read A Primer on Neural Network Models for Natural Language Processing by Yoav Goldberg, at 76 pages and to consult frequently as you move beyond the primer stage.

Enjoy and pass it on!

### Deep Learning – Dodging The NSA

Monday, May 29th, 2017

Ivanov’s motivation for local deep learning hardware came from monthly AWS bills.

You may suffer from those or be training on data sets you’d rather not share with the NSA.

For whatever reason, follow these detailed descriptions to build your own deep learning box.

Caution: If more than a month or more has lapsed from this post and your starting to build a system, check all the update links. Hardware and prices change rapidly.

### Weaponizing GPUs (Terrorism)

Monday, May 22nd, 2017

ETH Zurich scientists leveraged deep learning to automatically stich together millions of public images and video into a three-dimensional, living model of the city of Zurich.

The platform called “VarCity” combines a variety of different image sources: aerial photographs, 360-degree panoramic images taken from vehicles, photos published by tourists on social networks and video material from YouTube and public webcams.

“The more images and videos the platform can evaluate, the more precise the model becomes,” says Kenneth Vanhoey, a postdoc in the group led by Luc Van Gool, a Professor at ETH Zurich’s Computer Vision Lab. “The aim of our project was to develop the algorithms for such 3D city models, assuming that the volume of available images and videos will also increase dramatically in the years ahead.”

Using a cluster of GPUs including Tesla K40s with cuDNN to train their deep learning models, the technology recognizes image content such as buildings, windows and doors, streets, bodies of water, people, and cars. Without human assistance, the 3D model “knows”, for example, what pavements are and – by evaluating webcam data – which streets are one-way only.

The data/information gap between nation states and non-nation state groups grows narrower everyday. Here, GPUs and deep learning, produce planning data terrorists could have only dreamed about twenty years ago.

Technical advances make precautions such as:

Federal, state, and local law enforcement let people know that if they take pictures or notes around monuments and critical infrastructure facilities, they could be subject to an interrogation or an arrest; in addition to the See Something, Say Something awareness campaign, DHS also has broader initiatives such as the Buffer Zone Protection Program, which teach local police and security how to spot potential terrorist activities. (DHS focus on suspicious activity at critical infrastructure facilities)

sound old fashioned and quaint.

Such measures annoy tourists but unless potential terrorists are as dumb as the underwear bomber, against a skilled adversary, not so much.

I guess that’s the question isn’t it?

Are you planning to fight terrorists from shallow end of the gene pool or someone a little more challenging?

### DeepSketch2Face

Tuesday, May 16th, 2017

DeepSketch2Face: A Deep Learning Based Sketching System for 3D Face and Caricature Modeling by Xiaguang Han, Chang Gao, and Yizhou Yu.

Abstract:

Face modeling has been paid much attention in the field of visual computing. There exist many scenarios, including cartoon characters, avatars for social media, 3D face caricatures as well as face-related art and design, where low-cost interactive face modeling is a popular approach especially among amateur users. In this paper, we propose a deep learning based sketching system for 3D face and caricature modeling. This system has a labor-efficient sketching interface, that allows the user to draw freehand imprecise yet expressive 2D lines representing the contours of facial features. A novel CNN based deep regression network is designed for inferring 3D face models from 2D sketches. Our network fuses both CNN and shape based features of the input sketch, and has two independent branches of fully connected layers generating independent subsets of coefficients for a bilinear face representation. Our system also supports gesture based interactions for users to further manipulate initial face models. Both user studies and numerical results indicate that our sketching system can help users create face models quickly and effectively. A significantly expanded face database with diverse identities, expressions and levels of exaggeration is constructed to promote further research and evaluation of face modeling techniques.

Deep learning assisted drawing, here with faces or drawing more generally, is rife with possibilities for humor.

Realistic caricature/avatars are nearly within the reach of even art-challenged users.

### Virtual Jihadists (Bots)

Saturday, March 4th, 2017

Chip Huyen, who teaches CS 20SI: “TensorFlow for Deep Learning Research” @Standford, has posted code examples for the class, along with a chatbot, developed for one of the assignments.

The readme for the chatbot reads in part:

A neural chatbot using sequence to sequence model with attentional decoder. This is a fully functional chatbot.

This is based on Google Translate Tensorflow model https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/

Sequence to sequence model by Cho et al.(2014)

Created by Chip Huyen as the starter code for assignment 3, class CS 20SI: “TensorFlow for Deep Learning Research” cs20si.stanford.edu

The detailed assignment handout and information on training time can be found at http://web.stanford.edu/class/cs20si/assignments/a3.pdf

Dialogue is lacking but this chatbot could be trained to appear to government forces as a live “jihadist” following and conversing with other “jihadists.” Who may themselves be chatbots.

Unlike the expense of pilots for a fleet of drones, a single user could “pilot” a group of chatbots, creating an over-sized impression in cyberspace. The deeper the modeling of human jihadists, the harder it will be to distinguish virtual jihadists.

I say “jihadists” for headline effect. You could create interacting chatbots for right/left wing hate groups, gun owners, churches, etc., in short, anyone seeking to dilute surveillance.

(Unlike the ACLU or EFF, I don’t concede there are any legitimate reasons for government surveillance. The dangers of government surveillance far exceed any possible crime it could prevent. Government surveillance is the question. The answer is NO.)

CS 20SI: Tensorflow for Deep Learning Research

From the webpage:

Tensorflow is a powerful open-source software library for machine learning developed by researchers at Google Brain. It has many pre-built functions to ease the task of building different neural networks. Tensorflow allows distribution of computation across different computers, as well as multiple CPUs and GPUs within a single machine. TensorFlow provides a Python API, as well as a less documented C++ API. For this course, we will be using Python.

This course will cover the fundamentals and contemporary usage of the Tensorflow library for deep learning research. We aim to help students understand the graphical computational model of Tensorflow, explore the functions it has to offer, and learn how to build and structure models best suited for a deep learning project. Through the course, students will use Tensorflow to build models of different complexity, from simple linear/logistic regression to convolutional neural network and recurrent neural networks with LSTM to solve tasks such as word embeddings, translation, optical character recognition. Students will also learn best practices to structure a model and manage research experiments.

Enjoy!

### AI Podcast: Winning the Cybersecurity Cat and Mouse Game with AI

Wednesday, February 22nd, 2017

AI Podcast: Winning the Cybersecurity Cat and Mouse Game with AI. Brian Caulfield interviews Eli David of Deep Instinct.

From the description:

Cybersecurity is a cat-and-mouse game. And the mouse always has the upper hand. That’s because it’s so easy for new malware to go undetected.

Eli David, an expert in computational intelligence, wants to use AI to change that. He’s CTO of Deep Instinct, a security firm with roots in Israel’s defense industry, that is bringing the GPU-powered deep learning techniques underpinning modern speech and image recognition to the vexing world of cybersecurity.

“It’s exactly like Tom and Jerry, the cat and the mouse, with the difference being that, in this case, Jerry the mouse always has the upper hand,” David said in a conversation on the AI Podcast with host Michael Copeland. He notes that more than 1 million new pieces of malware are created every day.

Interesting take on detection of closely similar malware using deep learning.

Directed in part at detecting smallish modifications that evade current malware detection techniques.

OK, but who is working on using deep learning to discover flaws in software code?

### Deep Learning (MIT Press Book) – Published (and still online)

Monday, February 13th, 2017

Deep Learning by Yoshua Bengio, Ian Goodfellow and Aaron Courville.

From the introduction:

1.1 Who Should Read This Book?

This book can be useful for a variety of readers, but we wrote it with two main target audiences in mind. One of these target audiences is university students(undergraduate or graduate) learning about machine learning, including those who are beginning a career in deep learning and artiﬁcial intelligence research. The other target audience is software engineers who do not have a machine learning or statistics background, but want to rapidly acquire one and begin using deep learning in their product or platform. Deep learning has already proven useful in many software disciplines including computer vision, speech and audio processing,natural language processing, robotics, bioinformatics and chemistry, video games,search engines, online advertising and ﬁnance.

This book has been organized into three parts in order to best accommodate a variety of readers. Part I introduces basic mathematical tools and machine learning concepts. Part II describes the most established deep learning algorithms that are essentially solved technologies. Part III describes more speculative ideas that are widely believed to be important for future research in deep learning.

Readers should feel free to skip parts that are not relevant given their interests or background. Readers familiar with linear algebra, probability, and fundamental machine learning concepts can skip part I, for example, while readers who just want to implement a working system need not read beyond part II. To help choose which chapters to read, ﬁgure 1.6 provides a ﬂowchart showing the high-level organization of the book.

We do assume that all readers come from a computer science background. We assume familiarity with programming, a basic understanding of computational performance issues, complexity theory, introductory level calculus and some of the terminology of graph theory.

This promises to be a real delight, whether read for an application space or to get a better handle on deep learning.

### Comparing Symbolic Deep Learning Frameworks

Thursday, December 8th, 2016

Deep Learning Part 1: Comparison of Symbolic Deep Learning Frameworks by Anusua Trivedi.

From the post:

This blog series is based on my upcoming talk on re-usability of Deep Learning Models at the Hadoop+Strata World Conference in Singapore. This blog series will be in several parts – where I describe my experiences and go deep into the reasons behind my choices.

Deep learning is an emerging field of research, which has its application across multiple domains. I try to show how transfer learning and fine tuning strategy leads to re-usability of the same Convolution Neural Network model in different disjoint domains. Application of this model across various different domains brings value to using this fine-tuned model.

In this blog (Part1), I describe and compare the commonly used open-source deep learning frameworks. I dive deep into different pros and cons for each framework, and discuss why I chose Theano for my work.

Your mileage may vary but a great starting place!

### srez: Image super-resolution through deep learning

Sunday, August 28th, 2016

From the webpage:

Image super-resolution through deep learning. This project uses deep learning to upscale 16×16 images by a 4x factor. The resulting 64×64 images display sharp features that are plausible based on the dataset that was used to train the neural net.

Here’s an random, non cherry-picked, example of what this network can do. From left to right, the first column is the 16×16 input image, the second one is what you would get from a standard bicubic interpolation, the third is the output generated by the neural net, and on the right is the ground truth.

Once you have collected names, you are likely to need image processing.

Here’s an interesting technique using deep learning. Face on at the moment but you can expect that to improve.

### What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning?

Friday, August 19th, 2016

From the post:

Artificial intelligence is the future. Artificial intelligence is science fiction. Artificial intelligence is already part of our everyday lives. All those statements are true, it just depends on what flavor of AI you are referring to.

For example, when Google DeepMind’s AlphaGo program defeated South Korean Master Lee Se-dol in the board game Go earlier this year, the terms AI, machine learning, and deep learning were used in the media to describe how DeepMind won. And all three are part of the reason why AlphaGo trounced Lee Se-Dol. But they are not the same things.

The easiest way to think of their relationship is to visualize them as concentric circles with AI — the idea that came first — the largest, then machine learning — which blossomed later, and finally deep learning — which is driving today’s AI explosion — fitting inside both.

If you are confused by the mix of artificial intelligence, machine learning, and deep learning, floating around, Copeland will set you straight.

It’s a fun read and one you can recommend to non-technical friends.

### Grokking Deep Learning

Wednesday, August 17th, 2016

Grokking Deep Learning by Andrew W. Trask.

From the description:

Artificial Intelligence is the most exciting technology of the century, and Deep Learning is, quite literally, the “brain” behind the world’s smartest Artificial Intelligence systems out there. Loosely based on neuron behavior inside of human brains, these systems are rapidly catching up with the intelligence of their human creators, defeating the world champion Go player, achieving superhuman performance on video games, driving cars, translating languages, and sometimes even helping law enforcement fight crime. Deep Learning is a revolution that is changing every industry across the globe.

Grokking Deep Learning is the perfect place to begin your deep learning journey. Rather than just learn the “black box” API of some library or framework, you will actually understand how to build these algorithms completely from scratch. You will understand how Deep Learning is able to learn at levels greater than humans. You will be able to understand the “brain” behind state-of-the-art Artificial Intelligence. Furthermore, unlike other courses that assume advanced knowledge of Calculus and leverage complex mathematical notation, if you’re a Python hacker who passed high-school algebra, you’re ready to go. And at the end, you’ll even build an A.I. that will learn to defeat you in a classic Atari game.

In the Manning Early Access Program (MEAP) with three (3) chapters presently available.

A much more plausible undertaking than DARPA’s quest for “Explainable AI” or “XAI.” (DARPA WANTS ARTIFICIAL INTELLIGENCE TO EXPLAIN ITSELF) DARPA reasons that:

Potential applications for defense are endless—autonomous aerial and undersea war-fighting or surveillance, among others—but humans won’t make full use of AI until they trust it won’t fail, according to the Defense Advanced Research Projects Agency. A new DARPA effort aims to nurture communication between machines and humans by investing in AI that can explain itself as it works.

If non-failure is the criteria for trust, U.S. troops should refuse to leave their barracks in view of the repeated failures of military strategy since the end of WWII.

DARPA should choose a less stringent criteria for trusting an AI. However, failing less often than the Joint Chiefs of Staff may be too low a bar to set.

### Deep Learning Trends @ ICLR 2016 (+ Shout-Out to arXiv)

Friday, June 3rd, 2016

From the post:

Started by the youngest members of the Deep Learning Mafia [1], namely Yann LeCun and Yoshua Bengio, the ICLR conference is quickly becoming a strong contender for the single most important venue in the Deep Learning space. More intimate than NIPS and less benchmark-driven than CVPR, the world of ICLR is arXiv-based and moves fast.

Today’s post is all about ICLR 2016. I’ll highlight new strategies for building deeper and more powerful neural networks, ideas for compressing big networks into smaller ones, as well as techniques for building “deep learning calculators.” A host of new artificial intelligence problems is being hit hard with the newest wave of deep learning techniques, and from a computer vision point of view, there’s no doubt that deep convolutional neural networks are today’s “master algorithm” for dealing with perceptual data.

Information packed review of the conference and if that weren’t enough, this shout-out to arXiv:

ICLR Publishing Model: arXiv or bust
At ICLR, papers get posted on arXiv directly. And if you had any doubts that arXiv is just about the single awesomest thing to hit the research publication model since the Gutenberg press, let the success of ICLR be one more data point towards enlightenment. ICLR has essentially bypassed the old-fashioned publishing model where some third party like Elsevier says “you can publish with us and we’ll put our logo on your papers and then charge regular people $30 for each paper they want to read.” Sorry Elsevier, research doesn’t work that way. Most research papers aren’t good enough to be worth$30 for a copy. It is the entire body of academic research that provides true value, for which a single paper just a mere door. You see, Elsevier, if you actually gave the world an exceptional research paper search engine, together with the ability to have 10-20 papers printed on decent quality paper for a \$30/month subscription, then you would make a killing on researchers and I would endorse such a subscription. So ICLR, rightfully so, just said fuck it, we’ll use arXiv as the method for disseminating our ideas. All future research conferences should use arXiv to disseminate papers. Anybody can download the papers, see when newer versions with corrections are posted, and they can print their own physical copies. But be warned: Deep Learning moves so fast, that you’ve gotta be hitting refresh or arXiv on a weekly basis or you’ll be schooled by some grad students in Canada.

Is your publishing < arXiv?

Do you hit arXiv every week?

### Deep Learning: Image Similarity and Beyond (Webinar, May 10, 2016)

Friday, May 6th, 2016

Deep Learning: Image Similarity and Beyond (Webinar, May 10, 2016)

From the registration page:

Deep Learning is a powerful machine learning method for image tagging, object recognition, speech recognition, and text analysis. In this demo, we’ll cover the basic concept of deep learning and walk you through the steps to build an application that finds similar images using an already-trained deep learning model.

#### Recommended for:

• Data scientists and engineers
• Developers and technical team managers
• Technical product managers

What you’ll learn:

• How to leverage existing deep learning models
• How to extract deep features and use them using GraphLab Create
• How to build and deploy an image similarity service using Dato Predictive Services

What we’ll cover:

• Using an already-trained deep learning model
• Extracting deep features
• Building and deploying an image similarity service for pictures

Deep learning has difficulty justifying its choices, just like human judges of similarity, but could it play a role in assisting topic map authors in constructing explicit decisions for merging?

Once trained, could deep learning suggest properties and/or values to consider for merging it has not yet experienced?

I haven’t seen any webinars recently so I am ready to gamble on this being an interesting one.

Enjoy!

### Revealing the Hidden Patterns of News Photos:… [Uncovers Anti-Sanders Bias]

Saturday, March 26th, 2016

Abstract:

In this work, we analyze more than two million news photos published in January 2016. We demonstrate i) which objects appear the most in news photos; ii) what the sentiments of news photos are; iii) whether the sentiment of news photos is aligned with the tone of the text; iv) how gender is treated; and v) how differently political candidates are portrayed. To our best knowledge, this is the first large-scale study of news photo contents using deep learning-based vision APIs.

Not that bias-free news is possible, but deep learning appears to be useful in foregrounding bias against particular candidates:

We then conducted a case study of assessing the portrayal of Democratic and Republican party presidential candidates in news photos. We found that all the candidates but Sanders had a similar proportion of being labeled as an athlete, which is typically associates with a victory pose or a sharp focus on a face with blurred background. Pro-Clinton media recognized by their endorsements show the same tendency; their Sanders photos are not labeled as an athlete at all. Furthermore, we found that Clinton expresses joy more than Sanders does in the six popular news media. Similarly. pro-Clinton media shows a higher proportion of Clinton expressing joy than Sanders.

If the requirement is an “appearance” of lack of bias, the same techniques enable the monitoring/shaping of your content to prevent your bias from being discovered by others.

Data scientists who can successfully wield this framework will be in high demand for political campaigns.

### Automating Amazon/Hotel/Travel Reviews (+ Human Intelligence Test (HIT))

Sunday, February 28th, 2016

The Neural Network That Remembers by Zachary C. Lipton & Charles Elkan.

From the post:

On tap at the brewpub. A nice dark red color with a nice head that left a lot of lace on the glass. Aroma is of raspberries and chocolate. Not much depth to speak of despite consisting of raspberries. The bourbon is pretty subtle as well. I really don’t know that find a flavor this beer tastes like. I would prefer a little more carbonization to come through. It’s pretty drinkable, but I wouldn’t mind if this beer was available.

Besides the overpowering bouquet of raspberries in this guy’s beer, this review is remarkable for another reason. It was produced by a computer program instructed to hallucinate a review for a “fruit/vegetable beer.” Using a powerful artificial-intelligence tool called a recurrent neural network, the software that produced this passage isn’t even programmed to know what words are, much less to obey the rules of English syntax. Yet, by mining the patterns in reviews from the barflies at BeerAdvocate.com, the program learns how to generate similarly coherent (or incoherent) reviews.

The neural network learns proper nouns like “Coors Light” and beer jargon like “lacing” and “snifter.” It learns to spell and to misspell, and to ramble just the right amount. Most important, the neural network generates reviews that are contextually relevant. For example, you can say, “Give me a 5-star review of a Russian imperial stout,” and the software will oblige. It knows to describe India pale ales as “hoppy,” stouts as “chocolatey,” and American lagers as “watery.” The neural network also learns more colorful words for lagers that we can’t put in print.

This particular neural network can also run in reverse, taking any review and recognizing the sentiment (star rating) and subject (type of beer). This work, done by one of us (Lipton) in collaboration with his colleagues Sharad Vikram and Julian McAuley at the University of California, San Diego, is part of a growing body of research demonstrating the language-processing capabilities of recurrent networks. Other related feats include captioning images, translating foreign languages, and even answering e-mail messages. It might make you wonder whether computers are finally able to think.

(emphasis in original)

An enthusiastic introduction and projection of the future of recurrent neural networks! Quite a bit so.

My immediate thought was what a time saver a recurrent neural network would be for “evaluation” requests that appear in my inbox with alarming regularity.

What about a service that accepts forwarded emails and generates a review for the book, seller, hotel, travel, etc., which is returned to you for cut-n-paste?

That would be about as “intelligent” as the amount of attention most of us devote to such requests.

You could set the service to mimic highly followed reviewers so over time you would move up the ranks of reviewers.

I mention Amazon, hotel, travel reviews but those are just low-lying fruit. You could do journal book reviews with a different data set.

Near the end of the post the authors write:

In this sense, the computer-science community is evaluating recurrent neural networks via a kind of Turing test. We try to teach a computer to act intelligently by training it to imitate what people produce when faced with the same task. Then we evaluate our thinking machine by seeing whether a human judge can distinguish between its output and what a human being might come up with.

While the very fact that we’ve come this far is exciting, this approach may have some fundamental limitations. For instance, it’s unclear how such a system could ever outstrip the capabilities of the people who provide the training data. Teaching a machine to learn through imitation might never produce more intelligence than was present collectively in those people.

One promising way forward might be an approach called reinforcement learning. Here, the computer explores the possible actions it can take, guided only by some sort of reward signal. Recently, researchers at Google DeepMind combined reinforcement learning with feed-forward neural networks to create a system that can beat human players at 31 different video games. The system never got to imitate human gamers. Instead it learned to play games by trial and error, using its score in the video game as a reward signal.

Instead of asking whether computers can think, the more provocative question is “whether people think for a large range of daily activities?”

Consider it as the Human Intelligence Test (HIT).

How much “intelligence” does it take to win a video game?

Eye/hand coordination to be sure, attention, but what “intelligence” is involved?

Computers may “eclipse” human beings at non-intelligent activities, as a shovel “eclipses” our ability to dig with our bare hands.

But I’m not overly concerned.

Are you?

### Webinar: Image Similarity: Deep Learning and Beyond (January 12th/Register for Recording)

Monday, January 11th, 2016

From the webpage:

In this talk, we will extract features from the convolutional networks applied to real estate images to build a similarity graph and then do label propagation on the images to label different images in our dataset.

Recommended for:

• Data scientists and engineers
• Developers and technical team managers
• Technical product managers

What you’ll learn:

• How to extract features from a convolutional network using GraphLab Create
• How to build similarity graphs using nearest neighbors
• How to implement graph algorithms such as PageRank using GraphLab Create

What we’ll cover:

• Extracting features from convolutional networks
• Building similarity graphs using nearest neighbors
• Clustering: kmeans and beyond
• Graph algorithms: PageRank and label propagation

I had mixed results with webinars in 2015.

Looking forward to this one because of the coverage of similarity graphs.

From a subject identity perspective, how much similarity do you need to be the “same” subject?

If I have two books, one printed within the copyright period and another copy printed after the work came into the public domain, are they the same subject?

For some purposes yes and for other purposes not.

The strings we give web browsers, usually starting with “https://” these days, are crude measures of subject identity, don’t you think?

I say “the strings we give web browsers” as the efforts of TBL and his cronies to use popularity as a measure of success, continue their efforts to conflate URI, IRI, and URL into only URL. https://url.spec.whatwg.org/ The simplification doesn’t bother me as much as the attempts to conceal it.

It’s one way to bolster a claim to have anyways been right, just re-write the records that anyone is likely to remember. I prefer my history with warts and all.

### Awesome Deep Learning – Value-Add Curation?

Monday, December 28th, 2015

Tweeted by Gregory Piatetsky as:

Awesome Curated #DeepLearning resources on #GitHub: books, courses, lectures, researchers…

What will you find there? (As of 28 December 2015):

• Courses – 15
• Datasets – 114
• Free Online Books – 8
• Frameworks – 35
• Miscellaneous – 26
• Papers – 32
• Researchers – 96
• Tutorials – 13
• Videos and Lectures – 16
• Websites – 24

By my count, that’s 359 resources.

We know from detailed analysis of PubMed search logs, that 80% of searchers choose a link from the first twenty “hits” returned for a search.

You could assume that out of “23 million user sessions and more than 58 million user queries” PubMed searchers and/or PubMed itself or both transcend the accuracy of searching observed in other contexts. That seems rather unlikely.

The authors note:

Two interesting phenomena are observed: first, the number of clicks for the documents in the later pages degrades exponentially (Figure 8). Second, PubMed users are more likely to click the first and last returned citation of each result page (Figure 9). This suggests that rather than simply following the retrieval order of PubMed, users are influenced by the results page format when selecting returned citations.

Result page format seems like a poor basis for choosing search results, in addition to being in the top twenty (20) results.

Eliminating all the cruft from search results to give you 359 resources is a value-add, but what value-add should added to this list of resources?

What are the top five (5) value-adds on your list?

Serious question because we have tools far beyond what were available to curators in the 1960’s but there is little (if any) curation to match of the Reader’s Guide to Periodical Literature.

Here is a screen-shot of some of its contents:

If you can, tell me what search you would use to return that sort of result for “abortion” as a subject.

Nothing come to mind?

Just to get you started, would pointing to algorithms across these 359 resources be helpful? Would you want to know more than algorithm N occurs in resource Y? Some of the more popular ones may occur in every resource. How helpful is that?

So I repeat my earlier question:

What are the top five (5) value-adds on your list?

Please forward, repost, reblog, tweet. Thanks!

### The Limitations of Deep Learning in Adversarial Settings [The other type of setting would be?]

Tuesday, November 24th, 2015

Abstract:

Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.

I recommended deep learning for parsing lesser known languages earlier today. The utility of deep learning isn’t in doubt, but its vulnerability to “adversarial” input should give us pause.

Adversarial input isn’t likely to be labeled as such. In fact, it may be concealed in ordinary open data that is freely available for download.

As the authors note, the more prevalent deep learning becomes, the greater the incentive for the manipulation of input into a deep neural network (DNN).

Although phrased as “adversaries,” the manipulation of input into DNNs isn’t limited to the implied “bad actors.” The choice or “cleaning” of input could be considered manipulation of input, from a certain point of view.

This paper is notice that input into a DNN is as important in evaluating its results as as any other factor, if not more so.

Or to put it more bluntly, no disclosure of DNN data = no trust of DNN results.

### Deep Learning and Parsing

Sunday, November 22nd, 2015

Jason Baldridge tweets that the work of James Henderson (Google Scholar) should get more cites for deep learning and parsing.

Jason points to the following two works (early 1990’s) in particular:

Description Based Parsing in a Connectionist Network by James B. Henderson.

Abstract:

Recent developments in connectionist architectures for symbolic computation have made it possible to investigate parsing in a connectionist network while still taking advantage of the large body of work on parsing in symbolic frameworks. This dissertation investigates syntactic parsing in the temporal synchrony variable binding model of symbolic computation in a connectionist network. This computational architecture solves the basic problem with previous connectionist architectures,
while keeping their advantages. However, the architecture does have some limitations, which impose computational constraints on parsing in this architecture. This dissertation argues that, despite these constraints, the architecture is computationally adequate for syntactic parsing, and that these constraints make signi cant linguistic predictions. To make these arguments, the nature of the architecture’s limitations are fi rst characterized as a set of constraints on symbolic
computation. This allows the investigation of the feasibility and implications of parsing in the architecture to be investigated at the same level of abstraction as virtually all other investigations of syntactic parsing. Then a specifi c parsing model is developed and implemented in the architecture. The extensive use of partial descriptions of phrase structure trees is crucial to the ability of this model to recover the syntactic structure of sentences within the constraints. Finally, this parsing model is tested on those phenomena which are of particular concern given the constraints, and on an approximately unbiased sample of sentences to check for unforeseen difficulties. The results show that this connectionist architecture is powerful enough for syntactic parsing. They also show that some linguistic phenomena are predicted by the limitations of this architecture. In particular, explanations are given for many cases of unacceptable center embedding, and for several signifi cant constraints on long distance dependencies. These results give evidence for the cognitive signi ficance
of this computational architecture and parsing model. This work also shows how the advantages of both connectionist and symbolic techniques can be uni ed in natural language processing applications. By analyzing how low level biological and computational considerations influence higher level processing, this work has furthered our understanding of the nature of language and how it can be efficiently and e ffectively processed.

Connectionist Syntactic Parsing Using Temporal Variable Binding by James Henderson.

Abstract:

Recent developments in connectionist architectures for symbolic computation have made it possible to investigate parsing in a connectionist network while still taking advantage of the large body of work on parsing in symbolic frameworks. The work discussed here investigates syntactic parsing in the temporal synchrony variable binding model of symbolic computation in a connectionist network. This computational architecture solves the basic problem with previous connectionist architectures, while keeping their advantages. However, the architecture does have some limitations, which impose constraints on parsing in this architecture. Despite these constraints, the architecture is computationally adequate for syntactic parsing. In addition, the constraints make some signifi cant linguistic predictions. These arguments are made using a specifi c parsing model. The extensive use of partial descriptions of phrase structure trees is crucial to the ability of this model to recover the syntactic structure of sentences within the constraints imposed by the architecture.

Enjoy!

### Hopping on the Deep Learning Bandwagon

Thursday, November 5th, 2015

Hopping on the Deep Learning Bandwagon by Yanir Seroussi.

From the post:

I’ve been meaning to get into deep learning for the last few years. Now, the stars having finally aligned and I have the time and motivation to work on a small project that will hopefully improve my understanding of the field. This is the first in a series of posts that will document my progress on this project.

As mentioned in a previous post on getting started as a data scientist, I believe that the best way of becoming proficient at solving data science problems is by getting your hands dirty. Despite being familiar with high-level terminology and having some understanding of how it all works, I don’t have any practical experience applying deep learning. The purpose of this project is to fix this experience gap by working on a real problem.

#### The problem: Inferring genre from album covers

Deep learning has been very successful at image classification. Therefore, it makes sense to work on an image classification problem for this project. Rather than using an existing dataset, I decided to make things a bit more interesting by building my own dataset. Over the last year, I’ve been running BCRecommender – a recommendation system for Bandcamp music. I’ve noticed that album covers vary by genre, though it’s hard to quantify exactly how they vary. So the question I’ll be trying to answer with this project is how accurately can genre be inferred from Bandcamp album covers?

As the goal of this project is to learn about deep learning rather than make a novel contribution, I didn’t do a comprehensive search to see whether this problem has been addressed before. However, I did find a recent post by Alexandre Passant that describes his use of Clarifai’s API to tag the content of Spotify album covers (identifying elements such as men, night, dark, etc.), and then using these tags to infer the album’s genre. Another related project is Karayev et al.’s Recognizing image style paper, in which the authors classified datasets of images from Flickr and Wikipedia by style and art genre, respectively. In all these cases, the results are pretty good, supporting my intuition that the genre inference task is feasible.

Yanir continues this adventure into deep learning with: Learning About Deep Learning Through Album Cover Classification. And you will want to look over his list of Deep Learning Resources.

Yanir’s observation that the goal of the project was “…to learn about deep learning rather than make a novel contribution…” is an important one.

The techniques and lessons you learn may be known to others but they will be new to you.

### How to build and run your first deep learning network

Thursday, October 29th, 2015

From the post:

When I first became interested in using deep learning for computer vision I found it hard to get started. There were only a couple of open source projects available, they had little documentation, were very experimental, and relied on a lot of tricky-to-install dependencies. A lot of new projects have appeared since, but they’re still aimed at vision researchers, so you’ll still hit a lot of the same obstacles if you’re approaching them from outside the field.

In this article — and the accompanying webcast — I’m going to show you how to run a pre-built network, and then take you through the steps of training your own. I’ve listed the steps I followed to set up everything toward the end of the article, but because the process is so involved, I recommend you download a Vagrant virtual machine that I’ve pre-loaded with everything you need. This VM lets us skip over all the installation headaches and focus on building and running the neural networks.

I have been unable to find the posts that were to follow in this series.

Even by itself this will be enough to get you going on deep learning but the additional posts would be nice.

Pointers anyone?