Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 3, 2015

Minerva: a fast and flexible system for deep learning

Filed under: Deep Learning,GPU — Patrick Durusau @ 7:57 pm

Minerva: a fast and flexible system for deep learning

From the post:

Minerva is a fast and flexible tool for deep learning. It provides NDarray programming interface, just like Numpy. Python bindings and C++ bindings are both available. The resulting code can be run on CPU or GPU. Multi-GPU support is very easy. Please refer to the examples to see how multi-GPU setting is used.Minerva is a fast and flexible tool for deep learning. It provides NDarray programming interface, just like Numpy. Python bindings and C++ bindings are both available. The resulting code can be run on CPU or GPU. Multi-GPU support is very easy. Please refer to the examples to see how multi-GPU setting is used.

Features

  • Matrix programming interface
  • Easy interaction with NumPy
  • Multi-GPU, multi-CPU support
  • Good performance: ImageNet AlexNet training achieves 213 and 403 images/s with one and two Titan GPU, respectivly. Four GPU cards number will be coming soon.

I first saw this in a blog post by Danny Bickson, Minerva: open source deep learning on GPU software from MS.

Deep learning is gaining traction fast. Fast enough that when government contractors convince the FBI wire tapping is no long a matter of plugging into the local junction box, they may start working on deep learning.

Before deep learning gets to that point, defensive measures against deep learning need to be developed. Given the variety of deep learning approaches and algorithms that is going to be a real challenge.

Perhaps immutable data structures where copying enables real time performance in presenting results that are expected? While maintaining a copy of the unexpected results?

I think there is a presumption that on querying, information systems repeat the information they have stored. That’s a fairly naive view of data storage. We know it is a matter of permissions to “see” data. Why shouldn’t the answer you see also depend upon permissions?

Defenses against deep learning and reactive data storage may become very relevant in the not too distant future. Give it some thought.

January 1, 2015

Show and Tell (C-suite version)

Filed under: Algorithms,Deep Learning,Machine Learning — Patrick Durusau @ 5:02 pm

How Google “Translates” Pictures into Words Using Vector Space Mathematics

From the post:

Translating one language into another has always been a difficult task. But in recent years, Google has transformed this process by developing machine translation algorithms that change the nature of cross cultural communications through Google Translate.

Now that company is using the same machine learning technique to translate pictures into words. The result is a system that automatically generates picture captions that accurately describe the content of images. That’s something that will be useful for search engines, for automated publishing and for helping the visually impaired navigate the web and, indeed, the wider world.

One of the best c-suite level explanations I have seen of Show and Tell: A Neural Image Caption Generator.

May be useful to you in obtaining support/funding for similar efforts in your domain.

Take particular note of the decision to not worry overmuch about the meaning of words. I would never make that simplifying assumption. Just runs counter to the grain for the meaning of the words to not matter. However, I am very glad that Oriol Vinyals and colleagues made that assumption!

That assumption enables the processing of images at a large scale.

I started to write that I would not use such an assumption for more precise translation tasks, say the translation of cuneiform tablets. But as a rough finding aid for untranslated cuneiform or hieroglyphic texts, this could be the very thing. Doesn’t have to be 100% precise or accurate, just enough that the vast archives of ancient materials becomes easier to use.

Is there an analogy for topic maps here? That topic maps need not be final production quality materials when released but can be refined over time by authors, editors and others?

Like Wikipedia but not quite so eclectic and more complete. Imagine a Solr reference manual that inlines or at least links to the most recent presentations and discussions on a particular topic. And incorporates information from such sources into the text.

Is Google offering us “good enough” results with data, expectations that others will refine the data further? Perhaps a value-add economic model where the producer of the “good enough” content has an interest in the further refinement of that data by others?

December 24, 2014

DL4J: Deep Learning for Java

Filed under: Deep Learning,Machine Learning,Neural Networks — Patrick Durusau @ 9:26 am

DL4J: Deep Learning for Java

From the webpage:

Deeplearning4j is the first commercial-grade, open-source deep-learning library written in Java. It is meant to be used in business environments, rather than as a research tool for extensive data exploration. Deeplearning4j is most helpful in solving distinct problems, like identifying faces, voices, spam or e-commerce fraud.

Deeplearning4j integrates with GPUs and includes a versatile n-dimensional array class. DL4J aims to be cutting-edge plug and play, more convention than configuration. By following its conventions, you get an infinitely scalable deep-learning architecture suitable for Hadoop and other big-data structures. This Java deep-learning library has a domain-specific language for neural networks that serves to turn their multiple knobs.

Deeplearning4j includes a distributed deep-learning framework and a normal deep-learning framework (i.e. it runs on a single thread as well). Training takes place in the cluster, which means it can process massive amounts of data. Nets are trained in parallel via iterative reduce, and they are equally compatible with Java, Scala and Clojure, since they’re written for the JVM.

This open-source, distributed deep-learning framework is made for data input and neural net training at scale, and its output should be highly accurate predictive models.

By following the links at the bottom of each page, you will learn to set up, and train with sample data, several types of deep-learning networks. These include single- and multithread networks, Restricted Boltzmann machines, deep-belief networks, Deep Autoencoders, Recursive Neural Tensor Networks, Convolutional Nets and Stacked Denoising Autoencoders.

For a quick introduction to neural nets, please see our overview.

There are a lot of knobs to turn when you’re training a deep-learning network. We’ve done our best to explain them, so that Deeplearning4j can serve as a DIY tool for Java, Scala and Clojure programmers. If you have questions, please join our Google Group; for premium support, contact us at Skymind. ND4J is the Java scientific computing engine powering our matrix manipulations.

And you thought I write jargon laden prose. 😉

This both looks both exciting (as a technology) and challenging (as in needing accessible documentation).

Are you going to be “…turn[ing] their multiple knobs” over the holidays?

GitHub Repo

Tweets

#deeplearning4j @IRC

Google Group

I first saw this in a tweet by Gregory Piatetsky.

December 23, 2014

Deep Learning: Doubly Easy and Doubly Powerful with GraphLab Create

Filed under: Deep Learning,GraphLab — Patrick Durusau @ 3:08 pm

Deep Learning: Doubly Easy and Doubly Powerful with GraphLab Create by Piotr Teterwak.

From the post:

One of machine learning’s core goals is classification of input data. This is the task of taking novel data and assigning it to one of a pre-determined number of labels, based on what the classifier learns from a training set. For instance, a classifier could take an image and predict whether it is a cat or a dog.

dl_simpleclassifier

The pieces of information fed to a classifier for each data point are called features, and the category they belong to is a ‘target’ or ‘label’. Typically, the classifier is given data points with both features and labels, so that it can learn the correspondence between the two. Later, the classifier is queried with a data point and the classifier tries to predict what category it belongs to. A large group of these query data-points constitute a prediction-set, and the classifier is usually evaluated on its accuracy, or how many prediction queries it gets correct.

Despite a slow start, the post moves onto deep learning and GraphLab Create in detail, with code. You will need the GPU version of GraphLab Create to get the full benefit of this post.

Beyond distinguishing dogs and cats, a concern for other dogs and cats I’m sure, what images would you classify with deep learning?

I first saw this in a tweet by Aapo Kyrola

December 20, 2014

Teaching Deep Convolutional Neural Networks to Play Go

Filed under: Deep Learning,Games,Machine Learning,Monte Carlo — Patrick Durusau @ 2:38 pm

Teaching Deep Convolutional Neural Networks to Play Go by Christopher Clark and Amos Storkey.

Abstract:

Mastering the game of Go has remained a long standing challenge to the field of AI. Modern computer Go systems rely on processing millions of possible future positions to play well, but intuitively a stronger and more ‘humanlike’ way to play the game would be to rely on pattern recognition abilities rather then brute force computation. Following this sentiment, we train deep convolutional neural networks to play Go by training them to predict the moves made by expert Go players. To solve this problem we introduce a number of novel techniques, including a method of tying weights in the network to ‘hard code’ symmetries that are expect to exist in the target function, and demonstrate in an ablation study they considerably improve performance. Our final networks are able to achieve move prediction accuracies of 41.1% and 44.4% on two different Go datasets, surpassing previous state of the art on this task by significant margins. Additionally, while previous move prediction programs have not yielded strong Go playing programs, we show that the networks trained in this work acquired high levels of skill. Our convolutional neural networks can consistently defeat the well known Go program GNU Go, indicating it is state of the art among programs that do not use Monte Carlo Tree Search. It is also able to win some games against state of the art Go playing program Fuego while using a fraction of the play time. This success at playing Go indicates high level principles of the game were learned.

If you are going to pursue the study of Monte Carlo Tree Search for semantic purposes, there isn’t any reason to not enjoy yourself as well. 😉

And following the best efforts in game playing will be educational as well.

I take the efforts at playing Go by computer as well as those for chess, as indicating how far ahead humans are to AI.

Both of those two-player, complete knowledge games were mastered long ago by humans. Multi-player games with extended networds of influence and motives, not to mention incomplete information as well, seem securely reserved for human players for the foreseeable future. (I wonder if multi-player scenarios are similar to the multi-body problem in physics? Except with more influences.)

I first saw this in a tweet by Ebenezer Fogus.

December 19, 2014

DeepSpeech: Scaling up end-to-end speech recognition [Is Deep the new Big?]

Filed under: Deep Learning,Machine Learning,Speech Recognition — Patrick Durusau @ 5:18 pm

DeepSpeech: Scaling up end-to-end speech recognition by Awni Hannun, et al.

Abstract:

We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a “phoneme.” Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called DeepSpeech, outperforms previously published results on the widely studied Switchboard Hub5’00, achieving 16.5% error on the full test set. DeepSpeech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

Although the academic papers, so far, are using “deep learning” in a meaningful sense, early 2015 is likely to see many vendors rebranding their offerings as incorporating or being based on deep learning.

When approached with any “deep learning” application or service, check out the Internet Archive WayBack Machine to see how they were marketing their software/service before “deep learning” became popular.

Is there a GPU-powered box in your future?

I first saw this in a tweet by Andrew Ng.


Update: After posting I encountered: Baidu claims deep learning breakthrough with Deep Speech by Derrick Harris. Talks to Andrew Ng, great write-up.

December 18, 2014

DeepDive

Filed under: Deep Learning,Machine Learning — Patrick Durusau @ 7:11 pm

DeepDive

From the homepage:

DeepDive is a new type of system that enables developers to analyze data on a deeper level than ever before. DeepDive is a trained system: it uses machine learning techniques to leverage on domain-specific knowledge and incorporates user feedback to improve the quality of its analysis.

DeepDive differs from traditional systems in several ways:

  • DeepDive is aware that data is often noisy and imprecise: names are misspelled, natural language is ambiguous, and humans make mistakes. Taking such imprecisions into account, DeepDive computes calibrated probabilities for every assertion it makes. For example, if DeepDive produces a fact with probability 0.9 it means the fact is 90% likely to be true.
  • DeepDive is able to use large amounts of data from a variety of sources. Applications built using DeepDive have extracted data from millions of documents, web pages, PDFs, tables, and figures.
  • DeepDive allows developers to use their knowledge of a given domain to improve the quality of the results by writing simple rules that inform the inference (learning) process. DeepDive can also take into account user feedback on the correctness of the predictions, with the goal of improving the predictions.
  • DeepDive is able to use the data to learn "distantly". In contrast, most machine learning systems require tedious training for each prediction. In fact, many DeepDive applications, especially at early stages, need no traditional training data at all!
  • DeepDive’s secret is a scalable, high-performance inference and learning engine. For the past few years, we have been working to make the underlying algorithms run as fast as possible. The techniques pioneered in this project
    are part of commercial and open source tools including MADlib, Impala, a product from Oracle, and low-level techniques, such as Hogwild!. They have also been included in Microsoft's Adam.

This is an example of why I use Twitter for current awareness. My odds for encountering DeepDive on a web search, due primarily to page-ranked search results, are very, very low. From the change log, it looks like DeepDive was announced in March of 2014, which isn’t very long to build up a page-rank.

You do have to separate the wheat from the chaff with Twitter, but DeepDive is an example of what you may find. You won’t find it with search, not for another year or two, perhaps longer.

How does that go? He said he had a problem and was going to use search to find a solution? Now he has two problems? 😉

I first saw this in a tweet by Stian Danenbarger.

PS: Take a long and careful look at DeepDive. Unless I find other means, I am likely to be using DeepDive to extract text and the redactions (character length) from a redacted text.

December 15, 2014

Deep learning for… chess

Filed under: Amazon Web Services AWS,Deep Learning,Games,GPU — Patrick Durusau @ 5:38 am

Deep learning for… chess by Erik Bernhardsson.

From the post:

I’ve been meaning to learn Theano for a while and I’ve also wanted to build a chess AI at some point. So why not combine the two? That’s what I thought, and I ended up spending way too much time on it. I actually built most of this back in September but not until Thanksgiving did I have the time to write a blog post about it.

Chess sets are a common holiday gift so why not do something different this year?

Pretty print a copy of this post and include a gift certificate from AWS for a GPU instance for say a week to ten days.

I don’t think AWS sells gift certificates, but they certainly should. Great stocking stuffer, anniversary/birthday/graduation present, etc. Not so great for Valentines Day.

If you ask AWS for a gift certificate, mention my name. They don’t know who I am so I could use the publicity. 😉

I first saw this in a tweet by Onepaperperday.

December 12, 2014

Deep Neural Networks are Easily Fooled:…

Filed under: Deep Learning,Machine Learning,Neural Networks — Patrick Durusau @ 7:47 pm

Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images by Anh Nguyen, Jason Yosinski, Jeff Clune.

Abstract:

Deep neural networks (DNNs) have recently been achieving state-of-the-art performance on a variety of pattern-recognition tasks, most notably visual classification problems. Given that DNNs are now able to classify objects in images with near-human-level performance, questions naturally arise as to what differences remain between computer and human vision. A recent study revealed that changing an image (e.g. of a lion) in a way imperceptible to humans can cause a DNN to label the image as something else entirely (e.g. mislabeling a lion a library). Here we show a related result: it is easy to produce images that are completely unrecognizable to humans, but that state-of-the-art DNNs believe to be recognizable objects with 99.99% confidence (e.g. labeling with certainty that white noise static is a lion). Specifically, we take convolutional neural networks trained to perform well on either the ImageNet or MNIST datasets and then find images with evolutionary algorithms or gradient ascent that DNNs label with high confidence as belonging to each dataset class. It is possible to produce images totally unrecognizable to human eyes that DNNs believe with near certainty are familiar objects. Our results shed light on interesting differences between human vision and current DNNs, and raise questions about the generality of DNN computer vision.

This is a great paper for weekend reading, even if computer vision isn’t your field. In part because the results were unexpected. Computer science is moving towards being an experimental science, at least in some situations.

Before you read the article, spend a few minutes thinking about how DNNs and human vision differ.

I haven’t run it to ground yet but I wonder if the authors have stumbled upon a way to deceive deep neural networks outside of computer vision applications? If so, does that suggest experiments that could identify ways to deceive other classification algorithms? And how would you detect such means if they were employed? Still confident about your data processing results?

I first saw this in a tweet by Gregory Piatetsky.

October 29, 2014

How to run the Caffe deep learning vision library…

Filed under: Deep Learning,GPU,NVIDIA — Patrick Durusau @ 5:17 pm

How to run the Caffe deep learning vision library on Nvidia’s Jetson mobile GPU board by Pete Warden.

From the post:

Jetson boardPhoto by Gareth Halfacree

My colleague Yangqing Jia, creator of Caffe, recently spent some free time getting the framework running on Nvidia’s Jetson board. If you haven’t heard of the Jetson, it’s a small development board that includes Nvidia’s TK1 mobile GPU chip. The TK1 is starting to appear in high-end tablets, and has 192 cores so it’s great for running computational tasks like deep learning. The Jetson’s a great way to get a taste of what we’ll be able to do on mobile devices in the future, and it runs Ubuntu so it’s also an easy environment to develop for.

Caffe comes with a pre-built ‘Alexnet’ model, a version of the Imagenet-winning architecture that recognizes 1,000 different kinds of objects. Using this as a benchmark, the Jetson can analyze an image in just 34ms! Based on this table I’m estimating it’s drawing somewhere around 10 or 11 watts, so it’s power-intensive for a mobile device but not too crazy.

Yangqing passed along his instructions, and I’ve checked them on my own Jetson, so here’s what you need to do to get Caffe up and running.

Hardware fun for the middle of your week!

192 cores for under $200, plus GPU experience.

October 17, 2014

Update with 162 new papers to Deeplearning.University Bibliography

Filed under: Deep Learning — Patrick Durusau @ 6:57 pm

Update with 162 new papers to Deeplearning.University Bibliography by Amund Tveit.

From the post:

Added 162 new Deep Learning papers to the Deeplearning.University Bibliography, if you want to see them separate from the previous papers in the bibliography the new ones are listed below. There are many highly interesting papers, a few examples are:

  1. Deep neural network based load forecast – forecasts of electricity prediction
  2. The relation of eye gaze and face pose: Potential impact on speech recognition – combining speech recognition with facial expression
  3. Feature Learning from Incomplete EEG with Denoising Autoencoder – Deep Learning for Brain Computer Interfaces

Underneath are the 162 new papers, enjoy!

(Complete Bibliography – at Deeplearning.University Bibliography)

Disclaimer: we’re so far only covering (subset of) 2014 deep learning papers, so still far from a complete bibliography, but our goal is to come close eventuallly

Best regards,

Amund Tveit (Memkite Team)

You could find all these papers by search, if you knew what search terms to use.

This bibliography is a reminder of the power of curated data. The categories and grouping the papers into categories are definitely a value-add. Search doesn’t have those, in case you haven’t noticed. 😉

October 4, 2014

You Don’t Have to Be Google to Build an Artificial Brain

Filed under: Artificial Intelligence,Deep Learning — Patrick Durusau @ 7:24 pm

You Don’t Have to Be Google to Build an Artificial Brain by Cade Metz.

From the post:

When Google used 16,000 machines to build a simulated brain that could correctly identify cats in YouTube videos, it signaled a turning point in the art of artificial intelligence.

Applying its massive cluster of computers to an emerging breed of AI algorithm known as “deep learning,” the so-called Google brain was twice as accurate as any previous system in recognizing objects pictured in digital images, and it was hailed as another triumph for the mega data centers erected by the kings of the web.

But in the middle of this revolution, a researcher named Alex Krizhevsky showed that you don’t need a massive computer cluster to benefit from this technology’s unique ability to “train itself” as it analyzes digital data. As described in a paper published later that same year, he outperformed Google’s 16,000-machine cluster with a single computer—at least on one particular image recognition test.

This was a rather expensive computer, equipped with large amounts of memory and two top-of-the-line cards packed with myriad GPUs, a specialized breed of computer chip that allows the machine to behave like many. But it was a single machine nonetheless, and it showed that you didn’t need a Google-like computing cluster to exploit the power of deep learning.

Cade’s article should encourage you to do two things:

  • Learn GPU’s cold
  • Ditto on Deep Learning

Google and others will always have more raw processing power than any system you are likely to afford. However, while a steam shovel can shovel a lot of clay, it takes a real expert to make a vase. Particularly a very good one.

Do you want to pine for a steam shovel or work towards creating a fine vase?

PS: Google isn’t building “an artificial brain,” not anywhere close. That’s why all their designers, programmers and engineers are wetware.

September 10, 2014

Recursive Deep Learning For Natural Language Processing And Computer Vision

Filed under: Deep Learning,Machine Learning,Natural Language Processing — Patrick Durusau @ 5:28 am

Recursive Deep Learning For Natural Language Processing And Computer Vision by Richard Socher.

From the abstract:

As the amount of unstructured text data that humanity produces overall and on the Internet grows, so does the need to intelligently process it and extract diff erent types of knowledge from it. My research goal in this thesis is to develop learning models that can automatically induce representations of human language, in particular its structure and meaning in order to solve multiple higher level language tasks.

There has been great progress in delivering technologies in natural language processing such as extracting information, sentiment analysis or grammatical analysis. However, solutions are often based on diff erent machine learning models. My goal is the development of general and scalable algorithms that can jointly solve such tasks and learn the necessary intermediate representations of the linguistic units involved. Furthermore, most standard approaches make strong simplifying language assumptions and require well designed feature representations. The models in this thesis address these two shortcomings. They provide eff ective and general representations for sentences without assuming word order independence. Furthermore, they provide state of the art performance with no, or few manually designed features.

The new model family introduced in this thesis is summarized under the term Recursive Deep Learning. The models in this family are variations and extensions of unsupervised and supervised recursive neural networks (RNNs) which generalize deep and feature learning ideas to hierarchical structures. The RNN models of this thesis obtain state of the art performance on paraphrase detection, sentiment analysis, relation classifi cation, parsing, image-sentence mapping and knowledge base completion, among other tasks.

Socher’s models offer two significant advances:

  • No assumption of word order independence
  • No or few manually designed features

Of the two, I am more partial to elimination of the assumption of word order independence. I suppose in part because I see that leading to abandoning that assumption that words have some fixed meaning separate and apart from the other words used to define them.

Or in topic maps parlance, identifying a subject always involves the use of other subjects, which are themselves capable of being identified. Think about it. When was the last time you were called upon to identify a person, object or thing and you uttered an IRI? Never right?

That certainly works, at least in closed domains, in some cases, but other than simply repeating the string, you have no basis on which to conclude that is the correct IRI. Nor does anyone else have a basis to accept or reject your IRI.

I suppose that is another one of those “simplifying” assumptions. Useful in some cases but not all.

August 28, 2014

…Deep Learning Text Classification

Filed under: Deep Learning,Graphs,Neo4j — Patrick Durusau @ 4:20 pm

Using a Graph Database for Deep Learning Text Classification by Kenny Bastani.

From the post:

Graphify is a Neo4j unmanaged extension that provides plug and play natural language text classification.

Graphify gives you a mechanism to train natural language parsing models that extract features of a text using deep learning. When training a model to recognize the meaning of a text, you can send an article of text with a provided set of labels that describe the nature of the text. Over time the natural language parsing model in Neo4j will grow to identify those features that optimally disambiguate a text to a set of classes.

Similarity and graphs. What’s there to not like?

August 20, 2014

Deep Learning (MIT Press Book)

Filed under: Deep Learning,Machine Learning — Patrick Durusau @ 2:12 pm

Deep Learning (MIT Press Book) by Yoshua Bengio, Ian Goodfellow, and Aaron Courville.

From the webpage:

Draft chapters available for feedback – August 2014
Please help us make this a great book! This draft is still full of typos and can be improved in many ways. Your suggestions are more than welcome. Do not hesitate to contact any of the authors directly by e-mail or Google+ messages: Yoshua, Ian, Aaron.

Teaching a subject isn’t the only way to learn it cold. Proofing a book on a subject is another way to learn material cold.

Ready to dig in?

I first saw this in a tweet by Gregory Piatetsky

August 19, 2014

Deep Learning for NLP (without Magic)

Filed under: Deep Learning,Machine Learning,Natural Language Processing — Patrick Durusau @ 2:47 pm

Deep Learning for NLP (without Magic) by Richard Socher and Christopher Manning.

Abstract:

Machine learning is everywhere in today’s NLP, but by and large machine learning amounts to numerical optimization of weights for human designed representations and features. The goal of deep learning is to explore how computers can take advantage of data to develop features and representations appropriate for complex interpretation tasks. This tutorial aims to cover the basic motivation, ideas, models and learning algorithms in deep learning for natural language processing. Recently, these methods have been shown to perform very well on various NLP tasks such as language modeling, POS tagging, named entity recognition, sentiment analysis and paraphrase detection, among others. The most attractive quality of these techniques is that they can perform well without any external hand-designed resources or time-intensive feature engineering. Despite these advantages, many researchers in NLP are not familiar with these methods. Our focus is on insight and understanding, using graphical illustrations and simple, intuitive derivations. The goal of the tutorial is to make the inner workings of these techniques transparent, intuitive and their results interpretable, rather than black boxes labeled “magic here”. The first part of the tutorial presents the basics of neural networks, neural word vectors, several simple models based on local windows and the math and algorithms of training via backpropagation. In this section applications include language modeling and POS tagging. In the second section we present recursive neural networks which can learn structured tree outputs as well as vector representations for phrases and sentences. We cover both equations as well as applications. We show how training can be achieved by a modified version of the backpropagation algorithm introduced before. These modifications allow the algorithm to work on tree structures. Applications include sentiment analysis and paraphrase detection. We also draw connections to recent work in semantic compositionality in vector spaces. The principle goal, again, is to make these methods appear intuitive and interpretable rather than mathematically confusing. By this point in the tutorial, the audience members should have a clear understanding of how to build a deep learning system for word-, sentence- and document-level tasks. The last part of the tutorial gives a general overview of the different applications of deep learning in NLP, including bag of words models. We will provide a discussion of NLP-oriented issues in modeling, interpretation, representational power, and optimization.

A tutorial on deep learning from NAACL 2013, Atlanta. The webpage offers links to the slides (205), video of the tutorial, and additional resources.

Definitely a place to take a dive into deep learning.

On page 35 of the slides the following caught my eye:

The vast majority of rule-based and statistical NLP work regards words as atomic symbols: hotel, conference, walk.

In vector space terms, this is a vector with one 1 and a lot of zeroes.

[000000000010000]

Dimensionality: 20K (speech) – 50K (PTB) – 500K (big vocab) – 13M (Google 1T)

We call this a “one-hot” representation. Its problem:

motel [000000000010000] AND
hotel [000000010000000] = 0

Another aspect of topic maps comes to the fore!

You can have “one-hot” representations of subjects in a topic map, that is a single identifier, but that’s not required.

You can have multiple “one-hot” representations for a subject or you can have more complex collections of properties that represent a subject. Depends on your requirements, not a default of the technology.

If “one-hot” representations of subjects are insufficient for deep learning, shouldn’t they be insufficient for humans as well?

August 5, 2014

Deep Learning in Java

Filed under: Deep Learning,Feature Learning,Machine Learning — Patrick Durusau @ 6:03 pm

Deep Learning in Java by Ryan Swanstrom.

From the post:

Deep Learning is the hottest topic in all of data science right now. Adam Gibson, cofounder of Blix.io, has created an open source deep learning library for Java named DeepLearning4j. For those curious, DeepLearning4j is open sourced on github.

Ryan has some other deep learning goodies at his post so don’t skip directly to DeepLearning4j. 😉

Like all machine learning techniques, the more you know about it the easier it will be to ask uncomfortable questions when someone over plays their results.

It’s a useful technique but it is also useful to be an intelligent consumer of its results.

July 19, 2014

What is deep learning, and why should you care?

Filed under: Deep Learning,Image Recognition,Machine Learning — Patrick Durusau @ 2:45 pm

What is deep learning, and why should you care? by Pete Warden.

From the post:

neuron

When I first ran across the results in the Kaggle image-recognition competitions, I didn’t believe them. I’ve spent years working with machine vision, and the reported accuracy on tricky tasks like distinguishing dogs from cats was beyond anything I’d seen, or imagined I’d see anytime soon. To understand more, I reached out to one of the competitors, Daniel Nouri, and he demonstrated how he used the Decaf open-source project to do so well. Even better, he showed me how he was quickly able to apply it to a whole bunch of other image-recognition problems we had at Jetpac, and produce much better results than my conventional methods.

I’ve never encountered such a big improvement from a technique that was largely unheard of just a couple of years before, so I became obsessed with understanding more. To be able to use it commercially across hundreds of millions of photos, I built my own specialized library to efficiently run prediction on clusters of low-end machines and embedded devices, and I also spent months learning the dark arts of training neural networks. Now I’m keen to share some of what I’ve found, so if you’re curious about what on earth deep learning is, and how it might help you, I’ll be covering the basics in a series of blog posts here on Radar, and in a short upcoming ebook.

Pete gives a brief sketch of “deep learning” and promises more posts and a short ebook to follow.

Along those same lines you will want to see:

Microsoft Challenges Google’s Artificial Brain With ‘Project Adam’ by Daniela Hernandez (WIRED).

If you want in depth (technical) coverage, see: Deep Learning…moving beyond shallow machine learning since 2006! The reading list and references here should keep you busy for some time.

BTW, on “…shallow machine learning…” you do know the “Dark Ages” really weren’t “dark” but were so named in the Renaissance in order to show the fall into darkness (the Fall of Rome), the “Dark Ages,” and then the return of “light” in the Renaissance? See: Dark Ages (historiography).

Don’t overly credit characterizations of ages or technologies by later ages or newer technologies. They too will be found primitive and superstitious.

March 10, 2014

Data Science 101: Deep Learning Methods and Applications

Filed under: Data Science,Deep Learning,Machine Learning,Microsoft — Patrick Durusau @ 7:56 pm

Data Science 101: Deep Learning Methods and Applications by Daniel Gutierrez.

From the post:

Microsoft Research, the research arm of the software giant, is a hotbed of data science and machine learning research. Microsoft has the resources to hire the best and brightest researchers from around the globe. A recent publication is available for download (PDF): “Deep Learning: Methods and Applications” by Li Deng and Dong Yu, two prominent researchers in the field.

Deep sledding with twenty (20) pages of bibliography and pointers to frequently updated lists of resources (at page 8).

You did say you were interested in deep learning. Yes? 😉

Enjoy!

May 3, 2013

Deep learning made easy

Filed under: Artificial Intelligence,Deep Learning,Machine Learning,Sparse Data — Patrick Durusau @ 1:06 pm

Deep learning made easy by Zygmunt Zając.

From the post:

As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal.

There are a couple benchmarks for this competition and the best one is unusually hard to beat – only less than a fourth of those taking part managed to do so. We’re among them. Here’s how.

The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning, called sparse filtering. Actually, it’s not secret. It’s available at Github, and has one or two very appealling properties. Let us explain.

The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications are in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for the computers*.

Geoff Hinton from Toronto talks about two ends of spectrum in machine learning: one is statistics and getting rid of noise, the other one – AI, or the things that humans are good at but computers are not. Deep learning proponents say that deep, that is, layered, architectures, are the way to solve AI kind of problems.

The idea might have something to do with an inspiration from how the brain works. Each layer is supposed to extract higher-level features, and these features are supposed to be more useful for the task at hand.

Rather say layered architectures are observed to mimic human results.

Just as a shovel mimics and exceeds a human hand for digging.

But you would not say operation of a shovel gives us insight into the operation of a human hand.

Or would you?

March 30, 2013

2012 IPAM Graduate Summer School: Deep Learning, Feature Learning

Filed under: Deep Learning,Feature Learning,Machine Learning — Patrick Durusau @ 2:43 pm

2012 IPAM Graduate Summer School: Deep Learning, Feature Learning

OK, so they skipped the weekends!

Still have fifteen (15) days of video.

So if you don’t have a date for movie night…., 😉

June 15, 2012

Deep Learning Tutorials

Filed under: Deep Learning,Machine Learning — Patrick Durusau @ 1:29 pm

Deep Learning Tutorials

From the main page:

Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. See these course notes for a brief introduction to Machine Learning for AI and an `introduction to Deep Learning algorithms.

Deep Learning is about learning multiple levels of representation and abstraction that help to make sense of data such as images, sound, and text.

For more about deep learning algorithms, see for example:

The tutorials presented here will introduce you to some of the most important deep learning algorithms and will also show you how to run them using Theano. Theano is a python library that makes writing deep learning models easy, and gives the option of training them on a GPU.

The algorithm tutorials have some prerequisites. You should know some python, and be familiar with numpy. Since this tutorial is about using Theano, you should read over the Theano basic tutorial first. Once you’ve done that, read through our Getting Started chapter — it introduces the notation, and [downloadable] datasets used in the algorithm tutorials, and the way we do optimization by stochastic gradient descent.

The tutorial materials reflect the content of Yoshua Bengio’s Learning Algorithms (ITF6266) course.

Part of the resources you will find at: Deep Learning … moving beyond shallow machine learning since 2006!. There is a break between 2010 and 2012, with a few entries, such as in the blog, dated for 2012. There has been a considerable amount of work in the mean time so you might want to contribute to the site.

November 29, 2011

Deep Learning

Filed under: Artificial Intelligence,Deep Learning,Machine Learning — Patrick Durusau @ 8:42 pm

Deep Learning… moving beyond shallow machine learning since 2006!

From the webpage:

Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence.

This website is intended to host a variety of resources and pointers to information about Deep Learning. In these pages you will find

  • a reading list
  • links to software
  • datasets
  • a discussion forum
  • as well as tutorials and cool demos

I encountered this site via its Deep Learning Tutorial which is only one of the tutorial type resources available Tutorials.

I mention that because the Deep Learning Tutorial looks like it would be of interest to anyone doing data or entity mining.
.

« Newer Posts

Powered by WordPress