Archive for the ‘Image Recognition’ Category

Turning Pixelated Faces Back Into Real Ones

Thursday, February 9th, 2017

Google’s neural networks turn pixelated faces back into real ones by John E. Dunn.

From the post:

Researchers at Google Brain have come up with a way to turn heavily pixelated images of human faces into something that bears a usable resemblance to the original subject.

In a new paper, the company’s researchers describe using neural networks put to work at two different ends of what should, on the face of it, be an incredibly difficult problem to solve: how to resolve a blocky 8 x 8 pixel images of faces or indoor scenes containing almost no information?

It’s something scientists in the field of super resolution (SR) have been working on for years, using techniques such as de-blurring and interpolation that are often not successful for this type of image. As the researchers put it:

When some details do not exist in the source image, the challenge lies not only in “deblurring” an image, but also in generating new image details that appear plausible to a human observer.

Their method involves getting the first “conditioning” neural network to resize 32 x 32 pixel images down to 8 x 8 pixels to see if that process can find a point at which they start to match the test image.

John raises a practical objection:


The obvious practical application of this would be enhancing blurry CCTV images of suspects. But getting to grips with real faces at awkward angles depends on numerous small details. Emphasise the wrong ones and police could end up looking for the wrong person.

True but John presumes the “suspects” are unknown. That’s true for the typical convenience store robbery on the 10 PM news but not so for “suspects” under intentional surveillance.

In those cases, multiple ground truth images from a variety of angles are likely to be available.

DigitalGlobe – Open Data Program [What About Government Disasters?]

Tuesday, January 31st, 2017

Open Data Program

From the post:

DigitalGlobe is committed to helping everyone See A Better World™ by providing accurate high-resolution satellite imagery to support disaster recovery in the wake of large-scale natural disasters.

We release open imagery for select sudden onset major crisis events, including pre-event imagery, post-event imagery and a crowdsourced damage assessment.

When crises occur, DigitalGlobe is committed to supporting the humanitarian community by providing critical and actionable information to assist response efforts. Associated imagery and crowdsourcing layers are released into the public domain under a Creative Commons 4.0 license, allowing for rapid use and easy integration with existing humanitarian response technologies.

Kudos to DigitalGlobe but what about government disasters?

Governments have spy satellites, image analysis corps and military trained to use multi-faceted data flow.

What of public releases for areas of conflict, Chechnya, West Bank/Gaza/Israel, etc.? To reduce the advantages of government?

That creates demand by government for the same product, plus DigitalGlobe advantages.

“It’s an ill wind that blows no good.”

Proposal to Reduce Privacy in New York City

Sunday, January 29th, 2017

Memo: New York Called For Face Recognition Cameras At Bridges, Tunnels by Kevin Collier.

From the post:

The state of New York has privately asked surveillance companies to pitch a vast camera system that would scan and identify people who drive in and out of New York City, according to a December memo obtained by Vocativ.

The call for private companies to submit plans is part of Governor Andrew Cuomo’s major infrastructure package, which he introduced in October. Though much of the related proposals would be indisputably welcome to most New Yorkers — renovating airports and improving public transportation — a little-noticed detail included installing cameras to “test emerging facial recognition software and equipment.”

The proposed system would be massive, the memo reads:

The Authority is interested in implementing a Facial Detection System, in a free-flow highway environment, where vehicle movement is unimpeded at highway speeds as well as bumper-to-bumper traffic, and license plate images are taken and matched to occupants of the vehicles (via license plate number) with Facial Detection and Recognition methods from a gantry-based or road-side monitoring location.

All seven of the MTA’s bridges and both its tunnels are named in the proposal.

NYCbridgesTunnels-460

Proposals only at this point but take this as fair warning.

Follow both Kevin Collier and Vocativ as plans by the State of New York to eliminate privacy for its citizens develop.

Counter-measures

One counter measure to license plate readers is marketed under the name PhotoMaskCover.

PhotoMaskCover-460

Caution: I have never used the PhotoMaskCover product and have no relationship with its manufacturer. It claims to work. Evaluate as you would any other product from an unknown vendor.

For the facial recognition cameras, I was reminded that a hoodie and sunglasses are an easy and non-suspicious way to avoid such cameras.

For known MTA facial recognition cameras, wear a deep cowl that casts a complete shadow on your facial features. (Assuming you can drive safely with the loss of peripheral vision.)

As the number of deep cowls increase in MTA images, authorities will obsess more and more over the “unidentifieds,” spending their resources less and less effectively.

Defeating surveillance increases everyone’s freedom.

Intro to Image Processing

Sunday, November 13th, 2016

Intro to Image Processing by Eric Schles.

A short but useful introduction to some, emphasis on some, of the capabilities of OpenCV.

Understanding image processing will make you a better consumer and producer of digital imagery.

To its great surprise, the “press” recently re-discovered government isn’t to be trusted.

The same is true for the “press.”

Develop your capability to judge images offered by any source.

Introducing the Open Images Dataset

Friday, September 30th, 2016

Introducing the Open Images Dataset by Ivan Krasin and Tom Duerig.

From the post:

In the last few years, advances in machine learning have enabled Computer Vision to progress rapidly, allowing for systems that can automatically caption images to apps that can create natural language replies in response to shared photos. Much of this progress can be attributed to publicly available image datasets, such as ImageNet and COCO for supervised learning, and YFCC100M for unsupervised learning.

Today, we introduce Open Images, a dataset consisting of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories. We tried to make the dataset as practical as possible: the labels cover more real-life entities than the 1000 ImageNet classes, there are enough images to train a deep neural network from scratch and the images are listed as having a Creative Commons Attribution license*.

The image-level annotations have been populated automatically with a vision model similar to Google Cloud Vision API. For the validation set, we had human raters verify these automated labels to find and remove false positives. On average, each image has about 8 labels assigned. Here are some examples:

Impressive data set, if you want to recognize a muffin, gherkin, pebble, etc., see the full list at dict.csv.

Hopeful the techniques you develop with these images will lead to more focused image recognition. 😉

I lightly searched the list and no “non-safe” terms jumped out at me. Suitable for family image training.

srez: Image super-resolution through deep learning

Sunday, August 28th, 2016

srez: Image super-resolution through deep learning. by David Garcia.

From the webpage:

Image super-resolution through deep learning. This project uses deep learning to upscale 16×16 images by a 4x factor. The resulting 64×64 images display sharp features that are plausible based on the dataset that was used to train the neural net.

Here’s an random, non cherry-picked, example of what this network can do. From left to right, the first column is the 16×16 input image, the second one is what you would get from a standard bicubic interpolation, the third is the output generated by the neural net, and on the right is the ground truth.

srez_sample_output-460

Once you have collected names, you are likely to need image processing.

Here’s an interesting technique using deep learning. Face on at the moment but you can expect that to improve.

I’ll See You The FBI’s 411.9 million images and raise 300 million more, per day

Wednesday, June 15th, 2016

FBI Can Access Hundreds of Millions of Face Recognition Photos by Jennifer Lynch.

From the post:

Today the federal Government Accountability Office (GAO) finally published its exhaustive report on the FBI’s face recognition capabilities. The takeaway: FBI has access to hundreds of millions more photos than we ever thought. And the Bureau has been hiding this fact from the public—in flagrant violation of federal law and agency policy—for years.

According to the GAO Report, FBI’s Facial Analysis, Comparison, and Evaluation (FACE) Services unit not only has access to FBI’s Next Generation Identification (NGI) face recognition database of nearly 30 million civil and criminal mug shot photos, it also has access to the State Department’s Visa and Passport databases, the Defense Department’s biometric database, and the drivers license databases of at least 16 states. Totaling 411.9 million images, this is an unprecedented number of photographs, most of which are of Americans and foreigners who have committed no crimes.

I understand and share the concern over the FBI’s database of 411.9 million images from identification sources, but let’s be realistic about the FBI’s share of all the image data.

Not an exhaustive list but:

Facebook alone is equaling the FBI photo count every 1.3 days. Moreover, Facebook data is tied to both Facebook and very likely, other social media data, unlike my driver’s license.

Instagram takes a little over 5 days to exceed the FBI image count. but like the little engine that could, it keeps trying.

I’m not sure how to count YouTube’s 300 hours of video every minute.

No reliable counts are available for porn images, which streamed from Pornhub in 2015, accounted for 1,892 petabytes of data.

The Pornhub data stream includes a lot of duplication but finding non-religious and reliable stats on porn is difficult. Try searching for statistics on porn images. Speculation, guesses, etc.

Based on those figures, it’s fair to say the number of images available to the FBI is somewhere North of 100 billion and growing.

Oh, you think non-public photos off-limits to the FBI?

Hmmm, so is lying to federal judges, or so they say.

The FBI may say they are following safeguards, etc., but once a agency develops a culture of lying “in the public’s interest,” why would you ever believe them?

If you believe the FBI now, shouldn’t you say: Shame on me?

Katia – rape screening in R

Tuesday, February 16th, 2016

Katia – rape screening in R

From the webpage:

It’s Not Enough to Condemn Violence Against Women. We Need to End It.

All 12 innocent female victims above were atrociously killed, sexually assaulted, or registered missing after meeting strangers on mainstream dating, personals, classifieds, or social networking services.

INTRODUCTION TO THE KATIA RAPE SCREEN

Those 12 beautiful faces in the gallery above, are our sisters and daughters. Looking at their pictures is like looking through a tiny pinhole onto an unprecedented rape and domestic violence crisis that is destroying the American family unit.

Verified by science, the KATIA rape screen, coded in the computer programming language, R, can provably stop a woman from ever meeting her attacker.

The technology is named after a RAINN-counseled first degree aggravated rape survivor named Katia.

It is based on the work of a Google engineer from the Reverse Image Search project and a RAINN (Rape, Abuse & Incest National Network) counselor, with a clinical background in mathematical statistics, who has over a period of 15 years compiled a linguistic pattern analysis of the messages that rapists use to lure women online.

Learn more about the science behind Katia.

This project is taking concrete steps to reduce violence against women.

What more is there to say?

Reverse Image Search (TinEye) [Clue to a User Topic Map Interface?]

Wednesday, February 3rd, 2016

TinEye was mentioned in a post I wrote in 2015, Baltimore Burning and Verification, but I did not follow up at the time.

Unlike some US intelligence agencies, TinEye has a cool logo:

TinEye

Free registration enables you to share search results with others, an important feature for news teams.

I only tested the plugin for Chrome, but it offers useful result options:

tineye-options

Once installed, use by hovering over an image in your browser, right “click” and select “Search image on TinEye.” Your results will be presented as set under options.

Clue to User Topic Map Interface

That is a good example of how one version of a topic map interface should work. Select some text, right “click” and “Search topic map ….(preset or selection)” with configurable result display.

That puts you into interaction with the topic map, which can offer properties to enable you to refine the identification of a subject of interest and then a merged presentation of the results.

As with a topic map, all sorts of complicated things are happening in the background with the TinEye extension.

But as a user, I’m interested in the results that FireEye presents not how it got them.

I used to say “more interested” to indicate I might care how useful results came to be assembled. That’s a pretension that isn’t true.

It might be true in some particular case, but for the vast majority of searches, I just want the (uncensored Google) results.

US Intelligence Community Logo for Same Capability

I discovered the most likely intelligence community logo for a similar search program:

peeping-tom_2734636b

The answer to the age-old question of “who watches the watchers?” is us. Which watchers are you watching?

Automatically Finding Weapons…

Wednesday, January 13th, 2016

Automatically Finding Weapons in Social Media Images Part 1 by Justin Seitz.

From the post:

As part of my previous post on gangs in Detroit, one thing had struck me: there are an awful lot of guns being waved around on social media. Shocker, I know. More importantly I began to wonder if there wasn’t a way to automatically identify when a social media post has guns or other weapons contained in them. This post will cover how to use a couple of techniques to send images to the Imagga API that will automatically tag pictures with keywords that it feels accurately describe some of the objects contained within the picture. As well, I will teach you how to use some slicing and dicing techniques in Python to help increase the accuracy of the tagging. Keep in mind that I am specifically looking for guns or firearm-related keywords, but you can easily just change the list of keywords you are interested in and try to find other things of interest like tanks, or rockets.

This blog post will cover how to handle the image tagging portion of this task. In a follow up post I will cover how to pull down all Tweets from an account and extract all the images that the user has posted (something my students do all the time!).

This rocks!

Whether you are trying to make contact with a weapon owner who isn’t in the “business” of selling guns or if you are looking for like-minded individuals, this is a great post.

Would make an interesting way to broadly tag images for inclusion in group subjects in a topic map, awaiting further refinement by algorithm or humans.

This is a great blog to follow: Automating OSINT.

Neural Networks, Recognizing Friendlies, $Billions; Friendlies as Enemies, $Priceless

Thursday, December 24th, 2015

Elon Musk merits many kudos for the recent SpaceX success.

At the same time, Elon has been nominated for Luddite of the Year, along with Bill Gates and Stephen Hawking, for fanning fears of artificial intelligence.

One favorite target for such fears are autonomous weapons systems. Hannah Junkerman annotated a list of 18 posts, articles and books on such systems for Just Security.

While moralists are wringing their hands, military forces have not let grass grow under their feet with regard to autonomous weapon systems. As Michael Carl Haas reports in Autonomous Weapon Systems: The Military’s Smartest Toys?:

Military forces that rely on armed robots to select and destroy certain types of targets without human intervention are no longer the stuff of science fiction. In fact, swarming anti-ship missiles that acquire and attack targets based on pre-launch input, but without any direct human involvement—such as the Soviet Union’s P-700 Granit—have been in service for decades. Offensive weapons that have been described as acting autonomously—such as the UK’s Brimstone anti-tank missile and Norway’s Joint Strike Missile—are also being fielded by the armed forces of Western nations. And while governments deny that they are working on armed platforms that will apply force without direct human oversight, sophisticated strike systems that incorporate significant features of autonomy are, in fact, being developed in several countries.

In the United States, the X-47B unmanned combat air system (UCAS) has been a definite step in this direction, even though the Navy is dodging the issue of autonomous deep strike for the time being. The UK’s Taranis is now said to be “merely” semi-autonomous, while the nEUROn developed by France, Greece, Italy, Spain, Sweden and Switzerland is explicitly designed to demonstrate an autonomous air-to-ground capability, as appears to be case with Russia’s MiG Skat. While little is known about China’s Sharp Sword, it is unlikely to be far behind its competitors in conceptual terms.

The reasoning of military planners in favor of autonomous weapons systems isn’t hard to find, especially when one article describes air-to-air combat between tactically autonomous and machine-piloted aircraft versus piloted aircraft this way:


This article claims that a tactically autonomous, machine-piloted aircraft whose design capitalizes on John Boyd’s observe, orient, decide, act (OODA) loop and energy-maneuverability constructs will bring new and unmatched lethality to air-to-air combat. It submits that the machine’s combined advantages applied to the nature of the tasks would make the idea of human-inhabited platforms that challenge it resemble the mismatch depicted in The Charge of the Light Brigade.

Here’s the author’s mock-up of sixth-generation approach:

fighter-six-generation

(Select the image to see an undistorted view of both aircraft.)

Given the strides being made on the use of neural networks, I would be surprised if they are not at the core of present and future autonomous weapons systems.

You can join the debate about the ethics of autonomous weapons but the more practical approach is to read How to trick a neural network into thinking a panda is a vulture by Julia Evans.

Autonomous weapon systems will be developed by a limited handful of major military powers, at least at first, which means the market for counter-measures, such as turning such weapons against their masters, will bring a premium price. Far more than the offensive development side. Not to mention there will be a far larger market for counter-measures.

Deception, one means of turning weapons against their users, has a long history, not the earliest of which is the tale of Esau and Jacob (Genesis, chapter 26):

11 And Jacob said to Rebekah his mother, Behold, Esau my brother is a hairy man, and I am a smooth man:

12 My father peradventure will feel me, and I shall seem to him as a deceiver; and I shall bring a curse upon me, and not a blessing.

13 And his mother said unto him, Upon me be thy curse, my son: only obey my voice, and go fetch me them.

14 And he went, and fetched, and brought them to his mother: and his mother made savoury meat, such as his father loved.

15 And Rebekah took goodly raiment of her eldest son Esau, which were with her in the house, and put them upon Jacob her younger son:

16 And she put the skins of the kids of the goats upon his hands, and upon the smooth of his neck:

17 And she gave the savoury meat and the bread, which she had prepared, into the hand of her son Jacob.

Julia’s post doesn’t cover the hard case of seeing Jacob as Esau up close but in a battle field environment, the equivalent of mistaking a panda for a vulture, may be good enough.

The primary distinction that any autonomous weapons system must make is the friendly/enemy distinction. The term “friendly fire” was coined to cover cases where human directed weapons systems fail to make that distinction correctly.

The historical rate of “friendly fire” or fratricide is 2% but Mark Thompson reports in The Curse of Friendly Fire, that the actual fratricide rate in the 1991 Gulf war was 24%.

#Juniper, just to name one recent federal government software failure, is evidence that robustness isn’t an enforced requirement for government software.

Apply that lack of requirements to neural networks in autonomous weapons platforms and you have the potential for both developing and defeating autonomous weapons systems.

Julia’s post leaves you a long way from defeating an autonomous weapons platform but it is a good starting place.

PS: Defeating military grade neural networks will be good training for defeating more sophisticated ones used by commercial entities.

Amateur Discovery Confirmed by NASA

Friday, October 30th, 2015

NASA Adds to Evidence of Mysterious Ancient Earthworks by Ralph Blumenthal.

From the post:

High in the skies over Kazakhstan, space-age technology has revealed an ancient mystery on the ground.

Satellite pictures of a remote and treeless northern steppe reveal colossal earthworks — geometric figures of squares, crosses, lines and rings the size of several football fields, recognizable only from the air and the oldest estimated at 8,000 years old.

The largest, near a Neolithic settlement, is a giant square of 101 raised mounds, its opposite corners connected by a diagonal cross, covering more terrain than the Great Pyramid of Cheops. Another is a kind of three-limbed swastika, its arms ending in zigzags bent counterclockwise.

Described last year at an archaeology conference in Istanbul as unique and previously unstudied, the earthworks, in the Turgai region of northern Kazakhstan, number at least 260 — mounds, trenches and ramparts — arrayed in five basic shapes.

Spotted on Google Earth in 2007 by a Kazakh economist and archaeology enthusiast, Dmitriy Dey, the so-called Steppe Geoglyphs remain deeply puzzling and largely unknown to the outside world.

Two weeks ago, in the biggest sign so far of official interest in investigating the sites, NASA released clear satellite photographs of some of the figures from about 430 miles up.

More evidence you don’t need to be a globe trotter to make major discoveries!

A few of the satellite resources I have blogged about for your use: Free Access to EU Satellite Data, Planet Platform Beta & Open California:…, Skybox: A Tool to Help Investigate Environmental Crime.

Good luck!

What a Deep Neural Network thinks about your #selfie

Sunday, October 25th, 2015

What a Deep Neural Network thinks about your #selfie by Andrej Karpathy.

From the post:

Convolutional Neural Networks are great: they recognize things, places and people in your personal photos, signs, people and lights in self-driving cars, crops, forests and traffic in aerial imagery, various anomalies in medical images and all kinds of other useful things. But once in a while these powerful visual recognition models can also be warped for distraction, fun and amusement. In this fun experiment we’re going to do just that: We’ll take a powerful, 140-million-parameter state-of-the-art Convolutional Neural Network, feed it 2 million selfies from the internet, and train it to classify good selfies from bad ones. Just because it’s easy and because we can. And in the process we might learn how to take better selfies 🙂

A must read for anyone interested in deep neural networks and image recognition!

Selfies provide abundant and amusing data to illustrate neural network techniques that are being used every day.

Andrej provides numerous pointers to additional materials and references on neural networks. Good think considering how much interest his post is going to generate!

“The first casualty, when war comes, is truth”

Thursday, October 22nd, 2015

The quote, “The first casualty, when war comes, is truth,” is commonly attributed to Hiram Johnson a Republican politician from California in 1917. Johnson died on August 6, 1945, the day the United States dropped an atomic bomb on Hiroshima.

The ARCADE: Artillery Crater Analysis and Detection Engine is an effort to make it possible for anyone to rescue bits of the truth, even during war, at least with regard to the use of military ordinance.

From the post:

Destroyed buildings and infrastructure, temporary settlements, terrain disturbances and other signs of conflict can be seen in freely available satellite imagery. The ARtillery Crater Analysis and Detection Engine (ARCADE) is experimental computer vision software developed by Rudiment and the Centre for Visual Computing at the University of Bradford. ARCADE examines satellite imagery for signs of artillery bombardment, calculates the location of artillery craters, the inbound trajectory of projectiles to aid identification of their possible origins of fire. An early version of the tool that demonstrates the core capabilities is available here.

The software currently runs on Windows with MATLAB, but if there is enough interest, it could be ported to an open toolset built around OpenCV.

Everyone who is interested in military actions anywhere in the world should be a supporter of this project.

Given the poverty of Western reporting on bombings by the United States government around the world, I am very interested in the success of this project.

The post is a great introduction to the difficulties and potential uses of satellite data to uncover truths governments would prefer to remain hidden. That alone should be enough justification for supporting this project.

Skybox: A Tool to Help Investigate Environmental Crime

Saturday, October 10th, 2015

Skybox: A Tool to Help Investigate Environmental Crime by Kristine M. Gutterød & Emilie Gjengedal Vatnøy.

From the post:

Today public companies have to provide reports with data, while many private companies do not have to provide anything. Most companies within the oil, gas and mining sector are private, and to get information can be both expensive and time-consuming.

Skybox is a new developing tool used to extract information from an otherwise private industry. Using moving pictures on ground level—captured by satellites—you can monitor different areas up close.

“You can dig into the details and get more valuable and action-filled information for people both in the public and private sector,” explained Patrick Dunagan, strategic partnerships manager at Google, who worked in developing Skybox.

The satellite images can be useful when investigating environmental crime because you can monitor different companies, for example the change in the number of vehicles approaching or leaving a property, as well as environmental changes in the world.

Excellent news!

Hopefully Skybox will include an option to link in ground level photographs that can identify license plates and take photos of drivers.

Using GPS coordinates with time data, activists will have a means of detecting illegal and/or new dumping sites for surveillance.

Couple that with license plate data and the noose starts to tighten on environmental violators.

You will still need to pierce the shell corporations and follow links to state and local authorities but catching the physical dumpers is a first step.

I’m a bird watcher, I’m a bird watcher, here comes one now…

Thursday, June 18th, 2015

New website can identify birds using photos

From the post:

In a breakthrough for computer vision and for bird watching, researchers and bird enthusiasts have enabled computers to achieve a task that stumps most humans—identifying hundreds of bird species pictured in photos.

The bird photo identifier, developed by the Visipedia research project in collaboration with the Cornell Lab of Ornithology, is available for free at: AllAboutBirds.org/photoID.

Results will be presented by researchers from Cornell Tech and the California Institute of Technology at the Computer Vision and Pattern Recognition (CVPR) conference in Boston on June 8, 2015.

Called Merlin Bird Photo ID, the identifier is capable of recognizing 400 of the mostly commonly encountered birds in the United States and Canada.

“It gets the bird right in the top three results about 90% of the time, and it’s designed to keep improving the more people use it,” said Jessie Barry at the Cornell Lab of Ornithology. “That’s truly amazing, considering that the computer vision community started working on the challenge of bird identification only a few years ago.”

The perfect website for checking photos of birds made on summer vacation and an impressive feat of computer vision.

The more the service is used, the better it gets. Upload your vacation bird pics today!

CVPR 2015 Papers

Sunday, June 14th, 2015

CVPR [Computer Vision and Pattern Recognition] 2015 Papers by @karpathy.

This is very cool!

From the webpage:

Below every paper are TOP 100 most-occuring words in that paper and their color is based on LDA topic model with k = 7.
(It looks like 0 = datasets?, 1 = deep learning, 2 = videos , 3 = 3D Computer Vision , 4 = optimization?, 5 = low-level Computer Vision?, 6 = descriptors?)

You can sort by LDA topics, view the PDFs, rank the other papers by tf-idf similarity to a particular paper.

Very impressive and suggestive of other refinements for viewing a large number of papers in a given area.

Enjoy!

FaceNet: A Unified Embedding for Face Recognition and Clustering

Saturday, March 21st, 2015

FaceNet: A Unified Embedding for Face Recognition and Clustering by Florian Schroff, Dmitry Kalenichenko and James Philbin.

Abstract:

Despite significant recent advances in the field of face recognition, implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors.

Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches. To train, we use triplets of roughly aligned matching / non-matching face patches generated using a novel online triplet mining method. The benefit of our approach is much greater representational efficiency: we achieve state-of-the-art face recognition performance using only 128-bytes per face.

On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99.63%. On YouTube Faces DB it achieves 95.12%. Our system cuts the error rate in comparison to the best published result by 30% on both datasets. (emphasis in the original)

With accuracy at 99.63%, the possibilities are nearly endless. 😉

How long will it be before some start-up is buying ATM feeds from banks? Fast and accurate location information would be of interest to process servers, law enforcement, debt collectors, various government agencies, etc.

Looking a bit further ahead, ATM surrogate services will become a feature of better hotels and escort services.

Convolutional Neural Networks for Visual Recognition

Friday, March 20th, 2015

Convolutional Neural Networks for Visual Recognition by Fei-Fei Li and Andrej Karpathy.

From the description:

Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems. This course is a deep dive into details of the deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. During the 10-week course, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. The final assignment will involve training a multi-million parameter convolutional neural network and applying it on the largest image classification dataset (ImageNet). We will focus on teaching how to set up the problem of image recognition, the learning algorithms (e.g. backpropagation), practical engineering tricks for training and fine-tuning the networks and guide the students through hands-on assignments and a final course project. Much of the background and materials of this course will be drawn from the ImageNet Challenge.

Be sure to check out the course notes!

A very nice companion for your DIGITS experiments over the weekend.

I first saw this in a tweet by Lasse.

Show and Tell: A Neural Image Caption Generator

Sunday, November 23rd, 2014

Show and Tell: A Neural Image Caption Generator by Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan.

Abstract:

Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU score improvements on Flickr30k, from 55 to 66, and on SBU, from 19 to 27.

Another caption generating program for images. (see also, Deep Visual-Semantic Alignments for Generating Image Descriptions) Not quite to the performance of a human observer but quite respectable. The near misses are amusing enough for crowd correction to be an element in a full blown system.

Perhaps “rough recognition” is close enough for some purposes. Searching images for people who match a partial description and producing a much smaller set for additional processing.

I first saw this in Nat Torkington’s Four short links: 18 November 2014.

Deep Visual-Semantic Alignments for Generating Image Descriptions

Friday, November 21st, 2014

Deep Visual-Semantic Alignments for Generating Image Descriptions by Andrej Karpathy and Li Fei-Fei.

From the webpage:

We present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate the effectiveness of our alignment model with ranking experiments on Flickr8K, Flickr30K and COCO datasets, where we substantially improve on the state of the art. We then show that the sentences created by our generative model outperform retrieval baselines on the three aforementioned datasets and a new dataset of region-level annotations.

Excellent examples with generated text. Code and other predictions “coming soon.”

For the moment you can also read the research paper: Deep Visual-Semantic Alignments for Generating Image Descriptions

Serious potential in any event but even more so if the semantics of the descriptions could be captured and mapped across natural languages.

Guess the Manuscript XVI

Saturday, November 1st, 2014

Guess the Manuscript XVI

From the post:

Welcome to the sixteenth instalment of our popular Guess the Manuscript series. The rules are simple: we post an image of part of a manuscript that is on the British Library’s Digitised Manuscripts site, you guess which one it’s taken from!

bl mss XVI

Are you as surprised as we are to find an umbrella in a medieval manuscript? The manuscript from which this image was taken will feature in a blogpost in the near future.

In the meantime, answers or guesses please in the comments below, or via Twitter @BLMedieval.

Caution! The Medieval Period lasted from five hundred (500) C.E. until fifteen hundred (1500) C.E. Google NGrams records the first use of “umbrella” at or around sixteen-sixty (1660). Is this an “umbrella” or something else?

Using Google’s reverse image search found only repostings of the image search challenge, no similar images. Not sure that helps but was worth a try.

On the bright side, there are only two hundred and fifty-seven (257) manuscripts in the digitized collection dated between five hundred (500) C.E. until fifteen hundred (1500) C.E.

What stories or information can be found in those volumes that might be accompanied by such an image? Need to create a list of the classes of those manuscripts.

Suggestions? Is there an image processor in the house?

Enjoy!

50 Face Recognition APIs

Friday, October 24th, 2014

50 Face Recognition APIs by Mirko Krivanek.

Interesting listing published on Mashape. Only the top 12 are listed below. It would be nice to have a separate blog for voice recognition APIs. I’ve been thinking at using voice rather than passport or driving license, as a more secure ID. The voice has a texture unique to each individual.

Subjects that are likely to be of interest!

Mirko mentions voice but then lists face recognition APIs.

Voice comes up in a mixture of APIs in: 37 Recognition APIS: AT&T SPEECH, Moodstocks and Rekognition by Matthew Scott.

I first saw this in a tweet by Andrea Mostosi

Extracting images from scanned book pages

Monday, September 1st, 2014

Extracting images from scanned book pages by Chris Adams.

From the post:

I work on a project which has placed a number of books online. Over the years we’ve improved server performance and worked on a fast, responsive viewer for scanned books to make our books as accessible as possible but it’s still challenging to help visitors find something of interest out of hundreds of thousands of scanned pages.

Trevor and I have discussed various ways to improve the situation and one idea which seemed promising was seeing how hard it would be to extract the images from digitized pages so we could present a visual index of an item. Trevor’s THATCamp CHNM post on Freeing Images from Inside Digitized Books and Newspapers got a favorable reception and since it kept coming up at work I decided to see how far I could get using OpenCV.

Everything you see below is open-source and comments are highly welcome. I created a book-illustration-detection branch in my image mining project (see my previous experiment reconstructing higher-resolution thumbnails from the masters) so feel free to fork it or open issues.

Just in case you are looking for a Fall project. 😉

Consider capturing the images and their contents in associations with authors, publishers, etc. To enable mining those associations for patterns.

Large-Scale Object Classification…

Saturday, August 23rd, 2014

Large-Scale Object Classi cation using Label Relation Graphs by Jia Deng, et al.

Abstract:

In this paper we study how to perform object classi cation in a principled way that exploits the rich structure of real world labels. We develop a new model that allows encoding of flexible relations between labels. We introduce Hierarchy and Exclusion (HEX) graphs, a new formalism that captures semantic relations between any two labels applied to the same object: mutual exclusion, overlap and subsumption. We then provide rigorous theoretical analysis that illustrates properties of HEX graphs such as consistency, equivalence, and computational implications of the graph structure. Next, we propose a probabilistic classifi cation model based on HEX graphs and show that it enjoys a number of desirable properties. Finally, we evaluate our method using a large-scale benchmark. Empirical results demonstrate that our model can signifi cantly improve object classifi cation by exploiting the label relations.

Let’s hear it for “real world labels!”

By which the authors mean:

  • An object can have more than one label.
  • There are relationships between labels.

From the introduction:

We first introduce Hierarchy and Exclusion (HEX) graphs, a new formalism allowing flexible specifi cation of relations between labels applied to the same object: (1) mutual exclusion (e.g. an object cannot be dog and cat), (2) overlapping (e.g. a husky may or may not be a puppy and vice versa), and (3) subsumption (e.g. all huskies are dogs). We provide theoretical analysis on properties of HEX graphs such as consistency, equivalence, and computational implications.

Next, we propose a probabilistic classi fication model leveraging HEX graphs. In particular, it is a special type of Conditional Random Field (CRF) that encodes the label relations as pairwise potentials. We show that this model enjoys
a number of desirable properties, including flexible encoding of label relations, predictions consistent with label relations, efficient exact inference for typical graphs, learning labels with varying specifi city, knowledge transfer, and uni fication of existing models.

Having more than one label is trivially possible in topic maps. The more interesting case is the authors choosing to treat semantic labels as subjects and to define permitted associations between those subjects.

A world of possibilities opens up when you can treat something as a subject that can have relationships defined to other subjects. Noting that those relationships can also be treated as subjects should someone desire to do so.

I first saw this at: Is that husky a puppy?

AverageExplorer:…

Sunday, August 17th, 2014

AverageExplorer: Interactive Exploration and Alignment of Visual Data Collections, Jun-Yan Zhu, Yong Jae Lee, and Alexei Efros.

Abstract:

This paper proposes an interactive framework that allows a user to rapidly explore and visualize a large image collection using the medium of average images. Average images have been gaining popularity as means of artistic expression and data visualization, but the creation of compelling examples is a surprisingly laborious and manual process. Our interactive, real-time system provides a way to summarize large amounts of visual data by weighted average(s) of an image collection, with the weights reflecting user-indicated importance. The aim is to capture not just the mean of the distribution, but a set of modes discovered via interactive exploration. We pose this exploration in terms of a user interactively “editing” the average image using various types of strokes, brushes and warps, similar to a normal image editor, with each user interaction providing a new constraint to update the average. New weighted averages can be spawned and edited either individually or jointly. Together, these tools allow the user to simultaneously perform two fundamental operations on visual data: user-guided clustering and user-guided alignment, within the same framework. We show that our system is useful for various computer vision and graphics applications.

Applying averaging to images, particularly in an interactive context with users, seems like a very suitable strategy.

What would it look like to have interactive merging of proxies based on data ranges controlled by the user?

Cat Dataset

Monday, July 28th, 2014

Cat Dataset

cat

From the description:

The CAT dataset includes 10,000 cat images. For each image, we annotate the head of cat with nine points, two for eyes, one for mouth, and six for ears. The detail configuration of the annotation was shown in Figure 6 of the original paper:

Weiwei Zhang, Jian Sun, and Xiaoou Tang, “Cat Head Detection – How to Effectively Exploit Shape and Texture Features”, Proc. of European Conf. Computer Vision, vol. 4, pp.802-816, 2008.

A more accessible copy: Cat Head Detection – How to Effectively Exploit Shape and Texture Features

Prelude to a cat filter for Twitter feeds? 😉

I first saw this in a tweet by Basile Simon.

What is deep learning, and why should you care?

Saturday, July 19th, 2014

What is deep learning, and why should you care? by Pete Warden.

From the post:

neuron

When I first ran across the results in the Kaggle image-recognition competitions, I didn’t believe them. I’ve spent years working with machine vision, and the reported accuracy on tricky tasks like distinguishing dogs from cats was beyond anything I’d seen, or imagined I’d see anytime soon. To understand more, I reached out to one of the competitors, Daniel Nouri, and he demonstrated how he used the Decaf open-source project to do so well. Even better, he showed me how he was quickly able to apply it to a whole bunch of other image-recognition problems we had at Jetpac, and produce much better results than my conventional methods.

I’ve never encountered such a big improvement from a technique that was largely unheard of just a couple of years before, so I became obsessed with understanding more. To be able to use it commercially across hundreds of millions of photos, I built my own specialized library to efficiently run prediction on clusters of low-end machines and embedded devices, and I also spent months learning the dark arts of training neural networks. Now I’m keen to share some of what I’ve found, so if you’re curious about what on earth deep learning is, and how it might help you, I’ll be covering the basics in a series of blog posts here on Radar, and in a short upcoming ebook.

Pete gives a brief sketch of “deep learning” and promises more posts and a short ebook to follow.

Along those same lines you will want to see:

Microsoft Challenges Google’s Artificial Brain With ‘Project Adam’ by Daniela Hernandez (WIRED).

If you want in depth (technical) coverage, see: Deep Learning…moving beyond shallow machine learning since 2006! The reading list and references here should keep you busy for some time.

BTW, on “…shallow machine learning…” you do know the “Dark Ages” really weren’t “dark” but were so named in the Renaissance in order to show the fall into darkness (the Fall of Rome), the “Dark Ages,” and then the return of “light” in the Renaissance? See: Dark Ages (historiography).

Don’t overly credit characterizations of ages or technologies by later ages or newer technologies. They too will be found primitive and superstitious.

Deep Belief in Javascript

Wednesday, March 26th, 2014

Deep Belief in Javascript

From the webpage:

It’s an implementation of the Krizhevsky convolutional neural network architecture for object recognition in images, running entirely in the browser using Javascript and WebGL!

I built it so people can easily experiment with a classic deep belief approach to image recognition themselves, to understand both its limitations and its power, and to demonstrate that the algorithms are usable even in very restricted client-side environments like web browsers.

A very impressive demonstration of the power of Javascript to say nothing of neural networks.

You can submit your own images for “recognition.”

I first saw this in Nat Torkington’s Four short links: 24 March 2014.

Office Lens Is a Snap (Point and Map?)

Monday, March 17th, 2014

Office Lens Is a Snap

From the post:

The moment mobile-phone manufacturers added cameras to their devices, they stopped being just mobile phones. Not only have lightweight phone cameras made casual photography easy and spontaneous, they also have changed the way we record our lives. Now, with help from Microsoft Research, the Office team is out to change how we document our lives in another way—with the Office Lens app for Windows Phone 8.

Office Lens, now available in the Windows Phone Store, is one of the first apps to use the new OneNote Service API. The app is simple to use: Snap a photo of a document or a whiteboard, and upload it to OneNote, which stores the image in the cloud. If there is text in the uploaded image, OneNote’s cloud-based optical character-recognition (OCR) software turns it into editable, searchable text. Office Lens is like having a scanner in your back pocket. You can take photos of recipes, business cards, or even a whiteboard, and Office Lens will enhance the image and put it into your OneNote Quick Notes for reference or collaboration. OneNote can be downloaded for free.

Less than five (5) years ago, every automated process in Office Lens would have been a configurable setting.

Today, it’s just point and shoot.

There is an interface lesson for topic maps in the Office Lens interface.

Some people will need the Office Lens API. But, the rest of us, just want to take a picture of the whiteboard (or some other display). Automatic storage and OCR are welcome added benefits.

What about a topic map authoring interface that looks a lot like MS Word™ or Open Office. A topic map is loaded much like a spelling dictionary. When the user selects “map-it,” links are inserted that point into the topic map.

Hover over such a link and data from the topic map is displayed. Can be printed, annotated, etc.

One possible feature would be “subject check” which displays the subjects “recognized” in the document. To enable the author to correct any recognition errors.

In case you are interested, I can point you to some open source projects that have general authoring interfaces. 😉

PS: If you have a Windows phone, can you check out Office Lens for me? I am still sans a cellphone of any type. Since I don’t get out of the yard a cellphone doesn’t make much sense. But I do miss out on the latest cellphone technology. Thanks!