Archive for the ‘Image Recognition’ Category

Adversarial Learning Market Opportunity

Sunday, December 24th, 2017

The Pentagon’s New Artificial Intelligence Is Already Hunting Terrorists by Marcus Weisgerber.

From the post:

Earlier this month at an undisclosed location in the Middle East, computers using special algorithms helped intelligence analysts identify objects in a video feed from a small ScanEagle drone over the battlefield.

A few days into the trials, the computer identified objects – people, cars, types of building – correctly about 60 percent of the time. Just over a week on the job – and a handful of on-the-fly software updates later – the machine’s accuracy improved to around 80 percent. Next month, when its creators send the technology back to war with more software and hardware updates, they believe it will become even more accurate.

It’s an early win for a small team of just 12 people who started working on the project in April. Over the next year, they plan to expand the project to help automate the analysis of video feeds coming from large drones – and that’s just the beginning.

“What we’re setting the stage for is a future of human-machine teaming,” said Air Force Lt. Gen. John N.T. “Jack” Shanahan, director for defense intelligence for warfighter support, the Pentagon general who is overseeing the effort. Shanahan believes the concept will revolutionize the way the military fights.

So you will recognize Air Force Lt. Gen. John N.T. “Jack” Shanahan (Nvidia conference):

From the Nvidia conference:

Don’t change the culture. Unleash the culture.

That was the message one young officer gave Lt. General John “Jack” Shanahan — the Pentagon’s director for defense for warfighter support — who is hustling to put artificial intelligence and machine learning to work for the U.S. Defense Department.

Highlighting the growing role AI is playing in security, intelligence and defense, Shanahan spoke Wednesday during a keynote address about his team’s use of GPU-driven deep learning at our GPU Technology Conference in Washington.

Shanahan leads Project Maven, an effort launched in April to put machine learning and AI to work, starting with efforts to turn the countless hours of aerial video surveillance collected by the U.S. military into actionable intelligence.

There are at least two market opportunity for adversarial learning. The most obvious one is testing a competitor’s algorithm so it performs less well than yours on “… people, cars, types of building….”

The less obvious market requires US sales of AI-enabled weapon systems to its client states. Client states have an interest in verifying the quality of AI-enabled weapon systems, not to mention non-client states who will be interested in defeating such systems.

For any of those markets, weaponizing adversarial learning and developing a reputation for the same can’t start too soon. Is your anti-AI research department hiring?

Is it a vehicle? A helicopter? No, it’s a rifle! Messing with Machine Learning

Wednesday, December 20th, 2017

Partial Information Attacks on Real-world AI

From the post:

We’ve developed a query-efficient approach for finding adversarial examples for black-box machine learning classifiers. We can even produce adversarial examples in the partial information black-box setting, where the attacker only gets access to “scores” for a small number of likely classes, as is the case with commercial services such as Google Cloud Vision (GCV).

The post is a quick read (est. 2 minutes) with references but you really need to see:

Query-efficient Black-box Adversarial Examples by Andrew Ilyas, Logan Engstrom, Anish Athalye, Jessy Lin.


Current neural network-based image classifiers are susceptible to adversarial examples, even in the black-box setting, where the attacker is limited to query access without access to gradients. Previous methods — substitute networks and coordinate-based finite-difference methods — are either unreliable or query-inefficient, making these methods impractical for certain problems.

We introduce a new method for reliably generating adversarial examples under more restricted, practical black-box threat models. First, we apply natural evolution strategies to perform black-box attacks using two to three orders of magnitude fewer queries than previous methods. Second, we introduce a new algorithm to perform targeted adversarial attacks in the partial-information setting, where the attacker only has access to a limited number of target classes. Using these techniques, we successfully perform the first targeted adversarial attack against a commercially deployed machine learning system, the Google Cloud Vision API, in the partial information setting.

The paper contains this example:

How does it go? Seeing is believing!

Defeating image classifiers will be an exploding market for jewel merchants, bankers, diplomats, and others with reasons to avoid being captured by modern image classification systems.

Visual Domain Decathlon

Thursday, December 14th, 2017

Visual Domain Decathlon

From the webpage:

The goal of this challenge is to solve simultaneously ten image classification problems representative of very different visual domains. The data for each domain is obtained from the following image classification benchmarks:

  1. ImageNet [6].
  2. CIFAR-100 [2].
  3. Aircraft [1].
  4. Daimler pedestrian classification [3].
  5. Describable textures [4].
  6. German traffic signs [5].
  7. Omniglot. [7]
  8. SVHN [8].
  9. UCF101 Dynamic Images [9a,9b].
  10. VGG-Flowers [10].

The union of the images from the ten datasets is split in training, validation, and test subsets. Different domains contain different image categories as well as a different number of images.

The task is to train the best possible classifier to address all ten classification tasks using the training and validation subsets, apply the classifier to the test set, and send us the resulting annotation file for assessment. The winner will be determined based on a weighted average of the classification performance on each domain, using the scoring scheme described below. At test time, your model is allowed to know the ground-truth domain of each test image (ImageNet, CIFAR-100, …) but, of course, not its category.

It is up to you to make use of the data, and you can either train a single model for all tasks or ten independent ones. However, you are not allowed to use any external data source for training. Furthermore, we ask you to report the overall size of the model(s) used.

The competition is over but you can continue to submit results and check the results in the leaderboard. (There’s an idea that merits repetition.)

Will this be your entertainment game for the holidays?


Is That a Turtle in Your Pocket or Are You Just Glad To See Me?

Thursday, November 9th, 2017

Apologies to Mae West for spoiling her famous line from Sexette:

Is that a gun in your pocket, or are you just glad to see me?

Seems appropriate since Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok have created a 3-D turtle that is mistaken by neural networks as a rifle.

You can find the details in: Synthesizing Robust Adversarial Examples.


Neural network-based classifiers parallel or exceed human-level accuracy on many common tasks and are used in practical systems. Yet, neural networks are susceptible to adversarial examples, carefully perturbed inputs that cause networks to misbehave in arbitrarily chosen ways. When generated with standard methods, these examples do not consistently fool a classifier in the physical world due to viewpoint shifts, camera noise, and other natural transformations. Adversarial examples generated using standard techniques require complete control over direct input to the classifier, which is impossible in many real-world systems.

We introduce the first method for constructing real-world 3D objects that consistently fool a neural network across a wide distribution of angles and viewpoints. We present a general-purpose algorithm for generating adversarial examples that are robust across any chosen distribution of transformations. We demonstrate its application in two dimensions, producing adversarial images that are robust to noise, distortion, and affine transformation. Finally, we apply the algorithm to produce arbitrary physical 3D-printed adversarial objects, demonstrating that our approach works end-to-end in the real world. Our results show that adversarial examples are a practical concern for real-world systems.

All in good fun until you remember neural networks feed classification decisions to humans who make fire/no fire decisions and soon, fire/no fire decisions will be made by autonomous systems. Errors in classification decisions such as turtle vs. rifle will have deadly results.

What are the stakes in your neural net classification system? How easily can it be fooled by adversaries?

Cheap Tracking of Public Officials/Police

Thursday, October 12th, 2017

The use of license plate readers by law enforcement and others is on the rise. Such readers record the location of your license plate at a particular time and place. They also relieve public bodies of large sums of money.

How I replicated an $86 million project in 57 lines of code by Tait Brown details how he used open source software to create a “…good enough…” license plate reader for far less than the ticket price of $86 million.

Brown has an amusing (read unrealistic) good Samaritan scenario for his less expensive/more extensive surveillance system:

While it’s easy to get caught up in the Orwellian nature of an “always on” network of license plate snitchers, there are many positive applications of this technology. Imagine a passive system scanning fellow motorists for an abductors car that automatically alerts authorities and family members to their current location and direction.

The Teslas vehicles are already brimming with cameras and sensors with the ability to receive OTA updates — imagine turning them into a virtual fleet of good samaritans. Ubers and Lyft drivers could also be outfitted with these devices to dramatically increase the coverage area.

Using open source technology and existing components, it seems possible to offer a solution that provides a much higher rate of return — for an investment much less than $86M.

The better use of Brown’s less expensive/more extensive surveillance system is tracking police and public official cars. Invite them to the gold fish bowl they have created for all the rest of us.

A great public data resource for testing testimony about the presence/absence of police officers at crime scenes, protests, long rides to the police station and public officials consorting with co-conspirators.

ACLU calls for government to monitor itself, reflect an unhealthy confidence in governmental integrity. Only a close watch on government by citizens enables governmental integrity.

3D Face Reconstruction from a Single Image

Monday, September 18th, 2017

3D Face Reconstruction from a Single Image by Aaron S. Jackson, Adrian Bulat, Vasileios Argyriou and Georgios Tzimiropoulos, Computer Vision Laboratory, The University of Nottingham.

From the webpage:

This is an online demo of our paper Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression. Take a look at our project website to read the paper and get the code. Please use a (close to) frontal image, or the face detector won’t see you (dlib)

Images and 3D reconstructions will be deleted within 20 minutes. They will not be used for anything other than this demo.

Very impressive!

You can upload your own image or use an example face.

Here’s an example I stole from Reza Zadeh:

This has all manner of interesting possibilities. 😉


PS: Torch7/MATLAB code for “Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression”

Deanonymizing the Past

Thursday, July 6th, 2017

What Ever Happened to All the Old Racist Whites from those Civil Rights Photos? by Johnny Silvercloud raises an interesting question but never considers it from a modern technology perspective.

Silvercloud includes this lunch counter image:

I count almost twenty (20) full or partial faces in this one image. Thousands if not hundreds of thousands of other images from the civil rights era capture similar scenes.

Then it occurred to me, unlike prior generations with volumes of photographs, populated by anonymous bystanders/perpetrators to/of infamous acts, we have the present capacity to deanonimize the past.

As a starting point, may I suggest Deep Face Recognition by Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman, one of the more popular papers in this area, with 429 citations as of today (06 July 2017).


The goal of this paper is face recognition – from either a single photograph or from a set of faces tracked in a video. Recent progress in this area has been due to two factors: (i) end to end learning for the task using a convolutional neural network (CNN), and (ii) the availability of very large scale training datasets.

We make two contributions: first, we show how a very large scale dataset (2.6M images, over 2.6K people) can be assembled by a combination of automation and human in the loop, and discuss the trade off between data purity and time; second, we traverse through the complexities of deep network training and face recognition to present methods and procedures to achieve comparable state of the art results on the standard LFW and YTF face benchmarks.

That article was written in 2015 so consulting a 2017 summary update posted to Quora is advised for current details.

Banks, governments and others are using facial recognition for their own purposes, let’s also uses it to hold people responsible for their moral choices.

Moral choices at lunch counters, police riots, soldiers and camp guards from any number of countries and time periods, etc.


The Cartoon Bank

Friday, May 12th, 2017

The Cartoon Bank by the Condé Nast Collection.

While searching for a cartoon depicting Sean Spicer at a White House news briefing, I encountered The Cartoon Bank.

A great source of instantly recognized cartoons but I’m still searching for one I remember from decades ago. 😉

Turning Pixelated Faces Back Into Real Ones

Thursday, February 9th, 2017

Google’s neural networks turn pixelated faces back into real ones by John E. Dunn.

From the post:

Researchers at Google Brain have come up with a way to turn heavily pixelated images of human faces into something that bears a usable resemblance to the original subject.

In a new paper, the company’s researchers describe using neural networks put to work at two different ends of what should, on the face of it, be an incredibly difficult problem to solve: how to resolve a blocky 8 x 8 pixel images of faces or indoor scenes containing almost no information?

It’s something scientists in the field of super resolution (SR) have been working on for years, using techniques such as de-blurring and interpolation that are often not successful for this type of image. As the researchers put it:

When some details do not exist in the source image, the challenge lies not only in “deblurring” an image, but also in generating new image details that appear plausible to a human observer.

Their method involves getting the first “conditioning” neural network to resize 32 x 32 pixel images down to 8 x 8 pixels to see if that process can find a point at which they start to match the test image.

John raises a practical objection:

The obvious practical application of this would be enhancing blurry CCTV images of suspects. But getting to grips with real faces at awkward angles depends on numerous small details. Emphasise the wrong ones and police could end up looking for the wrong person.

True but John presumes the “suspects” are unknown. That’s true for the typical convenience store robbery on the 10 PM news but not so for “suspects” under intentional surveillance.

In those cases, multiple ground truth images from a variety of angles are likely to be available.

DigitalGlobe – Open Data Program [What About Government Disasters?]

Tuesday, January 31st, 2017

Open Data Program

From the post:

DigitalGlobe is committed to helping everyone See A Better World™ by providing accurate high-resolution satellite imagery to support disaster recovery in the wake of large-scale natural disasters.

We release open imagery for select sudden onset major crisis events, including pre-event imagery, post-event imagery and a crowdsourced damage assessment.

When crises occur, DigitalGlobe is committed to supporting the humanitarian community by providing critical and actionable information to assist response efforts. Associated imagery and crowdsourcing layers are released into the public domain under a Creative Commons 4.0 license, allowing for rapid use and easy integration with existing humanitarian response technologies.

Kudos to DigitalGlobe but what about government disasters?

Governments have spy satellites, image analysis corps and military trained to use multi-faceted data flow.

What of public releases for areas of conflict, Chechnya, West Bank/Gaza/Israel, etc.? To reduce the advantages of government?

That creates demand by government for the same product, plus DigitalGlobe advantages.

“It’s an ill wind that blows no good.”

Proposal to Reduce Privacy in New York City

Sunday, January 29th, 2017

Memo: New York Called For Face Recognition Cameras At Bridges, Tunnels by Kevin Collier.

From the post:

The state of New York has privately asked surveillance companies to pitch a vast camera system that would scan and identify people who drive in and out of New York City, according to a December memo obtained by Vocativ.

The call for private companies to submit plans is part of Governor Andrew Cuomo’s major infrastructure package, which he introduced in October. Though much of the related proposals would be indisputably welcome to most New Yorkers — renovating airports and improving public transportation — a little-noticed detail included installing cameras to “test emerging facial recognition software and equipment.”

The proposed system would be massive, the memo reads:

The Authority is interested in implementing a Facial Detection System, in a free-flow highway environment, where vehicle movement is unimpeded at highway speeds as well as bumper-to-bumper traffic, and license plate images are taken and matched to occupants of the vehicles (via license plate number) with Facial Detection and Recognition methods from a gantry-based or road-side monitoring location.

All seven of the MTA’s bridges and both its tunnels are named in the proposal.


Proposals only at this point but take this as fair warning.

Follow both Kevin Collier and Vocativ as plans by the State of New York to eliminate privacy for its citizens develop.


One counter measure to license plate readers is marketed under the name PhotoMaskCover.


Caution: I have never used the PhotoMaskCover product and have no relationship with its manufacturer. It claims to work. Evaluate as you would any other product from an unknown vendor.

For the facial recognition cameras, I was reminded that a hoodie and sunglasses are an easy and non-suspicious way to avoid such cameras.

For known MTA facial recognition cameras, wear a deep cowl that casts a complete shadow on your facial features. (Assuming you can drive safely with the loss of peripheral vision.)

As the number of deep cowls increase in MTA images, authorities will obsess more and more over the “unidentifieds,” spending their resources less and less effectively.

Defeating surveillance increases everyone’s freedom.

Intro to Image Processing

Sunday, November 13th, 2016

Intro to Image Processing by Eric Schles.

A short but useful introduction to some, emphasis on some, of the capabilities of OpenCV.

Understanding image processing will make you a better consumer and producer of digital imagery.

To its great surprise, the “press” recently re-discovered government isn’t to be trusted.

The same is true for the “press.”

Develop your capability to judge images offered by any source.

Introducing the Open Images Dataset

Friday, September 30th, 2016

Introducing the Open Images Dataset by Ivan Krasin and Tom Duerig.

From the post:

In the last few years, advances in machine learning have enabled Computer Vision to progress rapidly, allowing for systems that can automatically caption images to apps that can create natural language replies in response to shared photos. Much of this progress can be attributed to publicly available image datasets, such as ImageNet and COCO for supervised learning, and YFCC100M for unsupervised learning.

Today, we introduce Open Images, a dataset consisting of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories. We tried to make the dataset as practical as possible: the labels cover more real-life entities than the 1000 ImageNet classes, there are enough images to train a deep neural network from scratch and the images are listed as having a Creative Commons Attribution license*.

The image-level annotations have been populated automatically with a vision model similar to Google Cloud Vision API. For the validation set, we had human raters verify these automated labels to find and remove false positives. On average, each image has about 8 labels assigned. Here are some examples:

Impressive data set, if you want to recognize a muffin, gherkin, pebble, etc., see the full list at dict.csv.

Hopeful the techniques you develop with these images will lead to more focused image recognition. 😉

I lightly searched the list and no “non-safe” terms jumped out at me. Suitable for family image training.

srez: Image super-resolution through deep learning

Sunday, August 28th, 2016

srez: Image super-resolution through deep learning. by David Garcia.

From the webpage:

Image super-resolution through deep learning. This project uses deep learning to upscale 16×16 images by a 4x factor. The resulting 64×64 images display sharp features that are plausible based on the dataset that was used to train the neural net.

Here’s an random, non cherry-picked, example of what this network can do. From left to right, the first column is the 16×16 input image, the second one is what you would get from a standard bicubic interpolation, the third is the output generated by the neural net, and on the right is the ground truth.


Once you have collected names, you are likely to need image processing.

Here’s an interesting technique using deep learning. Face on at the moment but you can expect that to improve.

I’ll See You The FBI’s 411.9 million images and raise 300 million more, per day

Wednesday, June 15th, 2016

FBI Can Access Hundreds of Millions of Face Recognition Photos by Jennifer Lynch.

From the post:

Today the federal Government Accountability Office (GAO) finally published its exhaustive report on the FBI’s face recognition capabilities. The takeaway: FBI has access to hundreds of millions more photos than we ever thought. And the Bureau has been hiding this fact from the public—in flagrant violation of federal law and agency policy—for years.

According to the GAO Report, FBI’s Facial Analysis, Comparison, and Evaluation (FACE) Services unit not only has access to FBI’s Next Generation Identification (NGI) face recognition database of nearly 30 million civil and criminal mug shot photos, it also has access to the State Department’s Visa and Passport databases, the Defense Department’s biometric database, and the drivers license databases of at least 16 states. Totaling 411.9 million images, this is an unprecedented number of photographs, most of which are of Americans and foreigners who have committed no crimes.

I understand and share the concern over the FBI’s database of 411.9 million images from identification sources, but let’s be realistic about the FBI’s share of all the image data.

Not an exhaustive list but:

Facebook alone is equaling the FBI photo count every 1.3 days. Moreover, Facebook data is tied to both Facebook and very likely, other social media data, unlike my driver’s license.

Instagram takes a little over 5 days to exceed the FBI image count. but like the little engine that could, it keeps trying.

I’m not sure how to count YouTube’s 300 hours of video every minute.

No reliable counts are available for porn images, which streamed from Pornhub in 2015, accounted for 1,892 petabytes of data.

The Pornhub data stream includes a lot of duplication but finding non-religious and reliable stats on porn is difficult. Try searching for statistics on porn images. Speculation, guesses, etc.

Based on those figures, it’s fair to say the number of images available to the FBI is somewhere North of 100 billion and growing.

Oh, you think non-public photos off-limits to the FBI?

Hmmm, so is lying to federal judges, or so they say.

The FBI may say they are following safeguards, etc., but once a agency develops a culture of lying “in the public’s interest,” why would you ever believe them?

If you believe the FBI now, shouldn’t you say: Shame on me?

Katia – rape screening in R

Tuesday, February 16th, 2016

Katia – rape screening in R

From the webpage:

It’s Not Enough to Condemn Violence Against Women. We Need to End It.

All 12 innocent female victims above were atrociously killed, sexually assaulted, or registered missing after meeting strangers on mainstream dating, personals, classifieds, or social networking services.


Those 12 beautiful faces in the gallery above, are our sisters and daughters. Looking at their pictures is like looking through a tiny pinhole onto an unprecedented rape and domestic violence crisis that is destroying the American family unit.

Verified by science, the KATIA rape screen, coded in the computer programming language, R, can provably stop a woman from ever meeting her attacker.

The technology is named after a RAINN-counseled first degree aggravated rape survivor named Katia.

It is based on the work of a Google engineer from the Reverse Image Search project and a RAINN (Rape, Abuse & Incest National Network) counselor, with a clinical background in mathematical statistics, who has over a period of 15 years compiled a linguistic pattern analysis of the messages that rapists use to lure women online.

Learn more about the science behind Katia.

This project is taking concrete steps to reduce violence against women.

What more is there to say?

Reverse Image Search (TinEye) [Clue to a User Topic Map Interface?]

Wednesday, February 3rd, 2016

TinEye was mentioned in a post I wrote in 2015, Baltimore Burning and Verification, but I did not follow up at the time.

Unlike some US intelligence agencies, TinEye has a cool logo:


Free registration enables you to share search results with others, an important feature for news teams.

I only tested the plugin for Chrome, but it offers useful result options:


Once installed, use by hovering over an image in your browser, right “click” and select “Search image on TinEye.” Your results will be presented as set under options.

Clue to User Topic Map Interface

That is a good example of how one version of a topic map interface should work. Select some text, right “click” and “Search topic map ….(preset or selection)” with configurable result display.

That puts you into interaction with the topic map, which can offer properties to enable you to refine the identification of a subject of interest and then a merged presentation of the results.

As with a topic map, all sorts of complicated things are happening in the background with the TinEye extension.

But as a user, I’m interested in the results that FireEye presents not how it got them.

I used to say “more interested” to indicate I might care how useful results came to be assembled. That’s a pretension that isn’t true.

It might be true in some particular case, but for the vast majority of searches, I just want the (uncensored Google) results.

US Intelligence Community Logo for Same Capability

I discovered the most likely intelligence community logo for a similar search program:


The answer to the age-old question of “who watches the watchers?” is us. Which watchers are you watching?

Automatically Finding Weapons…

Wednesday, January 13th, 2016

Automatically Finding Weapons in Social Media Images Part 1 by Justin Seitz.

From the post:

As part of my previous post on gangs in Detroit, one thing had struck me: there are an awful lot of guns being waved around on social media. Shocker, I know. More importantly I began to wonder if there wasn’t a way to automatically identify when a social media post has guns or other weapons contained in them. This post will cover how to use a couple of techniques to send images to the Imagga API that will automatically tag pictures with keywords that it feels accurately describe some of the objects contained within the picture. As well, I will teach you how to use some slicing and dicing techniques in Python to help increase the accuracy of the tagging. Keep in mind that I am specifically looking for guns or firearm-related keywords, but you can easily just change the list of keywords you are interested in and try to find other things of interest like tanks, or rockets.

This blog post will cover how to handle the image tagging portion of this task. In a follow up post I will cover how to pull down all Tweets from an account and extract all the images that the user has posted (something my students do all the time!).

This rocks!

Whether you are trying to make contact with a weapon owner who isn’t in the “business” of selling guns or if you are looking for like-minded individuals, this is a great post.

Would make an interesting way to broadly tag images for inclusion in group subjects in a topic map, awaiting further refinement by algorithm or humans.

This is a great blog to follow: Automating OSINT.

Neural Networks, Recognizing Friendlies, $Billions; Friendlies as Enemies, $Priceless

Thursday, December 24th, 2015

Elon Musk merits many kudos for the recent SpaceX success.

At the same time, Elon has been nominated for Luddite of the Year, along with Bill Gates and Stephen Hawking, for fanning fears of artificial intelligence.

One favorite target for such fears are autonomous weapons systems. Hannah Junkerman annotated a list of 18 posts, articles and books on such systems for Just Security.

While moralists are wringing their hands, military forces have not let grass grow under their feet with regard to autonomous weapon systems. As Michael Carl Haas reports in Autonomous Weapon Systems: The Military’s Smartest Toys?:

Military forces that rely on armed robots to select and destroy certain types of targets without human intervention are no longer the stuff of science fiction. In fact, swarming anti-ship missiles that acquire and attack targets based on pre-launch input, but without any direct human involvement—such as the Soviet Union’s P-700 Granit—have been in service for decades. Offensive weapons that have been described as acting autonomously—such as the UK’s Brimstone anti-tank missile and Norway’s Joint Strike Missile—are also being fielded by the armed forces of Western nations. And while governments deny that they are working on armed platforms that will apply force without direct human oversight, sophisticated strike systems that incorporate significant features of autonomy are, in fact, being developed in several countries.

In the United States, the X-47B unmanned combat air system (UCAS) has been a definite step in this direction, even though the Navy is dodging the issue of autonomous deep strike for the time being. The UK’s Taranis is now said to be “merely” semi-autonomous, while the nEUROn developed by France, Greece, Italy, Spain, Sweden and Switzerland is explicitly designed to demonstrate an autonomous air-to-ground capability, as appears to be case with Russia’s MiG Skat. While little is known about China’s Sharp Sword, it is unlikely to be far behind its competitors in conceptual terms.

The reasoning of military planners in favor of autonomous weapons systems isn’t hard to find, especially when one article describes air-to-air combat between tactically autonomous and machine-piloted aircraft versus piloted aircraft this way:

This article claims that a tactically autonomous, machine-piloted aircraft whose design capitalizes on John Boyd’s observe, orient, decide, act (OODA) loop and energy-maneuverability constructs will bring new and unmatched lethality to air-to-air combat. It submits that the machine’s combined advantages applied to the nature of the tasks would make the idea of human-inhabited platforms that challenge it resemble the mismatch depicted in The Charge of the Light Brigade.

Here’s the author’s mock-up of sixth-generation approach:


(Select the image to see an undistorted view of both aircraft.)

Given the strides being made on the use of neural networks, I would be surprised if they are not at the core of present and future autonomous weapons systems.

You can join the debate about the ethics of autonomous weapons but the more practical approach is to read How to trick a neural network into thinking a panda is a vulture by Julia Evans.

Autonomous weapon systems will be developed by a limited handful of major military powers, at least at first, which means the market for counter-measures, such as turning such weapons against their masters, will bring a premium price. Far more than the offensive development side. Not to mention there will be a far larger market for counter-measures.

Deception, one means of turning weapons against their users, has a long history, not the earliest of which is the tale of Esau and Jacob (Genesis, chapter 26):

11 And Jacob said to Rebekah his mother, Behold, Esau my brother is a hairy man, and I am a smooth man:

12 My father peradventure will feel me, and I shall seem to him as a deceiver; and I shall bring a curse upon me, and not a blessing.

13 And his mother said unto him, Upon me be thy curse, my son: only obey my voice, and go fetch me them.

14 And he went, and fetched, and brought them to his mother: and his mother made savoury meat, such as his father loved.

15 And Rebekah took goodly raiment of her eldest son Esau, which were with her in the house, and put them upon Jacob her younger son:

16 And she put the skins of the kids of the goats upon his hands, and upon the smooth of his neck:

17 And she gave the savoury meat and the bread, which she had prepared, into the hand of her son Jacob.

Julia’s post doesn’t cover the hard case of seeing Jacob as Esau up close but in a battle field environment, the equivalent of mistaking a panda for a vulture, may be good enough.

The primary distinction that any autonomous weapons system must make is the friendly/enemy distinction. The term “friendly fire” was coined to cover cases where human directed weapons systems fail to make that distinction correctly.

The historical rate of “friendly fire” or fratricide is 2% but Mark Thompson reports in The Curse of Friendly Fire, that the actual fratricide rate in the 1991 Gulf war was 24%.

#Juniper, just to name one recent federal government software failure, is evidence that robustness isn’t an enforced requirement for government software.

Apply that lack of requirements to neural networks in autonomous weapons platforms and you have the potential for both developing and defeating autonomous weapons systems.

Julia’s post leaves you a long way from defeating an autonomous weapons platform but it is a good starting place.

PS: Defeating military grade neural networks will be good training for defeating more sophisticated ones used by commercial entities.

Amateur Discovery Confirmed by NASA

Friday, October 30th, 2015

NASA Adds to Evidence of Mysterious Ancient Earthworks by Ralph Blumenthal.

From the post:

High in the skies over Kazakhstan, space-age technology has revealed an ancient mystery on the ground.

Satellite pictures of a remote and treeless northern steppe reveal colossal earthworks — geometric figures of squares, crosses, lines and rings the size of several football fields, recognizable only from the air and the oldest estimated at 8,000 years old.

The largest, near a Neolithic settlement, is a giant square of 101 raised mounds, its opposite corners connected by a diagonal cross, covering more terrain than the Great Pyramid of Cheops. Another is a kind of three-limbed swastika, its arms ending in zigzags bent counterclockwise.

Described last year at an archaeology conference in Istanbul as unique and previously unstudied, the earthworks, in the Turgai region of northern Kazakhstan, number at least 260 — mounds, trenches and ramparts — arrayed in five basic shapes.

Spotted on Google Earth in 2007 by a Kazakh economist and archaeology enthusiast, Dmitriy Dey, the so-called Steppe Geoglyphs remain deeply puzzling and largely unknown to the outside world.

Two weeks ago, in the biggest sign so far of official interest in investigating the sites, NASA released clear satellite photographs of some of the figures from about 430 miles up.

More evidence you don’t need to be a globe trotter to make major discoveries!

A few of the satellite resources I have blogged about for your use: Free Access to EU Satellite Data, Planet Platform Beta & Open California:…, Skybox: A Tool to Help Investigate Environmental Crime.

Good luck!

What a Deep Neural Network thinks about your #selfie

Sunday, October 25th, 2015

What a Deep Neural Network thinks about your #selfie by Andrej Karpathy.

From the post:

Convolutional Neural Networks are great: they recognize things, places and people in your personal photos, signs, people and lights in self-driving cars, crops, forests and traffic in aerial imagery, various anomalies in medical images and all kinds of other useful things. But once in a while these powerful visual recognition models can also be warped for distraction, fun and amusement. In this fun experiment we’re going to do just that: We’ll take a powerful, 140-million-parameter state-of-the-art Convolutional Neural Network, feed it 2 million selfies from the internet, and train it to classify good selfies from bad ones. Just because it’s easy and because we can. And in the process we might learn how to take better selfies 🙂

A must read for anyone interested in deep neural networks and image recognition!

Selfies provide abundant and amusing data to illustrate neural network techniques that are being used every day.

Andrej provides numerous pointers to additional materials and references on neural networks. Good think considering how much interest his post is going to generate!

“The first casualty, when war comes, is truth”

Thursday, October 22nd, 2015

The quote, “The first casualty, when war comes, is truth,” is commonly attributed to Hiram Johnson a Republican politician from California in 1917. Johnson died on August 6, 1945, the day the United States dropped an atomic bomb on Hiroshima.

The ARCADE: Artillery Crater Analysis and Detection Engine is an effort to make it possible for anyone to rescue bits of the truth, even during war, at least with regard to the use of military ordinance.

From the post:

Destroyed buildings and infrastructure, temporary settlements, terrain disturbances and other signs of conflict can be seen in freely available satellite imagery. The ARtillery Crater Analysis and Detection Engine (ARCADE) is experimental computer vision software developed by Rudiment and the Centre for Visual Computing at the University of Bradford. ARCADE examines satellite imagery for signs of artillery bombardment, calculates the location of artillery craters, the inbound trajectory of projectiles to aid identification of their possible origins of fire. An early version of the tool that demonstrates the core capabilities is available here.

The software currently runs on Windows with MATLAB, but if there is enough interest, it could be ported to an open toolset built around OpenCV.

Everyone who is interested in military actions anywhere in the world should be a supporter of this project.

Given the poverty of Western reporting on bombings by the United States government around the world, I am very interested in the success of this project.

The post is a great introduction to the difficulties and potential uses of satellite data to uncover truths governments would prefer to remain hidden. That alone should be enough justification for supporting this project.

Skybox: A Tool to Help Investigate Environmental Crime

Saturday, October 10th, 2015

Skybox: A Tool to Help Investigate Environmental Crime by Kristine M. Gutterød & Emilie Gjengedal Vatnøy.

From the post:

Today public companies have to provide reports with data, while many private companies do not have to provide anything. Most companies within the oil, gas and mining sector are private, and to get information can be both expensive and time-consuming.

Skybox is a new developing tool used to extract information from an otherwise private industry. Using moving pictures on ground level—captured by satellites—you can monitor different areas up close.

“You can dig into the details and get more valuable and action-filled information for people both in the public and private sector,” explained Patrick Dunagan, strategic partnerships manager at Google, who worked in developing Skybox.

The satellite images can be useful when investigating environmental crime because you can monitor different companies, for example the change in the number of vehicles approaching or leaving a property, as well as environmental changes in the world.

Excellent news!

Hopefully Skybox will include an option to link in ground level photographs that can identify license plates and take photos of drivers.

Using GPS coordinates with time data, activists will have a means of detecting illegal and/or new dumping sites for surveillance.

Couple that with license plate data and the noose starts to tighten on environmental violators.

You will still need to pierce the shell corporations and follow links to state and local authorities but catching the physical dumpers is a first step.

I’m a bird watcher, I’m a bird watcher, here comes one now…

Thursday, June 18th, 2015

New website can identify birds using photos

From the post:

In a breakthrough for computer vision and for bird watching, researchers and bird enthusiasts have enabled computers to achieve a task that stumps most humans—identifying hundreds of bird species pictured in photos.

The bird photo identifier, developed by the Visipedia research project in collaboration with the Cornell Lab of Ornithology, is available for free at:

Results will be presented by researchers from Cornell Tech and the California Institute of Technology at the Computer Vision and Pattern Recognition (CVPR) conference in Boston on June 8, 2015.

Called Merlin Bird Photo ID, the identifier is capable of recognizing 400 of the mostly commonly encountered birds in the United States and Canada.

“It gets the bird right in the top three results about 90% of the time, and it’s designed to keep improving the more people use it,” said Jessie Barry at the Cornell Lab of Ornithology. “That’s truly amazing, considering that the computer vision community started working on the challenge of bird identification only a few years ago.”

The perfect website for checking photos of birds made on summer vacation and an impressive feat of computer vision.

The more the service is used, the better it gets. Upload your vacation bird pics today!

CVPR 2015 Papers

Sunday, June 14th, 2015

CVPR [Computer Vision and Pattern Recognition] 2015 Papers by @karpathy.

This is very cool!

From the webpage:

Below every paper are TOP 100 most-occuring words in that paper and their color is based on LDA topic model with k = 7.
(It looks like 0 = datasets?, 1 = deep learning, 2 = videos , 3 = 3D Computer Vision , 4 = optimization?, 5 = low-level Computer Vision?, 6 = descriptors?)

You can sort by LDA topics, view the PDFs, rank the other papers by tf-idf similarity to a particular paper.

Very impressive and suggestive of other refinements for viewing a large number of papers in a given area.


FaceNet: A Unified Embedding for Face Recognition and Clustering

Saturday, March 21st, 2015

FaceNet: A Unified Embedding for Face Recognition and Clustering by Florian Schroff, Dmitry Kalenichenko and James Philbin.


Despite significant recent advances in the field of face recognition, implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors.

Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches. To train, we use triplets of roughly aligned matching / non-matching face patches generated using a novel online triplet mining method. The benefit of our approach is much greater representational efficiency: we achieve state-of-the-art face recognition performance using only 128-bytes per face.

On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99.63%. On YouTube Faces DB it achieves 95.12%. Our system cuts the error rate in comparison to the best published result by 30% on both datasets. (emphasis in the original)

With accuracy at 99.63%, the possibilities are nearly endless. 😉

How long will it be before some start-up is buying ATM feeds from banks? Fast and accurate location information would be of interest to process servers, law enforcement, debt collectors, various government agencies, etc.

Looking a bit further ahead, ATM surrogate services will become a feature of better hotels and escort services.

Convolutional Neural Networks for Visual Recognition

Friday, March 20th, 2015

Convolutional Neural Networks for Visual Recognition by Fei-Fei Li and Andrej Karpathy.

From the description:

Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems. This course is a deep dive into details of the deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. During the 10-week course, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. The final assignment will involve training a multi-million parameter convolutional neural network and applying it on the largest image classification dataset (ImageNet). We will focus on teaching how to set up the problem of image recognition, the learning algorithms (e.g. backpropagation), practical engineering tricks for training and fine-tuning the networks and guide the students through hands-on assignments and a final course project. Much of the background and materials of this course will be drawn from the ImageNet Challenge.

Be sure to check out the course notes!

A very nice companion for your DIGITS experiments over the weekend.

I first saw this in a tweet by Lasse.

Show and Tell: A Neural Image Caption Generator

Sunday, November 23rd, 2014

Show and Tell: A Neural Image Caption Generator by Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan.


Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU score improvements on Flickr30k, from 55 to 66, and on SBU, from 19 to 27.

Another caption generating program for images. (see also, Deep Visual-Semantic Alignments for Generating Image Descriptions) Not quite to the performance of a human observer but quite respectable. The near misses are amusing enough for crowd correction to be an element in a full blown system.

Perhaps “rough recognition” is close enough for some purposes. Searching images for people who match a partial description and producing a much smaller set for additional processing.

I first saw this in Nat Torkington’s Four short links: 18 November 2014.

Deep Visual-Semantic Alignments for Generating Image Descriptions

Friday, November 21st, 2014

Deep Visual-Semantic Alignments for Generating Image Descriptions by Andrej Karpathy and Li Fei-Fei.

From the webpage:

We present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate the effectiveness of our alignment model with ranking experiments on Flickr8K, Flickr30K and COCO datasets, where we substantially improve on the state of the art. We then show that the sentences created by our generative model outperform retrieval baselines on the three aforementioned datasets and a new dataset of region-level annotations.

Excellent examples with generated text. Code and other predictions “coming soon.”

For the moment you can also read the research paper: Deep Visual-Semantic Alignments for Generating Image Descriptions

Serious potential in any event but even more so if the semantics of the descriptions could be captured and mapped across natural languages.

Guess the Manuscript XVI

Saturday, November 1st, 2014

Guess the Manuscript XVI

From the post:

Welcome to the sixteenth instalment of our popular Guess the Manuscript series. The rules are simple: we post an image of part of a manuscript that is on the British Library’s Digitised Manuscripts site, you guess which one it’s taken from!

bl mss XVI

Are you as surprised as we are to find an umbrella in a medieval manuscript? The manuscript from which this image was taken will feature in a blogpost in the near future.

In the meantime, answers or guesses please in the comments below, or via Twitter @BLMedieval.

Caution! The Medieval Period lasted from five hundred (500) C.E. until fifteen hundred (1500) C.E. Google NGrams records the first use of “umbrella” at or around sixteen-sixty (1660). Is this an “umbrella” or something else?

Using Google’s reverse image search found only repostings of the image search challenge, no similar images. Not sure that helps but was worth a try.

On the bright side, there are only two hundred and fifty-seven (257) manuscripts in the digitized collection dated between five hundred (500) C.E. until fifteen hundred (1500) C.E.

What stories or information can be found in those volumes that might be accompanied by such an image? Need to create a list of the classes of those manuscripts.

Suggestions? Is there an image processor in the house?