Archive for the ‘Image Recognition’ Category

Content Based Image Retrieval (CBIR)

Sunday, February 3rd, 2013

MapReduce Paves the Way for CBIR

From the post:

Recently, content based image retrieval (CBIR) has gained active research focus due to wide applications such as crime prevention, medicine, historical research and digital libraries.

As a research team from the School of Science, Information Technology and Engineering at theUniversity of Ballarat, Australia has suggested, image collections in databases in distributed locations over the Internet pose a challenge to retrieve images that are relevant to user queries efficiently and accurately.

The researchers say that with this in mind, it has become increasingly important to develop new CBIR techniques that are effective and scalable for real-time processing of very large image collections. To address this, the offer up a novel MapReduce neural network framework for CBIR from large data collection in a cloud environment.

Reference to the paper: MapReduce neural network framework for efficient content based image retrieval from large datasets in the cloud by Sitalakshmi Venkatraman. (In Hybrid Intelligent Systems (HIS), 2012 12th International Conference on)

Abstract:

Recently, content based image retrieval (CBIR) has gained active research focus due to wide applications such as crime prevention, medicine, historical research and digital libraries. With digital explosion, image collections in databases in distributed locations over the Internet pose a challenge to retrieve images that are relevant to user queries efficiently and accurately. It becomes increasingly important to develop new CBIR techniques that are effective and scalable for real-time processing of very large image collections. To address this, the paper proposes a novel MapReduce neural network framework for CBIR from large data collection in a cloud environment. We adopt natural language queries that use a fuzzy approach to classify the colour images based on their content and apply Map and Reduce functions that can operate in cloud clusters for arriving at accurate results in real-time. Preliminary experimental results for classifying and retrieving images from large data sets were quite convincing to carry out further experimental evaluations.

Sounds like the basis for a user-augmented index of visual content to me.

You?

Content-Based Image Retrieval at the End of the Early Years

Tuesday, January 22nd, 2013

Content-Based Image Retrieval at the End of the Early Years by Arnold W.M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. (Smeulders, A.W.M.; Worring, M.; Santini, S.; Gupta, A.; Jain, R.; , “Content-based image retrieval at the end of the early years,” Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.22, no.12, pp.1349-1380, Dec 2000
doi: 10.1109/34.895972)

Abstract:

Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.

Excellent survey article from 2000 (not 2002 as per the Ostermann paper).

I think you will appreciate the treatment of the “semantic gap,” both in terms of its description as well as ways to address it.

If you are using annotated images in your topic map application, definitely a must read.

Developing New Ways to Search for Web Images

Thursday, November 22nd, 2012

Developing New Ways to Search for Web Images by Shar Steed.

From the post:

Collections of photos, images, and videos are quickly coming to dominate the content available on the Web. Currently internet search engines rely on the text with which the images are labeled to return matches. But why is only text being used to search visual mediums? These labels can be unreliable, unhelpful and sometimes not available at all.

To solve this problem, scientists at Stanford and Princeton have been working to “create a new generation of visual search technologies.” Dr. Fei-Fei Li, a computer scientist at Stanford, has built the world’s largest visual database, containing more than 14 million labeled objects.

A system called ImageNet, applies the data gathered from the database to recognize similar, unlabeled objects with much greater accuracy than past algorithms.

A remarkable amount of material to work with, either via the API or downloading for your own hacking.

Another tool for assisting in the authoring of topic maps (or other content).

BioID face database

Saturday, November 17th, 2012

BioID face database

From the webpage:

The BioID Face Database has been recorded and is published to give all researchers working in the area of face detection the possibility to compare the quality of their face detection algorithms with others. It may be used for such purposes without further permission. During the recording special emphasis has been placed on “real world” conditions. Therefore the testset features a large variety of illumination, background, and face size. Some typical sample images are shown below. (click to enlarge the images)

Just in case you are interested in face detection + topic maps.

I first saw this in Face detection using Python and OpenCV.

The Ultimate User Experience

Tuesday, October 23rd, 2012

The Ultimate User Experience by Tim R. Todish.

From the post:

Today, more people have mobile phones than have electricity or safe drinking water. In India, there are more cell phones than toilets! We all have access to incredible technology, and as designers and developers, we have the opportunity to use this pervasive technology in powerful ways that can change people’s lives.

In fact, a single individual can now create an application that can literally change the lives of people across the globe. With that in mind, I’m going to highlight some examples of designers and developers using their craft to help improve the lives of people around the world in the hope that you will be encouraged to find ways to do the same with your own skills and talents.

I may have to get a cell phone to get a better understanding of its potential when combined with topic maps.

For example, the “hot” night spots are well known in New York City. What if a distributed information network imaged guests as they arrived/left and maintained a real time map of images + locations (no names)?

That would make a nice subscription service, perhaps with faceted searching by physical characteristics.

A Glance at Information-Geometric Signal Processing

Thursday, October 18th, 2012

A Glance at Information-Geometric Signal Processing by Frank Nielsen.

Slides from the MAHI workship (Methodological Aspects of Hyperspectral Imaging)

From the workshop homepage:

The scope of the MAHI workshop is to explore new pathways that can potentially lead to breakthroughs in the extraction of the informative content of hyperspectral images. It will bring together researchers involved in hyperspectral image processing and in various innovative aspects of data processing.

Images, their informational content and the tools to analyze them have semantics too.

Visual Clues: A Brain “feature,” not a “bug”

Saturday, September 29th, 2012

You will read in When Your Eyes Tell Your Hands What to Think: You’re Far Less in Control of Your Brain Than You Think that:

You’ve probably never given much thought to the fact that picking up your cup of morning coffee presents your brain with a set of complex decisions. You need to decide how to aim your hand, grasp the handle and raise the cup to your mouth, all without spilling the contents on your lap.

A new Northwestern University study shows that, not only does your brain handle such complex decisions for you, it also hides information from you about how those decisions are made.

“Our study gives a salient example,” said Yangqing ‘Lucie’ Xu, lead author of the study and a doctoral candidate in psychology at Northwestern. “When you pick up an object, your brain automatically decides how to control your muscles based on what your eyes provide about the object’s shape. When you pick up a mug by the handle with your right hand, you need to add a clockwise twist to your grip to compensate for the extra weight that you see on the left side of the mug.

“We showed that the use of this visual information is so powerful and automatic that we cannot turn it off. When people see an object weighted in one direction, they actually can’t help but ‘feel’ the weight in that direction, even when they know that we’re tricking them,” Xu said. (emphasis added)

I never quite trusted my brain and now I have proof that it is untrustworthy. Hiding stuff indeed! ;-)

But that’s the trick of subject identification/identity isn’t it?

That our brains “recognize” all manner of subjects without any effort on our part.

Another part of the effortless features of our brains. But it hides the information we need to integrate information stores from ourselves and others.

Or rather, making it more work than we are usually willing to devote to digging it out.

When called upon to be “explicit” about subject identification, or even worse, to imagine how other people identify subjects, we prefer to stay at home consuming passive entertainment.

Two quick points:

First, need to think about how to incorporate this “feature” into delivery interfaces for users.

Second, what subjects would users pay others to mine/collate/identify for them? (Delivery being a separate issue.)

“What Makes Paris Look Like Paris?”

Saturday, September 1st, 2012

“What Makes Paris Look Like Paris?” by Erwin Gianchandani.

From the post:

We all identify cities by certain attributes, such as building architecture, street signage, even the lamp posts and parking meters dotting the sidewalks. Now there’s a neat study by computer graphics researchers at Carnegie Mellon University — presented at SIGGRAPH 2012 earlier this month — that develops novel computational techniques to analyze imagery in Google Street View and identify what gives a city its character….

From the abstract:

Given a large repository of geotagged imagery, we seek to automatically find visual elements, e.g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering approach able to take into account the weak geographic supervision. We show that geographically representative image elements can be discovered automatically from Google Street View imagery in a discriminative manner. We demonstrate that these elements are visually interpretable and perceptually geo-informative. The discovered visual elements can also support a variety of computational geography tasks, such as mapping architectural correspondences and influences within and across cities, finding representative elements at different geo-spatial scales, and geographically-informed image retrieval.

The video and other resources are worth the time to review/read.

What features do you rely on to “recognize” a city?

The potential to explore features within a city or between cities looks particularly promising.

Finding Waldo, a flag on the moon and multiple choice tests, with R

Sunday, May 20th, 2012

Finding Waldo, a flag on the moon and multiple choice tests, with R by Arthur Charpentier.

From the post:

I have to admit, first, that finding Waldo has been a difficult task. And I did not succeed. Neither could I correctly spot his shirt (because actually, it was what I was looking for). You know, that red-and-white striped shirt. I guess it should have been possible to look for Waldo’s face (assuming that his face does not change) but I still have problems with size factor (and resolution issues too). The problem is not that simple. At the http://mlsp2009.conwiz.dk/ conference, a price was offered for writing an algorithm in Matlab. And one can even find Mathematica codes online. But most of the those algorithms are based on the idea that we look for similarities with Waldo’s face, as described in problem 3 on http://www1.cs.columbia.edu/~blake/‘s webpage. You can find papers on that problem, e.g. Friencly & Kwan (2009) (based on statistical techniques, but Waldo is here a pretext to discuss other issues actually), or more recently (but more complex) Garg et al. (2011) on matching people in images of crowds.

Not sure how often you will want to find Waldo but then you may not be looking for Waldo.

Tipped off to this post by Simply Statistics.

Are visual dictionaries generalizable?

Sunday, May 13th, 2012

Are visual dictionaries generalizable? by Otavio A. B. Penatti, Eduardo Valle, and Ricardo da S. Torres

Abstract:

Mid-level features based on visual dictionaries are today a cornerstone of systems for classification and retrieval of images. Those state-of-the-art representations depend crucially on the choice of a codebook (visual dictionary), which is usually derived from the dataset. In general-purpose, dynamic image collections (e.g., the Web), one cannot have the entire collection in order to extract a representative dictionary. However, based on the hypothesis that the dictionary reflects only the diversity of low-level appearances and does not capture semantics, we argue that a dictionary based on a small subset of the data, or even on an entirely different dataset, is able to produce a good representation, provided that the chosen images span a diverse enough portion of the low-level feature space. Our experiments confirm that hypothesis, opening the opportunity to greatly alleviate the burden in generating the codebook, and confirming the feasibility of employing visual dictionaries in large-scale dynamic environments.

The authors use the Caltech-101 image set because of its “diversity.” Odd because they cite the Caltech-256 image set, which was created to answer concerns about the lack of diversity in the Caltech-101 image set.

Not sure this paper answers the issues it raises about visual dictionaries.

Wanted to bring it to your attention because representative dictionaries (as opposed to comprehensive ones) may be lurking just beyond the semantic horizon.

SAXually Explicit Images: Data Mining Large Shape Databases

Monday, April 2nd, 2012

SAXually Explicit Images: Data Mining Large Shape Databases by Eamonn Keogh.

ABSTRACT

The problem of indexing large collections of time series and images has received much attention in the last decade, however we argue that there is potentially great untapped utility in data mining such collections. Consider the following two concrete examples of problems in data mining.

Motif Discovery (duplication detection): Given a large repository of time series or images, find approximately repeated patterns/images.

Discord Discovery: Given a large repository of time series or images, find the most unusual time series/image.

As we will show, both these problems have applications in fields as diverse as anthropology, crime…

Ancient history in the view of some, this is a Google talk from 2006!

But, it is quite well done and I enjoyed the unexpected application of time series representation to shape data for purposes of evaluating matches. It is one of those insights that will stay with you and that seems obvious after they say it.

I think topic map authors (semantic investigators generally) need to report such insights for the benefit of others.

Incremental face recognition for large-scale social network services

Saturday, March 31st, 2012

Incremental face recognition for large-scale social network services by Kwontaeg Choia, Kar-Ann Tohb, and Hyeran Byuna.

Abstract:

Due to the rapid growth of social network services such as Facebook and Twitter, incorporation of face recognition in these large-scale web services is attracting much attention in both academia and industry. The major problem in such applications is to deal efficiently with the growing number of samples as well as local appearance variations caused by diverse environments for the millions of users over time. In this paper, we focus on developing an incremental face recognition method for Twitter application. Particularly, a data-independent feature extraction method is proposed via binarization of a Gabor filter. Subsequently, the dimension of our Gabor representation is reduced considering various orientations at different grid positions. Finally, an incremental neural network is applied to learn the reduced Gabor features. We apply our method to a novel application which notifies new photograph uploading to related users without having their ID being identified. Our extensive experiments show that the proposed algorithm significantly outperforms several incremental face recognition methods with a dramatic reduction in computational speed. This shows the suitability of the proposed method for a large-scale web service with millions of users.

Any number of topic map uses suggest themselves for robust face recognition software.

What’s yours?

Computer Vision & Math

Tuesday, December 27th, 2011

Computer Vision & Math

From the website:

The main part of this site is called Home of Math. It’s an online mathematics textbook that contains over 800 articles with over 2000 illustrations. The level varies from beginner to advanced.

Try our image analysis software. Pixcavator is a light-weight program intended for scientists and engineers who want to automate their image analysis tasks but lack a significant computing background. This image analysis software allows the analyst to concentrate on the science and lets us take care of the math.

If you create image analysis applications, consider Pixcavator SDK. It provides a simple tool for developing new image analysis software in a variety of fields. It allows the software developer to concentrate on the user’s needs instead of development of custom algorithms.

Google1000 dataset

Thursday, November 10th, 2011

Google1000 dataset

From the post:

This is a dataset of scans of 1000 public domain books that was released to the public at ICDAR 2007. At the time there was no public serving infrastructure, so few people actually got the 120GB dataset. It has since been hosted on Google Cloud Storage and made available for public download: (see the post for the links)

Intended for OCR and machine learning purposes. The results of which you may wish to unite in topic maps with other resources.

The Revolution(s) Are Being Televised

Saturday, September 17th, 2011

Revolutions usually mean human rights violations, lots of them.

Patrick Meier has a project to collect evidence of mass human rights violations in Syria.

See: Help Crowdsource Satellite Imagery Analysis for Syria: Building a Library of Evidence

Topic maps are an ideal solution to link objects in dated satellite images to eye witness accounts, captured military documents, ground photos, news accounts and other information.

I say that for two reasons:

First, with a topic map you can start from any linked object in a photo, a witness account, ground photo or news account and see all related evidence for that location. Granted that takes someone authoring that collation but it doesn’t have to be only one someone.

Second, topic maps offer parallel subject processing, which can distribute the authoring task in a crowd-sourced project, for instance. For example, I could be doing photo analysis and marking the location of military checkpoints. That would generate topics and associations for the geographic location, the type of installation, dates (from the photos), etc. Someone else could be interviewing witnesses and taking their testimony. As part of the processing of that testimony, another volunteer codes an approximate date and geographic location in connection with part of that testimony. Still another person is coding military orders by identified individuals for checkpoints that include the one in question. Associations between all these separately encoded bits of evidence, each unknown to the individual volunteers becomes a mouse-click away from coming to the attention of anyone reviewing the evidence. And determining responsibility.

The alternative, the one most commonly used, is to have an under-staffed international group piece together the best evidence it can from a sea of documents, photos, witness accounts, etc. An adequate job for the resources they have, but why settle for an “adequate” job when it can be done properly with 21st century technology?

GRASS: Geographic Resources Analysis Support System

Saturday, September 17th, 2011

GRASS: Geographic Resources Analysis Support System

The post about satellite imagery analysis for Syria made me curious about tools for use for automated analysis of satellite images.

From the webpage:

Commonly referred to as GRASS, this is free Geographic Information System (GIS) software used for geospatial data management and analysis, image processing, graphics/maps production, spatial modeling, and visualization. GRASS is currently used in academic and commercial settings around the world, as well as by many governmental agencies and environmental consulting companies. GRASS is an official project of the Open Source Geospatial Foundation.

You may also want to visit the Open Dragon project.

From the Open Dragon site:

Availability of good software for teaching Remote Sensing and GIS has always been a problem. Commercial software, no matter how good a discount is offered, remains expensive for a developing country, cannot be distributed to students, and may not be appropriate for education. Home-grown and university-sourced software lacks long-term support and the needed usability and robustness engineering.

The OpenDragon Project was established in the Department of Computer Engineering of KMUTT in December of 2004. The primary objective of this project is to develop, enhance, and maintain a high-quality, commercial-grade software package for remote sensing and GIS analysis that can be distributed free to educational organizations within Thailand. This package, OpenDragon, is based on the Version 5 of the commercial Dragon/ips® software developed and marketed by Goldin-Rudahl Systems, Inc.

As of 2010, Goldin-Rudahl Systems has agreed that the Open Dragon software, based on Dragon version 5, will be open source for non-commercial use. The software source code should be available on this server by early 2011.

And there is always the commercial side, if you have funding ArcGIS. The makers of ArcGIS, Esri support a several open source GIS projects.

The results of using these or other software packages can be tied to other information using topic maps.

New Challenges in Distributed Information Filtering and Retrieval

Sunday, September 11th, 2011

New Challenges in Distributed Information Filtering and Retrieval

Proceedings of the 5th International Workshop on New Challenges in Distributed Information Filtering and Retrieval
Palermo, Italy, September 17, 2011.

Edited by:

Cristian Lai – CRS4, Loc. Piscina Manna, Building 1 – 09010 Pula (CA), Italy

Giovanni Semeraro – Dept. of Computer Science, University of Bari, Aldo Moro, Via E. Orabona, 4, 70125 Bari, Italy

Eloisa Vargiu – Dept. of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, 09123 Cagliari, Italy

Table of Contents:

  1. Experimenting Text Summarization on Multimodal Aggregation
    Giuliano Armano, Alessandro Giuliani, Alberto Messina, Maurizio Montagnuolo, Eloisa Vargiu
  2. From Tags to Emotions: Ontology-driven Sentimental Analysis in the Social Semantic Web
    Matteo Baldoni, Cristina Baroglio, Viviana Patti, Paolo Rena
  3. A Multi-Agent Decision Support System for Dynamic Supply Chain Organization
    Luca Greco, Liliana Lo Presti, Agnese Augello, Giuseppe Lo Re, Marco La Cascia, Salvatore Gaglio
  4. A Formalism for Temporal Annotation and Reasoning of Complex Events in Natural Language
    Francesco Mele, Antonio Sorgente
  5. Interaction Mining: the new Frontier of Call Center Analytics
    Vincenzo Pallotta, Rodolfo Delmonte, Lammert Vrieling, David Walker
  6. Context-Aware Recommender Systems: A Comparison Of Three Approaches
    Umberto Panniello, Michele Gorgoglione
  7. A Multi-Agent System for Information Semantic Sharing
    Agostino Poggi, Michele Tomaiuolo
  8. Temporal characterization of the requests to Wikipedia
    Antonio J. Reinoso, Jesus M. Gonzalez-Barahona, Rocio Muñoz-Mansilla, Israel Herraiz
  9. From Logical Forms to SPARQL Query with GETARUN
    Rocco Tripodi, Rodolfo Delmonte
  10. ImageHunter: a Novel Tool for Relevance Feedback in Content Based Image Retrieval
    Roberto Tronci, Gabriele Murgia, Maurizio Pili, Luca Piras, Giorgio Giacinto

Principal Components Analysis

Friday, November 26th, 2010

A Tutorial on Principal Components Analysis by Lindsay I. Smith.

From Chapter 3:

Finally we come to Principal Components Analysis (PCA). What is it? It is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences. Since patterns in data can be hard to find in data of high dimension, where the luxury of graphical representation is not available, PCA is a powerful tool for analysing data.

The other main advantage of PCA is that once you have found these patterns in the data, and you compress the data, ie. by reducing the number of dimensions, without much loss of information. This technique used in image compression, as we will see in a later section.

One of the main application areas for PCA is image analysis, recognition.

Lindsay starts off with a review of the mathematics needed to work through the rest of the material.

Topic maps are a natural fit for pairing up the results of image recognition, for example, and other data. More on that anon.