Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 23, 2014

Show and Tell: A Neural Image Caption Generator

Filed under: Image Processing,Image Recognition,Image Understanding — Patrick Durusau @ 10:53 am

Show and Tell: A Neural Image Caption Generator by Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan.

Abstract:

Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU score improvements on Flickr30k, from 55 to 66, and on SBU, from 19 to 27.

Another caption generating program for images. (see also, Deep Visual-Semantic Alignments for Generating Image Descriptions) Not quite to the performance of a human observer but quite respectable. The near misses are amusing enough for crowd correction to be an element in a full blown system.

Perhaps “rough recognition” is close enough for some purposes. Searching images for people who match a partial description and producing a much smaller set for additional processing.

I first saw this in Nat Torkington’s Four short links: 18 November 2014.

November 21, 2014

Deep Visual-Semantic Alignments for Generating Image Descriptions

Filed under: Identification,Image Processing,Image Recognition,Image Understanding — Patrick Durusau @ 7:52 pm

Deep Visual-Semantic Alignments for Generating Image Descriptions by Andrej Karpathy and Li Fei-Fei.

From the webpage:

We present a model that generates free-form natural language descriptions of image regions. Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data. Our approach is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate the effectiveness of our alignment model with ranking experiments on Flickr8K, Flickr30K and COCO datasets, where we substantially improve on the state of the art. We then show that the sentences created by our generative model outperform retrieval baselines on the three aforementioned datasets and a new dataset of region-level annotations.

Excellent examples with generated text. Code and other predictions “coming soon.”

For the moment you can also read the research paper: Deep Visual-Semantic Alignments for Generating Image Descriptions

Serious potential in any event but even more so if the semantics of the descriptions could be captured and mapped across natural languages.

November 1, 2014

Guess the Manuscript XVI

Filed under: British Library,Image Processing,Image Recognition,Image Understanding — Patrick Durusau @ 7:55 pm

Guess the Manuscript XVI

From the post:

Welcome to the sixteenth instalment of our popular Guess the Manuscript series. The rules are simple: we post an image of part of a manuscript that is on the British Library’s Digitised Manuscripts site, you guess which one it’s taken from!

bl mss XVI

Are you as surprised as we are to find an umbrella in a medieval manuscript? The manuscript from which this image was taken will feature in a blogpost in the near future.

In the meantime, answers or guesses please in the comments below, or via Twitter @BLMedieval.

Caution! The Medieval Period lasted from five hundred (500) C.E. until fifteen hundred (1500) C.E. Google NGrams records the first use of “umbrella” at or around sixteen-sixty (1660). Is this an “umbrella” or something else?

Using Google’s reverse image search found only repostings of the image search challenge, no similar images. Not sure that helps but was worth a try.

On the bright side, there are only two hundred and fifty-seven (257) manuscripts in the digitized collection dated between five hundred (500) C.E. until fifteen hundred (1500) C.E.

What stories or information can be found in those volumes that might be accompanied by such an image? Need to create a list of the classes of those manuscripts.

Suggestions? Is there an image processor in the house?

Enjoy!

October 24, 2014

50 Face Recognition APIs

Filed under: Face Detection,Image Processing,Image Recognition,Image Understanding — Patrick Durusau @ 1:44 pm

50 Face Recognition APIs by Mirko Krivanek.

Interesting listing published on Mashape. Only the top 12 are listed below. It would be nice to have a separate blog for voice recognition APIs. I’ve been thinking at using voice rather than passport or driving license, as a more secure ID. The voice has a texture unique to each individual.

Subjects that are likely to be of interest!

Mirko mentions voice but then lists face recognition APIs.

Voice comes up in a mixture of APIs in: 37 Recognition APIS: AT&T SPEECH, Moodstocks and Rekognition by Matthew Scott.

I first saw this in a tweet by Andrea Mostosi

September 1, 2014

Extracting images from scanned book pages

Filed under: Data Mining,Image Processing,Image Recognition — Patrick Durusau @ 7:14 pm

Extracting images from scanned book pages by Chris Adams.

From the post:

I work on a project which has placed a number of books online. Over the years we’ve improved server performance and worked on a fast, responsive viewer for scanned books to make our books as accessible as possible but it’s still challenging to help visitors find something of interest out of hundreds of thousands of scanned pages.

Trevor and I have discussed various ways to improve the situation and one idea which seemed promising was seeing how hard it would be to extract the images from digitized pages so we could present a visual index of an item. Trevor’s THATCamp CHNM post on Freeing Images from Inside Digitized Books and Newspapers got a favorable reception and since it kept coming up at work I decided to see how far I could get using OpenCV.

Everything you see below is open-source and comments are highly welcome. I created a book-illustration-detection branch in my image mining project (see my previous experiment reconstructing higher-resolution thumbnails from the masters) so feel free to fork it or open issues.

Just in case you are looking for a Fall project. 😉

Consider capturing the images and their contents in associations with authors, publishers, etc. To enable mining those associations for patterns.

August 23, 2014

Large-Scale Object Classification…

Filed under: Classification,Image Recognition,Image Understanding,Topic Maps — Patrick Durusau @ 3:37 pm

Large-Scale Object Classi cation using Label Relation Graphs by Jia Deng, et al.

Abstract:

In this paper we study how to perform object classi cation in a principled way that exploits the rich structure of real world labels. We develop a new model that allows encoding of flexible relations between labels. We introduce Hierarchy and Exclusion (HEX) graphs, a new formalism that captures semantic relations between any two labels applied to the same object: mutual exclusion, overlap and subsumption. We then provide rigorous theoretical analysis that illustrates properties of HEX graphs such as consistency, equivalence, and computational implications of the graph structure. Next, we propose a probabilistic classifi cation model based on HEX graphs and show that it enjoys a number of desirable properties. Finally, we evaluate our method using a large-scale benchmark. Empirical results demonstrate that our model can signifi cantly improve object classifi cation by exploiting the label relations.

Let’s hear it for “real world labels!”

By which the authors mean:

  • An object can have more than one label.
  • There are relationships between labels.

From the introduction:

We first introduce Hierarchy and Exclusion (HEX) graphs, a new formalism allowing flexible specifi cation of relations between labels applied to the same object: (1) mutual exclusion (e.g. an object cannot be dog and cat), (2) overlapping (e.g. a husky may or may not be a puppy and vice versa), and (3) subsumption (e.g. all huskies are dogs). We provide theoretical analysis on properties of HEX graphs such as consistency, equivalence, and computational implications.

Next, we propose a probabilistic classi fication model leveraging HEX graphs. In particular, it is a special type of Conditional Random Field (CRF) that encodes the label relations as pairwise potentials. We show that this model enjoys
a number of desirable properties, including flexible encoding of label relations, predictions consistent with label relations, efficient exact inference for typical graphs, learning labels with varying specifi city, knowledge transfer, and uni fication of existing models.

Having more than one label is trivially possible in topic maps. The more interesting case is the authors choosing to treat semantic labels as subjects and to define permitted associations between those subjects.

A world of possibilities opens up when you can treat something as a subject that can have relationships defined to other subjects. Noting that those relationships can also be treated as subjects should someone desire to do so.

I first saw this at: Is that husky a puppy?

August 17, 2014

AverageExplorer:…

Filed under: Clustering,Image Recognition,Indexing,Users — Patrick Durusau @ 4:22 pm

AverageExplorer: Interactive Exploration and Alignment of Visual Data Collections, Jun-Yan Zhu, Yong Jae Lee, and Alexei Efros.

Abstract:

This paper proposes an interactive framework that allows a user to rapidly explore and visualize a large image collection using the medium of average images. Average images have been gaining popularity as means of artistic expression and data visualization, but the creation of compelling examples is a surprisingly laborious and manual process. Our interactive, real-time system provides a way to summarize large amounts of visual data by weighted average(s) of an image collection, with the weights reflecting user-indicated importance. The aim is to capture not just the mean of the distribution, but a set of modes discovered via interactive exploration. We pose this exploration in terms of a user interactively “editing” the average image using various types of strokes, brushes and warps, similar to a normal image editor, with each user interaction providing a new constraint to update the average. New weighted averages can be spawned and edited either individually or jointly. Together, these tools allow the user to simultaneously perform two fundamental operations on visual data: user-guided clustering and user-guided alignment, within the same framework. We show that our system is useful for various computer vision and graphics applications.

Applying averaging to images, particularly in an interactive context with users, seems like a very suitable strategy.

What would it look like to have interactive merging of proxies based on data ranges controlled by the user?

July 28, 2014

Cat Dataset

Filed under: Data,Image Processing,Image Recognition,Image Understanding — Patrick Durusau @ 12:14 pm

Cat Dataset

cat

From the description:

The CAT dataset includes 10,000 cat images. For each image, we annotate the head of cat with nine points, two for eyes, one for mouth, and six for ears. The detail configuration of the annotation was shown in Figure 6 of the original paper:

Weiwei Zhang, Jian Sun, and Xiaoou Tang, “Cat Head Detection – How to Effectively Exploit Shape and Texture Features”, Proc. of European Conf. Computer Vision, vol. 4, pp.802-816, 2008.

A more accessible copy: Cat Head Detection – How to Effectively Exploit Shape and Texture Features

Prelude to a cat filter for Twitter feeds? 😉

I first saw this in a tweet by Basile Simon.

July 19, 2014

What is deep learning, and why should you care?

Filed under: Deep Learning,Image Recognition,Machine Learning — Patrick Durusau @ 2:45 pm

What is deep learning, and why should you care? by Pete Warden.

From the post:

neuron

When I first ran across the results in the Kaggle image-recognition competitions, I didn’t believe them. I’ve spent years working with machine vision, and the reported accuracy on tricky tasks like distinguishing dogs from cats was beyond anything I’d seen, or imagined I’d see anytime soon. To understand more, I reached out to one of the competitors, Daniel Nouri, and he demonstrated how he used the Decaf open-source project to do so well. Even better, he showed me how he was quickly able to apply it to a whole bunch of other image-recognition problems we had at Jetpac, and produce much better results than my conventional methods.

I’ve never encountered such a big improvement from a technique that was largely unheard of just a couple of years before, so I became obsessed with understanding more. To be able to use it commercially across hundreds of millions of photos, I built my own specialized library to efficiently run prediction on clusters of low-end machines and embedded devices, and I also spent months learning the dark arts of training neural networks. Now I’m keen to share some of what I’ve found, so if you’re curious about what on earth deep learning is, and how it might help you, I’ll be covering the basics in a series of blog posts here on Radar, and in a short upcoming ebook.

Pete gives a brief sketch of “deep learning” and promises more posts and a short ebook to follow.

Along those same lines you will want to see:

Microsoft Challenges Google’s Artificial Brain With ‘Project Adam’ by Daniela Hernandez (WIRED).

If you want in depth (technical) coverage, see: Deep Learning…moving beyond shallow machine learning since 2006! The reading list and references here should keep you busy for some time.

BTW, on “…shallow machine learning…” you do know the “Dark Ages” really weren’t “dark” but were so named in the Renaissance in order to show the fall into darkness (the Fall of Rome), the “Dark Ages,” and then the return of “light” in the Renaissance? See: Dark Ages (historiography).

Don’t overly credit characterizations of ages or technologies by later ages or newer technologies. They too will be found primitive and superstitious.

March 26, 2014

Deep Belief in Javascript

Filed under: Image Recognition,Image Understanding,Javascript,Neural Networks,WebGL — Patrick Durusau @ 1:34 pm

Deep Belief in Javascript

From the webpage:

It’s an implementation of the Krizhevsky convolutional neural network architecture for object recognition in images, running entirely in the browser using Javascript and WebGL!

I built it so people can easily experiment with a classic deep belief approach to image recognition themselves, to understand both its limitations and its power, and to demonstrate that the algorithms are usable even in very restricted client-side environments like web browsers.

A very impressive demonstration of the power of Javascript to say nothing of neural networks.

You can submit your own images for “recognition.”

I first saw this in Nat Torkington’s Four short links: 24 March 2014.

March 17, 2014

Office Lens Is a Snap (Point and Map?)

Office Lens Is a Snap

From the post:

The moment mobile-phone manufacturers added cameras to their devices, they stopped being just mobile phones. Not only have lightweight phone cameras made casual photography easy and spontaneous, they also have changed the way we record our lives. Now, with help from Microsoft Research, the Office team is out to change how we document our lives in another way—with the Office Lens app for Windows Phone 8.

Office Lens, now available in the Windows Phone Store, is one of the first apps to use the new OneNote Service API. The app is simple to use: Snap a photo of a document or a whiteboard, and upload it to OneNote, which stores the image in the cloud. If there is text in the uploaded image, OneNote’s cloud-based optical character-recognition (OCR) software turns it into editable, searchable text. Office Lens is like having a scanner in your back pocket. You can take photos of recipes, business cards, or even a whiteboard, and Office Lens will enhance the image and put it into your OneNote Quick Notes for reference or collaboration. OneNote can be downloaded for free.

Less than five (5) years ago, every automated process in Office Lens would have been a configurable setting.

Today, it’s just point and shoot.

There is an interface lesson for topic maps in the Office Lens interface.

Some people will need the Office Lens API. But, the rest of us, just want to take a picture of the whiteboard (or some other display). Automatic storage and OCR are welcome added benefits.

What about a topic map authoring interface that looks a lot like MS Word™ or Open Office. A topic map is loaded much like a spelling dictionary. When the user selects “map-it,” links are inserted that point into the topic map.

Hover over such a link and data from the topic map is displayed. Can be printed, annotated, etc.

One possible feature would be “subject check” which displays the subjects “recognized” in the document. To enable the author to correct any recognition errors.

In case you are interested, I can point you to some open source projects that have general authoring interfaces. 😉

PS: If you have a Windows phone, can you check out Office Lens for me? I am still sans a cellphone of any type. Since I don’t get out of the yard a cellphone doesn’t make much sense. But I do miss out on the latest cellphone technology. Thanks!

October 27, 2013

Lucene Image Retrieval LIRE

Filed under: Image Recognition,Lucene — Patrick Durusau @ 6:40 pm

Lucene Image Retrieval LIRE by Mathias Lux.

From the post:

Today I gave a talk on LIRE at the ACM Multimedia conference in the open source software competition, currently taking place in Barcelona. It gave me the opportunity to present a local installation of the LIRE Solr plugin and the possibilities thereof. Find the slides of the talk at slideshare: LIRE presentation at the ACM Multimedia Open Source Software Competition 2013

The Solr plugin itself is fully functional for Solr 4.4 and the source is available at https://bitbucket.org/dermotte/liresolr. There is a markdown document README.md explaining what can be done with plugin and how to actually install it. Basically it can do content based search, content based re-ranking of text searches and brings along a custom field implementation & sub linear search based on hashing.

There is a demo site as well.

See also: LIRE: open source image retrieval in Java.

If you plan on capturing video feeds from traffic cams or other sources, to link up with other data, image recognition is in your future.

You can start with a no-bid research contract or with LIRE and Lucene.

Your call.

August 6, 2013

Lire

Filed under: Image Processing,Image Recognition,Searching — Patrick Durusau @ 6:29 pm

Lire

From the webpage:

LIRE (Lucene Image Retrieval) is an open source library for content based image retrieval, which means you can search for images that look similar. Besides providing multiple common and state of the art retrieval mechanisms LIRE allows for easy use on multiple platforms. LIRE is actively used for research, teaching and commercial applications. Due to its modular nature it can be used on process level (e.g. index images and search) as well as on image feature level. Developers and researchers can easily extend and modify Lire to adapt it to their needs.

The developer wiki & blog are currently hosted on http://www.semanticmetadata.net

An online demo can be found at http://demo-itec.uni-klu.ac.at/liredemo/

Lire will be useful if you start collecting images of surveillance cameras or cars going into or out of known alphabet agency parking lots.

February 3, 2013

Content Based Image Retrieval (CBIR)

Filed under: Image Recognition,MapReduce — Patrick Durusau @ 6:57 pm

MapReduce Paves the Way for CBIR

From the post:

Recently, content based image retrieval (CBIR) has gained active research focus due to wide applications such as crime prevention, medicine, historical research and digital libraries.

As a research team from the School of Science, Information Technology and Engineering at theUniversity of Ballarat, Australia has suggested, image collections in databases in distributed locations over the Internet pose a challenge to retrieve images that are relevant to user queries efficiently and accurately.

The researchers say that with this in mind, it has become increasingly important to develop new CBIR techniques that are effective and scalable for real-time processing of very large image collections. To address this, the offer up a novel MapReduce neural network framework for CBIR from large data collection in a cloud environment.

Reference to the paper: MapReduce neural network framework for efficient content based image retrieval from large datasets in the cloud by Sitalakshmi Venkatraman. (In Hybrid Intelligent Systems (HIS), 2012 12th International Conference on)

Abstract:

Recently, content based image retrieval (CBIR) has gained active research focus due to wide applications such as crime prevention, medicine, historical research and digital libraries. With digital explosion, image collections in databases in distributed locations over the Internet pose a challenge to retrieve images that are relevant to user queries efficiently and accurately. It becomes increasingly important to develop new CBIR techniques that are effective and scalable for real-time processing of very large image collections. To address this, the paper proposes a novel MapReduce neural network framework for CBIR from large data collection in a cloud environment. We adopt natural language queries that use a fuzzy approach to classify the colour images based on their content and apply Map and Reduce functions that can operate in cloud clusters for arriving at accurate results in real-time. Preliminary experimental results for classifying and retrieving images from large data sets were quite convincing to carry out further experimental evaluations.

Sounds like the basis for a user-augmented index of visual content to me.

You?

January 22, 2013

Content-Based Image Retrieval at the End of the Early Years

Content-Based Image Retrieval at the End of the Early Years by Arnold W.M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. (Smeulders, A.W.M.; Worring, M.; Santini, S.; Gupta, A.; Jain, R.; , “Content-based image retrieval at the end of the early years,” Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.22, no.12, pp.1349-1380, Dec 2000
doi: 10.1109/34.895972)

Abstract:

Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.

Excellent survey article from 2000 (not 2002 as per the Ostermann paper).

I think you will appreciate the treatment of the “semantic gap,” both in terms of its description as well as ways to address it.

If you are using annotated images in your topic map application, definitely a must read.

November 22, 2012

Developing New Ways to Search for Web Images

Developing New Ways to Search for Web Images by Shar Steed.

From the post:

Collections of photos, images, and videos are quickly coming to dominate the content available on the Web. Currently internet search engines rely on the text with which the images are labeled to return matches. But why is only text being used to search visual mediums? These labels can be unreliable, unhelpful and sometimes not available at all.

To solve this problem, scientists at Stanford and Princeton have been working to “create a new generation of visual search technologies.” Dr. Fei-Fei Li, a computer scientist at Stanford, has built the world’s largest visual database, containing more than 14 million labeled objects.

A system called ImageNet, applies the data gathered from the database to recognize similar, unlabeled objects with much greater accuracy than past algorithms.

A remarkable amount of material to work with, either via the API or downloading for your own hacking.

Another tool for assisting in the authoring of topic maps (or other content).

November 17, 2012

BioID face database

Filed under: Face Detection,Image Recognition — Patrick Durusau @ 4:31 pm

BioID face database

From the webpage:

The BioID Face Database has been recorded and is published to give all researchers working in the area of face detection the possibility to compare the quality of their face detection algorithms with others. It may be used for such purposes without further permission. During the recording special emphasis has been placed on “real world” conditions. Therefore the testset features a large variety of illumination, background, and face size. Some typical sample images are shown below. (click to enlarge the images)

Just in case you are interested in face detection + topic maps.

I first saw this in Face detection using Python and OpenCV.

October 23, 2012

The Ultimate User Experience

Filed under: Image Recognition,Interface Research/Design,Marketing,Usability,Users — Patrick Durusau @ 4:55 am

The Ultimate User Experience by Tim R. Todish.

From the post:

Today, more people have mobile phones than have electricity or safe drinking water. In India, there are more cell phones than toilets! We all have access to incredible technology, and as designers and developers, we have the opportunity to use this pervasive technology in powerful ways that can change people’s lives.

In fact, a single individual can now create an application that can literally change the lives of people across the globe. With that in mind, I’m going to highlight some examples of designers and developers using their craft to help improve the lives of people around the world in the hope that you will be encouraged to find ways to do the same with your own skills and talents.

I may have to get a cell phone to get a better understanding of its potential when combined with topic maps.

For example, the “hot” night spots are well known in New York City. What if a distributed information network imaged guests as they arrived/left and maintained a real time map of images + locations (no names)?

That would make a nice subscription service, perhaps with faceted searching by physical characteristics.

October 18, 2012

A Glance at Information-Geometric Signal Processing

Filed under: Image Processing,Image Recognition,Semantics — Patrick Durusau @ 2:18 pm

A Glance at Information-Geometric Signal Processing by Frank Nielsen.

Slides from the MAHI workship (Methodological Aspects of Hyperspectral Imaging)

From the workshop homepage:

The scope of the MAHI workshop is to explore new pathways that can potentially lead to breakthroughs in the extraction of the informative content of hyperspectral images. It will bring together researchers involved in hyperspectral image processing and in various innovative aspects of data processing.

Images, their informational content and the tools to analyze them have semantics too.

September 29, 2012

Visual Clues: A Brain “feature,” not a “bug”

You will read in When Your Eyes Tell Your Hands What to Think: You’re Far Less in Control of Your Brain Than You Think that:

You’ve probably never given much thought to the fact that picking up your cup of morning coffee presents your brain with a set of complex decisions. You need to decide how to aim your hand, grasp the handle and raise the cup to your mouth, all without spilling the contents on your lap.

A new Northwestern University study shows that, not only does your brain handle such complex decisions for you, it also hides information from you about how those decisions are made.

“Our study gives a salient example,” said Yangqing ‘Lucie’ Xu, lead author of the study and a doctoral candidate in psychology at Northwestern. “When you pick up an object, your brain automatically decides how to control your muscles based on what your eyes provide about the object’s shape. When you pick up a mug by the handle with your right hand, you need to add a clockwise twist to your grip to compensate for the extra weight that you see on the left side of the mug.

“We showed that the use of this visual information is so powerful and automatic that we cannot turn it off. When people see an object weighted in one direction, they actually can’t help but ‘feel’ the weight in that direction, even when they know that we’re tricking them,” Xu said. (emphasis added)

I never quite trusted my brain and now I have proof that it is untrustworthy. Hiding stuff indeed! 😉

But that’s the trick of subject identification/identity isn’t it?

That our brains “recognize” all manner of subjects without any effort on our part.

Another part of the effortless features of our brains. But it hides the information we need to integrate information stores from ourselves and others.

Or rather, making it more work than we are usually willing to devote to digging it out.

When called upon to be “explicit” about subject identification, or even worse, to imagine how other people identify subjects, we prefer to stay at home consuming passive entertainment.

Two quick points:

First, need to think about how to incorporate this “feature” into delivery interfaces for users.

Second, what subjects would users pay others to mine/collate/identify for them? (Delivery being a separate issue.)

September 1, 2012

“What Makes Paris Look Like Paris?”

Filed under: Geo Analytics,Geographic Data,Image Processing,Image Recognition — Patrick Durusau @ 3:19 pm

“What Makes Paris Look Like Paris?” by Erwin Gianchandani.

From the post:

We all identify cities by certain attributes, such as building architecture, street signage, even the lamp posts and parking meters dotting the sidewalks. Now there’s a neat study by computer graphics researchers at Carnegie Mellon University — presented at SIGGRAPH 2012 earlier this month — that develops novel computational techniques to analyze imagery in Google Street View and identify what gives a city its character….

From the abstract:

Given a large repository of geotagged imagery, we seek to automatically find visual elements, e.g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering approach able to take into account the weak geographic supervision. We show that geographically representative image elements can be discovered automatically from Google Street View imagery in a discriminative manner. We demonstrate that these elements are visually interpretable and perceptually geo-informative. The discovered visual elements can also support a variety of computational geography tasks, such as mapping architectural correspondences and influences within and across cities, finding representative elements at different geo-spatial scales, and geographically-informed image retrieval.

The video and other resources are worth the time to review/read.

What features do you rely on to “recognize” a city?

The potential to explore features within a city or between cities looks particularly promising.

May 20, 2012

Finding Waldo, a flag on the moon and multiple choice tests, with R

Filed under: Graphics,Image Processing,Image Recognition,R — Patrick Durusau @ 6:28 pm

Finding Waldo, a flag on the moon and multiple choice tests, with R by Arthur Charpentier.

From the post:

I have to admit, first, that finding Waldo has been a difficult task. And I did not succeed. Neither could I correctly spot his shirt (because actually, it was what I was looking for). You know, that red-and-white striped shirt. I guess it should have been possible to look for Waldo’s face (assuming that his face does not change) but I still have problems with size factor (and resolution issues too). The problem is not that simple. At the http://mlsp2009.conwiz.dk/ conference, a price was offered for writing an algorithm in Matlab. And one can even find Mathematica codes online. But most of the those algorithms are based on the idea that we look for similarities with Waldo’s face, as described in problem 3 on http://www1.cs.columbia.edu/~blake/‘s webpage. You can find papers on that problem, e.g. Friencly & Kwan (2009) (based on statistical techniques, but Waldo is here a pretext to discuss other issues actually), or more recently (but more complex) Garg et al. (2011) on matching people in images of crowds.

Not sure how often you will want to find Waldo but then you may not be looking for Waldo.

Tipped off to this post by Simply Statistics.

May 13, 2012

Are visual dictionaries generalizable?

Filed under: Classification,Dictionary,Image Recognition,Information Retrieval — Patrick Durusau @ 7:54 pm

Are visual dictionaries generalizable? by Otavio A. B. Penatti, Eduardo Valle, and Ricardo da S. Torres

Abstract:

Mid-level features based on visual dictionaries are today a cornerstone of systems for classification and retrieval of images. Those state-of-the-art representations depend crucially on the choice of a codebook (visual dictionary), which is usually derived from the dataset. In general-purpose, dynamic image collections (e.g., the Web), one cannot have the entire collection in order to extract a representative dictionary. However, based on the hypothesis that the dictionary reflects only the diversity of low-level appearances and does not capture semantics, we argue that a dictionary based on a small subset of the data, or even on an entirely different dataset, is able to produce a good representation, provided that the chosen images span a diverse enough portion of the low-level feature space. Our experiments confirm that hypothesis, opening the opportunity to greatly alleviate the burden in generating the codebook, and confirming the feasibility of employing visual dictionaries in large-scale dynamic environments.

The authors use the Caltech-101 image set because of its “diversity.” Odd because they cite the Caltech-256 image set, which was created to answer concerns about the lack of diversity in the Caltech-101 image set.

Not sure this paper answers the issues it raises about visual dictionaries.

Wanted to bring it to your attention because representative dictionaries (as opposed to comprehensive ones) may be lurking just beyond the semantic horizon.

April 2, 2012

SAXually Explicit Images: Data Mining Large Shape Databases

Filed under: Data Mining,Image Processing,Image Recognition,Shape — Patrick Durusau @ 5:46 pm

SAXually Explicit Images: Data Mining Large Shape Databases by Eamonn Keogh.

ABSTRACT

The problem of indexing large collections of time series and images has received much attention in the last decade, however we argue that there is potentially great untapped utility in data mining such collections. Consider the following two concrete examples of problems in data mining.

Motif Discovery (duplication detection): Given a large repository of time series or images, find approximately repeated patterns/images.

Discord Discovery: Given a large repository of time series or images, find the most unusual time series/image.

As we will show, both these problems have applications in fields as diverse as anthropology, crime…

Ancient history in the view of some, this is a Google talk from 2006!

But, it is quite well done and I enjoyed the unexpected application of time series representation to shape data for purposes of evaluating matches. It is one of those insights that will stay with you and that seems obvious after they say it.

I think topic map authors (semantic investigators generally) need to report such insights for the benefit of others.

March 31, 2012

Incremental face recognition for large-scale social network services

Filed under: Image Recognition,Social Networks — Patrick Durusau @ 4:10 pm

Incremental face recognition for large-scale social network services by Kwontaeg Choia, Kar-Ann Tohb, and Hyeran Byuna.

Abstract:

Due to the rapid growth of social network services such as Facebook and Twitter, incorporation of face recognition in these large-scale web services is attracting much attention in both academia and industry. The major problem in such applications is to deal efficiently with the growing number of samples as well as local appearance variations caused by diverse environments for the millions of users over time. In this paper, we focus on developing an incremental face recognition method for Twitter application. Particularly, a data-independent feature extraction method is proposed via binarization of a Gabor filter. Subsequently, the dimension of our Gabor representation is reduced considering various orientations at different grid positions. Finally, an incremental neural network is applied to learn the reduced Gabor features. We apply our method to a novel application which notifies new photograph uploading to related users without having their ID being identified. Our extensive experiments show that the proposed algorithm significantly outperforms several incremental face recognition methods with a dramatic reduction in computational speed. This shows the suitability of the proposed method for a large-scale web service with millions of users.

Any number of topic map uses suggest themselves for robust face recognition software.

What’s yours?

December 27, 2011

Computer Vision & Math

Filed under: Image Recognition,Image Understanding,Mathematics — Patrick Durusau @ 7:10 pm

Computer Vision & Math

From the website:

The main part of this site is called Home of Math. It’s an online mathematics textbook that contains over 800 articles with over 2000 illustrations. The level varies from beginner to advanced.

Try our image analysis software. Pixcavator is a light-weight program intended for scientists and engineers who want to automate their image analysis tasks but lack a significant computing background. This image analysis software allows the analyst to concentrate on the science and lets us take care of the math.

If you create image analysis applications, consider Pixcavator SDK. It provides a simple tool for developing new image analysis software in a variety of fields. It allows the software developer to concentrate on the user’s needs instead of development of custom algorithms.

November 10, 2011

Google1000 dataset

Filed under: Dataset,Image Recognition,Machine Learning — Patrick Durusau @ 6:46 pm

Google1000 dataset

From the post:

This is a dataset of scans of 1000 public domain books that was released to the public at ICDAR 2007. At the time there was no public serving infrastructure, so few people actually got the 120GB dataset. It has since been hosted on Google Cloud Storage and made available for public download: (see the post for the links)

Intended for OCR and machine learning purposes. The results of which you may wish to unite in topic maps with other resources.

September 17, 2011

The Revolution(s) Are Being Televised

Filed under: Crowd Sourcing,Image Recognition,Image Understanding,Marketing — Patrick Durusau @ 8:17 pm

Revolutions usually mean human rights violations, lots of them.

Patrick Meier has a project to collect evidence of mass human rights violations in Syria.

See: Help Crowdsource Satellite Imagery Analysis for Syria: Building a Library of Evidence

Topic maps are an ideal solution to link objects in dated satellite images to eye witness accounts, captured military documents, ground photos, news accounts and other information.

I say that for two reasons:

First, with a topic map you can start from any linked object in a photo, a witness account, ground photo or news account and see all related evidence for that location. Granted that takes someone authoring that collation but it doesn’t have to be only one someone.

Second, topic maps offer parallel subject processing, which can distribute the authoring task in a crowd-sourced project, for instance. For example, I could be doing photo analysis and marking the location of military checkpoints. That would generate topics and associations for the geographic location, the type of installation, dates (from the photos), etc. Someone else could be interviewing witnesses and taking their testimony. As part of the processing of that testimony, another volunteer codes an approximate date and geographic location in connection with part of that testimony. Still another person is coding military orders by identified individuals for checkpoints that include the one in question. Associations between all these separately encoded bits of evidence, each unknown to the individual volunteers becomes a mouse-click away from coming to the attention of anyone reviewing the evidence. And determining responsibility.

The alternative, the one most commonly used, is to have an under-staffed international group piece together the best evidence it can from a sea of documents, photos, witness accounts, etc. An adequate job for the resources they have, but why settle for an “adequate” job when it can be done properly with 21st century technology?

GRASS: Geographic Resources Analysis Support System

GRASS: Geographic Resources Analysis Support System

The post about satellite imagery analysis for Syria made me curious about tools for use for automated analysis of satellite images.

From the webpage:

Commonly referred to as GRASS, this is free Geographic Information System (GIS) software used for geospatial data management and analysis, image processing, graphics/maps production, spatial modeling, and visualization. GRASS is currently used in academic and commercial settings around the world, as well as by many governmental agencies and environmental consulting companies. GRASS is an official project of the Open Source Geospatial Foundation.

You may also want to visit the Open Dragon project.

From the Open Dragon site:

Availability of good software for teaching Remote Sensing and GIS has always been a problem. Commercial software, no matter how good a discount is offered, remains expensive for a developing country, cannot be distributed to students, and may not be appropriate for education. Home-grown and university-sourced software lacks long-term support and the needed usability and robustness engineering.

The OpenDragon Project was established in the Department of Computer Engineering of KMUTT in December of 2004. The primary objective of this project is to develop, enhance, and maintain a high-quality, commercial-grade software package for remote sensing and GIS analysis that can be distributed free to educational organizations within Thailand. This package, OpenDragon, is based on the Version 5 of the commercial Dragon/ips® software developed and marketed by Goldin-Rudahl Systems, Inc.

As of 2010, Goldin-Rudahl Systems has agreed that the Open Dragon software, based on Dragon version 5, will be open source for non-commercial use. The software source code should be available on this server by early 2011.

And there is always the commercial side, if you have funding ArcGIS. The makers of ArcGIS, Esri support a several open source GIS projects.

The results of using these or other software packages can be tied to other information using topic maps.

September 11, 2011

New Challenges in Distributed Information Filtering and Retrieval

New Challenges in Distributed Information Filtering and Retrieval

Proceedings of the 5th International Workshop on New Challenges in Distributed Information Filtering and Retrieval
Palermo, Italy, September 17, 2011.

Edited by:

Cristian Lai – CRS4, Loc. Piscina Manna, Building 1 – 09010 Pula (CA), Italy

Giovanni Semeraro – Dept. of Computer Science, University of Bari, Aldo Moro, Via E. Orabona, 4, 70125 Bari, Italy

Eloisa Vargiu – Dept. of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, 09123 Cagliari, Italy

Table of Contents:

  1. Experimenting Text Summarization on Multimodal Aggregation
    Giuliano Armano, Alessandro Giuliani, Alberto Messina, Maurizio Montagnuolo, Eloisa Vargiu
  2. From Tags to Emotions: Ontology-driven Sentimental Analysis in the Social Semantic Web
    Matteo Baldoni, Cristina Baroglio, Viviana Patti, Paolo Rena
  3. A Multi-Agent Decision Support System for Dynamic Supply Chain Organization
    Luca Greco, Liliana Lo Presti, Agnese Augello, Giuseppe Lo Re, Marco La Cascia, Salvatore Gaglio
  4. A Formalism for Temporal Annotation and Reasoning of Complex Events in Natural Language
    Francesco Mele, Antonio Sorgente
  5. Interaction Mining: the new Frontier of Call Center Analytics
    Vincenzo Pallotta, Rodolfo Delmonte, Lammert Vrieling, David Walker
  6. Context-Aware Recommender Systems: A Comparison Of Three Approaches
    Umberto Panniello, Michele Gorgoglione
  7. A Multi-Agent System for Information Semantic Sharing
    Agostino Poggi, Michele Tomaiuolo
  8. Temporal characterization of the requests to Wikipedia
    Antonio J. Reinoso, Jesus M. Gonzalez-Barahona, Rocio Muñoz-Mansilla, Israel Herraiz
  9. From Logical Forms to SPARQL Query with GETARUN
    Rocco Tripodi, Rodolfo Delmonte
  10. ImageHunter: a Novel Tool for Relevance Feedback in Content Based Image Retrieval
    Roberto Tronci, Gabriele Murgia, Maurizio Pili, Luca Piras, Giorgio Giacinto
« Newer PostsOlder Posts »

Powered by WordPress