How Google “Translates” Pictures into Words Using Vector Space Mathematics
From the post:
Translating one language into another has always been a difficult task. But in recent years, Google has transformed this process by developing machine translation algorithms that change the nature of cross cultural communications through Google Translate.
Now that company is using the same machine learning technique to translate pictures into words. The result is a system that automatically generates picture captions that accurately describe the content of images. That’s something that will be useful for search engines, for automated publishing and for helping the visually impaired navigate the web and, indeed, the wider world.
…
One of the best c-suite level explanations I have seen of Show and Tell: A Neural Image Caption Generator.
May be useful to you in obtaining support/funding for similar efforts in your domain.
Take particular note of the decision to not worry overmuch about the meaning of words. I would never make that simplifying assumption. Just runs counter to the grain for the meaning of the words to not matter. However, I am very glad that Oriol Vinyals and colleagues made that assumption!
That assumption enables the processing of images at a large scale.
I started to write that I would not use such an assumption for more precise translation tasks, say the translation of cuneiform tablets. But as a rough finding aid for untranslated cuneiform or hieroglyphic texts, this could be the very thing. Doesn’t have to be 100% precise or accurate, just enough that the vast archives of ancient materials becomes easier to use.
Is there an analogy for topic maps here? That topic maps need not be final production quality materials when released but can be refined over time by authors, editors and others?
Like Wikipedia but not quite so eclectic and more complete. Imagine a Solr reference manual that inlines or at least links to the most recent presentations and discussions on a particular topic. And incorporates information from such sources into the text.
Is Google offering us “good enough” results with data, expectations that others will refine the data further? Perhaps a value-add economic model where the producer of the “good enough” content has an interest in the further refinement of that data by others?