Cuneiform « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 16, 2017

Shape Searching Dictionaries?

Filed under: Cuneiform,Facebook,Subject Identifiers,Subject Identity,Subject Recognition,Unicode — Patrick Durusau @ 11:27 am

Facebook, despite its spying, censorship, and being a shill for the U.S. government, isn’t entirely without value.

For example, this post by Simon St. Laurent:

Drew this response from Peter Cooper:

Which if you follow the link: Shapecatcher: Unicode Character Recognition you find:

Draw something in the box!

And let shapecatcher help you to find the most similar unicode characters!

Currently, there are 11817 unicode character glyphs in the database. Japanese, Korean and Chinese characters are currently not supported.
(emphasis in original)

I take “Japanese, Korean and Chinese characters are currently not supported.” means Anatolian Hieroglyphs; Cuneiform, Cuneiform Numbers and Punctuation, Early Dynastic Cuneiform, Old Persian, Ugaritic; Egyptian Hieroglyphs; Meroitic Cursive, and Meroitic Hieroglphs are not supported as well.

But my first thought wasn’t discovery of glyphs in Unicode Code Charts, although useful, but shape searching dictionaries, such as Faulkner’s A Concise Dictionary of Middle Egyptian.

A sample from Faulkner’s (1991 edition):

Or, The Student’s English-Sanskrit Dictionary by Vaman Shivram Apte (1893):

Imagine being able to search by shape for either dictionary! Not just as a gylph but as a set of glyphs, within any entry!

I suspect that’s doable based on Benjamin Milde‘s explanation of Shapecatcher:

…
Under the hood, Shapecatcher uses so called “shape contexts” to find similarities between two shapes. Shape contexts, a robust mathematical way of describing the concept of similarity between shapes, is a feature descriptor first proposed by Serge Belongie and Jitendra Malik.

You can find an indepth explanation of the shape context matching framework that I used in my Bachelor thesis (“On the Security of reCATPCHA”). In the end, it is quite a bit different from the matching framework that Belongie and Malik proposed in 2000, but still based on the idea of shape contexts.

The engine that runs this site is a rewrite of what I developed during my bachelor thesis. To make things faster, I used CUDA to accelereate some portions of the framework. This is a fairly new technology that enables me to use my NVIDIA graphics card for general purpose computing. Newer cards are quite powerful devices!
…

That was written in 2011 and no doubt shape matching has progressed since then.

No technique will be 100% but even less than 100% accuracy will unlock generations of scholarly dictionaries, in ways not imagined by their creators.

If you are interested, I’m sure Benjamin Milde would love to hear from you.

Comments Off

October 2, 2017

Machine Translation and Automated Analysis of Cuneiform Languages

Filed under: Cuneiform,Language,Machine Learning,Translation — Patrick Durusau @ 8:46 pm

Machine Translation and Automated Analysis of Cuneiform Languages

From the webpage:

The MTAAC project develops and applies new computerized methods to translate and analyze the contents of some 67,000 highly standardized administrative documents from southern Mesopotamia (ancient Iraq) from the 21st century BC. Our methodology, which combines machine learning with statistical and neural machine translation technologies, can then be applied to other ancient languages. This methodology, the translations, and the historical, social and economic data extracted from them, will be offered to the public in open access.

A recently funded (March 2017) project that strikes a number of resonances with me!

“Open access” and cuneiform isn’t an unheard of combination but many remember when access to cuneiform primary materials was a matter of whim and caprice. There are dark pockets where such practices continue but projects like MTAAC are hard on their heels.

The use of machine learning and automated analysis have the potential, when all extant cuneiform texts (multiple projects such as this one) are available, to provide a firm basis for grammars, lexicons, translations.

Do read: Machine Translation and Automated Analysis of the Sumerian Language by Émilie Pagé-Perron, Maria Sukhareva, Ilya Khait, Christian Chiarcos, for more details about the project.

There’s more to data science than taking advantage of sex-starved neurotics with under five second attention spans and twitchy mouse fingers.

Comments Off

May 19, 2017

SketchRNN model released in Magenta [Hieroglyphs/Cuneiform Anyone?]

Filed under: Cuneiform,Hieroglyphics,Neural Networks — Patrick Durusau @ 7:14 pm

SketchRNN model released in Magenta by Douglas Eck.

From the post:

Sketch-RNN, a generative model for vector drawings, is now available in Magenta. For an overview of the model, see the Google Research blog from April 2017, Teaching Machines to Draw (David Ha). For the technical machine learning details, see the arXiv paper A Neural Representation of Sketch Drawings (David Ha and Douglas Eck).

…

To try out Sketch-RNN, visit the Magenta GitHub for instructions. We’ve provided trained models, code for you to train your own models in TensorFlow and a Jupyter notebook tutorial (check it out!)

The code release is timed to coincide with a Google Creative Lab data release. Visit Quick, Draw! The Data for more information. For versions of the data pre-processed to work with Sketch-RNN, please refer to the GitHub repo for more information.

We’ll leave you with a look at yoga poses generated by moving through the learned representation (latent space) of the model trained on yoga drawings. Notice how it gets confused at around 10 seconds when it moves from poses standing towards poses done on a yoga mat. In our arXiv paper A Neural Representation of Sketch Drawings we discuss reasons for this behavior.
…

The paper, A Neural Representation of Sketch Drawings mentions:

…
ShadowDraw [17] is an interactive system that predicts what a finished drawing looks like based on a set of incomplete brush strokes from the user while the sketch is being drawn. ShadowDraw used a dataset of 30K raster images combined with extracted vectorized features. In this work, we use a much larger dataset of vector sketches that is made publicly available.
…

ShadowDraw is described at: ShadowDraw: Real-Time User Guidance for Freehand Drawing as:

…
We present ShadowDraw, a system for guiding the freeform drawing of objects. As the user draws, ShadowDraw dynamically updates a shadow image underlying the user’s strokes. The shadows are suggestive of object contours that guide the user as they continue drawing. This paradigm is similar to tracing, with two major differences. First, we do not provide a single image from which the user can trace; rather ShadowDraw automatically blends relevant images from a large database to construct the shadows. Second, the system dynamically adapts to the user’s drawings in real-time and produces suggestions accordingly. ShadowDraw works by efficiently matching local edge patches between the query, constructed from the current drawing, and a database of images. A hashing technique enforces both local and global similarity and provides sufficient speed for interactive feedback. Shadows are created by aggregating the top retrieved edge maps, spatially weighted by their match scores. We test our approach with human subjects and show comparisons between the drawings that were produced with and without the system. The results show that our system produces more realistically proportioned line drawings.
…

My first thought was the use of such techniques to assist in copying hieroglyphs or cuneiform as such or perhaps to assist in the practice of such glyphs.

OK, that may not have been your first thought but you have to admit it would make a rocking demonstration!

Comments Off