NLP Weather: High Pressure or Low?

Machine Translation Without the Translation by Geoffrey Pullum.

From the post:

I have been ruminating this month on why natural language processing (NLP) still hasn’t arrived, and I have pointed to three developments elsewhere that seem to be discouraging its development. First, enhanced keyword search via Google’s influentiality-ranking of results. Second, the dramatic enhancement in applicability of speech recognition that dialog design facilitates. I now turn to a third, which has to do with the sheer power of number-crunching.

Machine translation is the unclimbed Everest of computational linguistics. It calls for syntactic and semantic analysis of the source language, mapping source-language meanings to target-language meanings, and generating acceptable output from the latter. If computational linguists could do all those things, they could hang up the “mission accomplished” banner.

What has emerged instead, courtesy of Google Translate, is something utterly different: pseudotranslation without analysis of grammar or meaning, developed by people who do not know (or need to know) either source or target language.

The trick: huge quantities of parallel texts combined with massive amounts of rapid statistical computation. The catch: low quality, and output inevitably peppered with howlers.

Of course, if I may purloin Dr Johnson’s remark about a dog walking on his hind legs, although it is not done well you are surprised to find it done at all. For Google Translate’s pseudotranslation is based on zero linguistic understanding. Not even word meanings are looked up: The program couldn’t care less about the meaning of anything. Here, roughly, is how it works.


My conjecture is that it is useful enough to constitute one more reason for not investing much in trying to get real NLP industrially developed and deployed.

NLP will come, I think; but when you take into account the ready availability of (1) Google search, and (2) speech-driven applications aided by dialog design, and (3) the statistical pseudotranslation briefly discussed above, the cumulative effect is enough to reduce the pressure to develop NLP, and will probably delay its arrival for another decade or so.

Surprised to find that Geoffrey thinks more pressure will result in “real NLP,” albeit delayed by a decade or so for the reasons outlined in his post.

If you recall, machine translation of texts was the hot topic at the end of the 1950’s and early 1960’s.

With an emphasis on automatic translation of Russian. Height of the cold war so there was lots of pressure for a solution.

Lots of pressure then did not result in a solution.

There’s a rather practical reason for not investing in “real NLP.”

There is no evidence that how humans “understand” language is known well enough to program a computer to mimic that “understanding.”

If Geoffrey has evidence to the contrary, I am sure everyone would be glad to hear about it.

Comments are closed.