For Some Definition of “Read” and “Answer” – MS Clickbait

Microsoft creates AI that can read a document and answer questions about it as well as a person by Allison Linn.

From the post:

It’s a major milestone in the push to have search engines such as Bing and intelligent assistants such as Cortana interact with people and provide information in more natural ways, much like people communicate with each other.

A team at Microsoft Research Asia reached the human parity milestone using the Stanford Question Answering Dataset, known among researchers as SQuAD. It’s a machine reading comprehension dataset that is made up of questions about a set of Wikipedia articles.

According to the SQuAD leaderboard, on Jan. 3, Microsoft submitted a model that reached the score of 82.650 on the exact match portion. The human performance on the same set of questions and answers is 82.304. On Jan. 5, researchers with the Chinese e-commerce company Alibaba submitted a score of 82.440, also about the same as a human.

With machine reading comprehension, researchers say computers also would be able to quickly parse through information found in books and documents and provide people with the information they need most in an easily understandable way.

That would let drivers more easily find the answer they need in a dense car manual, saving time and effort in tense or difficult situations.

These tools also could let doctors, lawyers and other experts more quickly get through the drudgery of things like reading through large documents for specific medical findings or rarified legal precedent. The technology would augment their work and leave them with more time to apply the knowledge to focus on treating patients or formulating legal opinions.

Wait, wait! If you read the details about SQuAD, you realize how far Microsoft (or anyone else) is from “…reading through large documents for specific medical findings or rarified legal precedent….”

What is the SQuAD test?

Stanford Question Answering Dataset (SQuAD) is a new reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With 100,000+ question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.

Not to take anything away from Microsoft Research Asia or the creators of SQuAD, but “…the answer to every question is a segment of text, or span, from the corresponding reading passage.” is a long way from synthesizing an answer from a long legal document.

The first hurdle is asking a question that can be scored against every “…segment of text, or span…” such that a relevant snippet of text can be found.

The second hurdle is the process of scoring snippets of text in order to retrieve the most useful one. That’s a mechanical process, not one that depends on the semantics of the underlying question or text.

There are other hurdles but those two suffice to show there is no “reading and answering questions” in the same sense we would apply to any human reader.

Click-bait headlines don’t serve the cause of advocating more AI research. On the contrary, a close reading of alleged progress leads to disappointment.

Comments are closed.