Should-Read: Geoffrey Pulham (2013): Why Are We Still Waiting for Natural Language Processing?

Should-Read: This piece by the interesting Geoffrey Pulham seems to start out non-optimally.

There is a difference between (1) true “AI” on the one hand and (2) successful voice/text interface to database search on the other. At the moment (2) is easy. And we should implement (2)—which requires that humans do a little bit of adjusting in order not to use “not”, for figuring out within which superset of results any particular “not” is asking for the complement is genuinely hard, and does require true or nearly-true “AI”.

Thus to solve Pulham’s problem, all you have to do is ask two queries: (i) “Which UK papers are part of the Murdoch empire?”; (ii) “What are the major UK papers?”; take the complement of (i) within (ii) and you immediately get a completely serviceable and useful answer to your question.

That you need to do two rather than one query is because Google has not set itself up to produce short lists as possible answers to (ii) and (i), and then subtract (i) from (ii), and that the reason that it has not done that is a hard AI problem rather than the brute-force-and-massive-ignorance word-frequency-plus-internet-attention that is Google shtick.

But what amazes me is that Google can get so close—not that “true AI” is really hard.

And maybe that is Pelham’s real point:

Geoffrey Pulham (2013): Why Are We Still Waiting for Natural Language Processing?: “Try typing this, or any question with roughly the same meaning, into the Google search box… http://www.chronicle.com/blogs/linguafranca/2013/05/09/natural-language-processing/

…Which UK papers are not part of the Murdoch empire?

Your results (and you could get identical ones by typing the same words in the reverse order) will contain an estimated two million or more pages about Rupert Murdoch and the newspapers owned by his News Corporation. Exactly the opposite of what you asked for. Putting quotes round the search string freezes the word order, but makes things worse: It calls not for the answer (which would be a list including The Daily Telegraph, the Daily Mail, the Daily Mirror, etc.) but for pages where the exact wording of the question can be found, and there probably aren’t any (except this post).

Machine answering of such a question calls for not just a database of information about newspapers but also natural language processing (NLP). I’ve been waiting for NLP to arrive for 30 years. Whatever happened?…

Three developments….Google bet on… simple keyword search… [plus] showing the most influential first…. There is scant need for a system that can parse “Are there lizards that do not have legs but are not snakes?” given that putting legless lizard in the Google search box gets you to various Web pages that answer the question immediately….

Speech-recognition systems have been able to take off and become really useful in interactive voice-driven telephone systems… the magic of a technique known as dialog design…. At a point where you have just been asked, “Are you calling from the phone you wish to ask about?” you are extremely likely to say either Yes or No, and it’s not too hard to differentiate those acoustically…. Prompting a bank customer with “Do you want to pay a bill or transfer funds between accounts?” considerably improves the chances of getting something with either “pay a bill” or “transfer funds” in it; and they sound very different…. Classifying noise bursts in a dialog context is way easier than recognizing continuous text….

Machine translation… calls for syntactic and semantic analysis of the source language, mapping source-language meanings to target-language meanings, and generating acceptable output…. What has emerged instead… is… pseudotranslation without analysis of grammar or meaning…. The trick: huge quantities of parallel texts combined with massive amounts of rapid statistical computation. The catch… output inevitably peppered with howlers…. We know that Google Translate has let us down before and we shouldn’t trust it. But with nowhere else to turn (we can’t all keep human translators on staff), we use it anyway. And it does prove useful… enough to constitute one more reason for not investing much in trying to get real NLP industrially developed and deployed.

NLP will come, I think; but when you take into account the ready availability of (1) Google search, and (2) speech-driven applications aided by dialog design, and (3) the statistical pseudotranslation briefly discussed above, the cumulative effect is enough to reduce the pressure to develop NLP, and will probably delay its arrival for another decade or so.

October 15, 2017

AUTHORS:

Brad DeLong
Connect with us!

Explore the Equitable Growth network of experts around the country and get answers to today's most pressing questions!

Get in Touch