top | item 8222017

On the foolishness of “natural language programming”

132 points| LiveTheDream | 11 years ago |cs.utexas.edu | reply

152 comments

order
[+] kyllo|11 years ago|reply
The hardest part about software always has been, and probably always be, getting humans to first know and then express exactly what they want. Natural human language allows for vague, abstract, and ambiguous expression (not to mention the euphemisms, exaggerations, half-truths and outright lies).

Most people who have never written a computer program, have probably never even been through the experience of having to express exactly what they want someone or something else to do for them, in a specific and non-ambiguous manner. It really is a different way of thinking.

[+] Kenji|11 years ago|reply
Indeed. Not only that, but the power of a precise language is not to be underestimated. I noticed that I had (and probably still have) a lot of errorneous concepts in my head and the only way to find out the mistake is by expressing them in a rigorous, formal language that forces you to think about every detail, no matter how 'obvious' it appeared at first.
[+] libria|11 years ago|reply
This is the chief reason I'd support the "Anyone can learn to code" programs suddenly en vogue. We need our business owners - ideally customers - to be able to consider what exactly they want and don't want. We never need a line of code from them, but as long as they have the ability to consider their domain accurately, a good developer should be able to extract it from them.
[+] dyeje|11 years ago|reply
This is why exercises such as writing the steps to make a PB&J sandwich are instructive. They show how ambiguous natural language really is.
[+] baddox|11 years ago|reply
And one consequence of that, which is often overlooked when evaluating any AI application, is that humans can be really "bad" (or at least inconsistent and indecisive) at these types of problems. People miscommunicate and mishear each other all the time. We disagree about whether things are grammatical, the definitions of words, the implications of someone's tone, whether two people look alike, what a hastily handwritten note says, etc. When evaluating an AI application, people always assume that, even if the problems themselves are hard, the responses are obvious and trivial to verify. But that's not the case even for everyday human interaction.
[+] netcan|11 years ago|reply
OTOH, If you can communicate to me in a certain way with euphemisms and idioms, then all the information that I get can be gotten from that communication.

It's not an impossible task, just a difficult one.

[+] SilasX|11 years ago|reply
Not surprisingly, when you try to go the other direction -- add disambiguating features to natural language, for cases where you don't want the other party to have to make a contextual guess -- people try to remove the precision

Case in point: the debasement of the term "literally".

[+] nawitus|11 years ago|reply
It's possible* to build a software which understands natural human language and allows for vague, abstract expressions. Sometimes communication fails, which is why different "commands" should have different risk levels and different degrees of formality. For example, if I order pizza, I can be pretty vague, but when I order clothes online, I need to specify the size, color etc. more formally.

* It's possible since you can always simulate a human brain with software, but there are of course more practical artifical general intelligence systems like the AIXI-mc.

[+] scoofy|11 years ago|reply
Graduate of analytic philosophy here (concentrating in philosophy of language), and my take away is a sloppy: "It's not a bug, it's a feature."

Sorry, but this is a completely shocking article to me. First in it's immediate dismissal of any formalism inherent in natural language, but also with the ease at which he dismisses the proposition without any real consideration.

If we learned anything from Chomsky, it's that the underlying grammar we are born with is both instinctual, and follows formal rules. To say differently is literally, demonstrably false. Irregular verbs, for example, aren't learned in the traditional sense, one must actually unlearn the formal rules. Any child that tells you she "swimmed" all afternoon is using a more formal version of English than you do.

The idea of a natural language programming language is flawed, but not by formalism. It's flawed by the evolutionary nature of natural languages. That is, the very people that he states "are no longer able to use their native tongue effectively," are probably using a new dialect that shares a common ancestor with his more "traditional" usage.

Many people in this thread are talking about the inability of plebes to express what they actually want. This is a fair point, but not a problem with language specifically. Communication tools would need to be employed by a computer in the same way as humans use them. E.g., a simple ambiguity checker could work wonders here, as it does between humans when someone you are talking to simply says, "What did you mean when you said you 'realized you forgot your phone at the train station'? Did you forget at the train station, or realize at the train station?".

What IS a problem, however, is Quine's indeterminacy of translation. That could pose serious hurdles that may be insurmountable, however, we still have effective communication between humans, so it's simple to see how this may only be a theoretical problem, rather than a formal one.

This subject should be under the purview of analytic philosophy and linguistics, not mathematics or computer science.

[+] 305b283f|11 years ago|reply
Agreed.

It seems an appalling number of people agree with Dijkstra here - "we not only don't need natural language, we don't even want it."

I'm sorry, but if I can express to a human being a set of directions to fill out a form in a minute or two and expressing that to a computer takes much longer and is more error-prone, that is an inefficiency in software development which it is extremely desirable to address.

There is nothing magical about human brains that would make them theoretically impossible to express in software in such a way that we can give a program natural language directions, and the fact that so many people want to dismiss this endeavor out of hand is ridiculous to me.

This should be our holy grail, something to strive toward, not something to ignore.

In fact I would suggest that it will most likely be the only way out of the mess of such a wide variety of software standards (ever have fun moving your things to a new system and having to re-learn many things just because you changed PC or phone operating systems?), whereas natural language is a standard we already have and works fine.

This way you essentially wouldn't need to learn a new set of incantations - just tell the damn thing what you want the damn thing to do, dammit.

[+] barrkel|11 years ago|reply
The problem is a bridge between two disciplines. Saying that it should only be built from one side is no better than saying it should only be built from the other.

Indeterminacy of translation is indeed part of the problem. But you're not saying anything new by bringing it up - it's just a jargon term from linguistics to describe a problem a programmer might illustrate with the (buffalo)+ sentence or "(time|fruit) flies like (an arrow|a banana)" example. They already know the core of the problem without needing the whole weight of a linguistics education.

The obvious problem of indeterminacy of translation is why, when computer scientists talk about natural language programming, they do not normally mean natural language processing - even more so in Dijkstra's time, when computers were slower and our NLP algorithms worse.

The core that Dijkstra is getting at here is symbolic reasoning. He's pointing out that natural language is a poor fit for symbolic reasoning, that there's been a history of movement from rhetorical reasoning to symbolic reasoning in mathematics - in fact that mathematics stagnated where rhetorical reasoning persisted.

Even if we solved the translation indeterminacy problem, we would need Strong AI to convert such a high level description into something concrete enough for a computer to do. We propel computing machines using levers made out of abstractions - the higher the tower of abstraction, the longer the lever, and the greater the power. But the problem of programming is not in pushing the lever, it's in building the lever. In a word, it's engineering, not philosophy - it's about how, and not what.

[+] leephillips|11 years ago|reply
"If we learned anything from Chomsky, it's that the underlying grammar we are born with is both instinctual, and follows formal rules. To say differently is literally, demonstrably false."

The rejection of this idea is not only not "demonstrably false", it's actually pretty commonplace among linguists. It would be more accurate to say that Chomsky's theory of a universal grammar is demonstrably false.

"the evolutionary nature of natural languages"

One of Chomsky's more widely criticized ideas (and a pretty bizarre one), is that the language instinct could not have arisen by evolution through natural selection.

[+] exelius|11 years ago|reply
As someone who also has a degree in philosophy (specifically, philosophy of language as it relates to math) analytic philosophy, mathematics and computer science share a lot of common ground. Even the best natural language processing algorithms can only really come up with a probability of the intent of the sentence; this is often not enough for a programming language (which by necessity requires precision to be repeatable).

Nouns themselves can take a number of different meanings (specifically proper nouns and names) depending on context. Communicative languages often don't distinguish between equality and identity; a critical distinction in computer science (i.e. I can truthfully say "I've eaten at same restaurant 3 days in a row" if I ate at 3 different McDonalds locations on consecutive days). These assumptions about identity and equality are not the same from language to language (or even generation to generation, as you discussed). Context is also an issue; we often discuss things in ambiguous contexts which often require clarifying questions from a human. We may have effective communication between humans, but misunderstandings are common. IMO, language is an effective communication tool almost specifically because it is imprecise: we fill in the gaps with our own experiences and it mostly works out.

Overall, I don't think EWD was saying that natural language programming would be impossible; just that the effort required to program in it would likely be more than learning a symbolic programming language. The computer would need to ask so many clarifying questions to reach the level of specificity required for computer science that it would be a very arduous task. Rather than making computer programming more accessible, natural language programming would make it substantially more difficult.

[+] tiger10guy|11 years ago|reply
The ideas that I, as a programmer in the traditional sense, want to communicate to computers are not the ideas I want to communicate to humans.

I might ask my friend to move the report he's working on to a shared network location so I can load it into my computer and read it: "Hey Joe, can you move the report to the share?"

Joe might ask the computer to do the same thing: "cp /home/joe/reports/cool_report.pdf /network/share/reports/cool_report.pdf"

The actual ideas that are communicated are very similar, but not the same. English is good for communicating one idea while bash/GNU is good for communicating the other.

Just because English has some established formalism doesn't mean it's good at communicating the ideas we want to communicate to computers.

BTW, I don't care which field you put the issue under; it's the same issue and anyone who cares about it might contribute to the discussion.

[+] bsenftner|11 years ago|reply
I've been working with a natural language conversational agent for the last few months, creating a simulated personality from a novel, as a promotion for the novel. Natural language opens up a huge host of unreasonable expectations from the user, but more interesting is the utter gibberish people enter thinking they are conversing. When I say utter gibberish, that is exactly what people enter: sentence fragments with no subject, no verb and no correct spellings outside of "a", even "the" is most often spelled "teh". I suspect "natural language" makes people relax, but even when we hook in ASR (automated speech recognition) to get away from the gibberish spellings, the "sentences" people enter do not make sense. To some degree, I believe people are testing the limits of the natural language and knowledge base backing it, but too many of the "conversations" reveal an expectation of unreal super knowledge, like "you should know what I'm hinting, even though I can't spell or describe it"
[+] TheLoneWolfling|11 years ago|reply
Sounds a lot like SMS conversations.

It might as well be a different language in a lot of ways.

[+] canjobear|11 years ago|reply
It's sad to see him buying into this silliness:

> Remark. As a result of the educational trend away from intellectual discipline, the last decades have shown in the Western world a sharp decline of people's mastery of their own language: many people that by the standards of a previous generation should know better, are no longer able to use their native tongue effectively, even for purposes for which it is pretty adequate.

[+] shubb|11 years ago|reply
When people gain a certain authority by being very smart in a narrow field, they often use it to talk about the problems of society in general.

Because though they are an expert in their small field, but no more informed or unbiased than your friend at the pub, they end up sounding dumb.

In this and some of other writings, he would do well to think and gather data about social issues, or talk mostly about algorithms.

[+] pnathan|11 years ago|reply
~2010 Literacy rates in the US are roughly 15ish% fully literate (it's a bit tangled to try and reduce down to a single number; the documents didn't give one, and the data files are in some odd format), according to the US literacy groups and their surveys. It's also worth noting that literacy for the councils on literacy is a pretty low bar.

While I can't speak to total trend (measurements of literacy have changed, as have the prevalence of testing), it's fairly clear in a qualitative fashion that total reading capabilities have declined over the last 100 years. Examine pulps (cheap entertainment books) from the late 1800s, along with childrens' books of the time... significantly more complex paragraphs and much larger vocabulary.

[+] why-el|11 years ago|reply
Dijkstra is right, and this has been a standard problem in philosophy and logic. Only when one considers programming to be some new endeavor divorced from its logical roots that this problem seems new and puzzling. For more, check Russell's Theory of Definite Descriptions.
[+] scoofy|11 years ago|reply
I'm not sure i follow. The problem of definite descriptions has more to do with reference in the external world. I'm not sure i see how it would be a problem with the limited scope of a formal framework in a programming language.

When you're dealing with variables, "morning star" "evening star" problems are essentially irrelevant because you are defining things, rather than merely naming them.

[+] tree_of_item|11 years ago|reply
Dijkstra is wrong on this one. Siri and Cortana have already achieved this for trivial classes of programs, and they will only improve.

People comparing natural language programming to EULAs are missing the point entirely: natural language input guides a search process for formal programs, it isn't literally "the program" itself.

[+] tel|11 years ago|reply
I think Djikstra would not be bothered by this. His arguments tend to be that even with such a search process it's the very providence of formal language to give someone the power to guide that search.

In other words, you'd need formal language to be able to specify what you truly want, then would "translate" it to natural language to execute the search, the processor would search for the proper formal language expression, and then you would verify.

Or you could just skip all the intermediate steps.

[+] mamcx|11 years ago|reply
So, they are not programming. The ones that build the backend do it, and put a natural input interface as a frontend.
[+] ar7hur|11 years ago|reply
Natural language is ambiguous: one phrase often has multiple meanings. That’s the result of many thousands of years of evolution, and it’s actually very efficient. If we had to be always explicit, our phrases and sentences would be much, much longer — and boring. When A says something to B, A makes assumptions on B’s context, common sense,beliefs and knowledge. The verbal message itself just contains the minimum information needed, on top of this pre existing information, in order for B to get it.

Natural Language can be seen as a very efficient compression algorithm. The phrase the speaker chooses is the shortest message, given the context of the receiver.

Programming a computer with Natural Language is incredibly difficult, because Natural Language alone, without the context it is built upon, really lacks much of the information the computer needs to operate the program.

[+] falcolas|11 years ago|reply
All I can picture coming from this is reams of EULAs.

Legal documents contain the most precise language we humans can create that can still be considered to be "natural", and they're still all but unreadable without the proper degree and legal context.

Not precisely what I would call a "win" for the ease of programming

[+] malisper|11 years ago|reply
Dijkstra clearly points out that it would be impossible to program in a natural language. The thing is there are constructed languages such as Lojban[0]. Lojban, while semantically ambiguous, is grammatically unambiguous. This means a computer can understand something said in Lojban, but a computer cannot perfectly understand what is meant. While I'm no expert, I'm pretty sure it would not be very hard to create a computer interface using Lojban. I'm curious as to if anyone can explain why this is or isn't feasible.

[0] http://www.lojban.org/tiki/Lojban

[+] rwallace|11 years ago|reply
That wouldn't help significantly, because the hard part of getting a computer to understand instructions in English isn't understanding English, it's understanding the universe.

Suppose you want a robot you can give instructions like "clean the kitchen". Programming the robot to understand this as performing the action 'clean' on the object 'kitchen' is something we can already do. The problem is that the robot doesn't know how to perform that task. It doesn't even know what what state of affairs constitutes the desirable end result of a clean kitchen (as opposed to e.g. an empty kitchen because it threw out all your food and cutlery along with the trash). That knowledge is the meat of the problem, and it's just as hard in any language.

[+] mcguire|11 years ago|reply
For some reason, Dijkstra quoting A.E. Houseman brightens my day. And then there's this:

"Therefore, although changing to communication between machine and man conducted in the latter's native tongue would greatly increase the machine's burden, we have to challenge the assumption that this would simplify man's life."

[+] todd8|11 years ago|reply
I've noticed the limitation of natural language frequently. Sometimes, its just the inability of natural language to deal with levels of abstraction precisely. Sometimes, its the large assumed domain context that accompanies the use of natural language when describing procedures.

I had clients once where both of these limitations prevented us from being able to produce a product for them. They where incredibly successful Wall Street types. They wanted a software system written that would automate some of what they did. They were traders, but not interested in high velocity trading. After a weeks of meetings about high level goals, we had a meeting to finally get down to specifics about the procedures and functionality that they wanted. They simply couldn't describe in precise language what they did! They had been doing it for years and years very successfully, but we couldn't help them because they couldn't describe what they did. It was very odd.

They worked with sophisticated mathematical models and strange rules of thumb, the morning news, and the perceived level of activity on the exchange floor. Some of them used fancy interactive graphs while others relied on a simple printout of a spreadsheet full of numbers.

The universe they worked in was very complex and involved decision making that, apparently, was hard to describe in words. Imaging a world class boxer trying to put into words the algorithm that he used to win a match. It was a bit like that.

[+] jdmichal|11 years ago|reply
There is a joke that perfectly illustrates this entire article:

A programmers wife asks him, "Please go to the store and buy a carton of milk, and if they have eggs, buy a dozen."

The programmer returns home, and his wife is very angry with him. "Why did you buy twelve cartons of milk?!"

English SE question detailing the linguistic mechanisms: http://english.stackexchange.com/questions/40234/bring-6-egg...

[+] cousin_it|11 years ago|reply
Also,

>> At some point I hope to have computer systems I can program by voice in English, as in "House? Could you wake me up at 7?"

> Yeah, well, I fear the answer will be yes (it could), but it won't do so since you haven't asked it to wake you up, only if it could.

From an exchange on python-list, https://mail.python.org/pipermail/python-list/2003-October/1...

[+] schoen|11 years ago|reply
Wow, the examples in the SE discussion are awesome. I was aware of some of these linguistic issues, but the examples really underscore how subtle pragmatics can be. (The examples give completely syntactically parallel sentences that are pragmatically resolved in opposite ways.)
[+] jiggy2011|11 years ago|reply
Wouldn't this imply that the "buy" instruction is stateful?
[+] davesque|11 years ago|reply
This reminds me of when I first learned to program. By far the hardest part was learning how to think clearly and discretely. It changed the way I looked at the world completely. Really, we should want people to overcome this challenge, not avoid it.
[+] pjungwir|11 years ago|reply
I saw the title and thought, "Oh good, someone else who thinks like Dijkstra!" Oh well. :-)

I feel like the halting problem and Godel's theorem imply programmers will never be out of a job, and the best we can do is build out more and more solutions for specific use-cases, like Wordpress and Shopify. I don't think there will ever be a general-purpose AI that can write computer programs. At least not in my lifetime or my children's.

And btw, has anyone else felt that the Halting Problem and Godel's theorem are two sides of the same coin? Is there any formal connection between them? I feel like they are not "independent" (in the sense that Euclid's 5th postulate is independent).

[+] arketyp|11 years ago|reply
>And btw, has anyone else felt that the Halting Problem and Godel's theorem are two sides of the same coin? Is there any formal connection between them? I feel like they are not "independent" (in the sense that Euclid's 5th postulate is independent).

The theorem of the unsolvability of the halting problem is used in modern proofs of Gödels incompletness teorem(s) [1]. Hofstadter writes about this relation in Gödel Escher Bach as well.

[1] http://www.amazon.com/Lectures-Logic-Set-Theory-Mathematical...

[+] sp332|11 years ago|reply
This reminds me of Inform 7. The source code for the language is pretty much plain English. I like this because programs are read more often than they are written, and it's usually very easy to read a program and see what's going on. But the syntax is still pretty limited, and it only understands certain ways of expressing things. So you still have to learn the syntax and very specific semantics, even though the language looks like English.
[+] michaelvkpdx|11 years ago|reply
If you use the same language to solve a problem as you use to describe the problem and to define success, then your ability to accurately solve the problem will be constrained by the level of precision that is available in the language being used.

Given this- natural language programming can hope to achieve reasonable levels of precision in very few languages. English, for example, has over a million words, while French and Italian are both under 100K. That 10x discrepancy in words means that there is necessarily far more imprecision in romance languages than English. Graamatical constructs can add precision and clarity but cannot make up for the inherent gap in vocabulary precision.

Someone with more knowledge can comment on the feasibility of precise NLP in non-English, non-romance languages.

If, however, acceptance testing is defined through the precision of a low-level programming language, but the problem is defined in a less-precise natural language, there will always be a precision mismatch between the languages used to define the problem. A solution may be implemented that addresses the natural language issue but cannot meet the constraints of the programming language.

[+] mwfunk|11 years ago|reply
Sometimes I wish that the term 'language' had never been used to describe the blobs of text we use to create formal specifications for programs. This invites all sorts of comparisons/analogies/metaphors to spoken and written languages that may not necessarily be meaningful or constructive.

I think of source code as being more analogous to architectural blueprints or formal logic than someone's verbal description of an object. It's just a coincidence that source code is governed by structures and concepts that exist in human languages.

Usually when people say they want to be able to create software with natural language, what they want are better tools that allow them to more efficiently create something, with less time spent worrying about syntax or technical minutiae. This outcome doesn't require natural language, it just requires better tools.

[+] tim333|11 years ago|reply
I kind of disagree with the article and would like to be able to use natural language. The fact that computers are currently too dumb to understand English is an issue but not the one the author seems to be going on about. One of the reasons I like Python is it reads rather like English making code easy to understand and one of the main reasons for Lisp not really taking off is its reading so unlike natural languages makes it hard to understand. The comparison with maths seems false to me - many mathematical concepts express very badly in English and elegantly in mathematical symbolism - complex numbers, path integrals and all that. There seems little with that kind of complexity in computer science it's all add this, write this to the database, draw this on the screen etc.
[+] markbnj|11 years ago|reply
I think you missed Dijkstra's point. Why don't we use "natural language" to reason about mathematics, physics, logic, or even the structure of language itself?
[+] the_af|11 years ago|reply
But people are too dumb to understand English as well. In fact, I may have missed your point entirely! (which, oddly, would also favor my position). Why do you want computer programs to improve at something people are notoriously bad doing?

Python is nothing like English. It is an extremely precise formalism, and thank god for that.

[+] josephschmoe|11 years ago|reply
This article is an Appeal to Novelty fallacy. He never gives a reason why explicit language is better - just that it occurred later in civilization.

Beyond this, natural language programming doesn't preclude explicitness. You could still have portions of limited explicit language or even portions of actual code if that is insufficient.

Natural language programming gets us two advantages:

1. Easy entry (presumably what the author cares about)

2. A language which is composed of an infinite number of DSLs - but without the pains of limited language scope and in which DSLs are actually easy to write. This actually fixes the problem author was taking about - using a language for a purpose it clearly wasn't meant to.

And if you think natural language programming is going to get rid of symbolism in language, I've got a U+1F309 to sell you.

[+] cramerica|11 years ago|reply
The problem is of course the vagueness in natural languages caused by the complexity of its grammar and the unstated contextual nature of it all.

"The black truck driver ran through the red light."

Is the truck black or is the truck driver black?

Is the driver driving, or is he running on foot?

Is the red light a traffic indicator, or is it a just a beam of red light?

Teaching a computer to figure out the intended interpretation here is a monumental task, and teaching the people to understand how their use of language is actually very easy to misunderstand is even harder.

[+] Qantourisc|11 years ago|reply
Indeed, but figuring this out is trick

In Western countries you'd assume it's a black driver. In a country where almost everyone is black you'd assume a black truck. So the code could have bugs depending on the location/culture (and thus context).

[+] hayksaakian|11 years ago|reply
Maybe we need some contextual analysis capabilities and new class of errors based on overly ambiguous code.

(Along with a user defined value that let's the computer take its best guess)