Speranza
The Turing Test is one of the most disputed topics in Artificial Intelligence, Philosophy of Mind and Cognitive Science.
It has been proposed 50 years ago, as a method to determine whether machines can think or not.
It embodies important philosophical issues, as well as computational ones.
Moreover, because of its characteristics, it requires interdisciplinary attention.
The Turing Test posits that, to be granted intelligence, a computer should imitate human conversational behavior so well that it should be indistinguishable from a real human being.
From this, it follows that conversation is a crucial concept in its study.
Surprisingly, focusing on conversation in relation to the Turing Test has not been a prevailing approach in previous research.
Saygin's thesis provides a thorough and deep review of the 50 years of the Turing Test.
Philosophical arguments, computational concerns, and repercussions in other disciplines are all discussed.
Furthermore, Saygin studies the Turing Test as a special kind of conversation.
In doing so, the relationship between existing theories of conversation and human-computer communication is explored.
In particular, Herbert Paul Grice's cooperative principle and conversational maxims (and the idea of 'implicature' first introduced by Grice in his Oxford seminars on "Logic and Conversation", 1965) are concentrated on.
Viewing the Turing Test as conversation and computers as language users have significant effects on the way we look at Artificial Intelligence, and on communication in general.
Friday, January 30, 2015
The Grice Computer
Speranza
Theological Objection: This states that thinking is a function of man's immortal soul; therefore, a machine cannot think. "In attempting to construct such machines," wrote Turing, "we should not be irreverently usurping His power of creating souls, any more than we are in the procreation of children: rather we are, in either case, instruments of His will providing mansions for the souls that He creates."
Even though neurons fire in an all-or-nothing pulse, both the exact timing of the pulse and the probability of the pulse occurring have analog components. Turing acknowledges this, but argues that any analog system can be simulated to a reasonable degree of accuracy given enough computing power. (Philosopher Hubert Dreyfus would make this argument against "the biological assumption" in 1972.)[14]
Here Turing first returns to Lady Lovelace's objection that the machine can only do what we tell it to do and he likens it to a situation where a man "injects" an idea into the machine to which the machine responds and then falls off into quiescence. He extends on this thought by an analogy to a atomic pile of less than critical size which is to be considered the machine and an injected idea is to correspond to a neutron entering the pile from outside the pile; the neutron will cause a certain disturbance which eventually dies away. Turing then builds on that analogy and mentions that if the size of the pile were to be sufficiently large then a neutron entering the pile would cause a disturbance that would continue to increase until the whole pile were destroyed, the pile would be supercritical. Turing then asks the question as to whether this analogy of a super critical pile could be extended to a human mind and then to a machine. He concludes that such an analogy would indeed be suitable for the human mind with "There does seem to be one for the human mind. The majority of them seem to be "subcritical," i.e., to correspond in this analogy to piles of sub critical size. An idea presented to such a mind will on average give rise to less than one idea in reply. A smallish proportion are supercritical. An idea presented to such a mind that may give rise to a whole "theory" consisting of secondary, tertiary and more remote ideas". He finally asks if a machine could be made to be supercritical.
Turing then mentions that the task of being able create a machine that could play the imitation game is one of programming and he postulates that by the end of the century it will indeed be technologically possible to program a machine to play the game. He then mentions that in the process of trying to imitate a adult human mind it becomes important to consider the processes that lead to the adult mind being in its present state; which he summarizes as:
Nature of inherent complexity: The child machine could either be one that is as simple as possible, merely maintaining consistency with general principles, or the machine could be one with a complete system of logical inference programmed into it. This more complex system is explained by Turing as "..would be such that the machines store would be largely occupied with definitions and propositions. The propositions would have various kinds of status, e.g., well-established facts, conjectures, mathematically proved theorems, statements given by an authority, expressions having the logical form of proposition but not belief-value. Certain propositions may be described as "imperatives." The machine should be so constructed that as soon as an imperative is classed as "well established" the appropriate action automatically takes place.". Despite this built-in logic system the logical inference programmed in would not be one that is formal, rather it would be one that is more pragmatic. In addition the machine would build on its built-in logic system by a method of "scientific induction".
Ignorance of the experimenter: An important feature of a learning machine that Turing points out is the ignorance of the teacher of the machines' internal state during the learning process. This is in contrast to a conventional discrete state machine where the objective is to have a clear understanding of the internal state of the machine at every moment during the computation. The machine will be seen to be doing things that we often cannot make sense of or somthing that we consider to be completely random. Turing mentions that this specific character bestows upon a machine a certain degree of what we consider to be intelligence, in that intelligent behaviour consists of a deviation from the complete determinism of conventional computation but only so long as the deviation does not give rise to pointless loops or random behaviour.
The importance of random behaviour: Though Turing cautions us of random behaviour he mentions that inculcating an element of randomness in a learning machine would be of value in a system. He mentions that this could be of value where there might be multiple correct answers or ones where it might be such that a systematic approach would investigate several unsatisfactory solutions to a problem before finding the optimal solution which would entail the systematic process inefficient. Turing also mentions that the process of evolution takes the path of random mutations in order to find solutions that benefit a organism but he also admits that in the case of evolution the systematic method of finding a solution would not be possible.
Turing concludes by speculating about a time when machines will compete with humans on numerous intellectual tasks and suggests tasks that could be used to make that start. Turing then suggests that abstract tasks such as playing chess could be a good place to start another method which he puts as "..it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English.".
An examination of the development in artificial intelligence that has followed reveals that the learning machine did take the abstract path suggested by Turing as in the case of Deep Blue, a chess playing computer developed by IBM and one which defeated the world champion Garry Kasparov (though, this too is controversial) and the numerous computer chess games which can outplay most amateurs.[16] As for the second suggestion Turing makes, it has been likened by some authors as a call to finding a simulacrum of human cognitive development.[16] And such attempts at finding the underlying algorithms by which children learn of the features of the world around them are only beginning to be made.[16][17][18]
"Computing Machinery and Intelligence", written by Alan Turing and published in 1950 in Mind, is a seminal paper on the topic of artificial intelligence in which the concept of what is now known as the Turing test was introduced to a wide audience.
Turing's paper considers the question "Can machines think?"
Since the words "think" and "machine" cannot be defined in a clear way that satisfies everyone, Turing suggests we "replace the question by another, which is closely related to it and is expressed in relatively unambiguous words."
To do this, he must first find a simple and unambiguous idea to replace the word "think", second he must explain exactly which "machines" he is considering, and finally, armed with these tools, he formulates a new question, related to the first, that he believes he can answer in the affirmative.
Main article: Turing test
Rather than trying to determine if a machine is thinking, Turing suggests we should ask if the machine can win a game, called the "Imitation Game".
The original Imitation game that Turing described is a simple party game involving three players. Player A is a man, player B is a woman and player C (who plays the role of the interrogator) can be of either sex. In the Imitation Game, player C is unable to see either player A or player B (and knows them only as X and Y), and can communicate with them only through written notes or any other form that does not give away any details about their gender. By asking questions of player A and player B, player C tries to determine which of the two is the man and which is the woman. Player A's role is to trick the interrogator into making the wrong decision, while player B attempts to assist the interrogator in making the right one.
Turing proposes a variation of this game that involves the computer as:
'What will happen when a machine takes the part of A in this game?'
Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, 'Can machines think?'"[2] So the modified game becomes one that involves three participants in isolated rooms: a computer (which is being tested), a human, and a (human) judge. The human judge can converse with both the human and the computer by typing into a terminal. Both the computer and human try to convince the judge that they are the human. If the judge cannot consistently tell which is which, then the computer wins the game.[3]
Turing writes
As Stevan Harnad notes,[4] the question has become "Can machines do what we (as thinking entities) can do?"
In other words, Turing is no longer asking whether a machine can "think".
He is asking whether a machine can act indistinguishably[5] from the way a thinker acts.
This question avoids the difficult philosophical problem of pre-defining the verb "to think" and focuses instead on the performance capacities that being able to think makes possible, and how a causal system can generate them.
Some have taken Turing's question to have been "Can a computer, communicating over a teleprinter, fool a person into believing it is human?" [6] but it seems clear that Turing was not talking about fooling people but about generating human cognitive capacity.[7]
See also: Turing machine and Church–Turing thesis
Turing also notes that we need to determine which "machines" we wish to consider.
He points out that a human clone, while man-made, would not provide a very interesting example.
Turing suggested that we should focus on the capabilities of digital machinery—machines which manipulate the binary digits of 1 and 0, rewriting them into memory using simple rules. He gave two reasons.
First, there is no reason to speculate whether or not they can exist.
They already did in 1950.
Second, digital machinery is "universal." Turing's research into the foundations of computation had proved that a digital computer can, in theory, simulate the behaviour of any other digital machine, given enough memory and time.
This is the essential insight of the Church–Turing thesis and the universal Turing machine.
Therefore, if any digital machine can "act like it is thinking" then, every sufficiently powerful digital machine can. Turing writes, "all digital computers are in a sense equivalent."
This allows the original question to be made even more specific.
Turing now restates the original question as "Let us fix our attention on one particular digital computer C. Is it true that by modifying this computer to have an adequate storage, suitably increasing its speed of action, and providing it with an appropriate programme, C can be made to play satisfactorily the part of A in the imitation game, the part of B being taken by a man?"[8] This question, he believes, can be answered without resorting to speculation or philosophy.
Hence Turing states that the focus is not on "whether all digital computers would do well in the game nor whether the computers that are presently available would do well, but whether there are imaginable computers which would do well".[9] What is more important is to consider the advancements possible in the state of our machines today regardless of whether we have the available resource to create one or not.
See also: Philosophy of artificial intelligence
Having clarified the question, Turing turned to answering it: he considered the following nine common objections, which include all the major arguments against artificial intelligence raised in the years since his paper was first published.[10]
Be kind, resourceful, beautiful, friendly, have initiative, have a sense of humour, tell right from wrong, make mistakes, fall in love, enjoy strawberries and cream, make someone fall in love with it, learn from experience, use words properly, be the subject of its own thought, have as much diversity of behaviour as a man, do something really new.Turing notes that "no support is usually offered for these statements," and that they depend on naive assumptions about how versatile machines may be in the future, or are "disguised forms of the argument from consciousness." He chooses to answer a few of them:
The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. It can follow analysis; but it has no power of anticipating any analytical relations or truths.Turing suggests that Lovelace's objection can be reduced to the assertion that computers "can never take us by surprise" and argues that, to the contrary, computers could still surprise humans, in particular where the consequences of different facts are not immediately recognizable. Turing also argues that Lady Lovelace was hampered by the context from which she wrote, and if exposed to more contemporary scientific knowledge, it would become evident that the brain's storage is quite similar to that of a computer.
Even though neurons fire in an all-or-nothing pulse, both the exact timing of the pulse and the probability of the pulse occurring have analog components. Turing acknowledges this, but argues that any analog system can be simulated to a reasonable degree of accuracy given enough computing power. (Philosopher Hubert Dreyfus would make this argument against "the biological assumption" in 1972.)[14]
Here Turing first returns to Lady Lovelace's objection that the machine can only do what we tell it to do and he likens it to a situation where a man "injects" an idea into the machine to which the machine responds and then falls off into quiescence. He extends on this thought by an analogy to a atomic pile of less than critical size which is to be considered the machine and an injected idea is to correspond to a neutron entering the pile from outside the pile; the neutron will cause a certain disturbance which eventually dies away. Turing then builds on that analogy and mentions that if the size of the pile were to be sufficiently large then a neutron entering the pile would cause a disturbance that would continue to increase until the whole pile were destroyed, the pile would be supercritical. Turing then asks the question as to whether this analogy of a super critical pile could be extended to a human mind and then to a machine. He concludes that such an analogy would indeed be suitable for the human mind with "There does seem to be one for the human mind. The majority of them seem to be "subcritical," i.e., to correspond in this analogy to piles of sub critical size. An idea presented to such a mind will on average give rise to less than one idea in reply. A smallish proportion are supercritical. An idea presented to such a mind that may give rise to a whole "theory" consisting of secondary, tertiary and more remote ideas". He finally asks if a machine could be made to be supercritical.
Turing then mentions that the task of being able create a machine that could play the imitation game is one of programming and he postulates that by the end of the century it will indeed be technologically possible to program a machine to play the game. He then mentions that in the process of trying to imitate a adult human mind it becomes important to consider the processes that lead to the adult mind being in its present state; which he summarizes as:
-
- 1. The initial state of the mind, say at birth,
- 2. The education to which it has been subjected,
- 3. Other experience, not to be described as education, to which it has been subjected.
-
- Structure of the child machine = hereditary material
- Changes of the child machine = mutations
- Natural selection = judgment of the experimenter
Nature of inherent complexity: The child machine could either be one that is as simple as possible, merely maintaining consistency with general principles, or the machine could be one with a complete system of logical inference programmed into it. This more complex system is explained by Turing as "..would be such that the machines store would be largely occupied with definitions and propositions. The propositions would have various kinds of status, e.g., well-established facts, conjectures, mathematically proved theorems, statements given by an authority, expressions having the logical form of proposition but not belief-value. Certain propositions may be described as "imperatives." The machine should be so constructed that as soon as an imperative is classed as "well established" the appropriate action automatically takes place.". Despite this built-in logic system the logical inference programmed in would not be one that is formal, rather it would be one that is more pragmatic. In addition the machine would build on its built-in logic system by a method of "scientific induction".
Ignorance of the experimenter: An important feature of a learning machine that Turing points out is the ignorance of the teacher of the machines' internal state during the learning process. This is in contrast to a conventional discrete state machine where the objective is to have a clear understanding of the internal state of the machine at every moment during the computation. The machine will be seen to be doing things that we often cannot make sense of or somthing that we consider to be completely random. Turing mentions that this specific character bestows upon a machine a certain degree of what we consider to be intelligence, in that intelligent behaviour consists of a deviation from the complete determinism of conventional computation but only so long as the deviation does not give rise to pointless loops or random behaviour.
The importance of random behaviour: Though Turing cautions us of random behaviour he mentions that inculcating an element of randomness in a learning machine would be of value in a system. He mentions that this could be of value where there might be multiple correct answers or ones where it might be such that a systematic approach would investigate several unsatisfactory solutions to a problem before finding the optimal solution which would entail the systematic process inefficient. Turing also mentions that the process of evolution takes the path of random mutations in order to find solutions that benefit a organism but he also admits that in the case of evolution the systematic method of finding a solution would not be possible.
Turing concludes by speculating about a time when machines will compete with humans on numerous intellectual tasks and suggests tasks that could be used to make that start. Turing then suggests that abstract tasks such as playing chess could be a good place to start another method which he puts as "..it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English.".
An examination of the development in artificial intelligence that has followed reveals that the learning machine did take the abstract path suggested by Turing as in the case of Deep Blue, a chess playing computer developed by IBM and one which defeated the world champion Garry Kasparov (though, this too is controversial) and the numerous computer chess games which can outplay most amateurs.[16] As for the second suggestion Turing makes, it has been likened by some authors as a call to finding a simulacrum of human cognitive development.[16] And such attempts at finding the underlying algorithms by which children learn of the features of the world around them are only beginning to be made.[16][17][18]
See also[edit]
Notes[edit]
- Jump up ^ Turing 1950, p. 433
- Jump up ^ Turing 1950, p. 434
- Jump up ^ This describes the simplest version of the test. For a more detailed discussion, see Versions of the Turing test.
- Jump up ^ Harnad, Stevan (2008), "The Annotation Game: On Turing (1950) on Computing, Machinery, and Intelligence", in Epstein, Robert; Peters, Grace, The Turing Test Sourcebook: Philosophical and Methodological Issues in the Quest for the Thinking Computer, Kluwer
- Jump up ^ Harnad, Stevan (2001), "Minds, Machines, and Turing: The Indistinguishability of Indistinguishables", Journal of Logic, Language, and Information 9 (4): 425–445.
- Jump up ^ Wardrip-Fruin, Noah and Nick Montfort, ed (2003). The New Media Reader. The MIT Press. ISBN 0-262-23227-8.
- Jump up ^ Harnad, Stevan (1992), "The Turing Test Is Not A Trick: Turing Indistinguishability Is A Scientific Criterion", SIGART Bulletin 3 (4): 9–10.
- ^ Jump up to: a b Turing 1950, p. 442
- Jump up ^ Turing 1950, p. 436
- Jump up ^ Turing 1950 and see Russell & Norvig 2003, p. 948 where comment "Turing examined a wide variety of possible objections to the possibility of intelligent machines, including virtually all of those that have been raised in the half century since his paper appeared."
- Jump up ^ Lucas 1961, Penrose 1989, Hofstadter 1979, pp. 471–473,476–477 and Russell & Norvig 2003, pp. 949–950. Russell and Norvig identify Lucas and Penrose's arguments as being the same one answered by Turing.
- Jump up ^ "The Mind of Mechanical Man"
- Jump up ^ Searle 1980 and Russell & Norvig 2003, pp. 958–960, who identify Searle's argument with the one Turing answers.
- Jump up ^ Dreyfus 1979, p. 156
- Jump up ^ Dreyfus 1972, Dreyfus & Dreyfus 1986, Moravec 1988 and Russell & Norvig 2003, pp. 51–52, who identify Dreyfus' argument with the one Turing answers.
- ^ Jump up to: a b c Epstein, Robert; Roberts, Gary; Beber, Grace (2008). Parsing the Turing Test:Philosophical and Methodological Issues in the Quest for the Thinking Computer. Springer. p. 65. ISBN 978-1-4020-6710-5.
- Jump up ^ Gopnik, Alison; Meltzoff., Andrew N. (1997). Words, thoughts, and theories.. MIT Press.
- Jump up ^ Meltzoff, Andrew N. (1999). "Origins of theory of mind, cognition and communication.". Journal of communication disorders 32.4: 251–269.
References[edit]
- Brooks, Rodney (1990), "Elephants Don't Play Chess", Robotics and Autonomous Systems 6: 3–15, doi:10.1016/S0921-8890(05)80025-9, retrieved 2007-08-30
- Crevier, Daniel (1993), AI: The Tumultuous Search for Artificial Intelligence, New York, NY: BasicBooks, ISBN 0-465-02997-3
- Dreyfus, Hubert (1972), What Computers Can't Do, New York: MIT Press, ISBN 0-06-011082-1
- Dreyfus, Hubert; Dreyfus, Stuart (1986), Mind over Machine: The Power of Human Intuition and Expertise in the Era of the Computer, Oxford, UK: Blackwell
- Dreyfus, Hubert (1979), What Computers Still Can't Do, New York: MIT Press .
- Harnad, Stevan; Scherzer, Peter (2008), "First, Scale Up to the Robotic Turing Test, Then Worry About Feeling", Artificial Intelligence in Medicine 44 (2): 83–9, doi:10.1016/j.artmed.2008.08.008, PMID 18930641 .
- Haugeland, John (1985), Artificial Intelligence: The Very Idea, Cambridge, Mass.: MIT Press .
- Moravec, Hans (1976), The Role of Raw Power in Intelligence
- Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-790395-2
- Searle, John (1980), "Minds, Brains and Programs", Behavioral and Brain Sciences 3 (3): 417–457, doi:10.1017/S0140525X00005756
- Turing, Alan (October 1950), "Computing Machinery and Intelligence", Mind LIX (236): 433–460, doi:10.1093/mind/LIX.236.433
- Saygin, A. P. (2000), "Turing Test: 50 years later". Minds and Machines 10 (4): 463–518.
- Noah Wardrip-Fruin and Nick Montfort, eds. (2003). The New Media Reader. Cambridge: MIT Press. ISBN 0-262-23227-8. "Lucasfilm's Habitat" pp. 663–677.
External links[edit]
- PDF with the full text of the paper
- Full text of the paper
- "An analysis and review of the next 50 years". CiteSeerX: 10
.1 .1 .157 .1592.
Turing and Grice on rationality: in an experimental study looking at Griceian maxim violations and exploitations using transcripts of Loebner's one-to-one (interrogator-hidden interlocutor) Prize for AI contests between 1994 and 1999, Ayse Saygin finds significant differences between the responses of participants who knew and did not know about computers being involved.
Speranza
The question of whether it is possible for machines to think has a long history, which is firmly entrenched in the distinction between dualist and materialist views of the mind.
René Descartes prefigures aspects of the Turing Test in his 1637 Discourse on the Method when he writes:
"How many different automata or moving machines can be made by the industry of man or we can easily understand a machine's being constituted so that it can utter words, and even emit some responses to action on it of a corporeal kind, which brings about a change in its organs; for instance, if touched in a particular part it may ask what we wish to say to it; if in another part it may exclaim that it is being hurt, and so on."
"But it never happens that it arranges its speech in various ways, in order to reply appropriately to everything that may be said in its presence, as even the lowest type of man can do."
Here Descartes notes that automata are capable of responding to human interactions but argues that such automata can not respond appropriately to things said in their presence in the way that any human can.
Descartes therefore prefigures the Turing Test by identifying the insufficiency of appropriate linguistic response as that which separates the human from the automaton.
Descartes fails to consider the possibility that the insufficiency of appropriate linguistic response might be capable of being overcome by future automata and so does not propose the Turing Test as such, even if he prefigures its conceptual framework and criterion.
Denis Diderot formulates in his Pensées philosophiques a Turing-test criterion:
"If they find a parrot who could answer to everything, I would claim it to be an intelligent being without hesitation."
------ There is a similar treatment of the parrot problem by Locke (vide Jones's edition of the Essay). Locke is referring to Prince Maurice's Parrot, who can be deemed 'rational, very intelligent', or 'very intelligent, rational, rather.
Grice loved Locke's discussion so much that he speaks of 'very intelligent, rational' PIROTS, punning on Carnap ("Pirots carulise elatically").
-----
This does not mean Diderot agrees with this, but that it was already a common argument of materialists at that time.
According to dualism, the mind is non-physical (or, at the very least, has non-physical properties) and, therefore, cannot be explained in purely physical terms.
According to materialism, the mind can be explained physically, which leaves open the possibility of minds that are produced artificially.
In 1936, philosopher Alfred Ayer considered the standard philosophical question of other minds. This he do in Oxford, in Cambridge, Wisdom was taking the issue more seriously ("Other Minds").
How do we know that other people have the same conscious experiences that we do?
In his book, Language, Truth and Logic, Ayer suggests a protocol to distinguish between a conscious man and an unconscious machine.
"The only ground I can have," Ayer writes, "for asserting that an object which appears to be conscious is not really a conscious being, but only a dummy or a machine, is that it fails to satisfy one of the empirical tests by which the presence or absence of consciousness is determined."
This suggestion is very similar to the Turing test, but is concerned with consciousness rather than intelligence.
Moreover, it is not certain that Ayer's popular philosophical classic was familiar to Turing.
We doubt it wasn't.
In other words, a thing is not conscious if it fails the consciousness test.
Researchers in the United Kingdom had been exploring "machine intelligence" for up to ten years prior to the founding of the field of artificial intelligence (AI) research in 1956.
It was a common conversational topic among the members of the Ratio Club, who were an informal group of British cybernetics and electronics researchers that included Alan Turing, after whom the test is named.
The Ratio Club met in different nice pubs.
Turing, in particular, had been tackling the notion of machine intelligence since at least 1941 and one of the earliest-known mentions of "computer INTELLIGENCE" was made by him in 1947.
In Turing's report, "Intelligent Machinery", Turing nvestigates "the question of whether or not it is possible for machinery to show intelligent behaviour" -- "intelligent" applied to 'behaviour' --, and as part of that investigation, proposed what may be considered the forerunner to his later tests:
Turing: "It is not difficult to devise a paper machine which will play a not very bad game of chess."
Grice loved chess, and discusses chess in "Aspects of reasoning".
Turing: "Now get three men as subjects for the experiment. A, B and C. A and C are to be rather poor chess players, B is the operator who works the paper machine. Two rooms are used with some arrangement for communicating moves, and a game is played between C and either A or the paper machine. C may find it quite difficult to tell which he is playing."
"Computing Machinery and Intelligence" (Mind, October 1950) was the first published paper by Turing to focus exclusively on machine intelligence.
Turing begins the 1950 paper with the claim,
"I propose to consider the question 'Can machines think?" (implicature: as meaningless).
As he highlights, the traditional approach to such a question is to start with definitions, defining both the terms "machine" and "intelligence".
Turing chooses NOT to do so. Instead he replaces the question with a new, more operational, one, "which is closely related to it and is expressed in relatively unambiguous words."
In essence he proposes to change the question from "Can machines think?" to "Can machines do what we (as thinking entities) can do?"
The advantage of the new question, Turing argues, is that it draws "a fairly sharp line between the physical and intellectual capacities of a man."
To demonstrate this approach Turing proposes a test inspired by a party game, known as the "Imitation Game," in which a man and a woman go into separate rooms and guests try to tell them apart by writing a series of questions and reading the typewritten answers sent back.
In this game both the man and the woman aim to convince the guests that they are the other.
Huma Shah argues that this two-human version of the game was presented by Turing only to introduce the reader to the machine-human question-answer test.
Turing described his new version of the game as follows:
We now ask the question, "What will happen when a machine takes the part of A in this game?" Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, "Can machines think?"
Later in the paper Turing suggests an "equivalent" alternative formulation involving a judge conversing only with a computer and a man.
While neither of these formulations precisely matches the version of the Turing Test that is more generally known today, he proposed a third in 1952.
In this version, which Turing discussed in a BBC radio broadcast, a jury asks questions of a computer and the role of the computer is to make a significant proportion of the jury believe that it is really a man.
By that time, Grice was also giving radio broadcasts, notably those later reprinted by Pears in "The nature of metaphysics" (Third programme).
Turing's paper considered nine putative objections, which include all the major arguments against artificial intelligence that have been raised in the years since the paper was published (see "Computing Machinery and Intelligence").
The program, known as ELIZA, worked by examining a user's typed comments for keywords.
If a keyword is found, a rule that transforms the user's comments is applied, and the resulting sentence is returned.
If a keyword is not found, ELIZA responds either with a generic riposte or by repeating one of the earlier comments.
In addition, Weizenbaum developed ELIZA to replicate the behaviour of a Rogerian psychotherapist, allowing ELIZA to be "free to assume the pose of knowing almost nothing of the real world."
With these techniques, Weizenbaum's program was able to fool some people into believing that they were talking to a real person, with some subjects being "very hard to convince that ELIZA [...] is not human."
Thus, ELIZA is claimed by some to be one of the programs (perhaps the first) able to pass the Turing Test, even though this view is highly contentious.
Kenneth Colby created PARRY in 1972, a program described as "ELIZA with attitude".
It attempted to model the behaviour of a paranoid schizophrenic, using a similar (if more advanced) approach to that employed by Weizenbaum.
To validate the work, PARRY was tested in the early 1970s using a variation of the Turing Test.
A group of experienced psychiatrists analysed a combination of real patients and computers running PARRY through teleprinters.
Another group of 33 psychiatrists were shown transcripts of the conversations.
The two groups were then asked to identify which of the "patients" were human and which were computer programs.
The psychiatrists were able to make the correct identification only 48 percent of the time – a figure consistent with random guessing.
In the 21st century, versions of these programs (now known as "chatterbots") continue to fool people.
"CyberLover", a malware program, preys on Internet users by convincing them to "reveal information about their identities or to lead them to visit a web site that will deliver malicious content to their computers".
The program has emerged as a "Valentine-risk" flirting with people "seeking relationships online in order to collect their personal data".
John Searle's 1980 paper Minds, Brains, and Programs proposed the "Chinese room" thought experiment and argued that the Turing test could not be used to determine if a machine can think.
Searle noted that software (such as ELIZA) could pass the Turing Test simply by manipulating symbols of which they had no understanding.
Without understanding, they could not be described as "thinking" in the same sense people do.
Therefore, Searle concludes, the Turing Test cannot prove that a machine can think.
Much like the Turing test itself, Searle's argument has been both widely criticised and highly endorsed.
Arguments such as Searle's and others working on the philosophy of mind sparked off a more intense debate about the nature of intelligence, the possibility of intelligent machines and the value of the Turing test that continued through the 1980s and 1990s.
The Loebner Prize provides an annual platform for practical Turing Tests with the first competition held in November 1991.
It is underwritten by Hugh Loebner.
The Cambridge Center for Behavioral Studies in Massachusetts, United States, organized the prizes up to and including the 2003 contest.
As Loebner described it, one reason the competition was created is to advance the state of AI research, at least in part, because no one had taken steps to implement the Turing Test despite 40 years of discussing it.
The first Loebner Prize competition in 1991 led to a renewed discussion of the viability of the Turing Test and the value of pursuing it, in both the popular press and the academia.
The first contest was won by a mindless program with no identifiable intelligence that managed to fool naive interrogators into making the wrong identification.
This highlighted several of the shortcomings of the Turing Test.
The winner won, at least in part, because it was able to "imitate human typing errors".
The unsophisticated interrogators were easily fooled; and some researchers in AI have been led to feel that the test is merely a distraction from more fruitful research.
The silver (text only) and gold (audio and visual) prizes have never been won.
However, the competition has awarded the bronze medal every year for the computer system that, in the judges' opinions, demonstrates the "most human" conversational behaviour among that year's entries.
Artificial Linguistic Internet Computer Entity (A.L.I.C.E.) has won the bronze award on three occasions in recent times (2000, 2001, 2004).
Learning AI Jabberwacky won in 2005 and 2006.
The Loebner Prize tests conversational intelligence; winners are typically chatterbot programs, or Artificial Conversational Entities (ACE)s.
Early Loebner Prize rules restricted conversations: Each entry and hidden-human conversed on a single topic, thus the interrogators were restricted to one line of questioning per entity interaction.
The restricted conversation rule was lifted for the 1995 Loebner Prize. Interaction duration between judge and entity has varied in Loebner Prizes.
In Loebner 2003, at the University of Surrey, each interrogator was allowed five minutes to interact with an entity, machine or hidden-human.
Between 2004 and 2007, the interaction time allowed in Loebner Prizes was more than twenty minutes.
In 2008, the interrogation duration allowed was five minutes per pair, because the organiser, Kevin Warwick, and coordinator, Huma Shah, consider this to be the duration for any test, as Turing stated in his 1950 paper:
" ... making the right identification after five minutes of questioning".
They felt Loebner's longer test, implemented in Loebner Prizes 2006 and 2007, was inappropriate for the state of artificial conversation technology.
It is ironic that the 2008 winning entry, Elbot from Artificial Solutions, does not mimic a human.
Its personality is that of a robot, yet Elbot deceived three human judges that it was the human during human-parallel comparisons.
During the 2009 competition, held in Brighton, UK, the communication program restricted judges to 10 minutes for each round, 5 minutes to converse with the human, 5 minutes to converse with the program.
This was to test the alternative reading of Turing's prediction that the 5-minute interaction was to be with the computer.
For the 2010 competition, the Sponsor has again increased the interaction time, between interrogator and system, to 25 minutes.
On 7 June 2014 a Turing test competition, organized by Huma Shah and Kevin Warwick to mark the 60th anniversary of Turing's death, was held at the Royal Society London and was won by the Russian chatter bot Eugene Goostman.
The bot, during a series of five-minute-long text conversations, convinced 33% of the contest's judges that it was human.
Judges included John Sharkey, a sponsor of the bill granting a government pardon to Turing, AI Professor Aaron Sloman and Red Dwarf actor Robert Llewellyn.
The competition's organisers believed that the Turing test had been "passed for the first time" at the event, saying that "some will claim that the Test has already been passed."
"The words Turing Test have been applied to similar competitions around the world."
"However this event involved the most simultaneous comparison tests than ever before, was independently verified and, crucially, the conversations were unrestricted."
"A true Turing Test does not set the questions or topics prior to the conversations."
The contest has faced criticism.
First, only a third of the judges were fooled by the computer.
Second, the program's character claimed to be a 13 year old Ukrainian who learned English as a second language.
The contest required 30% of judges to be fooled, which was based on Turing's statement in his Computing Machinery and Intelligence paper.
Joshua Tenenbaum, an AI expert at MIT stated that, in his view, the result was unimpressive.
Saul Traiger argues that there are at least three primary versions of the Turing test, two of which are offered in "Computing Machinery and Intelligence" and one that he describes as the "Standard Interpretation."
While there is some debate regarding whether the "Standard Interpretation" is that described by Turing or, instead, based on a misreading of his paper, these three versions are not regarded as equivalent, and their strengths and weaknesses are distinct.
Huma Shah points out that Turing himself was concerned with whether a machine could think and was providing a simple method to examine this: through human-machine question-answer sessions.
Shah argues there is one imitation game which Turing described could be practicalised in two different ways:
a) one-to-one interrogator-machine test, and
b) simultaneous comparison of a machine with a human, both questioned in parallel by an interrogator.
Since the Turing test is a test of indistinguishability in performance capacity, the verbal version generalises naturally to all of human performance capacity, verbal as well as nonverbal (robotic).
Turing's original game described a simple party game involving three players.
Player A is a man, player B is a woman and player C (who plays the role of the interrogator) is of either sex. In the Imitation Game, player C is unable to see either player A or player B, and can communicate with them only through written notes. By asking questions of player A and player B, player C tries to determine which of the two is the man and which is the woman. Player A's role is to trick the interrogator into making the wrong decision, while player B attempts to assist the interrogator in making the right one.
Sterret referred to this as the "Original Imitation Game Test".
Turing proposed that the role of player A be filled by a computer so that its task was to pretend to be a woman and attempt to trick the interrogator into making an incorrect evaluation. The success of the computer was determined by comparing the outcome of the game when player A is a computer against the outcome when player A is a man. Turing stated if "the interrogator decide[s] wrongly as often when the game is played [with the computer] as he does when the game is played between a man and a woman", it may be argued that the computer is intelligent.
The second version appeared later in Turing's 1950 paper. Similar to the Original Imitation Game Test, the role of player A is performed by a computer. However, the role of player B is performed by a man rather than a woman.
"Let us fix our attention on one particular digital computer C. Is it true that by modifying this computer to have an adequate storage, suitably increasing its speed of action, and providing it with an appropriate programme, C can be made to play satisfactorily the part of A in the imitation game, the part of B being taken by a man?"
In this version, both player A (the computer) and player B are trying to trick the interrogator into making an incorrect decision.
Common understanding has it that the purpose of the Turing Test is not specifically to determine whether a computer is able to fool an interrogator into believing that it is a human, but rather whether a computer could imitate a human.
While there is some dispute whether this interpretation was intended by Turing – Sterrett believes that it was and thus conflates the second version with this one, while others, such as Traiger, do not – this has nevertheless led to what can be viewed as the "standard interpretation."
In this version, player A is a computer and player B a person of either sex.
The role of the interrogator is not to determine which is male and which is female, but which is a computer and which is a human.
The fundamental issue with the standard interpretation is that the interrogator cannot differentiate which responder is human, and which is machine.
There are issues about duration, but the standard interpretation generally considers this limitation as something that should be reasonable.
Controversy has arisen over which of the alternative formulations of the test Turing intended.
Sterrett argues that two distinct tests can be extracted from his 1950 paper and that, pace Turing's remark, they are not equivalent.
The test that employs the party game and compares frequencies of success is referred to as the "Original Imitation Game Test," whereas the test consisting of a human judge conversing with a human and a machine is referred to as the "Standard Turing Test," noting that Sterrett equates this with the "standard interpretation" rather than the second version of the imitation game.
Sterrett agrees that the Standard Turing Test (STT) has the problems that its critics cite but feels that, in contrast, the Original Imitation Game Test (OIG Test) so defined is immune to many of them, due to a crucial difference.
Unlike the STT, it does not make similarity to human performance the criterion, even though it employs human performance in setting a criterion for machine intelligence.
A man can fail the OIG Test, but it is argued that it is a virtue of a test of intelligence that failure indicates a lack of resourcefulness.
The OIG Test requires the resourcefulness associated with intelligence and not merely simulation of human conversational behaviour.
The general structure of the OIG Test could even be used with non-verbal versions of imitation games.
Still other writers have interpreted Turing as proposing that the imitation game itself is the test, without specifying how to take into account Turing's statement that the test that he proposed using the party version of the imitation game is based upon a criterion of comparative frequency of success in that imitation game, rather than a capacity to succeed at one round of the game.
Saygin has suggested that maybe the original game is a way of proposing a less biased experimental design as it hides the participation of the computer.
The imitation game also includes a "social hack" not found in the standard interpretation, as in the game both computer and male human are required to play as pretending to be someone they are not.
A crucial piece of any laboratory test is that there should be a control.
Turing never makes clear whether the interrogator in his tests is aware that one of the participants is a computer.
However, if there was a machine that did have the potential to pass a Turing test, it would be safe to assume a double blind control would be necessary.
To return to the Original Imitation Game, he states only that player A is to be replaced with a machine, not that player C is to be made aware of this replacement.
When Colby, FD Hilf, S Weber and AD Kramer tested PARRY, they did so by assuming that the interrogators did not need to know that one or more of those being interviewed was a computer during the interrogation.
***************************************
As Ayse Saygin, Peter Swirski, and others have highlighted, this makes a big difference to the implementation and outcome of the test.
An experimental study looking at Gricean maxim violations using transcripts of Loebner's one-to-one (interrogator-hidden interlocutor) Prize for AI contests between 1994–1999, Ayse Saygin found significant differences between the responses of participants who knew and did not know about computers being involved.
*************************************
Huma Shah and Kevin Warwick, who organized the Loebner Prize at Reading which staged simultaneous comparison tests (one judge-two hidden interlocutors), showed that knowing/not knowing did not make a significant difference in some judges' determination.
Judges were not explicitly told about the nature of the pairs of hidden interlocutors they would interrogate.
Judges were able to distinguish human from machine, including when they were faced with control pairs of two humans and two machines embedded among the machine-human set ups.
Spelling errors gave away the hidden-humans; machines were identified by 'speed of response' and lengthier utterances.
The power and appeal of the Turing test derives from its simplicity.
The philosophy of mind, psychology, and modern neuroscience have been unable to provide definitions of "intelligence" and "thinking" that are sufficiently precise and general to be applied to machines.
Without such definitions, the central questions of the philosophy of artificial intelligence cannot be answered. The Turing test, even if imperfect, at least provides something that can actually be measured. As such, it is a pragmatic solution to a difficult philosophical question.
The format of the test allows the interrogator to give the machine a wide variety of intellectual tasks. Turing wrote that "the question and answer method seems to be suitable for introducing almost any one of the fields of human endeavor that we wish to include."
John Haugeland (who collaborated with Grice on research on David Hume on personal identity) adds that "understanding the words is not enough; you have to understand the topic as well."
To pass a well-designed Turing test, the machine must use natural language, reason, have knowledge and learn.
The test can be extended to include video input, as well as a "hatch" through which objects can be passed: this would force the machine to demonstrate the skill of vision and robotics as well.
Together, these represent almost all of the major problems that artificial intelligence research would like to solve.
The Feigenbaum test is designed to take advantage of the broad range of topics available to a Turing test.
It is a limited form of Turing's question-answer game which compares the machine against the abilities of experts in specific fields such as literature or chemistry.
IBM's Watson machine achieved success in a man versus machine television quiz show of human knowledge, Jeopardy!
Turing did not explicitly state that the Turing test could be used as a measure of intelligence, or any other human quality.
He wanted to provide a clear and understandable alternative to the word "think", which he could then use to reply to criticisms of the possibility of "thinking machines" and to suggest ways that research might move forward.
Nevertheless, the Turing test has been proposed as a measure of a machine's "ability to think" or its "intelligence".
This proposal has received criticism from both philosophers and computer scientists.
It assumes that an interrogator can determine if a machine is "thinking" by comparing its behaviour with human behaviour.
Every element of this assumption has been questioned: the reliability of the interrogator's judgement, the value of comparing only behaviour and the value of comparing the machine with a human.
Because of these and other considerations, some AI researchers have questioned the relevance of the test to their field.
The Turing test does not directly test whether the computer behaves intelligently – it tests only whether the computer behaves like a human being. Since human behaviour and intelligent behaviour are not exactly the same thing, the test can fail to accurately measure intelligence in two ways:
The Turing test is concerned strictly with how the subject acts – the external behaviour of the machine.
In this regard, it takes a behaviourist or functionalist approach to the study of intelligence.
The example of ELIZA suggests that a machine passing the test may be able to simulate human conversational behaviour by following a simple (but large) list of mechanical rules, without thinking or having a mind at all.
John Searle has argued that external behaviour cannot be used to determine if a machine is "actually" thinking or merely "simulating thinking."
His Chinese room argument is intended to show that, even if the Turing test is a good operational definition of intelligence, it may not indicate that the machine has a mind, consciousness, or intentionality.
Intentionality is a philosophical term for the power of thoughts to be "about" something.
Turing anticipated this line of criticism in his original paper,writing:
"I do not wish to give the impression that I think there is no mystery about consciousness."
"There is, for instance, something of a paradox connected with any attempt to localise it."
"But I do not think these mysteries necessarily need to be solved before we can answer the question with which we are concerned in this paper."
In practice, the test's results can easily be dominated not by the computer's intelligence, but by the attitudes, skill or naivety of the questioner.
Turing does not specify the precise skills and knowledge required by the interrogator in his description of the test, but he did use the term "average interrogator".
The average interrogator would not have more than 70 per cent chance of making the right identification after five minutes of questioning.
Shah and Warwick show that experts are fooled, and that interrogator strategy, "power" vs "solidarity" affects correct identification, the latter being more successful.
Chatterbot programs such as ELIZA have repeatedly fooled unsuspecting people into believing that they are communicating with human beings.
In these cases, the "interrogator" is not even aware of the possibility that they are interacting with a computer.
To successfully appear human, there is no need for the machine to have any intelligence whatsoever and only a superficial resemblance to human behaviour is required.
Early Loebner prize competitions used "unsophisticated" interrogators who were easily fooled by the machines.
Since then, the Loebner Prize organizers have deployed philosophers, computer scientists, and journalists among the interrogators. Nonetheless, some of these experts have been deceived by the machines.
Michael Shermer points out that human beings consistently choose to consider non-human objects as human whenever they are allowed the chance, a mistake called the anthropomorphic fallacy.
They talk to their cars, ascribe desire and intentions to natural forces (e.g., "nature abhors a vacuum"), and worship the sun as a human-like being with intelligence.
If the Turing test is applied to religious objects, Shermer argues, then, that inanimate statues, rocks, and places have consistently passed the test throughout history.
This human tendency towards anthropomorphism effectively lowers the bar for the Turing test, unless interrogators are specifically trained to avoid it.
One interesting feature of the Turing Test is the frequency with which hidden human foils are misidentified by interrogators as being machines.
It has been suggested that interrogators are looking more for expected human responses rather than typical ones.
As a result some individuals can be often categorized as being machines.
This can therefore work in favor of a competing machine.
Mainstream AI researchers argue that trying to pass the Turing Test is merely a distraction from more fruitful research.
Indeed, the Turing test is not an active focus of much academic or commercial effort—as Stuart Russell and Peter Norvig write, AI researchers have devoted little attention to passing the Turing test.
There are several reasons.
First, there are easier ways to test their programs.
Most current research in AI-related fields is aimed at modest and specific goals, such as automated scheduling, object recognition, or logistics.
To test the intelligence of the programs that solve these problems, AI researchers simply give them the task directly.
Russell and Norvig suggest an analogy with the history of flight.
Planes are tested by how well they fly, not by comparing them to birds.
Aeronautical engineering texts do not define the goal of their field as 'making machines that fly so exactly like pigeons that they can fool other pigeons.
Second, creating lifelike simulations of human beings is a difficult problem on its own that does not need to be solved to achieve the basic goals of AI research.
Believable human characters may be interesting in a work of art, a game, or a sophisticated user interface, but they are not part of the science of creating intelligent machines, that is, machines that solve problems using intelligence.
Turing, for his part, never intended his test to be used as a practical, day-to-day measure of the intelligence of AI programs.
He wanted to provide a clear and understandable example to aid in the discussion of the philosophy of artificial intelligence.
John McCarthy observes that the philosophy of AI is "unlikely to have any more effect on the practice of AI research than philosophy of science generally has on the practice of science."
Numerous other versions of the Turing test, including those expounded above, have been mooted through the years.
An example is implied in the work of psychoanalyst Wilfred Bion, who was particularly fascinated by the "storm" that resulted from the encounter of one mind by another.
In his book, among several other original points with regard to the Turing test, literary scholar Peter Swirski discussed in detail the idea of what he termed the Swirski test—essentially the reverse Turing test.
He pointed out that it overcomes most if not all standard objections levelled at the standard version.
Carrying this idea forward, R. D. Hinshelwood described the mind as a "mind recognizing apparatus."
The challenge would be for the computer to be able to determine if it were interacting with a human or another computer.
This is an extension of the original question that Turing attempted to answer but would, perhaps, offer a high enough standard to define a machine that could "think" in a way that we typically define as characteristically human.
CAPTCHA is a form of reverse Turing test.
Before being allowed to perform some action on a website, the user is presented with alphanumerical characters in a distorted graphic image and asked to type them out.
This is intended to prevent automated systems from being used to abuse the site.
The rationale is that software sufficiently sophisticated to read and reproduce the distorted image accurately does not exist (or is not available to the average user), so any system able to do so is likely to be a human.
Software that can reverse CAPTCHA with some accuracy by analysing patterns in the generating engine is being actively developed.
OCR or optical character recognition is also under development as a workaround for the inaccessibility of several CAPTCHA schemes to humans with disabilities.
This is also known as a "Feigenbaum test" and was proposed by Edward Feigenbaum.
The "Total Turing test" variation of the Turing test adds two further requirements to the traditional Turing test.
The interrogator can also test the perceptual abilities of the subject (requiring computer vision) and the subject's ability to manipulate objects (requiring Robotics).
The Minimum Intelligent Signal Test was proposed by Chris McKinstry as "the maximum abstraction of the Turing test", in which only binary responses (true/false or yes/no) are permitted, to focus only on the capacity for thought.
It eliminates text chat problems like anthropomorphism bias, and doesn't require emulation of unintelligent human behaviour, allowing for systems that exceed human intelligence.
The questions must each stand on their own, however, making it more like an IQ test than an interrogation.
It is typically used to gather statistical data against which the performance of artificial intelligence programs may be measured.
The organizers of the Hutter Prize believe that compressing natural language text is a hard AI problem, equivalent to passing the Turing test.
The data compression test has some advantages over most versions and variations of a Turing test, including:
It gives a single number that can be directly used to compare which of two machines is "more intelligent."
It does not require the computer to lie to the judge.
The main disadvantages of using data compression as a test are:
It is not possible to test humans this way.
It is unknown what particular "score" on this test—if any—is equivalent to passing a human-level Turing test.
A related approach to Hutter's prize which appeared much earlier in the late 1990s is the inclusion of compression problems in an extended Turing Test, or by tests which are completely derived from Kolmogorov complexity.
Other related tests in this line are presented by Hernandez-Orallo and Dowe.
Algorithmic IQ, or AIQ for short, is an attempt to convert the theoretical Universal Intelligence Measure from Legg and Hutter (based on Solomonoff's inductive inference) into a working practical test of machine intelligence.
Two major advantages of some of these tests are their applicability to nonhuman intelligences and their absence of a requirement for human testers.
The Turing test inspired the Ebert test proposed by film critic Roger Ebert which is a test whether a computer-based synthesised voice has sufficient skill in terms of intonations, inflections, timing and so forth, to make people laugh.
Turing predicted that machines would eventually be able to pass the test; in fact, he estimated that by the year 2000, machines with around 100 MB of storage would be able to fool 30% of human judges in a five-minute test, and that people would no longer consider the phrase "thinking machine" contradictory.
In practice, the Loebner Prize chatterbot contestants only managed to fool a judge once, and that was only due to the human contestant pretending to be a chatbot.
He further predicted that machine learning would be an important part of building powerful machines, a claim considered plausible by contemporary researchers in artificial intelligence.
In a paper submitted to the Midwest Artificial Intelligence and Cognitive Science Conference, . Shane T. Mueller predicted a modified Turing Test called a "Cognitive Decathlon" could be accomplished within 5 years.
By extrapolating an exponential growth of technology over several decades, futurist Ray Kurzweil predicted that Turing test-capable computers would be manufactured in the near future. In 1990, he set the year around 2020.
By 2005, he had revised his estimate to 2029.
The Long Bet Project Bet Nr. 1 is a wager of $20,000 between Mitch Kapor (pessimist) and Ray Kurzweil (optimist) about whether a computer will pass a lengthy Turing Test by the year 2029.
During the Long Now Turing Test, each of three Turing Test Judges will conduct online interviews of each of the four Turing Test Candidates (i.e., the Computer and the three Turing Test Human Foils) for two hours each for a total of eight hours of interviews.
The bet specifies the conditions in some detail.
1990 marked the fortieth anniversary of the first publication of Turing's "Computing Machinery and Intelligence" paper, and, thus, saw renewed interest in the test.
Two significant events occurred in that year: The first was the Turing Colloquium, which was held at the University of Sussex in April, and brought together academics and researchers from a wide variety of disciplines to discuss the Turing Test in terms of its past, present, and future; the second was the formation of the annual Loebner Prize competition.
Blay Whitby lists four major turning points in the history of the Turing Test – the publication of "Computing Machinery and Intelligence", the announcement of Joseph Weizenbaum's ELIZA 66, Kenneth Colby's creation of PARRY, which was described in the Turing Colloquium.
In November 2005, the University of Surrey hosted an inaugural one-day meeting of artificial CONVERSATIONAL entity developers, attended by winners of practical Turing Tests in the Loebner Prize: Robby Garner, Richard Wallace and Rollo Carpenter.
Invited speakers included David Hamill, Hugh Loebner (sponsor of the Loebner Prize) and Huma Shah.
In parallel to the Loebner Prize held at the University of Reading, the Society for the Study of Artificial Intelligence and the Simulation of Behaviour, hosted a one-day symposium to discuss the Turing Test, organized by John Barnden, Mark Bishop, Huma Shah and Kevin Warwick.
The Speakers included Royal Institution's Director Baroness Susan Greenfield, Selmer Bringsjord, Turing's biographer Andrew Hodges, and consciousness scientist Owen Holland.
No agreement emerged for a canonical Turing Test, though Bringsjord expressed that a sizeable prize would result in the Turing Test being passed sooner.
Sixty years following its introduction, continued argument over Turing's 'Can machines think?' experiment led to its reconsideration for the 21st Century Symposium at the AISB Convention, held 29 March to 1 April 2010 in De Montfort University, UK. The AISB is the (British) Society for the Study of Artificial Intelligence and the Simulation of Behaviour.
Throughout 2012, a number of major events took place to celebrate Turing's life and scientific impact.
The Turing100 group supported these events and also, organized a special Turing test event in Bletchley Park on 23 June 2012 to celebrate the 100th anniversary of Turing's birth.
Latest discussions on the Turing Test in a symposium with 11 speakers, organized by Vincent C. Müller (ACT & Oxford) and Aladdin Ayeshm (de Montfort) – with Mark Bishop, John Barnden, Alessio Plebe and Pietro Perconti.
The game was recorded, and the program lost to Turing's colleague Alick Glennie, although it is said that it won a game against Champernowne's wife.
Results and report can be found in: "Can a machine think? – results from the 18th Loebner Prize contest", University of Reading .
Jose Hernandez-Orallo (2000), "Beyond the Turing Test", Journal of Logic, Language and Information 9 (4): 447–466, doi:10.1023/A:1008367325700, retrieved 21 July 2009.
Hernandez-Orallo, J; Dowe, D L (2010), "Measuring Universal Intelligence: Towards an Anytime Intelligence Test", Artificial Intelligence Journal 174 (18): 1508–1539, doi:10.1016/j.artint.2010.09.006.
Moor, James, ed. The Turing Test: The Elusive Standard of Artificial Intelligence, Dordrecht: Kluwer Academic Publishers.
Turing Test Page, The Turing Test – an Opera by Julian Wagstaff, Turing test at DMOZ, The Turing Test- How accurate could the turing test really be?
The Turing test entry in the Stanford Encyclopedia of Philosophy, Turing Test: 50 Years Later reviews a half-century of work on the Turing Test, from the vantage point of 2000, Bet between Kapor and Kurzweil, including detailed justifications of their respective positions, Why The Turing Test is AI's Biggest Blind Alley by Blay Witby, TuringHub.com Take the Turing Test, live, online, Jabberwacky.com An AI chatterbot that learns from and imitates humans, New York Times essays on machine intelligence part 1 and part 2"The first ever (restricted) Turing test", on season 2 , episode 5 of Scientific American Frontiers. Computer Science Unplugged teaching activity for the Turing Test. Wiki News: "Talk:Computer professionals celebrate 10th birthday of A.L.I.C.E."
Categories: Turing tests, Philosophy of artificial intelligence, Alan Turing, Human–computer interaction, History of artificial intelligence.
The question of whether it is possible for machines to think has a long history, which is firmly entrenched in the distinction between dualist and materialist views of the mind.
René Descartes prefigures aspects of the Turing Test in his 1637 Discourse on the Method when he writes:
"How many different automata or moving machines can be made by the industry of man or we can easily understand a machine's being constituted so that it can utter words, and even emit some responses to action on it of a corporeal kind, which brings about a change in its organs; for instance, if touched in a particular part it may ask what we wish to say to it; if in another part it may exclaim that it is being hurt, and so on."
"But it never happens that it arranges its speech in various ways, in order to reply appropriately to everything that may be said in its presence, as even the lowest type of man can do."
Descartes therefore prefigures the Turing Test by identifying the insufficiency of appropriate linguistic response as that which separates the human from the automaton.
Descartes fails to consider the possibility that the insufficiency of appropriate linguistic response might be capable of being overcome by future automata and so does not propose the Turing Test as such, even if he prefigures its conceptual framework and criterion.
Denis Diderot formulates in his Pensées philosophiques a Turing-test criterion:
"If they find a parrot who could answer to everything, I would claim it to be an intelligent being without hesitation."
------ There is a similar treatment of the parrot problem by Locke (vide Jones's edition of the Essay). Locke is referring to Prince Maurice's Parrot, who can be deemed 'rational, very intelligent', or 'very intelligent, rational, rather.
Grice loved Locke's discussion so much that he speaks of 'very intelligent, rational' PIROTS, punning on Carnap ("Pirots carulise elatically").
-----
This does not mean Diderot agrees with this, but that it was already a common argument of materialists at that time.
According to dualism, the mind is non-physical (or, at the very least, has non-physical properties) and, therefore, cannot be explained in purely physical terms.
According to materialism, the mind can be explained physically, which leaves open the possibility of minds that are produced artificially.
In 1936, philosopher Alfred Ayer considered the standard philosophical question of other minds. This he do in Oxford, in Cambridge, Wisdom was taking the issue more seriously ("Other Minds").
How do we know that other people have the same conscious experiences that we do?
In his book, Language, Truth and Logic, Ayer suggests a protocol to distinguish between a conscious man and an unconscious machine.
"The only ground I can have," Ayer writes, "for asserting that an object which appears to be conscious is not really a conscious being, but only a dummy or a machine, is that it fails to satisfy one of the empirical tests by which the presence or absence of consciousness is determined."
This suggestion is very similar to the Turing test, but is concerned with consciousness rather than intelligence.
Moreover, it is not certain that Ayer's popular philosophical classic was familiar to Turing.
We doubt it wasn't.
In other words, a thing is not conscious if it fails the consciousness test.
Researchers in the United Kingdom had been exploring "machine intelligence" for up to ten years prior to the founding of the field of artificial intelligence (AI) research in 1956.
It was a common conversational topic among the members of the Ratio Club, who were an informal group of British cybernetics and electronics researchers that included Alan Turing, after whom the test is named.
The Ratio Club met in different nice pubs.
Turing, in particular, had been tackling the notion of machine intelligence since at least 1941 and one of the earliest-known mentions of "computer INTELLIGENCE" was made by him in 1947.
In Turing's report, "Intelligent Machinery", Turing nvestigates "the question of whether or not it is possible for machinery to show intelligent behaviour" -- "intelligent" applied to 'behaviour' --, and as part of that investigation, proposed what may be considered the forerunner to his later tests:
Turing: "It is not difficult to devise a paper machine which will play a not very bad game of chess."
Grice loved chess, and discusses chess in "Aspects of reasoning".
Turing: "Now get three men as subjects for the experiment. A, B and C. A and C are to be rather poor chess players, B is the operator who works the paper machine. Two rooms are used with some arrangement for communicating moves, and a game is played between C and either A or the paper machine. C may find it quite difficult to tell which he is playing."
"Computing Machinery and Intelligence" (Mind, October 1950) was the first published paper by Turing to focus exclusively on machine intelligence.
Turing begins the 1950 paper with the claim,
"I propose to consider the question 'Can machines think?" (implicature: as meaningless).
As he highlights, the traditional approach to such a question is to start with definitions, defining both the terms "machine" and "intelligence".
Turing chooses NOT to do so. Instead he replaces the question with a new, more operational, one, "which is closely related to it and is expressed in relatively unambiguous words."
In essence he proposes to change the question from "Can machines think?" to "Can machines do what we (as thinking entities) can do?"
The advantage of the new question, Turing argues, is that it draws "a fairly sharp line between the physical and intellectual capacities of a man."
To demonstrate this approach Turing proposes a test inspired by a party game, known as the "Imitation Game," in which a man and a woman go into separate rooms and guests try to tell them apart by writing a series of questions and reading the typewritten answers sent back.
In this game both the man and the woman aim to convince the guests that they are the other.
Huma Shah argues that this two-human version of the game was presented by Turing only to introduce the reader to the machine-human question-answer test.
Turing described his new version of the game as follows:
We now ask the question, "What will happen when a machine takes the part of A in this game?" Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, "Can machines think?"
Later in the paper Turing suggests an "equivalent" alternative formulation involving a judge conversing only with a computer and a man.
While neither of these formulations precisely matches the version of the Turing Test that is more generally known today, he proposed a third in 1952.
In this version, which Turing discussed in a BBC radio broadcast, a jury asks questions of a computer and the role of the computer is to make a significant proportion of the jury believe that it is really a man.
By that time, Grice was also giving radio broadcasts, notably those later reprinted by Pears in "The nature of metaphysics" (Third programme).
Turing's paper considered nine putative objections, which include all the major arguments against artificial intelligence that have been raised in the years since the paper was published (see "Computing Machinery and Intelligence").
ELIZA and PARRY
In 1966, Joseph Weizenbaum created a program which appeared to pass the Turing test.The program, known as ELIZA, worked by examining a user's typed comments for keywords.
If a keyword is found, a rule that transforms the user's comments is applied, and the resulting sentence is returned.
If a keyword is not found, ELIZA responds either with a generic riposte or by repeating one of the earlier comments.
In addition, Weizenbaum developed ELIZA to replicate the behaviour of a Rogerian psychotherapist, allowing ELIZA to be "free to assume the pose of knowing almost nothing of the real world."
With these techniques, Weizenbaum's program was able to fool some people into believing that they were talking to a real person, with some subjects being "very hard to convince that ELIZA [...] is not human."
Thus, ELIZA is claimed by some to be one of the programs (perhaps the first) able to pass the Turing Test, even though this view is highly contentious.
Kenneth Colby created PARRY in 1972, a program described as "ELIZA with attitude".
It attempted to model the behaviour of a paranoid schizophrenic, using a similar (if more advanced) approach to that employed by Weizenbaum.
To validate the work, PARRY was tested in the early 1970s using a variation of the Turing Test.
A group of experienced psychiatrists analysed a combination of real patients and computers running PARRY through teleprinters.
Another group of 33 psychiatrists were shown transcripts of the conversations.
The two groups were then asked to identify which of the "patients" were human and which were computer programs.
The psychiatrists were able to make the correct identification only 48 percent of the time – a figure consistent with random guessing.
In the 21st century, versions of these programs (now known as "chatterbots") continue to fool people.
"CyberLover", a malware program, preys on Internet users by convincing them to "reveal information about their identities or to lead them to visit a web site that will deliver malicious content to their computers".
The program has emerged as a "Valentine-risk" flirting with people "seeking relationships online in order to collect their personal data".
John Searle's 1980 paper Minds, Brains, and Programs proposed the "Chinese room" thought experiment and argued that the Turing test could not be used to determine if a machine can think.
Searle noted that software (such as ELIZA) could pass the Turing Test simply by manipulating symbols of which they had no understanding.
Without understanding, they could not be described as "thinking" in the same sense people do.
Therefore, Searle concludes, the Turing Test cannot prove that a machine can think.
Much like the Turing test itself, Searle's argument has been both widely criticised and highly endorsed.
Arguments such as Searle's and others working on the philosophy of mind sparked off a more intense debate about the nature of intelligence, the possibility of intelligent machines and the value of the Turing test that continued through the 1980s and 1990s.
The Loebner Prize provides an annual platform for practical Turing Tests with the first competition held in November 1991.
It is underwritten by Hugh Loebner.
The Cambridge Center for Behavioral Studies in Massachusetts, United States, organized the prizes up to and including the 2003 contest.
As Loebner described it, one reason the competition was created is to advance the state of AI research, at least in part, because no one had taken steps to implement the Turing Test despite 40 years of discussing it.
The first Loebner Prize competition in 1991 led to a renewed discussion of the viability of the Turing Test and the value of pursuing it, in both the popular press and the academia.
The first contest was won by a mindless program with no identifiable intelligence that managed to fool naive interrogators into making the wrong identification.
This highlighted several of the shortcomings of the Turing Test.
The winner won, at least in part, because it was able to "imitate human typing errors".
The unsophisticated interrogators were easily fooled; and some researchers in AI have been led to feel that the test is merely a distraction from more fruitful research.
The silver (text only) and gold (audio and visual) prizes have never been won.
However, the competition has awarded the bronze medal every year for the computer system that, in the judges' opinions, demonstrates the "most human" conversational behaviour among that year's entries.
Artificial Linguistic Internet Computer Entity (A.L.I.C.E.) has won the bronze award on three occasions in recent times (2000, 2001, 2004).
Learning AI Jabberwacky won in 2005 and 2006.
The Loebner Prize tests conversational intelligence; winners are typically chatterbot programs, or Artificial Conversational Entities (ACE)s.
Early Loebner Prize rules restricted conversations: Each entry and hidden-human conversed on a single topic, thus the interrogators were restricted to one line of questioning per entity interaction.
The restricted conversation rule was lifted for the 1995 Loebner Prize. Interaction duration between judge and entity has varied in Loebner Prizes.
In Loebner 2003, at the University of Surrey, each interrogator was allowed five minutes to interact with an entity, machine or hidden-human.
Between 2004 and 2007, the interaction time allowed in Loebner Prizes was more than twenty minutes.
In 2008, the interrogation duration allowed was five minutes per pair, because the organiser, Kevin Warwick, and coordinator, Huma Shah, consider this to be the duration for any test, as Turing stated in his 1950 paper:
" ... making the right identification after five minutes of questioning".
They felt Loebner's longer test, implemented in Loebner Prizes 2006 and 2007, was inappropriate for the state of artificial conversation technology.
It is ironic that the 2008 winning entry, Elbot from Artificial Solutions, does not mimic a human.
Its personality is that of a robot, yet Elbot deceived three human judges that it was the human during human-parallel comparisons.
During the 2009 competition, held in Brighton, UK, the communication program restricted judges to 10 minutes for each round, 5 minutes to converse with the human, 5 minutes to converse with the program.
This was to test the alternative reading of Turing's prediction that the 5-minute interaction was to be with the computer.
For the 2010 competition, the Sponsor has again increased the interaction time, between interrogator and system, to 25 minutes.
On 7 June 2014 a Turing test competition, organized by Huma Shah and Kevin Warwick to mark the 60th anniversary of Turing's death, was held at the Royal Society London and was won by the Russian chatter bot Eugene Goostman.
The bot, during a series of five-minute-long text conversations, convinced 33% of the contest's judges that it was human.
Judges included John Sharkey, a sponsor of the bill granting a government pardon to Turing, AI Professor Aaron Sloman and Red Dwarf actor Robert Llewellyn.
The competition's organisers believed that the Turing test had been "passed for the first time" at the event, saying that "some will claim that the Test has already been passed."
"The words Turing Test have been applied to similar competitions around the world."
"However this event involved the most simultaneous comparison tests than ever before, was independently verified and, crucially, the conversations were unrestricted."
"A true Turing Test does not set the questions or topics prior to the conversations."
The contest has faced criticism.
First, only a third of the judges were fooled by the computer.
Second, the program's character claimed to be a 13 year old Ukrainian who learned English as a second language.
The contest required 30% of judges to be fooled, which was based on Turing's statement in his Computing Machinery and Intelligence paper.
Joshua Tenenbaum, an AI expert at MIT stated that, in his view, the result was unimpressive.
Saul Traiger argues that there are at least three primary versions of the Turing test, two of which are offered in "Computing Machinery and Intelligence" and one that he describes as the "Standard Interpretation."
While there is some debate regarding whether the "Standard Interpretation" is that described by Turing or, instead, based on a misreading of his paper, these three versions are not regarded as equivalent, and their strengths and weaknesses are distinct.
Huma Shah points out that Turing himself was concerned with whether a machine could think and was providing a simple method to examine this: through human-machine question-answer sessions.
Shah argues there is one imitation game which Turing described could be practicalised in two different ways:
a) one-to-one interrogator-machine test, and
b) simultaneous comparison of a machine with a human, both questioned in parallel by an interrogator.
Since the Turing test is a test of indistinguishability in performance capacity, the verbal version generalises naturally to all of human performance capacity, verbal as well as nonverbal (robotic).
Turing's original game described a simple party game involving three players.
Player A is a man, player B is a woman and player C (who plays the role of the interrogator) is of either sex. In the Imitation Game, player C is unable to see either player A or player B, and can communicate with them only through written notes. By asking questions of player A and player B, player C tries to determine which of the two is the man and which is the woman. Player A's role is to trick the interrogator into making the wrong decision, while player B attempts to assist the interrogator in making the right one.
Sterret referred to this as the "Original Imitation Game Test".
Turing proposed that the role of player A be filled by a computer so that its task was to pretend to be a woman and attempt to trick the interrogator into making an incorrect evaluation. The success of the computer was determined by comparing the outcome of the game when player A is a computer against the outcome when player A is a man. Turing stated if "the interrogator decide[s] wrongly as often when the game is played [with the computer] as he does when the game is played between a man and a woman", it may be argued that the computer is intelligent.
The second version appeared later in Turing's 1950 paper. Similar to the Original Imitation Game Test, the role of player A is performed by a computer. However, the role of player B is performed by a man rather than a woman.
"Let us fix our attention on one particular digital computer C. Is it true that by modifying this computer to have an adequate storage, suitably increasing its speed of action, and providing it with an appropriate programme, C can be made to play satisfactorily the part of A in the imitation game, the part of B being taken by a man?"
In this version, both player A (the computer) and player B are trying to trick the interrogator into making an incorrect decision.
Common understanding has it that the purpose of the Turing Test is not specifically to determine whether a computer is able to fool an interrogator into believing that it is a human, but rather whether a computer could imitate a human.
While there is some dispute whether this interpretation was intended by Turing – Sterrett believes that it was and thus conflates the second version with this one, while others, such as Traiger, do not – this has nevertheless led to what can be viewed as the "standard interpretation."
In this version, player A is a computer and player B a person of either sex.
The role of the interrogator is not to determine which is male and which is female, but which is a computer and which is a human.
The fundamental issue with the standard interpretation is that the interrogator cannot differentiate which responder is human, and which is machine.
There are issues about duration, but the standard interpretation generally considers this limitation as something that should be reasonable.
Controversy has arisen over which of the alternative formulations of the test Turing intended.
Sterrett argues that two distinct tests can be extracted from his 1950 paper and that, pace Turing's remark, they are not equivalent.
The test that employs the party game and compares frequencies of success is referred to as the "Original Imitation Game Test," whereas the test consisting of a human judge conversing with a human and a machine is referred to as the "Standard Turing Test," noting that Sterrett equates this with the "standard interpretation" rather than the second version of the imitation game.
Sterrett agrees that the Standard Turing Test (STT) has the problems that its critics cite but feels that, in contrast, the Original Imitation Game Test (OIG Test) so defined is immune to many of them, due to a crucial difference.
Unlike the STT, it does not make similarity to human performance the criterion, even though it employs human performance in setting a criterion for machine intelligence.
A man can fail the OIG Test, but it is argued that it is a virtue of a test of intelligence that failure indicates a lack of resourcefulness.
The OIG Test requires the resourcefulness associated with intelligence and not merely simulation of human conversational behaviour.
The general structure of the OIG Test could even be used with non-verbal versions of imitation games.
Still other writers have interpreted Turing as proposing that the imitation game itself is the test, without specifying how to take into account Turing's statement that the test that he proposed using the party version of the imitation game is based upon a criterion of comparative frequency of success in that imitation game, rather than a capacity to succeed at one round of the game.
Saygin has suggested that maybe the original game is a way of proposing a less biased experimental design as it hides the participation of the computer.
The imitation game also includes a "social hack" not found in the standard interpretation, as in the game both computer and male human are required to play as pretending to be someone they are not.
A crucial piece of any laboratory test is that there should be a control.
Turing never makes clear whether the interrogator in his tests is aware that one of the participants is a computer.
However, if there was a machine that did have the potential to pass a Turing test, it would be safe to assume a double blind control would be necessary.
To return to the Original Imitation Game, he states only that player A is to be replaced with a machine, not that player C is to be made aware of this replacement.
When Colby, FD Hilf, S Weber and AD Kramer tested PARRY, they did so by assuming that the interrogators did not need to know that one or more of those being interviewed was a computer during the interrogation.
***************************************
As Ayse Saygin, Peter Swirski, and others have highlighted, this makes a big difference to the implementation and outcome of the test.
An experimental study looking at Gricean maxim violations using transcripts of Loebner's one-to-one (interrogator-hidden interlocutor) Prize for AI contests between 1994–1999, Ayse Saygin found significant differences between the responses of participants who knew and did not know about computers being involved.
*************************************
Huma Shah and Kevin Warwick, who organized the Loebner Prize at Reading which staged simultaneous comparison tests (one judge-two hidden interlocutors), showed that knowing/not knowing did not make a significant difference in some judges' determination.
Judges were not explicitly told about the nature of the pairs of hidden interlocutors they would interrogate.
Judges were able to distinguish human from machine, including when they were faced with control pairs of two humans and two machines embedded among the machine-human set ups.
Spelling errors gave away the hidden-humans; machines were identified by 'speed of response' and lengthier utterances.
The power and appeal of the Turing test derives from its simplicity.
The philosophy of mind, psychology, and modern neuroscience have been unable to provide definitions of "intelligence" and "thinking" that are sufficiently precise and general to be applied to machines.
Without such definitions, the central questions of the philosophy of artificial intelligence cannot be answered. The Turing test, even if imperfect, at least provides something that can actually be measured. As such, it is a pragmatic solution to a difficult philosophical question.
The format of the test allows the interrogator to give the machine a wide variety of intellectual tasks. Turing wrote that "the question and answer method seems to be suitable for introducing almost any one of the fields of human endeavor that we wish to include."
John Haugeland (who collaborated with Grice on research on David Hume on personal identity) adds that "understanding the words is not enough; you have to understand the topic as well."
To pass a well-designed Turing test, the machine must use natural language, reason, have knowledge and learn.
The test can be extended to include video input, as well as a "hatch" through which objects can be passed: this would force the machine to demonstrate the skill of vision and robotics as well.
Together, these represent almost all of the major problems that artificial intelligence research would like to solve.
The Feigenbaum test is designed to take advantage of the broad range of topics available to a Turing test.
It is a limited form of Turing's question-answer game which compares the machine against the abilities of experts in specific fields such as literature or chemistry.
IBM's Watson machine achieved success in a man versus machine television quiz show of human knowledge, Jeopardy!
Turing did not explicitly state that the Turing test could be used as a measure of intelligence, or any other human quality.
He wanted to provide a clear and understandable alternative to the word "think", which he could then use to reply to criticisms of the possibility of "thinking machines" and to suggest ways that research might move forward.
Nevertheless, the Turing test has been proposed as a measure of a machine's "ability to think" or its "intelligence".
This proposal has received criticism from both philosophers and computer scientists.
It assumes that an interrogator can determine if a machine is "thinking" by comparing its behaviour with human behaviour.
Every element of this assumption has been questioned: the reliability of the interrogator's judgement, the value of comparing only behaviour and the value of comparing the machine with a human.
Because of these and other considerations, some AI researchers have questioned the relevance of the test to their field.
The Turing test does not directly test whether the computer behaves intelligently – it tests only whether the computer behaves like a human being. Since human behaviour and intelligent behaviour are not exactly the same thing, the test can fail to accurately measure intelligence in two ways:
- Some human behaviour is unintelligen.
- The Turing test requires that the machine be able to execute all human behaviours, regardless of whether they are intelligent.
- It even tests for behaviours that we may not consider intelligent at all, such as the susceptibility to insults, the temptation to lie or, simply, a high frequency of typing mistakes.
- If a machine cannot imitate these unintelligent behaviours in detail it fails the test.
- This objection was raised by The Economist, in an article entitled "artificial stupidity" published shortly after the first Loebner prize competition.
- The article noted that the first Loebner winner's victory was due, at least in part, to its ability to "imitate human typing errors."
- Turing himself had suggested that programs add errors into their output, so as to be better "players" of the game.
- Some intelligent behaviour is inhuman.
- The Turing test does not test for highly intelligent behaviours, such as the ability to solve difficult problems or come up with original insights.
- In fact, it specifically requires deception on the part of the machine: if the machine is more intelligent than a human being it must deliberately avoid appearing too intelligent.
- If it were to solve a computational problem that is practically impossible for a human to solve, then the interrogator would know the program is not human, and the machine would fail the test.
- Because it cannot measure intelligence that is beyond the ability of humans, the test cannot be used to build or evaluate systems that are more intelligent than humans.
- Because of this, several test alternatives that would be able to evaluate superintelligent systems have been proposed.
See also: Synthetic intelligence.
The Turing test is concerned strictly with how the subject acts – the external behaviour of the machine.
In this regard, it takes a behaviourist or functionalist approach to the study of intelligence.
The example of ELIZA suggests that a machine passing the test may be able to simulate human conversational behaviour by following a simple (but large) list of mechanical rules, without thinking or having a mind at all.
John Searle has argued that external behaviour cannot be used to determine if a machine is "actually" thinking or merely "simulating thinking."
His Chinese room argument is intended to show that, even if the Turing test is a good operational definition of intelligence, it may not indicate that the machine has a mind, consciousness, or intentionality.
Intentionality is a philosophical term for the power of thoughts to be "about" something.
Turing anticipated this line of criticism in his original paper,writing:
"I do not wish to give the impression that I think there is no mystery about consciousness."
"There is, for instance, something of a paradox connected with any attempt to localise it."
"But I do not think these mysteries necessarily need to be solved before we can answer the question with which we are concerned in this paper."
In practice, the test's results can easily be dominated not by the computer's intelligence, but by the attitudes, skill or naivety of the questioner.
Turing does not specify the precise skills and knowledge required by the interrogator in his description of the test, but he did use the term "average interrogator".
The average interrogator would not have more than 70 per cent chance of making the right identification after five minutes of questioning.
Shah and Warwick show that experts are fooled, and that interrogator strategy, "power" vs "solidarity" affects correct identification, the latter being more successful.
Chatterbot programs such as ELIZA have repeatedly fooled unsuspecting people into believing that they are communicating with human beings.
In these cases, the "interrogator" is not even aware of the possibility that they are interacting with a computer.
To successfully appear human, there is no need for the machine to have any intelligence whatsoever and only a superficial resemblance to human behaviour is required.
Early Loebner prize competitions used "unsophisticated" interrogators who were easily fooled by the machines.
Since then, the Loebner Prize organizers have deployed philosophers, computer scientists, and journalists among the interrogators. Nonetheless, some of these experts have been deceived by the machines.
Michael Shermer points out that human beings consistently choose to consider non-human objects as human whenever they are allowed the chance, a mistake called the anthropomorphic fallacy.
They talk to their cars, ascribe desire and intentions to natural forces (e.g., "nature abhors a vacuum"), and worship the sun as a human-like being with intelligence.
If the Turing test is applied to religious objects, Shermer argues, then, that inanimate statues, rocks, and places have consistently passed the test throughout history.
This human tendency towards anthropomorphism effectively lowers the bar for the Turing test, unless interrogators are specifically trained to avoid it.
One interesting feature of the Turing Test is the frequency with which hidden human foils are misidentified by interrogators as being machines.
It has been suggested that interrogators are looking more for expected human responses rather than typical ones.
As a result some individuals can be often categorized as being machines.
This can therefore work in favor of a competing machine.
Mainstream AI researchers argue that trying to pass the Turing Test is merely a distraction from more fruitful research.
Indeed, the Turing test is not an active focus of much academic or commercial effort—as Stuart Russell and Peter Norvig write, AI researchers have devoted little attention to passing the Turing test.
There are several reasons.
First, there are easier ways to test their programs.
Most current research in AI-related fields is aimed at modest and specific goals, such as automated scheduling, object recognition, or logistics.
To test the intelligence of the programs that solve these problems, AI researchers simply give them the task directly.
Russell and Norvig suggest an analogy with the history of flight.
Planes are tested by how well they fly, not by comparing them to birds.
Aeronautical engineering texts do not define the goal of their field as 'making machines that fly so exactly like pigeons that they can fool other pigeons.
Second, creating lifelike simulations of human beings is a difficult problem on its own that does not need to be solved to achieve the basic goals of AI research.
Believable human characters may be interesting in a work of art, a game, or a sophisticated user interface, but they are not part of the science of creating intelligent machines, that is, machines that solve problems using intelligence.
Turing, for his part, never intended his test to be used as a practical, day-to-day measure of the intelligence of AI programs.
He wanted to provide a clear and understandable example to aid in the discussion of the philosophy of artificial intelligence.
John McCarthy observes that the philosophy of AI is "unlikely to have any more effect on the practice of AI research than philosophy of science generally has on the practice of science."
Numerous other versions of the Turing test, including those expounded above, have been mooted through the years.
Main articles: reverse Turing test and CAPTCHA.
A modification of the Turing test wherein the objective of one or more of the roles have been reversed between machines and humans is termed a reverse Turing test. An example is implied in the work of psychoanalyst Wilfred Bion, who was particularly fascinated by the "storm" that resulted from the encounter of one mind by another.
In his book, among several other original points with regard to the Turing test, literary scholar Peter Swirski discussed in detail the idea of what he termed the Swirski test—essentially the reverse Turing test.
He pointed out that it overcomes most if not all standard objections levelled at the standard version.
Carrying this idea forward, R. D. Hinshelwood described the mind as a "mind recognizing apparatus."
The challenge would be for the computer to be able to determine if it were interacting with a human or another computer.
This is an extension of the original question that Turing attempted to answer but would, perhaps, offer a high enough standard to define a machine that could "think" in a way that we typically define as characteristically human.
CAPTCHA is a form of reverse Turing test.
Before being allowed to perform some action on a website, the user is presented with alphanumerical characters in a distorted graphic image and asked to type them out.
This is intended to prevent automated systems from being used to abuse the site.
The rationale is that software sufficiently sophisticated to read and reproduce the distorted image accurately does not exist (or is not available to the average user), so any system able to do so is likely to be a human.
Software that can reverse CAPTCHA with some accuracy by analysing patterns in the generating engine is being actively developed.
OCR or optical character recognition is also under development as a workaround for the inaccessibility of several CAPTCHA schemes to humans with disabilities.
Main article: Subject matter expert Turing test
Another variation is described as the subject matter expert Turing test, where a machine's response cannot be distinguished from an expert in a given field. This is also known as a "Feigenbaum test" and was proposed by Edward Feigenbaum.
The "Total Turing test" variation of the Turing test adds two further requirements to the traditional Turing test.
The interrogator can also test the perceptual abilities of the subject (requiring computer vision) and the subject's ability to manipulate objects (requiring Robotics).
The Minimum Intelligent Signal Test was proposed by Chris McKinstry as "the maximum abstraction of the Turing test", in which only binary responses (true/false or yes/no) are permitted, to focus only on the capacity for thought.
It eliminates text chat problems like anthropomorphism bias, and doesn't require emulation of unintelligent human behaviour, allowing for systems that exceed human intelligence.
The questions must each stand on their own, however, making it more like an IQ test than an interrogation.
It is typically used to gather statistical data against which the performance of artificial intelligence programs may be measured.
The organizers of the Hutter Prize believe that compressing natural language text is a hard AI problem, equivalent to passing the Turing test.
The data compression test has some advantages over most versions and variations of a Turing test, including:
It gives a single number that can be directly used to compare which of two machines is "more intelligent."
It does not require the computer to lie to the judge.
The main disadvantages of using data compression as a test are:
It is not possible to test humans this way.
It is unknown what particular "score" on this test—if any—is equivalent to passing a human-level Turing test.
A related approach to Hutter's prize which appeared much earlier in the late 1990s is the inclusion of compression problems in an extended Turing Test, or by tests which are completely derived from Kolmogorov complexity.
Other related tests in this line are presented by Hernandez-Orallo and Dowe.
Algorithmic IQ, or AIQ for short, is an attempt to convert the theoretical Universal Intelligence Measure from Legg and Hutter (based on Solomonoff's inductive inference) into a working practical test of machine intelligence.
Two major advantages of some of these tests are their applicability to nonhuman intelligences and their absence of a requirement for human testers.
The Turing test inspired the Ebert test proposed by film critic Roger Ebert which is a test whether a computer-based synthesised voice has sufficient skill in terms of intonations, inflections, timing and so forth, to make people laugh.
Turing predicted that machines would eventually be able to pass the test; in fact, he estimated that by the year 2000, machines with around 100 MB of storage would be able to fool 30% of human judges in a five-minute test, and that people would no longer consider the phrase "thinking machine" contradictory.
In practice, the Loebner Prize chatterbot contestants only managed to fool a judge once, and that was only due to the human contestant pretending to be a chatbot.
He further predicted that machine learning would be an important part of building powerful machines, a claim considered plausible by contemporary researchers in artificial intelligence.
In a paper submitted to the Midwest Artificial Intelligence and Cognitive Science Conference, . Shane T. Mueller predicted a modified Turing Test called a "Cognitive Decathlon" could be accomplished within 5 years.
By extrapolating an exponential growth of technology over several decades, futurist Ray Kurzweil predicted that Turing test-capable computers would be manufactured in the near future. In 1990, he set the year around 2020.
By 2005, he had revised his estimate to 2029.
The Long Bet Project Bet Nr. 1 is a wager of $20,000 between Mitch Kapor (pessimist) and Ray Kurzweil (optimist) about whether a computer will pass a lengthy Turing Test by the year 2029.
During the Long Now Turing Test, each of three Turing Test Judges will conduct online interviews of each of the four Turing Test Candidates (i.e., the Computer and the three Turing Test Human Foils) for two hours each for a total of eight hours of interviews.
The bet specifies the conditions in some detail.
1990 marked the fortieth anniversary of the first publication of Turing's "Computing Machinery and Intelligence" paper, and, thus, saw renewed interest in the test.
Two significant events occurred in that year: The first was the Turing Colloquium, which was held at the University of Sussex in April, and brought together academics and researchers from a wide variety of disciplines to discuss the Turing Test in terms of its past, present, and future; the second was the formation of the annual Loebner Prize competition.
Blay Whitby lists four major turning points in the history of the Turing Test – the publication of "Computing Machinery and Intelligence", the announcement of Joseph Weizenbaum's ELIZA 66, Kenneth Colby's creation of PARRY, which was described in the Turing Colloquium.
In November 2005, the University of Surrey hosted an inaugural one-day meeting of artificial CONVERSATIONAL entity developers, attended by winners of practical Turing Tests in the Loebner Prize: Robby Garner, Richard Wallace and Rollo Carpenter.
Invited speakers included David Hamill, Hugh Loebner (sponsor of the Loebner Prize) and Huma Shah.
In parallel to the Loebner Prize held at the University of Reading, the Society for the Study of Artificial Intelligence and the Simulation of Behaviour, hosted a one-day symposium to discuss the Turing Test, organized by John Barnden, Mark Bishop, Huma Shah and Kevin Warwick.
The Speakers included Royal Institution's Director Baroness Susan Greenfield, Selmer Bringsjord, Turing's biographer Andrew Hodges, and consciousness scientist Owen Holland.
No agreement emerged for a canonical Turing Test, though Bringsjord expressed that a sizeable prize would result in the Turing Test being passed sooner.
Sixty years following its introduction, continued argument over Turing's 'Can machines think?' experiment led to its reconsideration for the 21st Century Symposium at the AISB Convention, held 29 March to 1 April 2010 in De Montfort University, UK. The AISB is the (British) Society for the Study of Artificial Intelligence and the Simulation of Behaviour.
Throughout 2012, a number of major events took place to celebrate Turing's life and scientific impact.
The Turing100 group supported these events and also, organized a special Turing test event in Bletchley Park on 23 June 2012 to celebrate the 100th anniversary of Turing's birth.
Latest discussions on the Turing Test in a symposium with 11 speakers, organized by Vincent C. Müller (ACT & Oxford) and Aladdin Ayeshm (de Montfort) – with Mark Bishop, John Barnden, Alessio Plebe and Pietro Perconti.
See also:
|
|
|
Notes:
- Saygin 2000.
- Turing originally suggested a teleprinter, one of the few text-only communication systems available in 1950. Turing does not call his idea "Turing test", but rather the "Imitation Game"; however, later literature has reserved the term "Imitation game" to describe a particular version of the test. See #Versions of the Turing test, below. Turing gives a more precise version of the question later in the paper:
These questions are equivalent to this, 'Let us fix our attention on one particular digital computer C. Is it true that by modifying this computer to have an adequate storage, suitably increasing its speed of action, and providing it with an appropriate programme, C can be made to play satisfactorily the part of A in the imitation game, the part of B being taken by a man?'". Ssee Russell and Norvig where they comment, "Turing examined a wide variety of possible objections to the possibility of intelligent machines, including virtually all of those that have been raised in the half century since his paper appeared."
- Descartes, René (1996). Discourse on Method and Meditations on First Philosophy. New Haven & London: Yale University Press. pp. 34–5.
Diderot, D. (2007), Pensees Philosophiques, Addition aux Pensees Philosophiques, [Flammarion], p. 68.
For an example of property dualism, see Qualia.
Noting that materialism does not necessitate the possibility of artificial minds (for example, Roger Penrose), any more than dualism necessarily precludes the possibility. (See, for example, Property dualism.)
Ayer, A. J. (2001), "Language, Truth and Logic", Nature (Penguin) 138 (3498): 140, Bibcode:1936Natur.138..823G, doi:10.1038/138823a0, ISBN 0-334-04122-8
The Dartmouth conferences of 1956 are widely considered the "birth of AI". (Crevier 1993, p. 49)
McCorduck 2004, p. 95.
Copeland 2003, p. 1.
Copeland 2003, p. 2.
-
"Intelligent Machinery" (1948) was not published by Turing, and did not see publication until 1968 in:
Turing 1948, p. 412.
In 1948, working with his former undergraduate colleague, DG Champernowne, Turing began writing a chess program for a computer that did not yet exist and, in 1952, lacking a computer powerful enough to execute the program, played a game in which he simulated it, taking about half an hour over each move.
The game was recorded, and the program lost to Turing's colleague Alick Glennie, although it is said that it won a game against Champernowne's wife.
Turing 1948, p. [page needed].
- Harnad 2004, p. 1.
Turing 1950, p. 434.
- Turing 1950, p. 446.
Turing 1952, pp. 524–525. Turing does not seem to distinguish between "man" as a gender and "man" as a human. In the former case, this formulation would be closer to the Imitation Game, whereas in the latter it would be closer to current depictions of the test.
- Weizenbaum 1966, p. 37.Weizenbaum 1966, p. 42.Thomas 1995, p. 112. Bowden 2006, p. 370. Colby et al. 1972, p. 42. Saygin 2000, p. 501.Withers, Steven (11 December 2007), "Flirty Bot Passes for Human", iTWire Williams, Ian (10 December 2007), "Online Love Seerkers Warned Flirt Bots", V3 Searle 1980.
There are a large number of arguments against Searle's Chinese room. A few are:
-
Hauser, Larry (1997), "Searle's Chinese Box: Debunking the Chinese Room Argument", Minds and Machines 7 (2): 199–226, doi:10.1023/A:1008255830248 .
Rehman, Warren. (2009 Jul 19.), Argument against the Chinese Room Argument Check date values in:
|date=
(help).
Thornley, David H. (1997), Why the Chinese Room Doesn't Work
M. Bishop & J. Preston (eds.) (2001) Essays on Searle's Chinese Room Argument. Oxford University Press.
- Saygin 2000, p. 479.Sundman 2003. Loebner 1994.
- Shapiro 1992, p. 10–11 and Shieber 1994, amongst others.Shieber 1994, p. 77.Jabberwacky is discussed in Shah & Warwick 2009a.Turing test, on season 4 , episode 3 of Scientific American Frontiers.Turing 1950, p. 442.
Results and report can be found in: "Can a machine think? – results from the 18th Loebner Prize contest", University of Reading .
- "Rules for the 20th Loebner Prize contest", Loebner.net "Computer passes Turing Test for first time by convincing judges it is a 13-year-old boy". The Verge. Retrieved 8 June 2014.
"Turing Test success marks milestone in computing history". University of Reading. Retrieved 8 June 2014.
Debunking Eugene: Montreal cognitive scientist doubts UK university's Turing test claim CBC Canada "As It Happens" June 10, 2014
- Adam Mann (9 June 2014). "That Computer Actually Got an F on the Turing Test". Wired. Retrieved 9 June 2014. Traiger 2000.Saygin 2008. Shah 2011.Oppy, Graham and Dowe, David (2011) The Turing Test. Stanford Encyclopedia of Philosophy. Moor 2003. Traiger 2000, p. 99.Sterrett 2000.Genova 1994, Hayes & Ford 1995, Heil 1998, Dreyfus 1979 R.Epstein, G. Roberts, G. Poland, (eds.) Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer. Springer: Dordrecht, NetherlandsThompson, Clive (July 2005). "The Other Turing Test". Issue 13.07 (WIRED magazine). Retrieved 10 September 2011.
As a gay man who spent nearly his whole life in the closet, Turing must have been keenly aware of the social difficulty of constantly faking your real identity. And there's a delicious irony in the fact that decades of AI scientists have chosen to ignore Turing's gender-twisting test – only to have it seized upon by three college-age women. (Full version).Colby et al. 1972.Swirski 2000. Saygin & Cicekli 2002. Turing 1950, under "Critique of the New Problem". Haugeland 1985, p. 8. "These six disciplines," write Stuart J. Russell and Peter Norvig, "represent most of AI." Russell & Norvig 2003, p. 3 Watson:
- "Watson Wins 'Jeopardy!' The IBM Challenge", Sony Pictures, 16 February 2011
- Shah, Huma (5 April 2011), Turing's misunderstood imitation game and IBM's Watson success
- Saygin & Cicekli 2002, pp. 227–258.
Turing 1950, p. 448.
Jose Hernandez-Orallo (2000), "Beyond the Turing Test", Journal of Logic, Language and Information 9 (4): 447–466, doi:10.1023/A:1008367325700, retrieved 21 July 2009.
Hernandez-Orallo, J; Dowe, D L (2010), "Measuring Universal Intelligence: Towards an Anytime Intelligence Test", Artificial Intelligence Journal 174 (18): 1508–1539, doi:10.1016/j.artint.2010.09.006.
Russell & Norvig (2003, pp. 958–960) identify Searle's argument with the one Turing answers.
Warwick,K. and Shah,H., "Human Misidentification in Turing Tests", Journal of Experimental and Theoretical Artificial Intelligence, DOI:10.1080/0952813X.2014.921734, 2014
Russell & Norvig 2003, p. 3.
Turing 1950, under the heading "The Imitation Game," where he writes, "Instead of attempting such a definition I shall replace the question by another, which is closely related to it and is expressed in relatively unambiguous words."
McCarthy, John (1996), "The Philosophy of Artificial Intelligence", What has AI in Common with Philosophy?
Malik, Jitendra; Mori, Greg, Breaking a Visual CAPTCHA
McCorduck 2004, pp. 503–505, Feigenbaum 2003. The subject matter expert test is also mentioned in Kurzweil (2005)
Russell & Norvig 2010, p. 3.
McKinstry, Chris (1997), "Minimum Intelligent Signal Test: An Alternative Turing Test", Canadian Artificial Intelligence (41)
D L Dowe and A R Hajek (1997), "A computational extension to the Turing Test", Proceedings of the 4th Conference of the Australasian Cognitive Science Society, retrieved 21 July 2009.
Jose Hernandez-Orallo (2000), "Beyond the Turing Test", Journal of Logic, Language and Information 9 (4): 447–466, doi:10.1023/A:1008367325700, retrieved 21 July 2009.
An Approximation of the Universal Intelligence Measure, Shane Legg and Joel Veness, 2011 Solomonoff Memorial Conference
Alex_Pasternack (18 April 2011). "A MacBook May Have Given Roger Ebert His Voice, But An iPod Saved His Life (Video)". Motherboard. Retrieved 12 September 2011.
He calls it the "Ebert Test," after Turing's AI standard...
Shane T. Mueller, PhD (2008), "Is the Turing Test Still Relevant? A Plan for Developing the Cognitive Decathlon to Test Intelligent Embodied Behavior", Paper submitted to the 19th Midwest Artificial Intelligence and Cognitive Science Conference: 8pp, retrieved 8 September 2010
Kapor, Mitchell; Kurzweil, Ray, "By 2029 no computer – or "machine intelligence" – will have passed the Turing Test", The Arena for Accountable Predictions: A Long Bet
Whitby 1996, p. 53.
ALICE Anniversary and Colloquium on Conversation, A.L.I.C.E. Artificial Intelligence Foundation, retrieved 29 March 2009
Loebner Prize 2008, University of Reading.
AISB Symposium on the Turing Test, Society for the Study of Artificial Intelligence and the Simulation of Behaviour.
- Towards a Comprehensive Intelligence Test (TCIT): Reconsidering the Turing Test for the 21st Century Symposium
References:
- SCP-079 File (SCP containment breach)
- "Artificial Stupidity", The Economist 324
- Bion, W.S. "Making the best of a bad job", Clinical Seminars and Four Papers, Abingdon: Fleetwood Press.
- Bowden, Margaret A. Mind As Machine: A History of Cognitive Science, Oxford University Press....
- Colby, K. M.; Hilf, F. D.; Weber, S.; Kraemer, H. (1972), "Turing-like indistinguishability tests for the validation of a computer simulation of paranoid processes", Artificial Intelligence 3.
- Copeland, Jack (2003), Moor, James, ed., "The Turing Test", The Turing Test: The Elusive Standard of Artificial Intelligence (Springer).
- Crevier, Daniel (1993), AI: The Tumultuous Search for Artificial Intelligence, New York, NY: BasicBooks.
- Dreyfus, Hubert What Computers Still Can't Do, New York: MIT Press.
- Feigenbaum, Edward A. "Some challenges and grand challenges for computational intelligence", Journal of the ACM 50.
- Genova, J. "Turing's Sexual Guessing Game", Social Epistemology 8.
- Harnad, Stevan. "The Annotation Game: On Turing (1950) on Computing, Machinery, and Intelligence", in Epstein, Robert; Peters, Grace, The Turing Test Sourcebook: Philosophical and Methodological Issues in the Quest for the Thinking Computer, Klewer
- Haugeland, John. Artificial Intelligence: The Very Idea, Cambridge, Mass.: MIT Press .
- Hayes, Patrick; Ford, Kenneth (1995), "Turing Test Considered Harmful", Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI95-1), Montreal, Quebec, Canada.: 972–997
- Heil, John. Philosophy of Mind: A Contemporary Introduction, London and New York: Routledge.
- Hinshelwood, R.D. Group Mentality and Having a Mind: Reflections on Bion's work on groups and on psychosis
- Kurzweil, Ray. The Age of Intelligent Machines, Cambridge, Mass.: MIT Press.
- Kurzweil, Ray. The Singularity is Near, Penguin Books.
- Loebner, Hugh Gene. "In response", Communications of the ACM 37.
- McCorduck, Pamela, Machines Who Think (2nd ed.), Natick, MA: A. K. Peters, Ltd.
Further reading: Cohen, Paul R. "'If Not Turing's Test, Then What?", AI Magazine 26. Moor, James H. "The Status and Future of the Turing Test", Minds and Machines 11
Subscribe to:
Posts (Atom)