The Grice Club


The Grice Club

The club for all those whose members have no (other) club.

Is Grice the greatest philosopher that ever lived?

Search This Blog

Saturday, February 27, 2016

The horseshoe's implicature; or, robbing Peter Wason to pay Paul Grice!


A little logic goes a long way when we base an experiment on semantic theory in the cognitive science of conditional reasoning

The psychology of reasoning studies how subjects draw conclusions from premises — the process of derivation.

But premises have to be interpreted before any conclusions can be drawn.

Although premise interpretation has received recurrent attention (e.g., Byrne, 1989; Gebauer & Laming, 1997; Henle, 1962; Newstead, 1995) the full range of dimensions of interpretation facing the subject has not been considered.

Nor has interpretation been properly distinguished from derivation from an interpretation in a way that enables interactions between interpretation and derivation to be analysed.

Our general thesis is that integrating accounts of interpretation with accounts of derivation can lead to deeper insight in cognitive theory generally, and human reasoning in particular.

The present essay exemplifies this general claim in the domain of Peter Wason’s infamous selection task, an important reference point for several prominent cognitive theories of reasoning, under the light of Paul Grice's theory of the conversational implicatures of 'if' when interpreted as the horseshoe.

What is meant by interpretation in this context?

Interpretation maps representation systems (linguistic, diagrammatic, ... ) onto the things in the world which are represented.

Interpretation decides such matters as: which things in the world generally correspond to which words; which of these things are specifically in the domain of interpretation of the current discourse; which structural description should be assigned to an utterance; which propositions are assumed and which derived; which notions of validity of argument are intended; and so on. Natural languages such as English sometimes engender the illusion that such matters are settled by general knowledge of the language, but it is easy to see that this is not so.

Each time a sentence such as

i. The presidents of France were bald.

is uttered, its users must decide, for example, who is included, and how bald is bald, for present purposes.

In the context of the selection task we shall see that there are quite a few such decisions which subjects have to make, each resolvable in a variety of ways, and each with implications for what response to make in the task. Of course, interpretation, in this sense, is very widely studied in philosophy, logic and linguistics (and even psycholinguistics) as we document in our references throughout the paper.

Our thesis is that interchange between these studies and psychological studies of reasoning has been inadequate.

Perhaps because the methods of the fields are so divergent, there has been a reluctance to take semantic analyses seriously as a guide to psychological processes, and many of the concepts of logic are loosely employed in psychology, at best.

There are, of course, honourable exceptions which we will consider in our discussion.

The current paper is part of a more general program for raising the prominence of interpretative processes in cognitive theories of human reasoning.

Stenning explores the interpretative processes of representation selection that are revealed by comparing reasoning with diagrammatic versus sentential representations.

 Stenning and van Lambalgen use a non-monotonic logic to model the process of credulous interpretation whereby a hearer attempts to construct the speaker’s intended model, while suspending any disbeliefs.

This formal semantics for the conditional provides an account of the general kind of interpretation that subjects are most likely to begin from in their attempts to interpret the task and materials in the descriptive selection task and so underpins the informal discussion here.

Stenning, Yule, and Cox and Stenning and Cox revise and extend Newstead’s attempts to model the interpretational variety exhibited by subjects faced by syllogistic reasoning tasks.

In this paper we take Wason’s AD47 selection task and argue that the mental processes it evokes in subjects are, quite reasonably, dominated by interpretative processes.

Wason’s AD47 task is probably the most intensively studied task in the psychology of reasoning literature and has been the departure point, or point of passage, for several high profile cognitive theories: mental models theory, Griceian theory (Cara, Girotto), ‘evolutionary psychology’ (Cosmides), rational analysis (Oaksford and Chater).

We will argue and present empirical evidence that each of these theories misses critical contributions that logic and semantics can make to understanding the task.

For various reasons the materials of the task exert contradictory pressures for conflicting interpretations, and we argue that what we observe are subjects’ various, not always very successful, efforts to resolve these conflicts. The results of our experiments expose rich individual variation in reasoning and learning and so argue for novel standards of empirical analysis of the mental processes involved.

Wason’s original task was presented as follows:

Below is depicted a set of four cards, of which you can see only the exposed face but not the hidden back.

On each card, there is a number on one of its sides and a letter on the other.

Also below there is a rule which applies only to the four cards.

Your task is to decide which if any of these 4 cards you must turn in order to decide if the 'if' utterance is true.

Don’t turn unnecessary cards.

Tick the cards you want to turn.

"If" uterance:

If there is a vowel on one side, there is an even number on the other side.

The modal response  is to turn A and 4.

Very few subjects (around 5–10%) turn A and 7.

Wason (and the great majority of researchers up to the present) assume without considering alternatives, that correct performance is to turn the A and 7 cards only.

Oaksford and Chater’s inductive rational choice model was the first to challenge this assumption, by rejecting deductive models entirely.

Wason adopted, seemingly without awareness of alternatives, this criterion of good reasoning from a classical logical interpretation of the rule.

The acceptance of classical logic as a suitable normative basis for stipulating correct performance sits oddly with Wason’s and other researchers’ emphasis on content as opposed to logical form as a basis for modelling human reasoning.

We know (and knew in 1968) from much linguistic and philosophical study of conditionals that logically naive undergraduate subjects should not be expected to interpret Wason’s rule as a classical logical conditional (material implication, horseshoe, alla Grice, _sans_ implicature).

Wason also knew this, if only from Wetherick.

Rather than rejecting logical form (with the horseshoe) as a basis for analysing this reasoning, we ask why shouldn’t subjects be judged on the correctness of their reasoning according to whatever interpretations they do reasonably adopt?

This is the line of questioning this paper pursues.

In a very real sense, Wason got his own task wrong in stipulating that there was a particular ‘obviously correct’ answer.

By the lights of the commonest interpretation of the experimental material by undergraduate subjects as a defeasible rule robust to exceptions, the ‘competence’ answer would be to respond that no combination of card choices can falsify the rule, because any possible counterexamples are indistinguishable from exceptions.

And no finite combination of choices can prove the rule is true.

Alternatively, subjects with other plausible interpretations of the task and rule might reasonably want to respond that several alternative sets of cards would test the rule equally well, and this again is not an available response.

There are of course many psychological reasons why we should not expect subjects to make these kinds of responses even if they were offered as possibilities.

There are strong demand characteristics and authority relations in the experimental situation, and besides, subjects are not accustomed to reflecting on their language use and lack a vocabulary for talking about and distinguishing the elementary semantic concepts which are required to express these issues.

Taking interpretation seriously does not mean we thereby assume reasoning is perfect, nor that we reject classical logic as one (possibly educationally important) logical model.

But the unargued adoption of classical logic as a criterion of correct performance is thoroughly anti-logical. In our discussion we review some of the stances towards logic exhibited by the prominent cognitive theories that have made claims about the selection task, and appraise them from the current viewpoint.

The empirical investigation that ensued from Wason’s original experiment can be seen as a search for contents of rules which make the task ‘easy’ or ‘hard’ according to the classical competence criterion. Differences in accuracy of reasoning are then explained by various classifications of content.

We argue here that by far the most important determinant of ease of reasoning is whether interpretation of the rule assigns it descriptive or deontic form, and we explain the effect of this interpretative choice in terms of the the many problems descriptive interpretation creates in this task setting, as contrasted with the ease of reasoning in this setting with deontic interpretations.

Descriptive conditionals describe states of affairs and are therefore true or false as those states of affairs correspond to the conditionals’ content.

Deontic conditionals state how matters should be according to some (legal) law or regulation, or preference.

The semantic relation between sentence and case(s) for deontics is therefore quite different than for descriptives.

With descriptives, sets of cases may make the conditional true, or make it false.

With deontics, cases individually conform or not, but they do not affect the status of the law (or preference, or whatever). This is of course a crude specification of the distinction.

We shall have some more specific proposals to make below. But it is important for the empirical investigation to focus on these blunt differences that all analyses of the distinction agree on.

Our experiments do not seek to resolve fine differences between semantic analyses, but to show the empirical importance of the broad semantic categories.

In English, the semantic distinction between descriptives and deontics is not reflected simply on the surface of sentences.

Deontics are often expressed using subjunctives or modals — should, ought, must— but are equally often expressed with indicative verbs.

It is impossible to tell without consultation of context, whether a sentence such as

In the UK, vehicles drive on the left.

is to be interpreted descriptively or deontically—as a generalisation or a legal prescription.

Conversely, subjunctive verbs and modals are often interpreted descriptively -- e.g., in the sentence

If it’s 10.00 a.m., that should (must) be John.

-- said on hearing the doorbell, the modal expresses an inference about a description.

This means that we as experimenters cannot determine this semantic feature of subjects’ interpretation of conditionals simply by changing auxilliary verbs in rules.

A combination of rule, content, and subjects’ knowledge influences whether they assign a deontic or descriptive form.

For example, from the selection task, when the original ‘abstract’ (versus, descriptive) form of the selection task proved so counter-intuitively hard, attention rapidly turned to finding materials that made the task easy.

Johnson-Laird, Legrenzi, and Legrenzi  showed that a version of the task using a UK postal regulation --

If a letter has a second class stamp, it is left unsealed.

-- produced near-ceiling performance.

Though they described the facilitation in terms of familiarity, we believe that what was critical was that the rule, though stated indicatively, was interpreted deontically by their knowledgeable subjects.

The same rule was later found by Griggs and Cox to fail to facilitate the performance of American subjects unfamiliar with the postal regulation.

Again we believe that this was because such subjects, lacking contextual knowledge, did not interpret the rule deontically but as a descriptive generalisation.

What is critical, we argue, is not familiarity per se but deontic interpretation.

Wason and Green similarly showed that a rule embedded in a ‘production-line inspection’ scenario also produced good performance.

This rule was also deontic—about what manufactured items ought to be like -- e.g.,

If the wool is blue, it must be 4 ft. long.

Griggs and Cox showed that reasoning about a drinking age law was also easy.

Cheng and Holyoak developed the theory that success on deontic selection tasks was based on pragmatic reasoning schemas.

Although they present this theory as an alternative to logic-based theories it arguably presents a fragmentary deontic logic with some added processing assumptions (theorem prover) about the ‘perspective’ from which the rule is viewed.

However, Cheng and Holyoak did not take the further step of analysing abstractly the contrasting difficulties which descriptive conditionals pose in this task.

Cosmides and Tooby went on to illustrate a range of deontic materials which produce ‘good’ reasoning, adding the claim this facilitation only happened with social contract rules.

Cosmides and her collaborators used the argument that only social contract material was easy, to claim that humans evolved innate modular ‘cheating detector algorithms’ which underpin selection task performance on social contract rules.

Recent work has extended the evolutionary account by proposing a range of detectors beyond cheating detectors which are intended to underpin reasoning with, for example, precautionary conditionals (Fiddick, Cosmides, and Tooby).

Cummins has argued against this proliferation that the innate module concerned is more general and encompasses all of deontic reasoning.

Our logical analysis of the selection task will show that once close attention is paid to the logical differences between descriptive and deontic tasks, none of this evidence can bear either way on arguments about innateness or evolution.

The reasoning task with descriptives is simply harder than that with deontics because it engenders complex conflicts of interpretation in the context of the selection task. These observations of good reasoning with deontic conditionals and poor reasoning with descriptive conditionals were not classified as such in this literature at the time.

They were rather reported as effects of content on reasoning with rules of the same logical form. Cosmides and Tooby are explicit about logic being their target.

On this view [the view Cosmides and Tooby attack], reasoning is viewed as the operation of content-independent procedures, such as formal logic, applied impartially and uniformly to every problem, regardless of the nature of the content involved. (Cosmides & Tooby, 1992, p. 166)

Johnson-Laird equally does not allow that content can affect inference through interpretation’s effect on form.

Few select the card bearing 7, even though, if it had a vowel on its other side, it would falsify the rule.

People are much less susceptible to this error of omission when the rules and materials have a sensible content, e.g. when they concern postal regulations.

Hence the content of a problem can affect reasoning, and this phenomenon is contrary to the notion of formal rules of inference.

Wason himself, discussing the Wason and Shapiro Manchester/train thematic problem, rejects the idea that there are structural differences between thematic and abstract tasks.

The thematic problem proved much easier than a standard abstract version which was structurally equivalent.

Neither were implications of the difference in tasks (“test whether the rule is true or false” vs. “find out whether any cases are breaking the rule”) ever systematically explored.

Our proposal about the selection task at its simplest is that, under descriptive interpretations, multiple asymmetrical relations between sets of cases play roles in determining the truth value of the rule, and it is not even clear whether the compliance or non-compliance of cases alone can make the rule, as interpreted, true or false.

Under deontic interpretation, in contrast, the relation between each case and the rule is independent of the relation between other cases— cases comply or not. Case compliance has no impact on the status of the rule.

These blunt semantic differences mean that the original descriptive (abstract) task poses many difficulties to naive reasoners not posed by the deontic task. Previous work has pointed to the differences between the deontic and descriptive tasks (e.g., Oaksford and Chater; Manktelow and Over).

What is novel here is the derivation of a variety of particular dif- ficulties to be expected from the interaction of semantics and task, and the presentation of an experimental program to demonstrate that subjects really do experience these problems.

Deriving a spectrum of superficially diverse problems from a single semantic distinction supports a powerful empirical generalisation about reasoning in this task that had been missed, and an explanation of why that generalisation holds. It also strongly supports the view that subjects’ problems are highly variable and so reveals an important but much neglected level of empirical analysis.

 It is important to distinguish coarser from finer levels of semantic analysis in understanding our predictions. At finer levels of analysis, we will display a multiplicity of interpretative choices and insist on evidence that subjects adopt a variety of them—both across subjects and within a single subject’s episodes of reasoning.

At this level we certainly do not predict any specific interpretation. At coarser levels, of analysis such as between truth-functional and non-truth functional conditionals, and between descriptive and deontic conditionals, it is possible to predict highly specific consequences of adopting one of the other kind of reading in different versions of the task, and to show that these consequences are evident in the data. If they do appear as predicted, then that provides strong evidence that interpretative processes are driving the data. In fact, many of these consequences have been observed before but have remained unconnected with each other, and not appreciated for what they are—the various consequences of a homogeneous semantic distinction. Semantics supplies an essential theoretical base for understanding the psychology of reasoning.

The plan of the essay is as follows.

We begin by presenting in the next section what we take to be essential about a modern logical approach to such cognitive processes as are invoked by the selection task. The following section then uses this apparatus to show how the logical differences between descriptive and deontic selection tasks can be used to make predictions about problems that subjects will have in the former but not the latter. The following section turns these predictions into several experimental conditions, and presents data compared to Wason’s original task as baseline. Finally, we discuss the implications of these findings for theories of the selection task and of our interpretative perspective for cognitive theories more generally.

The selection task is concerned with reasoning about the natural language conditional ‘if’.

The reasoning patterns that are valid for this expression can only be determined after a logical form is assigned to the sentence in which this expression occurs.

The early interpretations of the selection task all assumed that the logical form assigned to ‘if’ should be the connective → or rather the horseshoe with the semantics given by classical propositional logic.

We want to argue that this easy identification is not in accordance with a modern conception of logic.

By this, we do not just mean that modern logic has come up with other competence models beside classical logic.

Rather, the easy identification downplays the complexity of the process of assigning logical form. In a nutshell, modern logic sees itself as concerned with the mathematics of reasoning systems. It is related to a concrete reasoning system such as classical propositional logic as geometry is related to light rays. It is impossible to say a priori what is the right geometry of the physical world; however, once some coordinating definitions (such as ‘a straight line is to be interpreted by a light ray’) have been made, it is determined which geometry describes the behaviour of these straight lines, and hypotheses about the correct geometry become falsifiable. Similarly, it does not make sense to determine a priori what is the right logic.

This depends on one’s notion of truth, semantic consequence, and more.

But once these parameters have been fixed, logic, as the mathematics of reasoning systems, determines what is and isn’t a valid consequence.

In this view it is of prime importance to determine the type of parameter that goes into the definition of what a logical system is, and, of course, the psychological purposes that might lead subjects to choose one or another setting in their reasoning.

This parameter-setting generally involves as much reasoning as does the reasoning task assigned to the subject. We are thus led to the important distinction between reasoning from an interpretation. and reasoning for an interpretation. The former is what is supposed to happen in a typical inference task: given premises, determine whether a given conclusion follows.

But because the premises are formulated in natural language, there is room for different logical interpretations of the given material and intended task. Determining what the appropriate logical form is in a given context itself involves logical reasoning which is far from trivial in the case of the selection task.

So what are the important parameters in a logical system?

Motivated by the great variety of logical systems, logicians have tried to come up with a general definition which encompasses them all.

Two main approaches can be distinguished here, one syntactic, and one semantic.

On the SYNTACTIC approach (for which see Gabbay), a logical system is defined by a derivability relation between sentences satisfying certain minimal properties, such as, for example, that ϕ ϕ.

This view captures a great many logical systems, based on vastly different principles.

However, the communis opinio is that the syntactic approach is still not general enough.

Take the example of the inference ϕ ϕ (called Identity or Reflexivity), which is generally considered to be a minimum requirement for a logical system.

Semantically speaking, it expresses that one considers the same type of models both on the left side and the right side of the turnstile.

But there exist logics for which this does not apply, e.g. logics where the models appearing on the right side are the result of operations applied to models on the left side.

This shows that a syntactic characterisation of logic is likely to be artificial.

Intuitions reside at the semantic level.

The more general notion of ‘logical system’ is therefore semantic.

In the sense that it involves the interplay between a language and its interpretation.

Let N be (a fragment of) natural language.

Assigning logical form to expressions in N at the very least involves the following steps.

We need a formal language into which N is translated.

We need the expression in L which translates an expression in N.

We need the semantics S for L.

We need the definition of "validity"of arguments ψ1,... ,ψn/ϕ, with premises "ψi" and conclusion "ϕ".

We can see from this list that assigning a Griceian "logical form" (even if Grice saves, there's no free lunch) is a matter of setting parameters.

For each item on the list, there are many possibilities for variation.

Consider as a non-trivial example, the choice of a formal language.

One possibility here is the ordinary recursive definition, which has clauses like

ii. if p, q.

are formulas, then so is

iii. pq.

thus allowing for iteration of the conditional.

However, another possibility is where formation of

iii. pq

is restricted to atomic p and q which do not themselves contain "⊃".

Or a language may contain two ⊃-like operators, one of which can be iterated and one of which can’t.

When interpreting an ‘if’ in English, as Grice's infame goes, a choice has to be made on which formal expression ‘if ’ is to be mapped.

In fact, the formally non-iterable 'if' is in many ways a more appropriate model for the natural language conditional than the usual iterable construct.

A possible rejoinder could be as folows.

Granted that a natural language conditional is hardly ever iterated, while keeping the same meaning, surely one is entitled to a bit of idealisation to smoothe the formal development.

The trouble is that this idealisation imposes a constraint on the semantics.

As one can see from a formula such as

iv. (p   q)    r.

-- a   in the antecedent of another   makes sense only if the former conditional can be false.

Otherwise the formula would just be equivalent to r.

But many natural language conditionals, such as, for example, generic statements cannot be false, which is not to say that they are true in the classical sense.

A Griceian might make the case for the hypothesis that the conditionals occurring in Wason’s task are often interpreted as being of this non-iterable type.

Once one has chosen a formal language, one must provide a definition of "satisfaction" and "truth" (or 'alethic satisfactoriness', as Grice prefers).

We will see that Wason subjects do not automatically assume the classical definition of satisfaction and truth here.

Rather, they are groping to find a definition of truth which is appropriate to the context.

At the most abstract level, the semantics for a language is given by a recursively defined binary relation x ϕ, where ϕ is a formula.

Different types of objects can be filled in for x, but the most prominent cases in logical theorising are models (e.g., classical and modal logics), and information states (e.g., dynamic logic).

Models are descriptions of states of affairs or possible worlds, and information states describe the available evidence.

The relation can be read as ‘makes true’, 'verifies', or ‘supports’, where the latter reading is of course more appropriate if the left argument of is an information state.

The relation may also contain an implicit numerical argument, indicating, say, DEGREE of support.

Even when the operator is read as ‘makes true’ or 'verify', or 'confirm', or 'support', this should not be taken as implying that ‘true’ has a classical meaning here, satisfying not-false = true.

That depends entirely on the nature of the recursive definition of , e.g. the clauses for the negation of a formula.

Moreover, even if not-false ≡ true, this does not imply that there exists a third truth value different from true and false, since the semantics might be non-truth-functional altogether.

But for some cases, mostly those in which we have only partial descriptions of the world, a three-valued logic may be appropriate.

It is at this level that the important distinction between descriptive and deontic can be made.

This distinction plays a prominent role in our analyses of the experimental data.

Intuitively, one may say that a descriptive conditional can be true or false on a given domain.

A single counterexample, as opposed to an exception, suffices to "falsify" the conditional.

A deontic conditional has different logical properties.

Examples may or may not comply with the conditional, but by themselves examples cannot make the conditional true or false.

The name ‘deontic’ derives from one characteristic use of conditionals of this type, as formalisations of norms.

A violation of the norm does not thereby establish that the norm is false — indeed the latter expression makes no sense.

But, logically speaking, something much more general is going on.

Perhaps surprisingly, the definition of the validity of an argument is also an independent parameter.

The classical definition of "validity" -- an argument is valid if the conclusion is true in all models of the premises -- is one possibility.

We have already pointed out that this assumes that premises and conclusion are evaluated with respect to the same models.

This however is not always the case.

The classical notion of validity may also give way to a non-monotonic notion of validity, the general form of which is as follows.

An argument is valid if the conclusion is valid in all preferred models of the premisses.

One concrete instance of this is so-called closed world reasoning, in which one assumes (roughly speaking) that all statements are false which are not forced to be true by the premises.

This type of reasoning is "non-monotonic" (or 'defeasible', as Grice, following Hart prefers) in the sense that the addition of a premise α to a given set of premises may destroy previous inferences from alone.

One example of such closed world reasoning is the often observed conversion of the conditional.

(pq) (qp)

The reader may think that the above variety is mainly due to Grice inventing new exotic but perfectly inapplicable systems.

This is not so.

As soon as logic left the confines of mathematics and turned to the formalisation of reasoning in natural language and of what is known as ‘common sense reasoning’, it was noticed that the parameter choices which worked well for mathematics, were unnecessarily restrictive in contexts closer to daily life; and also that there is no single setting which suffices for all such applications. In a nutshell, therefore, the interpretative problem facing a subject in a reasoning task is to provide settings for all these parameters—this is what is involved in assigning logical form.

It has been the bane of the psychology of reasoning that it operates with an oversimplified notion of logical form.

Typically, in the psychology of reasoning assigning logical form is conceived of as translating a natural language sentence into a formal language whose semantics is supposed to be given, but this is really only the beginning: it fixes just one parameter.

We do not claim that subjects know precisely what they are doing; that is, most likely subjects do not know in any detail what the mathematical consequences of their choices are.

We do claim, however, that subjects worry about how to set the parameters, and below we offer data obtained from tutorial dialogues to corroborate this claim.

This is not a descent into post-modern hermeneutics.

This doomful view may be partly due to earlier psychological invocations of interpretational defences against accusations of irrationality in reasoning, perhaps the most cited being Henle.

There exist no errors of reasoning, only differences in interpretation.

It is possible however to make errors in reasoning: the parameter settings may be inconsistent, or a subject may draw inferences not consistent with the settings.

From the point of view of the experimenter, once all the parameters are fixed, it is mathematically determined what the extension of the consequence relation will be; and the hypotheses on specific parameter settings therefore become falsifiable.

In particular, the resulting mathematical theory will classify an infinite number of reasoning patterns as either valid or invalid. In principle there is therefore ample room for testing whether a subject has set his parameters as guessed in the theory: choose a reasoning pattern no instance of which is included in the original set of reasoning tokens. In practice, there are limitations to this procedure because complex patterns may be hard to process.

Be that as it may, it remains imperative to obtain independent confirmation of the parameter settings by looking at arguments very different from the original set of tokens.

This is for instance our motivation for obtaining data about the meaning of negation in the context of the selection task (more on this below): while not directly relevant to the logical connectives involved in the selection task, it provided valuable insight into the notions of truth and falsity.

Psychology is in some ways harder once one acknowledges interpretational variety, but given the overwhelming evidence for that variety, responding by eliminating it from psychological theory is truly the response of the drunk beneath the lamp post. In fact, in some counterbalancing ways, psychology gets a lot easier because there are so many independent ways of getting information about subjects’ interpretations—such as tutorial dialogues.

Given the existence of interpretational variety, the right response is richer empirical methods aimed at producing convergent evidence for deeper theories which are more indirectly related to the stimuli observed. What the richness of interpretation does mean is that the psychology of reasoning narrowly construed has less direct implications for the rationality of subjects’ reasoning.

What was right about the earlier appeals to interpretational variation is that it indeed takes a lot of evidence to confidently convict subjects of irrationality. It is necessary to go to great lengths to make a charitable interpretation of what they are trying to do and how they understand what they are supposed to do, before one can be in a position to assert that they are irrational.

Even when all this is done, the irrational element can only be interpreted against a background of rational effort.

We now apply the preceding considerations to the process of assigning a logical form to the conditional occurring in the standard selection task.

Wason, following Grice, as it were, has in mind the interpretation of 'if' as  ⊃.

  is truth-functional, because the 4 cards must decide the truth value of the 'if' utterance.

Classical logic, because the task is to determine truth or falsity of the 'if' utterance, implying that there is no other option.

Furthermore, the task is to evaluate the 'if' utterance with respect to the 4 cards.

So, if we denote the model defined by the 4 cards as A, and the rule by "ϕ", the task can be succinctly described as the following question.

What further information about the model A must one obtain to be able to judge whether

A |= ϕ

where "|=" denotes the classical satisfaction relation?

All this is of course obvious from the experimenter’s point of view.

But the important question is whether this interpretation is accessible to the subject.

Given the wide range of other meanings of 'if', the subject must infer from the instructions, and possibly from contextual factors, what Grice has as the m-intended meaning of the 'if' utterance is (Grice distinguishes of course between what an utterer means and what the utterance means).

In Wason's case, we should wonder what WASON MEANS.

Reading very carefully, and bracketing his own most prominent meanings for the key terms involved, the Wason subject may deduce that the 'if' utterance is to be interpreted truth-functionally, with a classical algebra of truth values, hence with   as resulting logical form.

Actually, the situation is more complicated.

But this ‘bracketing’ is what subjects with little logical training typically find hard to do, and we now turn to their plight.

The Wason subject first has to come up with a formal language in which to translate the rule.

It is usually assumed that the selection task is about propositional logic.

In the case of Wason's abstract 'if' utterance one actually needs predicate logic (which is a good thing because Grice studied 'formal devices' representing (x), (Ex) and (ix).

And we say the 'if' utterance needs predicate or quantifiational logic, mainly because of the occurrence of the expression ‘one side ... other side’.

One way (although not the only one) to formalise the rule in predicate logic uses the following predicates

V(x, y) -- x is on the visible side of card y.
I(x, y) -- x is on the invisible side of card y.
O(x)  -- x is a vowel.
E(x) -- x is an even number.

and the rule is then translated as the following pair

∀c(∃x(V(x, c) ∧ O(x))   ∃y(I(y, c) ∧ E(y))) ∀c(∃x(I(x, c) ∧ O(x))  ∃y(V(y, c) ∧ E(y)))

This might seem pedantry, were it not for the fact that some subjects go astray at this point, replacing the second statement by a bi-conditional

∀c(∃x(I(x, c) ∧ O(x)) ↔ ∃y(V(y, c) ∧ E(y))).

Or even a reversed conditional

∀c(∃x(V(x, c) ∧ E(x)) → ∃y(I(y, c) ∧ O(y))).

This very interesting phenomenon will be studied further.

For simplicity’s sake we will focus here on the subjects’ problems at the level of propositional logic.

Suppose the Wason subject has chosen some kind of propositional representation ϕ for the rule, in particular for ⊃.

The Wason subject must now decide how to formalise the task itself.

If the subject heeds the instruction to determine whether the rule is true or false, he has to choose the formulation

A |= ϕ?

that we gave above.

But for some subjects this interpretation is not accessible because of the pragmatics of the task.

Is it really believable that the experimenter is in doubt about the truth value of the rule?

Isn’t it more likely that the experimenter (your professor! Grice!) wrote down a true statement — the more so since the background rule (‘letters on one side, numbers on the other side’) must also be taken to be true?

We can provide several examples of subjects with this type of reaction.

Such subjects place the formal representation of the rule to the left of the ‘validity’ symbol, not to the right, as intended.

In other words, they use it as a premise, not as a conclusion to be established or refuted.

Another class of Wason subjects proceed analogously because they believe a conditional allows exceptions, and cannot be falsified by a single counterexample.

These Wason subjects’ concept of conditional is more adequately captured by the following pair of statements:

p∧¬e q

p ∧¬q e

-- where "e" is a proposition letter standing for ‘exception’.

In the second rule, we use p , q to indicate that perhaps only some cards which satisfy p but not q qualify as bona fide exceptions.

The first condition then says that the rule applies only to non-exceptional cards.

There are no clear falsifying conditions for conditionals allowing exceptions, so these two conditions are best viewed as premises.

This of course changes the task, which is now seen as identifying the exceptions.

There is a more general phenomenon at work here, which deserves a section of its own.

It was noticed early on that facilitation in task performance could be obtained by changing the abstract rule to a familiar rule such as:

If you want to drink alcohol, you have to be over 18

-- though the deontic nature of the rule was not initially seen as important.

This observation was one reason why formal logic was considered to be a bad guide to actual human reasoning.

Logic was not able to explain how statements supposedly of the same "logical form" lead to vastly different performance—or so the argument went.

However, using the expanded notion of logical form given above one can see that the abstract rule and the deontic rule are not of the same form.

One difference is in the structure of the models associated to deontic statements.

We provide one especially simple definition of a deontic conditional; there are many variants, but this one will suffice for our purposes. A model A is given by a set of ‘worlds’ or ‘cases’ W, together with a relation R(v, w) on W intuitively meaning: ‘v is an ideal counterpart to w’. That is, if R(v, w), then the norms posited in w are never violated in v. With this understanding, we may define the semantics5 of a deontic conditional p ≺ q by putting, for any world w ∈ W w p ≺ q iff for all v such that R(v, w) : v p implies v q, and A p ≺ q iff for all w in W : w p ≺ q. The definition thus introduces an additional parameter R. This allows an interesting bifurcation in understanding the task. We first reformulate the selection task in terms of the semantics just given. Suppose for simplicity that the letters can only be A,K, and the numbers only 4,7. Define a model as follows. There are four worlds corresponding to the visible sides of the cards; denote these by A, K, 4,7. Then there are eight worlds corresponding to the possibilities for what is on the invisible side; denote these by < A, 4 and K >, < A, 7 >, < K, 4 >, < K, 7 >, < 4, A >, < 4, K >, < 7,A > and < 7,K >. Intuitively, the initial set of the four A, K, 4, 7 worlds comprises the incomplete information states, which allow eight completions.

This gives as domain W of the model twelve worlds in all. The ‘supports’ relation is defined on W as follows. Let p be the proposition ‘the card has a vowel’, and q the proposition ‘the card has an even number’. Then we have 1. A p, K ¬p, p undecided on 4 and 7 2. 4 q, 7 ¬q, q undecided on A and K 3. < A, 4 > p, q, < A, 7 > p, ¬q, ... , < 4,A> p, q, < 4,K> ¬p, q, etc. If the rule is understood descriptively and as applying to the four cards only, it is represented by a material implication, and hence it is interpreted relative to exhaustive and consistent sets of complete worlds, such as {< A, 4 >, < K, 4 >, < 4, K >, < 7,A>}, etc. In this case one may ask whether the rule is true or false on such a set of worlds. If however the rule is read deontically, it is of the form p ≺ q, and hence the model with domain the set W together with the predicate R is necessary. Define R on W by R(A, < A, 4 >), 494 K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 R(7, < 7, K >), ¬R(A, < A, 7 >), ¬R(7, < 7, A >), R(K, < K, 4 >), R(K, < K, 7 >), R(4, < 4, A >) and R(4, < 4, K >). The structure (W, R, ) then satisfies p ≺ q; that is, no amount of evidence from card-turning can make the rule false. Turning 7 to find A just means that < 7,A> is not an ideal counterpart to 7. This is actually a general phenomenon, which is not restricted to just conditionals. As we shall see, if one gives subjects the following variation on the selection task There is a vowel on oneside of the cards and there is an even number on the other side6 they typically respond by turning the A and 4 cards, instead of just replying ‘this statement is false of these four cards’ (see below, Section 3.1.5). One reason for this behaviour is given by subject 22 in Section 2.4 below, who now sees the task as checking those cards which could still satisfy the conjunctive rule, namely A and 4, since K and 7 do not satisfy in any case. Such a response is only possible if one has helped oneself to a predicate such as R. Formally, one may define a deontic conjunction p q by putting, for all w in W, w p q iff for all v such that R(v, w) : v p ∧ q. In this case the worlds < K, 4 > and < K, 7 > are both non-ideal counterparts to the partial world K, and similarly for the partial world 7. In other words, no completion of K or 7 can be ideal, and therefore the subject has to turn only A and 4, to see whether perhaps these worlds are ideal7. What is interesting is that, viewed in this light, there is a difference in complexity between the descriptive and the deontic cases. In the latter case, one can determine the extension of R by checking the cards one at a time. There is no interference: whether the partial world A can be extended to an ideal world is independent of whether 7 can be so extended. In the descriptive case there is a certain dependence between card choices. A subject may argue: ‘If I turn A and find a 7, I know that the rule is false, so I do not have to select any other cards. The same argument holds for the 7. So how can I make a unique choice?’ A particularly clear instance of this mental conflict is provided by subject 10 in Section 2.2. Rules which are interpreted descriptively thus present greater processing difficulties than rules which are interpreted deontically, and we contend that it is this processing difficulty which explains part of what have come to be called ‘content effects’.

Above we have seen that subjects may be in doubt about the structure of the relevant model: whether it consists of cards, or of cards plus distinguished predicate. An orthogonal issue is, which set of cards should form the domain of the model. The experimenter intends the domain to be the set of four cards. The subjects may not grasp this; indeed there are good reasons why they shouldn’t. Section 2.1.6 gives some reasons why natural language use suggests considering larger domains, of which the four cards shown are only a sample, and it presents a dialogue with a subject who has a probabilistic concept of truth that comes naturally with this interpretation of the domain. This brings us to the notion of ‘truth value on a model’. The experimenter intends the subject to operate with the classical algebra of truth values. However, especially when operating with conditionals, the subject tends to set this parameter differently. We have observed repeatedly (see Section 2.1.2) that subjects operate with a logic in which not-false is not the same as true. Theoretically, this logic could be one of a family of three-valued logics, where ‘not-false’ K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 495 includes the possibility ‘undecided’, or, more likely, it could be of the intuitionistic variety, where, very roughly speaking, A → B is true if there is a necessary connection (e.g., a proof) linking A and B. In the absence of further data it is hard to tell which logic is applied, if any, but it is worth noting that a conditional is often felt to have a lawlike character (see subject 13 in Section 2.1.2, and the discussion in Section 2.1.6); if that is so, the truth of the conditional cannot be established by pointing to the nonexistence of counterexamples. So far we have discussed the semantic relation x ϕ for the case where x is a model of some kind. When introducing this relation, we mentioned an alternative choice for x, namely an information state providing evidence relevant for ϕ. A quantitative information-theoretic approach to the selection task has been given by Oaksford and Chater (1994), who argue that turning the 4 card actually yields more information than turning the 7 card. We will not repeat their arguments here, nor the criticism we voiced in Stenning and van Lambalgen (2001), but we want to note in support of Oaksford and Chater (1994), that some subjects entertain both choices for x in x ϕ simultaneously, and then decide that in the context of this task it is best to go the information-theoretic route. Section 2.2.1 contains several examples of subjects exhibiting this pattern.

This concludes our survey of what is involved in assigning logical form.

We now turn to the demonstrations that subjects are indeed troubled by the different ways in which they can set the parameters, and that clearer task instructions can lead to fewer possibilities for the settings. 2. Designing experimental interventions A formal analysis of the semantic and pragmatic complexities of task and rule can suggest origins of subjects’ problems. We now take up the task of turning these hypotheses based on the semantics of the materials and tasks, into experimental manipulations. As a half-way house between semantics and controlled experiment, we report here excerpts from socratic tutorial dialogues to illustrate the kinds of problems subjects experience. Some of these excerpts were reported in Stenning and van Lambalgen (2001). Others are new observations from the same transcripts. Observational studies of externalised reasoning can provide prima facie evidence that these problems actually are real problems for subjects, although there is, of course, the possibility that externalising changes the task. Only controlled experiment can provide evidence that the predicted mental processes actually do take place when subjects reason in the original non-interactive task. We present these observations of dialogues in the spirit of providing plausibility for our semantically based predictions. We assure the reader that they are representative of episodes in the dialogues—not one-offs. 8 But rather than turn these observations into a quantitative study of the dialogues which would still only bear on this externalised task, we prefer to use them to illustrate and motivate our subsequent experimental manipulations which do bear directly on the original task. We acknowledge that we cannot be certain that our interpretations of the dialogues are correct representations of mental processes—the reader will often have alternative suggestions. Nevertheless, we feel that the combination of rich naturalistic, albeit selective observations, with controlled experimental data is more powerful than either would be on its own. At the very least, the dialogues strongly suggest that there are multiple possible confusions, and often multiple reasons for making the very same response, and so counsel against homogeneous explanations. Following the theory outlined in Section 1.1, we view these confusions as a consequence of subjects’ trying to fix one of the many parameters involved in deciding upon a logical form.

Here is a list of the problems faced by subjects, as witnessed by the experimental protocols. Illustrations will be provided below.

What is truth?
What is falsity?
Pragmatics: the authority of the source of the rule •
Rules and exceptions
Reasoning and planning
Interaction between interpretation and reasoning
Truth of the rule versus ‘truth’ of a case
Cards as viewed as a sample from a larger domain
Obtaining evidence for the rule versus evaluation of the cards
Subjects’ understanding of propositional connectives generally

Both the semantic issues facing the subject and the experimental manipulations we design to explore them may appear highly heterogeneous.

Let us reiterate that what integrates the account is the way these apparently miscellaneous factors come together in explaining the dominant observation, namely contrast between reasoning with descriptively versus deontically interpreted rules. Everything stems from the difference in the basic semantic relation between rule and case under these two major classes of interpretation. Indeed some of the effects we explore have been noted before but have not been taken to be anything more than surface irritants because no overall framework has been available to relate them. A structured specification of the landscape of parameters which have to be set in order to determine logical interpretation is beyond the scope of this paper, but see Stenning and van Lambalgen (submitted) for a formalised account of the main features of the interpretation of the conditional which we take to be central to subjects’ initial understanding of natural language conditionals in this task. 2.1. Subjects’ understanding of truth and falsity 2.1.1. A two-rule task In an earlier set of experiments (Stenning & van Lambalgen, 2001), we introduced a novel task of presenting two rules, instructing subjects that one is true and the other false, and asking them seek evidence to decide which is which. The rules were: 1. if there is a U on one side, then there is an 8 on the other side 2. if there is an I on one side, then there is an 8 on the other side given the background rule that one side contains U or I, and the other side contains 3 or 8. In the tutorial version of this experiment, subjects were presented with real cards lying in front of them on the table. The cards shown were U, I, 8 and 3. We first asked subjects to select cards, then to imagine what could be on the other side, and lastly to turn all cards, after which subjects were given the opportunity to revise their earlier selection. In this case, both U and I carried an 8, 8 carried an I, and 3 a U. K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 497 The motivation for introducing this manipulation was two-fold. First, the Bayesian approach due to Oaksford and Chater postulates that in solving the standard Wason task, subjects always compare the rule given, to the unstated null hypothesis that antecedent and consequent are independent.

We were thus interested in seeing what would happen if subjects were presented with explicit alternative non-null hypotheses. The classical logical competence model specifies that correct performance is to turn just the 3 card. This card is alone sufficient to identify which rule is true and which false, and is the only such singly sufficient card. In this task, the 3 card therefore offers greatest information gain and so presents a useful exploration of the Bayesian approach independent of existing observations. Second, and more importantly from our perspective, explicitly telling the subject that one rule is true and one false, should background a number of issues concerned with the notion of truth, such as the possibility of the rule withstanding exceptions. The experimental manipulation turned out to be unexpectedly fruitful; while struggling through the task, subjects made comments very suggestive of where their difficulties lay.9 Below we give excerpts from the tutorial dialogues which highlight these difficulties. Precisely because many semantic difficulties come to the surface in this novel task, it might lead to increased performance, and so it appears to be a good experiment to repeat in a standard format.

The tutorial experiment of which a part was described above, was preceded by a so called paraphrase task, in which subjects were asked to judge entailment relations between sentences involving propositional connectives and quantifiers. This task continues the classical work of Fillenbaum (1978) on subjects understandings of natural language connectives. For example, the subject could be given the sentence ‘if a card has a vowel on one side, it has an even number on the other side’, and then be asked to judge whether ‘every card which has a vowel on one side, has an even number on the other side’ follows from the given sentence.

This example is relatively innocuous, but we will see below that these judgements can be logically startling. 2.1.2. The logic of ‘true’ On a classical understanding of the two-rule task, the competence answer is to turn the 3; this would show which one of the rules is false, hence classically also which one is true. This classical understanding should be enforced by explicitly instructing the subjects that one rule is true and the other one false. Interestingly, some subjects refuse to be moved by this instruction, insisting that ‘not-false’ is not the same as ‘true’.

These subjects are thus guided by some nonclassical logic. Some subjects, when reading the rule(s) aloud actually inserted a modality in the conditional: Subject 13. [Standard Wason task] S. ... if there is an A, then there is a 4, necessarily the 4 ... [somewhat later] ... if there is an A on one side, necessarily a 4 on the other side ... . If truth involves necessity, then the absence of counterexamples is not sufficient for truth. Subject 17. S. [Writes miniature truth tables under the cards.] 498 K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 E. OK so if you found an I under the 3, you put a question mark for rule 1, and rule 2 is false; if you turned the 3 and found a U, then rule 1 is false and rule 2 is a question mark. So you want to turn 3 or not? S. No. E. Let’s actually try doing it. Turn over the U, you find a 3, which rule is true and which rule is false? S. (Long pause) E. Are we none the wiser? S. No, there’s a question mark. E. It could have helped us, but it didn’t help us? S. Yes. . . . E. OK and the 3 S. Well if there is a U then that one is disproved [pointing to the first rule] and if there is an I then that one is disproved [pointing to the second rule]. But neither rule can be proved by 3. . . . E. Turn over the last card [3] and see what’s on the back of it ... so it’s a U. What does that tell us about the rule? S. That rule one is false and it doesn’t tell us anything about rule 2? E. Can’t you tell anything about rule 2? S. No. The subject thinks falsifying rule 1 does not suffice and now looks for additional evidence to support rule 2. In the end she chooses the 8 card for this purpose, which is of course not the competence answer even when ‘not-false’ is not equated with ‘true’ (the I card would have to be chosen). Here are two more examples of the same phenomenon. Subject 8. S. I wouldn’t look at this one [3] because it wouldn’t give me appropriate information about the rules; it would only tell me if those rules are wrong, and I am being asked which of those rules is the correct one. Does that make sense? Subject 5. E. What about if there was a 3? S. A 3 on the other side of that one [U]. Then this [rule 1] isn’t true. E. It doesn’t say ... ? S. It doesn’t say anything about this one [rule 2]. E. And the I? S. If there is a 3, then this one [rule 2] isn’t true, and it doesn’t say anything about that one [rule 1]. The same problem is of course present in the standard Wason task as well, albeit in a less explicit form. If the cards are A, K, 4 and 7, then turning A and 7 suffices to verify that the rule is not false; but the subject may wonder whether it is therefore true.

For instance, if the concept of truth of a conditional involves attributing a lawlike character to the conditional, the absence of counterexamples does not suffice to establish truth.

Let us note here that it is precisely this difficulty which is absent in the case of deontic rules such as If you want to drink alcohol in this bar, you have to be over 17.

Such a rule cannot be shown to hold by examining cases; at most we can establish that it is not violated. So in the deontic case, subjects only have to do what they find easy in any case. 2.1.3. The logic of ‘false’ Interesting things happen when one asks subjects to meditate on what it could mean for a conditional to be false. As indicated above, the logic of ‘true’ need not determine the logic of ‘false’ completely. The paraphrase task alluded to above showed that a conditional (p → q) being false, i.e. is often (>50%) interpreted as p → not q! (We will refer to this property as strong falsity.) This observation is not ours alone: Fillenbaum observed that in 60% of the cases the negation of a causal temporal conditional p → q (‘if he goes to Amsterdam, he will get stoned’) is taken to bep → not q; for contingent universals (such as the rule in the selection task) the proportion is 30%. In our experiment the latter proportion is even higher.

Here is an example of a subject using strong falsity when asked to imagine what could be on the other side of a card. Example Subject 26 [Standard Wason task; subject has chosen strong falsity in paraphrase task] E. So you’re saying that if the statement is true, then the number [on the back of A] will be 4. ... What would happen if the statement were false? S. Then it would be a number other than 4. Note that strong falsity encapsulates a concept of necessary connection between antecedent and consequent in the sense that even counterexamples are no mere accidents, but are governed by a rule. If a subject believes that true and false in this situation are exhaustive, this could reflect a conviction that the cards have been laid out according to some rule. It is interesting to see what this interpretation means for card choices in the selection tasks. If a subject has strong negation but still believes true and false are exhaustive, then (in the standard Wason task) either of the cards p, q can show that p → q is not-false, hence true. Unfortunately, in the standard set up ‘either of A, 4’ is not a possible response offered. In the tutorial experiment involving the two-rule task subjects were at liberty to make such choices. In this case strong falsity has the effect of turning each of the two rules into a biconditional, ‘U if and only if 8’ and ‘I if and only if 8’, respectively. Any card now distinguishes between the two rules, and we do indeed find subjects emphatically making this choice: E. OK so you want to revise your choice or do you want to stick with the 8? S. No no ... I might turn all of them. E. You want to turn all of them? S. No no no just one of them, any of them. Perhaps the customary choice of p, q in the standard task is the projection of ‘either of p, q’ onto the given possibilities. Another option is that some subjects have a biconditional 500 K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 reading of ‘if ... then’ together with strong falsity; in this case both p and q are necessary. These considerations just serve to highlight the possibility that a given choice of cards is made for very different reasons by different subjects, so that by itself statistical information on the different card choices in the standard task must be interpreted with care. 2.1.4. Truth of the rule and ‘truth of the card’ Subjects are persistently confused about several notions of truth that could possibly be involved. The intended interpretation is that the domain of discourse consists of the four cards shown, and that the truth value of the rule is to be determined with respect to that domain.

This interpretation is however remarkably difficult to get at. An alternative interpretation is that the domain is some indefinitely large population of cards, of which the four cards shown are just a sample; this is the intuition that lies behind Oaksford and Chater’s Bayesian approach. We will return to this interpretation in Section 2.1.6 below. The other extreme is that each card defines a domain of its own, i.e. each card is to be evaluated against the rule independently. The latter interpretation is the one suited to deontic conditionals, but there are indications that subjects sometimes impose this interpretation also in the indicative case, and then struggle with the resulting clash between two notions of truth. If a card complies with the rule, in other words ‘if the rule is true of the card’, then some subjects seem to have a tendency to transfer this notion of truth to ‘truth of the rule tout court’. Here is an example of the phenomenon, observed in the two-rule task. Subject 10. E. If you found an 8 on this card [I], what would it say? S. It would say that rule two is true, and if the two cannot be true then rule one is wrong ... (Subject turns 8). E. OK so it’s got an I on the back, what does that mean? S. It means that rule two is true. E. Are you sure? S. I’m just thinking whether they are exclusive, yes because if there is an I then there is an 8. Yes, yes, it must be that. One experimental manipulation in the tutorial dialogue for the two-rule task addressed this problem by making subjects first turn U and I, to find 8 on the back of both. This caused great confusion, because the subjects’ logic (transferring ‘truth of the card’ to ‘truth of the rule’) led them to conclude that therefore both rules must be true, contradicting the instruction. Subject 18 [Initial choice was 8.] E. Start with the U, turn that over. S. U goes with 8. E. OK now turn the I over. S. Oh God, I shouldn’t have taken that card, the first ... . E. You turned it over and there was an 8. S. There was an 8 on the other side, U and 8. If there is an I there is an 8, so they are both true. [Makes a gesture that the whole thing should be dismissed.] K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 501 Subject 28. E. OK turn them. S. [turns U, finds 8] So rule one is true. E. OK for completeness’ sake let’s turn the other cards as well. S. OK so in this instance if I had turned that one [I] first then rule two would be true and rule one would be disproven. Either of these is different. [U or I] E. What does that actually mean, because we said that only one of the rules could be true. Exactly one is true. S. These cards are not consistent with these statements here. On the other hand subjects who ultimately got the two-rule task right also appeared to have an insight into the intended relation between rule and cards. Subject 6. E. So say there were a U on the back of the 8, then what would this tell you? S. I’m not sure where the 8 comes in because I don’t know if that would make the U-one right, because it is the opposite way around. If I turned that one [pointing to the U] just to see if there was an 8, if there was an 8 it doesn’t mean that rule two is not true.

We claim that part of the difficulty of the standard task involving a descriptive rule is the possibility of confusing the two relations between rule and cards. Transferring the ‘truth of the card’ to the ‘truth of the rule’ may be related to what Wason called ‘verification bias’, but it seems to cut deeper. One way to transfer the perplexity unveiled in the above excerpts to the standard task is to do a tutorial experiment where the A has a 4 on the back, and 7 an A. If a subject suffering from a confusion about the relation between cards and rule turns the A and finds 4, he should conclude that the rule is true, only to be rudely disabused upon turning 7. Unfortunately we haven’t yet done this manipulation. In any case it is clear that for a deontic rule no such confusion can arise, because the truth value of the rule is not an issue. 2.1.5. Exceptions and brittleness The concept of truth Wason intended is that of ‘true without exceptions’, what we call a brittle interpretation of the conditional. It goes without saying that this is not how a conditional is generally interpreted in real life. And we do find subjects who struggle with the required transition from a notion of truth which withstands some exceptions, to exceptionless truth. Subject 18. E. What could you say is on the back of the 3, are you sticking with the consonant? S. Consonant or U. E. OK. S. [Turns 3 and finds U] OK ... well no ... well that could be an exception you see. E. The U? S. The U could be an exception to the other rule. 502 K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 E. To the first rule? S. Yes, it could be an exception. E. So could you say anything about the rule based on this? Say, on just having turned the U and found a 3? S. Well yes, it could be a little exception, but it does disprove the rule so you’d have to ... E. You’d have to look at the other ones? S. Yes. Similarly in the standard Wason task: Subject 18. S. If I just looked at that one on its own [7/A] I would say that it didn’t fit the rule, and that I’d have to turn that one [A] over, and if that was different [i.e., if there wasn’t an even number] then I would say the rule didn’t hold. E. So say you looked at the 7 and you turned it over and you found an A, then? S. I would have to turn the other cards over ... well it could be just an exception to the rule so I would have to turn over the A. Clearly, if a counterexample is not sufficient evidence that the rule is false, then it is dubious whether card-turnings can prove the rule to be true or false at all.

Subjects may accordingly be confused about how to interpret the instructions of the experiment. In any case a ¬q card would lose some salience (if it had any to begin with).

Above we noted that there are problems concerning the domain of interpretation of the conditional rule. The intended interpretation is that the rule applies to the four cards shown only. However, the semantics of conditionals is such that they tend to apply to an open-ended domain of cases. This can best be seen in contrasting universal quantification with the natural language conditional. Universal quantification is equally naturally used in framing contingent contextually determined statements as open-ended generalisations. So, to develop Goodman’s (1954) example, “All the coins in my pocket this morning are copper” is a natural way to phrase a local generalisation with a fixed enumerable domain of interpretation. However, “If a coin is in my pocket this morning, it’s copper” is a distinctly unnatural way of phrasing the same claim.

The latter even invites the fantastical interpretation that if a silver coin were put in my pocket this morning it would become copper—that is an interpretation in which a larger open-ended domain of objects is in play. Similarly in the case of the four card task, the clause that “the rule applies only to the four cards” has to be explicitly included. One may question whether subjects take this clause on board, since this interpretation is an unnatural one for the conditional. It is further unnatural to call the sentence a rule if its application is so local. A much more natural interpretation is that the four cards are a sample. Indeed this is the point of purchase of Oaksford and Chater’s proposals that performance is driven by subjects’ assumptions about the larger domain of interpretation. We do find subjects who think that truth or falsity can only be established by (crude) probabilistic considerations: K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 503 Subject 26. S. [has turned U,I, found an 8 on the back of both] I can’t tell which one is true. E. OK let’s continue turning. S. [turns 3] OK that would verify rule two. [... ] Well, there are two cards that verify rule two, and only one card so far that verifies rule one. Because if this [3] were verifying rule one, it should be an I on the other side. E. Let’s turn [the 8]. S. OK so that says that rule two is true as well, three of the cards verify rule two and only one verifies rule one. E. So you decide by majority. S. Yes, the majority suggests rule two. It is interesting that 3/U is described as verifyingrule two, rather than falsifying rule one; U→8 is never ruled out: S. It’s not completely false, because there is one card that verifies rule one. Summarising: natural language descriptive conditionals bear complex relations to cases and sets of cases in their domain. In principle, only sets of cases can make a descriptive rule true. Even then the fact that all cases comply may intuitively not be enough, for instance when a subject hesitates to conclude ‘true’ from ‘not false’. The situation is still more complex because descriptive rules usually tolerate some exceptions. To get Wason’s desired interpretation of the rule as a material conditional, it is necessary to background the complex range of possibilities for descriptive rules’ relations to compliant cases and to exceptions, and to induce the intended meaning of ‘true’ and ‘false’. Here the two-rule task may have a role to play. If subjects were assured that one of two rules was false and one was true, and instructed that their task was to gather minimal evidence as to which rule was which, then this hopefully focusses their attention on the more straightforward relations between rules and cases, and backgrounds the higher-order issues about how exceptions affect the truth of rules, and more generally the nature of truth. Of course the excerpts given above have mainly illustrated subjects’ difficulties in the two-rule task. However, several tutorial dialogues involving the two-rule task also showed (very gradual and faltering) progress toward insight, while this progress was absent in the dialogues involving the standard task.

This gave us some confidence that the two-rule task might be helpful in reaching the competence response, a prediction borne out by the experimental results reported below.

The tutorial dialogues suggest that part of the difficulty of the selection task consists in having to choose a card without being able to inspect what is on the other side of the card. This difficulty can only be made visible in the dialogues because there the subject is confronted with real cards, which she is not allowed to turn at first. It then becomes apparent that some subjects would prefer to solve the problem by ‘reactive planning’, i.e. by first choosing a card, turning it and deciding what to do based on what is on the other side. This source of difficulty is obscured by the standard format of the experiment. The form invites the subjects to think of the cards depicted as real cards, but at the same time the answer should be given on the 504 K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 basis of the representation of the cards on the form, i.e. with inherently unknowable backs. The instruction ‘Tick the cards you want to turn ... ’ clearly does not allow the subject to return a reactive plan. This is a pity, because the tutorials amply show that dependencies are a source of difficulty. Here is an excerpt from a tutorial dialogue in the two-rule condition. Subject 1. E. Same for the I, what if there is an 8 on the back? S. If there is an 8 on the back, then it means that rule two is right and rule one is wrong. E. So do we turn over the I or not? S. Yes. Unless I’ve turned the U already. And in a standard Wason task: Subject 10. S. OK so if there is a vowel on this side then there is an even number, so I can turn A to find out whether there is an even number on the other side or I can turn the 4 to see if there is a vowel on the other side. E. So would you turn over the other cards? Do you need to turn over the other cards? S. I think it just depends on what you find on the other side of the card. No I wouldn’t turn them. . . . E. If you found a K on the back of the 4? S. Then it would be false. . . . S. But if that doesn’t disclude [sic] then I have to turn another one. E. So you are inclined to turn this over [the A] because you wanted to check? S. Yes, to see if there is an even number. E. And you want to turn this over [the 4]? S. Yes, to check if there is a vowel, but if I found an odd number [on the back of the A], then I don’t need to turn this [the 4]. E. So you don’t want to turn ... S. Well, I’m confused again because I don’t know what’s on the back, I don’t know if this one ... E. We’re only working hypothetically now. S. Oh well, then only one of course, because if the rule applies to the whole thing then one would test it. . . . E. What about the 7? S. Yes the 7 could have a vowel, then that would prove the whole thing wrong. So that’s what I mean, do you turn one at a time or do you ... ? . . . E. Well if you needed to know beforehand, without having turned these over, so you think to yourself I need to check whether the rule holds, so what cards do I need to turn over? You said you would turn over the A and the 4. S. Yes, but if these are right, say if this [the A] has an even number and this has a vowel [the 4], then I might be wrong in saying “Oh it’s fine”, so this could have an odd number [the K] and this a vowel [the 7] so in that case I need to turn them all. E. You’d turn all of them over? Just to be sure? S. Yes. Once one has understood Wason’s intention in specifying the task, it is easy to assume that it is obvious that the experimenter intends subjects to decide what cards to turn before any information is gained from any turnings. Alternatively, and equivalently, the instructions can be interpreted to be to assume the minimal possible information gain from turnings. However, the obviousness of these interpretations is possibly greater in hindsight, and so we set out to test whether they are a source of difficulty in the task. Note that no contingencies of choice can arise if the relation between rule and cards is interpreted deontically. Whether one case obeys the law is unconnected to whether any other case does. Hence the planning problem indicated above cannot arise for a deontic rule, which might be one explanation for the good performance in that case. In this connection it may be of interest to consider the so-called reduced array selection task, or RAST for short, due to Wason and Green and discussed extensively by Margolis (1988). In its barest outline10 the idea of the RAST is to remove the p and ¬p cards from the array of cards shown to the subject, thus leaving only q and ¬q.

The p and ¬p cards cause no trouble in the standard task in the sense that p is chosen almost always, and ¬p almost never, so one would expect that their deletion would cause little change in the response frequencies for the remaining cards.

Surprisingly however, the frequency of the ¬q response increases dramatically. From our point of view, this result is perhaps less surprising, because without the possibility to choose p, dependencies between card choices can no longer arise. This is not to say that this is the only difficulty the RAST removes.

A related planning problem, which can however occur only on a non-standard logical understanding of the problem, is the following. In a few early tutorial dialogues involving the two-rule experiment, the background rule incorrectly failed to specify that the cards have one side either U or I and on the other side either 3 or 8, owing to an error in the instructions. In this case the competence response is not to turn 3 only, but to turn U, I and 3. But several subjects did not want to choose the 3 for the following reason. Subject 7. S. Then I was wondering whether to choose the numbers. Well, I don’t think so because there might be other letters [than U,I] on the other side. There could be totally different letters. E. You can’t be sure? S. I can’t be sure. I can only be sure if there is a U or an I on the other side. So this is not very efficient and this [3] does not give me any information. But I could turn the U or the I. Apparently the subject thinks that he can choose between various sets of cards, each suffi- cient, and the choice should be as parsimonious as possible in the sense that every outcome of a turning must be relevant. To show that this is not an isolated phenomenon, here is a subject engaged in a standard Wason task: Subject 5. E. So you would pick the A and you would pick the 4. And lastly the 7? S. That’s irrelevant. E. So why do you think it’s irrelevant? S. Let me see again. Oh wait so that could be an A or a K again [writing the options for the back of 7 down], so if the 7 would have an A then that would prove me wrong. But if it would have a K then that wouldn’t tell me anything. E. So? S. So these two [pointing to A and 4] give me more information, I think. E. [... ] You can turn over those two [A and 4]. S. [turns over the A] E. So what does that say? S. That it’s wrong. E. And that one [4]? S. That it’s wrong. E. Now turn over those two [K and 7]. S. [Turning over the K] It’s a K and 4. Doesn’t say anything about this [pointing to the rule]. [After turning over the 7] Aha. E. So that says the rule is ... ? S. That the rule is wrong. But I still wouldn’t turn this over, still because I wouldn’t know if it would give an A, it could give me an a K and that wouldn’t tell me anything. E. But even though it could potentially give you an A on the back of it like this one has. S. Yes, but that’s just luck. I would have more chance with these two [referring to the A and the 4]. These subjects have no difficulty evaluating the meaning of the possible outcomes of turning 3 (in the two-rule task), or 7 (in the standard Wason task), but their choice is also informed by other considerations, in particular a perceived trade-off between the ‘information value’ of a card and the penalty incurred by choosing it. Of course this does not yet explain the evaluation of the 4/K card as showing that the rule is wrong, and simultaneously taking the K/4 card to be irrelevant. The combined evaluations seem to rule out a straightforward biconditional interpretation of the conditional, and also the explanation of the choice of 4 as motivated by a search for confirmatory evidence for the rule, as Wason would have it. This pattern of evaluations is not an isolated phenomenon, so an explanation would be most welcome. Even without such an explanation it is clear that the problem indicated, how to maximise information gain from turnings, cannot play a role in the case of deontic conditionals, since the status of the rule is not an issue. 2.3. The pragmatics of the descriptive selection task The descriptive task demands that subjects seek evidence for the truth of a statement which comes from the experimenter. The experimenter can safely be assumed to know what is (or K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 507 is deemed to be) on the back of the cards. If the rule is false its appearance on the task sheet amounts to the utterance, by the experimenter, of a knowing falsehood, possibly with intention to deceive. It is an active possibility that doubting the experimenter’s veracity is a socially uncomfortable thing to do. Quite apart from possible social psychological effects of discomfort, the communication situation in this task is bizarre. The subject is first given one rule to the effect that the cards have letters on one side and numbers on the other.

This rule they are supposed to take on trust.

Then they are given another rule by the same information source and they are supposed not to trust it but seek evidence for its falsity. If they do not continue to trust the first rule, then their card selections should diverge from Wason’s expectations. If they simply forget about the background rule, the proper card choice would be A,K and 7; and if they want to test the background rule as well as the foreground rule, they would have to turn all cards. Notice that with the deontic interpretation, this split communication situation does not arise.

The law stands and the task is to decide whether some people other than the source obey it. Here is an example of a subject who takes both rules on trust: Subject 3. [Standard Wason task; has chosen A and 4] E. Why pick those cards and not the other cards? S. Because they are mentioned in the rule and I am assuming that the rule is true. Another subject was rather bewildered when upon turning A he found a 7: Subject 8. S. Well there is something in the syntax with which I am not clear because it does not say that there is an exclusion of one thing, it says ‘if there is an A on one side there is a 4 on the other side’. So the rule is wrong. E. This [pointing to A] shows that the rule is wrong. S. Oh so the rule is wrong, it’s not something I am missing. Although this may sound similar to Wason’s ‘verification bias’, it is actually very different. Wason assumed that subjects would be in genuine doubt about the truth value of the rule, but would then proceed in an ‘irrational’, verificationist manner to resolve the issue. What transpires here is that subjects take it on the authority of the experimenter that the rule is true, and then interprets the instructions as indicating those cards which are evidence of this: Subject 22. S. Well my immediate [inaudible] first time was to assume that this is a true statement, therefore you only want to turn over the card that you think will satisfy the statement. The communicative situation of the two-rule task is already much less bizarre, since there is no longer an reason to doubt the veracity of the experimenter. The excerpts also suggest that a modified standard task in which the rule is attributed not to the experimenter but to an unreliable source, might increase the number of competence responses. It hardly needs emphasising anymore that these problems cannot arise in the case of a deontic rule. 508 K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 2.4. Subjects’ understanding of propositional connectives As mentioned before, the tutorial dialogues were preceded by a paraphrase task, in which subjects were asked whether a statement involving a conditional is equivalent to a statement involving other logical connectives. A further striking observations from the paraphrase task is that a conditional p → q is often (>50%) interpreted as a conjunction p ∧ q. Here is an example of what a conjunctive reading means in practice. Subject 22. [Subject has chosen the conjunctive reading in the paraphrase task.] E. [Asks subject to turn the 7] S. That one ... that isn’t true. There isn’t an A on the front and a 4 on the back. [... ] you turn over those two [A and 4] to see if they satisfy it, because you already know that those two [K and 7] don’t satisfy the statement. E. [baffled] Sorry, which two don’t satisfy the rule? S. These two don’t [K and 7], because on one side there is K and that should have been A, and that [7] wouldn’t have a 4, and that wouldn’t satisfy the statement. E. Yes, so what does that mean ... you didn’t turn it because you thought that it will not satisfy? S. Yes. Clearly, on a conjunctive reading, the rule is already falsified by the cards as exhibited; no turning is necessary. The subject might however feel forced by the experimental situation to select some cards, and accordingly reinterprets the task as checking whether a given card satisfies the rule. This brings us to an important consideration: how much of the problem is caused by the conditional? The literature on the selection task, with very few exceptions, has assumed that the problem is a problem specific to conditional rules. Indeed, it would be easy to infer also from the foregoing discussion of descriptive conditional semantics that the conditional (and its various expressions) is unique in causing subjects so much difficulty in the selection task, and that our only point is that a sufficiently rich range of interpretations for the conditional must be used to frame psychological theories of the selection task. However, the issues already discussed—the nature of truth, response to exceptions, contingency, pragmatics—are all rather general in their implications for the task of seeking evidence for truth. One can distinguish the assessment of truth of a sentence from truthfulness of an utterer for sentences of any form. The robustness or brittleness of statements to counterexamples is an issue which arises for any generalisation. The social psychological effects of the experimenter’s authority, and the communicative complexities introduced by having to take a cooperative stance toward some utterances and an adversarial one toward others is also a general problem of pragmatics that can affect statements of any logical form (cf. Grice, 1975). Contingencies between feedback from early evidence on choice of subsequent optimal evidence seeking are general to any form of sentence for which more than one case is relevant. It would seem to be a high priority to find out to what extent there is something uniquely problematic about conditionals in the selection task, and to what extent these more general issues could explain poor performance in seeking evidence for descriptive statements’ truth. Several early papers compared disjunction with the conditional (e.g., van Duyne, 1974), and K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 509 show that disjunctions are at least as hard as conditionals in the selection task. It is true that Johnson-Laird and Byrne (2002) cite Wason and Johnson-Laird (1969) as showing that disjunctions are easy, but that paper has so many divergences from the standard selection task it is hard to know how to relate it. But disjunction is perhaps too closely allied to the conditional to make a case that the problem is a more general problem of the possibility of non-truth-functionality of all natural language connectives. What would happen, for example, if the rule were stated using the putatively least problematical connective, conjunction? 2.5. Other sources of difficulty The transcripts of the tutorial dialogues reveal another important source of confusion, namely the interpretation of the anaphoric expression ‘one side ... other side’ and its interaction with the direction of the conditional. The trouble with ‘one side ... other side’ is that in order to determine the referent of ‘other side’, one must have kept in memory the referent of ‘one side’. That may seem harmless enough, but in combination with the various other problems identified here, it may prove too much. Even apart from limitations of working memory, subjects may have a non-intended interpretation of ‘one side ... other side’, wherein ‘one side’ is interpreted as ‘visible side’ (the front, or face of the card) and ‘other side’ is interpreted as ‘invisible side’ (the back of the card). The expression ‘one side ... other side’ is then interpreted as deictic, not as anaphoric. This possibility was investigated by Gebauer and Laming (1997), who argue that deictic interpretation of ‘one side ... other side’ and a biconditional interpretation of the conditional, both singly and in combination, are prevalent, persistently held, and consistently reasoned with. Gebauer and Laming present the four cards of the standard task six times to each subject, pausing to actually turn cards which the subject selects, and to consider their reaction to what is found on the back. Their results show few explicitly acknowledged changes of choice, and few selections which reflect implicit changes. Subjects choose the same cards from the sixth set as they do from the first. Gebauer and Laming argue that the vast majority of the choices accord with normative reasoning from one of the four combinations of interpretation achieved by permuting the conditional/biconditional with the deictic/anaphoric interpretations.11 We tried to find further evidence for Gebauer and Laming’s view, and presented subjects with rules in which the various possible interpretations of ‘one side ... other side’ were spelt out explicitly; e.g. one rule was (1) if there is a vowel on the face of the card, then there is an even number on the back To our surprise, subjects seemed completely insensitive to the wording of the rule and chose according to the standard pattern whatever the formulation; for discussion see Stenning and van Lambalgen (2001).

This result made us curious to see what would happen in tutorial dialogues when subjects are presented with a rule like (1), and indeed the slightly pathological (2)

 (2) if there is a vowel on the back of the card, there is an even number on the face of the card.

After having presented the subjects with these two rules, we told them that the intended interpretation of ‘one side ... other side’ is that ‘one side’ can refer to the visible face or to the invisible back. Accordingly, they now had choose cards corresponding to

 (3) if there is a vowel on one side (face or back), then there is an even number on the other side (face or back).

We now provide a number of examples, culled from the tutorial dialogues, which demonstrate the interplay between the interpretations chosen for anaphora and conditional. The first example shows us a subject who explicitly changes the direction of the implication when considering the back/face anaphora, even though she is at first very well aware that the rule is not biconditional. Subject 12. [Experiments (1), (2), and (3)] E. The first rule says that if there is a vowel on the face of the card, so what we mean by face is the bit you can see, then there is an even number on the back of the card, so that’s the bit you can’t see. So which cards would you turn over to check the rule. S. Well, I just thought 4, but then it doesn’t necessarily say that if there is a 4 that there is a vowel underneath. So the A. E. For this one it’s the reverse, so it says if there is a vowel on the back, so the bit you can’t see, there is an even number on the face; so in this sense which ones would you pick? S. [Subject ticks 4] This one. E. So why wouldn’t you pick any of the other cards? S. Because it says that if there is an even number on the face, then there is a vowel, so it would have to be one of those [referring to the numbers]. . . . E. [This rule] says that if there is a vowel on one side of the card, either face or back, then there is an even number on the other side, either face or back. S. I would pick that one [the A] and that one [the 4]. E. So why? S. Because it would show me that if I turned that [pointing to the 4] over and there was an A then the 4 is true, so I would turn it over. Oh, I don’t know. This is confusing me now because I know it goes only one way. . . . S. No, I got it wrong didn’t I, it is one way, so it’s not necessarily that if there is an even number then there is a vowel. The second example is of a subject who gives the normative response in experiment 3, but nonetheless goes astray when forced to consider the back/face interpretation. Subject 4. [Experiments (1), (2), and (3)] E. OK This says that if there is a vowel on the face [pointing to the face] of the card, then there is an even number on the back of the card. How is that different to ... S. Yes, it’s different because the sides are unidirectional. S. So would you pick different cards? S. If there is a vowel on the face ... I think I would pick the A. E. And for this one? [referring to the second statement] This is different again because it says if there is a vowel on the back ... K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 511 S. [completes sentence] then there is an even number on the face. I think I need to turn over the 4 and the 7. Just to see if it (the 4) has an A on the back. E. OK Why wouldn’t you pick the rest of the cards? S. I’m not sure, I haven’t made up my mind yet. This one (the A) I don’t have to turn over because it’s not a vowel on the back, and the K is going to have a number on the back so that’s irrelevant. This one [the 4] has to have a vowel on the back otherwise the rule is untrue. I still haven’t made up my mind about this one (the 7). Yes, I do have to turn it over because if it has a vowel on the back then it would make the rule untrue. So I think I will turn it over. I could be wrong. [When presented with the rule where the anaphora have the intended interpretation] S. I would turn over this one (the A) to see if there is an even number on the back and this one (the 7) to see if there was a vowel on the back. Our third example is of a subject who explicitly states that the meaning of the implication must change when considering back/face anaphora. Subject 16. [Experiments (1), (2), and (3)] [Subject has correctly chosen A in condition (1).] E.

The next one says that if there is a vowel on the back of the card, so that’s the bit you can’t see, then there is an even number on the face of the card, so that’s the bit you can see; so that again is slightly different, the reverse, so what would you do? S. Again I’d turn the 4 so that would be proof but not ultimate proof but some proof ... E. With a similar reasoning as before? S. Yes, I’m pretty sure what you are after ... I think it is a bit more complicated this time, with the vowel on the back of the card and the even number, that suggests that if and only if there is an even number there can be a vowel, I think I’d turn others just to see if there was a vowel, so I think I’d turn the 7 as well. [In condition 3 chooses A and 4] We thus see that, in these subjects, the direction of the conditional is related to the particular kind of deixis assumed for ‘one side ... other side’.

This shows that the process of natural language interpretation in this task need not be compositional, and that, contrary to Gebauer and Laming’s claim, subjects need not have a persistent interpretation of the conditional. Two questions immediately arise: 1. why would there be this particular interaction? 2. what does the observed interaction tell us about performance in the standard Wason task? Question 2 can easily be answered. If subjects would decompose the anaphoric expression ‘one side ... other side’ into two deictic expressions ‘face/back’ and ‘back/face’ and would then proceed to reverse the direction of the implication in the latter case, they should choose the p and q cards. Also, since the expression ‘one side ... other side’ does not appear in a deontic rule such as ‘if you want to drink alcohol, you have to be over 18’, subjects will not be distracted by this particular difficulty. Question 1 is not answered as easily. There may be something pragmatically peculiar about a conditional of which the consequent, but not the antecedent, is known. These are often used for diagnostic purposes (also called abduction): if we have a rule which says ‘if switch 1 is down, the light is on’, and we observe that the light is on, we are tempted to conclude that switch 1 must be down. This however is making an inference, not stating a conditional; but then subjects are perhaps not aware of the logical distinction between the two. It is of interest that the difficulty discussed here was already identified by Wason and Green (1984), albeit in slightly different terms: their focus is on the distinction between a unified and a disjoint representation of the stimulus. A unified stimulus is one in which the terms referred to in the conditional cohere in some way (say as properties of the same object, or as figure and ground), whereas in a disjoint stimulus the terms may be properties of different objects, spatially separated.

Wason and Green conjectured that it is disjoint representation which accounts for the difficulty in the selection task.

To test the conjecture they conducted three experiments, varying the type of unified representation. Although they use a reduced array selection task (RAST), in which one chooses only between q and ¬q, relative performance across their conditions can still be compared.

Their contrasting sentence rule pairs are of great interest, partly because they happen to contain comparisons of rules with and without anaphora. There are three relevant experiments numbered 2–4. Experiment 2 contrasts unified and disjoint representations without anaphora in either, and finds that unified rules are easier. Experiment 3 contrasts unified and disjoint representations with the disjoint rule having anaphora. Experiment 4 contrasts uni- fied and disjoint representations but removes the anaphora from the disjoint rule while adding another source of linguistic complexity (an extra tensed verb plus pronominal anaphora) to the unified one. For a full discussion of their experiments we refer the reader to Stenning and van Lambalgen (2001)Stenning and van Lambalgen (2001); here we discuss only experiment 2. In their experiment 2, cards show shapes (triangles, circles) and colours (black, white), and the two sentences considered are (4) Whenever they are triangles, they are on black cards. (5) (2b) Whenever there are triangles below the line, there is black above the line. That is, in (4) the stimulus is taken to be unified because it is an instance of figure/ground, whereas in (5) the stimulus consists of two parts and hence is disjoint. Performance for sentence (4) was worse than for sentence (4) (for details see Wason & Green, 1984, pp. 604–607). We would describe the situation slightly differently, in terms of the contrast between deixis and anaphora. Indeed, the experimental set-up is such that for sentence (5), the lower half of the cards is hidden by a bar, making it analogous to condition (2), where the object mentioned in the antecedent is hidden.

We have seen above that some subjects have difficulties with the intended direction of the conditional in experiment (2). Sentence (5) would be the ‘difficult half’ of the anaphora-containing sentence “Whenever there are triangles on one side of the line, there is black on the other side of the line”. Sentence (4) does not contain any such anaphora. With Wason and Green we would therefore predict that subjects find (5) more difficult. K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 513 3. Experiment In this experiment, several conditions are compared with base-line performance on the classical descriptive ‘abstract’ task, each designed to assess the contribution to determination of choice by one of the factors discussed above. We describe each condition in turn, and then present the results together. 3.1. The conditions 3.1.1. Classical ‘abstract’ task To provide a baseline of performance on the selection task with descriptive conditionals, the first condition repeats Wason’s (1968) classical study with the following instructions and materials (see instructions in Section 1). The other conditions are described through their departures from this baseline condition. 3.1.2. Two-rule task After the preliminary instructions for the classical task, the following instructions were substituted in this condition: ... Also below there appear two rules. One rule is true of all the cards, the other isn’t. Your task is to decide which cards (if any) you must turn in order to decide which rule holds. Don’t turn unnecessary cards. Tick the cards you want to turn. Rule 1: If there is a vowel on one side, then there is an even number on the other side. Rule 2: If there is a consonant on one side, then there is an even number on the other side. Normative performance in this task, according to the classical logical competence model, is to turn only the not-Q card. The rules are chosen so that the correct response is to turn exactly the card that the vast majority of subjects fail to turn in the classical task.

This has the added bonus that it is no longer correct to turn the P card which provides an interesting comparison with the original task. This is the only descriptive task for which choosing the true-antecedent case is an error. By any obvious measure of task complexity, this task is more complicated than the classical task. It demands that two conditionals are processed and that the implications of each case are considered with respect to both rules and with respect to a distribution of truth values. Nevertheless, our prediction was that performance should be substantially nearer the logically normative model for the reasons described above.

The ‘contingency instructions’, designed to remove any difficulties in understanding that choices have to be made ignoring possible interim feedback, after an identical preamble, read as follows, where the newly italicised portion is the change from the classical instructions: ... Also below there appears a rule. Your task is to decide which of these four cards you must turn (if any) in order to decide if the rule is true. Assume that you have to decide whether to turn each card before you get any information from any of the turns you choose to make. Don’t turn unnecessary cards. Tick the cards you want to turn. 514 K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 If the contingencies introduced by the descriptive semantics are a source of difficulty for subjects, this additional instruction should make the task easier. In particular, since there is a tendency to choose the P card first, there should be an increase in not-Q responding. After conducting this experiment we found a reference in Wason (1987) to use of essentially similar instructions in his contribution to the Science Museum exhibition of 1977, and there are mentions in other early papers. Clearly he had thought about assumed contingencies between card choices as a possible confusion. Wason reports no enhancement in his subjects’ reasoning, but he does not report whether any systematic comparison between these and standard instructions was made, or quite what the population of subjects were. 3.1.4. Judging truthfulness of an independent source We chose to investigate the possible contribution of problems arising from the authoritative position of the experimenter and the balance of cooperative and adversarial stances required toward different parts of the task materials through instructions to assess truthfulness of the source instead of truth of the rule, and we separated the source of the rule from the source of the instructions (the experimenter).

The instructions read as follows: ... Also below there appears a rule put forward by an unreliable source. Your task is to decide which cards (if any) you must turn in order to decide if the unreliable source is lying. Don’t turn unnecessary cards. Tick the cards you want to turn. With these instructions there should be no discomfort about seeking to falsify the rule. Nor should any falsity of the rule throw any doubt on the truthfulness of the rest of the instructions, since the information sources are independent. These ‘truthfulness’ instructions are quite closely related to several other manipulations that have been tried in past experiments. In the early days of experimentation on this task, when it was assumed that a failure to try and falsify explained the correct response, various ways of emphasising falsification were explored. Wason (1968) instructed subjects to pick cards which could break the rule and Hughes (1966) asked them whether the rule was a lie. Neither instruction had much effect. However, these instructions fail to separate the source of the rule from the experimenter (as the utterer of the rule) and may fail for that reason. Kirby (1994) used a related manipulation in which the utterer of the rule was a machine said to have broken down, needing to be tested to see if it was working properly again after repair. These instructions did produce significant improvement. Here the focus of the instruction is to tell whether the machine is ‘broken’, not simply whether the utterance of the rule is a falsehood. This might be expected to invoke a deontic interpretation (Kirby’s condition is akin to the ‘production line inspection scenarios’ mentioned before), and so it might be that the improvement observed is for this reason. Platt and Griggs (1993) explored a sequence of instructional manipulations in what they describe as abstract tasks which culminate in 81% correct responding. One of the changes they make is to use instructions to ‘seek violations’ of the rule, which is relevant here for its relation to instructions to test the truth of an unreliable source. Their experiments provide some insight into the conditions under which these instructions do and don’t facilitate performance.

Platt and Griggs study the effect of ‘explications’ of the rule and in the most effective manipulations actually replace the conditional rule by explications such as: ‘A card with a vowel on it can only K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 515 have an even number, but a card with a consonant on it can have either an even or an odd number.’ Note that this explication removes the problematic anaphora (see above, Section 2.5), explicitly contradicts a biconditional reading, and removes the conditional, with its tendency to robust interpretation.

But more significantly still, the facilitation of turning not-Q is almost entirely effected by the addition of ‘seek violations’ instructions, and these instructions probably switch the task from a descriptive to a deontic task. In reviewing earlier uses of the ‘seek violations’ instruction Platt and Griggs note that facilitation occurs with abstract permission and obligation rules but not with the standard abstract task. So, merely instructing to seek violations doesn’t invoke a deontic reading when the rule is still indicative, and the instruction is still interpretable descriptively—‘violations’ presumably might make the rule false. But combined with an ‘explication’ about what cards can have on them (or with permission or obligation schema) they appear to invoke a deontic reading. As we shall see, 80% seems to be about the standard rate of correct responding in deontically interpreted tasks regardless of whether they contain material invoking social contracts. So the present manipulation does not appear to have been explored before. We predicted that separating the source of the rule from the experimenter while maintaining a descriptive reading of the rule should increase normative responding. 3.1.5. Exploring other kinds of rules than conditionals This condition of the experiment was designed to explore the malleability of subjects’ interpretations of rules other than conditionals. In particular we chose a conjunctive rule as arguably the simplest connective to understand. As such this condition has a rather different status from the others in that it is not designed to remove a difficulty from a logically similar task but to explore a logical change. Since it was an exploration we additionally asked for subjects’ justification of their choices afterwards. A conjunctive rule was combined with the same instructions as are used in the classical abstract task. Rule: There is a vowel on one side, and there is an even number on the other side. The classical logical competence model demands that subjects should turn no cards with such a conjunctive rule—the rule interpreted in the same logic as Wason’s interpretation of his conditional rule can already be seen to be false of the not-P and not-Q cards. Therefore, under this interpretation the rule is already known to be false and no cards should be turned. We predicted that many subjects would not make this interpretation of this response. An alternative, perfectly rational, interpretation of the experimenter’s intentions is to construe the rule as having deontic force (every card should have a vowel on one side and an even number on the other) and to seek cards which might flout this rule other than ones that obviously can already be seen to flout it. If this interpretation were adopted, then the P and Q cards would be chosen.

Note that this interpretation is deontic even though the rule is syntactically indicative. 3.2. Subjects Subjects were 377 first year Edinburgh undergraduates, from a wide range of subject backgrounds. 516 K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 Table 1 Frequencies of card choice combinations by conditions Condition P QQPP ¬Q ¬Q ¬P,Q P,Q, ¬Q ¬P,¬Q All None Miscellaneous Total Classical 56 7 8 4* 3 7 1 2 9 8 5 108 Two-rule 8 8 2 1 9* 2 1 0 0 2 4 37 Contingency 15 0 3 8* 1 6 4 8 3 0 3 51 Truthfulness 39 6 9 14* 0 7 3 6 8 15 5 112 Conjunction 31 2 9 7 2 0 0 1 0 9* 8 69 Classical logical competence responses are marked asterisk (*). Any response made by at least three subjects in at least one condition is categorised: everything else is miscellaneous. 3.3. Method All tasks were administered to subjects in classroom settings in two large lectures. Subjects were randomly assigned to the different conditions, with the size of sample in each condition being estimated from piloting on effect sizes. Adjacent subjects did different conditions. The materials described above were preceded by the following general instruction: The following experiment is part of a program of research into how people reason. Please read the instructions carefully. We are grateful for your help. 3.4. Results Those subjects (12 across all conditions) who claimed to have done similar tasks before, or to have received any instruction in logic were excluded from the analysis. Table 1 presents the data from all of the conditions. Any response made by at least three subjects in at least one condition is categorised: all other responses are treated as miscellaneous. Subjects were scored as making a completely correct response, or as making at least some mistake, according to the classical logical competence model. For all the conditions except the two-rule task and the conjunction condition, this ‘competence model’ performance is choice of P and not-Q cards. For the two-rule task the correct response is not-Q.

For the conjunction condition it is to turn no cards. Table 2 presents the tests of significance of the percentages of correct/incorrect responses as compared to the baseline classical condition. 3.7% of subjects in the baseline condition made the correct choice of cards. The percentages completely correct in the other conditions were two-rule condition 24%; ‘truthfulness’ condition 13%; in the ‘contingency’ condition 18%; and in the conjunction condition 13%. The significance levels of these proportions by Fisher’s exact test appear in Table 2. The two-rule task elicits substantially more competence model selections than the baseline task. In fact the completely correct response is the modal response. More than six times as many subjects get it completely correct even though superficially it appears a more complicated task. The next most common responses are to turn P with Q, and to turn just Q. The former is the modal response in the classical task. The latter appears to show that even with unsuccessful K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 517 Table 2 Proportions of subjects completely correct and significances of differences from baseline of each of the four manipulations Condition Wrong Right P Percent correct Classical baseline 104 4 3.7 Two-rule 28 9 0.004 24 Contingency 37 8 0.005 18 Truthfulness 98 14 0.033 13 Conjunction 60 9 0.022 13 subjects, this task shifts attention to the consequent cards—turnings of P are substantially suppressed: 32% as compared to 80% in the baseline task. Contingency instructions also substantially increase completely correct responding, and do so primarily at the expense of the modal P with Q response. In particular they increase not-Q choice to 50%. Instructions to test the truthfulness of an unreliable source have a smaller effect which takes a larger sample to demonstrate, but nevertheless, 13% of subjects get it completely correct, nearly four times as many as the baseline task. The main change is again a reduction of P with Q responses, but there is also an increase in the response of turning nothing. Completely correct performance with a conjunctive rule was 13%—not as different from the conditions with conditional rules as one might expect if conditionals are the main source of difficulty.

The modal response is to turn the P and Q cards—just as in the original task. Anecdotally, debriefing subjects after the experiment reveals that a substantial number of these modal responses are explained by the subjects in terms construable as a deontic interpretation of the rule, roughly paraphrased as “The cards should have a vowel on one side and and even number on the other”. The P-with-Q response is correct for this interpretation. 3.5. Discussion of results Each of the manipulations designed to facilitate reasoning in the classical descriptive task makes it substantially easier as predicted by the semantic/pragmatic theories that the manipulations were derived from. The fact that subjects’ reasoning is improved by each of these manipulations, provides strong evidence that subjects’ mental processes are operating with related categories in the standard laboratory task. Approaches like those of Sperber’s Relevance Theory propose that the subjects solve the task ‘without thinking’. The fact that these instructional manipulations have an impact on subjects’ response strongly suggests that the processes they impact on are of a kind to interact with the content of the manipulations. This still leaves the question at what level of awareness? But even here, the tutorial dialogues suggest that the level is not so far below the surface as to prevent these processes being quite easily brought to some level of awareness. It is important to resist the idea that if subjects were aware of these problems, that itself would lead to their resolution, and the conclusion that therefore subjects can’t be suffering these problems. Extensive tutoring in the standard task which is sufficient to lead subjects 518 K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 to make their problems quite explicit, generally does not lead, at least immediately, to stable insight. This is as we should expect. If, for example, subjects become aware that robustness to counterexamples makes the task instructions uninterpretable, that itself does not solve their problem of how to respond. Or, for another example, if subjects become aware of being unable to reflect contingencies between choices in their responses, that does not solve the problem of what response to make. General questions of what concepts subjects have for expressing their difficulties, and in what ways they are aware of them are important questions, especially for teaching. These questions invite further research through tutoring experiments, but they should not be allowed to lead to misinterpretation of the implications of the present results.

We take each condition in turn 3.5.1. The two-rule task There are other possible explanations as to how the novel task functions to facilitate competence model responding. If subjects tend to confuse the two situations: “this rule is true of this card” and “this card makes this rule true” then it may help them that the two-rule task is calculated to lead them early to a conflict that a single card (e.g., the true consequent card) “makes both rules true” even as the instructions insist that one rule is true and one false. Although some subjects may infer that there must therefore be something wrong with the instructions, others progress from this impasse to appreciate that cases can comply with a rule without making it true—the semantic relations are asymmetrical even though the same word ‘true’ can, on occasion, be used for both directions. This confusion between semantic relations is evidently closely related to what Wason early called a ‘verification’ strategy (searching for compliant examples) in that it may lead to the same selections, but it is not the strategy as understood by Wason. This confusion between semantic relations is in abundant evidence in the dialogues. The two-rule task makes an interesting comparison with at least three other findings in the literature. First, the task was designed partly to make explicit the choice of hypotheses which subjects entertain for the kind of rational choice modelling proposed by Oakford and Chater. Providing two explicit rules (rather than a single rule to be compared with an assumed null hypothesis of independence) makes the false-consequent card unambiguously the most informative card and therefore the one which these models should predict will be most frequently chosen. In our data for this task, the false-consequent card comes in third substantially behind the true antecedent and true consequent cards. For a second comparison, Gigerenzer and Hug (1992) studied a manipulation which is of interest because it involves both a change from deontic to descriptive interpretation and from single to two-rule task. One example scenario, had a single rule that hikers who stayed overnight in a hut had to bring their own firewood. Cards represented hikers or guides and bringers or non-bringers of wood. As a single-rule deontic task with instructions to see whether people obeyed the rule, this produced 90% correct responding, a typical result.

But when the instructions asked the subject to turn cards in order to decide whether this rule was in force, or whether it was the guides who had to bring the wood, then performance dropped to 55% as conventionally scored. Gigerenzer and Hug explain this manipulation in terms of ‘perspective change’, but this is both a shift from a deontic task to a descriptive one (in the authors’ own words ‘to judge whether the rule is descriptively wrong’ (our emphasis), and from a single rule to a two-rule task, albeit that the second rule is mentioned but not printed alongside its alternative. K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 519 Unfortunately, the data cannot be scored appropriately for the classical competence model for the two-rule task from what is presented in the paper, but it appears to produce a level of performance higher than single rule abstract tasks but lower than deontic tasks, just as we observe. Direct comparison of the two subject populations is difficult as Gigerenzer’s subjects score considerably higher on all the reported tasks than ours, and no baseline single-rule descriptive task is included. The third comparison of the two-rule task is with work on ‘reasoning illusions’ by JohnsonLaird and coworkers mentioned above (Johnson-Laird & Byrne, 2002; Johnson-Laird, Legrenzi, Girotto, & Legrenzi, 2000; Johnson-Laird & Savary, 1999). Johnson-Laird and Savary (1999, p. 213) presented exactly comparable premises to those we used in our two-rule task but asked their subjects to choose a conclusion, rather than to seek evidence about which rule was true and which false. Their interest in these problems is that mental models theory assumes that subjects ‘only represent explicitly what is true’, and that this gives rise to ‘illusory inferences’. The following material was presented with the preface that both statements are about a hand of cards, and one is true and one is false: 1. If there is a king in the hand, then there is an ace. 2. If there is not a king in the hand, then there is an ace. Select one of these conclusions:

1. There is an ace in the hand.
2. There is not an ace in the hand.
3. There may or may not be an ace in the hand.

Johnson-Laird and Savary (1999) report that 15 out of 20 subjects concluded that that there is an ace in the hand, and the other five concluded that there might or might not be an ace in the hand. They claim that the 15 subjects are mistaken in their inference. Hence, apart from one caveat to which we will return, there is no reasonable interpretation of either the disjunction or the conditionals that yields a valid inference that there is an ace. (p. 204) The caveat appears to be that there are interpretations on which the premises are inconsistent and therefore anything (classically) logically follows, including this conclusion (p. 220). What struck us initially is that our subjects show some facility with reasoning about assumptions of the same form even when our task also requires added elements of selection rather than merely inference. Selection tasks are generally harder. Specifically, our two-rule task introduces the circumstance which Johnson-Laird and Savary claim mental models predicts to introduce fundamental difficulty, i.e. reasoning from knowledge that some as yet unidentifiable proposition is false. This introduction makes the selection task much easier for subjects than its standard form in our experiment. On a little further consideration, there is at least one highly plausible interpretation which make this conclusion valid and is an interpretation which appears in our dialogues from the two-rule task. Subjects think in terms of one of the rules applying and the other not, and they confuse (not surprisingly) the semantics of applicability with the semantics of truth. This is exactly the semantics familiar from the IF ... THEN ... ELSE construct of imperative 520 K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 computer languages. If one clause applies and the other doesn’t then it follows that there is an ace.

Whether the alternativeness of the rules is expressed metalinguistically (by saying one is false and one true) or object-linguistically (with an exclusive disjunction), and whether the rules are expressed as implications or as exclusive disjunctions, thinking in terms of applicability rather than truth is a great deal more natural and has the consequence observed. Johnson-Laird (personal communication) objects that this interpretation just is equivalent to the mental models theory one. But surely this is a crisp illustration of a difference between the theories. If an interpretation in terms of applicability is taken seriously, subjects should draw this conclusion, and should stick to it when challenged (as many do). In fact failure to draw the inference is an error under this interpretation. Only mental models theory’s restriction to a range of classical logical interpretations makes it define the inference as an error. We will put our money on the subjects having the more plausible interpretation of the conditionals here and the experimenters suffering an illusion of an illusion. 3.5.2. Contingency instructions As mentioned above, effects of this manipulation have been reported by Wason in early studies, but his theory of the task did not assign it any great importance, or lead him to systematically isolate the effect, or allow him to see the connection between descriptive interpretation and this instruction. In the context of our hypothesis that it is descriptive versus deontic interpretation which is the main factor controlling difficulty of the task through interactions between semantics and instructions, this observation that contingency has systematic and predicted effects provides an explanation for substantial differences between the abstract task and content facilitations which invoke deontic interpretations. None of the other extant theories assign any significant role to this observation. The effectiveness of contingency instructions presents particular difficulties for current rational choice models, since the choice of false-consequent cards rises so dramatically with an instruction which should have no effect on the expected information gain. 3.5.3. Truthfulness instructions As described above, the truthfulness condition differs from past attempts to cue subjects to seeking counterexamples. Its success in bringing about a significant if small improvement may have resulted from effects of the manipulation other than the social psychological effects or the more general pragmatic effects of the balance of cooperative and adversarial stances described above. For example, it may well be that at least some subjects are more adept at thinking about the truthfulness of speakers than the truth values of their utterances abstracted from such issues as ignorance or intent to deceive. 3.5.4. The conjunctive rule

The purpose behind the conjunctive version of the task was rather different from the other manipulations, namely to show that many features of the task militate against the adoption of Wason’s intended interpretation of his instructions quite apart from difficulties specific to conditionals. The interpretation of sentence semantics is highly malleable under the forces of task pragmatics. The results show that a conjunctive rule is treated very like (even if significantly differently from) the if ... then rule.

A higher proportion of subjects make the ‘classically K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 521 correct’ response than in the baseline task (13% as compared to 3.7%) but the modal response is the same (P and Q) and is made by similar proportions of subjects (45% conjunctive as compared to 52% baseline). One possibility is that a substantial number of subjects adopt a deontic interpretation of the rule and are checking for the cards that might be violators but are not yet known to be. It is also possible that these results have more specific consequences for interpretation of the standard descriptive task. We know from Fillenbaum’s (1978) work and from our own paraphrase tasks (Stenning & van Lambalgen, 2001) that about a half of subjects most readily entertain a conjunctive reading of if ... then sentences. The developmental literature reviewed in Evans, Newstead, and Byrne (1993) reveals this interpretation to be even commoner amongst young children. It is most implausible that this interpretation is due merely to some polysemy of the connective ‘if ... then’. Much more plausible is that the conjunctive reading is the result of assuming the truth of the antecedent suppositionally, and then answering subsequent questions from within this suppositional context. Be that as it may, if subjects’ selections in the conditional rule tasks correspond to the selections they would make given an explicit conjunction in the conjunction condition, and we are right that these selections are driven in this condition by an implicitly deontic interpretation of the conjunction, then this suggests a quite novel explanation of at least some ‘matching’ responses in the original conditional task. Perhaps the similar rate of choice of P and Q in the conjunction and ‘if ... then’ conditions points to a substantial number of subjects applying a deontic conjunctive interpretation in the standard task? This hypothesis in turn raises the question how such a reading would interact with negations in the ‘negations’ paradigm which is the source of the evidence for Evan’s (1972) ‘matching’ theory and therefore the source of one leg of ‘dual process’ theory (Evans & Over, 1996)? If interpretations stemming from deontic readings tend strongly toward wide sentential scope for negation, then one would predict that the rule with negated antecedent would be read as ‘Its not the case that there is a vowel on one side and an even number on the other’ which would lead to the same choices of A and 4, though for opposite reasons. That is, K and the 7 are now seen as already compliant, and the A and the 4 have to be tested to make sure they don’t have an even number or a vowel, respectively.

Pursuing this line of thought further suggests that negations in the second clause may not be interpretable in this framework (because of their interactions with the anaphors) and subjects might be forced to interpret them with the same wide scope, again leading to the same card choices, and potentially explaining why ‘matching’ appears to be unaffected by negation. Providing a semantic explanation, of course leaves open the questions about what processes operate. Evidently, further research will be required to explore these possibilities. The semantic analyses may seem complex but they make some rather strong predictions about how subjects should react to card turnings. This is an interesting line for future research holding out the possibility of a semantic basis for matching behaviour. One objection to these various interpretations of the conjunction condition results might be that there are other interpretations of the rule used. Subjects might, for example, have interpreted the rule existentially, as claiming that at least one card had a vowel on one side and an even number on the other. This would lead normatively to the same A and 4 selections. Accordingly, in a follow-up experiment, we revised the conjunctive rule to: Rule: There are vowels on one side of the cards and even numbers on the other. 522 K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 Table 3 Frequencies of card choice combinations by conditions Condition P Q Q P P ¬Q ¬Q ¬P,Q P,Q, ¬Q ¬P,¬Q All None Miscellaneous Total Classical 56 7 8 4* 3 7 1 2 9 8 5 108 Two-rule 8 8 2 1 9* 2 1 0 0 2 4 37 Contingency 15 0 3 8* 1 6 4 8 3 0 3 51 Truthfulness 39 6 9 14* 0 7 3 6 8 15 5 112 Conjunction 31 2 9 7 2 0 0 1 0 9* 8 69 Baseline 2 10 2 10 1* 1 0 0 0 0 1 3 30 Conjunction 2 21 1 3 1 0 0 0 0 0 1* 2 30 Abstract subjunctive 13 2 8 3* 0 1 2 1 1 0 0 31 The modified conjunction task and its new baseline condition are below the earlier results which are repeated here for convenience. Classical logical competence responses are marked asterisk (*). Any response made by at least three subjects in at least one condition is categorised: everything else is miscellaneous. It is implausible that this rule might be interpreted existentially.

We ran this rule in another condition with its own baseline condition to ensure comparability of the new population. Table 3 shows the results of this experiment, with the earlier results repeated for convenient comparison. The result was slightly more extreme with this version of the conjunctive rule. 70% of subjects (rather than 45%) chose the P and Q cards. The proportion of classical logical competence model responses was identical to that for the baseline conditional task, and the baseline condition showed the population was comparable. The rewording raised the proportion of subjects giving the modal P and Q response. This rewording of the conjunctive rule appeared to make the universal deontic reading even less ambiguously the dominant reading. These conjunctive rule results illustrate several general issues: how easy it is to invoke a deontic reading of indicative wording; how unnatural it is for naive subjects to adopt an ‘is-this-sentence-literally-true’ perspective rather than a ‘what-are-the-experimenter’sintentions’ perspective; that the difficulty of classical interpretation can be as great with conjunction as with implication. Although the difficulties may be different difficulties, there is a real possibility that they are closely related through conjunctive suppositional interpretations of the conditional. Finally, we explored one other obvious manipulation designed to follow up the malleability of subjects’ interpretations exposed by the conjunctive rule. If subjects’ difficulties in the original descriptive task follow from the complexities of descriptive semantics, is it possible to restore deontic levels of performance in the abstract task merely by making the rule subjunctive? We ran a further condition in which the rule used was: If a card has a vowel on one side, then it should have an even number on the other. and the instruction was to choose which of the four cards you must turn in order to decide if the card complies with the rule. The results of this condition are shown in Table 3 in the ‘Abstract subjunctive’ row. Three subjects of 31 turned P and not-Q, as compared to one of 30 in the baseline. If this is a facilitation it is a small one. Merely using subjunctive wording may be insufficient to invoke a deontic K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 523 reading.

This is not so surprising since there is an alternative ‘epistemic’ interpretation of the subjunctive modal here which might still be used with a descriptive semantics for the underlying rule. Imagine that the rule is clearly a robust descriptive scientific law (perhaps ‘All ravens are black’), then one might easily state in this context, that a card with ‘raven’ on one side should have ‘black’ on the other, implying something about what the cards have to be like to comply with the scientific law (still with a descriptive semantics underlying), rather than what the birds have to do to comply with a legal regulation. This possibility of interpretation may make it hard to invoke a deontic interpretation without further contentful support. Contentful support is, of course, what the various ‘quality inspector’ scenarios provide. Contentful support is also what permission and obligation schemas, and the ‘seek violations’ instructions in combination with modal explications of the rule provide, as reported by Platt and Griggs (1993). In summary of all the conditions, these results corroborate the findings of the tutoring experiments, also reported in Stenning and van Lambalgen (2001), that our manipulations alleviate real sources of difficulty with interpretation for subjects in the original descriptive task—sources of difficulty which do not apply in the deontic task.

This evidence suggests that far from failing to think at all, subjects are sensitive to several important semantic issues posed by the descriptive task. 4. General discussion What implications do these results have for theories of reasoning, and for the place of interpretation in cognitive theory more generally?What do they tell us about the way the field has viewed the relation between logical and psychological analyses of reasoning, and how that relation might be construed more productively? Each theory is a somewhat different case. These results remove the founding evidence for ‘evolutionary’ theories which propose that the difference in performance on ‘social contract’ conditionals and descriptive conditionals needs to be explained by innate cheating detection modules evolved in the Pleistocene. Our evidence is that the descriptive and deontic tasks are quite different tasks and that the former is fraught with interpretational problems where the latter is straightforward. So the selection task evidence has no direct bearing on innateness, modularity, or the Pleistocene, though it can be used to formulate some interesting and contrary hypotheses about cheating detection (Stenning, 2002). More generally, this reappraisal of the selection task provides a good example of how arguments for ‘massive modularity’ in cognition should be treated with some scepticism.

The original experiments found variation in performance as a function of difference in materials. Sweeping generalisations were then made from the laboratory task without any consideration of the relation between that task and subjects other communication and reasoning abilities. Just as our analysis directs attention to the differences between variations on the selection task and the continuities between natural language communication inside and outside the selection task, so our proposals return attention to the evolutionary issue how humans’ generalised communication capacities arose in evolution. The interactions between logic’s dual apparatus of interpretation and of derivation constitute an exquisitely context sensitive conceptual framework for the study of human reasoning and communication, whether in evolution, development or education. 524 K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 The non-evolutionary theories of human reasoning are most generally affected by the present results through their implications for the relation between logic and psychology. We focus here particularly on relevance theory, mental models theory, and rational analysis models. Inasmuch as relevance theory assumes that human reasoning and communication abilities are general abilities which interact with contextual specificities, our general drift is sympathetic to relevance theory’s conclusions.

We agree with relevance theory that the goal must be to make sense of what subjects are doing in the very strange situation of laboratory reasoning tasks—in a memorable phrase, to see subjects as ‘pragmatic virtuosos’ (Girotto, Kemmelmeier, van der Henst) rather than to see them as logical defectives. Our divergences from relevance theory are about the granularity of interaction between semantic and pragmatic processes in subjects’ reasoning; in the range of behaviour we believe to be of theoretical concern; and in the program of research. Relevance theory explains pragmatic effects in terms of very general factors—relevance to the task at hand and cost of inference to reveal that relevance.

These factors must always operate with regard to some semantic characterisation of the language processed. Condensing analysis into these two pragmatic factors however seems, in this case at least, to have led to relevance theorists missing the critical semantic differences which drive the psychological processes in this task—the differences between deontic and descriptives and their consequences for interpretation in this task’s setting.

Grice's theory’s conclusion has been that not much reasoning goes on when undergraduate subjects get the abstract task ‘wrong’.

Our combination of tutoring observations and experiment strongly suggest that a great deal goes on, however speedily the ‘precomputed’ attitudes are brought to bear in the actual task, and that the exact nature of the processes is highly variable from subject to subject.

Taking logic more seriously leads us to seek more detailed accounts of mental processes. The current results have rather wide-ranging implications for mental models theory. Some implications specific to the theory’s application to the selection task have already been discussed.

Others are more general, about mental models theory’s relation to logic and semantics. Since Johnson-Laird’s early work with Wason on the selection task mental models theory has been elaborated by a complex theory of the meanings of conditionals and the overlay of semantics by ‘pragmatic modulation’, and the theory has been much exercised by the issue whether subjects’ interpretations of the rule in the selection task is truth-functional or not. However, this consideration of semantic possibilities has been divorced from any consideration of their implications for the subjects’ interpretation of the task. If subjects’ reading of the rule is non-truth-functional (by whatever semantic or pragmatic route), then the subject should experience a conflict between their interpretation and the task instructions. This conflict has never been acknowledged by mental models theorists. What justification can there then be for applying the classical logical competence model as a criterion of correct performance while simultaneously rejecting it as an account of how subjects interpret the conditional? But the most significant implications of our analysis for mental models theory are implications for its general understanding of the relation between logic and psychology. Mental models theory and its opponents such as the ‘mental logics’ of Rips (1994), agree in assigning greatest prominence to the issue whether subjects reason using models or rules. Our claim is that both camps’ interpretations of these logical concepts are too mechanical, and the consequence is that the psychological investigations fail to give empirical content to the distinction. K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 525

Modern logic formalises its concept of interpretation in model theory, and of derivation (including rules of inference) in proof theory. Of course, one can reason over types of model but only within some meta-language (often in practice a natural language such as English or a formal language such as set theory), which, of course, in turn requires its own proof-theory and rules of inference. So rules are involved in this reasoning too. One sees this issue exemplified in mental models theory, which of necessity must also include principles for the manipulation of models. The principles are rules for the manipulation of model representations, and are just as formal, linguistically specified and content free as rules of inference in sentential systems. In fact there are point-by-point correspondences, not just general equivalences. For the systems of concern in the psychology of reasoning, logic provides completeness proofs.

The import of those proofs is that any inference described in a semantic way using models can be captured by a syntactic process using sentential rules. In fact models in mental models theory are what proof theorists call cases in a proof-by-cases strategy. Looking from the outside, as psychological researchers are forced to do, one cannot distinguish rules from models on the basis of observing merely the inputs and outputs of reasoning processes (see Hodges, 1993 for a logician’s appraisal of mental models theory’s account of its relation to logic). Coopting the interpretational apparatus of logic as a mechanism for modelling derivational processes, merely obscures the crucial distinction between interpretation and derivation.

These are general logical arguments about correspondences between classes of system. Stenning and Oberlander (1995) and Stenning and Yule (1997) provided detailed studies of the two most relevant particular equivalences between model and rule systems: mental models and Euler diagrams, and between mental models and a fragment of propositional logic. Stenning and van Lambalgen (submitted) provide a non-monotonic model of conditionals which shows how sentences and models work together in the processes of interpretation and reasoning. These arguments show that the issue of rules versus models has not yet been given any empirical content. The psychological debate misconstrues logic by treating it as providing mechanisms of reasoning, whereas it should be construed at a more abstract level. For example, our present proposals about the selection task claim that the dominant factor in determining reasoning will be whether subjects assign descriptive or deontic form to the rule presented. The processes of reasoning from either of these assigned interpretations can be formulated as sentential reasoning or as model-based reasoning, or as some combination of the two (see, e.g., Stenning & van Lambalgen, submitted for a treatment of the ‘suppression’ task in these terms). Finally, where do our findings leave the rational analysis models of selection task behaviour as optimal experiment (Oaksford & Chater, 1994). We applaud these authors’ challenge to the uniqueness of the classical logical model of the task, and also their insistence that the deontic and descriptive versions of the task require distinct accounts. This theory is clearly more sophisticated about the relations between formal models and cognitive processes than the theories it challenges. However, our proposals are quite divergent in their cognitive consequences. The rational analysis models reject any role for logic, claiming that the task is an inductive one. But this move smuggles logic in the back door. Applying optimal experiment theory requires assigning probabilities to propositions, and propositions are specified in some underlying language. The logic underlying the rational analysis model is the same old classical propositional calculus with all its attendant divergences from subjects’ interpretations of the task materials. This has direct psychological consequences.

The rational analysis models treat subjects’ performances as being equally correct as measured by the two distinct competence models for descriptive and deontic tasks.

Our analysis predicts that the descriptive task will be highly problematical and the deontic task rather straightforward. The tutorial evidence on the descriptive task and its experimental corroboration support our prediction about the descriptive task. Approaching through interpretation predicts and observes considerable variety in the problems different subjects exhibit in the descriptive task, and even variety within the same subject at different times.

We can agree that some subjects may adopt something like the rational analysis model of the task, but disagree about the uniformity of this or any other interpretation. Most of all we do not accept that everyone is doing the same thing at the relevant level of detail. This situates our approach with regard to some prominent psychological theories of reasoning, and illustrates similarities and differences with extant approaches in the context of this one particular task. But our proposals also have general implications for how cognitive theories of reasoning relate to logical and linguistic theories of language and communication more generally. If we are anything like right about the selection task, it is both possible and necessary to bring the details of formal accounts of natural languages (semantics of deontics and descriptives, variable and constant anaphora, tense, definiteness, domain of interpretation, scope of negation, ... ) to bear in explaining the details of performance in laboratory reasoning tasks.

This is necessary because subjects’ behaviour in these tasks is continuous with generalised human capacities for communication, and possible because although strange in many ways, laboratory tasks have to be construed by subjects using their customary communicative skills. Once this apparatus is transferred to the psychological laboratory, it can yield powerful explanatory theories of why small details of the materials yield large changes in behaviour.

For example, the empirical evidence is that the dominant factor controlling behaviour in the selection task is the highly abstract formal distinction between deontic and descriptive interpretation. But finding out how the details of the materials trigger the application of this distinction is a complex matter.

Psychologists need the abstractions provided by semantics as a basis for studying implementations in the mind.

Logicians and linguists have much to gain from the data generated in the strange communications that go on in the psychological laboratory.

These communications put subjects’ interpretative skills under so much more stress than is customary, that they bring the interpretative issues to the surface. In fact laboratory tasks have much in common with the curious communicative situation that is formal education and another benefit of the current approach is that it stands to reconnect the psychology of reasoning with educational investigations.

With very few exceptions (e.g., Stanovich & West, 2000), psychologists of reasoning do not ask what educational significance their results have. They regard their theories as investigating ‘the fundamental human reasoning mechanism’ which is independent of education. On our account, the descriptive selection task is interesting precisely because it forces subjects to reason in vacuo and this process is closely related to extremely salient educational processes which are aimed exactly at equipping students with generalisable skills for reasoning in novel contexts more effectively. For example, the balance of required cooperative assumption of the background rule and adversarial test of the foreground rule in the descriptive selection task, is absolutely typical of the difficulties posed in the strange communications involved in examination questions.

Many cross-cultural observations of reasoning can be understood in terms of the kinds of discourse different cultures invoke in various circumstances.

The discourses established by formal education are a very distinctive characteristic of our culture (see e.g., Bloom & Broder, 1950; Hill & Parry, 1989).

There is often held to be something of a crisis in education in teaching these very reasoning and thinking skills.

The prejudice against logically based accounts of human reasoning cuts off the insights of psychology from application to the educational problem.

The community who teach reasoning are often as allergic to formal semantics as are psychologists of reasoning, largely because of past simplistic attempts to apply formal theories in monolithic ways.

Now that logic is less monolithic, the fields cannot afford to continue avoiding each other.


There are a great variety of specific deontic stances which all share this feature that they deal in what is ideal relative to some criterion.

‘Language’ should not be taken to be too literally here, since we do not want to exclude systems for reasoning with diagrams.

The following list with comments reflects the logician’s practice.

Textbooks are typically devoted to single, or at most a few, systems, and do not treat the matter in this generality.

An exception is Gabbay’s "Elementary Logics" although it has a more syntactic perspective than the one advocated here.

It is not possible here to describe these examples, which often have to do with ‘plausible inference’.

A worked-out example can be found in van der Does and van Lambalgen.

As a good analogy, the reader may think of statistical inference: on the basis of a statement ψ about a sample S, one concludes a statement ϕ about a population P from which S was drawn.

Hence the set of models relevant for the premises (samples) is disjoint from the set of models relevant to the conclusion (populations).

In the following, is the ‘makes true’ or ‘supports’ relation.

In the psychological literature one may sometimes find a superficially similar distinction between descriptive and deontic conditionals.

See, for example, Oaksford and Chater, who conceive of a deontic conditional as material implication plus an added numerical utility function.

The preceding proposal introduces a much more radical distinction in logical form.

Marian Counihan ran an additional 10 subjects.

The results show a striking resemblance to the ones reported here.

In that very small sample the base-line testing prior to tutoring showed no simple increase in proportion of completely correct performance (on the classical model), although tutoring in the two-rule task was more effective than in the classical task.

The actual experimental set-up is much more complicated and not quite comparable to the experiments reported here.

Four combinations, because the deictic back/face reading of ‘one side ... other side’ appeared to be too implausible to be considered.


Bloom, B. S., & Broder, L. J. (1950). Problem solving processes of college students. Chicago: University of Chicago Press. Byrne, R. M. J. (1989). Suppressing valid inferences with conditionals. Cognition, 31, 61–83.

Cara, F., and Girotto, V. Relevance theory explains the selection task. Cognition, 57.

Cheng, P., & Holyoak, K. (1985). Pragmatic reasoning schemas. Cognitive Psychology, 17, 391–416. Cosmides, L. (1989). The logic of social exchange: Has natural selection shaped how humans reason? Studies with the Wason selection task. Cognition, 31, 187–276. Cosmides, L., & Tooby, J. (1992). Cognitive adaptations for social exchange. In J. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind: Evolutionary psychology and the generation of culture (pp. 163–228). New York: Oxford University Press. Cummins, D. (1996). Evidence for the innateness of deontic reasoning. Mind and Language, 11, 160–190. Evans, J. (1972). Interpretation and ‘matching bias’ in a reasoning task. Quarterly Journal of Experimental Psychology, 24, 193–199.

Evans, J., Newstead, S., & Byrne, R. (1993). In S. Newstead & R. Byrne (Eds.), Human reasoning: The psychology of deduction. Hove: Lawrence Erlbaum. Evans, J., & Over, D. (1996). Rationality and reasoning. Hove: Psychology Press. Fiddick, L., Cosmides, L., & Tooby, J. (2000). The role of domain-specific representations and inferences in the Wason selection task. Cognition, 75, 1–79. Fillenbaum, S. (1978). How to do some things with it. In Cotton & Klatzky (Eds.), Semantic functions in cognition. Lawrence Erlbaum Associates. Gabbay, D. (1993). A general theory of structured consequence relations. In P. Schroeder-Heister & K. Došen (Eds.), Substructural logics. Oxford: Clarendon Press.

Gebauer, G., & Laming, D. (1997). Rational choices in Wason’s selection task. Psychological Research, 60, 284– 293.

Gigerenzer, G., & Hug, K. (1992). Domain-specific reasoning: Social contracts, cheating, and perspective change. Cognition, 43, 127–171.

Girotto, V., Kemmelmeier, M., van der Henst, J.-B. Inept reasoners of pragmatic virtuosos? Relevance in the deontic selection task. Cognition, 81, B69–B76.

Grice, H. P. Indicative conditionals.
-- Logic and conversation. In P. Cole and J. Morgan (Eds.), Syntax and semantics: Vol. 3. Speech acts. London: Academic Press.
-- Studies in the way of words.
-- Retrospective epilogue
-- Aspects of reason.

Griggs, R. A., & Cox, J. R. (1982). The elusive thematic materials effect in Wason’s selection task. British Journal of Psychology, 73, 407–420. Goodman, N. (1954). Fact, fiction and forecast. London University Press.

Henle, M. (1962). On the relation between logic and thinking. Psychological Review, 69, 366–378. K. Stenning, M. van Lambalgen / Cognitive Science 28 (2004) 481–529 529

Hill, C., & Parry, K. (1989). Autonomous and pragmatic models of literacy: Reading assessment in adult education. Linguistics and Education, 1, 233–283.

Hodges, W. (1993). The logical content of theories of deduction. Commentary on Johnson-Laird & Byrne Deduction. Behavioural and Brain Sciences, 16(2), 353–354.

Hughes, M. The use of negative information in concept attainment. Ph.D. thesis, University of London.

Johnson-Laird, P., & Byrne, R. (2002). Conditionals: A theory of meaning, pragmatics and inference. Psychological Review. Johnson-Laird, P., Legrenzi, P., Girotto, V., & Legrenzi, M. (2000). Illusions in reasoning about consistency. Science, 288, 531–532. Johnson-Laird, P., Legrenzi, P., & Legrenzi, S. (1972). Reasoning and a sense of reality. British Journal of Psychology, 63, 395–400. Johnson-Laird, P., & Savary, F. (1999). Illusory inferences: A novel class of erroneous deductions. Cognition, 71(3), 191–229. Kirby, K. (1994). Probabilities and utilities of fictional outcomes in Wason’s selection task. Cognition, 51(1), 1–28. Manktelow, K., & Over, D. (1990). Inference and understanding: A philosophical perspective. London: Routledge. Margolis, H. (1988). Patterns, thinking, and cognition: A theory of judgement. University of Chicago Press. Newstead, S. (1995). Gricean implicatures and syllogistic reasoning. Journal of Memory and Language, 34, 644– 664. Oaksford, M., & Chater, N. (1994). A rational analysis of the selection task as optimal data selection. Psychological Review, 101, 608–631. Platt, R., & Griggs, R. (1993). Facilitation in the abstract selection task: The effects of attentional and instructional factors. Quarterly Journal of Experimental Psychology-A, 46(4), 591–613. Rips, L. (1994). The psychology of proof. Cambridge, MA: MIT Press.

Cara, F., and Girotto, V. (1995). Relevance theory explains the selection task. Cognition, 57, 31–95.

Stanovich, K., & West. (2000). Individual differences in reasoning: Implications for the rationality debate? Behavioural and Brain Sciences, 23, 645–726.

Stenning, K. (2002). Seeing reason. Image and language in learning how to think. Oxford: Oxford University Press.

Stenning, K., & Cox, R. (submitted). Rethinking deductive tasks: Relating interpretation and reasoning through individual differences.

Stenning, K., & Oberlander, J. (1995). A cognitive theory of graphical and linguistic reasoning: logic and implementation. Cognitive Science, 19, 97–140.

Stenning, K., and van Lambalgen, M. Semantics and psychology: Wason’s selection task as a case study. Journal of Logic, Language and Information 10

Stenning, K., and van Lambalgen, M. A working memory model of relations between interpretation and reasoning,.

Stenning, K., and Yule, P. Image and language in human reasoning: A syllogistic illustration. Cognitive Psychology, 34

Stenning, K., Yule, P., and Cox, R. Quantifier interpretation and syllogistic reasoning. In G. W. Cattrell (Ed.), Proceedings of the 18th Annual Conference of the Cognitive Science Society, Erlbaum: Maweh, N.J.

Strawson, P. F. Introduction to logical theory.
-- "If" and "->" in PGRICE, Philosophical grounds of rationality: intentions, categories, ends. Oxford: Clarendon.

VanderDoes and vanLambalgen. A logic of vision. Linguistics and Philosophy 23.

vanDuyne, P. Realism and linguistic complexity. British Journal of Psychology, 65.

Wason, P. C. Reasoning about a rule. Quarterly Journal of Experimental Psychology, 20.

Wason, P. C. Problem solving. In R. Gregory (Ed.), Oxford companion to the mind.

Wason, P. C. and D. Green, Reasoning and mental representation. Quarterly Journal of Experimental Psychology, 36

Wason, P., and Johnson-Laird, P. Proving a disjunctive rule. Quarterly Journal of Experimental Psychology, 21.

Wetherick, N. E. On the representativeness of some experiments in cognition. Bulletin of the British Psychological Society, 23.

No comments:

Post a Comment