The Grice Club

Welcome

The Grice Club

The club for all those whose members have no (other) club.

Is Grice the greatest philosopher that ever lived?

Search This Blog

Saturday, June 12, 2010

Causation vs Correlation

I have to leave very soon on a task, so I will begin now by referring readers to Wikipedia's online "Correlation does not imply causation" for excellent summaries of the relationships between Causation and Correlation as far as mainstream mathematics/statistics/probability/science understand these ideas.

There are some fallacies and errors associated with the mainstream approach, especially the mainstream view of CORRELATION, which is a grouped or "aggregated" concept in the mainstream view but a relationship or association between TWO INDIVIDUALS or TWO INDIVIDUAL EVENTS in the Probable Causation/Influence (PI) view, that is to say the view of the quantity P(A-->B).

I will try to return in a few hours.

Osher Doctorow

22 comments:

  1. Good. We shall be looking forward to it. Especially since we have an anti-Humean amongs us (J). Just kidding.

    I discussed this in my PhD -- I HAD to write about logic, since, well, you know how PhD things are -- they are the outcome of some seminars you have attended, etc. And I had attended a long seminar on Susan Haack!

    So I tackled, er, ... "and". And there is NOTHING in 'and' which merits a 'causal' approach. There is this new film with Jennifer Lopes. I didn't see it, but read the promotion. It's called "Plan B". The poster has these three things:

    1. to fall in love
    2. to get married
    3. to have a child

    She adds, in naughty handwriting, "but not necessarily in that order" -- which is exactly Strawson's point about "and and /\" in Introduction to Logical Theory. He fails to note the 'implicature' which results from abiding by "be orderly". But why should we?

    (Grice's and Urmson's example, in Kramer's paraphrase, becomes, "She striped and went to bed" (versus "She went to bed and stripped").

    ----

    The other point has to do with Grice's frequent, indeed overfrequent or hyperfrequent (I discussed this with the best man I could at the time discussed this with: Stampe) use of 'cause'. He is seen as a 'causalist', after all ("Causal theory of perception", causal theory of meaning, beliefs causing intentions, etc.) -- but he often dodges a good ontological-cum-epistemological account of 'cause'. My vintage Grice seems to be the most amusing. In WoW he considers:

    "Charles I was beheaded".

    What was the cause of Charles I's death?

    Surely his having been beheaded. Yet, by pre-Humean accounts, 'cause' involves some 'willingness'. And yet, it would be ridiculous to say, to echo Grice's explicitly:

    (WoW, p. 162):

    "Charles I's decapitation willed his death"

    !

    ReplyDelete
  2. -- So we are led with transitinal correlation, i.e. time-sequential correlation, of the type Grice explores in terms of von-Wright's non-mainsstream logic of events. It's temporal sequencing that '... causes ...' replicates. There IS a sense of actual potency or 'causal power' (to echo Madden/Harre) but the point is what sense to make of that, qua ontologist.

    In "Actions and Events", Grice concludes that 'cause' is best seen, back as it WAS seen in Greek, in the first uses of 'aitia', as what's going on when we say that HE was a rebel 'without a cause'. I.e. he had no cause TO rebel. Not that there was nothing that provoked his rebelion. The finis, or telos, seems to be paradigmatic of 'cause' for the latter, teleological, finalist, Grice. A good way to end one's philosophy, I would gather.

    ReplyDelete
  3. Good. At least, I hope so.

    Let us now compare:

    1) (A-->B) = (AB ' ) ' = (A ' U B)
    2) (A<-->B) = (A-->B)(B-->A) = (A ' U B)(B ' U A) = A ' B ' U A ' A U BB ' U BA = AB U A ' B '

    from which (2) we get:

    3) (A<-->B) = (AB) U (A ' B ' )

    I have used BA here to be the same as AB rather than in the Conditional Probability sense of the probability of B given A (in fact, henceforth I will simply write the latter as P(B given A)). This is an accord with mainstream set theory and probability where AB = BA is the intersection of sets/events A and B (more often written as A upside-down U B instead of AB). Readers who prefer the logical analogs of everything can simply replace AB by a ^ b where a is the proposition that asserts the set/event A and b is the corresponding proposition for B, and replace A ' (the complement of A) by ~a (the negation of a) and so on., with U (set union) replaced by v or V (inclusive disjunction, that is, and/or, which is either a or b or both).

    We take probabilities of the sets in (1) and also in (3) to obtain:

    4) P(A-->B) = 1 + P(AB) - P(A)
    5) P(A<-->B) = P(AB) + P(A ' B ' )

    with A ' B ' the intersection of A ' and B ' .

    Notice that the addition + in (5) is because AB and A ' B ' are disjoint or mutually exclusive (A and its complement A ' for example do not intersect except in the null set).

    I have stated that P(A-->B) is the "Probability that A influences or Causes B", which can be abbreviated as Probable Causation/Influence of B by A, or PI of P by A.

    Now P(A<-->B) is a candidate for "The Probable Correlation of A and B". The idea here is that it is the probability that A and B occur together.

    Readers who prefer Correlation as a group or aggregated quantity obtained from calculating many data points (literally an "averaged" quantity) should realize that Correlation, or more properly Mainstream Probability-Statistical Correlation, involves two different quantities:

    6) A theoretical POPULATION Correlation which is regarded as a fixed characteristic of the entire population regardless of how one takes samples or random samples from that population.

    7) A SAMPLE Correlation which is calculated from a particular random sample of specified size from the population.

    The first, (6), and the second, (7), are BOTH Aggregates or Grouped quantities, which in rough English mean:

    8) The average degree of SIMULTANEOUS RELATIONSHIP between 2 variables, let us say U and V, as calculated for either all Individuals in the population or all Individuals in a random sample of particular size from that population.

    The particular simultaneous relationship chosen in the most commonly used scenarios in physical sciences and pure mathematics is COVARIANCE, which is a generalization of VARIANCE. VARIANCE in rough English means VARIABILITY or "fluctuation" - how much a variable fluctuates from its population or sample mean or (for "nice" variables their central value). COVARIANCE in rough English means how much two variables fluctuate from their central values when multiplied together.

    If readers detect an arbitrariness in the definitions or rough definitions of the previous paragraph, then it should come as no surprise to them that there is an arbitrariness in regard P(A<-->B) as "Probable Correlation" in the new theory that I am referring to as "PI" (Probable Causation/Influence). The question then becomes, which "arbitrariness" is more useful, or should both be used, and where?

    I will hopefully eventually return to these questions, but readers can also realize that the simplest-appearing ideas may have slight variations or alternatives that are only ignored at one's "peril" or "danger" in terms of restricting knowledge or wisdom.

    Osher Doctorow

    ReplyDelete
  4. I should slightly modify my statement below (5) that P(A<-->B) is the probability that A and B occur together. It is actually the probability that (A-->B) and (B-->A) occur together, which translates into "The probability that the influence or causation of A on B and the influence or causation of B on A occur together." So one can simply if somewhat roughly say that P(A<-->B) is the probability that A and B occur together in their mutual influences or mutual causations!

    Also, notice that (A)(B) translates as the intersection of A and B, which is also written as AB or BA except that when A is a more complicated expression like (C U D), then it is better to write (C U D)(B) than CUDB which latter creates confusion, and similarly if B is a more complicated expression like (E U F).

    Osher Doctorow

    ReplyDelete
  5. There is even more arbitrariness in the Mainstream (the non-PI) definition of Correlation than I have indicated. In addition to all of the above, the Mainstream definition gets NORMALIZED or STANDARDIZED which roughly means it gets divided by the product of the standard deviations of the two variables, where the Variance is the standard deviation multiplied by itself (the "square" of the standard deviation).

    Readers can begin to see why Einstein, Heisenberg, Bohr, Schrodinger, Born and others had so many arguments in quantum physics - many of the arguments implicitly involved probabilities, variances, covariances, correlations, and so on, in none of which any of the physicists were mathematical experts and none of which had been subjected to philosophical analysis of ALTERNATIVE FORMULATIONS (at least, as far as the records or literature record).

    Osher Doctorow

    ReplyDelete
  6. Good! To add: a few notes from von-Wright directly from Grice, which may relate:

    --- Since the thing is a stretch I will propose it in an independent blog post.

    ReplyDelete
  7. I attempted to post a comment after the last one by J. L. Speranza, but it did not print - I think that it was too long and "timed out".

    To summarize much more briefly, from the equation:

    1) P(A<-->B) = P(AB) + P(A ' B ' )

    we see that Probable Correlation interpreted as P(A<-->B) has the desirable property of "almost" being P(AB), the probability that A and B occur together, except for the quantity P(A ' B ' ), also written P(A'B'). If the latter is 0, then it exactly is P(AB). This is even simpler than the mainstream probability-statistics Correlation.

    Although (1) is much more simple than most statistical equations or calculations, it has a similar meaning to simple physics laws such as Newton's F = ma or the force of gravity F = Gm1m2/r-squared, namely that it holds for all possible values of the variables or constants or quantities involved EXACTLY, and so is equivalent to a large (even possibly infinite) number of exact scientific or "mathematical" experiments.

    Osher Doctorow

    ReplyDelete
  8. Up to this point, we have been considering (random) set/events A, B, and I will explain roughly why everything also holds for random variables (typically written X, Y, Z, W, etc.).

    In mainstream probability-statistics, the univariate Cumulative Distribution Function (cdf) F or F_X or FX for brevity (if no confusion is likely with multiplication) of random variable X is defined as:

    1) FX(x) = P(X < = x), where x is a particular numerical value of the random variable X.

    Let A or A_X or AX be defined as:

    2) A = (X < = x)

    where (X < = x) is the set:

    3) (X < = x) = {w: X(w) < = x}

    where w is any point such that random variable X evaluated at the point w satisfies X(w) < = x.

    We then have:

    4) P(A) = FX(x).

    More precisely, let us write A_X(x) = (X < = x), then we have:

    5) P(A_X(x)) = P(X < = x) = FX(x)

    We can now define:

    6) FX-->Y(x, y) = P(X-->Y)(x, y) = P(A_X(x) --> B_Y(y)) where B_Y(y) = (Y < = y) = {w: Y(w) < = y}.

    We then have, from P(A-->B) = 1 + P(AB) - P(A):

    7) FX-->Y(x, y) = 1 + F(x, y) - FX(x)

    where F(x, y) = P(X < = x AND Y < = x) = P(X < = x, Y < = x) is the bivariate cdf of random variables X and Y.

    Osher Doctorow

    ReplyDelete
  9. It turns out that every random variable X is uniquely specified by its cdf F_X or FX or F if no confusion is likely.

    If random variable X has a probability density function (pdf) f or f_X or fX, then a similar uniqueness holds, and we can define:

    1) fX-->Y(x, y) = 1 + f(x, y) - fX(x)

    where f(x, y) is the bivariate pdf of random variables X and Y.

    So everything that has been described for (random) sets A, B, also goes through for random variables X, Y, and for their cdfs FX, FY, and if they exist for their pdfs fX, fY.

    Osher Doctorow

    ReplyDelete
  10. I like that. I have been following Grandy/Warner (in the Stanford entry cited by Doctorow in "Probably Grice: Grice on probability") in using 'p' and 'q'! (Who says I don't mind them? -- vide Robin Talmach Lakoff, "On minding your ps and qs"). Matter of fact, Grandy/Warner use A and B! I use the square brackets so often that I forget about things. So why am _I_ sticking with ps and qs? Should find out (perhaps via Grice).

    In any case, I love the idea to generalise to x and y!

    --- (and maybe z, when I feel that 'implies' can allow for triadicity or something!)

    ReplyDelete
  11. J. L. Speranza, thank you! The philosophers and philosophers of education at the Berkeley symposium in 1981 had similar comments and even recommended that I generalize to random variables rather than just probability. It took me a long time to figure out what the appropriately generalizing set is (A, B or A_x, B_x, and so on), but I found it among the work that I had already done with Marleen and which I had just thought of as a curiosity until then!

    You are correct about x, y, z, and other generalizations! In fact, a remarkable equation can be proven by mathematical induction:

    1) P(A1 <--> A2 <--> ... <--> An) = P(A1A2...An) + P(A1 ' A2 ' ... An ' ) where n is a positive integer > 1.

    I changed the labels to A1, A2, and so on because using A, B, C, ..., Z would seem to involve the claim that only 26 letters are relevant!

    Equation (1) is a "Multiple Correlation"!

    By the way, one can also prove that:

    2) (A1<-->A2<--> ... <-->An) = (A1A2...An) U (A1 ' U A2 ' U ..., U An ' )

    Try this first for A1, A2, and A1, A2, A3, and you will probably see the pattern of the proof.

    Osher Doctorow

    ReplyDelete
  12. I should add that:

    1) (A1<-->A2<-->A3) = (A1<-->A2)(A2<-->A3)

    where adjacent parentheses represent intersecting sets. Similarly for more sets.

    Osher Doctorow

    ReplyDelete
  13. Yes, there is this idea that A, B, C, and so on, gives you the limit of the 26 letters!

    I once discussed this with Timothy Wharton when he was researching onto natural signs. For I had found a reference, cryptic no doubt, in Borges, "The Library of Babel" to the effect that the 26 letters were 'natural signs' of something. I forget what!

    I suppose the idea of using 'x', 'y', and 'z' as the name for the variables has a long pedigree. I hope it's not just something as modern as Arabic!

    Apparently, Aristotle -- who is said never to have trodden the path of formal logic, wrongly -- is seen to use "A" and "B" (i.e. alpha and beta in the upper-cases) as names for stuff. Propositions, I think.

    ReplyDelete
  14. One type of superstring theory has 26 dimensions. Most superstring and supersymmetry theories in physics have 10 or 11 dimensions. Kaluza-Klein theory has 5 dimensions.

    I should mention that we can prove:

    1) P(A-->B) > = P(B given A) if P(A) is not 0.

    Proof. Consider P(A-->B) - P(B given A) = 1 + P(AB) - P(A) - (P(AB)/P(A) = 1 + y - x - y/x if we define y = P(AB) and x = P(A). Then we have:

    2) 1 + y - x - y/x = 1 - x + y - y/x = (1 - x) + y(1 - 1/x) = (1 - x) + (y/x)(x - 1) = (1 - x)(1 - y/x) = (1 - x)((x - y)/x

    and since x is a probability, x < = 1 so 1 - x > = 0, and x - y > = 0 because AB is a subset of A and therefore P(AB) = y < = P(A) = x. Finally, x > 0 since we excluded x = 0 (because then conditional probability y/x would be undefined). We know that the product of 3 quantities > = 0 (namely, 1 - x, x - y, and 1/x (the latter equivalent to dividing by x) is > = 0. Q.E.D.

    In English:

    3) The Probable Causation/Influence (PI) of A on B is greater than or equal to the Conditional Probability of B given A, for all random set/events A, B, whenever PI and Conditional Probability are both defined (that is, whenever P(A) is not 0).

    This relationship between PI and Conditional Probability removes some of the "arbitrariness" that might be thought characteristic of PI, and further focuses in on relationships between "given" and "influences/causes" in these contexts.

    Osher Doctorow

    ReplyDelete
  15. Even more remarkably, we can actually DEFINE conditional probability in terms of PI (Probable Causation/Influence). In fact:

    1) P(B given A) = [P(A-->B) + P(A) - 1]/P(A)

    if P(A) is not 0.

    Proof. Since P(B given A) = P(AB)/P(A), we have:

    2) P(AB) = P(B given A)P(A)

    and since P(A-->B) = 1 + P(AB) - P(A), we have:

    3) P(AB) = P(A-->B) - 1 + P(A)

    Now set P(AB) of (2) equal to P(AB) of (3), and the result is almost immediate after dividing both sides of the resulting equation by P(A). Q.E.D.

    Not only can Conditional Probability (CP for short here, although it is not a common abbreviation) be defined in terms of PI, but PI actually can be regarded as a generalisation of CP because PI is defined even when CP is not, namely when P(A) = 0.

    Osher Doctorow

    ReplyDelete
  16. For the record, the reference in Borges, cryptic as it is:

    http://jubal.westnet.com/hyperdiscordia/library_of_babel.html

    "The orthographical symbols are twenty-five in number."

    i.e. A B C D ... etc.

    "[Some librarians] admit that the inventors
    of this writing imitated the
    twenty-five natural symbols, but
    maintain that this application is accidental and that the books signify nothing in themselves. This dictum, we shall see, is not entirely fallacious."

    ReplyDelete
  17. The link listed only partially worked, but I was able to access the Library of Babel by keywords "Library of Babel". It will take me some time (to say the least) to read it.

    I should mention that some readers may wonder whether, if Probable Causation/Influence (PI) merely generalizes Conditional Probability (CP) to the case where P(A) = 0 (which case is undefined in CP), is this generalization valuable or almost trivial?

    A counterexample to "almost trivial" is the 26 letters of English. It is not "almost trivial" to add 1 letter to the "other 25".

    Even more remarkable is what happens for probabilities which are NEAR 0 but not exactly equal to 0. It turns out that when P(A) is near 0 in P(A-->B), then there are many cases where P(B given A) and P(A-->B) differ by very nearly 1, which is (1) the most that two probabilities can differ by! So it is not only one value P(A) = 0 which is affected by the lack of definition of P(B given A) when P(A) = 0, but very many (it turns out to be infinitely many) nearby probabilities (probabilities near 0). I will not go through the proof here, but readers might for example try constructing examples where P(A) is 5 times larger than P(AB) or 5 times larger than P(B), and so on, and study what happens to the equations.

    Osher Doctorow

    ReplyDelete
  18. Here I will attempt to return to a more intuitive view of PI as causation, after slightly correcting my rough English.

    It is true that Probable Causation/Influence (PI) generalises the IDEA behind Conditional Probability (CP), but PI does not technically generalise CP just as subtraction does not generalise division. More can be said about this later.

    So what is the IDEA behind Causation, and what is the IDEA behind Probability? There were all types of intuitive ideas about this as far back as Pierre de Fermat of early 1600s France who co-discovered "Probability" and perhaps further back to Ancient times. Let us, however, begin in a rather surprising place and time and person, with Lebesgue of France (Henri Lebesgue) in 1902, who discovered LEBESGUE MEASURE.

    Readers can undoubtedly obtain brief and simple summaries of "Lebesgue Measure" online under those keywords, especially Wikipedia's and Wolfram's/Weisstein's articles on them. A more detailed study for postgraduate mathematicians is:

    1) Hewitt, Edwin, and Stromberg, Karl, "Real and Abstract Analysis," (1965) Springer-Verlag: New York, Heidelberg.

    I must warn readers that Hewitt et al (1965) is almost illegible to non-mathematicians, which is a good reason to attempt to translate it into ordinary simple English, although it may take a lifetime to do it (well worth it, compared to many of the alternative modern fads, I think).

    What in the world (or elsewhere) is a "measure" in real and abstract analysis? In fact, what is real and abstract analysis? It is what calculus becomes when generalised. "Calculus" is very roughly the mathematical study of (a) speeds, accelerations, and similar things, and (b) lengths, areas, volumes or contents, and similar things.

    Since I am almost out of time for this post, I will indicate the direction in which I am going. "Probability" is a type of (generalised) Lebesgue Measure whose most applicable form to Causation is roughly speaking VOLUME or CONTENTS. "Causation" is roughly speaking a time-related VOLUME or AREA or LENGTH which most deeply involves two times, the time of the "Cause" and the time of the "Effect", but can involve simultaneity when the "Cause" and "Effect" occur at the same time.

    Osher Doctorow

    ReplyDelete
  19. Readers may begin to suspect that I belong to the Real and Abstract Analysis "school" of mathematics, which in turn leads most directly and logically to Differential Equations and Probability, and which tend to be oriented toward CONTINUOUS and CONNECTED rather than DISCRETE or SEPARATE things although they can study both and explain both. I do belong to that school of thinking. Many researchers in Artificial Intelligence and Engineering and some in physics belong to the DISCRETE school.

    In simple language, an uncut piece of string is CONNECTED, while a bunch of separate things like people or stars or whole numbers (1, 2, 3, and so on) are in general DISCONNECTED.

    Since I will be referring to VOLUME rather frequently (hopefully), readers should be sure that they have an intuitive picture or idea of volume. One can proceed in 3 steps:

    1) Do you have an intuitive idea of LENGTH? Most people seem to have this. If I ask you which piece of string is longer, then you usually will not hesitate much in deciding. If asked to describe the length of a string, you will probably say something like "it is something visible and involves how far apart the ends of the string are." It can also be described roughly as the SPATIAL MEASURE OF A STRING in this context.

    2) Do you have an intuitive idea of AREA? Many people would hesitate if asked this without further words, but if one then adds "Area of a carpet", "Area of a farm", then one often recalls the idea of multiplying the length times the width of the carpet (if it is shaped like a rectangle or square) to get the area, and one can describe the area as a MEASURE of the SPACE that the carpet or farm occupies, regarded as a FLAT space (which roughly has the name 2-dimensional space). A piece of connected string is roughly has the name 1-dimensional space.

    3) Do you have an intuitive idea of VOLUME? This is where one tends to lose more people. If you ask a person "How much space is there in this bottle?" then that person is arguably likely to get the idea of VOLUME. To measure it, he/she might then think: I will roughly speaking multiply its height times its width times its length. This may be simpler with a box: a box has a bottom, which has width and length. It also has a height, so one multiplies the width times the length and then multiplies tihs result by the height to get the volume of the box. Eventually, one will probably get the idea that the VOLUME of a bottle or box is a MEASURE of the SPACE that the bottle or box occupies - "how much space" it occupies.

    I will stop here in order to not "time out", and hopefully I will continue later.

    Osher Doctorow

    ReplyDelete
  20. A MEASURE in Real and Abstract Analysis is roughly speaking a generalisation of HOW MUCH SPACE something occupies. Strings are regarded as 1-dimensional, rugs or farmland as 2-dimensional, and bottles or boxes as 3-dimensional, and all occupy space. The space is referred to as Euclidean Space.

    Try the following "generalised thought experiment":

    1) Ask yourself how much your childhood training CAUSED your present ideas, behavior, and so on.

    2) Try to visualise a "time distance", that is, the "length of time" between your childhood and the present. That should be arguably part of your answer.

    3) Try to visualize the remainder of the "causation" as a spatial or spacetime volume between your childhood and your present, along your "world line" which is the path that your life is following through space and time.

    Arguably, when you are asked about CAUSATION in similar contexts, you actually think roughly in these terms if you think at all. You think of CAUSATION as a Measure between two times (childhood and the present) and between you and your childhood training in space. Since space is 3-dimensional and time is usually regarded as 1-dimensional, you are thinking in terms of a 4-dimensional MEASURE of spacetime. Let us describe this simply:

    4) CAUSATION and PROBABLE CAUSATION are 4-dimensional MEASURES of spacetime between two times, the time of the "Cause" and the time of the "Effect".

    Osher Doctorow

    ReplyDelete
  21. It might be argued that we do not usually really think this way, but as I mentioned in reply to J L Speranza a few minutes ago, Grice and G. Mackey and later Jauch and M. Jammer supplied the "anwer". As Grice argued, when we are explicit about our reasoning, we CONSTRUCT its steps rather than making previously given premises explicit, something like Jauch's "yes-no" experiments or propositions in Quantum Logic.

    Again, I refer readers to the very valuable:

    1) Jammer, Max (1974) "The philosophy of quantum mechanics," Wiley: N.Y., London, especially the chapter on Quantum Logic.

    References to Mackey and J. M. Jauch and his colleague C. Piron are found therein, in fact with detailed discussion in remarkably simple English, although reference to Wikipedia and Wolfram/Weisstein online will help. Piron also has a remarkably simple short book filled with equations, but remarkably simple equations not much more difficult than a = b + c! It is found in most good university physics or science research libraries.

    Osher Doctorow

    ReplyDelete
  22. Very good point about your 'constructing the steps'. Personally, I have no qualms or problems or anything with "Gedanke experimenten". After all, Grice is credited with having brought Martians into the picture (in his early 1965, "Some remarks about the senses", predating Putnam for a few). So I'll try to follow your gedanke experimente and supply replies of the results from my gedanke laboratory.

    ReplyDelete