What is number? By ‘number’ I mean whole number, positive integer such as 2, 34, 1457…  Conceptually, number is hard to pin down and the modern mathematical treatment which makes number depend on formal logical principles does not help us to understand what number ‘really’ is.  “He who sees things in their growth and first origins will obtain the clearest view of them” wrote Aristotle. So maybe we should start by asking why mankind ever bothered with numbers in the first place and what  mental/physical capacities are required to use them with confidence.

It would seem that numbers were invented principally to record data. Animals  get along perfectly well without a number system and innumerable ‘primitive’ peoples possessed a very rudimentary one, sometimes just one, two, three. The aborigine who often possessed no more than four tools did not need to count them and the pastoralist typically had a highly developed visual memory of his herd, an ability we have largely lost precisely because we don’t use it in our daily life (Note 1). It was the large, centrally controlled empires of the Middle East like Assyria and Babylon that developed both arithmetic and writing. The reasons are obvious: a hunter, goatherd or subsistence farmer in constant contact with his small store of worldly goods, does not need records, but a state official in charge of a large area does. Numbers were developed for the purpose of trade, stock-taking, assessment of military strength and above all taxation ― even today I would guess that numbers are employed more for bureaucratic purposes than scientific or pure mathematical ones (Note 2).

What about the required mental capacities? There are two, and seemingly only two, essential requirements. Firstly, one must be able to make a hard and fast distinction between what is ‘singular’ and ‘plural’, between a ‘one’ and a ‘more-than-one’. Secondly, one must be capable of ‘pairing off’ objects taken from two different sets, what mathematicians call carrying out a ’one-one correspondence’.

The difference between ‘one’ and ‘more-than-one’ is basic to human culture (Note 3). If we actually lived in a completely unified world, or felt that we did, there would be no need for numbers. Interestingly, in at least one ancient language, the word for ‘one’ is the same as the word for ‘alone’: without an initial sense of estrangement from the rest of the universe, number systems, and everything that is built upon them, would never have been invented.

‘Pairing off’ two sets, apples and pears for example, is the most basic procedure in the whole of mathematics and it was only relatively recently (end of 19th century) that it became clear that the basic arithmetic operations and numbers themselves        originate in messing about with collections of objects. Given any two sets, the first set A can either be paired off exactly member for member with the second set B, or it cannot be. In the negative case, the reason may be that A does not ‘stretch to’ B, i.e. is ‘less than’ (<), or alternatively it ‘goes beyond’ B (>), i.e. has at least one object to spare after a pairing off. Given two clearly defined sets of objects, one, and only one, of these three cases must apply.

But as Piaget points out, the ability to pair off, say, apples and pears is not sufficient. A child may be able to do this but baulk at pairing off shoes and chairs, or apples and people. To be fully numerate, one needs to be able, at least in principle, to ‘pair off’ any two collections of discrete objects. Not only children but whole societies hesitated at this point, considering that certain collections of things are ‘not to be compared’ because they are too different. One collection might be far away like the stars, the other near at hand, one collection comprised of  living things, the other of dead things and so on.

This gives us the cognitive baggage to be ‘numerate’ but a further step is necessary before we have a fully functioning number system. The society, tribe or social group  needs to decide on a set of more or less identical objects (later marks) which are to be the standard set against which all other sets are to be compared. So-called ‘primitive’ peoples used shells, beans or sticks as numbers for thousands of years and within living memory the Wedda of Ceylon carried out transactions with bundles of ‘number sticks’. Yoruba state officials of the Benin empire in Nigeria performed quite complicated additions and subtractions using only heaps of cowrie shells. Note that the use of a standard set is an enormous cultural and social advance  from simply pairing off specific sets of objects. The cowboy who had “so many notches on his gun” was (presumably) doing the latter, i.e. pairing off one dead man with one notch, and doubtless used other marks or words to refer to other objects. In many societies there were several sets of number words, marks or objects in use simultaneously, the choice depending on context or the objects being counted (Note 4).

So what are the criteria for the choice of a standard set? It is essential that the objects (or marks) chosen should be more or less identical since the whole principle of numbering is that individual differences such as colour, weight, shape and so on are irrelevant numerically speaking. Number is a sort of informational minimum: of all the information available we throw away practically everything since all that matters is how the objects concerned pair off with those of our standard set. Number, which is based on distinction by quantity, required a cultural and social revolution since it had to replace distinction by type which was far more important to the hunter/foodgatherer ― comestible or poisonous, friend or foe, male or female.

Secondly, we want a plentiful supply of object numbers so the chosen ‘one-object’ must be abundant or easy to make, thus the use of shells, sticks and beans. Thirdly, the chosen ‘one-object’ must be portable and thus fairly small and light. Fourthly, it is essential that the number objects do not fuse or adhere to each other when brought into close proximity.

All these requirements make the choice of a basic number object (or object-number) by no means as simple as it might appear and eventually led to the use of marks on a background such as charcoal strokes on plaster, or knots in a cord, rather than objects as such.

Numbering has come a long way since the use of shells or scratches on bones but the ingenious improvements leading up to our current Arab/Hindu place value number system have largely obscured the underlying principles of numbering.      The choice of a ‘one-object’, or mark, plus the ability to replicate this object or mark more or less indefinitely is the basis of a number system. The principal improvements subsequent to the replacement of number-objects by ‘number-marks’, have been ‘cipherisation’ and the use of bases.

In the case of cipherisation we allow a single word or mark to represent what is in fact a ‘more than one’, thus contradicting the basic distinction on which numbering depends. If we take 1 as our ‘one-mark’, 11111 ought by rights to represent what we call five and write as 5. Though this step was long to come,  the motivation is obvious: simple repetition is cumbersome and leads to error ― one can with difficulty distinguish 1111111 from 111111. Verbal number systems seem to have led the way in this: no  languages I know of say ‘one-one-one’ for 3 and very few simply repeat a ‘one-word’ and a ‘two-word’ (though there are examples of this).

The use of bases such as our base 10, depends on the idea of a ‘greater one’, i.e. an object that is at once ‘one’ and ‘more-than-one’ such as a tight bundle of similar sticks. And if we now extend the principle and make an even bigger bundle out of the previous bundles while keeping the ‘scaling’ the same, we have a fully fledged base number system. The choice of ten for a base is most likely a historical accident since we have exactly five fingers and thumbs on each hand. The hand was the first computer and finger counting was widely practiced until quite recent times: the Venerable Bede wrote a treatise on the subject.

The final advance was the use of ‘place value’: by shifting the mark to the left (or right in some cases) you make it ‘bigger’ by the same factor as your chosen base. Although we don’t see it like this, 4567, is a concise way of writing four thousands, five hundreds, six tens and seven ones. 

Human beings, especially in the modern world, spend a vast amount of time and effort moving objects from one location to another, from one country to another or from supermarket to kitchen. One set of possessions increases in size, another decreases, giving rise to the arithmetic operations of ‘adding’ and ‘subtraction’. And to make the vast array of material things manageable we need to divide them up neatly into subsets. And  for stock-taking and related purposes, we need agreed numerical symbols for the objects and people being shifted about. A tribal society can afford to ignore numbers (but not shape), an empire cannot.     SH




Note 1 A missionary in South America noted with amnazement that some tribes like the Abipone had only three number words but during migration could see at a glance from their saddles whether a single one of their dogs was missing out of the ‘immense horde’. From Menninger, Number Words and Number Symbols p. 10    

Note 2  To judge by the sort of problems they tackled, the Babylonian and Egyptian scribes were obviously interested in numbers for their own sake as well, i.e. were already pure mathematicians, but the primary motivation was undoubtedly socio-economic. Even geometry, which comes from the Greek word for ‘land-measurement’, was originally developed by the Egyptians in order to tax peasants with irregular shaped plots bordering the Nile.


Note 3  Some historians and ethnologists argue that the tripartite distinction ‘one-two-more than two’, rather than ‘one-many’, is the basic distinction. Thus the cases singular, dual and plural of certain ancient languages such as Greek.


Note 4  The Nootkans, for example, had different terms for counting or speaking of (a) people or salmon; (b) anything round in shape (c) anything long and narrow. And modern Japanese retains ‘numerical classifiers’.

Leibnitz's Formula

Leibnitz’s Formula for π


one of the peculiarities  of π, the ratio of circumference of a circle to its diameter and thus a strictly geometric entity, is that it comes up in all sorts of unexpected places, thus giving rise to the belief, common amongst pure mathematicians, that Nature has a sort of basic kit of numbers, including notably π, e, i and Γ that She applies here, there and everywhere. Buffon, the eighteenth-century French naturalist, worked out  a formula giving the probability of a needle of length l dropped at random onto a floor ruled with parallel lines at unit intervals cutting at least one line. If l  is less than a unit in length, the formula turns out to be 2l/π  and this result has even been tested experimentally by a modern scientist, Kahan. Actually, in this case and very many others, there is a perfectly rational connection between the formula and the properties of circles, but I must admit that I am floored by the connection between π and the Gamma Function in the weird and rather beautiful result  Γ(1/2) =  √π

          π also turns up as the limit to various numerical series, a matter which in the past was of considerable importance as manufacturing methods required better and better estimates of the value of π. Today, computers have calculated the value of π to over a million decimal places so the question of exactitude has become academic — although computers still use formulae originally discovered by pure mathematicians such as Euler or Ramanujan.

          Leibnitz, co-inventor of the Calculus, produced several centuries ago, somewhat out of a hat, the remarkable series    


                   π/4  =  1/3  +  1/5  – 1/7  +  ……


          British mathematicians, eager to give as much credit as possible to Newton, pointed out that a Scot, Gregory, had already derived, using Newton’s version of the calculus, the formula


                   tan-1 x = x  1/3 x2 + 1/5 x2  ……  


and that you obtain Leibnitz’s formula by setting x = 1.

          However, apart from the question of priority, one might reasonably wonder why it should be necessary to bring in calculus to validate such a simple-looking series. A problem in so-called elementary number theory should, so I feel at any rate, make no appeal to the methods of analysis or any other ‘higher’ mathematics but rely uniquely on the properties of the natural numbers. I feel so strongly about this that I had at one point even thought of offering a small money reward for a strictly numerical proof of Leibnitz’s famous series, but I am glad I did not do so, since I have subsequently come across one in Hilbert’s excellent book, Geometry and the Imagination.

          The complete proof is not at all easy — ‘elementary’ proofs in Number Theory are not necessarily simple, far from it — but the general drift of the argument is straightforward enough.

          Consider a circle whose centre is at the origin with radius r , r a positive integer (> 0). The formula for the circle is thus x2 + y2 = r2 .

          We  mark off lattice points to make a network of squares (or use squared paper), and take each lattice as having a side of unit length.

          For any given choice of circle (with r > 1), there will be squares which ‘overlap’, part of the square falling within the circumference and part falling outside the circumference and a single point counts as ‘part’ of a square.

          We define a function f(r) with r a positive integer to be the sum total of all lattices where the bottom left hand corner of the lattice is either inside or on the circumference of a circle radius r. (Any other criterion, such as counting a square ‘when there is more than half its area inside the circle’, would do so long as we stick to it, but there are good reasons for choosing this ‘left hand corner’ criterion, as will shortly be apparent.)

          It is not clear at a glance whether the lattice area, evaluated according to our left hand corner criterion, is larger or smaller than the true area of the circle. However, as we make the lattices smaller and smaller, i.e. increase r, we expect the difference to diminish progressively. 

           f(1) = 5   — remember we are counting the squares where only  the left hand corner point lies on the circumference. I make f(2) come to 13 and f(3) come to 29, while the two higher values given below are taken from Hilbert’s book Geometry and the Imagination :


                   f(2)     =       13             

                   f(3)     =       29  

                   f(10)   =     317

                   f(100) = 31417       


          The absolute value of the difference between the lattice area, f(r), evaluated simply by counting the relevant lattices, and the area of the circle, π r2, is |f(r) – π r2|  If we use f(r) as a rough and ready estimate of the area of the circle and divide by r2 we thus get an estimate of the value of π  obtaining


          π   13/4   = 3.125

 π     29/9  = 3.222222…

 π      317/100  =  3.17

 π      31417/1000 = 3.1417


          Now, since the diagonal of a unit square lattice is 2, all the ‘borderline cases’ will be included within a circular annulus bounded within by a circle of radius of (r Ö2)  and without by a circle of radius (r + 2).

          The area of this annulus is the difference between the larger and smaller circles, i.e.


          [(r + 2)2 π  –  (r + 2)2 π]  =  4 2 π r 


          |f(r)  – π r2|, the discrepancy between the lattice area and the area  of the circle, is bound to be less than the annulus area since some lattices falling within the annulus area get counted in f(r), and certainly f(r) cannot be greater than the annulus area.   


                   |f(r)  – π r2|  ≤  4 2 π r   


which, dividing right through by r2 gives


                             |f(r)/r2 π| ≤   4 2 π/r          …………………..(i)


          Now, assuming Cartesian coordinates with 0 as the centre of the circle, for any value of r there will be a certain number of points which lie on the circumference of the circle, those points (x, y) which satisfy the equation


                   (x2 + y2) = r2  where r is a positive integer (> 0).


          But we must count all the negative values of x and y as well. For example, with r = 2, the circumference will pass through the lattice points (2, 0), (2, 0), (0, 2) and (0, 2) and no others.

          We now introduce a new variable n = r2 making the radius Ön and the equation of the circle becomes x2 + y2 = n   Although n must be an integer, we lift the restriction on r so that the radius is not necessarily an integer, e.g. r = √7, r = 13 and so on.     

          Now, the number of lattice points on the circumference of a circle with radius n is equivalent to four times the number of ways that an integer n can be expressed as the sum of two squares — four times because we allow x and y to take minus values. This is strictly a problem in Number Theory and an important theorem states that


          The number of ways in which an integer can be expressed as the sum of the squares of two integers is equal to four times the excess of the number of factors of n having the form 4k + 1 over the number of factors having the form 4k + 3.


          Take 35 = 5 × 7. We have as factors of 35 : 1, 5, 7 and 35 which are respectively

                   1 (mod 4)

                   1 (mod 4)

                   3 (mod 4)

                   3 (mod 4)


          Since there are two of each type and 2 –- 2 = 0 there is no excess of the (4k+ 1) type and so, if the theorem is correct, 35 cannot be represented as the sum of two squares, which is the case.

          The proof of the theorem is quite complicated and will not be attempted here. What we can show at once is that


          No prime p which is 3 (mod 4) can be represented as the sum of two (integer) squares.


          This is so because any odd number, whether it be 1 or 3 (mod 4), will be 1 (mod 4) when squared. And every even number, whether 2 or 0 (mod 4) will be 0 (mod 4) when squared. So if p happens to be 3 (mod 4) like 7 or 11, it will have no representation as the sum of two squares, i.e. the equation a2 + b2  = 3 (mod m) is insoluble in integers.

          However, if p prime is 1 (mod 4) it may be possible to find a representation in two squares since (4k+1)2 + even2 = 1 (mod 4) is possible. A theorem given by Fermat, which goes some way towards establishing the principal theorem, states that


          An odd prime p is expressible as the sum of two squares if and only if p = 1 (mod 4)      


          The ‘if’ part means that every odd prime p such as 5, 13, 17 and so on can be expressed as the sum of two squares.  13 = 32 + 22 for example and 17 = 42 + 12.


          From our point of view, any representation such as 5 = 12 + 22 gives us eight  lattice points, four for the different ways of forming (12 + 22) and four for the different ways of forming (22 + 12) i.e. the lattice points with coordinates


                  (1, 2), (1, 2), (1, 2), (1, 2)


and those with coordinates


                    (2, 1), (2, 1), (2, 1), (2, 1) 


          65 = 5 ´ 13   has factors, 1, 5, 13 and 65 all of which are positive integers which are 1 (mod 4).  There should, then, be four different ways of representing 65 as the sum of two squares, where the order in which we write the two squares matters. And in effect we have


          65 = (12 + 82) = (82 + 12)  =   (42 + 72) = (72 + 42)


We end up with eight lattice points for each combination, namely


          (1, 8), (1, 8), (1, 8), (1, 8),

          (8, 1), (8, 1), (8, 1), (8, 1), 


          (4, 7), (4, 7), (4, 7), (4, 7),

          (7, 4), (7, 4), (7, 4), (7, 4), 

           The idea now is that, by considering every number n £ r2 , working out how many times it can be expressed as a sum of two squares and adding the results, we will obtain f(r) on multiplying by 4 . Actually, this would include the origin, the point (0, 0), which we do not want to consider, so, excluding this, we have

          (f(r) 1)     = 4   representations of n ≤  r2  as two squares.


          Now 1 has a representation since 12  = 12 + 02 giving the four points (1, 0), (0, 1), (1, 0) and (0, 1), 2 = 12 + 12  has a representation giving four points, 3 none and 4 = 22 + 22  gives four points  giving twelve  in all. I made f(2) = 13  which checks out with the above since (f(2) – 1)  = 12. 

          Actually, rather than work out the excess for each number n individually, it is much more convenient to add up the number of factors of all numbers of the form (4k+1) and then subtract the number of factors of all numbers of the form (4k+3). In the first list we have


1, 5, 9….  (4k+1)   r2    and in the second


3, 7, 11…… (4k+3) ≤  r2


          Each of the numbers above must appear in the total for its class as many times as there are multiples of it that are at most r2. 1 will obviously appear r2 times, but 5 will only appear [r2/5] where the square brackets indicate the nearest integer r2/5 

          Finally, since we are not removing or adding anything, we can  subtract the first term in the (4k+3) category from the first term in the (4k+1) category, the second term from the second and so on. We end up with the open-ended series, depending on the choice of r


(f(r) 1)     =   4   representations of n ≤  r2  as two squares.

                   =   4  { [r2] – [r2/3] + [r2/5] – [r2/7] + [r2/9] ……..} ..(ii)


Now the ‘least integer’ series [r2] [r2/3] + [r2/5] [r2/7] + [r2/9] …

unlike the series r2 r2/3  +  r2/5   r2/7  +  …… is not an infinite series since it terminates as soon as we reach the point where r2/(4k+1)  < 1  making all subsequent terms = 0.

          We assume for simplicity that r is odd and of the type 4k+1 so that r1 is a multiple of 4. Since all the terms with 4k+1 as denominator are positive, we can split the series into two, and then add up the pairs, where the first member of a pair is taken from the  +  and the second from the  Series. The ‘+ Series’ contains [r2/r] and the final non-zero term is [r2/(r2].


[r2/1]   +  [r2/5]  + [r2/9]  …+ [r2/r]    +    [r2/(r+4] + ……..[r2/r2]  


0           +  [r2/3]  + [r2/7]  …+ [r2/(r2)] + [r2/(r+2)] +…[r2/(r22)]   


          If we cut off the series at [r2/r] the error involved, namely the rest of the original series, will be less than r, or a r where a is some proper fraction, i.e.


[r2/(r+4] – [r2/(r+2)] …………… [r2/(r22)] + [r2/r2]   < r 


          To see this, we write all terms after [r2/r] as [r2/(r+k)] where k is even and ranges from 2 to (r2 r) since (r+ (r2 r)) is the denominator of the final non-zero term. The absolute values of all these terms are less than [r2/r] = r and they come in pairs which alternate in sign.

          Also, all terms where 2(r+k) > r2 or k > (r2/2 r) = r(r 2)/2 will make [r2/(r+k)] = 1 The first such term comes when k = r(r 2)/2 + 1/2 (since k is even) i.e. when k = (r1)2/2  From this point on all pairs will sum to zero so we can ignore them  and only need consider the pairs between [r2/r] and ending [r2/(r+(r1)2/2)]. There will be (r1)2/8 such pairs with a maximum difference of 1 in each case, and so the sum total of the error cannot exceed (r1)2/8  < r since  (r1)2  < 8r for r ≥ 2

          An example may make this more intelligible. Take r = 9 which is a number of the form 4k+1. Then [92/9] = 9  and all terms from then on have their absolute values < 9 while the final last term is [92/92] = [92/(9+72)] The last term where [92/(9+k)] 2 comes when k = 30 and we can neglect all pairs where k has values > 32 (we make the last value k = 32 to make up the pair). k even thus ranges from 2 to 32


 2        4

 6        8

10      12


30      32


          The maximum absolute amount possible will thus be 32/4 = 8 (in this case (8)) and 8 < 9 = r

          A similar argument can be used to establish the case where r is odd and of the form 4k+3 and any even value of r will be sandwiched between the two cases.

          We thus have, returning to (ii)


¼ (f(r) 1) =   [r2] [r2/3] + [r2/5] [r2/7] + [r2/9] ± [r2/r] ± a r

where a < 1

          To lift the square brackets, we note that the error in each term is less than 1 and that there will be, for r odd, (r+1)/2  terms if we cut the series off at [r2/r]. The total possible error is thus < (r+1)/2  <  r  for r ³ 2  and can be written as ± b r where  b < 1   

          We can thus write


¼ (f(r) 1)     =  r2 r2/3 + r2/5 r2/7 ……. ± a r ± b r  ………..(iii)



         Dividing right through by r2 we obtain


1/4r2 (f(r) 1)   =  1 1/3 + 1/5   1/7 ……. ± a/r ± b/r




1/4 (f(r)/r2  1/r2) = 1 1/3 + 1/5   1/7 ……. ± a/r ± b/r


which has limit as r →  ∞    f(r)/4r2  = 1 1/3 + 1/5   1/7 …….


          Finally, we note that the discrepancy between the area of the circle and the lattice representation is


          |f(r)/r2 π| ≤  4 2 π/r   with limit 0 as r →  giving us the desired


limit  r →     1 1/3 + 1/5   1/7  + 1/9  ……. ± 1/(2r+1)  =     π/4





                                                                         Sebastian Hayes  

Is there causality in Mathematics?





is there causality in mathematics?  

Perhaps we should start by pondering whether causality exists at all. Hume thought not and Wittgenstein dismissed the ‘causal connection’ between events as a superstition (Tractatus Logico-Philsophicus). Certainly no one has ever claimed to see or touch the ‘force of causality’, and if the supposed ‘causal connection’ between certain events were self-evident we would not have the difficulties we so manifestly do have in distinguishing  bona fide causal pairs from chance associations of events.

            For all that, I have never lost any sleep over Hume’s attack on causality and don’t intend to. “We know there is causality and there’s an end to it”, as Dr. Johnson said about free will. Hume himself, revealingly, admitted that he “dropped his philosophic scepticism when playing backgammon”.

            Belief in causality is undoubtedly a psychological necessity, and thus  a  biological necessity as well : as a species, we need to believe that we can ‘make a difference’ and, looking at what we have done to the planet, we’ve certainly proved that! If modern philosophers have their doubts about the existence of causality, well, so much the worse for them.

            Science in the West remained happily married to determinism for  three  centuries and Claude Bernard actually went so far as to define science as the application of causality to the material world. But then, in the course of the twentieth century, physics suddenly got infatuated with indeterminism. Why? The official answer is that this was forced on science by experimental discoveries in the atomic and subatomic domains where ‘statistical determinism’ rather than ‘complete individual determinism’ is the norm. Yes, but the phenomenon is far less comprehensive and radical than people think. The individual molecule in a gas is, if you like, allowed freedom of movement — but only because this is very unlikely  to affect the overall result. It is like Saddam Hussein giving Iraqis the vote. Also, it is usually only the order of appearance of the events that is random, not the events themselves. A good example is the process of photographic development which is a chemical amplification of initial atomic events. It is possible, using very weak exposure, to arrange for the  individual photons to arrive one after the other and if this is done the photographic image builds up in a way that is completely unpredictable. But the fact remains that all the micro-events have been completely specified in advance (by the object that is being photographed)1.

            In other cases, ‘random mutation’ for example, the consensus is that the events themselves are basically indeterminate but this remains an untested and probably untestable hypothesis. One suspects that the sudden vogue for indeterminism in physics and elsewhere during the twenties and thirties (strongly resisted by Einstein) was part of the Zeitgeist : the senseless slaughter of the Great War and, later, the Wall Street Crash (which no one had predicted) seemed to many people to demonstrate that the world was not fully comprehensible by rational means after all.  But the real culprit was  logical positivism, a philosophy which has had a crippling effect on the way we think about science and life generally. Whereas common sense always prefers to assume that there is an agent for all changes in the external world — “Every event has a cause” — logical positivism holds fast to the verification principle instead. Since causality cannot be verified directly, it has no right to exist, therefore it doesn’t exist. Stripped of causality, physics becomes an exercise in applied mathematics, while mathematics itself is, according to the moderns, either symbolic logic (Russell) or “a game played according to fixed rules with meaningless marks on paper” (Hilbert). This effectively puts paid not only to determinism but to objective reality itself which has become the unwanted ghost in a wholly symbolic machine. Britain’s most acclaimed theoretical physicist (Hawking) once admitted disingenuously that he was not really concerned about the underlying truth of a theory but only whether it was ‘interesting and fruitful’. 




In the real world by my book events cause other events :  they do not simply happen to precede them. Coercion is involved, not  ‘functional co-variance’. But when we shift to the logical, sanitised universe we find that all we are left with is ‘material implication’  P Þ Q.  

            I don’t expect I need to remind readers that the validity of P Þ Q does not mean we can just invert the terms and conclude that Q Þ P.   What is, however, accepted in both logic and mathematics is that the truth of P Þ Q entails the truth of the Contrapositive  Not-Q  Þ  Not-P    


                        “If (a, p) = 1 and p is prime  ap-1 = 1 (mod p) for all a            True

                        “If ap-1 = 1 (mod p) for all a, then  p is prime       False  (because of Carmichael Numbers such as 561)


“If  it is not the case that ap-1 =1 (mod p)  for all a, then p cannot be prime” —   True.


            Logically speaking the Contrapositive is equivalent to the original statement because the truth tables are identical. However, if the original statement has a causal basis, this feature disappears when we form the contrapositive. Negating an event is not the same thing as negating an assumption, since something that does not occur can neither cause something else to occur nor positively prevent its occurrence.


            “I shot my noisy next door neighbour in the head ten minutes ago, so he is now dead.”

            This statement is valid because the underlying causal connection is valid (a shot in the head causes death) whether or not it corresponds to the facts.  


            “If my next door neighbour is currently alive, I cannot have shot him in the head ten minutes ago”


is, I suppose, valid reasoning but sounds most peculiar — as if I were a psychopath suffering from recurrent bouts of amnesia. This shows what happens when we empty statements of causal content. 


            The point is that Contrapositives are always a good deal weaker than affirmative statements. One of the reasons why Newtonian physics got off to such a flying start, was because it was formulated positively : “Every particle attracts every other directly with respect to mass and inversely in proportion to the square of the distance between them”. Practical people like engineers took to Newtonian mechanics because they could visualize what was going on, “If rod A makes  B go down, then B will make C go up, and C will make D move to the right….”   

            All this has grave consequences for modern mathematics since most modern proofs are indirect (75% it has been estimated) and proceed along the lines, “But if A  is not so, then Y, then Z, but Z is nonsense, therefore not-A cannot be true, therefore A”.   Stevenson and Brunel would not have been impressed. Modern mathematics is choc-a-bloc with entities whose only right to exist is that, if they didn’t, someone would be contradicting himself somewhere.2 Compare this to direct proofs which actually show you how to turn up an example of the thing you are looking for.




In logic  P
ÞQ  [P logically implies Q] is always valid except when P is true and Q is false. Thus


“If  all triangles are equilateral, then no square can be inscribed in a circle”


“If 8 is a prime number,  then G.M. ≤ A.M.”


 are both valid (since we have F Þ F and F Þ T).

            Both these sentences are not even untrue, they are just rubbish because there is no connection between the respective statements.

            Examples like the above only go to show foolish it is to completely ignore meaning when we are setting up a logical system. The rules governing, say,  embroidery or bridge are neither here nor there, they are ‘meaningless’ and none the worse for it. But logic is not embroidery since it can, in principle, have considerable bearing on the decisions we actually make, such as, for example, whether a country is a potential threat to us, and in consequence whether we should go to war or not.

            Logic teaches you how not to contradict yourself. But why not contradict yourself if you feel like it?  One answer is that this frustrates the main purposes of speech which is to communicate with other people. But there is a second reason which is much more significant. We insist on non-contradiction in logic and mathematics because Nature actually is non-contradictory (at the macroscopic level anyway)  :  it (Nature) obeys a very important principle which I have baptised the Axiom of Exclusion “An event cannot at one and the same time both occur and not occur at the same spot”. Without this assumption science would be impossible  for there would be no point in working out, for example, that an eclipse of the sun was going to take place at such and such a locality if it was simultaneously feasible for it not to take place there3. Logic is, or should be, the faithful servant of reality rather than the legislator of what is and is not : The Axiom of Exclusion is the justification for, not the consequence of, the logical rule (in bivalent logic) that “A proposition cannot at one and the same time be true and  false”.    


            Of course, if the reality we are modelling is inherently  fluctuating and ambivalent it is a mistake to make the symbolic system too cut and dried because it will not fit the original. This is basically the reason why literature is able to give a far more convincing picture of actual human behaviour — which seems to be incurably irrational — than bio-chemistry. The maddening ambiguity and vagueness of language — as the mathematician sees it — become assets if we are dealing with a shifting,  inconsistent reality.




The Buddhist logician Dharmottara considered that


            “There can be no necessary relation other than one based on Identity or Causality.” 4     


            This is admirably concise so let us apply it to mathematics. If causality has nothing to do with mathematics, which is the usual view, this means that mathematics is entirely based on ‘Identity’, i.e. it is all one vast tautology.

            This seems to be true of mathematical formulae such as those for summing the figurate numbers.  


0                               0000                                                  00000

00                 +           000         =                                      00000

000                              00                                                  00000

0000                              0                                                  00000


            Since the above is perfectly general we can conclude that the sum of the natural numbers commencing with unity can be presented as a rectangle with one side equal to the greatest natural number of the sequence and the other side equal to that amount plus an extra unit, i.e. 1 + 2 + 3 + ….n  = (n (n+1)/2     Causality as such does not seem to be involved.

            But proofs by rearrangement, though the most convincing of all proofs, are not that common in modern mathematics : one reason why  infinite series are such a minefield is that rearrangement can radically alter the nature of the series, the most notorious case being that of log 2 5.

            But what about mathematical induction? Here there is a definite sense of compulsion : if such and such is true for n, it must be true for (n+1).  Certainly, mathematical induction is not mere rearrangement : there is a sequential element which reminds us unmistakeably of a bona fide causal process, steam forcing a piston along the inside of a cylinder whether it wants to go there or not. A large number of functions — all? — can be defined by recursion rather than analytically, and this often seems a much more natural way of doing things. But definition by recursion is very different from analytical definition  n  à  f(n)  because in the recursive process a function is built up piecemeal instead of being there in its entirety from the word go. Philosophically speaking, analytical definition is being, definition by recursion becoming.  



            For Plato actual lines and circles were imitations of ideal states of affairs and the imperfect nature of the sublunary world meant there might  occasionally be some slight deviations and discrepancies (a tangent drawn in the sand or on papyrus might well touch a circle at more than one ‘point’ for example). For Newton and Kepler what happened down here was wholly dependent on the prior decisions of a mathematical God, and we still seem to think like this a lot of the time which is why we still talk of the ‘laws of Nature’ — we do not speak of  ‘the observed regularities of Nature’. God may not have known, i.e. not bothered to work out in detail, all the particular consequences of his original handful of edicts, but then again He didn’t need to. So long as the original laws were basic and far-reaching enough, the world could be left to take care of itself. There is causality of a kind here because there is compulsion : rocks, plants and animals have no choice but to comply with the rules and even man, though he has free will, remains constrained in his physical being. But, according to this paradigm, the causality we find in Nature is not itself ‘natural’ : it has a supernatural origin and purpose.

             In the classical (post-Renaissance) world-view there is no real difference between physical and mathematical law, between pure and applied mathematics, so the same schema applies. God determined the axioms and everything else is theorems. But today we no longer believe in an omnipotent intelligent Creator God (most of us anyway) so the ‘laws of physics’ and ‘laws of mathematics’ revert to being something rather similar to Platonic Forms, existing out on a limb. This does seem to remove causality from mathematics and physics unless we view the way in which phenomena model themselves on ideal states of affairs  — how an actual gas approximates to the behaviour of an ideal gas, for example — as a sort of watered down ‘formal causality’. In the Judaeo-Christian world view which was that of Kepler and Newton, everything hinges on actions and decisions made ‘in the beginning’ : someone (God) had  sometime in the distant past ‘divided the light from the darkness’ and distinguished primes from composite numbers. But Platonic Forms and mathematical formulae simply are , they do not do anything.   




Supposedly, the whole of mathematics can be derived from the half dozen or so Axioms of von Neumann or Zermolo Set Theory (se M500 206 p. 20). But no one ever sat down of a night to see what interesting theorems he or she could derive from them : they are strangely remote like mountains one sees in the distance but which are utterly unrelated to life down here in the plains.

            But then most modern mathematics has an insubstantial air : the very way in which we are taught to do our mathematics, to consider the basic entities and procedures, inclines us towards a view of the world where nothing really happens. The old-fashioned viewpoint, still very much in evidence in school textbooks of the pre-war era, goes rather like this. We have a numerical or geometric entity, we do something drastic to it, multiply it, chop it up into bits, rotate it &c. &c. and then we see what we are left with. The modern way is to ‘map’ certain values to certain other ones : we make up two sets  (a, b) and (a’, b’) selected according to a rule. Everything exists in a sort of eternal present and we merely move around looking at what’s here and comparing it with what’s there. The idea of an unknown which by dint of intelligent manipulation gets transformed into a known is both intuitively clear and exciting  : it is like  working out the  identity of Mr. X from circumstantial evidence and witness statements. But the idea of a variable is quite different : somehow x has all possible values at once (usually an infinite number) each of which incidentally is a constant. Also, in the real world effects always succeed causes which means, mathematically speaking, that the dependent and independent variables are not freely invertible — precisely what we are told to assume in Calculus. Examples can be multiplied endlessly….

            What nobody seems to have noticed is that the two dominant tendencies in modern mathematics, the axiomatic approach and the analytical, ‘functional’ presentation (which is essentially Platonic) are pulling mathematics in two completely different directions and may well eventually tear it apart.  An axiomatic approach means that deduction is all-important since, not only can everything (or nearly everything if we take Gödel into account) be derived from the axioms, but nothing that is not so deducible will crop up (again pace Gödel). But deduction involves step by step argument, thus temporal sequence; also, there is a strict hierarchy with certain propositions being much higher up the pecking order, as it were, than others. But the ‘functional’, analytical treatment is, implicitly at least, atemporal and non-hierarchical. All the properties of  y = f(x)  are there as soon as we have written down the expression and it is ‘our fault’ is we don’t spot them straightaway. Moreover, all the cross-references between different functions also exist as soon as the functions are properly defined, and in fact prior to their being properly defined (by us). As for some propositions being key ones on which others depend, if everything is already out there nothing ‘depends’ on anything else, it either is out there or it isn’t. There either are odd perfect numbers or there are not. This rather cuts the ground from under the feet of the ‘prove-at-all-costs’ lobby and, moreover, because computers can usually prove or disprove whether such and such an assertion is true over the domain that concerns us in practice, it ceases to be so important to know if something is ‘always’ true or not — indeed, some philosopher will shortly come along and tell us that this is a ‘metaphysical question’ and thus not worth bothering about.

            Currently there are only three theories of mathematics left in the running, formalism, logicism and Platonism . Neither of the first two schools of thought can explain the often amazingly good match between mathematics and physical reality, and, while Platonism does explain this, the metaphysical price to pay is a very high one indeed. Even mathematicians who are not afraid to call themselves Platonists (such as Penrose) fight shy of giving any coherent statement of their philosophic position to the general public. Now logicism and formalism do not recognize causal processes at all while Platonism admits only a very watered down sort of causality at best. So this explains the inevitable demise of causality in the scientific world-view.

            But more significantly none of these mathematical schools can explain the surprising vitality of mathematics which never ceases to astonish (and sometimes to alarm). Formalism allows for human invention since that is what in the last resort the whole of mathematics is but has little to say about how and why inventions come about. I am so far from being a positivist that I see ‘vital forces’ operating everywhere, not only in the biological and physical domains but also in supposedly abstract areas like pure mathematics and even, in a very rarefied form, in logic. There is perhaps a single unified ‘force of necessity’ which is (almost) tangible in an arrangement of rods and levers and which, in a good mathematical proof, can be sensed thrusting the tortuous argument on to its triumphant crescendo.   

            Moreover, this élan vital is surely active in mathematics as a whole, ceaselessly pushing it in new and unexpected directions : mathematics, like technology, has a life of its own and individual mathematicians get dragged along whether they want to go in that direction or not. What is absent from the logicist, Platonist and formalist views on mathematics is precisely a recognition of this vital principle. There is just no driving force in Set Theory : it is a steam-engine that has been cleaned up, varnished and put to rest in a science museum. This is why Poincaré, who was a creative mathematician in a sense that Russell and Whitehead were not, dismissed logicism with the crushing retort, “Logic is sterile but mathematics is the most fertile of mothers”.


References and Notes  


1  See French and Taylor, An Introduction to Quantum Physics, pp. 88-89  The remarkable illustrations show the picture of a girl’s face building up from randomly distributed dots.

            I have conjectured that there is some sort of a law involved : if all the events are specified in advance, their order of appearance need not be, if not all of the events are specified in advance, there must be strict order.  


2 Does anyone, for example, really believe that ‘almost all’ numbers are transcendental? (I remind readers that a transcendental number is a real number that is not the root of a polynomial equation with integer coefficients.)  Apart from e and p I doubt if anyone reading this could produce one without consulting  a dictionary of mathematics. On doing this I find that 10 -1+ 10–2 + 10 -3 + …..  is also a member of this highly select (but apparently very well attended) club.    


3 The trouble with Quantum Mechanics is that it does not verify the Axiom of Exclusion since it permits a physical system to be in incompatible states at the same time. The Many Worlds version of QM does verify the Axiom of Exclusion, of course, but there is a heavy price to pay in universes. 


4  Stcherbatsky, Buddhist Logic Vol. 1. p. 259.  This is, incidentally, an extremely interesting, readable and, I believe, important book despite its abstruse air. It is more concerned with speculative philosophy than logic as such. The world-view of certain Buddhist thinkers in Northern India during the first few centuries of our era, has a distinctly modern feel  — they would have been quite happy with Einstein’s attempt to describe the physical world in terms of causally related events occurring in a single unified Space-Time field.

            This raises the question of why India didn’t get there first in terms of the scientific revolution. Needham, in discussing the question with reference to China, concludes that the key notion of natural law was lacking. But this was certainly not lacking in India (the law of karma). Maybe these Hinayana Buddhists were too advanced in their conceptions : it was necessary to work  through the cruder scientific paradigm of a world made up of ‘hard, massy particles’ interacting with each other by pushes and pulls before moving on to the vision of evanescent bundles of energy evolving in Space/Time. Also, of course, there was little motivation to develop science as such  :  for a Buddhist the physical world was just not important enough to bother about. 



5 log 2  =   1 1/2 + 1/3 1/4 + 1/5 +………   

            =  (1 – ½) 1/4  + (1/3 1/6) 1/8 + (1/5 1/10) ……

            = ½ ¼ + 1/6 1/8 + 1/10 …….

            = ½ { 1 1/2 + 1/3 1/4 + 1/5 +………}   

            =   ½  log 2 


 NOTE.  Ackowledgments toM500 Magazine where this article originally appeared.



Permutations and Combinations

Selections, Permutations and Combinations


This article is not written for mathematicians but for the ‘intelligent layman’, or rather the ‘thoughtful’ layman — not quite the same thing since, unfortunately, it is quite possible to be highly intelligent without doing much thinking for oneself. 

            Many aspects of so-called elementary mathematics are much harder to grasp for the beginner than they need be, not because there are difficult or ‘deep’ questions involved but because of the specialised terminology which often conflicts with normal English usage. In some cases, there is the added problem of inconsistent or confusing notation.

            Today, professional mathematicians insist on viewing their subject as a self-contained formal discipline, but there is no doubt in my eyes that our core mathematical concepts are abstractions from sense experience and there is an increasing amount of observational evidence which supports this commonsense view (see, for example, Where Mathematics Comes From, by the cognitive scientists Lakoff and Núñez, Perseus, 2000). In what follows, the bare minimum of mathematical knowledge is taken for granted, and the reader is invited to actually test what is stated by experiment and observation, using real or imagined objects.         




Since the terms combination  and permutation cause difficulty, I shall start by using the word Selection which, hopefully, I do not need to define.

            Let us suppose we have a store, or pool, of discrete, distinguishable objects, playing cards, shirts, shrubs, whatever, and we are going to select some — or possibly all — of them. This is the sort of thing we spend our lives doing : when going into a shop we make a selection from all the goods on offer, and when preparing a meal  we make a selection from the various vegetables, eggs and joints of meat available.

            In practice, the number of items we choose always has an upper limit, for example, the amount of goods we can afford to buy in one day, and, similarly, there is an upper limit on the number of goods any shop can put on the shelves. This means we need not bother to consider what happens if we want to make an ‘infinite’ selection from an ‘infinite’ amount, leaving such strictly academic questions to people who have nothing better to do. 

            On the other hand, what may cause some surprise at first is that in mathematics not taking any of the objects on display counts as a selection — a zero selection, if you like. The main reasons for doing this are aesthetic and formal reasons, i.e. so we can have a more general theory which includes the zero case. But it is not unreasonable to view abstaining from making a selection as a sort of selection nonetheless : offered a choice of fruit for dessert I might reject everything on the menu and say I didn’t want a dessert. To ‘not-do’ something is a form of doing, or can be, nor need this involve us in any strange concepts such as the celebrated Taoist technique of achieving your aim by (apparently) not bothering whether you achieve it or not.

            Both Permutations and a Combinations are Selections and the essential difference is that, in the first case, we take the ordering of the elements selected into account, whereas, in the second place, we disregard order entirely. Thus, supposing we are going to select four items which we label A, B, C and D.

ABCD is a different permutation from CDAB but it counts as the same combination because the same elements appear in the selection and  only these elements. In general, as one can imagine, there are many more permutaions than there are combinations.



Suppose we have n objects and are going to select r at a time. [Some people may feel a little uneasy with ‘number n’ but the basic idea  is no different from talking about ‘a certain number of men’ or ‘a certain number of potatoes’. How many? Well, that depends on the context but it is still possible to say quite a lot about such unspecified collections of objects, for example that all the men will have one and only one head and all the potatoes will have a rounded surface.] Since we can’t select more objects than are available, r cannot be greater than n, and it is important to bear this in mind. [Unfortunately, the standard ‘smaller or equal to’ sign is not available to me at the moment.] Also, both n and r must be assumed to be positive (or just possibly  zero) since we are dealing with actual, or conceivably actual, collections of objects, and with actual, or conceivably actual, choices between them. 

We note any such permutation as nPr   n = 0, 1, 3…   (where r  cannot be greater than n).   

            By convention n appears as an index (at the top of a letter) and r as a suffix (below the line).


            Suppose we decide not to take any objects at all. There is only one way of making no choice at all, or so I assume, so  nP0. = 1


            We are now going to select a single object from n objects, where n is not zero. There are as many ways of doing this as there are different objects so 

                                    nP1. = n

            For example, given ABCD as distinguishable elements, we can take out A or B or C or D.    Here, it is easy to count the number of possibilities, but the point is to realise that there are as many single selections as there are distinguishable elements in the first place, no matter how many objects are involved. This is why we write n allowing n to take any positive value — it could be 100 or 100, 000 or a number so big it could not be written down on this sheet of paper.


            Suppose we take two objects from n (where n is at least 2) and arrange them in a series of boxes or pigeonholes which (I find I cannot symbolize with this treatment).


For the first position  we have n choices — the case we have already considered.  This means, whatever we chose, there is one less possibility for the second box, i.e. (n-1) possibilities. Thus, if we had chosen A out of ABCD we would be reduced to choosing from BCD for the next box, with the possibilities AB, AC, AD in all. [Note that, we are not allowed to repeat our first choice since we assume that the n objects are different and are not replaced until we have concluded the selection. The case where we allow repetition is more complicated and is best left alone for the moment.]

            We have thus three selections with A in the first box. But there is nothing special about A and exactly the same sort of thing will apply to any of the other suit cards. And since there are four suits, we end up with 3 ´ 4  permutations in all. Thus


                4P2. = 4 ´ 3  =  12   or more succinctly using a dot to indicate multiplication   4P2. = 4 .3  =  12      


            There is nothing special about the number 4 and a little thought will convince you that, no matter what the value of n, there will be n ´ (n-1) permutations or      

                                    nP2. =  n ´ (n-1)   or  n (n-1) for short.       


            If you are unconvinced by this, convince yourself by putting a certain number of actual objects on the table and seeing how many different pairs you can make. But remember that, since we are speaking of permutations, two objects in a different order will make two permutations. Thus an envelope placed to the right of a pair of scissors  * is one permutation and  * is a different one.  

            We now choose three objects out of n. Once again we will consider the case of just four objects symbolized as A B C D. The possibilities can be represented in   a tree diagram where you follow down a pathway from §



                        B                                 C                                D


            C             D                 B               D               B                     C

            The total possibilities are   ABC, ABD, ACB, ACD, ADB, ADC i.e. six in all.

            Again, the same will apply to the other choices for first place, B, C and D and since there are four suits we end up with  6 ´ 4 = 24  permutations of 4 objects, taking 3 at a time.   

            Thus   4P3. =   2 ´ 3 ´ 4   =  4 ´ 3 ´ 2  =  24   

            Generalizing to n objects, we can perhaps agree that  

            nP3. =  n ´ (n-1) ´ (n-2)  =  n(n-1)(n-2)


            Note that the last digit we subtract is 2, appearing in (n-2) and that this is one less than the number of boxes which is the suffix digit, 3. 


            Proceeding in this way, you will perhaps soon be able to produce a completely general formula for a permutation of r objects taken from a store of n original objects. I repeat once more the important proviso that we are not allowed repeats : once we have, for example, used § in a single selection, we cannot use it again in the same selection (though we can, of course, use it again in a further selection because we return it to the store).

            The formula is


              nPr  =  n(n-1)(n-2)(n-3)……..(n-r+1)


            The part where one is most likely to make a mistake is where you cut off the multiplication process. One might suppose that you end up with (n-r) since there are r boxes with one object in each. But, if you look carefully, you will see that we start with n or, if you like, (n-0), and so, if we want there to be r objects in the r boxes, we must end up not with r but (r-1) . This number is going to be subtracted, so we get (n (r-1)) or, following the rule about “a minus from a minus makes a plus”, we can write (n-r+1).  You might fear that this is going to go negative which would give a ridiculous result since the entire product would turn out to be negative. But this cannot happen as we have stipulated in the beginning that r can never overtake n, or r £ n  The largest number of permutations of n objects is obviously going to be when we use the whole lot of them, i.e. r = n . Fitting in this value of r we get n(n-1)(n-2)(n-3)……..(n-n+1) and since (n-n) = 0 this turns out to be   n(n-1)(n-2)(n-3)……..(1) i.e. a multiplication involving all the numbers from 1 to n inclusive. This case is so important in mathematics that it has a special notation n! where the ! does not indicate surprise, as in normal speech, but is best conceived as an instruction to multiply n by all the numbers less than it down to 1.  Incidentally, if you find the exclamation mark notation annoying at first, you are in good company since, when it was first proposed in the mid-nineteenth century the Royal Society thought the same.

            The so-called factorial numbers n! with n = 1, 2, 3…  get very large surprisingly quickly. Take a guess at how large 8! is. One might imagine it was below the thousand mark but it actually turns out to be 40,320 .

            On most pocket calculators, you can simply key in two digits and obtain the number of possible permutations of n objects taking r at a time, but this will teach you nothing about what all this means, nor will this familiarise you with handling numbers. It is more instructive to make an original guess at nPr  for different values of n and r, then perform the multiplications with pen and paper using short cuts (even better in your head) and finally check your result. Thus, I might guess if there are 10 objects and I take 4 at a time, the total permutations will be around 1,500. Multiplying out

            11 ´ 10 ´ 9 ´ 8  =  110 ´ 72 = 110 ´ 72  =  7200 + 720  =  7920 which turns out to be correct. Once again the number of permutations is improbably large — though still perfectly testable provided one has plenty of time and patience.






The term is most misleading since a combination in the sense of ‘a combination lock’ is an arrangement of numbers in a specific order, say, 15678. This ‘combination’ is not the same as 78651 since, using the latter, you would not be able to open the lock. Now, this is not the mathematical sense according to which the two different permutations 15678 and 78651 constitute the same combination — since the selfsame digits appear and no others. In practice, we are more often concerned with combinations than permutations. For example, if I select half a dozen books to take with me on holiday, it is quite unimportant how they are packed in my case, or how they were arranged on the shelf — all that matters is that I have them and that I don’t have duplicate copies of the same book. 

            In the so-called trivial case of selecting nothing at all, i.e. r = 0, there is no difference between a permutation and a combination. Also, if we restrict ourselves to selecting a single object from the pool, i.e. r = 1, there’s nothing to permute either — we either select this object or we don’t. Thus, noting combinations as  nCr  we can say straightaway


              nC0 = nP0  =  1   and  nC1 = nP1 = n



            For any other selection, the number of permutations of  r objects looks as if it is going to be larger than the number of combinations. How much larger?

            If we have just four objects in a pool, say A B C D, the number of permutations, taking them all at a time, will be 4! =  24 — a case we have already considered and which is not too large an amount for you to check using actual objects.

            However, all these twenty-four permutations constitute a single  combination, since the four objects, and no others, appear each time, albeit in a different order. So the ratio permutation : combination is 24 : 1.  A little thought will hopefully convince you that this will be the same whatever the  number of objects we select. Any permutation of r objects will constitute a single combination, whatever r is.        So, to obtain the number of combinations we simply have to divide by the factorial r!  In the example given above we were considering the permutations of  A B C D where we select all of them, but we can also consider the case where we select only  two objects out of the four. The number of permutations would then be 


                                4P2  =  4.3  = 12   


            According to the principle just enunciated, the number of combinations should be 12 divided by 2! or  12/2 =  6 . By trial, this turns out to be correct  since the only possibilities are  AB, AC, AD, BC, BD and CD.     


            Note that there are exactly two terms in the multiplication, 4P2 = 4.3  corresponding to two boxes ÿÿ in which we are going to put two objects, one in each.   Also, it is worth pointing out that, although there are n terms in factorial n! there are only (n-1) that we need bother about since the first (or last) number is 1 and anything multiplied by 1, i.e. taken once, is just that quantity.  


            If the reader accepts the preceding, at any rate provisorily, we can state a general formula for the number of combinations of n objects taking r at any one time where r £ n. Since the ratio permutation : combination is r! : 1  we simply divide by factorial r :


            nCr     =    nPr /r!  =  (n(n-1)(n-2)(n-3)……..(n-r+1))/ r! 


            This may look complicated but in practice comes out quite easily. For example, supposing I want to calculate the ways of selecting 5 objects from a pool of 10 different objects, I first calculate 10P5  =  10 ´ 9 ´ 8 ´ 7 ´ 6  =  30240

                                                                                          5 digits

and now divide by 5! = 5 ´ 4 ´ 3 ´ 2 ´ 1 =  5 ´ 24 = 120 .   30240/120 = 252.          

                                                5 digits


            Thus  10C5 =  252


            As it happens, the formula actually works in the case when we pick a single item from n objects. If you substitute 1 for r in the formula above you get just the single entry (n1+1) = n on top and 1! On the bottom. Since multiplying 1 by all the numbers back to 1 is just 1, the bottom line is just 1. And so we have

                                                nC1  = 1/1 = 1


In the case of taking nothing at all, r = 0, the formula does not really work since (n0+1) = (n+1) which means we have more objects than we actually have which is absurd, and the bottom line is factorial 0 or 0! which seems meaningless. We can, however, simply define  nC0 as being 1 for all n, and similarly define 0!, if it should crop up elsewhere as being equal to 1. This does no harm and allows us to have a completely general formula with predictions which correspond to what actually happens in the real world.


            Most modern textbooks give the Combination Formula in a slightly different form, namely   


                                    nCr  =          n!        .          

                                                       r! (nr)!


which is neater but much less helpful to the beginner since it does not show the link with permutations, and does not allow you to count the r terms in the multiplication process. Then end result is the same because there are in effect a whole lot of terms in the numerator and denominator which cancel out. To see this, note that in the formula as I give it, there are r terms in the numerator ending with  (n-r+1). If we are to carry on multiplying back to 1 the very next term will be (nr) , the one after that (nr1) and so on ending with 1. This means we have multiplied right back from n to 1 so this is so-called factorial n written n!  To equalize, we need to multiply the bottom by the same amount, in this case the sequence between (n-r+1) and 1, or factorial  (nr) = (nr)!

            As an example, take   10C5 which we just quoted. Here, n = 10 and r = 5 and so, fitting in these values to the new formula, we have


                        =          10!        .    =           

                                 5! (105)!                 ( ´ (       


                        =     3628800              =  252  as before.

                              120 ´ 120


            [It is purely accidental that, in this case, we get 120 twice in the denominator — this is simply because, in this case, (nr) = r  since (10 5) = 5.]    


We can, on this basis, start to set up a table of combinations.

            Leaving aside the case of making a zero selection from a zero collection — a special case which messes up the general pattern, we start with selecting 1 object from a store containing 1 object only. We know about this already : there is only one way of doing this.

            We then pass to the number of combinations of  how to select 1 object from a store of 2, then the number of ways of selecting 2 objects from a store of 2 and so on.  If you follow the rule set out above, i.e. dividing the number of permutations by the value of r (the number of objects selected), you should start building up a table like this one :


            0          1          2          3          4          5          ………………….


0          1


1          1          1


2          1          2          1


3          1          3          3          1


4          1          4          6          4          1


5          1          5          10        10        5          1



            The row number, marked in red on the left, most confusingly (for English speakers) indicates the objects in the store, starting at 0, while the columns indicate the number of objects selected. Thus, the n in nCr becomes the row number, while the r becomes the column number. One gets used to this, but it is quite infuriating at first and neither mathematics books nor teachers sufficiently emphasize this point. 

            To find out, say, how many ways you can select 3 objects from a store of 5 distinguishable objects, you look along row 5 and down column  3. This gives 10 . So there are ten ways of doing this which we can check by working it out from first principles.  5P3  = 5.4.3  = 60  and if we divide by r! = 3.2.1. = 6 we obtain 10 as expected. Also, using the other formula

                        5C3  =  (                         =  120/12   =  10

                        (3.2.1) ´ (2.1.)  


            Them manner in which one builds up the entries should be apparent if you look closely so there is no need to work out the individual entries once you have a few to start with.

            The above, as you probably know, is Pascal’s Traingle, an array first defined and investigated in the West by the scientist and religious mystic, Blaise Pascal,  who pointed out many curious and interesting features of these numbers and the way they come together. Actually, the selfsame array was known to the melancholy hedonist Omar Khayyàm centuries earlier — he was, surprisingly, an astronomer and mathematician apart from being a lyric poet and wine-taster — and earlier still in China. So this collection of numbers has been going for nearly a thousand years already, attracting the attention of mathematicians across the globe, and certain hitherto unknown features are still being investigated today. This probably makes it, along with the Fibonacci Sequence and the Prime Numbers Sequence, the most prestigious collection of numbers in mathematics.


                                                                        Sebastian Hayes         





The Ramanujan Problem

st1\:*{behavior:url(#ieooui) } in january 1913, G.H.Hardy, perhaps England’s best pure mathematician at the time, received a bulky handwritten letter from a poor clerk in Madras who had three times failed to get into an Indian University. The correspondent confessed that  “since leaving school I have been employing the spare time at my disposal to work on mathematics” and wondered what G.H. Hardy thought of his efforts. Then followed ten pages full of weird theorems along with the claim that the author,  Srinivasa Ramanujan by name, had in his hands “an expression for the number of prime numbers less than N which very nearly approximates to the real result, the error being negligible.”             Hardy was completely knocked back by this letter, “the most extraordinary I received in my life” and eventually arranged for Ramanujan to come to Cambridge (without having to pass any examinations, of course), sponsoring him a year or so later for election to the Royal Society. So far this reads like a fairy tale but there is a sad ending. Ramanujan didn’t take to the English climate, awful cooking and stiff upper lipness : he tried to commit suicide once in the London Underground and ended up contracting  tuberculosis which, after his return to India, killed him before he was thirty-five. Those interested in his fascinating life story would be recommended to read “The Man who Knew Infinity” * by Robert Kanigel, Scribners 1991. My only fault with this book is the insufficient mathematical coverage, too scanty even for my modest level.              Ramanujan, even in his ‘maturer’ years gave very few proofs and those he did give were usually inadequate, and very occasionally actually wrong. He was a man who had received no more than the equivalent of ‘A’ level formal training in mathematics. I am not competent to read, let alone comment upon, Ramanujan’s mathematical output. But his reputation seems to be standing up  pretty well since his death and even underwent something of a  renaissance when his last Notebook was discovered in the eighties since some saw in it anticipations of string theory.  (Not that Ramanujan was at all interested in physics or indeed any applied mathematics.)              So how did he do it?             Patience and keen observation (of numbers) accounts for some of Ramanujan’s results. In the days when the PC was not even a pipe dream Ramanujan spent a lot of time trawling through seas of numbers, exactly the sort of drudgery Western mathematicians at the time rather looked down on. However, one can’t see observation alone producing         113          +     213        +      313         + ……          =     1               e 2p   1            e4p 1           e6p 1                                   24 or                coth p          +    coth 2p        +      coth 3p         + ……   =     19p7                    17                          27                          37                                 56700 just two of the results contained in this now celebrated letter.  (Note : p signifies ‘pi’.)  

            The early twentieth century was the era of rigour : Hardy himself disliked loose mathematical thinking and wanted to reform English mathematics to bring it up to the continental standard. And a stone’s throw away from Hardy’s rooms, Bertrand Russell was busily reducing the whole of mathematics to logic. Russell’s very definition of mathematics — “The science of drawing necessary conclusions” — seemingly excludes Ramanujan’s entire output. Unless, of course, one wants to argue that the reasoning that went on was largely unconscious. But this sort of talk wouldn’t have gone down very well with the early Russell’s positivist friends who tended to ridicule the very idea of the unconscious .              Most eminent pure mathematicians in the twentieth century have either been open or closet Platonists — Hardy himself was an open one. Mathematical Platonists believe that the truths of mathematics are true in an absolute sense : they are not human inventions, and cannot be refuted by an appeal to observation and experiment.   But Hardy was also a militant agnostic, a sort of mathematical Dawkins, and thus hostile to anything smacking of ‘mysticism’. The vision of the higher mathematical Sacred Grail required years of hard work, university training and self-discipline — there was no royal road to analysis. And here was a fellow who claimed to receive formulae for hypergeometric series and elliptic integrals in dreams and who attributed his mathematical achievements to his family’s tutelary goddess, Namagiri. This was Rider Haggard or worse.           If one takes a Formalist point of view, mathematics is invention, and belongs to the arts rather than to the sciences  — at least in principle. In practice, however, students of mathematics are never invited to devise their own symbolic systems in the way in which, for example, artists are invited (or more often obliged) to choose their own subjects for their paintings.  It is just about permissible, at least in popular books on mathematics, to speak of  ‘mathematical instinct’ and ‘inherent mathematical judgment’, but few writers on mathematics  even attempt to define such terms which belong to aesthetics or psychology. Obviously Ramanujan did have these elusive qualities but the trouble with the ‘creativity’ angle is that it leads us into the murky underwater channels of the unconscious, and the least that can be said of modern mathematicians is that they don’t fancy getting their feet wet. Also, it  does not explain why Ramanujan, working almost entirely on his own, homed in on so many of the great themes of nineteenth and twentieth century mathematics albeit from a rather different angle. One might have expected him to go off completely at a tangent, but he obviously didn’t or he wouldn’t have been elected to the Royal Society. So how did he do it?              Must we, after all, believe that Ramanujan was a sort of mathematical Joan of Arc? This is an explanation of sorts and has the merit of being the one Ramanujan himself preferred. In the 16th and 17th centuries Ramanujan would not have been such a misfit: even Descartes, the father of modern rationalism,  claimed to have been visited by the Angel of Truth.  There are nonetheless difficulties with this explanation, even for such an anti-rationalist as myself, principally the fact that Ramanujan was not invariably right. His claim to have in his hands a formula giving the distribution of the primes unfortunately turned out to be mistaken. (It has apparently since then been shown that no such formula can exist.) Of course, there is no reason why a goddess should not err on certain technical points but it is suspicious that the slips made by Ramanujan (or his source) were precisely the ones to be expected from someone not fully au courant  with  the very latest research into the divergence of  infinite series ¾ research that Ramanujan, in his Madras backwater, was unaware of.             I personally don’t have the sort of trouble Hardy and Russell (or today militant rationalists like Martin Gardner)  have with the idea that some people can tune in ‘directly’ to sources of knowledge most of us can’t, though I interpret the phenomenon more in terms of Jungian ‘Group Minds’ or ‘Collective Memories’ than in terms of goddesses and spirits. It may be that Ramanujan from the mysterious East connected up with planes of being invisible to us educated Westerners, had readier access to the Akashic Records of Mme Blavatsky, if you like. I certainly have less of a problem with this approach than that of   mathematical Platonism. The latter made good sense in the days when people viewed God as the Supreme Mathematician (as Kepler and Newton did) but cuts little ice today with anyone except professional mathematicians. For what it is worth, the consensus amongst physicists today is that the world we live in is not the result of intelligence and planning — it just happened. And the fact that mathematics has proved to be a useful tool in investigating the cosmos doesn’t in the least mean the cosmos is inherently mathematical. Is a cat mathematical? To actually model a predator pursuing its moving prey on the savannah, quite complicated mathematics involving differential equations is required, but no one in his right senses is going to suggest that a cheetah or a cougar knows what he or she is doing mathematically speaking, or needs to : trial and error and natural selection suffice. And, as far as we know, there’s nothing special about the values of the most important mathematical constants, G, c or  the fine structure constant: so far all attempts to derive such values by a priori reasoning ¾ as Eddington tried to do ¾  have been miserable failures. We just happen to be in a universe where these constants have the values they do and that’s ultimately all there is to it. And if there is something beyond and behind all possible and actual universes, the Matrix to end all matrixes, my feeling is that neither words  nor symbols nor numbers are going to be of any help here ¾ “The Tao that can be named is not the original Tao” (first line of the Tao Te Ching). As far as I am concerned mathematics deals strictly with what is measurable and, whatever ultimate reality is, it’s certainly not measurable or it wouldn’t be ultimate.             So how do I explain Ramanujan? As someone who believes that the origins of mathematics lie in our perceptions of the physical world in which we live, I must  admit Ramanujan worries me a bit. Because of the terseness of his results and his  air of absolute conviction he does, at first glance, look like  someone who has a window on a higher  reality, a strictly mathema
tical one, and that all he has to do is to transcribe  what he sees.  But then again part of the reason for this lies in his idiosyncratic working habits. In India at any rate — where he did most of his creative work — he did his mathematics with chalk and slate because he found paper too expensive. He rubbed out with his elbow as he progressed and only noted down the final result. So he  probably couldn’t remember the intermediate steps by the time he’d finished and had no means of checking. Maybe he even covered up his tracks on purpose : we don’t really want a magician  to reveal his secret as Cutter, the magician’s ingénieur, says in the film The Prestige
¾  it spoils our pleasure. Indian mathematics never was too much concerned with proofs anyway — there is the famous example of the ‘proof’ of Pythagoras’ Theorem by way of a diagram with the caption “Behold!”

            One thing that’s certain is that  Ramanujan was born in the right place and time and that maybe accounts for a lot about his mathematics. India was, at the end of the nineteenth century, a country looking in two directions. It was still immersed in mysticism, the occult, philosophic and religious speculation. But at the same time it had an advanced educational system modelled on the British, and was encouraged,  to send particularly  bright pupils to Oxford and Cambridge. The rational plus the irrational (or supra-rational) is  a heady and treacherous mixture but it suits certain types of minds perfectly. Kepler, astrologer and astronomer, mystic and painstaking observer, was a child of a similar place and time, Renaissance Germany. The dangers of irrationalism have been trumpeted in our ears for two centuries already , but there are equally grave dangers attendant upon the exclusive use of ‘reason’. There has always been something threatening and, above all, puritanical in rationalism. Hardy wrote of one of his contemporaries, “Bromwich would have had a happier life, and been a greater mathematician if his mind had worked with less precision”. The Houhnhms, the strictly rational beings of Swift’s “Gulliver’s Travels”, are not only rather dull but not even very congenial since they entirely lack spontaneity, tenderness and enthusiasm.             Much has been said about Ramanujan’s lack of adequate mathematical training. But it was, on the contrary, very suitable — for him. He was given about as much as he needed to get going, namely groundings in most areas including calculus (still little taught in schools at the end of the nineteenth century). He didn’t make it to university but he did get to know several eminent  Indian savants and his immediate superior in his office was an excellent amateur mathematician. So Ramanujan had people he could talk to about mathematics, and it was in many respects an advantage that such persons did not know more than they did, or more than he did  — for they would have put him off following down certain pathways. It is an open question whether even Hardy, who discovered him, had, in the last analysis, a good or a bad influence on him.            Mathematics has, in the last two centuries, become a matter of solving the great problems, and rigorously proving the great theorems, bequeathed to us by the previous generation. It has become grimly serious and has long since ceased to be the carefree exploration  of virgin territory that it was in the time of Fermat and Euler. Ramanujan was not a prover nor  even especially  a problem solver :  he was an explorer. In his youth, after giving up the idea of getting into college, he spent five happy years supported by his poor parents doing nothing except sitting on a wooden bench in the sun in front of the family house working at mathematics, his choice of mathematics. After his excursion into Europe he returned to this mode of life in his last years, exploring peculiar things he called “mock-theta functions”. The best thing to do with such a person is to let him get on with it and have someone check up on his results later. But that wouldn’t do in the contemporary era, it sounds far too lax, ‘libertarian’ ¾ people might actually get to enjoy mathematics if they were allowed to follow down pathways that caught their fancy.              In this era of “education, education and education” it is worth pointing out that, though lack of knowledge renders people impotent, too much knowledge available at the drop of a hat makes one lazy, blasé and unimaginative. It is indeed often salutary to be deprived of knowledge.  If Pascal’s father had not forbidden him to study geometry, he would not have got off to such a good start by re-discovering whole chunks of Euclid unaided.  Ramanujan kicked off with an out of date pot-boiler, Carr’s Synopsis,  which is apparently all formulae and no proof. The author was an enthusiast for his subject, however, and managed to communicate this to his readers. According to Kanigel, the book has a certain flow and movement ¾ indeed I’d like to read it myself and I’m sure I’d get a lot out of it.            Now you can’t teach ‘exploration’ but it can be encouraged. In contemporary schools and colleges it practically never is. What we get is the  message to the world delivered by Head of the American Patent Office in 1890 : “Everything worth discovering has already been discovered” (he actually said that)    with the exception perhaps of a few abstruse issues that require ten or fifteen years of preparatory training in a college of higher education. As it happens, one of the most exciting mathematical events in the last twenty-five years has been the discovery (or rather invention) of fractals. But they were turned up by an explorer of mathematics, Mandelbrot, who worked at the time for IBM, not Princeton University  — I gather that  even  today the snobbish pure mathematical fraternity in America does not accept Mandelbrot as being part of the club. And it all came out of looking into a simple function that goes back to Newton and is known to most sixth formers.             The great objection to exploration is that there’s no point in re-inventing the wheel. But there is. Invention or re-discovery gives you a thrill that  answering  routine questions set by someone else never does. Secondly, it gets you into the habit of inventing and “If you want to be a blacksmith, go and work in the forge”. Once you’ve started inventing, you may well end up with something that really is original since discovering something for yourself is much more likely to lead on to further discoveries. A retired civil engineer of my acquaintance, Henry Jones by name, with no mathematical training beyond ‘O’ level, produced a weird-sounding definition of an ellipse, namely the locus traced out by a point on the circumference of a revolving circle, the centre of which is revolving around a fixed centre at half the speed of the point in question. This is in effect the parametric equation of the ellipse which goes back to Copernicus though Jones did not know it. This hopelessly old-fashioned geometric definition suggested an immediate application — which the algebraic definition doesn’t— and Jones went on to design a compass which could draw ellipses, as well as circles and straight lines, since the circle and the straight line are, mathematically speaking, limiting cases of the ellipse. (Although he took out a p
atent I believe the Jones elliptical compass was never manufactured, though it deserved to be.)   
            On the basis of Kanigel’s book, I don’t think I am able to subscribe to the conventional wisdom that “if only Ramanujan had had the proper training what a great mathematician he would have been!” More likely strict training would have turned him, or killed him, off. Einstein himself, a mediocre physics student who found himself obliged to borrow the notes of his friend Besso when preparing for his final examinations, only just survived the academic obstacle course3.    Ramanujan had the sort of education suitable for a bold and imaginative person, more would have weakened his self-confidence and destroyed his enthusiasm for the subject. His very failures were glorious. Although Ramanujan’s claim that he had a function giving the distribution of the primes fails for very large numbers, it is for all that a tremendous achievement. “Of the first nine million numbers, 602, 489 are prime. Ramanujan’s formula gave a figure off by just 53 — closer than the canonical version of the prime number theorem.” (Kanigel, op. cit.) This really is David against Goliath, on the one hand a hundred or more years of research from the cream of the West’s pure mathematicians with all the data available and on the other a man with a slate and a piece of chalk who had never even heard of the Cauchy Integral Theorem. If he’d done nothing else the man deserves a name in the history books — and this was one of his errors!  


 * The title of Kanigel’s book, The Man Who Knew Infinity,  is a misnomer and would be more applicable to a biography of Cantor. To my knowledge  Ramanujan never showed any interest in Transfinite Ordinals and, when he came to England, does not seem to have even heard of Set Theory. The Man who Knew and Loved Ordinary Numbers  would have been a more suitable, but less eye-catching,  title for a biography of Ramanujan. .   

Observations on 'Li'


Li — what is li ? The basic sense seems to be “the pre-established harmony and unity of the universe” (Chu Hsi).

            There are three ideas here. Firstly, Chu Hsi states that this ‘order’ is pre-established, it is not something that we would like to come about (such as the Millennium) nor something that was once but is now out of reach (like Eden) , it is simply there and always will be. Secondly, there is the idea of harmony, which implies variety and movement — one could hardly speak of the ‘harmony’ of Nirvana or the Platonic solids. Finally, there is mention of unity , which implies, amongst other things, that there is no fundamental difference between man and the natural world. Li also has the more mundane sense of ‘rites’, ‘ritual action’ and again ‘propriety’, ‘ceremonial behaviour’. Confucian thought traditionally divides up the cosmos into the triad Earth, Man, Heaven. The latter, Ch’ien, Heaven, is above all ‘order’ : it has permanence and stability and most traditional Chinese thinkers seem to assume, without question, that this underlying order is ‘good’ — ‘good’ in the sense of ‘desirable’ or ‘as it should be’ rather than in the sense of ‘benevolent’. Man has limited free will : he can align himself or not according to the ‘rule of Heaven’ though ultimately he/she will be brought back within the larger scheme even if he rebels against it. If his actions in life, his own li, mirror or embody the overriding heavenly li, he will attain contentment — “the good life consists in attunement to li” (Chu Hsi). Within the human sphere li is thus behaviour which is aligned to the ‘rule of Heaven’, the harmonizing of the here-and-now and the eternal (which is the essential aim of ritual). Courtly ceremonies, politeness, suitable dress and music (to which Confucius gave much importance) are all means to the same end. The following passage from a philosopher of the Chinese classical era shows the connection between li as ‘rational behaviour’ and ‘celestial harmony’ — the spirit is very close to that of the Enlightenment French philosophers who, of course, greatly admired Chinese civilization. “In all matters relating to the functioning of the body and the mind, if they are in keeping with li , there will be a far-reaching self-control; if they are not, there will be a disordering of the rhythm of living. Thus in eating and drinking, in clothing and housing, in [the alternation of] energetic action and stillness, if these matters are in keeping with li , then there is the harmony of moderation : if they are not, then there is physical collapse and disease. In matters of outward appearance and bearing, in meeting and parting with people, in one’s style of walking, if these are in keeping with li , then there is the beauty of refinement about them : if they are not, then they show arrogance, surliness, vulgarity and a barbarous spirit. Thus it is that without li man cannot live, nor his business in life succeed, nor his states and families abide.” (Hsun Ch’ing)
All this, however, is a strictly Confucian approach, stressing as it does deliberate behaviour. The Taoist approach is quite the opposite though the underlying aim is identical : to get human behaviour in tune with the rhythms of the universe. For the Taoist, and even more so for the Zen Buddhist, only a spontaneous response to a situation can have li (or ‘be li’?). Hence the development of ‘grass-style’ calligraphy (‘ts’ao-shu-fa’) which supposedly imitates the movement of grass bending in the wind, or ‘hsieh-i’ brush painting which is done in a flash ideally without the brush even leaving the paper — the great modern exponent of this style is Ch’i Pai-shih. It should be noted that the aim is not self-expression and spontaneity for its own sake, though there were undoubtedly painters, particularly in the latter Sung era, whose work does sink to this level, much as our own ‘modern’ art has done. The underlying purpose is to give expression to latent energies and event-patterns within the individual and within nature :


            “In his paintings Ch’i attempts to express the hidden order of things, to depict their substance as a general symbol and to catch the rhythm of life in nature. (…) Ch’i’s art is a synthesis of the concrete and the spiritual; it is an expression of balance between the objectivity of the world and the subjectivity of the creator.”

                                                                        Josef Hejzlar, Chinese Watercolours


Appropriateness In a number of contexts ‘li´ simply means ‘behaviour appropriate to the circumstances’ which sounds at first sight rather commonplace but is actually a very far-reaching conception. The notion is almost completely absent from the Western cultural tradition which, mainly because of Plato and his influence on Christian theology via Saint Augustine, focuses on absolutes. From the Chinese point of view, the cosmos is in a perpetual state of flux — or, more correctly perhaps, is perpetual flux — although there are certain recognizable repeating event-patterns ; in consequence behaviour which is a proper response to the situation at one moment may well be quite misguided a moment later. Hence the value of the Y Ching (‘The Book of Changes’) which was originally regarded as a technical work, on a par with a treatise on hydraulics — this is why the first Ch’in Emperor, when he ordered the burning of all unnecessary books, spared the Y Ching considering it to be, not a philosophical work, but a work of practical utility. The Y Ching purports to tell the enquirer what the ‘world-situation’ is at that precise moment, and what it is most likely to evolve into. Thus, by responding appropriately to the situation the individual can “be harnessed by, and harness for himself the changing state of nature” (“The Fortune Teller’s Y Ching”).

            If we take this idea of li as appropriateness and apply it across the board, we end up with some interesting conclusions. Moral virtues are, then, not absolutes in the sense of being praiseworthy whatever the circumstances. Modesty is undoubtedly a virtue but when inappropriate is indistinguishable from cowardice. This approach does not necessarily lead to moral nihilism or relativism since we can still hold to certain general principles while being at all times ready to adapt them to the passing moment. Indeed, fitting actions to the circumstances is itself a general principle, while opportunism is a debased form of li since the notion of a higher order and an objective standard of ‘rightness’ is completely lacking. Photographs of buildings can be beautiful in themselves, but to be successful as buildings, churches or houses must harmonize with their surroundings. A baroque cathedral requires a baroque city. The eighteenth-century architects who found it necessary to make vast changes in the landscape when they designed and built a country house were, from the point of view of li, completely right : the building needed an appropriate setting. Of course, one could equally well take the Romantic view that the natural setting should dictate the style of architecture. To be a ‘world-historical figure’, it is not enough to have exceptional abilities : one must be in tune with the underlying (but not necessarily the apparent) Zeitgeist. A genius in the wrong place and time will achieve nothing : if he had not lived during the ferment of mid-seventeenth century England, Cromwell, who had no military training whatsoever and no military interests even, would have remained an obscure country squire. Einstein’s genius fitted his time (1905-1920) — but only just. Twenty years earlier his ideas would have been too novel (thus not li), while twenty years later Einstein found himself fighting a desperate rearguard action against Quantum Mechanics in the name of classical physics as he conceived it.

            What of pure mathematics? Without a doubt contemporary pure mathematics is not li. The idea of producing a proof so long-winded that it requires a computer to print it out, let alone check it (Four Colour Theorem) is just plain ludicrous — and what is ludicrous is by definition not li, is ‘anti-li’. Judged in terms of appropriateness, Fermat’s Last Theorem has not yet been proved and quite possibly never will be since a result in Elementary Number Theory should, to be li, only use the methods of Elementary Number Theory. On the other hand, Wiles’s approach is perfectly acceptable as a means of establishing the Shimura-Tanayama Conjecture since the latter concerns modular forms, a very modern branch of mathematics.

“Here is a branch that is short, and here is a branch that is long” (Ts’ui-wei) I was relieved when I first read that it had been proved that no formula will ever give the complete distribution of the primes : this is how it should be. If he were alive in our scientific and mathematical era, Ts’ui-wei might well have written, “Here is a number that is prime and here is a number that is composite”. Several people have remarked on the agreeable combination of apparent deep structure and randomness that the distribution of the primes exhibits. But this is exactly what nature when left to itself exhibits almost everywhere! To be sure, we do not expect this combination within number theory and its presence at the very heart of the natural number system is highly significant : it suggests that the distribution of the primes is ‘natural’ in a way that man-made distribution functions are not. Although I believe there must be some physical/mathematical constraints for there to be a universe at all, living Nature does not seem to adhere to them with much consistency. I am not referring to the unpredictable element introduced into evolution by chance mutation — though certainly this is an extremely important fact of life. On a more mundane level, just look around you at the extraordinarily diverse and convoluted forms of plants, trees and grasses. One might have thought that the ‘laws of physics’ combined with the ceaseless ‘struggle for existence’ would have left in place only a very few mathematically correct shapes which maximized certain parameters. If we assume an upright stem (or trunk), and the periodic production of leaves and branches around this axis according to a single interval fixed in advance, it can be shown that a distribution based on the angle 360° /Φ2 (roughly 137.5°) is the most advantageous since it keeps successive branches well spaced out while allowing them all to receive the light of the sun. Having worked this out in the study, I went out armed with callipers and protractor to see how many plants and trees actually employ this angle (sometimes known as the Golden Angle). The answer was none at all as far as I could make out.1 In reality shrubs and trees don’t need to bother about all this since their branches, being flexible, can easily curve round to avoid each other. One even comes across plants making the elementary mistake of using an angle of 180° : clearly they have not yet heard of irrational numbers. The moral is that although certain features are fixed in advance, in the genes, plenty of other features are deliberately left unspecified with the result that the plant can adapt to varying environments and improvise its responses (which a man-made mechanical device is incapable of doing). The planned features give the feeling of underlying order, the unplanned the sense of randomness : what you never get in the animal kingdom is shapes taken from a textbook of Euclidian geometry. If you want to find li in this sense of ‘order + randomness’ your best bet is to go somewhere untouched by man, a deserted beach, a wilderness. In practice few trees and shrubs have a single upright trunk anyway and the arrangement of leaves and branches is pretty haphazard — a complete mess mathematically speaking. If you don’t believe me, take a walk in the park.2
            In the fascinating section on li on his website, Dr Watkins and the authors he quotes emphasize the mobile, ‘flowing’ aspect of li. Dr Watkins himself defines li as “the order of flow, the wonderful dancing pattern of liquid” while Alan Watts refers to li as ‘a watercourse’ and David Wade says that li “are essentially dynamic formations”. Now, if the patterns to be found in the ripples of sand-dunes, in the cell-structure of a nettle-stalk, in the protuberances on the bark of trees and so forth, are in some sense ‘residues’ or ‘relics’ of a deeper level of reality which is the Chinese view, it follows that this ‘ground-swell’ of existence, the ‘order’ which is of Heaven rather than of Earth, is in motion. David Wade speaks of the observed patterns as “frozen moments” and Dr Watkins relates that he was at one time haunted by the idea of “the prime numbers as moving particles…eventually coming to rest when they achieved dynamic equilibrium” (Prime Numbers, the Zeta function and Li). I emphasize this because it runs completely counter to the entire Western philosophic and mathematical tradition which has always viewed the Absolute as essentially motionless. Plato’s Ideas are static and were intended to be : by Plato’s time the Athenians had had enough of change since the disastrous Peloponnesian war with Sparta and subsequent political upheavals were in everyone’s memories. These beautiful eternal Forms that man could approach only by way of geometry were utterly removed from the conditions of earthly existence

 “War, death, disease could not affect them and their truth

Did not depend on trial or experiment,

Each step self-evident, demonstrable and sure”.


                                                                        Sebastian Hayes, The Initiates


We fare no better if we jump nearly two thousand years to Descartes’ Co-ordinate Geometry. The algebraic formula of a curve y = f(x)  includes all the points along it and it is ‘our fault’ if you like that we have to laboriously work out particular features — to the eye of the omniscient mathematician, God, all these features and doubtless many more not apparent to us are immediately present. The sixteenth and seventeenth centuries saw the birth of dynamics but in reality motion is always presented as a succession of stills — how could it be otherwise since “notre intelligence ne se représente clairement que l’immobilité”? As Bergson pointed out, the trajectory of the moving particle is a set of points : the moving arrow is never in motion. Newton, perhaps following some sort of a mystic intuition comparable to his intuition of the universality of Attraction, groped towards a true mathematics of motion in his theory of fluxions 3 but even he was unable to make it into a coherent doctrine and he found to his annoyance, his version of the Calculus short-changed by that of Leibnitz which dealt in final ratios between infinitesimal quantities.  We may, in fact, ask whether mathematics is, or can be, li in the sense that it reflects and embodies in its operations and formulae features of a transcendental ‘order’ (that of Heaven, Ch’ien). The Platonic answer is, “Yes”, and almost all pure mathematicians in all eras are either open or covert Platonists. The vision of Kepler and Newton was of a Creator God who decided once and for all what the rules governing the universe were to be; moreover, these rules were mathematical in nature and only mathematicians could hope to detect and decipher them. Even today, although most mathematicians have long since dispensed with a Creator God, they hold firm to a strong belief in mathematics, not as an aid to industry and science, but as the nearest one can get to certainty in this world. “If there is another world, then it must be mathematical” is the unspoken (and occasionally outspoken) assumption. However, mathematics deals in truths which are essentially unchanging and has the greatest difficulty representing movement — much greater difficulty than music or even painting — so this means that the transcendental realm, if it exists at all, must be static. My personal feeling is that there is indeed a higher level of reality but that it is not static, and so for this and other reasons is unmathematical. The Eastern traditions, inasmuch as one can generalize, tend to view the ‘beyond’ as being ‘in motion’, as, for example, the dharma of Hinayana Buddhism, a flux of evanescent point-instants, or the ceaselessly changing but endlessly recurring event-currents studied by the Y Ching. Mathematics aims at finality and by and large achieves it — which is why it is so impressive. Generally speaking, art does not. A painting, however, well-executed and inspired is hardly going to stop someone else trying his or her hand at the same theme and each generation finds itself obliged to produce its own love songs, funeral dirges and tales of adventure. But once someone has stated that if p is prime, ap-1 = 1 (mod p) , that is it. The theorem can be generalized and proved in different ways but, for all that, it stands there as unchanged and unchangeable as a rock. Mathematics tends to advance by accretion, by building on what has been already established, and for this and other reasons appears to be timeless. Successive generations of mathematicians simply uncover different portions of a gigantic sphinx buried in the sand.

            There is, however, a serious limitation to this approach : precisely because mathematics aims at finality and logical consistency it cannot tolerate anything the slightest bit random or subjective. Thus it cannot be li in the ‘Order + randomness’ sense : it is ‘Order + Order + Order + ….’. The beauty of Euclidian constructions or modern algebraic formulae is a completely different beauty from that of ripples on sand dunes : it is an unnatural beauty. It is not necessarily the worse for that but one cannot have everything and there is something deeply offensive in statements such as Bertrand Russell’s :

“Mathematics takes us still further from what is human, into the region of absolute necessity, to which not only the actual world, but every possible world must conform” .

Is this true? I say it is not — at any rate not without very serious qualifications. Go out into Nature and you receive an impression that is far, far closer to the Taoist vision of the casual combination of the haphazard and the constrained than it is to the strait-jacket of modern or ancient mathematics. The ‘region of absolute necessity’ of which Russell speaks is essentially a figment of his imagination, a projection, since it neither corresponds to the reality ‘down here’ nor ‘up there’. Unpredictability has made an astonishing come-back into science during the latter twentieth century though rather few people have learned useful lessons from this. I have sometimes wondered whether it would be possible to introduce randomness into a mathematical system without wrecking it completely — as far as I know noone has tried. It may be that only a very different type of mathematics, one that precisely does allow a certain degree of randomness and subjectivism, will be able to cope adequately with the shifting realities of the quantum domain which lies below the sharply defined particle-like level of reality we are more familiar with.
Sebastian Hayes





1  I have since then found some lilies and hollyhocks that use something approaching the Golden Angle.


2   Phyllotaxis in plants does exist but it seems to have more to do with ‘close packing’ at the tip of the growing plant than with optimizing air and sunlight.


3 “I consider mathematical Quantities in this Place not as consisting of very small Parts; but as described by a continued Motion. Lines are described, and thereby generated not by the apposition of Parts, but by the continued Motion of Points; … Portions of Time by a continual Flux : and so in other Quantities. These Geneses really take place in the Nature of things….” Newton De Quadratura

Observations on the Distribution of the Prime Numbers

st1\:*{behavior:url(#ieooui) }

/* Style Definitions */ table.MsoNormalTable {mso-style-name:”Table Normal”; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:””; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:”Times New Roman”; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400;}

the first question to ask is: Could the Distribution of the Primes be other than what it is?    Seemingly not.  Could a ‘universe’ exist where the basic constants of physics, such as c, the speed of light in a vacuum,  and g, the gravitational constant, were completely different? Most physicists I have talked to say yes. In any physical universe there would have to be an upper limit to the transmission of information, but there is no reason why it should be anywhere near the value of c. As for g, it has been suggested by some physicists that it actually has changed during the evolution of the universe we live in.  Could there be a universe where Hook’s Law, or Boyle’s Law or any of the other basic laws of physics was not valid? Conceivably.
The distribution of the primes is thus, in some sense, more ‘necessary’ or more fundamental than even the most basic physical constants and principles. (It has apparently been shown that not only is there no formula which will give us the distribution of the primes exactly, but that no such formula can possibly exist.)  But the distribution of the primes is not a logical law nor even a mathematical one : it is a physical law. Consider a hen laying an egg —  I assume a hen that can only lay one egg at a time. This hen carries on laying eggs indefinitely. We make copies of the egg situation at successive moments thus deriving all the natural numbers (in concrete form). They are already ordered, firstly temporally and secondly quantitatively, so there is no need for any ‘Axiom of Order’, let alone for the Axiom of the Least Upper Bound or the Axiom of Choice. We then test as to whether we can make each ordered collection of eggs into so many smaller, non-unitary, numerically equal collections. If we can,  we put such collections on the right, if we can’t we leave them on the left.   Thus   000000 goes on the right, since we can break it down into 
00 00 00, while 0000000 goes  on the left because we can only break it down into 0 0 0 0 0 0 0 . This procedure can be continued indefinitely and requires absolutely no knowledge of mathematics whatsoever. There is no ‘intelligence’ involved as such, no need to posit the existence of a supreme Mind behind the scenes. As soon as you have a ‘world’ where there are ‘little bits floating around’ you have primality and non-primality and you are landed with the distribution of the primes whether you know it or not, and whether you like it or not.

            As far as I am concerned I am very pleased about this : it is the victory of Nature, ignorant, witless Nature over human intelligence. “Pull down thy vanity, mathematician, pull down”. To judge from the writings of certain people, you would imagine that the actual distribution  of the primes in reality was a crude and misguided attempt to approach the Li function or the Riemann distribution.   And yet at the same time there is nothing special about the distribution of the primes — at any rate not to my eyes. I am sceptical about the so-called ‘beauty’ of this distribution and am convinced that no one would for a moment pay any attention to it were it not for the extraordinarily complicated mathematics that is (indirectly) involved. The curve on the graph is no different from hundreds of others, and the coloured 3-D pictures no different from thousands of other vaguely psychedelic computer simulations. Show patterns based on the distribution of the primes to an assortment of people who don’t know their origin and see whether they pick them out from the rest. I wager no one would. And, incidentally, this is not true of all shapes, all curves : there are plenty of mathematical shapes which really are inherently beautiful and fascinating, the parabola, the equiangular spiral spiral mirabilis, that Bernoulli was so enamoured of he had it inscribed on his tombstone. Even the graph of log x  is, to me, aesthetically more satisfying than the Prime Distribution.
So in a sense the prime distribution is ‘nothing special’ : it has the supreme Zen quality of being as it is and not otherwise — but then so does everything else.  We should, I think, consider what exactly we are looking for if we want a ‘reason’ for the distribution of the primes. An explanation should involve facts or principles that are much more fundamental than the fact or behaviour we want to explain. A vast amount of so-called ‘elementary’ Number Theory is based on such basic truths as A number cannot be at once odd and even or pragmatic procedures such as  the Euclidian Algorithm which is so simple (yet so powerful) that it could be carried out by a caveman using collections of twigs or pebbles.
But on what great truths is the Prime Distribution based? I see none. Only that Some numbers are prime and others are  not and they come in a certain order  which is not so much a truth as a fact of experience. It is for this reason that I have always derided any attempt at finding any  great significance in the distribution — until I stumbled across Matthew Watkins’ website.

             Seemingly, we have to recognize that there are certain mathematical assertions that may will never be ‘proved’, not for any highfaluting Gödelian reason, but simply because they are about as basic as one can go. Quite possibly Goldbach’s Conjecture (“Every even number > 4 is the sum of two odd primes”) is a case in point. Wiles’ much vaunted proof of Fermat’s Last Theorem  is only valid if one considers that the assertion on which it is based, namely the Taniyama-Shimura Conjecture, is in some sense more basic than the (apparent) fact that there are no cubic or higher order Pythagorean Triples. (Actually, it would seem to me there must be some strictly physical rather than mathematical reason for the truth of Fermat’s Last Theorem, something involving dimensionality of the real, rather than the mathematical, world.)   Also, the Distribution of the Primes is almost completely useless — if we except its recent use in codes.  No great  discoveries in physics depend on it, or seem likely to. (I am aware of the ‘chaos’ interpretation of the Prime Distribution and find this interesting but it was not observation of the Prime Distribution that gave rise to Chaos Theory.) 

            The uselessness of the Prime Distribution highly significant. As I see it, the Distribution of the Primes in itself gives us negligible knowledge about the physical world (I would say none at all) : if such knowledge really were embedded in it surely we would have got the pearl out of the oyster by now. Of course, the problem of the Prime Distribution has been a most stimulating and entertaining intellectual exercise for generations of mathematicians but that is not the point. But the  mathematics enveloping  the Prime Distribution is no more revealing of the structure of the world we live in than Mozart’s symphonies, and we don’t go to Mozart for knowledge but pleasure.

            At the same time, the interest the Prime Distribution is currently arousing (of which I was not aware until scanning Dr. Watkins’ website) is not just intellectual and aesthetic. There are articles on the distribution of the primes which view it as a ‘chaotic’ phenomenon, there is the claim that the Riemann zeta function is a generator of a vast class of functions, and, most significant of all, we have the interpretation of the zeta function as “a thermodynamic partition function defining an abstract numerical gas”. What this amounts to is that the Distribution of the Primes has a quasi-physical nature : so maybe it does have something  to tell us about reality after all.  

So what to conclude?  The only possible way forward is to suppose that the Distribution of the Primes tells us something about a deeper level of reality from which the visible and intellectual universe we know once emerged, and is still emerging.  Is the universe self-sufficient?  Self-explanatory?   It would seem not. All societies and practically all thinkers have at some stage found it necessary to appeal to some being or principle which is outside the physical universe. Newton and Kepler still believed in a supremely intelligent Creator God  and the rationalist thinkers of the Enlightenment, despite their hostility to organised religion, still needed a Prime Mover or a vague impersonal Deity. Mathematicians found themselves in a quandary when the nineteenth century brought about the death of God : they were left with a handful of equations and formulae without a supreme intelligence that produced them. And curiously, the twentieth has taken us right back to the idea of a beginning in time and a Space-Time singularity beyond which is….?  

            Most pure mathematicians, closely followed by theoretical physicists, are secret — sometimes overt —  Platonists and do indeed posit a reality beyond the material. However, they are unanimous about this ‘higher reality’ being mathematical in nature. They do not ever think for a moment that it may be professional blindness that impels  them to this conclusion. Musicians would doubtless be more attracted to the Vedic doctrine that “In the beginning was the sound”  and lovers to the idea that the universe came about through amorous play.

           I do believe that there is “an order of things of an entirely different kind lying at the foundation of the physical order”, as Schopenhauer put it, but I am equally convinced that this order is utterly unmathematical. “The Tao that can be named is not the original Tao”. Lao Tze in the fifth century BC was living in a mainly verbal, not numerical,  culture : today he would almost certainly  write “The Tao that can be numbered is not the original Tao”. Within the reality beyond this one — let us call it K0 as opposed to K1 — there would, as far as I can see, be no separability and no discreteness, no shape and no form. It would be  a domain beyond,  and prior to, plurality : the only number appropriate to it would  thus be 1 (or equivalently 0). Is there anything at all we can say about it?  A little. It is presumably ‘continuous’ which nothing in this universe actually is. The current universe must in some sense be contained within it, since otherwise nothing would come of nothing and manifestly something has. Also, surprisingly in a way, it would seem that K0 is in motion, in perpetual motion.  In the more speculative part of Dr Watkins’ website he speaks of the ancient Chinese concept of li  and views li as “essentially dynamic formations”, perhaps analogous to Newton’s mysterious ‘fluxions’ which have been completely rejected by modern mathematicians. (Newton was incidentally much interested in alchemy and mystical literature.).

            The only way I can take on board the idea that the Prime Distribution is significant and meaningful is by interpreting it as a sort of ‘frozen wave’ on the ocean that is K0. The physical world, K1, is not primary, but is a residue, an offshoot, of K1  in much the same way as David Bohm’s Explicate  Order is aan offshoot from the original Implicate Order.  However, it may be a ‘first order’ residue or offshoot and thus hold precious information about the reality that is beyond this one.

“The child is father to the man” (Wordsworth) —  a most paradoxical statement. Wordsworth presumably  meant that the child was closer to the source and therefore had a more vivid memory of what existed before birth. It may be, then, that we see in the Distribution of the Primes a relatively pure trace of what is almost (but not quite) unknowable and from which the entire physical universe has emerged. Whether true or not, this is certainly a beautiful thought and I am most grateful to Dr Watkins for introducing me to it. For what is striking is that the natural numbers, which by their discreteness and separateness, are entirely of this world and can say nothing about the beyond, but nonetheless by their distribution  perhaps point towards a reality that is the very opposite of all this since it is single, unitary, continuous and in perpetual motion.       

Contra Cantor

                                                 Contra Cantor




Passing in review the various paradoxes, linguistic and mathematical, that bothered logicians around the beginning of the last century, Russell and Whitehead — I shall henceforth just say Russell — found that “they all result from a certain kind of vicious circle” that consists in “supposing that a collection of objects may contain members which can only be defined by means of the collection as a whole” (RW, 37). As an example of what they had in mind they cited the statement “All propositions are either true or false”.  Russell comments:


   “It would seem that such a statement could not be legitimate unless ‘all propositions’ referred to some already definite collection, which it cannot do if new propositions are created by statements about ‘all propositions’ ” (RW, 37).


            More mathematical examples are the Set of All Sets — is it a member of itself? — or Burali-Forti’s paradox of the Ordinal Number of All Ordinals.

            Russell suggests stopping such statements being made, or at any rate being accepted as meaningful by logicians — “Whatever involves all of a collection must not be one of the collection”  (RW, 37). Poincaré coined the term ‘impredicative’ for statements that define an object in terms of a collection to which the object being defined belongs. He considered that impredicative definitions should be banned from mathematics.

            But what happens if we do want to talk about ‘all’ the members of such a collection? This, Russell assures us, need not pose any insuperable difficulties. A statement about ‘all’ of a certain collection is of ‘higher type’ than a statement about specific members of the collection and in consequence must be excluded from the range of application of the statement. The Set of All Sets is ‘of higher type’ than any Set you like to mention which will be one of its members, and so we do not get the ridiculous situation of the Set of All Sets being at one and the same time a member, and yet not a member, of itself.

            At first sight Russell’s solution sounds both sensible and effective. However, it soon became a major embarrassment to him, for not only did strict application of the theory of types make a lot of proofs very cumbersome it actually invalidated a lot of them. As Weyl and others pointed out, analysis turned out to be littered with impredicative formulae. This stimulated the Intuitionists to reformulate the whole subject but Russell had no intention of taking such a heroic course. He states airily in the Introduction to the 1927 re-edition of  Principia Mathematicae that “though it might be possible to sacrifice infinite well-ordered series to logical rigour, the theory of real numbers….can hardly be the object of reasonable doubt (RW, xlv, 1927). But why not? Russell’s reply sounds suspiciously like an eighteenth century clergyman’s assertion that “the eternal existence of a Creator God cannot seriously be questioned”.  

            Subsequent mathematical discussion of these issues has clouded rather than clarified the basic principles at stake: in particular far too much attention has been given to the validity or otherwise of the so-called Axiom of Choice. As it seems to me, the problem is not ‘impredicative statements’ as such — this is something of a red herring — but a failure to distinguish between  ‘definite’ sets  and ‘indefinitely extendable’ sets. By definition the former are fully constituted once and for all, and thus listable, whereas the latter are not. Confusing the two is the real ‘category mistake’ at the root of all the kerfuffle. 

            In conversation we normally deal with two, and only two, types of sets or collections, those that are what I call definite and those that are continually being extended. The persons living in the UK at the present moment constitute a definite set which can be (and actually is) listed — at any rate within the bounds of bureaucratic error. The set of all human beings, past, present and future, is not a definite set but a continually expanding one, and one that will presumably continue to expand as long as the species exists. 

            Self-referential statements of the type “Whatever I say is untrue” only cause trouble because there is a certain ambivalence about the type of collection we are dealing with. One schoolboy philosopher exclaims to another during the lunch-break, “You know, there’s not a single thing I’m sure about!” His companion rejoins, “Ah! but there is one thing at least you’re sure about, and that is that you aren’t sure about anything!”

            Sceptic’s first statement only referred to the fully definite set of all beliefs he had actually considered up to that moment, and a standpoint of all-round scepticism was  not one of them. It would be quite perverse to consider his first statement as referring to the collection of all possible beliefs the human species might conceivably entertain. The belief “I don’t believe in anything” was not, at the beginning of the discussion, a member of the Set of All Beliefs Sceptic Had Considered (a definite set) but after the end of the conversation it was. His first statement was time and context dependent : it was not an intemporal assertion.

            At a future time Sceptic, if he were consistent, would say, “I’m not sure about anything — except the statement I made to you yesterday that I wasn’t  sure about anything I’d considered up to then.” The Set of Beliefs Sceptic Was Sure About started off empty, then contained one member, perhaps went on to containing two members if Sceptic carried on with declarations in the same style, and so on.

            All this hardly seems worth dwelling on. So why the fuss ? Because, when it comes to mathematics, the situation is very, very different. Mathematical assertions are not generally considered to be time and context dependent, they are in some sense held to be ‘eternally true’, true even before human beings or the universe we live in existed. Once true, always true, when it comes to mathematics. 

            So far it has not been necessary to introduce the fatal word ‘infinite’ but it cannot be withheld any longer. Can any so-called ‘infinite’ set ever be a fully constituted totality, a ‘definite set’ ? I do not see that it can. The only sensible way of treating ‘infinite’ sets is to view them as open-ended partly definite sets which can be extended as far as we wish. This is entirely in line with the way we proceed in normal speech and conversation — which, one strongly suspects, is the main reason mathematicians disapprove of such an approach.  

            What we must above all not do is to treat an open-ended indefinite set as a fully constituted one. But in mathematics, ever since the advent of Cantor and Dedekind, this is exactly what is done in mathematics. This is the essential ‘category mistake’, the sin for which there is no forgiveneness, not Russell’s ‘self-referential misapprehension’. Some mathematicians, notably Cantor himself, were frank enough to put their hands on the table and declare that they really did believe in the existence of the transfinite. Even Russell, though at the time a positivist, introduced into his Principia the controversial Axiom “That infinite classes exist” (RW, *120.03). Most modern mathematicians are, however, content to evade the issue : as Davis and Hersh point out, the modern mathematician is two things at once, a Platonist in the study, but a Formalist when confronting the outside world.


Cantor’s Proofs


Cantor’s proofs are of two main types, one acceptable (to me), one not. Let us first take his proof that the rational numbers between 0 and 1 form a null set.

            This depends on two prior results, his ingenious diagonalisation of  Q, the  rational numbers, and the well-known limit (as usually stated)

              lim.  n ® ¥  (1/2 + 1/4 + 1/8 + ………+ 1/2n)  =  1.

            Since, for any positive rational number you like to name, say 1/N, I can always find a smaller one, namely 1/(N+1), it looks at first sight as if it were impossible to list the rational numbers, first, second, third &c., i.e. put them in one-one correspondence  with N, the natural numbers. But Cantor showed how this could be done. For example, those between 0 and 1 can be listed as follows:


            0, 1, 1/2, 1/3, 2/3, 1/4, 2/4, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6…..


            This is not an ordering by increasing or decreasing size but that does not matter, nor does it matter (too much) that there will be some redundancy — 2/4 will appear though we already have 1/2. The point is that given any specified fraction between 0 and 1, it will eventually crop up and can be attributed an ordinal from the natural numbers, hundred and seventy-seventh, ten-thousandth, or what have you. We do not need to know what this ordinal is, but we do know that we can provide it if challenged to do so if given enough time.   

            There is nothing objectionable in this procedure since we do not have to envisage the rational numbers between 0 and 1 as a definitively constituted totality existing in some Platonic never-never land — though this is undoubtedly how Cantor himself viewed them.

            The sequence S = 1/2, 1/4, 1/8……..1/2n-1  is a geometric sequence with constant ratio 1/2. The terms are respectively t1 = 1/2, t2 = 1/22, … = 1/2n.   If we take partial sums s1 , s2 , …. sn we have

            s1 = 1/2 = (11/2); s2 = 1/2 + 1/4 = (11/2) + (1/2 1/4) = 1 1/4 ;  and

sn = (1 1/2n)  < 1 for all n Î N.

            The series Sp of successive partial sums is clearly, in my terms, not a fully constituted totality but an indefinitely extendable one. Many slapdash authors, who ought to know better, talk about 1 being “the sum to infinity” of the series: in fact, as is generally the case with series, the limit 1 is unattainable and there is no definitive sum, only a perpetually changing one as n increases, which is why we speak of ‘partial’ sums, though the word is misleading.

            Cantor now invites us to construct a sequence of open intervals where each interval {In} has centre rn . Each interval starts at the point (rn k/2n+1) and ends at the point (rn + k/2n+1) so it has length twice k/2n+1 or k/2n. Since we have got a way of listing the rational numbers we drop them one after the other into these intervals. And the total length of n intervals is


   k/2 + k/22 + k/23 + …..+ k/2n   =    k( 1/2 + 1/4 + 1/8+ ………..)  

                                                            =  k(1 1/2n-1)  < k  since (1 1/2n-1) < 1


            Provided we can decrease k as much as we wish, we can squeeze ‘all’ the rationals between 0 and 1 into an arbitrarily small compass. So a line segment a foot (or a millimetre) long is nonetheless capable of containing an ‘infinite’ quantity of numbers, many more than there are stars in the sky. Cantor has thus, to his own satisfaction at least, shown that the rationals between 0 and 1 are what he calls ‘a null set’: they take up so little space it’s as if they weren’t there at all.

            One might baulk a little at this over-literal way of considering numbers as points on a line (which they are not) and, of course, in the real world there would be a definite limit to the size of k — it could not be made smaller than that of an elementary particle, for instance. However, one might be prepared to let this pass as temporary exercise of mathematical licence. The main thing is that there is no need to view this procedure as having been carried out for ‘all’ the rationals 0 < q < 1 but only for as many as someone likes to mention. It is usually stated in maths books that this result (the infinite compressibility of Q) is ‘counter-intuitive’ : it would be more accurate to describe it as being unrealistic. This is not a problem if we make sure to continually bear in mind that a mathematical model or construction is not itself part of the physical world. 

            Other bizarre results such as the length of the Koch curve fall into the same category. Starting with an equilateral triangle, then building one on the middle third of each side and continuing in this way ad infinitum, it appears that the perimeter of the curious jagged figure can be made to exceed any stipulated length provided you go on long enough even though the whole creature can be inserted in a disc of, say, radius one metre. Of course, in any actual situation there would once again be a limiting size beyond which it would not be possible to go : there is certainly no need to conclude that we have here a case of “infinity in the palm of your hand” (Blake), though some people seem to think so.  

            If now we pass to Cantor’s ‘proof’ that the real numbers are not denumerable, we have a very different kettle of fish. A collection is considered denumerable if it can be put in one-one correspondence with N, the natural numbers — broadly speaking can be listed. We have seen that this is possible for the rationals between 0 and 1. Cantor now invites us to consider an enumerated list of all the real numbers (rationals + irrationals) between 0 and 1. These reals are exhibited in the form of non-terminating   decimals — any other base would be just as good. To avoid ambiguity a fraction like 1/5 has been listed as 0.19999999…..  instead of 0.2 —  absurd though it would be to do any such thing. So there they all are :


            s1 = 0.a11 a12  a13 …….    

            s2 = 0.a21 a22  a23…….         

            s3 = 0.a31 a32  a33…….         


            where every a is a natural number between 0 and 9.

            Cantor now produces out of a hat a ‘number’ that has not appeared in the list, call it b. We concoct b by ‘doing the opposite’ as it were. If a11 is 1, make b1 (the first digit of b) = 2, if a11 ¹ 1 make b1 = 1. Likewise for a22 , a33, giving b2 , b3  and so on. This defines b = 0.b1 b2 b3 ……   But this ‘number’ has not appeared in the list since it differs from s1 in the first place, from s2 in the second place and so on. Therefore, the real numbers between 0 and 1 are not denumerable, and since these are only a small part of R as a whole, R, the Set of all Reals is not denumerable — a paradoxical result since N is already an ‘infinite’ set, so R must be of a higher type of infinity than N, Q.E.D.

            Now this proof by contradiction wholly depends on the original assumption that all the reals between 0 and 1 have been listed — not one has been left out. Since Cantor shows one that has been left out, the assumption must have been wrong in the first place. However, if we do not view the reals between 0 and 1 as a folly constituted totality, listable and enumerable, but as an open-ended extendable set, the argument collapses like a burst balloon. All that could ever be on view at a single moment in time is an array of decimals — or other r-esimals — taken to n places. A competing generator handled by Cantor in person cannot produce any real number for given r and n which is not on show since all possibilities are covered. All Cantor can do is print out an arbitrary ‘diagonal’ rational number between 0 and 1 to m places with m > n.

            Since the base used is immaterial let us use base 2 and print out numbers between 0 and 1 using only the symbols 0 and 1. In the first print out we only go so far as one digit, then we print out all numbers with two digits after the point and so on. We have


            0.0       0.00                 0.000

            0.1       0.01                 0.001

                        0.10                 0.010

                        0.11                 0.011






            To keep ahead Cantor has to counter with a number containing at least one more digit after the point, but whatever he chooses this number will come up at the next print out. Thus the struggle is ding-dong and inconclusive. It should be stressed that the ability to view R as a whole does not depend on our limited range of vision or the size of the memory of the computer or any other technicality : the reals are simply not exhibitable in their full extent because strictly speaking they do not have a ‘full extent’. Even God would not be able to view ‘all’ the real numbers at one fell swoop because there is no ‘all’ to view.

            Very similar is Cantor’s ‘proof’ that, for all non-empty sets A, the cardinality of A  <  the cardinality of the Power Set of A. (The Power Set, remember, consists of the sets that can be constructed  from the members of A e.g. if A = {1, 2, 3}, the P(A) consists of  A itself = {1, 2, 3}, also the sets {1, 2},

{1, 3} and {2, 3}, the singleton sets {1}, {2} and {3} and Ø, the Empty Set.)  Obviously, for ordinary ‘finite’ sets the theorem holds, but, since things become so disconcerting when we pass to consider transfinite sets, Cantor wonders whether it remains valid.

            In typical fashion Cantor proceeds to assume that an exhaustive mapping from A to P(A) has been carried out. Since for any a we can, faute de mieux, pair it off with the set of which it is the sole member, namely {a}and this is far from exhausting all the possibilities, Cantor concludes that the cardinality of P(A) cannot be less than the cardinality of A.

            We are now invited to consider the Set B given by:


B:   {a Î A, a ¹ f(a)}


i.e. the set containing all those elements which are not members of the sets they have been paired off with in the mapping. It would seem that B is non-empty. But if so, B, being a bona fide subset of P(A) must have a pre-image under this mapping, ab say, i.e. there is an ab in A such that f(ab) = B. But ab itself must either belong to B, or not belong to B. We find that if it does it doesn’t and if it doesn’t it does. Thus contradiction. Therefore there can be no such mapping

f: A  ®  P(A) and so  card. A < card. P(A)  Q.E.D.

            This argument is worthless because Cantor has envisaged a mapping that cannot ever be carried out in full, even in theory : he is treating an ongoing, indefinitely extendable mapping as a completed act.


Sets with oscillating membership


If we regard the proposed function f, not as already existent, but as in the process of being defined, we get a different picture. Suppose we have carried out a bijection from A to P(A) to n places — which is all we can ever hope to do — and we have a non-empty set B satisfying Cantor’s condition, namely that individual members of B have not been paired off with themselves viewed as sets. B does not as yet have a pre-image in A so, noting this, we pick some element in A not yet used, ab say, and form (ab , B).

            Now, prior to its being assigned an image under the function f, the element ab did not have an image ; however, now that it has acquired one we realize that it has automatically become a member of B (which it was not before) and so is disqualified from being the pre-image of B. We thus remove ab from (ab, B) and look for another pre-image. The same situation develops and one might justifiably conclude that, since we are perpetually going to have to change B‘s pre-image as soon as we assign one, then any function of the desired type A to      P(A) is going to be of a very provisory nature and so, we might decide, for this reason, to conclude that the cardinality of P(A) must be ‘greater’ than that of A. This is not quite what Cantor says though.

            This oscillating procedure whereby one or more element changes sets perpetually is entirely normal outside mathematics — in fact it is really only in the unreal world of mathematics that sets ever do get constituted definitively once and for all. Individuals are always changing their set membership as their age changes, as their beliefs mature, as the frontiers of countries  are  redrawn and so forth. Even species evolve and change into radically different ones, so we are told, and nothing stays exactly the same for very long.  

            A typical example of ‘oscillating membership’  is provided by Russell’s Village Barber Paradox though Russell did not realize this. Russell invites us to consider a Village Barber who claims he shaves everyone in the village who does not shave himself and only such persons. The big question is : Does he shave himself? If he does shave himself, he shouldn’t be doing so — since, as a barber, he shouldn’t be shaving self-shavers. On the other hand, if he doesn’t shave himself, that is exactly what he ought to be doing.

            The contradiction only arises because Russell, like practically all modern mathematicians, insists on viewing sets as being constituted once and for all in the usual Platonic manner. Let us see what would actually happen in real life.  It is first of all necessary to define what we mean by being a self-shaver : how many days do you have to shave yourself consecutively to qualify ? Ten? Four? One? It doesn’t really matter as long as everyone agrees on a fixed length of time, otherwise the question is completely meaningless. Secondly, it is important to realize that the Barber has not always been the Village Barber : there was a time when he was a boy or perhaps inhabited a different village. On some day d he took up his functions as Village Barber in the village in question. Suppose our man has been shaving himself for the last four days prior to taking on the job, so, if four days is the length of time needed to qualify as a self-shaver, he  classes himself on day d as a self-shaver. He does not get a shave that day since he belongs to the Self-shaving set and the Village Barber does not shave such invididuals.

            The next day he reviews the situation and decides he is no longer in the Self-shaving category — he didn’t get a shave the previous day — so he shaves himself on Day 2. On Day 3 he carries on shaving himself — since he has not yet got a run of four successive self-shaving days behind him. This goes on until Day 6 when he doesn’t shave himself. The Barber spends his entire adult active life oscillating between the Self-shaving and the Non-self-shaving sets. There is nothing especially strange about this : most people except strict teetotallers and alcoholics oscillate between being members of the Set of Drinkers and Non-drinkers — depending of course on how much and how often you have to drink to be classed as a ‘drinker’.

            This example was originally chosen by Russell to show that the self-referential issue has nothing necessarily to do with infinity. Nor does it, but it does depend on the question of whether sets or collections are time and context dependent.




One understands, of course, why mathematics as the exact science par excellence does not want to be bothered with such messy creatures as sets with varying membership but it is worth stressing how different the abstract systems of mathematics are from conditions in the real world. Perhaps, in the future a kind of mathematics will arise which will be time and context dependent while still remaining more precise than ordinary speech. Mathematics does indeed model time dependent processes, notably via differential equations, but only from the outside : time itself in the sense of change is never allowed to be present within the boundaries of the mathematical system. Mathematics has managed to do something which sounds equally difficult, namely to model randomness (up to a point) and there is an interesting chapter discussing this complex issue in a recent book, How Mathematicians Think, by William Byers (Chapter 7). However, randomness is still, like time, studied from the outside although it is getting steadily closer and closer to the fixed ideal world of mathematics via Heisenberg, chaos theory, Gödel’s Incompleteness and so forth.  Maybe the twin shadows of time and chance will in the end blot out the pure light of eternity after all.    


NOTE:  This article appeared in Issue 223 of  M500, the magazine of the Department of Mathematics of the Open University, UK.