Mathematical Beginnings

 

Early in 2002 I sketched something for which the above might have been a reasonable title, and in trying to bring it to life in 2005 I have had the benefit of a book by Michael Potter, Set Theory and its Philosophy (OUP, 2004). This draws an important distinction in its second chapter between one kind of aggregates called fusions and another, called collections, which include sets in the sense of set theory. This distinction is very attractive, but on a critical reading it is not quite as clear as Potter makes it out to be. Now in trying to give it critical clarification I have an axe to grind. I want to argue that set theory is no proper starting point for mathematics. After all, no-one has ever remained in total ignorance of mathematics until, in intellectual maturity, meeting set theory and thereupon embarking on mathematics with set theory as its foundation. This might seem a merely practical, educational observation, but I want to argue that the very confusion I hope to point out in the fusion / collection distinction can only (admittedly, this too is a psychological ‘only’, and I do not claim any logical necessity for it) be sorted out if we consider the early stages of mathematics as experienced by children and then, as grown ups, draw philosophical conclusions from them (and then educational conclusions as to how mathematics might be taught).

          During the war we frequently heard from the BBC the depressing statement “a number of our aircraft is missing”. This pedantry upset us because the crews of these aircraft were individual human beings and their machines were individual aircraft. Nevertheless, the pedantry was not an error: it was an allowable, if tactless, grammatical trick whereby objects which, in Potter’s words, “we might otherwise refer to in the plural” are referred to in the singular. Another example, I believe, is the way, after the Civil War, “the United States are” became “the United States is” in official despatches. Or “the Government are unable to decide” could become “the Government has decided”.

          These singular usages are examples of fusion-talk. “A collection, by contrast, does not merely lump several objects together into one: it keeps the things distinct and is a further entity over and above them.” This is where I have to quibble. The implication is that a fusion too lumps several objects into one but fails to keep them distinct. But only grammatically does it lump them into one, and it does keep them distinct.

Then: “The contrast between collections and fusions becomes explicit when we consider the notion of membership. This is fundamental to our conception of a collection as consisting of its members, but it gets no grip at all on the notion of a fusion. The fusion of the cards in a pack is made up of just those cards, but they cannot be said to be its members, since it is also made up out of the four suits. A collection has a determinate number of members . . . ” But membership can grip the notion of a fusion if we use single inverted commas and call it ‘membership’ to show that we have changed the rules (to coincide, perhaps, with early writers on set theory, including apparently Peano in a slip – see Potter, page 23). Or call it quasi-membership if you do not like my distinction between single and double inverted commas. A pack of cards can have 52 card-‘members’ as well as four suit-‘members’. A pre-war squadron of fighter aircraft could have nine aircraft-‘members’ and three flight-‘members’, each of which would have had three aircraft-‘members’. Then, of course, a fusion would not have a determinate number of ‘members’, making fusions useless for set theory.

Or perhaps Potter meant “A collection not only, unlike a fusion, lumps things together, it also, like a fusion, keeps them distinct, but the serious difference is that it is a further entity over and above them”, but this only leads us to the main quibble I want to discuss, namely what he meant by an entity over and above a collection’s members.   

          To test whether any over-and-above entity is inescapably needed for set theory I shall examine an intermediate case, which I shall call a configuration, and illustrate it by thinking of a girl playing with Lego pieces. She has thirty of them spread over a table, but occasionally one or two fall on the floor, out of the fusion, as we have to call it for the moment – but the girl regards the fusion as still the same one, the first sin against the principles of set theory. She plays with what are left, putting various numbers of them together in different shapes and then breaking them up again. At each stage the source of her pleasure is the full fusion of the pieces put together and their leftovers on the table. Suddenly, she finds she has used all the remaining pieces on the table, let us say twenty-five, and that the result is remarkably attractive and looks like a dog. She proposes to keep it and call it Fido. Unfortunately Fido is liable to lose its tail or a leg, or even both, and she still loyally calls what remains Fido, a second sin against set theory.

          Of course, we can intervene and tell her that she must obey our set-theoretical rules and restrict the name “Fido” to the full twenty five pieces in their proper dog-shape, but whether we do or not we have certainly arrived at a stage that enables us to carry out an entity-over-and-above test, and this is the stage for which I suggest the term “configuration”. Clearly, nothing is there beyond what was there already, except for the name “Fido”, and while this is an entity in a context of typography it is not one for the purposes of anyone, like Potter, wanting to set up a justification for set theory. So perhaps, provided we impose our rules on the girl, we have the analogy we want, and can say “a set is like a configuration; something distinctive has happened to its members that brings our viewpoint into issue, but it does not force on us this over-and-above business”.

          Alas, we can’t. What about a singleton set? We can pick up a single piece of Lego and give it a name and call it a configuration, but that cannot make it differ from what it was before we named it. Still less can we achieve the null set – there is nothing to name, nothing that can configure. If we want set theory, entities-over-and-above are inescapable. What I want to propound is that achieving them will be a great deal less puzzling if we leave set theory aside and consider the very earliest stages of basic mathematics, and problems that arise from teaching them.

          Incidentally, Potter “modifies” his translation of Frege’s declaration that a class cannot be a fusion in a way that disguises Frege’s rather complicated terminology. The full quotation can be found in the 1895 Schröder article, on page 195 of the Olms Kleine Schriften, earlier in which he has used the word “Mannigfaltigkeit”, diversity, and here “Sammlung”, collection, but clearly not in Potter’s sense. It translates:

         

If, in accordance with our previous use of the word [which had been an attempt to accommodate Schröder], a class consists of objects, is a collection, a collective unity of them, it has to disappear if the objects disappear. If we burn all the trees in a wood, we thereby burn the wood. There cannot therefore be [ie, could not be, if we took that line] an empty class.

 

Now we come to the mathematical beginnings of my title. There has been, and may still be, a theory that the true beginning is topology, a distinction between inside and outside, with the emotional overtones of being with our mother or separated from her, but in most people’s common sense view the obvious beginning is counting. We do this with words, which we can call counting words. What is important about them is that they are memorable noises and have a memorable order. It would be proper to call them ordinal words, except that grammarians have adopted this term for “first”, “second”, “third”, etc, which to mathematicians are a side issue, because the words we actually use when we count are “one”, “two”, three”, “four”, etc (mutatis mutandis for different languages).

          Then what are cardinal number words? They are exactly the same, but used for a different purpose. The difference can be illustrated by reference to the number words of different tribes. The Inuit are said to use for “five” a word meaning “hand”; Borges related a story told him by his grandmother that the Indians of the Pampas used a word meaning “thumb”, which he maintained also meant “infinity” to them. Around 1970 David Attenborough, filming in New Guinea, was trying to find a notoriously shy tribe called the Biame, and, just as he was about to give up, met them unexpectedly and this magical encounter is captured on his film, which must surely be safe in the BBC archives. By sign language he managed to convey that he was interested in the local rivers, and their leader responded by pointing in their direction and, as he named them, making what were clearly counting noises – which he accompanied by tapping first the fingers of one hand in order, then of the other, ending with its thumb, and then tapping points on his body in an order that he found memorable, up to I think fourteen, which was tapped on his shoulder or neck.

          One could say that the Inuit had a cardinal bias of mind, the Pampas Indians and the Biame an ordinal one, though of course both will have used their words in both senses, as appropriate. To reinforce what these senses are, try counting some reasonably small set of objects – a handful of nuts, say. You will be saying “one, two, three, . . . ” until you get to the end, and then, if you are asked how many nuts there are you will repeat the last of your words; but to see that there really is a difference between the ways you used it, re-examine the process in memory and in slow motion. Let us say the word was “eight”. You will have ended “ . . . six, seven, eight”, and pointed to the last three individual nuts one by one. On being asked how many nuts altogether, you would have gestured embracingly, with an open hand perhaps, like someone who has been taught that it is wrong to point, to all eight. Cardinal number words, then, are how-many? words, ordinal number words are counting words (and I shall continue to ignore grammarians’ so-called ordinals).

          Now, it is clear from all this that explaining the ordinal usage requires absolutely no over-and-above reference. In these circumstances and for this purpose we make these and these noises, and that is all there is to it. Nevertheless, when we, cardinally, answer “eight”, we are strongly inclined to feel that we have specified a number of something, and from this it is only a small step to feeling that we have named a number. Quite possibly future generations will find this perverse, and give Ockham’s razor explanations for cardinals along the lines of mine for ordinals. One thing that makes this difficult for the time being is our verbal usage for the identities of simple arithmetic. We verbalise 2 + 3 = 5 as “two plus three equals five” and think of this as a mathematical sentence expressing the identity of what might seem to be different things but are in fact one and the same. “2 + 3”, “5”, “two plus three” and “five” all refer to the same thing, namely the number five. What that actually is is something we do not ask ourselves at this stage.

          My attitude in this matter may be influenced by my upbringing in an extremely old-fashioned elementary school on the edge of the City of London. For example, when we did a long division we not only had to write out the working in full, with the divisor written on the left and separated from the dividend by a right-hand bracket, with the quotient written out digit by digit above the dividend and separated from it by a horizontal line, but we then had to write a separate express numerical sentence stating the result, say

 

                   2057 ÷ 25 = 82 rem 7

 

since the right-hand bracket and horizontal line were only part of the working and expressed neither “divided by” nor “equals”. When I got to my grammar school and found that nobody bothered with this, I thought they were an illiterate lot.

          Similar feelings came to me when I arrived in Oberhausen in 1945 and found that Germans would speak my first identity as “zwei plus drei gleich fünf”. Why couldn’t they use the verb “gleichen” and say “gleicht”? I now realise that they had in mind the full “ist gleich”, but in that case why couldn’t they take the trouble of saying it?

          Now, however, I have a division of conscience. Perhaps the German usage should be encouraged as helping towards an Ockhamist future for the philosophy of cardinals. Another thought I have long had is that my impulse to say that the expression “two plus three” denotes the number five, and that “2 + 3” denotes the same number 5, ought to give way to calling 5 the value of the expression “2 + 3”. After all, we say we give the variable x (or the letter “x”) the value 5 in certain circumstances – it would perhaps be retrograde to explain this by saying that in those circumstances we momentarily allow the letter “x” to denote the number 5.

          The very fact that this should strike us (or me at least) as a dilemma underlines our uneasiness over the claims of mathematical Platonists. For if it were true that numbers exist independently of our knowledge of them there would be no harm in taking numerals to denote them. If our uneasiness is understandable, we are right to look round for usages, like calling “value” in aid, that could wean us of an irrational addiction. And yet, again, there is no doubt that a denoting grammar and treating mathematical propositions as sentences (that say something about something) are extraordinarily convenient.

          Since the days of Ramsey we have been brainwashed into agreeing that whatever arithmetical identities say something about, it is not the external world, and frequently, since then, the term “tautology” has been used for them. Ramsey questioned it (in the first paper in Foundations, Kegan Paul 1931, now reprinted by CUP) because he used the term in Wittgenstein’s narrow, truth-functional, Tractatus sense, and it was not clear to him whether this sense was sufficient for the a priori quality of arithmetical identities, let alone the equations of analysis. Nevertheless he went on using the term and it has stuck, and it is instructive to see three cases where he draws distinctions: the purely arithmetical (2 + 2 = 4); the everyday but a priori (because arithmetic is embedded in it); and, most instructive of all, something everyday but turning out to be significant, ie capable of being wrong, in spite of its embedded arithmetic. On page 12, the significant “I have two pennies in each of my pockets” gives logically “I have four pennies altogether in my pockets” by making use of “2 + 2 = 4” as an intermediate step, but on page 2 the significant “It is two miles to the station and two miles on to the Gogs” gives “It is four miles to the station via the Gogs”, by an inference that he clearly regards as equally logical, not noticing that it can be falsified by “Starting at the station I can get to the Gogs in two miles, and from home I can get to the station in two, but if I try to do both in one go I always get tired and lose my way and it takes five”.

          This lapse of Ramsey’s does not detract from his page 2 point, but actually reinforces it. Whatever the meaning might be, the “2” of arithmetic, the “two” of a faultlessly logical practical inference and the “two” of a fallible but normally reliable inference all mean exactly the same. One can quibble that “two pennies” uses a cardinal 2 and “two miles” a quantity 2, as with “two pints of beer”, but the fact remains that in each individual case the two and the 2 are the same. So if anybody can convincingly Ockhamise both cardinal “two” and quantity “two”, as I have done for ordinal “two”, they will thereby do away with anything for either cardinal “2” or quantity “2” to refer to, denote or name.

Arithmetical propositions that are sentences that say something about something are what we instinctively want to keep – but what, and about what? They cannot say anything significant in the technical sense, for they can only be wrong if they are wrongly written, like “2 + 3 = 6”, and they cannot be about anything that we recognise as a natural entity, or even an abstract one: if we were to say that “2” referred to twoness, that wouldn’t make twoness plus twoness equal fourness.  So let us go back to when we felt that 2 + 3 and 5 were the same thing but did not ask what that thing was.

What justifies “the same” is the way the expression and the numeral are used. This is not to say that they are used in the same way, because clearly they are not: they are used in a way that leads us to regard them as having, one might say, the same upshot. We then find that this identity of upshot encourages us to use phraseology that appears to refer to mathematical entities, in a kind of as if phraseology. This progress of usage has nothing to do with abstraction, whether or not that word deserves Frege’s lampooning of it in his Antwort auf die Ferienplauderei des Herrn Thomae. It is a matter of slipping into an ontological way of speaking. If that is the case we should be honest about it. We should declare that we are conjuring up an ontology of numbers, invented entities taken out of thin air to give us the convenience of a denoting grammar.

It will not be surprising that having done this we find that mathematicians do not always agree on exactly what they have conjured up. The positive whole numbers begin with 1. The natural numbers begin with 0. Are the natural numbers from 1 on the same mathematical entities as the positive whole numbers or do they retain their distinction? It simply does not matter. Mathematicians have to choose what line they take and make it clear to their readers (though I need to say in advance that modern mathematicians of set theory are united in taking intuitively distinct entities as identical with a relentlessness that can make an amateur feel quite giddy).

Before I try to apply my conjuring trick to set theory, I should like to apply it to functions. Pupils are quite quickly going to learn to regard them as sets of ordered pairs, having presumably already met y = f (x) notation in a more or less Ockhamist manner, but I should like to anticipate that by something even more Ockhamist. I call it independent and dependant assignment of values to variables. Suppose that we are introducing trigonometry but wish to jump the elementary stage in which this is done in terms of right-angled triangles. On the blackboard we draw an x axis, a y axis and a unit circle with its centre at their joint origin. We say that we are going to assign values to the variable q as an assigning point moves anti-clockwise from the x axis around the unit circle, measuring its movement in the same units as it traverses the circle (in effect, in radians). This point simultaneously assigns its x co-ordinate to x and its y co-ordinate to y. No mention has been made of the cosine or the sine functions, only of the independent assignment of values to q and of the simultaneous dependant assignment of values to x and y. Only then need “cos” and “sin” be introduced as convenient notations, and our new apparatus can be applied to elementary trigonometrical problems concerning right angled triangles larger or smaller than those in our unit circle.

Another elementary notation that can lead us towards the concept of a function is “squared”. Since 2 squared is 4 and 3 squared is 9, we can say that this new mathematical apparatus assigns the dependant value 4 to y when we assign the independent value 2 to x, and the value 9 to y when we assign 3 to x. The Belgian mathematician Papy made much use of arrows to express the workings of such an apparatus, and we can make use of our rights as mathematical conjurors to say that we are talking about something called the square function, which is an assemblage of the arrows 2Ú4, 3Ú9 etc, while the cosine function consists of arrows qÚx and the sine function of arrows qÚy.   

Naturally, f( ) notation has to be introduced now, but this can still be done in a fairly Ockhamist manner. While still avoiding investigation of the ‘over-and-above’ problem of sets, we can make use of the concept of individuals (see Potter, page 24). Individuals, or atoms, or in German Urelemente, are for our present purposes anything we can meaningfully conjure up mathematically out of thin air, and, not having progressed to sets, we can talk of assemblages of them, which are simply Potter’s fusions. One advantage of having jumped the gun of sets is that one assemblage can be a constituent of another and thereby count as an individual, so that an assemblage of arrows can be an individual capable of being assigned as value to the function-variable f, and in turn be a constituent of an assemblage of functions. When we reach the proper terminology of sets, however, an individual will not be allowed to be a set and have members: it can only be a member of sets. Incidentally, there is nothing in the concept of individuals to prevent our treating real objects as individual constituents of assemblages, nor as individual members of sets when we finally introduce ourselves to those. I can declare my inherited armchair or my two favourite eightieth birthday presents to be individuals, and such gambits can be useful for taking children through the foot hills of assemblages and into set theory, indeed are absolutely legitimate there, but they are inelegant in any attempt to found mathematics intuitively, let alone formally. For our purposes at this stage, our only individuals will be whatever we can meaningfully dream up out of thin air for serious mathematical or logical purposes. (Another mathematician I shall be making references to, Tourlakis – Lectures in Logic and Set Theory, CUP, 2003 – is with me here. See page 99 of his second volume: “Naively, or informally, set theory is the study of collections of ‘mathematical objects’ ”. Unfortunately, he departs from me in his formal set theory, which also accepts ‘mathematical objects’ as elements of formal sets. The line I shall be taking when we finally encounter sets is a strict distinction between informal ones whose elements can be intuitive mathematical objects and a formal theory with no set-members outside itself.)

(Potter, on the other hand, is strictly speaking within his rights to say, page 51, that if a set theory is allowed individuals at all, it can legitimately have real objects such as birthday presents as its individuals, but his reason for saying this is dubious, as can be seen from his suggestions: chairs, electrons, thoughts, angels. He wants to let these in so that they can be counted, and thinks that the only natural way for a set theory to achieve this is to embrace such entities as individuals. Now the only way a set theory can embrace anything, as individuals or as constructed sets, is for it to be unambiguous, for any item of the universe, whether it is a member of a specified set or not. This certainly does not seem to be the case when the items are thoughts. Besides, to be useful for counting, a formal set theory only needs to provide a model of counting-arithmetic, and the less it embraces dubious objects the better, even for counting dubious objects. There is actually quite a range of dubiety between the unambiguously countable, through things we can count optimistically, like waves, to things we cannot meaningfully attempt to count at all, like sparks in a smithy, as I suggest in my Wittgenstein book.)    

          Many years ago, wanting to introduce children to sets somewhat prematurely, and thinking they were ready for the concept of sets as entities over and above their members, I made use of coloured number sticks, called Cuisenaire rods, which were then familiar objects in primary schools. They have now become too expensive for primary schools, but I have noticed that children who have met them in early years have an affection for them that enables them to meet them later without feeling above them. So I am here describing a use that could be profitable for children of fourteen or so who remember the simple arithmetical code of placing rods together at length to express addition and crossed over one another to express multiplication. Now, the new code will be to use tandem placing to express union and crossing to express intersection – of assemblages rather than sets for the time being.

          These must be chosen for appropriate shared or unshared characteristics of the children in the class. The case that first provokes our philosophical interest will be a pairing of characteristics for which the intersection is just one child, in this case perhaps a blue rod crossed over a yellow one. Clearly that cross is not the child, but it can be said in some sense to represent the child. Later, it will be said to represent the unit set or singleton set whose only member is the child, but could any meaning be given to its representing the unit assemblage whose only ‘member’ was the child? No doubt we could stretch language to saying that, but it would not make that unit assemblage anything other than the child itself. Remember what we said about a single piece of Lego.

          Now assume that blue represents the assemblage of the boys and yellow of the girls. No-one will be represented by their cross. Nor can it be said to represent an assemblage, since there is nothing for this to be an assemblage of. This can be the stumbling block where we suggest to our pupils that a new language is needed, namely of sets.

          For preliminary exercises we shall continue to have sets of children, but as soon as the legitimacy of over-and-above talk is established with the help of the rods and the children we must insist that this has all been a preliminary to what is the real thing for them, mathematical sets whose basic members are mathematical entities ‘conjured out of thin air’. I need to point out here that since we have already conjured numbers out of thin air and then arrows and then functions, we can already claim legitimacy enough for our new, convenient idea of sets as conjured-up ‘over-and-above’ entities. While we were dealing with assemblages we never actually needed our rods or their pairings – they were merely a convenience of representation, and it was clear to us that they represented nothing more than the children themselves in their various groupings. Yet they were visibly ‘over and above’, and so now they provide us with a very helpful analogy: if we can regard crossed rods as representing a child on its own when an assemblage has been so defined as to have only that child as ‘member’, where there is no distinction between the unit assemblage and the child, we can now deem the cross to be an ‘over and above’ that is distinct, thus enabling us to have our cake and eat it. That is to say, we can refer to the cross as an ‘over and above’; and we can also regard ourselves as thereby somehow referring to the child. This is just what, psychologically, we want to do with sets: treat them as ‘over-and-above’ entities and also use them to give us in effect the convenience of our old singular-for-plural parlance.

I must emphasise that it is only in effect that we have retained our old parlance, and achieved cake-and-eating with it. A set with, say, seven members is not identical to the assemblage or fusion of those members. When it is in turn made a member of a further set, say with two members, the other being a set with seven other members, that further set will have only two members, namely those two sets, not fourteen. Understanding this is important from the beginning, but absolutely vital for the next stage, when we construct our model for mathematics in which there are no longer any given individuals, such as numbers and functions, but only what we can build up from one invented entity, the null class. This model is the formal version of Cantor’s paradise, and the question is, are we still entitled to enjoy it?

Among modern mathematicians it is only when infinitely membered sets are reached that the more radically minded declare that we have outstayed our welcome in paradise and should remove ourselves from it. For the time being, therefore, I shall give an outline of this astonishing idea without anticipating what problems will await us when we reach that threshold and try to pass beyond it.

(Technically, our invented foundation entity, the null class, is not what we have been calling an individual: it is, precisely, an expression of the fact that we have no individuals at our disposal to be members of anything and have to invent a set with no members before we can begin. To use an image suggested to me by Professor Smiley, the ground floor of our model’s structure is empty, and only on the first floor above it can we put our empty set to embody that fact. Less vividly, Tourlakis, on pages 100 and 116, settles this by using “atom” as a variant for “individual” or “Urelement”, and defining it as not a collection, and thus not an empty collection either. On his page 59 Potter has mysterious references to the possibility that both Zermelo and Goedel regarded the empty set as improper, or at least invented, and perhaps thus as an invented individual, before tying his own flag to its not being an individual at all. Quite apart from whether it is an individual, what is puzzling to me is why there should be so much resistance to its being invented, when it so patently is.)

In this paper’s intuitive introduction, although we countenanced invented cardinals, we were radically Ockhamist in refusing to allow more than ordinal number noises, and these had begun with the noise “one”. In setting up set theory as a model of arithmetic, mathematicians are of various schools, some beginning with ordinals, some with cardinals, some with the natural numbers. Personally, I favour ordinals, and to me their natural beginning in set theory would be the set whose sole member is the null set. Admittedly, this involves the inelegance of introducing a null set for which we have no use but providing us with its singleton set as our eventual starting point. There would be, however, a much more serious disadvantage. To do ordinal arithmetic we shall be lamed if we have no identity element for addition: that is, we need to be able to say that for any ordinal n, n + 0 = n (adding zero makes no difference). There is therefore no avoiding starting our formal ordinals with the null set itself, not the set whose only member it is.

The Biame, of course, had no ordinal arithmetic, but Attenborough could easily have invented one for them. To express 7 + 4  = 11, they could have tapped the five fingers of one hand and the first two of the second, and then, uttering their words for “one” . . . “four”, tap their last three fingers and then their elbow, and then, with their finger still on their elbow, triumphantly utter their word for “eleven”. We ourselves can play such tricks to satisfy ourselves that ordinal addition is as meaningful as the more natural cardinal addition, but of course we shall have to introduce formal rules so that it can be taken beyond any practical limit. The most important of these is to define n + 1 as the successor of any given ordinal n, and then n + m as the mth successor of n, in effect what, for small numbers, Attenborough might have taught the Biame (who would never have seen any need for an ordinal zero).

Formal rules can follow two strategies: one is to accommodate any real or intuitive entities that we might wish to count, or to build into a formal theory; or, my aim here, to set up ‘pure’ sets, existing only in our theory-without-individuals and simply requiring us to make an opening once-for-all ontological declaration that we are calling into being whatever ‘pure’ entities we need for our theory, whose function is to mimic anything real mathematicians can do in ‘real’ mathematics. Zermelo’s ‘natural numbers’ (see Potter, page 293), namely

 

{ }, {{ }}, {{{ }}}, {{{{ }}}}, {{{{{ }}}}}, . . .

   

would do very well as formal ordinals (remember that according to a rule already given each of these sets except the first has just one member). Von Neumann’s model-set of natural numbers (see the same page),

 

          { }; {{ }}; {{ }, {{ }}}; {{ }, {{ }}, {{ }, {{ }}}; . . .

 

where the null set is followed by the set whose sole member is the null set, followed by the set whose two members are the two previous sets, followed by the set whose three members are the three previous sets and so on, is intuitively a proper model for the cardinals, in that each set has as many members as its intuitive equivalent specifies. Modern proponents of set theory, however, as I have warned, universally prefer to use identical models for ordinals and cardinals, adopting von Neumann’s for both, which means that only when transfinity is reached does any difference between the two arise (when it becomes fascinating). Perhaps the song and dance I made about the intuitive difference between ordinals and cardinals in the very beginnings of learning numbers explains why I do not take to this assimilation. I am intrigued, however, to find that Tourlakis is quite happy to use the discarded Zermelo set as indices (page 210 of his second volume), and see also page 293 of Potter.

          In adopting a set theory that allows for real as well as invented individuals, Potter feels he has to introduce complications that will show why I prefer to have only invented individuals in intuitive set theory and none at all in a formal one. These complications are set out in his chapters 9 and 11, but I shall limit myself to pointing out what I find his prime inelegance in the first of those, on cardinals, on page 155.

          He adopts Cantor’s definition of sets’ being equinumerous, namely their being in one to one correspondence with each other. Then he summons up appropriate entities for expressing such a state of affairs (at the bottom of the page). If (and of course only if) two sets A and B are equinumerous there will be something called their cardinality, which they will have in common: card (A) = card (B), which on the next page he calls “innocent enough”, and then reifies by a definition that derives from Russell but avoids his paradox (see pages 156, 157), “the set of all sets such that . . .”, so that the summoning up can become a theorem. Cantor confines himself to expounding, with more words and less formality, his basic declaration (on the first page of his Beiträge, page 282 of the Olms edition). Tourlakis, in contrast to both, defines cardinals respectably on page 458.     

Now this calling of entities into being can happen in three ways. The first is what we all do without noticing it, what I have called slipping into an ontological way of speaking, as in talking about numbers (the extreme Okhamist story about ordinals with which I began does after all take some effort). The second is what Cantor and Potter do, invoking them so that we have to keep our eyes open for the sleight of hand. The third is what ought to be done: setting up a purely formal set theory with no ready-made atoms, prefaced by an ontological clean breast, to the effect that we shall assume the existence of whatever entities satisfy the axioms of our theory (or theories, since we can adopt different axioms for different purposes). In doing this we have to ensure that that there is always a formal object available to be a surrogate for any entity that other mathematicians might be tempted to ‘call up’. For example, for a set A, card (A) will be a corresponding formal von Neumann natural number deemed to be the appropriate cardinal. Of course, while doing this we are entitled to step aside for teaching purposes so as to explain to our pupils what has been going on and illustrate it intuitively. (Quine, in Set Theory and its Logic, 1963, makes a clean breast of needing no clean breast as far as his virtual classes are concerned, treating them “as a mere manner of speaking” – page 16 in his §2. As to his real classes, page 1 in his introduction seems to say that this is simply down to our common understanding of what a class is, but I hope the above shows that this is by no means straightforward, while page 28 of his §4 makes the existence of classes depend upon the concept of values of variables – but then that concept needs to be declared as primitive, and not explained away by the value of x being what “x” temporarily denotes.)   

(An important detail in adopting formal ordinals to do duty as cardinals: they will be able to do so because they are in one-one correlation with themselves as identical sets. As a cardinal, the ordinal w, a set whose  members are the totality of finite ordinals, will eventually be called Aleph zero, but this will not detract from their being one identical set-theoretical object in this way of doing things. And in general, in case we meet further transfinite cardinals, they will always be represented by the first ordinal of their given cardinality.)

I hope these notes will inspire a new generation of mathematicians to return to set theory without atoms, individuals or Urelemente and present it in a manner that is comprehensible to beginners. Meanwhile, I shall take up a point made above, that when transfinity is reached the distinction between ordinals and cardinals becomes fascinating.                                

In doing this I must begin by ‘stepping aside for teaching purposes’, in particular to explain the concept of an order type. For this it is worth looking at Cantor himself, in the Olms reprint, and the informative passages, both for order type and the fascinating transfinite, are §§ 11-13 of III 4 Nr5 (see §§ 5-10 for historical introduction – indeed see all of III 4 Nr5 for intriguing metaphysical asides) and §§ 1-20 of III 9, Cantor’s final contribution to set theory, his Beiträge.

          My intuitive explanation of order types as a schoolmaster was to consider the series 1, ½, ¼ , etc as occurring during intervals of a minute, a half minute, a quarter minute etc, enabling the whole infinite series to be completed, in thought experiment, in two minutes. Introducing the symbol “w” to denote the set of finite formal ordinals (von Neumann, above), we can draw the first fascinating distinction of many, between 1 + w and w + 1. For consider: an extra 1 at the front will merely push the original 1 into the half-minute place and make no ultimate difference to the order of our thought experiment, whereas a 1 at the end will make us wait until the experiment is completed, pause, and make a fresh start, justifying our calling this a difference of order type. (Gratifyingly, Potter uses my schoolmaster’s trick to elucidate what he calls a constructivist attitude to set-formation, and dignifies it by calling it a supertask – see his pages 37 and 177.)

          What, then, is an order type in a set theory that has no place for intuitive entities that other mathematicians are happy to call into being? We first need the concept of an isomorphism. This is a translation that preserves a structure that we are interested in, in our case order. A simple informal example is given by Tourlakis on page 316 of his second volume, giving an explanation of the word’s etymology as a bonus, where the set {1,2,3} is given its numerical order by the relation (set of ordered pairs) {(1,2), (2,3), (1,3)} and {a,b,c} its alphabetical order by {(a,b), (b,c), (a,c)}. These respective relations, translated one into the other, preserve the orders of the members of the respective sets. Since neither intuitive numbers nor letters of the alphabet are elements of the type of formal set theory that I am recommending, the question arises whether such an isomorphism can be set up in a set theory that is both formal and excludes ‘foreign’ individuals. Clearly, this can be done in comparing say the Zermelo natural numbers (for they are to hand for us to use even if we have been ignoring them) with the von Neumann ones. Whether, in addition, the concept of isomorphism can be embodied as an entity of pure set theory is something I must leave to mathematicians.

          Naturally, embracing a pedagogic method that includes Cuisenaire rods and stories about the Biame, I have plenty of room for different levels of set theory, and I am quite happy, informally, to accept one that allows sets of any kind of intuitive entity that mathematicians find useful. Since letters of the alphabet play such an important role as symbols with which to denote variables it would be pedantic to try to exclude them. Indeed, although I am doing propaganda for pure set theory, I am not wishing to denigrate mathematicians who prefer their axiomatised set theory to accept intuitive entities as it meets them, provided they do so without pretending to provide everything that could ever count as mathematics in one embracing formality once and for all, and provided too that the aim of achieving only a simulacrum of mathematics is not treated as old fashioned eccentricity.

          To return, however, to my muttons and pursue the fascination of the transfinite. To start I will give the references that I have found most useful. They are: Quine, Set Theory and its Logic, Harvard, 1963, Chapters VII and IX, in particular §§ 22 and 30; Tourlakis, Lectures in Logic and Set Theory, Volume 2, Cambridge UP, 2003, Chapter VI, §§ 4, 5 and 10; Potter, Set Theory and Its Philosophy, Oxford UP, 2004, Chapters 10 and 11; and Cantor, Beiträge, §§ 5-20. I have also been told that RL Wilder, Introduction to the Foundations of Mathematics, Wiley, second edition, 1965, is an excellent account of the very type of set theory I want to promote, but it is extraordinarily difficult to find a copy short of reading it at the British Library. In particular, the Science Museum Library has every Wilder book except that one.

          I start with an elementary benefit of beginning the von Neumann version of ordinals with the null set instead of the set whose sole member that is, ie with zero instead of one. This gives the useful fact that the nth ordinal is a set whose members are 0 . . . n-1, while the set whose members are 0 . . . n has n+1 members. From this we get the elegance of being able to define the successor of n as the union of n as a set (namely {0, 1, … n-1}) and the set whose member that is (namely {{0, 1, … n-1}}, giving a set with the n+1 members 0, 1, … n-1, {0, 1, … n-1}, the last being the nth ordinal, giving us {0, 1, … n} as required.  

          The ordinal w is the set of all such ordinals but it clearly cannot be specified as the successor of any of them, since the von Neumann ordinals as we have so far understood them (what he called the natural numbers) have no last member for w immediately to succeed (and still will not have a last member when we understand them better). For this reason w is called a limit ordinal – it cannot be reached by successively ‘adding one’ in the sense above. Yet one can be added to it in that sense – in other words it can be given a successor. This is the union of w (the set of all finite von Neumann ordinals) with the set whose member is w, and this union set we can term w + 1.

          Repeating this we can obtain w + 1 +1, and so on indefinitely until we fail to reach w + w, a second limit ordinal. For a brief indication of how this process goes on see Quine pages 151, 152 and 210, 211; Potter page 204, and Tourlakis page 426. On all of these a mysterious so-called epsilon number, epsilon zero, appears, whose importance is that it is a ‘fixed point’, in that it is the value of a function for itself as argument, ie by satisfying f (x) = x, something one would not have thought possible in transfinite arithmetic until one observes that w (to the w to the w to the w . . . ) is identical to w to the w to the w . . . Cantor’s treatment of epsilon numbers constitutes § 20 of his Beiträge and is thus his last published contribution to set theory

          Now this new series of ordinal numbers has, after w, its first limit ordinal, no last one, and so it will need something ‘stronger’ than a limit ordinal, if not to terminate it, which is impossible, to complete it in some sense. This requirement was first expressed by Cantor in § 11 of Nr 5 of his 4th Abhandlung, preceding the Beiträge: he asks for a ‘zweite Erzeugungsprinzip’ over and above his first, which had enabled him to posit a totality of the unterminating whole numbers and (for the first time) call it w. Now this set, as we now call it, is termed denumerable, indeed is the paradigm denumerable set, since a set is denumerable if its members can be put in one-one correspondence with the members of w.

          What is astounding to the unprepared amateur is that if we take the series of transfinite ordinals, beginning with w and including all the successor ordinals and limit ordinals indicated above, and terminate it arbitrarily, even well beyond epsilon zero, we can, with sufficient ingenuity, put the result in one-one correlation with the members of w, making it denumerable. Cantor’s aim was to establish that the unterminated totality of that series would be non-denumerable, and require a higher cardinal number than Aleph zero. After all, this seems very reasonable by analogy: if we terminate the finite whole numbers arbitrarily, however far from zero, they form a finite cardinality, but if we form a set of their totality, their cardinality is transfinite. We ought to be able to hope that forming a set of the series we have been discussing, which Cantor calls the second number-class, will give us a higher cardinality (and clearly, if it does it will have to be the next higher). Cantor suggests (page 196 of the Olms edition) that it will be a limit, but we already have modest limit ordinals, so a new term is needed for a super limit ordinal that will, by beginning the third number class, wind up the second.

          It is termed an initial ordinal, defined by Tourlakis on page 459, with a proof (VII.4.3) that establishes that every initial ordinal is a cardinal. The one we want, to wind up the second number class, can only be the very next one, and so is justifiably termed w1. An anxiety arises here: for we can by another route find an indubitably higher cardinality than Aleph zero (ie than the limit ordinal w) but we cannot guarantee that it is also the next highest, and thereby identify it with w1.

          This indubitably higher cardinality is famous as the cardinality of the continuum, the set of real numbers between zero and one. Cantor first proved their nondenumerability in his first contribution to set theory of 1874, page 115 of the Olms edition, distinguishing the denumerability of the algebraic real numbers and the nondenumerability of the full real numbers, but it was in a later paper, of 1890-91, his last contribution to set theory before the Beiträge, pages 278-281 in Olms, that he gave the proof that is now famous as his diagonal procedure. This is now expressed very simply by setting out the real numbers between zero and one as an infinite array of infinite decimals. Each infinite decimal is clearly a denumerable expansion occupying a row, and there are assumed to be denumerably many rows. Now if one were to change one of the digits of an expansion one would have changed its value, but one could reasonably assume that this new value would appear in some other row. But if one were systematically (granted the time, or giving it to oneself by a ‘super’ thought experiment) to change, first, the first digit of the first row and then, second, the second digit of the second row, and so on, one would have achieved an expansion different from any previously there. If, however, one were arbitrarily to insert it among the rows, hoping to retain denumerability, the possibility of generating yet a further new real number would remain, and would continue to remain however often one did the same.

          This cardinality of the continuum can be shown (see pages 453-4 and 455 of Tourlakis) also to hold for the set of all functions from w to w, and for the so-called power set of all subsets of w. The latter is termed P(w); the cardinal of the continuum, by taking the diagonal process for binary expansions, can be conveniently termed 2 to Aleph zero. The question is, whether the same holds for the set that Cantor called his second number class, enabling us to identify its cardinality with that of the continuum and to call that too w1 (or Aleph one).

          Cantor failed to establish that this was the case. An editorial footnote to his second contribution (of 1878) on page 133 of Olms identifies the sentence on page 132 where he first expresses his hope that it is. At least (see pages 199-201 of Olms) he established to his own satisfaction that no cardinality intervened between the cardinalities of the first and second number classes, but this still leaves the cardinality of the second ‘in the air’ between having a cardinality of its own and sharing that of the continuum. The work of Gödel (1940) and Cohen (1963 and 1964) has now established that it will always be in the air: the former demonstrating (see Tourlakis, page 215) that the continuum hypothesis is consistent with the Axiom of Choice and the other axioms of set theory, the latter demonstrating that it is independent of them.    

          This in turn arouses a further anxiety, though it may well be private to myself and only come from my lack of mathematical understanding. On page 229 Tourlakis admits that adopting the Axiom of Choice or dropping it is as free a matter of choice as with Euclid’s parallel axiom, yet on that very page he declares its plausibility, and on page 395 begins an argument emphasising its plausibility. I find this quite convincing, but I also find convincing an opposite intuitive consideration of my own. For I can only hope that anyone who compares my account of Cantor’s second number class with my description of his diagonal procedure as applied to real numbers will agree that the latter indicates a higher cardinality than the former, and not an identical one. Fearing this comparison to have more in common with a trance experience than with intuition, I was encouraged to find that Cohen (1966, and quoted at the top of Potter, page 274) might charitably be said to agree with me.

          An equally open question is whether there can be cardinal numbers outside the series of Alephs. One speculation has been that the continuum’s cardinality is not Aleph-one but Aleph-two (Gödel, briefly, see Potter page 273), that it is some higher Aleph, that its cardinality somehow squeezes in between Alephs (without the Axiom of Choice there might be cardinalities that are incomparable, see Hartogs, page 266 of Potter), or that its cardinality is beyond any Aleph, which appears to be the implication of Cohen on Potter’s page 274.

          The Aleph sequence is defined inductively by Tourlakis on page 465, requiring that Aleph zero is w, that the next highest cardinal to Aleph alpha is to be called Aleph alpha + 1, with a proviso if alpha is a limit ordinal and so cannot be reached; while the existence of a next highest cardinal depends on VII.4.16 (page 462), that for any set its power set has a higher cardinality (even if not the next higher), depending in turn on the Zermelo well ordering principle (VI.5.50, page 355 and depending on the Axiom of Choice). That there might be cardinals beyond Alephs, or Alephs beyond those with any ordinal subscript, is a supposition that requires the concepts of the cofinality of ordinals, their regularity or singularity, and the weak or strong inaccessibility of cardinals (pages 478-484). I cannot pretend to explain any of these complications.

At least, a neat titbit is that Tourlakis, on his page 483, treats “Aleph” as a function sign and its index as its argument place. This enables him to ask, what is Aleph’s first fixed point, ie, for what alpha, Aleph alpha equals alpha. It is “quite huge”, constituting a cardinal fixed point corresponding to but enormously beyond the ordinal epsilon zero. We thus have an intriguing triple comparison: this strongly inaccessible cardinal apparently relates to smaller infinite cardinals in no more mysterious a manner than w did (above) to finite cardinals, and, in between, than initial ordinals did to limit ordinals. This tantalising coincidence of apparent simplicity with the limits of my mathematical ability tempts me to think there could be an analogy with my previous paper, in which I defended the meaning of beliefs which are embedded in grander beliefs that I find meaningless. 

For these mathematical concepts and their problems surely raise questions of what is the case in this realm of mathematics, or as I put it above of sentences that say something about something. Potter, on his page 233, quotes Boolos (2000) on this: “ . . . and we are no longer listening to a description of anything that is the case?” On the same page he quotes Wittgenstein. This is another of his misleading quotations in that, coming from two different pages of Wittgenstein’s mathematics lectures (Cora Diamond, 1976, pages 32 and 142, going back to page 140), it does not draw out how limited Wittgenstein’s viewpoint is. This is sad, because it concludes with a wonderful analogy. If someone (misguidedly, of course) says that a child has learnt Aleph zero multiplications, Wittgenstein says it hasn’t learnt anything huge, meaning nothing more by Aleph zero than an approach to the infinite. But what if he himself could have learnt more, say the sequence of Alephs and the question of infinite but non-Aleph cardinalities, by time travelling to Cohen and beyond (indeed, just by staying alive for a little): what substance would that have had? No more, I fear, than the infinite radius of curvature of a ruler in a schoolboy’s satchel.