Early in 2002 I sketched something for which the above might have been
a reasonable title, and in trying to bring it to life in 2005 I have had the
benefit of a book by Michael Potter, Set Theory and its Philosophy (OUP,
2004). This draws an important distinction in its second chapter between one
kind of aggregates called fusions and another, called collections, which
include sets in the sense of set theory. This distinction is very attractive,
but on a critical reading it is not quite as clear as Potter makes it out to
be. Now in trying to give it critical clarification I have an axe to grind. I
want to argue that set theory is no proper starting point for mathematics.
After all, no-one has ever remained in total ignorance of mathematics until, in
intellectual maturity, meeting set theory and thereupon embarking on
mathematics with set theory as its foundation. This might seem a merely
practical, educational observation, but I want to argue that the very confusion
I hope to point out in the fusion / collection distinction can only
(admittedly, this too is a psychological ‘only’, and I do not claim any logical
necessity for it) be sorted out if we consider the early stages of mathematics
as experienced by children and then, as grown ups, draw philosophical
conclusions from them (and then educational conclusions as to how mathematics
might be taught).
During the war we
frequently heard from the BBC the depressing statement “a number of our
aircraft is missing”. This pedantry upset us because the crews of these
aircraft were individual human beings and their machines were individual
aircraft. Nevertheless, the pedantry was not an error: it was an allowable, if
tactless, grammatical trick whereby objects which, in Potter’s words, “we might
otherwise refer to in the plural” are referred to in the singular. Another
example, I believe, is the way, after the Civil War, “the United States are”
became “the United States is” in official despatches. Or “the Government are
unable to decide” could become “the Government has decided”.
These singular usages
are examples of fusion-talk. “A collection, by contrast, does not merely lump
several objects together into one: it keeps the things distinct and is a
further entity over and above them.” This is where I have to quibble. The
implication is that a fusion too lumps several objects into one but fails to
keep them distinct. But only grammatically does it lump them into one, and it
does keep them distinct.
Then: “The contrast between collections and
fusions becomes explicit when we consider the notion of membership. This is
fundamental to our conception of a collection as consisting of its members, but
it gets no grip at all on the notion of a fusion. The fusion of the cards in a
pack is made up of just those cards, but they cannot be said to be its members,
since it is also made up out of the four suits. A collection has a determinate
number of members . . . ” But membership can grip the notion of a fusion if we
use single inverted commas and call it ‘membership’ to show that we have
changed the rules (to coincide, perhaps, with early writers on set theory,
including apparently Peano in a slip – see Potter, page 23). Or call it
quasi-membership if you do not like my distinction between single and double inverted
commas. A pack of cards can have 52 card-‘members’ as well as four
suit-‘members’. A pre-war squadron of fighter aircraft could have nine
aircraft-‘members’ and three flight-‘members’, each of which would have had
three aircraft-‘members’. Then, of course, a fusion would not have a
determinate number of ‘members’, making fusions useless for set theory.
Or perhaps Potter meant “A collection not
only, unlike a fusion, lumps things together, it also, like a fusion,
keeps them distinct, but the serious difference is that it is a further entity
over and above them”, but this only leads us to the main quibble I want to
discuss, namely what he meant by an entity over and above a collection’s
members.
To test whether any
over-and-above entity is inescapably needed for set theory I shall examine an
intermediate case, which I shall call a configuration, and illustrate it by
thinking of a girl playing with Lego pieces. She has thirty of them spread over
a table, but occasionally one or two fall on the floor, out of the fusion, as
we have to call it for the moment – but the girl regards the fusion as still
the same one, the first sin against the principles of set theory. She plays
with what are left, putting various numbers of them together in different shapes
and then breaking them up again. At each stage the source of her pleasure is
the full fusion of the pieces put together and their leftovers on the table.
Suddenly, she finds she has used all the remaining pieces on the table, let us
say twenty-five, and that the result is remarkably attractive and looks like a
dog. She proposes to keep it and call it Fido. Unfortunately Fido is liable to
lose its tail or a leg, or even both, and she still loyally calls what remains
Fido, a second sin against set theory.
Of course, we can
intervene and tell her that she must obey our set-theoretical rules and
restrict the name “Fido” to the full twenty five pieces in their proper
dog-shape, but whether we do or not we have certainly arrived at a stage that
enables us to carry out an entity-over-and-above test, and this is the stage
for which I suggest the term “configuration”. Clearly, nothing is there beyond
what was there already, except for the name “Fido”, and while this is an entity
in a context of typography it is not one for the purposes of anyone, like
Potter, wanting to set up a justification for set theory. So perhaps, provided
we impose our rules on the girl, we have the analogy we want, and can say “a
set is like a configuration; something distinctive has happened to its members
that brings our viewpoint into issue, but it does not force on us this
over-and-above business”.
Alas, we can’t. What
about a singleton set? We can pick up a single piece of Lego and give it a name
and call it a configuration, but that cannot make it differ from what it was
before we named it. Still less can we achieve the null set – there is nothing
to name, nothing that can configure. If we want set theory,
entities-over-and-above are inescapable. What I want to propound is that achieving
them will be a great deal less puzzling if we leave set theory aside and
consider the very earliest stages of basic mathematics, and problems that arise
from teaching them.
Incidentally, Potter
“modifies” his translation of Frege’s declaration that a class cannot be a
fusion in a way that disguises Frege’s rather complicated terminology. The full
quotation can be found in the 1895 Schröder article, on page 195 of the Olms Kleine
Schriften, earlier in which he has used the word “Mannigfaltigkeit”,
diversity, and here “Sammlung”, collection, but clearly not in Potter’s sense.
It translates:
If, in accordance with our previous use of the word [which had been an attempt to accommodate Schröder], a class consists of objects, is a collection, a collective unity of them, it has to disappear if the objects disappear. If we burn all the trees in a wood, we thereby burn the wood. There cannot therefore be [ie, could not be, if we took that line] an empty class.
Now we come to the mathematical beginnings of
my title. There has been, and may still be, a theory that the true beginning is
topology, a distinction between inside and outside, with the emotional
overtones of being with our mother or separated from her, but in most people’s
common sense view the obvious beginning is counting. We do this with words,
which we can call counting words. What is important about them is that they are
memorable noises and have a memorable order. It would be proper to call them
ordinal words, except that grammarians have adopted this term for “first”,
“second”, “third”, etc, which to mathematicians are a side issue, because the
words we actually use when we count are “one”, “two”, three”, “four”, etc
(mutatis mutandis for different languages).
Then what are cardinal
number words? They are exactly the same, but used for a different purpose. The
difference can be illustrated by reference to the number words of different
tribes. The Inuit are said to use for “five” a word meaning “hand”; Borges
related a story told him by his grandmother that the Indians of the Pampas used
a word meaning “thumb”, which he maintained also meant “infinity” to them.
Around 1970 David Attenborough, filming in New Guinea, was trying to find a
notoriously shy tribe called the Biame, and, just as he was about to give up,
met them unexpectedly and this magical encounter is captured on his film, which
must surely be safe in the BBC archives. By sign language he managed to convey
that he was interested in the local rivers, and their leader responded by
pointing in their direction and, as he named them, making what were clearly
counting noises – which he accompanied by tapping first the fingers of one hand
in order, then of the other, ending with its thumb, and then tapping points on
his body in an order that he found memorable, up to I think fourteen, which was
tapped on his shoulder or neck.
One could say that the
Inuit had a cardinal bias of mind, the Pampas Indians and the Biame an ordinal
one, though of course both will have used their words in both senses, as appropriate.
To reinforce what these senses are, try counting some reasonably small set of
objects – a handful of nuts, say. You will be saying “one, two, three, . . . ”
until you get to the end, and then, if you are asked how many nuts there are
you will repeat the last of your words; but to see that there really is a
difference between the ways you used it, re-examine the process in memory and
in slow motion. Let us say the word was “eight”. You will have ended “ . . .
six, seven, eight”, and pointed to the last three individual nuts one by one.
On being asked how many nuts altogether, you would have gestured
embracingly, with an open hand perhaps, like someone who has been taught that
it is wrong to point, to all eight. Cardinal number words, then, are how-many?
words, ordinal number words are counting words (and I shall continue to ignore
grammarians’ so-called ordinals).
Now, it is clear from
all this that explaining the ordinal usage requires absolutely no
over-and-above reference. In these circumstances and for this purpose we make
these and these noises, and that is all there is to it. Nevertheless, when we,
cardinally, answer “eight”, we are strongly inclined to feel that we have specified
a number of something, and from this it is only a small step to feeling
that we have named a number. Quite possibly future generations will find
this perverse, and give Ockham’s razor explanations for cardinals along the
lines of mine for ordinals. One thing that makes this difficult for the time
being is our verbal usage for the identities of simple arithmetic. We verbalise
2 + 3 = 5 as “two plus three equals five” and think of this as a
mathematical sentence expressing the identity of what might seem to be
different things but are in fact one and the same. “2 + 3”, “5”, “two plus
three” and “five” all refer to the same thing, namely the number five. What
that actually is is something we do not ask ourselves at this stage.
My attitude in this
matter may be influenced by my upbringing in an extremely old-fashioned elementary
school on the edge of the City of London. For example, when we did a long
division we not only had to write out the working in full, with the divisor
written on the left and separated from the dividend by a right-hand bracket,
with the quotient written out digit by digit above the dividend and separated
from it by a horizontal line, but we then had to write a separate express
numerical sentence stating the result, say
2057 ÷ 25 = 82 rem 7
since the right-hand bracket and horizontal line were only part of the
working and expressed neither “divided by” nor “equals”. When I got to my
grammar school and found that nobody bothered with this, I thought they were an
illiterate lot.
Similar feelings came to
me when I arrived in Oberhausen in 1945 and found that Germans would speak my
first identity as “zwei plus drei gleich fünf”. Why couldn’t they use
the verb “gleichen” and say “gleicht”? I now realise that they had in mind the
full “ist gleich”, but in that case why couldn’t they take the trouble of saying
it?
Now, however, I have a
division of conscience. Perhaps the German usage should be encouraged as
helping towards an Ockhamist future for the philosophy of cardinals. Another
thought I have long had is that my impulse to say that the expression “two plus
three” denotes the number five, and that “2 + 3” denotes the same number
5, ought to give way to calling 5 the value of the expression “2 + 3”.
After all, we say we give the variable x (or the letter “x”) the
value 5 in certain circumstances – it would perhaps be retrograde to explain
this by saying that in those circumstances we momentarily allow the letter “x”
to denote the number 5.
The very fact that this
should strike us (or me at least) as a dilemma underlines our uneasiness over
the claims of mathematical Platonists. For if it were true that numbers exist
independently of our knowledge of them there would be no harm in taking
numerals to denote them. If our uneasiness is understandable, we are right to
look round for usages, like calling “value” in aid, that could wean us of an
irrational addiction. And yet, again, there is no doubt that a denoting grammar
and treating mathematical propositions as sentences (that say something about
something) are extraordinarily convenient.
Since the days of Ramsey
we have been brainwashed into agreeing that whatever arithmetical identities
say something about, it is not the external world, and frequently, since then,
the term “tautology” has been used for them. Ramsey questioned it (in the first
paper in Foundations, Kegan Paul 1931, now reprinted by CUP) because he
used the term in Wittgenstein’s narrow, truth-functional, Tractatus sense, and
it was not clear to him whether this sense was sufficient for the a priori
quality of arithmetical identities, let alone the equations of analysis.
Nevertheless he went on using the term and it has stuck, and it is instructive
to see three cases where he draws distinctions: the purely arithmetical (2 + 2
= 4); the everyday but a priori (because arithmetic is embedded in it);
and, most instructive of all, something everyday but turning out to be
significant, ie capable of being wrong, in spite of its embedded arithmetic. On
page 12, the significant “I have two pennies in each of my pockets” gives
logically “I have four pennies altogether in my pockets” by making use of “2 +
2 = 4” as an intermediate step, but on page 2 the significant “It is two miles
to the station and two miles on to the Gogs” gives “It is four miles to the
station via the Gogs”, by an inference that he clearly regards as equally
logical, not noticing that it can be falsified by “Starting at the station I
can get to the Gogs in two miles, and from home I can get to the station in
two, but if I try to do both in one go I always get tired and lose my way and
it takes five”.
This lapse of Ramsey’s
does not detract from his page 2 point, but actually reinforces it. Whatever
the meaning might be, the “2” of arithmetic, the “two” of a faultlessly logical
practical inference and the “two” of a fallible but normally reliable inference
all mean exactly the same. One can quibble that “two pennies” uses a cardinal 2
and “two miles” a quantity 2, as with “two pints of beer”, but the fact remains
that in each individual case the two and the 2 are the same. So if anybody can
convincingly Ockhamise both cardinal “two” and quantity “two”, as I have done
for ordinal “two”, they will thereby do away with anything for either cardinal
“2” or quantity “2” to refer to, denote or name.
Arithmetical propositions that are sentences
that say something about something are what we instinctively want to
keep – but what, and about what? They cannot say anything significant in the
technical sense, for they can only be wrong if they are wrongly written, like
“2 + 3 = 6”, and they cannot be about anything that we recognise as a natural
entity, or even an abstract one: if we were to say that “2” referred to
twoness, that wouldn’t make twoness plus twoness equal fourness. So let us go back to when we felt that 2 + 3
and 5 were the same thing but did not ask what that thing was.
What justifies “the same” is the way the
expression and the numeral are used. This is not to say that they are used in
the same way, because clearly they are not: they are used in a way that leads
us to regard them as having, one might say, the same upshot. We then find that
this identity of upshot encourages us to use phraseology that appears to refer
to mathematical entities, in a kind of as if phraseology. This progress
of usage has nothing to do with abstraction, whether or not that word deserves
Frege’s lampooning of it in his Antwort auf die Ferienplauderei des Herrn
Thomae. It is a matter of slipping into an ontological way of speaking. If
that is the case we should be honest about it. We should declare that we are
conjuring up an ontology of numbers, invented entities taken out of thin air to
give us the convenience of a denoting grammar.
It will not be surprising that having done
this we find that mathematicians do not always agree on exactly what they have
conjured up. The positive whole numbers begin with 1. The natural numbers begin
with 0. Are the natural numbers from 1 on the same mathematical entities as the
positive whole numbers or do they retain their distinction? It simply does not
matter. Mathematicians have to choose what line they take and make it clear to
their readers (though I need to say in advance that modern mathematicians of
set theory are united in taking intuitively distinct entities as identical with
a relentlessness that can make an amateur feel quite giddy).
Before I try to apply my conjuring trick to
set theory, I should like to apply it to functions. Pupils are quite quickly
going to learn to regard them as sets of ordered pairs, having presumably
already met y = f (x) notation in a more or less Ockhamist manner, but I
should like to anticipate that by something even more Ockhamist. I call it
independent and dependant assignment of values to variables. Suppose that we
are introducing trigonometry but wish to jump the elementary stage in which
this is done in terms of right-angled triangles. On the blackboard we draw an x
axis, a y axis and a unit circle with its centre at their joint
origin. We say that we are going to assign values to the variable q as an assigning point moves
anti-clockwise from the x axis around the unit circle, measuring its
movement in the same units as it traverses the circle (in effect, in radians).
This point simultaneously assigns its x co-ordinate to x and its y
co-ordinate to y. No mention has been made of the cosine or the sine
functions, only of the independent assignment of values to q and of the simultaneous
dependant assignment of values to x and y. Only then need “cos”
and “sin” be introduced as convenient notations, and our new apparatus can be
applied to elementary trigonometrical problems concerning right angled
triangles larger or smaller than those in our unit circle.
Another elementary notation that can lead us
towards the concept of a function is “squared”. Since 2 squared is 4 and 3
squared is 9, we can say that this new mathematical apparatus assigns the
dependant value 4 to y when we assign the independent value 2 to x,
and the value 9 to y when we assign 3 to x. The Belgian
mathematician Papy made much use of arrows to express the workings of such an apparatus,
and we can make use of our rights as mathematical conjurors to say that we are
talking about something called the square function, which is an assemblage of
the arrows 2Ú4, 3Ú9 etc, while the
cosine function consists of arrows qÚx and the sine function
of arrows qÚy.
Naturally, f( ) notation has to be introduced
now, but this can still be done in a fairly Ockhamist manner. While still
avoiding investigation of the ‘over-and-above’ problem of sets, we can make use
of the concept of individuals (see Potter, page 24). Individuals, or atoms, or
in German Urelemente, are for our present purposes anything we can meaningfully
conjure up mathematically out of thin air, and, not having progressed to sets,
we can talk of assemblages of them, which are simply Potter’s fusions. One
advantage of having jumped the gun of sets is that one assemblage can be a
constituent of another and thereby count as an individual, so that an
assemblage of arrows can be an individual capable of being assigned as value to
the function-variable f, and in turn be a constituent of an assemblage of
functions. When we reach the proper terminology of sets, however, an individual
will not be allowed to be a set and have members: it can only be a member of
sets. Incidentally, there is nothing in the concept of individuals to prevent
our treating real objects as individual constituents of assemblages, nor as
individual members of sets when we finally introduce ourselves to those. I can
declare my inherited armchair or my two favourite eightieth birthday presents
to be individuals, and such gambits can be useful for taking children through
the foot hills of assemblages and into set theory, indeed are absolutely
legitimate there, but they are inelegant in any attempt to found mathematics intuitively,
let alone formally. For our purposes at this stage, our only individuals will
be whatever we can meaningfully dream up out of thin air for serious
mathematical or logical purposes. (Another mathematician I shall be making
references to, Tourlakis – Lectures in Logic and Set Theory, CUP, 2003 –
is with me here. See page 99 of his second volume: “Naively, or informally, set
theory is the study of collections of ‘mathematical objects’ ”. Unfortunately,
he departs from me in his formal set theory, which also accepts ‘mathematical
objects’ as elements of formal sets. The line I shall be taking when we finally
encounter sets is a strict distinction between informal ones whose elements can
be intuitive mathematical objects and a formal theory with no set-members
outside itself.)
(Potter, on the other hand, is strictly
speaking within his rights to say, page 51, that if a set theory is allowed
individuals at all, it can legitimately have real objects such as birthday
presents as its individuals, but his reason for saying this is dubious, as can
be seen from his suggestions: chairs, electrons, thoughts, angels. He wants to
let these in so that they can be counted, and thinks that the only natural way
for a set theory to achieve this is to embrace such entities as individuals.
Now the only way a set theory can embrace anything, as individuals or as
constructed sets, is for it to be unambiguous, for any item of the universe,
whether it is a member of a specified set or not. This certainly does not seem
to be the case when the items are thoughts. Besides, to be useful for counting,
a formal set theory only needs to provide a model of
counting-arithmetic, and the less it embraces dubious objects the better, even
for counting dubious objects. There is actually quite a range of dubiety
between the unambiguously countable, through things we can count
optimistically, like waves, to things we cannot meaningfully attempt to count
at all, like sparks in a smithy, as I suggest in my Wittgenstein book.)
Many years ago, wanting
to introduce children to sets somewhat prematurely, and thinking they were
ready for the concept of sets as entities over and above their members, I made
use of coloured number sticks, called Cuisenaire rods, which were then familiar
objects in primary schools. They have now become too expensive for primary
schools, but I have noticed that children who have met them in early years have
an affection for them that enables them to meet them later without feeling
above them. So I am here describing a use that could be profitable for children
of fourteen or so who remember the simple arithmetical code of placing rods
together at length to express addition and crossed over one another to express
multiplication. Now, the new code will be to use tandem placing to express
union and crossing to express intersection – of assemblages rather than sets
for the time being.
These must be chosen for
appropriate shared or unshared characteristics of the children in the class.
The case that first provokes our philosophical interest will be a pairing of
characteristics for which the intersection is just one child, in this case
perhaps a blue rod crossed over a yellow one. Clearly that cross is not the
child, but it can be said in some sense to represent the child. Later, it will
be said to represent the unit set or singleton set whose only member is the
child, but could any meaning be given to its representing the unit assemblage
whose only ‘member’ was the child? No doubt we could stretch language to saying
that, but it would not make that unit assemblage anything other than the child
itself. Remember what we said about a single piece of Lego.
Now assume that blue
represents the assemblage of the boys and yellow of the girls. No-one will be
represented by their cross. Nor can it be said to represent an assemblage,
since there is nothing for this to be an assemblage of. This can be the
stumbling block where we suggest to our pupils that a new language is needed,
namely of sets.
For preliminary
exercises we shall continue to have sets of children, but as soon as the
legitimacy of over-and-above talk is established with the help of the rods and
the children we must insist that this has all been a preliminary to what is the
real thing for them, mathematical sets whose basic members are mathematical
entities ‘conjured out of thin air’. I need to point out here that since we
have already conjured numbers out of thin air and then arrows and then
functions, we can already claim legitimacy enough for our new, convenient idea
of sets as conjured-up ‘over-and-above’ entities. While we were dealing with
assemblages we never actually needed our rods or their pairings – they were
merely a convenience of representation, and it was clear to us that they
represented nothing more than the children themselves in their various
groupings. Yet they were visibly ‘over and above’, and so now they
provide us with a very helpful analogy: if we can regard crossed rods as
representing a child on its own when an assemblage has been so defined as to have
only that child as ‘member’, where there is no distinction between the unit
assemblage and the child, we can now deem the cross to be an ‘over and above’
that is distinct, thus enabling us to have our cake and eat it. That is
to say, we can refer to the cross as an ‘over and above’; and we can also
regard ourselves as thereby somehow referring to the child. This is just what,
psychologically, we want to do with sets: treat them as ‘over-and-above’
entities and also use them to give us in effect the convenience of our old
singular-for-plural parlance.
I must emphasise that it is only in effect
that we have retained our old parlance, and achieved cake-and-eating with it. A
set with, say, seven members is not identical to the assemblage or fusion of
those members. When it is in turn made a member of a further set, say with two
members, the other being a set with seven other members, that further set will
have only two members, namely those two sets, not fourteen. Understanding this
is important from the beginning, but absolutely vital for the next stage, when
we construct our model for mathematics in which there are no longer any given
individuals, such as numbers and functions, but only what we can build up from
one invented entity, the null class. This model is the formal version of
Cantor’s paradise, and the question is, are we still entitled to enjoy it?
Among modern mathematicians it is only when
infinitely membered sets are reached that the more radically minded declare
that we have outstayed our welcome in paradise and should remove ourselves from
it. For the time being, therefore, I shall give an outline of this astonishing
idea without anticipating what problems will await us when we reach that
threshold and try to pass beyond it.
(Technically, our invented foundation entity,
the null class, is not what we have been calling an individual: it is,
precisely, an expression of the fact that we have no individuals at our
disposal to be members of anything and have to invent a set with no members
before we can begin. To use an image suggested to me by Professor Smiley, the
ground floor of our model’s structure is empty, and only on the first floor
above it can we put our empty set to embody that fact. Less vividly, Tourlakis,
on pages 100 and 116, settles this by using “atom” as a variant for
“individual” or “Urelement”, and defining it as not a collection, and
thus not an empty collection either. On his page 59 Potter has mysterious
references to the possibility that both Zermelo and Goedel regarded the empty
set as improper, or at least invented, and perhaps thus as an invented
individual, before tying his own flag to its not being an individual at all.
Quite apart from whether it is an individual, what is puzzling to me is why
there should be so much resistance to its being invented, when it so patently
is.)
In this paper’s intuitive introduction,
although we countenanced invented cardinals, we were radically Ockhamist in
refusing to allow more than ordinal number noises, and these had begun with the
noise “one”. In setting up set theory as a model of arithmetic, mathematicians
are of various schools, some beginning with ordinals, some with cardinals, some
with the natural numbers. Personally, I favour ordinals, and to me their
natural beginning in set theory would be the set whose sole member is the null
set. Admittedly, this involves the inelegance of introducing a null set for
which we have no use but providing us with its singleton set as our eventual
starting point. There would be, however, a much more serious disadvantage. To
do ordinal arithmetic we shall be lamed if we have no identity element for
addition: that is, we need to be able to say that for any ordinal n, n
+ 0 = n (adding zero makes no difference). There is therefore no
avoiding starting our formal ordinals with the null set itself, not the set
whose only member it is.
The Biame, of course, had no ordinal
arithmetic, but Attenborough could easily have invented one for them. To
express 7 + 4 = 11, they could have
tapped the five fingers of one hand and the first two of the second, and then,
uttering their words for “one” . . . “four”, tap their last three fingers and
then their elbow, and then, with their finger still on their elbow,
triumphantly utter their word for “eleven”. We ourselves can play such tricks
to satisfy ourselves that ordinal addition is as meaningful as the more natural
cardinal addition, but of course we shall have to introduce formal rules so
that it can be taken beyond any practical limit. The most important of these is
to define n + 1 as the successor of any given ordinal n, and then n
+ m as the mth successor of n, in effect what, for small
numbers, Attenborough might have taught the Biame (who would never have seen
any need for an ordinal zero).
Formal rules can follow two strategies: one
is to accommodate any real or intuitive entities that we might wish to count,
or to build into a formal theory; or, my aim here, to set up ‘pure’ sets,
existing only in our theory-without-individuals and simply requiring us to make
an opening once-for-all ontological declaration that we are calling into being
whatever ‘pure’ entities we need for our theory, whose function is to mimic
anything real mathematicians can do in ‘real’ mathematics. Zermelo’s ‘natural
numbers’ (see Potter, page 293), namely
{ }, {{ }}, {{{ }}}, {{{{ }}}}, {{{{{ }}}}},
. . .
would do very well as formal ordinals (remember that according to a
rule already given each of these sets except the first has just one member).
Von Neumann’s model-set of natural numbers (see the same page),
{ }; {{ }}; {{ }, {{
}}}; {{ }, {{ }}, {{ }, {{ }}}; . . .
where the null set is followed by the set whose sole member is the null
set, followed by the set whose two members are the two previous sets, followed
by the set whose three members are the three previous sets and so on, is
intuitively a proper model for the cardinals, in that each set has as many
members as its intuitive equivalent specifies. Modern proponents of set theory,
however, as I have warned, universally prefer to use identical models for
ordinals and cardinals, adopting von Neumann’s for both, which means that only
when transfinity is reached does any difference between the two arise (when it
becomes fascinating). Perhaps the song and dance I made about the intuitive
difference between ordinals and cardinals in the very beginnings of learning
numbers explains why I do not take to this assimilation. I am intrigued,
however, to find that Tourlakis is quite happy to use the discarded Zermelo set
as indices (page 210 of his second volume), and see also page 293 of Potter.
In adopting a set theory
that allows for real as well as invented individuals, Potter feels he has to
introduce complications that will show why I prefer to have only invented
individuals in intuitive set theory and none at all in a formal one. These
complications are set out in his chapters 9 and 11, but I shall limit myself to
pointing out what I find his prime inelegance in the first of those, on
cardinals, on page 155.
He adopts Cantor’s
definition of sets’ being equinumerous, namely their being in one to one
correspondence with each other. Then he summons up appropriate entities for
expressing such a state of affairs (at the bottom of the page). If (and of
course only if) two sets A and B are equinumerous there will be something
called their cardinality, which they will have in common: card (A) = card (B),
which on the next page he calls “innocent enough”, and then reifies by a
definition that derives from Russell but avoids his paradox (see pages 156,
157), “the set of all sets such that . . .”, so that the summoning up can
become a theorem. Cantor confines himself to expounding, with more words and
less formality, his basic declaration (on the first page of his Beiträge,
page 282 of the Olms edition). Tourlakis, in contrast to both, defines
cardinals respectably on page 458.
Now this calling of entities into being can
happen in three ways. The first is what we all do without noticing it, what I
have called slipping into an ontological way of speaking, as in talking about
numbers (the extreme Okhamist story about ordinals with which I began does
after all take some effort). The second is what Cantor and Potter do, invoking
them so that we have to keep our eyes open for the sleight of hand. The third
is what ought to be done: setting up a purely formal set theory with no
ready-made atoms, prefaced by an ontological clean breast, to the effect that
we shall assume the existence of whatever entities satisfy the axioms of our
theory (or theories, since we can adopt different axioms for different
purposes). In doing this we have to ensure that that there is always a formal
object available to be a surrogate for any entity that other mathematicians
might be tempted to ‘call up’. For example, for a set A, card (A) will be a
corresponding formal von Neumann natural number deemed to be the appropriate
cardinal. Of course, while doing this we are entitled to step aside for
teaching purposes so as to explain to our pupils what has been going on and
illustrate it intuitively. (Quine, in Set Theory and its Logic, 1963,
makes a clean breast of needing no clean breast as far as his virtual classes
are concerned, treating them “as a mere manner of speaking” – page 16 in his
§2. As to his real classes, page 1 in his introduction seems to say that this
is simply down to our common understanding of what a class is, but I hope the
above shows that this is by no means straightforward, while page 28 of his §4
makes the existence of classes depend upon the concept of values of variables –
but then that concept needs to be declared as primitive, and not
explained away by the value of x being what “x” temporarily
denotes.)
(An important detail in adopting formal
ordinals to do duty as cardinals: they will be able to do so because they are
in one-one correlation with themselves as identical sets. As a cardinal, the
ordinal w, a set
whose members are the totality of
finite ordinals, will eventually be called Aleph zero, but this will not
detract from their being one identical set-theoretical object in this way of
doing things. And in general, in case we meet further transfinite cardinals,
they will always be represented by the first ordinal of their given
cardinality.)
I hope these notes will inspire a new
generation of mathematicians to return to set theory without atoms, individuals
or Urelemente and present it in a manner that is comprehensible to beginners.
Meanwhile, I shall take up a point made above, that when transfinity is reached
the distinction between ordinals and cardinals becomes fascinating.
In doing this I must begin by ‘stepping aside
for teaching purposes’, in particular to explain the concept of an order type.
For this it is worth looking at Cantor himself, in the Olms reprint, and the
informative passages, both for order type and the fascinating transfinite, are
§§ 11-13 of III 4 Nr5 (see §§ 5-10 for historical introduction – indeed see all
of III 4 Nr5 for intriguing metaphysical asides) and §§ 1-20 of III 9, Cantor’s
final contribution to set theory, his Beiträge.
My intuitive explanation of order types as a schoolmaster was to consider the series 1, ½, ¼ , etc as occurring during intervals of a minute, a half minute, a quarter minute etc, enabling the whole infinite series to be completed, in thought experiment, in two minutes. Introducing the symbol “w” to denote the set of finite formal ordinals (von Neumann, above), we can draw the first fascinating distinction of many, between 1 + w and w + 1. For consider: an extra 1 at the front will merely push the original 1 into the half-minute place and make no ultimate difference to the order of our thought experiment, whereas a 1 at the end will make us wait until the experiment is completed, pause, and make a fresh start, justifying our calling this a difference of order type. (Gratifyingly, Potter uses my schoolmaster’s trick to elucidate what he calls a constructivist attitude to set-formation, and dignifies it by calling it a supertask – see his pages 37 and 177.)
What, then, is an
order type in a set theory that has no place for intuitive entities that other
mathematicians are happy to call into being? We first need the concept of an
isomorphism. This is a translation that preserves a structure that we are
interested in, in our case order. A simple informal example is given by
Tourlakis on page 316 of his second volume, giving an explanation of the word’s
etymology as a bonus, where the set {1,2,3} is given its numerical order by the
relation (set of ordered pairs) {(1,2), (2,3), (1,3)} and {a,b,c} its
alphabetical order by {(a,b), (b,c), (a,c)}. These respective relations,
translated one into the other, preserve the orders of the members of the
respective sets. Since neither intuitive numbers nor letters of the alphabet
are elements of the type of formal set theory that I am recommending, the
question arises whether such an isomorphism can be set up in a set theory that
is both formal and excludes ‘foreign’ individuals. Clearly, this can be done in
comparing say the Zermelo natural numbers (for they are to hand for us to use
even if we have been ignoring them) with the von Neumann ones. Whether, in
addition, the concept of isomorphism can be embodied as an entity of
pure set theory is something I must leave to mathematicians.
Naturally, embracing a
pedagogic method that includes Cuisenaire rods and stories about the Biame, I
have plenty of room for different levels of set theory, and I am quite happy,
informally, to accept one that allows sets of any kind of intuitive entity that
mathematicians find useful. Since letters of the alphabet play such an
important role as symbols with which to denote variables it would be pedantic
to try to exclude them. Indeed, although I am doing propaganda for pure set
theory, I am not wishing to denigrate mathematicians who prefer their
axiomatised set theory to accept intuitive entities as it meets them, provided
they do so without pretending to provide everything that could ever count as
mathematics in one embracing formality once and for all, and provided too that
the aim of achieving only a simulacrum of mathematics is not treated as old
fashioned eccentricity.
To return, however, to
my muttons and pursue the fascination of the transfinite. To start I will give
the references that I have found most useful. They are: Quine, Set Theory
and its Logic, Harvard, 1963, Chapters VII and IX, in particular §§ 22 and
30; Tourlakis, Lectures in Logic and Set Theory, Volume 2, Cambridge UP,
2003, Chapter VI, §§ 4, 5 and 10; Potter, Set Theory and Its Philosophy,
Oxford UP, 2004, Chapters 10 and 11; and Cantor, Beiträge, §§ 5-20. I
have also been told that RL Wilder, Introduction to the Foundations of
Mathematics, Wiley, second edition, 1965, is an excellent account of the
very type of set theory I want to promote, but it is extraordinarily difficult
to find a copy short of reading it at the British Library. In particular, the
Science Museum Library has every Wilder book except that one.
I start with an
elementary benefit of beginning the von Neumann version of ordinals with the
null set instead of the set whose sole member that is, ie with zero instead of
one. This gives the useful fact that the nth ordinal is a set whose members are
0 . . . n-1, while the set whose members are 0 . . . n has n+1 members. From
this we get the elegance of being able to define the successor of n as the
union of n as a set (namely {0, 1, … n-1}) and the set whose member that is
(namely {{0, 1, … n-1}}, giving a set with the n+1 members 0, 1, … n-1, {0, 1,
… n-1}, the last being the nth ordinal, giving us {0, 1, … n} as required.
The ordinal w is the set of all such
ordinals but it clearly cannot be specified as the successor of any of them,
since the von Neumann ordinals as we have so far understood them (what he
called the natural numbers) have no last member for w immediately to succeed (and
still will not have a last member when we understand them better). For this
reason w is called a
limit ordinal – it cannot be reached by successively ‘adding one’ in the sense
above. Yet one can be added to it in that sense – in other words it can be
given a successor. This is the union of w (the set of all finite von
Neumann ordinals) with the set whose member is w, and this union set we can
term w + 1.
Repeating this we can
obtain w + 1 +1, and so
on indefinitely until we fail to reach w + w, a second limit ordinal.
For a brief indication of how this process goes on see Quine pages 151, 152 and
210, 211; Potter page 204, and Tourlakis page 426. On all of these a mysterious
so-called epsilon number, epsilon zero, appears, whose importance is that it is
a ‘fixed point’, in that it is the value of a function for itself as argument,
ie by satisfying f (x) = x, something one would not have
thought possible in transfinite arithmetic until one observes that w (to the w to the w to the w . . . ) is identical to w to the w to the w . . . Cantor’s treatment of
epsilon numbers constitutes § 20 of his Beiträge and is thus his last
published contribution to set theory
Now this new series of
ordinal numbers has, after w, its first limit
ordinal, no last one, and so it will need something ‘stronger’ than a limit
ordinal, if not to terminate it, which is impossible, to complete it in some
sense. This requirement was first expressed by Cantor in § 11 of Nr 5 of his 4th
Abhandlung, preceding the Beiträge: he asks for a ‘zweite
Erzeugungsprinzip’ over and above his first, which had enabled him to posit a
totality of the unterminating whole numbers and (for the first time) call it w. Now this set, as we now call
it, is termed denumerable, indeed is the paradigm denumerable set, since a set
is denumerable if its members can be put in one-one correspondence with the
members of w.
What is astounding to
the unprepared amateur is that if we take the series of transfinite ordinals,
beginning with w and including
all the successor ordinals and limit ordinals indicated above, and terminate it
arbitrarily, even well beyond epsilon zero, we can, with sufficient ingenuity,
put the result in one-one correlation with the members of w, making it denumerable.
Cantor’s aim was to establish that the unterminated totality of that
series would be non-denumerable, and require a higher cardinal number than
Aleph zero. After all, this seems very reasonable by analogy: if we terminate
the finite whole numbers arbitrarily, however far from zero, they form a finite
cardinality, but if we form a set of their totality, their cardinality is
transfinite. We ought to be able to hope that forming a set of the series we
have been discussing, which Cantor calls the second number-class, will give us
a higher cardinality (and clearly, if it does it will have to be the next
higher). Cantor suggests (page 196 of the Olms edition) that it will be a limit,
but we already have modest limit ordinals, so a new term is needed for a super
limit ordinal that will, by beginning the third number class, wind up the
second.
It is termed an initial
ordinal, defined by Tourlakis on page 459, with a proof (VII.4.3) that
establishes that every initial ordinal is a cardinal. The one we want,
to wind up the second number class, can only be the very next one, and so is
justifiably termed w1. An anxiety
arises here: for we can by another route find an indubitably higher cardinality
than Aleph zero (ie than the limit ordinal w) but we cannot guarantee
that it is also the next highest, and thereby identify it with w1.
This indubitably higher
cardinality is famous as the cardinality of the continuum, the set of real
numbers between zero and one. Cantor first proved their nondenumerability in
his first contribution to set theory of 1874, page 115 of the Olms edition,
distinguishing the denumerability of the algebraic real numbers and the
nondenumerability of the full real numbers, but it was in a later paper, of
1890-91, his last contribution to set theory before the Beiträge, pages
278-281 in Olms, that he gave the proof that is now famous as his diagonal
procedure. This is now expressed very simply by setting out the real numbers
between zero and one as an infinite array of infinite decimals. Each infinite
decimal is clearly a denumerable expansion occupying a row, and there are
assumed to be denumerably many rows. Now if one were to change one of the
digits of an expansion one would have changed its value, but one could reasonably
assume that this new value would appear in some other row. But if one were
systematically (granted the time, or giving it to oneself by a ‘super’ thought
experiment) to change, first, the first digit of the first row and then,
second, the second digit of the second row, and so on, one would have achieved
an expansion different from any previously there. If, however, one were
arbitrarily to insert it among the rows, hoping to retain denumerability, the
possibility of generating yet a further new real number would remain, and would
continue to remain however often one did the same.
This cardinality of the
continuum can be shown (see pages 453-4 and 455 of Tourlakis) also to hold for
the set of all functions from w to w, and for the so-called
power set of all subsets of w. The latter is
termed P(w); the cardinal
of the continuum, by taking the diagonal process for binary expansions, can be
conveniently termed 2 to Aleph zero. The question is, whether the same holds
for the set that Cantor called his second number class, enabling us to identify
its cardinality with that of the continuum and to call that too w1 (or Aleph one).
Cantor failed to
establish that this was the case. An editorial footnote to his second
contribution (of 1878) on page 133 of Olms identifies the sentence on page 132
where he first expresses his hope that it is. At least (see pages 199-201 of
Olms) he established to his own satisfaction that no cardinality intervened
between the cardinalities of the first and second number classes, but this
still leaves the cardinality of the second ‘in the air’ between having a
cardinality of its own and sharing that of the continuum. The work of Gödel
(1940) and Cohen (1963 and 1964) has now established that it will always be in
the air: the former demonstrating (see Tourlakis, page 215) that the continuum
hypothesis is consistent with the Axiom of Choice and the other axioms of set
theory, the latter demonstrating that it is independent of them.
This in turn arouses a
further anxiety, though it may well be private to myself and only come from my
lack of mathematical understanding. On page 229 Tourlakis admits that adopting
the Axiom of Choice or dropping it is as free a matter of choice as with
Euclid’s parallel axiom, yet on that very page he declares its plausibility,
and on page 395 begins an argument emphasising its plausibility. I find this
quite convincing, but I also find convincing an opposite intuitive
consideration of my own. For I can only hope that anyone who compares my
account of Cantor’s second number class with my description of his diagonal
procedure as applied to real numbers will agree that the latter indicates a
higher cardinality than the former, and not an identical one. Fearing this
comparison to have more in common with a trance experience than with intuition,
I was encouraged to find that Cohen (1966, and quoted at the top of Potter,
page 274) might charitably be said to agree with me.
An equally open question
is whether there can be cardinal numbers outside the series of Alephs. One
speculation has been that the continuum’s cardinality is not Aleph-one but
Aleph-two (Gödel, briefly, see Potter page 273), that it is some higher Aleph,
that its cardinality somehow squeezes in between Alephs (without the Axiom of
Choice there might be cardinalities that are incomparable, see Hartogs, page
266 of Potter), or that its cardinality is beyond any Aleph, which appears to
be the implication of Cohen on Potter’s page 274.
The Aleph sequence is
defined inductively by Tourlakis on page 465, requiring that Aleph zero is w, that the next highest
cardinal to Aleph alpha is to be called Aleph alpha + 1, with a proviso if
alpha is a limit ordinal and so cannot be reached; while the existence of a
next highest cardinal depends on VII.4.16 (page 462), that for any set its
power set has a higher cardinality (even if not the next higher), depending in
turn on the Zermelo well ordering principle (VI.5.50, page 355 and depending on
the Axiom of Choice). That there might be cardinals beyond Alephs, or Alephs
beyond those with any ordinal subscript, is a supposition that requires the
concepts of the cofinality of ordinals, their regularity or singularity, and
the weak or strong inaccessibility of cardinals (pages 478-484). I cannot
pretend to explain any of these complications.
At least, a neat titbit is that Tourlakis, on
his page 483, treats “Aleph” as a function sign and its index as its argument
place. This enables him to ask, what is Aleph’s first fixed point, ie, for what
alpha, Aleph alpha equals alpha. It is “quite huge”, constituting a cardinal
fixed point corresponding to but enormously beyond the ordinal epsilon
zero. We thus have an intriguing triple comparison: this strongly inaccessible
cardinal apparently relates to smaller infinite cardinals in no more mysterious
a manner than w did (above) to
finite cardinals, and, in between, than initial ordinals did to limit ordinals.
This tantalising coincidence of apparent simplicity with the limits of my
mathematical ability tempts me to think there could be an analogy with my
previous paper, in which I defended the meaning of beliefs which are embedded
in grander beliefs that I find meaningless.
For these mathematical concepts and their
problems surely raise questions of what is the case in this realm of
mathematics, or as I put it above of sentences that say something about
something. Potter, on his page 233, quotes Boolos (2000) on this: “ . . . and
we are no longer listening to a description of anything that is the case?” On
the same page he quotes Wittgenstein. This is another of his misleading
quotations in that, coming from two different pages of Wittgenstein’s
mathematics lectures (Cora Diamond, 1976, pages 32 and 142, going back to page
140), it does not draw out how limited Wittgenstein’s viewpoint is. This is
sad, because it concludes with a wonderful analogy. If someone (misguidedly, of
course) says that a child has learnt Aleph zero multiplications, Wittgenstein
says it hasn’t learnt anything huge, meaning nothing more by Aleph zero than an
approach to the infinite. But what if he himself could have learnt more,
say the sequence of Alephs and the question of infinite but non-Aleph
cardinalities, by time travelling to Cohen and beyond (indeed, just by staying
alive for a little): what substance would that have had? No more, I fear, than
the infinite radius of curvature of a ruler in a schoolboy’s satchel.