Wednesday, May 26, 2004

If We're Doomed, it's not from First Principles

PDF Printable Version (PDF)

Martin Rees, Britain's Astronomer Royal, has written a thoughtful book called Our Final Hour with the snappy subtitle A Scientists Warning: How terror, error, and environmental disaster threaten humankind's future in this centuryon earth and beyond. In it, he presents and motivates his estimate of the likelihood that humans will survive another century50%. The book is well argued, sane, and definitely worth reading.

In the course of an otherwise plausible discussion, Rees presents one argument that is very different from the rest, and much less plausible. Chapter 10, The Doomsday Philosophers, is subtitled Can pure thought tell us whether humanity's years are numbered? Rather clearly not, you might think. Suprisingly, however, Rees goes on to present a curious argument (devised by Brandon Carter) that purports to show, from first principles, that we are probably doomed. To be fair, the author is clearly not entirely comfortable with the argument since he closes the chapter with the following comment:

When I first heard Carter's Doomsday argument, it reminded me of George Orwell's robust comment in a different context: "You must be a real intellectual to believe thatno ordinary person could be so foolish." But pinpointing an explicit flaw is not a trivial exercise. It is worth doing so, however, since none of us welcomes a new argument that humanity's days may be numbered.

Here is the Carter's argument, as presented by Rees, followed by my reasons for thinking it shouldn't keep anyone awake at night.

Carter's Doomsday Argument

The following is taken from Chapter 10 of Our Final Hour:

This "Doomsday" argument depends on a kind of "Copernican principle" or "principle of mediocrity" applied to our position in time. Ever since Copernicus, we have denied ourselves a central location in the universe. Likewise, according to Carter, we shouldn't assume that we are living in a special time in the history of humanity, neither among the very first nor the very last of our species. Consider our place in the "roll call" of Homo sapiens. We know our place only very roughly: most estimates suggest that the number of human beings who have preceded us is around sixty billion, so our number in the roll call is in this range . . .

[discussion of "Why so few?" omitted.]

Now consider two different scenarios for humanity's future: a "pessimistic" one, where our species dies within one or two centuries (or if it survives longer than that has a much diminished population), so that the total number of humans who will ever exist is one hundred billion; and an "optimistic" scenario, where humanity survives for many millenia with at least the present population (or perhaps even spreads far beyond Earth with an ever-enlarging population), so that trillions of people are destined to be born in the future. Brandon Carter argues that the "principle of mediocrity" should lead us to bet on the "pessimistic" scenario. Our place in the roll-call (about halfway through) is then entirely unsurprising and typical, whereas in the "optimistic" scenario, where a high population persists into the far future, those living in the twenty-first centure would be in the early roll-call of humanity.

The Nature of the Game: Models and Estimates

The first thing to observe about this argument is that it is really putting forward a model in this case, a way of estimating how much longer humanity is likely to survive. We do not normally think of models as being "right" or "wrong": rather, they are more or less accurate in different circumstances, and may have biases or other systematic shortcomings. Rees talks in terms of identifying the "flaw", but while I think the argument is flawed, it is helpful to remember that its model is only one of an infinity of ways of estimating the future lifetime of humanity, and that the different methods will inevitably give a wide range of answers. For this reason, even a reader who is convinced that the logic of the Carter-Rees Argument is sound should not necessarily expect its prediction to be accurate.

To make this more concrete, consider estimating human life expectancy. Here are some estimates one might make (figures being only illustrative). [In the following, E(X) denotes the expectation value of Xessentially an averageand E(X|Y) denotes the expectation value of X given Y.]

  • E(Age at death) = 70
  • E(Age at death | male) = 65
  • E(Age at death | smoker) = 52
  • E(Age at death | male smoker) = 49
  • E(Age at death | current age = 85) = 91
  • E(Age at death | 2 convictions for drunk driving) = 42
  • E(Age at death | Advanced melanoma) = Current Age + 6 months

These examples illustrate that our estimates change according to what information we have. The Carter-Rees argument uses almost the minimum amount of information possible. As such, it is a very simple estimate, and might be less convincing than other estimates that use more information.

All the the examples above take the form of probabilistic estimates based on different information. But we might also want to use information about what we believe is likely to happen in the future to help us to make (potentially) more accurate estimates. For example:

  • Estimated age at death = current age + 1 yearbecause we expect a large meteor to hit the Earth in 1 year.
  • Estimated age at death = 70 if over 18, and 150 if under 18because we expect revolutionary medical advances to allow almost everyone to live to 150 within 50 years.

Having set Carter-Rees model in context, let's look at the argument that leads to it.

I Refute You Thus

Why Count Humans?

The first odd thing about the argument is that it says we "shouldn't assume that we are living in a special time in the history of the human race" (2004), but it doesn't carry this through and say "humans have existed on Earth for tens of thousands of years, so the principle of mediocrity suggests we are likely to live for tens of thousands more years." Instead, it switches tack and starts counting human beings. This is actually a nice illustration of exactly the point discussed above, namely that there are many possible models, and that the information we choose to take into account conditions our estimates. It is just strange to articulate a principle that leads to a comparatively reassuring estimate and then subtly vary it to produce a more worrying projection.

It's also interesting to reverse the argument since it is symmetrical. The principle of mediocrity suggests that we are unlikely to be near the end of human history. And presumably the same holds for tigers, pandas, golden frogs and the countless other species on the verge of extinction. Of course, as Martin Rees suggests, only a real intellectual would believe that the Doomsday Argument will save the whale. And I'm sure he does not think we can stop worrying about meteor collisions for a while because we're safe until there have been another 60 billion-odd humans.

Random Sampling and Censoring

Although I hesitate to say the the Doomsday Argument is wrong, I do think it is misconceived. Its starting point is the assertion that we should not assume that we are living in a special time in the history of humanity. Where does this (non-)assumption come from?

Rees uses an analogy with drawing tickets from two identical urns, one containing 10 tickets and the other 1,000. The tickets are numbered from 1 to 10 and from 1 to 1,000 respectively. He then arguescorrectlythat if a ticket is drawn from a randomly selected urn, and its number turns out to be 6, this is strong evidence that the urn from which it was drawn was in fact the one containing 10 tickets. This is clearly correct since if the first urn is chosen the probability of drawing 6 is 10%, while in the case of the second the probability is 0.1%. Common sense (known to probability theorists as Bayes's Theorem) tells us that the odds that we drew from the second urn are 100 to 1 against. (I've included a proof of this with Bayes's Theorem in this PDF document, which will prove that you really just wanted to use common sense to get the result.)

Having two urns actually confuses the situation. Fundamentally, Rees is arguing that a reasonable way to estimate the number of tickets in a single urn containing an unknown number is to draw out a ticket and multiply its number by 2. It is true that if one did this many times, the average resulting estimate would be correct, but it's also true that the spread of estimates would be rather large, and that many estimates would be very poor. In fact, we would underestimate by a factor of K or more exactly 1/(2K)th of the time.

Let us nevertheless accept the proposed estimation method. The question then becomes: is the situation in the Doomsday Argument equivalent to the urns? No, it is not. This is immediately clear because in the case of the urns, we are randomly choosing a ticket from a known distribution: we know that there is exactly one of each of N tickets, and that each is chosen with probability 1/N. The equivalent situation would be if we knew when humanity would end (how many human beings there will be in total) and then randomly picked a sampling point between one and that number. But we haven't done that: we're just looking at one specific point in time, before the number of humans has been determined. This is not random sampling, because it is not random, and clearly even those points that we could sample (numbers less than 60 billion) are not given equal chances of being drawn. (If you are not immediately convinced that this is not random sampling, consider what happens if you apply the same procedure tomorrow. Do you get a random answer? Clearly not. You get an answer that is strongly and essentially predictably correlated with today's answer.) In fact we have no idea how much of the distribution we are "sampling" because the distribution is censored.

To illustrate this, here is the probability distribution for the 10-ticket urn. Naturally every point (ticket) has the same probability of being drawn0.1.

discrete uniform distribution for 10 tickets

What does the equivalent picture look like for the Doomsday Argument? I would argue that it looks like this:

discrete uniform distribution for 10 tickets

There are obviously two rather major differences between this distribution and that shown for the 10-ticket urn.

  1. The right hand side of the second graph is missing, marked "Censored." This is because we don't know how many people there will be. So even if we do manage to persuade ourselves that there is some sense in which we the current value of "humans who have lived" is randomly chosen, it is definitely chosen from a range that excludes all humans who have not yet lived, whatever their number might be.
  2. All of the probability is at a single number, which I've made 60 billion (Rees's estimate of the number of human's who have ever lived). This might be a more contentious claim. My perspective is that there is no useful sense in which the 60 billion is 'random': I presume that Rees is arguing that it is in some sense random because he and Carter didn't pick the number: it is just the point at which the censoring starts. But I think this is a pretty hard position to defend, and again ask you to think about repeating the experiment.

In fact, in the section after next, I will suggest a minor variation in the Doomsday Argument that I think is more defensible and which actually leads to more optimism than Rees can muster. But first I want to talk a little more about distributions.

The Distribution Really Matters

Let us return to estimating human life expectancy, which seems closely related to the problem at hand. A method not dissimilar to the Carter-Rees approach for going about this would be to estimate a human's life expectancy as twice his or her current age. This really isn't a very good method of estimation for most people outside middle age for the very obvious reason that human lifespans follow, in the rich nations at least, a roughly Normal distribution with a mean around 70 and a small-ish standard deviation. To estimate the lifespan of a 2-year old as 4 is almost as unduly pessimistic as it is optimistic to estimate that of a 90-year old as 180. Clearly proven longevity thus far is no guarantee of further years. (In fact, I am reminded of the parable of the woman who was most disappointed when her horse, whom she had carefully been training to survive on ever less food, died shortly after she thought she had reduced its nutritional requirements to nil.)

Unfortunately, we don't have good statistics on the lifespans of intelligent species since we don't actually know of any life beyond Earth. But the importance of distributions, as illustrated by the case of estimating human lifetimes, does suggest that we should be a little carful about just doubling our current age (or cumulative population count).

A More Optimistic Interpretation

As I have indicated, I think the Carter-Rees Argument is somewhat suspect as a way of making predictions, but I think it can be improved, at least slightly, and in doing so made to seem much more optimistic.

In addition to the urns analogy, Rees relays an argument from Richard Gott, who visited both the Berlin Wall and the Egyptian pyramids in 1970. These had respective ages of 12 years and around 4,000 years at that time. On the basis of their proven longevities as of 1970, Gott correctly predicted that the pyramids would outlast the Berlin wall. While I don't find this an entirely compelling argument, (not least because of its application to human life expectancy, discussed above), it does have a very different slant from the urns analogy. It basically argues that proven longevity suggests future longevity. This suggests a Universal Lifetime Estimate as follows:

E (Lifetime(X) | Age(X) = A) = 2A.

This could also be extended to the counting estimate that is used in the actual Carter-Rees model, to give a Universal Final Count Estimate as follows:

E (Final Count (X) | Current Count(X) = C) = 2C.

The interesting thing about these formulas is that while they give the same estimate of humanity as the Carter-Rees model, as humanity survives longer, this makes us more optimistic about the future. The contrast in interpretation could hardly be greater. The Carter-Rees ("Doomsday") Argument says that we expect about 120 billion people in total, and appears to suggest that as time passes, we are getting ever closer to the edge of a precipice. In contrast, the Universal Final Count Estimate says that if we get to the 120 billion people that Carter-Rees suggests is our likely total, we will have gathered much more evidence of our future longevity as a species. The longer shall survive, the longer shall we expect to survive into the future.

I should make it clear that I don't really have much more faith in the Universal Estimates than in the Carter-Rees model; but I do think they are at least more internally self-consistent, as well as being more optimistic.

A Note On Anthropic and Mediocrity Principles

Carter and Rees are both astronomers, and as Rees notes, astronomers have long made various assumptions about the universe and our place in it. One of these is that our place is not central, which seems very reasonable. We also assume that space is isotropic the same in all directions. More generally, we assume that the laws of physics are constant across the universe, so that, for example, when we send a probe to Mars gravity doesn't suddenly disappear, but can be relied upon to pull visiting rovers down to the surface.

We should, however, be careful to understand that these are assumptions, not self-evident or proven truths. It could be that we actually are very close to the centre of the universe, or that the space is not isotropic, or that the laws of physics do in fact vary in space. All of these things are, in principle, subject to experimental verification or refutation.

Perhaps the most fundamental problem with the Carter-Rees Argument is that it starts from an assumption (that we are not living in a special time in the history of humanity), then concludes from this that we are probably somewhere near the middle (arguably just as special as the ends, but let that pass) and then forms an estimate from that which is really no more than the mirror of the initial assumption. In other words, one way of reading the Carter-Rees Argument is as follows:

Let's assume that half the people who will ever live have been born.
Then there will be the same number in the future as in the past.

Stated that way, it seems rather circular.

Bibliography

Martin Rees, Our Final Hour. Basic Books (New York) 2003.

[Update 2004/05/29]

See also: Nick Bostrom. A Primer on the Doomsday Argument.

Friday, May 14, 2004

Welcome to Planet Zero

Perhaps the first article on this blog should have been the welcome, but life rarely seems to proceed in an orderly fashion. So welcome to Planet Zero.

Insofar as it has a principal theme, it is likely that the theme of this blog will be something to do with environment, sustainability, and humanity's future on and stewardship of the Earth. The musings on this theme will be from someone of fundamentally technical benta mathematician of sorts who wants to be a scientist; a scientist of sorts who likes (or liked) writing code; a programmer of sorts, who wants to 'save the planet'. If there is an agenda, it is to encourage people to make it more likely that human life will continue indefinitely on the planetsomething far from certain at this time. There is likely to be room, too, for some arts, for what, after all, are we saving the planet for, if not the finer expressions of humanity?

There's little point in pretending I will be a reliable correspondent; I am far more likely to be both infrequent and irregular. But I will try to be interesting.

Thursday, May 13, 2004

XML: Better Than the Hype

PDF Printable Version (PDF)

In all the time I've been in and around software only one technology I have applied has consistently and repeatedly delivered more than it seemed to promise. That technology is XML. Moreover, I don't seem to be alone in finding that I get more benefits from XML than seems reasonable. This 'overdelivery' is sufficiently surprising and unusual that I've spent quite a lot of time trying to understand its cause.

The Content, the Whole Content and Nothing But the Content

One statement often made about XML is that it separates information about content (meaning, semanics) from form. This is frequently contrasted with HTML, in which the mark-up contains information about formatting intention and (particularly in the case of div- and span-heavy XHTML) document structure.

While this is true, it seems to miss the key point, which I would summarize as follows:

In designing a good XML format we seek to describe whatever objects are to be represented fully and naturally, without undue regard to immediate usage.

The point here is that when we set out to represent something in XML, we are usually motivated by a specific use that we have in mind for the data. For example, perhaps we wish to print some bus timetables, and this is motivating us to devise an XML representation for timetable data. The key is that we don't focus narrowly on the precise data that we expect to print in the timetable, still less on how we are going to break it up and lay it out. Rather, we try to find a clear, natural representation of the underlying timetable data in XML, and to ensure that we make this representation as rich and complete as is reasonable.

Repurposing

The benefit that we get from adopting this approach is that when we want to make carry out some other task, there is a very good chance that we will be able to exploit exactly the same XML. This might be some other layout-oriented task (such as displaying the bus timetables on the web, or producing a pocket timetable), or something quite different such as building an online query service, calculating travel times, or journey planning.

In practice, I think it is this 'repurposability' of well-designed XML that is responsible for its repeatedly over-delivering. First, I get the project I actually wanted done cleanly, and then a few weeks, months or years later, I decide I need to exploit the same data differently, and this is very easy.

Paper, Scissors, Stone, XML

There are obviously lots of other virtues of XML that contribute to its 'overdelivery'. Commonly sited examples are its internationalization, the ever growing sea of high-quality, open-source XML processing software, and the rapidly growing XML-ization of the world; going with the flow has its rewards. But I think there are two other factors that are less widely appreciated, and perhaps more important.

The first of these is a sort of 'moral high ground'. Basically, if I need to exchange data with someone and I say my format's XML and theirs isn't, I win: almost everyone will take XML, because even if they don't have current capability to handle it, they feel they should have. And if we both have XML, we both win, because even though the formats are unlikely to be the same, XSLT usually makes conversion between different formats extremely easy (subject to their having similar expressiveness and concepts).

The way I think of this is in relation to Paper, Scissors, Stone. It's as if XML has changed the rules, so they now read

paper wraps stone
stone blunts scissors
scissors cut paper
XML annihilates paper, scissors and stone.

Cruel to be Kind

Finally, I think the other reason XML delivers such big payoffs is down to Tim Bray's success in inisisting that all XML parsers reject non-well-formed XML. The consequence is that I hardly ever encounter a badly formed document, and when I doas Tim himself explains over at ongoing ("XML Supports Constructive Finger-Pointing")the guilty party always says mea culpa and fixes it. I know there are people who believe draconian error handling is unfriendly, but XML appears to me the closest thing we're ever likely to get to proof that they're wrong. Sometimes, you really do have to be cruel to be kind.