SlowClimate

Rohde et al. 2011ish - Berkeley Earth Temperature Averaging Process

2012-01-01T17:51:00.000-06:00

Rohde et al. (2011) The paper describes a process for estimating the temperature at any point on the Earth’s land surface using a discontinuous network of stations of any geographical distribution. The method was applied to the GHCN network and an estimate of the global average temperature was calculated back to 1800. Uncertainties in the global average were also calculated that account for spatial sampling and data errors.

The paper is important in two ways. Firstly from a scientific perspective it is important because it takes a new and statistically innovative look at the problem of estimating global land temperatures. That it confirms global average land surface air temperature trends is unsurprising, its greatest scientific impact will be in helping to elucidate trends and variability at smaller scales where uncertainties are much larger. The length of the record is 50 years longer than the longest current estimate produced by CRU and the proposed uncertainty range is narrower than those estimated for other data sets. Secondly it has a role to play in the wider discussion of climate change about which I will say nothing other than to acknowledge it.
The paper has not yet been peer-reviewed in the conventional sense, but instead placed on the web along with a bundle of data and code in parallel with the journal submission. A number of other informal reviews already exist and my comments will no doubt overlap with some of them.

The first thing to note is that the global average land surface air temperature produced by the Berkeley group is very close to those produced by NASA GISS, NOAA NCDC and Hadley CRU at least as far back as 1900. This is unsurprising. Before this date the global network thins out significantly and the four estimates diverge. In the Rohde paper, the post-1900 agreement does not appear as strong because the four data sets are not compared in an exactly like for like fashion. The GISS and CRU estimates shown are estimates of the global average temperature based on land stations rather than an estimate of the global average land temperature. As I understand it, the GISS estimate has been corrected, but the CRU estimate is still the wrong one for comparison.

Back before 1900, the uncertainties are generally larger, but there is a persistent decadal scale difference in the data with the Berkeley data set running cooler than the other data sets. This difference is most noticeable in comparison with the longer CRU data set which is around 0.5K warmer in 1850.

What did they do?

The averaging process is based on Kriging, a method that deals naturally with uneven geographical sampling. Temperatures are decomposed into a global average, a climatological average, and a local temperature anomaly. The climatological average is further broken down as a function of elevation and latitude (which together explain 95% of the variance) with the local deviations modelled using a simple local correlation function. The local temperature anomalies are assumed to have a simple local correlation function which more or less decays exponentially.

A number of add ons are included to deal with station moves, stations that are unrepresentative of the local climate and data outliers. Station moves are handled with what the authors call the ‘scalpel’ which cuts station records where a neighbour comparison implies a discontinuity in the series. The two fragments either side of the cut are dealt with as individual stations. Non-representative series – defined as those that diverge from the estimated value at the location of the station – are down-weighted iteratively.

These components are poured in to the statistical meat grinder and an estimate of the global average temperature pops out. Although the authors claim to have removed the need for gridding the data, they evaluate the integrals over the Earth’s surface numerically which amounts to the same thing.

To estimate data-based uncertainties they use subsets of the data and assess how the spread of the estimates based on fewer stations affects the global average. They also make a separate estimation of the spatial sampling uncertainty by applying the historical spatial weighting functions that are output by their averaging process to later, more completely sampled epochs.

By-products of the process include an estimate of the annual climatological land temperature (about 8.5+-0.5C), and an estimate of the bias at each station.

Criticisms

The chief criticism I have concerns their uncertainty analysis. They claim to have narrowed the uncertainty range for global average land temperature estimates, but at certain points in the manuscript uncertainties are acknowledged that they explicitly do not tackle or cannot tackle on their own. These include the problems associated with prevalent and widespread biases, the fixed parameters and analysis choices within their framework, the structural uncertainty. All of these problems will have a greater effect the further back in time the analysis is taken.

The most obvious fixed parameter is the correlation function used to do the kriging. This involves a whole suite of choices, principally the choice to use a 4^th order polynomial in the exponential and the choice to use correlations rather than covariances. As they note this latter choice makes sense if variances do not vary rapidly with distance – an assumption not supported in the text and likely to have a greater effect in the earlier record where the stations are predominantly coastal and being used to infer continental temperature variability. Fixing the form of the correlation function hides a lot of local behaviour which is nevertheless shown in their Figures 2 and 3.

It is also not clear what their correlation function shows either. It is calculated from pairs of neighbouring stations and the correlation at zero separation is interpreted in terms of data error: two stations at the same notional location would exhibit different variability due to the exact circumstances of the station siting. However, in their formulation there is no allowance for data error and the correlation functions should represent the underlying temperature field that they are trying to estimate rather than the measured temperature field (the true temperatures will generally have higher correlations than the measured temperatures).

Other choices are the use of neighbour composites to decide where station breaks are for the scalpel, the ad hoc weighting procedures for the station reliability and outlier assessments. It has already been noted elsewhere that – as with most unsupervised algorithms – the scalpel occasionally makes unusual changes to stations that seem counterintuitive.

Exactly how important these choices – and others – are is impossible to ascertain by simply reading the text. The method is too complex for me to imagine my way through it. As it is, the uncertainty ranges, particularly in the early data where biases are expected to be larger, seem too narrow and the fields that are produced seem too smooth. This leaves a large question mark hanging over the larger variability in the early 19^th century and the usefulness of the analysis at small scales. Without a more thorough quantification of the sensitivity of the analysis to choices and parameter settings, it is not possible to place a great deal of trust in the estimates of the earliest data or the smaller scale features.

Regarding their interpretation of their analysis as being the most comprehensive and the best: this may well be true, but this does not get them over the problem that there is uncertainty inherent in their choice of processing algorithms. Even if the other data sets are inferior – and there are no grounds for supposing that they are – then they still help to map out the structural uncertainty because their approaches to the problem are all very different.

The authors note that another way to assess structural uncertainty is to look at factors that are thought to affect the quantity under question. They make reference to two other submitted publications (also found on their website) which deal with urbanisation and with station siting in the US. These other analyses will be dealt with separately, but do not, I think, shed a great deal of light on these already well illuminated areas of study.

Summary

A new method has been devised which offers a great opportunity to more fully understand the uncertainties in estimates of global average land temperature going back for the first time to 1800. However, as might be expected with a new approach it is not clear that the potential has been realised. Without a more thorough assessment of the algorithm’s behaviour and associated uncertainties it is not possible to assess its success in reducing those uncertainties and sharpening our view of historical climate change.

Where it fits in to our understanding is of interest although perhaps too early to say. The method is closest methodologically to GISS, using local structure rather than large scale structure to interpolate the data. Also like GISS the analysis makes use of shorter data records. As with GISS, the fields produced have a certain smoothness that will underestimate local variability. This is possibly more representative of the platonic-ideal large scale temperature fields that these groups are trying to assess, but that is a matter for debate (what the hell are they actually measuring?) The lack of smoothing in the CRU data set suggests that this might still be a better choice for looking at small scale variability, but the analysis will be susceptible to micro-siting issues. The NOAA analysis by making use of teleconnections to reconstruct the large scale temperature field might be the more reliable record earlier on. There are various ways to test these suppositions and the ISTI plans to look at some of these using idealised benchmark tests.

Foster and Rahmstorf 2011 - The True Global Warming Signal

2012-01-01T11:36:00.006-06:00

Everyone knows that the temporary flickers in the global temperature curve are the marks of El Ninos, La Ninas and volcanic explosions. The slower variations are the symptoms of changes in the overall forcing due to the vagaries of solar output, accumulations of greenhouse gases in the atmosphere and others, or they arise as natural variations internal to the climate system.

Disentangling these effects is not easy. In fact it is not at all clear that they can, or want to be disentangled. What if high latitude aerosol forcing caused a slowdown in the meridional overturning circulation that projected on to patterns of internal variability such as the Atlantic Multidecadal Oscillation (AMO)? Would it be correct to attribute such a drop to internal variability or to anthropogenic forcing? Likewise should one separate out the thermodynamic effects of large volcanoes from the dynamic effects such as the hypothesised link with El Nina and the North Atlantic Oscillations? These questions cannot be answered categorically because each interpretation presupposes a different set of questions. On a basic level, if El Nino increases global temperature, what does global temperature minus the effects on it of the El Nino Southern Oscillation (ENSO) actually measure? What does it represent?

FR11 preform a multiple regression of a linear trend, annual cycle, ENSO, Total Solar Irradiance (TSI) and Aerosol Optical Depth (AOD) on global mean temperature and, having removed the ‘exogenous’ effects of ENSO, TSI and AOD claim - and claim here is the key word - that what remains is the “true global warming signal”.

On the plus side, they consider a range of different data sets - GISS, NCDC, HadCRUT, RSS and UAH - as well as looking at alternative measures of ENSO. They use MEI for their main analysis but also looked at SOI. They also looked at alternative measures of TSI (sunspots as opposed to TSI) and volcanism (Ammann as opposed to Sato). In principle such switches explore the sensitivity of their results to such choices. In this case “None of these substitutions affected the results in a significant way, establishing that this analysis is robust to the choice of data to represent exogenous factors”.

However, once started down that road it doesn’t pay to stop too soon. Why, for example were the lags of each predictor allowed to be decided by taking the value that gave the best fit to the data? If each GMT data set is a fair measure of global temperature then this ought to give some idea of how uncertain the lags are and one ought to play mix and match – take the lag that give the best fit for GISS and use it for HadCRUT for example. What of other measures of ENSO such as those provided by tropical Pacific SSTs or that used by Thompson et al. (2008 and later) (similar series to SOI, MEI etc) and Compo and Sardeshmukh (very, very different). Or other measures of TSI reconstructed from the same satellite data which show different mid to long-term trends.

One might plausibly argue for or against certain choices, but to do so would require a greater understanding of what the resulting signal is intended to measure but this is not provided. The considerations are further muddled by combining surface temperatures with tropospheric temperatures. They are quite different things.

A clue to what FR11 intended to extract is seen in their interpretation of the results. They claim that

“The resultant adjusted data show clearly, both visually and when subjected to statistical analysis, that the rate of global warming due to other factors (most likely these are exclusively anthropogenic) has been remarkably steady during the 32 years from 1979 through 2010.”

This suggests that FR11 have some concept of an underlying global temperature trend, largely anthropogenic, that might be revealed if we could somehow contrive to run the latter half of the twentieth century over again and again to obtain some kind of ensemble average. This hypothetical – and unmeasurable – standpoint stands in opposition to the somewhat more literal interpretation that global warming is simply a rise in global mean temperature. Other definitions – many definitions – have been offered. The FR11 interpretation is roughly in accord with the mental model shared by many climate scientists, that global temperatures are essentially on the up, but overlain on this monotonic rise are more rapid fluctuations associated with a variety of factors.

Taking too narrow a view can potentially make some apparent headway however, in this case, what to make of :

“There is no indication of any slowdown or acceleration of global warming, beyond the variability induced by these known natural factors.”

This latter question of whether or not there has been a slowdown in warming has mildly exercised all manner of minds over the past few years and it rather depends on precisely what is meant by global warming. The two factors discussed above prevent FR11 from definitively answering this question. First what does this index measure and second, how well does it measure it?

Why Slow Climate?

2011-11-13T15:13:00.003-06:00

The modern world is speedy, but there is often a benefit in stepping back and taking a longer look at things. In science, in climate science maybe more than others, there is a benefit in realising that the right answer won't be immediately apparent and that no single fact will ever prove completely decisive to our understanding of the climate system. It pays to take it slow.

A while ago, I came across the manifesto of the Slow Science movement and much of what they said echoes what I had been thinking. It is not possible to make an instant judgement of how a single paper or new piece of research changes your own (let alone a wider) view of science. These things take time to bed in, to find their place and true significance. It is in their relationships to other pieces of knowledge and in discussions with others that they are guided to where eventually they settle.

However, I don't agree with the opening line of their manifesto as a general truth about scientists, nor as a guiding principle: "We are scientists. We don’t blog. We don’t twitter. We take our time."

Werner Heisenberg, whose scientific chops few could doubt, noted that "Science, is rooted in conversations." Blogging and twittering are simply conduits for the conversations of our times. They won't replace the peer reviewed literature any time soon, but they do help to map out the web of inferences that connect all that information together. Conversations tug on those connections, testing their strength, working out by the most delicate of vibrations where the juiciest unclaimed bugs have landed.

OK, don't blog if you don't want to. But taking one's time and blogging are not necessarily antithetical. Some fine scientists blog. One can tweet or blog on a subject one has thought about deeply and slowly and the discipline of putting those thoughts into words can often help guide and objectify that thinking. Also, scientists do very often make rapid (days rather than months) decisions - when they are asked to review papers and proposals, or when they attend conferences and enter into the public and private conversations that follow them.

They even do it, in a more private way, every time they read a paper. The act of reading a paper is to make a series of judgements based on the evidence and argument provided in the paper against what is already known. Some people make notes about each paper as an aide memoire, others present their assessments in journal clubs and seminars or in more formal reviews. That's kind of what I want to do here: take the papers I've been reading and add my perspective to the range of views out there.

Blogging is also an invitation to the wider world to discuss a particular subject in an open manner. In doing so one can help to work out where the rough edges are on good ideas, rapidly find where the greatest flaws are in the bad ones. And, to work out what's nitpicking and what's not. In a more open discussion environment you also encounter people from outside of the discipline you work in and convincing them - or at least trying to convince them - of a particular point that convinces you (and vice versa obviously) can add a dimension to your understanding of the physical world that you would not otherwise have found. I'm open to people from all realms of knowledge to come along and join in, but I might also ask you so many questions, you'll wonder why you bothered.

This approach comes with caveats.

First, there are few papers I genuniely feel are so good that their methods can't be criticised. I'm not reviewing the papers here so I won't necessarily make constructive comments about the paper.

Second, though I try not to pick nits, I do. Nitpicking often serves as a proxy for something about the paper that bugs me, but that I can't quite put my finger on - an itch I can't scratch. Sometimes I figure it out, sometimes I don't, but I do occasionally revisit papers looking for the deeper scientific problem that irks me.

Third, my view of climate science isn't broad. Some bits I know better than others so oftentimes when reading a paper I end up with a series of questions to follow up (usually more papers to read) rather than anything conclusive. The converse of this is that what I think of as a decisive counter point in favour of or fatal to a paper's conclusions may be different to yours.

Fourth, my choice of papers is likely to send y'all to sleep. I have my interests, you have yours. I'll look at papers of wider interest when I read them.

Fifth, I reserve the right to change my mind. I know this confuses some people. So I thought it only fair to warn you.

Ludecke, Link and Ewert 2011

2011-11-11T17:28:00.004-06:00

There's been some discussion (and discussion about the discussion) of a paper by Ludecke et al. (2011) which was published in the International Journal of Modern Physics C. A number of critiques have been written already so there's no burning reason to add to them, but this paper seems topical and though I didn't want to start out this blog writing about a weak paper, I've read it and I think it's weak and I thought I should say why.

First, what did they do?

The authors consider long station records (50 or 100 years long). They detrend each record, estimate the Hurst coefficient of the detrended record and generate a large number of realisations of synthetic series with the same Hurst coefficient. They calculate the fraction of the synthetic series whose trends exceed that of the original station series (with its trend intact). They then count the number of stations where this fraction is greater than 5%. This number considered as a fraction of all the stations is intrepreted as the probability that the n records of the group are natural.

In addition they stratify station records by altitude of the station and by population and find that the trends in the stations increases with altitude and population.

What did they conclude?

There were four main conclusions
1) the mean trend for their selected stations was 0.58C from 1906-2005. For stations with populations below 1000 and within 800m of sea level the trend was 0.41C.
2) They "evaluated - with a confidence interval of 95% - the probability that the observed global warming from 1906 to 2005 was a natural fluctuation as lying between 40% and 70%, depending on the station's characteristics. For the period of 1906 to 1955 the probabilities are arranged between 80% and 90% and for 1956 to 2005 between 60% and 70%."
3) A strong UHI and altitude effect is seen in station trends.
4) A real Hurst exponent for the Earth could be larger than 0.63.

Comments

I think the paper is flawed in a number of fundamental ways that make some of the more detailed criticisms almost irrelevant. Just a note here that, as always, an incorrect or weak method does not mean that their conclusions are incorrect, only that the analysis does not support those conclusions.

0) My initial impression is that the analysis is heavily predisposed to the finding that temperature variations are more likely natural than not. The conclusions given the basic set up are highly likely and therefore (to my mind) highly unlikely to be informative. The authors dismiss global average records because they have unrealistically small standard deviations (I'll come back to this minor quibble later) and insist that local records are the place to look. At a local level, the natural variability will be much larger than any anthropogenic factors so they're maximising one signal at the expense of the other.

1) the records are split into a trend and the variability around that trend. The trend is called 'not natural' and the variability around it is called 'natural'. If these were just labels the analysis would be uninteresting. However, the authors associate the 'not natural' trend with anthropogenic factors and the 'natural' variations around the trend as natural variability.

This is problematic because part of the natural variation in the real world is not random. Volcanoes occurred when they occurred and the sun wobbled when it wobbled, these things didn't happen at random, but they are natural. Likewise the 'anthropogenic factors' are not obviously a second order polynomial. Splitting the data up in this way will inevitably muddle natural and anthropogenic factors and treating the natural variability as random ignores some of the actual natural variations.

2) However, splitting the data this way means that when they calculate the fraction of synthetic series with a H-coefficient equal to that of the detrended station record, they can interpret this as the probability that the record arises from natural causes. This is not correct on two counts. First, as mentioned above 'natural' is simply a label. Second, what they have calculated is the probability of seeing a trend of that size in a synthetic series with that H coefficient (P(data|H)). In order to calculate the probability that the record was generated using this particular statistical model they need to know the prior probability that the series was generated using this model (let's call that P(H)). Then using Bayes theorem:

P(H|data) = P(data|H)*P(H)/P(data)

[In addition, it would be useful to know what the grounds were for assuming that natural variability takes that particular form. Is a series with that particular H-coefficient a good model for natural variability? The grounds for this are hard to justify because the same data are used to estimate the model parameters as the modelled data are being compared to. They could split the data into training and test sets, but they already have perilously few data for accurately assessing H. However, even if we assume that this is a valid thing to do (which one can't simply dismiss a priori: it is an hypothesis) the analysis is still flawed.]

3) the next step involving the counting of stations with natural and non-natural trends is referred to in the conclusions, but the intrepretation of these counts (expressed as a fraction of all stations) made by the authors as the "probability that the observed temperature course is natural" is not correct. It is not clear what the correct way to combine the individual station records is because their 'model' does not include information about how correlated individual station records might be. Therefore, their conclusion based on these probabilities is unsupported by the text.

Looking at Table 2 and Figure 10, one might be forgiven for thinking that actually much of the behaviour seen cannot be explained by the simple Hurst model used. Between 28% and 48% of stations between 1956 and 2005 had a trend that occurred less than 5% of the time based on their Hurst model - 'not natural' by their designation. Figure 10 shows that the observed trends at the stations are inconsistent with the theoretical distributions. This they acknowledge:

"In the warming domain, the gap between the theoretical line for alpha=0.75 and the A7 graph in Figure 10 could be explained by warming due to anthropogenic greenhouse gases, an additionally warm biases caused by authoritative alterations of the station characteristics beginning in the 1970s, and, most notably, the activity of the sun. However, all these impacts can not be separated from each other"

The difficulties of interpreting their results are not carried through to the conclusions, nor to the abstract and their strongest conclusions are therefore not supported by their analysis. For example:

"As a result, the probabilities that the observed temperature series are natural have values roughly between 40% and 90%, depending on the stations characteristics and the periods considered. 'Natural' means that we do not have within a defined confidence interval a definitely positive anthropogenic contribution and, therefore, only a marginal anthropogenic contribution can not be excluded."

More minor comments

The most major of the more minor comments concerns station selection. Firstly they select long station series, which is sensible given their aim, but by doing so, they select stations that are not globally representative. The stations are predominantly in the northern hemisphere mid latitudes. This limits the conclusions they can draw about global temperatures.

Secondly, station distribution is not controlled for in their analysis of urbanisation and altitude. They stratify the stations according to population and altitude and attribute differences in the mean trends of these subgroups purely to population and altitude. However, this conclusion cannot be drawn without knowing how these stratifications affect the geographical distribution of stations within their sample. Other analyses looking at urbanisation (and other siting effects) have typically established equivalent networks of rural and urban stations e.g. by pairing off rural and urban stations that are close to one another and might therefore be expected to have similar trends.

A quibble here about their references to global temperature series. They claim that

"We argue that global records are not a feasible tool to examine global temperatures. First of all, the homogenization and grid box procedures used to establish global records are inevitably arbitrary and, therefore, could cause quite different outcomes when applied to identical sets of local records. Secondly, and of main interest here, establishing global records attenuate the extremes of uncorrelated local records. As a consequence, the standard deviation, which is a decisive attribute in our analysis, becomes unrealistically small in global records."

On page 13 they note

"The probable reason for the low standard deviation was already noted in the introduction (paragraph 1)."

Paragraph 1 was what I quoted above.

"However, GISS gives no further details about the procedures to establish its global records except for an annotation about the elimination of outliers and homogeneity adjustments. Global records should therefore be considered with care because they could depend on an inevitably arbitrary selection and different algorithms."

The GISS code and data are all publically available, so the first part of this statement is untrue. The question of arbitrary selection and diferent algorithms is an interesting one. It is trivially true that truly 'arbitrary' selection of stations and adjustment would lead to different global temperature records, but in practice these procedures are not entirely arbitrary and the four major analyses (GISS, NOAA, Berkeley and CRU) that process the data in quite different ways arrive at very similar global averages (and standard deviations) for land stations during the period considered in this analysis. Furthermore, the authors then draw on four global temperature series to support the contention that global tempertures began to drop in 1998 (GISS, HadCRU, RSS and UAH on page 14).

The argument against a low standard deviation is not elaborated.

Besides noting that a trend will bias the estimation of the Hurst coefficient too high, the Hurst coefficient estimation is treated as unproblematic. I would be interested to know what the uncertainty on the estimation actually was as this would generally (I would guess) widen their range of natural trends.

In summary

As you have probably gathered I don't think the papers conclusions are supported well by their analysis. Appropriate controls are lacking in their UHI analysis and there are mistakes in both their intrepretation of their calculated probabilities and their argument moving from individual stations records to global conclusions.

If I was reviewing the paper I would have rejected it on the grounds that the analysis needs a fundamental rethink. As it is their chain of argument is broken. Even if it were fixed I think it unlikely that it can say anything decisive about the various roles played by natural, anthropogenic and confounding factors on global temperature.

The Poverty of Skepticism

2011-05-08T11:11:00.006-06:00

Strange is the world in which skeptics are defined by that which they believe rather than by the route they take to knowledge.

Strange too is a world in which one might take especial pride in being skeptical. Skepticism is a faculty that all adults should possess, but like the ability to tie one's own shoes, public pride in such a skill is suitable only for children. Like most adults, I would say that in so much as I do not accept immediately all claims placed before me, I am skeptical. But I do not want to say by this small boast that my claims have any special merit. Not all have been examined closely and some, having a shape too familiar or pleasing, may have passed beneath notice and settled altogether too comfortably in mind. Others, having a taste disagreeable to my constitution, might have left me, like the child that refuses the bitter medicine, deficient in some vitamin of thought, at distinct disease in the consonance of my ideas.

If you would offer correction to my malady, think to sugar it well. If we agree too well, be not surprised by my indifference.

If there is one rule that I would live by, it is this: do not make skepticism your only virtue. True, the scientist's mind should not be a place where ideas are sustained beyond their time, lingering boorishly like the last late guest at a dinner party, draining the hosts' last reserves of brandy and goodwill. But equally the scientist's mind should not be a place where ideas go only to die.

The poverty of skepticism as a sole creed is that it is purely destructive.

The gardener who prunes but never plants is master only of a wasteland. Most ideas must weather the early frosts of indifference and so the open mind must be like the fertile earth and sheltered garden, the skeptic a gardener in it. Nurturing the tender shoots, with wisdom and experience trimming back eruptions of the older growth to preserve its vigour, or knowing when the time has come to fell the tree that once provided shade to toil beneath in the summer sun. And now and then he takes the sharp blade and digs deep for the pale and fatal roots of the weeds that are the cost. Ruthlessly to track down every fragment to drag them up and burn them.

Such slow tasks have no end nor can attain but passing likeness of perfection. Yet there, in the garden, blooms a rose. Its mottled stem and bee-beguiling scent no different from the rest, and only the gardener knows that it has seen from bud, to thorn, to flower, a hundred summers and a hundred winters more than he.