Friday, 11 November 2011

Ludecke, Link and Ewert 2011

There's been some discussion (and discussion about the discussion) of a paper by Ludecke et al. (2011) which was published in the International Journal of Modern Physics C. A number of critiques have been written already so there's no burning reason to add to them, but this paper seems topical and though I didn't want to start out this blog writing about a weak paper, I've read it and I think it's weak and I thought I should say why.

First, what did they do?

The authors consider long station records (50 or 100 years long). They detrend each record, estimate the Hurst coefficient of the detrended record and generate a large number of realisations of synthetic series with the same Hurst coefficient. They calculate the fraction of the synthetic series whose trends exceed that of the original station series (with its trend intact). They then count the number of stations where this fraction is greater than 5%. This number considered as a fraction of all the stations is intrepreted as the probability that the n records of the group are natural.

In addition they stratify station records by altitude of the station and by population and find that the trends in the stations increases with altitude and population.

What did they conclude?

There were four main conclusions
1) the mean trend for their selected stations was 0.58C from 1906-2005. For stations with populations below 1000 and within 800m of sea level the trend was 0.41C.
2) They "evaluated - with a confidence interval of 95% - the probability that the observed global warming from 1906 to 2005 was a natural fluctuation as lying between 40% and 70%, depending on the station's characteristics. For the period of 1906 to 1955 the probabilities are arranged between 80% and 90% and for 1956 to 2005 between 60% and 70%."
3) A strong UHI and altitude effect is seen in station trends.
4) A real Hurst exponent for the Earth could be larger than 0.63.


I think the paper is flawed in a number of fundamental ways that make some of the more detailed criticisms almost irrelevant. Just a note here that, as always, an incorrect or weak method does not mean that their conclusions are incorrect, only that the analysis does not support those conclusions.

0) My initial impression is that the analysis is heavily predisposed to the finding that temperature variations are more likely natural than not. The conclusions given the basic set up are highly likely and therefore (to my mind) highly unlikely to be informative. The authors dismiss global average records because they have unrealistically small standard deviations (I'll come back to this minor quibble later) and insist that local records are the place to look. At a local level, the natural variability will be much larger than any anthropogenic factors so they're maximising one signal at the expense of the other.

1) the records are split into a trend and the variability around that trend. The trend is called 'not natural' and the variability around it is called 'natural'. If these were just labels the analysis would be uninteresting. However, the authors associate the 'not natural' trend with anthropogenic factors and the 'natural' variations around the trend as natural variability.

This is problematic because part of the natural variation in the real world is not random. Volcanoes occurred when they occurred and the sun wobbled when it wobbled, these things didn't happen at random, but they are natural. Likewise the 'anthropogenic factors' are not obviously a second order polynomial. Splitting the data up in this way will inevitably muddle natural and anthropogenic factors and treating the natural variability as random ignores some of the actual natural variations.

2) However, splitting the data this way means that when they calculate the fraction of synthetic series with a H-coefficient equal to that of the detrended station record, they can interpret this as the probability that the record arises from natural causes. This is not correct on two counts. First, as mentioned above 'natural' is simply a label. Second, what they have calculated is the probability of seeing a trend of that size in a synthetic series with that H coefficient (P(data|H)). In order to calculate the probability that the record was generated using this particular statistical model they need to know the prior probability that the series was generated using this model (let's call that P(H)). Then using Bayes theorem:

P(H|data) = P(data|H)*P(H)/P(data)

[In addition, it would be useful to know what the grounds were for assuming that natural variability takes that particular form. Is a series with that particular H-coefficient a good model for natural variability? The grounds for this are hard to justify because the same data are used to estimate the model parameters as the modelled data are being compared to. They could split the data into training and test sets, but they already have perilously few data for accurately assessing H. However, even if we assume that this is a valid thing to do (which one can't simply dismiss a priori: it is an hypothesis) the analysis is still flawed.]

3) the next step involving the counting of stations with natural and non-natural trends is referred to in the conclusions, but the intrepretation of these counts (expressed as a fraction of all stations) made by the authors as the "probability that the observed temperature course is natural" is not correct. It is not clear what the correct way to combine the individual station records is because their 'model' does not include information about how correlated individual station records might be. Therefore, their conclusion based on these probabilities is unsupported by the text.

Looking at Table 2 and Figure 10, one might be forgiven for thinking that actually much of the behaviour seen cannot be explained by the simple Hurst model used. Between 28% and 48% of stations between 1956 and 2005 had a trend that occurred less than 5% of the time based on their Hurst model - 'not natural' by their designation. Figure 10 shows that the observed trends at the stations are inconsistent with the theoretical distributions. This they acknowledge:

"In the warming domain, the gap between the theoretical line for alpha=0.75 and the A7 graph in Figure 10 could be explained by warming due to anthropogenic greenhouse gases, an additionally warm biases caused by authoritative alterations of the station characteristics beginning in the 1970s, and, most notably, the activity of the sun. However, all these impacts can not be separated from each other"

The difficulties of interpreting their results are not carried through to the conclusions, nor to the abstract and their strongest conclusions are therefore not supported by their analysis. For example:

"As a result, the probabilities that the observed temperature series are natural have values roughly between 40% and 90%, depending on the stations characteristics and the periods considered. 'Natural' means that we do not have within a defined confidence interval a definitely positive anthropogenic contribution and, therefore, only a marginal anthropogenic contribution can not be excluded."

More minor comments

The most major of the more minor comments concerns station selection. Firstly they select long station series, which is sensible given their aim, but by doing so, they select stations that are not globally representative. The stations are predominantly in the northern hemisphere mid latitudes. This limits the conclusions they can draw about global temperatures.

Secondly, station distribution is not controlled for in their analysis of urbanisation and altitude. They stratify the stations according to population and altitude and attribute differences in the mean trends of these subgroups purely to population and altitude. However, this conclusion cannot be drawn without knowing how these stratifications affect the geographical distribution of stations within their sample. Other analyses looking at urbanisation (and other siting effects) have typically established equivalent networks of rural and urban stations e.g. by pairing off rural and urban stations that are close to one another and might therefore be expected to have similar trends.

A quibble here about their references to global temperature series. They claim that

"We argue that global records are not a feasible tool to examine global temperatures. First of all, the homogenization and grid box procedures used to establish global records are inevitably arbitrary and, therefore, could cause quite different outcomes when applied to identical sets of local records. Secondly, and of main interest here, establishing global records attenuate the extremes of uncorrelated local records. As a consequence, the standard deviation, which is a decisive attribute in our analysis, becomes unrealistically small in global records."

On page 13 they note

"The probable reason for the low standard deviation was already noted in the introduction (paragraph 1)."

Paragraph 1 was what I quoted above.

"However, GISS gives no further details about the procedures to establish its global records except for an annotation about the elimination of outliers and homogeneity adjustments. Global records should therefore be considered with care because they could depend on an inevitably arbitrary selection and different algorithms."

The GISS code and data are all publically available, so the first part of this statement is untrue. The question of arbitrary selection and diferent algorithms is an interesting one. It is trivially true that truly 'arbitrary' selection of stations and adjustment would lead to different global temperature records, but in practice these procedures are not entirely arbitrary and the four major analyses (GISS, NOAA, Berkeley and CRU) that process the data in quite different ways arrive at very similar global averages (and standard deviations) for land stations during the period considered in this analysis. Furthermore, the authors then draw on four global temperature series to support the contention that global tempertures began to drop in 1998 (GISS, HadCRU, RSS and UAH on page 14).

The argument against a low standard deviation is not elaborated.

Besides noting that a trend will bias the estimation of the Hurst coefficient too high, the Hurst coefficient estimation is treated as unproblematic. I would be interested to know what the uncertainty on the estimation actually was as this would generally (I would guess) widen their range of natural trends.

In summary

As you have probably gathered I don't think the papers conclusions are supported well by their analysis. Appropriate controls are lacking in their UHI analysis and there are mistakes in both their intrepretation of their calculated probabilities and their argument moving from individual stations records to global conclusions.

If I was reviewing the paper I would have rejected it on the grounds that the analysis needs a fundamental rethink. As it is their chain of argument is broken. Even if it were fixed I think it unlikely that it can say anything decisive about the various roles played by natural, anthropogenic and confounding factors on global temperature.

No comments:

Post a Comment