2 thoughts on “The influence of station density on climate data homogenization”

  1. This is a very worthwhile paper and the first of its type that I am aware of which has considered the issue of station density in such a systematic way. The basic concept of using the full homogenised data set as a reference is sound, at least as a benchmark.

    My only caution with the results is that the specific results presented in the paper are likely to be specific to the regions concerned. Peru and Switzerland both have complex topography and hence it would be expected that correlation length scales are relatively short by global standards. (By way of comparison, the density of temperature stations in Australia is about one per 10,000 square kilometres, even sparser than Peru, but the generally flat terrain means that correlation length scales are longer). Peru has the added complication that there is a strong ENSO signal on temperatures, including on coastal temperature gradients (although it is unclear from the paper how close the study area is to the coast) – in fact it is hard to imagine a more challenging region geographically for data homogenisation in general.

    The authors also note that correlation length scales are generally shorter in Peru than in Switzerland, and note that this is typical of the tropics. I note here that the correlations are aggregated across the entire year. My experience with tropical Australian data is that (a) there is very strong seasonality in correlation length scales for minimum temperature – in some cases, typical distances for decay to 0.6 correlation range from 200-300km in the wet season to 1000-1500km in the dry season and (b) low daily minimum temperatures in the tropical wet season generally occur during precipitation events (especially significant thunderstorms), not as a result of radiational cooling, so it is not surprising that minimum temperature correlations become comparable to those for precipitation during the wet season.

    (It is interesting that most of the lowest correlations in the Swiss data set, especially for maximum temperature, are in the 50-150km range; presumably this reflects the geographic distribution of mountain/valley station pairs, which I imagine would be the ones in Switzerland with the weakest correlations, all other things being equal?).

    Something which is mentioned only in passing is that one of the largest benefits of a homogenised data set is that it greatly reduces the spread of station trends, producing a more spatially coherent set of results, as shown in Figure 4.

    The result that the automatic method performed better (or, perhaps more accurately, less badly) on the sparse data set than the dense data set is interesting, and could perhaps use some more exploration.

    Impact on the larger scientific community. [80]

    As the first major paper of its type, this will be an important contribution to what can and cannot realistically be done in the homogenisation of sparse networks, which are particularly common in developing countries.

    Contribution to the scientific field of the journal. [90]

    This paper is definitely relevant to the journal.

    The technical quality of the paper. [90]

    The paper appears to be technically sound.

    Importance at the time of publishing. [-]

    Not relevant. New paper.

    Importance of the research program. [-]

    Not relevant. Single paper.


    1. Thank you for your assessment. I agree that when looking to transfer these results to other regions it would be better to look at typical correlations between stations than simply station density.

      In an upcoming paper (the manuscript is found on EarthArXiv) Ralf Lindau and I propose a method to estimate the Signal to Noise Ratio (SNR) and the number of breaks for a difference time series. The SNR is the standard deviation of the break signal divided by the standard deviation of the noise. These numbers will hopefully be an accurate way to estimate how difficult it is to homogenise a network.

      These estimates are just for a difference time series: one would still need to make choices which references series to include when computing a network wide average. Comparing SNR and number of breaks is still apples and oranges. It would be valuable to have a measure that combined both, but that would likely be more homogenisation method specific.


Leave a Reply to Blair Trewin Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s