What is OPTyymm.AVE? ---- -- ------------ This directory contains the optimal averages computed while running Reanalysis. The data are in files "OPTyymm.AVE" where yymm are the year and month. The contents of the file are semi-documented in the program "OPTAVE.F". If you have further questions contact Lev Gandin (wd20lg@sun1.wwb.noaa.gov). ---------------------------------------------------------------------------- Optimal Averaging ------- --------- As a means to increase the possibility of climate change detection, the averaging of basic meteorological fields over some selected areas was incorporated into the reanalysis procedures. A new method known as the optimal averaging (OAv, Kagan, 1979) which assures minimum of the RMS averaging error (under assumption that the underlying statistics on the correlation function is exact) and providing this minimal error as a by- product, has been used in the course of the reanalysis. Fundamentally, the optimal averaging is analogous to the well-known method of optimum interpolation (OI) widely used in objective analysis of meteorological fields. The OAv application in the reanalysis has been preceded by large series of numerical experiments on OAv, using both semi-analytical solutions for a rather simplified OAv model (Gandin, 1993) and numerical quadrature approach under more realistic assumptions, and performed at NCEP as a part of work on the reanalysis project. In the course of these experiments, the OAv performance was compared with that of usual, arithmetic, averaging (Aav) in its dependence on various parameters of the averaging. The main conclusions may be formulated as follows: 1. Except for very small domains, the OAv is substantially more accurate than AAv. The accuracy increase is particularly high if deviations from the forecast first guess are averaged, rather than the values themselves or their deviations from climatology (anomalies). 2. For a given domain, the OAv accuracy quickly increases with increasing number N of observation points, until N becomes large, so that its further increase does not practically influences the averaging accuracy. 3. Inhomogeneities in the pattern of observation points over a domain have less effect on OAv than on the AAv accuracy. At the same time, the OAv accuracy is very sensitive (although still less than AAv) to violations of the observation pattern symmetry with respect to the domain. 4. Inaccuracies in the underlying statistics (variances, correlation functions, RMS observation errors) have small effects on the OAv, much smaller than is the case for OI. Only a dramatic overestimate of the observation accuracy may lead to a substantial decrease in the OAv accuracy. Lev Gandin ---------------------------------------------------------------------------- Optimal Averaging ------- --------- 'Optimal averages' can be found in the full archive. The idea of creating an 'optimal' average can be illustrated by a simple example. Suppose we only had 1 temperature observation in the USA, say at Miami. A simple approach would to assume that observation was representative of the entire USA. Suppose Miami's temperature was 28C then we would estimate average US temperature to be 28C. Of course that would be a silly estimate. A better way would to assume that the anomaly is zero (no information) in regions well separated from Miami and make some statistical statements about the temperature anomalies in regions near Miami. This is basically the procedure used by 'optimal' averaging. The 'optimal' average depends heavily on the climatology when the data is sparse. Of course, that raises the question, how do you calculate the climatology? Some regions are data rich in part of the record and data poor in other parts. In such cases, one is tempted to create the climatology from the data-rich period. However, suppose the climatology was calculated in a warm decade and used in a cold decade. Obviously one would get a warm bias (relative to truth) that was dependent on the number and pattern of the observations. (The 'optimal' average, like most statistical methods, assumes a stationary process. For those interested in long-term trends, this is a dangerous assumption.) For regions that are data poor throughout the record, the accuracy is dependent on the estimate of the climatology (and the assumption that the 'climate' is stationary). Is the 'optimal' average optimal? In the process of calculating the 'optimal' average, one needs to know the correlation between neighboring points. If the correlation is poorly modeled, then the 'optimal' average suffers. In addition, the correlation is frequency dependent. The correlation pattern for high frequencies (period < 10 days) is usually localized about the point of interest. For ENSO frequencies, on the other hand, the correlation patterns are often global in extent. Since the 'optimal' averaging procedure does not handle the low frequencies differently from the high frequencies, one could find a procedure that does better. In addition, a modern data assimilation system uses more data and makes better use of that data so an accurate data assimilation system should be better in principle. The 'optimal average' can be shown to be equal to the spatial average of an 'optimal interpolation' given the same observations and statistical model. The steps of the proof are: 1) show that the optimal average of a region is equal to the sum of the optimal averages of the subregions weighted by their fractional area, 2) show that as the size of the subregion diminishes, the 'optimal average' approaches the 'optimal interpolation' estimate, 3) therefore, we can cover the region with a very fine mesh, and the 'optimal average' of the region is equal to fractional-area-weighted 'optimal average' of the small subregions. However, the 'optimal average' of the subregions approaches the 'optimal interpolation' value. Thus, the 'optimal average' is equal to the spatial average of the 'optimal interpolation' analysis. How Optimal Averaging was Implemented --- ------- --------- --- ----------- Theory and practice are often quite different. The optimal averages were computed by 2 different means using sonde data only (no aircraft data were used). Method 1. In the first method, the optimal weights were computed and then normalized so that the sum of the weights was one. Then the observed data is multiplied by their respective weights. Step 1. find optimal weights {w } i = 1,2, .. n i Step 2. normalize weights w' = w / W i i where W = w + w + ... + w 1 2 n Step 3. compute optimal average no. 1 Opt-Ave-1 = w' F + w' F + ... + w' F 1 1 2 2 n n where F is the observed value at point i i Method 2. In the second method, the optimal weights were computed and then NORMALIZED so that the sum of the weights was one. The deviations of the observations from the first guess (see ASSIM.SYS) were then multiplied by their respective weights producing an 'optimal-average increment'. The second optimal average was then computed by adding the the area average of the first guess and the 'optimal-average increment'. Step 1. find optimal weights {w } i = 1,2, .. n i Step 2. normalize weights w' = w / W i i where W = w + w + ... + w 1 2 n Step 3. compute FG (area average of first guess) // FG = || fg(x,y) dx dy // where fg(x,y) is the first guess Step 4. compute optimal average increment OA-inc = w' (F - fg ) + w' (F - fg ) + .. + w' (F -fg ) 1 1 1 2 2 2 n n n where fg is the value of the first guess at point i i and F is the observed value at point i i Step 5. compute optimal average number 2 Opt-Ave-2 = OA-inc + FG In the first paragraph, we considered the extreme situation of only having observed temperature for one point, Miami. By the first method, the optimally averaged temperature for the USA would be Miami's temperature. The second method would find the difference between Miami's observed temperature and first guess (say 2C) and then add this 'increment' to the US average of the first guess. (I.e., assume the average temperature was 2C warmer than the first guess.) Now suppose there was a station shift. The temperature was no longer measured in Miami but in Boston. The optimal average computed by the first method would show a large change as the new average would correspond to Boston's temperature. The second optimal average would show a smaller shift as the first guess would not change significantly. These examples are, of course, unrealistic. However, they do point out problems that could occur. There have been significant changes in observation density over the decades. In addition, the long-term trends are fractions of a degree unlike the change one would see from moving from Miami to Boston. In regions of no data coverage, the first guess may show the model biases which would affect Method 2. If you use the error estimates from the optimal averaging, make sure that you can agree with the values of the 'climatology', variance and autocorrelation estimates used by the optimal averages. Optimal averaging does eliminate the problem of over weighting regions of high-observation density compared with the arithmetic averaging. Note: since methods 1 and 2 normalize the weights, the 'optimal average' is not equal to the spatial average of the corresponding 'optimal interpolation' analysis. (Methods 1 and 2 differ from the "optimal interpolation" as defined in the literature.) Looking at the optimal averages for Jan 1985, the first thing that one notices is that the optimal averaged temperatures for region 1, mid-west US sector (35N-50N, 105W-85W) are approximately 1 degree colder than the final analyses. Over a region that is data rich and is surrounded by data rich regions, this error appears to be quite large. This bias is the result of the main US sonde being ~1 degree colder than the aircraft data. The optimal average only uses sonde data whereas Reanalysis tends to draw to the aircraft data which is consistent with itself and has more observations. Consequently the optimal average over the US is colder by ~1 degree. I asked Bill Collins (NCEP) which data were better, and he wouldn't say. Nevertheless the average sonde shows little bias relative to the first guess, unlike the main US sonde. In addition, the error estimates (for T, U and V) appear to be quite small. If I understand the tables correctly, the averages computed from Reanalysis are outside the error bars of the optimal averages. Again, if you have further questions contact Lev Gandin, who worked on the theory of optimal averages (wd20lg@sun1.wwb.noaa.gov), or Eugenia Kalnay, who is a proponent of optimal averages (wd23ek@sun1.wwb.noaa.gov). disclaimer: The above text neither represents official or unofficial NCEP policy, opinions or beliefs. (WNE)