What is OPTyymm.AVE?
                            ---- -- ------------

This directory contains the optimal averages computed while running Reanalysis.
The data are in files "OPTyymm.AVE" where yymm are the year and month.  The
contents of the file are semi-documented in the program "OPTAVE.F".  If you 
have further questions contact Lev Gandin (wd20lg@sun1.wwb.noaa.gov).


----------------------------------------------------------------------------


                             Optimal Averaging
                             ------- ---------

As a means to increase the possibility of climate change detection, the 
averaging of basic meteorological fields over some selected areas was 
incorporated into the reanalysis procedures.  A new method known as the 
optimal averaging (OAv, Kagan, 1979) which assures minimum of the RMS 
averaging error (under assumption that the underlying statistics on the 
correlation function is exact) and providing this minimal error as a by-
product, has been used in the course of the reanalysis.  Fundamentally, the 
optimal averaging is analogous to the well-known method of optimum 
interpolation (OI) widely used in objective analysis of meteorological fields.

The OAv application in the reanalysis has been preceded by large series of 
numerical experiments on OAv, using both semi-analytical solutions for a 
rather simplified OAv model (Gandin, 1993) and numerical quadrature approach 
under more realistic assumptions, and performed at NCEP as a part of work on 
the reanalysis project.  In the course of these experiments, the OAv 
performance was compared with that of usual, arithmetic, averaging (Aav) in 
its dependence on various parameters of the averaging.  The main conclusions 
may be formulated as follows:

1. Except for very small domains, the OAv is substantially more accurate than 
AAv.  The accuracy increase is particularly high if deviations from the 
forecast first guess are averaged, rather than the values themselves or their 
deviations from climatology (anomalies).

2. For a given domain, the OAv accuracy quickly increases with increasing 
number N of observation points, until N becomes large, so that its further 
increase does not practically influences the averaging accuracy.

3. Inhomogeneities in the pattern of observation points over a domain have 
less effect on OAv than on the AAv accuracy.  At the same time, the OAv 
accuracy is very sensitive (although still less than AAv) to violations of the 
observation pattern symmetry with respect to the domain.

4. Inaccuracies in the underlying statistics (variances, correlation 
functions, RMS observation errors) have small effects on the OAv, much smaller 
than is the case for OI.  Only a dramatic overestimate of the observation 
accuracy may lead to a substantial decrease in the OAv accuracy.

                                Lev Gandin


----------------------------------------------------------------------------


                             Optimal Averaging
                             ------- ---------


'Optimal averages' can be found in the full archive.  The idea of creating an 
'optimal' average can be illustrated by a simple example.  Suppose we only 
had 1 temperature observation in the USA, say at Miami.  A simple approach 
would to assume that observation was representative of the entire USA.  
Suppose Miami's temperature was 28C then we would estimate average US 
temperature to be 28C.  Of course that would be a silly estimate.  A better 
way would to assume that the anomaly is zero (no information) in regions well 
separated from Miami and make some statistical statements about the 
temperature anomalies in regions near Miami.  This is basically the procedure 
used by 'optimal' averaging.  

The 'optimal' average depends heavily on the climatology when the data is 
sparse.  Of course, that raises the question, how do you calculate the 
climatology?  Some regions are data rich in part of the record and data poor 
in other parts.  In such cases, one is tempted to create the climatology from 
the data-rich period.  However, suppose the climatology was calculated in a 
warm decade and used in a cold decade.  Obviously one would get a warm bias 
(relative to truth) that was dependent on the number and pattern of the 
observations. (The 'optimal' average, like most statistical methods, assumes a 
stationary process.  For those interested in long-term trends, this is a 
dangerous assumption.)  For regions that are data poor throughout the record,
the accuracy is dependent on the estimate of the climatology (and the 
assumption that the 'climate' is stationary).

Is the 'optimal' average optimal?  In the process of calculating the 'optimal'
average, one needs to know the correlation between neighboring points.  If 
the correlation is poorly modeled, then the 'optimal' average suffers.  In
addition, the correlation is frequency dependent.  The correlation pattern
for high frequencies (period < 10 days) is usually localized about the point
of interest. For ENSO frequencies, on the other hand, the correlation patterns
are often global in extent.  Since the 'optimal' averaging procedure does not
handle the low frequencies differently from the high frequencies, one could
find a procedure that does better.  In addition, a modern data assimilation 
system uses more data and makes better use of that data so an accurate data 
assimilation system should be better in principle.  

The 'optimal average' can be shown to be equal to the spatial average of an
'optimal interpolation' given the same observations and statistical model.
The steps of the proof are: 1) show that the optimal average of a region is
equal to the sum of the optimal averages of the subregions weighted by their
fractional area, 2) show that as the size of the subregion diminishes,
the 'optimal average' approaches the 'optimal interpolation' estimate, 3)
therefore, we can cover the region with a very fine mesh, and the 'optimal
average' of the region is equal to fractional-area-weighted 'optimal average' 
of the small subregions. However, the 'optimal average' of the subregions 
approaches the 'optimal interpolation' value.  Thus, the 'optimal average'
is equal to the spatial average of the 'optimal interpolation' analysis.



                    How Optimal Averaging was Implemented
                    --- ------- --------- --- -----------


Theory and practice are often quite different.  The optimal averages were
computed by 2 different means using sonde data only (no aircraft data were
used).

Method 1.

In the first method, the optimal weights were computed and then normalized
so that the sum of the weights was one.  Then the observed data is multiplied
by their respective weights.

   Step 1.
         find optimal weights {w } i = 1,2, .. n
                                i

   Step 2.
         normalize weights
         
           w' = w  / W
            i    i 

         where W = w  + w  + ... + w
                    1    2          n

    Step 3.
         compute optimal average no. 1
         Opt-Ave-1 = w' F  + w' F  + ... + w' F
                       1 1    2  2           n n

         where F  is the observed value at point i
                i


Method 2.

In the second method, the optimal weights were computed and then NORMALIZED
so that the sum of the weights was one.  The deviations of the observations
from the first guess (see ASSIM.SYS) were then multiplied by their respective 
weights producing an 'optimal-average increment'.  The second optimal average 
was then computed by adding the the area average of the first guess and the 
'optimal-average increment'.

   Step 1.
         find optimal weights {w } i = 1,2, .. n
                                i

   Step 2.
         normalize weights
         
           w' = w  / W
            i    i 

         where W = w  + w  + ... + w
                    1    2          n

    Step 3.
         compute FG (area average of first guess)

              //
         FG = || fg(x,y) dx dy
              //

         where fg(x,y) is the first guess

    Step 4.
         compute optimal average increment

         OA-inc = w' (F - fg ) + w' (F - fg ) + .. + w' (F -fg )
                   1   1    1     2   2    2          n   n   n

         where fg  is the value of the first guess at point
                 i                                         i

         and F  is the observed value at point i
              i

    Step 5.
         compute optimal average number 2
         Opt-Ave-2 = OA-inc + FG



In the first paragraph, we considered the extreme situation of only having 
observed temperature for one point, Miami.  By the first method, the 
optimally averaged temperature for the USA would be Miami's temperature.  The 
second method would find the difference between Miami's observed temperature 
and first guess (say 2C) and then add this 'increment' to the US average of 
the first guess.  (I.e., assume the average temperature was 2C warmer than 
the first guess.)

Now suppose there was a station shift.  The temperature was no longer 
measured in Miami but in Boston.  The optimal average computed by the first 
method would show a large change as the new average would correspond to 
Boston's temperature.  The second optimal average would show a smaller shift 
as the first guess would not change significantly.

These examples are, of course, unrealistic.  However, they do point out
problems that could occur.  There have been significant changes in observation 
density over the decades.  In addition, the long-term trends are fractions of 
a degree unlike the change one would see from moving from Miami to Boston.  
In regions of no data coverage, the first guess may show the model biases 
which would affect Method 2.  If you use the error estimates from the optimal 
averaging, make sure that you can agree with the values of the 'climatology', 
variance and autocorrelation estimates used by the optimal averages.

Optimal averaging does eliminate the problem of over weighting regions of
high-observation density compared with the arithmetic averaging.

Note: since methods 1 and 2 normalize the weights, the 'optimal average' is
not equal to the spatial average of the corresponding 'optimal interpolation'
analysis.  (Methods 1 and 2 differ from the "optimal interpolation" as
defined in the literature.)

Looking at the optimal averages for Jan 1985, the first thing that one notices
is that the optimal averaged temperatures for region 1, mid-west US sector 
(35N-50N, 105W-85W) are approximately 1 degree colder than the final analyses.  
Over a region that is data rich and is surrounded by data rich regions, this 
error appears to be quite large.  This bias is the result of the main US sonde
being ~1 degree colder than the aircraft data.  The optimal average only
uses sonde data whereas Reanalysis tends to draw to the aircraft data which is
consistent with itself and has more observations.  Consequently the optimal 
average over the US is colder by ~1 degree.  I asked Bill Collins (NCEP) which 
data were better, and he wouldn't say.  Nevertheless the average sonde shows 
little bias relative to the first guess, unlike the main US sonde.  In 
addition, the error estimates (for T, U and V) appear to be quite small.  
If I understand the tables correctly, the averages computed from Reanalysis 
are outside the error bars of the optimal averages.  

Again, if you have further questions contact Lev Gandin, who worked on the 
theory of optimal averages (wd20lg@sun1.wwb.noaa.gov), or Eugenia Kalnay, 
who is a proponent of optimal averages (wd23ek@sun1.wwb.noaa.gov). 


disclaimer: The above text neither represents official or unofficial NCEP 
policy, opinions or beliefs. (WNE)