Consolidation forecast archive.
U.S. Temperature and precipitation probability of exceedence forecasts.
Forecast description.
These forecasts are produced by a linear regression technique called 'Ensemble Regression' Ensemble
regression treats all members of a forecast ensemble as a potential solution to the problem. In a
single model ensemble, all members are considered to be equally likely to be the 'best' . In a
multi-model ensemble, the members from the more skillful models are assumed to be more likely to
occur. The ensemble regression procedure assumes that the conditional error distribution for the
best member (that is, the expected errors between the closest solution and the observation) are about
the same regradless of which model produced it. The ensemble regression procedure derives a least
squares solution to the entire ensemble set from the standpoint of minimizing the expected errors
between the best member and the observation.
File names and variable ids.
File names.
There are two types of archive files, hindcast and operational forecasts. A hindcast file contains a
'clean' set of retrospective forecasts. These forecasts are generally made in research mode from
historical data. The “operational forecast” files contain the data issued in real time, and
may including any variations in procedures or even errors, if those forecasts were officially 'issued'.
Hindcast datasets are identified by a data identifier (ID). Operational forecast datasets are identified
by the element followed by a “.operational” suffix in the dataset name.
File format.
Files for the 102 climate divisions are in ASCII format in a simple spreadsheet format. Data are
grouped by the order in which the forecasts were issued. Forecasts are typically for 102 forecast
divisions based on the NCDC climate division data, and are for three month seasons.
Column 1 = Year and month of the center month of the three month season in YYYYMM format..
Column 2 = Year that the forecast was issued
Column 3 = Month that the forecast was issued. A forecast is typically issued around the 2nd week of
the month, so a forecast labeled 1982 1 would have been issued around mid-January, 1982.
Column 4 = Lead time, in months, between the latest data used for this forecast and the START of the
valid time. So a 1 in this column indicates a 1-month lead time. A forecast issued in
January, 1982 would typically be based on data through the end of December, 1981, so a
1-month lead would refer to the 3-month period starting on February 1, 1982 (Jan + 1)=Feb.
and extend through the end of April (FMA). This is labeled by the center month, M=March, 1982)
hence Column 1 for this example would read 198203.
Column 5 = Forecast division for which this data is valid. (See CPC website)
Column 6- Probability of exceedence values. Column 6 gives the value expected to be exceeded 98% of the time.
Column 7 95% PoE
Column 8 90% PoE
Column 9 80% PoE
Column 10 70% PoE
Column 11 60% PoE
Column 12 50% PoE
Column 13 40% PoE
Column 14 30% PoE
Column 15 20% PoE
Column 16 10% PoE
Column 17 5% PoE
Column 18 2% PoE
Column 19 Gives the expected value (Mean) of the distribution of observations expected for this forecast.
Column 20 P(N+A) , Gives the Probability that the observation will be in the Normal or Above normal class.
Here 'Normal' refers to the middle third of the distribution (not necessarily near the expected value).
Below normal is the lower third (0-33.3%) of the climatological distribution of observations. Near
Normal is the middle, (33.3-66.6%), and above normal is the upper third (66.7%-100%) of the climatological
distribution. Climatology is always defined by the observations of the last 3 complete decades
(ie. 1961-90, 1970-2000).
Column 21 Gives the probability of Above normal) (P(A)).
Column 22. Gives the effective skill of the relationship. The value in this column is defined as:
R=SQRT(1-Vf/Vb) Where Vf is the forecast error variance
(Expected value of (Forecast - Observation)^2)
and Vb is the climatological variance of the observations (Observations-Obs mean)^2.
For positive values, this produces a skill estimate is similar to the correlation coefficient between the forecast and observations. Negative values signify that the models are predicting a greater variance in the expected observations than climatological variance.
Column 23 Forecast ID.
Because models may change in the course of time, each forecast is given an idea to help identify how it was made.
Forecast ID.
CCCVVV
CCC = decimal equivalent of binary model inclu
VVV = a version number.
Key to CCC = Each of four current input model forecast tools is given a position in a binary field.
ECCA = Ensemble CCA
CFS = CFS model ensembles
CCA = Canonical Correlation Analysis (Barnston)
SMLR = Screening Multiple Linear Regression.
Binary Decimal
ECCA, CFS, CCA, SMLR CCC
0 0 0 1 = 001
0 1 0 0 = 004
0 1 1 1 = 007
1 1 1 1 = 015