g2subset v1.3.4 g2subset is program that runs on a http server that allows users to download subsets of grib2 files. Characteristics cgi-bin script: program that generates web pages perl wrapper for wgrib2: a perl script that calls wgrib2 makes regional subsets: makes cookie cutter sections of a grid can interpolate: can convert to custom grids extract point values: can make grib files with selection lat-lon point using nearest neighbor interpolation select fields: can extract based on time, field and level Requirements: web server grib2 files that you want to distribute perl wgrib2 Why run g2subset: Bandwidth is a resource. Downloading subsets saves time and money. Of course this is not true in Bizzaro world where subsetting means fewer TB are downloaded and consequently your funding is cut. Components of g2subset g2subset.pl This is a small perl script that normally runs in the "cgi-bin" directory of the web server. At minimum, this script specifies the directory that g2subset will serve and than calls it "g2sub_main.pl". Here is a sample g2subet.pl #!/usr/bin/perl -w -I/home/wd23ja/bin require ("g2subsetmod_grid.pl"); $dir="/var/ftp/pub/"; &g2sub_main($dir); exit; g2subsetmod_grib.pl This is the main perl script. It creates the web pages, reads the user responses and runs wgrib2. If you are using the above g2subset.pl, this perl script will reside in /home/wd23ja/bin/. .g2subrc This is a a "resource" file and placed in each data directory. It provides some customization/initialization for the data. Here is a simple .g2subrc See the subroutine read_g2subrc in the g2subset source code. title=rotating GFS forecasts ncol=4 gribfilter=yes files_pat='grib2$' : the files_pat is a regex for the grib2 files vars=4LFTX 5WAVA 5WAVH ABSV CAPE CIN CLWMR CWAT GPA HGT HPBL ICEC LAND LFTX O3MR POT \ : vars is a list of grib2 variables that can be selected. (wgrib2 names) levs=0-0.1_m_below_ground 0.1-0.4_m_below_ground 0.33-1_sigma_layer 500_mb 50_mb \ 550_mb 600_mb 650_mb 700_mb : levs is a list of levels that can be selected. (wgrib2 levels) Blanks are replaced by underscores. title = title for the web page ncol = number of columns for listing gribfilter = yes or no (turns on grib2 filtering) files_pat = 'match pattern' regular expression for files to serve vars = list of grib variables (spaces are replaced by underscores) lev = list of levels (spaces are replaced by underscores) hours = yes/no to hour filter days = yes/no to day filter months = (x) yes/no to month filter forecast = (x) data are forecasts (use verification time) nosubregion = (x) do not allow subregions nopoints = (x) do not allow point output noregrid = (X) do not regrid (interpolation to new grid) Principles of Operation The cgi-bin part of g2subset provides a pointer to a directory. This directory or a subdirectory holds the grib2 files. Traditionally these grib2 files are stored on a directory that is visable to the outside world through the http server. This is so that "partial http downloading" technique will work. http://www.cpc.ncep.noaa.gov/products/wesley/fast_downloading_grib.html The g2subset perl scripts create the web pages. When the user requests a "download", all the selections are converted into a single wgrib2 command. You can see the command by selecting "Show the URL for scripting downloads" and then clicking on "Download". Index files Grib files are flat files; they do not include indexing. The g2subset script uses wgrib2 inventories as the index file. The default suffix of the index file is .inv and are created by wgrib2 GRIBFILE > GRIBFILE.inv The nomads.ncep.noaa.gov uses .idx suffix for the index file. They produce the .idx file by wgrib2 GRIB2_FILE > GRIB2_FILE.idx G2subset will work without index files but it makes the system slow as the entire data file will be read everytime there is a data request. G2subset supports an index file that uses the verification time instead of the reference time for date code matching. It has a suffix of .inv-verf. Speed When making regional subsets, the server needs to decode the grib file, make the subset (cookie cutter, interpolation) and encode the grib file. For speed, the input files should be in simple, complex or aec packing. (Jpeg2000 is very slow.) For output, the simple packed files (default) uses the least amount of CPU at the expense of bandwidth. AEC packing uses more CPU but produces much smaller files. However, one should be aware that some users may have difficulty with decoding AEC compressed files. Complex packing is another possibility when CPU is less of a concern. Another way to speed up the operation is to compile wgrib2 with OpenMP and set $OMP_NUM_THREADS to a small number between 2 to 4. This helps complex and simple packing. In my tests (2012), you can distribute continental USA 1-km grids using complex packing and the OpenMP version of wgrib2. If you need even more speed, wgrib2m-type processing can be used (untested). Things to do Wgrib2 now supports the -fgrep/-egrep and -i_file options. G2subset can be changed so that "cat X.inv | egrep A | egrep B | wgrib2 -i X ..." is replaced by "wgrib2 -i_file X.inv -egrep A -egrep X ..." which is cleaner. The interpolation options are "rough". The parameters needed are not explained by the web pages and you need to see the wgrib2 documentation. The interface can be done better.