General Data Structure
Table of Contents
-
Organization of Data Directories
-
Organization of Data Files
-
Use of Templates in GrADS Data Descriptor Files
-
Dataset Templates
Our repository contains a directory for each model
system (i.e., ETA, GFS, and RUC). Each model directory
contains one directory for each month. Each month
directory contains a directory for each day in the
month. The directory for each day contains all the data
from model runs initialized during that day. These
directories contain the GRIB files. We also have a
parallel structure that contains the table of contents
(i.e., toc) listing of each GRIB file. A sample of our
directories structure is shown below.
noaaport/
|-- merged
| |-- eta
| | `-- 200305
| | `-- 20030505
| |-- eta_toc
| | `-- 200305
| | `-- 20030505
| |-- gfs
| | `-- 200305
| | `-- 20030505
| |-- gfs_toc
| | `-- 200305
| | `-- 20030505
| |-- ruc
| | `-- 200305
| | `-- 20030505
| `-- ruc_toc
| `-- 200305
| `-- 20030505
Organization of Data Files
Our model data repository contains data from
different climate and weather models. It has recently been
expanded to place station datasets under the distributed data
access framework. The data from the
climate models are relatively static, while the data
from the weather models are collected in near realtime. The
weather models are run at different frequencies and at
different spatial resolutions and domains. The
frequency of a model is the number of times or cycles
the model is run per day. A model is typically run at 4
cycles per day (i.e., 00Z, 06Z, 12Z, and 18Z). The
resolution and domain for a model run is determined by
its grid number. A model run is the single execution of
a specific model using a specific grid initialized at a
specific time and run for so many forecast hours. The
same model may be run at different grids. Our weather
model data is cataloged by specific combinations of
model and grid.
Output from a model run contains data from the
analysis field (i.e., the initialization field) and the
forecast data at the end of each time step (i.e.,
forecast hour). Our repository for weather model data
contains a separate file for each analysis field and
each forecast hour. Thus each file contains data from
the gridded analysis field or data from one forecast
hour from a single model run. All files contain gridded
data in primarily GRIB format. Thus a model run with 20 steps
will contain 21 GRIB files: one analysis (e.g., hour 0)
plus 20 forecast hours.
Files are named to identify the model run where the
data originated. The filenames are divided into 5 sections,
delimited by underscores. In certain occasions, the 4th and 5th
section may be excluded (such as a climate reanalysis with monthly
time steps). An example follows:
<model>_<grid>_<yyyymmdd>_<cycle>_<hour>
In instances where hours and days have no meaning,
(e.g. daily and monthly datasets), the filenames take the
following default values:
<model>_<999>_<00000101>_<0000>_<000>
- model - name of the model(e.g., early-eta)
- grid - grid number (e.g. 212) - This number will serve
other purposes when non-gridded data is involved; 1-3 digits.
- yyyymmdd - initialization year, month, day - 8
digits (e.g., 20030501)
- cycle - initialization time in hhmm - 4 digits
(e.g., 0000)
- hour - forecast hour relative to initialization
(e.g., 000); 3 digits
The following is a list of filename extensions, and their
corresponding purpose, that you may
encounter while browsing the NOMADS data structure.
Please note that each file type may not exist for every
dataset on our system:
- .grb - GRIB data
- .unf - GrADS unformatted binary data file
- .idx - index file produced by GrADS gribmap utility
- .map - map file producted by GrADS stnmap utility
- .ctl - GRADS data descriptor file
- .toc - Basic table of contents dump produced by WGRIB. toc
files exist for each individual file, as well as a merged file
for each cycle (fff.toc)
- .inv - WGRIB grib inventory data, contains information which
allows for partial HTTP file transfers to work. Similar but
more detailed then .toc files
Examples:
meso-eta_218_20030501_1800_045.grb
meso-eta_218_20030501_1800_000.unf
meso-eta_218_20030501_1800_045.toc
meso-eta_218_20030501_1800_045.inv
meso-eta_218_20030501_1800_fff.ctl
meso-eta_218_20030501_1800_fff.idx
meso-eta_218_20030501_1800_fff.map
meso-eta_218_20030501_1800_fff.toc
Use of Templates in GrADS
Data Descriptor Files
Our repository has a single data descriptor file
(i.e., .ctl) for each model run. We use the GrADS
templates feature to aggregate all the forecast hour
files for the same run. Since these files represent
multiple forecast hours, we identify these files by
using "fff" in the forecast hour field in the file
name.
Examples:
meso-eta_218_20030501_1800_fff.ctl <-- grads descriptor file for run
meso-eta_218_20030501_1800_fff.idx <-- index file for run
meso-eta_218_20030501_1800_fff.toc <-- table of contents for run
Dataset Templates
For select datasets, GrADS control file templates have been
built across the entire date range for only a few variables,
forecast hours, and vertical levels. These ware recently
introduced to provides a solution
to the problem of having to write a script to loop across hundreds
or even thousands of cycle-organized files just to create a
time series for one variable and level.
These templates (or subsets) are best utilized by the Grads
Data Server and Live Access Server. They can be found either in
the top level of a dataset listing, or, in a subfolder named
'subsets'. For example, the top level directory for the
North American Regional Reanalysis:
http://nomads.ncdc.noaa.gov:9091/dods/NCEP_NARR_DAILY
Contains the listing: narr-a_221_hgtprs.subset
Remotely opening this file from an OPeNDAP enabled client such as
gradsdods will enable you access to the geopotential height field
at any level for the entire reanalysis!
We have created many subsets for The Global Forecast System, so,
we have gathered them all into a subsets folder:
http://nomads.ncdc.noaa.gov:9091/dods/NCEP_GFS/subsets
The general naming scheme for subset templates is as follows,
squared brackets denote portions that may be excluded:
<model>_<grid>_<Description>[_<fh1>][_<fh2>][.subset]
| <model>_<grid> |
Name and grid # of the dataset |
| <Description> |
One-word description of the subset |
| <fh1> |
Forecast hour, or start of forecast hour range
Assumed to be 000 if not present |
| <fh2> |
End of forecast hour period
Assumed to be not relevant if not present |
| [.subset] |
Filename extention to identify subset templates
Not required inside /subsets subdirectory |
|