Configuration#
Note
Support for full geographic meshes exists, however there is currently an issue with computing surface normals. Thus projected coordinate systems should only be used.
Note
Regardless of input coordinate system all input points are specified in latitude and longitude in WSG84.
Schema#
The config file is a JSON file. However, it does support C-style comments: // and /** **/ are both valid.
There are a few required sections: modules, meshes, forcing.
The general layout of a CHM config JSON file is
{
"option":
{
"option_a": true,
"option_b": 1234,
...
},
"modules":
[
"module1",
"module2",
...
],
"config":
{
"module1":
{
...
},
...
}
"meshes":
{
...
},
"forcing":
{
...
},
"output":
{
...
}
}
For every section, if a top-level key:value pair is found and the value contains “.json”, that file is loaded and inserted into this option. The key-value is not used, and may be anything. Key names are enclosed in quotes (” “). Although it tends to make more sense to arrange the keys in the shown order, the order does not matter (anywhere) and will be read correctly.
Warning
The modules key is an array and requires the use of [ ]
Warning
Do not prefix a number with zero (0). This is the octal prefix and it will cause the JSON parser to choke on lines that otherwise look fine.
Note
A user can specify a number as "5" or 5. Internally to CHM it will be converted to a numeric type. Thus, both are fine, however a non-string should be preferred. This is similar for "true" and true.
Note
Boolean types are case sensitive.
Sections#
option#
This section contains options for CHM and the simulation in general. This is a required section.
{
"option":
{
"debug_level": "debug",
"prj_name": "SnowCast",
"startdate": "20180501T050000"
"enddate": "20180501T060000"
}
}
- point_mode#
- Type:
{ }- Required:
No
Point mode selects that the model should be run in point mode, versus distributed mode.
There is one optional key that need to be specified:
forcing(string)
forcingneeds to correspond to a specific input point as defined in the forcing sectionUsage of this key also requires adding
point_modeto the module list. Lastly, no modules which are definedparallel:domainmay be used when point_mode is enabled.
"point_mode":
{
"forcing":"UpperClearing"
},
"point_mode":
{
// empty to just enable it
},
- notification_script#
- Type:
string
- Default:
None
Path to a script to call upon model execution. This is useful for sending a notification to a computer or phone upon the completion of a long model run.
"notification_script":"./finished.sh"
And example of what finished.sh might do is below, which triggers a
notification.
- debug_level#
- Type:
string
- Default:
“Debug”
This controls the verbosity of the output. Options are:
verbose [ all messages ]
debug [ most messages useful for debugging ]
warning [only warnings]
error [ only errors which terminate model execution ]
Currently most useful internal messages are debug level.
"debug_level":"debug"
- startdate#
- Type:
string
- Default:
None
Allows for a different start time than that specified by the input timeseries.
In the same ISO format as the forcing data: YYYYMDTHMS.
"startdate":"20010501T000000"
- enddate#
- Type:
string
- Default:
None
Allows for a different end time than that specified by the input timeseries.
In the same ISO format as the forcing data: YYYYMDTHMS. If enddate == stardate, then
one timestep is run.
"enddate":"20010502T000000"
modules#
Modules to run. These are a comma separated list of keys. This is a required section.
A few notes:
order as defined in this list has no bearing on the order modules execute
may be commented out to remove them from execution
names are case sensitive
point_modemodule is required to enable point mode, in addition to being enabled inoption.point_mode.
Note
Modules are in a list ([ ])
"modules":
[
"Liston_wind",
"Burridge_iswr",
"slope_iswr",
"Liston_monthly_llra_ta",
"kunkel_rh",
"Thornton_p",
"Walcek_cloud",
"Sicart_ilwr",
"Harder_precip_phase",
"snobal",
"Gray_inf",
"Richard_albedo"
]
remove_depency#
Under some cases, a cyclic dependency is created when a module B depends on module A’s output, and module A depends on module B’s output. There is no way to automatically resolve this. It requires the modeller to manually break the cycle and force one module to run ahead of another (essentially time-lagged).
An example of this occurring is that the albedo model requires knowledge of SWE, provided by the snowmodel. However, the snowmodel requires albedo to run. Therefore, the modeller may define that the albedo routine is run first, then the snowpack model.
Specifically: if module A depends on B (A->B), then to remove the decency of B from A, specify it as
"A":"B"This can be thought of as
Aneeds to come beforeB. If the specified modules are not added to the modules list, they are ignored."remove_depency": { "Richard_albedo":"snobal" }
config#
Each module, upon creation is provided a configuration instance. These configuration data are set by creating a key that exactly matches the module name. If a section is added, but that module isn’t specified, the section is ignored.
- module_name#
- Type:
{ }
For example:
"config":
{
"slope_iswr":
{
"no_slope":true
}
},
If the configuration is sufficiently large or cumbersome, it may be best to have it in a separate file. This can be specified as
//consider this in CHM.json
"config":
{
"simple_canopy":"canopy.json"
}
And canopy.json is
"canopy":
{
"LAI":3
}
Note that the sub-keys for a module’s configuration are entirely dependent upon the module. Please see the module’s help for specific options.
meshes#
This section defines the mesh and optional the parameter files to use. It is a require section. This section has two keys:
- mesh#
- Type:
string
File path to the
.meshfile produced by mesher.
- parameters#
- Type:
{ }
Optionally, A set of key:value pairs to other
.paramfiles that contain extra parameters to be used. These are in the format{ "file":"<path>"" }
"meshes":
{
"mesh":"meshes/granger30.mesh",
"parameters":
{
"file":"meshes/granger30.param",
"file":"meshes/granger30_surface.param"
}
}
- If CHM is run with MPI ranks > 1, then pre-partitioned HDF5-based meshes need to be used. See partition and
meshgen for how to convert the mesh.
When using the partitioned mesh, then the following is sufficient:
"meshes": {
"mesh":"FABDEM-clip_mesh.np160.partition"
}
There is currently a limitation in CHM such that the .partition and .mesh folder need to be in the CHM
project root.
parameter_mapping#
The parameters may be classified values for use in a look-up table. For example, the landcover may be a numeric class value and values such as LAI need to be obtained from a lookup table. These parameters may be either specified directly in the file or located in another file:
"parameter_mapping":
{
"soil":"parameters/wolf_soil_param.json"
}
or as a key:value pair. In all cases, the parameter name is how it will be referenced in the module that is looking for it. Please see the module’s documentation for what the expected format is.
{
"landcover":
{
"20":
{
"desc":"lake",
"is_waterbody":true
},
"31":
{
"desc":"snow ice"
}
}
}
output#
Output may be either to an ascii-timeseries for a specific triangle on the mesh or it may be the entirety of the mesh. The two output types are set by:
a key named
"vtu":{ ... }or"ugrid":{ ... }will enable the entire mesh outputall other keys (
"some_name":{...}`) are assumed to be the names of output timeseries
Both mesh and timeseries can be used together.
- output_dir#
- Type:
string
- Default:
“output”
The output directory name.
timeseries output#
The name of the timeseries key is used to uniquely identify this output: "output_name":{ ... }.
If using point_mode, this name corresponds to the output key. If a lot of stations are to be
output, consider keeping them in a separate file and inserting using the top-level “.json” behaviour.
There is currently no check that one MPI rank finds the output triangle. Any rank that doesn’t have this output triangle will raise a warning, but exactly 1 rank should report that it finds the output triangle. If the file is empty, confirm that a rank does find the triangle and confirm the output lat long is correct.
- longitude#
- Type:
float
WGS84 longitude of output point. The triangle that contains this point is then selected for output. An error is raised if no triangle contains the point.
- latitude#
- Type:
float
WGS84 latitude of output point. The triangle that contains this point is then selected for output. An error is raised if no triangle contains the point.
- file#
The output file name. The output is in csv format and each column is a variable.
"output":
{
"more_stations":"mystations.json",
"UpperClearing":
{
"longitude": "-115.175362",
"latitude": "50.956547",
"file": "uc.txt"
}
}
where mystations.json would look like
{
"some_station":
{
"longitude": "-115.175362",
"latitude": "50.956547",
"file": "somestation.txt"
}
}
Entire mesh#
The entire mesh may be written to a netCDF UGRID compliant file or Paraview’s vtu format. Both can be use for visualization in Paraview and for analysis.
For general usage, the ugrid file is recommended as the output format. For more details on the output files, please see the output section.
Note
The default behaviour of both ugrid and vtu` outputs is not to output any files. The user must specify the output frequency
or timing.
- base_name#
- Type:
string
The base file name to be used. Default is the same name as the output folder.
Warning
NCZarr does not currently support creating of zarr files in MPI mode. Do not use format:”zarr” at the moment
- format#
- Type:
string
- Default:
“netcdf”
Storage backend for
ugridoutputs. Options arenetcdf(writes a.ncfile) orzarr(writes a.zarrstore via NCZarr). Only applies tougridoutputs.
- variables#
- Type:
[ "variable_name", ... ]
The default behaviour to is to write every variable. This may produce an undesirable amount of output. This takes a list of variables to output.
"variables": [
"t",
"U_2m_above_srf",
"swe",
"iswr"
],
- frequency#
- Type:
int
- Default:
1
Frequency can be set to write ever N timesteps. If the automatic checkpoint system is used, this might not produce a consistent output frequency and time. For example, suppose daily midnight output was chosen (
frequency:24) as the model simulation starts at 00:00. However, the auto-checkpoint suspends at the 8am timestep, the next output will be at 8am instead of midnight.
- rotate_frequency#
- Type:
int
- Default:
0
Only applies to ugrid outputs. The timestep frequency to create a new ugrid file at. Rotated files are named
<base_name>_YYYYMMDDTHHMMSS.ncand the cadence is preserved across checkpoint resume.
- chunk_time_len#
- Type:
int
- Default:
unset
Only applies to ugrid outputs. Sets an explicit time chunk length (in timesteps). Must not be set alongside
chunk_target_mb.
- chunk_target_mb#
- Type:
float
- Default:
unset
Only applies to ugrid outputs. Sets the target chunk size per variable (in MB). Must not be set alongside
chunk_time_len.
- write_all_parameters#
- Type:
boolean
- Default:
true
Disables/enables writing parameters to the output.
- output_parameters#
- Type:
[ "parameter_name", ... ]
Controls which parameters are written when
write_all_parametersis enabled. If omitted, the defaults areElevation,Slope, andAspect.
- write_ghost_neighbors#
- Type:
boolean
- Default:
false
Write each MPI rank’s ghost face data to vtu output. Only possible with vtu output.
- specific_datetime#
- Type:
string
- Default:
””
Output at a specific date-time, given in the iso format, e.g.,
"specific_datetime": "20191227T160000"
- specific_time#
- Type:
string
- Default:
””
Output at a specific time every day, given in a “HH:MM” 24hr-format, e.g.,
"specific_time": "14:00"
- compress#
- Type:
bool
- Default:
true
Enable shuffle + deflate level 5. Docs/ Improvements compression. Only on ugrid output.
- bitgroom#
- Type:
bool
- Default:
true
Enable bitgrooming. Only on ugrid output.
Example:
All of the frequency options can mixed together, allowing more complex output frequency selections
"output":
{
"vtu": {
"base_name": "SC",
"variables": [
"t",
"U_2m_above_srf",
"swe",
"iswr"
],
"output_parameters": [
"Elevation",
"Slope",
"Aspect"
],
"frequency": "24",
"specific_datetime": "20191227T160000",
"write_all_parameters": false,
"write_ghost_neighbors": true
},
"ugrid": {
"format": "zarr",
"variables": [
"t",
"U_2m_above_srf",
],
"output_parameters": [
"Elevation",
"Slope",
"Aspect"
],
"frequency": "1",
"specific_datetime": "20191227T160000",
"write_all_parameters": false
}
}
forcing#
Input forcing can be either a ASCII timeseries or a NetCDF. Please see forcing for more details on the file specifications.
Warning
Including json sub-config files in this section is not supported
Input forcing stations do not need to be located within the simulation domain. Therefore they can act as ‘virtual stations’ so-as to use reanalysis data, or met stations located outside of the basin. However, because global datasets cannot generally be loaded all at once into memory without significant memory pressure, the met loader only loads points that are within the bounding box of the mesh. For MPI runs, it only loads within the ranks’ meshe sub-domain bounding box. If the number of stations within the bounding box cannot fullfill num_stations_to_use, the bounding box is expanded by 25% (up to 3 times).
An example of this is shown below, where each black point is a virtual station, representing the center for a NetCDF grid cell from a NWP product.
- UTC_offset#
- type:
int
- default:
0
If the input timeseries it not UTC, then this is the correction to account for UTC offset (all solar radiation calculations are in UTC). This is Positive west!
- use_netcdf#
- Type:
boolean
- Default:
false
Specify if a NetCDF (.nc) file will be used. Cannot be used along with ASCII inputs!
- station_search_radius#
- Type:
double
- Default:
None
The search radius (meters) surrounding any given triangle within which to search for a station. This is used to ensure only “close” stations are used. Cannot be used when
station_N_nearestis set. Based off the center of the triangle.
- station_N_nearest#
This is removed in favour of
num_stations_to_use.
- num_stations_to_use#
- Type:
int
- Default:
5
The number of forcing inputs, e.g., netcdf cell centres or station timeseries, to include for the interpolation at a triangle.
Both
station_search_radiusandnum_stations_to_usecannot be simultaneously specified. If neither is specific, thennum_stations_to_use:5is used as default.If the
interpolantmode isnearest, then this is automatically set to 1.
- interpolant#
- Type:
string
- Default:
“spline”
Chooses either thin plate spline with tension (spline) or inverse distance weighting (idw). Nearest selects the closest station and only uses that with no interpolation.
"interpolant" : "idw" "interpolant" : "spline" "interpolant" : "nearest"
Note
ASCII and NetCDF inputs cannot be mixed. It is one or the other.
ASCII timeseries#
This is given as "station_name":{ ... }. If using point_mode, then the value station_name must exactly match the input used for option.point_mode.
- file#
A relative or absolute path to an input forcing file
- latitude#
- Type:
double
Latitude of the input station, WGS84. Positive North. Not “N” or “S” suffix
- longitude#
- Type:
double
Longitude of the input station, WGS84. Positive East. Not “N” or “S” suffix
- elevation#
- Type:
double
Elevation is given in meters. It does not need to be equal to the elevation of the triangle upon which it lies if the station is located in the simulation domain. This value is used in the lapse rate equations to interpolate the data.
If required, forcing station definitions can be located in an external
file. For the external file, the name of the key doesn’t matter. The
external file should contain the stations in the format as per above. It
does not require an addition "forcing": section definition.
"forcing":
{
"num_stations_to_use": 1,
"interpolant": "nearest",
"some_station":
{
//definition
},
"reanalysis_extract_1": "external_file_1.json",
"reanalysis_extract_2": "external_file_2.json",
}
where external_file_*.json looks like
{
"station1":
{
//details here
},
"station2":
{
//details here
}
}
Filters#
Filters perform an operation on the data prior to being passed to a module. They allow for things such as wind-undercatch corrections to be done on the fly.
If a filter is defined, it must be defined on the forcing file and operate upon a variable that exists in the forcing data. They are given as:
"filter_name": { ... }`. The configuration values are filter-specific; please see the filter documentation for what is required. Multiple filters may be specified.
"buckbrush":
{
"file": "bb_m_2000-2008",
"latitude": 60.52163,
"longitude": -135.197151,
"elevation": 1305,
"filter":
{
"scale_wind_speed": {
"Z_F": 4.6,
"variable": "u"
},
"goodison_undercatch": {
"variable": "p"
}
}
}
Warning
Filters run in the order defined in the configuration file.
Example#
"forcing":
{
"UTC_offset": 8,
"buckbrush":
{
"file": "bb_m_2000-2008",
"latitude": 60.52163,
"longitude": -135.197151,
"elevation": 1305,
"filter":
{
"scale_wind_speed":
{
"Z_F": 4.6,
"variable": "u"
},
"goodison_undercatch":
{
"variable":"p"
}
}
},
"alpine":
{
"file": "alp_m_2000-2008",
"latitude": 60.567267,
"longitude": -135.184652,
"elevation": 1559,
"filter": {
"scale_wind_speed": {
"Z_F": 2.5,
"variable": "u"
},
"goodison_undercatch":
{
"variable":"p"
}
}
}
}
NetCDF#
The use a netCDF file, set ``forcing:use_netcdf=true`, and choose the netcdf file.
"forcing":
{
"use_netcdf": true,
"file":"forcing_file.nc",
}
If a multipart file is to be used, simply replace the netcdf with the json list.
"forcing":
{
"use_netcdf": true,
"file":"metadata.json",
"num_stations_to_use": 1,
"interpolant": "nearest",
}
Please see the NetCDF forcing section for more details.
Warning
NetCDF and point_mode are not supported.
Filters#
Filters are the same as for ASCII with one important distinction: every specified filter is run for every virtual station (i.e., grid cell centre).
"filter": {
"scale_wind_speed": {
"Z_F": "40",
"variable": "u"
}
}
Example#
"forcing": {
"UTC_offset": "0",
"use_netcdf": true,
"file": "GEM-CHM_2p5_west_2017100106_2018080105.nc",
"filter": {
"scale_wind_speed": {
"Z_F": "40",
"variable": "u"
}
}
}
checkpoint#
CHM can save its state after a timestep, allowing CHM to resume from this timestep. Further details on the output format can be found in the checkpointing section.
To enable checkpoints, save_checkpoint must be enabled and one of the on_* options must be supplied. Multiple
on_* can be combined. For example, on_frequency and on_last can be combined to produce
checkpoints every on_frequency timesteps as well as on the last timestep.
Warning
The configuration file can be modified in between checkpoint runs. However, the changes MUST be coherent with what is saved in the checkpoint file. For example the following is not allowed: a new module is added that does not have checkpointed data.
- save_checkpoint#
- Type:
boolean
- Default:
false
Enable checkpointing. Must be set true to enable the checkpointing. One of the options for when to checkpoint must also be sent.
- on_frequency#
- Type:
int64
- Default:
empty
The frequency of checkpointing. Checkpoints every
on_frequencytimesteps.
- specific_datetime#
- Type:
string
- Default:
””
Checkpoints at a specific date-time, given in the iso format, e.g.,
"specific_datetime": "20191227T160000"
- specific_time#
- Type:
string
- Default:
””
Checkpoints at a specific time every day, given in a “HH:MM” 24hr-format, e.g.,
"specific_time": "14:00"
- on_last#
- Type:
bool
- Default:
false
Check point only on the last timestep. Can be used with
frequency, but does not requirefrequencyto be set.
- on_wallclock_limit#
- Type:
bool
- Default:
false
If the environment variable
CHM_WALLCLOCK_LIMITis detected at simulation start, then CHM will track how long it has left. When it only hasminutes_of_wallclockminutes left (default = 2 min).CHM_WALLCLOCK_LIMITneeds to be created from the scheduler environment. How to do so for PBS Pro and a SLURM is given below:
# All PBS / SLURM options here
# [...]
# Set the env var
CHM_WALLCLOCK_LIMIT=$(squeue -j $SLURM_JOB_ID -h --Format TimeLimit)
CHM_WALLCLOCK_LIMIT=$(qstat -f $PBS_JOBID | sed -rn 's/.*Resource_List.walltime = (.*)/\1/p')
# Then run CHM
mpirun [...] CHM -f [...]
- minutes_of_wallclock#
- Type:
int64
- Default:
2
Number of minutes before the wallclock expires to begin checkpointing. Only has an effect if
on_wallclock_limit=true.
- load_checkpoint_path#
- Type:
string
- Default:
empty
Path to checkpoint file to load from (specifically, the json file). Can be used with the other checkpointing options.
- auto_resume#
- Type:
bool
- Default:
false
Set to
trueto auto-resume from the most recent checkpoint that exists inoutput_folder/checkpoint. Doing so allows for easily and repeatedly resuming from a checkpoint file.
Note
Upon a successful completion of a simulation of CHM, a sentinel file is written to the output folder “<output_folder>/clean_exit”. This will not be written if CHM suspends due to a wall lock limit. Therefor, the intent of this file is to be used to allow repeated automatic requeuing of a job on HPC that have short wallclock limits, i.e., keep requeuing until that file is found.
Basic checkpoint example
"checkpoint":
{
"save_checkpoint": true,
"frequency": 4,
"on_last": true,
"load_checkpoint_path":"output/checkpoint/checkpoint_20001001T140000.np1.json"
}
An example of auto save and resume. This will checkpoint with 5 minutes of wall clock left. Then, as long as the same output directory is used, a subsequent run will detect the most recent checkpoint and resume from it.
"checkpoint":
{
"save_checkpoint": true,
"on_wallclock_limit": true,
"minutes_of_wallclock": 5,
"auto_resumed": true
}