Configuration#

Note

Support for full geographic meshes exists, however there is currently an issue with computing surface normals. Thus projected coordinate systems should only be used.

Note

Regardless of input coordinate system all input points are specified in latitude and longitude in WSG84.

Schema#

The config file is a JSON file. However, it does support C-style comments: // and /** **/ are both valid.

There are a few required sections: modules, meshes, forcing.

The general layout of a CHM config JSON file is

{
   "option":
   {
      "option_a": true,
      "option_b": 1234,
      ...
   },
   "modules":
   [
      "module1",
      "module2",
      ...
   ],
   "config":
   {
      "module1":
      {
         ...
      },

      ...
   }
   "meshes":
   {
      ...
   },
   "forcing":
   {
     ...
   },
   "output":
   {
      ...
   }
}

For every section, if a top-level key:value pair is found and the value contains “.json”, that file is loaded and inserted into this option. The key-value is not used, and may be anything. Key names are enclosed in quotes (” “). Although it tends to make more sense to arrange the keys in the shown order, the order does not matter (anywhere) and will be read correctly.

Warning

The modules key is an array and requires the use of [ ]

Warning

Do not prefix a number with zero (0). This is the octal prefix and it will cause the JSON parser to choke on lines that otherwise look fine.

Note

A user can specify a number as "5" or 5. Internally to CHM it will be converted to a numeric type. Thus, both are fine, however a non-string should be preferred. This is similar for "true" and true.

Note

Boolean types are case sensitive.

Sections#

option#

This section contains options for CHM and the simulation in general. This is a required section.

{
   "option":
   {
        "debug_level": "debug",
        "prj_name": "SnowCast",
        "startdate": "20180501T050000"
        "enddate": "20180501T060000"

   }
}
point_mode#
Type:

{ }

Required:

No

Point mode selects that the model should be run in point mode, versus distributed mode.

There is one optional key that need to be specified:

  • forcing (string)

forcing needs to correspond to a specific input point as defined in the forcing section

Usage of this key also requires adding point_mode to the module list. Lastly, no modules which are defined parallel:domain may be used when point_mode is enabled.

"point_mode":
{
  "forcing":"UpperClearing"
},
"point_mode":
{
  // empty to just enable it
},
notification_script#
Type:

string

Default:

None

Path to a script to call upon model execution. This is useful for sending a notification to a computer or phone upon the completion of a long model run.

"notification_script":"./finished.sh"

And example of what finished.sh might do is below, which triggers a notification.

debug_level#
Type:

string

Default:

“Debug”

This controls the verbosity of the output. Options are:

  • verbose [ all messages ]

  • debug [ most messages useful for debugging ]

  • warning [only warnings]

  • error [ only errors which terminate model execution ]

Currently most useful internal messages are debug level.

"debug_level":"debug"
startdate#
Type:

string

Default:

None

Allows for a different start time than that specified by the input timeseries. In the same ISO format as the forcing data: YYYYMDTHMS.

"startdate":"20010501T000000"
enddate#
Type:

string

Default:

None

Allows for a different end time than that specified by the input timeseries. In the same ISO format as the forcing data: YYYYMDTHMS. If enddate == stardate, then one timestep is run.

"enddate":"20010502T000000"

modules#

Modules to run. These are a comma separated list of keys. This is a required section.

A few notes:

  • order as defined in this list has no bearing on the order modules execute

  • may be commented out to remove them from execution

  • names are case sensitive

  • point_mode module is required to enable point mode, in addition to being enabled in option.point_mode.

Note

Modules are in a list ([ ])

"modules":
[
   "Liston_wind",
   "Burridge_iswr",
   "slope_iswr",
   "Liston_monthly_llra_ta",
   "kunkel_rh",
   "Thornton_p",
   "Walcek_cloud",
   "Sicart_ilwr",
   "Harder_precip_phase",
   "snobal",
   "Gray_inf",
   "Richard_albedo"

]

remove_depency#

Under some cases, a cyclic dependency is created when a module B depends on module A’s output, and module A depends on module B’s output. There is no way to automatically resolve this. It requires the modeller to manually break the cycle and force one module to run ahead of another (essentially time-lagged).

An example of this occurring is that the albedo model requires knowledge of SWE, provided by the snowmodel. However, the snowmodel requires albedo to run. Therefore, the modeller may define that the albedo routine is run first, then the snowpack model.

Specifically: if module A depends on B (A->B), then to remove the decency of B from A, specify it as "A":"B"

This can be thought of as A needs to come before B. If the specified modules are not added to the modules list, they are ignored.

"remove_depency":
{
  "Richard_albedo":"snobal"
}

config#

Each module, upon creation is provided a configuration instance. These configuration data are set by creating a key that exactly matches the module name. If a section is added, but that module isn’t specified, the section is ignored.

module_name#
Type:

{ }

For example:

"config":
{

   "slope_iswr":
       {
         "no_slope":true
       }
},

If the configuration is sufficiently large or cumbersome, it may be best to have it in a separate file. This can be specified as

//consider this in CHM.json
"config":
{
    "simple_canopy":"canopy.json"
}


And canopy.json is

"canopy":
{
  "LAI":3
}

Note that the sub-keys for a module’s configuration are entirely dependent upon the module. Please see the module’s help for specific options.

meshes#

This section defines the mesh and optional the parameter files to use. It is a require section. This section has two keys:

mesh#
Type:

string

File path to the .mesh file produced by mesher.

parameters#
Type:

{ }

Optionally, A set of key:value pairs to other .param files that contain extra parameters to be used. These are in the format { "file":"<path>"" }

"meshes":
{
 "mesh":"meshes/granger30.mesh",
 "parameters":
 {
   "file":"meshes/granger30.param",
   "file":"meshes/granger30_surface.param"
 }
}
If CHM is run with MPI ranks > 1, then pre-partitioned HDF5-based meshes need to be used. See partition and

meshgen for how to convert the mesh.

When using the partitioned mesh, then the following is sufficient:

"meshes": {
    "mesh":"FABDEM-clip_mesh.np160.partition"
}

There is currently a limitation in CHM such that the .partition and .mesh folder need to be in the CHM project root.

parameter_mapping#

The parameters may be classified values for use in a look-up table. For example, the landcover may be a numeric class value and values such as LAI need to be obtained from a lookup table. These parameters may be either specified directly in the file or located in another file:

"parameter_mapping":
{
  "soil":"parameters/wolf_soil_param.json"
}

or as a key:value pair. In all cases, the parameter name is how it will be referenced in the module that is looking for it. Please see the module’s documentation for what the expected format is.

{
   "landcover":
   {
      "20":
      {
        "desc":"lake",
        "is_waterbody":true
      },
      "31":
      {
        "desc":"snow ice"
      }
   }
}

output#

Output may be either to an ascii-timeseries for a specific triangle on the mesh or it may be the entirety of the mesh. The two output types are set by:

  • a key named "vtu":{ ... } or "ugrid":{ ... } will enable the entire mesh output

  • all other keys ("some_name":{...}`) are assumed to be the names of output timeseries

Both mesh and timeseries can be used together.

output_dir#
Type:

string

Default:

“output”

The output directory name.

timeseries output#

The name of the timeseries key is used to uniquely identify this output: "output_name":{ ... }.

If using point_mode, this name corresponds to the output key. If a lot of stations are to be output, consider keeping them in a separate file and inserting using the top-level “.json” behaviour.

There is currently no check that one MPI rank finds the output triangle. Any rank that doesn’t have this output triangle will raise a warning, but exactly 1 rank should report that it finds the output triangle. If the file is empty, confirm that a rank does find the triangle and confirm the output lat long is correct.

longitude#
Type:

float

WGS84 longitude of output point. The triangle that contains this point is then selected for output. An error is raised if no triangle contains the point.

latitude#
Type:

float

WGS84 latitude of output point. The triangle that contains this point is then selected for output. An error is raised if no triangle contains the point.

file#

The output file name. The output is in csv format and each column is a variable.

 "output":
 {
    "more_stations":"mystations.json",
    "UpperClearing":
    {
        "longitude": "-115.175362",
        "latitude": "50.956547",
        "file": "uc.txt"
    }
}

where mystations.json would look like

{
     "some_station":
     {
         "longitude": "-115.175362",
         "latitude": "50.956547",
         "file": "somestation.txt"
     }
}

Entire mesh#

The entire mesh may be written to a netCDF UGRID compliant file or Paraview’s vtu format. Both can be use for visualization in Paraview and for analysis.

For general usage, the ugrid file is recommended as the output format. For more details on the output files, please see the output section.

Note

The default behaviour of both ugrid and vtu` outputs is not to output any files. The user must specify the output frequency or timing.

base_name#
Type:

string

The base file name to be used. Default is the same name as the output folder.

Warning

NCZarr does not currently support creating of zarr files in MPI mode. Do not use format:”zarr” at the moment

format#
Type:

string

Default:

“netcdf”

Storage backend for ugrid outputs. Options are netcdf (writes a .nc file) or zarr (writes a .zarr store via NCZarr). Only applies to ugrid outputs.

variables#
Type:

[ "variable_name", ... ]

The default behaviour to is to write every variable. This may produce an undesirable amount of output. This takes a list of variables to output.

"variables": [
             "t",
             "U_2m_above_srf",
             "swe",
             "iswr"
         ],
frequency#
Type:

int

Default:

1

Frequency can be set to write ever N timesteps. If the automatic checkpoint system is used, this might not produce a consistent output frequency and time. For example, suppose daily midnight output was chosen (frequency:24) as the model simulation starts at 00:00. However, the auto-checkpoint suspends at the 8am timestep, the next output will be at 8am instead of midnight.

rotate_frequency#
Type:

int

Default:

0

Only applies to ugrid outputs. The timestep frequency to create a new ugrid file at. Rotated files are named <base_name>_YYYYMMDDTHHMMSS.nc and the cadence is preserved across checkpoint resume.

chunk_time_len#
Type:

int

Default:

unset

Only applies to ugrid outputs. Sets an explicit time chunk length (in timesteps). Must not be set alongside chunk_target_mb.

chunk_target_mb#
Type:

float

Default:

unset

Only applies to ugrid outputs. Sets the target chunk size per variable (in MB). Must not be set alongside chunk_time_len.

write_all_parameters#
Type:

boolean

Default:

true

Disables/enables writing parameters to the output.

output_parameters#
Type:

[ "parameter_name", ... ]

Controls which parameters are written when write_all_parameters is enabled. If omitted, the defaults are Elevation, Slope, and Aspect.

write_ghost_neighbors#
Type:

boolean

Default:

false

Write each MPI rank’s ghost face data to vtu output. Only possible with vtu output.

specific_datetime#
Type:

string

Default:

””

Output at a specific date-time, given in the iso format, e.g., "specific_datetime": "20191227T160000"

specific_time#
Type:

string

Default:

””

Output at a specific time every day, given in a “HH:MM” 24hr-format, e.g., "specific_time": "14:00"

compress#
Type:

bool

Default:

true

Enable shuffle + deflate level 5. Docs/ Improvements compression. Only on ugrid output.

bitgroom#
Type:

bool

Default:

true

Enable bitgrooming. Only on ugrid output.

Example:

All of the frequency options can mixed together, allowing more complex output frequency selections

"output":
{
 "vtu": {
         "base_name": "SC",
         "variables": [
             "t",
             "U_2m_above_srf",
             "swe",
             "iswr"
         ],
         "output_parameters": [
             "Elevation",
             "Slope",
             "Aspect"
         ],
         "frequency": "24",
         "specific_datetime": "20191227T160000",
         "write_all_parameters": false,
         "write_ghost_neighbors": true
     },
     "ugrid": {
             "format": "zarr",
             "variables": [
                 "t",
                 "U_2m_above_srf",
             ],
             "output_parameters": [
                 "Elevation",
                 "Slope",
                 "Aspect"
             ],
             "frequency": "1",
             "specific_datetime": "20191227T160000",
             "write_all_parameters": false
         }
}

forcing#

Input forcing can be either a ASCII timeseries or a NetCDF. Please see forcing for more details on the file specifications.

Warning

Including json sub-config files in this section is not supported

Input forcing stations do not need to be located within the simulation domain. Therefore they can act as ‘virtual stations’ so-as to use reanalysis data, or met stations located outside of the basin. However, because global datasets cannot generally be loaded all at once into memory without significant memory pressure, the met loader only loads points that are within the bounding box of the mesh. For MPI runs, it only loads within the ranks’ meshe sub-domain bounding box. If the number of stations within the bounding box cannot fullfill num_stations_to_use, the bounding box is expanded by 25% (up to 3 times).

An example of this is shown below, where each black point is a virtual station, representing the center for a NetCDF grid cell from a NWP product.

images/netcdf.png
UTC_offset#
type:

int

default:

0

If the input timeseries it not UTC, then this is the correction to account for UTC offset (all solar radiation calculations are in UTC). This is Positive west!

use_netcdf#
Type:

boolean

Default:

false

Specify if a NetCDF (.nc) file will be used. Cannot be used along with ASCII inputs!

station_search_radius#
Type:

double

Default:

None

The search radius (meters) surrounding any given triangle within which to search for a station. This is used to ensure only “close” stations are used. Cannot be used when station_N_nearest is set. Based off the center of the triangle.

station_N_nearest#

This is removed in favour of num_stations_to_use.

num_stations_to_use#
Type:

int

Default:

5

The number of forcing inputs, e.g., netcdf cell centres or station timeseries, to include for the interpolation at a triangle.

Both station_search_radius and num_stations_to_use cannot be simultaneously specified. If neither is specific, then num_stations_to_use:5 is used as default.

If the interpolant mode is nearest, then this is automatically set to 1.

interpolant#
Type:

string

Default:

“spline”

Chooses either thin plate spline with tension (spline) or inverse distance weighting (idw). Nearest selects the closest station and only uses that with no interpolation.

"interpolant" : "idw"
"interpolant" : "spline"
"interpolant" : "nearest"

Note

ASCII and NetCDF inputs cannot be mixed. It is one or the other.

ASCII timeseries#

This is given as "station_name":{ ... }. If using point_mode, then the value station_name must exactly match the input used for option.point_mode.

file#

A relative or absolute path to an input forcing file

latitude#
Type:

double

Latitude of the input station, WGS84. Positive North. Not “N” or “S” suffix

longitude#
Type:

double

Longitude of the input station, WGS84. Positive East. Not “N” or “S” suffix

elevation#
Type:

double

Elevation is given in meters. It does not need to be equal to the elevation of the triangle upon which it lies if the station is located in the simulation domain. This value is used in the lapse rate equations to interpolate the data.

If required, forcing station definitions can be located in an external file. For the external file, the name of the key doesn’t matter. The external file should contain the stations in the format as per above. It does not require an addition "forcing": section definition.

"forcing":
  {

    "num_stations_to_use": 1,
    "interpolant": "nearest",
    "some_station":
     {
       //definition
     },
    "reanalysis_extract_1": "external_file_1.json",
   "reanalysis_extract_2": "external_file_2.json",
}

where external_file_*.json looks like

{
 "station1":
   {
    //details here
   },
 "station2":
   {
    //details here
   }
}
Filters#

Filters perform an operation on the data prior to being passed to a module. They allow for things such as wind-undercatch corrections to be done on the fly.

If a filter is defined, it must be defined on the forcing file and operate upon a variable that exists in the forcing data. They are given as:

"filter_name": { ... }`. The configuration values are filter-specific; please see the filter documentation for what is required. Multiple filters may be specified.

"buckbrush":
{
  "file": "bb_m_2000-2008",
  "latitude": 60.52163,
  "longitude": -135.197151,
  "elevation": 1305,
  "filter":
  {
    "scale_wind_speed": {
      "Z_F": 4.6,
      "variable": "u"
    },
    "goodison_undercatch": {
      "variable": "p"
    }
  }
}

Warning

Filters run in the order defined in the configuration file.

Example#
"forcing":
    {

      "UTC_offset": 8,

      "buckbrush":
        {
          "file": "bb_m_2000-2008",
          "latitude": 60.52163,
          "longitude": -135.197151,
          "elevation": 1305,
          "filter":
            {
            "scale_wind_speed":
                {
                "Z_F": 4.6,
                "variable": "u"
            },
            "goodison_undercatch":
            {
                "variable":"p"
            }
         }
      },
      "alpine":
        {
          "file": "alp_m_2000-2008",
          "latitude": 60.567267,
          "longitude": -135.184652,
          "elevation": 1559,
          "filter": {
         "scale_wind_speed": {
             "Z_F": 2.5,
             "variable": "u"
         },
         "goodison_undercatch":
         {
             "variable":"p"
         }

          }
      }
   }

NetCDF#

The use a netCDF file, set ``forcing:use_netcdf=true`, and choose the netcdf file.

"forcing":
    {

        "use_netcdf": true,
        "file":"forcing_file.nc",
    }

If a multipart file is to be used, simply replace the netcdf with the json list.

"forcing":
    {

        "use_netcdf": true,
        "file":"metadata.json",
        "num_stations_to_use": 1,
        "interpolant": "nearest",
    }

Please see the NetCDF forcing section for more details.

Warning

NetCDF and point_mode are not supported.

Filters#

Filters are the same as for ASCII with one important distinction: every specified filter is run for every virtual station (i.e., grid cell centre).

"filter": {
           "scale_wind_speed": {
               "Z_F": "40",
               "variable": "u"
           }
       }
Example#
"forcing": {
        "UTC_offset": "0",
        "use_netcdf": true,
        "file": "GEM-CHM_2p5_west_2017100106_2018080105.nc",
        "filter": {
            "scale_wind_speed": {
                "Z_F": "40",
                "variable": "u"
            }
        }
    }

checkpoint#

CHM can save its state after a timestep, allowing CHM to resume from this timestep. Further details on the output format can be found in the checkpointing section.

To enable checkpoints, save_checkpoint must be enabled and one of the on_* options must be supplied. Multiple on_* can be combined. For example, on_frequency and on_last can be combined to produce checkpoints every on_frequency timesteps as well as on the last timestep.

Warning

The configuration file can be modified in between checkpoint runs. However, the changes MUST be coherent with what is saved in the checkpoint file. For example the following is not allowed: a new module is added that does not have checkpointed data.

save_checkpoint#
Type:

boolean

Default:

false

Enable checkpointing. Must be set true to enable the checkpointing. One of the options for when to checkpoint must also be sent.

on_frequency#
Type:

int64

Default:

empty

The frequency of checkpointing. Checkpoints every on_frequency timesteps.

specific_datetime#
Type:

string

Default:

””

Checkpoints at a specific date-time, given in the iso format, e.g., "specific_datetime": "20191227T160000"

specific_time#
Type:

string

Default:

””

Checkpoints at a specific time every day, given in a “HH:MM” 24hr-format, e.g., "specific_time": "14:00"

on_last#
Type:

bool

Default:

false

Check point only on the last timestep. Can be used with frequency, but does not require frequency to be set.

on_wallclock_limit#
Type:

bool

Default:

false

If the environment variable CHM_WALLCLOCK_LIMIT is detected at simulation start, then CHM will track how long it has left. When it only has minutes_of_wallclock minutes left (default = 2 min). CHM_WALLCLOCK_LIMIT needs to be created from the scheduler environment. How to do so for PBS Pro and a SLURM is given below:

# All PBS / SLURM options here
# [...]
# Set the env var
CHM_WALLCLOCK_LIMIT=$(squeue -j $SLURM_JOB_ID -h --Format TimeLimit)
CHM_WALLCLOCK_LIMIT=$(qstat -f $PBS_JOBID | sed -rn 's/.*Resource_List.walltime = (.*)/\1/p')

# Then run CHM
mpirun [...] CHM -f [...]
minutes_of_wallclock#
Type:

int64

Default:

2

Number of minutes before the wallclock expires to begin checkpointing. Only has an effect if on_wallclock_limit=true.

load_checkpoint_path#
Type:

string

Default:

empty

Path to checkpoint file to load from (specifically, the json file). Can be used with the other checkpointing options.

auto_resume#
Type:

bool

Default:

false

Set to true to auto-resume from the most recent checkpoint that exists in output_folder/checkpoint. Doing so allows for easily and repeatedly resuming from a checkpoint file.

Note

Upon a successful completion of a simulation of CHM, a sentinel file is written to the output folder “<output_folder>/clean_exit”. This will not be written if CHM suspends due to a wall lock limit. Therefor, the intent of this file is to be used to allow repeated automatic requeuing of a job on HPC that have short wallclock limits, i.e., keep requeuing until that file is found.

Basic checkpoint example

"checkpoint":
{
   "save_checkpoint": true,
   "frequency": 4,
   "on_last": true,
   "load_checkpoint_path":"output/checkpoint/checkpoint_20001001T140000.np1.json"
}

An example of auto save and resume. This will checkpoint with 5 minutes of wall clock left. Then, as long as the same output directory is used, a subsequent run will detect the most recent checkpoint and resume from it.

"checkpoint":
{
   "save_checkpoint": true,
   "on_wallclock_limit": true,
   "minutes_of_wallclock": 5,
   "auto_resumed": true
}