Output#

There are two main outputs from CHM: timeseries and mesh outputs.

The mesh outputs are either the Paraview vtu format or the netcdf ugrid format. When parameter output is enabled, the default parameter set is Elevation, Slope, and Aspect unless overridden in the output configuration.

The vtu output has 1 file per MPI rank, per timestep. For large domains, large MPI rank counts, and long time periods, this can produce a large number of files. HPC file systems often do not perform well with millions of files. A further consideration is that to produce raster tiff files or similar in post processing requires that the vtu files be converted to ugrid using pyCHM. The vtu format does have some benifits as it allows for viewing a single ranks’ subset, as well as visualisation of the ghost regions. Lastly, because of limitations in the vtu format, a duplicate mesh used for outputting must be held in memory making it more memory heavy than the ugrid output.

vtu#

The .vtu format is a Paraview (vtk) unstructured mesh. It is one file, per MPI rank, per timestep.

The naming scheme of these files is

base_name + posix datetime + _MPIrank + .vtu

For example: SC1506837600_0.vtu

Each MPI rank writes its own subdomain into a single file. This process’ rank is thus suffixed to the file. If only 1 rank is used, a _0 will always be added for consistency.

In addition to the vtu files, a base_name.pvd is written. This is an XML file that holds a reference to all the vtu files:

<?xml version="1.0" encoding="utf-8"?>
<VTKFile type="Collection" version="0.1">
 <Collection>
     <DataSet timestep="1506837600" group="" part="0" file="SC1506837600_0.vtu"/>
      ...

Although the vtu files may be loaded directly into Paraview, it is preferred to load the pvd file. Due to the timestep field, the Datetime plugin can then show an overlay with the date-time for easier analysis.

Note

If more than 1 rank is used, the pvd file is the only reasonable way of loading all the parts of the mesh into one view.

ugrid#

The ugrid file is a netcdf URGID schema. It defines the mesh in the same way as what is used internally to CHM where the triangles are composed indexes of 3-nodes into the vertex list. The values in CHM are almost exclusively face centered. By default the ugrid is level 5 deflate+shuffle with bitgrooming. This improves the compression, and improves the output speed. For a mesh of #cells=2200 for 24 hours with 3 ranks:

UGRID output can also be stored as a Zarr directory via NCZarr by setting output.ugrid.format to zarr. This creates a .zarr store with the same schema and parallel write behavior.

Rotation#

UGRID rotation (rotate_frequency) starts a new file every N timesteps. Rotated files are named <base_name>_YYYYMMDDTHHMMSS.nc based on the model time. When resuming from a checkpoint, CHM continues writing to the file that was active at checkpoint time and preserves the rotation cadence.

Chunking#

The UGRID netcdf files are chunked in space such that each in-space chunk corresponds to an MPI rank’s spatial decomposition. In time, the chunking is per-timestep. With compression enabled, this is the most efficient way to write to disk. However, for large number of MPI ranks (e.g., 800), with many variables, with many timesteps, this results in significant overhead during analysis where the optimal chunking scheme may be 24 or more timestpes, and 1M+ triangles instead of the ~30k which is often in the MPI decomposition. Therefore, post processing tools exist in pyCHM to rechunk to more efficient analysis datastores (e.g., zarr) with a more optimal chunking layout.

CHM can also attempt to optimize the UGRID time chunking, sized to balance Dask recommendations and output cadence:

  • Target ~256MB per variable chunk, with bounds of 1MB (min) and 1GB (max).

  • If total chunks per output file would exceed 100k, the time chunk grows up to a hard ceiling of 2GB.

  • Chunk lengths align to the mesh output cadence (frequency or only_last_n). If neither is set, the chunker assumes only a small number of outputs and keeps chunk sizes small.

Users can override chunk sizing with either chunk_time_len (explicit timesteps) or chunk_target_mb (target MB per variable). Only one override can be set at a time. These settings can metadata overhead and avoid overly large task graphs while respecting model output frequency.

Warning

A temporal chunking of t > 1 when compression is enabled results in having to decompress, append, recompress, and write back each timestep. This is slow.

Method

size (b)

timestep (ms)

+bit +compress

1866316

235

~bit +compress

3568842

350

~bit ~compress

5076286

280

$ ncdump -h output_casr3_debug.nc
netcdf output_casr3_debug {
dimensions:
        nMesh2_node = 3761 ;
        nMesh2_face = 6828 ;
        Two = 2 ;
        Three = 3 ;
        time = 24 ;
variables:
        double time(time) ;
                time:standard_name = "time" ;
                time:long_name = "Time" ;
                time:units = "minutes since 1970-01-01 00:00:00" ;
        int Mesh2 ;
                Mesh2:cf_role = "mesh_topology" ;
                Mesh2:long_name = "Topology data of 2D unstructured mesh" ;
                Mesh2:topology_dimension = 2 ;
                Mesh2:node_coordinates = "Mesh2_node_x Mesh2_node_y" ;
                Mesh2:face_node_connectivity = "Mesh2_face_nodes" ;
                Mesh2:face_dimension = "nMesh2_face" ;
                Mesh2:face_coordinates = "Mesh2_face_x Mesh2_face_y" ;
        uint Mesh2_face_nodes(nMesh2_face, Three) ;
                Mesh2_face_nodes:cf_role = "face_node_connectivity" ;
                Mesh2_face_nodes:long_name = "Maps every triangular face to its three corner nodes." ;
        double Mesh2_node_x(nMesh2_node) ;
                Mesh2_node_x:standard_name = "longitude" ;
                Mesh2_node_x:long_name = "Longitude of 2D mesh nodes." ;
                Mesh2_node_x:units = "degrees_east" ;
                Mesh2_node_x:_FillValue = NaN ;
        double Mesh2_node_y(nMesh2_node) ;
                Mesh2_node_y:standard_name = "latitude" ;
                Mesh2_node_y:long_name = "Latitude of 2D mesh nodes." ;
                Mesh2_node_y:units = "degrees_north" ;
                Mesh2_node_y:_FillValue = NaN ;
        double Mesh2_node_z(nMesh2_node) ;
                Mesh2_node_z:standard_name = "altitude" ;
                Mesh2_node_z:long_name = "Z coordinate of 2D mesh nodes." ;
                Mesh2_node_z:units = "m" ;
                Mesh2_node_z:mesh = "Mesh2" ;
                Mesh2_node_z:location = "node" ;
                Mesh2_node_z:_FillValue = NaN ;
        double Mesh2_node_z_paraview(nMesh2_node) ;
                Mesh2_node_z_paraview:standard_name = "scaled_altitude" ;
                Mesh2_node_z_paraview:long_name = "Scaled Z coordinate of 2D mesh nodes." ;
                Mesh2_node_z_paraview:units = "m" ;
                Mesh2_node_z_paraview:mesh = "Mesh2" ;
                Mesh2_node_z_paraview:location = "node" ;
                Mesh2_node_z_paraview:_FillValue = NaN ;
        uint64 global_id(nMesh2_face) ;
                global_id:mesh = "Mesh2" ;
                global_id:location = "face" ;
                global_id:coordinates = "Mesh2_face_x Mesh2_face_y" ;
        double Mesh2_face_x(nMesh2_face) ;
                Mesh2_face_x:standard_name = "latitude" ;
                Mesh2_face_x:long_name = "Characteristics latitude of 2D mesh triangle (e.g. circumcenter coordinate)." ;
                Mesh2_face_x:units = "degrees_north" ;
        double Mesh2_face_y(nMesh2_face) ;
                Mesh2_face_y:standard_name = "latitude" ;
                Mesh2_face_y:long_name = "Characteristics latitude of 2D mesh triangle (e.g. circumcenter coordinate)." ;
                Mesh2_face_y:units = "degrees_north" ;
        double Mesh2_face_z(nMesh2_face) ;
                Mesh2_face_z:standard_name = "altitude" ;
                Mesh2_face_z:long_name = "Characteristics latitude of 2D mesh triangle (e.g. circumcenter coordinate)." ;
                Mesh2_face_z:units = "m" ;
        double rh(time, nMesh2_face) ;
                rh:mesh = "Mesh2" ;
                rh:location = "face" ;
                rh:_FillValue = NaN ;
        double Elevation(nMesh2_face) ;
                Elevation:mesh = "Mesh2" ;
                Elevation:location = "face" ;
                Elevation:_FillValue = NaN ;

Variables are defined on dimensions = (face, time), and parameters are only on dimension = (face).

This ugrid file can be loaded into Paraview (chose the netcdf ugrid loader option in Paraview). By default the z coordinate is not shown. From the Filters menu option, search for “Warp by Scalar” and choose the Mesh2_node_z_paraview field. Paraview struggles with lat/long x,y coords with high elevation values thus, this is a scaled-for-visualization value. If you want to see the cell elevation, choose the “Elevation” from the variable colouring menu.

For analysis, xarray can load ugrid, but uxarray is likely a better option

timeseries#

The timeseries output is a text, comma-separated-value (.csv) file. The first column is always datetime, and the subsequent column names are the variables output from all the modules. This order is not guaranteed.

The datetime is in the format YYYYMMDDThhmmss.

These files are output to the output_folder/points/ subdirectory. The files are named station_name.txt.

datetime,ilwr,l,acc_snow
20170901T060000,429.61,1.81749