Debugging and Catalyst Replay
To simplify the process of debugging in-situ pipelines, catalyst now
supports the serialization of conduit_nodes. During each API call,
users can write the params argument of each API call out to disk.
Then, using catalyst_replay, the nodes will be read back in,
and each API call will be invoked again. This prevents users from
needing to re-run their simulation when debugging.
Serializing Nodes and Writing to Disk
To use the catalyst_replay command, nodes must first be written to disk.
The steps to do this are simple:
Set the environment variable
CATALYST_DATA_DUMP_DIRECTORYto the directory where the node data for each API invocation should be saved.Invoke the stub implementation in your custom API implementation.
This will write the conduit_node passed into the API call out to
CATALYST_DATA_DUMP_DIRECTORY. The conduit_nodes are written out as
.conduit_bin files. They will follow the general pattern
<stage>_params.conduit_bin.<num_ranks>.<rank>, where:
<stage>is one ofinitialize,executeorfinalize.<num_ranks>is the number of MPI ranks that the simulation was run with.<rank>is the 0 based index of the rank used to generate this file.
Files for the execute stage will also include the invocation number,
since catalyst_execute can be called multiple times. For example,
execute_invc0_params.conduit_bin.2.1 would contain the params passed
into the 0th invocation of catalyst_execute, which was called by 2nd of
two ranks (since rank indices are 0-indexed).
Replaying API Calls with catalyst_replay
After the node data has been written out to disk, the catalyst_replay
command can be used to read the node data back into memory and execute the
same API calls. Find the catalyst_replay executable in the
RUNTIME_OUTPUT_DIRECTORY generated by CMake (this is usually bin/).
Run catalyst_replay with the same number of MPI ranks as the simulation
used to generate the data, and pass the value of CATALYST_DATA_DUMP_DIRECTORY
as a command-line argument. This invoke each API method with the corresponding node
data. For an example, see the examples/replay directory.