Restarted Simulation Readers

From KitwarePublic
Jump to navigationJump to search

When running a large or long simulation, it is common to need to restart the simulation. There are several reasons for needing to restart a simulation; system failures on large machines can bring down running programs occasionally, or scheduling policies may limit how long a simulation can run. Depending on the simulation, when it is restarted it may create a brand new set of results files. Thus, there will be a data file (or set of data files) for each time the simulation was restarted.

Having these multiple outputs from simulation restarts can lead to complications when reading them. For starters, each data set will have its own time series, but we will probably want to be able to step through time over all time data sets as if they were one. Furthermore, it is common for a restarted simulation to backtracking, meaning that there may be some overlap of time steps amongst the data sets. The file you read the data from is important as the data in some time steps may be invalid if, for example, the simulation ended abruptly while writing the data.

The ParaView development team has been working to simplify reading output from restarted simulations. We try to make this as simple as possible, but there are still some steps that need to be taken to read in these series of data sets. This document describes how to load in restarted data for different file formats.

Exodus

Current simulations write one Exodus file per process and many simulations also write a new set of files each time the simulation is restarted. In the past it was necessary to mash all of these files into on large file. This is no longer necessary. (In fact, it is highly discouraged.) For many years, ParaView has been able to load in a set of files from one simulation run by simply selecting the first file in the set. ParaView will automatically find all other related files, given some standard naming conventions, and load them all together as a single data set. Dealing with restarts adds a level of complication because you now have a series of a series of files.

ParaView provides two solutions to the problem. The first is a naming convention adopted from existing simulation codes. The second is a case file that specifies the names of files in the time series.

Note that both of these solutions still rely on the time values provided in the Exodus files, even if they all contain only one time step. If, for example, all the files report data at time 0, ParaView will assume that they specify data at the same time and will only allow you to load one of the time steps.

Exodus Naming Conventions

New in version 3.8.

By default, ParaView will assume that the numbers at the end of a file represent partition numbers and will attempt to load them all at once. But there is a convention for having numbers specifying a temporal sequence instead. If the files end in .e-s.{RS} (where {RS} is some number sequence), then the numbers (called restart numbers) are taken as a partition in time instead of space. For example, the following sequence of files represents a time series.

mysimoutput.e-s.000
mysimoutput.e-s.001
mysimoutput.e-s.002

You can use any number of digits in the restart numbers, but by convention the number used should match in all files. Also by convention you can leave off the -s.{RS} of the first file. The following file sequence is interpreted the same as that above.

mysimoutput.e
mysimoutput.e-s.001
mysimoutput.e-s.002

It is possible to combine a time series sequence with a spatial partitioning sequence. For files ending in .e-s.{RS}.{NP}.{RANK}, the {RS} is interpreted as the restart number (as in the previous example), the {NP} is interpreted as the number of partitions, and the {RANK} is interpreted as a partition number, which is usually simply the rank of the simulation process that dumped the file. A set of files with four spatial partitions that has three "restarts" (time series sets) will have filenames like the following.

mysimoutput.e-s.000.0004.0000
mysimoutput.e-s.000.0004.0001
mysimoutput.e-s.000.0004.0002
mysimoutput.e-s.000.0004.0003
mysimoutput.e-s.001.0004.0000
mysimoutput.e-s.001.0004.0001
mysimoutput.e-s.001.0004.0002
mysimoutput.e-s.001.0004.0003
mysimoutput.e-s.002.0004.0000
mysimoutput.e-s.002.0004.0001
mysimoutput.e-s.002.0004.0002
mysimoutput.e-s.002.0004.0003

As before, the -s.{RS} part of the files for the first time index are optional. Thus, the following is equivalent to the previous example.

mysimoutput.e.0004.0000
mysimoutput.e.0004.0001
mysimoutput.e.0004.0002
mysimoutput.e.0004.0003
mysimoutput.e-s.001.0004.0000
mysimoutput.e-s.001.0004.0001
mysimoutput.e-s.001.0004.0002
mysimoutput.e-s.001.0004.0003
mysimoutput.e-s.002.0004.0000
mysimoutput.e-s.002.0004.0001
mysimoutput.e-s.002.0004.0002
mysimoutput.e-s.002.0004.0003

Exodus Time Series Case Files

New in version 3.4.

ParaView can handle the naming issue by reading a "case" file to specify the file set of each simulation run. The case file is a simple text file with the extension .ex-timeseries where each line contains the filename of a first file of an Exodus data set. The rest of the files are automatically determined in the same manner as if you had just loaded the single file set. Do not list every file of every data set. The Exodus filenames may be given relative to the case file.

A case file can usually be generated by redirecting the output of the Unix find command. (In some circumstances, you may be able to use the simpler ls command, but there are several technical issues that will probably prevent this from working on most large data sets.) For example, let us say that every file is in a directory and has a name like the following:

mysimoutput.{RR}/mysimoutput.{RR}.{FFFF}

Where {RR} represents the restart number and {FFFF} represents the process number. Note that in this example we have placed the simulation output in different directories. We recommend this practice to limit the number of files in each directory. We can build a case file for this simulation data by running the following command.

find . –name 'mysimoutput.*.0000’ > mysimoutput.ex-timeseries

The find command will then list all the first files in all the subdirectories. Do not worry about the order in the file, ParaView will automatically order the time steps and handle overlapping time. Simply open the mysimoutput.ex-timeseries file in ParaView.

Troubleshooting tip: If all of your files are in the same directory and end with the restart number, the case file will fail to load properly. For example, if you have the following three exodus files that represent a time series (as opposed to a spatial partitioning), listing them in a case file will not work.

mysimoutput.00
mysimoutput.01
mysimoutput.02

The problem is that ParaView will incorrectly identify each entry in your case file as a file in the same partitioned set of files. You will instead have to rename the files so that ParaView will not recognize them as an Exodus partition sequence. You could, for example, append .e to all the file names. You could also convert them to the Exodus naming conventions, in which case you will no longer need the case file at all.

SPCTH

Current CTH simulations write one spyplot file per process and also write a new set of files each time the simulation is restarted. ParaView is able to load in a set of files from one simulation run by simply selecting the first file in the set. ParaView will automatically find all other related files, given some standard naming conventions, and load them all together as a single data set. Dealing with restarts adds a level of complication because you now have a series of a series of files with no naming convention.

ParaView now handles the naming issue by requiring a “case” file to specify the file set of each simulation run. The case file is a simple text file with the extension .spcth-timeseries where each line contains the filename of a first file of a spyplot data set. The rest of the files are automatically determined in the same manner as if you had just loaded the single file set. Do not list every file of every data set. The spyplot filenames may be given relative to the case file.

A case file can usually be generated by redirecting the output of the Unix find command. (In some circumstances, you may be able to use the simpler ls command, but there are several technical issues that will probably prevent this from working on most large data sets.) For example, let us say that every file is in a directory and has a name like the following:

mysimoutput{R}/spct{R}.{F}

Where {R} represents the restart identifier (maybe a number or a letter) and {F} represents the process number. Note that in this example we have placed the simulation output in different directories. We recommend this practice to limit the number of files in each directory. We can build a case file for this simulation data by running the following command.

find . –name 'spct*.0’ > mysimoutput.spcth-timeseries

The find command will then list all the first files in all the subdirectories. Do not worry about the order in the file, ParaView will automatically order the time steps and handle overlapping time. Simply open the mysimoutput.spcth-timeseries file in ParaView.

Customized Restart Reader

This section is for developers who want to implement a restarted output reader or a reader for any other time series of files. Reading a time series of files is handled by the vtkFileSeriesReader class (located in the ParaView3/Servers/Filters directory). vtkFileSeriesReader is really a meta-reader that takes a “core” reader that will do the actual loading and parsing of data from files. In order to use vtkFileSeriesReader, you must place it in a special FileSeriesReaderProxy, which will provide the introspection facilities needed to use the core reader.

There are two modes with which vtkFileSeriesReader can a sequence of files. In the first mode, vtkFileSeriesReader simply takes a list of files. If the files all exist in the same directory and have the same name with the exception of a number that identifies the file, then the ParaView GUI will automatically group these files and feed them all to the reader. In the second mode, vtkFileSeriesReader takes a meta-file (often referred to a case file) that lists all the files to load. This mode often adds extra requirements for the user but is useful when the naming convention of the first mode cannot be easily followed. A common example is a file format which is itself a collection of files, resulting in two numberings in the file names. ParaView is not able to resolve which numbers are temporal and which are spatial.

The rest of this document assumes that you already have an implementation of the core reader for a single output set. It also assumes that you either already have the server manager XML for accessing the reader or are capable of creating it.

Collection of Files

If the restarts are generally written as (or can be referenced as) a sequence of numbered files, then you should just allow vtkFileSeriesReader to load them directly. This is the default mode. Since vtkFileSeriesReader is just as capable of loading a single file as a series of files and since the ParaView GUI does not really distinguish selecting a single file from a group of numbered files, you should probably “hide” the definition for the core reader by placing it in a ProxyGroup other than sources, for example internal_sources.

The following is the boilerplate server manager XML needed to enable a file series reader in ParaView. It sets up the special FileSeriesReaderProxy, establishes the core reader, and provides information about the times defined. Text enclosed in {braces} or after an ellipse (...) needs to be replaced with a value specific to your reader.

<source lang="xml">

<FileSeriesReaderProxy name="{ReaderName}"
                       class="vtkFileSeriesReader"
                       label="{GUI Reader Name}"
                       file_name_method="{SetFileName method}">
  <Documentation ...

  <SubProxy>
    <Proxy name="Reader"
           proxygroup="internal_sources" proxyname="{CoreReader}" />
    <ExposedProperties>
      <Property ...
    </ExposedProperties>
  </SubProxy>

  <StringVectorProperty name="FileNames"
                        clean_command="RemoveAllFileNames"
                        command="AddFileName"
                        animateable="0"
                        number_of_elements="0" 
                        repeat_command="1">
    <FileListDomain name="files" />
    <Documentation>
      The file or list of files to be read by the reader.
      A list of files will be sequenced over time.
    </Documentation>
  </StringVectorProperty>

  <DoubleVectorProperty name="TimestepValues"
                        repeatable="1"
                        information_only="1">
    <TimeStepsInformationHelper />
    <Documentation>
      Available timestep values.
    </Documentation>
  </DoubleVectorProperty>
</FileSeriesReaderProxy>

</source>

The parameters you need to edit are as follows:

  • FileSeriesReaderProxy tag, name and label parameters: The name and label that the proxy will be registered under, just like any reader proxy.
  • FileSeriesReaderProxy tag, file_name_method parameter: The name of the method used in the core reader. Note that this is the name of the method in the VTK class, not the name of the property in the server manager proxy.
  • Reader SubProxy: Set the proxy name (and group) of the Reader subproxy to the name of the core reader proxy (not the VTK class name). Also make sure to expose any properties that you want accessible through the GUI.

Register the reader proxy with the GUI in the same way you would register the core reader. For an example of a reader that uses the vtkFileSeriesReader in this way, see the legacy file reader.

Meta File

When using the meta file option with vtkFileSeriesReader, the reader can no longer behave like a reader for a single file set. It will be a separate reader that takes a different file format (the meta or case file). Thus, when you create the XML for your core reader, be sure to expose it as a reader (by putting it in the sources proxy group) and registering it as itself in the GUI.

The following is the boilerplate server manager XML needed to enable a file series reader using the meta file option in ParaView. It sets up the special FileSeriesReaderProxy, establishes the core reader, enables the meta file option, and provides information about the times defined. Text enclosed in {braces} or after an ellipse (...) needs to be replaced with a value specific to your reader.

<source lang="xml">

<FileSeriesReaderProxy name="{ReaderName}"
                       class="vtkFileSeriesReader"
                       label="{GUI Reader Name}"
                       file_name_method="{SetFileName method}">
  <Documentation ...

  <SubProxy>
    <Proxy name="Reader" proxygroup="sources" proxyname="{CoreReader}" />
    <ExposedProperties>
      <Property ...
    </ExposedProperties>
  </SubProxy>

  <StringVectorProperty name="FileName"
                        animateable="0"
                        command="SetMetaFileName"
                        number_of_elements="1">
    <FileListDomain name="files" />
    <Documentation>
      This points to a special metadata file that lists the
      output files for each restart.
    </Documentation>
  </StringVectorProperty>

  <IntVectorProperty name="UseMetaFile"
                     command="SetUseMetaFile"
                     number_of_elements="1"
                     default_values="1">
    <BooleanDomain name="bool" />
    <Documentation>
      This hidden property must always be set to 1 for this proxy to work.
    </Documentation>
  </IntVectorProperty>

  <DoubleVectorProperty name="TimestepValues" 
                        repeatable="1"
                        information_only="1">
    <TimeStepsInformationHelper />
  </DoubleVectorProperty>

  <Hints>
    <Property name="UseMetaFile" show="0" />
  </Hints>
</FileSeriesReaderProxy>

</source> The parameters you need to edit are as follows:

  • FileSeriesReaderProxy tag, name and label parameters: The name and label that the proxy will be registered under, just like any reader proxy. Of course, make sure they do not conflict with the core reader.
  • FileSeriesReaderProxy tag, file_name_method parameter: The name of the method used in the core reader. Note that this is the name of the method in the VTK class, not the name of the property in the server manager proxy.
  • Reader SubProxy: Set the proxy name (and group) of the Reader subproxy to the name of the core reader proxy (not the VTK class name). Also make sure to expose any properties that you want accessible through the GUI.

Register the reader proxy with the GUI in the same way you would register any other reader. Of course, make sure you use a different file extension for the meta/case file than the types of files it points to. For an example of a reader that uses the vtkFileSeriesReader in this way, see the SPCTH restart reader.

Gotchas

Ideally, you can create a restart reader by simply wrapping your core reader in a vtkFileSeriesReader as described previously. However, in this imperfect world there are sometimes some stumbling blocks in getting this to work. There are sometimes “gotchas” that will require extra work on your part. In this section we try to capture them and provide advice on how to solve them.

Reader process request robustness

Readers are usually used in a specific order; state is set and ProcessRequest is called with a specific sequence of events: request data object, request information, and request data. State may be changed, but usually not the file name.

vtkFileSeriesReader will often need to call the core reader in different ways. It will have to occasionally change the file name to query and retrieve information over time. It may also have to call request information multiple times in a row to retrieve all the time information. This behavior may cause conditions that were not considered or tested in the core reader, so finding aberrant behavior from the core reader is common. In short, be read to do some debugging.

Reader calls GetOutput from within ProcessRequests

All vtkAlgorithm classes (including all reader classes) have a GetOutput method that returns its output object for the pipeline. The output is really managed by the executive attached to the algorithm.

When processing a pipeline request, the algorithm is supposed to use the input and output objects passed to it through the information object arguments to ProcessRequests (and subsequent calls to methods like RequestInformation and RequestData). Although they are not supposed to do so, some algorithms might ignore these arguments and call GetOutput on itself to get the output data object. Under normal pipeline operation, these two objects are the same, and the algorithm will seem to behave normally. However, vtkFileSeriesReader will call ProcessRequests outside of the core reader’s pipeline, and the two output objects will be different.

When vtkFileSeriesReader is used with a core reader that calls GetOutput in its RequestData, it usually results in vtkFileSeriesReader returning an empty data object. In this case, the core reader will need to be fixed to use the output object passed in as arguments to ProcessRequest/RequestData.

State changes between input changes

When the vtkFileSeriesReader changes the filename in the core reader, it expects the rest of the core reader’s state to remain the same. However, when the core reader detects that the filename has changed, it may clear out some state associated with information specific to that file, including state that is exposed through properties. This is not a bug on the reader’s part; rather, it is simply the engineered behavior of the reader. However, it can cause server manager properties to become out of sync.

There is no automatic way around this situation. The easiest solution that we have found is to subclass vtkFileSeriesReader and to override the RequestInformationForInput method. This is the method that vtkFileSeriesReader calls to change the filename of the core reader and get information about that file. By overriding this method, a subclass has the ability to save part of the state, let the superclass change the filename and read information, and then restore the appropriate state. An example of this is the vtkExodusFileSeriesReader.

Acknowledgements

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

SAND 2008-3286P