Fast Path For Temporal Data

From ParaQ Wiki
Jump to navigationJump to search

Problem Defintion

The new time support in VTK is described here. It contains the following excerpt:


...requesting the data for one cell across many timesteps would still be very slow compared to what it could be for a more optimized path. To address this need in the future we plan on creating a fast-path for such requests. This fast path will be implemented for key readers and the array calculator initially. It is still unclear exactly how this will be implemented. But it will effectively be a separate pipeline possibly connecting to a different output port. Another option is to have a special information request that returns the data as meta-information as opposed to first class data.


The purpose of this page is to begin a discussion on how to implement such a "fast-path".

Exodus Example

An example of this optimized path can be seen in the exodus API. It contains ex_get_xxx_time() functions that read the values of a node/element variable for a single node/element through a specified number of time steps. When a filter wants a node's/element's variable value over time (e.g. the vtkExtractDataOverTime filter), instead of re-executing for each timestep, it would send one request upstream to the reader which, in the case of the exodus reader, would then call the appropriate ex_get_xxx_time() method. The problem then becomes how to propagate the data back to the filter. Should it use a separate pipeline or send it back in a vtkInformation key-to-value map?

Proposed Algorithm

1. The exodus reader advertises a special key that tells filters it supports a fast-path for extracting data over time.

2. If a filter supports fast-paths, it will check its input pipeline information to see if it has this key. If it does, it creates an information request to send upstream to the reader, telling it the type of variable (node or element), the id of the node/element, and the range of time steps to return.

3. The reader listens for this request and responds to it by calling ex_get_(elem|nodal)_var_time() for each enabled (nodal/element) variable array. For each one, it will add a new array to its output vtkFieldData, where the array name is formatted as "{ARRAY_NAME}OverTime" (i.e. "TemperatureOverTime") to ensure no conflicts occur with the names of arrays in the vtkPointData or vtkCellData.

4. Back at the filter, it will unpack the "XXX_OverTime" arrays in the field data and copy them on to the output point/cell data arrays of the filter (changing the array names back to the original ones).


Questions

One question this discussion doesn't seem to answer is how this mechanism interacts with the pipeline executives. Will it be a separate request (on the same level as REQUEST_DATA or REQUEST_INFORMATION)? If not, which request type will it be included in? In any event, which executive will generate requests of this type?
The following keys will be added to vtkStreamingDemandDrivenPipeline: FAST_PATH_FOR_TEMPORAL_DATA (set by the reader on its output information), FAST_PATH_OBJECT_TYPE ("CELL","POINT","EDGE", etc), FAST_PATH_ID_TYPE ("GLOBAL" or "INDEX"), and FAST_PATH_OBJECT_ID. In the filter's handling of a REQUEST_UPDATE_EXTENT request, these keys will be added to its input information and thus propagated upstream. In the reader's handling of REQUEST_DATA requests, it will extract these keys. In the filter's handling of REQUEST_DATA requests is where it will unpack the temporal arrays from its input field data and pass to its output point data. The pipeline will actually have to be executed a couple times in order for all this to happen. --Eric 12:34, 17 July 2007 (EDT)

Should we even allow the filter to request a range of time steps? The exodus API supports it which is why I included it.

I also think that it should be included. One use case for this is plots comparing different datasets. Rather than force the user to trim unneeded extents where two datasets' simulation times don't overlap, we should be able to request only those times where they do overlap.

Should the filter be able to specify a specific array to extract data from (instead of having the reader output data for all enabled arrays)?

This would certainly increase the efficiency of the reader since it would eliminate disk seeks to page in data which may never be used.
The proposal mentions cell and node data, but what about the edge and face data?
I think a bigger issue is how to map from the VTK point/cell id that the user has selected to the correct id in the correct exodus file. This is probably a technical detail that Eric will have to figure out (and a detail that will change once we move to multiblock). Once it is figured out, the edge/face data should just fall out. It is my understanding that those are defined only in edge and face sets, which means that the user will be selecting a cell variable on that part of the output. So, again, this is just reverse mapping back to the appropriate exodus identifier. --Ken 09:45, 17 July 2007 (EDT)
How will this work when running pvserver in parallel? Will only the rank0 process respond? Or will each process look up the node/cell values in its piece of the dataset? This is something that must be decided if the fast path is not an "out-of-band" call. If it is an out-of-band request, then it is not as important. --Dcthomp 03:12, 17 July 2007 (EDT)
A big part of this will be determined in how the reverse mapping happens. Most likely, each process will have to hold information about the part of the data that it read so that it could do the reverse mapping of the local cells. However, it may be a good idea to transfer the data all to the root process. Ideally it would work either way, but we would be less likely to run into bugs or technical issues if the data was just transfered to node 0. --Ken 09:45, 17 July 2007 (EDT)