ParaView/Users Guide/Batch Processing

From KitwarePublic
< ParaView
Revision as of 11:28, 4 February 2011 by DaveDemarle (Talk | contribs)

Jump to: navigation, search

Batch Processing

ParaView's pvbatch and pvpython command line executables substitute a python interpreter for the Qt GUI interface that most users control ParaView's back end data processing and rendering engine through. Either may be used for batch processing, that is to replay Visualization sessions in an exact, easily repeated way. The input to either comes in the form of the same python script that was described in the previous section.

Of the two, pvbatch is more specialized for batch processing and suited to running in an offline mode on dedicated data processing supercomputers because:

  • It does not take in commands from the terminal, which is usually unavailable on this class of machines.

Therefore you must supply a filename of the script you want pvbatch to execute.

  • It it is permanently joined to the backend server and thus does not require TCP socket connections to it.

Therefore in the scripts that you give to pvbatch it is not possible to Disconnect() from the paired server or Connect() to a different one.

  • It can be run directly as an MPI parallel program in which all pvbatch processes divide up the work and cooperate.

Therefore you typically start pvbatch like this:

[mpiexec -N <numprocessors>] pvbatch [args-for-pvbatch] script-filename [args-for-script]

Creating the Input Deck

There are at least three ways to create a batch script.

The hardest one is writing it by hand using the syntax described in the previous section. You can of course use any text editor for this but you will probably be more productive if you set up a more fully featured python IDE like Idle or the python shell within the ParaView GUI so that you have access to interactive documentation, tab completion and quick preview capabilities. Another alternative is to let the ParaView GUI client record all of your actions into a python script by using the Python Trace feature. Later you can easily tweak the recorded script once you become familiar with ParaView's python syntax. The third, and to longtime ParaView users the most traditional way, is to instead record a ParaView state file and then load that via a small python script as demonstrated in the first example below.


Loading a state file and saving a rendered result

>>> from paraview.simple import *
# Load the state
>>> servermanager.LoadState("/Users/berk/myteststate.pvsm")

At this point you have a working pipeline instantiated on the server which you can use introspection on to access and then arbitrarily control anything within. At the core ParaView's is a visualization engine so we will demonstrate by simply generate and saving an image.

# Make sure that the view in the state is the active one so we don't have to refer to it by name.
>>> SetActiveView(GetRenderView())
# Now render and save.
>>> Render()
>>> WriteImage("/Users/berk/image.png")

parameter study

Parameter studies are one example of how batch processing can be extremely useful. In a parameter study one or more pipeline parameters (a filename, a timestep, or a filter property for example) are varied across some range but an otherwise identical script is replayed numerous times and results are saved. After the suite of sessions complete the set of results are easy to compare. For this type of work I recommend writing a higher level script that varies the parameter and for each value spawns off a pvbatch session where the parameter gets passed in as an argument to the ParaView python script.

The following is a slightly condensed version of a hierarchical set of scripts written during a benchmark study. This benchmark is an example of a parameter study in which the number of triangles rendered in the scene is varied and afterward we examine the output to determine how the rendering rate differs as a function of that parameter change.

This top level script varies the number of triangles and then submits parallel jobs to the cluster's PBS batch queue. See the qsub manpages or ask your system administrators for the exact syntax of the submission command.

for NUMTRIS in 10 20 30 40 50
    mkdir ~/tmp/run${RUNID}
    qsub -N run${RUNID} \
        -l "walltime=0:${TLIMIT}:0.0 select=${NNODES}:ncpus=8:arch=wds024c" \
        -j eo -e ~/tmp/run${ID}/outstreams.log \
    let RUNID+=1

The second level script is executed whenever it gets to the top of PBS's priority queue. It examines the parameters it is given and then runs paraview's pvbatch executable with them. It also does some bookkeeping tasks that are helpful when debugging the batch submission process.

#setup MPI environment
source ${HOME}/
#prepare and run the parallel pvbatch program for the parameter value we are given
batch_command="${HOME}/ParaView-3.8.1/build/bin/pvbatch ${HOME}/ -# ${RUNID} -nt ${NUMTRIS}"
mpirun -np $NNODES --hostfile $PBS_NODEFILE $batch_command
#move the results to more permanent storage
mv /tmp/bench* ${HOME}/tmp/run${DDM_RUNNUM}

The final level is the script that is executed by pvbatch.

from paraview.simple import *
from optparse import OptionParser
import paraview.benchmark
import math
import sys
import time
parser = OptionParser()
parser.add_option("-#", "--runid", action="store", dest="runid",type="int",
                  default=-1, help="an identifier for this run")
parser.add_option("-nt", "--triangles", action="store", dest="triangles",type="int",
                  default=1, help="millions of triangles to render")
(options, args) = parser.parse_args()
print "########################################"
print "RUNID = ", options.runid
print "START_TIME = ", time.localtime()
print "ARGS = ", sys.argv
print "OPTIONS = ", options
print "########################################"
TS = Sphere()
TS.PhiResolution = side
TS.ThetaResolution = side
dr = Show()
view.UseImmediateMode = 0
view = Render()
cam = GetActiveCamera()
for i in range(0,50):
  WriteImage('/tmp/bench_%d_image_%d.jpg' % (options.runid, i))
print "total Polygons:" + str(dr.SMProxy.GetRepresentedDataInformation(0).GetPolygonCount())
print "view.ViewSize:" + str(view.ViewSize)
logname="/tmp/bench_" + str(options.runid) + "_rawlog.txt"
print "#######"
print "END_TIME = ", time.localtime()

large data example

Another important example is for visualizing extremely large datasets that can not be easily worked with interactively. In this setting, the user first constructs a visualization off a small but representative data set. Typically this takes place by recording a session in the standard GUI client running on some small and easily accessed machine. Later, the user edits the filename property of the reader in the recorded session file to point to the large full resolution data. Finally the user submits the script to a larger machine which performs the visualization and saves off results offline.

The essential thing that you need to be able to do for this is to substitute the filename and location of the original small dataset with the name and locations of the large one. There are two ways to do this.

The first way is to directly edit the filename in either the ParaView state file or the python script where it is loaded. The task is made easier by the fact that all readers conventionally name the input file name property "FileName". Standard python scripts are well described in other sections so we will describe paraview state files here instead. A paraview state file has the extension .pvsm and the internal format is a text based XML file. Simply open the pvsm file in a text editor, search for FileName and replace all occurances of the old with the new.

For reference, the portion of a pvsm file that specifies a reader's input file is:

    <Proxy group="sources" type="LegacyVTKFileReader" id="160" servers="1">
      <Property name="FileNameInfo" id="160.FileNameInfo" number_of_elements="1">
        <Element index="0" value="/Data/molar.vtk"/>
      <Property name="FileNames" id="160.FileNames" number_of_elements="1">
        <Element index="0" value="/Data/molar.vtk"/>
        <Domain name="files" id="160.FileNames.files"/>
      <Property name="TimestepValues" id="160.TimestepValues"/>
      <SubProxy name="Reader" servers="1"/>

The second way is to use introspection to set up the pipeline and then replace the file with the larger one before the pipeline updates and time is wasted processing the smaller one. An example of how to do this follows: