Difference between revisions of "ParaView/Users Guide/Batch Processing"

From KitwarePublic
Jump to: navigation, search
m
Line 156: Line 156:
 
</source>
 
</source>
  
The second way is to set up the pipeline and then use introspection to find and then change the filename. This approach is easier to parameterize but somewhat more fragile since not all readers respond well to having their names changed once established. You should at least use caution and try to change the filename before the pipeline first runs. Otherwise more readers will be confused and you will also waste time processing the smaller file. When loading state files the proper place to do this is immediately after the '''LoadState''' command. For python scripts the place to do this is as near to the creation of the reader as possible, and certainly before any '''Update''' or '''Render''' commands.
+
The second way is to set-up the pipeline and then use introspection to find and then change the file name. This approach is easier to parameterize, but somewhat more fragile since not all readers respond well to having their names changed once established. You should use caution and try to change the file name before the pipeline first runs. Otherwise, more readers will be confused and you will also waste time processing the smaller file. When loading state files, the proper place to do this is immediately after the LoadState command. For Python scripts, the place to do this is as near to the creation of the reader as possible, and certainly before any Update or Render commands.
 
An example of how to do this follows:
 
An example of how to do this follows:
  

Revision as of 15:20, 19 May 2011

Batch Processing

ParaView's pvbatch and pvpython command line executables substitute a Python interpreter for the Qt GUI interface that most users control ParaView's back-end data processing and rendering engine through. Either may be used for batch processing, that is to replay Visualization sessions in an exact, easily repeated way. The input to either comes in the form of the same Python script that was described in the previous section.

Of the two, pvbatch is more specialized for batch processing and suited to running in an offline mode on dedicated data-processing supercomputers because:

  • It does not take commands from the terminal, which is usually unavailable on this class of machines.

Therefore you must supply a filename of the script you want pvbatch to execute.

  • It is permanently joined to the back-end server and thus does not require TCP socket connections to it.

Therefore, in the scripts that you give to pvbatch, it is not possible to Disconnect() from the paired server or Connect()to a different one.

  • It can be run directly as an MPI parallel program in which all pvbatch processes divide the work and cooperate.

Therefore, you typically start pvbatch like this:

[mpiexec -N <numprocessors>] pvbatch [args-for-pvbatch] script-filename [args-for-script]

Creating the Input Deck

There are at least three ways to create a batch script.

The hardest one is writing it by hand using the syntax described in the previous section. You can, of course, use any text editor for this. However, you will probably be more productive if you set up a more fully-featured Python IDE like Idle or the Python shell within the ParaView GUI so that you have access to interactive documentation, tab completion, and quick preview capabilities. Another alternative is to let the ParaView GUI client record all of your actions into a Python script by using the Python Trace feature. Later, you can easily tweak the recorded script once you become familiar with ParaView's Python syntax. The third, and to longtime ParaView users the most traditional way, is to instead record a ParaView state file and then load that via a small Python script as demonstrated in the first example below.

Examples

Loading a State File and Saving a Rendered Result

>>> from paraview.simple import *
# Load the state
>>> servermanager.LoadState("/Users/berk/myteststate.pvsm")

At this point, you have a working pipeline instantiated on the server that you can use introspection on to access and then arbitrarily control anything within. At the core, ParaView is a visualization engine, so we will demonstrate by simply generate and saving an image.

# Make sure that the view in the state is the active one so we don't have to refer to it by name.
>>> SetActiveView(GetRenderView())
# Now render and save.
>>> Render()
>>> WriteImage("/Users/berk/image.png")

Parameter Study

Parameter studies are one example of how batch processing can be extremely useful. In a parameter study, one-or-more pipeline parameters (for example: a filename, a timestep, or a filter property) are varied across some range but an otherwise identical script is replayed numerous times and results are saved. After the suite of sessions complete, the set of results are easy to compare. For this type of work, it is helpful to write a higher-level script that varies the parameter; each value spawns off a pvbatch session where the parameter gets passed in as an argument to the ParaView Python script.

The following is a slightly condensed version of a hierarchical set of scripts written during a benchmark study. This benchmark is an example of a parameter study in which the number of triangles rendered in the scene is varied. Afterward, we examine the output to determine how the rendering rate differs as a function of that parameter change.

This top-level script varies the number of triangles and then submits parallel jobs to the cluster's PBS batch queue. See the qsub manpages or ask your system administrators for the exact syntax of the submission command.

RUNID=0
NNODES=8
TLIMIT=10
for NUMTRIS in 10 20 30 40 50
do
    mkdir ~/tmp/run${RUNID}
 
    qsub -N run${RUNID} \
        -l "walltime=0:${TLIMIT}:0.0 select=${NNODES}:ncpus=8:arch=wds024c" \
        -j eo -e ~/tmp/run${ID}/outstreams.log \
        -v "RUNID=${ID} NNODES=${NNODES} NUMTRIS=${NUMTRIS}" \
        ~/level2.sh
 
    let RUNID+=1
done

The second level script is executed whenever it gets to the top of PBS's priority queue. It examines the parameters it is given and then runs ParaView's pvbatch executable with them. It also does some bookkeeping tasks that are helpful when debugging the batch submission process.

echo "RUN NUMBER=${RUNID}"
 
#setup MPI environment
source ${HOME}/openmpipaths.sh
 
#prepare and run the parallel pvbatch program for the parameter value we are given
batch_command="${HOME}/ParaView-3.8.1/build/bin/pvbatch ${HOME}/level3.py -# ${RUNID} -nt ${NUMTRIS}"
mpirun -np $NNODES --hostfile $PBS_NODEFILE $batch_command
 
#move the results to more permanent storage
mv /tmp/bench* ${HOME}/tmp/run${DDM_RUNNUM}

The final level is the script that is executed by pvbatch.

from paraview.simple import *
from optparse import OptionParser
import paraview.benchmark
import math
import sys
import time
 
parser = OptionParser()
parser.add_option("-#", "--runid", action="store", dest="runid",type="int",
                  default=-1, help="an identifier for this run")
parser.add_option("-nt", "--triangles", action="store", dest="triangles",type="int",
                  default=1, help="millions of triangles to render")
(options, args) = parser.parse_args()
 
print "########################################"
print "RUNID = ", options.runid
print "START_TIME = ", time.localtime()
print "ARGS = ", sys.argv
print "OPTIONS = ", options
print "########################################"
 
paraview.benchmark.maximize_logs()
 
TS = Sphere()
side=math.sqrt(options.triangles*1000000/2)
TS.PhiResolution = side
TS.ThetaResolution = side
 
dr = Show()
view.UseImmediateMode = 0
view = Render()
 
cam = GetActiveCamera()
for i in range(0,50):
  cam.Azimuth(3)
  Render()
  WriteImage('/tmp/bench_%d_image_%d.jpg' % (options.runid, i))
 
print "total Polygons:" + str(dr.SMProxy.GetRepresentedDataInformation(0).GetPolygonCount())
print "view.ViewSize:" + str(view.ViewSize)
 
paraview.benchmark.get_logs()
logname="/tmp/bench_" + str(options.runid) + "_rawlog.txt"
paraview.benchmark.dump_logs(logname)
 
print "#######"
print "END_TIME = ", time.localtime()

Large Data Example

Another important example is for visualizing extremely large datasets that can not be easily worked with interactively. In this setting, the user first constructs a visualization of a small but representative data set. This typically takes place by recording a session in the standard GUI client running on some small and easily-accessed machine. Later, the user changes the file name property of the reader in the recorded session file. Finally, the user submits the script to a larger machine, which performs the visualization offline and saves results for later inspection.

It's essential to substitute the file name and location of the original small dataset with the name and locations of the large one. There are two ways to do this.

The first way is to directly edit the file name in either the ParaView state file or the Python script where it is loaded. The task is made easier by the fact that all readers conventionally name the input file name property "FileName". Standard Python scripts are well described in other sections, so we will describe ParaView state files here instead. A ParaView state file has the extension ".pvsm" and the internal format is a text-based XML file. Simply open the pvsm file in a text editor, search for FileName, and replace all occurrences of the old with the new.

For reference, the portion of a pvsm file that specifies a reader's input file is:

    <Proxy group="sources" type="LegacyVTKFileReader" id="160" servers="1">
      <Property name="FileNameInfo" id="160.FileNameInfo" number_of_elements="1">
        <Element index="0" value="/Data/molar.vtk"/>
      </Property>
      <Property name="FileNames" id="160.FileNames" number_of_elements="1">
        <Element index="0" value="/Data/molar.vtk"/>
        <Domain name="files" id="160.FileNames.files"/>
      </Property>
      <Property name="TimestepValues" id="160.TimestepValues"/>
      <SubProxy name="Reader" servers="1"/>
    </Proxy>

The second way is to set-up the pipeline and then use introspection to find and then change the file name. This approach is easier to parameterize, but somewhat more fragile since not all readers respond well to having their names changed once established. You should use caution and try to change the file name before the pipeline first runs. Otherwise, more readers will be confused and you will also waste time processing the smaller file. When loading state files, the proper place to do this is immediately after the LoadState command. For Python scripts, the place to do this is as near to the creation of the reader as possible, and certainly before any Update or Render commands. An example of how to do this follows:

>>> from paraview.simple import *
# Load the state
>>> servermanager.LoadState("/Users/berk/myteststate.pvsm")
# Now the pipeline will be instantiated but it will not have updated yet.
# You can programmatically obtain the reader from the pipeline starting with this command, which lists all readers, sources and filters in the pipeline.
>>> GetSources()
#{('box.ex2', '274'): <paraview.servermanager.ExodusIIReader object at 0x21b3eb70>}
# But it is easier if you note that readers are typically named according to the name of the file that they are created for.
>>> reader = FindSource('box.ex2')
#Now you can change the filename with these two commands:
>>> reader.FileName = ['/path_to/can.ex2']
>>> reader.FileNameChanged()