I agree about the dependencies. I&#39;m a bit worried about dealing with a vtkPNetCDFReader and a vtkPNetCDFCFReader but for parallel performance for climate data sets I think it&#39;s important enough to update it.<br><br>

Thanks,<br>Andy<br><br><div class="gmail_quote">On Thu, Feb 21, 2013 at 4:46 PM, Moreland, Kenneth <span dir="ltr">&lt;<a href="mailto:kmorel@sandia.gov" target="_blank">kmorel@sandia.gov</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div style="font-size:14px;font-family:Calibri,sans-serif;word-wrap:break-word">

<div>

<div>

<div>Both of these sound fine time.  The only caveat is that (1) is a parallel-specific improvement that requires communication.  I don&#39;t remember the dependencies of vtkNetCDFCFReader or the library it is in, but I would hesitate changing the dependencies.

  It might be necessary to make a vtkPNetCDFCFReader subclass.</div>

</div>

</div>

<div><br>

</div>

<div>-Ken</div>

<div><br>

</div>

<span>

<div style="border-right:medium none;padding-right:0in;padding-left:0in;padding-top:3pt;text-align:left;font-size:11pt;border-bottom:medium none;font-family:Calibri;border-top:#b5c4df 1pt solid;padding-bottom:0in;border-left:medium none">

<span style="font-weight:bold">From: </span>Andy Bauer &lt;<a href="mailto:andy.bauer@kitware.com" target="_blank">andy.bauer@kitware.com</a>&gt;<br>

<span style="font-weight:bold">Date: </span>Thursday, February 21, 2013 12:38 PM<br>

<span style="font-weight:bold">To: </span>Burlen Loring &lt;<a href="mailto:bloring@lbl.gov" target="_blank">bloring@lbl.gov</a>&gt;<br>

<span style="font-weight:bold">Cc: </span>Kenneth Moreland &lt;<a href="mailto:kmorel@sandia.gov" target="_blank">kmorel@sandia.gov</a>&gt;, &quot;<a href="mailto:paraview@paraview.org" target="_blank">paraview@paraview.org</a>&quot; &lt;<a href="mailto:paraview@paraview.org" target="_blank">paraview@paraview.org</a>&gt;<br>

<span style="font-weight:bold">Subject: </span>Re: [Paraview] [EXTERNAL] vtkNetCDFCFReader parallel performance<br>

</div><div><div class="h5">

<div><br>

</div>

<blockquote style="BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0 0 5">

<div>

<div>I&#39;ve been looking at the code and there&#39;s a bunch of small reads in order to get all of the meta-data and set things up according to the CF conventions. I&#39;m going through now just making process 0 do this work and broadcast it. I&#39;m getting decent speedups

 on this for 240 processes on hopper where the runs go from taking about 58 seconds down to 38 seconds.

<br>

<br>

Any objections to some significant refactoring of the reader? The 2 things I want to try are:<br>

<br>

1) read in the meta data on process 0 and broadcast to the other processes<br>

<br>

2) reduce the amount of file opens and closes in the reader at the expense of keeping the file pointer open.<br>

<br>

Thanks,<br>

Andy<br>

<br>

<div class="gmail_quote">On Thu, Feb 7, 2013 at 4:33 PM, Burlen Loring <span dir="ltr">

&lt;<a href="mailto:bloring@lbl.gov" target="_blank">bloring@lbl.gov</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div text="#000000" bgcolor="#FFFFFF">

<div>Hi Andy,<br>

<br>

data that small should be fairly fast, and nersc&#39;s global scratch shouldn&#39;t blink when 24 procs access file in read only mode. maybe PV is reading all the data on a single process(or worse all of them) then doing a redistribution behind the scenes?? That would

 certainly explain your results. either way good luck.<span><font color="#888888"><br>

<br>

Burlen</font></span>

<div>

<div><br>

<br>

On 02/07/2013 09:51 AM, Andy Bauer wrote:<br>

</div>

</div>

</div>

<div>

<div>

<blockquote type="cite">Hi Burlen,<br>

<br>

I got the data from a different user and that&#39;s where he put the data. I thought about copying it to $SCRATCH. I just thought though that it was really funky that trying to read in data that was under 4 MB for a single time step should be pretty fast for when

 I only have 24 processes asking for data. I was thinking that using the scratch space would just be covering up some deeper problem too in that I want to scale up to much more than 24 processes. After all, any run that can&#39;t scale beyond 24 processes shouldn&#39;t

 be running on Hopper anyways!<br>

<br>

Andy<br>

<br>

<div class="gmail_quote">On Thu, Feb 7, 2013 at 11:57 AM, Burlen Loring <span dir="ltr">

&lt;<a href="mailto:bloring@lbl.gov" target="_blank">bloring@lbl.gov</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div text="#000000" bgcolor="#FFFFFF">

<div>Hi Andy,<br>

<br>

do you have a strong reason for using the global scratch fs? if not you may have better luck using hopper&#39;s dedicated lustre scratch. Spec quote &gt; 2x bandwidth[*]. In reality I&#39;m sure it depends on the number of user&#39;s hammering it at the time in question.

 may help to use lustre scratch while you&#39;re working on parallelization of the netcdf readers.<br>

<br>

Burlen<br>

<br>

* <a href="http://www.nersc.gov/users/computational-systems/hopper/file-storage-and-i-o/" target="_blank">

http://www.nersc.gov/users/computational-systems/hopper/file-storage-and-i-o/</a>

<div>

<div><br>

<br>

On 02/06/2013 03:35 PM, Andy Bauer wrote:<br>

</div>

</div>

</div>

<blockquote type="cite">

<div>

<div>Hi Ken,<br>

<br>

I think it&#39;s more than just a file contention issue. On hopper@nersc I did set DVS_MAXNODES to 14 and that helped out a lot. Without that set before I was able to run with 480 processes accessing the same data file (the

<span>17*768*1152 with 324 time steps data set) but with the &quot;bad&quot; one that was </span>

<span>768*1152 with 9855 time steps</span> I had problems with just 24 processes.<br>

<br>

I have some things which I want to try out but I think you&#39;re right that using a parallel netcdf library should help a lot, if it doesn&#39;t cause conflicts.<br>

<br>

Thanks,<br>

Andy<br>

<br>

<div class="gmail_quote">On Wed, Feb 6, 2013 at 5:20 PM, Moreland, Kenneth <span dir="ltr">

&lt;<a href="mailto:kmorel@sandia.gov" target="_blank">kmorel@sandia.gov</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div style="font-size:14px;font-family:Calibri,sans-serif;word-wrap:break-word">

<div>

<div>

<div>This does not surprise me.  The current version of the netCDF reader only uses the basic interface for accessing files, which is basically a serial interface.  You are probably getting a lot of file request contention.</div>

<div><br>

</div>

<div>At the time I wrote the netCDF reader, parallel versions were just coming online.  I think it would be relatively straightforward to update the reader to use collective parallel calls from a parallel netCDF library.  Unfortunately, I have lost track on

 the status of the parallel netCDF library and file formats.  Last I looked, there were actually two parallel netCDF libraries and formats.  One version directly added collective parallel calls to the library.  The other changed the format to use hdf5 under

 the covers and use the parallel calls therein.  These two libraries use different formats for the files and I don&#39;t think are compatible with each other.  Also, it might be the case for one or both libraries that you cannot read the data in parallel if it

 was not written in parallel or written in an older version of netCDF.</div>

<div><br>

</div>

<div>-Ken</div>

<div>

<div><br>

</div>

</div>

</div>

</div>

<span>

<div style="border-right:medium none;padding-right:0in;padding-left:0in;padding-top:3pt;text-align:left;font-size:11pt;border-bottom:medium none;font-family:Calibri;border-top:#b5c4df 1pt solid;padding-bottom:0in;border-left:medium none">

<span style="font-weight:bold">From: </span>Andy Bauer &lt;<a href="mailto:andy.bauer@kitware.com" target="_blank">andy.bauer@kitware.com</a>&gt;<br>

<span style="font-weight:bold">Date: </span>Wednesday, February 6, 2013 10:38 AM<br>

<span style="font-weight:bold">To: </span>&quot;<a href="mailto:paraview@paraview.org" target="_blank">paraview@paraview.org</a>&quot; &lt;<a href="mailto:paraview@paraview.org" target="_blank">paraview@paraview.org</a>&gt;, Kenneth Moreland &lt;<a href="mailto:kmorel@sandia.gov" target="_blank">kmorel@sandia.gov</a>&gt;<br>

<span style="font-weight:bold">Subject: </span>[EXTERNAL] vtkNetCDFCFReader parallel performance<br>

</div>

<div>

<div>

<div><br>

</div>

<blockquote style="BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0 0 5">

<div>

<div>Hi Ken,<br>

<br>

I&#39;m having some performance issues with a fairly large NetCDF file using the vtkNetCDFCFReader. The dimensions of it are 768 lat, 1152 lon and 9855 time steps (no elevation dimension). It has one float variable with these dimensions -- pr(time, lat, lon). This

 results in a file around 33 GB. I&#39;m running on hopper and for small amounts of processes (at most 24 which is the number of cores per node) and the run time seems to increase dramatically as I add more processes. The tests I did read in the first 2 time steps

 and did nothing else. The results are below but weren&#39;t done too rigorously:<br>

<br>

numprocs -- time<br>

1  -- 1:22<br>

2 -- 1:52<br>

4 -- 7:52<br>

8 -- 5:34<br>

16 -- 10:46<br>

22 -- 10:37<br>

24 -- didn&#39;t complete on hopper&#39;s &quot;regular&quot; node with 32 GB of memory but I was able to run it in a reasonable amount of time on hopper&#39;s big memory nodes with 64 GB of memory.<br>

<br>

I have the data in a reasonable place on hopper. I&#39;m still playing around with settings (things get a bit better if I set DVS_MAXNODES --

<a href="http://www.nersc.gov/users/computational-systems/hopper/performance-and-optimization/hopperdvs/" target="_blank">

http://www.nersc.gov/users/computational-systems/hopper/performance-and-optimization/hopperdvs/</a>) but this seems a bit weird as I&#39;m not having any problems like this on a data set that has spatial dimensions of 17*768*1152 with 324 time steps.<br>

<br>

Any quick thoughts on this? I&#39;m still investigating but was hoping you could point out if I&#39;m doing anything stupid.<br>

<br>

Thanks,<br>

Andy<br>

<br>

<br>

</div>

</div>

</blockquote>

</div>

</div>

</span></div>

</blockquote>

</div>

<br>

<br>

<fieldset></fieldset> <br>

</div>

</div>

<pre>_______________________________________________

Powered by <a href="http://www.kitware.com" target="_blank">www.kitware.com</a>

Visit other Kitware open-source projects at <a href="http://www.kitware.com/opensource/opensource.html" target="_blank">http://www.kitware.com/opensource/opensource.html</a>

Please keep messages on-topic and check the ParaView Wiki at: <a href="http://paraview.org/Wiki/ParaView" target="_blank">http://paraview.org/Wiki/ParaView</a>

Follow this link to subscribe/unsubscribe:

<a href="http://www.paraview.org/mailman/listinfo/paraview" target="_blank">http://www.paraview.org/mailman/listinfo/paraview</a></pre>

</blockquote>

<br>

</div>

</blockquote>

</div>

<br>

</blockquote>

<br>

</div>

</div>

</div>

</blockquote>

</div>

<br>

</div>

</div>

</blockquote>

</div></div></span>

</div>

</blockquote></div><br>