[Paraview] [EXTERNAL] vtkNetCDFCFReader parallel performance

Thu Feb 21 17:03:33 EST 2013

I agree about the dependencies. I'm a bit worried about dealing with a
vtkPNetCDFReader and a vtkPNetCDFCFReader but for parallel performance for
climate data sets I think it's important enough to update it.

Thanks,
Andy

On Thu, Feb 21, 2013 at 4:46 PM, Moreland, Kenneth <kmorel at sandia.gov>wrote:

>   Both of these sound fine time.  The only caveat is that (1) is a
> parallel-specific improvement that requires communication.  I don't
> remember the dependencies of vtkNetCDFCFReader or the library it is in, but
> I would hesitate changing the dependencies.  It might be necessary to make
> a vtkPNetCDFCFReader subclass.
>
>  -Ken
>
>   From: Andy Bauer <andy.bauer at kitware.com>
> Date: Thursday, February 21, 2013 12:38 PM
> To: Burlen Loring <bloring at lbl.gov>
> Cc: Kenneth Moreland <kmorel at sandia.gov>, "paraview at paraview.org" <
> paraview at paraview.org>
> Subject: Re: [Paraview] [EXTERNAL] vtkNetCDFCFReader parallel performance
>
>   I've been looking at the code and there's a bunch of small reads in
> order to get all of the meta-data and set things up according to the CF
> conventions. I'm going through now just making process 0 do this work and
> broadcast it. I'm getting decent speedups on this for 240 processes on
> hopper where the runs go from taking about 58 seconds down to 38 seconds.
>
> Any objections to some significant refactoring of the reader? The 2 things
> I want to try are:
>
> 1) read in the meta data on process 0 and broadcast to the other processes
>
> 2) reduce the amount of file opens and closes in the reader at the expense
> of keeping the file pointer open.
>
> Thanks,
> Andy
>
> On Thu, Feb 7, 2013 at 4:33 PM, Burlen Loring <bloring at lbl.gov> wrote:
>
>>  Hi Andy,
>>
>> data that small should be fairly fast, and nersc's global scratch
>> shouldn't blink when 24 procs access file in read only mode. maybe PV is
>> reading all the data on a single process(or worse all of them) then doing a
>> redistribution behind the scenes?? That would certainly explain your
>> results. either way good luck.
>>
>> Burlen
>>
>>
>> On 02/07/2013 09:51 AM, Andy Bauer wrote:
>>
>> Hi Burlen,
>>
>> I got the data from a different user and that's where he put the data. I
>> thought about copying it to $SCRATCH. I just thought though that it was
>> really funky that trying to read in data that was under 4 MB for a single
>> time step should be pretty fast for when I only have 24 processes asking
>> for data. I was thinking that using the scratch space would just be
>> covering up some deeper problem too in that I want to scale up to much more
>> than 24 processes. After all, any run that can't scale beyond 24 processes
>> shouldn't be running on Hopper anyways!
>>
>> Andy
>>
>> On Thu, Feb 7, 2013 at 11:57 AM, Burlen Loring <bloring at lbl.gov> wrote:
>>
>>>  Hi Andy,
>>>
>>> do you have a strong reason for using the global scratch fs? if not you
>>> may have better luck using hopper's dedicated lustre scratch. Spec quote >
>>> 2x bandwidth[*]. In reality I'm sure it depends on the number of user's
>>> hammering it at the time in question. may help to use lustre scratch while
>>> you're working on parallelization of the netcdf readers.
>>>
>>> Burlen
>>>
>>> *
>>> http://www.nersc.gov/users/computational-systems/hopper/file-storage-and-i-o/
>>>
>>>
>>> On 02/06/2013 03:35 PM, Andy Bauer wrote:
>>>
>>>  Hi Ken,
>>>
>>> I think it's more than just a file contention issue. On hopper at nersc I
>>> did set DVS_MAXNODES to 14 and that helped out a lot. Without that set
>>> before I was able to run with 480 processes accessing the same data file
>>> (the 17*768*1152 with 324 time steps data set) but with the "bad" one
>>> that was 768*1152 with 9855 time steps I had problems with just 24
>>> processes.
>>>
>>> I have some things which I want to try out but I think you're right that
>>> using a parallel netcdf library should help a lot, if it doesn't cause
>>> conflicts.
>>>
>>> Thanks,
>>> Andy
>>>
>>> On Wed, Feb 6, 2013 at 5:20 PM, Moreland, Kenneth <kmorel at sandia.gov>wrote:
>>>
>>>>   This does not surprise me.  The current version of the netCDF reader
>>>> only uses the basic interface for accessing files, which is basically a
>>>> serial interface.  You are probably getting a lot of file request
>>>> contention.
>>>>
>>>>  At the time I wrote the netCDF reader, parallel versions were just
>>>> coming online.  I think it would be relatively straightforward to update
>>>> the reader to use collective parallel calls from a parallel netCDF library.
>>>>  Unfortunately, I have lost track on the status of the parallel netCDF
>>>> library and file formats.  Last I looked, there were actually two parallel
>>>> netCDF libraries and formats.  One version directly added collective
>>>> parallel calls to the library.  The other changed the format to use hdf5
>>>> under the covers and use the parallel calls therein.  These two libraries
>>>> use different formats for the files and I don't think are compatible with
>>>> each other.  Also, it might be the case for one or both libraries that you
>>>> cannot read the data in parallel if it was not written in parallel or
>>>> written in an older version of netCDF.
>>>>
>>>>  -Ken
>>>>
>>>>    From: Andy Bauer <andy.bauer at kitware.com>
>>>> Date: Wednesday, February 6, 2013 10:38 AM
>>>> To: "paraview at paraview.org" <paraview at paraview.org>, Kenneth Moreland <
>>>> kmorel at sandia.gov>
>>>> Subject: [EXTERNAL] vtkNetCDFCFReader parallel performance
>>>>
>>>>   Hi Ken,
>>>>
>>>> I'm having some performance issues with a fairly large NetCDF file
>>>> using the vtkNetCDFCFReader. The dimensions of it are 768 lat, 1152 lon and
>>>> 9855 time steps (no elevation dimension). It has one float variable with
>>>> these dimensions -- pr(time, lat, lon). This results in a file around 33
>>>> GB. I'm running on hopper and for small amounts of processes (at most 24
>>>> which is the number of cores per node) and the run time seems to increase
>>>> dramatically as I add more processes. The tests I did read in the first 2
>>>> time steps and did nothing else. The results are below but weren't done too
>>>> rigorously:
>>>>
>>>> numprocs -- time
>>>> 1  -- 1:22
>>>> 2 -- 1:52
>>>> 4 -- 7:52
>>>> 8 -- 5:34
>>>> 16 -- 10:46
>>>> 22 -- 10:37
>>>> 24 -- didn't complete on hopper's "regular" node with 32 GB of memory
>>>> but I was able to run it in a reasonable amount of time on hopper's big
>>>> memory nodes with 64 GB of memory.
>>>>
>>>> I have the data in a reasonable place on hopper. I'm still playing
>>>> around with settings (things get a bit better if I set DVS_MAXNODES --
>>>> http://www.nersc.gov/users/computational-systems/hopper/performance-and-optimization/hopperdvs/)
>>>> but this seems a bit weird as I'm not having any problems like this on a
>>>> data set that has spatial dimensions of 17*768*1152 with 324 time steps.
>>>>
>>>> Any quick thoughts on this? I'm still investigating but was hoping you
>>>> could point out if I'm doing anything stupid.
>>>>
>>>> Thanks,
>>>> Andy
>>>>
>>>>
>>>>
>>>
>>>
>>>  _______________________________________________
>>> Powered by www.kitware.com
>>>
>>> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>>>
>>> Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView
>>>
>>> Follow this link to subscribe/unsubscribe:http://www.paraview.org/mailman/listinfo/paraview
>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.paraview.org/pipermail/paraview/attachments/20130221/08415c04/attachment.htm>