|
(0015083)
|
|
Alan Scott
|
|
2009-02-17 16:33
|
|
Here is the e-mail. It is probably clearer than I have been reporting this bug.
Sorry for the slow/minimal response, I'm out of the office... I'll double-check that I am using the "bulk" version of the exodus call that fetches the most possible information at once (I thought I already was). If that is still too slow, then we'll have to talk to Greg Sjaardema and the other Exodus library developers to see if there's anything to be done... it would be hard/impossible/(or at least unsightly) for the exodus reader to make netcdf calls directly.
David
OK, I have taken a look at this but the news is not terribly good.
Basically, the Exodus API does not provide a way to fetch information about all results variables defined over all the blocks at once (there is a method for writing concatenated block information, but not reading it). Even if it were to do so, NetCDF just isn't very clever about storing array dimensions: they are all indexed by strings instead of numbers and the strings are stored in what appears to be a linear array that is not sorted. I don't know that a lot can be done without changing the Exodus API at a minimum and more likely the NetCDF implementation.
I haven't thoroughly audited the NetCDF code, but assuming that the NC_dimarray struct isn't used directly in a ton of places, turning it into a hashtable could speed things up a good deal. This would not require an API change but would be a solid day or two of work for some poor soul. I don't have a day or two to do this right now but would be willing to work with someone to define the task a little more clearly.
David
> I think I have a bit of a handle on the performance problems of
> ParaView with a lot of blocks, and it isn't very pretty. I am looking
> specifically at ParaView's Exodus reader.
>
> There really are two places to focus:
> * At RequestInformation time
> * After hitting the Apply button.
>
> The following analysis has to do with the first bullet, when ParaView
> is reading in initial header information.
>
> My data is as follows:
> * single exodus file
> * About 25,000 blocks. (many.e has 1500)
> * About 60,000 attributes, nodes, side sets, face vars, etc.
>
> * The first problem occurs in
> vtkExodusIIReaderPrivate::RequestInformation(), in the line that reads
> for ( obj = 0; obj < nids; ++obj ) (about line 3746). nids == number
> of blocks. It calls into ex_get_block, which makes 5 calls into
> ncdimid(), which calls nc_inq_dimid(), which calls NC_finddim()
>
> NC_finddim() has the following loop in it, which runs completey
> through if it is looking for a variable that doesn't exist. It runs
> through this loop about 60,000 times.
>
> for(; (size_t) dimid < ncap->nelems
> && (strlen((*loc)->name->cp) != slen
> || strncmp((*loc)->name->cp, name, slen) != 0);
> dimid++, loc++)
> {
> /*EMPTY*/
> }
>
> So, we have a n^2 problem, with string compares at the bottom of the n^2.
>
> A few ideas that I have tried that seemed to help (but not enough) is:
> * in NC_finddim(), you are often looking for the next variable in the
> list, such as the following:
> - num_el_in_blk3
> - num_nod_per_el3
> - num_att_in_blk3
> * You are also sometimes looking for the same variable a second time.
> For instance, I am getting calls to ncdimid() from ex_get_block() and
> ex_get_attr_names(). These are both found inside of the for loop
> mentioned above in RequestInformation().
> * I put code in to remember where it had looked last, and try the next one.
> This obviously fails if the variable isn't in the list. This hack
> hack is as follows, in NC_finddim():
> if( (size_t) (dimLast + 1) < ncap->nelems
> && (strlen((*(loc+dimLast+1))->name->cp) == slen)
> && (strncmp((*(loc+dimLast+1))->name->cp, name, slen) == 0))
> {
> loc = loc+dimLast+1;
> dimid = dimLast+1;
> }
> else if( (size_t) dimLast < ncap->nelems
> && (strlen((*(loc+dimLast))->name->cp) == slen)
> && (strncmp((*(loc+dimLast))->name->cp, name, slen) == 0))
> {
> loc = loc+dimLast;
> dimid = dimLast;
> }
> else
> {
> for(; (size_t) dimid < ncap->nelems
> && (strlen((*loc)->name->cp) != slen
> || strncmp((*loc)->name->cp, name, slen) != 0);
> dimid++, loc++)
> {
> /*EMPTY*/
> }
> }
>
> if((size_t)dimid >= ncap->nelems)
> return(-1); /* not found */
>
> dimLast = dimid;
>
>
> * Next, I tried to not do work that isn't needed. In
> RequestInformation (vtkExodusIIReader.cxx, about line 3740), the
> function call looks like
> this:
> VTK_EXO_FUNC( ex_get_block( exoid, obj_types[i], ids[obj],
> obj_typenames[obj],
> &binfo.Size, &binfo.BdsPerEntry[0], &binfo.BdsPerEntry[1],
> &binfo.BdsPerEntry[2], &binfo.AttributesPerEntry ),
> "Could not read block params." );
>
> * Since we know beforehand if BdsPerEntry[0] is neede (it ==
> ModelParameters.num_nodes), and BdsPerEntry[1]
> (ModelParameters.num_edge) and BdsPerEntry[2]
> (ModelParameters.num_face), send 0 into these functions if we have no data. Thus, the underlying code isn't hunting for it.
>
>
>>>> In my opinion, what really needs to be done is fourfold: 1) not ask
>>>> for information that we don't need, 2) figure out if we need a
>>>> variable type once, rather than once per block, 3) figure out a non
>>>> n^2 method of searching the exodus header. A hash table or sorted
>>>> list come to mind. 4) Not be comparing strings that don't have
>>>> fixed lengths. This is a call to strlen, which is just a for() loop across each character.
>
>
> * As far as after you hit the Apply button, it takes LOTS of time - at
> least some of which is in NC_findvar (var.c), once gain probably in an
> n^2 algorithm, and then starts to grow memory until it goes over 2
> gbytes of memory used and dies. I can look into this further if you
> like. Realize that it is slow getting into here!
>
>
> Let me know if this is what you were looking for,
>
> Alan
>
>
> |
|
|
(0019455)
|
|
Utkarsh Ayachit
|
|
2010-02-04 14:53
|
|
PERF: Trying to improve the performance of vtkPVGeometryFilter when dealing with
large number of blocks. This includes following fixes:
* Restructure vtkAppendPolyData and vtkDataSetSurfaceFilter so that they can be
called directly as internal implementations rather than using the pipeline.
This avoids garbage collection related issues since it avoids reference loops
between trivial producer and it's data-object.
* vtkPVGeometryFilter now resolves partial arrays after surface extraction. This
was a no-brainer. Geometry is always smaller than the input, why in the world
would be fill arrays in the input?
* Remved unnecessary shallow copies, all of those lead to garbage collection
related slow downs.
* vtkPVGeometryFilter directly calls vtkAppendPolyData and
vtkDataSetSurfaceFilter using them as implementations rather than algorithms
to avoid pipeline related slow downs.
/cvsroot/ParaView3/ParaView3/VTK/Graphics/vtkAppendPolyData.h,v <-- VTK/Graphics/vtkAppendPolyData.h
new revision: 1.61; previous revision: 1.60
/cvsroot/ParaView3/ParaView3/VTK/Graphics/vtkAppendPolyData.cxx,v <-- VTK/Graphics/vtkAppendPolyData.cxx
new revision: 1.108; previous revision: 1.107
/cvsroot/ParaView3/ParaView3/VTK/Graphics/vtkDataSetSurfaceFilter.h,v <-- VTK/Graphics/vtkDataSetSurfaceFilter.h
new revision: 1.29; previous revision: 1.28
/cvsroot/ParaView3/ParaView3/VTK/Graphics/vtkDataSetSurfaceFilter.cxx,v <-- VTK/Graphics/vtkDataSetSurfaceFilter.cxx
new revision: 1.73; previous revision: 1.72
/cvsroot/ParaView3/ParaView3/Servers/Filters/vtkPVGeometryFilter.h,v <-- Servers/Filters/vtkPVGeometryFilter.h
new revision: 1.48; previous revision: 1.47
/cvsroot/ParaView3/ParaView3/Servers/Filters/vtkPVGeometryFilter.cxx,v <-- Servers/Filters/vtkPVGeometryFilter.cxx
new revision: 1.99; previous revision: 1.98 |
|