[Paraview] Parallel file formats (again)... (UNCLASSIFIED)

Fri Jun 13 16:19:08 EDT 2008

You are right: write 1 XML file that references all HDF5 files. For example, process 0 writes 1 XML file with 4096 grids, each describing the HDF5 file from one process.

Kent
Pittsburgh Supercomputing Center

Renato N. Elias wrote:
> Hi Jerry,
> 
> As I described in my last email, I managed to compile and link the 
> Fortran example supplied with the XDMF library (in fact, I changed the 
> example a little bit writing a Fortran interface explicitly to call the 
> extern "C" routine name). It gave me 10 XDMF and 10 HDF5 files. Each 
> XDMF pointing to a corresponding HDF5. Ok, everything easy and 
> running... now, I'm guessing, from a naive approach, that I could just 
> follow the same example for a parallel running where each process, 
> having its own piece of the domain, would write its data in pairs of 
> files XDMF/HDF5. Thus, for 2 processes, I'd have something like:
> 
> Iteration 1
> Demo_00001.00.xmf  --> Demo_00001.00.h5
> Demo_00001.01.xmf --> Demo_00001.01.h5
> 
> Iteration 2
> Demo_00002.00.xmf  --> Demo_00002.00.h5
> Demo_00002.01.xmf --> Demo_00002.01.h5
> ...
> and so on
> 
> Maybe, all 20 XDMF files in this example could be replaced by just one 
> containing all iterations and processes information, but, I haven't 
> found it explained on XDMF's wiki.
> 
> Do you have a simple example of parallel XDMF file? Moreover, Is 
> ParaView able to load transient parallel files in XDMF?
> 
> thanks
> 
> Renato.
> 
> 
> Clarke, Jerry (Civ, ARL/CISD) wrote:
>> Classification:  UNCLASSIFIED Caveats: NONE
>>
>> Renato,
>>
>> www.arl.hpc.mil/ice doesn't exist anymore.
>>
>> Let me know what you are trying to do and I can provide an example.
>> If you're trying to work in parallel, you probably want to use
>> <Grid GridType="Collection" CollectionType="Spatial" ...
>>     <Grid Name="Grid from node 0"
>>     <Grid Name="Grid from node 1"
>>     etc.
>> </Grid>
>>
>> For the non 0 started arrays, there is an Offset="1" option in the XML
>>
>> As for the row major / column major arrays, I link C++ to Fortran, but
>> we have been talking about putting a transpose method in the DataItem
>> object.
>>
>>
>> Jerry Clarke
>>
>> -----Original Message-----
>> From: paraview-bounces at paraview.org
>> [mailto:paraview-bounces at paraview.org] On Behalf Of Renato N. Elias
>> Sent: Friday, June 13, 2008 1:55 PM
>> To: Dominik Szczerba
>> Cc: paraview at paraview.org
>> Subject: Re: [Paraview] Parallel file formats (again)...
>>
>>
>> It seems that XDMF is the unique way to go if we'd like to get parallel
>> data into ParaView using HDF5. No problem since XDMF can be seen as a
>> HDF5 extension. In fact, I thought about doing exactly what you cited --
>> use the HDF5 API and write simple XML/XDMF files using Fortran, no
>> matter if my data is heavy or light.
>>
>> The problem with XDMF is the lack of information. The first link pointed
>> by Google is always down (www.arl.hpc.mil/ice/) and the "official" wiki
>> site doesn't offer so much. I already got something on doing Fortran
>> talk with XDMF (we can also write interfaces from the Fortran side to
>> talk with C-like routine names) but, now, I'd like to go a bit further
>> and use parallelism but there's no examples covering the subject using
>> Fortran. My chance is debugging C++ examples and try to make some
>> correlation.
>>
>> Regarding row-major order and non 0 started arrays I can't say anything.
>>
>> I only say that, for Fortran programmers, it's getting a bit harder to
>> work without having to deal with C++ and all that OOP stuffs. In this
>> sense, I love Metis, so powerful, so easy, so fast, so simple and
>> everything written in C ANSI. Just minor efforts to get it working with
>> Fortran. As we say in Brazil, sometimes people like to kill cockroaches
>> using bazookas instead of flip-flops... for writing files*my guess* is
>> that straight C would do the job nicely
>>
>> Dominik, the problem with Fortran is that everybody associates it with
>> 77 (just that old programming language). Maybe, they should change the
>> name of the language from Fortran 2003 to F++ ;oP
>>
>> Renato.
>>
>> Dominik Szczerba wrote:
>>  
>>> And how would he handle hard-coded row-major ordering in XDMF?
>>> -- Dominik
>>>
>>> Chris Kees wrote:
>>>    
>>>> You might want to reconsider XDMF or something based on it. I'm not 
>>>> sure that XDMF is significantly harder to implement in fortran than 
>>>> straight HDF5. It's just a matter  of doing some additional text i/o 
>>>> on a relatively simple XML file. XDMF splits the data (with some
>>>> redundancy) into light/meta data stored as simple XML (ascii) file 
>>>> and an HDF5 archive of the "heavy" data.  You can read and write the 
>>>> XML file directly from fortran without using the XDMF library and
>>>> then use the HDF5 fortran API directly to write the heavy data.   You
>>>>       
>>
>>  
>>>> have the option of storing the heavy data in the XML file as text 
>>>> when HDF5 isn't available (or when debugging/running on small
>>>> data).   To me it looks like the posts you cite are pointing in this
>>>>       
>>
>>  
>>>> direction though they were unhappy with some aspects of XDMF.  It's 
>>>> not clear to me whether it's the XDMF xml format, the documentation 
>>>> of that format, or the C API that needs work in order to make it more
>>>>       
>>
>>  
>>>> useful.
>>>> Also, it sounds like you've already decided against a mixed language 
>>>> approach, but the the book by H. P. Langtangen "Python Scripting for 
>>>> Computational Science" advocates a fortran/python pairing to deal 
>>>> with some of your  general concerns.
>>>> Chris
>>>> *  *
>>>> On Jun 13, 2008, at 7:58 AM, Renato N. Elias wrote:
>>>>
>>>>      
>>>>> Can anyone shed some light above how is the support status for 
>>>>> parallel file formats in ParaView?
>>>>>
>>>>> In my lab most of the students still work with Fortran. It seems 
>>>>> that "the universe nowadays only speaks C++ (and Python for 
>>>>> scripting)" which force us to do an extensive evaluation for a good 
>>>>> and well supported parallel file format to invest before struggling 
>>>>> with all that mixed languages interface/wrapping annoyances (not 
>>>>> everybody working with programs are programmers, there's still some 
>>>>> engineers like civil, mechanical, chemical, etc... doing
>>>>>         
>> science...).
>>  
>>>>> I could say that our my concerns about choosing a file format to 
>>>>> sticky with is:
>>>>>
>>>>> -- Easiness for installation and use (in this sense, Ensight is 
>>>>> wonderful since we don't need extra libraries. It's insane when we 
>>>>> need to compile 50 MB of libraries to link with a 2 MB program that 
>>>>> uses just one routine of such library);
>>>>> -- Easiness for interfacing (most of the libraries nowadays is 
>>>>> written in C++ for C++ programmers which discourage its use by C and
>>>>>         
>>
>>  
>>>>> Fortran programs. Ok, we can always spend some time in interfacing 
>>>>> it, but, a library should offer more functionality and flexibility 
>>>>> than annoyances)
>>>>> -- Portability.
>>>>>
>>>>> Some time ago there was some interesting posts from Jean Favre and 
>>>>> Dominic about this, which give us some overview about the subject.
>>>>>
>>>>> http://www.paraview.org/pipermail/paraview/2008-May/008070.html
>>>>> http://www.paraview.org/pipermail/paraview/2008-May/008071.html
>>>>>
>>>>> My 2 cents for the discussion, *from a Fortran perspective*, is:
>>>>>
>>>>> 1). ENSIGHT:
>>>>> 1.1. Quite simple to implement and use (no need for extra libraries 
>>>>> and all that stuff. Just a few Fortran statements do the job); 1.2. 
>>>>> Implicit support for transient data and parallelism; 1.3. Depending 
>>>>> on the number of processes we might have a huge number of 
>>>>> small/medium files since each point and cell data variable is stored
>>>>>         
>>
>>  
>>>>> in one file (sometimes it can be a serious problem); 1.4. Not 
>>>>> compressed (too bad); 1.5. Not so well supported *as a parallel 
>>>>> format* by ParaView yet.
>>>>> After the change to deal (after PV 2.2.1) with multigroup datasets 
>>>>> some functionalities were lost until reimplementation.
>>>>> 1.6. Supported by ParaView, Visit and Ensight (of course)
>>>>>
>>>>> 2). XML/VTK:
>>>>> 2.1. Almost impossible for a Fortran user to implement, so, we're 
>>>>> forced to interface with VTK in order to write something; 2.2. Time 
>>>>> series support has been introduced in some sense ;o) 2.3. It's a bit
>>>>>         
>>
>>  
>>>>> complicated to understand. Ok, it's XML and we should use it (and 
>>>>> believe on it ;o) ) through some library, so, it's not supposed to 
>>>>> "hand-implementation"; 2.4. Encoding/compression is supported (which
>>>>>         
>>
>>  
>>>>> is really good) 2.5. It should be the most well parallel file format
>>>>>         
>>
>>  
>>>>> supported by ParaView (after EXODUS, maybe) 2.6. Only supported by 
>>>>> VTK based softwares (ParaView, Visit, MayaVi)
>>>>>
>>>>> 3). XDMF/HDF5:
>>>>> 3.1. Same as 2.1, 2.2 and 2.3
>>>>> 3.2. The website describing the library is a bit down lately...
>>>>> 3.3. HDF5 seems a very promising file format. It has some 
>>>>> development concern about its use by other scientific languages 
>>>>> besides being flexible, compressed, cross platform, etc... .
>>>>> 3.4. From my knowledge, XDMF is supported by Ensight, ParaView and 
>>>>> Visit also --> not sure about how good is that support.
>>>>>
>>>>> 4). EXODUS II:
>>>>> 4.1. Same as 2.1 --> I already tried more than once to find 
>>>>> something about Exodus format. There's a good documentation in 
>>>>> SANDIA/SEACAS page but the library is not open source (it's a 
>>>>> license based distribution) which turns it a bit complicated to 
>>>>> adopt; 4.2. Nothing to say about timea nd compression support since 
>>>>> I never used it; 4.3. It must be well supported by PV since it's a 
>>>>> Sandia's format;
>>>>>
>>>>> regards
>>>>>
>>>>> Renato.