View Issue Details Jump to Notes ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0012720ParaView(No Category)public2011-11-10 21:552012-02-08 17:22
ReporterAlan Scott 
Assigned ToUtkarsh Ayachit 
PriorityurgentSeverityminorReproducibilityhave not tried
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version3.12 
Target VersionFixed in Version3.14 
Summary0012720: CTH reads file 0 for all processes
DescriptionWe suspect that large Cray clusters are serializing access to single files when multiple pvservers are trying to access these single files. As we scale into the thousands of pvservers, we believe this is becoming fatal.

ParaView 3.12.0, remote server (I am using 8 processes), Linux client.
Although I am sure you can replicate with any cth dataset, I am doing the following:
* Make soft links (ln -s) to files spcta.0, spcta.1, spcta.2 and spcta.3 of Dave's big CTH AMR dataset (i.e., 256 files). Now, we have a 4 file subset of this dataset.
* strace -o $HOME/pvserver.strace -tt -f -ff -e trace=open,close,read,write
  - This will create a different file for each process. Do a ls -ls on these files, the smaller ones are not of interest, the larger are from lib/paraview3.12/pvserver. We care about the larger ones.
  - Note that 4 of them are slightly larger than the smaller ones. We care about these larger files.

Open each file in turn. Search for spcth. Notice that each file opens file 0 4 times, and then opens it's real file 2 times.

As stated, we believe that these 4 opens of file 0 are fatal for Cielo and possibly other cray systems.

This is a show stopper bug for Cielo going into production with expected size datasets.

I will send the log files to Utkarsh and Robert from my run. I am marking this as a crash, although technically it is a hang (or a glacier - take your pick).
TagsNo tags attached.
ProjectSandia
Topic Name12720_cth_reads_too_much
Typecrash
Attached Files

 Relationships
parent of 0012729closedUtkarsh Ayachit vtkFileSeriesReader's MTime is being changed in ProcessRequest for several readers. 

  Notes
(0027690)
Utkarsh Ayachit (administrator)
2011-11-14 17:20

commit 1c9d8ffd920503167e80bbbb457112aa268bfe64
Author: Utkarsh Ayachit <utkarsh.ayachit@kitware.com>
Date: Mon Nov 14 17:14:19 2011 -0500

    Fixed BUG 0012720. Minimize reads on satellites.
    
    All processes were reading first file to gather meta-data. This caused issues
    when running in parallel on large number of cores. Fixed by reading the file on
    root node and then broadcasting the gathered information to all nodes.
    
    Structured the code slightly to avoid processing of the meta-data when timesteps
    changed.
(0027718)
Utkarsh Ayachit (administrator)
2011-11-18 14:54

merged to master.
(0027878)
Alan Scott (manager)
2011-12-21 21:47

This appears to be working very well. The only concern I have is if we find that header info is different between files. So far, so good.

This increased read speeds an incredible amount. Nice.

Tested remote server, master, Linux.

 Issue History
Date Modified Username Field Change
2011-11-10 21:55 Alan Scott New Issue
2011-11-11 13:40 Utkarsh Ayachit Assigned To => Utkarsh Ayachit
2011-11-14 10:51 Utkarsh Ayachit Status backlog => todo
2011-11-14 10:51 Utkarsh Ayachit Status todo => active development
2011-11-14 17:20 Utkarsh Ayachit Topic Name => 12720_cth_reads_too_much
2011-11-14 17:20 Utkarsh Ayachit Note Added: 0027690
2011-11-14 17:20 Utkarsh Ayachit Status active development => gatekeeper review
2011-11-14 17:20 Utkarsh Ayachit Fixed in Version => git-next
2011-11-14 17:20 Utkarsh Ayachit Resolution open => fixed
2011-11-15 13:50 Utkarsh Ayachit Relationship added parent of 0012729
2011-11-18 14:53 Utkarsh Ayachit Fixed in Version git-next => git-master
2011-11-18 14:54 Utkarsh Ayachit Status gatekeeper review => customer review
2011-11-18 14:54 Utkarsh Ayachit Note Added: 0027718
2011-12-21 21:47 Alan Scott Note Added: 0027878
2011-12-21 21:47 Alan Scott Status customer review => closed
2012-02-08 17:22 Utkarsh Ayachit Fixed in Version git-master => 3.14


Copyright © 2000 - 2018 MantisBT Team