View Issue Details [ Jump to Notes ] | [ Print ] | ||||||||
ID | Project | Category | View Status | Date Submitted | Last Update | ||||
0012720 | ParaView | (No Category) | public | 2011-11-10 21:55 | 2012-02-08 17:22 | ||||
Reporter | Alan Scott | ||||||||
Assigned To | Utkarsh Ayachit | ||||||||
Priority | urgent | Severity | minor | Reproducibility | have not tried | ||||
Status | closed | Resolution | fixed | ||||||
Platform | OS | OS Version | |||||||
Product Version | 3.12 | ||||||||
Target Version | Fixed in Version | 3.14 | |||||||
Summary | 0012720: CTH reads file 0 for all processes | ||||||||
Description | We suspect that large Cray clusters are serializing access to single files when multiple pvservers are trying to access these single files. As we scale into the thousands of pvservers, we believe this is becoming fatal. ParaView 3.12.0, remote server (I am using 8 processes), Linux client. Although I am sure you can replicate with any cth dataset, I am doing the following: * Make soft links (ln -s) to files spcta.0, spcta.1, spcta.2 and spcta.3 of Dave's big CTH AMR dataset (i.e., 256 files). Now, we have a 4 file subset of this dataset. * strace -o $HOME/pvserver.strace -tt -f -ff -e trace=open,close,read,write - This will create a different file for each process. Do a ls -ls on these files, the smaller ones are not of interest, the larger are from lib/paraview3.12/pvserver. We care about the larger ones. - Note that 4 of them are slightly larger than the smaller ones. We care about these larger files. Open each file in turn. Search for spcth. Notice that each file opens file 0 4 times, and then opens it's real file 2 times. As stated, we believe that these 4 opens of file 0 are fatal for Cielo and possibly other cray systems. This is a show stopper bug for Cielo going into production with expected size datasets. I will send the log files to Utkarsh and Robert from my run. I am marking this as a crash, although technically it is a hang (or a glacier - take your pick). | ||||||||
Tags | No tags attached. | ||||||||
Project | Sandia | ||||||||
Topic Name | 12720_cth_reads_too_much | ||||||||
Type | crash | ||||||||
Attached Files | |||||||||
Relationships | ||||||
|
Relationships |
Notes | |
(0027690) Utkarsh Ayachit (administrator) 2011-11-14 17:20 |
commit 1c9d8ffd920503167e80bbbb457112aa268bfe64 Author: Utkarsh Ayachit <utkarsh.ayachit@kitware.com> Date: Mon Nov 14 17:14:19 2011 -0500 Fixed BUG 0012720. Minimize reads on satellites. All processes were reading first file to gather meta-data. This caused issues when running in parallel on large number of cores. Fixed by reading the file on root node and then broadcasting the gathered information to all nodes. Structured the code slightly to avoid processing of the meta-data when timesteps changed. |
(0027718) Utkarsh Ayachit (administrator) 2011-11-18 14:54 |
merged to master. |
(0027878) Alan Scott (manager) 2011-12-21 21:47 |
This appears to be working very well. The only concern I have is if we find that header info is different between files. So far, so good. This increased read speeds an incredible amount. Nice. Tested remote server, master, Linux. |
Notes |
Issue History | |||
Date Modified | Username | Field | Change |
2011-11-10 21:55 | Alan Scott | New Issue | |
2011-11-11 13:40 | Utkarsh Ayachit | Assigned To | => Utkarsh Ayachit |
2011-11-14 10:51 | Utkarsh Ayachit | Status | backlog => todo |
2011-11-14 10:51 | Utkarsh Ayachit | Status | todo => active development |
2011-11-14 17:20 | Utkarsh Ayachit | Topic Name | => 12720_cth_reads_too_much |
2011-11-14 17:20 | Utkarsh Ayachit | Note Added: 0027690 | |
2011-11-14 17:20 | Utkarsh Ayachit | Status | active development => gatekeeper review |
2011-11-14 17:20 | Utkarsh Ayachit | Fixed in Version | => git-next |
2011-11-14 17:20 | Utkarsh Ayachit | Resolution | open => fixed |
2011-11-15 13:50 | Utkarsh Ayachit | Relationship added | parent of 0012729 |
2011-11-18 14:53 | Utkarsh Ayachit | Fixed in Version | git-next => git-master |
2011-11-18 14:54 | Utkarsh Ayachit | Status | gatekeeper review => customer review |
2011-11-18 14:54 | Utkarsh Ayachit | Note Added: 0027718 | |
2011-12-21 21:47 | Alan Scott | Note Added: 0027878 | |
2011-12-21 21:47 | Alan Scott | Status | customer review => closed |
2012-02-08 17:22 | Utkarsh Ayachit | Fixed in Version | git-master => 3.14 |
Issue History |
Copyright © 2000 - 2018 MantisBT Team |