[Paraview] Failing to connect to an MPI server in certain cases
seanzig at users.sourceforge.net
Thu Feb 21 18:31:08 EST 2008
Well, it seems to be working now. My only guess is that it had
something to do with the X server. I compiled with regular OpenGL (not
Mesa), and used the --use-offscreen-rendering switch, so I assume it was
trying to use pbuffers or likewise that still require access to an X
server. (I did this on purpuse as I do actually want this capability
eventually, but for now I don't have remote X access to each machine)
At first it was working intermittently. When it did work it waited
several seconds, then told me that remote rendering would be disabled
(which is fine).
After that, I explicitly cleared the DISPLAY env var in the batch
script. Now it connects instantly (still gives the remote rendering
disabled) and works every time.
Perhaps it was a comm. timeout. While the server process was waiting
for a non-responsive X-server, perhaps the client gave up?
Moreland, Kenneth wrote:
> Offhand I know of nothing that should cause your problem. We use pbs to launch jobs on our vis clusters and it works fine. We do have to use reverse connections because (1) outside computers cannot make connections to the cluster nodes and (2) we do not know where the server is going to be allocated anyway.
> Do you have any information the pvserver job? Do you have its output? Is it exiting normally or crashing? Is there any chance you could run it in a debugger?
>> -----Original Message-----
>> From: paraview-bounces+kmorel=sandia.gov at paraview.org [mailto:paraview-
>> bounces+kmorel=sandia.gov at paraview.org] On Behalf Of Sean Ziegeler
>> Sent: Thursday, February 21, 2008 1:38 PM
>> To: ParaView
>> Subject: [Paraview] Failing to connect to an MPI server in certain cases
>> We use MPI across a grid of Linux x86_64 workstations. I've compiled PV
>> 3.2.1 with OpenMPI, and it works fine if I use plain-old mpirun.
>> However, if I submit a job through GridEngine (to do load balancing for
>> everyone), it runs the server ok, but I can't connect to it. I get the
>> following errors:
>> ERROR: In
>> line 67
>> vtkServerConnection (0x159b0d0): Server Connection Closed!
>> ERROR: In
>> line 351
>> vtkServerConnection (0x159b0d0): Server could failed to gather
>> Submitting a parallel job via a batch queue system can affect the
>> environment variables and such, but I would think pvserver would simply
>> fail to execute. I'm looking in the code around where those errors
>> message occur, but I can't find anything obviously wrong. Anyone have
>> any ideas? Anyone know what those error messages tend to indicate other
>> than a general communication failure?
>> ParaView mailing list
>> ParaView at paraview.org
More information about the ParaView