MantisBT - ParaView
View Issue Details
0012087ParaViewBugpublic2011-04-14 14:062011-04-25 12:02
Burlen 
David Partyka 
normalcrashalways
closedfixed 
 
3.12 
0012087: vtkSocket does not restart interupted call
Hi,

While installing ParaView on Nautilus,
http://www.nics.tennessee.edu/computing-resources/nautilus, [^] I hit a bug
in vtkSocket that prevents ParaView from running on this machine. While
tracking this down I uncovered a couple related issues.

The main issue is that vtkSocket does not handle EINTR. EINTR occurs
when a signal is caught by the application during a blocking socket
call. While ParaView does not make use of signals they are used for
asynchronous communication by some SGI specific libraries on Nautilus
that are linked in with SGI MPI. Because Rank 0 pvserver spends quite a
bit of its time blocked in socket calls it only takes a few 10s of
seconds for EINTR to occur. When faced with EINTR ParaView silently
exits leaving the user wondering what the heck happened. Which brings me
to the second issue, a lack of error reporting in vtkSocket.

To solve the first issue vtkSocket has to handle EINTR. How EINTR should
be handled depends on the specific socket call. For all calls except
connect the call can simply be restarted. For EINTR during connect one
can't restart the call on all unix, so instead one must block in a
select call when connect fails with EINTR. To be portable across Unix
one should handle EINTR in all socket calls, even simple ones like
set/getsockopt.

The second issue of error reporting applies to all socket related errors
in general, my feeling is that when a socket call fails vtkSocket should
print a message using vtkErrorMacro, errno, and strerror(or windows
equivalent) at the point of failure. I think this should be done inside
vtkSocket because this is the only place one can safely assume errno has
relevant information and vtkSocket has been implemented returning a
single error code, -1, so that returning the real error code would
change the API and break existing code, including ParaView. Not to
mention that the values for error codes are apparently different on
windows and unix.

I took a stab at fixing these issues, patches attached. I tested them on
my workstation, nautilus, and laptop running xp. I ran a dashboard on my
linux workstation and didn't see any related issues. Would someone at KW
mind taking a look at the changes and see if it could be made permanent?

By the way after testing all socket calls for error returns I uncovered
a third bug, vtkSocket::Close didn't set the descriptor ivar to -1 which
resulted in vtkSocket::~vtkSocket calling close on a closed socket. Not
a disasterous error, but this reinforces my opinion that the returns
should be tested and error messages printed.

Thanks
Burlen

Run ParaView on Nautilus.
No tags attached.
patch vtkSocket.patch (18,085) 2011-04-14 14:06
https://www.vtk.org/Bug/file/8819/vtkSocket.patch
Issue History
2011-04-14 14:06BurlenNew Issue
2011-04-14 14:06BurlenFile Added: vtkSocket.patch
2011-04-14 14:12David PartykaAssigned To => David Partyka
2011-04-14 14:12David PartykaStatusbacklog => tabled
2011-04-14 15:47David PartykaNote Added: 0026181
2011-04-14 15:47David PartykaStatustabled => @80@
2011-04-14 15:47David PartykaFixed in Version => 3.12
2011-04-14 15:47David PartykaResolutionopen => fixed
2011-04-15 09:28David PartykaNote Added: 0026183
2011-04-15 09:28David PartykaStatus@80@ => @20@
2011-04-15 09:28David PartykaResolutionfixed => reopened
2011-04-15 10:07David GobbiNote Added: 0026184
2011-04-15 10:17David PartykaNote Added: 0026185
2011-04-15 13:54David GobbiNote Added: 0026186
2011-04-15 16:44David PartykaNote Added: 0026187
2011-04-15 16:44David PartykaStatus@20@ => @80@
2011-04-15 16:44David PartykaResolutionreopened => fixed
2011-04-25 12:02Alan ScottNote Added: 0026252
2011-04-25 12:02Alan ScottStatus@80@ => closed

Notes
(0026181)
David Partyka   
2011-04-14 15:47   
Resolved by applying Burlen's patch in this VTK commit.

http://vtk.org/gitweb?p=VTK.git;a=commitdiff;h=af11b213f53d3d6c7cce5df87933bc47506494d0 [^]
(0026183)
David Partyka   
2011-04-15 09:28   
Doesn't compile on Windows. Need to address that first before we can merge this in. Thanks!
(0026184)
David Gobbi   
2011-04-15 10:07   
I'm keen to see this patch applied, so I'll add my 2 cents.

The type socklen_t is not defined on Windows (except under cygwin). So add this near the top of the file:

#if defined(VTK_HAVE_GETSOCKNAME_WITH_SOCKLEN_T)
#define vtkSocketLenType socklen_t
#else
#define vtkSocketLenType int
#endif

Then use vtkSocketLenType instead of socklen_t.
(0026185)
David Partyka   
2011-04-15 10:17   
Yeah that part is fine and I tried that locally. Once you fix socklen_t you find that getsockopt which is called a few lines later has a slightly different API than the posix equivalent. If you have some time to add the appropriate ifdefs etc please feel free to refactor it.

http://msdn.microsoft.com/en-us/library/ms738544(v=vs.85).aspx [^]
(0026186)
David Gobbi   
2011-04-15 13:54   
Windows compilation issues fixed, uploaded to gerrit for review:
http://review.source.kitware.com/#change,1411 [^]
Compiled under MSVC 2005, Linux/gcc-4.2, and OS X 10.6

I also removed #define vtkGenericErrorMacro from the file, it
was a copy of vtkGenericWarningMacro with slightly different
text. I didn't think the modified text justified the duplication.
(0026187)
David Partyka   
2011-04-15 16:44   
David Gobbi made adjustments to get this compiling on Windows as well and the dashboard is now happy again.

http://www.cdash.org/CDash/viewUpdate.php?buildid=976660 [^]
(0026252)
Alan Scott   
2011-04-25 12:02   
Trusting Dave's work.