Closed Bug 65146 Opened 24 years ago Closed 24 years ago

deadlock potential in proxy code

Categories

(Core :: XPCOM, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla0.9

People

(Reporter: waterson, Assigned: waterson)

References

Details

Attachments

(4 files)

I am seeing a deadlock between the socket transport thread and the UI thread; 
I'll attach stack traces. The socket transport thread trying to make a 
synchronous, proxied call (from nsProxyObject::Release) while still holding on 
to its monitor. Meanwhile, the UI thread is wedged, unable to respond to the 
proxy event, trying to enter the socket transport's monitor.

I believe this is a proxy bug: specifically, I do not think the proxy code 
should be synchronously posting an event in nsProxyObject::Release.
This may be the best solution to the problem (provided it doesn't break
something else -- doug?) b/c we definitely cannot release the socket
transport monitor while processing eSocketState_Done.
Locking should be "leafy" -- non-nesting, short critical sections.  Java-style
monitors considered harmful.  There's almost always a way to convert non-leafy
locking to leafy locking (see however
http://lxr.mozilla.org/mozilla/source/js/src/jsobj.c#218 and below for a rare
hard case where nested locks are not avoidable).

/be
cc'ing jud, who sez that we don't need to proxy the nsHTTPChannel's mEventSink 
anymore (almost).
Depends on: 65160
I am not sure what my reason was to have a sync release. There are many cases in 
necko where the only remaining reference to an object exists in a message which 
will be punted across threads.  

The orignal bug that I fixed this for is:
http://bugzilla.mozilla.org/show_bug.cgi?id=47134

If things test okay, r=dougt

Oh, one hl test that you should run is ->

1. delete your IMAP password.
2. Connect to your IMAP server.
3. Note that the password dialog does come up.
4. Enter your info
5. Ensure that you see mail in your inbox.

For the record, I'm seeing situations now where HTTP `wraps around' a cache 
record (sending a PRUint32 that represents `channel size' past zero and into the 
gazillion range). I don't know if this is another bug or not: things seem to 
work okay once I evict proxies from HTTP altogether (bug 65160).
in any case, I like the async fix.  Can you check it in, or please reassign it
to me with a fat sr=waterson.  :-)

Assignee: dougt → waterson
mscott, could you give this patch a try? I've not had any problems with it, but 
would like your feedback...
Status: NEW → ASSIGNED
Ok, so ``fixing'' this bug (which I'm pretty sure is the right thing to do) 
exposes another deadlock. I'll attach stack traces below, but the general idea 
is (stacks grow down):

[Main Thread] 
  Gets OnDataAvailable
  Reads from pipe
  Calls out to writer routine WHILE HOLDING PIPE MONITOR
    in nsPipe::nsPipeInputStream::ReadSegments()
  Does some PSM detection voodoo
  Cancels HTTP request
  Blocks trying to aquire nsSocketTransport's monitor

[SocketTransport]
  Acquire nsSocketTransport monitor in Process()
  Does async read WHILE HOLDING SOCKET TRANSPORT MONITOR
  Calls through to pipe write routine
  Blocks trying to acquire pipe's monitor in WriteSegments()

With the above patch installed, this happens consistently at 
http://www.iwin.com/home/start.asp. It seems very wrong to me that the main 
thread is holding the pipe's monitor for such an enormous amount of time!

Here are the full stack traces...

[MainThread]
NTDLL! 77f8a122()
NTDLL! 77f8ecf1()
PR_EnterMonitor(PRMonitor * 0x04bb0ea0) line 79 + 14 bytes
nsAutoMonitor::nsAutoMonitor(PRMonitor * 0x04bb0ea0) line 184 + 13 bytes
nsSocketTransport::Cancel(nsSocketTransport * const 0x04bb2a04, unsigned int 
0x804b0002) line 1823
nsHTTPPipelinedRequest::Cancel(unsigned int 0x804b0002) line 1034 + 30 bytes
nsHTTPRequest::Cancel(nsHTTPRequest * const 0x04be1dd0, unsigned int 0x804b0002) 
line 164 + 15 bytes
nsHTTPChannel::Cancel(nsHTTPChannel * const 0x04be4db0, unsigned int 0x804b0002) 
line 193 + 22 bytes
nsLoadGroup::Cancel(nsLoadGroup * const 0x04be2e70, unsigned int 0x804b0002) 
line 256 + 16 bytes
nsDocLoaderImpl::Stop(nsDocLoaderImpl * const 0x04be2ee0) line 282 + 31 bytes
nsURILoader::Stop(nsURILoader * const 0x0279b690, nsISupports * 0x04be2ef8) line 
850 + 23 bytes
nsDocShell::Stop(nsDocShell * const 0x04be0080) line 1565
nsWebShell::StopDocumentLoad(nsWebShell * const 0x04be019c) line 664
nsMyObserver::Notify(nsMyObserver * const 0x04bf0e00, const char * 0x010461f4, 
nsDetectionConfident eSureAnswer) line 98 + 18 bytes
nsXPCOMDetector::Report(const char * 0x010461f4) line 667
nsPSMDetector::HandleData(const char * 0x00c7b4f0, unsigned int 0x00000288) line 
376
nsXPCOMDetector::DoIt(nsXPCOMDetector * const 0x00c3d360, const char * 
0x00c7b4f0, unsigned int 0x00000288, int * 0x04bf0f54) line 653
nsDetectionAdaptor::RawBuffer(nsDetectionAdaptor * const 0x04bf0f40, const char 
* 0x00c7b4f0, unsigned int * 0x0012f7cc) line 275 + 35 bytes
ParserWriteFunc(nsIInputStream * 0x04bf1dd0, void * 0x0012f8cc, const char * 
0x00c7b4f0, unsigned int 0x00000000, unsigned int 0x00000288, unsigned int * 
0x0012f844) line 2280
InterceptStreamListener::IntercepterWriter(nsIInputStream * 0x04bf1dd0, void * 
0x0012f88c, const char * 0x00c7b4f0, unsigned int 0x00000000, unsigned int 
0x00000288, unsigned int * 0x0012f844) line 1260 + 33 bytes
nsPipe::nsPipeInputStream::ReadSegments(nsPipe::nsPipeInputStream * const 
0x04bf1dd0, unsigned int (nsIInputStream *, void *, const char *, unsigned int, 
unsigned int, unsigned int *)* 0x01e9e0c0 
InterceptStreamListener::IntercepterWriter(nsIInputStream *, void *, const char 
*, unsigned int, unsigned int, unsigned int *), void * 0x0012f88c, unsigned int 
0x000005b4, ...) line 402 + 
InterceptStreamListener::ReadSegments(InterceptStreamListener * const 
0x04bf5734, unsigned int (nsIInputStream *, void *, const char *, unsigned int, 
unsigned int, unsigned int *)* 0x0182486a ParserWriteFunc(nsIInputStream *, void 
*, const char *, unsigned int, unsigned int, unsigned int *), void * 0x0012f8cc, 
unsigned int 0x000005b4, unsigned int * 0x0012f8dc) line 1279
nsParser::OnDataAvailable(nsParser * const 0x04bf4238, nsIChannel * 0x04be4db0, 
nsISupports * 0x00000000, nsIInputStream * 0x04bf5734, unsigned int 0x00000000, 
unsigned int 0x000005b4) line 2334 + 29 bytes
nsDocumentOpenInfo::OnDataAvailable(nsDocumentOpenInfo * const 0x04be3380, 
nsIChannel * 0x04be4db0, nsISupports * 0x00000000, nsIInputStream * 0x04bf5734, 
unsigned int 0x00000000, unsigned int 0x000005b4) line 261 + 46 bytes
nsHTTPFinalListener::OnDataAvailable(nsHTTPFinalListener * const 0x04be1d20, 
nsIChannel * 0x04be4db0, nsISupports * 0x00000000, nsIInputStream * 0x04bf5734, 
unsigned int 0x00000000, unsigned int 0x000005b4) line 1215 + 46 bytes
InterceptStreamListener::OnDataAvailable(InterceptStreamListener * const 
0x04bf5730, nsIChannel * 0x04be4db0, nsISupports * 0x00000000, nsIInputStream * 
0x04bf1dd0, unsigned int 0x00000000, unsigned int 0x000005b4) line 1221
nsHTTPServerListener::OnDataAvailable(nsHTTPServerListener * const 0x04bf3d10, 
nsIChannel * 0x04bb2a04, nsISupports * 0x04be4db0, nsIInputStream * 0x04bf1dd0, 
unsigned int 0x00007d78, unsigned int 0x000005b4) line 558 + 67 bytes
nsOnDataAvailableEvent::HandleEvent(nsOnDataAvailableEvent * const 0x03fffbd0) 
line 160 + 70 bytes
nsStreamObserverEvent::HandlePLEvent(PLEvent * 0x03fffbd4) line 78
PL_HandleEvent(PLEvent * 0x03fffbd4) line 576 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x0055b240) line 509 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x000e03d6, unsigned int 0x0000c0ff, unsigned int 
0x00000000, long 0x0055b240) line 1054 + 9 bytes
USER32! 77e13eb0()
USER32! 77e1401a()
USER32! 77e192da()
nsAppShellService::Run(nsAppShellService * const 0x00577200) line 408
main1(int 0x00000001, char * * 0x004b5e30, nsISupports * 0x00000000) line 978 + 
32 bytes
main(int 0x00000001, char * * 0x004b5e30) line 1272 + 37 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 77e87903()



[SocketTransportThread]
NTDLL! 77f8a122()
NTDLL! 77f8ecf1()
PR_EnterMonitor(PRMonitor * 0x04bf7350) line 79 + 14 bytes
nsAutoMonitor::nsAutoMonitor(PRMonitor * 0x04bf7350) line 184 + 13 bytes
nsPipe::nsPipeOutputStream::WriteSegments(nsPipe::nsPipeOutputStream * const 
0x04bf1de0, unsigned int (nsIOutputStream *, void *, char *, unsigned int, 
unsigned int, unsigned int *)* 0x1005f4f0 nsReadFromInputStream(nsIOutputStream 
*, void *, char *, unsigned int, unsigned int, unsigned int *), void * 
0x04bf1fd0, unsigned int 0x00002000, unsigned int * 0x02f6fe68) line 661
nsPipe::nsPipeOutputStream::WriteFrom(nsPipe::nsPipeOutputStream * const 
0x04bf1de0, nsIInputStream * 0x04bf1fd0, unsigned int 0x00002000, unsigned int * 
0x02f6fe68) line 827
nsStreamListenerProxy::OnDataAvailable(nsStreamListenerProxy * const 0x04bf50e4, 
nsIChannel * 0x04bb2a04, nsISupports * 0x04be4db0, nsIInputStream * 0x04bf1fd0, 
unsigned int 0x0000832c, unsigned int 0x00002000) line 272 + 38 bytes
nsSocketTransport::doReadAsync(short 0x0001) line 1290 + 96 bytes
nsSocketTransport::Process(short 0x0001) line 808 + 13 bytes
nsSocketTransportService::Run(nsSocketTransportService * const 0x0276b854) line 
424 + 13 bytes
nsThread::Main(void * 0x0276b2b0) line 106 + 26 bytes
_PR_NativeRunThread(void * 0x0276b090) line 399 + 13 bytes
_threadstartex(void * 0x0276ced0) line 212 + 13 bytes
KERNEL32! 77e92ca8()
Yeah, looking at this a bit more, it seems like brendan's already hacked around 
one deadlock in nsPipe::nsPipeInputStream::ReadSegments(). What is the 
state that really needs to be protected here?
Leaving the monitor before calling out to `writer()' seems to fix the second 
deadlock. But it's kinda scary...

Index: nsPipe2.cpp
===================================================================
RCS file: /cvsroot/mozilla/xpcom/io/nsPipe2.cpp,v
retrieving revision 1.33
diff -u -2 -r1.33 nsPipe2.cpp
--- nsPipe2.cpp	2001/01/27 01:28:00	1.33
+++ nsPipe2.cpp	2001/02/06 05:25:42
@@ -400,5 +400,7 @@
         while (readBufferLen > 0) {
             PRUint32 writeCount = 0;
+            mon.Exit(); // XXX avoid deadlock better
             rv = writer(this, closure, readBuffer, *readCount, readBufferLen, 
&writeCount);
+            mon.Enter();            
             if (NS_FAILED(rv) && rv != NS_BASE_STREAM_WOULD_BLOCK)
                 goto done;
I told warren to avoid monitors and callouts inside monitors, and look what
happened!

Who claims to own nsPipe2.cpp these days?  That file needs a monitor enema. 
Perhaps the quick fix is to exit and re-enter (using the nsAutoMonitor) around
the call-outs?  The only invariants that might vary are the read-buffer state
variables, but a pipe is single-producer, single-consumer (thread-wise, that
is), so the read-buffer state vars should be touched only by this code.

/be
Both the socket transport and the pipe implementations use large-scoping
monitors for thread synchronization.  Fixing nsPipe2 is only part of the
problem.  We need to do the same thing with the socket transport.
darin: brendan has given me new confidence in relaseing the monitor while 
calling out through `writer()'. What do you think? Is the change right?
See bug 26511 for removing some of the locking in pipe2.
darin: I totally agree (no surprise) that monitors blithely interlocking large,
nested, module-callout-full "critical sections" (hah! I say hah!  critical only
to lazy coding ;-) need to be fixed, everywhere.  But we can divide and conquer,
especially with nsPipe2.cpp, in order to break the deadly embrace; then fix the
socket transport later.

mozilla0.8 freezes tonight -- can we get a patch from an ownerly person?

/be
Brendan's suggestion looks safe to me.  An overlapped write would only expand
the amount that could be read and/or change the value of nsPipe::mCondition.
Unless the writer writes zero bytes or returns NS_BASE_STREAM_WOULD_BLOCK,
these changes would never come into play.  With the way nsPipe2 is written,
I think this special case would be handled correctly.
I'm all for waterson's patch.  r=darin.
I think I'll try to get this in once the tree opens for m0.9. That'll give it 
plenty of time to cook.
Target Milestone: --- → mozilla0.9
We should make the same fix inside WriteSegments where we call out to the reader.
For what it's worth, I also noticed about a 150ms speed up on my linux box
with this patch applied (including the one for WriteSegments).  This was using
John Morrison's loader test at http://jrgm.mcom.com/page-loader/loader.pl (sorry
this is only a netscape internal link).
I've been running with this patch for a while now, and everything seems good.
r=darin
sr=brendan@mozilla.org.  Can we have a bug on further improvements, e.g., to use
a lock rather than a monitor?  In general, to remove these doubt-inducing XXX
comments altogether.

/be
Yes, but it would require splitting the blocking pipe impl from the
non-blocking pipe impl, since the blocking impl will definitely need
to wait on a monitor.
A monitor is a lock, a condition variable, and a reentrancy count.  If we keep
critical sections leafy, whether blocking or non-blocking, then I see no need
for the reentrancy count.  That leaves a condition, which can be null for
non-blocking and non-null, pointing to a valid PRCondVar, for blocking, pipes. 
I don't think we need to split impls just to support blocking while avoiding a
monitor; there may of course be other reasons to split impls.

/be
Created bug 68614 to cover removing the monitors from the
non-blocking pipe implementation.
brendan: good point.
IOW, you wait on a condition, not (necessarily, or primitively) on a monitor.

Java-style monitors are a plague on the land and an abomination in mine eyes!

/be
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: