Closed
Bug 65146
Opened 24 years ago
Closed 24 years ago
deadlock potential in proxy code
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla0.9
People
(Reporter: waterson, Assigned: waterson)
References
Details
Attachments
(4 files)
5.67 KB,
text/plain
|
Details | |
2.02 KB,
text/plain
|
Details | |
739 bytes,
patch
|
Details | Diff | Splinter Review | |
2.13 KB,
patch
|
Details | Diff | Splinter Review |
I am seeing a deadlock between the socket transport thread and the UI thread;
I'll attach stack traces. The socket transport thread trying to make a
synchronous, proxied call (from nsProxyObject::Release) while still holding on
to its monitor. Meanwhile, the UI thread is wedged, unable to respond to the
proxy event, trying to enter the socket transport's monitor.
I believe this is a proxy bug: specifically, I do not think the proxy code
should be synchronously posting an event in nsProxyObject::Release.
Assignee | ||
Comment 1•24 years ago
|
||
Assignee | ||
Comment 2•24 years ago
|
||
Assignee | ||
Comment 3•24 years ago
|
||
Comment 4•24 years ago
|
||
This may be the best solution to the problem (provided it doesn't break
something else -- doug?) b/c we definitely cannot release the socket
transport monitor while processing eSocketState_Done.
Comment 5•24 years ago
|
||
Locking should be "leafy" -- non-nesting, short critical sections. Java-style
monitors considered harmful. There's almost always a way to convert non-leafy
locking to leafy locking (see however
http://lxr.mozilla.org/mozilla/source/js/src/jsobj.c#218 and below for a rare
hard case where nested locks are not avoidable).
/be
Assignee | ||
Comment 6•24 years ago
|
||
cc'ing jud, who sez that we don't need to proxy the nsHTTPChannel's mEventSink
anymore (almost).
Comment 7•24 years ago
|
||
I am not sure what my reason was to have a sync release. There are many cases in
necko where the only remaining reference to an object exists in a message which
will be punted across threads.
The orignal bug that I fixed this for is:
http://bugzilla.mozilla.org/show_bug.cgi?id=47134
If things test okay, r=dougt
Oh, one hl test that you should run is ->
1. delete your IMAP password.
2. Connect to your IMAP server.
3. Note that the password dialog does come up.
4. Enter your info
5. Ensure that you see mail in your inbox.
Assignee | ||
Comment 8•24 years ago
|
||
For the record, I'm seeing situations now where HTTP `wraps around' a cache
record (sending a PRUint32 that represents `channel size' past zero and into the
gazillion range). I don't know if this is another bug or not: things seem to
work okay once I evict proxies from HTTP altogether (bug 65160).
Comment 9•24 years ago
|
||
in any case, I like the async fix. Can you check it in, or please reassign it
to me with a fat sr=waterson. :-)
Assignee: dougt → waterson
Assignee | ||
Comment 10•24 years ago
|
||
mscott, could you give this patch a try? I've not had any problems with it, but
would like your feedback...
Status: NEW → ASSIGNED
Assignee | ||
Comment 11•24 years ago
|
||
Ok, so ``fixing'' this bug (which I'm pretty sure is the right thing to do)
exposes another deadlock. I'll attach stack traces below, but the general idea
is (stacks grow down):
[Main Thread]
Gets OnDataAvailable
Reads from pipe
Calls out to writer routine WHILE HOLDING PIPE MONITOR
in nsPipe::nsPipeInputStream::ReadSegments()
Does some PSM detection voodoo
Cancels HTTP request
Blocks trying to aquire nsSocketTransport's monitor
[SocketTransport]
Acquire nsSocketTransport monitor in Process()
Does async read WHILE HOLDING SOCKET TRANSPORT MONITOR
Calls through to pipe write routine
Blocks trying to acquire pipe's monitor in WriteSegments()
With the above patch installed, this happens consistently at
http://www.iwin.com/home/start.asp. It seems very wrong to me that the main
thread is holding the pipe's monitor for such an enormous amount of time!
Here are the full stack traces...
[MainThread]
NTDLL! 77f8a122()
NTDLL! 77f8ecf1()
PR_EnterMonitor(PRMonitor * 0x04bb0ea0) line 79 + 14 bytes
nsAutoMonitor::nsAutoMonitor(PRMonitor * 0x04bb0ea0) line 184 + 13 bytes
nsSocketTransport::Cancel(nsSocketTransport * const 0x04bb2a04, unsigned int
0x804b0002) line 1823
nsHTTPPipelinedRequest::Cancel(unsigned int 0x804b0002) line 1034 + 30 bytes
nsHTTPRequest::Cancel(nsHTTPRequest * const 0x04be1dd0, unsigned int 0x804b0002)
line 164 + 15 bytes
nsHTTPChannel::Cancel(nsHTTPChannel * const 0x04be4db0, unsigned int 0x804b0002)
line 193 + 22 bytes
nsLoadGroup::Cancel(nsLoadGroup * const 0x04be2e70, unsigned int 0x804b0002)
line 256 + 16 bytes
nsDocLoaderImpl::Stop(nsDocLoaderImpl * const 0x04be2ee0) line 282 + 31 bytes
nsURILoader::Stop(nsURILoader * const 0x0279b690, nsISupports * 0x04be2ef8) line
850 + 23 bytes
nsDocShell::Stop(nsDocShell * const 0x04be0080) line 1565
nsWebShell::StopDocumentLoad(nsWebShell * const 0x04be019c) line 664
nsMyObserver::Notify(nsMyObserver * const 0x04bf0e00, const char * 0x010461f4,
nsDetectionConfident eSureAnswer) line 98 + 18 bytes
nsXPCOMDetector::Report(const char * 0x010461f4) line 667
nsPSMDetector::HandleData(const char * 0x00c7b4f0, unsigned int 0x00000288) line
376
nsXPCOMDetector::DoIt(nsXPCOMDetector * const 0x00c3d360, const char *
0x00c7b4f0, unsigned int 0x00000288, int * 0x04bf0f54) line 653
nsDetectionAdaptor::RawBuffer(nsDetectionAdaptor * const 0x04bf0f40, const char
* 0x00c7b4f0, unsigned int * 0x0012f7cc) line 275 + 35 bytes
ParserWriteFunc(nsIInputStream * 0x04bf1dd0, void * 0x0012f8cc, const char *
0x00c7b4f0, unsigned int 0x00000000, unsigned int 0x00000288, unsigned int *
0x0012f844) line 2280
InterceptStreamListener::IntercepterWriter(nsIInputStream * 0x04bf1dd0, void *
0x0012f88c, const char * 0x00c7b4f0, unsigned int 0x00000000, unsigned int
0x00000288, unsigned int * 0x0012f844) line 1260 + 33 bytes
nsPipe::nsPipeInputStream::ReadSegments(nsPipe::nsPipeInputStream * const
0x04bf1dd0, unsigned int (nsIInputStream *, void *, const char *, unsigned int,
unsigned int, unsigned int *)* 0x01e9e0c0
InterceptStreamListener::IntercepterWriter(nsIInputStream *, void *, const char
*, unsigned int, unsigned int, unsigned int *), void * 0x0012f88c, unsigned int
0x000005b4, ...) line 402 +
InterceptStreamListener::ReadSegments(InterceptStreamListener * const
0x04bf5734, unsigned int (nsIInputStream *, void *, const char *, unsigned int,
unsigned int, unsigned int *)* 0x0182486a ParserWriteFunc(nsIInputStream *, void
*, const char *, unsigned int, unsigned int, unsigned int *), void * 0x0012f8cc,
unsigned int 0x000005b4, unsigned int * 0x0012f8dc) line 1279
nsParser::OnDataAvailable(nsParser * const 0x04bf4238, nsIChannel * 0x04be4db0,
nsISupports * 0x00000000, nsIInputStream * 0x04bf5734, unsigned int 0x00000000,
unsigned int 0x000005b4) line 2334 + 29 bytes
nsDocumentOpenInfo::OnDataAvailable(nsDocumentOpenInfo * const 0x04be3380,
nsIChannel * 0x04be4db0, nsISupports * 0x00000000, nsIInputStream * 0x04bf5734,
unsigned int 0x00000000, unsigned int 0x000005b4) line 261 + 46 bytes
nsHTTPFinalListener::OnDataAvailable(nsHTTPFinalListener * const 0x04be1d20,
nsIChannel * 0x04be4db0, nsISupports * 0x00000000, nsIInputStream * 0x04bf5734,
unsigned int 0x00000000, unsigned int 0x000005b4) line 1215 + 46 bytes
InterceptStreamListener::OnDataAvailable(InterceptStreamListener * const
0x04bf5730, nsIChannel * 0x04be4db0, nsISupports * 0x00000000, nsIInputStream *
0x04bf1dd0, unsigned int 0x00000000, unsigned int 0x000005b4) line 1221
nsHTTPServerListener::OnDataAvailable(nsHTTPServerListener * const 0x04bf3d10,
nsIChannel * 0x04bb2a04, nsISupports * 0x04be4db0, nsIInputStream * 0x04bf1dd0,
unsigned int 0x00007d78, unsigned int 0x000005b4) line 558 + 67 bytes
nsOnDataAvailableEvent::HandleEvent(nsOnDataAvailableEvent * const 0x03fffbd0)
line 160 + 70 bytes
nsStreamObserverEvent::HandlePLEvent(PLEvent * 0x03fffbd4) line 78
PL_HandleEvent(PLEvent * 0x03fffbd4) line 576 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x0055b240) line 509 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x000e03d6, unsigned int 0x0000c0ff, unsigned int
0x00000000, long 0x0055b240) line 1054 + 9 bytes
USER32! 77e13eb0()
USER32! 77e1401a()
USER32! 77e192da()
nsAppShellService::Run(nsAppShellService * const 0x00577200) line 408
main1(int 0x00000001, char * * 0x004b5e30, nsISupports * 0x00000000) line 978 +
32 bytes
main(int 0x00000001, char * * 0x004b5e30) line 1272 + 37 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 77e87903()
[SocketTransportThread]
NTDLL! 77f8a122()
NTDLL! 77f8ecf1()
PR_EnterMonitor(PRMonitor * 0x04bf7350) line 79 + 14 bytes
nsAutoMonitor::nsAutoMonitor(PRMonitor * 0x04bf7350) line 184 + 13 bytes
nsPipe::nsPipeOutputStream::WriteSegments(nsPipe::nsPipeOutputStream * const
0x04bf1de0, unsigned int (nsIOutputStream *, void *, char *, unsigned int,
unsigned int, unsigned int *)* 0x1005f4f0 nsReadFromInputStream(nsIOutputStream
*, void *, char *, unsigned int, unsigned int, unsigned int *), void *
0x04bf1fd0, unsigned int 0x00002000, unsigned int * 0x02f6fe68) line 661
nsPipe::nsPipeOutputStream::WriteFrom(nsPipe::nsPipeOutputStream * const
0x04bf1de0, nsIInputStream * 0x04bf1fd0, unsigned int 0x00002000, unsigned int *
0x02f6fe68) line 827
nsStreamListenerProxy::OnDataAvailable(nsStreamListenerProxy * const 0x04bf50e4,
nsIChannel * 0x04bb2a04, nsISupports * 0x04be4db0, nsIInputStream * 0x04bf1fd0,
unsigned int 0x0000832c, unsigned int 0x00002000) line 272 + 38 bytes
nsSocketTransport::doReadAsync(short 0x0001) line 1290 + 96 bytes
nsSocketTransport::Process(short 0x0001) line 808 + 13 bytes
nsSocketTransportService::Run(nsSocketTransportService * const 0x0276b854) line
424 + 13 bytes
nsThread::Main(void * 0x0276b2b0) line 106 + 26 bytes
_PR_NativeRunThread(void * 0x0276b090) line 399 + 13 bytes
_threadstartex(void * 0x0276ced0) line 212 + 13 bytes
KERNEL32! 77e92ca8()
Assignee | ||
Comment 12•24 years ago
|
||
Yeah, looking at this a bit more, it seems like brendan's already hacked around
one deadlock in nsPipe::nsPipeInputStream::ReadSegments(). What is the
state that really needs to be protected here?
Assignee | ||
Comment 13•24 years ago
|
||
Leaving the monitor before calling out to `writer()' seems to fix the second
deadlock. But it's kinda scary...
Index: nsPipe2.cpp
===================================================================
RCS file: /cvsroot/mozilla/xpcom/io/nsPipe2.cpp,v
retrieving revision 1.33
diff -u -2 -r1.33 nsPipe2.cpp
--- nsPipe2.cpp 2001/01/27 01:28:00 1.33
+++ nsPipe2.cpp 2001/02/06 05:25:42
@@ -400,5 +400,7 @@
while (readBufferLen > 0) {
PRUint32 writeCount = 0;
+ mon.Exit(); // XXX avoid deadlock better
rv = writer(this, closure, readBuffer, *readCount, readBufferLen,
&writeCount);
+ mon.Enter();
if (NS_FAILED(rv) && rv != NS_BASE_STREAM_WOULD_BLOCK)
goto done;
Comment 14•24 years ago
|
||
I told warren to avoid monitors and callouts inside monitors, and look what
happened!
Who claims to own nsPipe2.cpp these days? That file needs a monitor enema.
Perhaps the quick fix is to exit and re-enter (using the nsAutoMonitor) around
the call-outs? The only invariants that might vary are the read-buffer state
variables, but a pipe is single-producer, single-consumer (thread-wise, that
is), so the read-buffer state vars should be touched only by this code.
/be
Comment 15•24 years ago
|
||
Both the socket transport and the pipe implementations use large-scoping
monitors for thread synchronization. Fixing nsPipe2 is only part of the
problem. We need to do the same thing with the socket transport.
Assignee | ||
Comment 16•24 years ago
|
||
darin: brendan has given me new confidence in relaseing the monitor while
calling out through `writer()'. What do you think? Is the change right?
Comment 17•24 years ago
|
||
See bug 26511 for removing some of the locking in pipe2.
Comment 18•24 years ago
|
||
darin: I totally agree (no surprise) that monitors blithely interlocking large,
nested, module-callout-full "critical sections" (hah! I say hah! critical only
to lazy coding ;-) need to be fixed, everywhere. But we can divide and conquer,
especially with nsPipe2.cpp, in order to break the deadly embrace; then fix the
socket transport later.
mozilla0.8 freezes tonight -- can we get a patch from an ownerly person?
/be
Comment 19•24 years ago
|
||
Brendan's suggestion looks safe to me. An overlapped write would only expand
the amount that could be read and/or change the value of nsPipe::mCondition.
Unless the writer writes zero bytes or returns NS_BASE_STREAM_WOULD_BLOCK,
these changes would never come into play. With the way nsPipe2 is written,
I think this special case would be handled correctly.
Comment 20•24 years ago
|
||
I'm all for waterson's patch. r=darin.
Assignee | ||
Comment 21•24 years ago
|
||
I think I'll try to get this in once the tree opens for m0.9. That'll give it
plenty of time to cook.
Target Milestone: --- → mozilla0.9
Comment 22•24 years ago
|
||
We should make the same fix inside WriteSegments where we call out to the reader.
Comment 23•24 years ago
|
||
For what it's worth, I also noticed about a 150ms speed up on my linux box
with this patch applied (including the one for WriteSegments). This was using
John Morrison's loader test at http://jrgm.mcom.com/page-loader/loader.pl (sorry
this is only a netscape internal link).
Assignee | ||
Comment 24•24 years ago
|
||
Comment 25•24 years ago
|
||
I've been running with this patch for a while now, and everything seems good.
r=darin
Comment 26•24 years ago
|
||
sr=brendan@mozilla.org. Can we have a bug on further improvements, e.g., to use
a lock rather than a monitor? In general, to remove these doubt-inducing XXX
comments altogether.
/be
Comment 27•24 years ago
|
||
Yes, but it would require splitting the blocking pipe impl from the
non-blocking pipe impl, since the blocking impl will definitely need
to wait on a monitor.
Comment 28•24 years ago
|
||
A monitor is a lock, a condition variable, and a reentrancy count. If we keep
critical sections leafy, whether blocking or non-blocking, then I see no need
for the reentrancy count. That leaves a condition, which can be null for
non-blocking and non-null, pointing to a valid PRCondVar, for blocking, pipes.
I don't think we need to split impls just to support blocking while avoiding a
monitor; there may of course be other reasons to split impls.
/be
Comment 29•24 years ago
|
||
Created bug 68614 to cover removing the monitors from the
non-blocking pipe implementation.
Comment 30•24 years ago
|
||
brendan: good point.
Comment 31•24 years ago
|
||
IOW, you wait on a condition, not (necessarily, or primitively) on a monitor.
Java-style monitors are a plague on the land and an abomination in mine eyes!
/be
Assignee | ||
Comment 32•24 years ago
|
||
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•