Closed Bug 469514 Opened 16 years ago Closed 12 years ago

ASSERTION: leaking stream event: 'Not Reached' (intermittent bustage on leak builds)

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: asuth, Unassigned)

References

Details

(Keywords: assertion, intermittent-failure)

Attachments

(1 file, 1 obsolete file)

"python leaktest.py -- --trace-malloc malloc.log --shutdown-leaks=sdleak.log" has failed twice on "OS X 10.5.2 mozilla-central leak test build" with errors like so:

WARNING: An event was posted to a thread that will never run it (rejected): file /builds/moz2_slave/mozilla-central-macosx-debug/build/xpcom/threads/nsThread.cpp, line 359
WARNING: PostEvent failed: file /builds/moz2_slave/mozilla-central-macosx-debug/build/xpcom/io/nsStreamUtils.cpp, line 179
WARNING: An event was posted to a thread that will never run it (rejected): file /builds/moz2_slave/mozilla-central-macosx-debug/build/xpcom/threads/nsThread.cpp, line 359
WARNING: PostEvent failed: file /builds/moz2_slave/mozilla-central-macosx-debug/build/xpcom/io/nsStreamUtils.cpp, line 179
###!!! ASSERTION: leaking stream event: 'Not Reached', file /builds/moz2_slave/mozilla-central-macosx-debug/build/xpcom/io/nsStreamUtils.cpp, line 163
NS_AsyncCopy(nsIInputStream*, nsIOutputStream*, nsIEventTarget*, nsAsyncCopyMode, unsigned int, void (*)(void*, unsigned int), void*)+0x000015A0 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x0005599E]
NS_NewOutputStreamReadyEvent(nsIOutputStreamCallback**, nsIOutputStreamCallback*, nsIEventTarget*)+0x000001A8 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x000543DA]
nsCRT::strncasecmp(char const*, char const*, unsigned int)+0x00000A18 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x000531B0]
nsCRT::strncasecmp(char const*, char const*, unsigned int)+0x00000A4A [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x000531E2]
nsCRT::strncasecmp(char const*, char const*, unsigned int)+0x00000C9C [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x00053434]
NS_NewPipe(nsIInputStream**, nsIOutputStream**, unsigned int, unsigned int, int, int, nsIMemory*)+0x00000564 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x000512BA]
NS_NewPipe(nsIInputStream**, nsIOutputStream**, unsigned int, unsigned int, int, int, nsIMemory*)+0x000012D8 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x0005202E]
NS_NewPipe(nsIInputStream**, nsIOutputStream**, unsigned int, unsigned int, int, int, nsIMemory*)+0x0000151C [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x00052272]
nsLinebreakConverter::ConvertUnicharLineBreaks(unsigned short const*, nsLinebreakConverter::ELinebreakType, nsLinebreakConverter::ELinebreakType, int, int*)+0x000035F0 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x00050288]
NS_NewInputStreamTee(nsIInputStream**, nsIInputStream*, nsIOutputStream*)+0x0000038E [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x0004B834]
NS_ConsumeStream(nsIInputStream*, unsigned int, nsACString_internal&)+0x00000100 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x00053E06]
nsCStringArray::AppendCString(nsACString_internal const&)+0x00007357 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/components/libtoolkitcomps.dylib +0x0003CAA9]
nsGetServiceByCID::nsGetServiceByCID(nsID const&)+0x00000D50 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/components/libnecko.dylib +0x000513CE]
nsCRT::strcmp(char const*, char const*)+0x00005FBA [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/components/libnecko.dylib +0x000F38C6]
nsGetInterface::nsGetInterface(nsISupports*, unsigned int*)+0x000025DF [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/components/libnecko.dylib +0x0001DEB3]
nsGetInterface::nsGetInterface(nsISupports*, unsigned int*)+0x00002B30 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/components/libnecko.dylib +0x0001E404]
NS_AsyncCopy(nsIInputStream*, nsIOutputStream*, nsIEventTarget*, nsAsyncCopyMode, unsigned int, void (*)(void*, unsigned int), void*)+0x00000E28 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x00055226]
NS_SetGlobalThreadObserver(nsIThreadObserver*)+0x0000075E [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x00087928]
NS_ProcessNextEvent_P(nsIThread*, int)+0x00000082 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x00010C5E]
NS_SetGlobalThreadObserver(nsIThreadObserver*)+0x00000DF5 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x00087FBF]
nsCaseInsensitiveCStringComparator::nsCaseInsensitiveCStringComparator()+0x00008D74 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/components/libnecko.dylib +0x0004DA5A]
nsGetServiceByCIDWithError::nsGetServiceByCIDWithError(nsID const&, unsigned int*)+0x00004451 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/components/libnecko.dylib +0x00022F05]
nsGetServiceByCIDWithError::nsGetServiceByCIDWithError(nsID const&, unsigned int*)+0x00006405 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/components/libnecko.dylib +0x00024EB9]
nsINIParser_internal::nsINIParser_internal()+0x00000A65 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x000276F1]
nsQueryReferent::nsQueryReferent(nsIWeakReference*, unsigned int*)+0x000012FA [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/libxpcom_core.dylib +0x00028C42]
nsINIParser_internal::~nsINIParser_internal()+0x0000B282 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/XUL +0x00019334]
UNKNOWN [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/XUL +0x000033A3]
XRE_main+0x000023A9 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/XUL +0x0000ADFB]
start+0x00000AD7 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/firefox-bin +0x000017CB]
start+0x00000102 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/firefox-bin +0x00000DF6]
start+0x00000029 [/builds/moz2_slave/mozilla-central-macosx-debug/build/obj-firefox/dist/MinefieldDebug.app/Contents/MacOS/firefox-bin +0x00000D1D]
ERROR FAIL Exited with code 6 during test run
program finished with exit code 0

The logs themselves:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1229223792.1229224123.12343.gz 2008/12/13 19:03:12

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1229198550.1229199017.25434.gz 2008/12/13 12:02:30

I have no idea where the correct place to be filing this bug is;  I added it to the compendium (https://wiki.mozilla.org/Tinderbox/Nov-2008-Orange-Compendium), and it seemed like other bugs from there that didn't know where to go ended up here too.
I see four matches on 'ASSERTION: leaking stream event' in the last 4 days (which is as far as can be grepped):

Buildstart     branch           revision       buildslave
Dec 08 ~1400   mozilla-central  f8ac1b911fa4   bm-xserve17
Dec 09 ~0600   mozilla-central  85507cfcdda8   moz2-darwin9-slave02
Dec 13 ~1900   mozilla-central  54601bbd453c   bm-xserve19
Dec 13 ~0140   mozilla-1.9.1    58e38927aaca   moz2-darwin9-slave02

Note that assertions only became fatal for m-c when bug 463681 landed at 10:59 PST Dec 12, and aren't yet on m-1.9.1. This doesn't seem like an infrastructure problem, but an intermittent code issue.
Component: Release Engineering → General
Product: mozilla.org → Core
QA Contact: release → general
Version: other → Trunk
Not seem in win32 or linux logs either.
Summary: Intermittent bustage on "OS X 10.5.2 mozilla-central leak test build" box. Cause is "ASSERTION: leaking stream event: 'Not Reached' → ASSERTION: leaking stream event: 'Not Reached' (intermittent bustage on leak builds)
Component: General → XPCOM
QA Contact: general → xpcom
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.5/1238462036.1238464194.29989.gz
OS X 10.5.2 mozilla-1.9.1 leak test build on 2009/03/30 18:13:56
Flags: wanted1.9.1?
Blocks: 438871
Depends on: 224783
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.5/1239389297.1239394563.14596.gz
OS X 10.5.2 mozilla-1.9.1 leak test build on 2009/04/10 11:48:17
Hardware: PowerPC → All
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1239727268.1239728107.14231.gz&fulltext=1
OS X 10.5.2 mozilla-central leak test build on 2009/04/14 09:41:08
This looks like a fundamental issue in the design of NS_AsyncCopy:

http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsStreamUtils.cpp#147

I wonder if it would make sense for objects owned by dying threads to automatically transfer ownership to the main thread? That would ensure objects can always be cleaned up.
Attached patch v1.0 (obsolete) — Splinter Review
This is what I've come up with based on roc's comment.  It made sense to me, and I've added a comment explaining why doing that should be safe.

We may want to consider adding a new error code to specifically say the event did not dispatch because the thread is dead, but this is probably fine.
Assignee: nobody → sdwilsh
Status: NEW → ASSIGNED
Attachment #372749 - Flags: superreview?(benjamin)
Attachment #372749 - Flags: review?(bzbarsky)
Won't that cause whatever callback code the event will trigger to be called on a thread where it doesn't expect to be called?
Yes, but I think that it is safe (see the comment I added in the patch). I'm not sold on that though...
I don't quite understand the bug yet... which thread is disappearing? Why isn't that threading cancelling the async-copy before terminating? Unilaterally dispatching to the main thread sounds worse than the existing situation, to be honest.
I don't think we know which thread is disappearing, nor can I think of a way to determine that (ideas?).

If this were consistently reproducible, it'd be easy to fix too, of course.
Attached patch logging v1.0Splinter Review
Better idea.  Let's add some logging to try and figure out what is not canceling the request before shutdown.  We'll have to keep this in until we hit this sadly...
Attachment #372749 - Attachment is obsolete: true
Attachment #372895 - Flags: superreview?(benjamin)
Attachment #372895 - Flags: review?(benjamin)
Attachment #372749 - Flags: superreview?(benjamin)
Attachment #372749 - Flags: review?(bzbarsky)
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1239998142.1239998512.8631.gz
OS X 10.5.2 mozilla-central leak test build on 2009/04/17 12:55:42
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240313617.1240314963.28914.gz
WINNT 5.2 mozilla-central leak test build on 2009/04/21 04:33:37
OS: Mac OS X → All
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240324419.1240324888.18011.gz
OS X 10.5.2 mozilla-central leak test build on 2009/04/21 07:33:39  

This is happening an awful lot lately.
Comment on attachment 372895 [details] [diff] [review]
logging v1.0

In bsmedberg's absence, and considering that this is all #ifdef DEBUG, r=me to check this in so we can track this down.
Attachment #372895 - Flags: superreview?(benjamin)
Attachment #372895 - Flags: review?(benjamin)
Attachment #372895 - Flags: review+
I changed some of the logging code to not crash:
http://hg.mozilla.org/mozilla-central/rev/5617c22ea9ac
Hrm - maybe this is a threadpool that is calling NS_AsyncCopy?  The mTarget address that we have is never printed.  That, or the pointers are slightly different because of different interfaces.  I haven't dealt with vtables recently enough to recall how that works...
If having different interfaces were the issue, the pointers would be close, but they aren't.

mTarget is an nsIEventTarget, which is either (acording to MXR) an nsThread, nsSocketTransportService, nsIOThreadPool, nsStreamTransportService, or nsDOMThreadService.  Looking at who calls this code would likely reduce the possibilities further, e.g., starting from http://mxr.mozilla.org/mozilla-central/search?string=NS_NewOutputStreamReadyEvent
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1240423793.1240425433.3274.gz
OS X 10.5.2 mozilla-central leak test build on 2009/04/22 11:09:53

Anybody mind if I back the current logging patch out until we have something else to try?
Actually, I'm testing a new logging patch.
Feel free to backout or add a new logging patch.  I meant to do it this morning, but I forgot.  I'll let you take care of it with a new patch though.
Assignee: sdwilsh → nobody
Status: ASSIGNED → NEW
I landed the backout
http://hg.mozilla.org/mozilla-central/rev/df5794cb4d9a
and gave up on the new logging patch since nsGetTypeName doesn't really work given the symbols that we strip out.  (If only we had simple code lying around to use the debug symbols, which do have the information we need...)
Also, two notes about future debugging patches:

 * we can use DEBUG_cltbld to make it show up for leak tinderboxes but not other people running debug builds (just test with DEBUG when developing and then change before landing)

 * all the assertions were for the nsOutputStreamReadyEvent case, so we don't need to log the nsInputStreamReadyEvent case
Probably the next logging patch should just instrument the constructors of all 5 of those classes.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1241369617.1241369890.2819.gz
OS X 10.5.2 mozilla-central leak test build on 2009/05/03 09:53:37
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1241719061.1241721179.4595.gz&fulltext=1
OS X 10.5.2 mozilla-central leak test build on 2009/05/07 10:57:41
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1242411599.1242411935.12543.gz
OS X 10.5.2 mozilla-central leak test build on 2009/05/15 11:19:59
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1242399602.1242402314.25999.gz
OS X 10.5.2 mozilla-central leak test build on 2009/05/15 08:00:02
possible dupes (one way or the other): bug 224783, bug 434559
Flags: wanted1.9.1?
Mass marking whiteboard:[orange] bugs WFM (to clean up TBPL bug suggestions) that:
* Haven't changed in > 6months
* Whose whiteboard contains none of the strings: {disabled,marked,random,fuzzy,todo,fails,failing,annotated,leave open,time-bomb}
* Passed a (quick) manual inspection of bug summary/whiteboard to ensure they weren't a false positive.

I've also gone through and searched for cases where the whiteboard wasn't labelled correctly after test disabling, by using attachment description & basic comment searches. However if the test for which this bug was about has in fact been disabled/annotated/..., please accept my apologies & reopen/mark the whiteboard appropriately so this doesn't get re-closed in the future (and please ping me via IRC or email so I can try to tweak the saved searches to avoid more edge cases).

Sorry for the spam! Filter on: #FFA500
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: