Closed Bug 601725 Opened 14 years ago Closed 12 years ago

Firefox 4.0b7pre Crash Report [@ StartRequest ]

Categories

(Core :: JavaScript Engine, defect, P2)

x86
All
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
blocking2.0 --- -
status2.0 --- wanted

People

(Reporter: chofmann, Assigned: jorendorff)

References

Details

(Keywords: crash)

Crash Data

these crashes started slowly increasing around sept 26 in builds from the 25th, but then took a larger bump yesterday to the #5 top crash on trunk. 20100924 20100925 20100926 1 4.0b7pre2010092504 1 , 20100927 1 4.0b7pre2010092704 1 , 20100928 2 1 4.0b7pre2010092804, 1 4.0b7pre2010092704, 20100929 2 4.0b7pre2010092904 2 , 20100930 5 3 4.0b7pre2010093004, 2 4.0b7pre2010092904, 20101001 8 6 4.0b7pre2010093004, 2 4.0b7pre2010100108, 20101002 12 5 4.0b7pre2010100204, 4 4.0b7pre2010093004, 3 4.0b7pre2010100108, 20101003 28 15 4.0b7pre2010093004, 7 4.0b7pre2010100204, 4 4.0b7pre2010100304, 1 4.0b7pre2010100315, 1 4.0b7pre2010100108, stack looks like http://crash-stats.mozilla.com/report/index/32f6b4e7-b5f9-4b1b-8d9a-ed8642101002 Frame Module Signature [Expand] Source 0 mozjs.dll StartRequest js/src/jsapi.cpp:815 1 xul.dll XPCCallContext::Init js/src/xpconnect/src/xpccallcontext.cpp:165 more reports at http://crash-stats.mozilla.com/report/list?signature=StartRequest
probably should block beta 7 if the uptick continues
blocking2.0: --- → ?
FWIW, http://hg.mozilla.org/mozilla-central/rev/d70a9ed2b89e was pushed on Sept 24th which would have first showed up in Sept 25th builds - possible suspect?
blocking2.0: ? → beta7+
The first instance of this crash on record is from a 9/23 build: http://crash-stats.mozilla.com/report/index/742cdb61-8077-49a7-b231-e428b2101007 The last merge from TM before 9/23 was on 9/21: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e4fdb23efa33&tochange=e6cfaf0840cb That group also just happens to contain the last changeset to touch StartRequest, so that changeset seems like a good starting point. http://hg.mozilla.org/mozilla-central/rev/d20abbebe373
Small correction: the merge from TM referred to in comment 4 was on 9/20.
Looks like there is no thread attached to a context? Thread data null? Brian has blame for the lines affected.
Assignee: general → bhackett1024
Priority: -- → P2
This is a NULL cx->thread on entry to StartRequest, but cx->thread was being dereferenced by StartRequest before d20abbebe373 and d20abbebe373 does not change anything around where cx->thread is written. From the stack traces in the crash reports it looks like the cx came from GetSafeJSContext in (most? all?) cases, which is again code that d20abbebe373 doesn't touch. So this might be a latent issue; I don't know XPConnect at all and am entirely unequipped to diagnose. My options are to either back out d20abbebe373 (which is a performance patch) or paper over things by adding a cx->thread NULL check on GetSafeJSContext. Is there a preference?
(In reply to comment #7) > This is a NULL cx->thread on entry to StartRequest, but cx->thread was being > dereferenced by StartRequest before d20abbebe373 and d20abbebe373 does not > change anything around where cx->thread is written. From the stack traces in > the crash reports it looks like the cx came from GetSafeJSContext in (most? > all?) cases, which is again code that d20abbebe373 doesn't touch. So this > might be a latent issue; I don't know XPConnect at all and am entirely > unequipped to diagnose. > > My options are to either back out d20abbebe373 (which is a performance patch) > or paper over things by adding a cx->thread NULL check on GetSafeJSContext. Is > there a preference? Papering over doesn't sound good. - Can you back out d20abbebe373 and reland it easily, or will there be merge pain from that? If it's easy, then it would be nice to do it now, either to find the problem or rule out this patch. If it's going to be hard, then it sounds like you don't think d20abbebe373 is the problem so it may not be worth trying at this point. On the hypothesis that d20abbebe373 is not the cause, I looked again and found 2 patches that look possibly related. lw, jorendorff, could these have anything to do with this crash? (lw; create XPCCallContext in XPC_NW_Construct) http://hg.mozilla.org/mozilla-central/rev/81b271bfaeba (jorendorff; JS_Save/RestoreFrameChain should update cx->compartment) http://hg.mozilla.org/mozilla-central/rev/a8252fe3f211
(In reply to comment #8) > - Can you back out d20abbebe373 and reland it easily, or will there be merge > pain from that? If it's easy, then it would be nice to do it now, either to > find the problem or rule out this patch. If it's going to be hard, then it > sounds like you don't think d20abbebe373 is the problem so it may not be worth > trying at this point. It wouldn't be hard to back out. The only problem I have is that if the crashes do go away it still leaves me with no clue what is going on --- it seems possible/likely that d20abbebe373 caused this failure, I just don't see how or why that could happen.
Luke also pointed me to this changeset as possibly related: (gal; TM: set right compartment in js::ctypes::CClosure::ClosureStub) http://hg.mozilla.org/mozilla-central/rev/835980f51185 It hit m-c on Oct 1, so it seems less likely.
I talked to Blake about the "XPCCallContext in XPC_NW_Construct" patch and we don't see anything suspicious.
These are on m-c now as well.
mark as fixed, or wait to see if we can confirm by looking for the absence of these in builds after oct 20? latest stats are: date tl crashes at, count build, count build, ... StartRequest 20101014 18 6 4.0b8pre2010101404, 5 4.0b8pre2010101304, 5 4.0b8pre2010101204, 2 4.0b8pre2010101104, 20101015 7 4 4.0b8pre2010101504, 2 4.0b8pre2010101204, 1 4.0b7pre2010100204, 20101016 7 3 4.0b8pre2010101504, 2 4.0b8pre2010101404, 1 4.0b8pre2010101604, 1 4.0b8pre2010101322, 20101017 5 2 4.0b8pre2010101704, 1 4.0b8pre2010101604, 1 4.0b8pre2010101404, 1 4.0b7pre2010100108, 20101018 7 3 4.0b8pre2010101404, 2 4.0b7pre2010100204, 1 4.0b8pre2010101704, 1 4.0b8pre2010101504, 20101019 3 4.0b8pre2010101904 3 ,
OS: Mac OS X → Windows XP
(In reply to comment #14) > mark as fixed, or wait to see if we can confirm by looking for the absence of > these in builds after oct 20? The backout is an experiment--we need to try to confirm. So far, I see there is one crash with a build id from 10/20, and it doesn't seem to be the standard m-c build, but it seems to be from a cset after the backout, so it looks like it is not fixed. We could try http://hg.mozilla.org/mozilla-central/rev/a8252fe3f211 if that doesn't work. Otherwise, someone should inspect the code for other clues about the problem or what experiments we can do to diagnose. I can't get to that for a bit yet--I am currently busy with 2 other b7-blocking topcrashes.
It looks like this crash is still there after the backout: http://crash-stats.mozilla.com/report/index/57c03edb-7c74-4823-ba02-bb0a62101021
Brian's patch should go back in. This bug smells like compartments. /be
(In reply to comment #17) > Brian's patch should go back in. This bug smells like compartments. I'd like to see at least one crash from a standard m-c nightly post-backout before doing that. But once we have ruled out this patch (which I expect we will), let's reland.
(In reply to comment #19) > Is this one from a nightly? > > http://crash-stats.mozilla.com/report/index/7aeb3b5f-51dd-45ad-9a4a-8aa4b2101021 Yes, that's a nightly. Go ahead an reland. Thanks for trying this out for me. Does anyone think it's worth trying backing out http://hg.mozilla.org/mozilla-central/rev/a8252fe3f211?
Don't back it out yet! That patch is righteous and necessary for correctness in a lot more cases. If it caused this crash, it's revealing a deeper issue that we have to look at. I'll look tomorrow.
(In reply to comment #21) > Don't back it out yet! That patch is righteous and necessary for correctness in > a lot more cases. If it caused this crash, it's revealing a deeper issue that > we have to look at. I'll look tomorrow. OK, I won't back out anything. Btw, I don't have any particularly good reason to suspect that patch: it's just in the right time range and seems at least peripherally related to the issue. For this crash, I think it needs someone to dive into the code and see what's going on, or what experiments we can do to get more data. I figured trying to back out a patch or two would have a small probability of finding the problem until it can get proper attention. Anyway, it didn't work. :-)
Assignee: bhackett1024 → jorendorff
Do we really feel like this is a topcrash worthy of blocking b7? I'm seeing, what, maybe 10 reports a day?
daily crash rate is variable. about 3-18 crashes per day in the last ten days, and we had that one spike of 28 crashes on oct 3. this is a new crash since beta5/6 so its hard to predict where this will go. if it's site content that tickles this bug directly, or indirectly through GC or other areas then the set of sites in the url list would expose the bug to a lot of users. 180 about:blank 102 \N 99 18 http://www.facebook.com/ 17 http://www.facebook.com/home.php? 13 jar:file:///C:/Arquivos%20de%20programas/Minefield/omni.jar!/chrome/browser/content/browser/aboutHome.xhtml 13 http://en-us.start3.mozilla.com/firefox?client=firefox-a&rls=org.mozilla:en-US:official 11 http://www.orkut.com.br/Main#Home 10 about:sessionrestore 7 http://static.ak.facebook.com/common/redirectiframe.html 7 http://maps.google.com/ 7 http://apps.facebook.com/onthefarm/index.php 7 http://apps.facebook.com/logout.php 6 https://mail.google.com/mail/?shva=1#inbox 6 http://www.orkut.com.br/Home 6 http://www.google.com/ 6 http://www.google.com.br/ 6 http://www.facebook.com/?ref=home
also theories about the possibility this is related to this family of bugs should be given a chance to pan out. Bug 604756 - crash [@ JSString::flatten() ] [@ JSString::flatten ] Bug 605836 - crash [@ js_NewGCShortString(JSContext*) see comments which might trace back to this bug. https://bugzilla.mozilla.org/show_bug.cgi?id=605836#c2 https://bugzilla.mozilla.org/show_bug.cgi?id=604756#c14
(In reply to comment #23) > Do we really feel like this is a topcrash worthy of blocking b7? I'm seeing, > what, maybe 10 reports a day? My vote would be no. It's currently #71. I don't know how many blocking topcrashes we have, but somehow I doubt it's around 70. (In reply to comment #24) > daily crash rate is variable. about 3-18 crashes per day in the last ten > days, and we had that one spike of 28 crashes on oct 3. > > this is a new crash since beta5/6 so its hard to predict where this will go. True. > if it's site content that tickles this bug directly, or indirectly through GC > or other areas then the set of sites in the url list would expose the bug to a > lot of users. I would say it's very unlikely it's certain site content. Given the signature and the proximate cause, it is probably related to to threading or compartments somehow. (In reply to comment #25) > also theories about the possibility this is related to this family of bugs > should be given a chance to pan out. > > Bug 604756 - crash [@ JSString::flatten() ] [@ JSString::flatten ] > Bug 605836 - crash [@ js_NewGCShortString(JSContext*) They might be loosely related (possibly both threading-related), but it is unlikely they are closely related: the flatten crash started much earlier in time.
Moving this to betaN+ based on comment 26
blocking2.0: beta7+ → betaN+
blocking2.0: betaN+ → -
status2.0: --- → wanted
It is #38 top crasher in 4.0b9 for the last week.
In bug 671840 with a [@ StartRequest ] crash signature the reporter has this STR: "I set Mozilla Firefox as a scheduled task to run every day at a given time and show a specific page. (Normally, the page loads without any problem!)" And that makes Firefox 5 crash when Firefox is not already running.
It's #29 top crasher in 9.0.1, #15 in 10.0b2, and #28 in 11.0a2 on Mac OS X. The stack trace now looks like: Frame Module Signature [Expand] Source 0 XUL StartRequest js/src/jsapi.cpp:944 1 XUL XPCCallContext::Init js/xpconnect/src/XPCCallContext.cpp:148 2 XUL XPCCallContext::XPCCallContext js/xpconnect/src/XPCCallContext.cpp:63 3 XUL nsXPConnect::BeginCycleCollection js/xpconnect/src/nsXPConnect.cpp:469 4 XUL nsCycleCollector::BeginCollection xpcom/base/nsCycleCollector.cpp:2733 5 XUL nsCycleCollectorRunner::Run xpcom/base/nsCycleCollector.cpp:3504 6 XUL nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:631 7 XUL NS_ProcessNextEvent_P obj-firefox/i386/xpcom/build/nsThreadUtils.cpp:245 8 XUL nsThread::ThreadFunc xpcom/threads/nsThread.cpp:272 9 libnspr4.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c:187 10 libSystem.B.dylib _pthread_start 11 libSystem.B.dylib thread_start
Severity: normal → critical
Crash Signature: [@ StartRequest]
Keywords: crash
OS: Windows XP → All
(In reply to Scoobidiver from comment #30) > The stack trace now looks like: > 4 XUL nsCycleCollector::BeginCollection > xpcom/base/nsCycleCollector.cpp:2733 > 5 XUL nsCycleCollectorRunner::Run xpcom/base/nsCycleCollector.cpp:3504 > 6 XUL nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:631 > 7 XUL NS_ProcessNextEvent_P > obj-firefox/i386/xpcom/build/nsThreadUtils.cpp:245 > 8 XUL nsThread::ThreadFunc xpcom/threads/nsThread.cpp:272 > 9 libnspr4.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c:187 > 10 libSystem.B.dylib _pthread_start > 11 libSystem.B.dylib thread_start So we start the cycle collection not from the main thread but from some other thread in response to some event???
(In reply to Igor Bukanov from comment #31) > So we start the cycle collection not from the main thread but from some > other thread in response to some event??? Technically yes, but the cycle collector thread and main thread do not run at the same time: the main thread tells the cycle collector thread to run a cycle collection and then the main thread blocks until the cycle collector thread finishes. Why this goofiness? Because gal speculated that it might trashing the caches of the core running the main thread (bug 580096). No measurements were performed and I would be happy if you removed this goofy case; it was no small annoyance to the single-threaded runtime work.
(In reply to Luke Wagner [:luke] from comment #32) > Why this goofiness? Because gal speculated that it might > trashing the caches of the core running the main thread (bug 580096). No > measurements were performed and I would be happy if you removed this goofy > case; it was no small annoyance to the single-threaded runtime work. Additionally, I think we might be using a separate thread to make it easier to implement interruptible cycle collection. (At least I think I heard that.) So we might not want to take it out.
(In reply to Bill McCloskey (:billm) from comment #33) Ah, good point.
(In reply to Scoobidiver from comment #30) > It's #29 top crasher in 9.0.1, #15 in 10.0b2, and #28 in 11.0a2 on Mac OS X. > > The stack trace now looks like: What % look like that? With BeginCycleCollection in there somewhere. I looked at maybe 6 crash reports, and only a few had BeginCycleCollection in there, so I'm curious what the rough % in the overall set might be.
reports like https://crash-analysis.mozilla.com/crash_stacks/Stack-summary-11.0a2.txt can help with questions like this to break down the distribution of like stacks for any one signature. in the browser search down to "startrequest" to find the signature or something deeper in the frame.
Thanks, I really need to write down that URL somewhere. I looked through the stack reports for January 6, searching for "|startrequest". The links I give seem to be updated daily, so you won't find the exact same thing I did 1/8 and onwards. It doesn't show up at all in 9. In 10, it is ranked 91. There are 23 stacks. https://crash-analysis.mozilla.com/crash_stacks/Stack-summary-10.0.txt - 9 look like JS_SetRuntimeThread, PR_GetCurrentThread, blank frames, start request - 8 are nsGlobalWindow::SetNewDocument eventually calling nsDOMClassInfo::DefineStaticJSVals - 5 are inside the cycle collector - 1 is some kind of thready thing involving nsRunnable::Release In 11, it is ranked 57. There are 6 stacks. - 3 are in the cycle collector - 2 are inside XPCJSContextStack::Pop whatever that is - 1 is blank
It's #100 top browser crasher but #4 on Mac OS X in 10.0.1.
Scoob, there are no crashes for this sig newer than version 11. there are other sigs that *contain* StartRequest for current versions, such as mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | nsStreamLoader::OnStartRequest(nsIRequest*, nsISupports*)
(In reply to Wayne Mery (:wsmwk) from comment #39) > Scoob, there are no crashes for this sig newer than version 11. Let's close it.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.