Closed
Bug 634542
Opened 14 years ago
Closed 14 years ago
Spike in crashes [@ call_trace ] among users with Firebug installed + [@ JSStackFrame::callObj() ]
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | betaN+ |
People
(Reporter: scoobidiver, Assigned: luke)
References
Details
(Keywords: crash, regression, topcrash, Whiteboard: [hardblocker][has patch][landed already?])
Crash Data
Attachments
(1 file, 5 obsolete files)
4.71 KB,
patch
|
gal
:
review+
|
Details | Diff | Splinter Review |
It is a low volume crash, but from 4.0b12pre/20110215, there is a spike in crashes.
It is #4 top crasher in this build.
Signature call_trace
UUID d9f5abe0-92a7-4bac-9855-124262110216
Time 2011-02-16 00:23:46.49567
Uptime 1628
Last Crash 1658 seconds (27.6 minutes) before submission
Install Age 1628 seconds (27.1 minutes) since version was first installed.
Product Firefox
Version 4.0b12pre
Build ID 20110215030353
Branch 2.0
OS Mac OS X
OS Version 10.6.4 10F569
CPU amd64
CPU Info family 6 model 15 stepping 10
Crash Reason EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash Address 0x6a
App Notes Renderers: 0x22600,0x20400
Frame Module Signature [Expand] Source
0 XUL call_trace js/src/jsvalue.h:1199
1 XUL js_TraceObject js/src/jsobj.cpp:6507
2 XUL js::gc::MarkChildren js/src/jsgcinlines.h:289
3 XUL js::MarkRuntime js/src/jsgcinlines.h:347
4 XUL js_GC js/src/jsgc.cpp:2407
5 XUL nsXPConnect::Collect js/src/xpconnect/src/nsXPConnect.cpp:407
6 XUL nsXPConnect::GarbageCollect js/src/xpconnect/src/nsXPConnect.cpp:415
7 XUL nsJSContext::CC dom/base/nsJSEnvironment.cpp:3400
8 XUL nsJSContext::MaybeCC dom/base/nsJSEnvironment.cpp:3490
9 XUL GCTimerFired dom/base/nsJSEnvironment.cpp:3469
10 XUL nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:425
11 XUL nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:517
12 XUL nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:633
13 XUL NS_ProcessPendingEvents_P nsThreadUtils.cpp:200
14 XUL nsBaseAppShell::NativeEventCallback widget/src/xpwidgets/nsBaseAppShell.cpp:132
15 XUL nsAppShell::ProcessGeckoEvents widget/src/cocoa/nsAppShell.mm:399
16 CoreFoundation CoreFoundation@0x4de90
17 CoreFoundation CoreFoundation@0x4c088
The regression window is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=26421a3b68f3&tochange=2c646d10b9c7
More reports at:
https://crash-stats.mozilla.com/report/list?product=Firefox&range_value=4&range_unit=weeks&signature=call_trace
Reporter | ||
Updated•14 years ago
|
blocking2.0: --- → ?
This falls under random GC marking crashes.
Blocks: 613650
blocking2.0: ? → -
Assignee | ||
Comment 2•14 years ago
|
||
Having a single GC crash address rocket to #4 (almost tying #2) starting with a single nightly build seems causal.
Comment 3•14 years ago
|
||
This looks worrysome. Can we try to investigate the stacks? No need to block until we have STRs but early analysis would be good.
blocking2.0: - → ?
Comment 4•14 years ago
|
||
So far it is just a one-day spike: the volume on Feb 16 is low again. (There are plenty of crashes already today from the Feb 15 build.) I'd wait a day or two to see if it persists before bothering with it too much.
Comment 5•14 years ago
|
||
OK, will re-evaluate tomorrow.
Reporter | ||
Comment 6•14 years ago
|
||
> So far it is just a one-day spike
It is #2 top crasher in the Feb 16 build. Until now, crash report numbers are the following ones:
2011021600 27
2011021500 51
2011021400 1
2011020300 8
2011020100 1
Comments say:
"Was cmd + tab'ing from browser to text editor"
"Just closed google wave."
"this is cause in firefox 4 nightly and with Firebug a 1.7.ax11"
Assignee | ||
Comment 7•14 years ago
|
||
The 20 crashes I looked at all had Firebug installed.
Summary: Spike in crashes [@ call_trace ] → Spike in crashes [@ call_trace ] among users with Firebug installed
Yep, looks like all the users have either Firebug 1.7X.0a10 (released Feb. 11 12:58pm) or 1.7X.0a11 (released Feb. 16 2:12am), so it looks like the users seeing this are running the current Firebug alphas for Firefox 4.
Comment 9•14 years ago
|
||
Tentatively marking as a hardblocker since this looks like a "firebug makes me crash" scenario and its position in terms of overall crashiness might underestimate its importance.
blocking2.0: ? → final+
Summary: Spike in crashes [@ call_trace ] among users with Firebug installed → Spike in crashes [@ call_trace ]
Whiteboard: [hardblocker]
Updated•14 years ago
|
Summary: Spike in crashes [@ call_trace ] → Spike in crashes [@ call_trace ] among users with Firebug installed
Comment 10•14 years ago
|
||
Bug 614347 landed at 1pm on 2011-02-16, and I have a (small) hope that it might help.
Comment 11•14 years ago
|
||
Steve and I analyzed this, and it turns out Luke came to the same conclusion independently. The crash is inside call_trace:
uintN count = fp->script()->bindings.countArgsAndVars();
// fp->script() is a jsval tag, so we crash trying to dereference
// |bindings|.
So it seems that a call object has a private that points to a random jsval instead of the stack frame. This could happen if the function that owns the call object exits without clearing the call object's fp pointer. Then it would be fairly likely that the target of the fp pointer would get overwritten with stack jsvals, causing us to crash just like this.
Given that all these seem happen with Firebug, we thought that it might have something to do with JS_EvaluateInStackFrame. EvalInStackFrame calls JS_GetScopeChain, which can create a call object on demand. In particular, if the current stack frame is an eval frame, then we would create a call object for it, but the script epilogue code doesn't clear call objects for non-strict eval frames. So that's the best guess right now for the exact cause. In any case, we strongly suspect that some function-return path skips clearing the call object's fp when it shouldn't.
Next steps are to try to repro this condition with Firebug and/or land diagnostic patches to test for that condition in the field.
Assignee: general → sphink
Status: NEW → ASSIGNED
Assignee | ||
Comment 12•14 years ago
|
||
Unfortunately, the suspected bug in comment 11 wasn't a bug, so we still don't have a specific cause. It still seems likely, though, that an eval frame is being given its own call object, so this is a patch to try to catch that bug close to the source and crash informatively. Also, I added some code to take a copy of the JSStackFrame near the crash in call_trace to better diagnose crashes. I'll wait for green on try, of course, but asking for approval to push to m-c.
Attachment #513325 -
Flags: review?
Attachment #513325 -
Flags: approval2.0?
Comment 13•14 years ago
|
||
Comment on attachment 513325 [details] [diff] [review]
not fix, find bugs
Approving late-stage telemetry.
Attachment #513325 -
Flags: approval2.0? → approval2.0+
Comment 14•14 years ago
|
||
Comment on attachment 513325 [details] [diff] [review]
not fix, find bugs
Confirmed that the memcpy is not optimized away in MSVC. I think I've done it just like this before *except* I didn't make the global symbol public, and it didn't work. So that's the key. (Another way to do it is to write your own memcpy using volatile ptr arguments--that works too.)
Attachment #513325 -
Flags: review? → review+
Assignee | ||
Comment 15•14 years ago
|
||
Diagnosis landed:
http://hg.mozilla.org/mozilla-central/rev/c1b04cd37871
Comment 16•14 years ago
|
||
Dunno if I'm searching correctly or not, but I don't see any crashes coming in with the js_GetCallObject or PutActivationObjects signatures. I see a dozen more call_trace crashes today from last night's build. I haven't figured out how to get the crash dump data yet (my LDAP bit is flipped, so that's not the problem), so I don't know if Luke's frame copy is showing up properly.
Assignee | ||
Comment 17•14 years ago
|
||
I was also just looking at this. Yeah, it seems none of the crash-asserts tripped. The frame copy did indeed work, but it indicates a clobbered stack frame (which we already knew for the case where we crash getting fp->script; the copy was hoping that we'd crash at 0xbad (with a non-trash frame)).
I'm going to work on another telemetry patch.
Comment 18•14 years ago
|
||
FWIW, the 64-bit ones seem to be almost all near-null crashes (one at ~0LL, a bunch at 0x0, and a bunch more < 0xFF); the 32-bit ones are mostly within 0xFF of 0xffffffffffff0000 ... which I guess is Socorro for 0xffff0000?
Whiteboard: [hardblocker] → [hardblocker][needs telemetry]
Assignee | ||
Comment 19•14 years ago
|
||
This patch smuggles the originating creation site of the call object into the call object's flags bits. It also records some chars of the call object's function's fun->atom/script->filename. Lastly, it records whether this call object was passed up in some PutActivation objects call.
Attachment #513325 -
Attachment is obsolete: true
Attachment #513672 -
Flags: review?
Assignee | ||
Comment 20•14 years ago
|
||
Comment on attachment 513672 [details] [diff] [review]
not fix, find bugs 2
Gotta stop assuming :dmandelin works.
Attachment #513672 -
Flags: review? → review?(dmandelin)
Comment 21•14 years ago
|
||
(In reply to comment #19)
> Created attachment 513672 [details] [diff] [review]
> not fix, find bugs 2
>
> This patch smuggles the originating creation site of the call object into the
> call object's flags bits. It also records some chars of the call object's
> function's fun->atom/script->filename. Lastly, it records whether this call
> object was passed up in some PutActivation objects call.
Huh? The attached patch is the same as the previous, except it buffers the captured frame on both sides. Did you attach the right one?
Assignee | ||
Comment 22•14 years ago
|
||
Thanks for pointing that out Steve. Patch looking green on try; anyone feel free to jump in and r+ so I can get this in before the nightly.
Attachment #513672 -
Attachment is obsolete: true
Attachment #513699 -
Flags: review?(dmandelin)
Attachment #513672 -
Flags: review?(dmandelin)
Updated•14 years ago
|
Attachment #513699 -
Flags: review?(dmandelin) → review+
Assignee | ||
Comment 23•14 years ago
|
||
Diagnostic 2:
http://hg.mozilla.org/mozilla-central/rev/5dd4f10fb1fd
Assignee | ||
Comment 24•14 years ago
|
||
There have been some crashes with useful data (some now in JSStackFrame::callObj (called by call_trace, right before call_trace would have crashed)).
The origin of the un-put Call object is JS_GetFrameCallObject, which means we are in the rare case that a call objects is created for a non-heavyweight function (which is exercised by firebug for breakpoints/watch exprs/etc).
I scanned the relevant interp/mjit places for this case and nothing jumped out. For at least one crash, the call object was created for some unnamed function in this (beautified and not-containing-personal-info) Facebook script: http://pastebin.mozilla.org/1082776. I played around with Firebug on Facebook (breaking on this exact script) to no avail.
Assignee | ||
Comment 25•14 years ago
|
||
Attachment #513699 -
Attachment is obsolete: true
Attachment #513812 -
Flags: review?(dvander)
Assignee | ||
Comment 26•14 years ago
|
||
I keep hoping someone will pop up with a simple STR but, until then, the well isn't dry. This patch checks some new key asserts in opt builds and collects more data about the source of the call object.
Attachment #513813 -
Flags: review?(dvander)
Comment on attachment 513812 [details] [diff] [review]
patch to backout previous patches
> JS_ASSERT(fp->isFunctionFrame() && !fp->isEvalFrame());
>+
>+ if (fp->isEvalFrame())
>+ *((int *)0x1337) = 0;
Is this a new diagnostic or part of the backout?
Attachment #513812 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 28•14 years ago
|
||
(In reply to comment #27)
> Is this a new diagnostic or part of the backout?
Oops, meant to back that out too, good catch.
Comment 30•14 years ago
|
||
> I keep hoping someone will pop up with a simple STR
not much useful in crash meta data.
Correlation to startup or time of session
42 total crashes for call_trace on 20110219-crashdata.csv
5 startup crashes inside 30 sec.
14 startup crashes inside 3 min.
7 repeated crashes inside 3 min. of last crash
urls for testing
16 \N
2 jar:file:///C:/Program%20Files%20(x86)/Minefield/omni.jar!/chrome/browser/content/browser/aboutHome.xhtml
2 http://lmorchard.com/status/main/ostatussub
1 wyciwyg://144/http://talkgadget.google.com/talkgadge
1 jar:file:///C:/Program%20Files%20(x86)/Minefield/omni.jar!/chrome/browser/content/browser/aboutPrivateBrowsing.xhtml
1 https://www.google.com/accounts/ServiceLogin?service=mail&passive=true&rm=false&continue=http...
1 https://www.facebook.com/pagelet/generic.php/PersonalProfileWallTabPagelet?__a=3&ajaxpipe=1&data=%7B%22profile_id%22%3A516463201%2C%22q%22%3A%22lmorchard%22%2C%22filter%22%3A1%2C%22can_see_more%22%3Atrue%2C%22pager_load_count%22%3A1%7D
1 https://0-176.channel.facebook.com/iframe/11?r=https%3A%2F%2Fs-static.ak.facebook.com%2Frsrc.php%2Fv1%2Fyg%2Fr%2FPzAcNRMfs_y.js&r=https%3A%2F%2Fs-static.ak.facebook.com%2Frsrc.php%2Fv1%2Fy-%2Fp%2Fr%2FggG67ARnbnR.js&r=https%3A%2F%2Fs-static.ak.facebook.com
1 http://www.theonion.com/articles/area-dad-just-wants-computer-with-the-basics,19195/
1 http://www.ligtv.com.tr/Default.aspx?r=1&hid=87601
1 http://www.jamesming.com/corkhub/index.php/home/vendors
Comment 31•14 years ago
|
||
this should probably block the next beta since it seems to be a significant and sustained regression since the last beta starting around 2011021503 or maybe just a bit before.
call_trace
date total breakdown by build
crashes count build, count build, ...
20110210
20110211 2 1 4.0b92011011019,
1 4.0b112011020314,
20110212 1 4.0b92011011019 1 ,
20110213 3 2 4.0b112011020314,
1 4.0b82010121417,
20110214 1 4.0b12pre2011021403 1 ,
20110215 15 13 4.0b12pre2011021503,
2 4.0b112011020314,
20110216 61 37 4.0b12pre2011021503,
21 4.0b12pre2011021603, 2 4.0b112011020314,
1 4.0b102011012116,
20110217 41 23 4.0b12pre2011021603,
14 4.0b12pre2011021703, 2 4.0b12pre2011021503,
1 4.0b82010121417, 1 4.0b12pre2011021712,
20110218 49 19 4.0b12pre2011021803,
14 4.0b12pre2011021712, 9 4.0b12pre2011021703,
3 4.0b112011020314, 2 4.0b12pre2011021603,
1 4.0b92011011019, 1 4.0b12pre2011021503,
20110219 41 23 4.0b12pre2011021803,
9 4.0b12pre2011021903, 4 4.0b12pre2011021712,
3 4.0b12pre2011021703, 1 4.0b82010121417,
1 4.0b12pre2011021814,
Comment 32•14 years ago
|
||
> 2 http://lmorchard.com/status/main/ostatussub
Les, are you seeing this crash often enough that you could try to hit it? If so, getting it in a debugger and/or a minidump/core dump would be probably be very helpful.
Reporter | ||
Comment 33•14 years ago
|
||
The JSStackFrame::callObj() crash signature caused by the diagnostic patch is currently #2 top crasher in 4.0b12pre.
Comment 34•14 years ago
|
||
STR:
1) Install Firebug 1.7a11
2) enable it for all sites
3) Open Google in tab 1
3) Open Google Docs in tab 2
4) Open Google Docs in tab 3
5) In tab 3, create a new presentation
6) Change the title, subtitle to something
7) Close tab 3 and immediately close tab 2
https://crash-stats.mozilla.com/report/index/bp-b3b5d55b-41bd-4b68-b820-1609e2110221
Comment 35•14 years ago
|
||
Btw, these STR may not be minimal. If you create a new presentation it opens in a new tab. And if I close the presentation I get a popup where I click "Leave page". This crashes every time, most often @call_trace, but sometimes @js_TraceObject: https://crash-stats.mozilla.com/report/index/bp-b2b49d12-6267-4ce0-a36f-e3d2a2110221
Let me know if you need more detailed steps or anything.
Comment 36•14 years ago
|
||
resetting the blocking flag so this gets reconsideration for beta blocking.
also noting that the JSStackFrame::callObj.. signature has a slightly later spike profile where the bump was not seen until builds from feb 19.
JSStackFrame::callObj..
date total breakdown by build
crashes count build, count build, ...
20110215 6 2 4.0b112011020314,
1 4.0b92011011019, 1 4.0b82010121417,
1 4.0b12pre2011021403, 1 4.0b102011012116,
20110216 5 4 4.0b112011020314,
1 4.0b102011012116,
20110217 1 4.0b112011020314 1 ,
20110218 1 4.0b112011020314 1 ,
20110219 16 11 4.0b12pre2011021903,
5 4.0b112011020314,
20110220 52 26 4.0b12pre2011022003,
21 4.0b12pre2011021903, 5 4.0b112011020314,
combined, the two signatures are pretty close to the top crash.
Updated•14 years ago
|
Summary: Spike in crashes [@ call_trace ] among users with Firebug installed → Spike in crashes [@ call_trace ] among users with Firebug installed + [@ JSStackFrame::callObj() ]
Updated•14 years ago
|
blocking2.0: final+ → ?
Comment 37•14 years ago
|
||
(In reply to comment #35)
> Btw, these STR may not be minimal. If you create a new presentation it opens in
> a new tab. And if I close the presentation I get a popup where I click "Leave
> page". This crashes every time, most often @call_trace, but sometimes
> @js_TraceObject:
> https://crash-stats.mozilla.com/report/index/bp-b2b49d12-6267-4ce0-a36f-e3d2a2110221
>
> Let me know if you need more detailed steps or anything.
(In reply to comment #34)
> STR:
>
> 1) Install Firebug 1.7a11
> 2) enable it for all sites
> 3) Open Google in tab 1
> 3) Open Google Docs in tab 2
> 4) Open Google Docs in tab 3
> 5) In tab 3, create a new presentation
> 6) Change the title, subtitle to something
> 7) Close tab 3 and immediately close tab 2
>
> https://crash-stats.mozilla.com/report/index/bp-b3b5d55b-41bd-4b68-b820-1609e2110221
Great, I was able to reproduce using these steps! Just to note that I have created new presentation in tab 4 and then closed 4, 3, 2 -> crash.
https://crash-stats.mozilla.com/report/index/1da02646-4996-4b74-aa07-605462110221
Honza
blocking2.0: ? → ---
Updated•14 years ago
|
blocking2.0: --- → ?
Comment 38•14 years ago
|
||
(In reply to comment #32)
> > 2 http://lmorchard.com/status/main/ostatussub
>
> Les, are you seeing this crash often enough that you could try to hit it? If
> so, getting it in a debugger and/or a minidump/core dump would be probably be
> very helpful.
I don't think I've gotten it in awhile. But, I'll take a look at getting Minefield running in a debugger - never done it before
Assignee | ||
Comment 39•14 years ago
|
||
Beautiful, can repro locally with a debug build.
Assignee | ||
Comment 40•14 years ago
|
||
Despite looking at the js patches in the regression range like 10 times, there it is: a missing call to PutActivationObjects on the new cleared-scope error path.
Assignee | ||
Comment 41•14 years ago
|
||
Backed out the diagnostics:
http://hg.mozilla.org/mozilla-central/rev/2ebeb6e9f036
This patch fixes the reproducible crash in call_trace for me and should, based on the regression range, fix the top crashes.
With the error->report->firebug->dangling-call-object mindset, I can see several other (highly unlikely) error paths that could cause the same bug (perhaps this is the low-volume call_trace crash history). Rather than requiring each error source PutActivationObjects (like this patch), I think instead we should refactor a bit so that PutActivationObjects is always called on the error path.
Perhaps we can land/verify this fix tonight and do the bigger/better patch next?
Assignee: sphink → lw
Attachment #513812 -
Attachment is obsolete: true
Attachment #513813 -
Attachment is obsolete: true
Attachment #514066 -
Flags: review?(gal)
Attachment #513813 -
Flags: review?(dvander)
Assignee | ||
Updated•14 years ago
|
Whiteboard: [hardblocker][needs telemetry] → [hardblocker][has patch]
Updated•14 years ago
|
Attachment #514066 -
Flags: review?(gal) → review+
Assignee | ||
Comment 42•14 years ago
|
||
Landed the quick fix; will file follow-up bug for the rest of the stuff in comment 41.
http://hg.mozilla.org/mozilla-central/rev/73d941bd2ef2
Shouldn't it be a call to ScriptEpilogue()?
Assignee | ||
Comment 44•14 years ago
|
||
No, because ScriptPrologue hasn't been called yet for the frame. I think that's the root of this family of bugs: creating the call object is done before the prologue, but cleaned up by the epilogue.
Comment 45•14 years ago
|
||
This landed, so marking betaN+ for bookkeeping changes, but wondering why it isn't marked FIXED? Is that because the follow-up bug needs to be filed? Or was that the also already landed bug 635805?
blocking2.0: ? → betaN+
Whiteboard: [hardblocker][has patch] → [hardblocker][has patch][landed already?]
Assignee | ||
Comment 46•14 years ago
|
||
The bug is for a spike in nightly crashes so I was planning to watch crashes for a day to verify that the crashes went away / down to their original levels. I am fine with marking the bug FIXED now though if that's the protocol.
Also, the followup was filed; forgot to link here: bug 635811.
Comment 47•14 years ago
|
||
I could not reproduce the bug using steps in comment #37 in the nightly
Built from http://hg.mozilla.org/mozilla-central/rev/1da3405c74fd
Honza
Comment 48•14 years ago
|
||
We can reopen if this didnt fix it.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 49•14 years ago
|
||
Looks fixed; no call_trace crashes since 20110221030349.
Updated•13 years ago
|
Crash Signature: [@ call_trace ]
[@ JSStackFrame::callObj() ]
You need to log in
before you can comment on or make changes to this bug.
Description
•