Closed
Bug 530955
Opened 15 years ago
Closed 15 years ago
New crash [@ ExecuteTree] in Firefox 3.6b3
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking1.9.2 | --- | .5+ |
status1.9.2 | --- | .5-fixed |
status1.9.1 | --- | unaffected |
People
(Reporter: jst, Assigned: dvander)
References
()
Details
(4 keywords, Whiteboard: [sg:critical][critsmash:investigating])
Crash Data
Attachments
(4 files)
3.10 KB,
text/plain
|
Details | |
39.07 KB,
image/png
|
Details | |
266 bytes,
application/x-javascript
|
Details | |
1.43 KB,
patch
|
dmandelin
:
review+
beltzner
:
approval1.9.2.5+
|
Details | Diff | Splinter Review |
There's a new crash in Firefox 3.6b3 with the signature "ExecuteTree" in Firefox 3.6b3 that hasn't been seen in any of the versions 3\.5.*. So far we've seen 78+ of these crashes in the wild.
Please see http://crash-stats.mozilla.com/query/query?product=Firefox&version=Firefox%3A3.6b3&range_value=1&range_unit=weeks&query_search=signature&query_type=exact&query=ExecuteTree&do_query=1 for more crash info.
Flags: blocking1.9.2?
Comment 1•15 years ago
|
||
21 total crashes for ExecuteTree on 20091122-crashdata.csv
3 start up crashes inside 3 minutes
signature list
18 ExecuteTree
3 js_ExecuteTree
os breakdown
6 ExecuteTree Windows NT 5.1.2600 Service Pack 2
5 ExecuteTree Windows NT 5.1.2600 Service Pack 3
4 ExecuteTree Windows NT 6.1.7600
3 js_ExecuteTree Windows NT 5.1.2600 Service Pack 2
1 ExecuteTree Windows NT 6.0.6002 Service Pack 2
1 ExecuteTree Windows NT 6.0.6001 Service Pack 1
1 ExecuteTree Windows NT 5.1.2600 Service Pack 2, v.2055
distribution of all versions where the ExecuteTree crash was found on 20091122-crashdata.csv
14 Firefox 3.6b3
3 Firefox 3.6b1
3 Firefox 3.5.5
1 Firefox 3.6b2
Comment 2•15 years ago
|
||
I think I first see this showing up in 3.6 data on 10/16 then ramps just after beta 1 on nov 5
Comment 3•15 years ago
|
||
in the lists in comment 2 the js_ExecuteTree is probably a different bug that is low volume on both 3.5.x and 3.6
http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=js_ExecuteTree&date=&range_value=1&range_unit=weeks&do_query=1&signature=js_ExecuteTree
That should probably be tracked separately.
volumes shown in comment 2 are only the ExecuteTree signature and show higher volume and increasing as 3.6 betas get more users.
Comment 4•15 years ago
|
||
It looks like the higher volume crash is in the called tree itself, so we'll probably need minidump inspection here to have any joy. The lower-volume 3.5+3.6 one looks to be crashing trying to call LeaveTree, can't tell much more without minidumpery.
Flags: blocking1.9.2? → blocking1.9.2+
Updated•15 years ago
|
Assignee: general → gal
Comment 5•15 years ago
|
||
Looks like I won the lottery. I will start looking at the stacks and take it from there.
Comment 6•15 years ago
|
||
The LeaveTree one might be an oom condition. Focusing on the higher-volume on trace crash. Minidump won't help much because we can't see the jitted coded from the minidump. We should look at the urls.
Comment 7•15 years ago
|
||
I spent some time browsing the url jst pulled for me. No luck. Its very heavy on facebook, but that might just be representative of average web use. None of the URLs crash for me. At this point my gut is telling me this is a GC issue (we GC and something isn't kept alive, causing us to die on trace). jst has 2 core files for linux in the same general area as this bug. I will look at those next.
Comment 8•15 years ago
|
||
Ok, after some digging around I have convinced myself that this bug was fixed by the following patch:
https://bugzilla.mozilla.org/show_bug.cgi?id=528048
The patch makes sure sprops stay alive after a GC if we embed them on trace. The patch has landed on m-c a week ago but is not on 1.9.2 yet.
There isn't enough crash data for trunk to tell whether the patch fixed anything there. I only see 1 crash with ExecuteTree for the last 4 weeks on trunk.
Comment 9•15 years ago
|
||
This crash shows as #40 in 3.6 B4. Did the patch in Bug 528048 make it into the beta (from the checkin date it looks as if it might have)? If it still an issue and someone can give me some URLs I will see if I can reproduce.
Comment 10•15 years ago
|
||
marcia, jst pulled urls for me but this is a GC bug so I was never able to reproduce it. I was hoping we could confirm based on crash stats that the patch fixed the problem.
Comment 11•15 years ago
|
||
From http://hg.mozilla.org/releases/mozilla-1.9.2/pushloghtml it looks like the patch in bug 528048 went into the 1.9.2 branch well after b4 was tagged. So I think we have to wait for b5 (or RC) crash data.
Keywords: qawanted
Whiteboard: [waiting for 3.6b5 crash data]
Comment 12•15 years ago
|
||
I will look at crash stats for beta 5 and renominate for blocking if needed.
Flags: blocking1.9.2+
Comment 13•15 years ago
|
||
adding dependency so we don't loose track of rechecking this. sounds like beta 5 might be going out today.
Depends on: 528048
Comment 14•15 years ago
|
||
[@ ExecuteTree] went from #83 in b4 to #10 in b5 :(
http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6b4
http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6b5
Flags: blocking1.9.2?
Whiteboard: [waiting for 3.6b5 crash data]
Comment 15•15 years ago
|
||
yeah, something caused a big spike yesterday. one thing that happened was 200,000 new users moved up from beta4 to beta5 or joined the beta as new users.
https://wiki.mozilla.org/CrashKill/Crashr#Relases_3.6b5
Updated•15 years ago
|
Flags: blocking1.9.2? → blocking1.9.2+
Comment 16•15 years ago
|
||
not as bad as the spike of 140 crashes on 12-19, but volume is still up on 12-20 == 112 crashes.
there are a couple of 'hu' TLD's that seem to be part of the increase the last two days.
domains of sites
26 http://www.facebook.com
http://apps.facebook.com/thesummoning/
http://apps.facebook.com/inthemafia/
http://apps.facebook.com/happy-aquarium/
4 http://myvip.com
newly spotted domains on 12-19/20
23 http://www.baratikor.hu
8 http://www.baratikor.hu/
2 http://www.baratikor.hu/dating/?selectedTab=13
lots of http://www.baratikor.hu/gallery/?fid= sanitized
lots of http://www.baratikor.hu/friend/? sanitized
10 http://hi5.com
2 http://hi5.com/friend/profile/displayProfile.do?userid= -santized
2 http://hi5.com/friend/mail/displayInbox.do
2 http://hi5.com/
5 http://iwiw.hu
2 http://iwiw.hu/pages/user/login.jsp?method=Login
2 http://iwiw.hu/
Comment 17•15 years ago
|
||
are these crashes all the latest beta? iow, have we eliminated older buildids?
Comment 18•15 years ago
|
||
some of these are on 3.5.x but much higher rate om 3.6
checking --- 20091220-crashdata.csv ExecuteTree
release total-crashes
ExecuteTree crashes
pct.
all 215848 112 0.000518884
3.0.15 6811 0
3.0.16 31577 0
3.5.5 18251 2 0.000109583
3.5.6 105854 3 2.83409e-05
3.6b5 16669 94 0.00563921
3.6b4 5000 10 0.002
3.6b3 644 0
3.6b2 661 2 0.00302572
3.6b1 2101 0
Comment 19•15 years ago
|
||
before this recent spike the distribution was still tilted toward 3.6bx
checking --- 20091215-crashdata.csv ExecuteTree
release total-crashes
ExecuteTree crashes
pct.
all 231799 47 0.000202762
3.0.15 41098 0
3.0.16 414 0
3.5.5 127012 7 5.51129e-05
3.5.6 3080 0
3.6b5 209 0
3.6b4 22729 38 0.00167187
3.6b3 677 1 0.0014771
3.6b2 793 0
3.6b1 2265 1 0.000441501
Comment 20•15 years ago
|
||
This is almost certainly multiple bugs; ExecuteTree is tracemonkeyese for "running generated code". All the ILLEGAL_INSTRUCTION (and PRIV_INSTRUCTION -- exciting! are we running garbage?) ones are on AMDs, so there may be some instruction selection issues at hand.
Comment 21•15 years ago
|
||
I looked through a bunch of these and found many different stacks, with a large percentage of crashes happening on hungarian operating systems. even within that subset I found different stacks. we're going to have to minus this and look for more data.
Updated•15 years ago
|
Flags: wanted1.9.2+
Flags: blocking1.9.2-
Flags: blocking1.9.2+
Comment 22•15 years ago
|
||
(In reply to comment #20)
> All the ILLEGAL_INSTRUCTION (and PRIV_INSTRUCTION -- exciting! are we
> running garbage?) ones are on AMDs
Most, but not all; e.g. https://crash-stats.mozilla.com/report/index/0817bbaa-f4bc-48d0-8c52-e438f2091219
Anyway, I agree this is probably multiple bugs. Unfortunately, this crash happens too rarely to be found in nightlies (although I may search harder to try to find it), so we cannot try to guess at a patch that may have started it.
choffman: is there any way we can get these crash numbers compared to ADUs? I'm curious if the increase we see around b3 or so is simply due to more users, or if there is evidence that a patch/patches introduced more of these at that time.
Sadly, minidumps can't help us at this time because the code at the crash point is generated, and is not on the stack. We need new ideas in order to get anywhere on this. Here are two:
1. dvander suggested stashing the script filename before we call the trace so that we can recover it from the crashreport. For example, we could copy it to a char[] buffer in ExecuteTree, and then it would be in the minidump. We could see if the same script keeps showing up, or if we're really lucky find a test case.
2. Teach breakpad to send back the generated trace code. One idea is to create a breakpad API to register a memory range of interest. We call that just before calling a trace with the range of that trace. If we crash, then breakpad includes that memory in the minidump. After returning from the trace, we unregister that range. The problem with this idea is that traces can call other traces, so we can't easily and compactly represent the memory range of interest.
A simpler idea that would work is to add the page that contains EIP to the minidump. We could then refine that with tracer knowledge later.
Comment 23•15 years ago
|
||
(In reply to comment #22)
> A simpler idea that would work is to add the page that contains EIP to the
> minidump. We could then refine that with tracer knowledge later.
file pls!
Comment 24•15 years ago
|
||
(In reply to comment #22)
>
> choffman: is there any way we can get these crash numbers compared to ADUs? I'm
> curious if the increase we see around b3 or so is simply due to more users, or
> if there is evidence that a patch/patches introduced more of these at that
> time.
>
bugs are on file to get adu data merged into the crash database so we can do things like that more easily. until then I'm grabing snaps from the two sources and pasting together at https://wiki.mozilla.org/CrashKill/Crashr
adus
crash-count 3.6b3 3.6b4
20091118-crashdata 21 18435
20091119-crashdata 26 142847
20091120-crashdata 30 207349
20091121-crashdata 20 217975
20091122-crashdata 21 243541
20091123-crashdata 33 294307
20091124-crashdata 24 321004
20091125-crashdata 30 319230
20091126-crashdata 50 313303 11003
20091127-crashdata 54 227788 101832
20091128-crashdata 70 111492 208895
20091129-crashdata 63 80372 262879
20091130-crashdata 77 79695 318380
20091201-crashdata 87 58951 354012
20091202-crashdata 43 47254 377984
20091203-crashdata 42 40100 394451
20091204-crashdata 46 34703 399269
20091205-crashdata 42 29512 375329
20091206-crashdata 45 26259 390387
20091207-crashdata 34 28124 447912
20091208-crashdata 54 26173 460269
Comment 25•15 years ago
|
||
(In reply to comment #22)
> 1. dvander suggested stashing the script filename before we call the trace so
> that we can recover it from the crashreport. For example, we could copy it to a
> char[] buffer in ExecuteTree, and then it would be in the minidump. We could
> see if the same script keeps showing up, or if we're really lucky find a test
> case.
Not sure you need to copy the whole string to a stack buffer -- we know how perf-sensitive ExecuteTree/LeaveTree are -- but here's a fun fact: script filenames are GC'ed and shared aggressively, see js_SaveScriptFilename. The char buffer used is an extension of a JSHashEntry, so in the heap, but perhaps you could use a more concise id to track filename from the stack.
> A simpler idea that would work is to add the page that contains EIP to the
> minidump. We could then refine that with tracer knowledge later.
+1000.
/be
Comment 26•15 years ago
|
||
I'm in contact with a user who reported a reproducible crash
that looks likes this bug. Here's his crash data:
http://crash-stats.mozilla.com/report/index/93852b95-98f9-47a4-9e5b-0b69b2100222
He's very cooperative and have created a test account for us on their server.
I can reproduce the crash (only on Windows though), my crashes:
bp-40101cbf-e63b-4bd6-9b48-6d6392100324 2010-03-25 01:29
bp-c4a8bc74-d38a-40d5-8e1d-109fc2100324 2010-03-25 01:28
bp-87e16b1d-961f-4524-8154-7d01b2100324 2010-03-25 01:28
bp-14631208-05bc-4db0-9160-f23ab2100324 2010-03-24 21:42
bp-aadbeecc-ef7d-4e2a-9d50-adfe42100324 2010-03-24 21:39
bp-b6991f9b-2792-4d21-9749-65ae82100324 2010-03-24 21:35
I can't reproduce it on trunk. Nor on MacOSX or Linux, with any version.
The user says this crash started with Firefox 3.6, it never occurred with
3.5.x.
Comment 28•15 years ago
|
||
User says the crash also occurs on Mac OS X 10.5 and 10.6 with Firefox 3.6.
Comment 29•15 years ago
|
||
I tried the STR with no luck (3.6.2, macosx, product build). I will try again with a debug build.
Comment 30•15 years ago
|
||
Andreas, can you reproduce the crash?
Comment 31•15 years ago
|
||
Still 100% reproducible for me. Namoroka 3.6.5pre 20100415 on Windows XP.
Keywords: crash
Whiteboard: [sg:critical]
Comment 32•15 years ago
|
||
It's #53 in the Firefox 3.6.3 top crash list, with 5118 crashes (past 2 weeks).
http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6.3
blocking1.9.2: --- → ?
Keywords: topcrash
Comment 33•15 years ago
|
||
Andreas, what should we do here now that Mats can reproduce? Would a corefile from Mats help?
Comment 34•15 years ago
|
||
I only tried mac. Let me go upstairs and find a windows box and try again there. If that fails too, we should figure out core files.
Comment 35•15 years ago
|
||
Mats: can you capture it in a VM and get a snapshot to Andreas? Alternatively, we could try copilot to your machine, so Andreas or someone else can debug it live.
![]() |
Assignee | |
Comment 36•15 years ago
|
||
Attaching Visual Studio or WinDbg and using the save memory feature might work, too.
Comment 37•15 years ago
|
||
dvander, I will stop by. We should try this out on windows before resorting to bigger guns.
Comment 38•15 years ago
|
||
reproduced
![]() |
Assignee | |
Updated•15 years ago
|
Group: core-security
![]() |
Assignee | |
Comment 39•15 years ago
|
||
I have narrowed this down the assembly generated on line 11240 of jstracer.cpp in the 1.9.2 branch. This code is supposed to index into a typemap vector, but the base address is garbage. I will know more soon.
![]() |
Assignee | |
Comment 40•15 years ago
|
||
Okay, I think I see what's going on here. The bogus address is 0x1E, stored in EBX.
> mov ebx, [ebx + 0xC]
> add ebx, 0x1E
This line is grabbing a FrameInfo* from the RP stack and adding |sizeof(FrameInfo) + 2|. 0xC/4 is the distance between the trace entry frame and the frame that owns the argsobj. That's 3.
So why is rp[3] NULL? This is an optimized build i.e. no trace spew, so reading the nearest guard jump:
> 006af307 jne 006df3f4
> ... ... guard code
> 006df418 mov eax, 0x66DE698
Examining this address as a GuardRecord, and then recovering the VMSideExit, reveals the callDepth is 3.
RP uses 0-based indexes, so this is an off-by-one bug - rp[3] would be valid if |callDepth >= 4|. Test case and patch coming.
![]() |
Assignee | |
Comment 41•15 years ago
|
||
This bug does not exist on trunk, it happened to be fixed along with bug 495331. Test case does not crash (poisoning memory would do the trick), but you can see the problem because the type guard fails too much:
monitor: exits(16), timeouts(0), type mismatch(0), triggered(16), global mismatch(0), flushed(0)
Assignee: gal → dvander
Status: NEW → ASSIGNED
![]() |
Assignee | |
Comment 42•15 years ago
|
||
monitor: exits(2), timeouts(0), type mismatch(0), triggered(2), global mismatch(0), flushed(0)
Attachment #439423 -
Flags: review?(dmandelin)
Updated•15 years ago
|
Attachment #439423 -
Flags: review?(dmandelin) → review+
![]() |
Assignee | |
Updated•15 years ago
|
Attachment #439423 -
Flags: approval1.9.2.5?
Updated•15 years ago
|
blocking1.9.2: ? → needed
Comment 44•15 years ago
|
||
Comment 45•15 years ago
|
||
dvander: Can you please check out Bug 561813? On Mac I get this crash running the trunk and the URL in that bug. See the last bug comment for the link to my crash report.
Updated•15 years ago
|
Whiteboard: [sg:critical] → [sg:critical][critsmash:investigating]
Comment 46•15 years ago
|
||
dvander, any progress here?
![]() |
Assignee | |
Comment 47•15 years ago
|
||
(In reply to comment #46)
> dvander, any progress here?
This is waiting on approval. I don't know when that happens.
Comment 48•15 years ago
|
||
Comment on attachment 439423 [details] [diff] [review]
fix
a=beltzner for 1.9.2 default only
Attachment #439423 -
Flags: approval1.9.2.5? → approval1.9.2.5+
Updated•15 years ago
|
status1.9.1:
--- → unaffected
status1.9.2:
--- → wanted
Comment 49•15 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
blocking1.9.2: needed → .5+
Comment 50•15 years ago
|
||
I have seen a few crashes showing up in crash stats with this stack (http://tinyurl.com/2e2sdqv links to the Mac crashes) - I can crash in this stack by loading https://home.eease.adp.com/recruit2/?id=510443&t=2 using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.3a6pre) Gecko/20100621 Minefield/3.7a6pre. Should I reopen this bug or file a new one?
Comment 51•15 years ago
|
||
(In reply to comment #50)
> I have seen a few crashes showing up in crash stats with this stack
> (http://tinyurl.com/2e2sdqv links to the Mac crashes) - I can crash in this
> stack by loading https://home.eease.adp.com/recruit2/?id=510443&t=2 using
> Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.3a6pre)
> Gecko/20100621 Minefield/3.7a6pre. Should I reopen this bug or file a new one?
Since this bug is already patched, let's do a new one.
Comment 52•15 years ago
|
||
Bug 573558 is the new bug on file for the crash noted in Comment 50.
Comment 53•15 years ago
|
||
(In reply to comment #44)
> crash on load 1.9.2 winxp
> http://www.roadsafetraffic.com/locations.htm
> bp-02c5e6df-1749-4be3-86e2-520822100423
>
> http://www.srssa.com/contact/
> bp-d7bd2b01-0642-4359-b380-daa142100423
I used these to verify the fix. Both of these still crash in 1.9.2.6 but don't crash in build 1 of 1.9.2.7 on Win XP: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.7) Gecko/20100701 Firefox/3.6.7 (.NET CLR 3.5.30729).
Keywords: verified1.9.2
Updated•15 years ago
|
Group: core-security
Updated•14 years ago
|
Crash Signature: [@ ExecuteTree]
You need to log in
before you can comment on or make changes to this bug.
Description
•