3.10 KB, text/plain
39.07 KB, image/png
1.43 KB, patch
|Details | Diff | Splinter Review|
There's a new crash in Firefox 3.6b3 with the signature "ExecuteTree" in Firefox 3.6b3 that hasn't been seen in any of the versions 3\.5.*. So far we've seen 78+ of these crashes in the wild. Please see http://crash-stats.mozilla.com/query/query?product=Firefox&version=Firefox%3A3.6b3&range_value=1&range_unit=weeks&query_search=signature&query_type=exact&query=ExecuteTree&do_query=1 for more crash info.
21 total crashes for ExecuteTree on 20091122-crashdata.csv 3 start up crashes inside 3 minutes signature list 18 ExecuteTree 3 js_ExecuteTree os breakdown 6 ExecuteTree Windows NT 5.1.2600 Service Pack 2 5 ExecuteTree Windows NT 5.1.2600 Service Pack 3 4 ExecuteTree Windows NT 6.1.7600 3 js_ExecuteTree Windows NT 5.1.2600 Service Pack 2 1 ExecuteTree Windows NT 6.0.6002 Service Pack 2 1 ExecuteTree Windows NT 6.0.6001 Service Pack 1 1 ExecuteTree Windows NT 5.1.2600 Service Pack 2, v.2055 distribution of all versions where the ExecuteTree crash was found on 20091122-crashdata.csv 14 Firefox 3.6b3 3 Firefox 3.6b1 3 Firefox 3.5.5 1 Firefox 3.6b2
Created attachment 414433 [details] execute tree stack trend oct/nov I think I first see this showing up in 3.6 data on 10/16 then ramps just after beta 1 on nov 5
in the lists in comment 2 the js_ExecuteTree is probably a different bug that is low volume on both 3.5.x and 3.6 http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=js_ExecuteTree&date=&range_value=1&range_unit=weeks&do_query=1&signature=js_ExecuteTree That should probably be tracked separately. volumes shown in comment 2 are only the ExecuteTree signature and show higher volume and increasing as 3.6 betas get more users.
It looks like the higher volume crash is in the called tree itself, so we'll probably need minidump inspection here to have any joy. The lower-volume 3.5+3.6 one looks to be crashing trying to call LeaveTree, can't tell much more without minidumpery.
Looks like I won the lottery. I will start looking at the stacks and take it from there.
The LeaveTree one might be an oom condition. Focusing on the higher-volume on trace crash. Minidump won't help much because we can't see the jitted coded from the minidump. We should look at the urls.
I spent some time browsing the url jst pulled for me. No luck. Its very heavy on facebook, but that might just be representative of average web use. None of the URLs crash for me. At this point my gut is telling me this is a GC issue (we GC and something isn't kept alive, causing us to die on trace). jst has 2 core files for linux in the same general area as this bug. I will look at those next.
Ok, after some digging around I have convinced myself that this bug was fixed by the following patch: https://bugzilla.mozilla.org/show_bug.cgi?id=528048 The patch makes sure sprops stay alive after a GC if we embed them on trace. The patch has landed on m-c a week ago but is not on 1.9.2 yet. There isn't enough crash data for trunk to tell whether the patch fixed anything there. I only see 1 crash with ExecuteTree for the last 4 weeks on trunk.
This crash shows as #40 in 3.6 B4. Did the patch in Bug 528048 make it into the beta (from the checkin date it looks as if it might have)? If it still an issue and someone can give me some URLs I will see if I can reproduce.
marcia, jst pulled urls for me but this is a GC bug so I was never able to reproduce it. I was hoping we could confirm based on crash stats that the patch fixed the problem.
From http://hg.mozilla.org/releases/mozilla-1.9.2/pushloghtml it looks like the patch in bug 528048 went into the 1.9.2 branch well after b4 was tagged. So I think we have to wait for b5 (or RC) crash data.
I will look at crash stats for beta 5 and renominate for blocking if needed.
adding dependency so we don't loose track of rechecking this. sounds like beta 5 might be going out today.
[@ ExecuteTree] went from #83 in b4 to #10 in b5 :( http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6b4 http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6b5
Created attachment 418560 [details] execute tree stack trend oct-dec19 yeah, something caused a big spike yesterday. one thing that happened was 200,000 new users moved up from beta4 to beta5 or joined the beta as new users. https://wiki.mozilla.org/CrashKill/Crashr#Relases_3.6b5
not as bad as the spike of 140 crashes on 12-19, but volume is still up on 12-20 == 112 crashes. there are a couple of 'hu' TLD's that seem to be part of the increase the last two days. domains of sites 26 http://www.facebook.com http://apps.facebook.com/thesummoning/ http://apps.facebook.com/inthemafia/ http://apps.facebook.com/happy-aquarium/ 4 http://myvip.com newly spotted domains on 12-19/20 23 http://www.baratikor.hu 8 http://www.baratikor.hu/ 2 http://www.baratikor.hu/dating/?selectedTab=13 lots of http://www.baratikor.hu/gallery/?fid= sanitized lots of http://www.baratikor.hu/friend/? sanitized 10 http://hi5.com 2 http://hi5.com/friend/profile/displayProfile.do?userid= -santized 2 http://hi5.com/friend/mail/displayInbox.do 2 http://hi5.com/ 5 http://iwiw.hu 2 http://iwiw.hu/pages/user/login.jsp?method=Login 2 http://iwiw.hu/
are these crashes all the latest beta? iow, have we eliminated older buildids?
some of these are on 3.5.x but much higher rate om 3.6 checking --- 20091220-crashdata.csv ExecuteTree release total-crashes ExecuteTree crashes pct. all 215848 112 0.000518884 3.0.15 6811 0 3.0.16 31577 0 3.5.5 18251 2 0.000109583 3.5.6 105854 3 2.83409e-05 3.6b5 16669 94 0.00563921 3.6b4 5000 10 0.002 3.6b3 644 0 3.6b2 661 2 0.00302572 3.6b1 2101 0
before this recent spike the distribution was still tilted toward 3.6bx checking --- 20091215-crashdata.csv ExecuteTree release total-crashes ExecuteTree crashes pct. all 231799 47 0.000202762 3.0.15 41098 0 3.0.16 414 0 3.5.5 127012 7 5.51129e-05 3.5.6 3080 0 3.6b5 209 0 3.6b4 22729 38 0.00167187 3.6b3 677 1 0.0014771 3.6b2 793 0 3.6b1 2265 1 0.000441501
This is almost certainly multiple bugs; ExecuteTree is tracemonkeyese for "running generated code". All the ILLEGAL_INSTRUCTION (and PRIV_INSTRUCTION -- exciting! are we running garbage?) ones are on AMDs, so there may be some instruction selection issues at hand.
I looked through a bunch of these and found many different stacks, with a large percentage of crashes happening on hungarian operating systems. even within that subset I found different stacks. we're going to have to minus this and look for more data.
(In reply to comment #20) > All the ILLEGAL_INSTRUCTION (and PRIV_INSTRUCTION -- exciting! are we > running garbage?) ones are on AMDs Most, but not all; e.g. https://crash-stats.mozilla.com/report/index/0817bbaa-f4bc-48d0-8c52-e438f2091219 Anyway, I agree this is probably multiple bugs. Unfortunately, this crash happens too rarely to be found in nightlies (although I may search harder to try to find it), so we cannot try to guess at a patch that may have started it. choffman: is there any way we can get these crash numbers compared to ADUs? I'm curious if the increase we see around b3 or so is simply due to more users, or if there is evidence that a patch/patches introduced more of these at that time. Sadly, minidumps can't help us at this time because the code at the crash point is generated, and is not on the stack. We need new ideas in order to get anywhere on this. Here are two: 1. dvander suggested stashing the script filename before we call the trace so that we can recover it from the crashreport. For example, we could copy it to a char buffer in ExecuteTree, and then it would be in the minidump. We could see if the same script keeps showing up, or if we're really lucky find a test case. 2. Teach breakpad to send back the generated trace code. One idea is to create a breakpad API to register a memory range of interest. We call that just before calling a trace with the range of that trace. If we crash, then breakpad includes that memory in the minidump. After returning from the trace, we unregister that range. The problem with this idea is that traces can call other traces, so we can't easily and compactly represent the memory range of interest. A simpler idea that would work is to add the page that contains EIP to the minidump. We could then refine that with tracer knowledge later.
(In reply to comment #22) > A simpler idea that would work is to add the page that contains EIP to the > minidump. We could then refine that with tracer knowledge later. file pls!
(In reply to comment #22) > > choffman: is there any way we can get these crash numbers compared to ADUs? I'm > curious if the increase we see around b3 or so is simply due to more users, or > if there is evidence that a patch/patches introduced more of these at that > time. > bugs are on file to get adu data merged into the crash database so we can do things like that more easily. until then I'm grabing snaps from the two sources and pasting together at https://wiki.mozilla.org/CrashKill/Crashr adus crash-count 3.6b3 3.6b4 20091118-crashdata 21 18435 20091119-crashdata 26 142847 20091120-crashdata 30 207349 20091121-crashdata 20 217975 20091122-crashdata 21 243541 20091123-crashdata 33 294307 20091124-crashdata 24 321004 20091125-crashdata 30 319230 20091126-crashdata 50 313303 11003 20091127-crashdata 54 227788 101832 20091128-crashdata 70 111492 208895 20091129-crashdata 63 80372 262879 20091130-crashdata 77 79695 318380 20091201-crashdata 87 58951 354012 20091202-crashdata 43 47254 377984 20091203-crashdata 42 40100 394451 20091204-crashdata 46 34703 399269 20091205-crashdata 42 29512 375329 20091206-crashdata 45 26259 390387 20091207-crashdata 34 28124 447912 20091208-crashdata 54 26173 460269
(In reply to comment #22) > 1. dvander suggested stashing the script filename before we call the trace so > that we can recover it from the crashreport. For example, we could copy it to a > char buffer in ExecuteTree, and then it would be in the minidump. We could > see if the same script keeps showing up, or if we're really lucky find a test > case. Not sure you need to copy the whole string to a stack buffer -- we know how perf-sensitive ExecuteTree/LeaveTree are -- but here's a fun fact: script filenames are GC'ed and shared aggressively, see js_SaveScriptFilename. The char buffer used is an extension of a JSHashEntry, so in the heap, but perhaps you could use a more concise id to track filename from the stack. > A simpler idea that would work is to add the page that contains EIP to the > minidump. We could then refine that with tracer knowledge later. +1000. /be
I'm in contact with a user who reported a reproducible crash that looks likes this bug. Here's his crash data: http://crash-stats.mozilla.com/report/index/93852b95-98f9-47a4-9e5b-0b69b2100222 He's very cooperative and have created a test account for us on their server. I can reproduce the crash (only on Windows though), my crashes: bp-40101cbf-e63b-4bd6-9b48-6d6392100324 2010-03-25 01:29 bp-c4a8bc74-d38a-40d5-8e1d-109fc2100324 2010-03-25 01:28 bp-87e16b1d-961f-4524-8154-7d01b2100324 2010-03-25 01:28 bp-14631208-05bc-4db0-9160-f23ab2100324 2010-03-24 21:42 bp-aadbeecc-ef7d-4e2a-9d50-adfe42100324 2010-03-24 21:39 bp-b6991f9b-2792-4d21-9749-65ae82100324 2010-03-24 21:35 I can't reproduce it on trunk. Nor on MacOSX or Linux, with any version. The user says this crash started with Firefox 3.6, it never occurred with 3.5.x.
User says the crash also occurs on Mac OS X 10.5 and 10.6 with Firefox 3.6.
I tried the STR with no luck (3.6.2, macosx, product build). I will try again with a debug build.
Andreas, can you reproduce the crash?
Still 100% reproducible for me. Namoroka 3.6.5pre 20100415 on Windows XP.
It's #53 in the Firefox 3.6.3 top crash list, with 5118 crashes (past 2 weeks). http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6.3
Andreas, what should we do here now that Mats can reproduce? Would a corefile from Mats help?
I only tried mac. Let me go upstairs and find a windows box and try again there. If that fails too, we should figure out core files.
Mats: can you capture it in a VM and get a snapshot to Andreas? Alternatively, we could try copilot to your machine, so Andreas or someone else can debug it live.
Attaching Visual Studio or WinDbg and using the save memory feature might work, too.
dvander, I will stop by. We should try this out on windows before resorting to bigger guns.
I have narrowed this down the assembly generated on line 11240 of jstracer.cpp in the 1.9.2 branch. This code is supposed to index into a typemap vector, but the base address is garbage. I will know more soon.
Okay, I think I see what's going on here. The bogus address is 0x1E, stored in EBX. > mov ebx, [ebx + 0xC] > add ebx, 0x1E This line is grabbing a FrameInfo* from the RP stack and adding |sizeof(FrameInfo) + 2|. 0xC/4 is the distance between the trace entry frame and the frame that owns the argsobj. That's 3. So why is rp NULL? This is an optimized build i.e. no trace spew, so reading the nearest guard jump: > 006af307 jne 006df3f4 > ... ... guard code > 006df418 mov eax, 0x66DE698 Examining this address as a GuardRecord, and then recovering the VMSideExit, reveals the callDepth is 3. RP uses 0-based indexes, so this is an off-by-one bug - rp would be valid if |callDepth >= 4|. Test case and patch coming.
Created attachment 439422 [details] test case This bug does not exist on trunk, it happened to be fixed along with bug 495331. Test case does not crash (poisoning memory would do the trick), but you can see the problem because the type guard fails too much: monitor: exits(16), timeouts(0), type mismatch(0), triggered(16), global mismatch(0), flushed(0)
Created attachment 439423 [details] [diff] [review] fix monitor: exits(2), timeouts(0), type mismatch(0), triggered(2), global mismatch(0), flushed(0)
dvander: Can you please check out Bug 561813? On Mac I get this crash running the trunk and the URL in that bug. See the last bug comment for the link to my crash report.
dvander, any progress here?
(In reply to comment #46) > dvander, any progress here? This is waiting on approval. I don't know when that happens.
Comment on attachment 439423 [details] [diff] [review] fix a=beltzner for 1.9.2 default only
I have seen a few crashes showing up in crash stats with this stack (http://tinyurl.com/2e2sdqv links to the Mac crashes) - I can crash in this stack by loading https://home.eease.adp.com/recruit2/?id=510443&t=2 using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.3a6pre) Gecko/20100621 Minefield/3.7a6pre. Should I reopen this bug or file a new one?
(In reply to comment #50) > I have seen a few crashes showing up in crash stats with this stack > (http://tinyurl.com/2e2sdqv links to the Mac crashes) - I can crash in this > stack by loading https://home.eease.adp.com/recruit2/?id=510443&t=2 using > Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.3a6pre) > Gecko/20100621 Minefield/3.7a6pre. Should I reopen this bug or file a new one? Since this bug is already patched, let's do a new one.
(In reply to comment #44) > crash on load 1.9.2 winxp > http://www.roadsafetraffic.com/locations.htm > bp-02c5e6df-1749-4be3-86e2-520822100423 > > http://www.srssa.com/contact/ > bp-d7bd2b01-0642-4359-b380-daa142100423 I used these to verify the fix. Both of these still crash in 184.108.40.206 but don't crash in build 1 of 220.127.116.11 on Win XP: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:18.104.22.168) Gecko/20100701 Firefox/3.6.7 (.NET CLR 3.5.30729).