Last Comment Bug 530955 - New crash [@ ExecuteTree] in Firefox 3.6b3
: New crash [@ ExecuteTree] in Firefox 3.6b3
Status: RESOLVED FIXED
[sg:critical][critsmash:investigating]
: crash, regression, topcrash, verified1.9.2
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: 1.9.2 Branch
: All All
: -- critical with 1 vote (vote)
: ---
Assigned To: David Anderson [:dvander]
:
:
Mentors:
http://crash-stats.mozilla.com/query/...
Depends on: 528048 536271
Blocks:
  Show dependency treegraph
 
Reported: 2009-11-24 17:44 PST by Johnny Stenback (:jst, jst@mozilla.com)
Modified: 2011-06-13 10:01 PDT (History)
24 users (show)
sayrer: blocking1.9.2-
sayrer: wanted1.9.2+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
.5+
.5-fixed
unaffected


Attachments
execute tree stack trend oct/nov (3.10 KB, text/plain)
2009-11-24 19:50 PST, chris hofmann
no flags Details
execute tree stack trend oct-dec19 (39.07 KB, image/png)
2009-12-20 11:36 PST, chris hofmann
no flags Details
test case (266 bytes, application/x-javascript)
2010-04-15 18:51 PDT, David Anderson [:dvander]
no flags Details
fix (1.43 KB, patch)
2010-04-15 18:52 PDT, David Anderson [:dvander]
dmandelin: review+
mbeltzner: approval1.9.2.5+
Details | Diff | Splinter Review

Description Johnny Stenback (:jst, jst@mozilla.com) 2009-11-24 17:44:18 PST
There's a new crash in Firefox 3.6b3 with the signature "ExecuteTree" in Firefox 3.6b3 that hasn't been seen in any of the versions 3\.5.*. So far we've seen 78+ of these crashes in the wild.

Please see http://crash-stats.mozilla.com/query/query?product=Firefox&version=Firefox%3A3.6b3&range_value=1&range_unit=weeks&query_search=signature&query_type=exact&query=ExecuteTree&do_query=1 for more crash info.
Comment 1 chris hofmann 2009-11-24 19:41:16 PST
21 total crashes for ExecuteTree on 20091122-crashdata.csv
3 start up crashes inside 3 minutes

signature list
  18 ExecuteTree
   3 js_ExecuteTree

os breakdown
   6 ExecuteTree Windows NT 5.1.2600 Service Pack 2
   5 ExecuteTree Windows NT 5.1.2600 Service Pack 3
   4 ExecuteTree Windows NT 6.1.7600
   3 js_ExecuteTree Windows NT 5.1.2600 Service Pack 2
   1 ExecuteTree Windows NT 6.0.6002 Service Pack 2
   1 ExecuteTree Windows NT 6.0.6001 Service Pack 1
   1 ExecuteTree Windows NT 5.1.2600 Service Pack 2, v.2055

distribution of all versions where the ExecuteTree crash was found on 20091122-crashdata.csv
  14 Firefox 3.6b3
   3 Firefox 3.6b1
   3 Firefox 3.5.5
   1 Firefox 3.6b2
Comment 2 chris hofmann 2009-11-24 19:50:52 PST
Created attachment 414433 [details]
execute tree stack trend oct/nov

I think I first see this showing up in 3.6 data on 10/16 then ramps just after beta 1 on nov 5
Comment 3 chris hofmann 2009-11-24 19:56:00 PST
in the lists in comment 2 the js_ExecuteTree is probably a different bug that is low volume on both 3.5.x and 3.6

http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=js_ExecuteTree&date=&range_value=1&range_unit=weeks&do_query=1&signature=js_ExecuteTree

That should probably be tracked separately.

volumes shown in comment 2 are only the ExecuteTree signature and show higher volume and increasing as 3.6 betas get more users.
Comment 4 Mike Shaver (:shaver -- probably not reading bugmail closely) 2009-11-26 08:08:24 PST
It looks like the higher volume crash is in the called tree itself, so we'll probably need minidump inspection here to have any joy.  The lower-volume 3.5+3.6 one looks to be crashing trying to call LeaveTree, can't tell much more without minidumpery.
Comment 5 Andreas Gal :gal 2009-11-30 13:19:26 PST
Looks like I won the lottery. I will start looking at the stacks and take it from there.
Comment 6 Andreas Gal :gal 2009-11-30 13:33:29 PST
The LeaveTree one might be an oom condition. Focusing on the higher-volume on trace crash. Minidump won't help much because we can't see the jitted coded from the minidump. We should look at the urls.
Comment 7 Andreas Gal :gal 2009-11-30 17:48:42 PST
I spent some time browsing the url jst pulled for me. No luck. Its very heavy on facebook, but that might just be representative of average web use. None of the URLs crash for me. At this point my gut is telling me this is a GC issue (we GC and something isn't kept alive, causing us to die on trace). jst has 2 core files for linux in the same general area as this bug. I will look at those next.
Comment 8 Andreas Gal :gal 2009-12-01 14:44:51 PST
Ok, after some digging around I have convinced myself that this bug was fixed by the following patch:

https://bugzilla.mozilla.org/show_bug.cgi?id=528048

The patch makes sure sprops stay alive after a GC if we embed them on trace. The patch has landed on m-c a week ago but is not on 1.9.2 yet.

There isn't enough crash data for trunk to tell whether the patch fixed anything there. I only see 1 crash with ExecuteTree for the last 4 weeks on trunk.
Comment 9 Marcia Knous [:marcia - use ni] 2009-12-03 11:58:18 PST
This crash shows as #40 in 3.6 B4. Did the patch in Bug 528048 make it into the beta (from the checkin date it looks as if it might have)? If it still an issue and someone can give me some URLs I will see if I can reproduce.
Comment 10 Andreas Gal :gal 2009-12-03 12:02:46 PST
marcia, jst pulled urls for me but this is a GC bug so I was never able to reproduce it. I was hoping we could confirm based on crash stats that the patch fixed the problem.
Comment 11 Jesse Ruderman 2009-12-16 00:07:22 PST
From http://hg.mozilla.org/releases/mozilla-1.9.2/pushloghtml it looks like the patch in bug 528048 went into the 1.9.2 branch well after b4 was tagged.  So I think we have to wait for b5 (or RC) crash data.
Comment 12 Jesse Ruderman 2009-12-17 00:30:50 PST
I will look at crash stats for beta 5 and renominate for blocking if needed.
Comment 13 chris hofmann 2009-12-17 08:24:16 PST
adding dependency so we don't loose track of rechecking this.  sounds like beta 5 might be going out today.
Comment 14 Jesse Ruderman 2009-12-20 11:06:12 PST
[@ ExecuteTree] went from #83 in b4 to #10 in b5 :(
http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6b4
http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6b5
Comment 15 chris hofmann 2009-12-20 11:36:18 PST
Created attachment 418560 [details]
execute tree stack trend oct-dec19

yeah, something caused a big spike yesterday.  one thing that happened was 200,000 new users moved up from beta4 to beta5 or joined the beta as new users.

https://wiki.mozilla.org/CrashKill/Crashr#Relases_3.6b5
Comment 16 chris hofmann 2009-12-21 11:42:50 PST
not as bad as the spike of 140 crashes on 12-19, but volume is still up on 12-20 == 112 crashes.

there are a couple of 'hu'  TLD's that seem to be part of the increase the last two days.

domains of sites
  26 http://www.facebook.com
        http://apps.facebook.com/thesummoning/
        http://apps.facebook.com/inthemafia/
        http://apps.facebook.com/happy-aquarium/

   4 http://myvip.com

newly spotted domains on 12-19/20
  23 http://www.baratikor.hu
      8 http://www.baratikor.hu/
      2 http://www.baratikor.hu/dating/?selectedTab=13
      lots of http://www.baratikor.hu/gallery/?fid= sanitized
      lots of http://www.baratikor.hu/friend/?  sanitized

  10 http://hi5.com
      2 http://hi5.com/friend/profile/displayProfile.do?userid= -santized
      2 http://hi5.com/friend/mail/displayInbox.do
      2 http://hi5.com/

   5 http://iwiw.hu
      2 http://iwiw.hu/pages/user/login.jsp?method=Login
      2 http://iwiw.hu/
Comment 17 Robert Sayre 2009-12-21 12:04:56 PST
are these crashes all the latest beta? iow, have we eliminated older buildids?
Comment 18 chris hofmann 2009-12-21 14:29:16 PST
some of these are on 3.5.x but much higher rate om 3.6

checking --- 20091220-crashdata.csv ExecuteTree
release total-crashes
              ExecuteTree crashes
                         pct.
all     215848  112     0.000518884
3.0.15  6811            0
3.0.16  31577           0
3.5.5   18251   2       0.000109583
3.5.6   105854  3       2.83409e-05
3.6b5   16669   94      0.00563921
3.6b4   5000    10      0.002
3.6b3   644             0
3.6b2   661     2       0.00302572
3.6b1   2101            0
Comment 19 chris hofmann 2009-12-21 14:33:49 PST
before this recent spike the distribution was still tilted toward 3.6bx

checking --- 20091215-crashdata.csv ExecuteTree
release total-crashes
              ExecuteTree crashes
                         pct.
all     231799  47      0.000202762
3.0.15  41098           0
3.0.16  414             0
3.5.5   127012  7       5.51129e-05
3.5.6   3080            0
3.6b5   209             0
3.6b4   22729   38      0.00167187
3.6b3   677     1       0.0014771
3.6b2   793             0
3.6b1   2265    1       0.000441501
Comment 20 Mike Shaver (:shaver -- probably not reading bugmail closely) 2009-12-21 14:40:44 PST
This is almost certainly multiple bugs; ExecuteTree is tracemonkeyese for "running generated code".  All the ILLEGAL_INSTRUCTION (and PRIV_INSTRUCTION -- exciting! are we running garbage?) ones are on AMDs, so there may be some instruction selection issues at hand.
Comment 21 Robert Sayre 2009-12-21 14:58:40 PST
I looked through a bunch of these and found many different stacks, with a large percentage of crashes happening on hungarian operating systems. even within that subset I found different stacks. we're going to have to minus this and look for more data.
Comment 22 David Mandelin [:dmandelin] 2009-12-21 15:01:17 PST
(In reply to comment #20)
> All the ILLEGAL_INSTRUCTION (and PRIV_INSTRUCTION -- exciting! are we 
> running garbage?) ones are on AMDs

Most, but not all; e.g. https://crash-stats.mozilla.com/report/index/0817bbaa-f4bc-48d0-8c52-e438f2091219

Anyway, I agree this is probably multiple bugs. Unfortunately, this crash happens too rarely to be found in nightlies (although I may search harder to try to find it), so we cannot try to guess at a patch that may have started it.

choffman: is there any way we can get these crash numbers compared to ADUs? I'm curious if the increase we see around b3 or so is simply due to more users, or if there is evidence that a patch/patches introduced more of these at that time.

Sadly, minidumps can't help us at this time because the code at the crash point is generated, and is not on the stack. We need new ideas in order to get anywhere on this. Here are two:

1. dvander suggested stashing the script filename before we call the trace so that we can recover it from the crashreport. For example, we could copy it to a char[] buffer in ExecuteTree, and then it would be in the minidump. We could see if the same script keeps showing up, or if we're really lucky find a test case.

2. Teach breakpad to send back the generated trace code. One idea is to create a breakpad API to register a memory range of interest. We call that just before calling a trace with the range of that trace. If we crash, then breakpad includes that memory in the minidump. After returning from the trace, we unregister that range. The problem with this idea is that traces can call other traces, so we can't easily and compactly represent the memory range of interest. 

A simpler idea that would work is to add the page that contains EIP to the minidump. We could then refine that with tracer knowledge later.
Comment 23 Mike Shaver (:shaver -- probably not reading bugmail closely) 2009-12-21 15:10:57 PST
(In reply to comment #22)
> A simpler idea that would work is to add the page that contains EIP to the
> minidump. We could then refine that with tracer knowledge later.

file pls!
Comment 24 chris hofmann 2009-12-21 16:16:43 PST
(In reply to comment #22)
> 
> choffman: is there any way we can get these crash numbers compared to ADUs? I'm
> curious if the increase we see around b3 or so is simply due to more users, or
> if there is evidence that a patch/patches introduced more of these at that
> time.
> 

bugs are on file to get adu data merged into the crash database so we can do things like that more easily.   until then I'm grabing snaps from the two sources and pasting together at https://wiki.mozilla.org/CrashKill/Crashr

                                     adus
                  crash-count   3.6b3    3.6b4

20091118-crashdata	21	18435	
20091119-crashdata	26	142847	
20091120-crashdata	30	207349	
20091121-crashdata	20	217975	
20091122-crashdata	21	243541	
20091123-crashdata	33	294307	
20091124-crashdata	24	321004	
20091125-crashdata	30	319230	
20091126-crashdata	50	313303	11003
20091127-crashdata	54	227788	101832
20091128-crashdata	70	111492	208895
20091129-crashdata	63	80372	262879
20091130-crashdata	77	79695	318380
20091201-crashdata	87	58951	354012
20091202-crashdata	43	47254	377984
20091203-crashdata	42	40100	394451
20091204-crashdata	46	34703	399269
20091205-crashdata	42	29512	375329
20091206-crashdata	45	26259	390387
20091207-crashdata	34	28124	447912
20091208-crashdata	54	26173	460269
Comment 25 Brendan Eich [:brendan] 2009-12-21 22:01:25 PST
(In reply to comment #22)
> 1. dvander suggested stashing the script filename before we call the trace so
> that we can recover it from the crashreport. For example, we could copy it to a
> char[] buffer in ExecuteTree, and then it would be in the minidump. We could
> see if the same script keeps showing up, or if we're really lucky find a test
> case.

Not sure you need to copy the whole string to a stack buffer -- we know how perf-sensitive ExecuteTree/LeaveTree are -- but here's a fun fact: script filenames are GC'ed and shared aggressively, see js_SaveScriptFilename. The char buffer used is an extension of a JSHashEntry, so in the heap, but perhaps you could use a more concise id to track filename from the stack.

> A simpler idea that would work is to add the page that contains EIP to the
> minidump. We could then refine that with tracer knowledge later.

+1000.

/be
Comment 26 Mats Palmgren (:mats) 2010-03-24 19:20:24 PDT
I'm in contact with a user who reported a reproducible crash
that looks likes this bug. Here's his crash data:
http://crash-stats.mozilla.com/report/index/93852b95-98f9-47a4-9e5b-0b69b2100222

He's very cooperative and have created a test account for us on their server.
I can reproduce the crash (only on Windows though), my crashes:
bp-40101cbf-e63b-4bd6-9b48-6d6392100324	2010-03-25	01:29
bp-c4a8bc74-d38a-40d5-8e1d-109fc2100324	2010-03-25	01:28
bp-87e16b1d-961f-4524-8154-7d01b2100324	2010-03-25	01:28
bp-14631208-05bc-4db0-9160-f23ab2100324	2010-03-24	21:42
bp-aadbeecc-ef7d-4e2a-9d50-adfe42100324	2010-03-24	21:39
bp-b6991f9b-2792-4d21-9749-65ae82100324	2010-03-24	21:35

I can't reproduce it on trunk.  Nor on MacOSX or Linux, with any version.
The user says this crash started with Firefox 3.6, it never occurred with
3.5.x.
Comment 28 Mats Palmgren (:mats) 2010-03-25 02:02:38 PDT
User says the crash also occurs on Mac OS X 10.5 and 10.6 with Firefox 3.6.
Comment 29 Andreas Gal :gal 2010-03-25 16:40:17 PDT
I tried the STR with no luck (3.6.2, macosx, product build). I will try again with a debug build.
Comment 30 Mats Palmgren (:mats) 2010-04-05 05:08:45 PDT
Andreas, can you reproduce the crash?
Comment 31 Mats Palmgren (:mats) 2010-04-15 14:26:41 PDT
Still 100% reproducible for me.  Namoroka 3.6.5pre 20100415 on Windows XP.
Comment 32 Mats Palmgren (:mats) 2010-04-15 14:36:31 PDT
It's #53 in the Firefox 3.6.3 top crash list, with 5118 crashes (past 2 weeks).
http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6.3
Comment 33 Jesse Ruderman 2010-04-15 14:59:48 PDT
Andreas, what should we do here now that Mats can reproduce?  Would a corefile from Mats help?
Comment 34 Andreas Gal :gal 2010-04-15 15:03:11 PDT
I only tried mac. Let me go upstairs and find a windows box and try again there. If that fails too, we should figure out core files.
Comment 35 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-04-15 15:09:01 PDT
Mats: can you capture it in a VM and get a snapshot to Andreas?  Alternatively, we could try copilot to your machine, so Andreas or someone else can debug it live.
Comment 36 David Anderson [:dvander] 2010-04-15 15:10:30 PDT
Attaching Visual Studio or WinDbg and using the save memory feature might work, too.
Comment 37 Andreas Gal :gal 2010-04-15 15:11:36 PDT
dvander, I will stop by. We should try this out on windows before resorting to bigger guns.
Comment 38 Andreas Gal :gal 2010-04-15 15:26:49 PDT
reproduced
Comment 39 David Anderson [:dvander] 2010-04-15 17:35:58 PDT
I have narrowed this down the assembly generated on line 11240 of jstracer.cpp in the 1.9.2 branch. This code is supposed to index into a typemap vector, but the base address is garbage. I will know more soon.
Comment 40 David Anderson [:dvander] 2010-04-15 18:28:25 PDT
Okay, I think I see what's going on here. The bogus address is 0x1E, stored in EBX.

>  mov ebx, [ebx + 0xC]
>  add ebx, 0x1E

This line is grabbing a FrameInfo* from the RP stack and adding |sizeof(FrameInfo) + 2|. 0xC/4 is the distance between the trace entry frame and the frame that owns the argsobj. That's 3.

So why is rp[3] NULL? This is an optimized build i.e. no trace spew, so reading the nearest guard jump:

>  006af307  jne 006df3f4
>  ... ... guard code
>  006df418  mov eax, 0x66DE698

Examining this address as a GuardRecord, and then recovering the VMSideExit, reveals the callDepth is 3.

RP uses 0-based indexes, so this is an off-by-one bug - rp[3] would be valid if |callDepth >= 4|. Test case and patch coming.
Comment 41 David Anderson [:dvander] 2010-04-15 18:51:07 PDT
Created attachment 439422 [details]
test case

This bug does not exist on trunk, it happened to be fixed along with bug 495331. Test case does not crash (poisoning memory would do the trick), but you can see the problem because the type guard fails too much:

monitor: exits(16), timeouts(0), type mismatch(0), triggered(16), global mismatch(0), flushed(0)
Comment 42 David Anderson [:dvander] 2010-04-15 18:52:26 PDT
Created attachment 439423 [details] [diff] [review]
fix

monitor: exits(2), timeouts(0), type mismatch(0), triggered(2), global mismatch(0), flushed(0)
Comment 45 Marcia Knous [:marcia - use ni] 2010-04-26 12:04:18 PDT
dvander: Can you please check out Bug 561813? On Mac I get this crash running the trunk and the URL in that bug. See the last bug comment for the link to my crash report.
Comment 46 Damon Sicore (:damons) 2010-05-11 13:38:56 PDT
dvander, any progress here?
Comment 47 David Anderson [:dvander] 2010-05-11 14:00:01 PDT
(In reply to comment #46)
> dvander, any progress here?

This is waiting on approval. I don't know when that happens.
Comment 48 Mike Beltzner [:beltzner, not reading bugmail] 2010-05-18 13:49:35 PDT
Comment on attachment 439423 [details] [diff] [review]
fix

a=beltzner for 1.9.2 default only
Comment 50 Marcia Knous [:marcia - use ni] 2010-06-21 12:31:20 PDT
I have seen a few crashes showing up in crash stats with this stack (http://tinyurl.com/2e2sdqv links to the Mac crashes) - I can crash in this stack by loading https://home.eease.adp.com/recruit2/?id=510443&t=2 using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.3a6pre) Gecko/20100621 Minefield/3.7a6pre. Should I reopen this bug or file a new one?
Comment 51 David Mandelin [:dmandelin] 2010-06-21 12:32:14 PDT
(In reply to comment #50)
> I have seen a few crashes showing up in crash stats with this stack
> (http://tinyurl.com/2e2sdqv links to the Mac crashes) - I can crash in this
> stack by loading https://home.eease.adp.com/recruit2/?id=510443&t=2 using
> Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.3a6pre)
> Gecko/20100621 Minefield/3.7a6pre. Should I reopen this bug or file a new one?

Since this bug is already patched, let's do a new one.
Comment 52 Marcia Knous [:marcia - use ni] 2010-06-21 13:36:34 PDT
Bug 573558 is the new bug on file for the crash noted in Comment 50.
Comment 53 Al Billings [:abillings] 2010-07-01 12:32:56 PDT
(In reply to comment #44)
> crash on load 1.9.2 winxp
> http://www.roadsafetraffic.com/locations.htm
> bp-02c5e6df-1749-4be3-86e2-520822100423
> 
> http://www.srssa.com/contact/
> bp-d7bd2b01-0642-4359-b380-daa142100423

I used these to verify the fix. Both of these still crash in 1.9.2.6 but don't crash in build 1 of 1.9.2.7 on Win XP: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.7) Gecko/20100701 Firefox/3.6.7 (.NET CLR 3.5.30729).

Note You need to log in before you can comment on or make changes to this bug.