Last Comment Bug 523745 - Crash in [@ js_EqualStrings ]
: Crash in [@ js_EqualStrings ]
Status: RESOLVED FIXED
[3.6b1][#5 Firefox 3.6b1 topcrash]
: crash, topcrash
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: 1.9.2 Branch
: x86 All
: -- critical (vote)
: ---
Assigned To: David Mandelin [:dmandelin]
:
:
Mentors:
Depends on: 521169
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-21 16:53 PDT by Marcia Knous [:marcia - use ni]
Modified: 2011-09-26 23:34 PDT (History)
14 users (show)
sayrer: blocking1.9.2+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
beta4-fixed


Attachments
Diagnostic patch (989 bytes, patch)
2009-11-13 16:15 PST, David Mandelin [:dmandelin]
dvander: review+
Details | Diff | Splinter Review

Description Marcia Knous [:marcia - use ni] 2009-10-21 16:53:59 PDT
Seen while running Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2b1) Gecko/20091019 Firefox/3.6b1.

While running a BFT I stepped away from my machine for a moment and I crashed in this stack. 

Bug 451873 appears similar although it was filed a while back.

Here are my extensions:
Nightly Tester Tools	2.0.2	false	{8620c15f-30dc-4dba-a131-7c5d20cf4a29}
Yahoo! Toolbar	1.6.6.20090220	false	{635abd67-4fe9-1b23-4f01-e679fa7484c1}
FlashGot	1.2.0.7	true	{19503e42-ca3c-4c27-b1e2-9cdb2170ee34}
NoScript	1.9.9.11	true	{73a6fe31-595d-460b-a920-fcc0f8843232}
WindFox	1.0.2	true	WindFox@windfinder.com
Save Session	1.3.1.6	true	savesession@noasobi.net
Pray Times!	1.1.3	true	azan-times@hamid.net
Custom Buttons²	3.0.1	true	CustomButtons2@cbtnext.org
Reload Plus	1.0	true	reloadplus@blackwind
Comment 1 Carsten Book [:Tomcat] 2009-10-22 16:07:32 PDT
marcia, do you have a crashreporter id ?
Was this crash reproducible without all this extensions ?
Comment 2 Marcia Knous [:marcia - use ni] 2009-10-22 16:14:48 PDT
Have not been able to reproduce this crash. Will post the crash ID when I am next in the lab because I don't see the report on this 10.6 machine.
Comment 4 Oliver Saier 2009-10-24 02:35:41 PDT
Happens on win xp too, apparently while random browsing.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2b2pre) Gecko/20091023 Firefox/3.0.1 (.NET CLR 3.5.30729)
http://crash-stats.mozilla.com/report/index/bp-061e7814-68bc-4553-92c3-bbc072091024
http://crash-stats.mozilla.com/report/index/62ec802f-a19e-4b8f-8d5e-f9f612091023
http://crash-stats.mozilla.com/report/index/38361596-5693-4ec9-bd34-530ce2091023
Comment 5 Henrik Skupin (:whimboo) [away 09/30 - 10/06] 2009-10-28 02:14:43 PDT
Bug 497618 made changes around this area for 1.9.2. I wonder if it is somehow related. Looks like that we access random memory at 0x10d29000 while trying to get the string length:

http://hg.mozilla.org/releases/mozilla-1.9.2/annotate/4b0e75e126f1/js/src/jsstr.cpp#l3329

n = str1->length();
Comment 6 chris hofmann 2009-11-03 08:06:12 PST
very early in the 3.6b1 data gathering but this is near the top of the top crash list


report for the stack signature js_EqualStrings 
42 total crashes for js_EqualStrings on 20091102-crashdata.csv
4 start up crashes inside 3 minutes

actually a couple of reports of this signature on 3.5.4 and 3.0.15 but it happens way less there.  we could check landings on those branches for suspect code.

distribution of all versions where the js_EqualStrings crash was found on 20091102-crashdata.csv
  40 Firefox 3.6b1
   1 Firefox 3.5.4
   1 Firefox 3.0.15

____________number of uniq sites found with this signature:
      40
   2 js_EqualStrings http://maps.google.com.br/maps?hl=pt-BR&tab=wl
   2 js_EqualStrings \N
   1 js_EqualStrings https://www.cellularsouth.com/cscommerce/products/phones/product_phone_detail.jsp?navAction=push&navCount=0&id=prod26570031
   1 js_EqualStrings https://myuni.adelaide.edu.au/webapps/lobj-wiki-bb_bb60/editor/plugins/lobj_campus_pack/image.htm
   1 js_EqualStrings http://www17.24h.com.vn/index.php
   1 js_EqualStrings http://www.zino.gr/
   1 js_EqualStrings http://www.yandex.ru/
   1 js_EqualStrings http://www.tudou.com/
   3 js_EqualStrings http://www.tagged.com/profile.html?  -profile uids removed

   1 js_EqualStrings http://www.tagged.com/meetme.html
   1 js_EqualStrings http://www.tagged.com/home.html
   1 js_EqualStrings http://www.tagged.com/friends.html#tab=2
   1 js_EqualStrings http://www.pctools.com/guides/registry/detail/1106/
   1 js_EqualStrings http://www.netflix.com/Queue?lnkce=sntQu&lnkctr=mhbque
   1 js_EqualStrings http://www.nate.com/?f=cysvc
   1 js_EqualStrings http://www.facebook.com/home.php?#/inbox/?folder=[fb]messages&page=1&tid=1146629790696
   1 js_EqualStrings http://www.engadget.com/2009/05/25/lenovos-ion-based-s12-makes-netbooks-exciting-again/
   1 js_EqualStrings http://www.buick.com/vehicles/2010/lacrosse/overview.do?cmp=OLA_Buick_LaCrosse_Yahoo
   1 js_EqualStrings http://www.buick.com/vehicles/2010/lacrosse/build.do
   1 js_EqualStrings http://www.buick.com/tools/byo/byoCustomizeVehicle.do?year=2010&brand=lacrosse&title=&evar10=byo_path_buildbytrim&pvc=471
Comment 7 Julian Seward [:jseward] 2009-11-03 09:55:35 PST
From

  http://www.tagged.com/home.html

I got the trace below (read-after-free), which has previously been
reported as bug 515799.  I suspect it's a red herring though; and I
couldn't see any other badness on the dozen or so of the other
above-listed URLs I tried.  This was on x64-linux, so no JITting aiui,
so perhaps not a good test.

 Invalid read of size 4
    at 0x5994686: nsViewManager::ViewToWidget(nsView*, nsView*, nsRect const&) const (nsIDeviceContext.h:284)
    by 0x5994ED7: nsViewManager::UpdateWidgetArea(nsView*, nsIWidget*, nsRegion const&, nsView*) (nsViewManager.cpp:813)
    by 0x599508E: nsViewManager::UpdateView(nsIView*, nsRect const&, unsigned int) (nsViewManager.cpp:844)
    by 0x5992300: nsViewManager::UpdateView(nsIView*, unsigned int) (nsViewManager.cpp:602)
    by 0x5993B45: nsViewManager::RemoveChild(nsIView*) (nsViewManager.cpp:1420)
    by 0x598F3E8: nsView::~nsView() (nsView.cpp:224)
    by 0x598E41E: nsIView::Destroy() (nsView.cpp:294)
    by 0x5993EA2: nsViewManager::~nsViewManager() (nsViewManager.cpp:180)
    by 0x59921EC: nsViewManager::Release() (nsViewManager.cpp:226)
    by 0x5EC9F13: nsCOMPtr_base::~nsCOMPtr_base() (nsCOMPtr.cpp:81)
    by 0x569F5F6: DocumentViewerImpl::~DocumentViewerImpl() (nsCOMPtr.h:469)
    by 0x56985D8: DocumentViewerImpl::Release() (nsDocumentViewer.cpp:570)
  Address 0x21a2a838 is 8 bytes inside a block of size 128 free'd
    at 0x4C23A9E: operator delete(void*) (vg_replace_malloc.c:346)
    by 0x5E2FD67: nsThebesDeviceContext::~nsThebesDeviceContext() (nsThebesDeviceContext.cpp:351)
    by 0x5E2E7EE: nsThebesDeviceContext::Release() (nsThebesDeviceContext.cpp:295)
    by 0x56AFD55: nsPresContext::~nsPresContext() (nsPresContext.cpp:309)
    by 0x56AE0C5: nsPresContext::Release() (nsPresContext.cpp:322)
    by 0x5EC9FF0: nsCOMPtr_base::assign_with_AddRef(nsISupports*) (nsCOMPtr.h:456)
    by 0x569F487: DocumentViewerImpl::Destroy() (nsCOMPtr.h:640)
    by 0x569E2FE: DocumentViewerImpl::Show() (nsDocumentViewer.cpp:1914)
    by 0x56ACFDD: nsPresContext::EnsureVisible() (nsPresContext.cpp:1583)
    by 0x56B345F: PresShell::UnsuppressAndInvalidate() (nsPresShell.cpp:4611)
    by 0x56B3577: PresShell::UnsuppressPainting() (nsPresShell.cpp:4653)
    by 0x56B15BD: PresShell::sPaintSuppressionCallback(nsITimer*, void*) (nsPresShell.cpp:2724)
Comment 8 chris hofmann 2009-11-03 17:12:25 PST
  42 crashes for js_EqualStrings on 2009 11 02

os breakdown
  18 js_EqualStrings Windows NT 5.1.2600 Service Pack 3
   9 js_EqualStrings Windows NT 5.1.2600 Service Pack 2
   7 js_EqualStrings Windows NT 6.1.7600
   2 js_EqualStrings Windows NT 6.0.6002 Service Pack 2
   1 js_EqualStrings Windows NT 6.1.7100
   1 js_EqualStrings Windows NT 6.0.6001 Service Pack 1
   1 js_EqualStrings Windows NT 6.0.6000
   1 js_EqualStrings Mac OS X 10.6.1 10B504
   1 js_EqualStrings Mac OS X 10.5.8 9L31a
   1 js_EqualStrings Mac OS X 10.5.8 9L30
Comment 9 David Mandelin [:dmandelin] 2009-11-04 17:55:08 PST
Taking, although I may not have time to get to this for a while. Anyone who wants should feel free to steal.
Comment 10 David Mandelin [:dmandelin] 2009-11-12 16:22:01 PST
We believe this is a GC-with-tracing issue:

- The crash occurs at the start of js_EqualStrings where we try to get the length of the first argument. The JSString* is always 8-byte-aligned and always has a reasonable address, so it makes sense to guess that it is a "real" JSString* (and not another value that got mistagged, or a corrupted value). In all the examples that I looked at, the js_EqualStrings is called from trace. Notably, the code that records JSOP_LOOKUPSWITCH bakes a JSString* onto the trace, which could potentially get GC'd if it is not properly rooted. That string is also used as the first argument to js_EqualStrings.

- The first crash with this signature came from an 8/31 build. The last TM merge before that was on 8/25. There were 2 main GC/string-related things that came in there. One was that we GC without flushing traces, thus introducing the possibility for bugs with the referents of pointers baked into traces getting GC'd. The second was the atomized unit string table. That patch did have a GC bug, which was fixed as part of the same merge. That second change seems an unlikely cause of this because it should also cause crashes in the interpreter.

- The rate of these crashes on trunk decreased by 90% (from 5-20 per week to 1-2 per week) starting with the 10/17 build. The GC/string-related change in the TM merge of 10/16 fixed a problem where many traces were not being used as GC roots, thus causing objects baked into traces to be collected early.

So, we think the problem is that some string pointers baked into traces are not being marked by the GC because they are not rooted, or are being skipped due to bugs in the code that marks their roots.

Possible things to do going forward:

- diagnostic: invert the argument order in the call to js_EqualStrings in recorded code for JSOP_LOOKUPTABLE on trunk to see if that makes us crash at the second point.

- candidate mitigation: land bug 521169 to 1.9.2, which will reduce these by 90% if our guess here is right.

- further study: audit the code that marks objects referred to on trace fragments to see if there are still things being missed post bug 521169.

- diagnostic/whack-it-with-a-hammer: abort tracing JSOP_LOOKUPTABLE if the case expression is a string. If we are right, this should eliminate the problem, and if it doesn't, it will tell us that other calls (which are all in LIR for == and ===) are all/part of the problem.
Comment 11 David Anderson [:dvander] 2009-11-12 17:32:36 PST
bug 528048 - we have more rooting bugs still. maybe related.
Comment 12 David Mandelin [:dmandelin] 2009-11-13 16:15:14 PST
Created attachment 412332 [details] [diff] [review]
Diagnostic patch

Diagnostic patch to see if we are crashing on this particular trace-generated call to js_EqualStrings.
Comment 13 David Mandelin [:dmandelin] 2009-11-13 16:24:54 PST
Diagnostic patch landed to m-c:

http://hg.mozilla.org/mozilla-central/rev/ed50524641ba
Comment 14 David Mandelin [:dmandelin] 2009-11-13 18:07:24 PST
I did an audit of uses of insImm* and INS_CONST* to see if there are any potential bugs there from failing to mark things. I didn't find much, but I found 2 questionable lines of code that seem to bake constant pointers into the trace that need to be roots:

    near line 10654
    // vp[0] is the callee.
    lir->insStorei(INS_CONSTWORD(OBJECT_TO_JSVAL(funobj)), invokevp_ins, 0);

    near line 11480
    LIns* args[] = {vp_ins, INS_CONSTPTR(sprop), obj_ins, cx_ins};

I also saw 2 manual roots that I don't understand. Probably they are needed, but I wouldn't mind learning why:

    in guardCallee:
    treeInfo->gcthings.addUnique(callee);

    in interpretedFunctionCall:
    treeInfo->gcthings.addUnique(callee);
Comment 15 David Mandelin [:dmandelin] 2009-11-13 18:29:57 PST
OK. I looked at the historical data more carefully and I found that dvander's patch that greatly reduced the number of these crashes on trunk hit m-c [1] on 10/16/2009 at 10:35pm. Thus, the last nightly that did not have this fix was built on 10/16. 

Looking on crash-stats.com, I found that there are no crashes with this signature with build dates later than 10/16.

Thus, bug 521169 is the fix (this is actually a dup of that bug) and just needs to be landed on 1.9.2. 

This bug does not affect 1.9.1.

[1] http://hg.mozilla.org/mozilla-central/rev/dde13d040e44
[2] http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f7ccf195ccba&tochange=9224d55364e8

* Random thoughts:

  - Although all the data I need for this kind of historical analysis is available through the web interface, it seems like it always takes a few iterations. This "proves" that we need to be able to run our queries directly instead of using scripts and/or cobbled-together web query results.

  - I suspect part of the reason bug 521169 was introduced is that there is no standard API for iterating over all the trace trees/anchor fragments. Instead, code to do this is duplicated in 3-4 places. We should fix this.
Comment 16 chris hofmann 2009-11-13 18:48:27 PST
(In reply to comment #15)
> 
> * Random thoughts:
> 
>   - Although all the data I need for this kind of historical analysis is
> available through the web interface, it seems like it always takes a few
> iterations. This "proves" that we need to be able to run our queries directly
> instead of using scripts and/or cobbled-together web query results.

definitely agree. figuring out some "verification query" is a good idea. it would present the crash date and build info in some what that would easily answer the question "is this bugger still happening? and if/so, when and where?"  

maybe file a bug and include the process/scripts you cobbled-together and we and use step-wise refinement to make that set of steps faster and more reliable.  we could also kick this idea around at the next monday crashkill meeting.
Comment 17 David Mandelin [:dmandelin] 2009-11-16 14:12:47 PST
The fix landed to 1.9.2 with the 11/14 build and there are currently no crashes for this for that build or later (and there are crashes for previous builds that have occurred since then). So I think we can call this fixed; I'll verify in a week.

Note You need to log in before you can comment on or make changes to this bug.