Closed Bug 523745 Opened 12 years ago Closed 11 years ago

Crash in [@ js_EqualStrings ]


(Core :: JavaScript Engine, defect)

1.9.2 Branch
Not set



Tracking Status
status1.9.2 --- beta4-fixed


(Reporter: marcia, Assigned: dmandelin)



(Keywords: crash, topcrash, Whiteboard: [3.6b1][#5 Firefox 3.6b1 topcrash])

Crash Data


(1 file)

Seen while running Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2b1) Gecko/20091019 Firefox/3.6b1.

While running a BFT I stepped away from my machine for a moment and I crashed in this stack. 

Bug 451873 appears similar although it was filed a while back.

Here are my extensions:
Nightly Tester Tools	2.0.2	false	{8620c15f-30dc-4dba-a131-7c5d20cf4a29}
Yahoo! Toolbar	false	{635abd67-4fe9-1b23-4f01-e679fa7484c1}
FlashGot	true	{19503e42-ca3c-4c27-b1e2-9cdb2170ee34}
NoScript	true	{73a6fe31-595d-460b-a920-fcc0f8843232}
WindFox	1.0.2	true
Save Session	true
Pray Times!	1.1.3	true
Custom Buttons²	3.0.1	true
Reload Plus	1.0	true	reloadplus@blackwind
Whiteboard: [3.6b1]
marcia, do you have a crashreporter id ?
Was this crash reproducible without all this extensions ?
Severity: normal → critical
Have not been able to reproduce this crash. Will post the crash ID when I am next in the lab because I don't see the report on this 10.6 machine.
Happens on win xp too, apparently while random browsing.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2b2pre) Gecko/20091023 Firefox/3.0.1 (.NET CLR 3.5.30729)
OS: Mac OS X → All
Bug 497618 made changes around this area for 1.9.2. I wonder if it is somehow related. Looks like that we access random memory at 0x10d29000 while trying to get the string length:

n = str1->length();
Group: core-security
Flags: blocking1.9.2?
Keywords: topcrash
Whiteboard: [3.6b1] → [3.6b1][#12 3.6b1 topcrash]
Whiteboard: [3.6b1][#12 3.6b1 topcrash] → [3.6b1]
very early in the 3.6b1 data gathering but this is near the top of the top crash list

report for the stack signature js_EqualStrings 
42 total crashes for js_EqualStrings on 20091102-crashdata.csv
4 start up crashes inside 3 minutes

actually a couple of reports of this signature on 3.5.4 and 3.0.15 but it happens way less there.  we could check landings on those branches for suspect code.

distribution of all versions where the js_EqualStrings crash was found on 20091102-crashdata.csv
  40 Firefox 3.6b1
   1 Firefox 3.5.4
   1 Firefox 3.0.15

____________number of uniq sites found with this signature:
   2 js_EqualStrings
   2 js_EqualStrings \N
   1 js_EqualStrings
   1 js_EqualStrings
   1 js_EqualStrings
   1 js_EqualStrings
   1 js_EqualStrings
   1 js_EqualStrings
   3 js_EqualStrings  -profile uids removed

   1 js_EqualStrings
   1 js_EqualStrings
   1 js_EqualStrings
   1 js_EqualStrings
   1 js_EqualStrings
   1 js_EqualStrings
   1 js_EqualStrings[fb]messages&page=1&tid=1146629790696
   1 js_EqualStrings
   1 js_EqualStrings
   1 js_EqualStrings
   1 js_EqualStrings

I got the trace below (read-after-free), which has previously been
reported as bug 515799.  I suspect it's a red herring though; and I
couldn't see any other badness on the dozen or so of the other
above-listed URLs I tried.  This was on x64-linux, so no JITting aiui,
so perhaps not a good test.

 Invalid read of size 4
    at 0x5994686: nsViewManager::ViewToWidget(nsView*, nsView*, nsRect const&) const (nsIDeviceContext.h:284)
    by 0x5994ED7: nsViewManager::UpdateWidgetArea(nsView*, nsIWidget*, nsRegion const&, nsView*) (nsViewManager.cpp:813)
    by 0x599508E: nsViewManager::UpdateView(nsIView*, nsRect const&, unsigned int) (nsViewManager.cpp:844)
    by 0x5992300: nsViewManager::UpdateView(nsIView*, unsigned int) (nsViewManager.cpp:602)
    by 0x5993B45: nsViewManager::RemoveChild(nsIView*) (nsViewManager.cpp:1420)
    by 0x598F3E8: nsView::~nsView() (nsView.cpp:224)
    by 0x598E41E: nsIView::Destroy() (nsView.cpp:294)
    by 0x5993EA2: nsViewManager::~nsViewManager() (nsViewManager.cpp:180)
    by 0x59921EC: nsViewManager::Release() (nsViewManager.cpp:226)
    by 0x5EC9F13: nsCOMPtr_base::~nsCOMPtr_base() (nsCOMPtr.cpp:81)
    by 0x569F5F6: DocumentViewerImpl::~DocumentViewerImpl() (nsCOMPtr.h:469)
    by 0x56985D8: DocumentViewerImpl::Release() (nsDocumentViewer.cpp:570)
  Address 0x21a2a838 is 8 bytes inside a block of size 128 free'd
    at 0x4C23A9E: operator delete(void*) (vg_replace_malloc.c:346)
    by 0x5E2FD67: nsThebesDeviceContext::~nsThebesDeviceContext() (nsThebesDeviceContext.cpp:351)
    by 0x5E2E7EE: nsThebesDeviceContext::Release() (nsThebesDeviceContext.cpp:295)
    by 0x56AFD55: nsPresContext::~nsPresContext() (nsPresContext.cpp:309)
    by 0x56AE0C5: nsPresContext::Release() (nsPresContext.cpp:322)
    by 0x5EC9FF0: nsCOMPtr_base::assign_with_AddRef(nsISupports*) (nsCOMPtr.h:456)
    by 0x569F487: DocumentViewerImpl::Destroy() (nsCOMPtr.h:640)
    by 0x569E2FE: DocumentViewerImpl::Show() (nsDocumentViewer.cpp:1914)
    by 0x56ACFDD: nsPresContext::EnsureVisible() (nsPresContext.cpp:1583)
    by 0x56B345F: PresShell::UnsuppressAndInvalidate() (nsPresShell.cpp:4611)
    by 0x56B3577: PresShell::UnsuppressPainting() (nsPresShell.cpp:4653)
    by 0x56B15BD: PresShell::sPaintSuppressionCallback(nsITimer*, void*) (nsPresShell.cpp:2724)
Summary: Crash in [@js_EqualStrings ] → Crash in [@ js_EqualStrings ]
  42 crashes for js_EqualStrings on 2009 11 02

os breakdown
  18 js_EqualStrings Windows NT 5.1.2600 Service Pack 3
   9 js_EqualStrings Windows NT 5.1.2600 Service Pack 2
   7 js_EqualStrings Windows NT 6.1.7600
   2 js_EqualStrings Windows NT 6.0.6002 Service Pack 2
   1 js_EqualStrings Windows NT 6.1.7100
   1 js_EqualStrings Windows NT 6.0.6001 Service Pack 1
   1 js_EqualStrings Windows NT 6.0.6000
   1 js_EqualStrings Mac OS X 10.6.1 10B504
   1 js_EqualStrings Mac OS X 10.5.8 9L31a
   1 js_EqualStrings Mac OS X 10.5.8 9L30
Whiteboard: [3.6b1] → [3.6b1][#6 Firefox 3.6b1 topcrash]
Taking, although I may not have time to get to this for a while. Anyone who wants should feel free to steal.
Assignee: general → dmandelin
Flags: blocking1.9.2? → blocking1.9.2+
Whiteboard: [3.6b1][#6 Firefox 3.6b1 topcrash] → [3.6b1][#5 Firefox 3.6b1 topcrash]
We believe this is a GC-with-tracing issue:

- The crash occurs at the start of js_EqualStrings where we try to get the length of the first argument. The JSString* is always 8-byte-aligned and always has a reasonable address, so it makes sense to guess that it is a "real" JSString* (and not another value that got mistagged, or a corrupted value). In all the examples that I looked at, the js_EqualStrings is called from trace. Notably, the code that records JSOP_LOOKUPSWITCH bakes a JSString* onto the trace, which could potentially get GC'd if it is not properly rooted. That string is also used as the first argument to js_EqualStrings.

- The first crash with this signature came from an 8/31 build. The last TM merge before that was on 8/25. There were 2 main GC/string-related things that came in there. One was that we GC without flushing traces, thus introducing the possibility for bugs with the referents of pointers baked into traces getting GC'd. The second was the atomized unit string table. That patch did have a GC bug, which was fixed as part of the same merge. That second change seems an unlikely cause of this because it should also cause crashes in the interpreter.

- The rate of these crashes on trunk decreased by 90% (from 5-20 per week to 1-2 per week) starting with the 10/17 build. The GC/string-related change in the TM merge of 10/16 fixed a problem where many traces were not being used as GC roots, thus causing objects baked into traces to be collected early.

So, we think the problem is that some string pointers baked into traces are not being marked by the GC because they are not rooted, or are being skipped due to bugs in the code that marks their roots.

Possible things to do going forward:

- diagnostic: invert the argument order in the call to js_EqualStrings in recorded code for JSOP_LOOKUPTABLE on trunk to see if that makes us crash at the second point.

- candidate mitigation: land bug 521169 to 1.9.2, which will reduce these by 90% if our guess here is right.

- further study: audit the code that marks objects referred to on trace fragments to see if there are still things being missed post bug 521169.

- diagnostic/whack-it-with-a-hammer: abort tracing JSOP_LOOKUPTABLE if the case expression is a string. If we are right, this should eliminate the problem, and if it doesn't, it will tell us that other calls (which are all in LIR for == and ===) are all/part of the problem.
bug 528048 - we have more rooting bugs still. maybe related.
Attached patch Diagnostic patchSplinter Review
Diagnostic patch to see if we are crashing on this particular trace-generated call to js_EqualStrings.
Attachment #412332 - Flags: review?(dvander)
Attachment #412332 - Flags: review?(dvander) → review+
I did an audit of uses of insImm* and INS_CONST* to see if there are any potential bugs there from failing to mark things. I didn't find much, but I found 2 questionable lines of code that seem to bake constant pointers into the trace that need to be roots:

    near line 10654
    // vp[0] is the callee.
    lir->insStorei(INS_CONSTWORD(OBJECT_TO_JSVAL(funobj)), invokevp_ins, 0);

    near line 11480
    LIns* args[] = {vp_ins, INS_CONSTPTR(sprop), obj_ins, cx_ins};

I also saw 2 manual roots that I don't understand. Probably they are needed, but I wouldn't mind learning why:

    in guardCallee:

    in interpretedFunctionCall:
OK. I looked at the historical data more carefully and I found that dvander's patch that greatly reduced the number of these crashes on trunk hit m-c [1] on 10/16/2009 at 10:35pm. Thus, the last nightly that did not have this fix was built on 10/16. 

Looking on, I found that there are no crashes with this signature with build dates later than 10/16.

Thus, bug 521169 is the fix (this is actually a dup of that bug) and just needs to be landed on 1.9.2. 

This bug does not affect 1.9.1.


* Random thoughts:

  - Although all the data I need for this kind of historical analysis is available through the web interface, it seems like it always takes a few iterations. This "proves" that we need to be able to run our queries directly instead of using scripts and/or cobbled-together web query results.

  - I suspect part of the reason bug 521169 was introduced is that there is no standard API for iterating over all the trace trees/anchor fragments. Instead, code to do this is duplicated in 3-4 places. We should fix this.
(In reply to comment #15)
> * Random thoughts:
>   - Although all the data I need for this kind of historical analysis is
> available through the web interface, it seems like it always takes a few
> iterations. This "proves" that we need to be able to run our queries directly
> instead of using scripts and/or cobbled-together web query results.

definitely agree. figuring out some "verification query" is a good idea. it would present the crash date and build info in some what that would easily answer the question "is this bugger still happening? and if/so, when and where?"  

maybe file a bug and include the process/scripts you cobbled-together and we and use step-wise refinement to make that set of steps faster and more reliable.  we could also kick this idea around at the next monday crashkill meeting.
The fix landed to 1.9.2 with the 11/14 build and there are currently no crashes for this for that build or later (and there are crashes for previous builds that have occurred since then). So I think we can call this fixed; I'll verify in a week.
Closed: 11 years ago
Depends on: 521169
Resolution: --- → FIXED
Crash Signature: [@ js_EqualStrings ]
Group: core-security
You need to log in before you can comment on or make changes to this bug.