Closed Bug 943366 Opened 8 years ago Closed 8 years ago

crash in js::ExclusiveContext::getNewType(js::Class const*, js::TaggedProto, JSFunction*)

Categories

(Core :: JavaScript Engine, defect)

27 Branch
x86
All
defect
Not set
critical

Tracking

()

VERIFIED DUPLICATE of bug 951528
Tracking Status
firefox26 - verified
firefox27 + verified
firefox28 --- verified
firefox29 --- verified

People

(Reporter: tracy, Assigned: djvj)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file, 1 obsolete file)

3.46 KB, text/plain
Details
This bug was filed from the Socorro interface and is 
report bp-35243c58-3291-42aa-9a55-227472131125.
=============================================================

This has entered top 10 (#8) on Aurora (Fx27).  It also present on Beta (Fx26) and Nightly (Fx28) though not appearing on those top crash lists.
  
Crash Reason: 	
EXCEPTION_ACCESS_VIOLATION_READ (FX26, Fx27 andFx28) (most reports have this reason.

However, Fx26 is also showing several
EXCEPTION_ACCESS_VIOLATION_WRITE (eg. https://crash-stats.mozilla.com/report/index/51e63431-9a67-481b-a905-a8b392131123)
and one
EXCEPTION_ILLEGAL_INSTRUCTION (eg. https://crash-stats.mozilla.com/report/index/fa58e53b-5a8f-4978-9236-9f0262131122)
Requesting tracking for 27 because of topcrash status there and for 26 for investigation into the potential security risk there.
Dan can you expidite a look at this bug for sec risk in FF26? We're going to build on RC and final beta today.
Flags: needinfo?(dveditz)
Talked with Dveditz in IRC and there's no immediate concern here that would make this a blocker for FF26 but will track for FF27 as a topcrasher and we should definitely look into this further as well as pull in the other bugs involving jsExclusiveContext -- over to Naveed for assigning
Flags: needinfo?(dveditz) → needinfo?(nihsanullah)
Assignee: nobody → jcoppeard
Flags: needinfo?(nihsanullah)
This hasn't been caused by the recent GC-related changes in getNewTypes() made by bug 939993 as these haven't landed on Aurora or beta, where this crash is also occurring.

Looking at the crash stats, the first builds where this shows up are 2013-10-25 for Aurora, 2013-10-30 for nightly and all the way back to 2013-09-20 for version 26 (currently beta), although that has a slightly different signature.

Unassigning myself since I'm not making any progress on this and I couldn't find anything obvious that would case this.
Assignee: jcoppeard → nobody
The crash is in JIT code, but fortunately there are enough symbol references to make it recognizable:

07da99c7 8b6804          mov     ebp,dword ptr [eax+4]
07da99ca 8b6d00          mov     ebp,dword ptr [ebp]
07da99cd 81fd68302166    cmp     ebp,offset mozjs!js::ProxyObject::callableClass_ (66213068)
07da99d3 0f840d0b0000    je      07daa4e6
07da99d9 81fdd0162166    cmp     ebp,offset mozjs!js::ProxyObject::uncallableClass_ (662116d0)
07da99df 0f84010b0000    je      07daa4e6
07da99e5 81fd50312166    cmp     ebp,offset mozjs!js::OuterWindowProxyObject::class_ (66213150)
07da99eb 0f84f50a0000    je      07daa4e6
07da99f1 f7450440000000  test    dword ptr [ebp+4],40h

This looks like branchTestObjectTruthy: http://mxr.mozilla.org/mozilla-central/source/js/src/jit/IonMacroAssembler.h#917

The crash occurs at 07da99c7 because eax (I'm guessing a JSObject* in objReg?) has the value of 1.
Looking at the context before EIP, this is actually part of CodeGenerator::testValueTruthyKernel. I can see the tag tests for undefined, null, boolean, int32, and finally object before doing the branchTestObjectTruthy bit in comment 6.

So, testValueTruthyKernel was passed a |value| whose tag indicates object but whose payload is 1.
Looks like two separate crashes here, to me.  The 0x1 crashes are what comment 6 is looking at.  The other places are some other issue.

Note that bug 936372 is an intermittent orange that implicates truthiness testing code -- not much information there as we don't (to the best of my knowledge?) have minidumps or similar to investigate, like we do here.  The 0x1 crashes started showing up November 4, that bug was filed November 8, so I suspect with a little filing-time/happenstance fuzz they're the same issue.
Blocks: 936372
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #8)
> Looks like two separate crashes here, to me.  The 0x1 crashes are what
> comment 6 is looking at.  The other places are some other issue.

Yeah. What I consider to be "this bug" are the EXCEPTION_ACCESS_VIOLATION_READ at address 0x5 (pointer 0x1 + offset 0x4). These are the most prevalent hits, and show up on Aurora 27 and Nightly 28. There are a few occurrences getNewType on 26, but they are under different circumstances, and in lesser volume. I'm not worried about those hits.

(The getNewType signature is wrong anyway; breakpad got confused by the JIT frame on the stack. It's certainly possible that other crashes got mixed into the same bucket)
Depends on: 947526
(In reply to David Major [:dmajor] from comment #9)
> (The getNewType signature is wrong anyway; breakpad got confused by the JIT
> frame on the stack. It's certainly possible that other crashes got mixed
> into the same bucket)

Breakpad cannot yet deal with JIT frames correctly - I know that bsmedberg and nbp were talking about solutions for that at the Stability Week, but I'm not sure if there's any bugs filed for the work or when it's planned.
Not sure if you need repro steps but http://www.mountain.es/epages/Mountain.sf/es_ES/?ObjectPath=/Shops/Store.Mountain/Products/OMPSTUDIO3D_154G is very crashy for me on both Aurora and Nightly. If it doesn't crash right away reload it. I never survived more than one reload.
For me the website mentioned in comment 11 also crashes the current Firefox nightly reliably, but my crash reports do not link to this bug.

Similar thing with this website: http://shop.ayy.fi/en_GB/
It always crashes nightly except when I load it in a background tab.
Crash reports used to point to this bug, but now with new crash reports there is no link to any bug, and several days ago the crash reports for the same website linked to other bugs.
jaulmerd: Thank you for the link! A reliable repro will be super helpful here.

Waldo: I confirmed that the link from comment 11 hits this crash on my m-c build from this morning. What kind of code can we add to a local build help track down the 0x1?
Flags: needinfo?(jwalden+bmo)
And thanks Christian for confirming too. The comment 12 also hits this crash on my build. 

I've kicked off a debug build, maybe some assertion will help find the cause. I'll check tomorrow.
No asserts on a debug build, just the same crash.
Attached file bug943366notes.txt (obsolete) —
I've bisected this down to change 2963a336e7ec from bug 921120.

Using the replay debugger, I've partially tracked down the source of the 0x1 pointer -- some JIT code constructs a Franken-value with the tag and payload coming from different sources. I've attached some disassembly with my notes.

The two repro links (comment 11 and 12) have similar copies of a script called sf.epk.min.js, which appears to be a concatenation of several libraries. I've saved a Fiddler capture in case the live sites change.

This is as far as I can investigate without JS expertise. Kannan can you take it from here?
Flags: needinfo?(kvijayan)
Attached file bug943366notes.txt
Oops. I made a mistake in the notes. The value at ESP+10C should actually be 4 (or higher) -- basically any value such that we don't take the bailout branch.
Attachment #8348337 - Attachment is obsolete: true
Blocks: 921120
If there's a reproducible testcase, we shouldn't need to add any extra code to track it down.  I trust JIT people can figure out the issue from here.  :-)  (At worst, by cutting down the original testcase to size, but it rarely gets that bad.)
Flags: needinfo?(jwalden+bmo)
Thanks for the super detailed analysis David!  Looking at it.  The boolean test code in the leadup seems to be a |CompareBAndBranch| codegen, followed by the false block (the small region you have marked as not-executed), and true block.  Bad codegen is somewhere in the true block.

Don't have any further insights as of yet.
Assignee: nobody → kvijayan
I suspect this may be the same as bug 951528.  There too, we have an invalid JSValue constructed with a String tag and a zero (null) payload, and it seems to be somehow related to frames which use arguments.
Flags: needinfo?(kvijayan)
7 Day Ranking:
> Firefox 26: N/A
> Firefox 27: #59 @ 0.18% (new)
> Firefox 28: N/A
> Firefox 29: N/A

Given current rankings it would seem this is no longer an issue in Firefox 26, 28, and 29, and is low-volume issue in Firefox 27.
Keywords: topcrash-win
I cannot reproduce this crash anymore using the URLs from comment 11 and comment 12.

Last bad nightly: 2013-12-20
First good nightly: 2013-12-21

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c9ea463d36c3&tochange=7bc1fb6a21ae

A patch for bug 951528 landed in that range (see comment 20).
Given that this is the same regressing patch as for bug 951528, the assembly dump shows the same behaviour (bad pointer-based JSValue getting built off of arguments), and that the bug doesn't reproduce after the fix for 951528 went in, I'm reasonably positive that this is the same issue.

Marking dup.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 951528
The bug this was duped to is verified fixed.  But I have seen this signature, in greatly reduced volume, from on 27b4.  If it doesn't drop out of topcrash status with data from 27b4 and 27b5, I'll reopen.
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #24)
> The bug this was duped to is verified fixed.  But I have seen this
> signature, in greatly reduced volume, from on 27b4.  If it doesn't drop out
> of topcrash status with data from 27b4 and 27b5, I'll reopen.

I expect this to drop rapidly as people update. There may still be a few hits remaining (comment 9 observed that there are some other root-causes with this signature) but they ought to be in very much lower volume.
I'm still seeing a fairly high volume of crashes, mostly on Android though:

Firefox
 * Firefox 29: 1 crash
 * Firefox 28: 1 crash
 * Firefox 27: 54 crashes
 * Firefox 26: 89 crashes

Fennec
 * Fennec 29: 1 crash
 * Fennec 28: 0 crashes
 * Fennec 27: 20 crashes
 * Fennec 26: 506 crashes
(In reply to David Major [:dmajor] from comment #25)
> (In reply to [:tracy] Tracy Walker - QA Mentor from comment #24)
> > The bug this was duped to is verified fixed.  But I have seen this
> > signature, in greatly reduced volume, from on 27b4.  If it doesn't drop out
> > of topcrash status with data from 27b4 and 27b5, I'll reopen.
> 
> I expect this to drop rapidly as people update. There may still be a few
> hits remaining (comment 9 observed that there are some other root-causes
> with this signature) but they ought to be in very much lower volume.

as predicted, it has dropped considerably (desktop) with a few other crashes.  main volume here has been crushed.
Status: RESOLVED → VERIFIED
Volume in the last 3 days:
 * Firefox 26: 67 crashes
 * Firefox 27: 18 crashes
 * Firefox 28: 1 crash
 * Firefox 29: 2 crashes

Based on volume I'd say this is fixed on Desktop. It's still pretty high on Fennec 26 though with 538 crashes in 3 days. Should there be another bug filed for this specifically for Firefox on Android?
You need to log in before you can comment on or make changes to this bug.