Open Bug 1275495 Opened 5 years ago Updated 5 days ago

Crash in js::fun_apply

Categories

(Core :: JavaScript Engine, defect, P5)

ARM64
Android
defect

Tracking

()

Tracking Status
firefox48 --- affected
firefox-esr45 --- affected
firefox64 - wontfix
firefox65 - wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- affected

People

(Reporter: kanru, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-eb39db5e-7227-4dba-a605-03be22160525.
=============================================================
https://crash-stats.mozilla.com/signature/?product=Firefox&signature=js%3A%3Afun_apply&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#summary

This is #8 crash on Nightly 20160523030225, 7 crashes which are likely from single installation. We have this crash signature for a long time, average 3~4 crashes per day.

Exploitability medium. Looks like we are calling into some invalid address. Probably related to jit. Jeff, you touched this code recently, do you have any idea?
Flags: needinfo?(jwalden+bmo)
My changes here (I assume you're referring to bug 1178653 and bug 1259877, right?) were generally cosmetic, not consequential -- just streamlining the internal APIs to do things more readably/simply.  Calling into a random address, seems dubious as a failure mode of those patches.  (I think you weren't implying my changes had anything to do with this, just noting in case.)

As to what to do, how to debug this.  Without any real information as to what triggered this, we're fairly in the dark as far as how this super-generic code path goes awry.

The only real ideas I have, right off the bat, are to use MSDN's "This function is obsolete and should not be used" functions IsBadReadPtr https://msdn.microsoft.com/en-us/library/windows/desktop/aa366713(v=vs.85).aspx or similar functions IsBadCodePtr, IsBadStringPtr, or IsBadWritePtr to test the address being used here before actually using it -- and intentionally crashing if the claim is it's bad, in a way such that we can telemetrize whatever breadcrumbs we can find at that late date.

Beyond this using obsolete functions of high dubiety, it's not necessarily clear those breadcrumbs would be enough to figure out what went wrong.  But maybe if we grabbed the script URL or page URL or something, we could figure something out.  Maybe?
Flags: needinfo?(jwalden+bmo)
Of some interest may be bug 1231024 comment 29 et seq: There was an error in the static register assignment for a fun_apply optimization on x86 (only) that was found and corrected.  However, one unresolved aspect of that finding was that the code did not fail all the time, only when there was a certain amount of polymorphism in the test code, despite the register assignment being static.
Crash volume for signature 'js::fun_apply':
 - nightly (version 50): 0 crashes from 2016-06-06.
 - aurora  (version 49): 0 crashes from 2016-06-07.
 - beta    (version 48): 76 crashes from 2016-06-06.
 - release (version 47): 142 crashes from 2016-05-31.
 - esr     (version 45): 2 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       0       0       0       0       0       0       0
 - aurora        0       0       0       0       0       0       0
 - beta          9       2       5       6      11      18      21
 - release      22      19      10      17      22      19      20
 - esr           0       0       0       1       0       1       0

Affected platforms: Windows, Mac OS X
Bumping -- 

153 crashes on Fennec in the past week. Comments saying this occurs a lot on Facebook.
OS: Windows → Android
Hardware: x86 → All
This doesn't seem any worse or more actionable than it was in the past?
Duplicate of this bug: 1524182

99.4% of these js::fun_apply crashes in Fennec 67 Nightly are ARM64 builds.

These crash reports seems to be non-actionable, as the stack trace is missing except the last frame which is pointing at the RootingAPI.

The following stack overflow page [1] suggest that concurrent reads could cause these kind of issues, but this sounds dubious and would not explain any of the crashes already reported on x86 and x86_64 CPUs, unless there is a data-race.

Paul, any idea what this could be?

[1] https://stackoverflow.com/questions/19119943/what-does-segv-accerr-mean

Flags: needinfo?(pbone)

Setting P1 as this is the second highest top-crasher on arm64.

Priority: -- → P1

If this can help, there was a spike of this crash signature on the release channel between December 27 and January 14, and another one starting on February 27th.

However, all these crashes seems to be on older versions of Firefox, and so far none got reported on the latest releases. We should wait until the release rolls-out.

(In reply to Nicolas B. Pierron [:nbp] from comment #8)

These crash reports seems to be non-actionable, as the stack trace is missing except the last frame which is pointing at the RootingAPI.

The following stack overflow page [1] suggest that concurrent reads could cause these kind of issues, but this sounds dubious and would not explain any of the crashes already reported on x86 and x86_64 CPUs, unless there is a data-race.

I agree, that answer is really weird/wrong. When a CPU gets a conflicting.. anything, I don't think it's even aware that there is a conflict, and you just get wrong answers/corruption. If it were aware then it could resolve the conflict itself right? and we wouldn't have the memory models we do now, and it'd be much slower. I could be wrong, but that's what makes sense to me and fits with other knowledge.

[1] https://stackoverflow.com/questions/19119943/what-does-segv-accerr-mean

Flags: needinfo?(pbone)

There are different types of crashes mixed up in the same signature. Some have completely messed-up stacks:

https://crash-stats.mozilla.com/report/index/d5b74f23-0686-46a3-859b-5d9f90190402

Because it looks like the stack pointer is bad and then this function attempts to write to the stack as it constructs the InvokeArgs object. The stack pointer is 0x00000000008e7d90 which does not look valid to me.

And yeah, that's pretty unactionable on its own.

https://searchfox.org/mozilla-release/source/js/src/vm/JSFunction.cpp#1172

And then there are others with more sensible stacks:

https://crash-stats.mozilla.com/report/index/d4f636e1-2c10-4e35-bc43-8c2810190401

Can someone who speaks crash-stats signatures update the signature on this bug to that it covers the first group only, and file a new bug for the second group. (NI: :kanru). Thanks.

Well, that's the windows crashes. The crash stats link for all crashes including Android is:

https://crash-stats.mozilla.com/signature/?signature=js%3A%3Afun_apply&date=%3E%3D2019-01-03T03%3A55%3A00.000Z&date=%3C2019-04-03T03%3A55%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#reports

Android has a whole lot more crashes, and has the kind with and without the bad stack:

Bad stack on android: https://crash-stats.mozilla.com/report/index/ac6fba85-c1d0-42f9-82b2-c25890190402

Okay stack on android: https://crash-stats.mozilla.com/report/index/a395841f-35f3-44c7-bcf5-f1e990190402

Flags: needinfo?(kanru)

(In reply to Paul Bone [:pbone] from comment #12)

Can someone who speaks crash-stats signatures update the signature on this bug to that it covers the first group only, and file a new bug for the second group. (NI: :kanru). Thanks.

The easiest is to go to the list on crash-stat, and display the "proto signature". Then copy it to "Crash Signature" field of this bug and add square brackets around the proto signature, as a replacement of the previous one.

I think it needs something tricker than that.

Somehow we need to say that it should have js::fun_apply at the top or very near the top, and NOT something sensible like MessageLoop::Run or nsThread::ProcessNextEvent at or near the bottom, because I think we want to select only those crashes with messed-up stacks.

Priority: P1 → P5
QA Whiteboard: qa-not-actionable
You need to log in before you can comment on or make changes to this bug.