Crash in js::fun_apply
Categories
(Core :: JavaScript Engine, defect, P5)
Tracking
()
People
(Reporter: kanru, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: crash)
Crash Data
This bug was filed from the Socorro interface and is report bp-eb39db5e-7227-4dba-a605-03be22160525. ============================================================= https://crash-stats.mozilla.com/signature/?product=Firefox&signature=js%3A%3Afun_apply&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#summary This is #8 crash on Nightly 20160523030225, 7 crashes which are likely from single installation. We have this crash signature for a long time, average 3~4 crashes per day. Exploitability medium. Looks like we are calling into some invalid address. Probably related to jit. Jeff, you touched this code recently, do you have any idea?
Comment 1•7 years ago
|
||
My changes here (I assume you're referring to bug 1178653 and bug 1259877, right?) were generally cosmetic, not consequential -- just streamlining the internal APIs to do things more readably/simply. Calling into a random address, seems dubious as a failure mode of those patches. (I think you weren't implying my changes had anything to do with this, just noting in case.) As to what to do, how to debug this. Without any real information as to what triggered this, we're fairly in the dark as far as how this super-generic code path goes awry. The only real ideas I have, right off the bat, are to use MSDN's "This function is obsolete and should not be used" functions IsBadReadPtr https://msdn.microsoft.com/en-us/library/windows/desktop/aa366713(v=vs.85).aspx or similar functions IsBadCodePtr, IsBadStringPtr, or IsBadWritePtr to test the address being used here before actually using it -- and intentionally crashing if the claim is it's bad, in a way such that we can telemetrize whatever breadcrumbs we can find at that late date. Beyond this using obsolete functions of high dubiety, it's not necessarily clear those breadcrumbs would be enough to figure out what went wrong. But maybe if we grabbed the script URL or page URL or something, we could figure something out. Maybe?
Comment 2•7 years ago
|
||
Of some interest may be bug 1231024 comment 29 et seq: There was an error in the static register assignment for a fun_apply optimization on x86 (only) that was found and corrected. However, one unresolved aspect of that finding was that the code did not fail all the time, only when there was a certain amount of polymorphism in the test code, despite the register assignment being static.
Comment 3•7 years ago
|
||
Crash volume for signature 'js::fun_apply': - nightly (version 50): 0 crashes from 2016-06-06. - aurora (version 49): 0 crashes from 2016-06-07. - beta (version 48): 76 crashes from 2016-06-06. - release (version 47): 142 crashes from 2016-05-31. - esr (version 45): 2 crashes from 2016-04-07. Crash volume on the last weeks: W. N-1 W. N-2 W. N-3 W. N-4 W. N-5 W. N-6 W. N-7 - nightly 0 0 0 0 0 0 0 - aurora 0 0 0 0 0 0 0 - beta 9 2 5 6 11 18 21 - release 22 19 10 17 22 19 20 - esr 0 0 0 1 0 1 0 Affected platforms: Windows, Mac OS X
Comment 4•4 years ago
|
||
Bumping -- 153 crashes on Fennec in the past week. Comments saying this occurs a lot on Facebook.
Comment 5•4 years ago
|
||
This doesn't seem any worse or more actionable than it was in the past?
Comment 7•4 years ago
|
||
99.4% of these js::fun_apply crashes in Fennec 67 Nightly are ARM64 builds.
Comment 8•4 years ago
|
||
These crash reports seems to be non-actionable, as the stack trace is missing except the last frame which is pointing at the RootingAPI.
The following stack overflow page [1] suggest that concurrent reads could cause these kind of issues, but this sounds dubious and would not explain any of the crashes already reported on x86 and x86_64 CPUs, unless there is a data-race.
Paul, any idea what this could be?
[1] https://stackoverflow.com/questions/19119943/what-does-segv-accerr-mean
Comment 9•4 years ago
|
||
Setting P1 as this is the second highest top-crasher on arm64.
Comment 10•4 years ago
|
||
If this can help, there was a spike of this crash signature on the release channel between December 27 and January 14, and another one starting on February 27th.
However, all these crashes seems to be on older versions of Firefox, and so far none got reported on the latest releases. We should wait until the release rolls-out.
Comment 11•4 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #8)
These crash reports seems to be non-actionable, as the stack trace is missing except the last frame which is pointing at the RootingAPI.
The following stack overflow page [1] suggest that concurrent reads could cause these kind of issues, but this sounds dubious and would not explain any of the crashes already reported on x86 and x86_64 CPUs, unless there is a data-race.
I agree, that answer is really weird/wrong. When a CPU gets a conflicting.. anything, I don't think it's even aware that there is a conflict, and you just get wrong answers/corruption. If it were aware then it could resolve the conflict itself right? and we wouldn't have the memory models we do now, and it'd be much slower. I could be wrong, but that's what makes sense to me and fits with other knowledge.
[1] https://stackoverflow.com/questions/19119943/what-does-segv-accerr-mean
Comment 12•4 years ago
|
||
There are different types of crashes mixed up in the same signature. Some have completely messed-up stacks:
https://crash-stats.mozilla.com/report/index/d5b74f23-0686-46a3-859b-5d9f90190402
Because it looks like the stack pointer is bad and then this function attempts to write to the stack as it constructs the InvokeArgs object. The stack pointer is 0x00000000008e7d90 which does not look valid to me.
And yeah, that's pretty unactionable on its own.
https://searchfox.org/mozilla-release/source/js/src/vm/JSFunction.cpp#1172
And then there are others with more sensible stacks:
https://crash-stats.mozilla.com/report/index/d4f636e1-2c10-4e35-bc43-8c2810190401
Can someone who speaks crash-stats signatures update the signature on this bug to that it covers the first group only, and file a new bug for the second group. (NI: :kanru). Thanks.
Well, that's the windows crashes. The crash stats link for all crashes including Android is:
Android has a whole lot more crashes, and has the kind with and without the bad stack:
Bad stack on android: https://crash-stats.mozilla.com/report/index/ac6fba85-c1d0-42f9-82b2-c25890190402
Okay stack on android: https://crash-stats.mozilla.com/report/index/a395841f-35f3-44c7-bcf5-f1e990190402
Comment 13•4 years ago
|
||
(In reply to Paul Bone [:pbone] from comment #12)
Can someone who speaks crash-stats signatures update the signature on this bug to that it covers the first group only, and file a new bug for the second group. (NI: :kanru). Thanks.
The easiest is to go to the list on crash-stat, and display the "proto signature". Then copy it to "Crash Signature" field of this bug and add square brackets around the proto signature, as a replacement of the previous one.
Comment 14•4 years ago
|
||
I think it needs something tricker than that.
Somehow we need to say that it should have js::fun_apply at the top or very near the top, and NOT something sensible like MessageLoop::Run or nsThread::ProcessNextEvent at or near the bottom, because I think we want to select only those crashes with messed-up stacks.
Updated•4 years ago
|
Updated•2 years ago
|
Comment 15•10 months ago
|
||
Redirect a needinfo that is pending on an inactive user to the triage owner.
:sdetar, since the bug has high severity, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•10 months ago
|
Comment 16•6 months ago
|
||
Since the crash volume is low (less than 5 per week), the severity is downgraded to S3
. Feel free to change it back if you think the bug is still critical.
For more information, please visit auto_nag documentation.
Description
•