Crash in js::GlobalHelperThreadState::ionLazyLinkListRemove

RESOLVED FIXED in Firefox 49

Status

()

defect
--
critical
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: njn, Assigned: h4writer)

Tracking

({crash})

Trunk
mozilla50
Unspecified
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox47 affected, firefox48 affected, firefox49 fixed, firefox-esr45 affected, firefox50 fixed)

Details

(crash signature)

Attachments

(2 attachments, 2 obsolete attachments)

This bug was filed from the Socorro interface and is 
report bp-7fbb8713-a62d-4e7a-8c45-2e9ca2160516.
=============================================================

This crash first appeared in Nightly 20160515030241 and has occurred 5 times, some on Mac and some on Linux. Here are all the occurrences:

https://crash-stats.mozilla.com/signature/?product=Firefox&date=%3E%3D2016-01-20&signature=js%3A%3AGlobalHelperThreadState%3A%3AionLazyLinkListRemove&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1

Hannes, it looks like bug 1270108 might be the cause -- it touched those lines and it landed on May 14. Can you please investigate?
Flags: needinfo?(hv1989)
I tried to pinpoint anything, but didn't find anything yet.

I think this signature is the same, but on Windows:
https://crash-stats.mozilla.com/signature/?version=49.0a1&signature=js%3A%3Ajit%3A%3ALazyLink#aggregations
Do you know which pointer is null? Maybe we can add a setting to use a very low limit to make fuzzing more effective?
Posted patch Patch (obsolete) — Splinter Review
I have been breaking my head about this.

Somehow the lazy link stub is called, but when we want to get the IonBuilder it is gone? There are only a few places we remove pendingIonBuilder. All of them happen on the mainthread. Therefore not entirely sure what is going on...

Also I'm not sure it is related to this patch. I think only the signature has changed:
https://crash-stats.mozilla.com/signature/?product=Firefox&signature=js%3A%3Ajit%3A%3ALazyLinkTopActivation&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#reports

This patch will just paper the issue. It also contains an extra check to make sure the state between pendingBuilder and baselineOrIonRawPointer is correct.
Assignee: nobody → hv1989
Flags: needinfo?(hv1989)
Attachment #8758154 - Flags: review?(jdemooij)
Comment on attachment 8758154 [details] [diff] [review]
Patch

Review of attachment 8758154 [details] [diff] [review]:
-----------------------------------------------------------------

Clearing the r? as we found out what's likely causing this.
Attachment #8758154 - Flags: review?(jdemooij)
Flags: needinfo?(hv1989)
In case it helps, I crash 100% on https://www.mapbox.com/maps/ and subsequent pages when scrolling/zooming.

Here's an incident ID from OS X Nightly: https://crash-stats.mozilla.com/report/index/089af15e-3dff-42a2-9b5c-5da8e2160609
This is the #2 OS X Nightly crash for 6/7 builds, with 5 crashes.
This moves the lazy link list to the runtime. As a result the cutoff in "AttachFinishedCompilations" should now only hit builders from that runtime. Instead of removing from other threads (e.g. workers).

I added functions to LinkedList.h to take "LinkedListElement<T>" as arguments instead of "T", since IonBuilder is not a complete type and I didn't wanted to include "IonBuilder.h" into JSRuntime.
Attachment #8758154 - Attachment is obsolete: true
Flags: needinfo?(hv1989)
Attachment #8762014 - Flags: review?(jdemooij)
Duplicate of this bug: 1279506
Crash Signature: [@ js::GlobalHelperThreadState::ionLazyLinkListRemove] → [@ js::GlobalHelperThreadState::ionLazyLinkListRemove] [@ js::jit::LazyLink]
In OffThreadIonCompile, when compartment == nullptr, aren't we supposed to cancel/finish everything? Not touching the lazy link list in that case is a bit confusing.
Flags: needinfo?(hv1989)
(In reply to Jan de Mooij [:jandem] from comment #9)
> In OffThreadIonCompile, when compartment == nullptr, aren't we supposed to
> cancel/finish everything? Not touching the lazy link list in that case is a
> bit confusing.

Oh right I forgot to explain this. The "compartment == nullptr" is only used for waitForAllThreads which is only used for OOMTesting. And I think we only do this to make sure there is no compilation happening on another thread? I could just clear the whole runtime ion lazy link maybe?
Flags: needinfo?(hv1989)
(In reply to Hannes Verschore [:h4writer] from comment #11)
> Oh right I forgot to explain this. The "compartment == nullptr" is only used
> for waitForAllThreads which is only used for OOMTesting. And I think we only
> do this to make sure there is no compilation happening on another thread? I
> could just clear the whole runtime ion lazy link maybe?

I see, that makes sense. It'd be good to make this more explicit to avoid surprises:

* Add a |bool discardLazyLinkList = true| argument to CancelOffThreadIonCompile.
* Then we can MOZ_ASSERT(discardLazyLinkList == !!compartment);

The LinkedList changes are not trivial, so these should be reviewed by an MFBT reviewer like Waldo. I think it'd be simpler to just define these functions in HelperThreads.cpp instead of vm/Runtime.cpp, so you don't need to change LinkedList. We do that elsewhere too: SelfHosting.cpp defines a bunch of JSRuntime methods.
Attachment #8762014 - Flags: review?(jdemooij)
Yeah, I don't think we should be modifying LinkedList for this, just to avoid an include.  Realistically, Runtime is pretty fundamental; having to include IonBuilder.h in it is not a real burden.
Okay, thanks for the input. I'll update my patch accordingly.
Attachment #8762014 - Attachment is obsolete: true
Attachment #8763852 - Flags: review?(jdemooij)
Comment on attachment 8763852 [details] [diff] [review]
bug1274895-ion-lazylinklist

Review of attachment 8763852 [details] [diff] [review]:
-----------------------------------------------------------------

Nice
Attachment #8763852 - Flags: review?(jdemooij) → review+
Pushed by hv1989@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8fba141d39a9
IonMonkey: Move the ion lazy list to the JSRuntime, r=jandem
Pushed by hv1989@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bcd2d9a91654
IonMonkey: Follow-up on bug 1274895 for failure, r=jandem
https://hg.mozilla.org/mozilla-central/rev/8fba141d39a9
https://hg.mozilla.org/mozilla-central/rev/bcd2d9a91654
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Attachment #8765337 - Flags: review+
Comment on attachment 8765337 [details] [diff] [review]
Patch (for aurora)

Approval Request Comment
[Feature/regressing bug #]:
Bug 1270108

[User impact if declined]:
Possible crashes on JS heavy sites that use webworkers

[Describe test coverage new/current, TreeHerder]:
Landed on m-i. Crashs-stats doesn't show any crashes since this landing.

[Risks and why]: 
This is a very mechanical patch. Moving code and logic from one to another place. That part is stressed a lot during regular browsing. Not impossible that something got screwed up, but very unlikely. I would have expected to have reports (bugzilla/crash-stats) if that were the case.

[String/UUID change made/needed]:
/
Attachment #8765337 - Flags: approval-mozilla-aurora?
Comment on attachment 8765337 [details] [diff] [review]
Patch (for aurora)

Crash fix, affects aurora, let's try an uplift.
Attachment #8765337 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Crash volume for signature 'js::jit::LazyLink':
 - nightly(version 50):107 crashes from 2016-06-06.
 - aurora (version 49):21 crashes from 2016-06-07.
 - beta   (version 48):37 crashes from 2016-06-06.
 - release(version 47):11 crashes from 2016-05-31.
 - esr    (version 45):3 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       0       1       0       0       0       7       9
 - aurora        0       0       1       0       7       4       9
 - beta          7       3       4       2       5       6       6
 - release       2       1       0       1       0       4       3
 - esr           0       2       0       0       0       1       0

Affected platforms: Windows, Mac OS X, Linux
You need to log in before you can comment on or make changes to this bug.