Closed Bug 1347984 Opened 7 years ago Closed 7 years ago

Crash in EnqueuePromiseReactionJob called from nsJARChannel::OnStopRequest

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox54 + wontfix
firefox55 + wontfix
firefox56 + wontfix
firefox57 + fixed
firefox58 --- fixed

People

(Reporter: mccr8, Assigned: jandem)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-5b904ab8-0afe-4928-a86d-01beb2170314.
=============================================================

There are a few dozen of these crashes across various channels in the last week. The ones with this signature that I looked at were all coming from nsJARChannel::OnStopRequest, into AsyncScriptLoader::OnStreamComplete, into js::PromiseObject::reject.
Component: Networking: JAR → JavaScript Engine
before bp-61d6ccf4-a326-43ca-adba-5e62b0170419 happened, I installed the webextension only xpi from https://github.com/gorhill/uBlock/releases/download/1.12.1/uBlock0.webext.xpi (uBlock Origin) in my second firefox profile. Then I started it via about:profiles and the tabs from the last session were automatically opened before the immediate crash. I think this is my first "@ EnqueuePromiseReactionJob" crash.
this crash signature seems to have regressed in volume in firefox 54, which is now entering into the beta phase. the first recorded crash was in 54.0a1 build 20170224030232 and it continued somewhat regularly thereafter...

the crashes are happening cross-platform but it's mainly os x and linux systems that are affected by it. in very early crash data from 54.0b1 the signature is accounting for 4.4% of all crashes in the content process.
Keywords: regression
OS: Mac OS X → All
Hardware: Unspecified → All
(In reply to Darkspirit from comment #1)
> before bp-61d6ccf4-a326-43ca-adba-5e62b0170419 happened, I installed the
> webextension only xpi from
> https://github.com/gorhill/uBlock/releases/download/1.12.1/uBlock0.webext.
> xpi (uBlock Origin) in my second firefox profile. Then I started it via
> about:profiles and the tabs from the last session were automatically opened
> before the immediate crash. I think this is my first "@
> EnqueuePromiseReactionJob" crash.

When I set extensions.webextensions.remote back to false (the default), I don't had any crashes anymore. I reported bug 1357729 in this context, which is resolved as duplicate of bug 1353060.
See Also: → 1346012
[Tracking Requested - why for this release]: Crash volume seems to be climbing, cf. comment 2.

Till, is there any other diagnostic information that we can add to help debug this?  I know that you have said that you don't see what's going on in bug 1346012, but clearly something is going wrong. :)
Flags: needinfo?(till)
Makes sense to track this based on Comment 4.
Is this planning on being uplifted, we think this might be the causes of in the beta webextensions cohort.
Hi Jason, Is there someone that owns this bug?  It sounds like a P1 fix that would be considered for uplift to 54... without a clear owner.
Flags: needinfo?(jorendorff)
Priority: -- → P1
Shu can you have someone take a look at this?
Flags: needinfo?(shu)
Till is still the person with most expertise in the promises code, who's already needinfo'd. Darkspirit's comment 1 STR is worth a try, hopefully it still reproduces.
Flags: needinfo?(shu)
(Sorry, can't reproduce it anymore.)
Attached patch More release asserts (obsolete) — Splinter Review
After staring at this code for another few hours, I still don't even have a hunch about what might be going on. Given that this happens on all platforms, it seems very likely that this is in fact an internal bug, not something caused by meddling by, say, AV software.

This patch adds a few additional release asserts that I have at least some vague hope of getting triggered instead of crashing as right now.

I would very much like to know what's special about address 0x4c201b4, which is involved in about 50% of all crashes ...
Assignee: nobody → till
Status: NEW → ASSIGNED
Flags: needinfo?(till)
Flags: needinfo?(jorendorff)
Attachment #8869708 - Flags: review?(jdemooij)
Comment on attachment 8869708 [details] [diff] [review]
More release asserts

Review of attachment 8869708 [details] [diff] [review]:
-----------------------------------------------------------------

No crashes on Nightly in the past 7 days? Not enough users or is it possible it got fixed there?
Attachment #8869708 - Flags: review?(jdemooij) → review+
Pushed by tschneidereit@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b8e654020c64
Add some more release asserts to promise job handling code. r=jandem
https://hg.mozilla.org/mozilla-central/rev/b8e654020c64
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(In reply to Jan de Mooij [:jandem] from comment #12)
> No crashes on Nightly in the past 7 days? Not enough users or is it possible
> it got fixed there?
there aren't that many of those crashes happening on nightly - should we try to get the patch to beta in order to get more data? (also if it's mainly a diagnostic patch, should we keep the bug open)?
Flags: needinfo?(till)
(In reply to [:philipp] from comment #15)
> (In reply to Jan de Mooij [:jandem] from comment #12)
> > No crashes on Nightly in the past 7 days? Not enough users or is it possible
> > it got fixed there?
> there aren't that many of those crashes happening on nightly - should we try
> to get the patch to beta in order to get more data? (also if it's mainly a
> diagnostic patch, should we keep the bug open)?

Yes, I intended for this to be uplifted. You're right on leaving the bug open, too.
Status: RESOLVED → REOPENED
Flags: needinfo?(till)
Resolution: FIXED → ---
Comment on attachment 8869708 [details] [diff] [review]
More release asserts

Approval Request Comment
[Feature/Bug causing the regression]: unknown
[User impact if declined]: no immediate impact, this is adding investigatory release asserts, which we want more users to exercise.
[Is this code covered by automated tests?]: yes, to the extent possible
[Has the fix been verified in Nightly?]: n/a
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: Only adds release asserts that can only be triggered in situations where we'd otherwise be guaranteed to crash.
[String changes made/needed]: n/a
Attachment #8869708 - Flags: approval-mozilla-beta?
Comment on attachment 8869708 [details] [diff] [review]
More release asserts

Diagnostic patch, let's uplift to beta for this Thursday's build.
Attachment #8869708 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Till Schneidereit [:till] from comment #17)
> [Is this code covered by automated tests?]: yes, to the extent possible
> [Has the fix been verified in Nightly?]: n/a
> [Needs manual test from QE? If yes, steps to reproduce]: no

Setting qe-verify- based on Till's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
(In reply to Ryan VanderMeulen [:RyanVM] from comment #19)
> https://hg.mozilla.org/releases/mozilla-beta/rev/adb1f8ff7655

setting flags
this is now showing up as [@ TriggerPromiseReactions ] in 54.0b11 and hitting the newly added MOZ_RELEASE_ASSERT(reactionVal.isObject())
Crash Signature: [@ EnqueuePromiseReactionJob] → [@ EnqueuePromiseReactionJob] [@ TriggerPromiseReactions]
(In reply to [:philipp] from comment #22)
> this is now showing up as [@ TriggerPromiseReactions ] in 54.0b11 and
> hitting the newly added MOZ_RELEASE_ASSERT(reactionVal.isObject())

ni? to Till to re-look at this and see what the release assertions have uncovered.
Flags: needinfo?(till)
This signature is ranked #4 in content top-crash for beta. They all started in b11 (buildid 20170525155044).
Keywords: topcrash
Gerry, this is a new crash in 54 with relatively high volume in beta (and the new release assert added in this bug is triggering on beta quite a lot).  Should we block on this?  Back out the new assertions?
Flags: needinfo?(gchang)
Hi(In reply to Julien Cristau [:jcristau] from comment #25)
> Gerry, this is a new crash in 54 with relatively high volume in beta (and
> the new release assert added in this bug is triggering on beta quite a lot).
> Should we block on this?  Back out the new assertions?

Yes, I think we have to backout these new assertions.
NI sheriff to help on this.
Flags: needinfo?(gchang) → needinfo?(cbook)
(In reply to Gerry Chang [:gchang] from comment #26)
> Hi(In reply to Julien Cristau [:jcristau] from comment #25)
> > Gerry, this is a new crash in 54 with relatively high volume in beta (and
> > the new release assert added in this bug is triggering on beta quite a lot).
> > Should we block on this?  Back out the new assertions?
> 
> Yes, I think we have to backout these new assertions.
> NI sheriff to help on this.

Backing out the assertions doesn't solve anything, because we were already crashing before; those assertions were just added to see if we could get more information about the crash.  So we'll just get a different signature for the crashes without actually decreasing the crash volume, in which case Julien's other question--should we block on this?--becomes more relevant.
Flags: needinfo?(gchang)
(In reply to Gerry Chang [:gchang] from comment #26)
> Hi(In reply to Julien Cristau [:jcristau] from comment #25)
> > Gerry, this is a new crash in 54 with relatively high volume in beta (and
> > the new release assert added in this bug is triggering on beta quite a lot).
> > Should we block on this?  Back out the new assertions?
> 
> Yes, I think we have to backout these new assertions.
> NI sheriff to help on this.

I agree with Nathan, these asserts don't trigger any more crashes than we had before, they strictly move them to a slightly earlier point. Now that they have been backed out that's ok, too: I have all the information they could give already.

I've been looking at the crashes for quite a bit over the last couple of weeks, still to no avail. Hence I'm also not sure what blocking on them would achieve. I will continue to try to figure out what's going on.
Like :till mentioned in comment #29, there is no clue at this moment. We will leave this in 54.
Flags: needinfo?(gchang)
Assuming this will be affecting 56.
This is highly correlated with Avast on Release:
(85.71% in signature vs 00.40% overall) Addon "Avast SafePrice" = true
(85.71% in signature vs 00.99% overall) Addon "Avast Online Security" = true
(85.71% in signature vs 05.81% overall) Module "snxhk.dll" = true
Still a huge volume crash, with 10K crashes in the last week, often but not always in the content process. 
Noting also that this crash inspires a lot of angry user comments.
the MOZ_RELEASE_ASSERT(reactionVal.isObject()); is being hit on fennec numerous times in the last couple of days with the [@ ResolvePromise] signature.

this looks related to the ublock origin update to version 1.14.4 turning it into a webextension - the spike for this signature happened on the same day.

Correlations for FennecAndroid Release:
(99.92% in signature vs 00.71% overall) moz_crash_reason = MOZ_RELEASE_ASSERT(reactionVal.isObject())
(97.75% in signature vs 01.58% overall) Module "2029012401EexhtceanCspiuotnrSat.sqlite-shm" = true
(99.84% in signature vs 04.41% overall) Module "uBlock0@raymondhill.net.xpi" = true
(99.92% in signature vs 06.45% overall) Addon "uBlock Origin" = true
Crash Signature: [@ EnqueuePromiseReactionJob] [@ TriggerPromiseReactions] → [@ EnqueuePromiseReactionJob] [@ TriggerPromiseReactions] [@ ResolvePromise]
This isn't showing up in huge volume on 56 or 57 but I assume it will when those move to release. 
I'm leaving a comment at https://github.com/gorhill/uBlock/issues/2795 to alert the uBlock developer(s).
Note that ([@ EnqueuePromiseReactionJob ]) is mostly windows read assertions (at least this week), and all either 0xfffffffffff or 0x4c201b4.  They *appear* to be on the line before the MOZ_RELEASE_ASSERT (which is being hit at lower frequency, and is also being hit on fennec).

The fixed address is odd...
#6 top overall browser crash in Fennec release.
Just hit this for the first time in a newly updated Firefox 58. See https://crash-stats.mozilla.com/report/index/2462f4e0-ac5c-49f6-aad1-8a17e0170922
I also just hit this on today's nightly OSX, first hard crash I've had in a while

https://crash-stats.mozilla.com/report/index/7e95bb09-46c3-4ffc-ad6b-cfb290170922
[Tracking Requested - why for this release]:

#1 top crash for Fennec nightly 20170925171818.

Correlations for FennecAndroid Nightly
(100.0% in signature vs 01.22% overall) moz_crash_reason = MOZ_RELEASE_ASSERT(reactionVal.isObject())
(100.0% in signature vs 30.29% overall) Addon "uBlock Origin" = true
(17.95% in signature vs 80.65% overall) build_id = 20170916100141
(86.32% in signature vs 27.68% overall) Addon "uBlock Origin" Version = 1.14.8
In early 57b3 this is also the top Mac browser crash. One comment: Just opened firefox and started typing in the adress bar. The keys I pressed didn’t start to show up.
If someone has ideas for how to debug this further, I'd very much welcome those. Perhaps chaos mode in rr might stand a chance, don't know. As it is, I spent days trying to figure this out, tried to trace all code paths that could possibly cause the wrong value to be stored in this slot, and came up empty. I'm willing to spend more time on it, and that obviously would be important, but I definitely won't be able to do so until the end of next week as I'm traveling right now.
https://crash-stats.mozilla.com/signature/?signature=ResolvePromise is the highest volume signature on Android at the moment. Correlations for release, beta and nightly show a strong correlation to uBlock Origin" Version = 1.14.8. Perhaps trying to at least reproduce the crash on Android might help. ni on Ioana. I will try to get some URLs.
Flags: needinfo?(ioana.chiorean)
Just a random hint, maybe a red herring, but note the stacks in comment 38, comment 39, as well as all the 5 first stacks I've found on the link in comment 43 show mozilla::dom::GenerateRTCCertificateTask::Resolve() in the backtrace, which is generated via a webrtc generateCertificate call. Maybe it would be a nice place to add some log or look at the code?

https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/generateCertificate
¡Hola Till!

Just crashed like this bp-c2fe8d82-9577-4888-95d2-3ea130170930

Is that crash this bug?

¡Gracias!
Alex
It's possible to trigger that assertion because JS::ResolvePromise doesn't guard against reentrant calls (bug 1403988), but the test case for that scenario looks a bit contrived to me:

---
var CreatePendingPromise = getSelfHostedValue("CreatePendingPromise");
var p = CreatePendingPromise();
resolvePromise(p, {
    get then() {
        resolvePromise(p, [undefined]);
    }
});
---
This is the #4 Windows top crash in Nightly 20170929100122.
Can this functionality be selectively disabled or put behind a pref while we work on understanding the cause?
Flags: needinfo?(jdemooij)
Attached patch PatchSplinter Review
Crash dumps show the |reactions| object in TriggerPromiseReactions is a DeadObjectProxy. We can and should just pass this to EnqueuePromiseReactionJob - it knows how to handle this - but instead we fell through to the array case and there things go horribly wrong.

The patch also adds a nukeAllCCWs function to the JS shell. That makes it possible to write a shell test for this, and this function seems useful to have for fuzzing.
Assignee: till → jdemooij
Attachment #8869708 - Attachment is obsolete: true
Status: REOPENED → ASSIGNED
Flags: needinfo?(till)
Flags: needinfo?(jdemooij)
Flags: needinfo?(ioana.chiorean)
Attachment #8916035 - Flags: review?(till)
Comment on attachment 8916035 [details] [diff] [review]
Patch

Review of attachment 8916035 [details] [diff] [review]:
-----------------------------------------------------------------

\o/, this is fantastic! I'd planned on spending the coming weekend on this, but I'm obviously very glad to see this fixed!
Attachment #8916035 - Flags: review?(till) → review+
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/754a3e12321c
Check for dead object proxies in TriggerPromiseReactions. r=till
https://hg.mozilla.org/mozilla-central/rev/754a3e12321c
Status: ASSIGNED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Please nominate this for Beta and Release approval. I think we should strongly consider taking this patch if/when we ship a 56.0.2 release.
Flags: needinfo?(jdemooij)
Flags: in-testsuite+
Target Milestone: mozilla55 → mozilla58
Comment on attachment 8916035 [details] [diff] [review]
Patch

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1313049
[User impact if declined]: Significantly higher crash rate
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no, adds a type check that wasn't there before, nothing else.
[Why is the change risky/not risky?]: n/a
[String changes made/needed]: none
Flags: needinfo?(jdemooij)
Attachment #8916035 - Flags: approval-mozilla-release?
Attachment #8916035 - Flags: approval-mozilla-beta?
Comment on attachment 8916035 [details] [diff] [review]
Patch

Fixes a top crasher, Beta57+
Attachment #8916035 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Here, I think the main effect would be on bringing down the crashes for Fennec on 56. This could be a good ridealong, maybe even a dot release driver for Fennec.
This is the #8 top crash for Fennec and we have a fix.  There isn't anything else as a mobile dot release driver for 56 though. 

Do we still think this is related to a specific version of ublock origin? 

Jandem or Till, want to request uplift to m-r?

Marcia, do you think this is worth a 56 dot release?
Flags: needinfo?(mozillamarcia.knous)
Flags: needinfo?(jdemooij)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #59)
> This is the #8 top crash for Fennec and we have a fix.  There isn't anything
> else as a mobile dot release driver for 56 though. 
> 
> Do we still think this is related to a specific version of ublock origin? 
> 
> Jandem or Till, want to request uplift to m-r?
> 
> Marcia, do you think this is worth a 56 dot release?

If we had another ride along maybe, but as discussed in IRC, on it own probably not worth it.
Flags: needinfo?(mozillamarcia.knous)
So far I don't see any other Fennec dot release driver. Maybe we should hold off.
Flags: needinfo?(jdemooij)
Attachment #8916035 - Flags: approval-mozilla-release? → approval-mozilla-release-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: