Closed Bug 1490009 Opened 6 years ago Closed 3 years ago

Crash in OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::gc::StoreBuffer::putCell

Tracking

()

Status:

RESOLVED WORKSFORME

Tracking Flags:

Tracking

Status

firefox-esr60

---

unaffected

firefox62

---

unaffected

firefox63

wontfix

firefox64

wontfix

firefox65

---

wontfix

firefox66

---

wontfix

firefox67

---

wontfix

People

(Reporter: marcia, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file, 1 obsolete file)

bug1490009-reset-promise-callback 6 years ago Jon Coppeard (:jonco) 1.71 KB, patch	mccr8 : review+	Details \| Diff \| Splinter Review
bug1490009-reset-promise-callback v2 6 years ago Jon Coppeard (:jonco) 3.87 KB, patch	bzbarsky : review+ pascalc : approval-mozilla-beta+	Details \| Diff \| Splinter Review

Marcia Knous [:marcia]

Reporter

Description

•

6 years ago

This bug was filed from the Socorro interface and is
report bp-19001319-8a54-49f5-a5a3-dc1990180910.
=============================================================

New Windows content crash showing up in the 63 betas: https://bit.ly/2x2DahT. No comments and the only URL that shows up is facebook. Doesn't appear to affect 62 and no crashes so far in 64 nightly.

Top 10 frames of crashing thread:

0 xul.dll js::AutoEnterOOMUnsafeRegion::crash js/src/vm/JSContext.cpp:1628
1 xul.dll js::gc::StoreBuffer::putCell js/src/gc/StoreBuffer.h:461
2 xul.dll JS::HeapObjectPostBarrier js/src/gc/Barrier.h:269
3 xul.dll static bool mozilla::CycleCollectedJSContext::EnqueuePromiseJobCallback xpcom/base/CycleCollectedJSContext.cpp:272
4 xul.dll JSRuntime::enqueuePromiseJob js/src/vm/Runtime.cpp:653
5 xul.dll static bool EnqueuePromiseReactionJob js/src/builtin/Promise.cpp:1008
6 xul.dll static bool ResolvePromise js/src/builtin/Promise.cpp:1052
7 xul.dll static bool FulfillMaybeWrappedPromise js/src/builtin/Promise.cpp:1086
8 xul.dll static bool ResolvePromiseInternal js/src/builtin/Promise.cpp:823
9 xul.dll js::PromiseObject::resolve js/src/builtin/Promise.cpp:4154

=============================================================

Marcia Knous [:marcia]

Reporter

Comment 1

•

6 years ago

Adding another signature seen in 63 beta. 63 beta 4 has over 100 crashes so far.

Marcia Knous [:marcia]

Reporter

Comment 2

•

6 years ago

Hello Steven - this is a new JS crash which is emerging in 63 - can someone triage this bug? Thanks in advance.

Flags: needinfo?(sdetar)

Pascal Chevrel:pascalc (PTO until April 26)

Comment 3

•

6 years ago

Tracking for 63 as the volume on beta is sizable. Steven could you triage this bug and help find an owner please? Thanks

tracking-firefox63: --- → +

Steven DeTar [:sdetar]

Comment 4

•

6 years ago

Jason: during your triage of the JS engine bugs this week, could we make sure this bug is triaged.  Thank you

Flags: needinfo?(sdetar) → needinfo?(jorendorff)

Jason Orendorff [:jorendorff]

Comment 5

•

6 years ago

P1 to investigate, at least

Flags: needinfo?(jorendorff) → needinfo?(jcoppeard)

Priority: -- → P1

Jon Coppeard (:jonco)

Comment 6

•

6 years ago

I don't know why this just started happening, but here's what I think is going on:

Whenever a promise is resolved or rejected the JS engine tells the embedding to create a job to handle it (by calling back into the engine) at the appropriate time.  These job objects frequently contain pointers to JSObject in the nursery, and so we have to record them in our storebuffer.  If a script uses a lot of promises then this will result in there being many store buffer entries.

(It occurs to me that this may have started because we're sweeping the nursery less frequently now.)

These job objects are cycle collected and so may persist until the next CC even after the job has been run and the object is no longer used.

Flags: needinfo?(jcoppeard)

Jon Coppeard (:jonco)

Comment 7

•

6 years ago

Attached patch bug1490009-reset-promise-callback (obsolete) — Details — Splinter Review

Here's a patch I think will help.  This clears all the JSObject references in the callback object after use.  This will remove the storebuffer entries ASAP rather than waiting for the next minor GC.

It also occurs to me that this could be a more general problem with CC'ed things holding GC things live longer than necessary.

Assignee: nobody → jcoppeard

Attachment #9010582 - Flags: review?(continuation)

Andrew McCreight [:mccr8]

Comment 8

•

6 years ago

Comment on attachment 9010582 [details] [diff] [review]
bug1490009-reset-promise-callback

Review of attachment 9010582 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/CycleCollectedJSContext.cpp
@@ +226,5 @@
>      if (global && !global->IsDying()) {
>        mCallback->Call("promise callback");
>        aAso.CheckForInterrupt();
>      }
> +    mCallback->Reset();

Please add a comment here about why you are doing this.

Attachment #9010582 - Flags: review?(continuation) → review+

Pulsebot

Comment 9

•

6 years ago

Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c173cb530f63
Clear CallbackObject fields after use for promise job to avoid tenuring objects unnecessarily r=mccr8

Jon Coppeard (:jonco)

Comment 10

•

6 years ago

I'd like to leave this open until we see whether it fixes the problem.

Keywords: leave-open

Andreea Pavel [:apavel]

Comment 11

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/c173cb530f63

Boris Zbarsky [:bzbarsky]

Comment 12

•

6 years ago

There are explicit guarantees that CallbackObject provides about when CallbackOrNull() can return null, and the newly-added API totally violates those guarantees.  Worse yet, it didn't even adjust the corresponding documentation, so consumers would think that the guarantees are still being provided.

As a bare minimum, the new API needs to come with documentation explaining what invariants it's violating, and the existing documentation about the invariants needs to be updated accordingly.

Better yet, in addition to the documentation the API would be private with a friend declaration to allow this one callsite.

Even better would be to rejigger the whole thing to have a concept of "run-once" callback, which auto-disappears  after you Call() it.  Exactly how to set that up is not 100% clear to me yet, but conceptually it would be the best solution.

Flags: needinfo?(jcoppeard)

Boris Zbarsky [:bzbarsky]

Updated

•

6 years ago

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 13

•

6 years ago

Yeah, I guess I should have checked more about whether it was okay or not to just randomly clear. I didn't realize there was this particular invariant. This should be backed out then. It'll still have been in a Nightly, so we can see if it helps the original issue, maybe.

I don't entirely understand how this violates the description on CallbackOrNull(). Doesn't JS run in the mCallback->Call()? Is the problem that it is expected that CallbackOrNull() can return null only when certain particular kinds of JS are run (as specified in the second sentence of the comment) or that sometimes we don't actually run mCallback->Call()?

Flags: needinfo?(continuation) → needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 14

•

6 years ago

The invariant being violated is the second sentence:

  // It will only return null
  // after a JavaScript caller has called nukeSandbox on a Sandbox object, and
  // the cycle collector has had a chance to run.

That can't happen without script running, but not all script running can cause that.

Flags: needinfo?(bzbarsky)

Andrew McCreight [:mccr8]

Comment 15

•

6 years ago

It looks like Jon is on PTO. Sheriffs, please back this patch out.

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 16

•

6 years ago

Backed out.
https://hg.mozilla.org/mozilla-central/rev/99723bad4d198575d4888af2cc103fa325719e5e

status-firefox64: ? → affected

status-firefox-esr60: --- → unaffected

tracking-firefox64: --- → +

Keywords: checkin-needed

Randell Jesup [:jesup] (needinfo me)

Comment 17

•

6 years ago

could this be related to bug 1488164 which has the same spike time?  (of course, it could just be two bugs that uplifted and caused OOMs in beta) (note: I think jonco is on PTO per above)

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 18

•

6 years ago

Doesn't seem related to me.

Flags: needinfo?(bzbarsky)

Jon Coppeard (:jonco)

Comment 19

•

6 years ago

Attached patch bug1490009-reset-promise-callback v2 — Details — Splinter Review

Sorry about that.  How does this version look?  I updated the doc comments and make Reset() private.

Flags: needinfo?(jcoppeard)

Attachment #9013311 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 20

•

6 years ago

Comment on attachment 9013311 [details] [diff] [review]
bug1490009-reset-promise-callback v2

>+  // this point. This should only be called if the object is know not to be used

"known"

Attachment #9013311 - Flags: review?(bzbarsky) → review+

Pulsebot

Comment 21

•

6 years ago

Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cae1a7f33840
Clear CallbackObject fields after use for promise job to avoid tenuring objects unnecessarily r=bz

Raul Gurzau (:RaulG)

Comment 22

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/cae1a7f33840

Ionuț Goldan [:igoldan]

Comment 23

•

6 years ago

We gained some perf boosts!

== Change summary for alert #16377 (as of Tue, 02 Oct 2018 09:21:35 GMT) ==

Improvements:

 16%  tp5o_webext responsiveness linux64 pgo e10s stylo         1.23 -> 1.03
 16%  tp5o_webext responsiveness windows7-32 opt e10s stylo     1.20 -> 1.01
 15%  tp5o_webext responsiveness linux64 opt e10s stylo         1.39 -> 1.18
 12%  tp5o_webext responsiveness linux64-qr opt e10s stylo      1.96 -> 1.73

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16377

Pascal Chevrel:pascalc (PTO until April 26)

Comment 24

•

6 years ago

Jon, can we uplift this patch to beta as almost all the  crashes affect 63?

Flags: needinfo?(jcoppeard)

Jon Coppeard (:jonco)

Updated

•

6 years ago

Attachment #9010582 - Attachment is obsolete: true

Jon Coppeard (:jonco)

Comment 25

•

6 years ago

Comment on attachment 9013311 [details] [diff] [review]
bug1490009-reset-promise-callback v2

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: None

User impact if declined: Hopefully this will reduce the volume of OOM crashes on beta.

Is this code covered by automated tests?: Yes

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: No

If yes, steps to reproduce: 

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): This is a simple change and problems would have resulted in immediate test failures on central.

String changes made/needed: None.

Flags: needinfo?(jcoppeard)

Attachment #9013311 - Flags: approval-mozilla-beta?

Pascal Chevrel:pascalc (PTO until April 26)

Comment 26

•

6 years ago

Comment on attachment 9013311 [details] [diff] [review]
bug1490009-reset-promise-callback v2

OOM Crash fix, uplift approved for 63 beta 12, thanks.

Attachment #9013311 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Andreea Pavel [:apavel]

Comment 27

•

6 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/01dd14f85e40

status-firefox63: affected → fixed

Joel Maher ( :jmaher ) (UTC -8)

Comment 28

•

6 years ago

I see a perf win here:
== Change summary for alert #16386 (as of Tue, 02 Oct 2018 18:40:23 GMT) ==

Improvements:

 14%  tp5o_webext responsiveness windows7-32 opt e10s stylo     1.20 -> 1.03

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16386

QA Contact: sdetar

Firefox Bug Husbandry Bot

Updated

•

6 years ago

QA Contact: sdetar

[:philipp]

Comment 29

•

6 years ago

the volume of [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::gc::StoreBuffer::putCell ] crashes doesn't look like it has significantly reduced on beta since the patch landed in 63.0b12.

Flags: needinfo?(jcoppeard)

Jon Coppeard (:jonco)

Comment 30

•

6 years ago

I don't know what's happening here.  Note that this is an OOM crash so the cause may not be related to this signature at all.

It's hard to know for sure but low volume of crashes like this seem to show up in nightly in the 20180823220048 build.

Flags: needinfo?(jcoppeard)

Jon Coppeard (:jonco)

Comment 31

•

6 years ago

Unassigning myself as I'm not actively working on this.

Assignee: jcoppeard → nobody

Ryan VanderMeulen [:RyanVM]

Comment 32

•

6 years ago

Steven, this is currently tracking64+ and unassigned. With only a couple weeks left in this cycle, is this on the radar still?

status-firefox65: --- → affected

Flags: needinfo?(sdetar)

Steven DeTar [:sdetar]

Comment 33

•

6 years ago

Jon, can you help answer Ryan's question?  Is there anything we could be doing here?  It seems this is not on our radar at this point, is that accurate?

Flags: needinfo?(sdetar) → needinfo?(jcoppeard)

Julien Cristau [:jcristau]

Updated

•

6 years ago

status-firefox63: fixed → wontfix

Jon Coppeard (:jonco)

Comment 34

•

6 years ago

I don't think there's anything we can do here.  We crashing because we're running out of memory so something has made us allocate more, but it's not necessarily the code in the stack trace.

Flags: needinfo?(jcoppeard)

Julien Cristau [:jcristau]

Comment 35

•

6 years ago

Too late to do anything for 64.

status-firefox64: affected → wontfix

Liz Henry (:lizzard) (relman/hg->git project)

Comment 36

•

5 years ago

Marking this fix-optional for 65 and 66 to remove it from repeated triage.

status-firefox65: affected → fix-optional

status-firefox66: --- → fix-optional

Nicolas B. Pierron [:nbp]

Comment 37

•

5 years ago

Based on comment 30 and comment 34, this bug seems to be non-actionable.

Keywords: stalled

Boris Zbarsky [:bzbarsky]

Updated

•

5 years ago

Regressions: 1555729

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox65: fix-optional → wontfix

status-firefox66: fix-optional → wontfix

status-firefox67: --- → wontfix

BugBot [:suhaib / :marco/ :calixte]

Comment 38

•

4 years ago

The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?

Flags: needinfo?(sdetar)

Steven DeTar [:sdetar]

Updated

•

4 years ago

Flags: needinfo?(sdetar)

Andrei Purice

Comment 39

•

3 years ago

Marking this as Resolved > Worksforme since there are no more crashes with this signature in the past 6 months.

Status: NEW → RESOLVED

Closed: 3 years ago

Resolution: --- → WORKSFORME

BugBot [:suhaib / :marco/ :calixte]

Updated

•

3 years ago

Keywords: leave-open

BugBot [:suhaib / :marco/ :calixte]

Comment 40

•

3 years ago

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled

You need to log in before you can comment on or make changes to this bug.