Closed
Bug 1490009
Opened 6 years ago
Closed 3 years ago
Crash in OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::gc::StoreBuffer::putCell
Categories
(Core :: JavaScript Engine, defect, P1)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox62 | --- | unaffected |
firefox63 | + | wontfix |
firefox64 | + | wontfix |
firefox65 | --- | wontfix |
firefox66 | --- | wontfix |
firefox67 | --- | wontfix |
People
(Reporter: marcia, Unassigned)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file, 1 obsolete file)
3.87 KB,
patch
|
bzbarsky
:
review+
pascalc
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-19001319-8a54-49f5-a5a3-dc1990180910. ============================================================= New Windows content crash showing up in the 63 betas: https://bit.ly/2x2DahT. No comments and the only URL that shows up is facebook. Doesn't appear to affect 62 and no crashes so far in 64 nightly. Top 10 frames of crashing thread: 0 xul.dll js::AutoEnterOOMUnsafeRegion::crash js/src/vm/JSContext.cpp:1628 1 xul.dll js::gc::StoreBuffer::putCell js/src/gc/StoreBuffer.h:461 2 xul.dll JS::HeapObjectPostBarrier js/src/gc/Barrier.h:269 3 xul.dll static bool mozilla::CycleCollectedJSContext::EnqueuePromiseJobCallback xpcom/base/CycleCollectedJSContext.cpp:272 4 xul.dll JSRuntime::enqueuePromiseJob js/src/vm/Runtime.cpp:653 5 xul.dll static bool EnqueuePromiseReactionJob js/src/builtin/Promise.cpp:1008 6 xul.dll static bool ResolvePromise js/src/builtin/Promise.cpp:1052 7 xul.dll static bool FulfillMaybeWrappedPromise js/src/builtin/Promise.cpp:1086 8 xul.dll static bool ResolvePromiseInternal js/src/builtin/Promise.cpp:823 9 xul.dll js::PromiseObject::resolve js/src/builtin/Promise.cpp:4154 =============================================================
Reporter | ||
Comment 1•6 years ago
|
||
Adding another signature seen in 63 beta. 63 beta 4 has over 100 crashes so far.
Crash Signature: [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::gc::StoreBuffer::putCell] → [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::gc::StoreBuffer::putCell]
[@ shutdownhang | js::gc::StoreBuffer::putCell]
Reporter | ||
Comment 2•6 years ago
|
||
Hello Steven - this is a new JS crash which is emerging in 63 - can someone triage this bug? Thanks in advance.
Flags: needinfo?(sdetar)
Comment 3•6 years ago
|
||
Tracking for 63 as the volume on beta is sizable. Steven could you triage this bug and help find an owner please? Thanks
tracking-firefox63:
--- → +
Comment 4•6 years ago
|
||
Jason: during your triage of the JS engine bugs this week, could we make sure this bug is triaged. Thank you
Flags: needinfo?(sdetar) → needinfo?(jorendorff)
Comment 5•6 years ago
|
||
P1 to investigate, at least
Flags: needinfo?(jorendorff) → needinfo?(jcoppeard)
Priority: -- → P1
Comment 6•6 years ago
|
||
I don't know why this just started happening, but here's what I think is going on: Whenever a promise is resolved or rejected the JS engine tells the embedding to create a job to handle it (by calling back into the engine) at the appropriate time. These job objects frequently contain pointers to JSObject in the nursery, and so we have to record them in our storebuffer. If a script uses a lot of promises then this will result in there being many store buffer entries. (It occurs to me that this may have started because we're sweeping the nursery less frequently now.) These job objects are cycle collected and so may persist until the next CC even after the job has been run and the object is no longer used.
Flags: needinfo?(jcoppeard)
Comment 7•6 years ago
|
||
Here's a patch I think will help. This clears all the JSObject references in the callback object after use. This will remove the storebuffer entries ASAP rather than waiting for the next minor GC. It also occurs to me that this could be a more general problem with CC'ed things holding GC things live longer than necessary.
Assignee: nobody → jcoppeard
Attachment #9010582 -
Flags: review?(continuation)
Comment 8•6 years ago
|
||
Comment on attachment 9010582 [details] [diff] [review] bug1490009-reset-promise-callback Review of attachment 9010582 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/base/CycleCollectedJSContext.cpp @@ +226,5 @@ > if (global && !global->IsDying()) { > mCallback->Call("promise callback"); > aAso.CheckForInterrupt(); > } > + mCallback->Reset(); Please add a comment here about why you are doing this.
Attachment #9010582 -
Flags: review?(continuation) → review+
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c173cb530f63 Clear CallbackObject fields after use for promise job to avoid tenuring objects unnecessarily r=mccr8
Comment 10•6 years ago
|
||
I'd like to leave this open until we see whether it fixes the problem.
Keywords: leave-open
Comment 11•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c173cb530f63
Comment 12•6 years ago
|
||
There are explicit guarantees that CallbackObject provides about when CallbackOrNull() can return null, and the newly-added API totally violates those guarantees. Worse yet, it didn't even adjust the corresponding documentation, so consumers would think that the guarantees are still being provided. As a bare minimum, the new API needs to come with documentation explaining what invariants it's violating, and the existing documentation about the invariants needs to be updated accordingly. Better yet, in addition to the documentation the API would be private with a friend declaration to allow this one callsite. Even better would be to rejigger the whole thing to have a concept of "run-once" callback, which auto-disappears after you Call() it. Exactly how to set that up is not 100% clear to me yet, but conceptually it would be the best solution.
Flags: needinfo?(jcoppeard)
Updated•6 years ago
|
Flags: needinfo?(continuation)
Comment 13•6 years ago
|
||
Yeah, I guess I should have checked more about whether it was okay or not to just randomly clear. I didn't realize there was this particular invariant. This should be backed out then. It'll still have been in a Nightly, so we can see if it helps the original issue, maybe. I don't entirely understand how this violates the description on CallbackOrNull(). Doesn't JS run in the mCallback->Call()? Is the problem that it is expected that CallbackOrNull() can return null only when certain particular kinds of JS are run (as specified in the second sentence of the comment) or that sometimes we don't actually run mCallback->Call()?
Flags: needinfo?(continuation) → needinfo?(bzbarsky)
Comment 14•6 years ago
|
||
The invariant being violated is the second sentence: // It will only return null // after a JavaScript caller has called nukeSandbox on a Sandbox object, and // the cycle collector has had a chance to run. That can't happen without script running, but not all script running can cause that.
Flags: needinfo?(bzbarsky)
Comment 15•6 years ago
|
||
It looks like Jon is on PTO. Sheriffs, please back this patch out.
Keywords: checkin-needed
Comment 16•6 years ago
|
||
Backed out. https://hg.mozilla.org/mozilla-central/rev/99723bad4d198575d4888af2cc103fa325719e5e
Comment 17•6 years ago
|
||
could this be related to bug 1488164 which has the same spike time? (of course, it could just be two bugs that uplifted and caused OOMs in beta) (note: I think jonco is on PTO per above)
Flags: needinfo?(bzbarsky)
Comment 19•6 years ago
|
||
Sorry about that. How does this version look? I updated the doc comments and make Reset() private.
Flags: needinfo?(jcoppeard)
Attachment #9013311 -
Flags: review?(bzbarsky)
Comment 20•6 years ago
|
||
Comment on attachment 9013311 [details] [diff] [review] bug1490009-reset-promise-callback v2 >+ // this point. This should only be called if the object is know not to be used "known"
Attachment #9013311 -
Flags: review?(bzbarsky) → review+
Comment 21•6 years ago
|
||
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/cae1a7f33840 Clear CallbackObject fields after use for promise job to avoid tenuring objects unnecessarily r=bz
Comment 22•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/cae1a7f33840
Comment 23•6 years ago
|
||
We gained some perf boosts! == Change summary for alert #16377 (as of Tue, 02 Oct 2018 09:21:35 GMT) == Improvements: 16% tp5o_webext responsiveness linux64 pgo e10s stylo 1.23 -> 1.03 16% tp5o_webext responsiveness windows7-32 opt e10s stylo 1.20 -> 1.01 15% tp5o_webext responsiveness linux64 opt e10s stylo 1.39 -> 1.18 12% tp5o_webext responsiveness linux64-qr opt e10s stylo 1.96 -> 1.73 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16377
Comment 24•6 years ago
|
||
Jon, can we uplift this patch to beta as almost all the crashes affect 63?
Flags: needinfo?(jcoppeard)
Updated•6 years ago
|
Attachment #9010582 -
Attachment is obsolete: true
Comment 25•6 years ago
|
||
Comment on attachment 9013311 [details] [diff] [review] bug1490009-reset-promise-callback v2 [Beta/Release Uplift Approval Request] Feature/Bug causing the regression: None User impact if declined: Hopefully this will reduce the volume of OOM crashes on beta. Is this code covered by automated tests?: Yes Has the fix been verified in Nightly?: Yes Needs manual test from QE?: No If yes, steps to reproduce: List of other uplifts needed: None Risk to taking this patch: Low Why is the change risky/not risky? (and alternatives if risky): This is a simple change and problems would have resulted in immediate test failures on central. String changes made/needed: None.
Flags: needinfo?(jcoppeard)
Attachment #9013311 -
Flags: approval-mozilla-beta?
Comment 26•6 years ago
|
||
Comment on attachment 9013311 [details] [diff] [review] bug1490009-reset-promise-callback v2 OOM Crash fix, uplift approved for 63 beta 12, thanks.
Attachment #9013311 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 27•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/01dd14f85e40
Comment 28•6 years ago
|
||
I see a perf win here: == Change summary for alert #16386 (as of Tue, 02 Oct 2018 18:40:23 GMT) == Improvements: 14% tp5o_webext responsiveness windows7-32 opt e10s stylo 1.20 -> 1.03 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16386
QA Contact: sdetar
Updated•6 years ago
|
QA Contact: sdetar
Comment 29•6 years ago
|
||
the volume of [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::gc::StoreBuffer::putCell ] crashes doesn't look like it has significantly reduced on beta since the patch landed in 63.0b12.
Flags: needinfo?(jcoppeard)
Comment 30•6 years ago
|
||
I don't know what's happening here. Note that this is an OOM crash so the cause may not be related to this signature at all. It's hard to know for sure but low volume of crashes like this seem to show up in nightly in the 20180823220048 build.
Flags: needinfo?(jcoppeard)
Comment 31•6 years ago
|
||
Unassigning myself as I'm not actively working on this.
Assignee: jcoppeard → nobody
Comment 32•6 years ago
|
||
Steven, this is currently tracking64+ and unassigned. With only a couple weeks left in this cycle, is this on the radar still?
status-firefox65:
--- → affected
Flags: needinfo?(sdetar)
Comment 33•6 years ago
|
||
Jon, can you help answer Ryan's question? Is there anything we could be doing here? It seems this is not on our radar at this point, is that accurate?
Flags: needinfo?(sdetar) → needinfo?(jcoppeard)
Updated•6 years ago
|
Comment 34•6 years ago
|
||
I don't think there's anything we can do here. We crashing because we're running out of memory so something has made us allocate more, but it's not necessarily the code in the stack trace.
Flags: needinfo?(jcoppeard)
Comment 35•6 years ago
|
||
Too late to do anything for 64.
Comment 36•5 years ago
|
||
Marking this fix-optional for 65 and 66 to remove it from repeated triage.
Comment 37•5 years ago
|
||
Based on comment 30 and comment 34, this bug seems to be non-actionable.
Keywords: stalled
Updated•5 years ago
|
Comment 38•4 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?
Flags: needinfo?(sdetar)
Updated•4 years ago
|
Flags: needinfo?(sdetar)
Comment 39•3 years ago
|
||
Marking this as Resolved > Worksforme since there are no more crashes with this signature in the past 6 months.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
Updated•3 years ago
|
Keywords: leave-open
Comment 40•3 years ago
|
||
Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.
Keywords: stalled
You need to log in
before you can comment on or make changes to this bug.
Description
•