Closed
Bug 1490042
Opened 7 years ago
Closed 7 years ago
Infinite recursion inside type inference sweeping
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
RESOLVED
FIXED
mozilla64
People
(Reporter: mccr8, Assigned: jonco)
References
Details
(Keywords: crash)
Crash Data
Attachments
(3 files, 1 obsolete file)
8.00 KB,
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
1.06 KB,
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
22.40 KB,
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-316db272-008e-41c2-a87f-019280180907.
=============================================================
Top 10 frames of crashing thread:
0 kernelbase.dll VirtualAlloc
1 mozglue.dll ?chunk_alloc@@YAPAXII_NPA_N@Z.llvm.16719441873306329850 memory/build/mozjemalloc.cpp:2017
2 mozglue.dll arena_t::AllocRun memory/build/mozjemalloc.cpp:2561
3 mozglue.dll BaseAllocator::malloc memory/build/mozjemalloc.cpp:4163
4 mozglue.dll moz_arena_malloc memory/build/malloc_decls.h:115
5 xul.dll js::LifoAlloc::getOrCreateChunk js/src/ds/LifoAlloc.cpp:269
6 xul.dll js::ObjectGroup::sweep js/src/vm/TypeInference.cpp:4509
7 xul.dll js::ObjectGroup::maybeClearNewScriptOnOOM js/src/vm/TypeInference.cpp:3113
8 xul.dll js::TypeZone::clearAllNewScriptsOnOOM js/src/vm/TypeInference.cpp:4701
9 xul.dll js::AutoClearTypeInferenceStateOnOOM::~AutoClearTypeInferenceStateOnOOM js/src/vm/TypeInference.cpp:4722
=============================================================
While looking at bug 1489957, I came across a handful of what looks like an infinite recursion crash inside type inference sweeping code. The sweep, maybeClearNewScriptOnOOM, clearAllNewScriptsOnOOM, ~AutoClearTypeInferenceStateOnOOM repeat over and over again.
Reporter | ||
Comment 1•7 years ago
|
||
Jon, any ideas? There aren't that many crashes, so it might not be worth investing too much time, but maybe there's an easy fix.
Flags: needinfo?(jcoppeard)
Reporter | ||
Comment 2•7 years ago
|
||
I guess it is more common than I thought at first. I did a search for proto signatures containing clearAllNewScriptsOnOOM, and that turned up almost 1000 crashes, and the half dozen crash reports I looked at also looked like instances of this crash, such as: bp-3a5fe8a8-58f3-4867-9f5e-17f190180910
https://crash-stats.mozilla.com/search/?proto_signature=~clearAllNewScriptsOnOOM&date=%3E%3D2018-09-03T08%3A33%3A05.000Z&date=%3C2018-09-10T08%3A33%3A05.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
Assignee | ||
Comment 3•7 years ago
|
||
TypeInference data is swept by copy and compacting it to a new area on every collection. Sweeping is incremental and there is code to ensure data is swept if necessary before it is accessed (AutoSweepObjectGroup). In the OOM failure path we are touching unswept data and sweeping it, which causes us to try to allocate again resulting in an infinite loop.
sfink suggested that we should be throwing away information on OOM and marking types as unknown, which would mean we don't need to allocate. I think this is the right approach.
Assignee | ||
Comment 4•7 years ago
|
||
As we see from the stack trace it's possible for the type inference OOM handling to sweep an ObjectGroup and hit OOM and loop. TI OOM handling works by setting a flag on an AutoClearTypeInferenceStateOnOOM and having that run cleanup actions in its destructor. The problem here is that we can nest AutoClearTypeInferenceStateOnOOM instances. I think that if we disallow this nesting then it won't be possible for this problem to happen, and OOM won't affect the ongoing cleanup.
I've spent a while today trying to make a test case to reproduce this but I haven't been able to find anything that works.
Assignee: nobody → jcoppeard
Flags: needinfo?(jcoppeard)
Assignee | ||
Comment 5•7 years ago
|
||
This patch changes the destructor of ~AutoClearTypeInferenceStateOnOOM to call |zone->types.setSweepingTypes(false)| at end, after cleanup has happened. This makes us assert (in release) if the cleanup tries to use any nested AutoClearTypeInferenceStateOnOOM.
ObjectGroup::sweep() changes to take a reference to one of these cleanup objects and the option to pass a null pointer and have it create a local one is removed.
Finally the AutoSweepObjectGroup and AutoSweepTypeScript classes get two constructors: one that takes a reference to an existing cleanup object, and one that creates one for you if needed.
All this means that OOM during OOM cleanup will just set the flag in the OOM cleanup object which is already true and not cause us to loop.
Attachment #9008409 -
Flags: review?(jdemooij)
Assignee | ||
Comment 6•7 years ago
|
||
Here's a patch for an unsetgczeal() shell function which turns off a single zeal mode and doesn't finish the current GC like gczeal(0) does.
This was necessary to make a test case for this.
Attachment #9008411 -
Flags: review?(sphink)
Assignee | ||
Comment 7•7 years ago
|
||
Test case to cause OOM during sweeping of type information. This was quite tricky to get right.
I checked that this reproduces the problem without the fix. I don't think we get an infinite loop but recursion of depth proportional to the number of unswept object groups. It also depends on incremental GC yielding during TI sweeping after an OOM has occured.
Attachment #9008415 -
Flags: review?(jdemooij)
Comment 8•7 years ago
|
||
Comment on attachment 9008411 [details] [diff] [review]
bug1490042-unset-zeal
Review of attachment 9008411 [details] [diff] [review]:
-----------------------------------------------------------------
Wow. That's quite comprehensive. Looks good to me, though I'm not sure I would spot some necessary missing cleanup.
Attachment #9008411 -
Flags: review?(sphink) → review+
Comment 9•7 years ago
|
||
Comment on attachment 9008409 [details] [diff] [review]
bug1490042-fix
Review of attachment 9008409 [details] [diff] [review]:
-----------------------------------------------------------------
Nice!
::: js/src/gc/GC.cpp
@@ +6238,5 @@
> }
> }
>
> static void
> SweepThing(JSScript* script, AutoClearTypeInferenceStateOnOOM* oom)
Nit: I think this |oom| could be a reference too, because we dereference it unconditionally? Also the SweepThing overload below.
::: js/src/vm/TypeInference.cpp
@@ -4740,5 @@
> propertySet = nullptr;
> }
>
> -static void
> -EnsureHasAutoClearTypeInferenceStateOnOOM(AutoClearTypeInferenceStateOnOOM*& oom, Zone* zone,
Very cool we can get rid of this.
Attachment #9008409 -
Flags: review?(jdemooij) → review+
Comment 10•7 years ago
|
||
Comment on attachment 9008415 [details] [diff] [review]
bug1490042-testcase
Review of attachment 9008415 [details] [diff] [review]:
-----------------------------------------------------------------
That's pretty heroic.
Attachment #9008415 -
Flags: review?(jdemooij) → review+
Assignee | ||
Comment 11•7 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #9)
> Nit: I think this |oom| could be a reference too, because we dereference it
> unconditionally? Also the SweepThing overload below.
Good call. I also need to fix argument forwarding and then this works.
Comment 12•7 years ago
|
||
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/aa3c5d257b1e
Only allow a single AutoClearTypeInferenceStateOnOOM to be active at once r=jandem r=sfink
Comment 13•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox64:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Comment 14•7 years ago
|
||
Is this something you'd consider safe for a Beta uplift?
status-firefox62:
--- → wontfix
status-firefox63:
--- → affected
status-firefox-esr60:
--- → affected
Flags: needinfo?(jcoppeard)
Flags: in-testsuite+
Comment 15•7 years ago
|
||
Backed out for causing topcrash bug 1491530.
https://hg.mozilla.org/integration/autoland/rev/2b3acad1d831076f4936382ba4216c1c9d43ff63
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla64 → ---
Comment 16•7 years ago
|
||
Assignee | ||
Comment 17•7 years ago
|
||
The previous patch failed because of a path taken when converting unboxed objects to native uses AutoSweepObjectGroup while there's an AutoClearTypeInferenceStateOnOOM further up the stack. I couldn't see a way to thread a reference to that through all the calls without it infecting everything.
Here's an alternative, which is to not pass a reference to the AutoClearTypeInferenceStateOnOOM at all and keep the state in the TypeZone. I don't like this as much but it seems to work.
Another option is to wait until unboxed object support is removed which I think will make the original patch viable again.
Attachment #9008409 -
Attachment is obsolete: true
Flags: needinfo?(jcoppeard)
Attachment #9009933 -
Flags: review?(jdemooij)
Comment 18•7 years ago
|
||
Comment on attachment 9009933 [details] [diff] [review]
bug1490042-fix v2
Review of attachment 9009933 [details] [diff] [review]:
-----------------------------------------------------------------
This is still nicer than what we're currently doing.
Attachment #9009933 -
Flags: review?(jdemooij) → review+
Comment 19•7 years ago
|
||
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/274c743b3d55
Only allow one AutoClearTypeInferenceStateOnOOM to be live at a time to fix recursive type sweeping r=jandem
Comment 20•7 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Comment 21•7 years ago
|
||
Jon, do you consider this patch safe for a Beta uplift? Thanks.
Flags: needinfo?(jcoppeard)
Assignee | ||
Comment 22•7 years ago
|
||
The previous patch caused problems and had to be backed out so I'd prefer to let this one ride if this issue is low enough volume that we're happy to do that.
Flags: needinfo?(jcoppeard)
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•