Closed Bug 1405980 Opened 4 years ago Closed 4 years ago

Intermittent GECKO(4316) | Assertion failure: zone->isGCScheduled(), at z:/build/build/src/js/src/jsgc.cpp:7046

Categories

(Core :: JavaScript: GC, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox58 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jonco)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file, 1 obsolete file)

Filed by: archaeopteryx [at] coole-files.de

https://treeherder.mozilla.org/logviewer.html#?job_id=135091654&repo=autoland

https://queue.taskcluster.net/v1/task/Y2PMoYKrQPWI2l34pLdg1w/runs/0/artifacts/public/logs/live_backing.log

06:13:01     INFO -  GECKO(4316) | Assertion failure: zone->isGCScheduled(), at z:/build/build/src/js/src/jsgc.cpp:7046
06:13:01     INFO -  GECKO(4316) | [GPU 4128, Main Thread] WARNING: Shutting down GPU process early due to a crash!: file z:/build/build/src/gfx/ipc/GPUParent.cpp, line 407
06:13:01     INFO -  GECKO(4316) | Hit MOZ_CRASH(Aborting on channel error.) at z:/build/build/src/ipc/glue/MessageChannel.cpp:2543
06:13:01     INFO -  GECKO(4316) | #01: mozilla::ipc::MessageChannel::OnChannelErrorFromLink() [ipc/glue/MessageChannel.cpp:2543]
06:13:01     INFO -  GECKO(4316) | #02: base::MessagePumpForIO::WaitForIOCompletion(unsigned long,base::MessagePumpForIO::IOHandler *) [ipc/chromium/src/base/message_pump_win.cc:495]
06:13:01     INFO -  GECKO(4316) | #03: base::MessagePumpForIO::WaitForWork() [ipc/chromium/src/base/message_pump_win.cc:472]
06:13:01     INFO -  GECKO(4316) | #04: base::MessagePumpForIO::DoRunLoop() [ipc/chromium/src/base/message_pump_win.cc:434]
06:13:01     INFO -  GECKO(4316) | #05: base::MessagePumpWin::Run(base::MessagePump::Delegate *) [ipc/chromium/src/base/message_pump_win.h:80]
06:13:01     INFO -  GECKO(4316) | #06: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:326]
06:13:01     INFO -  GECKO(4316) | #07: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:320]
06:13:01     INFO -  GECKO(4316) | #08: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:300]
06:13:01     INFO -  GECKO(4316) | #09: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:184]
06:13:01     INFO -  GECKO(4316) | #10: `anonymous namespace'::ThreadFunc [ipc/chromium/src/base/platform_thread_win.cc:29]
06:13:01     INFO -  GECKO(4316) | #11: kernel32.dll + 0x53c45
06:13:01     INFO -  GECKO(4316) | #12: patched_BaseThreadInitThunk [mozglue/build/WindowsDllBlocklist.cpp:824]
06:13:01     INFO -  GECKO(4316) | #13: ntdll.dll + 0x637f5
06:13:01     INFO -  GECKO(4316) | #14: ntdll.dll + 0x637c8
06:13:01     INFO -  TEST-INFO | Main app process: exit 1
Duplicate of this bug: 1406431
Looks like this was triggered by the changes in bug 1405274.  The assertion that's failing is the following in GCRuntime::budgetIncrementalGC:

    if (zone->isTooMuchMalloc())
        MOZ_ASSERT(zone->isGCScheduled());

My current thinking is that this is happening because that bug stopped clearing the malloc bytes count for all zones on every GC, regardless of which zones were collected (although I just backed that patch out for telemetry regressions).

I think it's still possible for this assertion to fail though, since there are various situations were we don't collect the atoms zone (e.g. use of AutoKeepAtoms) even though isTooMuchMalloc() may be true.
Blocks: 1405274
Attached patch bug1405980-malloc-zone-assertion (obsolete) — Splinter Review
Patch to relax scheduling assertions for the atoms zone.
Assignee: nobody → jcoppeard
Attachment #8917031 - Flags: review?(sphink)
Comment on attachment 8917031 [details] [diff] [review]
bug1405980-malloc-zone-assertion

Review of attachment 8917031 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgc.cpp
@@ +7023,5 @@
>          if (!zone->canCollect())
>              continue;
>  
>          if (zone->usage.gcBytes() >= zone->threshold.gcTriggerBytes()) {
> +            MOZ_ASSERT_IF(!zone->isAtomsZone(), zone->isGCScheduled());

I think this needs a comment. If I am understanding correctly, the only reason why the atoms zone could be different here is if hasHelperThreadZones() was true and is now false. Is that correct? Because otherwise, zone->canCollect() should have already skipped the zone.

More generally, this assertion seems to depend on canCollect() staying "no more true" than it was when the GC was triggered. (false -> true is the problematic transition.) There isn't any other way that could happen, is there? eg a zone getting transitioned between zone groups, or createdForHelperThread() becoming false.
Attachment #8917031 - Flags: review?(sphink) → review+
(In reply to Steve Fink [:sfink] [:s:] from comment #5)
> If I am understanding correctly, the only
> reason why the atoms zone could be different here is if
> hasHelperThreadZones() was true and is now false. Is that correct? Because
> otherwise, zone->canCollect() should have already skipped the zone.

That is correct... and looking at the backtrace for the assertion failure shows that this isn't what is happening here, as it's failing in the first slice of a GC.  So my patch is not the answer.
Attachment #8917031 - Attachment is obsolete: true
I don't know what's going on here, so this patch is to print some debugging information on failure.  Hopefully this will shed some light.
Attachment #8917418 - Flags: review?(sphink)
Attachment #8917418 - Flags: review?(sphink) → review+
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8cdff44958c8
Dump debugging information if zones are not scheduled as expected r=sfink
https://hg.mozilla.org/mozilla-central/rev/8cdff44958c8
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Was this fixed somewhere else?
Flags: needinfo?(jcoppeard)
(In reply to Marco Castelluccio [:marco] from comment #11)
There were more changes to the malloc triggers done in bug 1408375.  I expect that fixed it.
Flags: needinfo?(jcoppeard)
You need to log in before you can comment on or make changes to this bug.