Closed Bug 1399866 Opened 4 years ago Closed 1 year ago

Intermittent GECKO(1136) | Assertion failure: js::CheckGrayMarkingState(mJSRuntime), at z:/build/build/src/xpcom/base/CycleCollectedJSRuntime.cpp:1207

Categories

(Core :: JavaScript: GC, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox67 --- unaffected
firefox68 --- unaffected
firefox69 --- unaffected
firefox70 --- unaffected
firefox71 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jonco)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

Attachments

(7 files)

Filed by: archaeopteryx [at] coole-files.de

https://treeherder.mozilla.org/logviewer.html#?job_id=131009814&repo=autoland

https://queue.taskcluster.net/v1/task/Ql1gEelxQD2EP70evR_W-A/runs/0/artifacts/public/logs/live_backing.log

...
11:46:15     INFO -  GECKO(1136) |   from Object 000001DD62354880 fun_environment edge
11:46:15     INFO -  GECKO(1136) |   from ObjectGroup 000001DD63E19DF0 group_proto edge
11:46:15     INFO -  GECKO(1136) |   from Object 000001DD63483640 group edge
11:46:15     INFO -  GECKO(1136) |   from Object 000001DD685F2520 object slot edge
11:46:15     INFO -  GECKO(1136) |   from Object 000001DD6888D8C0 object slot edge
11:46:15     INFO -  GECKO(1136) |   from ObjectGroup 000001DD5E351070 group_proto edge
11:46:15     INFO -  GECKO(1136) |   from Object 000001DD69394CA0 group edge
11:46:15     INFO -  GECKO(1136) |   from root Preserved wrapper
11:46:15     INFO -  GECKO(1136) | Assertion failure: js::CheckGrayMarkingState(mJSRuntime), at z:/build/build/src/xpcom/base/CycleCollectedJSRuntime.cpp:1207
11:46:29     INFO -  GECKO(1136) | #01: nsCycleCollector::BeginCollection(ccType,nsICycleCollectorListener *) [xpcom/base/nsCycleCollector.cpp:3909]
11:46:29     INFO -  GECKO(1136) | #02: nsCycleCollector::Collect(ccType,js::SliceBudget &,nsICycleCollectorListener *,bool) [xpcom/base/nsCycleCollector.cpp:3761]
11:46:29     INFO -  GECKO(1136) | #03: nsCycleCollector_collectSlice(js::SliceBudget &,bool) [xpcom/base/nsCycleCollector.cpp:4315]
11:46:29     INFO -  GECKO(1136) | #04: nsJSContext::RunCycleCollectorSlice(mozilla::TimeStamp) [dom/base/nsJSEnvironment.cpp:1542]
11:46:29     INFO -  GECKO(1136) | #05: CCRunnerFired [dom/base/nsJSEnvironment.cpp:1936]
11:46:29     INFO -  GECKO(1136) | #06: std::_Func_impl<bool (*)(mozilla::TimeStamp),std::allocator<int>,bool,mozilla::TimeStamp>::_Do_call(mozilla::TimeStamp &&) [vs2015u3/VC/include/functional:214]
11:46:29     INFO -  GECKO(1136) | #07: mozilla::IdleTaskRunner::Run() [xpcom/threads/IdleTaskRunner.cpp:62]
11:46:29     INFO -  GECKO(1136) | #08: mozilla::TimedOut [xpcom/threads/IdleTaskRunner.cpp:85]
11:46:29     INFO -  GECKO(1136) | #09: nsTimerImpl::Fire(int) [xpcom/threads/nsTimerImpl.cpp:515]
11:46:29     INFO -  GECKO(1136) | #10: nsTimerEvent::Run() [xpcom/threads/TimerThread.cpp:289]
11:46:29     INFO -  GECKO(1136) | #11: nsThread::ProcessNextEvent(bool,bool *) [xpcom/threads/nsThread.cpp:1040]
11:46:29     INFO -  GECKO(1136) | #12: NS_ProcessNextEvent(nsIThread *,bool) [xpcom/threads/nsThreadUtils.cpp:521]
11:46:29     INFO -  GECKO(1136) | #13: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:97]
11:46:29     INFO -  GECKO(1136) | #14: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:320]
11:46:29     INFO -  GECKO(1136) | #15: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:300]
11:46:29     INFO -  GECKO(1136) | #16: nsBaseAppShell::Run() [widget/nsBaseAppShell.cpp:160]
11:46:29     INFO -  GECKO(1136) | #17: nsAppShell::Run() [widget/windows/nsAppShell.cpp:232]
11:46:29     INFO -  GECKO(1136) | #18: nsAppStartup::Run() [toolkit/components/startup/nsAppStartup.cpp:288]
11:46:29     INFO -  GECKO(1136) | #19: XREMain::XRE_mainRun() [toolkit/xre/nsAppRunner.cpp:4697]
11:46:29     INFO -  GECKO(1136) | #20: XREMain::XRE_main(int,char * * const,mozilla::BootstrapConfig const &) [toolkit/xre/nsAppRunner.cpp:4861]
11:46:29     INFO -  GECKO(1136) | #21: XRE_main(int,char * * const,mozilla::BootstrapConfig const &) [toolkit/xre/nsAppRunner.cpp:4956]
11:46:29     INFO -  GECKO(1136) | #22: do_main [browser/app/nsBrowserApp.cpp:237]
11:46:29     INFO -  GECKO(1136) | #23: NS_internal_main(int,char * *,char * *) [browser/app/nsBrowserApp.cpp:311]
11:46:29     INFO -  GECKO(1136) | #24: wmain [toolkit/xre/nsWindowsWMain.cpp:118]
11:46:29     INFO -  GECKO(1136) | #25: __scrt_common_main_seh [f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253]
11:46:29     INFO -  GECKO(1136) | #26: KERNEL32.DLL + 0x12774
11:46:29     INFO -  GECKO(1136) | #27: ntdll.dll + 0x70d61
11:46:29     INFO -  GECKO(1136) | Hit MOZ_CRASH(Aborting on channel error.) at z:/build/build/src/ipc/glue/MessageChannel.cpp:2544
11:46:29     INFO -  GECKO(1136) | ### ERROR: CreateThread: Access is denied.
This is a gray marking issue and the logs indicate that it's for a ProxyObject's private slot.  Here's a patch to add assertions when setting this slot.  Not a fix.
Assignee: nobody → jcoppeard
Attachment #8909886 - Flags: review?(sphink)
Here's a patch to output object class names when a gray marking problem is detected, which should make this easier to track down.  Not a fix.
Attachment #8909889 - Flags: review?(sphink)
Keywords: leave-open
Attachment #8909886 - Flags: review?(sphink) → review+
Attachment #8909889 - Flags: review?(sphink) → review+
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fbdfaa4bae2d
Add gray marking assertions when setting proxy target r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/a079a0a2971d
Dump more info on gray marking failure r=sfink
Recent crashes are like:

[task 2017-09-25T18:26:11.827Z] 18:26:11     INFO - GECKO(1096) | Found black to gray edge to Object XULElement 0xe53dffa0
[task 2017-09-25T18:26:11.831Z] 18:26:11     INFO - GECKO(1096) |   from Object Proxy 0xd1869c70 proxy target edge
[task 2017-09-25T18:26:11.835Z] 18:26:11     INFO - GECKO(1096) |   from Object Object 0xe3133a60 object slot edge
[task 2017-09-25T18:26:11.843Z] 18:26:11     INFO - GECKO(1096) |   from Object Proxy 0xe53ded30 proxy target edge
[task 2017-09-25T18:26:11.845Z] 18:26:11     INFO - GECKO(1096) |   from Object XULElement 0xe53dffa0 object slot edge
[task 2017-09-25T18:26:11.847Z] 18:26:11     INFO - GECKO(1096) |   from root Preserved wrapper
I still can't figure out what's going on here.

Here's a patch to dump even more information when we detect things have gone wrong in debug builds.
Attachment #8924506 - Flags: review?(pbone)
Comment on attachment 8924506 [details] [diff] [review]
bug1399866-more-gray-marking-info

Review of attachment 8924506 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me. r+
Attachment #8924506 - Flags: review?(pbone) → review+
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/203ea4c2406f
Dump even more info on gray marking failure r=pbone
Duplicate of this bug: 1359049
Duplicate of this bug: 1406105
Duplicate of this bug: 1388185
Duplicate of this bug: 1421982
Duplicate of this bug: 1425223
We're seeing a black proxy object with a gray target.  

Add an assert that we never create a proxy with a gray target.  Currently we only assert we never create a black to gray edge, but a newly created proxy will not be black outside of incremental GC.
Attachment #8937160 - Flags: review?(sphink)
Attachment #8937160 - Flags: review?(sphink) → review+
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/dd94a26765f3
Assert proxy private is not gray on creation r=sfink
Duplicate of this bug: 1425902
Duplicate of this bug: 1427578
Duplicate of this bug: 1425558
Duplicate of this bug: 1423726
Duplicate of this bug: 1431880
Duplicate of this bug: 1431890
There have been a total of 32 failures in the last week, according to Orange Factor.
All the failures occur on debug.

As per platform, below you can see the no. of occurrences for each platform:
-linux64-stylo-disabled: 10
-windows10-64-ccov: 9
-Linux x64: 8
-Windows 7: 2
-linux32-stylo-disabled: 1
-OS X 10.10: 1
-windows10-64: 1

Here is a recent log file and a snippet with the failure:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=159260208&lineNumber=4081

[task 2018-01-30T05:10:11.082Z] 05:10:11     INFO - GECKO(2040) |     "engines": null (shape 7f25e57cb318 enumerate slot 34)
[task 2018-01-30T05:10:11.196Z] 05:10:11     INFO - GECKO(2040) | MEMORY STAT | vsize 1669MB | residentFast 189MB | heapAllocated 34MB
[task 2018-01-30T05:10:11.563Z] 05:10:11     INFO - GECKO(2040) | Assertion failure: js::CheckGrayMarkingState(mJSRuntime), at /builds/worker/workspace/build/src/xpcom/base/CycleCollectedJSRuntime.cpp:1231

It seems that all dependencies have been resolved.
:jonco , would you please take a look at this?
Flags: needinfo?(jcoppeard)
Whiteboard: [stockwell needswork]
(In reply to Tiberius Oros[:tiberius_oros] from comment #42)
I'm currently working on resolving a talos regression (bug 1433128).  I'll take a look at this next.
I found somewhere we don't have gray marking asserts: the public/Proxy.h header proves some functions that are used by gecko to write proxy slots.  This patch adds assertions to ensure we don't create black to gray edges.
Flags: needinfo?(jcoppeard)
Attachment #8951343 - Flags: review?(sphink)
Attachment #8951343 - Flags: review?(sphink) → review+
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/42eb9bb1033e
Add gray marking asserts to proxy write functions r=sfink
I filed bug 1440739 to improve our gray marking asserts further.
Duplicate of this bug: 1448207
This is a good candidate for rr.
Flags: needinfo?(overholt)
Flags: needinfo?(overholt)

Recent failure log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=225652005&repo=autoland&lineNumber=20772

06:53:59 INFO - GECKO(1076) | Assertion failure: js::CheckGrayMarkingState(mJSRuntime), at /builds/worker/workspace/build/src/xpcom/base/CycleCollectedJSRuntime.cpp:1119
21680 06:55:49 INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ mozilla::CycleCollectedJSRuntime::CheckGrayBits() const]
24917 06:55:50 INFO - TEST-UNEXPECTED-FAIL | leakcheck | tab missing output line for total leaks!
33236 07:04:13 ERROR - Return code: 1
33237 07:04:13 ERROR - Got 1 unexpected crashes
33282 [taskcluster:error] exit status 1

Jon, can you redirect this to someone who can look into it?

Flags: needinfo?(jcoppeard)

The recent failures all look like this:

Found black to gray edge to gray JS Object Window 0x2b51ece5f060 (compartment 0x7f6b3097c8e0)
  from black JS ObjectGroup 0x298433092040 group_global edge
  from gray JS Object Proxy 0x2984330d8040 (compartment 0x7f6b3097c8e0) group edge
  from gray JS Object LexicalEnvironment 0x2b51ece61040 (compartment 0x7f6b3097c8e0) object slot edge
  from gray JS Object Function 0x2b51ece67470 (compartment 0x7f6b3097c8e0) fun_environment edge
  ...

So it's complaining that we have a black object group whose global is gray, which should not be possible.

Not a fix. This patch adds some assertions that we don't ever set the realm of an ObjectGroup such that it points to a gray global object. We have these asserts already for other types of edge but this one is indirect because we store a pointer to the Realm, and get the global from that.

Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/67b51087d673
Add assertions that we don't create black to gray edges from ObjectGroups to global objects r=sfink

(In reply to Narcis Beleuzu [:NarcisB] from comment #83)
Great news, this may have found the problem.

Flags: needinfo?(jcoppeard)
Attachment #9101300 - Attachment description: Bug 1399866 - Add assertions that we don't create black to gray edges from ObjectGroups to global objects r?sfink → Bug 1399866 - Add assertions that we don't create black to gray edges from ObjectGroups to global objects r=sfink

When remapping wrappers the embedding can call this to create a new proxy object while in a realm whose global is gray. This breaks the JS API invariants and can cause black to gray GC edges to be creaated. Fix this by triggering the global's read barrier here.

Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cec21bd20658
Trigger global's read barrier in js::NewProxyObject r=sfink
https://hg.mozilla.org/integration/autoland/rev/0f2ab1f1dc33
Add assertions that we don't create black to gray edges from ObjectGroups to global objects r=sfink

Closing as there are no failures since the last patches landed. If this starts happening again please file a new bug.

Status: NEW → RESOLVED
Closed: 1 year ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.