Closed Bug 1496566 Opened 2 years ago Closed 1 year ago

Intermittent PID 4868 | Assertion failure: mEvents, at z:/build/build/src/xpcom/threads/nsThread.cpp:1097

Categories

(Core :: XPCOM, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox-esr60 --- unaffected
firefox65 --- wontfix
firefox66 --- fixed
firefox67 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: froydnj)

References

Details

(Keywords: assertion, intermittent-failure, Whiteboard: [stockwell unknown])

Attachments

(1 file)

Filed by: ncsoregi [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=203462219&repo=mozilla-central

https://queue.taskcluster.net/v1/task/QgbUKQiQSiig856fVR4dPQ/runs/0/artifacts/public/logs/live_backing.log

18:25:41     INFO - TEST-START | /html/dom/elements/elements-in-the-dom/historical.html
18:25:41     INFO - PID 4868 | ++DOCSHELL 085E5C00 == 2 [pid = 4548] [id = {ec1f783a-1443-42f1-813a-be6f51b9ee78}]
18:25:41     INFO - PID 4868 | ++DOMWINDOW == 5 (00FC27A0) [pid = 4548] [serial = 5] [outer = 00000000]
18:25:41     INFO - PID 4868 | ++DOMWINDOW == 6 (098A2400) [pid = 4548] [serial = 6] [outer = 00FC27A0]
18:25:41     INFO - PID 4868 | ++DOMWINDOW == 7 (098A3800) [pid = 4548] [serial = 7] [outer = 00FC27A0]
18:25:41     INFO - PID 4868 | ++DOCSHELL 098A7000 == 3 [pid = 4548] [id = {274509b5-83d8-4862-be82-8c42a352f9a1}]
18:25:41     INFO - PID 4868 | ++DOMWINDOW == 8 (00FC28D0) [pid = 4548] [serial = 8] [outer = 00000000]
18:25:41     INFO - PID 4868 | ++DOMWINDOW == 9 (098A8000) [pid = 4548] [serial = 9] [outer = 00FC28D0]
18:25:41     INFO - PID 4868 | ++DOMWINDOW == 10 (098A4400) [pid = 4548] [serial = 10] [outer = 00FC28D0]
18:25:41     INFO - PID 4868 | Assertion failure: mEvents, at z:/build/build/src/xpcom/threads/nsThread.cpp:1097
18:25:41     INFO - ........
18:25:41     INFO - TEST-OK | /html/dom/elements/elements-in-the-dom/historical.html | took 347ms

This bug failed 72 times in the last 7 days. Occurs on Linux, OSX and Windows platforms on debug build type.

Recent log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=224140970&repo=mozilla-inbound&lineNumber=23853

froydnj: Can you please take a look at this bug?

Flags: needinfo?(nfroyd)
Whiteboard: [stockwell needswork]

OK, so the failing stack seems to be:

Assertion failure: mEvents, at /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp:1033
#01: nsThread::ProcessNextEvent(bool, bool*) [mfbt/Assertions.h:38]
#02: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/threads/nsThreadUtils.cpp:468]
#03: nsComponentManagerImpl::GetServiceByContractID(char const*, nsID const&, void**) [xpcom/components/nsComponentManager.cpp:1414]
#04: <name omitted> [xpcom/components/nsComponentManagerUtils.cpp:61]
#05: nsCOMPtr<nsIDebug2>::assign_from_gs_contractid(nsGetServiceByContractID, nsID const&) [xpcom/base/nsCOMPtr.h:1206]
#06: WatchdogMain(void*) [xpcom/base/nsCOMPtr.h:839]
#07: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
#08: libpthread.so.0 + 0x6295

That is, we're in the XPConnect watchdog, and we've decided that it's time to interrupt the currently running script:

https://searchfox.org/mozilla-central/rev/4faab2f1b697827b93e16cb798b22b197e5235c9/js/xpconnect/src/XPCJSContext.cpp#502-521

so we go to grab the debug service, but wouldn't you know, somebody else is trying to grab the debug service at the exact same time, and so we have to wait inside the component manager:

https://searchfox.org/mozilla-central/rev/4faab2f1b697827b93e16cb798b22b197e5235c9/js/xpconnect/src/XPCJSContext.cpp#502-521

and for reasons that I don't really understand, we decide that, because we're waiting for some other thread to be done creating the service, we need to spintheeventloop* on the current thread. But our current thread is the watchdog thread, and we don't have a proper event queue on the watchdog thread (it's not a true nsIThread). So we assert that we're trying to process the event queue on a thread that doesn't have one.

I guess the fix is to...not do that? Patch coming.

Flags: needinfo?(nfroyd)
I expect mccr8 to get to this review first, but including kmag in case he has
some comments on the fallout from the event queue refactoring from a while
back.
Attachment #9039534 - Flags: review?(kmaglione+bmo)
Attachment #9039534 - Flags: review?(continuation)
Attachment #9039534 - Flags: review?(continuation) → review+
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/27c046fe369a
avoid processing non-existent thread event queues; r=mccr8

There are 95 total failures in the last 7 days

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=224584565&repo=mozilla-central&lineNumber=14727
[task 2019-01-29T00:28:13.424Z] 00:28:13 INFO - TEST-START | /css/css-tables/visibility-collapse-colspan-001.html
[task 2019-01-29T00:28:13.447Z] 00:28:13 INFO - PID 11458 | [Child 11557, Main Thread] WARNING: NS_ENSURE_TRUE(domReturn) failed: file /builds/worker/workspace/build/src/dom/base/nsGlobalWindowOuter.cpp, line 7187
[task 2019-01-29T00:28:13.448Z] 00:28:13 INFO - PID 11458 | ++DOCSHELL 0x7f5d056ce000 == 1 [pid = 12317] [id = {b37d99a8-3772-449a-a4c6-6009da5a949d}]
[task 2019-01-29T00:28:13.526Z] 00:28:13 INFO - PID 11458 | ++DOMWINDOW == 1 (0x7f5d05662000) [pid = 12317] [serial = 1] [outer = (nil)]
[task 2019-01-29T00:28:13.563Z] 00:28:13 INFO - PID 11458 | ++DOMWINDOW == 2 (0x7f5d05899000) [pid = 12317] [serial = 2] [outer = 0x7f5d05662000]
[task 2019-01-29T00:28:13.582Z] 00:28:13 INFO - PID 11458 | --DOCSHELL 0x7fc8bfecd000 == 0 [pid = 12289] [id = {46c392dd-9990-434e-bef2-22d5b4fa8e2c}]
[task 2019-01-29T00:28:13.583Z] 00:28:13 INFO - PID 11458 | --DOMWINDOW == 3 (0x7fc8bfe61000) [pid = 12289] [serial = 1] [outer = (nil)] [url = http://web-platform.test:8000/css/css-tables/visibility-collapse-col-005.html]
[task 2019-01-29T00:28:13.584Z] 00:28:13 INFO - PID 11458 | --DOMWINDOW == 2 (0x7fc8c0098000) [pid = 12289] [serial = 2] [outer = (nil)] [url = about:blank]
[task 2019-01-29T00:28:13.587Z] 00:28:13 INFO - PID 11458 | --DOMWINDOW == 1 (0x7fc8c10b6800) [pid = 12289] [serial = 3] [outer = (nil)] [url = about:blank]
[task 2019-01-29T00:28:13.588Z] 00:28:13 INFO - PID 11458 | --DOMWINDOW == 0 (0x7fc8c10b9c00) [pid = 12289] [serial = 4] [outer = (nil)] [url = http://web-platform.test:8000/css/css-tables/visibility-collapse-col-005.html]
[task 2019-01-29T00:28:13.614Z] 00:28:13 INFO - PID 11458 | nsStringStats
[task 2019-01-29T00:28:13.616Z] 00:28:13 INFO - PID 11458 | => mAllocCount: 8567
[task 2019-01-29T00:28:13.617Z] 00:28:13 INFO - PID 11458 | => mReallocCount: 0
[task 2019-01-29T00:28:13.618Z] 00:28:13 INFO - PID 11458 | => mFreeCount: 8567
[task 2019-01-29T00:28:13.619Z] 00:28:13 INFO - PID 11458 | => mShareCount: 5703
[task 2019-01-29T00:28:13.620Z] 00:28:13 INFO - PID 11458 | => mAdoptCount: 435
[task 2019-01-29T00:28:13.622Z] 00:28:13 INFO - PID 11458 | => mAdoptFreeCount: 439
[task 2019-01-29T00:28:13.622Z] 00:28:13 INFO - PID 11458 | => Process ID: 12289, Thread ID: 140500623931200
[task 2019-01-29T00:28:13.638Z] 00:28:13 INFO - PID 11458 | ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /tmp/tmpqTsO10.mozrunner/runtests_leaks_847_tab_pid12345.log
[task 2019-01-29T00:28:13.780Z] 00:28:13 INFO - PID 11458 | Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[task 2019-01-29T00:28:13.828Z] 00:28:13 INFO - PID 11458 | Assertion failure: mEvents, at /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp:1034
[task 2019-01-29T00:28:13.828Z] 00:28:13 INFO - PID 11458 | #01: libxul.so + 0xcf3bd3
[task 2019-01-29T00:28:13.829Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.829Z] 00:28:13 INFO - PID 11458 | #02: libxul.so + 0xcce405
[task 2019-01-29T00:28:13.829Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.830Z] 00:28:13 INFO - PID 11458 | #03: libxul.so + 0xcd34e3
[task 2019-01-29T00:28:13.830Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.831Z] 00:28:13 INFO - PID 11458 | #04: libxul.so + 0x15aaa43
[task 2019-01-29T00:28:13.831Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.831Z] 00:28:13 INFO - PID 11458 | #05: libxul.so + 0x15aa522
[task 2019-01-29T00:28:13.832Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.949Z] 00:28:13 INFO - PID 11458 | #06: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2019-01-29T00:28:13.950Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.952Z] 00:28:13 INFO - PID 11458 | #07: libpthread.so.0 + 0x76ba
[task 2019-01-29T00:28:13.953Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.955Z] 00:28:13 INFO - PID 11458 | #08: libc.so.6 + 0x10741d
[task 2019-01-29T00:28:13.956Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.957Z] 00:28:13 INFO - PID 11458 | #09: ??? (???:???)
[task 2019-01-29T00:28:13.959Z] 00:28:13 INFO - PID 11458 | ###!!! [Parent][MessageChannel] Error: (msgtype=0x1E0087,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv

A patch has been landed in the above comment, no other failures since then.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

Please nominate this for Beta approval when you get a chance.

Flags: needinfo?(nfroyd)
Keywords: assertion

Comment on attachment 9039534 [details] [diff] [review]
avoid processing non-existent thread event queues

Beta/Release Uplift Approval Request

Feature/Bug causing the regression

None

User impact if declined

None. Sheriffs will sigh at having to star this again and again on beta, though.

Is this code covered by automated tests?

Yes

Has the fix been verified in Nightly?

Yes

Needs manual test from QE?

No

If yes, steps to reproduce

List of other uplifts needed

None

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

Small change, well understood.

String changes made/needed

None

Flags: needinfo?(nfroyd)
Attachment #9039534 - Flags: review?(kmaglione+bmo) → approval-mozilla-beta?
Comment on attachment 9039534 [details] [diff] [review]
avoid processing non-existent thread event queues

Fix for intermittent test failure; we want to trust the test results enough to catch real bugs. OK to uplift for beta 5.
Attachment #9039534 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.