Intermittent PID 4868 | Assertion failure: mEvents, at z:/build/build/src/xpcom/threads/nsThread.cpp:1097

RESOLVED FIXED in Firefox 66

Status

()

defect
P5
normal
RESOLVED FIXED
9 months ago
4 months ago

People

(Reporter: intermittent-bug-filer, Assigned: froydnj)

Tracking

({assertion, intermittent-failure})

unspecified
mozilla67
Points:
---

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox65 wontfix, firefox66 fixed, firefox67 fixed)

Details

(Whiteboard: [stockwell unknown])

Attachments

(1 attachment)

Reporter

Description

9 months ago
treeherder
Filed by: ncsoregi [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=203462219&repo=mozilla-central

https://queue.taskcluster.net/v1/task/QgbUKQiQSiig856fVR4dPQ/runs/0/artifacts/public/logs/live_backing.log

18:25:41     INFO - TEST-START | /html/dom/elements/elements-in-the-dom/historical.html
18:25:41     INFO - PID 4868 | ++DOCSHELL 085E5C00 == 2 [pid = 4548] [id = {ec1f783a-1443-42f1-813a-be6f51b9ee78}]
18:25:41     INFO - PID 4868 | ++DOMWINDOW == 5 (00FC27A0) [pid = 4548] [serial = 5] [outer = 00000000]
18:25:41     INFO - PID 4868 | ++DOMWINDOW == 6 (098A2400) [pid = 4548] [serial = 6] [outer = 00FC27A0]
18:25:41     INFO - PID 4868 | ++DOMWINDOW == 7 (098A3800) [pid = 4548] [serial = 7] [outer = 00FC27A0]
18:25:41     INFO - PID 4868 | ++DOCSHELL 098A7000 == 3 [pid = 4548] [id = {274509b5-83d8-4862-be82-8c42a352f9a1}]
18:25:41     INFO - PID 4868 | ++DOMWINDOW == 8 (00FC28D0) [pid = 4548] [serial = 8] [outer = 00000000]
18:25:41     INFO - PID 4868 | ++DOMWINDOW == 9 (098A8000) [pid = 4548] [serial = 9] [outer = 00FC28D0]
18:25:41     INFO - PID 4868 | ++DOMWINDOW == 10 (098A4400) [pid = 4548] [serial = 10] [outer = 00FC28D0]
18:25:41     INFO - PID 4868 | Assertion failure: mEvents, at z:/build/build/src/xpcom/threads/nsThread.cpp:1097
18:25:41     INFO - ........
18:25:41     INFO - TEST-OK | /html/dom/elements/elements-in-the-dom/historical.html | took 347ms
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)

This bug failed 72 times in the last 7 days. Occurs on Linux, OSX and Windows platforms on debug build type.

Recent log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=224140970&repo=mozilla-inbound&lineNumber=23853

froydnj: Can you please take a look at this bug?

Flags: needinfo?(nfroyd)
Whiteboard: [stockwell needswork]
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Assignee

Comment 27

5 months ago

OK, so the failing stack seems to be:

Assertion failure: mEvents, at /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp:1033
#01: nsThread::ProcessNextEvent(bool, bool*) [mfbt/Assertions.h:38]
#02: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/threads/nsThreadUtils.cpp:468]
#03: nsComponentManagerImpl::GetServiceByContractID(char const*, nsID const&, void**) [xpcom/components/nsComponentManager.cpp:1414]
#04: <name omitted> [xpcom/components/nsComponentManagerUtils.cpp:61]
#05: nsCOMPtr<nsIDebug2>::assign_from_gs_contractid(nsGetServiceByContractID, nsID const&) [xpcom/base/nsCOMPtr.h:1206]
#06: WatchdogMain(void*) [xpcom/base/nsCOMPtr.h:839]
#07: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
#08: libpthread.so.0 + 0x6295

That is, we're in the XPConnect watchdog, and we've decided that it's time to interrupt the currently running script:

https://searchfox.org/mozilla-central/rev/4faab2f1b697827b93e16cb798b22b197e5235c9/js/xpconnect/src/XPCJSContext.cpp#502-521

so we go to grab the debug service, but wouldn't you know, somebody else is trying to grab the debug service at the exact same time, and so we have to wait inside the component manager:

https://searchfox.org/mozilla-central/rev/4faab2f1b697827b93e16cb798b22b197e5235c9/js/xpconnect/src/XPCJSContext.cpp#502-521

and for reasons that I don't really understand, we decide that, because we're waiting for some other thread to be done creating the service, we need to spintheeventloop* on the current thread. But our current thread is the watchdog thread, and we don't have a proper event queue on the watchdog thread (it's not a true nsIThread). So we assert that we're trying to process the event queue on a thread that doesn't have one.

I guess the fix is to...not do that? Patch coming.

Flags: needinfo?(nfroyd)
Assignee

Comment 28

5 months ago
I expect mccr8 to get to this review first, but including kmag in case he has
some comments on the fallout from the event queue refactoring from a while
back.
Attachment #9039534 - Flags: review?(kmaglione+bmo)
Attachment #9039534 - Flags: review?(continuation)
Attachment #9039534 - Flags: review?(continuation) → review+

Comment 29

5 months ago
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/27c046fe369a
avoid processing non-existent thread event queues; r=mccr8
Comment hidden (obsolete)

There are 95 total failures in the last 7 days

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=224584565&repo=mozilla-central&lineNumber=14727
[task 2019-01-29T00:28:13.424Z] 00:28:13 INFO - TEST-START | /css/css-tables/visibility-collapse-colspan-001.html
[task 2019-01-29T00:28:13.447Z] 00:28:13 INFO - PID 11458 | [Child 11557, Main Thread] WARNING: NS_ENSURE_TRUE(domReturn) failed: file /builds/worker/workspace/build/src/dom/base/nsGlobalWindowOuter.cpp, line 7187
[task 2019-01-29T00:28:13.448Z] 00:28:13 INFO - PID 11458 | ++DOCSHELL 0x7f5d056ce000 == 1 [pid = 12317] [id = {b37d99a8-3772-449a-a4c6-6009da5a949d}]
[task 2019-01-29T00:28:13.526Z] 00:28:13 INFO - PID 11458 | ++DOMWINDOW == 1 (0x7f5d05662000) [pid = 12317] [serial = 1] [outer = (nil)]
[task 2019-01-29T00:28:13.563Z] 00:28:13 INFO - PID 11458 | ++DOMWINDOW == 2 (0x7f5d05899000) [pid = 12317] [serial = 2] [outer = 0x7f5d05662000]
[task 2019-01-29T00:28:13.582Z] 00:28:13 INFO - PID 11458 | --DOCSHELL 0x7fc8bfecd000 == 0 [pid = 12289] [id = {46c392dd-9990-434e-bef2-22d5b4fa8e2c}]
[task 2019-01-29T00:28:13.583Z] 00:28:13 INFO - PID 11458 | --DOMWINDOW == 3 (0x7fc8bfe61000) [pid = 12289] [serial = 1] [outer = (nil)] [url = http://web-platform.test:8000/css/css-tables/visibility-collapse-col-005.html]
[task 2019-01-29T00:28:13.584Z] 00:28:13 INFO - PID 11458 | --DOMWINDOW == 2 (0x7fc8c0098000) [pid = 12289] [serial = 2] [outer = (nil)] [url = about:blank]
[task 2019-01-29T00:28:13.587Z] 00:28:13 INFO - PID 11458 | --DOMWINDOW == 1 (0x7fc8c10b6800) [pid = 12289] [serial = 3] [outer = (nil)] [url = about:blank]
[task 2019-01-29T00:28:13.588Z] 00:28:13 INFO - PID 11458 | --DOMWINDOW == 0 (0x7fc8c10b9c00) [pid = 12289] [serial = 4] [outer = (nil)] [url = http://web-platform.test:8000/css/css-tables/visibility-collapse-col-005.html]
[task 2019-01-29T00:28:13.614Z] 00:28:13 INFO - PID 11458 | nsStringStats
[task 2019-01-29T00:28:13.616Z] 00:28:13 INFO - PID 11458 | => mAllocCount: 8567
[task 2019-01-29T00:28:13.617Z] 00:28:13 INFO - PID 11458 | => mReallocCount: 0
[task 2019-01-29T00:28:13.618Z] 00:28:13 INFO - PID 11458 | => mFreeCount: 8567
[task 2019-01-29T00:28:13.619Z] 00:28:13 INFO - PID 11458 | => mShareCount: 5703
[task 2019-01-29T00:28:13.620Z] 00:28:13 INFO - PID 11458 | => mAdoptCount: 435
[task 2019-01-29T00:28:13.622Z] 00:28:13 INFO - PID 11458 | => mAdoptFreeCount: 439
[task 2019-01-29T00:28:13.622Z] 00:28:13 INFO - PID 11458 | => Process ID: 12289, Thread ID: 140500623931200
[task 2019-01-29T00:28:13.638Z] 00:28:13 INFO - PID 11458 | ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /tmp/tmpqTsO10.mozrunner/runtests_leaks_847_tab_pid12345.log
[task 2019-01-29T00:28:13.780Z] 00:28:13 INFO - PID 11458 | Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[task 2019-01-29T00:28:13.828Z] 00:28:13 INFO - PID 11458 | Assertion failure: mEvents, at /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp:1034
[task 2019-01-29T00:28:13.828Z] 00:28:13 INFO - PID 11458 | #01: libxul.so + 0xcf3bd3
[task 2019-01-29T00:28:13.829Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.829Z] 00:28:13 INFO - PID 11458 | #02: libxul.so + 0xcce405
[task 2019-01-29T00:28:13.829Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.830Z] 00:28:13 INFO - PID 11458 | #03: libxul.so + 0xcd34e3
[task 2019-01-29T00:28:13.830Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.831Z] 00:28:13 INFO - PID 11458 | #04: libxul.so + 0x15aaa43
[task 2019-01-29T00:28:13.831Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.831Z] 00:28:13 INFO - PID 11458 | #05: libxul.so + 0x15aa522
[task 2019-01-29T00:28:13.832Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.949Z] 00:28:13 INFO - PID 11458 | #06: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2019-01-29T00:28:13.950Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.952Z] 00:28:13 INFO - PID 11458 | #07: libpthread.so.0 + 0x76ba
[task 2019-01-29T00:28:13.953Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.955Z] 00:28:13 INFO - PID 11458 | #08: libc.so.6 + 0x10741d
[task 2019-01-29T00:28:13.956Z] 00:28:13 INFO -
[task 2019-01-29T00:28:13.957Z] 00:28:13 INFO - PID 11458 | #09: ??? (???:???)
[task 2019-01-29T00:28:13.959Z] 00:28:13 INFO - PID 11458 | ###!!! [Parent][MessageChannel] Error: (msgtype=0x1E0087,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv

A patch has been landed in the above comment, no other failures since then.

Comment hidden (Intermittent Failures Robot)

Comment 33

5 months ago
bugherder
Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Assignee: nobody → nfroyd

Please nominate this for Beta approval when you get a chance.

Flags: needinfo?(nfroyd)
Keywords: assertion
Comment hidden (Intermittent Failures Robot)
Assignee

Comment 36

5 months ago

Comment on attachment 9039534 [details] [diff] [review]
avoid processing non-existent thread event queues

Beta/Release Uplift Approval Request

Feature/Bug causing the regression

None

User impact if declined

None. Sheriffs will sigh at having to star this again and again on beta, though.

Is this code covered by automated tests?

Yes

Has the fix been verified in Nightly?

Yes

Needs manual test from QE?

No

If yes, steps to reproduce

List of other uplifts needed

None

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

Small change, well understood.

String changes made/needed

None

Flags: needinfo?(nfroyd)
Attachment #9039534 - Flags: review?(kmaglione+bmo) → approval-mozilla-beta?
Comment on attachment 9039534 [details] [diff] [review]
avoid processing non-existent thread event queues

Fix for intermittent test failure; we want to trust the test results enough to catch real bugs. OK to uplift for beta 5.
Attachment #9039534 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment hidden (Intermittent Failures Robot)
You need to log in before you can comment on or make changes to this bug.