Closed Bug 1400253 Opened 2 years ago Closed 2 years ago

All 19 tests using do_calendar_startup() fail - Xpcshell failure in Calendar on 2017-09-25: ABORT: file ... mozilla/dom/workers/ServiceWorkerRegistrar.cpp, line 1027 (Friday bustage)

Categories

(Calendar :: General, defect, P1, critical)

Lightning 5.8
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jorgk, Assigned: Fallen)

Details

(Whiteboard: [Thunderbird-testfailure: X all][Thunderbird-temporary-fix])

Attachments

(1 file, 1 obsolete file)

All failures looking like this:

TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_calmgr.js | xpcshell return code: 1 [log…]
PID 10280 | FATAL ERROR: AsyncShutdown timeout in profile-before-change Conditions: [{"name":"ServiceWorkerRegistrar: Flushing data","state":"(none)","filename":"c:/builds/moz2_slave/tb-c-cen-w32-ntly-000000000000/build/mozilla/dom/workers/ServiceWorkerRegistrar.cpp","lineNumber":1027,"stack":"ServiceWorkerRegistrar: Flushing data"}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the use [log…]
PID 10280 | [10280, Main Thread] ###!!! ABORT: file c:/builds/moz2_slave/tb-c-cen-w32-ntly-000000000000/build/mozilla/dom/workers/ServiceWorkerRegistrar.cpp, line 1027 [log…]
PROCESS-CRASH | xpcshell-icaljs.ini:calendar/test/unit/test_calmgr.js | application crashed [@ mozalloc_abort(char const * const)] [log…]

TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_datetime.js | xpcshell return code: 1 [log…]
PID 11904 | FATAL ERROR: AsyncShutdown timeout in profile-before-change Conditions: [{"name":"ServiceWorkerRegistrar: Flushing data","state":"(none)","filename":"c:/builds/moz2_slave/tb-c-cen-w32-ntly-000000000000/build/mozilla/dom/workers/ServiceWorkerRegistrar.cpp","lineNumber":1027,"stack":"ServiceWorkerRegistrar: Flushing data"}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the use [log…]
PID 11904 | [11904, Main Thread] ###!!! ABORT: file c:/builds/moz2_slave/tb-c-cen-w32-ntly-000000000000/build/mozilla/dom/workers/ServiceWorkerRegistrar.cpp, line 1027 [log…]
PROCESS-CRASH | xpcshell-icaljs.ini:calendar/test/unit/test_datetime.js | application crashed [@ mozalloc_abort(char const * const)] [log…]

TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_calmgr.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_datetime.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_bug356207.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_alarmutils.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_bug343792.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_alarmservice.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_freebusy_service.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_freebusy.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_deleted_items.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_datetimeformatter.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_ics_service.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_items.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_storage.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_utils.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-libical.ini:calendar/test/unit/test_alarmutils.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-libical.ini:calendar/test/unit/test_alarmservice.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_rfc3339_parser.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-libical.ini:calendar/test/unit/test_bug343792.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-libical.ini:calendar/test/unit/test_bug356207.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_timezone_definition.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-libical.ini:calendar/test/unit/test_calmgr.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-libical.ini:calendar/test/unit/test_datetime.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-libical.ini:calendar/test/unit/test_datetimeformatter.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-icaljs.ini:calendar/test/unit/test_recur.js | xpcshell return code: 1 [log…]
TEST-UNEXPECTED-FAIL | xpcshell-libical.ini:calendar/test/unit/test_deleted_items.js | xpcshell return code: 1 [log…]

M-C last good: 8e818b5e9b6bef0fc1a5c527ecf30b0d56
M-C first bad: 893fe1549e1e7342a66514b65960f08d40

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8e818b5e9b6bef0fc1a5c527ecf30b0d56&tochange=893fe1549e1e7342a66514b65960f08d40

Looks like:
4890481365e6 Kris Maglione - Bug 1399646: Part 2 - Use the async shutdown service for ServiceWorkerRegistrar. r=baku
https://hg.mozilla.org/mozilla-central/rev/4890481365e6

Kris and Andrea, any hints for us?
Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(amarchesini)
Definitely related to bug 1399646. Kris, can you take a look? Thanks
Flags: needinfo?(amarchesini)
My guess is that the timeout is not new, just the way that we handle it. AsyncShutdown tracks how long conditions take to complete and crashes if they take too long. The previous code just looped indefinitely, and either eventually finished or got killed by the xpcshell harness.
Flags: needinfo?(kmaglione+bmo)
What "conditions" and which "too long" are you referring to? Previously these tests passed, so I assume the "eventually finished". So what can I do to debug or fix this issue? With the Xpcshell tests in perma orange with an (felt) endless log of failures I have great trouble spotting new failures, so sheriffing C-C has just become very hard.
Flags: needinfo?(kmaglione+bmo)
There are a number of things that need to block shutdown until some asynchronous operation completes. They tend to achieve this by spinning nested event loops.

In this case, it's the shutdown of the ServiceWorkerRegistrar:

http://searchfox.org/mozilla-central/rev/2c9a5993ac40ec1db8450e3e0a85702fa291b9e2/dom/workers/ServiceWorkerRegistrar.cpp#1044-1070

The previous version of this code did its own, fairly simple event loop spinning. The new version uses the async shutdown service instead, which tracks which which operations are still pending, reports which ones are taking too long, and eventually crashes if one times out. The previous version did not do these things, and simply spun in definitely. The xpcshell harness was free to kill the process and still consider it a success if that happened (though I'm not sure that's what happened; perhaps it just eventually completed after a long delay), but a crash is always considered a failure.

So I'd suggest you look into the shutdown behavior with the previous code. If the registrar shutdown froze in the same way there, then there's probably a bug in the code run by those tests that prevents it from shutting down correctly. If it didn't, then there may be another shutdown bug that only shows up when it doesn't spin its own nested event loop.
Flags: needinfo?(kmaglione+bmo)
Whiteboard: [Thunderbird-testfailure: X all]
Keywords: leave-open
Whiteboard: [Thunderbird-testfailure: X all] → [Thunderbird-testfailure: X all][Thunderbird-disabled-test]
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/9d2bc17f3faa
temporarily disable failing Calendar tests. rs=bustage-fix
Hmm, I missed test_ics_parser.js and test_recur.js. They didn't fail here:
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=4600e944bce2589ca1d4f0881b246b4e8897f7a1&selectedJob=131283343
but I'll hit them with the next push.
Adding two more. Instead of doing a second patch, I'll back the first one out and land this one instead. I hope this won't become a "whack a mole" exercise.
Attachment #8908766 - Attachment is obsolete: true
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/8dc11e7c9084
Backed out changeset 9d2bc17f3faa from bug 1400253 for not disabling enough tests. a=jorgk
https://hg.mozilla.org/comm-central/rev/d0a970b8ece0
temporarily disable failing tests. rs=bustage-fix
This was successful, with 19 tests switched off, Xpcshell now passes on Windows, currently the only platform where tests work at all. Philipp, can you please take a look. I really can't understand why some tests pass and others don't and why only calendar tests are affected.
Flags: needinfo?(philipp)
Severity: normal → major
Summary: Massive Xpcshell failure in Calendar on 2017-09-25 (25 failing tests): ABORT: file ... mozilla/dom/workers/ServiceWorkerRegistrar.cpp, line 1027 (Friday bustage) → Massive Xpcshell failure in Calendar on 2017-09-25 (19 failing tests): ABORT: file ... mozilla/dom/workers/ServiceWorkerRegistrar.cpp, line 1027 (Friday bustage)
Just to see what's happening, I ran
mozilla/mach xpcshell-test calendar/test/unit/test_datetimeformatter.js
locally.

The test runs through successfully with icaljs to
 0:02.72 LOG: Thread-1 INFO exiting test

waits then seconds and prints:
 0:12.75 PROCESS_OUTPUT: Thread-1 (pid:8060) "WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"ServiceWorkerRegistrar: Flushing data","state":"(none)","filename":"c:/mozilla-source/comm-central/mozilla/dom/workers/ServiceWorkerRegistrar.cpp","lineNumber":1027,"stack":"ServiceWorkerRegistrar: Flushing data"}] Barrier: profile-before-change"

waits a minute and crashes:
 1:03.76 PROCESS_OUTPUT: Thread-1 (pid:8060) "FATAL ERROR: AsyncShutdown timeout in profile-before-change Conditions: [{"name":"ServiceWorkerRegistrar: Flushing data","state":"(none)","filename":"c:/mozilla-source/comm-central/mozilla/dom/workers/ServiceWorkerRegistrar.cpp","lineNumber":1027,"stack":"ServiceWorkerRegistrar: Flushing data"}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources."
 1:03.76 PROCESS_OUTPUT: Thread-1 (pid:8060) "[8060, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, NS_ERROR_XPC_GS_RETURNED_FAILURE) failed with result 0x80004002: file c:/mozilla-source/comm-central/mozilla/js/xpconnect/src/XPCJSID.cpp, line 702"
 1:03.76 PROCESS_OUTPUT: Thread-1 (pid:8060) "WARNING: No crash reporter available"
 1:03.76 PROCESS_OUTPUT: Thread-1 (pid:8060) "[8060, Main Thread] ###!!! ABORT: file c:/mozilla-source/comm-central/mozilla/dom/workers/ServiceWorkerRegistrar.cpp, line 1027"
 1:03.85 TEST_END: Thread-1 Harness FAIL, expected PASS. Subtests passed 31/31. Unexpected 1
xpcshell return code: 1
PROCESS-CRASH | xpcshell-icaljs.ini:calendar/test/unit/test_datetimeformatter.js | application crashed [unknown top frame]

Then it repeats with libical and the same happens until it finally fails.

Weird. Looking at what distinguishes a failing from a non-failing test is this: All failing tests use do_calendar_startup(), so that must be the culprit.

I think the Calendar people need to address this.
Summary: Massive Xpcshell failure in Calendar on 2017-09-25 (19 failing tests): ABORT: file ... mozilla/dom/workers/ServiceWorkerRegistrar.cpp, line 1027 (Friday bustage) → All 19 tests using do_calendar_startup() fail - Xpcshell failure in Calendar on 2017-09-25: ABORT: file ... mozilla/dom/workers/ServiceWorkerRegistrar.cpp, line 1027 (Friday bustage)
Assignee: nobody → philipp
Severity: major → critical
Priority: -- → P1
Version: Trunk → Lightning 5.8
Keywords: leave-open
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/66b336c0c7f7
Backed out changeset d0a970b8ece0 to re-enable temporarily disabled tests. a=jorgk
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Fixed by bug 1396897.
Flags: needinfo?(philipp)
Target Milestone: --- → 6.0
Beta (Calendar 5.9, TB 57):
https://hg.mozilla.org/releases/comm-beta/rev/9829d15fd5f05cbc7f07b0532024270951778d78
Target Milestone: 6.0 → 5.9
Whiteboard: [Thunderbird-testfailure: X all][Thunderbird-disabled-test] → [Thunderbird-testfailure: X all][Thunderbird-temporary-fix]
As bug 1396897 provided a real fix, does it need [Thunderbird-temporary-fix] ? Or are you using this tag for bugs that were a temporary fix at the time?
Before I sent the message to Maildev, I only had Thunderbird-disabled-test where I tagged a) temporary disabled tests b) temporary fixes and c) other disabled testes which can remain in that state for a while.

Now I devised a better scheme: Thunderbird-temporary-fix for anything that needs immediate further attention and Thunderbird-disabled-test for annoying but uncritical failures what we don't have time to look into.

Both tags are intended to be permanent, if the bug is closed, there is no need for action any more.

[PLR] is intended for open or closed bugs and requires action. Once reviewed, the tag is cleared.
You need to log in before you can comment on or make changes to this bug.