[wdspec] new_session fixture takes a long time due to long shutdown times of Firefox

RESOLVED DUPLICATE of bug 1411908

Status

defect
P3
normal
RESOLVED DUPLICATE of bug 1411908
Last year
Last year

People

(Reporter: whimboo, Unassigned)

Tracking

(Blocks 1 bug)

unspecified
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox61 fixed)

Details

For several wdspec tests I noticed that sometimes when we make use of the `new_session` fixture for a test, the requested shutdown of Firefox takes a very long time. As result the test module times out.

Here an example from bug 1409422:

https://treeherder.mozilla.org/logviewer.html#?job_id=170430897&repo=mozilla-inbound&lineNumber=50739-50754

> [task 2018-03-26T23:51:47.341Z] 23:51:47     INFO - PID 5227 | 1522108307331	geckodriver::marionette	TRACE	-> 39:[0,107,"quit",{"flags":["eForceQuit"]}]
> [task 2018-03-26T23:51:47.345Z] 23:51:47     INFO - PID 5227 | 1522108307338	Marionette	TRACE	0 -> [0,107,"quit",{"flags":["eForceQuit"]}]
> [task 2018-03-26T23:51:47.347Z] 23:51:47     INFO - PID 5227 | 1522108307340	Marionette	TRACE	0 <- [1,107,{"error":"invalid session id","message":"","stacktrace":"WebDriverError@chrome://marionette/content/error.js:178:5\nIn ... et@chrome://marionette/content/server.js:236:8\n_onJSONObjectReady/<@chrome://marionette/content/transport.js:500:9\n"},null]
> [task 2018-03-26T23:51:47.350Z] 23:51:47     INFO - PID 5227 | 1522108307340	geckodriver::marionette	TRACE	<- [1,107,{"error":"invalid session id","message":"","stacktrace":"WebDriverError@chrome://marionette/content/error.js:178:5\nInvalidSessionIDError@chrome://marionette/content/error.js:347:5\nassert.that/<@chrome://marionette/content/assert.js:405:13\nassert.session@chrome://marionette/content/assert.js:79:3\ndespatch@chrome://marionette/content/server.js:285:7\nasync*execute@chrome://marionette/content/server.js:262:11\nasync*onPacket/<@chrome://marionette/content/server.js:237:15\nasync*onPacket@chrome://marionette/content/server.js:236:8\n_onJSONObjectReady/<@chrome://marionette/content/transport.js:500:9\n"},null]
> [task 2018-03-26T23:51:47.369Z] 23:51:47     INFO - PID 5227 | 1522108307356	Marionette	DEBUG	Closed connection 0
> [task 2018-03-26T23:52:16.814Z] 23:52:16     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/navigation/get_title.py | expected OK
> [task 2018-03-26T23:52:16.814Z] 23:52:16     INFO - TEST-INFO took 30002ms
Priority: -- → P2
Here a log from a debug build and a failure for `get_title.py`:

https://treeherder.mozilla.org/logviewer.html#?job_id=169660058&repo=mozilla-inbound&lineNumber=53568-53582

> [task 2018-03-22T14:35:25.570Z] 14:35:25     INFO - PID 4620 |  => Process ID: 4971, Thread ID: 4147496704
> [task 2018-03-22T14:35:55.225Z] 14:35:55     INFO - PID 4620 | [Parent 4924, IPDL Background] WARNING: Some storage operations are taking longer than expected during shutdown and will be aborted!: file /builds/worker/workspace/build/src/dom/quota/ActorsParent.cpp, line 5827
[..]
> [task 2018-03-22T14:36:27.536Z] 14:36:27     INFO - PID 4620 | Hit MOZ_CRASH(Shutdown hanging before starting.) at /builds/worker/workspace/build/src/toolkit/components/terminator/nsTerminator.cpp:206

This really looks like the QuotaManager shutdown hang as covered on bug 1411908.
Depends on: 1411908
Maybe this is related to bug 1443524.
See Also: → 1443524
Needs "perf" and "hang" key words.
I think you may have the causality in the bug title wrong: it’s not
new_session that causes Firefox to spend a long time shutting down,
but some behaviour in Firefox that causes new_session to take a
long time.

Furthermore I notice from the log you quoted in comment #0 that the
execution time of Marionette:Quit itself is not very long (less
than one second).  It then takes 29 seconds before the test is
interrupted by wptrunner.  If we think it for PGO builds can take
up to 60 seconds to shut down Firefox, this implies the test runtime
(30 seconds) needs to be extended, right?  Either conditionally for
known-to-be-slow builds or generally.
Priority: P2 → P3
Summary: `new_session` fixture causes long shutdown times of Firefox → new_session fixture causes long shutdown times of Firefox
(In reply to Andreas Tolfsen ‹:ato› from comment #4)
> but some behaviour in Firefox that causes new_session to take a
> long time.

You are right. The summary should indeed be better. Fixing it now.

> Furthermore I notice from the log you quoted in comment #0 that the
> execution time of Marionette:Quit itself is not very long (less
> than one second).  It then takes 29 seconds before the test is
> interrupted by wptrunner.  If we think it for PGO builds can take
> up to 60 seconds to shut down Firefox, this implies the test runtime
> (30 seconds) needs to be extended, right?  Either conditionally for
> known-to-be-slow builds or generally.

No, it is not intended to have 29s to shutdown Firefox. This is basically a result from bug 1411908, which really should be fixed given it affects a huge amount of Firefox users.

Andrew, while we are still waiting for further feedback from you on bug 1411908, it might be helpful for you to know that we have a couple of wdspec tests which reproduce this underlying problem.
Flags: needinfo?(bugmail)
Summary: new_session fixture causes long shutdown times of Firefox → new_session fixture takes a long time due to long shutdown times of Firefox
:whimboo, thanks for your attention on this and pointing directly at tests.  As you know, reliable reproduction and tests are frequently the hard part, so it is invaluable to be made aware of these!  :janv is picking up that bug, so I'm transferring the needinfo to him.
Flags: needinfo?(bugmail) → needinfo?(jvarga)
Jan, also note bug 1425323 where we had to disable a test due to high frequent intermittent failures. This might even be a better testcase in case you want to test on Windows.
Bug 1409422 has 32 failures in the past 7 days.
Platforms: 
- Linux opt
- Linux x64 opt/ pgo
- linux64-nightly opt
Recent log failure:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=172023556&lineNumber=49232

Relevant part of the log:
[task 2018-04-05T05:33:56.424Z] 05:33:56     INFO - PID 4701 | 1522906436406	geckodriver::marionette	TRACE	<- [1,107,{"error":"invalid session id","message":"","stacktrace":"WebDriverError@chrome://marionette/content/error.js:178:5\nInvalidSessionIDError@chrome://marionette/content/error.js:347:5\nassert.that/<@chrome://marionette/content/assert.js:405:13\nassert.session@chrome://marionette/content/assert.js:79:3\ndespatch@chrome://marionette/content/server.js:290:7\nasync*execute@chrome://marionette/content/server.js:267:11\nasync*onPacket/<@chrome://marionette/content/server.js:242:15\nasync*onPacket@chrome://marionette/content/server.js:241:8\n_onJSONObjectReady/<@chrome://marionette/content/transport.js:500:9\n"},null]
[task 2018-04-05T05:33:56.424Z] 05:33:56     INFO - PID 4701 | 1522906436420	Marionette	DEBUG	Closed connection 0
[task 2018-04-05T05:34:25.937Z] 05:34:25     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/navigation/get_title.py | expected OK
[task 2018-04-05T05:34:25.937Z] 05:34:25     INFO - TEST-INFO took 30002ms
[task 2018-04-05T05:34:26.002Z] 05:34:26  WARNING - u'runner_teardown': ()
[task 2018-04-05T05:34:26.003Z] 05:34:26     INFO - Starting runner
[task 2018-04-05T05:34:26.025Z] 05:34:26     INFO - PID 5170 | 1522906466023	geckodriver	INFO	geckodriver 0.20.0 ( 'e72d2ab40d64'  '2018-04-05 06:44 +0200')
[task 2018-04-05T05:34:26.026Z] 05:34:26     INFO - PID 5170 | 1522906466023	webdriver::httpapi	DEBUG	Creating routes
[task 2018-04-05T05:34:26.030Z] 05:34:26     INFO - PID 5170 | 1522906466027	geckodriver	INFO	Listening on 127.0.0.1:4444
[task 2018-04-05T05:34:26.530Z] 05:34:26     INFO - WebDriver HTTP server listening at http://127.0.0.1:4444/
Flags: needinfo?(hskupin)
Whiteboard: [stockwell needswork]
Noemi, please do not add comments about intermittents on this bug. Instead keep that on the bugs which got the stars via Treeherder. Thanks.
Flags: needinfo?(hskupin)
Depends on: 1434618
Whiteboard: [stockwell needswork]
Depends on: 1455568
So if bug 1411908 doesn't help us here we should disable the user prompts tests once my patch on bug 1455568 has been landed.
Flags: needinfo?(jvarga)
We have really good news! I cannot see any timeout failures anymore since the QuotaManager bug has been fixed. I will keep this bug open for a little bit more and will check again once bug 1455568 has been fixed. Then we clearly won't need a longer timeout anymore for all the prompt tests.
Actually I think it doesn't make sense to keep this bug open longer as needed. The underlying problem has been fixed by Jan and all the timeout failures for user prompt handling are gone. Lets dupe it against bug 1411908.
Status: NEW → RESOLVED
Closed: Last year
No longer depends on: 1434618, 1455568, 1411908
Resolution: --- → DUPLICATE
Summary: new_session fixture takes a long time due to long shutdown times of Firefox → [wdspec] new_session fixture takes a long time due to long shutdown times of Firefox
Duplicate of bug: 1411908
No longer blocks: 1459916
You need to log in before you can comment on or make changes to this bug.