Closed Bug 1422036 Opened 7 years ago Closed 7 years ago

Intermittent TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/actions/mouse_dblclick.py | expected OK

Categories

(Remote Protocol :: Marionette, defect)

Version 3
defect
Not set
normal

Tracking

(firefox59 fixed, firefox60 fixed)

RESOLVED FIXED
mozilla60
Tracking Status
firefox59 --- fixed
firefox60 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: baku)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

It's a new failure based on the changes landed on bug 1385476.
Depends on: 1385476
See Also: → 1421878
Yeah, it seems sometimes the new session per test doesn't start.
[task 2018-01-07T10:42:22.230Z] 10:42:22     INFO - STDOUT: ______________________ ERROR at setup of test_no_dblclick ______________________
[task 2018-01-07T10:42:22.231Z] 10:42:22     INFO - STDOUT: new_session = <function create_session at 0x7fce7850d230>
[task 2018-01-07T10:42:22.231Z] 10:42:22     INFO - STDOUT: url = <function url at 0x7fce7850da28>
[task 2018-01-07T10:42:22.232Z] 10:42:22     INFO - STDOUT: add_browser_capabilites = <function update_capabilities at 0x7fce7850dc08>
[task 2018-01-07T10:42:22.233Z] 10:42:22     INFO - STDOUT:     @pytest.fixture
[task 2018-01-07T10:42:22.234Z] 10:42:22     INFO - STDOUT:     def dblclick_session(new_session, url, add_browser_capabilites):
[task 2018-01-07T10:42:22.234Z] 10:42:22     INFO - STDOUT: >       _, session = new_session({"capabilities": {"alwaysMatch": add_browser_capabilites({})}})
[task 2018-01-07T10:42:22.235Z] 10:42:22     INFO - STDOUT: add_browser_capabilites = <function update_capabilities at 0x7fce7850dc08>
[task 2018-01-07T10:42:22.236Z] 10:42:22     INFO - STDOUT: new_session = <function create_session at 0x7fce7850d230>
[task 2018-01-07T10:42:22.237Z] 10:42:22     INFO - STDOUT: url        = <function url at 0x7fce7850da28>
[task 2018-01-07T10:42:22.237Z] 10:42:22     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/actions/mouse_dblclick.py:21: 
[task 2018-01-07T10:42:22.238Z] 10:42:22     INFO - STDOUT: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[task 2018-01-07T10:42:22.238Z] 10:42:22     INFO - STDOUT: tests/web-platform/tests/webdriver/tests/support/fixtures.py:192: in create_session
[task 2018-01-07T10:42:22.239Z] 10:42:22     INFO - STDOUT:     value = _session.send_command("POST", "session", body=body)
[task 2018-01-07T10:42:22.239Z] 10:42:22     INFO - STDOUT: tests/web-platform/tests/tools/webdriver/webdriver/client.py:434: in send_command
[task 2018-01-07T10:42:22.241Z] 10:42:22     INFO - STDOUT:     session=self)
[task 2018-01-07T10:42:22.242Z] 10:42:22     INFO - STDOUT: tests/web-platform/tests/tools/webdriver/webdriver/transport.py:155: in send
[task 2018-01-07T10:42:22.242Z] 10:42:22     INFO - STDOUT:     response = conn.getresponse()
[task 2018-01-07T10:42:22.243Z] 10:42:22     INFO - STDOUT: /usr/lib/python2.7/httplib.py:1136: in getresponse
[task 2018-01-07T10:42:22.243Z] 10:42:22     INFO - STDOUT:     response.begin()
[task 2018-01-07T10:42:22.244Z] 10:42:22     INFO - STDOUT: /usr/lib/python2.7/httplib.py:453: in begin
[task 2018-01-07T10:42:22.245Z] 10:42:22     INFO - STDOUT:     version, status, reason = self._read_status()
[task 2018-01-07T10:42:22.245Z] 10:42:22     INFO - STDOUT: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[task 2018-01-07T10:42:22.246Z] 10:42:22     INFO - STDOUT: self = <httplib.HTTPResponse instance at 0x7fce78d7bf80>
[task 2018-01-07T10:42:22.246Z] 10:42:22     INFO - STDOUT:     def _read_status(self):
[task 2018-01-07T10:42:22.247Z] 10:42:22     INFO - STDOUT:         # Initialize with Simple-Response defaults
[task 2018-01-07T10:42:22.247Z] 10:42:22     INFO - STDOUT:         line = self.fp.readline(_MAXLINE + 1)
[task 2018-01-07T10:42:22.248Z] 10:42:22     INFO - STDOUT:         if len(line) > _MAXLINE:
[task 2018-01-07T10:42:22.249Z] 10:42:22     INFO - STDOUT:             raise LineTooLong("header line")
[task 2018-01-07T10:42:22.250Z] 10:42:22     INFO - STDOUT:         if self.debuglevel > 0:
[task 2018-01-07T10:42:22.251Z] 10:42:22     INFO - STDOUT:             print "reply:", repr(line)
[task 2018-01-07T10:42:22.251Z] 10:42:22     INFO - STDOUT:         if not line:
[task 2018-01-07T10:42:22.252Z] 10:42:22     INFO - STDOUT:             # Presumably, the server closed the connection before
[task 2018-01-07T10:42:22.252Z] 10:42:22     INFO - STDOUT:             # sending a valid response.
[task 2018-01-07T10:42:22.253Z] 10:42:22     INFO - STDOUT: >           raise BadStatusLine(line)
[task 2018-01-07T10:42:22.253Z] 10:42:22     INFO - STDOUT: E           BadStatusLine: ''
[task 2018-01-07T10:42:22.253Z] 10:42:22     INFO - STDOUT: line       = ''
[task 2018-01-07T10:42:22.254Z] 10:42:22     INFO - STDOUT: self       = <httplib.HTTPResponse instance at 0x7fce78d7bf80>
[task 2018-01-07T10:42:22.254Z] 10:42:22     INFO - STDOUT: /usr/lib/python2.7/httplib.py:417: BadStatusLine

No response from wptserve?
A latest test job for a debug build shows a hang and crash due to a long shutdown:

https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=159664311&lineNumber=2691-2694

[task 2018-01-31T20:37:35.383Z] 20:37:35     INFO - PID 833 | Hit MOZ_CRASH(Workers Hanging - 1|A:1|S:0|Q:0-BC:2|FetchBodyWorkerHolder) at /builds/worker/workspace/build/src/dom/workers/RuntimeService.cpp:2014
[task 2018-01-31T20:37:35.383Z] 20:37:35     INFO - PID 833 | ExceptionHandler::GenerateDump cloned child 2156
[task 2018-01-31T20:37:35.384Z] 20:37:35     INFO - PID 833 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2018-01-31T20:37:35.385Z] 20:37:35     INFO - PID 833 | ExceptionHandler::WaitForContinueSignal waiting for continue signal...

Andrea, do you know of a way to get more information like additional log from Firefox?
Flags: needinfo?(amarchesini)
I know why we are crashing here. Basically we are in a deadlock with the worker because of this:

1. the worker is shutting down. Fetch WorkerHolder is notified on the worker thread.
2. We dispatch a runnable to the main-thread in order to cancel the httpChannel.
3. When OnStreamComplete is called, on the main-thread, we try go to back to the worker again in order to complete the releasing of the WorkerHolder and so on.
4. But point 2 and 3 took too much time and the worker is already in a Killing state where no runnables can be executed.

The result is that, the worker thread cannot be released because we still have the Fetch WorkerHolder, but the fetch code cannot dispatch the runnable to release its workerHolder.

I have a way to fix this, and it's part of the refactoring of worker shutting down procedure.
Flags: needinfo?(amarchesini)
(In reply to Andrea Marchesini [:baku] from comment #15)
> I have a way to fix this, and it's part of the refactoring of worker
> shutting down procedure.

Given the amount of failures we experience for that test, do you have a rough estimate in when this refactoring will be done? Also on which bug does that happen so that we can follow the status? Thanks.
Flags: needinfo?(amarchesini)
Attached patch hang.patchSplinter Review
Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
Attachment #8948409 - Flags: review?(catalin.badea392)
Comment on attachment 8948409 [details] [diff] [review]
hang.patch

Review of attachment 8948409 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm
Attachment #8948409 - Flags: review?(catalin.badea392) → review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d79788e03f74
Using WorkerControlRunnable to release resources in fetch when the worker is shutting down, r=catalinb
Priority: P5 → P3
Priority: P3 → --
This is actually one of the main worker shutdown crash. We should uplift it asap.
Comment on attachment 8948409 [details] [diff] [review]
hang.patch

Approval Request Comment
[Feature/Bug causing the regression]: Fetch in workers
[User impact if declined]: Fetch can cause a crash in worker shutdown
[Is this code covered by automated tests?]: n/a
[Has the fix been verified in Nightly?]: n/a
[Needs manual test from QE? If yes, steps to reproduce]: n/a
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: low
[Why is the change risky/not risky?]: If the worker is not able to process normal runnables (because shutting down), we use a control runnable to release resources on the current thread.
[String changes made/needed]: none
Attachment #8948409 - Flags: approval-mozilla-beta?
Blocks: 1405290
https://hg.mozilla.org/mozilla-central/rev/d79788e03f74
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Comment on attachment 8948409 [details] [diff] [review]
hang.patch

Fixes a number of worker-related shutdown issues. Let's get this on Beta for 59b8.
Attachment #8948409 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1448612
Blocks: 1437575
Moving bug to Remote Protocol::Marionette component per bug 1815831.
Component: geckodriver → Marionette
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: