Closed Bug 1623007 Opened 4 years ago Closed 3 years ago

Intermittent [beta/release][Wd] TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/new_session/create_alwaysMatch.py,/webdriver/tests/new_session/create_firstMatch.py,/webdriver/tests/new_session/timeouts.py | expected OK

Categories

(Remote Protocol :: Marionette, defect, P2)

Version 3
defect

Tracking

(firefox75 fix-optional, firefox76 unaffected)

RESOLVED WORKSFORME
Tracking Status
firefox75 --- fix-optional
firefox76 --- unaffected

People

(Reporter: whimboo, Unassigned)

References

Details

(Keywords: intermittent-failure, regression)

This bug catches the unexpected timeout regression as it can be seen for Wdspec tests since March 9th on the beta branch. Affected tests are create_alwaysMatch.py (bug 1418778), create_firstMatch.py (bug 1507121) and timeouts.py (bug 1574837).

All of them are related to long delays in shutting down the Firefox browser.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from bug 1507121 comment #52)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #51)

Yes, this sounds like remote settings continuing to use IndexedDB late in the shutdown process is triggering the IndexedDB bug. We want to fix both things (Remote Settings and IndexedDB).

So you mean bug bug 1597312? What is the timeline for the fix? This is a perma failure for us on beta now. Are there any workarounds for now? Thanks.

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #53)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #52)

So you mean bug bug 1597312? What is the timeline for the fix? This is a perma failure for us on beta now. Are there any workarounds for now? Thanks.

Can you provide a treeherder link filtered down to the thing that's permafailing? Permafail should make it very easy to fix.

(In reply to Andreea Pavel [:apavel] from comment #54)

Some examples:

Wdspec beta failures range: https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&searchStr=wdspec&fromchange=2e6e177dbbff7020af4ba28f128f01f68aaada25&tochange=048c7cb557453e3abe49216c52fb868ed9a4819c (green jobs are not disaplyed)

Flags: needinfo?(bugmail)
Summary: Intermittent [beta] TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/new_session/ <random_file.py> | expected OK | March 9th regression → Intermittent [beta] TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/new_session/<file_name> | expected OK | March 9th regression

Note that on March 9th there was the first beta merge from central, so maybe a feature which is only enabled in Nightly plays well, but the old behavior not? Lets wait for the feedback from Andrew, and if he can see something.

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

Andrew can you please have another look? Comment 1 has links to failing jobs which you were requesting. Thanks!

Unless we get a reply here maybe enabling the async shutdown logs might give more information about the slow shutdown.

Depends on: 1624001

Andreea, when is the next push to beta done? Please let me know if the failures are still persistent. If yes I would like to have anohter look into the shutdown timings given that we enabled async shutdown logs for Marionette.

Flags: needinfo?(apavel)
Summary: Intermittent [beta] TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/new_session/<file_name> | expected OK | March 9th regression → Intermittent [beta] TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/<path>/<file_name> | expected OK | March 9th regression

Henrik, if you mean the central to beta merge, that's on April 6th. If there are any other pushes before then, except the ffxbuild, I'll let you know.

Flags: needinfo?(apavel)

(In reply to Andreea Pavel [:apavel] from comment #11)

Henrik, if you mean the central to beta merge, that's on April 6th. If there are any other pushes before then, except the ffxbuild, I'll let you know.

No I meant the simulation merges actually. But the intermittent bot already commented those... And this is even on release now.

Summary: Intermittent [beta] TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/<path>/<file_name> | expected OK | March 9th regression → Intermittent [beta, release] TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/<path>/<file_name> | expected OK | March 9th regression

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #13)

(In reply to Andreea Pavel [:apavel] from comment #11)

Henrik, if you mean the central to beta merge, that's on April 6th. If there are any other pushes before then, except the ffxbuild, I'll let you know.

No I meant the simulation merges actually. But the intermittent bot already commented those... And this is even on release now.

Yes, that is correct.

Recent failure log from release: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=295593697&repo=mozilla-release&lineNumber=89656

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from bug 1456641 comment #53)

Looks like the shutdown phase for profile-before-change gets finished but profile-before-change-telemetry is never started:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=296180058&repo=mozilla-release&lineNumber=1151-1153

Maybe this happens due to the following session file error?

[task 2020-04-03T19:17:22.151Z] 19:17:22 INFO - PID 1597 | console.error: (new Error("SessionFile is closed", "resource:///modules/sessionstore/SessionFile.jsm", 433))

In working cases this failure doesn't appear:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=296180058&repo=mozilla-release&lineNumber=32805-32817

Chris, do you have any idea? We see this across a lot of different Marionette and Wdspec tests. So if that is a telemetry problem we should file a new bug for it, and mark all known failures dependent o nit.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from bug 1456641 comment #54)

Here actually another failure from a different test:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=296180553&repo=mozilla-release&lineNumber=61442-61447

[task 2020-04-03T19:32:08.395Z] 19:32:08     INFO - PID 7332 | DEBUG: Finished phase profile-before-change
[task 2020-04-03T19:32:08.433Z] 19:32:08     INFO - PID 7332 | 
[task 2020-04-03T19:32:08.433Z] 19:32:08     INFO - PID 7332 | ###!!! [Parent][DispatchAsyncMessage] Error: PBackgroundIDBFactory::Msg_PBackgroundIDBFactoryRequestConstructor Value error: message was deserialized, but contained an illegal value
[task 2020-04-03T19:32:08.433Z] 19:32:08     INFO - PID 7332 | 
[task 2020-04-03T19:32:23.220Z] 19:32:23     INFO - PID 7332 | console.error: (new Error("SessionFile is closed", "resource:///modules/sessionstore/SessionFile.jsm", 433))
[task 2020-04-03T19:32:38.411Z] 19:32:38     INFO - PID 7332 | DEBUG: Starting phase profile-before-change-telemetry

So this problem might happen in-between those phases?

Flags: needinfo?(florian)
Flags: needinfo?(chutten)
Summary: Intermittent [beta, release] TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/<path>/<file_name> | expected OK | March 9th regression → Intermittent beta/release TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/new_session/create_alwaysMatch.py,/webdriver/tests/new_session/create_firstMatch.py,/webdriver/tests/new_session/timeouts.py | expected OK

Wow, it takes 15s after profile-before-change to error out that SessionFile is closed. And then another 15s afterwards for it to move to the next phase. What's going on in all that time?

I'm not sure what's going on here. Maybe someone with knowledge about the session store would? ni?mdeboer

Flags: needinfo?(chutten) → needinfo?(mdeboer)

Sebastian, those failures are not only for those two tests, but way more. So I don't think we can simply list them all in the summary due to length restrictions. Does a placeholder as it was in use before not work?

Flags: needinfo?(aryx.bugmail)

No, that didn't get suggested.

Flags: needinfo?(aryx.bugmail)

So here other bugs which got inappropriate classifications: Bug 1478371, Bug 1456641, Bug 1495691, Bug 1507121, Bug 1524813, Bug 1418778, Bug 1534017, Bug 1574837, Bug 1489472, Bug 1548426

What shall we do to cover all of those for beta/release here?

Flags: needinfo?(aryx.bugmail)
Summary: Intermittent beta/release TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/new_session/create_alwaysMatch.py,/webdriver/tests/new_session/create_firstMatch.py,/webdriver/tests/new_session/timeouts.py | expected OK → Intermittent [beta/release][Wd] TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/new_session/create_alwaysMatch.py,/webdriver/tests/new_session/create_firstMatch.py,/webdriver/tests/new_session/timeouts.py | expected OK

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #20)

So here other bugs which got inappropriate classifications: Bug 1478371, Bug 1456641, Bug 1495691, Bug 1507121, Bug 1524813, Bug 1418778, Bug 1534017, Bug 1574837, Bug 1489472, Bug 1548426

What shall we do to cover all of those for beta/release here?

We'll do our best to classify those correctly from now on.

Flags: needinfo?(aryx.bugmail)

Andreea, haven't there been any failures anymore for beta and release? Or haven't been run any simulations?

Flags: needinfo?(apavel)

There are failures on release https://treeherder.mozilla.org/#/jobs?repo=mozilla-release&group_state=expanded&searchStr=wd&selectedJob=297973278 - this bug gets added at the end of the suggested bugs list because its summay has a lower match rate than the one which perfectly matches the failure line.

Flags: needinfo?(apavel)

There is basically nothing else that needs specific tracking for beta and release. All the recent test failures as listed here are actually wrongly classified or belong to tests that in general are intermittent.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(mdeboer)
Flags: needinfo?(florian)
Flags: needinfo?(bugmail)
Resolution: --- → WORKSFORME
Moving bug to Remote Protocol::Marionette component per bug 1815831.
Component: geckodriver → Marionette
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.