Open Bug 1879556 Opened 11 months ago Updated 7 months ago

Intermittent TEST-UNEXPECTED-ERROR | /webdriver/tests/classic/perform_actions/<test file>| <random> - teardown error: OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted

Categories

(Remote Protocol :: Marionette, defect, P5)

Desktop
Windows
defect

Tracking

(firefox124 disabled, firefox126 disabled, firefox127 disabled, firefox128 affected)

Tracking Status
firefox124 --- disabled
firefox126 --- disabled
firefox127 --- disabled
firefox128 --- affected

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, intermittent-testcase, leave-open)

Attachments

(2 files, 1 obsolete file)

Filed by: hskupin [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=446670071&repo=try
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/c_XHXwhtRxSSFRpFb_jP6g/runs/0/artifacts/public/logs/live_backing.log


This seems to be a Windows-only failure that happens frequently and maybe has started recently. We should check.

Note that this is a new issue and started with beta simulations for 124b1 on Windows headless only.

OS: Unspecified → Windows
Hardware: Unspecified → Desktop

Here the recent beta simulation pushes:

I've triggered some re-builds for the wd2 jobs on all of the former builds to see if it might also happen earlier. The changeset for Feb 6th doesn't make sense because it's Cocoa only.

So far no reproduction. Maybe it is some kind of Windows issue caused by updating workers?

As discussed in today's meeting Julian will have a look.

Flags: needinfo?(jdescottes)

As it looks like only the tests in webdriver/tests/classic/perform_actions/pointer_touch.py are causing this problem.

Summary: Intermittent Wd | OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted → Intermittent TEST-UNEXPECTED-ERROR | /webdriver/tests/classic/perform_actions/pointer_touch.py | <random> - teardown error: OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
Duplicate of this bug: 1879993

(In reply to Joel Maher ( :jmaher ) (UTC -8) from bug 1879993 comment #1)

there will be a deployment this week of the new windows image with python 3.11, so I wanted to get ahead of the game so we didn't have a perma fail and something that we could uplift as needed.

here is how I ran the test on try:

./mach try fuzzy --no-artifact --worker-override="win11-64-2009=gecko-t/win11-64-2009-alpha" -q 'test-windows11-64 wdspec headless !debug'

Joel, any idea why this failure started with the beta simulation jobs on Try on February 6th? See my comment 3. It doens't make any sense. Was there maybe another Windows update applied that day or the day before which could have triggered this problem?

Flags: needinfo?(jmaher)

no windows changes have been made/deployed in the last few weeks. As this failure is seen, it makes me suspect either we have a different failure pattern with the new python 3.11, or python 3.11 introduces some timing issues or different package dependencies. for example if we had a slightly different version of some other dependent package cached on certain VMs, that could account for why this was seen (low chance)

Flags: needinfo?(jmaher)
Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/334d08337edd disable webdriver/tests/classic/perform_actions/pointer_touch.py on win/opt. r=whimboo,webdriver-reviewers

Marking as disabled on Windows for tracking purposes.

Assignee: jmaher → nobody
Blocks: 1743116
Status: ASSIGNED → NEW
Keywords: test-disabled
See Also: → 1891294
Summary: Intermittent TEST-UNEXPECTED-ERROR | /webdriver/tests/classic/perform_actions/pointer_touch.py | <random> - teardown error: OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted → Intermittent TEST-UNEXPECTED-ERROR | /webdriver/tests/classic/perform_actions/<test file>| <random> - teardown error: OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
Duplicate of this bug: 1891294

The finalizer sometimes runs in parallel to other commands, which leads to a failure.
I think addfinalizer is not guaranteed to use the same event loop as regular async fixtures.
Instead using yield should still schedule the actions.release command at the end of the test, but run in the same event loop

Attachment #9398877 - Attachment is obsolete: true

The approach using only yield was still failing. I tried to add logs to get a better sense of which commands might be the issue here, but when I do I can't get any test failure.

However, waiting during the release_actions fixture seems to fix the intermittent. I'm not sure if this is acceptable, considering this is only needed to fix a mozilla-CI-specific issue, but I'll still propose a patch for that.

Flags: needinfo?(jdescottes)

This allows to avoid relatively frequent failures on mozilla's windows CI

From what I can tell, we always run tests on Linux for wpt.fyi, except for Edge . I looked at a few edge runs with failures for this test suite but I couldn't find any with the same stack trace.

Pushed by jdescottes@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/60bd84a99544 [wdspec] Wait for 100ms in release_actions fixture r=webdriver-reviewers,whimboo
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/46425 for changes under testing/web-platform/tests
Upstream PR was closed without merging
Upstream PR merged by moz-wptsync-bot

The test is no longer disabled for ASAN builds but we still don't know the underlying reason for this race. We should still check while the workaround ensures that the test doesn't fail.

It might be good to uplift this wait patch to beta. Julian, would you mind to request?

Flags: needinfo?(jdescottes)

Comment on attachment 9399930 [details]
Bug 1879556 - [wdspec] Wait for 100ms in release_actions fixture

Beta/Release Uplift Approval Request

  • User impact if declined: Fixes intermittent failures for wdspec tests on windows
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Test only change, delaying a test helper by 100ms on windows
  • String changes made/needed:
  • Is Android affected?: No
Flags: needinfo?(jdescottes)
Attachment #9399930 - Flags: approval-mozilla-beta?

Comment on attachment 9399930 [details]
Bug 1879556 - [wdspec] Wait for 100ms in release_actions fixture

We build our last beta tomorrow, it is a long standing issue that doesn't affect users, let's have it ride the train, thanks.

Attachment #9399930 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: