Closed Bug 1354750 Opened 7 years ago Closed 7 years ago

Intermittent TEST-UNEXPECTED-TIMEOUT | /webdriver/actions/key.py | expected OK

Categories

(Testing :: web-platform-tests, defect)

Version 3
defect
Not set
normal

Tracking

(firefox55 fixed, firefox56 fixed)

RESOLVED FIXED
mozilla56
Tracking Status
firefox55 --- fixed
firefox56 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jgraham)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:other])

Attachments

(1 file)

:jgraham- can you look at this given that this is failing often every day.
Flags: needinfo?(james)
Whiteboard: [stockwell needswork]
It looks like maybe when this fails the test file isn't loading at all? Otherwise I would expect some intermediate output. Also increasing the timeout doesn't help. If that's true disabling it *might* just make the following test intermittent in the same way.
Something bad happens in pytest startup so we never start running tests. I'm pretty sure disabling won't have any impact here except to change the test that's unstable. I'm doing an excessie number of try runs to figure out what's up.
Flags: needinfo?(james)
Flags: needinfo?(james)
So so there's good news and bad news. The good news is I think I know what the problem is. The bad news is what the problem is. 

So I think that what's happening here is a nasty interaction between multiprocessing and threading. On Linux multiprocessing calls fork() which provides a child process with a copy (well copy-on-write) of the parent process memory. That usually works fine. However in the case that you have a mutex in the parent process that happens to be locked at the time of the fork, and you attempt to acquire that same mutex in the child process it will now always appear locked because even if it were released in the parent that change is never reflected in the child's copy. In our case we see an intermittent deadlock when trying to set up the structured logger from pytest, despite the fact that there's only a single thread in that process. Keying the lock itself on pid makes the problem go away, but isn't a very general solution because there could be other bits of code that will fail in the same way at some later time.

I'm not sure what the best solution here is other than "don't use threading and multiprocessing together" (needless to say I didn't know about this issue when the code was written). Possibly landing some hack in the short term is a good idea.
I am glad there is some news about this- do you think we can get a hack in this week, or should we disable this test (or wdspec) job until we can get a hack landed?
Disabling this test won't help at all. Disabling the entire suite would be bad for other reasons. You would have to make the job Tier < 1. But I have an idea for a better solution that should be low effort (basically there's a backported version of the Python 3 multiprocessing module that allows you to use multiple processes in a way that doesn't hillariously violate posix semantics), so I'll try that.
Comment on attachment 8880151 [details]
Bug 1354750 - Disable loading mozlog plugin with pytest for wpt,

https://reviewboard.mozilla.org/r/151526/#review156614
Attachment #8880151 - Flags: review?(ato) → review+
Pushed by james@hoppipolla.co.uk:
https://hg.mozilla.org/integration/autoland/rev/a7e6d0b5fbdd
Disable loading mozlog plugin with pytest for wpt, r=ato
Flags: needinfo?(james)
https://hg.mozilla.org/mozilla-central/rev/a7e6d0b5fbdd
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
should we uplift this to beta?
Whiteboard: [stockwell needswork] → [stockwell fixed:other]
Assignee: nobody → james
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: