59 bytes, text/x-review-board-request
:jgraham- can you look at this given that this is failing often every day.
Whiteboard: [stockwell needswork]
It looks like maybe when this fails the test file isn't loading at all? Otherwise I would expect some intermediate output. Also increasing the timeout doesn't help. If that's true disabling it *might* just make the following test intermittent in the same way.
Something bad happens in pytest startup so we never start running tests. I'm pretty sure disabling won't have any impact here except to change the test that's unstable. I'm doing an excessie number of try runs to figure out what's up.
So so there's good news and bad news. The good news is I think I know what the problem is. The bad news is what the problem is. So I think that what's happening here is a nasty interaction between multiprocessing and threading. On Linux multiprocessing calls fork() which provides a child process with a copy (well copy-on-write) of the parent process memory. That usually works fine. However in the case that you have a mutex in the parent process that happens to be locked at the time of the fork, and you attempt to acquire that same mutex in the child process it will now always appear locked because even if it were released in the parent that change is never reflected in the child's copy. In our case we see an intermittent deadlock when trying to set up the structured logger from pytest, despite the fact that there's only a single thread in that process. Keying the lock itself on pid makes the problem go away, but isn't a very general solution because there could be other bits of code that will fail in the same way at some later time. I'm not sure what the best solution here is other than "don't use threading and multiprocessing together" (needless to say I didn't know about this issue when the code was written). Possibly landing some hack in the short term is a good idea.
I am glad there is some news about this- do you think we can get a hack in this week, or should we disable this test (or wdspec) job until we can get a hack landed?
Disabling this test won't help at all. Disabling the entire suite would be bad for other reasons. You would have to make the job Tier < 1. But I have an idea for a better solution that should be low effort (basically there's a backported version of the Python 3 multiprocessing module that allows you to use multiple processes in a way that doesn't hillariously violate posix semantics), so I'll try that.
Comment on attachment 8880151 [details] Bug 1354750 - Disable loading mozlog plugin with pytest for wpt, https://reviewboard.mozilla.org/r/151526/#review156614
Attachment #8880151 - Flags: review?(ato) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/a7e6d0b5fbdd Disable loading mozlog plugin with pytest for wpt, r=ato
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox56: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
should we uplift this to beta?
Whiteboard: [stockwell needswork] → [stockwell fixed:other]
status-firefox55: --- → fixed
You need to log in before you can comment on or make changes to this bug.