The temporary folder might be deleted before we attempt to generate a minidump
Categories
(Testing :: XPCShell Harness, defect, P2)
Tracking
(Not tracked)
People
(Reporter: gsvelto, Assigned: gsvelto)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
While investigating bug 1998231 I've stumbled upon this error. The sequence of events that seems to be happening is the following:
- The harness detects a test timing out
- It decides to kill the process and grab a minidump
- The test process exists cleanly and the harness removes the temporary directory
- We attempt to write a minidump but fail because the folder where we want to write it is gone
I haven't had time to check if this sequence of events is possible or if I'm misreading the log, but it appears to be plausible. This seems to be happening only on 32-bit Windows for some reason, but that might be the only platform where the race is wide enough since we launch a separate process to write a minidump, instead of having the harness write it by itself.
Comment 1•1 month ago
|
||
The severity field is not set for this bug.
:jmaher, could you have a look please?
For more information, please visit BugBot documentation.
Updated•1 month ago
|
| Assignee | ||
Comment 2•1 month ago
|
||
I think I understand what's going on. When we launch an xpcshell test we fire a timer to kill the test process on a timeout, the function is here. The code around this function assumes it runs without interruptions and, crucially, without letting the Python event loop spin. This is true for 64-bit Windows, however on 32-bit Windows mozcrash will spawn another process and wait. My modest knowledge of Python tells me that this causes the event loop to spin during the wait and, if the test has finished executing, it will resume execution here, which will delete the temporary directory and ultimately cause the failure in minidumpwriter.exe (as the directory it expects to write the minidump is gone). Now I have to figure out a way to prevent this race.
| Assignee | ||
Comment 3•1 month ago
|
||
| Assignee | ||
Comment 4•1 month ago
|
||
I managed to reproduce locally and I can confirm that my theory in comment 2 is correct, now to find a way to prevent this race.
| Assignee | ||
Updated•1 month ago
|
| Assignee | ||
Comment 5•1 month ago
|
||
Let's see if this fixes the problem, try run: https://treeherder.mozilla.org/jobs?repo=try&revision=e9f20133b998ad82e24dad89873fe6431a3cc059
| Assignee | ||
Comment 6•1 month ago
|
||
Looking good, time for review.
Updated•1 month ago
|
Description
•