Adjust xpcshell directory removal retries
Categories
(Testing :: XPCShell Harness, enhancement)
Tracking
(Not tracked)
People
(Reporter: gbrown, Assigned: gbrown)
References
Details
On Windows (both 7 and 10), the xpcshell harness intermittently experiences trouble removing temporary directories (especially the plugins directory). Most often the OS error indicates that a file is still in use. There is retry logic to allow for that: up to 25 attempts are made at 1 second intervals; if retries are exhausted, one last attempt is made at the end of the harness run.
In bug 1584692, some Windows xpcshell tasks are timing out after 60 minutes, and that's odd because most such tasks complete in about 30 minutes. It has been noted that several groups of tests in the problematic chunk are long-running and multiple test timeouts often occur in those long-running tests in the tasks that time out.
Another apparent characteristic of the long-running tasks is that test logs show extra retries for plugin directory removal. Almost all Windows xpcshell test logs show some removal errors and retries, but the problematic runs seem to have significantly more retries. Retries "waste" time in that they run on test threads: For the 25 seconds that a thread is trying to remove a directory, the next queued test is stalled. I also wonder if retrying is sometimes causing some sort of cascading failure: while multiple threads are trying to remove directories, perhaps the filesystem is over-loaded and other file operations are stalled, stalling more tests, causing more retries...??
Would we see more reliable run times with less retries?
Assignee | ||
Comment 1•5 years ago
|
||
Example:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=269278863&repo=autoland&lineNumber=6462
[task 2019-10-01T19:15:30.634Z] 19:15:30 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:30.821Z] 19:15:30 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:31.188Z] 19:15:31 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:31.323Z] 19:15:31 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:32.318Z] 19:15:32 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:32.693Z] 19:15:32 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:33.132Z] 19:15:33 INFO - Failed to remove directory: c:\users\task_1569953948\appdata\local\temp\xpc-plugins-emq4ns. Waiting.
[task 2019-10-01T19:15:33.819Z] 19:15:33 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:34.134Z] 19:15:34 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:34.634Z] 19:15:34 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:34.689Z] 19:15:34 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:35.636Z] 19:15:35 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:35.819Z] 19:15:35 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:37.138Z] 19:15:37 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:37.193Z] 19:15:37 INFO - Failed to remove directory: c:\users\task_1569953948\appdata\local\temp\xpc-plugins-xuelsw. Waiting.
[task 2019-10-01T19:15:38.193Z] 19:15:38 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:38.320Z] 19:15:38 INFO - Failed to remove directory: c:\users\task_1569953948\appdata\local\temp\xpc-plugins-w1ckdc. Waiting.
[task 2019-10-01T19:15:38.695Z] 19:15:38 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:39.134Z] 19:15:39 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:39.322Z] 19:15:39 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:39.697Z] 19:15:39 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:39.820Z] 19:15:39 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:40.819Z] 19:15:40 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:41.194Z] 19:15:41 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:41.637Z] 19:15:41 INFO - Failed to remove directory: c:\users\task_1569953948\appdata\local\temp\xpc-plugins-emq4ns. Waiting.
[task 2019-10-01T19:15:42.320Z] 19:15:42 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:42.639Z] 19:15:42 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:43.138Z] 19:15:43 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:43.193Z] 19:15:43 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:44.140Z] 19:15:44 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:44.319Z] 19:15:44 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:45.638Z] 19:15:45 INFO - rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:45.693Z] 19:15:45 INFO - Failed to remove directory: c:\users\task_1569953948\appdata\local\temp\xpc-plugins-xuelsw. Waiting.
Assignee | ||
Comment 2•5 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #0)
For the 25 seconds that a thread is trying to remove a directory, the next queued test is stalled.
Actually, it is worse than that. Each of the 25 retries is a call to mozfile.remove, which has its own retries and delays: 5 retries with an increasing delay of N*0.5 seconds. So up to 25 * (1+.5+1+1.5+2+2.5) = 212.5 seconds.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 3•5 years ago
|
||
Current and reduced retries:
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=e620f3102cc73155c53ca4e70ebc021d09217164
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=d182d7ec77e900ac10ad6b39a9fe7d57fd57937d
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=bb9c5bd874272aeda0e9329892b96fa6b5da97cf
Reducing retries reduces the time "wasted" waiting to delete the plugins directory, but it doesn't seem to make a practical difference to task outcome. (On the other hand, there's no apparent benefit to all these retries.)
Assignee | ||
Comment 4•5 years ago
|
||
I'll just increase the max-run-time again instead.
Description
•