Closed Bug 1585494 Opened 5 months ago Closed 5 months ago

Adjust xpcshell directory removal retries

Categories

(Testing :: XPCShell Harness, enhancement)

Version 3
enhancement
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

On Windows (both 7 and 10), the xpcshell harness intermittently experiences trouble removing temporary directories (especially the plugins directory). Most often the OS error indicates that a file is still in use. There is retry logic to allow for that: up to 25 attempts are made at 1 second intervals; if retries are exhausted, one last attempt is made at the end of the harness run.

In bug 1584692, some Windows xpcshell tasks are timing out after 60 minutes, and that's odd because most such tasks complete in about 30 minutes. It has been noted that several groups of tests in the problematic chunk are long-running and multiple test timeouts often occur in those long-running tests in the tasks that time out.

Another apparent characteristic of the long-running tasks is that test logs show extra retries for plugin directory removal. Almost all Windows xpcshell test logs show some removal errors and retries, but the problematic runs seem to have significantly more retries. Retries "waste" time in that they run on test threads: For the 25 seconds that a thread is trying to remove a directory, the next queued test is stalled. I also wonder if retrying is sometimes causing some sort of cascading failure: while multiple threads are trying to remove directories, perhaps the filesystem is over-loaded and other file operations are stalled, stalling more tests, causing more retries...??

Would we see more reliable run times with less retries?

Example:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=269278863&repo=autoland&lineNumber=6462

[task 2019-10-01T19:15:30.634Z] 19:15:30     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:30.821Z] 19:15:30     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:31.188Z] 19:15:31     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:31.323Z] 19:15:31     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:32.318Z] 19:15:32     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:32.693Z] 19:15:32     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:33.132Z] 19:15:33     INFO -  Failed to remove directory: c:\users\task_1569953948\appdata\local\temp\xpc-plugins-emq4ns. Waiting.
[task 2019-10-01T19:15:33.819Z] 19:15:33     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:34.134Z] 19:15:34     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:34.634Z] 19:15:34     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:34.689Z] 19:15:34     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:35.636Z] 19:15:35     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:35.819Z] 19:15:35     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:37.138Z] 19:15:37     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:37.193Z] 19:15:37     INFO -  Failed to remove directory: c:\users\task_1569953948\appdata\local\temp\xpc-plugins-xuelsw. Waiting.
[task 2019-10-01T19:15:38.193Z] 19:15:38     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:38.320Z] 19:15:38     INFO -  Failed to remove directory: c:\users\task_1569953948\appdata\local\temp\xpc-plugins-w1ckdc. Waiting.
[task 2019-10-01T19:15:38.695Z] 19:15:38     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:39.134Z] 19:15:39     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:39.322Z] 19:15:39     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:39.697Z] 19:15:39     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:39.820Z] 19:15:39     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:40.819Z] 19:15:40     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:41.194Z] 19:15:41     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:41.637Z] 19:15:41     INFO -  Failed to remove directory: c:\users\task_1569953948\appdata\local\temp\xpc-plugins-emq4ns. Waiting.
[task 2019-10-01T19:15:42.320Z] 19:15:42     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:42.639Z] 19:15:42     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:43.138Z] 19:15:43     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:43.193Z] 19:15:43     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-xuelsw',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:44.140Z] 19:15:44     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:44.319Z] 19:15:44     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-w1ckdc',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:45.638Z] 19:15:45     INFO -  rmtree() failed for "('c:\\users\\task_1569953948\\appdata\\local\\temp\\xpc-plugins-emq4ns',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
[task 2019-10-01T19:15:45.693Z] 19:15:45     INFO -  Failed to remove directory: c:\users\task_1569953948\appdata\local\temp\xpc-plugins-xuelsw. Waiting.

(In reply to Geoff Brown [:gbrown] from comment #0)

For the 25 seconds that a thread is trying to remove a directory, the next queued test is stalled.

Actually, it is worse than that. Each of the 25 retries is a call to mozfile.remove, which has its own retries and delays: 5 retries with an increasing delay of N*0.5 seconds. So up to 25 * (1+.5+1+1.5+2+2.5) = 212.5 seconds.

Current and reduced retries:
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=e620f3102cc73155c53ca4e70ebc021d09217164
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=d182d7ec77e900ac10ad6b39a9fe7d57fd57937d
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=bb9c5bd874272aeda0e9329892b96fa6b5da97cf

Reducing retries reduces the time "wasted" waiting to delete the plugins directory, but it doesn't seem to make a practical difference to task outcome. (On the other hand, there's no apparent benefit to all these retries.)

I'll just increase the max-run-time again instead.

Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.