Closed Bug 1527799 Opened 5 years ago Closed 5 years ago

Processes still running after task on Windows

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: glandium, Unassigned)

References

Details

As can be seen at the end of: https://queue.taskcluster.net/v1/task/G0j-vl9PS861uEMQ7u-6Dg/runs/0/artifacts/public/logs/live_backing.log

[taskcluster:error] [mounts] Could not unmount <nil> due to: 'Could not persist cache "level-3-checkouts" due to remove Z:\task_1550107918\build\src\vs2017_15.4.2\VC\bin\Hostx64\x64\msvcp140.dll: Access is denied.'
[taskcluster 2019-02-14T02:24:22.928Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/G0j-vl9PS861uEMQ7u-6Dg/runs/1/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2020-02-14T00:40:39.774Z
[taskcluster:error] Could not persist cache "level-3-checkouts" due to remove Z:\task_1550107918\build\src\vs2017_15.4.2\VC\bin\Hostx64\x64\msvcp140.dll: Access is denied.

The first line suggests some process using the msvcp140.dll file is still running, which shouldn't be happening: all processes should be killed at the end of a task.

Flags: needinfo?(pmoore)

List of processes still running at the end of the failing task:
https://taskcluster-artifacts.net/ZFZlawJ5QxSovzeUV3ALFw/0/public/logs/live_backing.log

Z:\task_1550112176>wmic process get description,executablepath 
Description          ExecutablePath                                                           

System Idle Process                                                                           
System                                                                                        
smss.exe                                                                                      
csrss.exe                                                                                     
wininit.exe                                                                                   
csrss.exe                                                                                     
winlogon.exe                                                                                  
services.exe                                                                                  
lsass.exe                                                                                     
svchost.exe                                                                                   
svchost.exe                                                                                   
dwm.exe                                                                                       
svchost.exe                                                                                   
svchost.exe                                                                                   
svchost.exe                                                                                   
svchost.exe                                                                                   
svchost.exe                                                                                   
spoolsv.exe                                                                                   
LiteAgent.exe                                                                                 
svchost.exe                                                                                   
dirmngr.exe                                                                                   
nssm.exe                                                                                      
IpOverUsbSvc.exe                                                                              
cmd.exe                                                                                       
conhost.exe                                                                                   
nxlog.exe                                                                                     
svchost.exe                                                                                   
Ec2Config.exe                                                                                 
WmiPrvSE.exe                                                                                  
taskhostex.exe       C:\Windows\system32\taskhostex.exe                                       
explorer.exe         C:\Windows\Explorer.EXE                                                  
WmiPrvSE.exe                                                                                  
svchost.exe                                                                                   
svchost.exe                                                                                   
msdtc.exe                                                                                     
generic-worker.exe                                                                            
livelog.exe                                                                                   
vctip.exe            z:\task_1550112176\build\src\vs2017_15.4.2\VC\bin\Hostx64\x64\VCTIP.EXE  
cmd.exe              C:\Windows\system32\cmd.exe                                              
conhost.exe          C:\Windows\system32\conhost.exe                                          
WMIC.exe             C:\Windows\System32\Wbem\WMIC.exe                                        

Clearly, the culprit is vctip, which is presumably leftover from running VC++. It's something the job could avoid having still running or running at all, but it's also clearly a problem that the worker process doesn't clean up the running processes before unmounting caches.

It is by design, that if a cache cannot be unmounted, it is not persisted.

The problem with killing processes first in order to be able to unmount a cache, is that the existence of a process still running after the task has completed which has an open file handle to a file inside the cache, means that we can't be sure the cache is in a clean state. If we kill a process which is writing to the cache, we could leave it in a corrupt state. We therefore only unmount caches if the task completed successfully and no processes are still running which have open file handles to files inside the cache.

If it the responsibility of the task to ensure that no locks are held on caches when the task completes, so if it is preferred to kill vctip, this should be explicitly handled in the task, since the worker cannot know if killing a particular process will leave a cache in a bad state or not. In contrast the task understands what the processes are and whether it is safe to kill them or not.

Note workers are rebooted between tasks, so no zombie processes persist across task boundaries.

Flags: needinfo?(pmoore)

It is by design, that if a cache cannot be unmounted, it is not persisted.

Fine, but why does it have to make the task fail?

And after bug 1527798:

[taskcluster:error] [mounts] Could not unmount <nil> due to: 'Could not persist cache "level-3-checkouts" due to remove Z:\task_1550178085\build\src\vs2017_15.8.4\VC\bin\Hostx64\x64\mspdbcore.dll: Access is denied.'
[taskcluster:error] Could not persist cache "level-3-checkouts" due to remove Z:\task_1550178085\build\src\vs2017_15.8.4\VC\bin\Hostx64\x64\mspdbcore.dll: Access is denied.

(In reply to Mike Hommey [:glandium] from comment #3)

It is by design, that if a cache cannot be unmounted, it is not persisted.

Fine, but why does it have to make the task fail?

The task declares a cache to be persisted after the task completes, but after the task has completed, there are open file handles in the cache, so the cache cannot be released. This is a good enough reason to mark the task as failed. It should first release the open file handles, before completing. Not doing so either means the worker can't release the cache, or forces it to aggressively kill processes, which could leave the cache in a compromised state. This isn't something we would want from a successful task. The open file handles indicate that the task did not complete successfully, as resources were not released.

By making it the responsibility of the task to release file handles, rather than the worker, the worker does not make assumptions about which processes can or cannot be safely killed, and which will interfere with the cache. Having a successful task, but not persisting the cache, would also be strange/misleading behaviour.

I guess we can agree this is wontfix. Still painful to discover that changes that affect tasks don't trigger said tasks.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.