Created attachment 437050 [details] [diff] [review] patch v1 Workers should try to restart if they have unrecoverable problems rather than just terminate. This will help in automating the management of multiple workers.
Comment on attachment 437050 [details] [diff] [review] patch v1 This looks good. As I'm thinking about the things the worker would hit when it restarts, I started thinking about rogue processes from Firefox and Firefox subprocesses (as we talked about today). Does it make sense for the worker to do some kind of ps -A |grep Firefox & kill -9 anything it finds when it is doing this restarting step? Or is that logic handled by the sisyphus system itself? r+, contingent on resolving that ^ issue.
Created attachment 438680 [details] [diff] [review] killTest patch I started working on a patch to do that. The previous killTest might accidentally kill the worker and would only try to do so if there was an indication something went wrong. If the worker was unaware of the stuck processes it wouldn't handle it. This patch kills anything below the build directory either before a build is performed or before a test runs so I think it will cover the bases you outlined. It currently has the build directory hard-wired. I should be able to get the build directory another way, so its not quite ready for prime time. It *appears* to be working in testing though. Another thing it could do better is flag the case where a process is still running after the test *should have* completed.
Comment on attachment 438680 [details] [diff] [review] killTest patch This looks like the right approach to me.
Created attachment 438994 [details] [diff] [review] killTest patch v2 same as v1 with addition of calling killTest when exiting or restarting the worker. This still has the hard coded path, but since we are rethinking the build issues, lets go with it as is for now.