switch from timeout command to subprocess.run timeout argument
Categories
(Socorro :: Processor, enhancement, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
References
Details
Attachments
(1 file)
The Socorro processor was written a million years ago and it's possible that whatever was running the stackwalker as a sub process didn't have a timeout or it was flaky. Because of that, we run the stackwalker using the timeout gnu coreutils command to prevent it from running forever.
Nowadays, we're using subprocess.run which has a timeout argument. Under the hood, it uses threads to keep track of the sub process and timeout. Contrast that with the timeout gnu coreutils command which uses a child process.
Roughly what we have now:
import subprocess
subprocess.run(["timeout", "--signal", "KILL", "100", "sleep", "100"], capture_output=True)
$ python subprocesstest.py &
[1] 253021
$ ps -f
UID PID PPID C STIME TTY TIME CMD
willkg 244711 6089 0 17:36 pts/3 00:00:00 /bin/bash
willkg 253021 244711 0 17:43 pts/3 00:00:00 /home/willkg/.pyenv/versions/3.10.12/bin/python subprocesstest.py
willkg 253149 253021 0 17:43 pts/3 00:00:00 timeout --signal KILL 100 sleep 100
willkg 253151 253149 0 17:43 pts/3 00:00:00 sleep 100
willkg 253255 244711 99 17:43 pts/3 00:00:00 ps -f
Wildly, if you kill the python subprocesstest.py process, the sleep 100 process continues:
$ fg
python subprocesstest.py
^C
Traceback (most recent call last):
File "/home/willkg/mozilla/socorro/subprocesstest.py", line 3, in <module>
subprocess.run(["timeout", "--signal", "KILL", "100", "sleep", "100"], capture_output=True)
File "/home/willkg/.pyenv/versions/3.10.12/lib/python3.10/subprocess.py", line 505, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/home/willkg/.pyenv/versions/3.10.12/lib/python3.10/subprocess.py", line 1154, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/home/willkg/.pyenv/versions/3.10.12/lib/python3.10/subprocess.py", line 2021, in _communicate
ready = selector.select(timeout)
File "/home/willkg/.pyenv/versions/3.10.12/lib/python3.10/selectors.py", line 416, in select
fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
$ ps -f
UID PID PPID C STIME TTY TIME CMD
willkg 244711 6089 0 17:36 pts/3 00:00:00 /bin/bash
willkg 253151 3125 0 17:43 pts/3 00:00:00 sleep 100
willkg 253765 244711 0 17:44 pts/3 00:00:00 ps -f
What I want to switch to:
import subprocess
subprocess.run(["sleep", "100"], timeout=100, capture_output=True)
$ python subprocesstest.py &
[1] 255047
$ ps -f
UID PID PPID C STIME TTY TIME CMD
willkg 244711 6089 0 17:36 pts/3 00:00:00 /bin/bash
willkg 255047 244711 1 17:46 pts/3 00:00:00 /home/willkg/.pyenv/versions/3.10.12/bin/python subprocesstest.py
willkg 255198 255047 0 17:46 pts/3 00:00:00 sleep 100
willkg 255275 244711 0 17:46 pts/3 00:00:00 ps -f
If we kill python subprocesstest.py in this case, we don't end up with orphaned processes:
$ fg
python subprocesstest.py
^C
Traceback (most recent call last):
File "/home/willkg/mozilla/socorro/subprocesstest.py", line 3, in <module>
subprocess.run(["sleep", "100"], timeout=100, capture_output=True)
File "/home/willkg/.pyenv/versions/3.10.12/lib/python3.10/subprocess.py", line 505, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/home/willkg/.pyenv/versions/3.10.12/lib/python3.10/subprocess.py", line 1154, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/home/willkg/.pyenv/versions/3.10.12/lib/python3.10/subprocess.py", line 2021, in _communicate
ready = selector.select(timeout)
File "/home/willkg/.pyenv/versions/3.10.12/lib/python3.10/selectors.py", line 416, in select
fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
$ ps -f
UID PID PPID C STIME TTY TIME CMD
willkg 244711 6089 0 17:36 pts/3 00:00:00 /bin/bash
willkg 255597 244711 99 17:46 pts/3 00:00:00 ps -f
This bug covers switching from timeout gnu coreutils command to using subprocess.run with a timeout.
| Assignee | ||
Comment 1•2 years ago
|
||
| Assignee | ||
Comment 2•2 years ago
|
||
| Assignee | ||
Comment 3•2 years ago
|
||
I deployed this to prod just now in bug #1851648. Marking as FIXED.
Description
•