Closed Bug 814496 Opened 9 years ago Closed 9 years ago
SUT: long-running exec'd process corrupts response to future commands
If devicemanagerSUT.launchProcess is used to exec a process via StartPrg2, and that process runs for more than 300 s, the output from that process may be returned at a later time and confused with the response data for another command. (There may be similar issues with other types of exec's and other ways of launching via dm -- this is just a description of the particular scenario that is giving me trouble today!)
Observed while running robocop locally on a tegra, while investigating bug 805969/813858... On one run, testBrowserProvider executed for longer than normal. devicemanagerSUT.launchProcess had been called which eventually called _sendCmds, which was waiting for the sut prompt response, for up to 300 s: https://hg.mozilla.org/mozilla-central/file/3b71d63eafd5/testing/mozbase/mozdevice/mozdevice/devicemanagerSUT.py#l207 On the sutAgent side, StartPrg2 exec'd "am instrument ...", created a RedirOutputThread, and waited for the thread for up to 300 s (30x10): https://hg.mozilla.org/mozilla-central/file/3b71d63eafd5/build/mobile/sutagent/android/DoCommand.java#l3729 StartPrg2 timed out after 300 s, and returned. dmSUT._sendCmds received the $> prompt without most of the test output and without the normal returnCode= marker. launchProcess returned and dmSUT began monitoring for process completion using the "ps" command periodically. After some time, a "ps" command received the test output concatenated with the normal ps output. I suppose the RedirOutputThread for the exec continues to run after the time-out and still has a valid stream to the socket...as it receives output, it keeps sending it to the socket, even though the listener has now moved on to issue other commands.
With logging and an artificially long-running test, I was able to confirm: If an exec'd process runs for longer than the 30x10=300 second timeout, the RedirOutputThread continues to run and redirect output, which is just mixed in with the responses to whatever other command happens to be running at that time.
I have added a stopRedirect() function to RedirOutputThread. stopRedirect signals the thread loop to exit and waits for confirmation that redirection has ended before returning. RedirOutputThread.run breaks out of its redirection loop when signalled, notifies stopRedirect() that redirection has ended, and then continues to wait for process completion (so that it can clean up effectively). I tested locally with an artificially long-running robocop test and verified that there was no longer any response corruption.
Attachment #684786 - Flags: review?(wlachance)
Comment on attachment 684786 [details] [diff] [review] stop RedirOutputThread process output redirection on timeout So I could be missing something, but as it is I don't see any reason why you couldn't override the RedirOutputThread class to do this automatically after the thread is joined. I think getting the right behaviour automatically here would be both safer and less verbose. Otherwise, good catch!
Attachment #684786 - Flags: review?(wlachance) → review-
(In reply to William Lachance (:wlach) from comment #4) > Comment on attachment 684786 [details] [diff] [review] > stop RedirOutputThread process output redirection on timeout > > So I could be missing something, but as it is I don't see any reason why you > couldn't override the RedirOutputThread class to do this automatically after > the thread is joined. I think getting the right behaviour automatically here > would be both safer and less verbose. Good idea, but... - There are a couple of places where we want to join but keep redirection going. - Thread.join is final -- it cannot be overridden.
...so then the patch starts to look like this. I think I prefer the original patch. Thoughts?
Attachment #685404 - Flags: feedback?(wlachance)
Comment on attachment 685404 [details] [diff] [review] an alternative Given that we can't override join, I think either this one or the other are about equally good. I think I slightly prefer this one as it does provide a one-stop-shop method to use but the difference is slight. I guess I'll give r+ to either and let you make the final call.
Attachment #685404 - Flags: feedback?(wlachance) → feedback+
Comment on attachment 684786 [details] [diff] [review] stop RedirOutputThread process output redirection on timeout Changing r- to r+ based on discussion.
Attachment #684786 - Flags: review- → review+
Attachment #684786 - Attachment is obsolete: true
Comment on attachment 685404 [details] [diff] [review] an alternative Let's go with the new version. r=wlach
Attachment #685404 - Flags: review+
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
You need to log in before you can comment on or make changes to this bug.