Closed
Bug 1448900
Opened 6 years ago
Closed 6 years ago
Confusing "Failed to stop browser process" log message when trying to kill the already exited browser process
Categories
(Testing :: geckodriver, defect, P1)
Testing
geckodriver
Tracking
(firefox61 fixed)
RESOLVED
FIXED
mozilla61
Tracking | Status | |
---|---|---|
firefox61 | --- | fixed |
People
(Reporter: whimboo, Assigned: ato)
References
Details
Attachments
(2 files)
(In reply to Henrik Skupin (:whimboo) from bug 1434618 comment #20) > I see some strange behavior here. All is based on the following log: > > https://treeherder.mozilla.org/logviewer.html#?job_id=169826750&repo=try > (log viewer) > https://taskcluster-artifacts.net/W8In7kJvTES01pc-37mDGA/0/public/logs/ > live_backing.log (full log) > > > 1) Geckodriver reports after only 5s that it failed to stop the browser > process. How can this be given that we should wait 70s for that to be > happening. Andreas, do you have an idea? Here the excerpt from the log: > > > [task 2018-03-23T09:02:46.253Z] 09:02:46 INFO - PID 8530 | 1521795766245 Marionette TRACE 0 -> [0,3,"quit",{"flags":["eForceQuit"]}] > [..] > > [task 2018-03-23T09:02:51.563Z] 09:02:51 INFO - PID 8530 | 1521795771555 geckodriver::marionette ERROR Failed to stop browser process (In reply to Andreas Tolfsen ‹:ato› from comment #21) > I think std::process::Child::kill() errors if the process isn’t > running: > > https://doc.rust-lang.org/std/process/struct.Child.html#method.kill > > mozrunner unconditionally calls this after > mozrunner::RunnerProcess::running() is false in the ::wait() loop: > > > https://searchfox.org/mozilla-central/rev/ > 003262ae12ce937950ffb8d3b0fa520d1cc38bff/testing/mozbase/rust/mozrunner/src/ > runner.rs#147 (In reply to Andreas Tolfsen ‹:ato› from comment #22) > Although killing twice an ended process does not: > https://play.rust-lang.org/ > ?gist=85ae7d8f983307245872fb328b5685b4&version=stable
Reporter | ||
Comment 1•6 years ago
|
||
We should really get rid of this confusing message which originates here: https://searchfox.org/mozilla-central/rev/003262ae12ce937950ffb8d3b0fa520d1cc38bff/testing/geckodriver/src/marionette.rs#604 It's really interesting why `kill()` reports an error in that case. I would like to know what kind of error this actually is.
Reporter | ||
Comment 2•6 years ago
|
||
Ok, so it can be replicated with the following code: https://play.rust-lang.org/?gist=3ec0f0a6a8278ad0b218814d6b98f6c3&version=stable The problem is going to happen when the process exits itself, and we try to kill it afterward.
Reporter | ||
Comment 3•6 years ago
|
||
The failure as thrown here is actually:
> Failed to stop process: Error { repr: Custom(Custom { kind: InvalidInput, error: StringError("invalid argument: can\'t kill an exited process") }) }
It means we should really only call `kill()` if we have to. And I remember that I was unhappy with that change when I reviewed the code.
Reporter | ||
Comment 4•6 years ago
|
||
Just for reference, the related code was added by bug 1443853. Andreas, can you provide a fix for that?
Blocks: 1443853
Flags: needinfo?(ato)
Summary: Confusing "Failed to stop browser process" log messages → Confusing "Failed to stop browser process" log message when trying to kill the already exited browser process
Assignee | ||
Comment 5•6 years ago
|
||
Incidentally my original patch to bug 1443853 had a fix for this race condition, but I decided to adopt jgraham’s proposal for a simpler approach. In any case I will provide a patch for this.
Assignee: nobody → ato
Status: NEW → ASSIGNED
Flags: needinfo?(ato)
Priority: -- → P1
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 8•6 years ago
|
||
I’m inclined to want to make a 0.20.1 release with this regression fix. Inaccurately returning "Failed to stop process" when the process has already successfully exited is pretty annoying and is already confusing users. There have been two functional changes to geckodriver since the 0.20.0 release, which is/will be this patch and https://bugzilla.mozilla.org/show_bug.cgi?id=1444068. I think this meets the requirements for a point release according to semantic versioning, but do you agree in principle that we should do this, whimboo?
Flags: needinfo?(hskupin)
Reporter | ||
Comment 9•6 years ago
|
||
The changes are minimal. So I'm happy with a point release. I will try to review today but not if I can make it. At latest tomorrow morning you have it.
Flags: needinfo?(hskupin)
Comment 10•6 years ago
|
||
mozreview-review |
Comment on attachment 8963217 [details] Bug 1448900 - Avoid killing exited process. https://reviewboard.mozilla.org/r/232098/#review237672 I don't love this from a readability point of view, but it seems like it should work.
Attachment #8963217 -
Flags: review+
Assignee | ||
Comment 11•6 years ago
|
||
mozreview-review-reply |
Comment on attachment 8963217 [details] Bug 1448900 - Avoid killing exited process. https://reviewboard.mozilla.org/r/232098/#review237672 So I think the only non-racy solution here is to use try_wait. The problem with "Child::running()", then later "if Child::running" is that its inherently racy. Maybe I can add some comments that will improve the readability of this.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Reporter | ||
Comment 14•6 years ago
|
||
mozreview-review |
Comment on attachment 8963218 [details] Bug 1448900 - Release mozrunner 0.6.1. https://reviewboard.mozilla.org/r/232100/#review237810
Attachment #8963218 -
Flags: review?(hskupin) → review+
Reporter | ||
Comment 15•6 years ago
|
||
mozreview-review |
Comment on attachment 8963217 [details] Bug 1448900 - Avoid killing exited process. https://reviewboard.mozilla.org/r/232098/#review237818 I've tested it locally with all variations except not being able to kill the browser and it works pretty fine. I only want to see a bit more logging here, which I would need for investigating problems. See my inline comments. ::: testing/mozbase/rust/mozrunner/src/runner.rs:144 (Diff revision 2) > fn wait(&mut self, timeout: time::Duration) -> io::Result<process::ExitStatus> { > - let now = time::Instant::now(); > - while self.running() { > - if now.elapsed() >= timeout { > - break; > + let start = time::Instant::now(); > + loop { > + match self.try_wait() { > + // child has already exited, reap its exit code > + Ok(Some(status)) => return Ok(status), For investigation of failures I would like to see the exit code logged at least as debug level. ::: testing/mozbase/rust/mozrunner/src/runner.rs:147 (Diff revision 2) > - if now.elapsed() >= timeout { > - break; > + match self.try_wait() { > + // child has already exited, reap its exit code > + Ok(Some(status)) => return Ok(status), > + > + // child still running and timeout elapsed, kill it > + Ok(None) if start.elapsed() >= timeout => return self.kill(), When we have to kill the process lets log this as info level. This information will be pretty helpful.
Attachment #8963217 -
Flags: review?(hskupin) → review+
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 16•6 years ago
|
||
mozreview-review-reply |
Comment on attachment 8963217 [details] Bug 1448900 - Avoid killing exited process. https://reviewboard.mozilla.org/r/232098/#review237818 > For investigation of failures I would like to see the exit code logged at least as debug level. Good idea, I’ve also missed this.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Reporter | ||
Comment 19•6 years ago
|
||
mozreview-review |
Comment on attachment 8963217 [details] Bug 1448900 - Avoid killing exited process. https://reviewboard.mozilla.org/r/232098/#review237870 ::: testing/geckodriver/src/marionette.rs:604 (Diff revision 3) > > if let Some(ref mut runner) = self.browser { > // TODO(https://bugzil.la/1443922): > // Use toolkit.asyncshutdown.crash_timout pref > - if runner.wait(time::Duration::from_secs(70)).is_err() { > - error!("Failed to stop browser process"); > + match runner.wait(time::Duration::from_secs(70)) { > + Ok(x) => debug!("Browser process stopped with exit status {}", x), Don't you want to use info level here, similar to the command we execute and which also gets logged as info level?
Assignee | ||
Comment 20•6 years ago
|
||
mozreview-review-reply |
Comment on attachment 8963217 [details] Bug 1448900 - Avoid killing exited process. https://reviewboard.mozilla.org/r/232098/#review237870 > Don't you want to use info level here, similar to the command we execute and which also gets logged as info level? I don’t think I would like that. If anything, we should reduce the level of the other one. This isn’t information I think it is generally useful to display to the user given that everything works.
Comment 21•6 years ago
|
||
Pushed by atolfsen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3d25026d07ef Avoid killing exited process. r=jgraham,whimboo https://hg.mozilla.org/integration/autoland/rev/56f8adbe41b7 Release mozrunner 0.6.1. r=whimboo
Comment 22•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3d25026d07ef https://hg.mozilla.org/mozilla-central/rev/56f8adbe41b7
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Reporter | ||
Comment 23•6 years ago
|
||
It looks like the new message contains `exit status` and `exit code`. Can we remove one of them? https://treeherder.mozilla.org/logviewer.html#?job_id=171227846&repo=try&lineNumber=41187 > [task 2018-03-30T17:29:38.196Z] 17:29:38 INFO - PID 1961 | 1522430972688 geckodriver::marionette DEBUG Browser process stopped with exit status exit code: 0
Flags: needinfo?(ato)
Assignee | ||
Comment 24•6 years ago
|
||
The "exit code: 0" part of that comes from the Display trait implementation on ExitStatus, so we should remove the "with exit status" part of the log message, so it comes something like "Browser process stopped: exit code: 0". Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1451838.
Flags: needinfo?(ato)
You need to log in
before you can comment on or make changes to this bug.
Description
•