1448900 - Confusing "Failed to stop browser process" log message when trying to kill the already exited browser process

Reporter

Description

•

7 years ago

(In reply to Henrik Skupin (:whimboo) from bug 1434618 comment #20) > I see some strange behavior here. All is based on the following log: > > https://treeherder.mozilla.org/logviewer.html#?job_id=169826750&repo=try > (log viewer) > https://taskcluster-artifacts.net/W8In7kJvTES01pc-37mDGA/0/public/logs/ > live_backing.log (full log) > > > 1) Geckodriver reports after only 5s that it failed to stop the browser > process. How can this be given that we should wait 70s for that to be > happening. Andreas, do you have an idea? Here the excerpt from the log: > > > [task 2018-03-23T09:02:46.253Z] 09:02:46 INFO - PID 8530 | 1521795766245 Marionette TRACE 0 -> [0,3,"quit",{"flags":["eForceQuit"]}] > [..] > > [task 2018-03-23T09:02:51.563Z] 09:02:51 INFO - PID 8530 | 1521795771555 geckodriver::marionette ERROR Failed to stop browser process (In reply to Andreas Tolfsen ‹:ato› from comment #21) > I think std::process::Child::kill() errors if the process isn’t > running: > > https://doc.rust-lang.org/std/process/struct.Child.html#method.kill > > mozrunner unconditionally calls this after > mozrunner::RunnerProcess::running() is false in the ::wait() loop: > > > https://searchfox.org/mozilla-central/rev/ > 003262ae12ce937950ffb8d3b0fa520d1cc38bff/testing/mozbase/rust/mozrunner/src/ > runner.rs#147 (In reply to Andreas Tolfsen ‹:ato› from comment #22) > Although killing twice an ended process does not: > https://play.rust-lang.org/ > ?gist=85ae7d8f983307245872fb328b5685b4&version=stable

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 1

•

7 years ago

We should really get rid of this confusing message which originates here: https://searchfox.org/mozilla-central/rev/003262ae12ce937950ffb8d3b0fa520d1cc38bff/testing/geckodriver/src/marionette.rs#604 It's really interesting why `kill()` reports an error in that case. I would like to know what kind of error this actually is.

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 2

•

7 years ago

Ok, so it can be replicated with the following code: https://play.rust-lang.org/?gist=3ec0f0a6a8278ad0b218814d6b98f6c3&version=stable The problem is going to happen when the process exits itself, and we try to kill it afterward.

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 3

•

7 years ago

The failure as thrown here is actually: > Failed to stop process: Error { repr: Custom(Custom { kind: InvalidInput, error: StringError("invalid argument: can\'t kill an exited process") }) } It means we should really only call `kill()` if we have to. And I remember that I was unhappy with that change when I reviewed the code.

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 4

•

7 years ago

Just for reference, the related code was added by bug 1443853. Andreas, can you provide a fix for that?

Blocks: 1443853

Flags: needinfo?(ato)

Summary: Confusing "Failed to stop browser process" log messages → Confusing "Failed to stop browser process" log message when trying to kill the already exited browser process

Andreas Tolfsen ❲:ato❳

Assignee

Comment 5

•

7 years ago

Incidentally my original patch to bug 1443853 had a fix for this race condition, but I decided to adopt jgraham’s proposal for a simpler approach. In any case I will provide a patch for this.

Assignee: nobody → ato

Status: NEW → ASSIGNED

Flags: needinfo?(ato)

Priority: -- → P1

Comment hidden (mozreview-request)

Andreas Tolfsen ❲:ato❳

Assignee

Comment 8

•

7 years ago

I’m inclined to want to make a 0.20.1 release with this regression fix. Inaccurately returning "Failed to stop process" when the process has already successfully exited is pretty annoying and is already confusing users. There have been two functional changes to geckodriver since the 0.20.0 release, which is/will be this patch and https://bugzilla.mozilla.org/show_bug.cgi?id=1444068. I think this meets the requirements for a point release according to semantic versioning, but do you agree in principle that we should do this, whimboo?

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 9

•

7 years ago

The changes are minimal. So I'm happy with a point release. I will try to review today but not if I can make it. At latest tomorrow morning you have it.

Flags: needinfo?(hskupin)

James Graham [:jgraham]

Comment 10

•

7 years ago

mozreview-review

Comment on attachment 8963217 [details] Bug 1448900 - Avoid killing exited process. https://reviewboard.mozilla.org/r/232098/#review237672 I don't love this from a readability point of view, but it seems like it should work.

Attachment #8963217 - Flags: review+

Andreas Tolfsen ❲:ato❳

Assignee

Comment 11

•

7 years ago

mozreview-review-reply

Comment on attachment 8963217 [details] Bug 1448900 - Avoid killing exited process. https://reviewboard.mozilla.org/r/232098/#review237672 So I think the only non-racy solution here is to use try_wait. The problem with "Child::running()", then later "if Child::running" is that its inherently racy. Maybe I can add some comments that will improve the readability of this.

Comment hidden (mozreview-request)

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 14

•

7 years ago

mozreview-review

Comment on attachment 8963218 [details] Bug 1448900 - Release mozrunner 0.6.1. https://reviewboard.mozilla.org/r/232100/#review237810

Attachment #8963218 - Flags: review?(hskupin) → review+

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 15

•

7 years ago

mozreview-review

Comment on attachment 8963217 [details] Bug 1448900 - Avoid killing exited process. https://reviewboard.mozilla.org/r/232098/#review237818 I've tested it locally with all variations except not being able to kill the browser and it works pretty fine. I only want to see a bit more logging here, which I would need for investigating problems. See my inline comments. ::: testing/mozbase/rust/mozrunner/src/runner.rs:144 (Diff revision 2) > fn wait(&mut self, timeout: time::Duration) -> io::Result<process::ExitStatus> { > - let now = time::Instant::now(); > - while self.running() { > - if now.elapsed() >= timeout { > - break; > + let start = time::Instant::now(); > + loop { > + match self.try_wait() { > + // child has already exited, reap its exit code > + Ok(Some(status)) => return Ok(status), For investigation of failures I would like to see the exit code logged at least as debug level. ::: testing/mozbase/rust/mozrunner/src/runner.rs:147 (Diff revision 2) > - if now.elapsed() >= timeout { > - break; > + match self.try_wait() { > + // child has already exited, reap its exit code > + Ok(Some(status)) => return Ok(status), > + > + // child still running and timeout elapsed, kill it > + Ok(None) if start.elapsed() >= timeout => return self.kill(), When we have to kill the process lets log this as info level. This information will be pretty helpful.

Attachment #8963217 - Flags: review?(hskupin) → review+

Andreas Tolfsen ❲:ato❳

Assignee

Updated

•

7 years ago

Blocks: 1449877
No longer blocks: 1441204

Andreas Tolfsen ❲:ato❳

Assignee

Comment 16

•

7 years ago

mozreview-review-reply

Comment on attachment 8963217 [details] Bug 1448900 - Avoid killing exited process. https://reviewboard.mozilla.org/r/232098/#review237818 > For investigation of failures I would like to see the exit code logged at least as debug level. Good idea, I’ve also missed this.

Comment hidden (mozreview-request)

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 19

•

7 years ago

mozreview-review

Comment on attachment 8963217 [details] Bug 1448900 - Avoid killing exited process. https://reviewboard.mozilla.org/r/232098/#review237870 ::: testing/geckodriver/src/marionette.rs:604 (Diff revision 3) > > if let Some(ref mut runner) = self.browser { > // TODO(https://bugzil.la/1443922): > // Use toolkit.asyncshutdown.crash_timout pref > - if runner.wait(time::Duration::from_secs(70)).is_err() { > - error!("Failed to stop browser process"); > + match runner.wait(time::Duration::from_secs(70)) { > + Ok(x) => debug!("Browser process stopped with exit status {}", x), Don't you want to use info level here, similar to the command we execute and which also gets logged as info level?

Andreas Tolfsen ❲:ato❳

Assignee

Comment 20

•

7 years ago

mozreview-review-reply

Comment on attachment 8963217 [details] Bug 1448900 - Avoid killing exited process. https://reviewboard.mozilla.org/r/232098/#review237870 > Don't you want to use info level here, similar to the command we execute and which also gets logged as info level? I don’t think I would like that. If anything, we should reduce the level of the other one. This isn’t information I think it is generally useful to display to the user given that everything works.

Pulsebot

Comment 21

•

7 years ago

Pushed by atolfsen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3d25026d07ef Avoid killing exited process. r=jgraham,whimboo https://hg.mozilla.org/integration/autoland/rev/56f8adbe41b7 Release mozrunner 0.6.1. r=whimboo

Andrei Ciure[:aciure]

Comment 22

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3d25026d07ef https://hg.mozilla.org/mozilla-central/rev/56f8adbe41b7

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox61: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla61

Henrik Skupin [:whimboo][⌚️UTC+2]

Reporter

Comment 23

•

7 years ago

It looks like the new message contains `exit status` and `exit code`. Can we remove one of them? https://treeherder.mozilla.org/logviewer.html#?job_id=171227846&repo=try&lineNumber=41187 > [task 2018-03-30T17:29:38.196Z] 17:29:38 INFO - PID 1961 | 1522430972688 geckodriver::marionette DEBUG Browser process stopped with exit status exit code: 0

Flags: needinfo?(ato)

Andreas Tolfsen ❲:ato❳

Assignee

Comment 24

•

7 years ago

The "exit code: 0" part of that comes from the Display trait implementation on ExitStatus, so we should remove the "with exit status" part of the log message, so it comes something like "Browser process stopped: exit code: 0". Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1451838.

Flags: needinfo?(ato)

Bug 1448900 - Avoid killing exited process. 7 years ago Andreas Tolfsen ❲:ato❳ 59 bytes, text/x-review-board-request	whimboo : review+	Details
Bug 1448900 - Release mozrunner 0.6.1. 7 years ago Andreas Tolfsen ❲:ato❳ 59 bytes, text/x-review-board-request	whimboo : review+	Details