Firefox does not exit when geckodriver is terminated
Categories
(Testing :: geckodriver, defect, P3)
Tracking
(Not tracked)
People
(Reporter: whimboo, Unassigned)
References
()
Details
Attachments
(5 files)
Comment 1•7 years ago
|
||
Comment 2•7 years ago
|
||
Reporter | ||
Comment 3•7 years ago
|
||
Comment 4•7 years ago
|
||
Comment 5•7 years ago
|
||
Comment 6•7 years ago
|
||
Comment 7•7 years ago
|
||
Comment 8•7 years ago
|
||
Comment 9•7 years ago
|
||
Comment hidden (mozreview-request) |
Comment 11•7 years ago
|
||
Comment 12•7 years ago
|
||
Comment 13•7 years ago
|
||
Comment 14•7 years ago
|
||
mozreview-review |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 17•7 years ago
|
||
mozreview-review |
Comment 18•7 years ago
|
||
mozreview-review |
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Reporter | ||
Comment 19•6 years ago
|
||
Comment 20•6 years ago
|
||
Updated•6 years ago
|
Reporter | ||
Comment 21•6 years ago
|
||
(In reply to Andreas Tolfsen ⦗:ato⦘ from comment #20)
I’m going to pick this up and rebase jgraham’s patch. We now use
a hyper version which does support shutting down.
Looks like you missed to follow-up here. Will you be able to do it soon? Otherwise we should move this bug to the next 0.25 release (bug 1520585).
Comment 22•6 years ago
|
||
No if you want to do it, please go ahead!
Reporter | ||
Updated•6 years ago
|
Comment 23•5 years ago
|
||
(In reply to Andreas Tolfsen ⦗:ato⦘ from comment #20)
I’m going to pick this up and rebase jgraham’s patch. We now use
a hyper version which does support shutting down.
Hello. I'm interested in helping resolve this issue. I have a Python script that uses Selenium+geckodriver+Firefox, and I'd like to gracefully handle ctrl+c events, but I run into this issue where the Firefox windows don't close. Is the current status still that the attached patch needs to be rebased?
Reporter | ||
Comment 24•5 years ago
|
||
James, could you please help Daniel given that were your patches?
Comment 25•5 years ago
|
||
chan-signal
/chan
have since been deprecated in favour of
signal-hook
/crossbeam-channel
, so the attached patches can’t
simply be rebased.
The webdriver
crate has since also moved to use a Tokio-based
hyper
and no longer does its own HTTP request routing but instead
relies on warp
.
The new APIs are however strikingly similar and the same fundamental
approach should still be applicable. You will need to spin up a
thread in geckodriver that blocks on receiving the signal and hook
that up to the webdriver::server::Dispatcher
thread and warp
in some way using a channel so they know to exit when the interrupt
arrives.
Unfortunately I don’t know the warp
API very well, but hyper
can be stopped with hyper::server::Server::with_graceful_shutdown()
.
Comment 26•5 years ago
|
||
Thanks! I'll start working on it.
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 27•5 years ago
|
||
Daniel, do you have any further questions? Otherwise do you have an updated status?
Comment 28•5 years ago
|
||
No further questions at the moment. I have not had a significant amount of time to work on this yet due to my other responsibilities, but I am committed to fixing this bug. I, unexpectedly, will be on medical leave for 3-6 weeks starting on the 10th, so I may not have a patch before January. If someone else comes along and wants to fix it while I am out, they are welcome to, but, as I said, I am committed to fixing this since it blocks some of my own work and it is an interesting problem to me.
Reporter | ||
Comment 29•5 years ago
|
||
Good to hear, and until then best luck for you. I don't think that during that time anyone else would need it. As such the bug should be waiting for you. :)
Comment 30•5 years ago
|
||
Hello, I have hit a wall on this. I am able to spin up a thread that handles the signal and send a message to the webdriver::server::Dispatcher
thread, but I have yet to find a way to shutdown the warp
server that compiles. The compilation fails saying that I cannot await a future because we're not using Rust 2018. Is that true or have I made a simple mistake somewhere? I have a patch file (along with one for the vendoring of signal-hook
) that I can attach if you'd like to look at where I am.
Comment 31•5 years ago
|
||
The compilation fails saying that I cannot await a future because we're not using Rust 2018
This is unfortunately true. I don't know how hard it would be to update to the new edition. I"m guessing not that hard.
I have a patch file (along with one for the vendoring of signal-hook) that I can attach if you'd like to look at where I am.
Please! Or push to phabricator if you are able to do that.
Comment 32•5 years ago
|
||
This is my first time working in moz-central, so I don't know how to push to phabricator yet. I'll attach the patches now while I read up on how to do that.
Comment 33•5 years ago
|
||
Comment 34•5 years ago
|
||
And perhaps I am doing the attachment part wrong too because the vendoring patch is too big.
Comment 35•5 years ago
|
||
I don't know if the situation has improved recently but the docs for this were previously horrible. The tl;dr version is:
- Ensure you have Python 3 and pip 3 installed
pip3 install mozphab
(with--user
if you don't want a system-wide install)moz-phab submit <commits>
Don't worry about the vendoring patch.
Reporter | ||
Comment 36•5 years ago
|
||
Our code should compile for 2018. I fixed all what was broken via bug 1508670 a long time ago. Maybe something new sneaked in. Just test it out.
Comment 37•5 years ago
|
||
So features requiring Rust 2018 should now work on central. The patch you attached looks like full files rather than a diff; lets try to make the Phabricator route work.
Comment 38•5 years ago
|
||
Changing the edition did work though it uncovered new compilation errors. I have fixed those and am running the webdriver tests now. Then I'll test my script that reproduces the original problem, and see if I've actually got everything working. If I do, I'll push to Phabricator.
Comment 39•5 years ago
|
||
Updated•5 years ago
|
Comment 40•5 years ago
|
||
Okay. I have something that I think works. In my repro script, the Firefox window now closes when it didn't before. However, I cannot tell if the warp
-driven webdriver server thread is properly joining. If I put a print at the end of the webdriver server thread body, I don't see the output.
Reporter | ||
Comment 41•5 years ago
|
||
James, mind following up on Daniel's question from about 2 weeks ago? Thanks.
Comment 42•5 years ago
|
||
So with a diff like
diff --git a/testing/geckodriver/src/main.rs b/testing/geckodriver/src/main.rs
index cf844bfe2e8ac..bc4eae6a3e4e1 100644
--- a/testing/geckodriver/src/main.rs
+++ b/testing/geckodriver/src/main.rs
@@ -236,7 +236,7 @@ fn inner_main(app: &mut App) -> ProgramResult<()> {
})?;
}
}
-
+ println!("SHUTDOWN inner_main");
Ok(())
}
diff --git a/testing/webdriver/src/server.rs b/testing/webdriver/src/server.rs
index 2146969ab20fe..63b2981b43199 100644
--- a/testing/webdriver/src/server.rs
+++ b/testing/webdriver/src/server.rs
@@ -183,12 +183,14 @@ where
let builder = thread::Builder::new().name("webdriver server".to_string());
let handle = builder.spawn(move || {
tokio::run(server);
+ println!("SHUTDOWN webdriver server");
})?;
let builder = thread::Builder::new().name("webdriver dispatcher".to_string());
builder.spawn(move || {
let mut dispatcher = Dispatcher::new(handler);
dispatcher.run(&msg_recv);
+ println!("SHUTDOWN dispatcher");
})?;
Ok(Listener {
ctrl-c with a Firefox session started shuts down Firefox, and gives the following output:
^CSHUTDOWN webdriver server
SHUTDOWN inner_main
jgraham@flitwick:~/develop/gecko$ Exiting due to channel error.
`
Exiting due to channel error.
So I don't see the SHUTDOWN dispatcher
line, and something, either Firefox or geckodriver is producing the Exiting due to channel error.
output. So I think maybe something is wrong in the order of shutdown in this case, but basically the patch here works. I'm pretty happy to take it if we solve the excessive CPU usage of the busy loop.
Reporter | ||
Comment 43•5 years ago
|
||
How does it triggering Firefox to quit? Is it sending the Marionette:Quit
message? That is how it should be done because that causes a safe shutdown. Everything else will just be killing the process which is not really wanted.
Comment 44•5 years ago
|
||
Hello. I've had to put working on this on hold until a week or two from now to push another project out the door. James has been providing feedback on Phabricator and I do have next steps to solve the last remaining issues. The busy loop can be solved with a sleep in the thread handling the signal, but I would like to see if I can phrase it as a future so it plays along with the event loop. That said, I haven't seen that channel error message. When I get back to this, I'll see if I can reproduce it on my end.
Henrik, the signal handler sends a webdriver::server::DispatchMessage::Quit
message to the webdriver dispatcher which causes the dispatcher to call its delete_session()
function, which calls the MarionetteHandler
's delete_session()
function, which passes a WebDriverCommand::DeleteSession
message to the MarionetteHandler
's handle_command()
function, which sends that message to the MarionetteConnection
through MarionetteConnection
's send_command()
function which encodes that message into a Marionette:Quit
message. Or at least that's the execution path I traced by reading. I can't confirm that's actually how it works, so it should be confirmed by someone more familiar with the codebase.
Comment 45•5 years ago
|
||
Hello again. The current status on this is that shutdown of the dispatcher hangs while waiting for the MarionetteConnection
to close (via MarionetteConnection.close()
within MarionetteHandler.delete_session()
. The close function is an empty function, and I can't figure out what is actually happening with that function.
I was scheduled to go on family leave for a new child next week, but because of COVID-19, those plans have moved up a week, and tomorrow is my last work day for 3 months.
If the source of the hang can be identified, I think (hope) that is the last hurdle. Adding a 1s sleep to the signal handling thread makes CPU usage go back to the level it was before.
Reporter | ||
Comment 46•5 years ago
|
||
Thank you Daniel for the feedback! I haven't looked into that detail myself for several months so hard to say what's going on. Would you mind pushing your latest changes to phabricator so that we could test on our own? Maybe James has some additional feedback here...
Otherwise all best for you and your family. Especially for the upcoming new family member!
Comment 47•5 years ago
|
||
Okay, latest changes are pushed (just the addition of the sleep to address CPU usage). Thanks for the well wishes!
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 48•4 years ago
|
||
James, would you have the time to check the latest status of the patch? Not sure what's remaining here, and if it's not too much if it could be a potential fix for 0.28.0.
Updated•4 years ago
|
Comment 49•4 years ago
|
||
Comment 50•4 years ago
|
||
I rebased/partially rewrote this patch (the webdriver part; the geckodriver part is largely unchanged). I don't see a hang so afaict we can land this subject to review etc.
Reporter | ||
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Reporter | ||
Comment 51•4 years ago
|
||
So what I didn't notice earlier is that with this patch we only handle SIGINT
, which only handles the Ctrl+C case. But we don't setup signal handlers for those cases when the geckodriver process gets killed via SIGTERM
and/or SIGKILL
. Both signals are limited to non-Windows platforms, but I feel that we should still add them both or only one to handle a graceful shutdown of Firefox when the geckodriver process gets killed.
Also there seems to be a race condition. In most of the cases we are going to quit the session before sendnig the Marionette:Quit
command. As such Firefox will still be not closed.
In a working case it looks like:
1601629124353 geckodriver DEBUG signal received
1601629124353 webdriver::server DEBUG The dispatcher got the Quit message, deleting session
1601629124353 webdriver::server DEBUG Deleting session
1601629124353 webdriver::server DEBUG shutdown!
1601629124355 Marionette DEBUG 0 -> [0,3,"Marionette:Quit",{"flags":["eForceQuit"]}]
1601629124355 Marionette INFO Stopped listening on port 61295
1601629124377 Marionette DEBUG Closed connection 0
And here a failing case:
1601629293053 geckodriver DEBUG signal received
1601629293053 webdriver::server DEBUG The dispatcher got the Quit message, deleting session
1601629293053 webdriver::server DEBUG Deleting session
1601629293054 webdriver::server DEBUG shutdown!
1601629293057 Marionette DEBUG Closed connection 0
^CDEBUG:urllib3.connectionpool:Resetting dropped connection: localhost
James, what do you think?
Reporter | ||
Comment 52•4 years ago
|
||
It's too late for 0.28 and as such has to be moved to 0.29.
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 53•3 years ago
|
||
Too late for the 0.31.0 release and James doesn't work on it at the moment.
Reporter | ||
Comment 54•2 years ago
|
||
We haven't had the time to continue on this particular issue and most likely we won't be able anytime soon. As such lets drop the bug from any milestone / release tracking.
Reporter | ||
Comment 55•9 months ago
|
||
Note that with bug 336193 fixed Firefox should handle SIGTERM/SIGINT/SIGHUP on all platforms now.
Description
•