Open Bug 1219969 Opened 4 years ago Updated 2 years ago

[e10s] marionette.navigate() raises IOError when the corresponding top-level request is canceled.

Categories

(Testing :: Marionette, defect, P5)

defect

Tracking

(e10s+)

REOPENED
Tracking Status
e10s + ---

People

(Reporter: dev, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

Attached file test_http_request_cancel.py (obsolete) —
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0 Iceweasel/38.3.0
Build ID: 20150922225347

Steps to reproduce:

I ran the attached marionette testcase in e10s-mode using the firefox-ui-tests CLI.


Actual results:

The test hangs several minutes until it raises an IOError.

The gecko.log shows this:

canceling 'https://www.mozilla.org/'
[Parent 9126] WARNING: waitpid failed pid:9199 errno:10: file /builds/slave/m-cen-l64-ntly-000000000000000/build/src/ipc/chromium/src/base/process_util_posix.cc, line 267
JavaScript error: chrome://marionette/content/listener.js, line 396: NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIMessageSender.sendAsyncMessage]


Expected results:

There should be a TimeoutException, just like in non-e10s mode.
Forgot to say that Bug 1209373 might be related.
Attachment #8680934 - Attachment mime type: text/x-python → text/plain
I can confirm this behavior with the attached testcase. Looks like we are running into the socket timeout here. Chris, looks like there is one more thing to fix for navigate.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(cmanchester)
I think this is a case that isn't covered by the workarounds we needed for e10s. I can offer some simple workarounds to avoid this in individual tests, but I think a real fix would be non-trivial. From debugging a little bit it looks like the content listener is in a state where it can no longer send messages to the parent after the request is cancelled. To solve this reliably we might have to implement another timeout managed by the parent process. I wonder if this is a state that can be achieved outside of an addon.
Flags: needinfo?(cmanchester)
Not sure I completely understand your last sentence but the testcase is not related to an addon. You can always get into that state when you cancel the page load right after it has been started. I wonder how the stop button behaves here and if it causes the same exception. I don't currently have the time to test this but maybe Martin can check that.
Flags: needinfo?(dev)
Summary: On E10s, marionette.navigate() raises IOError when the corresponding top-level request is canceled. → [e10s] marionette.navigate() raises IOError when the corresponding top-level request is canceled.
This is desired behaviour. Navigate() is blocking until a timer is hit or the DOMContentLoaded event is fired. 

What is the use case for wanting to cancel navigate when it isnt complete?
David, this reported issue is e10s only. It works fine in non-e10s mode. A use case I have described above in my last comment for the toolbar's stop button.
(In reply to Henrik Skupin (:whimboo) from comment #4)
> Not sure I completely understand your last sentence but the testcase is not
> related to an addon. You can always get into that state when you cancel the
> page load right after it has been started. I wonder how the stop button
> behaves here and if it causes the same exception. I don't currently have the
> time to test this but maybe Martin can check that.

I should have said "outside of chrome js". It appears the origin of this bug is an addon encountering the issue.
Not sure, Henrik, how you would cancel the page load right after it has been started. As far as I understand navigate() works synchronously, so that the page load has already finished when navigate() returns. In other words, the following code won't stop the page from being loaded:

    with self.marionette.using_context("content"):
        self.marionette.navigate("https://www.mozilla.org/")
    self.browser.navbar.locationbar.stop_button.click()

However, I did a quick test, navigating to an URL which takes some seconds to load [1], and clicking the stop button manually. In that case I get the same error as above:

    IOError: process died with returncode None

By the way, I've just learned about LocationBar.load_url() [2]. The function does not wait for the document to be loaded and therefore works for my test case. However, using navigate() and asserting a TimeoutException would be cleaner. Now it depends on you if you want to fix this issue. Maybe you want to choose one of the following paths:

(a) Fix this bug so that navigate() raises TimeoutException, just as I described in the first comment.
(b) Add an option to navigate() which says the Marionette Server should not wait for the document to load. Not sure if this makes sense.
(c) Don't fix this bug (wontfix). In this case it would be nice if some documentation hints about LocationBar.load_url().

[1] https://github.com/RequestPolicyContinued/requestpolicy/blob/2775a88e09e62273b30b1626e8bee39872ec1e17/tests/content/slowly_loading_page.php (The page needs 5 seconds to load with parameters `?load_duration=5000`.)
[2] https://github.com/mozilla/firefox-ui-tests/blob/acd30465aa9d1372edd2a402cd801ad20319ed4d/firefox_puppeteer/ui/browser/toolbars.py#L208
Flags: needinfo?(dev)
(In reply to Martin Kimmerle from comment #8)
> However, I did a quick test, navigating to an URL which takes some seconds
> to load [1], and clicking the stop button manually. In that case I get the
> same error as above:
> 
>     IOError: process died with returncode None

That's what I was thinking of. So it confirms my suspicion. Thanks Martin.

> cleaner. Now it depends on you if you want to fix this issue. Maybe you want
> to choose one of the following paths:

Given that the stop button also causes this IOError exception there is definitely a problem outside of navigate() maybe even in Firefox.
Blocks: e10s-tests
tracking-e10s: --- → +
a lot of the navigation code has been rewritten and fixed since this was raised. I suspect this has been fixed already.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
Attached file Marionette testcase
No, this is still a problem. Here the updated testcase.

But this is really very special code! So fixing it has low priority.
Attachment #8680934 - Attachment is obsolete: true
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
OS: Unspecified → All
Priority: -- → P5
Hardware: Unspecified → All
You need to log in before you can comment on or make changes to this bug.