Closed Bug 1332122 Opened 5 years ago Closed 4 years ago

Navigating to file:// URLs times out in Marionette

Categories

(Testing :: Marionette, defect, P1)

53 Branch
defect

Tracking

(firefox53 affected, firefox56 fixed)

RESOLVED FIXED
mozilla56
Tracking Status
firefox53 --- affected
firefox56 --- fixed

People

(Reporter: jugglinmike, Assigned: whimboo)

References

(Blocks 1 open bug)

Details

(Whiteboard: [spec][17/07])

Attachments

(3 files)

Firefox Nightly (version "53.0a1 (2017-01-18)" at the time of this writing)
does not respond to the "Go" command [1] when the provided URL uses the
`file://` protocol. The browser's stable release (version "50.1.0" at the time
of this writing) returns as expected.

Demonstration using the W3C's WebDriver Python bindings [2]:

    import webdriver

    s = webdriver.Session(host='127.0.0.1', port=4444)
    s.start()
    s.url = 'file:///'
    s.end()

Expected: the browser navigates to the location `file:///`, the HTTP connection
closes, and the script exits successfully.

Actual: the browser navigates to the location `file:///`, the the HTTP
connection times out, and the script exits with an error.

[1] https://w3c.github.io/webdriver/webdriver-spec.html#go
[2] https://github.com/w3c/wdclient
Component: Developer Tools → Marionette
Product: Firefox → Testing
another case that we need to handle for #go
Blocks: webdriver
I will have a look at those hang issues once I'm done with the current window handling work.
Summary: [webdriver] `POST /session/{session id}/url` times out for `file://` URLs → Navigating to file:// URLs times out in Marionette
Duplicate of this bug: 1330935
I want to add that file:// URLs are special. As such the spec has been changed lately, so we do not have to wait for those. 

While trying to get this in as part of my work for bug 1291320, I figured out that it's causing a huge issue for us in Marionette. Reason is that loading such a URL will cause a remoteness change to occur. It means that if we return immediately the remoteness change can happen when we are sending already the next command to the listener. As such the framescript gets initialized again, which causes a hang in Marionette.

Also I cannot shortly delay the return of listener.get() because if the remoteness change happens before our return, we trigger the pollForReadyState, which then will never finish due to missing events. 

I think that we should defer the support for file URLs as long as we do not have a proper solution to wait for specific load events to listening for.
[mass] Setting priority
Priority: -- → P1
It’s a bit unclear to me what you plan to do here.

(In reply to Henrik Skupin (:whimboo) from comment #4)
> I want to add that file:// URLs are special. As such the spec has been
> changed lately, so we do not have to wait for those. 
> 
> While trying to get this in as part of my work for bug 1291320, I figured
> out that it's causing a huge issue for us in Marionette. Reason is that
> loading such a URL will cause a remoteness change to occur. It means that if
> we return immediately the remoteness change can happen when we are sending
> already the next command to the listener. As such the framescript gets
> initialized again, which causes a hang in Marionette.

I agree with handling the file: protocol with special care.  Since it is loaded from local disk and is an internal Gecko document, can we possibly skip checking for the otherwise relevant DOMContentLoaded and document.readyState changes?  Would that help?

> Also I cannot shortly delay the return of listener.get() because if the
> remoteness change happens before our return, we trigger the
> pollForReadyState, which then will never finish due to missing events.

So document.readyState if I load file:///home/ato gets set to "complete", but I assume we fail to intercept the DOMContentLoaded events?  If so, could we have an event-less version of pollForReadyState?

Another possibility is to maybe make testing/marionette/driver.js only call pollForReadyState if the requested URL is not of the file: protocol.

> I think that we should defer the support for file URLs as long as we do not
> have a proper solution to wait for specific load events to listening for.

I agree with this sentiment.  Returning early is far better than never returning at all.
(In reply to Andreas Tolfsen ‹:ato› from comment #6)
> I agree with handling the file: protocol with special care.  Since it is
> loaded from local disk and is an internal Gecko document, can we possibly
> skip checking for the otherwise relevant DOMContentLoaded and
> document.readyState changes?  Would that help?

That's my plan, yes.

> So document.readyState if I load file:///home/ato gets set to "complete",
> but I assume we fail to intercept the DOMContentLoaded events?  If so, could
> we have an event-less version of pollForReadyState?

That won't work because you don't know if the current readyState `complete` is for the last page load, or the current one. So there are also chances that you return too early. I would really simply skip the wait for events.

> Another possibility is to maybe make testing/marionette/driver.js only call
> pollForReadyState if the requested URL is not of the file: protocol.

`pollForReadyState` is no more. But this proposal is similar to the first paragraph of your last comment. And that is what I want to use.

> > I think that we should defer the support for file URLs as long as we do not
> > have a proper solution to wait for specific load events to listening for.
> 
> I agree with this sentiment.  Returning early is far better than never
> returning at all.

Keep in mind that we will only defer a possible hang to the next command. This is due the file:// protocol is causing a remoteness change, and all the commands which are using the new dispatching method simply cannot handle that. 

I'm happy to do this change so that we can mark it as fixed, but we definitely have to come up with a follow-up for the dispatcher.
It looks like that the problem here is a bit more complicated, or we simply do something wrong in registering our content framescript. What I have seen now when running a test with loading a file:// URL, we still receive the following DOM events:

*** Received frame event: beforeunload
*** Received frame event: pagehide
*** Received frame event: blur
*** Received frame event: unload

Then our framescript is dead, and as it looks like its not getting registered anymore. So the currently active message id is not getting further processed.

Now the interesting part is that loading a file: URL will turn on the remoteness flag. Means when a none remoteness tab has been opened (like about:robots) before, the remoteness state flips and the requested file: url gets correctly loaded with all the events we usually expect from a page load (DOMContentLoaded, pageshow). BUT, if a remoteness tab is already there, the framescript gets reloaded but without a remoteness flip. That's an interesting behavior which we might not have accounted for yet. As result we run registerSelf, but we no longer continue to process the currently active message. It's simply dead.

This gives me a feeling that there is a misbehavior in registering our frame script, if it's getting reloaded but no remoteness change is happening.
So from further inspection of this problem and help from Olli on IRC, the issue seems to lay in `linkedBrowser.outerWindowId`, which returns `null` in case of a remote->remote process switch as caused by loading a file:// URL. But it returns a valid outer window id in case of a non-remote->remote process switch.

Here how we retrieve the contentBrowser:
https://dxr.mozilla.org/mozilla-central/rev/0b77ed3f26c5335503bc16e85b8c067382e7bb1e/testing/marionette/browser.js#144

And this code will try to access the outerWindowId:
https://dxr.mozilla.org/mozilla-central/rev/0b77ed3f26c5335503bc16e85b8c067382e7bb1e/testing/marionette/driver.js#1221

Maybe something is wrong with the CPOW? It looks like this is used given the comment above the method of the last link.

Bob, do you have an idea what could be wrong here?
Flags: needinfo?(bobowencode)
Summary: Navigating to file:// URLs times out in Marionette → Navigating to file:// URLs times out in Marionette if loaded in an al
I just tested the other way around. Means from a loaded file:// URL to a HTTP page, and the same problem also appears. So it should be something related to the child process change.
Summary: Navigating to file:// URLs times out in Marionette if loaded in an al → Navigating to file:// URLs times out in Marionette
(In reply to Henrik Skupin (:whimboo) from comment #9)
> So from further inspection of this problem and help from Olli on IRC, the
> issue seems to lay in `linkedBrowser.outerWindowId`, which returns `null` in
> case of a remote->remote process switch as caused by loading a file:// URL.
> But it returns a valid outer window id in case of a non-remote->remote
> process switch.
> 
> Here how we retrieve the contentBrowser:
> https://dxr.mozilla.org/mozilla-central/rev/
> 0b77ed3f26c5335503bc16e85b8c067382e7bb1e/testing/marionette/browser.js#144
> 
> And this code will try to access the outerWindowId:
> https://dxr.mozilla.org/mozilla-central/rev/
> 0b77ed3f26c5335503bc16e85b8c067382e7bb1e/testing/marionette/driver.js#1221
> 
> Maybe something is wrong with the CPOW? It looks like this is used given the
> comment above the method of the last link.
> 
> Bob, do you have an idea what could be wrong here?

As far as I can tell, for a remote browser, outerWindowID never returns something that will work with getOuterWindowWithId in the parent process.
This is actually got from [1] and set from the child at [2], I think.

If you call outerWindowID on the non-remote browser that ID will return a window from getOuterWindowWithId.
It still seems to return the original non-remote window after navigating to file:// (or http://), although I'm not sure why this hasn't been cleaned up or how long it would live.

I'm not quite clear what the ID is used for here, could using getBrowserForOuterWindowID, then contentWindowAsCPOW to get to the window help?
I think that might work with non-remote and remote IDs, but I'm not totally sure.

[1] https://hg.mozilla.org/mozilla-central/file/33b92d9c4056/toolkit/content/widgets/remote-browser.xml#l221
[2] https://hg.mozilla.org/mozilla-central/file/33b92d9c4056/toolkit/content/widgets/remote-browser.xml#l444
Flags: needinfo?(bobowencode)
There is no way for us to use `contentWindowAsCPOW` on the linkedBrowser because we do not allow CPOWs in Marionette. So this proposal will not work.

Interestingly we hit the same issue with the outerid == null on bug 1363368 now. So I would like to continue on this particular problem there.
Depends on: 1363368
I’m afraid this is going to be a bit off-topic, but I feel this is worth noting down for posterity’s sake.

We found that testing/web-platform/tests/webdriver/navigation.py’s file: URL test to start passing when Firefox is built with the RELEASE_OR_BETA ifdef (which can be triggered by removing the "a"(lpha) character from config/milestone.txt).  If I understand correctly, this causes the sandboxed processes to be disabled.  When these are enabled, which they are by default in Nightly, a content frame script reload occurs when navigating to a file: URL with Marionette.

Marionette currently assumes that all frame script reloads are caused by remoteness changes, so this bug is really about making it possible for Marionette to recover from situations where the script gets reloaded but not assuming it is coupled with a remoteness change.  Unfortunately, this only happens with file: URLs at the moment.
I picked this bug up again today and I can say that I have a working solution locally now which allows us to correctly handling a page load for file:// with a reload of the frame script included, but without a remoteness change. I will have to check carefully now, which if conditions for remoteness changes we can get rid off.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Depends on: 1378191
Whiteboard: [spec][17/07]
So the only dependency we actually have here for these changes to land is bug 1378191.
No longer depends on: webdriver-navigate, 1363368
I'm about to push a WIP patch. It still needs refactoring, but lets see what a try build tells...
With the latest patch I still have problems that a navigation command hangs intermittently, but frequently enough so I can investigate. Interestingly it is the new test which I have recently written for bug 1378191, and which seems to exactly reproduce what we see in bug 1368434. I will give more information once I understand what's going on.
So this is actually the worst bug I found in Marionette so far which might cause all of our intermittent issues! I will post some details to bug 1368434, and will have to get this fixed first!
No longer blocks: 1368434
Depends on: 1368434
Whereby thinking more about it bug 1368434 could even be something else. Reason is that affected tests do not use the `start_session` command in a way as described above. So lets get it fixed via bug 1378810, and I can check bug 1368434 afterward.
Blocks: 1368434
Depends on: 1378810
No longer depends on: 1368434
Duplicate of this bug: 1378810
I cannot actually spin out this patch to bug 1378810 because it highly depends on the frame script reload changes I did in this patch series. So I will have to keep it in here, even with the risk that we cannot uplift it to beta and/or esr52.
Latest try build for the code I'm about to upload is:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b448d1043b4a52011e752bd2a43deb588b8d2f9e

So far all fine!
Attachment #8883682 - Flags: review?(dburns)
Attachment #8883683 - Flags: review?(dburns)
Attachment #8883684 - Flags: review?(dburns)
Comment on attachment 8883682 [details]
Bug 1332122 - Remove unnecessary wrapper browser.startSession().

https://reviewboard.mozilla.org/r/154588/#review160064
Attachment #8883682 - Flags: review?(dburns) → review+
Comment on attachment 8883683 [details]
Bug 1332122 - Re-register the browser for frame script reloads.

https://reviewboard.mozilla.org/r/154590/#review160072
Attachment #8883683 - Flags: review?(dburns) → review+
Comment on attachment 8883684 [details]
Bug 1332122 - Add unit tests for navigating to file:// URLs.

https://reviewboard.mozilla.org/r/154592/#review160074
Attachment #8883684 - Flags: review?(dburns) → review+
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/98b4cbf90a4a
Remove unnecessary wrapper browser.startSession(). r=automatedtester
https://hg.mozilla.org/integration/autoland/rev/3f3ffe6ae813
Re-register the browser for frame script reloads. r=automatedtester
https://hg.mozilla.org/integration/autoland/rev/c5738f17acb9
Add unit tests for navigating to file:// URLs. r=automatedtester
https://hg.mozilla.org/mozilla-central/rev/98b4cbf90a4a
https://hg.mozilla.org/mozilla-central/rev/3f3ffe6ae813
https://hg.mozilla.org/mozilla-central/rev/c5738f17acb9
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Depends on: 1381343
You need to log in before you can comment on or make changes to this bug.