Closed Bug 666801 Opened 9 years ago Closed 7 years ago

Handle webProgress for out-of-process content

Categories

(Core :: DOM: Navigation, defect)

x86
All
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla23

People

(Reporter: Felipe, Assigned: billm)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-needed, Whiteboard: [e10s])

Attachments

(4 files, 6 obsolete files)

19.35 KB, patch
Details | Diff | Splinter Review
4.15 KB, patch
smaug
: review+
Details | Diff | Splinter Review
8.73 KB, patch
Felipe
: review+
Details | Diff | Splinter Review
20.75 KB, patch
Felipe
: review+
Details | Diff | Splinter Review
Code that relies on webProgress is currently broken for out-of-process content. Some of them were disabled in bug 663042 for e10s-compat mode.
Depends on: 663042
Is this bug asking for browser.webProgress to continue to work as it does now? Is this feasible at all?
(In reply to comment #1)
> Is this bug asking for browser.webProgress to continue to work as it does
> now? Is this feasible at all?

No specific direction on how to approach this was assumed when filing these bugs. They were filed as a way to have a tracking place for everything that was disabled by bug 663042.

For this specific case, it's worth noting that about half of the things disabled in 663042 were due to the lack of webProgress.

From my understanding, ideally the preferred options would be to fix in a way that is clean and:

 1 - can keep consumers intact
 2 - can keep consumers intact if CPOWs are used
 3 - can keep consumers intact if running in single-process mode
 4 - break
(In reply to Felipe Gomes (:felipe) from comment #2)

> From my understanding, ideally the preferred options would be to fix in a
> way that is clean and:
> 
>  1 - can keep consumers intact
>  2 - can keep consumers intact if CPOWs are used
>  3 - can keep consumers intact if running in single-process mode
>  4 - break

We tried to do #1 and #3 in Fennec. We ended up using #4.

#1 just wasn't feasible since many of the nsIWebProgress properties are not available and neither is nsIRequest
#3 just makes for complicated code and needed to test "am I in the child process" in a lot of places.

#4 was the simplest method, resulting in the simplest code. We also reduced the IPC overhead to "just what we needed and no more" instead of adding support for features we didn't even use.
(In reply to Mark Finkle (:mfinkle) from comment #3)
> (In reply to Felipe Gomes (:felipe) from comment #2)
> 
> > From my understanding, ideally the preferred options would be to fix in a
> > way that is clean and:
> > 
> >  1 - can keep consumers intact
> >  2 - can keep consumers intact if CPOWs are used
> >  3 - can keep consumers intact if running in single-process mode
> >  4 - break
> 
> We tried to do #1 and #3 in Fennec. We ended up using #4.
> 
> #1 just wasn't feasible since many of the nsIWebProgress properties are not
> available and neither is nsIRequest
> #3 just makes for complicated code and needed to test "am I in the child
> process" in a lot of places.
> 
> #4 was the simplest method, resulting in the simplest code. We also reduced
> the IPC overhead to "just what we needed and no more" instead of adding
> support for features we didn't even use.

Considering existing Firefox extensions:
* "just what we need" is probably all of nsIWebProgressListener[2]
* #4 is going to be extremely painful
(In reply to Dão Gottwald [:dao] from comment #4)

> Considering existing Firefox extensions:
> * "just what we need" is probably all of nsIWebProgressListener[2]
> * #4 is going to be extremely painful

With e10s, you could substitute "nsIWebProgressListener" with many, many interfaces. It's a slippery slope, and suddenly you create some CPOW solution that removes any improvements e10s gives you.

For example, "all of nsIWebProgressListener" means you'll need "all of" nsIWebProgress, nsIDOMWindow, nsIDOMDocument, nsIRequest, (many, many more ...)

But I do agree that #4 is a big, scary change.
(In reply to Mark Finkle (:mfinkle) from comment #5)
> It's a slippery slope, and suddenly you create some CPOW
> solution that removes any improvements e10s gives you.

It's not all or nothing; we should still rewrite our code (although this could then happen incrementally without blocking enabling e10s in nightlies). I just think backward compatibility might be necessary for a huge pile of add-ons that aren't going to be ready in time or would be abandoned altogether.

An alternative solution might be to disable e10s when we detect a legacy add-on. I don't know if this makes more sense technically.
Let's not make this bug too meta: the plans for extension compat were already posted months ago to dev.planning, and at this point we know we cannot implement enough compatibility to actually be helpful to most extension authors.
Felipe: any update here?
So with some work we can expose the nsIChannel on the parent (i.e we can map the nsIRequest in the child to an ID, that when passed to the parent via the messageManager or whatever can be used to retrieve the "real" nsIRequest on the parent that is servicing the child nsIRequest).

From talking to felipe, it sounds like the plan is to try to deprecate nsIWebProgress.DOMWindow.   Those two things together ought to provide a workable solution to this?

I guess my questions is: assuming removing .DOMWindow is a huge API change, do we really need the full nsIRequest too?   It sounds like most consumers are either just using it to get the requests' .URI property, which would be a lot easier to pass.  And/or they're using nsBrowserStatusFilter, which already passes nsnull for nsIRequest most of the time (but not all: see OnStateChange).

The necko plumbing involved would be non-trivial, so I just want to make sure we actually need it before we choose that solution.
Attached patch WIP v1Splinter Review
First version of the patch that is working reliably for the most relevant progress notifications. Note that this patch goes on top of the patch from bug 662008.

With the patch applied and Drew's content-log turned on (and filtering out calls from sss_observe), the log for loading www.google.com is reduced from 2000 to 400 lines on average. There's still a progress listener in nsLoginManager that I haven't touched.

As Gavin suggested, I've ifdef'ed out some parts for nsIRequest until we decide about Jason's point in comment 9.

This patch still needs work but the basic idea is there. Also need to think about nsIWebProgressListener2's onRefreshAttempted which is sync.

Taking a brief look at the log (will run better measures soon), one important next thing to tackle would be loadURI/setCurrentURI.
Assignee: nobody → felipc
Status: NEW → ASSIGNED
Blocks: 474464
I'm very concerned about this, and I believe that the preferable approach is to refactor the code (approach #4) and we should only be taking the current approach if that is completely unreasonable.

My primary concern is that existing code will be making state assumptions which just aren't correct when these messages are delivered asynchronously: if they then perform actions on content without ensuring that the same content is still loaded, it's very easy to have subtle and dangerous security bugs.

The current patch doesn't implement progress changes, which is necessary for performance, but it also means that we'll silently be hiding a bunch of incompatibilities. I think that if we do this route we'll be taking on a lot of unwarranted technical debt. It's better to get this done more slowly, IMO.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #11)
> My primary concern is that existing code will be making state assumptions
> which just aren't correct when these messages are delivered asynchronously:
> if they then perform actions on content without ensuring that the same
> content is still loaded, it's very easy to have subtle and dangerous
> security bugs.

I'm not sure what kind of actions you mean and how this would be different if front-end code replaced the proposed progress listener messages with some more abstract (still async) messages...
> I'm not sure what kind of actions you mean

For instance, in non-e10s, when we supply the necko channel for all these notifications, it's done synchronously, so for STATE_TRANSFERRING the channel would still be in the state of transferring data.   Under the e10s model proposed here, the web progress notification would be sent back to the parent from the child, and the parent channel we provide might well be done sending data (in fact, we'd probably have to take extra steps to keep it alive for the progress notification).  So despite the API being the same, the actual state provided to the consumer is different.
(In reply to Jason Duell (:jduell) from comment #13)
> > I'm not sure what kind of actions you mean
> 
> For instance, in non-e10s, when we supply the necko channel for all these
> notifications, it's done synchronously, so for STATE_TRANSFERRING the
> channel would still be in the state of transferring data.   Under the e10s
> model proposed here, the web progress notification would be sent back to the
> parent from the child, and the parent channel we provide might well be done
> sending data

What security implications does this have? Note that "might be done sending data" is true for any async progress message.
The difference is that if we built an API which was obviously incompatible with the existing API, we can do security auditing at review time to make sure that assumptions are correct. The point of the current patch is to get things to work mostly automatically, but with possibly dangerous side effects.

Basically any action where the frontend code has one of these notifications and then acts on content based on the URI or other data is subject to these race conditions. The content may already be looking at a different page, or may have changed its state in other ways.
Keeping compatibility and auditing aren't mutually exclusive.
We won't be able to audit add-ons, one way or another; nothing guarantees they'll correctly use a new async API.
(In reply to Dão Gottwald [:dao] from comment #16)
> Keeping compatibility and auditing aren't mutually exclusive.
> We won't be able to audit add-ons, one way or another; nothing guarantees
> they'll correctly use a new async API.

I agree - breaking all of the existing code to encourage auditing doesn't seem like the right tradeoff at this point. Taking this approach for now lets us focus on getting the various other pieces we need working, and doesn't really prevent us from revisiting a more invasive approach at some later point.
Blocks: 666809
I'd like to narrow the scope of this bug a little bit. The new, less ambitious goals are:

1. use shims to make the existing webProgress code in Firefox work when browser.tabs.remote is set
2. don't break anything when browser.tabs.remote is false

In the future we'll still have to deal with addon compatibility issues. However, I think it will be a lot easier to make those choices if we have a working multiprocess Firefox with CPOWs so that developers and addon authors can figure out what architecture works best.

The first patches I'm posting are just refactorings that shouldn't change semantics. The last patch adds shims for webProgress, but only for remote browser elements. Normal firefox should be unaffected.
This patch changes all the places where we do |aWebProgress.DOMWindow == content| to instead call a function. It shouldn't have any semantic effects.

The only one I'm not sure about is |aWebProgress.DOMWindow == aWebProgress.DOMWindow.top|, so please give that one an extra look.
Attachment #737746 - Flags: review?(felipc)
Using the DOMWindow property of webProgress won't work in e10s, but it's okay to use IDs. This patch adds a DOMWindowID attribute to nsIWebProgress that should match the ID of the DOMWindow attribute.
Attachment #737747 - Flags: review?(bugs)
This patch changes the WPIsTopLevel functions that were added earlier so that they use the window ID, rather than the window itself. (I added the getter for the inner window ID in anticipation that we might use it later, but it's not used in this bug.) Again, this patch shouldn't change the behavior of anything.
Attachment #737751 - Flags: review?(felipc)
Attached patch add shims for webProgress (obsolete) — Splinter Review
This is the main patch. It works roughly as follows:

- It adds a new XBL binding for remote <browser> elements. Mostly this binding just proxies webProgress and a few other fields.

- It adds the files browser-child.js and browser-parent.js. They run in the child and parent processes. We already have a content.js that runs code loaded into the content process. My thinking is that -parent/-child.js would only be used when browser.tabs.remote is true. Code that uses the message manager either way would still go in content.js.

- The code in browser-child.js just listens for WebProgress notifications and forwards them to the parent.

- The parent receives them in browser-parent.js and calls any listeners that have been registered, passing them shims for the nsIWebProgress object as well as the request.

- I also threw some code to handle nsIWebNavigation into browser-child.js. That could go in a separate bug, but it's pretty simple, so I left it in.
Attachment #737766 - Flags: review?(felipc)
Attached patch add shims for webProgress, v2 (obsolete) — Splinter Review
A few stray pieces ended up in that last patch. Removed.
Attachment #737766 - Attachment is obsolete: true
Attachment #737766 - Flags: review?(felipc)
Attachment #737770 - Flags: review?(felipc)
Comment on attachment 737746 [details] [diff] [review]
refactor code to test if nsIWebProgress if for top-level window

"aBrowser.contentWindow == aWebProgress.DOMWindow" is "aWebProgress is notifying for the the top-level window in aBrowser".

"aWebProgress.DOMWindow == aWebProgress.DOMWindow.top" is "aWebProgress is notifying for its top-level window", which isn't the same check at all (aBrowser isn't relevant). So that part needs to be fixed.

Otherwise this looks good, but it would be best to avoid adding these helper functions in the global scope. A separate module (WPLUtils.jsm?) might be better. It depends on what you see as the next step for this, I guess. (Will these helpers stay around indefinitely, or are they a temporary measure? What will their implementations look like in e10s mode?)
Attachment #737746 - Flags: review?(felipc) → review-
Would it make sense for nsIWebProgress to expose something like an isTopLevel property?
(In reply to Dão Gottwald [:dao] from comment #25)
> Would it make sense for nsIWebProgress to expose something like an
> isTopLevel property?

FWIW this was the approach that was being taken last time, and can be seen in the "WIP v1" patch
Blocks: fxe10s
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #24)
> "aBrowser.contentWindow == aWebProgress.DOMWindow" is "aWebProgress is
> notifying for the the top-level window in aBrowser".
> 
> "aWebProgress.DOMWindow == aWebProgress.DOMWindow.top" is "aWebProgress is
> notifying for its top-level window", which isn't the same check at all
> (aBrowser isn't relevant). So that part needs to be fixed.

I spoke to billm about this in person. I was wrong about this mattering; while there is a theoretical difference, in practice this doesn't matter for this code because aWebProgress in this case is guaranteed to be notifying for aBrowser or one of its child docshells.

I think it makes sense to add both the window ID as well as an isTopLevel property, to make the common case more easy to read.
Comment on attachment 737746 [details] [diff] [review]
refactor code to test if nsIWebProgress if for top-level window

I talked to Felipe, and we were thinking that the istoplevel stuff could go in a separate object (maybe WebProgressUtils?) to avoid adding too many names to the global. I can put them there in the final patch.
Attachment #737746 - Flags: review?(felipc)
(In reply to Bill McCloskey (:billm) from comment #28)
> I talked to Felipe, and we were thinking that the istoplevel stuff could go
> in a separate object (maybe WebProgressUtils?) to avoid adding too many
> names to the global.

If we add isTopLevel to nsIWebProgress, then we won't need this helper at all.
Comment on attachment 737746 [details] [diff] [review]
refactor code to test if nsIWebProgress if for top-level window

Posting some new patches to address the feedback so far.
Attachment #737746 - Flags: review?(felipc)
This adds an isTopLevel attribute as Dão suggested.
Assignee: felipc → wmccloskey
Attachment #737747 - Attachment is obsolete: true
Attachment #737747 - Flags: review?(bugs)
Attachment #739356 - Flags: review?(bugs)
This replaces all the DOMWindow == content checks (and the related ones) with isTopLevel checks. Based on the discussions I've had with Gavin and Felipe, I think this is a correct refactoring.
Attachment #737751 - Attachment is obsolete: true
Attachment #737751 - Flags: review?(felipc)
Attachment #739357 - Flags: review?(felipc)
Attached patch add shims for webProgress, v3 (obsolete) — Splinter Review
New version of the main patch that forwards isTopLevel from the child to the parent process.
Attachment #737770 - Attachment is obsolete: true
Attachment #737770 - Flags: review?(felipc)
Attachment #739358 - Flags: review?(felipc)
Attachment #737746 - Attachment is obsolete: true
Comment on attachment 739357 [details] [diff] [review]
use isTopLevel attribute in browser code

Review of attachment 739357 [details] [diff] [review]:
-----------------------------------------------------------------

Sweet! this is so much nicer even for the existing single-process case
Attachment #739357 - Flags: review?(felipc) → review+
Comment on attachment 739358 [details] [diff] [review]
add shims for webProgress, v3

Review of attachment 739358 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know what else you envision being added to browser-parent.js, but from its current content I think it lends itself quite well to being a RemoteWebProgress.jsm instead of a #include (which will have the benefit of being lazily loaded only for remote cases).
I think the main issue is how it makes the .xml binding dependent on other JS that exists in the context (i.e. your browser.xml would be unusable in a context where browser.js isn't present). We have a few instances of this already but I wanna reduce it rather than add more.

(I'm aware that same thing happens with the new gMultiProcessBrowser boolean and I'm now thinking what to do about it.. but that is basically a reflection of a pref so we can easily remove the variable dependency if we want)


feedback+ for this question and some commentary below, but this looks good.

::: browser/base/content/browser-child.js
@@ +20,5 @@
> +    if (!aRequest)
> +      return null;
> +    if (aRequest instanceof Ci.nsIChannel)
> +      return aRequest.QueryInterface(Ci.nsIChannel).URI.spec;
> +    return undefined;

is there a difference in behavior here between null and undefined? if so add a comment, otherwise make the two the same

@@ +24,5 @@
> +    return undefined;
> +  },
> +
> +  _setupJSON: function setupJSON(aWebProgress, aRequest) {
> +    return { isTopLevel: aWebProgress.DOMWindow ? aWebProgress.DOMWindow == content : null,

you can now use your new aWebProgress.isTopLevel getter

@@ +47,5 @@
> +
> +    let json = this._setupJSON(aWebProgress, aRequest);
> +    json.documentURI = aWebProgress.DOMWindow.document.documentURIObject.spec;
> +    json.location = spec;
> +    json.canGoBack = docShell.canGoBack;

just fyi, canGoBack/canGoForward can throw. Perhaps it's unecessary to wrap this in a try if the throw conditions will never happen here, but it's probably better to do it anyway and avoid headaches in the future when Content:LocationChange doesn't get sent for mysterious reasons

@@ +86,5 @@
> +WebProgressListener.init();
> +
> +let WebNavigation =  {
> +  _webNavigation: docShell.QueryInterface(Ci.nsIWebNavigation),
> +  _timer: null,

_timer unused?

::: browser/base/content/browser-parent.js
@@ +11,5 @@
> +
> +RemoteWebProgressRequest.prototype = {
> +  QueryInterface : XPCOMUtils.generateQI([Ci.nsIChannel]),
> +
> +  get URI() { return this.uri; }

Do we know all usages of this getter? If we don't easily or it's accessible by add-ons it's better to do a this.uri.clone() to avoid someone changing the original ref (unless this is expected to be done)

@@ +39,5 @@
> +  _init: function WP_Init() {
> +    this._browser.messageManager.addMessageListener("Content:StateChange", this);
> +    this._browser.messageManager.addMessageListener("Content:LocationChange", this);
> +    this._browser.messageManager.addMessageListener("Content:SecurityChange", this);
> +    this._browser.messageManager.addMessageListener("Content:StatusChange", this);

I assume this will also need destructor handling to remove these listeners and the _browser ref, otherwise things will leak.

::: browser/base/content/browser.xml
@@ +1,1 @@
> +<?xml version="1.0"?>

Even though metro has a file with this name now, I'd prefer if we named this one as remote-browser.xml specially if we're only gonna use it for the remote binding.
Attachment #739358 - Flags: review?(felipc) → feedback+
Thanks, Felipe. I made all the changes you suggested except the one about canGoBack. If this fail, I think it should fail in a way that's noticeable. However, I still need to look into what happens if a content script throws an exception.
Attachment #739358 - Attachment is obsolete: true
Attachment #739903 - Flags: review?(felipc)
Comment on attachment 739903 [details] [diff] [review]
add shims for webProgress, v4

Review of attachment 739903 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Bill McCloskey (:billm) from comment #36)
> Thanks, Felipe. I made all the changes you suggested except the one about
> canGoBack. If this fail, I think it should fail in a way that's noticeable.

sounds good

::: browser/base/content/browser-child.js
@@ +24,5 @@
> +
> +  _setupJSON: function setupJSON(aWebProgress, aRequest) {
> +    return { isTopLevel: aWebProgress.isTopLevel,
> +             requestURI: this._requestSpec(aRequest)
> +           };

traditional style for more than 1 line is usually:

  return {
    isTopLevel: aWebProgress.isTopLevel,
    requestURI: this._requestSpec(aRequest)
  };

@@ +51,5 @@
> +    json.charset = charset.toString();
> +
> +    sendAsyncMessage("Content:LocationChange", json);
> +
> +    let self = this;

leftover

::: browser/modules/RemoteWebProgress.jsm
@@ +54,5 @@
> +  _destroy: function WP_Destroy() {
> +    this._browser.messageManager.removeMessageListener("Content:StateChange", this);
> +    this._browser.messageManager.removeMessageListener("Content:LocationChange", this);
> +    this._browser.messageManager.removeMessageListener("Content:SecurityChange", this);
> +    this._browser.messageManager.removeMessageListener("Content:StatusChange", this);

should probably also set this._browser = null; for good measure

@@ +83,5 @@
> +    this._isTopLevel = aMessage.json.isTopLevel;
> +
> +    switch (aMessage.name) {
> +    case "Content:StateChange":
> +      var req = this._uriSpec(aMessage.json.requestURI);

s/var/let/g in this function?
Attachment #739903 - Flags: review?(felipc) → review+
Comment on attachment 739356 [details] [diff] [review]
add DOMWindowID and isTopLevel to nsIWebProgress


>+NS_IMETHODIMP
>+nsBrowserStatusFilter::GetIsTopLevel(bool *aIsTopLevel)
>+{
>+    NS_NOTREACHED("nsBrowserStatusFilter::GetIsTopLevel");
>+    return NS_ERROR_NOT_IMPLEMENTED;
>+}
Could you still assign false to the out param



>+nsDocLoader::GetDOMWindowID(uint64_t *aResult)
>+{
Assign 0 to out param here.


>+nsDocLoader::GetIsTopLevel(bool *aResult)
>+{
assign false to out param here
Attachment #739356 - Flags: review?(bugs) → review+
Can we put remote-browser.xml in toolkit, rather than in browser?
Blocks: 884132
Comment on attachment 739356 [details] [diff] [review]
add DOMWindowID and isTopLevel to nsIWebProgress

>+NS_IMETHODIMP
>+nsDocLoader::GetIsTopLevel(bool *aResult)
>+{
>+  nsCOMPtr<nsIDOMWindow> window;
>+  nsresult rv = GetDOMWindow(getter_AddRefs(window));
>+  NS_ENSURE_SUCCESS(rv, rv);
>+
>+  nsCOMPtr<nsPIDOMWindow> piwindow = do_QueryInterface(window);
>+  NS_ENSURE_STATE(piwindow);
>+
>+  nsCOMPtr<nsIDOMWindow> topWindow;
>+  rv = piwindow->GetTop(getter_AddRefs(topWindow));
GetTop is actually a function on nsIDOMWindow (no QI needed) and works by getting information from the nsDocShell derived from this object, so the easiest way to implement this would actually be to stub it out here and override it in nsDocShell along the lines of function isTopLevel() { return !this.sameTypeParent; }
Blocks: 910838
Added dev-doc-needed keyword: this added DOMWindowID and isTopLevel attributes to nsIWebProgress, these are still not listed on https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIWebProgress.
Keywords: dev-doc-needed
You need to log in before you can comment on or make changes to this bug.