Closed Bug 573635 Opened 14 years ago Closed 14 years ago

e10s: resolve changes to prompt code from m-c merge

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
fennec 2.0a1+ ---

People

(Reporter: jduell.mcbugs, Assigned: dougt)

References

Details

Attachments

(3 files, 4 obsolete files)

While trying to merge from m-c to e10s, we get a conflict that embedding/components/windowwatcher/src/nsPromptService.cpp has been removed in m-c (replaced apparently with nsPrompter.js).  But smaug has concurrently made some changes to the file to support forwarding modal (chrome) dialogs from content to chrome (bug 548847)

I'm not sure which bug/patch removed nsPromptService.cpp, since hg log apparently doesn't include changesets which delete a file in its output (!).   Apparently dolske made the change.

dolske on irc: "that's going to need significant changes. Basically nsPrompter.js needs to be made to do what the old nsPromptService.cpp stuff was doing, and all the nsIDialogParamBlock stuff changes to the a nsIPropertyBag."

I have not checked in the m-c->e10s merge yet, but plan to simply delete nsPromptService.cpp for now.
Giving to dougt, since he asked to be assigned to it.
Bug 563274 landed the prompting overhaul that's in play here.
After looking into this, here are my comments and thoughts:

1. We might modify nsPrompter.js, the file parallel to the removed nsPromptService.cpp, with the (JavaScript version of) the changes smaug made prior to the removal to the removed file. However, this doesn't look nice for several reasons:

1.a. nsIWindowWatcher.idl is frozen, and one method we would need to access, GetWindowTreeOwner, appears only in nsWindowWatcher.cpp, not in the idl. We can add an nsIWindowWatcher2.idl I guess (but do we want to?). A further issue is that GetWindowTreeOwner is static.

1.b. The arguments in nsPrompter.js and the e10s work smaug did on prompts are not of the same kind, and it doesn't seem there is any obvious and clean way to translate them (one is a key-value dictionary, the other a list of ints and strings).

2. Alternatively, since nsPrompter.js is in JavaScript, we can in theory use the messageManager code we have been using in mobile e10s to remote things (like e.g. the InstallTrigger). However,

2.a. Turns out that the remoting of prompts never worked in mobile e10s - only regular e10s. So we will need to fix that as a separate issue, and meanwhile can't use fennec for testing purposes here. But anyhow, I am not even sure we have in regular e10s the tools like the messageManager (which we would access from browser.js and frame-content.js in mobile) - do we?

2.b. This approach would entail basically replacing what smaug did. Do we want to do that, and can we do that - does anything else depend on it?

2.c. We would need to access a message manager from a component in content, where in the current design it is not reachable. Might be a workaround though.
On content component you can access the TabChildGlobal from the window object
which is opening a dialog.

And about 1b, the old code use DialogParams, I assume the new code uses similar,
just not exactly the same thing. You need to serialize the parameters (key-value list) and then deserialize it in chrome process.
Attached patch message manager approach (obsolete) — Splinter Review
WIP patch for the message manager approach. alert(), notify() and prompt() are remoted using the message manager. Need feedback on both the approach and the details. Some stuff to note:

1. This works in mobile e10s, but will not work in plain e10s. Getting it to work there would require message managers etc. set up like in mobile e10s. Once that is set up (or if it already exists) it would be easy to add there (at which point we could basically remove the previous e10s code for remoting prompts, assuming as I asked before it isn't needed somehow elsewhere).

2. smaug, I didn't quite understand what you said about getting the message manager through TabChildGlobal. How would that be done? Meanwhile the current patch gets the message manager in a hackish way using wrappedJSObject.

3. Aside from alert(), notify() and prompt(), there are other nsIPromptService calls that could be remoted, like alertCheck etc. These weren't remoted in the e10s work previously done, so I wasn't sure if it was needed - is it? If so would be easy to add (but would need a different way to test).

4. Probably should move the new code in browser.js and content.js to separate files. Any ideas about where and what names?
Some notes:
* PromptService.js in Fennec is being overhauled. See bug 573130 and bug 573649.
* If we do use an approach like this, I think we should use bindings/browser.xml and bindings/browser.js to hold the code. When we move that code to toolkit, it should "just work".
TabChildGlobal *is* the content process (or inProcess) message manager.
So the approach in the patch won't work in test-ipc.xul, because sending sync message and then opening a dialog prevents all the painting in the client process.
But I guess that isn't a problem in Fennec? Or is it? What is the dialog is
open for a long time?
(In reply to comment #2)
 
> 1.a. nsIWindowWatcher.idl is frozen

It's worth noting that in bug 563274 we discussed it being ok to unfreeze interfaces as needed in order to make overhauled prompting work sanely. It ended up that I didn't need to, but it's still an option. [And it's likely trunk will be globally de-thawed and bumped to Gecko 2.0 real soon now, which makes the issue moot.]

Not sure why changing the interface is useful to you, but thought I'd note that it is an option.

> 1.b. The arguments in nsPrompter.js and the e10s work smaug did on prompts are
> not of the same kind, and it doesn't seem there is any obvious and clean way to
> translate them 

Please don't try. :) nsIDialogParamBlock is an abomination of an "interface". It would be best to write code to serialize the property bag to a JSON string, and pass that across the process boundary.
Depends on: 573130, 573649
(In reply to comment #7)
> So the approach in the patch won't work in test-ipc.xul, because sending sync
> message and then opening a dialog prevents all the painting in the client
> process.
> But I guess that isn't a problem in Fennec? Or is it? What is the dialog is
> open for a long time?

Due to the type of dialogs shown in Fennec, I don't think it's a problem there. But it would be nice to have it the same way everywhere. However, since the alert(), prompt() etc. are synchronous (they can return what the user enters in the dialog), I am not sure how we could implement them using asynchronous networking. We would need to spin in a message processing loop in content and wait for the response, I guess - can we do that?
Blocks: 573649
No longer depends on: 573649
Attachment #453819 - Flags: review?(josh)
Comment on attachment 453819 [details] [diff] [review]
message manager approach

>diff --git a/chrome/content/browser.js b/chrome/content/browser.js

>+  messageManager.addMessageListener("PrompterOpenPrompt", this);

"Prompt:Open" would be a more conventional message name

>diff --git a/chrome/content/content.js b/chrome/content/content.js
>+function PromptServiceRemoterChild(messageManager) {
>+  var prompter = Cc["@mozilla.org/embedcomp/prompt-service;1"].getService(Ci.nsIPromptService);
>+  prompter.wrappedJSObject.messageManager = messageManager;
>+  prompter.messageManager = messageManager;
>+}

Doing this makes the last open tab be the "winner". The prompt service will only use the last open tab as the source of the alert, confirm or prompt.

Unless you found out that each tab has it's own prompt service. Let me know if this is the case.

>+var runtime = Components.classes["@mozilla.org/xre/app-info;1"].getService(Components.interfaces.nsIXULRuntime);
>+if (runtime.processType == runtime.PROCESS_TYPE_CONTENT) {
>+  var r = new PromptServiceRemoterChild(this);
>+}

content.js is _always_ run from a content-process - no need for the check

>diff --git a/components/PromptService.js b/components/PromptService.js

> function promptService() {
>   let bundleService = Cc["@mozilla.org/intl/stringbundle;1"].getService(Ci.nsIStringBundleService);
>   this._bundle = bundleService.createBundle("chrome://global/locale/commonDialogs.properties");
>+  this.wrappedJSObject = this; // XXX safe?

Yuck

r- until we get the last-open-tab issue figured out

As bad as it sounds, a simpler approach would be to override the window.alert, window.confirm and window.prompt from a content-process script.
Attachment #453819 - Flags: review?(josh) → review-
(In reply to comment #10)
> As bad as it sounds, a simpler approach would be to override the window.alert,
> window.confirm and window.prompt from a content-process script.

How would one do that?
> As bad as it sounds, a simpler approach would be to override the window.alert,
> window.confirm and window.prompt from a content-process script.

mfinkle, is something like this what you meant?
Attachment #457103 - Flags: feedback?(mark.finkle)
(In reply to comment #12)
> Created attachment 457103 [details] [diff] [review]
> Remote alert/confirm/prompt directly
> 
> > As bad as it sounds, a simpler approach would be to override the window.alert,
> > window.confirm and window.prompt from a content-process script.
> 
> mfinkle, is something like this what you meant?

Yes, something just like that. I'd change the names a bit (XxxParent and XxxChild is a bit different than our other naming conventions in JS). I'd be interested to hear what others think of this approach.

If we really want to go with this approach, I'd recommend we move the code to content/bindings/browser.js and .xml, so it's available everywhere.
Comment on attachment 457103 [details] [diff] [review]
Remote alert/confirm/prompt directly

(forgot to feedback+ the approach)
Attachment #457103 - Flags: feedback?(mark.finkle) → feedback+
Comment on attachment 457103 [details] [diff] [review]
Remote alert/confirm/prompt directly

sendSyncMessage blocks painting in the content process, so I think
test-ipc.xul would have painting problems.
But I guess we're going to move the painting to chrome process also in that
case.
New version of direct remoting of alert/etc.. Moved changes to the files mfinkle said, and also the proper tab is now brought to the foreground.

Olli: This still uses sync messages for confirm() and prompt(); I don't see how to avoid that right now (their return values are needed). Rendering is fine in Fennec at least, not sure about elsewhere.
Attachment #457103 - Attachment is obsolete: true
Attachment #457452 - Flags: review?(mark.finkle)
Attachment #457452 - Flags: review?(Olli.Pettay)
(In reply to comment #16)
> Olli: This still uses sync messages for confirm() and prompt(); I don't see how
> to avoid that right now (their return values are needed).
Did you look at the implementation in TabChild::OpenDialog?
(It is broken now because of Bug 563274)


> Rendering is fine in
> Fennec at least, not sure about elsewhere.
I didn't test, but I assume test-ipc.xul doesn't really work.
Content page won't be painted while moving the alert dialog above it.
tracking-fennec: --- → 2.0a1+
I hate to have a fennec-only fix for this, but we might need to do that for the short term.
sync handling might actually break different kinds of things.
Currently in non-e10s alert() doesn't stop event loop.
New patch part 1 of 2.
Attachment #453819 - Attachment is obsolete: true
Attachment #457452 - Attachment is obsolete: true
Attachment #459883 - Flags: review?(mark.finkle)
Attachment #459883 - Flags: review?(Olli.Pettay)
Attachment #457452 - Flags: review?(mark.finkle)
Attachment #457452 - Flags: review?(Olli.Pettay)
Part 2 of 2.

This includes a patch to the message manager, adding a function processNextEvent(), which lets code run loops that don't freeze everything. Using that, the mobile-browser patch can return values properly from prompt() and confirm(), without sync messages - everything is async now. So this should avoid the problems with the previous patch.
Attachment #459886 - Flags: review?(mark.finkle)
Attachment #459886 - Flags: review?(Olli.Pettay)
This solution will trigger the same problems as bug 559200, as necko messages could be processed before they're expected.
(In reply to comment #22)
> This solution will trigger the same problems as bug 559200, as necko messages
> could be processed before they're expected.

I don't understand.

It seems that the situation in the other bug is: The parent sends RecvOnStartRequest and RecvOnDataAvailable. The child receives them, and while running the first - RecvOnStartRequest - it makes an RPC call, which runs an event loop (due to sending an rpc message), and that leads to the second being processed before the first finished (which is bad).

What is parallel to that in this bug? Yes, the child will run an event loop, but this is due to a child-side script calling alert/notify/confirm(). So I am not sure what ordering problem can occur, or what necko messages could be processed before they should.
(In reply to comment #23)
> (In reply to comment #22)
> > This solution will trigger the same problems as bug 559200, as necko messages
> > could be processed before they're expected.
> 
> I don't understand.
> 
> It seems that the situation in the other bug is: The parent sends
> RecvOnStartRequest and RecvOnDataAvailable. The child receives them, and while
> running the first - RecvOnStartRequest - it makes an RPC call, which runs an
> event loop (due to sending an rpc message), and that leads to the second being
> processed before the first finished (which is bad).
> 

The problem is re-entrancy.  In the other bug, OnStart led to necko notifying some observer that sent an RPC message.  RPC messages spin an *IPC-message-only* wait loop, which awakes for other IPC messages.  One of those happened to be OnData, and necko 'asploded when it re-entered OnStart.

> What is parallel to that in this bug? Yes, the child will run an event loop,
> but this is due to a child-side script calling alert/notify/confirm(). So I am
> not sure what ordering problem can occur, or what necko messages could be
> processed before they should.

Here, this code does something even worse than RPC wrt re-entrancy --- spinning the *main XPCOM* event loop.  So, necko might receive OnStart, then notify an observer that throws a dialog.  This code will spin the XPCOM loop, which not only processes IPC messages (one of which might be OnData), but also *any arbitrary XPCOM event*.  The potential for re-entrance into necko code is strictly worse than with RPC.

I don't know how modal dialogs and synchronous XHR work in firefox-bin.  Someone needs to find out(!), specifically if they just spin the main XPCOM event loop without any event filtering.  If so, we're totally screwed here, and all IPDL message handlers that might call out into foreign code need to protect themselves against arbitrary re-entrancy.  Unless someone has a clever idea for fixing dialogs and synchronous XHR.
(In reply to comment #24)
> 
> Here, this code does something even worse than RPC wrt re-entrancy --- spinning
> the *main XPCOM* event loop.  So, necko might receive OnStart, then notify an
> observer that throws a dialog.  This code will spin the XPCOM loop, which not
> only processes IPC messages (one of which might be OnData), but also *any
> arbitrary XPCOM event*.  The potential for re-entrance into necko code is
> strictly worse than with RPC.
> 
> I don't know how modal dialogs and synchronous XHR work in firefox-bin. 
> Someone needs to find out(!), specifically if they just spin the main XPCOM
> event loop without any event filtering.  If so, we're totally screwed here, and
> all IPDL message handlers that might call out into foreign code need to protect
> themselves against arbitrary re-entrancy.  Unless someone has a clever idea for
> fixing dialogs and synchronous XHR.

Thanks for the explanation, I see what you mean.

Ok, here is how alert/confirm/notify() are currently handled:

http://mxr.mozilla.org/mozilla-central/source/xpfe/appshell/src/nsXULWindow.cpp#417

- same kind of loop as in the code here. So in theory the same problem could occur, *if* onStart and onDataAvailable were queued (but I guess they aren't, the onStart is just done immediately, before even sending onDataAvailable). But in e10s these can be queued, leading to the problem.

It looks like the options are:

1. Use sync messages for this bug & similar ones. This will prevent the queuing problem, but the price is that the child is basically frozen until the dialog is responded to. There might actually be cases where this breaks web pages, I suspect, but even without that it is a change in existing behavior, and will look odd to users.

2. Guard against this problem - does it even occur? (That is, do we have code that actually does call an observer in onStartRequest that shows an alert/confirm/prompt?) We can guard against that by noting when we are in code that can suffer from the queuing problem (line OnStartRequest), and runtime aborting if an alert/etc. is shown during that time.

3. Change how the observer calls are handled. If the observer notifications in OnStartRequest were not done immediately, but rather queued to happen right after it, the queuing problem would go away. But perhaps existing code assumes that the observer is called before the OnStartRequest is finished? If so, might not be feasible.

3.1. As a variation, perhaps the calling of the observers in OnStartRequest can be done at the very end of that function - after whatever is needed by OnData is
already set up.


I recommend going with #1 for now. This only affects Fennec, so nothing in Firefox will be broken. So, that means applying the "Remote alert/confirm/prompt directly v2" patch (currently obsoleted).
(In reply to comment #24)
> I don't know how modal dialogs and synchronous XHR work in firefox-bin. 
> Someone needs to find out(!), specifically if they just spin the main XPCOM
> event loop without any event filtering. 
They do just spin the main XPCOM event loop.

... which is why 1# doesn't really work, since there is still the
synchronous XHR.
After further thought, I think there are two short-term things we can do to resolve this specific bug:

A. Use the previous patch, with synchronous networking. Freezes while the prompt is active, so a regression there. Fixes the potential issues that can arise from the child running the XPCOM event loop, but, the loop is now run in the parent - which can also lead to problems in theory.

B. Use the current patch, which runs the XPCOM event loop in the child. This is problematic, but we are already currently running the loop in the child - so there is no regression in that sense. But this approach does add the loop in the parent. On the other hand there is no freezing while the prompt is active.

In both cases there are potential problems with the XPCOM event loop, but resolving that is outside the scope of this bug (and it is not easy to do).
the child process should just use sendSyncMessage and block everything in the child process until the prompt returns.  This avoids the crashes due to message ordering and is a clean path forward.
If we take that approach, does everything still work ok if the alert() is open
for a long time and there is for example gmail open in another tab (which uses still the same content process)
(In reply to comment #29)
> If we take that approach, does everything still work ok if the alert() is open
> for a long time and there is for example gmail open in another tab (which uses
> still the same content process)

As mentioned earlier, this approach does change behavior and there might be websites with regressions. Specifically in this example, networking will be essentially frozen, so gmail in another tab might time out if the prompt is active for long enough. Gmail handles that fairly gracefully, other websites might not.
Comment on attachment 457452 [details] [diff] [review]
Remote alert/confirm/prompt directly v2

Switch to use DOMWindowCreated in this patch and let's go with it
Attachment #457452 - Attachment is obsolete: false
Attached patch Sync patch with DOMWindowCreated (obsolete) — Splinter Review
Fixed version of synchronous networking patch, with DOMWindowCreated.
Attachment #457452 - Attachment is obsolete: true
Attachment #460983 - Flags: review?(mark.finkle)
Comment on attachment 460983 [details] [diff] [review]
Sync patch with DOMWindowCreated

I'm a fan of using " around strings, but other than that r+
Attachment #460983 - Flags: review?(mark.finkle) → review+
Fixed 's to "s.

Carried over the r+ (that was correct to do, I hope)?
Attachment #460983 - Attachment is obsolete: true
Attachment #461021 - Flags: review+
(In reply to comment #34)
> Created attachment 461021 [details] [diff] [review]
> Sync patch with DOMWindowCreated v1.1
> 
> Fixed 's to "s.
> 
> Carried over the r+ (that was correct to do, I hope)?

Yes
tested and pushed:
http://hg.mozilla.org/mobile-browser/rev/0eddbfb4b022

We can open new bugs for trying other, safer, better approaches to this problem.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Attachment #459883 - Flags: review?(Olli.Pettay) → review-
Comment on attachment 459883 [details] [diff] [review]
mobile-browser patch (part 1 of 2)

Er, r+ for this approach.
Necko is being fixed to handle re-entrancy, so I think we need something like this.
Attachment #459883 - Flags: review- → review+
Comment on attachment 459886 [details] [diff] [review]
mozilla-central patch (part 2 of 2)

Very strange place for processNextEvent.
Attachment #459886 - Flags: review?(Olli.Pettay) → review-
(In reply to comment #36)
> tested and pushed:
> http://hg.mozilla.org/mobile-browser/rev/0eddbfb4b022
> 
> We can open new bugs for trying other, safer, better approaches to this
> problem.

Let's start discussion here in bug #573635 if not started yet.
Attachment #459883 - Flags: review?(mark.finkle)
Attachment #459886 - Flags: review?(mark.finkle)
Regarding better, safer approaches: This code fails to act as expected,

  document.cookie = "asdfasdf";
  alert(document.cookie.length);
  alert(document.cookie);
  for (i=0; i < document.cookie.length; i++)
    x = document.cookie;
  alert(x);

but replacing the alerts with something else will work. It seems this might be an example of hitting reentrancy problems?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: