Closed Bug 1196975 Opened 4 years ago Closed 4 years ago

[e10s] Stop using shims in SDK code

Categories

(Add-on SDK Graveyard :: General, defect)

defect
Not set

Tracking

(e10sm8+, firefox45 fixed)

RESOLVED FIXED
mozilla45
Tracking Status
e10s m8+ ---
firefox45 --- fixed

People

(Reporter: billm, Assigned: gkrizsanits)

Details

Attachments

(6 files, 2 obsolete files)

The idea here is to use shims for SDK add-ons, but not for the SDK code itself. Ideally we could even turn off the shims for particular SDK modules. From talking to Mossop, it sounds like there are a few modules that still need the shims, so we could keep the enabled for those.
Assignee: nobody → gkrizsanits
tracking-e10s: --- → m8+
I'm a bit confused here. We seem to use interposition for the bootstrap scope (http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/internal/XPIProvider.jsm#4138) and for the add-ons code (http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsXREDirProvider.cpp#578) but I'm not sure if we add interposition to the actual modules the add-ons load, or if we do I'm not sure where we do that. I think the loader loads the modules in a separate sandbox than the add-on itself (and share the sandbox with other modules most of the time) but I can be wrong the SDK code is quite confusing.
I mean we have to use shims for modules since things like pagemods are relying on shimmed observers... just not sure where do we set the interposition for that scope. Other thing I'm not sure about is what will we gain from this. The previous shim optimization should fix some of the performance concerns here, what are the ones that we're targeting exactly? My biggest problem that the current architecture is using shimmed observers for various core functionality like pageworker (http://mxr.mozilla.org/mozilla-central/source/addon-sdk/source/lib/sdk/page-worker.js#19). Not sure how we can move away from that if that's the goal here.
Flags: needinfo?(wmccloskey)
The shims are enabled in Jetpack modules because we set the addonId attribute on their Sandbox. (Remember that shims are enabled per-addon, not per-compartment.)
http://mxr.mozilla.org/mozilla-central/source/addon-sdk/source/lib/toolkit/loader.js#245

I think the idea here would be to figure out which SDK modules actually depend on shims and then disable them for the rest. Mossop had a list of a couple that he remembered.

To be honest, I think the real bug here is to dig into a few important SDK add-ons (Video DownloadHelper and the LastPass beta, probably), find out why they're using shims, and figure out a way to stop using shims there. This bug proposes an idea for how to do that, but we don't necessarily have to do it this way.
Flags: needinfo?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #3)
> The shims are enabled in Jetpack modules because we set the addonId
> attribute on their Sandbox. (Remember that shims are enabled per-addon, not
> per-compartment.)
> http://mxr.mozilla.org/mozilla-central/source/addon-sdk/source/lib/toolkit/
> loader.js#245

Thanks, conceptually I remembered but could not recall how we did that. But yeah I remember now
http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCWrappedNativeScope.cpp#144

> 
> I think the idea here would be to figure out which SDK modules actually
> depend on shims and then disable them for the rest. Mossop had a list of a
> couple that he remembered.

Yeah, I talked to him, and guided me to a bug back then. I guess I still have
to play with it to see what's going on.

> 
> To be honest, I think the real bug here is to dig into a few important SDK
> add-ons (Video DownloadHelper and the LastPass beta, probably), find out why
> they're using shims, and figure out a way to stop using shims there. This
> bug proposes an idea for how to do that, but we don't necessarily have to do
> it this way.

Alright, this makes perfect sense now, thanks!
(In reply to Bill McCloskey (:billm) from comment #3)
> To be honest, I think the real bug here is to dig into a few important SDK
> add-ons (Video DownloadHelper and the LastPass beta, probably)

As it turns out Video DownloadHelper also hits a zillion of unsafe CPOW usages in the SDK, that is responsible for performance issues. And by a look at some of the stacks something is horrible wrong in the SDK:

(I just throw in an SDK function that hits unsafe CPOW usage to see the stack)

  Stack:
    getTabContentWindowNoShim@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/tabs/utils.js:268:1
getTabForContentWindowNoShim/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/tabs/utils.js:271:32
getTabForContentWindowNoShim@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/tabs/utils.js:271:10
isWindowInTab@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/content/sandbox.js:60:12
WorkerSandbox@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/content/sandbox.js:220:10
constructor@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/core/heritage.js:146:23
@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/deprecated/sync-worker.js:144:25
dispatch@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/method/core.js:119:12
method/method/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/lang/functional/core.js:26:42
method@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/lang/functional/core.js:26:1
@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/panel.js:347:3
emitOnObject@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/core.js:112:9
emit@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/core.js:89:38
receive@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/utils.js:115:5
transform/next@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/utils.js:36:24
filter/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/utils.js:53:7
transform/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/utils.js:43:29
emitOnObject@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/core.js:112:9
emit@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/core.js:89:38
receive@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/utils.js:115:5
transform/next@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/utils.js:36:24
filter/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/utils.js:53:7
transform/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/utils.js:43:29
emitOnObject@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/core.js:112:9
emit@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/event/core.js:89:38
forward@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/panel/events.js:20:3
Observer<.observe@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/system/events.js:76:7
emit@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/system/events.js:55:3
onContentLoad@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/panel/utils.js:310:7
EventListener.handleEvent*EventTargetInterposition.methods.addEventListener@resource://gre/modules/RemoteAddonsParent.jsm:624:5
AddonInterpositionService.prototype.interposeProperty/desc.value@file:///home/gabor/development/mozilla-central/obj-x86_64-unknown-linux-gnu/dist/bin/components/multiprocessShims.js:160:52
make@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/panel/utils.js:337:3
setup@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/panel.js:169:16
@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/core/disposable.js:48:56
dispatch@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/method/core.js:119:12
Disposable<.initialize@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/core/disposable.js:71:17
constructor@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/core/heritage.js:146:23
@resource://gre/modules/commonjs/toolkit/loader.js -> resource://testaddon/index.js:24:13
evaluate@resource://gre/modules/commonjs/toolkit/loader.js:278:19
load@resource://gre/modules/commonjs/toolkit/loader.js:330:5
main@resource://gre/modules/commonjs/toolkit/loader.js:747:10
run@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/addon/runner.js:145:19
startup/</<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/addon/runner.js:86:7
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:934:23
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:813:7
Promise*this.PromiseWalker.scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:744:11
this.PromiseWalker.schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:776:7
this.PromiseWalker.completePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:711:7
handler@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/addon/window.js:56:3
EventListener.handleEvent*EventTargetInterposition.methods.addEventListener@resource://gre/modules/RemoteAddonsParent.jsm:624:5
AddonInterpositionService.prototype.interposeProperty/desc.value@file:///home/gabor/development/mozilla-central/obj-x86_64-unknown-linux-gnu/dist/bin/components/multiprocessShims.js:160:52
@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/addon/window.js:54:1
evaluate@resource://gre/modules/commonjs/toolkit/loader.js:278:19
load@resource://gre/modules/commonjs/toolkit/loader.js:330:5
_require@resource://gre/modules/commonjs/toolkit/loader.js:632:16
require@resource://gre/modules/commonjs/toolkit/loader.js:586:12
startup/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/addon/runner.js:71:19
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:934:23
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:813:7
Promise*this.PromiseWalker.scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:744:11
this.PromiseWalker.schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:776:7
this.PromiseWalker.completePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:711:7
listener/<@resource://gre/modules/sdk/system/Startup.js:51:46

I want to figure out what is going on here, this is a mess and has to be fixed in the SDK.
One problem I see that Video DownloadHelper is using the low level API content/worker and passing in a CPOW window. So that comes with a bunch of CPOW usages like window.location and to determine the browser it belongs to we use window.top. Also while we iterate all the browsers we use shims for the browser.contentWindow == window.top check, but I can fix that part.

The problem I'm struggling with is how to get window.top or actually how to figure out the owner browser of a CPOWed window, without actually doing any CPOW calls. Bill, do you have any idea for that?
Flags: needinfo?(wmccloskey)
Dave, I'm trying to figure out what is the e10s story for content/workers. Should it support CPOWed windows? I kind of hope that we should just throw there for CPOWs... but I'm afraid that we're relying on this. So in e10s mode when a content window is created (not necessary top level) how do we attach a worker to it exactly? (What events are we relying on to do that / do we use CPOWs there?)
Flags: needinfo?(dtownsend)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #7)
> Dave, I'm trying to figure out what is the e10s story for content/workers.
> Should it support CPOWed windows? I kind of hope that we should just throw
> there for CPOWs... but I'm afraid that we're relying on this. So in e10s
> mode when a content window is created (not necessary top level) how do we
> attach a worker to it exactly? (What events are we relying on to do that /
> do we use CPOWs there?)

We shouldn't support attaching workers to CPOWed windows. Until recently we still relied on it but I think the last high-level module that used that was fixed by bug 1146603 so we might be able to throw in this case now.
Flags: needinfo?(dtownsend)
(In reply to Dave Townsend [:mossop] from comment #8)
> We shouldn't support attaching workers to CPOWed windows. Until recently we
> still relied on it but I think the last high-level module that used that was
> fixed by bug 1146603 so we might be able to throw in this case now.

Thanks Dave, I guess then what I asked from Bill is not that important anymore.
(would be still interested if there is a way but I kind of assume that there isn't
currently anyway)
Flags: needinfo?(wmccloskey)
We could guess that |window === window.top| and compare it to browser.contentWindow for all browsers. We might come up with nothing and have to fall back to using CPOWs to get window.top, but it would probably work most of the time.
Attachment #8671866 - Flags: review?(wmccloskey)
For this patch and for the rest, feel free to pass the reviews to Dave if you think he should review it instead.
Attachment #8671868 - Flags: review?(wmccloskey)
Attached patch part3: shimwaiver for event/dom (obsolete) — Splinter Review
There was only one consumer that relied on shims. At least now we know which one :)
Attachment #8671869 - Flags: review?(wmccloskey)
For selection.js we rely on shims here, so I had to add a noshim version and switch the rest of the consumers to that one, and keep the original for selection.js
Attachment #8671871 - Flags: review?(wmccloskey)
We've already talked about this with Dave a few comments above, so I flag him for this one. (Also since this is kind of a feature deprecation as opposed to the rest of the patches).
Attachment #8671874 - Flags: review?(dtownsend)
Comment on attachment 8671874 [details] [diff] [review]
part5: SDK Workers should not support CPOWs

Review of attachment 8671874 [details] [diff] [review]:
-----------------------------------------------------------------

I'm assuming this passes tests? If so it looks fine to me.

::: addon-sdk/source/lib/sdk/content/worker.js
@@ +119,5 @@
>    // This method of attaching should be deprecated
> +
> +  if (Cu.isCrossProcessWrapper(window))
> +    throw new Error("Attaching worker to a window from another " +
> +                    "process directly is not supported.");

Wrap this in braces please.
Attachment #8671874 - Flags: review?(dtownsend) → review+
Comment on attachment 8671866 [details] [diff] [review]
part1: shimwaiver. v1

Review of attachment 8671866 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/addoncompat/ShimWaiver.jsm
@@ +3,5 @@
> +// file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +this.EXPORTED_SYMBOLS = ["ShimWaiver"];
> +
> +let ShimWaiver = {

this.ShimWaiver =
Attachment #8671866 - Flags: review?(wmccloskey) → review+
Comment on attachment 8671871 [details] [diff] [review]
part4: GetTabForContentWindowNoShim. v1

Review of attachment 8671871 [details] [diff] [review]:
-----------------------------------------------------------------

::: addon-sdk/source/lib/sdk/content/worker.js
@@ +122,5 @@
>      detach(worker);
>  
>    model.window = window;
>    let frame = null;
> +  let tab = getTabForContentWindowNoShim(window.top);

Let's just pass window. This already does window.top internally.

::: addon-sdk/source/lib/sdk/tabs/utils.js
@@ +264,5 @@
> +// only sdk/selection.js is relying on shims
> +function getTabForContentWindowNoShim(window) {
> +  function getTabContentWindowNoShim(tab) {
> +    let browser = getBrowserForTab(tab);
> +    return ShimWaiver.getProperty(browser, "contentWindow") || ShimWaiver.getProperty(browser, "contentWindowAsCPOW");

Can we just use contentWindowAsCPOW alone?

@@ +266,5 @@
> +  function getTabContentWindowNoShim(tab) {
> +    let browser = getBrowserForTab(tab);
> +    return ShimWaiver.getProperty(browser, "contentWindow") || ShimWaiver.getProperty(browser, "contentWindowAsCPOW");
> +  }
> +  return getTabs().find(tab => getTabContentWindowNoShim(tab) === window.top) || null;

Can we try searching for window, and then search for window.top if that doesn't produce any results?
Attachment #8671871 - Flags: review?(wmccloskey) → review+
Comment on attachment 8671868 [details] [diff] [review]
part2: Simple cases for the shimwaiver. v1

Review of attachment 8671868 [details] [diff] [review]:
-----------------------------------------------------------------

I'm having trouble understanding why sdk/core/observer.js even exists. The only use in the tree is a test. And it's not exposed in mapping.json.

I'm going to have to pass this on to Dave. I don't really understand it enough. It seems like dom/events.js and system/events.js could be imported by SDK add-ons, which then wouldn't get the benefit of shims. The goal here is only to stop using shims when we know we don't need them.
Attachment #8671868 - Flags: review?(wmccloskey) → review?(dtownsend)
Comment on attachment 8671869 [details] [diff] [review]
part3: shimwaiver for event/dom

Review of attachment 8671869 [details] [diff] [review]:
-----------------------------------------------------------------

Same as above.
Attachment #8671869 - Flags: review?(wmccloskey) → review?(dtownsend)
(In reply to Bill McCloskey (:billm) from comment #19)
> Comment on attachment 8671871 [details] [diff] [review]
> part4: GetTabForContentWindowNoShim. v1
> ::: addon-sdk/source/lib/sdk/tabs/utils.js
> @@ +264,5 @@
> > +// only sdk/selection.js is relying on shims
> > +function getTabForContentWindowNoShim(window) {
> > +  function getTabContentWindowNoShim(tab) {
> > +    let browser = getBrowserForTab(tab);
> > +    return ShimWaiver.getProperty(browser, "contentWindow") || ShimWaiver.getProperty(browser, "contentWindowAsCPOW");
> 
> Can we just use contentWindowAsCPOW alone?

I'm sorry, that part is a left over. we don't need || ShimWaiver.getProperty(browser, "contentWindowAsCPOW"); here at all. I just tried to get rid of the original getTabForContentWindow this way but that did not work out and it seems like I forgot this here. Thanks for noticing it and sorry for the confusion.

> @@ +266,5 @@
> > +  function getTabContentWindowNoShim(tab) {
> > +    let browser = getBrowserForTab(tab);
> > +    return ShimWaiver.getProperty(browser, "contentWindow") || ShimWaiver.getProperty(browser, "contentWindowAsCPOW");
> > +  }
> > +  return getTabs().find(tab => getTabContentWindowNoShim(tab) === window.top) || null;
> 
> Can we try searching for window, and then search for window.top if that
> doesn't produce any results?

I should have made this obvious, this version does not take CPOWs ever, that's why we don't have to rely on shims. It's too bad that I caused confusion with that leaving in the contentWindowAsCPOW check.

On the other hand in the original version, the getTabContentWindow we could do this, and you are right we should.
(In reply to Bill McCloskey (:billm) from comment #20)
> I'm going to have to pass this on to Dave. I don't really understand it
> enough. It seems like dom/events.js and system/events.js could be imported
> by SDK add-ons, which then wouldn't get the benefit of shims. The goal here
> is only to stop using shims when we know we don't need them.

This is a good point. But on the long run we will not support that anyway. Dave, what's the alternative way here that is e10s compatible? How can we suggest people to use that, and what is the deprecation process to roll out a change like this?

In the meanwhile we could leave these parts as they are and add a noshim  version for them, and use those internally. I think a lot of internal shim usages originates from these and for no good reason really.
Comment on attachment 8671868 [details] [diff] [review]
part2: Simple cases for the shimwaiver. v1

Review of attachment 8671868 [details] [diff] [review]:
-----------------------------------------------------------------

I think we used to use core/observer a bunch but I switched modules away from it because it was adding to CPOW traffic a lot. It looks like no-one on AMO uses that module so we could probably remove it.

Of the files here both dom/events and system/events are in use in modules on AMO so this could potentially break things if people are trying to listen to content events or observer notifications. One option would be to spend the time looking at all those uses to see if any of them care about shimmed events. The other I guess is to deprecate these modules, clone them to versions that don't use shims and wait for developers to switch. Our policy used to be an 18 week deprecation process but given that these are low-level modules we could probably accelerate that (particularly if we could uplift those changes to aurora at least.
Comment on attachment 8671869 [details] [diff] [review]
part3: shimwaiver for event/dom

Review of attachment 8671869 [details] [diff] [review]:
-----------------------------------------------------------------

::: addon-sdk/source/lib/sdk/content/events.js
@@ +34,5 @@
>    // Map supported event types to a streams of those events on the given
>    // `window` for the inserted document and than merge these streams into
>    // single form stream off all window state change events.
>    let stateChanges = TYPES.map(function(type) {
> +    return open(document, type, { capture: true }, /* withShims = */ true);

I'm failing to understand why this needs shims. The only caller of this is page-worker and it shouldn't be dealing with remote windows at all. What breaks without this?
(In reply to Dave Townsend [:mossop] from comment #25)
> Comment on attachment 8671869 [details] [diff] [review]
> part3: shimwaiver for event/dom
> 
> Review of attachment 8671869 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: addon-sdk/source/lib/sdk/content/events.js
> @@ +34,5 @@
> >    // Map supported event types to a streams of those events on the given
> >    // `window` for the inserted document and than merge these streams into
> >    // single form stream off all window state change events.
> >    let stateChanges = TYPES.map(function(type) {
> > +    return open(document, type, { capture: true }, /* withShims = */ true);
> 
> I'm failing to understand why this needs shims. The only caller of this is
> page-worker and it shouldn't be dealing with remote windows at all. What
> breaks without this?

I'm failing to understand the 'why' part as well :( But page-workers are broken without the shims and I could not figure it out why. It would help if I understood how they're supposed to work, but I'm not sure how we attach them and how do we attach listeners and so forth. So a typical test failure would look like this one:

TEST-START | jetpack-package/addon-sdk/source/test/test-page-worker.js.testAutoDestructor
TEST-INFO | [JavaScript Error: "1444737011298	Toolkit.Telemetry	ERROR	ClientID::getCachedClientID - invalid client id in preferences, resetting: null" {file: "resource://gre/modules/Log.jsm" line: 751}]
TEST-INFO | [JavaScript Error: "1444737013693	Toolkit.GMP	ERROR	GMPInstallManager.simpleCheckAndInstall Could not check for addons: Error: got node name: html, expected: updates (resource://gre/modules/addons/ProductAddonChecker.jsm:145:1) JS Stack trace: parseXML@ProductAddonChecker.jsm:145:1 < promise callback*ProductAddonChecker.getProductAddonList@ProductAddonChecker.jsm:299:12 < GMPInstallManager.prototype.checkForAddons@GMPInstallManager.jsm:107:5 < GMPInstallManager.prototype.simpleCheckAndInstall<@GMPInstallManager.jsm:213:29" {file: "resource://gre/modules/Log.jsm" line: 751}]
TEST-UNEXPECTED-FAIL | jetpack-package/addon-sdk/source/test/test-page-worker.js.testAutoDestructor | Test timed out (after: START)
TEST-INFO | Traceback (most recent call last):
  File "resource://gre/modules/commonjs/sdk/timers.js", line 40, in notify
    callback.apply(null, args);
  File "resource://gre/modules/commonjs/sdk/deprecated/unit-test.js", line 504, in tiredOfWaiting
    self.console.testMessage(false, false, self.test.name,
  File "resource://gre/modules/commonjs/sdk/test/harness.js", line 538, in testMessage
    this.trace();

So usually some exception that does not make much sense, possibly some other exception that we cannot detect since they are swallowed by either the SDK or some XUL code. Tests are timing out and add-ons stop working. It does not help that the SDK tests seems to swallow some console output as well. So I feel like I'm poking a black box again... but it does bother me as well, I kind of hoped you know the answer for why are we relying on shims for page-workers. But if you say that we shouldn't, we should figure out why instead of just keep relying on them.

I know it's been a while for you as well, but you might know the answers for some of these questions, and it would help me a LOT:
- Does this error make any sense to you?
- Do you know the full story about how we attach page-workers and its listeners to content and where? (I would assume we do that in the content process if we're not relying on shims, but where? and where should we get its events on the parent side?) Or do you know who might know it?
- Is there  away to run SDK tests with some about:prefs on and prevent it to swallow console output?
Flags: needinfo?(dtownsend)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #26)
> (In reply to Dave Townsend [:mossop] from comment #25)
> > Comment on attachment 8671869 [details] [diff] [review]
> > part3: shimwaiver for event/dom
> > 
> > Review of attachment 8671869 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > ::: addon-sdk/source/lib/sdk/content/events.js
> > @@ +34,5 @@
> > >    // Map supported event types to a streams of those events on the given
> > >    // `window` for the inserted document and than merge these streams into
> > >    // single form stream off all window state change events.
> > >    let stateChanges = TYPES.map(function(type) {
> > > +    return open(document, type, { capture: true }, /* withShims = */ true);
> > 
> > I'm failing to understand why this needs shims. The only caller of this is
> > page-worker and it shouldn't be dealing with remote windows at all. What
> > breaks without this?
> 
> I'm failing to understand the 'why' part as well :( But page-workers are
> broken without the shims and I could not figure it out why. It would help if
> I understood how they're supposed to work, but I'm not sure how we attach
> them and how do we attach listeners and so forth. So a typical test failure
> would look like this one:
> 
> TEST-START |
> jetpack-package/addon-sdk/source/test/test-page-worker.js.testAutoDestructor
> TEST-INFO | [JavaScript Error: "1444737011298	Toolkit.Telemetry	ERROR
> ClientID::getCachedClientID - invalid client id in preferences, resetting:
> null" {file: "resource://gre/modules/Log.jsm" line: 751}]
> TEST-INFO | [JavaScript Error: "1444737013693	Toolkit.GMP	ERROR
> GMPInstallManager.simpleCheckAndInstall Could not check for addons: Error:
> got node name: html, expected: updates
> (resource://gre/modules/addons/ProductAddonChecker.jsm:145:1) JS Stack
> trace: parseXML@ProductAddonChecker.jsm:145:1 < promise
> callback*ProductAddonChecker.getProductAddonList@ProductAddonChecker.jsm:299:
> 12 < GMPInstallManager.prototype.checkForAddons@GMPInstallManager.jsm:107:5
> <
> GMPInstallManager.prototype.simpleCheckAndInstall<@GMPInstallManager.jsm:213:
> 29" {file: "resource://gre/modules/Log.jsm" line: 751}]
> TEST-UNEXPECTED-FAIL |
> jetpack-package/addon-sdk/source/test/test-page-worker.js.testAutoDestructor
> | Test timed out (after: START)
> TEST-INFO | Traceback (most recent call last):
>   File "resource://gre/modules/commonjs/sdk/timers.js", line 40, in notify
>     callback.apply(null, args);
>   File "resource://gre/modules/commonjs/sdk/deprecated/unit-test.js", line
> 504, in tiredOfWaiting
>     self.console.testMessage(false, false, self.test.name,
>   File "resource://gre/modules/commonjs/sdk/test/harness.js", line 538, in
> testMessage
>     this.trace();
> 
> So usually some exception that does not make much sense, possibly some other
> exception that we cannot detect since they are swallowed by either the SDK
> or some XUL code. Tests are timing out and add-ons stop working. It does not
> help that the SDK tests seems to swallow some console output as well. So I
> feel like I'm poking a black box again... but it does bother me as well, I
> kind of hoped you know the answer for why are we relying on shims for
> page-workers. But if you say that we shouldn't, we should figure out why
> instead of just keep relying on them.

I did some digging and there is a lot of weirdness here. I kind of think that this code is a little broken and the shims are somehow fixing it. The particularly confusing but is that the test fails with the shims on even if e10s is off. That means that the shims are making things behave differently in non-e10s which is probably not desirable. That said I don't understand what is going wrong without the shims. For some reason addon an un-shimmed DOM event listener for DOMContentLoaded to the frame that the page-worker creates for that test never gets called even though I can see the document in the frame does get loaded. I'm wondering if something changed about propagating DOM events across the docshell content boundary (particularly in the hidden window) since this test was passing before we turned shims on.

I have an idea of a way to fix it but I'm out tomorrow so won't get chance to look at this again till Thursday.

> I know it's been a while for you as well, but you might know the answers for
> some of these questions, and it would help me a LOT:
> - Does this error make any sense to you?

The error is unrelated

> - Do you know the full story about how we attach page-workers and its
> listeners to content and where? (I would assume we do that in the content
> process if we're not relying on shims, but where? and where should we get
> its events on the parent side?) Or do you know who might know it?

We do it in the main process. The page that page-workers attach to is in the main process.

> - Is there  away to run SDK tests with some about:prefs on and prevent it to
> swallow console output?

It generally shouldn't, console.log outputs fine for me.
Flags: needinfo?(dtownsend)
(In reply to Dave Townsend [:mossop] from comment #27)
> I did some digging and there is a lot of weirdness here. I kind of think
> that this code is a little broken and the shims are somehow fixing it. The

There are some weirdness here that part is granted... 

> particularly confusing but is that the test fails with the shims on even if
> e10s is off. That means that the shims are making things behave differently
> in non-e10s which is probably not desirable. That said I don't understand

Wow. I will have to investigate that. That sounds weird.

> > - Do you know the full story about how we attach page-workers and its
> > listeners to content and where? (I would assume we do that in the content
> > process if we're not relying on shims, but where? and where should we get
> > its events on the parent side?) Or do you know who might know it?
> 
> We do it in the main process. The page that page-workers attach to is in the
> main process.

That page is in the main process, but its window is not. So we create a sandbox that is in the main process and attach the CPOW of the window to it's global as the proto. That is super scary, and we get unsafe CPOWs every time someone is touching the window from the pages content script. So I guess the reason why this does not work without shims is that we don't get the DOM events from the window since it is in the content process. I think most tests are based on data uris, in which case the window of the page might be in the parent process... (I'm just guessing about this, I have not validated this part at all)

I still have to look into this some more, but is this way it is intended to work? Because I think this should be fixed.

> 
> > - Is there  away to run SDK tests with some about:prefs on and prevent it to
> > swallow console output?
> 
> It generally shouldn't, console.log outputs fine for me.

I've just got a new hardware, and I think I had a non-debug build or something, anyway this was totally my bad.
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #28)
> > > - Do you know the full story about how we attach page-workers and its
> > > listeners to content and where? (I would assume we do that in the content
> > > process if we're not relying on shims, but where? and where should we get
> > > its events on the parent side?) Or do you know who might know it?
> > 
> > We do it in the main process. The page that page-workers attach to is in the
> > main process.
> 
> That page is in the main process, but its window is not. So we create a
> sandbox that is in the main process and attach the CPOW of the window to
> it's global as the proto. That is super scary, and we get unsafe CPOWs every
> time someone is touching the window from the pages content script. So I
> guess the reason why this does not work without shims is that we don't get
> the DOM events from the window since it is in the content process. I think
> most tests are based on data uris, in which case the window of the page
> might be in the parent process... (I'm just guessing about this, I have not
> validated this part at all)

I don't think this is right. Nothing is remote here (or at least it isn't intended to be). page-worker creates an iframe in the hidden XUL window and loads a url in it. The iframe doesn't have the remote attribute so it shouldn't be loaded remotely regardless of the url.
(In reply to Dave Townsend [:mossop] from comment #29)
> I don't think this is right. Nothing is remote here (or at least it isn't
> intended to be). page-worker creates an iframe in the hidden XUL window and
> loads a url in it. The iframe doesn't have the remote attribute so it
> shouldn't be loaded remotely regardless of the url.

Alright I will double check this but it seems like this is the case.

I don't think we want to allow arbitrary web-content to be loaded on the parent side, not even in the hidden window. So I think it can be that this part has changed, and sandboxing reached a point where any frame that has content URL gets loaded in a child process. I don't know.

Anyway, assuming that this is the case, how can we fix this? Is there a reason why we use here a sync worker instead of a remote one? Why are we trying to load content on the parent side anyway?
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #30)
> (In reply to Dave Townsend [:mossop] from comment #29)
> > I don't think this is right. Nothing is remote here (or at least it isn't
> > intended to be). page-worker creates an iframe in the hidden XUL window and
> > loads a url in it. The iframe doesn't have the remote attribute so it
> > shouldn't be loaded remotely regardless of the url.
> 
> Alright I will double check this but it seems like this is the case.
> 
> I don't think we want to allow arbitrary web-content to be loaded on the
> parent side, not even in the hidden window. So I think it can be that this
> part has changed, and sandboxing reached a point where any frame that has
> content URL gets loaded in a child process. I don't know.
> 
> Anyway, assuming that this is the case, how can we fix this? Is there a
> reason why we use here a sync worker instead of a remote one? Why are we
> trying to load content on the parent side anyway?

We're not trying to as such, we have bug 1129662 on file to switch to a remote worker, just no-one has worked on it.
Comment on attachment 8671869 [details] [diff] [review]
part3: shimwaiver for event/dom

Review of attachment 8671869 [details] [diff] [review]:
-----------------------------------------------------------------

I'm going to try to work on bug 1129662 tomorrow to either fix or get enough of the way there that we don't need to take the withShims argument from this patch.
Attachment #8671869 - Flags: review?(dtownsend) → review-
Comment on attachment 8671868 [details] [diff] [review]
part2: Simple cases for the shimwaiver. v1

Review of attachment 8671868 [details] [diff] [review]:
-----------------------------------------------------------------

The patch looks good but I guess we should ask Kev or someone what they want to do about warnings for potentially breaking changes
Attachment #8671868 - Flags: review?(dtownsend)
Attachment #8671868 - Flags: review+
Attachment #8671868 - Flags: feedback?(kev)
(In reply to Dave Townsend [:mossop] from comment #32)
> Comment on attachment 8671869 [details] [diff] [review]
> part3: shimwaiver for event/dom
> 
> Review of attachment 8671869 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'm going to try to work on bug 1129662 tomorrow to either fix or get enough
> of the way there that we don't need to take the withShims argument from this
> patch.

Thanks Dave, that sounds great! Let me know if I can help with it somehow.

(In reply to Dave Townsend [:mossop] from comment #33)
> Comment on attachment 8671868 [details] [diff] [review]
> part2: Simple cases for the shimwaiver. v1
> 
> Review of attachment 8671868 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> The patch looks good but I guess we should ask Kev or someone what they want
> to do about warnings for potentially breaking changes

I think I will try to update this patch to use the shimwaiver versions internally but keep exporting the original shimmed versions for add-ons. And then we should not worry about add-on breakage in this bug.
One thing that occurred to me. For SDK modules that have been updated to e10s we generally have the module in the parent process talking to modules in the child process. Each child process (and the main process for in-content frames) has its own loader separate from the normal loader. If there was a way to make modules loaded by this child loader never use shims that might save some cpow cases without needing the explicit shim waiver. Is that something that is possible?
Flags: needinfo?(gkrizsanits)
(In reply to Dave Townsend [:mossop] from comment #35)
> there was a way to make modules loaded by this child loader never use shims
> that might save some cpow cases without needing the explicit shim waiver.

Luckily we only do interposition in the parent/chrome process, in the child/content processes this should be already a non-issue.
Flags: needinfo?(gkrizsanits)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #36)
> (In reply to Dave Townsend [:mossop] from comment #35)
> > there was a way to make modules loaded by this child loader never use shims
> > that might save some cpow cases without needing the explicit shim waiver.
> 
> Luckily we only do interposition in the parent/chrome process, in the
> child/content processes this should be already a non-issue.

Right, but process scripts also run in the parent process.
(In reply to Dave Townsend [:mossop] from comment #37)
> Right, but process scripts also run in the parent process.

Oh, this sounds interesting, yes we could do something like that. We should flag the sandboxes this child loader is using and then avoid shims for those scopes entirely. Some platform work need to be done for this but nothing complicated really. Could you point me to an example where this might occur?
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #38)
> (In reply to Dave Townsend [:mossop] from comment #37)
> > Right, but process scripts also run in the parent process.
> 
> Oh, this sounds interesting, yes we could do something like that. We should
> flag the sandboxes this child loader is using and then avoid shims for those
> scopes entirely. Some platform work need to be done for this but nothing
> complicated really. Could you point me to an example where this might occur?

Here is where we create the loader for the processes: https://dxr.mozilla.org/mozilla-central/source/addon-sdk/source/lib/framescript/content.jsm#37

The patch in bug 112962 makes content/l10n-html use the observer service to listen for "document-element-inserted". This will make the instance that runs in the parent process likely use shims unless we stop sandboxes from that loader from being shimmed.
I made sdk/dom/events-shimmed the default for dom/events. The other option would be to change all the related requires in the SDK. Only a handful of add-ons use "sdk/dom/events" in require. But I'm not sure about the story why both versions are supported on the first place. Anyway let me know if this is OK or not.

I had to remove a couple of content/events related tests, but since only tests are using that API I don't care too much.

There rest should be pretty straightforward but let me know if something is not or if I broke any rules...
Attachment #8671868 - Attachment is obsolete: true
Attachment #8671869 - Attachment is obsolete: true
Attachment #8671868 - Flags: feedback?(kev)
Attachment #8682401 - Flags: review?(dtownsend)
See Comment 35 and below why this might be a nice addition.
Attachment #8682404 - Flags: review?(wmccloskey)
There was one last in the latest try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1a15bdf67aab but I fixes it already in the uploaded patches. So this should be all green on try.
Attachment #8682406 - Flags: review?(dtownsend) → review+
Comment on attachment 8682401 [details] [diff] [review]
part2-part3 merged: shimwaiver applications. v2

Review of attachment 8682401 [details] [diff] [review]:
-----------------------------------------------------------------

::: addon-sdk/source/lib/sdk/dom/events.js
@@ +113,5 @@
> +function removeListener(element, type, listener, capture, shimmed = false) {
> +  if (shimmed) {
> +    element.removeEventListener(type, listener, capture);
> +  } else {
> +    ShimWaiver.getProperty(element, "removeEventListener")(type, listener, capture);    

Lots of trailing whitespace throughout this patch
Attachment #8682401 - Flags: review?(dtownsend) → review+
(In reply to Dave Townsend [:mossop] from comment #45)
> Lots of trailing whitespace throughout this patch

Whoops, sorry I need to set up my new editor a bit better it seems... I will fix them, and thanks for spotting it.
Attachment #8682404 - Flags: review?(wmccloskey) → review+
You need to log in before you can comment on or make changes to this bug.