Closed Bug 1122083 Opened 9 years ago Closed 7 years ago

e10s porting document doesn't talk about Addon-SDK

Categories

(Add-on SDK Graveyard :: Documentation, defect, P1)

defect

Tracking

(e10s+)

RESOLVED INCOMPLETE
Tracking Status
e10s + ---

People

(Reporter: ianbicking, Assigned: wbamberg)

References

Details

(Whiteboard: triaged)

This document: https://developer.mozilla.org/en-US/Add-ons/Working_with_multiprocess_Firefox

It talks a lot about frame scripts, but the techniques it suggests don't seem applicable to the Addon-SDK.  Specifically it involves registering various chrome: URLs in the manifest.  Maybe it only needs to talk about using data.url() in those places, but right now the page is unclear for SDK users.
Will or Jim: is this issue on your radar? Should the Add-on SDK have a separate e10s documentation?
Blocks: e10s-addons
tracking-e10s: --- → +
Flags: needinfo?(wbamberg)
Flags: needinfo?(jmathies)
The docs say that if you are just using the "high-level APIs" then you should be unaffected by e10s. But that leaves a couple of things unexplained:
* exactly which "low-level" Add-on SDK APIs will not work any more?
* what should you do if you are using these APIs?

I think I have a reasonable understanding of the answers to these questions, but want to check my understanding first.
Assignee: nobody → wbamberg
Flags: needinfo?(wbamberg)
Thanks, Will.
Flags: needinfo?(jmathies)
As an example, here's some code I am using in an SDK addon:

    var canvas = document.createElementNS('http://www.w3.org/1999/xhtml', 'canvas');
    document.documentElement.appendChild(canvas);
    canvas.width = 300;
    canvas.height = 200;
    var ctx = canvas.getContext('2d');
    var contentWindow = require("sdk/tabs/utils").getTabContentWindow(tab);
    ctx.drawWindow(contentWindow, pos.x, pos.y, pos.w, pos.h, backgroundColor);

This is code I picked up off Stack Overflow.  I don't believe I can run this code from a regular page worker for permission reasons (drawWindow isn't available).  Without e10s I can run it directly in the addon code.  With e10s I get an error "Argument 1 of CanvasRenderingContext2D.drawWindow does not implement interface Window" which took me a while to realize was because of e10s.

It would be nice to see warnings around some of these low level modules (like tab/utils) about e10s, ideally with some error messages included.  (For instance, I searched the web for this error string as my first step.)  I think the solution is a frame worker, so then we need additional documentation to explain how to make a frame worker from an SDK addon.
I've dug into this a bit, and here's what I think the current state of things is. I've also cc-ed a few people who I hope can correct me if I'm way off. Otherwise, maybe I should just write this up as a page on MDN?

(1) The SDK contract with respect to e10s:

High-level APIs will just work in e10s. If they don't, file a bug.
Low-level APIs might not. In practice most Low-level APIs will work fine, but Low-level APIs which give direct access to content will not work. For example, you might expect this would not work:

var contentDocument = require("sdk/window/utils").getMostRecentBrowserWindow().content.document;

(2) However, the SDK has also shimmed many Low-level APIs, so they will partially work (including the example above). There are 2 caveats though:

* the shims are not perfect: some add-ons will not work even with the shims (this is Ian's case)
* the shims can have a bad effect on performance, and we would like to get people off them eventually

We don't have a complete list of Low-level APIs that will not work without the shims. This would be good to have.

(3) To test whether an add-on works without the shims, use multiprocessCompatible (https://developer.mozilla.org/en-US/Add-ons/Install_Manifests#multiprocessCompatible). With the SDK, you can set this by editing the install.rdf template in SDK_ROOT/app-extension/install.rdf. You can't do this with jpm, only cfx.

However, some of the SDK's APIs internally depend on the shims. So by setting multiprocessCompatible, your add-on might not work, even if you are only using High-level APIs. Example:

var pageMod = require("sdk/page-mod");

pageMod.PageMod({
  include: "*.mozilla.org",
  contentScript: 'document.body.innerHTML = ' +
                 ' "<h1>Page matches ruleset</h1>";'
});

This add-on will not work if you've set multiprocessCompatible, because page-mod depends on the shims.

(4) If the shims don't work for you or if you are trying to get off them, you can use frame scripts in your add-on just fine. For example, an add-on that trivially access content directly using a Low-level API:

function logContent() {
  var contentDocument = require("sdk/window/utils").getMostRecentBrowserWindow().content.document;
  console.log(contentDocument.body.innerHTML);
}

require("sdk/ui/button/action").ActionButton({
  id: "log-content",
  label: "Log Content",
  icon: "./icon-16.png",
  onClick: logContent
});

This add-on will work by default due to the shims, but will break if you set multiprocessCompatible. So you could rewrite the add-on to use frame scripts:

/*
frame-script.js is in the "data" directory, and has this content:

sendAsyncMessage("sdk-low-level-apis-e10s@jetpack:got-content", content.document.body.innerHTML);

*/

const self = require("sdk/self");
const windowUtils = require("sdk/window/utils");
const tabUtils = require("sdk/tabs/utils")

function logContentAsync() {
  // There is probably a more sensible
  // way to get the current <browser>
  var browserWindow = windowUtils.getMostRecentBrowserWindow();
  var tab = tabUtils.getActiveTab(browserWindow);
  var browser = tabUtils.getBrowserForTab(tab);
  var browserMM = browser.messageManager;
  browserMM.loadFrameScript(self.data.url("frame-script.js"), false);
  browserMM.addMessageListener("sdk-low-level-apis-e10s@jetpack:got-content", logContent);
}

function logContent(message) {
  console.log(message.data);
}

require("sdk/ui/button/action").ActionButton({
  id: "log-content",
  label: "Log Content",
  icon: "./icon-16.png",
  onClick: logContentAsync
});

Now if will work even if you set multiprocessCompatible. See the frame scripts docs for all the details.

(5) There are some things the SDK team could do to help developers. Is it worth raising bugs for these, or are there already bugs?

(a) figure out exactly which LLAs break without shims, and add warnings when people use them
(b) make it possible for jpm users to set multiprocessCompatible.
(c) un-shim the SDK's own internals, so you can set multiprocessCompatible without breaking things like page-mod
(In reply to Will Bamberg [:wbamberg] from comment #5)
> I've dug into this a bit, and here's what I think the current state of
> things is. I've also cc-ed a few people who I hope can correct me if I'm way
> off. Otherwise, maybe I should just write this up as a page on MDN?
> 
> (1) The SDK contract with respect to e10s:
> 
> High-level APIs will just work in e10s. If they don't, file a bug.
> Low-level APIs might not. In practice most Low-level APIs will work fine,
> but Low-level APIs which give direct access to content will not work. For
> example, you might expect this would not work:
> 
> var contentDocument =
> require("sdk/window/utils").getMostRecentBrowserWindow().content.document;
> 
> (2) However, the SDK has also shimmed many Low-level APIs, so they will
> partially work (including the example above). There are 2 caveats though:

For the sake of correctness, it is Firefox itself that provides the shims to all add-ons to help them work in e10s.

> * the shims are not perfect: some add-ons will not work even with the shims
> (this is Ian's case)
> * the shims can have a bad effect on performance, and we would like to get
> people off them eventually
> 
> We don't have a complete list of Low-level APIs that will not work without
> the shims. This would be good to have.
> 
> (3) To test whether an add-on works without the shims, use
> multiprocessCompatible
> (https://developer.mozilla.org/en-US/Add-ons/
> Install_Manifests#multiprocessCompatible). With the SDK, you can set this by
> editing the install.rdf template in SDK_ROOT/app-extension/install.rdf. You
> can't do this with jpm, only cfx.
> 
> However, some of the SDK's APIs internally depend on the shims. So by
> setting multiprocessCompatible, your add-on might not work, even if you are
> only using High-level APIs. Example:
> 
> var pageMod = require("sdk/page-mod");
> 
> pageMod.PageMod({
>   include: "*.mozilla.org",
>   contentScript: 'document.body.innerHTML = ' +
>                  ' "<h1>Page matches ruleset</h1>";'
> });
> 
> This add-on will not work if you've set multiprocessCompatible, because
> page-mod depends on the shims.

The goal should be to fix all high-level APIs so they work without the shims in place. I don't know if we have bugs tracking that work right now though.

> (4) If the shims don't work for you or if you are trying to get off them,
> you can use frame scripts in your add-on just fine. For example, an add-on
> that trivially access content directly using a Low-level API:
> 
> function logContent() {
>   var contentDocument =
> require("sdk/window/utils").getMostRecentBrowserWindow().content.document;
>   console.log(contentDocument.body.innerHTML);
> }
> 
> require("sdk/ui/button/action").ActionButton({
>   id: "log-content",
>   label: "Log Content",
>   icon: "./icon-16.png",
>   onClick: logContent
> });
> 
> This add-on will work by default due to the shims, but will break if you set
> multiprocessCompatible. So you could rewrite the add-on to use frame scripts:
> 
> /*
> frame-script.js is in the "data" directory, and has this content:
> 
> sendAsyncMessage("sdk-low-level-apis-e10s@jetpack:got-content",
> content.document.body.innerHTML);
> 
> */
> 
> const self = require("sdk/self");
> const windowUtils = require("sdk/window/utils");
> const tabUtils = require("sdk/tabs/utils")
> 
> function logContentAsync() {
>   // There is probably a more sensible
>   // way to get the current <browser>
>   var browserWindow = windowUtils.getMostRecentBrowserWindow();
>   var tab = tabUtils.getActiveTab(browserWindow);
>   var browser = tabUtils.getBrowserForTab(tab);
>   var browserMM = browser.messageManager;
>   browserMM.loadFrameScript(self.data.url("frame-script.js"), false);
>  
> browserMM.addMessageListener("sdk-low-level-apis-e10s@jetpack:got-content",
> logContent);
> }
> 
> function logContent(message) {
>   console.log(message.data);
> }
> 
> require("sdk/ui/button/action").ActionButton({
>   id: "log-content",
>   label: "Log Content",
>   icon: "./icon-16.png",
>   onClick: logContentAsync
> });
> 
> Now if will work even if you set multiprocessCompatible. See the frame
> scripts docs for all the details.
> 
> (5) There are some things the SDK team could do to help developers. Is it
> worth raising bugs for these, or are there already bugs?
> (a) figure out exactly which LLAs break without shims, and add warnings when
> people use them
> (b) make it possible for jpm users to set multiprocessCompatible.
> (c) un-shim the SDK's own internals, so you can set multiprocessCompatible
> without breaking things like page-mod

I don't know of bugs, so feel free to raise them, unfortunately our resources are spread thin right now trying to get the whole of Firefox working for e10s, making changes to the SDK to help remove shim usage is likely to take a lower priority.
Ian: I've written a version of the above up at https://developer.mozilla.org/en-US/Add-ons/SDK/Guides/Multiprocess_Firefox_and_the_SDK, and linked to it from the main add-ons migration doc. Please let me know if this works for you.

I don't have a definitive list of which SDK APIs are not e10s-compatible, but we could raise a bug asking for the SDK team to come up with a list. I agree with you that printing warnings when people use e10s-unsafe APIs seems like a good idea.
Flags: needinfo?(ianb)
A couple notes:

- Is it possible to use more than one script in a framescript?  In most SDK contexts I could do contentScriptFile: [self.data.url("util.js"), self.data.url("script.js")], but it's not clear if loadFrameScript works like this.
- Generally I assume framescripts and content scripts are not the same, and don't have the same environment (e.g., no ports in framescripts).  If this was noted explicitly that would be helpful.
- Is there any meaning or pattern to the string "sdk-low-level-apis-e10s@jetpack:got-content"?  Does the string have to be unique across all of Firefox?
- I assume that SDK window objects don't have a .messageManager property, but it's probably worth calling out the distinction explicitly.
- It would probably be good to note some of the conversion functions.  The viewFor module is referenced in the windows module, but doesn't have its own page.  It's less clear to me how to convert a tab, maybe require("sdk/tabs/utils").getOwnerWindow(aTab).messageManager ?  Nope, experimentation seems to indicate no. The document uses the term "chrome window", but there seems to be at least 4 kinds of windows and I can't keep track.  XUL?  Browser?  DOM?  SDK?  Content?  Is DOM the same as Content?  
  - Related to this window confusion I seem to have gotten stuck in a situation where I have a ChromeMessageBroadcaster instead of a normal message manager, maybe because I got the wrong kind of window object.
  - Ultimately using require("sdk/tabs/utils").getBrowserForTab(tab) seemed to give me the right object.  The example code uses getMostRecentBrowserWindow() which didn't seem to work for me (got me a ChromeMessageBroadCaster)
- There's an example of getting data from a framescript, but not sending data to the framescript.  I am often confused about load order guarantees, so I'd like to see in the docs if you can just send data to the framescript, or if you have to do a handshake to make sure the framescript is initialized before sending data.
- It's unclear how long messageManager exists and how those messages work.  If I have multiple framescripts are all the messages going through the same channel?  Is it a per-tab channel?  What is the life of a framescript?  My head hurts when I try to imagine what might happen if I invoke the same routine twice on the same window.
- You mention the Global Message Manager, but I don't know why I'd use that.  An example of why you'd want that message manager would be useful.
- It appears sendAsync/etc can take objects, not just strings.  But the example just shows a string, and I assumed that only strings were allowed.  An object example would be helpful.

I'm not 100% of the way to converting my addon for e10s using these instructions, but these are at least a few notes as I continue to work on it.
Flags: needinfo?(ianb)
(In reply to Will Bamberg [:wbamberg] from comment #7)
> Ian: I've written a version of the above up at
> https://developer.mozilla.org/en-US/Add-ons/SDK/Guides/
> Multiprocess_Firefox_and_the_SDK, and linked to it from the main add-ons
> migration doc. Please let me know if this works for you.
> 
> I don't have a definitive list of which SDK APIs are not e10s-compatible,
> but we could raise a bug asking for the SDK team to come up with a list. I
> agree with you that printing warnings when people use e10s-unsafe APIs seems
> like a good idea.

I've updated bug 1004745 with dependencies for all the high level API problems I know of. That is sdk/page-worker (likely works but needs to load its pages remotely), sdk/page-mod (mostly works but can be slow), sdk/tabs (some broken pieces), sdk/selection (some fixes on the way but there will be some features that can't be fixed here). Additionally a bug exists for sdk/windows which I think works fine but we need to evaluate further.

As for the low-level APIs I've filed bug 1129664 to look into figuring out which work and which don't. I'm not sure when that will actually happen, it's hard to do by hand and there isn't a good automated way.
(In reply to Ian Bicking (:ianb) from comment #8)
> A couple notes:
> 

Thanks for the feedback. As we discussed, some of these are addressed in the main message manager page [1] and I don't think we should duplicate it here.

For the rest, I've made some updates to https://developer.mozilla.org/en-US/Add-ons/SDK/Guides/Multiprocess_Firefox_and_the_SDK#Using_frame_scripts

* tried to make it clear that frame scripts and content scripts are totally different things
* distinguished the three sorts of window that get bandied about, and tried to explain a bit better how to do conversions from SDK-land to chrome-land
* covered the tabUtils.getBrowserForTab(viewFor(sdkTab)) pattern for getting a XUL browser from an SDK tab
* mentioned that you can only pass a single script into loadFrameScript

Please let me know if this covers it for you.

[1] https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox/The_message_manager
Flags: needinfo?(ianb)
I had a bug that I wanted to resolve, to verify that I really understand how things should work.  Turns out I don't!

The specific problem I'm encountering is that I am adding a framescript to a tab to do something specific.  If I do it again I end up adding two framescripts (and the message manager just broadcasts back and forth from all of them).  Then if I reload or go to a new URL those framescripts hang around and cause problems.

In https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox/The_message_manager under "Unloading frame scripts" it says: "disable any frame scripts that are already loaded. There is no mechanism to unload frame scripts that are already loaded, so you need to send a message to your frame scripts telling them to disable themselves."  I can't find any information on how a framescript disables itself (or why I seem to have framescripts hanging around when they should be disabled automatically – I'm not using allowDelayedLoad).  I thought there might be information in https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox/Frame_script_environment but there isn't.  Experimentation seems to indicate I have to remove all listeners and then it gets collected.

Separately, when I was reminding myself again what was going on I first looked at https://developer.mozilla.org/en-US/Add-ons/Working_with_multiprocess_Firefox and found it hard to find my way back to https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox – a little more link density between the two could be useful, since breadcrumbs don't help navigate between them.
Flags: needinfo?(ianb)
BTW, removing the listener doesn't cause anything to be cleaned up, I just stopped noticing the zombie framescripts.
Something you might be interested in is bug 1130529 which I'm trying to get through review now. It adds some SDK modules that make dealing with child processes a lot nicer. It includes the ability to load a module in any child processes and setting up nice message passing between them. Modules running in the parent process can load other modules as you might expect and see when any new frames are created (the parent sees this too). Anything loaded in the child process should get unloaded when the add-on in the parent is unloaded automatically.
(In reply to Ian Bicking (:ianb) from comment #11)
> I had a bug that I wanted to resolve, to verify that I really understand how
> things should work.  Turns out I don't!
> 
> The specific problem I'm encountering is that I am adding a framescript to a
> tab to do something specific.  If I do it again I end up adding two
> framescripts (and the message manager just broadcasts back and forth from
> all of them).  Then if I reload or go to a new URL those framescripts hang
> around and cause problems.
> 

Frame scripts are associated with a browser tab, not with a page. So once you load them, they stay loaded until the tab is closed, even if you reload the document or navigate. So they hang around. There is, AFAIK, no way to unload them once you have loaded them, other than closing the tab they were loaded into.

> In
> https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox/
> The_message_manager under "Unloading frame scripts" it says: "disable any
> frame scripts that are already loaded. There is no mechanism to unload frame
> scripts that are already loaded, so you need to send a message to your frame
> scripts telling them to disable themselves."  I can't find any information
> on how a framescript disables itself (or why I seem to have framescripts
> hanging around when they should be disabled automatically – I'm not using
> allowDelayedLoad).  I thought there might be information in
> https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox/
> Frame_script_environment but there isn't.

AKAIK there is no "disable feature" in a frame script, that's why it doesn't appear there. "telling them to disable themselves" means, sending them a message telling them to stop doing whatever it was they were doing, remove event listeners, &c. So it's a manual thing, to be implemented in each frame script.

See bug 1051238 for more details on this.

Does that make sense? I will try to make it clearer in the docs.

>  Experimentation seems to indicate
> I have to remove all listeners and then it gets collected.
> 
> Separately, when I was reminding myself again what was going on I first
> looked at
> https://developer.mozilla.org/en-US/Add-ons/
> Working_with_multiprocess_Firefox and found it hard to find my way back to
> https://developer.mozilla.org/en-US/Firefox/Multiprocess_Firefox – a little
> more link density between the two could be useful, since breadcrumbs don't
> help navigate between them.

OK, thanks.
I initially thought bug 1051238 just indicated the *content* of framescripts was cached forever.  But sure, maybe the running process is also unkillable.

How this all relates to loadFrameScript and allowDelayedLoad I'm unclear.  I expected allowDelayedLoad to mean that the script would stick around forever, but apparently everything does!

I wrote this module to help me manage frame scripts: https://github.com/mozilla-services/pageshot/blob/2e2d8fa92b00a828f23a7a8d49ed9c59cf4b23c3/lib/framescripter.js

It's called from here: https://github.com/mozilla-services/pageshot/blob/2e2d8fa92b00a828f23a7a8d49ed9c59cf4b23c3/lib/screenshot.js#L45-50

If it was fancier I'd handle the cleanup that bug 1051238 implies is necessary, but this starts to add more and more boilerplate the frame scripts, all without any code reuse techniques for those scripts :( (though presumably bug 1130529 would help a lot)
Priority: -- → P1
Whiteboard: triaged
https://bugzilla.mozilla.org/show_bug.cgi?id=1399562
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.