Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Implement "FrameWorker" component to handle persistent connections to social providers

RESOLVED FIXED in Firefox 16

Status

()

Firefox
SocialAPI
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: Gavin, Assigned: mixedpuppy)

Tracking

Trunk
Firefox 16
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [Fx16])

Attachments

(2 attachments, 9 obsolete attachments)

See these links for details of the component already implemented as part of the Social API addon (https://github.com/mozilla/socialapi-dev/blob/develop/modules/frameworker.js):
https://wiki.mozilla.org/Firefox_Social_Integration_Design_Spec#FrameWorker
https://wiki.mozilla.org/Firefox_Social_Integration_Design_Spec#The_Worker_API

The requirements are to be able to maintain a persistent connection to the social provider, and allow message passing to implement functionality such as notifications, chat, "share" buttons, etc.

The current approach is to create an empty JS sandbox/worker, and "import" desired APIs into it. A possibly simpler alternative we were considering was to just use an iframe, since some social provider utility libraries depend on the presence of a DOM. That might be simpler implementation-wise, but exposes a much larger API, making it almost certainly impossible to ever revert to a more restricted one.

BrowserID has a similar setup which they call a "Sandbox":
http://mxr.mozilla.org/mozilla-central/source/services/aitc/modules/browserid.js
That takes the "use an iframe" approach, so we might be able to factor that code out and use it for this, if we decide to go that way.
(Assignee)

Comment 1

5 years ago
My position is that we change this to have the iframe.contentWindow as the proto object for the sandbox.  This provides the frameworker script with access to window and document.  Our current frameworker does not change that much, we still only load a js file and still use ports for communication between provider frames.  We reduce new functionality to only the importScripts api and ports.  The advantages/disadvantages of making this change that I see are are:

Disadvantages:

- inability to move to a real Worker in the future
- frameworkers run on UI thread (not a change at the moment, but if we could move to a real worker we'd move off the UI thread)
- potentially a larger api surface can make it easier for poorly written providers to affect performance, but one could argue that is possible now
- potentially larger memory footprint by using an iframe vs. a real worker. (not sure what the difference really is)

Advantages:

* advantages really may be better defined as problems we no longer have if we do this.

- various bugs have appeared in our use of the sandbox, where we need to add certain APIs from window into the sandbox.
- limited api availability limits the providers ability to rely on existing js libraries (internal or 3rd party) that may require access to window and document.  Both providers in development now have run into limitations they need to work around.  There is legit concern of forward maintenance of hacks to work around these limitations
- when we run into an API that a provider finds necessary, we have to add support for it in code, recent example is the need to access cookies for the domain, there are currently 5 or 6 non-worker APIs that we add to the frameworker.
- where we need new APIs not available to a real Worker, we'll have to add them and potentially run them through w3c if we want browser compatibility in the future.  Using an iframe, it might be possible to add support to some browsers through an extension.

As well, from a perspective of timing and resourcing for landing on m-c I feel we remove a chunk of effort by using window as proto.
(Assignee)

Comment 2

5 years ago
Created attachment 631053 [details] [diff] [review]
windowproto.patch

This patch illustrates the change at a code level.  Current tests in the hg repo still pass with this change, as well the full addon continues to work barring the change of our window.cookie back to document.cookie.
(In reply to Shane Caraveo (:mixedpuppy) from comment #1)
> Disadvantages:
> 
> - inability to move to a real Worker in the future

This is the crux of my argument.  IMO, an inability to move to a real Worker in the future while still keeping the "worker-like" model is a poor decision to make at this stage, especially given there are no specific blocking issues for our providers which this would solve.
(In reply to Shane Caraveo (:mixedpuppy) from comment #2)
> Created attachment 631053 [details] [diff] [review]
> windowproto.patch
> 
> This patch illustrates the change at a code level.  Current tests in the hg
> repo still pass with this change, as well the full addon continues to work
> barring the change of our window.cookie back to document.cookie.

To be clear, my argument isn't that the change is difficult.  My argument is simply that the change is wrong from a strategic POV.

Comment 5

5 years ago
Testing my understanding: can we do a limited exposure of "window", perhaps as unsafeWindow, and not provide "document"?  This would make clear our intent to move to a more W3C-Worker-like system when possible, while providing a simple path for implementors to move forward today.
(Assignee)

Comment 6

5 years ago
(In reply to Michael Hanson from comment #5)
> Testing my understanding: can we do a limited exposure of "window", perhaps
> as unsafeWindow, and not provide "document"?  This would make clear our
> intent to move to a more W3C-Worker-like system when possible, while
> providing a simple path for implementors to move forward today.

yes, we can do that, but I would rather not go half way.  This is what the addonsdk did, and either you can live in that api or you cannot, when you cannot, you have unsafeWindow littered everywhere in your code.  If we do it, lets at least call it "unsupportedWindow", the term unsafe made me feel like I was creating security issues with every use.


I'd much rather have the web as the api rather than having to do a postmessage to get my cookies.
(Assignee)

Updated

5 years ago
Depends on: 755122
(Assignee)

Comment 7

5 years ago
adding dependencies on all the frameworker related bugs for now, some may get removed later if not important
Depends on: 756173, 756588, 759219, 751241, 755482
The point is that strategically we want to stick with Workers.  Therefore, using a window as the prototype should be seen purely as an implementation detail.  So IMO we should *not* expose unsafeWindow or window or the DOM, then we seem to be exactly where we want to be.

Shane, ISTM you are arguing that the Worker is not the correct long-term strategy.  That is a worthwhile decision to consider, but isn't what we are discussing here IIUC.  It we assume for now we want to stick with the Worker and *not* expose any of that stuff in the short or long term, what are our options?
(Assignee)

Updated

5 years ago
Assignee: nobody → mixedpuppy
(Assignee)

Comment 9

5 years ago
After prototyping a proxy object wrapping window and using that as the sandbox prototype, I'm inclined to leave the frameworker as it is.  The proxy adds more code that needs testing, and ends up being technically the same as what we have now.  It does not fix the array problem, and we already worked around the XHR issue.
amen to that :)  With no actual things the change would fix, I agree the case for change is poor.
(Assignee)

Comment 11

5 years ago
I still think window as proto is the right thing, but for a limited api, the proxy adds nothing.

Updated

5 years ago
Whiteboard: [ms1]

Comment 12

5 years ago
Shane, where are we with this? Is this the patch we are using? Is there still an unresolved issue? It would be great if you can summarize.
(Assignee)

Comment 13

5 years ago
The patch attached wont be used, but there are a couple platform issues I attached to this bug that still need to be resolved at this point that are attached to this bug, I wouldn't make any of those a blocker for and initial landing.
So should we just close this?
No, we should use this bug to track getting the module landed in mozilla-central. We discussed this on IRC - a patch against mozilla-central putting the module in toolkit/components/social should be sufficient. Bug 762579 includes some build goop to make that easier, so we can wait for that to land and build on top of it.
(Assignee)

Comment 16

5 years ago
Created attachment 634671 [details] [diff] [review]
frameworker implementation patch

Here is the implementation patch for initial review.  We are having a windows/nightly/sandbox bug that is being looked into, but I feel we can get a code review started on this in parallel.
Attachment #631053 - Attachment is obsolete: true
Attachment #634671 - Flags: review?(gavin.sharp)
(Assignee)

Comment 17

5 years ago
Created attachment 634680 [details] [diff] [review]
frameworker implementation patch

new patch works around previously mentioned nightly/windows issue
Attachment #634671 - Attachment is obsolete: true
Attachment #634671 - Flags: review?(gavin.sharp)
Attachment #634680 - Flags: review?(gavin.sharp)

Updated

5 years ago
Blocks: 766403
Comment on attachment 634680 [details] [diff] [review]
frameworker implementation patch

Overall, this solution is more complicated than I thought - I suppose because it implements something that looks like the SharedWorker API, rather than using a simpler "hidden window with message passing" approach.

It seems like we'll need to also land the workerAPI module for this to actually be useful - I thought FrameWorker was more standalone.

Some comments on the code:

>diff -r d29af708ec3c toolkit/components/social/Frameworker.jsm

>+function AbstractPort(portid) {

>+  postMessage: function fw_AbstractPort_postMessage(data) {

>+    // There seems to be an issue with passing objects directly and letting
>+    // the structured clone thing work - we sometimes get:
>+    // [Exception... "The object could not be cloned."  code: "25" nsresult: "0x80530019 (DataCloneError)"]
>+    // The best guess is that this happens when funky things have been added to the prototypes.
>+    // It doesn't happen for our "control" messages, only in messages from
>+    // content - so we explicitly use JSON on these messages as that avoids
>+    // the problem.

Is there a bug filed about this? Would be good to reference the number here.

>+function FrameWorker(url, clientWindow, name) {

>+    // on OSX, using createElement fails, on Win, createElementNS fails with
>+    // NS_ERROR_NOT_AVAILABLE, so just try both :)
>+    let frame;
>+    try {
>+      frame = hiddenDoc.createElementNS(XUL_NS, "iframe");
>+    }
>+    catch (ex) {
>+      frame = hiddenDoc.createElement("iframe");
>+    }

Just unconditionally use createElementNS(HTML_NS, "iframe"), which will work in both cases (and will consistently give you an HTML iframe, vs. getting a XUL one on Mac).

>+        sandbox.importScripts = function fw_importScripts() {

>+            let scriptURL = workerURI.resolve(uri);
>+            if (scriptURL.indexOf(workerURI.prePath) != 0) {

This same-origin check isn't sufficient - it could let a worker.com FrameWorker load e.g. http://worker.comwebsite.ru/secretdatas.js. Granted that's not very likely, but there are other edge cases that aren't handled by this either, since the load actually occurs from chrome - things like content policies, other security checks, etc. Can this functionality be implemented entirely in the sandbox, rather than in chrome? Then you don't need to worry about doing any of your own security checks at all.

>+        // and we delegate ononline and onoffline events to the worker.
>+        // See http://www.whatwg.org/specs/web-apps/current-work/multipage/workers.html#workerglobalscope

Are these currently used by anyone? Are there tests for this? I'd like to try an reduce the exposed functionality as much as possible, so if these are just "nice to haves" it would be good to remove them. (This applies to everything we're adding to the sandbox.)

>+    // doc.documentElement on the hiddenWindow is not working on windows,
>+    // doc.body does work.
>+    let container = hiddenDoc.body ? hiddenDoc.body : hiddenDoc.documentElement;

"not working" how? doc.documentElement should return the <html> element on non-Mac, and inserting an iframe into that should also work fine.

You'll need to re-base this on top of bug 762579, which landed on m-c early this morning. Have you had a chance to run this on try to make sure the tests are OK?
Attachment #634680 - Flags: review?(gavin.sharp)
(Assignee)

Comment 19

5 years ago
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #18)
> Comment on attachment 634680 [details] [diff] [review]
> frameworker implementation patch
> 
> Overall, this solution is more complicated than I thought - I suppose
> because it implements something that looks like the SharedWorker API, rather
> than using a simpler "hidden window with message passing" approach.

That is because we hoped to move to a real worker at some point in the future, so we want to mimic as close as possible how they work.

> It seems like we'll need to also land the workerAPI module for this to
> actually be useful - I thought FrameWorker was more standalone.

workerAPI is specific to our use of frameworker for socialapi, it is not necessary for other uses.

> Some comments on the code:
> 
> >diff -r d29af708ec3c toolkit/components/social/Frameworker.jsm
> 
> >+function AbstractPort(portid) {
> 
> >+  postMessage: function fw_AbstractPort_postMessage(data) {
> 
> >+    // There seems to be an issue with passing objects directly and letting
> >+    // the structured clone thing work - we sometimes get:
> >+    // [Exception... "The object could not be cloned."  code: "25" nsresult: "0x80530019 (DataCloneError)"]
> >+    // The best guess is that this happens when funky things have been added to the prototypes.
> >+    // It doesn't happen for our "control" messages, only in messages from
> >+    // content - so we explicitly use JSON on these messages as that avoids
> >+    // the problem.
> 
> Is there a bug filed about this? Would be good to reference the number here.

I'll leave that for MarkH to answer.


> >+function FrameWorker(url, clientWindow, name) {
> 
> >+    // on OSX, using createElement fails, on Win, createElementNS fails with
> >+    // NS_ERROR_NOT_AVAILABLE, so just try both :)
> >+    let frame;
> >+    try {
> >+      frame = hiddenDoc.createElementNS(XUL_NS, "iframe");
> >+    }
> >+    catch (ex) {
> >+      frame = hiddenDoc.createElement("iframe");
> >+    }
> 
> Just unconditionally use createElementNS(HTML_NS, "iframe"), which will work
> in both cases (and will consistently give you an HTML iframe, vs. getting a
> XUL one on Mac).

I'll try that out.

> >+        sandbox.importScripts = function fw_importScripts() {
> 
> >+            let scriptURL = workerURI.resolve(uri);
> >+            if (scriptURL.indexOf(workerURI.prePath) != 0) {
> 
> This same-origin check isn't sufficient - it could let a worker.com
> FrameWorker load e.g. http://worker.comwebsite.ru/secretdatas.js. Granted
> that's not very likely, but there are other edge cases that aren't handled
> by this either, since the load actually occurs from chrome - things like
> content policies, other security checks, etc. Can this functionality be
> implemented entirely in the sandbox, rather than in chrome? Then you don't
> need to worry about doing any of your own security checks at all.

Good idea, we could inject the basic logic in a similar fashion that we inject the worker end of the ports.


> >+        // and we delegate ononline and onoffline events to the worker.
> >+        // See http://www.whatwg.org/specs/web-apps/current-work/multipage/workers.html#workerglobalscope
> 
> Are these currently used by anyone? Are there tests for this? I'd like to
> try an reduce the exposed functionality as much as possible, so if these are
> just "nice to haves" it would be good to remove them. (This applies to
> everything we're adding to the sandbox.)

IIUC this is part of the worker api, but I am not certain.  Mark?


> >+    // doc.documentElement on the hiddenWindow is not working on windows,
> >+    // doc.body does work.
> >+    let container = hiddenDoc.body ? hiddenDoc.body : hiddenDoc.documentElement;
> 
> "not working" how? doc.documentElement should return the <html> element on
> non-Mac, and inserting an iframe into that should also work fine.

That was a problem Mark ran into on windows, I can retest now that I have VMs again.

> You'll need to re-base this on top of bug 762579, which landed on m-c early
> this morning. Have you had a chance to run this on try to make sure the
> tests are OK?

I can pull the latest and rebase.  The tests did run and pass.

Shane
(Assignee)

Comment 20

5 years ago
Created attachment 635518 [details] [diff] [review]
frameworker implementation patch

rebased patch in case anyone needs it right away, changes for comments yet to come.
Attachment #634680 - Attachment is obsolete: true
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #18)
> Comment on attachment 634680 [details] [diff] [review]
> frameworker implementation patch
> 
> >+function AbstractPort(portid) {
> 
> >+  postMessage: function fw_AbstractPort_postMessage(data) {
> 
> >+    // There seems to be an issue with passing objects directly and letting
> >+    // the structured clone thing work - we sometimes get:
> >+    // [Exception... "The object could not be cloned."  code: "25" nsresult: "0x80530019 (DataCloneError)"]
> >+    // The best guess is that this happens when funky things have been added to the prototypes.
> >+    // It doesn't happen for our "control" messages, only in messages from
> >+    // content - so we explicitly use JSON on these messages as that avoids
> >+    // the problem.
> 
> Is there a bug filed about this? Would be good to reference the number here.

Sadly I couldn't manage to reproduce it either (although I didn't spend a huge amount of time on that).  It has only been seen when using the real Amigo code, so I don't have full visibility on the underlying cause.  My and Amigo's best guess is that some of their prototype changes which could cause that.  I asked on #developers and there was a vague recollection by someone that property accessors on an object could cause a similar problem, but I never got as far as an actual repro I could add to tests.

> >+        // and we delegate ononline and onoffline events to the worker.
> >+        // See http://www.whatwg.org/specs/web-apps/current-work/multipage/workers.html#workerglobalscope
> 
> Are these currently used by anyone? Are there tests for this? I'd like to
> try an reduce the exposed functionality as much as possible, so if these are
> just "nice to haves" it would be good to remove them. (This applies to
> everything we're adding to the sandbox.)

I believe Amigo is using this as a trigger to reconnect to the mqtt server when navigator comes back online.  I think onoffline is also used to break the connection, but this is probably less important as the connection will "naturally" die anyway.

I think these are the only unanswered questions you asked - let me know if I missed anything!
(Assignee)

Comment 22

5 years ago
Created attachment 635835 [details]
patch with comments addressed

for testing by gavin
(Assignee)

Comment 23

5 years ago
Created attachment 635965 [details] [diff] [review]
frameworker xhr relative url failure

This patch shows the failure of using a relative url with the sandbox xhr.  The new importScripts which is injected into the sandbox does not try to resolve the urls passed in.  The error received is:

Exception... "The URI is malformed"  nsresult: "0x804b000a (NS_ERROR_MALFORMED_URI)"  location: "JS frame :: <injected port handling code> :: importScripts :: line 1"  data: no
Attachment #635518 - Attachment is obsolete: true
Attachment #635835 - Attachment is obsolete: true
(Assignee)

Comment 24

5 years ago
Created attachment 636563 [details]
frameworker passing xhr relative url

This is using XHR from the iframe window.  To get that to work, we have to use iframe.contentWindow.wrappedJSObject as the priniciple for the sandbox.
Attachment #635965 - Attachment is obsolete: true
(Assignee)

Comment 25

5 years ago
Created attachment 637231 [details]
frameworker implementation patch

This revision:
- turns off several content features
- returns to using frame.contentWindow.wrappedJSObject for the sandbox principle
- uses XHR from the iframe for importScripts, supporting relative urls properly relies on the above point
- moves importScripts to an injected api, allowing the iframe XHR to handle same-origin policy etc
Attachment #637231 - Flags: review?(gavin.sharp)

Updated

5 years ago
Whiteboard: [ms1] → [Fx16]
Depends on: 769771
Created attachment 638019 [details] [diff] [review]
re-factor

I took the latest patch here and made a few changes:
- uses an HTML iframe consistently, fixed bug 769771 to add ability to make it type="content" and get a content window
- fix issue discovered by MattN in bug 762993: script shouldn't be able to access window.top, so set isBrowserFrame on the iframe docshell
- gets rid of the window unwrapping, except in the one place where it's apparently needed (workerAPI copying)
- splits the code in the frameworker constructor up into pieces to help readability
- refactors into separate objects:
  - "FrameWorkerObject" for the actual frameworker (holds on to the frame, sandbox, ports), replacing "workerInfo"
  - "WorkerHandle" for the thing returned from the FrameWorker constructor (which seems like an odd object when looked at this way - it's only useful property is "port", which points to one specific port, but its "terminate" function applies to the entire frameworker and all of its ports).
- added a test for localStorage (ran into some issue while developing)
- added a test to ensure that the frameworker window isn't a chrome window

This is otherwise compatible with the existing module, I think, and the tests in the patch all pass on Mac (and should work on Win/Linux: try run at https://tbpl.mozilla.org/?tree=Try&rev=1a8a786fa67e ).

other minor changes:
- change the filename to FrameWorker.jsm to match the exported symbol name
- various minor cleanups

desired future changes:
- change the exported symbol to be something like "GetFrameWorkerHandle", since it doesn't actually return a FrameWorker and isn't a typical object constructor. We can then rename "FrameWorkerObject" to be "FrameWorker". I waited for this since it involves changing all the consumers.
Attachment #636563 - Attachment is obsolete: true
Attachment #637231 - Attachment is obsolete: true
Attachment #637231 - Flags: review?(gavin.sharp)
Attachment #638019 - Flags: review?(mixedpuppy)
Attachment #638019 - Flags: review?(mhammond)
Comment on attachment 638019 [details] [diff] [review]
re-factor

Looks great.  One nit is that in some cases a pattern of ".bind(this)" is used and in others a pattern of "let self = this;" is used to gain access to 'this' in a callback.

One other point that also existed in the previous version is that the functions "addEventListener" and "postMessage" are exposed directly in the sandbox.  Shane and I went around in circles on this a few times, but I'd prefer to see these exposed with, say, a leading underscore, so there is at least a hint that these functions are internal implementation details and should not be used directly by a provider (as these functions do not exist in the global scope of a worker).
Attachment #638019 - Flags: review?(mhammond) → review+
(Assignee)

Comment 28

5 years ago
Comment on attachment 638019 [details] [diff] [review]
re-factor

verified tests on m-c and with addon, both on osx
Attachment #638019 - Flags: review?(mixedpuppy) → review+
(In reply to Mark Hammond (:markh) from comment #27)
> Looks great.  One nit is that in some cases a pattern of ".bind(this)" is
> used and in others a pattern of "let self = this;" is used to gain access to
> 'this' in a callback.

In the one case where I use "self", I need to be able to reference the function object by name (to remove it as an observer). I could cache the bound function somehow, but using the old self trick seems simpler.
 
> One other point that also existed in the previous version is that the
> functions "addEventListener" and "postMessage" are exposed directly in the
> sandbox.  Shane and I went around in circles on this a few times, but I'd
> prefer to see these exposed with, say, a leading underscore, so there is at
> least a hint that these functions are internal implementation details and
> should not be used directly by a provider (as these functions do not exist
> in the global scope of a worker).

OK, I'll make this change.
Created attachment 638698 [details] [diff] [review]
additional review changes

Tests pass with this.
Attachment #638698 - Flags: feedback?(mhammond)
Created attachment 638724 [details] [diff] [review]
as landed

https://hg.mozilla.org/integration/mozilla-inbound/rev/14351e2f3937
Attachment #638019 - Attachment is obsolete: true
Target Milestone: --- → Firefox 16

Comment 32

5 years ago
w00t!
Also landed some additional tests:
https://hg.mozilla.org/integration/mozilla-inbound/rev/242ce0073d97
Flags: in-testsuite+
https://hg.mozilla.org/mozilla-central/rev/14351e2f3937
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
https://hg.mozilla.org/mozilla-central/rev/242ce0073d97
Comment on attachment 638698 [details] [diff] [review]
additional review changes

clearing feedback request as this has landed
Attachment #638698 - Flags: feedback?(mhammond)
Blocks: 784508
You need to log in before you can comment on or make changes to this bug.