Closed Bug 1309351 Opened 8 years ago Closed 8 years ago

Optimize the loading of built-in SDK modules, and load them into a shared compartment

Categories

(Add-on SDK Graveyard :: General, defect)

defect
Not set
normal

Tracking

(firefox50 verified, firefox51 verified, firefox52 verified)

VERIFIED FIXED
mozilla52
Tracking Status
firefox50 --- verified
firefox51 --- verified
firefox52 --- verified

People

(Reporter: kmag, Assigned: kmag)

References

Details

(Keywords: addon-compat)

Attachments

(3 files, 1 obsolete file)

      No description provided.
Attachment #8800094 - Flags: review?(poirot.alex)
Attachment #8800093 - Flags: review?(poirot.alex)
Attachment #8800094 - Attachment is obsolete: true
Comment on attachment 8800093 [details]
Bug 1309351: Part 1 - Replace child_process/subprocess.js with a thin wrapper around Subprocess.jsm.

https://reviewboard.mozilla.org/r/85120/#review83782

Hi Kris,
Besides the more detailed feedback that follows, here is some other details that I want to point out and that are not related to one of the files currently changed in this patch:

- the removed modules have to be removed from the `addon-sdk/moz.build` file (I'm sure that you already did this change locally and it is only missing from the patch, especially given that "./mach build" is going to fail without this change)

- besides the test addon that is in this patch, there is another test file, `addon-sdk/source/test/test-child-process.js`, which is much more detailed, it has been disabled in Bug 979675 because it was intermittent, but, even if we don't re-enable it on the build infrastructure, it can be used locally to help us to catch early some of the more subtle differences between the two implementation.

::: addon-sdk/source/lib/sdk/system/child_process/subprocess.js:47
(Diff revision 2)
> +    callback(string);
> +});
>  
>  var subprocess = {
> -    call: function(options) {
> +  call: function(options) {
> -        options.mergeStderr = options.mergeStderr || false;
> +    let procPromise = Task.spawn(function*() {

If `pathSearch` (the one defined internally in the subprocess JSMs) fails to find the requested executable, then this promise will be rejected, and, at least by looking at "child_process.js", it looks like that we should catch its rejection and call the [`handleError` callback][1] (which was probably handled [here][2], when this kind of error was synchronously raised from the current SDK implementation).

[1]:  http://searchfox.org/mozilla-central/source/addon-sdk/source/lib/sdk/system/child_process.js#97
[2]: http://searchfox.org/mozilla-central/rev/cd1be634c9309c7fc99a3fde67dd44d343875f60/addon-sdk/source/lib/sdk/system/child_process.js#92

::: addon-sdk/source/lib/sdk/system/child_process/subprocess.js:83
(Diff revision 2)
> -                        console.warn("error decoding output: " + ex);
> -                        data = getBytes(event.data.data);
> -                    }
> -                }
>  
> -                callbackFunc(data);
> +      if (options.done)

Nit: even if in the SDK modules `if/else/etc.` blocks are often written without being surrounded by curly brackets when they have only one statement in it, if there are no reasons to keep the old coding style in this file (especially given that this module is almost completely rewritten), I would prefer to have curly brackets here (and around the other similar blocks) like we both love ;-)

::: addon-sdk/source/lib/sdk/system/child_process/subprocess.js:84
(Diff revision 2)
> -                        data = getBytes(event.data.data);
> -                    }
> -                }
>  
> -                callbackFunc(data);
> -                break;
> +      if (options.done)
> +        proc.wait().then(options.done);

When the `options.done` callback is called, stderr and stdout are supposed to be full drained from data, and any pending data should already be sent to any listener registered to the 'data' events of the `child.stdout` and `child.stderr` streams, so that `exec` and `execFile` can correctly return the collected data to their callback as expected.

::: addon-sdk/source/lib/sdk/system/child_process/subprocess.js:131
(Diff revision 2)
> -
> -            libc.close();
> -        }
> -    }
>  
> -    //main
> +    let result = {

it looks like there is a missing return here, this `result` object is supposed to be returned to the caller.

Also, in the previous version this result object had a `pid` property (which is supposed to be the spawned process pid on a unix system and, I suppose `undefined`, on windows).
Attachment #8800093 - Flags: review?(lgreco)
Another potential issue (but I didn't digged into it too much yet to say for sure) is that it seems possible that the stdin stream has not been hooked yet immediatelly after the child_process has been created, and if the caller close the stdin stream too fast, I suppose that it is possible that the child process will not notice that the stream has been closed (and in that case, it could fail to close the stdin of client process, and the child process will not be stopped when it happens)
Keywords: addon-compat
Comment on attachment 8800093 [details]
Bug 1309351: Part 1 - Replace child_process/subprocess.js with a thin wrapper around Subprocess.jsm.

https://reviewboard.mozilla.org/r/85120/#review83782

Ugh, disabled tests...

Yeah, I did remove the deleted modules from moz.build after I pushed to review.

> If `pathSearch` (the one defined internally in the subprocess JSMs) fails to find the requested executable, then this promise will be rejected, and, at least by looking at "child_process.js", it looks like that we should catch its rejection and call the [`handleError` callback][1] (which was probably handled [here][2], when this kind of error was synchronously raised from the current SDK implementation).
> 
> [1]:  http://searchfox.org/mozilla-central/source/addon-sdk/source/lib/sdk/system/child_process.js#97
> [2]: http://searchfox.org/mozilla-central/rev/cd1be634c9309c7fc99a3fde67dd44d343875f60/addon-sdk/source/lib/sdk/system/child_process.js#92

Yeah, that had occurred to me, but I decided to it since no tests failed, and this module isn't very widely-used and is deprecated anyway. But I suppose it is worth handling...

> Nit: even if in the SDK modules `if/else/etc.` blocks are often written without being surrounded by curly brackets when they have only one statement in it, if there are no reasons to keep the old coding style in this file (especially given that this module is almost completely rewritten), I would prefer to have curly brackets here (and around the other similar blocks) like we both love ;-)

Well, most of the loader code I've been mucking through has them written this way, but I'm happy to add braces.

> When the `options.done` callback is called, stderr and stdout are supposed to be full drained from data, and any pending data should already be sent to any listener registered to the 'data' events of the `child.stdout` and `child.stderr` streams, so that `exec` and `execFile` can correctly return the collected data to their callback as expected.

I think that should be the case here, but I guess it may be worth making sure of...

> it looks like there is a missing return here, this `result` object is supposed to be returned to the caller.
> 
> Also, in the previous version this result object had a `pid` property (which is supposed to be the spawned process pid on a unix system and, I suppose `undefined`, on windows).

*sigh* I was hoping that there were enough existing tests to cover at least the basics, but apparently there are not... All of the existing tests pass.
(In reply to Luca Greco [:rpl] from comment #5)
> Another potential issue (but I didn't digged into it too much yet to say for
> sure) is that it seems possible that the stdin stream has not been hooked
> yet immediatelly after the child_process has been created, and if the caller
> close the stdin stream too fast, I suppose that it is possible that the
> child process will not notice that the stream has been closed (and in that
> case, it could fail to close the stdin of client process, and the child
> process will not be stopped when it happens)

I don't think that will be an issue. A closed pipe is a closed pipe. In any case, the behavior is pretty similar to what it was before.
Comment on attachment 8800531 [details]
Bug 1309351: Part 2 - Use a shared global sandbox and simple module resolution for built-in modules.

https://reviewboard.mozilla.org/r/85448/#review84214

Looks good. But please give it a try on some significant sdk addons before landing as I don't know how badly that could interact with existing modules.
Having said that, if try is fine with that change, that's already a quite positive!

::: addon-sdk/source/lib/toolkit/loader.js:741
(Diff revision 1)
>      requireHook
>    } = loader;
>  
> +  if (isSystemURI(requirer.uri)) {
> +    isNative = false;
> +    loaderResolve = Loader.resolve;

A comment to say why you are toggle this would help!
Attachment #8800531 - Flags: review?(poirot.alex) → review+
Comment on attachment 8800531 [details]
Bug 1309351: Part 2 - Use a shared global sandbox and simple module resolution for built-in modules.

https://reviewboard.mozilla.org/r/85448/#review84214

I tested with GeckoProfiler, but I'll ask QA to do some more thorough testing.

> A comment to say why you are toggle this would help!

Yeah, good point...
Comment on attachment 8800093 [details]
Bug 1309351: Part 1 - Replace child_process/subprocess.js with a thin wrapper around Subprocess.jsm.

https://reviewboard.mozilla.org/r/85120/#review84276

::: addon-sdk/source/lib/sdk/system/child_process/subprocess.js:113
(Diff revisions 2 - 4)
>              return String.fromCharCode(...buffer);
>            });
>          };
> +      }
>  
>        if (options.stdout)

nit

curly brackets on every "if" with a single expression in the body or on none of them, mixing the two styles will only make easier to misread them.

::: addon-sdk/source/lib/sdk/system/child_process/subprocess.js:137
(Diff revisions 2 - 4)
> +    procPromise.catch(e => {
> +      Cu.reportError(e instanceof Error ? e : e.message || e);
> +      if (options.done)
> +        options.done({exitCode: -1});
> +    });

This definitely ensures that the rejection has been caught, but I'm not sure that it will always report the error in a useful way to the caller, e.g.:
when procPromise is rejected because the requested executable has not been found, this is going to report  the error to the caller as "Process exited with exit code -1" which is not what actually happened.

The real error (which is "Executable not found: ...") is going to be reported to the browser console, but it will not be easy for the caller to be able to handle the "command not found" differently from a command that exits with an error (because they will be indistinguible from the caller point of view).

Thanks Kris, this new version already seems to be able to provide a replacement for this module that doesn't behave too differently from the  current implementation, but I think that the error reported for a "command not found scenario", as describe above, is something that can be tweaked.

As an additional checks, I just tried to re-enable locally the test-child-process.js test file and this new version passes almost all the test cases in that test file, besides the following two:

- testExecFileCallbackError, it currently fails because a "command not found" error was reported as an NS_ERROR_FILE_UNRECOGNIZED_PATH, while the new error message reported to the callback is "Process exit code -1", like described in the above comment, I'm not saying that we have to report exactly the same error message at any cost, but "Process exit code -1" seems too generic as error message in this scenario.

- testChildStdinStreamLarge, it currently fails because it doesn't seem to be able to collect all the data piped into stdin (http://searchfox.org/mozilla-central/source/addon-sdk/source/test/test-child_process.js#420), it seems that when the stdin stream has been closed and the child process stopped, there is still data piped in the stdin stream that has not been completely read.
Attachment #8800093 - Flags: review?(lgreco)
Comment on attachment 8800093 [details]
Bug 1309351: Part 1 - Replace child_process/subprocess.js with a thin wrapper around Subprocess.jsm.

https://reviewboard.mozilla.org/r/85120/#review84276

> nit
> 
> curly brackets on every "if" with a single expression in the body or on none of them, mixing the two styles will only make easier to misread them.

In general, the rules for mixing bracketed and unbracked if statements is that multi-line if statements and chains including them always have brackets, and the rest don't.

> This definitely ensures that the rejection has been caught, but I'm not sure that it will always report the error in a useful way to the caller, e.g.:
> when procPromise is rejected because the requested executable has not been found, this is going to report  the error to the caller as "Process exited with exit code -1" which is not what actually happened.
> 
> The real error (which is "Executable not found: ...") is going to be reported to the browser console, but it will not be easy for the caller to be able to handle the "command not found" differently from a command that exits with an error (because they will be indistinguible from the caller point of view).

There's no way to handle this the same way as the old code without spinning the event loop so we can spawn the process synchronously, which is not a good option. This module is deprecated, anyway, and the wrapper is only a temporary stopgap, so I don't think it's worth worrying about.

I don't get either of those failures, and the current version definitely waits for all input to be processed before calling done()
Comment on attachment 8800093 [details]
Bug 1309351: Part 1 - Replace child_process/subprocess.js with a thin wrapper around Subprocess.jsm.

https://reviewboard.mozilla.org/r/85120/#review84312

If the module is deprecated and we are going to suggest to move to a different solution in any case, then there aren't probably too much reasons to worry about these details, anyway I thought it was something that worth a mention before proceeding. 

> > In general, the rules for mixing bracketed and unbracked if statements is that multi-line if statements and chains including them always have brackets, and the rest don't.

Yep. That's exactly what I meant.

> There's no way to handle this the same way as the old code without spinning the event loop so we can spawn the process synchronously, which is not a good option. This module is deprecated, anyway, and the wrapper is only a temporary stopgap, so I don't think it's worth worrying about.

Sounds good, but what I meant is that it seems to be still possible to report it with a better error message here: https://dxr.mozilla.org/mozilla-central/source/addon-sdk/source/lib/sdk/system/child_process.js#72
and it would be pretty similar to what happened before, because it seems to me that the child_process errors were never really reported synchronously: https://dxr.mozilla.org/mozilla-central/source/addon-sdk/source/lib/sdk/system/child_process.js#105

> I don't get either of those failures, and the current version definitely waits for all input to be processed before calling done()

I've got this tests running by re-enabling them locally, by decommenting this line: http://searchfox.org/mozilla-central/source/addon-sdk/source/test/test-child_process.js#545
Attachment #8800093 - Flags: review+
Comment on attachment 8800093 [details]
Bug 1309351: Part 1 - Replace child_process/subprocess.js with a thin wrapper around Subprocess.jsm.

https://reviewboard.mozilla.org/r/85120/#review84312

Yes, I did the same.

Apparently the IO error was a race, and there was a bug in the Subprocess.jsm polling code. I wasn't seeing it because when I ran the tests, the timing happened to work out, and the output got processed before the pipe closed.
https://hg.mozilla.org/integration/mozilla-inbound/rev/ed7605224e51a86662c2a77248bf66400f3ce6ea
Bug 1309351: Part 1 - Replace child_process/subprocess.js with a thin wrapper around Subprocess.jsm. r=rpl f=mhowell

https://hg.mozilla.org/integration/mozilla-inbound/rev/36b49ab783ae32db547d2304cd9c3c282bf7ee54
Bug 1309351: Part 2 - Use a shared global sandbox and simple module resolution for built-in modules. r=ochameau
https://hg.mozilla.org/mozilla-central/rev/ed7605224e51
https://hg.mozilla.org/mozilla-central/rev/36b49ab783ae
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Comment on attachment 8800531 [details]
Bug 1309351: Part 2 - Use a shared global sandbox and simple module resolution for built-in modules.

Approval Request Comment
[Feature/regressing bug #]: Bug 935109

[User impact if declined]:

The current version of this code has severe performance issues, and considerable unnecessary memory overhead. These problems are considerably worsened by the rollout of e10s to add-on users, where the memory and startup time overhead apply to both the parent and child processes.

[Describe test coverage new/current, TreeHerder]:

These changes only affect built-in SDK modules, and should be well-exercised by the existing test suite.

[Risks and why]:

Low. Since these changes only affect built-in SDK modules, the risks to add-ons in the wild should be minimal, unless they are explicitly trying to circumvent the encapsulation of built-in code. QA is currently testing these changes against popular and complex SDK add-ons, and should come across any large scale issues that are likely to arise.

[String/UUID change made/needed]: None.
Attachment #8800531 - Flags: approval-mozilla-beta?
Attachment #8800531 - Flags: approval-mozilla-aurora?
Comment on attachment 8800531 [details]
Bug 1309351: Part 2 - Use a shared global sandbox and simple module resolution for built-in modules.

I reviewed the patches attached to both of those bugs and I do *not* feel comfortable uplifting them to 50.0b10 especially given that QA is still ongoing. If rejecting these uplifts to Beta50 means we have fewer add-ons getting white listed in 50 then (sorry!) but that is a potential situation we need to prepare for.

The fact that this code has not stabilized on pre-beta channels sufficiently is a big concern. The risk mentioned in bug 1309350 is medium and the patches landed in Nightly yesterday which is clearly not enough bake time to minimize risk to quality when uplifting so late in the Beta cycle.
Attachment #8800531 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Just to clarify, this uplift does not just affect the number of add-ons being enabled in 50, it affects any user of Firefox who uses an SDK add-on, including the system add-ons.
This patch can land independently of the ones in bug 1309350, and it (along with the first patch in the other bug) has been in nightly for a week without issues.

QA has also done significant testing since I initially filed the uplift request, and hasn't found even a minor issue, so I'm fairly confident that these are low risk at this point.
Comment on attachment 8800531 [details]
Bug 1309351: Part 2 - Use a shared global sandbox and simple module resolution for built-in modules.

Enhance a performance & memeory issue related to built-in SDK modules. I'd like to take it in 51 aurora and see if any issues happen.
Attachment #8800531 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8800531 [details]
Bug 1309351: Part 2 - Use a shared global sandbox and simple module resolution for built-in modules.

There was long discussion on the benefits, risks and mitigation plan for this late uplift. We have decided to take this in 50, Beta+
Attachment #8800531 - Flags: approval-mozilla-beta- → approval-mozilla-beta+
We performed a lot of testing around this bug on Firefox 52.0a1 under Windows 10 64-bit, Windows 7 53-bit and Mac OS X 10.11.6.

We covered:
  - Popular add-ons with e10s on/off
  - Test Pilot add-ons with e10s on
  - Several add-ons with package_json and a large number of JS files with e10s on/off 


There were no regressions encountered during this testing.

More details about testing are available via: https://public.etherpad-mozilla.org/p/1309351


Tomorrow we will test across the other fixed versions.
I am setting the tracking flags accordingly.
Status: RESOLVED → VERIFIED
Depends on: 1313440
Manually verified this bug also on Firefox 51.0a2 (2016-10-27) and Firefox 50.0b11 (20161027110534). Tested a few add-ons and we concluded that they have maintained the same behavior. There were no new regressions found.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: