Closed Bug 1231784 Opened 4 years ago Closed 4 years ago

Install specialpowers and mochikit at runtime via AddonManager.loadTemporaryAddon()

Categories

(Testing :: General, defect)

defect
Not set

Tracking

(firefox46 fixed, firefox47 fixed)

RESOLVED FIXED
mozilla47
Tracking Status
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: ahal, Assigned: ahal)

References

Details

Attachments

(3 files, 1 obsolete file)

In order to bypass the signing requirement, specialpowers will need to be loaded on the fly via a special API.

Unsigned restartless addons can be installed via the gui in about:debugging via the "Load Temporary Addon" button. The code that gets triggered on click is here:
https://dxr.mozilla.org/mozilla-central/source/devtools/client/aboutdebugging/aboutdebugging.js#115

We'll need to add the "installTemporaryAddon()" api to marionette. We'll also need to remove xul overlays in mochitest so the tests don't run right at startup.
Blocks: 1233200
Depends on: 1239330
Depends on: 1239352
Summary: Install specialpowers via AddonManager.loadTemporaryAddon() in mochitest → Install specialpowers and mochikit at runtime via AddonManager.loadTemporaryAddon()
Doing the same with mochitest means we won't need to get it signed, which would have been a huge source of headaches.
Duplicate of this bug: 1224294
Good news: I made mochitest restartless and am able to run the harness by installing it at runtime.
Bad news: doing this hits the same e10s content==null problem I had in bug 1224294
Worse news: instead of importing, I tried to use document.loadOverlay() in the hopes that it would work similarly to the chrome.manifest installed overlay. But I still get the content==null error.

:mconley said that automatic CPOW shims are enabled for addon compartments. Either the problem isn't automatic CPOW shims, or I have no idea what constitutes an addon compartment. Either way, I'll try and get someone who knows about this stuff to take a deeper look at this.
I've verified that the problem is related to the addon interpolation not being turned on. Addon interpolations are enabled for compartments that contain an 'addonId' metadata and for which SetAddonInterpolation was called [1]. In other words, interpolations are tied to a specific JSContext. Given that the interpolation works without my patch, my best guess is that doing `loadSubScript("URI", win); win.testInit();` is switching to a compartment outside of the "mochikit@mozilla.org" scope.

While digging around, I noticed there is an undocumented addonId option to Cu.Sandbox [2]. This appears like it should set the 'addonId' property on the JSContext's compartment such that the interpolation will be enabled. My idea was to create a new sandbox with "mochikit@mozilla.org" as it's addonId, and then run the test harness inside of that sandbox using Cu.executeInSandbox(). I still think this could be a solution, but I haven't been able to get it working. I have two ideas about why it might not be working.

1. The JSContext is not whitelisted. Notice each JSContext needs to be specifically whitelisted when setting the interposition [2]. I think that while my sandbox's compartment has the proper addonId, it's still a separate JSContext and needs to be whitelisted again. Unfortunately the UpdateInterpositionWhitelist function is really misleading, as it can only be called once per interposition, making it impossible to add a second JSContext to the whitelist after the fact. This should be easy to fix assuming there are no objections.

2. The browser-test harness is switching to a new JSContext *within* the sandbox. While the sandbox global scope is running within the proper compartment, my theory is that the harness is still switching to a different JSContext, likely when we open the second browser-harness.xul window and continue execution there [3]. If this is the case I need to create the sandbox from within browser-harness.xul after the new window is created. Due to limitations of what can be copied into sandbox, this requires a moderate refactor.

I'm going to continue poking at this, but I'm sort of thrashing about in the water here. Would appreciate all the help I can get.
Hey Gabor, first off my apologies for involving you in this.. but you seem like a good person to ask.

Would you mind taking a quick read over comment 4 and letting me know if my theory makes any sense? Am I on the right track? Am I missing anything obvious? Should I be asking someone else instead?

Thanks, I really appreciate it.
Flags: needinfo?(gkrizsanits)
Attached patch Latest WIP (obsolete) — Splinter Review
Here's my latest wip patch btw in case anyone is interested. The change that has caused this problem has been removing browser-test-overlay.xul and using a bootstrap.js instead. I've been using:
./mach mochitest -f browser --e10s dom/indexedDB

as a test case. This also requires the patch in bug 1239330 (which is landing soon).
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
Attachment #8709154 - Attachment description: temp.patch → Latest WIP
This is a quite complex and confusing area. Let me try to explain how interposition
work.
1. First you have to register an interposition by calling Cu.setAddonInterposition(addonid, interposition);
2. From this point on the engine will use that interposition on every scope that is tagged
by this addonid. But one has to create these scopes AFTER the interposition registration!
For tagging a sandbox scope one must set the addonId option for it as you
discovered. For other scopes like xul stuff the engine determines if it belongs to the addon
by checking its location. For example scripts in a XUL document: http://mxr.mozilla.org/mozilla-central/source/dom/xul/XULDocument.cpp#3504
Important part is that all the test runner scopes that you want interposition on must be tagged
with the addonid, to achieve that you must tag the sandboxes manually and put all the xul stuff in the right location or tinder with MapURIToAddonId http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/AddonPathService.cpp#205 to return the right addonId for your xul pages.

(In reply to Andrew Halberstadt [:ahal] from comment #4)
> I've verified that the problem is related to the addon interpolation not
> being turned on. Addon interpolations are enabled for compartments that
> contain an 'addonId' metadata and for which SetAddonInterpolation was called
> [1]. In other words, interpolations are tied to a specific JSContext. Given
> that the interpolation works without my patch, my best guess is that doing
> `loadSubScript("URI", win); win.testInit();` is switching to a compartment
> outside of the "mochikit@mozilla.org" scope.

I see that you spent quite some time to understand what is going on and I know it's hard to understand
all this from looking at code, so here are a few things I think you didn't get entirely right.

Usually one scope/global has a 1-to-1 relationship with a compartment. Compartment is a memory chunk that holds the objects that are created within the scope of a global. You should not care about JSContext too much, it contains some info for the JS engine, important part is that when you execute some JS code you set the JSContext to operate on a compartment. If a function is called its code will run with its globals compartment. So if you have a function in sandbox1 and call it from let's say a script of a xul document, or from anywhere else, its code will still be executed in sandbox1's compartment.

> 
> While digging around, I noticed there is an undocumented addonId option to
> Cu.Sandbox [2]. This appears like it should set the 'addonId' property on
> the JSContext's compartment such that the interpolation will be enabled. My
> idea was to create a new sandbox with "mochikit@mozilla.org" as it's
> addonId, and then run the test harness inside of that sandbox using
> Cu.executeInSandbox(). I still think this could be a solution, but I haven't
> been able to get it working. I have two ideas about why it might not be
> working.
> 
> 1. The JSContext is not whitelisted. Notice each JSContext needs to be
> specifically whitelisted when setting the interposition [2]. I think that

I don't know what you mean whitelisted. The code you linked here is not doing
any whitelisting.

> while my sandbox's compartment has the proper addonId, it's still a separate
> JSContext and needs to be whitelisted again. Unfortunately the
> UpdateInterpositionWhitelist function is really misleading, as it can only
> be called once per interposition, making it impossible to add a second
> JSContext to the whitelist after the fact. This should be easy to fix
> assuming there are no objections.

Oh, you bumped into UpdateInterpositionWhitelist... it's a different interposition
for function objects. Don't worry about it you don't have to deal with that at all.
It's an entirely different beast you don't need for this bug.

> 
> 2. The browser-test harness is switching to a new JSContext *within* the
> sandbox. While the sandbox global scope is running within the proper
> compartment, my theory is that the harness is still switching to a different
> JSContext, likely when we open the second browser-harness.xul window and
> continue execution there [3]. If this is the case I need to create the
> sandbox from within browser-harness.xul after the new window is created. Due
> to limitations of what can be copied into sandbox, this requires a moderate
> refactor.

I hope I answered this question.

> 
> I'm going to continue poking at this, but I'm sort of thrashing about in the
> water here. Would appreciate all the help I can get.

Let me know if it made any sense what I've tried to explain so far.
Flags: needinfo?(gkrizsanits)
Thanks, that helps a lot! In fact, I think I figured out how to get it working. I believe there is a bug in XPIProvider.jsm, where installTemporaryAddon doesn't call XPIProvider._addURIMapping [1] (actually, I think this mapping isn't updated anytime AddonManager is used to install addons directly). Because of this, the mochikit@mozilla.org addonId wasn't getting returned by MapURIToAddonID when it should be.

After fixing that, the interposition started working when I used document.loadOverlay() to import my scripts. (I still couldn't get the mozIJSSubScript method to work, but at this point I don't care to understand why as long as there's a working alternative). I'll do some more testing and try runs after lunch to make sure everything is working, and I'll file a new bug under Toolkit::Add-ons Manager to tackle the missing uri mapping.

[1] https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/internal/XPIProvider.jsm#3838
Depends on: 1240875
The dreaded e10s interposition bug seems to be fixed \o/.

Now I'm running into a problem where installing temporary addons is hitting the network, and causing a crash due to MOZ_DISABLE_NONLOCAL_CONNECTIONS:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b1364bdf4335

I've verified it's happening somewhere in AddonManager.installTemporaryAddons(). I chatted on #amo and it's expected that installing addons would ping some servers. But mochitest *should* be setting all the proper prefs to disable this [1]. So either installTemporaryAddons() is ignoring one of those prefs, or there's a pref that the mochitest harness is missing.

The weird part is that I can't reproduce this locally. I'm trying to turn on NSPR logging to see the network request in a log, but for some reason the uploaded NSPR log on try is empty :/.. Even weirder, e10s mochitest plain jobs pass even though non-e10s plain and e10s browser-chrome both fail.
Finally got NSPR logging working, and turns out it was caused by some kind of first run experience. The fact that firefox crashed during specialpowers installation was probably a coincidence. I don't really know why my patch caused this first run experience, but I was able to fix it by setting 'startup.homepage_welcome_url' to 'about:blank'.
Starting to look a little better:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4ca91e7289b3&group_state=expanded

Down to individual test failures, and mochitest-chrome still has some problems. I'll likely need help fixing up the failing tests, will file separate bugs. Once these are fixed I'll try turning on xpinstall.signatures.required and we can start signing test extensions.
Attachment #8709154 - Attachment is obsolete: true
Depends on: 1241996
Fixed android and b2g mochitest-chrome, as well as two of the test failures. Only thing left is the test_0011_check_basic.xul failure in M-oth.
I had all the failures fixed, but noticed that I regressed bug 1175318. I think I also have that fixed now, just waiting for the new try results to come in:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e5303f21a65&group_state=expanded

With luck, I'll finally be able to flag for review on this soon.
There's a bit of orange in there, but they all have intermittent bugs on file already. It's possible my patch increases their frequency, but that would still be the test's fault. I think landing, and letting the sheriffs make a call about whether the tests need to be disabled or not is ok here. I'll let a sheriff know too.
Attachment #8711123 - Attachment description: MozReview Request: Bug 1231784 - Install specialpowers and mochikit extensions at runtime via AddonManager.loadTemporaryAddon() → MozReview Request: Bug 1231784 - Install specialpowers and mochikit extensions at runtime via AddonManager.loadTemporaryAddon(), r=jgriffin
Attachment #8711123 - Flags: review?(jgriffin)
Comment on attachment 8711123 [details]
MozReview Request: Bug 1231784 - Install specialpowers and mochikit extensions at runtime via AddonManager.loadTemporaryAddon(), r=jgriffin

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/31991/diff/1-2/
Noticed that the dt5 failures in my try run recently got fixed, and the JP linux debug jobs are actually hidden. So I seem to be in the clear.
Attachment #8713635 - Flags: review?(dtownsend) → review+
Comment on attachment 8713635 [details]
MozReview Request: Bug 1231784 - Fix toolkit/mozapps/extensions/test/browser/browser_select_confirm.js, r=Mossop

https://reviewboard.mozilla.org/r/32793/#review29663

I guess in a better world that UI would just ignore temporary add-ons, but given that there shouldn't be any when it runs for real users, and we're not even showing that UI anymore it's probably not worth doing that. Maybe we should get rid of that code entirely if we're never going to use it again...
Attachment #8713634 - Flags: review?(robert.strong.bugs) → review+
Comment on attachment 8713634 [details]
MozReview Request: Bug 1231784 - Fix toolkit/mozapps/update/chrome tests, r=rstrong

https://reviewboard.mozilla.org/r/32791/#review29691

LGTM
Blocks: 1233202
Attachment #8711123 - Flags: review?(jgriffin) → review+
Comment on attachment 8711123 [details]
MozReview Request: Bug 1231784 - Install specialpowers and mochikit extensions at runtime via AddonManager.loadTemporaryAddon(), r=jgriffin

https://reviewboard.mozilla.org/r/31991/#review30049

LGTM. The linux opt m-e10s dt5 failure on the try run looks a bit suspicious though.
If you look at the try runs before mine, it's near permafail in those too. Then bug 1204174 landed after that try push to supposedly mitigate the timeouts in those tests. I can do another run though, just in case.
Sigh, there's another test failing now, though in the same directory:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6f57461282b2

It's still quite orange below my push, but I guess my push somehow makes it a bit slower? It definitely looks permafail on my push. The good news is that someone grabbed the bug 2 hours ago.. so hopefully there is a fix coming soon.
Depends on: 1241570
No longer depends on: 1241570
KWierso pointed out that that intermittent got much worse recently and is just as orange on inbound. I must have pushed my try run just after the thing that spiked up the failure rate. So I'm just going to land this and hope for the best #yolo.
If it turns out that this does make that intermittent permafail after all, I'll add a requestLongerTimeout call until bug 1231784 can be fixed properly.
If it takes you more than ten minutes to figure out what's going on, you have a couple of options: treat bug 1206321 comment 2 et seq. as license to just disable that test (across all platforms, it also hung on 10.6 the few times we got around to running it, and it hanging on Windows in a different way without your patch is either the first or second most common Jetpack intermittent), or, read that and then say "philor, wouldn't you say that that means the Jetpack suite doesn't meet the very first requirement in https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#Requirements_for_jobs_shown_in_the_default_Treeherder_view ?" and *poof* it's gone.
Ah, 10.10 isn't run by default on try.. I was wondering how I missed that. Though I see I can enable it explicitly in my try run, is there a reason for that?

It looks like for some reason 'nukeSandbox' is no longer a method on Components.utils in the test context. I see that other places in the codebase have something like if ('nukeSandbox' in Cu).. so maybe I just need to add a check like that.
Flags: needinfo?(ahalberstadt)
We bought new hardware to run 10.10 on, but we didn't buy new rack space, which required shutting off existing things to put the new ones in their place, which required dropping the load. The shrieking about the result and pace of this method never reaches the ears of the people doing it, so it has turned out to be a great method. For them.
Right after uninstalling the test addon I get the following error:
 15:24:46 INFO - JavaScript error: resource://gre/modules/addons/XPIProvider.jsm -> jar:file:///var/folders/2_/5bvqc7_51034r2hmy_t0wjpm00000w/T/tmpUew2qN.mozrunner/extensions/test-addon-author-email@jetpack.xpi!/bootstrap.js, line 299: NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIXPCComponents_Utils.nukeSandbox]

Unfortunately, I only have bad news:
1. It only happens on OSX opt (OSX debug and linux opt both work fine)
2. It's a failure in the harness between tests, so I can't disable my way to victory here (unless I take philor's hint and disable the whole suite)
3. I don't have a mac, so I'll have to try-debug this which will be *slow*
To add to the weirdness, I actually reproduced this failure on 10.6 in this try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e5303f21a65&filter-searchStr=osx%20opt%20jp

But the retrigger came back green. So this is intermittent on 10.6 opt and permafail on 10.10 opt. This almost definitely an existing race condition in the jetpack-addons harness that got made more prevalent by my patch.
Depends on: 1142734
Phew. Mossop fixed bug 1142734, and it looks like JP is green again:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d9ecc4404e0a

Here's another full try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9643f33e2910&group_state=expanded

So far looks good. Just waiting on the windows jobs.
backed out the last commit for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=21531194&repo=mozilla-inbound
Flags: needinfo?(ahalberstadt)
What happened here was that browser_select_confirm.js actually got removed in bug 1244302. Somehow I managed to add that file back via an automatic merge and didn't notice. Given that that change was simply to fix the test, and the test no longer exists, there's no further action needed here.
Flags: needinfo?(ahalberstadt)
Attachment #8713635 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/9d4f6a42a6e8
https://hg.mozilla.org/mozilla-central/rev/c6a623bd18f1
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
We'll also want to land this on aurora for Firefox 46, as this is blocking addon signing. Here's a try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9d93197880b4
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1248056
Here's a try run on aurora btw:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9d93197880b4

Looks good, though I'll need to backport the fix from bug 1142734 as well as a couple minor follow-up fixes for mach.
Comment on attachment 8713635 [details]
MozReview Request: Bug 1231784 - Fix toolkit/mozapps/extensions/test/browser/browser_select_confirm.js, r=Mossop

We'll need this patch for aurora.
Attachment #8713635 - Attachment is obsolete: false
Follow-up fix for some lurking Beta non-local connection bustage that showed up in the Try uplift simulations.

https://hg.mozilla.org/integration/mozilla-inbound/rev/278d8d654f2f
https://hg.mozilla.org/releases/mozilla-aurora/rev/cbe77570408e
https://hg.mozilla.org/mozilla-central/rev/278d8d654f2f
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Blocks: 1252067

Andrew, just a heads up that app update no longer checks updates for add-ons as of a few years ago so the following is no longer necessary for app update.
https://searchfox.org/mozilla-central/source/testing/mochitest/install.rdf#13

Flags: needinfo?(ahal)
Flags: needinfo?(ahal)
You need to log in before you can comment on or make changes to this bug.