Closed Bug 1308656 Opened 8 years ago Closed 7 years ago

Land shield-recipe-client as system addon

Categories

(Firefox :: Normandy Client, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: mythmon, Assigned: mythmon)

References

Details

(Whiteboard: [go-faster-system-addon])

Attachments

(1 file, 1 obsolete file)

The shield-recipe-client is the client component of SHIELD [0]. It is intended to replace the self-repair hidden iframe, and handles fetching, verifying, and executing recipes from the Normandy server [1]. For an overview of the general system, see the Normandy Concepts docs [2].

The add-on provides a restricted sandbox for recipe actions to execute in, and provides "driver functions" for recipe actions to perform privileged actions. Right now this consists of

* logging facilities
* showing heartbeat prompts
* storing data reliably
* getting information about the client, such as browser version

For a complete list of driver functions, see the Driver API docs [3]. This list of driver functions will grow in the future to accommodate new recipe functionality.

The code for shield-recipe-client is developed at https://github.com/mozilla/normandy-addon. This is written as a SDK addon and built with JPM. The code checked into mozilla-central will be based on the xpi built from this repo. I talked with Rob Helmer about this, and we decided it will be ok for now. In the future we should remove the addon SDK code in favor of standard Firefox frontend code.

Mike Kelly announced our intent to ship this system addon on the Go Faster mailing list [4]. Several security and operations concerns have been covered in that thread.

[0]: https://wiki.mozilla.org/Firefox/SHIELD
[1]: http://normandy.readthedocs.io/en/latest/index.html
[2]: http://normandy.readthedocs.io/en/latest/dev/concepts.html
[3]: http://normandy.readthedocs.io/en/latest/dev/driver.html
[4]: https://mail.mozilla.org/pipermail/gofaster/2016-September/000434.html
Attachment #8799078 - Flags: review?(rhelmer)
Attachment #8799078 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

Julien, can you f? this from a security point of view?
Attachment #8799078 - Flags: feedback?(jvehent)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

Rebecca, can you f? this from a data privacy of view?
Attachment #8799078 - Flags: feedback?(rweiss)
Assignee: nobody → mcooper
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83120

Uhh. This commit just lists a 1-dir addition to a makefile.

Did you forget to hg add all the new files, or push the commit that has those?
Attachment #8799078 - Flags: review?(gijskruitbosch+bugs)
Oops, sorry. I forgot to hg add the new files. I've fixed that now on review board.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83236

::: browser/extensions/shield-recipe-client/LICENSE:1
(Diff revision 2)
> +Mozilla Public License Version 2.0

I don't think this needs its own license file.. the one in browser/LICENSE should be enough if it's MPL.

What is the license situation of the JEXL dependency? That probably does need to be declared.

::: browser/extensions/shield-recipe-client/bootstrap.js:11
(Diff revision 2)
> +const { utils: Cu } = Components;
> +const rootURI = __SCRIPT_URI_SPEC__.replace("bootstrap.js", "");
> +const COMMONJS_URI = "resource://gre/modules/commonjs";
> +const { require } = Cu.import(COMMONJS_URI + "/toolkit/require.js", {});
> +const { Bootstrap } = require(COMMONJS_URI + "/sdk/addon/bootstrap.js");
> +var { startup, shutdown, install, uninstall } = new Bootstrap(rootURI);

We have discussed this separately, but more people will see it here - we should move away from the SDK as it doesn't really have a future.

Also it'd be nice to have the code in this extension look more like code elsewhere in Firefox, which should help Firefox peers to be able to effectively review.

I don't think it needs to block the intial landing though.

::: browser/extensions/shield-recipe-client/lib/EventEmitter.js:12
(Diff revision 2)
> +exports.EventEmitter = function(sandbox=null) {
> +  const listeners = {};
> +
> +  return {
> +    emit(eventName, event) {
> +      // Fire events async

Why use `setTimeout` here versus promises?

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:185
(Diff revision 2)
> +    }
> +
> +    let timestamp = Date.now();
> +    let sendPing = false;
> +
> +    switch (name) {

You might consider using an object instead of a `switch` here - it's a little more readable and you can't have the hazard of accidentally forgetting a `break`.

::: browser/extensions/shield-recipe-client/lib/NormandyApi.js:47
(Diff revision 2)
> +      if (rawText.indexOf(serialized) === -1) {
> +        throw new Error('Canonical recipe serialization does not match!');
> +      }
> +
> +      let certChain = (yield Http.get({url: x5u})).text;
> +      let builtSignature = `x5u="${x5u}";p384ecdsa=${signature}`;

Hm so the value of `x5u` needs double-quotes and the value of `p384ecdsa` does not?

::: browser/extensions/shield-recipe-client/lib/NormandyApi.js:52
(Diff revision 2)
> +      let builtSignature = `x5u="${x5u}";p384ecdsa=${signature}`;
> +
> +      const verifier = Cc['@mozilla.org/security/contentsignatureverifier;1']
> +        .createInstance(Ci.nsIContentSignatureVerifier);
> +
> +      if (!verifier.verifyContentSignature(serialized, builtSignature, certChain, 'normandy.content-signature.mozilla.org')) {

The host shouldn't be hardcoded, can it be set in a pref instead? This could be the default if the pref isn't set, but it'll be hard to e.g. run against a test server if it's hardcoded like this.

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:60
(Diff revision 2)
> +    },
> +
> +    saveHeartbeatFlow(data) {
> +      let url;
> +      if (!this.testing) {
> +        url = 'https://input.mozilla.org/api/v2/hb/';

Similar to above, we should use a pref.

Is this the only use for the `testing` setting?

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:91
(Diff revision 2)
> +        version: Services.appinfo.version,
> +        channel: Services.appinfo.defaultUpdateChannel,
> +        isDefaultBrowser: ShellService.isDefaultBrowser() || null,
> +        searchEngine: Services.search.defaultEngine.identifier,
> +        syncSetup: Services.prefs.prefHasUserValue('services.sync.username'),
> +        plugins: {}, // TODO

Did you mean to resolve these TODOs?

`about:plugins` uses `navigator.plugins` to get the list of plugins - it also does a quick refresh and fetches these async.

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:92
(Diff revision 2)
> +        channel: Services.appinfo.defaultUpdateChannel,
> +        isDefaultBrowser: ShellService.isDefaultBrowser() || null,
> +        searchEngine: Services.search.defaultEngine.identifier,
> +        syncSetup: Services.prefs.prefHasUserValue('services.sync.username'),
> +        plugins: {}, // TODO
> +        doNotTrack: false, // TODO

There is a `privacy.donottrackheader.enabled` pref you could use.

::: browser/extensions/shield-recipe-client/package.json:11
(Diff revision 2)
> +  "description": "Client to download and run recipes for SHIELD, Heartbeat, etc.",
> +  "main": "index.js",
> +  "author": "Mike Cooper <mcooper@mozilla.com>",
> +  "repository": "https://github.com/mozilla/normandy-addon",
> +  "scripts": {
> +    "test": "jpm test",

Is there supposed to be a test directory? We should have tests that run in mozilla-central on checkin, so  changes to Firefox/Gecko that might break this extension are caught right away.
Attachment #8799078 - Flags: review?(rhelmer)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83236

> We have discussed this separately, but more people will see it here - we should move away from the SDK as it doesn't really have a future.
> 
> Also it'd be nice to have the code in this extension look more like code elsewhere in Firefox, which should help Firefox peers to be able to effectively review.
> 
> I don't think it needs to block the intial landing though.

I agree, both that this should change and that it should not block the initial landing.

As far as "look more like code elsewhere in Firefox", I think that would be ideal, but I don't really know what the rest of the code in Firefox looks like. I made sure it passed eslint, but if you have suggestions for other stylistic and organizational changes, I'd be happy to hear them.

> Why use `setTimeout` here versus promises?

The intent here is to make sure the event callbacks fire on the next eventloop tick, which makes it easier to bind handlers to events in the same tick that the events are fired. I don't think promises would be appropriate here, but maube there is a better way to make a functino run async besides `setTimeout`.

> You might consider using an object instead of a `switch` here - it's a little more readable and you can't have the hazard of accidentally forgetting a `break`.

That makes sense. I waffle back and forth about which style I prefer.

For reference, this code was based on the UITour version, here: https://dxr.mozilla.org/mozilla-central/source/browser/components/uitour/UITour.jsm#1115

> Hm so the value of `x5u` needs double-quotes and the value of `p384ecdsa` does not?

Yes. The p384ecdsa value is base64, so its alphabet is predictable, where the x5u is a url, and so could have something like a semicolon in it.

> The host shouldn't be hardcoded, can it be set in a pref instead? This could be the default if the pref isn't set, but it'll be hard to e.g. run against a test server if it's hardcoded like this.

This isn't a actually a host name, although it looks like one. It is the name of the signing domain (I think I've got the words right for that). Our development setup also uses this same domain for it's signatures. There is no network activity here. Making this a pref would make the signature checking less secure, since an attacker could use signatures from another domain.

> Similar to above, we should use a pref.
> 
> Is this the only use for the `testing` setting?

Agreed this should be a pref. The `testing` attribute on the driver is defined by the normandy driver spec, and actions are free to use it. Even if we don't use it here we should keep it.

> Is there supposed to be a test directory? We should have tests that run in mozilla-central on checkin, so  changes to Firefox/Gecko that might break this extension are caught right away.

Hmm. I would have expected jpm to include the tests in the built addon, which this patch was based on. I'll the tests directory.

I agree that we should have tests that run in mozilla-central on checkin, but I'm not sure how to accomplish that. Do you know what we might do?
(In reply to Mike Cooper [:mythmon] from comment #9)
> Comment on attachment 8799078 [details]
> Bug 1308656 - Add shield-recipe-client as system add-on
>
> > Why use `setTimeout` here versus promises?
> 
> The intent here is to make sure the event callbacks fire on the next
> eventloop tick, which makes it easier to bind handlers to events in the same
> tick that the events are fired. I don't think promises would be appropriate
> here, but maube there is a better way to make a functino run async besides
> `setTimeout`.


Ah. Here's a more direct way to spin the event loop:
Services.tm.mainThread.processNextEvent()


> > Is there supposed to be a test directory? We should have tests that run in mozilla-central on checkin, so  changes to Firefox/Gecko that might break this extension are caught right away.
> 
> Hmm. I would have expected jpm to include the tests in the built addon,
> which this patch was based on. I'll the tests directory.
> 
> I agree that we should have tests that run in mozilla-central on checkin,
> but I'm not sure how to accomplish that. Do you know what we might do?

Hm I'm not sure exactly how jpm tests work, we might be able to run them from xpcshell or a mochitest both of which the build system understands already.
(In reply to Robert Helmer [:rhelmer] from comment #10)
> (In reply to Mike Cooper [:mythmon] from comment #9)
> > Comment on attachment 8799078 [details]
> > Bug 1308656 - Add shield-recipe-client as system add-on
> >
> > > Why use `setTimeout` here versus promises?
> > 
> > The intent here is to make sure the event callbacks fire on the next
> > eventloop tick, which makes it easier to bind handlers to events in the same
> > tick that the events are fired. I don't think promises would be appropriate
> > here, but maube there is a better way to make a functino run async besides
> > `setTimeout`.
> 
> 
> Ah. Here's a more direct way to spin the event loop:
> Services.tm.mainThread.processNextEvent()


Hm although reading at https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIThread#processNextEvent() you should probably just leave this alone and use setTimeout.
 
> > > Is there supposed to be a test directory? We should have tests that run in mozilla-central on checkin, so  changes to Firefox/Gecko that might break this extension are caught right away.
> > 
> > Hmm. I would have expected jpm to include the tests in the built addon,
> > which this patch was based on. I'll the tests directory.
> > 
> > I agree that we should have tests that run in mozilla-central on checkin,
> > but I'm not sure how to accomplish that. Do you know what we might do?
> 
> Hm I'm not sure exactly how jpm tests work, we might be able to run them
> from xpcshell or a mochitest both of which the build system understands
> already.
Attachment #8799456 - Attachment is obsolete: true
We're going to need to get a fix for bug 1308332 before landing this because the perf regression won't be shippable. :'(
Depends on: 1308332
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

(In reply to Mike Cooper [:mythmon] from comment #2)
> Julien, can you f? this from a security point of view?

I will defer my f? to :pauljt who has performed the original secreview of the system add-on. I think this is just a follow-up.
Attachment #8799078 - Flags: feedback?(ptheriault)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83408

OK, there might be more, but at this stage there's so much in here that it feels like I should just submit this review and we can iterate from there.

Globally, this needs to land with at least some tests. There should probably be browser mochitests, or you can ping Mossop to see if there's a way to integrate the add-on SDK-based tests. I don't know if that's possible or not.

I also wonder how hard it would really be to refactor this into a non-SDK add-on because of the perf issues that have been flagged up with SDK add-ons. Most of it already uses require('chrome') and a lot of Cu.imports, so it feels like it shouldn't actually be difficult? Like, there's only a few SDK modules in use, plus the convenience of being able to load JEXL. Is there no way to compile that thing into a single JS file that we minify and just subscript-load or something, if we really can't use a sandbox and eval() instead?

::: browser/extensions/shield-recipe-client/bootstrap.js:1
(Diff revision 3)
> +/* This Source Code Form is subject to the terms of the Mozilla Public

Other files like index.js will also need a 3-line license block at the top.

By convention, we use "use strict" everywhere, too.

::: browser/extensions/shield-recipe-client/index.js:36
(Diff revision 3)
> +  if (reason === 'uninstall' || reason === 'disable') {
> +    SelfRepairInteraction.enableSelfRepair();
> +  }

Why is this necessary?

::: browser/extensions/shield-recipe-client/install.rdf:11
(Diff revision 3)
> +          <em:creator>Mike Cooper &lt;mcooper@mozilla.com&gt;</em:creator>
> +          <em:optionsURL>data:text/xml,&lt;placeholder/&gt;</em:optionsURL>

Other system add-ons do not have a <creator> value to avoid confusing users about the origin of the add-on. We should probably do the same here.

optionsURL is not required, so we should just leave it out rather than having a placeholder value (same with optionsType).

::: browser/extensions/shield-recipe-client/lib/EnvExpressions.js:5
(Diff revision 3)
> +const {Cu} = require('chrome');
> +Cu.import('resource://gre/modules/TelemetryArchive.jsm'); /* globals TelemetryArchive: false */
> +Cu.import('resource://gre/modules/Task.jsm'); /* globals Task */
> +
> +const {Jexl} = require('jexl');

Note that one of the ways we're thinking of making the loader faster is removing the report for node-style paths. Nobody else is using it in-tree, and it seems unlikely many add-ons use it... why are we using it here rather than the .style path we use elsewhere, or the lib/ path that's normally used for this?

::: browser/extensions/shield-recipe-client/lib/EnvExpressions.js:8
(Diff revision 3)
> +function getEnv() {
> +  return {
> +    telemetry: getLatestTelemetry(),
> +  };

Just inline this into where we're using it, as there's only one callsite?

::: browser/extensions/shield-recipe-client/lib/EnvExpressions.js:32
(Diff revision 3)
> +  }
> +
> +  let telemetry = {};
> +  for (let key in mostRecentPings) {
> +    const ping = mostRecentPings[key];
> +    telemetry[ping.type] = yield TelemetryArchive.promiseArchivedPingById(ping.id);

Exposing the full telemetry pings to the heartbeat/SHIELD thing seems like too big a sledgehammer to use for a very small nail.

Can we instead just expose the data we're interested in? We've done the same with UITour - we expose what is necessary.

::: browser/extensions/shield-recipe-client/lib/EnvExpressions.js:40
(Diff revision 3)
> +});
> +
> +const jexl = new Jexl();
> +jexl.addTransforms({
> +  date: dateString => new Date(dateString),
> +  stableSample: stableSample,

We live in the es6 world, you don't need key: value to be repetitive if the key and value are the same, so just:

```
stableSample,
```

will do.

::: browser/extensions/shield-recipe-client/lib/EnvExpressions.js:44
(Diff revision 3)
> +  eval(expr, extraContext={}) {
> +    const context = Object.assign({}, getEnv(), extraContext);
> +    const onelineExpr = expr.replace(/[\t\n\r]/g, ' ');
> +    return jexl.eval(onelineExpr, context);

So... more than half of the code here is the Jexl thing. That's a lot of code to ship for basically just checking "does my environment match the criteria for this recipe".

Why can't we implement this with eval in a sandbox?

::: browser/extensions/shield-recipe-client/lib/EventEmitter.js:20
(Diff revision 3)
> +          Log.debug(`EventEmitter: Event fired with no listeners: ${eventName}`);
> +          return;
> +        }
> +        let frozenEvent = Object.freeze(event);
> +        if (sandbox) {
> +          frozenEvent = Cu.cloneInto(frozenEvent, sandbox);

I wrote a long comment about this and mozreview trashed it so now I am both grumpy and trying to remember all this.

Basically, there are multiple levels of indirection here, and I think they pose a memory leak risk as well as potentially a security risk.

You're creating an event emitter in the add-on code, then you clone it into the sandbox (which the emitter itself has a reference to, too...) and then the sandboxed code is supposed to be able to call `on()` to add a listener function which gets called by the add-on when the add-on calls `emit()`.

I *think* it might all be safe right now (security-wise) because of forwarding-and-cloning functions (bug 1022002).

However, the cross-compartment circular references between the emitter and the sandbox and the code that created both seem iffy to me - there's no `off()` implementation and so I don't see how any of it would ever be cleaned up. Maybe I'm being overly paranoid.

Either way, it feels like it would be simpler if you loaded this script (sans chrome requirements and Timer.jsm/Log.js references) into the sandbox and just created the emitter in there. Callers of emit() would need to clone any arguments they'd want to pass (could add a helper for this on the Heartbeat object, of course), but that would feel a lot more contained to me, and easier to reason about.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:56
(Diff revision 3)
> +  constructor(chromeWindow, options) {
> +    this.chromeWindow = chromeWindow;

It looks like some of this is significantly inspired/cribbed from the existing UITour.showHeartbeat code. Should we remove that API?

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:72
(Diff revision 3)
> +          this.userEngaged(new Map([
> +            ['type', 'button'],
> +            ['flowid', this.options.flowId],
> +          ]));

On line 122 you're passing a JS object, and here you pass a Map(). Pick one? :-)

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:85
(Diff revision 3)
> +        },
> +      }];
> +    }
> +
> +
> +    this.notificationBox = this.chromeWindow.document.querySelector('#high-priority-global-notificationbox');

getElementById, please.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:88
(Diff revision 3)
> +
> +
> +    this.notificationBox = this.chromeWindow.document.querySelector('#high-priority-global-notificationbox');
> +    this.notice = this.notificationBox.appendNotification(
> +      this.options.message,
> +      'heartbeat-' + this.options.flowId,

I don't see any checking that `message` and `flowId` are both valid values. It feels like for those, at least, the add-on code should verify that the recipe is passing sane values.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:104
(Diff revision 3)
> +
> +    // Holds the rating UI
> +    let frag = this.chromeWindow.document.createDocumentFragment();
> +
> +    // Build the heartbeat stars
> +    const numStars = this.options.engagementButtonLabel ? 0 : 5;

Feels like if there's no stars we shouldn't include the rating element at all. I'd just:

```
const numStars = 5;
if (this.options.engagementButtonLabel) {
  ...
}
```

everything including the appending of `ratingContainer`.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:147
(Diff revision 3)
> +    this.messageText.flex = 0;
> +    let leftSpacer = this.messageText.nextSibling;
> +    leftSpacer.flex = 0;
> +
> +    // Add Learn More Link
> +    if (this.options.learnMoreMessage && this.options.learnMoreURL) {

Should we verify the learnMoreURL is actually a valid URL? Also, can we enforce that it uses http/https? I don't think we allow privilege inheritance from data: or javascript: URLs for these text-links but I'd rather just not find out.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:156
(Diff revision 3)
> +    // Append the fragment and apply the styling
> +    this.notice.appendChild(frag);

You can pass appendNotification a document fragment instead of a string. This avoids having to do all the manual insertion afterwards, and might also help with adding classes for styling.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:165
(Diff revision 3)
> +    // Let the consumer know the notification was shown.
> +    this.maybeNotifyHeartbeat('NotificationOffered');
> +
> +    let handleWindowClosed = () => {
> +      this.maybeNotifyHeartbeat('WindowClosed');
> +      this.chromeWindow.removeEventListener('SSWindowClosing', handleWindowClosed);

This event listener should be removed earlier if the notification bar is removed some other way, or we'll leak.

Likewise, when we clean up (ie the notification box disappears through any means at all) we should ensure our object is destroyed and we delete references to e.g. this.chromeWindow and this.notificationBox - or we'll leak. The sandbox presumably stays alive as I don't see it being destroyed anywhere, and that has references to the notification promise etc, which then keep everything else alive.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:185
(Diff revision 3)
> +    }
> +
> +    let timestamp = Date.now();
> +    let sendPing = false;
> +
> +    let phases = {

Note that not all of these are described in the docs.

Is there some reason not to do something like this instead:

```
let phases = new Set(["NotificationOffered", "LearnMore", ..., "WindowClosed"]);
let finalPhases = new Set(["WindowClosed", "NotificationClosed"]);

if (phases.has(name)) {
  if (!this.surveyResults.timestamps[name]) {
    this.surveyResults.timestamps[name] = data.timestamp;
  }
  sendPing = finalPhases.has(name);
  if (name == "Voted") {
    this.surveyResults.score = data.score;
  }
} else {
  Log.error(...);
}
```

You could initialize surveyResults as `{timestamps: {}, flowId: this.options.flowId}`, and set `data.timestamp` to `Date.now()` before the loop. This avoids repetition.

This seems simpler than the repetition for the variously-named TS members.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:206
(Diff revision 3)
> +      },
> +      SurveyExpired: () => {
> +        this.surveyResults.expiredTS = timestamp;
> +      },
> +      NotificationClosed: () => {
> +        this.surveyResults.closedTS = timestamp;

Can you elaborate on how these timestamps are used? It feels like it would make sense if there was a telemetry histogram indicating what action was taken, or how soon such an action was taken, but sending the raw timestamps doens't make as much sense to me.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:230
(Diff revision 3)
> +      return;
> +    }
> +
> +    // Send the ping to Telemetry
> +    let payload = Object.assign({version: 1}, this.surveyResults);
> +    for (let meta of ['surveyId', 'surveyVersion', 'testing']) {

What's the "testing" thing here and does it require shipping?

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:256
(Diff revision 3)
> +    this.surveyResults = null;
> +  }
> +
> +  userEngaged(/* engagementParams */) {
> +    // Make the heartbeat icon pulse twice
> +    this.notice.label = this.options.thankyouMessage;

Note that if you pass a fragment as the messageText, I'm not entirely sure that this will work. :-(

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:262
(Diff revision 3)
> +    this.messageImage.classList.remove('pulse-onshow');
> +    this.messageImage.classList.add('pulse-twice');
> +
> +    // Remove all the children of the notice (rating container, and the flex)
> +    while (this.notice.firstChild) {
> +      this.notice.removeChild(this.notice.firstChild);

this.notice.firstChild.remove()

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:270
(Diff revision 3)
> +    // Make sure that we have a valid URL. If we haven't, do not open the engagement page.
> +    let engagementURL = null;
> +    try {
> +      engagementURL = new URL(this.options.postAnswerURL);
> +    } catch (error) {
> +      Log.error('Invalid URL specified.');

We have this information from the start. We should just throw from showHeartbeatNotification if the URL is present but invalid there, and do an ! empty check here.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:279
(Diff revision 3)
> +    if (engagementURL) {
> +      // Open the engagement URL in a new tab.
> +      this.chromeWindow.gBrowser.selectedTab =
> +        this.chromeWindow.gBrowser.addTab(engagementURL.toString(), {
> +          owner: this.chromeWindow.gBrowser.selectedTab,
> +          relatedToCurrent: true,

Why are we passing this? AIUI there's no guarantee that the current tab here is related to the new one.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:285
(Diff revision 3)
> +        });
> +    }
> +
> +    setTimeout(() => {
> +      this.notificationBox.removeNotification(this.notice);
> +    }, 3000);

Magic value alert. Should probably just be a const at the top of the file?

Also, in rare cases this will race with the survey expiry timer. The latter should be cleared here.

::: browser/extensions/shield-recipe-client/lib/Http.js:17
(Diff revision 3)
> +  },
> +
> +  request({
> +    method,
> +    url,
> +    acceptNon200: acceptNon200=false,

This seems mislabeled, as it really means you're accepting anything? It also doesn't seem to be used, so maybe just kill the option?

::: browser/extensions/shield-recipe-client/lib/Http.js:21
(Diff revision 3)
> +    url,
> +    acceptNon200: acceptNon200=false,
> +    headers: headers={},
> +    data: data=null,
> +  }) {
> +    return new Promise((resolve, reject) => {

Can we enforce that `url` here is https, and perhaps constrain the domains client-side? If we need http-non-s to work for tests, we can have a "drop the shields" ( https://www.youtube.com/watch?v=RHoXUP804vg ) pref that we toggle on infra and/or local dev in order to run tests.

::: browser/extensions/shield-recipe-client/lib/Http.js:27
(Diff revision 3)
> +      let opts = {
> +        url,
> +        headers,
> +        onComplete(response) {
> +          if (response === undefined) {
> +            reject('Response failed');

We access response.text in some of the rejection handlers, so this should probably reject with something that also looks like a response object. Alternatively, reject here and in the status === 0 case with `new Error(<message>)`, which is the usual way to reject promises and get useful stacks etc., and update the rejection handlers.

::: browser/extensions/shield-recipe-client/lib/NormandyApi.js:15
(Diff revision 3)
> +      let params = [];
> +      for (let key in data) {
> +        params.push(`${encodeURIComponent(key)}=${encodeURIComponent(data[key])}`);
> +      }
> +      url += '?' + params.join('&');

Please don't.

let paramObj = new URLSearchParams();
for (let key in data) {
  paramObj.append(key, data[key]);
}
url += '?' + paramObj.toString();


It's also a little surprising that we then still pass `data` to Http.js...

Should this code just really live in whatever consumer isn't passing sensible URLs? Or take separate data and 'urlparams' arguments or something? This code feels smelly. :-)

::: browser/extensions/shield-recipe-client/lib/NormandyApi.js:42
(Diff revision 3)
> +
> +    const verifiedRecipes = [];
> +
> +    for (let {recipe, signature: {signature, x5u}} of recipesWithSigs) {
> +      const serialized = CanonicalJSON.stringify(recipe);
> +      if (rawText.indexOf(serialized) === -1) {

!rawText.includes(serialized)

::: browser/extensions/shield-recipe-client/lib/NormandyApi.js:52
(Diff revision 3)
> +      let builtSignature = `x5u="${x5u}";p384ecdsa=${signature}`;
> +
> +      const verifier = Cc['@mozilla.org/security/contentsignatureverifier;1']
> +        .createInstance(Ci.nsIContentSignatureVerifier);
> +
> +      if (!verifier.verifyContentSignature(serialized, builtSignature, certChain, 'normandy.content-signature.mozilla.org')) {

Shouldn't this string here be a pref so we can test this on integration tests and developer machines with something other than the real keys?

::: browser/extensions/shield-recipe-client/lib/NormandyApi.js:54
(Diff revision 3)
> +      } else {
> +        verifiedRecipes.push(recipe);

Nit: no else after return or throw.

::: browser/extensions/shield-recipe-client/lib/NormandyApi.js:70
(Diff revision 3)
> +    .then(req => req.json)
> +    .then(clientData => {

Nit:

```
.then(({json: clientData}) => {
```

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:3
(Diff revision 3)
> +const {uuid} = require('sdk/util/uuid');
> +const {Cu} = require('chrome');
> +Cu.import('resource://gre/modules/Services.jsm'); /* globals Services: false */

The `globals` comments and package.json make me suspect you're using eslint. Do you have your own eslintrc? It doesn't seem to be included in this commit. Does all this stuff currently pass browser's eslint (./mach eslint browser/extensions/shield-recipe-client/ would be the trivial test assuming you didn't copy the .eslintrc there yet)

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:13
(Diff revision 3)
> +const {Storage} = require('./Storage.js');
> +const {EnvExpressions} = require('./EnvExpressions.js');
> +const {Http} = require('./Http.js');
> +const {Heartbeat} = require('./Heartbeat.js');
> +
> +const actionLogger = Log.makeNamespace('actions');

Not familiar with the SDK, but shouldn't this be more unique?

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:18
(Diff revision 3)
> +const actionLogger = Log.makeNamespace('actions');
> +
> +// Spec: https://raw.githubusercontent.com/mozilla/normandy-actions/9d6406f21f9025602c87754209de9fd675cc4871/docs/driver.rst
> +exports.NormandyDriver = function(recipeRunner, sandbox, extraContext) {
> +  return {
> +    testing: true,

What is this and why is it set to true by default?

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:21
(Diff revision 3)
> +exports.NormandyDriver = function(recipeRunner, sandbox, extraContext) {
> +  return {
> +    testing: true,
> +
> +    get locale() {
> +      return Preferences.get('general.useragent.locale');

This should check the actual locale with the chrome registry, as the pref could be just about anything:

```
Cc["@mozilla.org/chrome/chrome-registry;1"].getService(Ci.nsIXULChromeRegistry).getSelectedLocale("browser")
```

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:34
(Diff revision 3)
> +      actionLogger[level](message);
> +    },
> +
> +    showHeartbeat(options) {
> +      Log.info(`Showing heartbeat prompt "${options.message}"`);
> +      let aWindow = Services.wm.getMostRecentWindow('navigator:browser');

This could be null. You should deal with that - probably bail out if so.

Also, browser code generally prefers double-quoted strings, but I recognize that might be too late to change everywhere.

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:38
(Diff revision 3)
> +      Log.info(`Showing heartbeat prompt "${options.message}"`);
> +      let aWindow = Services.wm.getMostRecentWindow('navigator:browser');
> +
> +      let heartbeat = new Heartbeat(aWindow, {
> +        message: options.message,
> +        thankyouMessage: options.thanksMessage,

Can we make these the same?

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:41
(Diff revision 3)
> +      let heartbeat = new Heartbeat(aWindow, {
> +        message: options.message,
> +        thankyouMessage: options.thanksMessage,
> +        flowId: options.flowId,
> +        learnMoreMessage: options.learnMoreMessage,
> +        learnMoreURL: options.learnMoreUrl,

And these URL vs. Url things.

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:45
(Diff revision 3)
> +        learnMoreMessage: options.learnMoreMessage,
> +        learnMoreURL: options.learnMoreUrl,
> +        engagementButtonLabel: options.engagementButtonLabel,
> +        postAnswerURL: options.postAnswerUrl,
> +        surveyId: options.surveyId,
> +        surveyVersion: options.surveyVersion,

It looks like this could basically just be something like:

let internalOptions = Object.assign({}, options, {testing: this.testing, sandbox});

and then pass the `internalOptions`?

That would also mean you wouldn't have to do tedious bookkeeping here if we add/remove options.

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:60
(Diff revision 3)
> +    },
> +
> +    saveHeartbeatFlow(data) {
> +      let url;
> +      if (!this.testing) {
> +        url = 'https://input.mozilla.org/api/v2/hb/';

As rhelmer noted, this should be a pref.

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:71
(Diff revision 3)
> +
> +      Log.debug('Sending heartbeat flow data to Input', data);
> +
> +      // Make request to input
> +      let p = Http.post({url, data, headers})
> +      .then(response => {

Nit: personally I would put the `.then(response =>` bit on the same line as the thing that produces the promise.

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:78
(Diff revision 3)
> +        // Resolve undefined instead of passing the response down.
> +        return undefined;
> +      })
> +      .catch(response => {
> +        Log.error('Input response:', response.text);
> +        throw Cu.cloneInto(response, sandbox, {cloneFunctions: false});

As noted above, there's no guarantee `response` is an Error.

Why is the cloneFunctions param necessary here?

It feels like this should throw new sandbox.Error(<error message>); instead.

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:84
(Diff revision 3)
> +      });
> +      // Wrap the promise for the sandbox
> +      return sandbox.Promise.resolve(p);
> +    },
> +
> +    client() {

This looks like it's basically getting the same stuff as the UITour. Can we just delegate instead of maintaining the same logic here?

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:121
(Diff revision 3)
> +    envExpression(expr) {
> +      return EnvExpressions.eval(expr, extraContext);
> +    },

This also doesn't seem to be documented. Why do we need it?

::: browser/extensions/shield-recipe-client/lib/RecipeRunner.js:99
(Diff revision 3)
> +   * @param {object} extraContext Any extra context to provide to the filter environment.
> +   * @return {boolean} The result of evaluating the filter, cast to a bool.
> +   */
> +  checkFilter(recipe, extraContext) {
> +    let filter = recipe.filter_expression;
> +    if (!filter || filter === '') {

!filter is true for empty string, so you can omit the OR and that check.

What happens if you pass " " to JEXL?

::: browser/extensions/shield-recipe-client/lib/RecipeRunner.js:124
(Diff revision 3)
> +    const sandbox = new Cu.Sandbox(null, {
> +      wantComponents: false,
> +      wantGlobalProperties: ['URL', 'URLSearchParams'],
> +    });
> +
> +    sandbox.setTimeout = Cu.cloneInto(setTimeout, sandbox, {cloneFunctions: true});

I would like for us not to do this.

If the forwarding functions in bug 1022002 work as advertised, and this did what it said on the tin, you'd have just forwarded chrome-privileged eval() into your sandbox (setTimeout("(function() { alert(Components.stack); })()", 0)).

Of course, this isn't a |window| setTimeout (because that doesn't work - you can't call it on a non-window) and Timer.jsm doesn't currently accept strings as arguments. So maybe right now we're lucky at the moment and you can't priv-esc from this. This would change if someone changed Timer.jsm or if this code was ever moved into a window. That kind of privilege escalation isn't something I would like us to have lying around as an "oops" factor for such changes.

Why do we need this? For "run code async when we've run to completion", Promise.resolved().then() works already.

Orthogonally, it'd be interesting to see how this worked out in practice with destroying the sandboxes. If we (attempt to) kill off the sandbox and the timeout resolves afterwards... what happens? The actual setTimeout implementation doesn't live in the sandbox, but the sandbox will be dead (and/or data in it will be gone). That doesn't sound like fun either.

::: browser/extensions/shield-recipe-client/lib/RecipeRunner.js:126
(Diff revision 3)
> +    let action = yield NormandyApi.fetchAction(recipe.action);
> +    let response = yield Http.get({url: action.implementation_url});

Feels like both of these and/or this method as a whole should have a .catch().

Also, as noted elsewhere, feels like these URLs should be restricted to https.

Finally, these two things feel like they should have more specific names than "response" or even just "action".

::: browser/extensions/shield-recipe-client/lib/RecipeRunner.js:133
(Diff revision 3)
> +        a.execute()
> +        .catch(err => sandboxedDriver.log(err, 'error'));

Nit: Again, would put this on one line...

::: browser/extensions/shield-recipe-client/lib/RecipeRunner.js:143
(Diff revision 3)
> +    `;
> +
> +    const driver = new NormandyDriver(this, sandbox, extraContext);
> +    sandbox.sandboxedDriver = Cu.cloneInto(driver, sandbox, {cloneFunctions: true});
> +    sandbox.sandboxedRecipe = Cu.cloneInto(recipe, sandbox);
> +    sandbox.window = Cu.cloneInto({}, sandbox);

What is this for?

::: browser/extensions/shield-recipe-client/lib/RecipeRunner.js:146
(Diff revision 3)
> +    sandbox.sandboxedDriver = Cu.cloneInto(driver, sandbox, {cloneFunctions: true});
> +    sandbox.sandboxedRecipe = Cu.cloneInto(recipe, sandbox);
> +    sandbox.window = Cu.cloneInto({}, sandbox);
> +
> +    Cu.evalInSandbox(registerActionScript, sandbox);
> +    Cu.evalInSandbox(actionScript, sandbox);

So what controls that these sandboxes are eventually destroyed? Or do we just leak them until the end of the process' lifetime?

::: browser/extensions/shield-recipe-client/lib/SelfRepairInteraction.js:29
(Diff revision 3)
> +    Preferences.set(PREF_SELF_SUPPORT_ENABLED, enabled);
> +  },
> +
> +  isEnabled() {
> +    let enabled = Preferences.get(PREF_SELF_SUPPORT_ENABLED);
> +    return enabled === true || enabled === undefined;

Why does this return true if the pref is not defined? Also, if we're using Preferences.get() and need a default value if the value is not present, we can just pass a default as the second arg.

::: browser/extensions/shield-recipe-client/lib/Storage.js:2
(Diff revision 3)
> +const {Cu} = require('chrome');
> +var {indexedDB} = require('sdk/indexed-db');

Why not let?

::: browser/extensions/shield-recipe-client/lib/Storage.js:5
(Diff revision 3)
> +  if (sandbox === null) {
> +    sandbox = {Promise};
> +  }

Feels like we should just throw instead?

::: browser/extensions/shield-recipe-client/lib/Storage.js:83
(Diff revision 3)
> +  let sandboxedStorageInterfaces = Cu.cloneInto(storageInterface, sandbox, {
> +    cloneFunctions: true,
> +  });
> +  return sandboxedStorageInterfaces;

Nit: just return this directly?

::: browser/extensions/shield-recipe-client/lib/stableSample.js:22
(Diff revision 3)
> +    throw new Error(`frac must be between 0 and 1 inclusive (got ${frac})`);
> +  }
> +
> +  const mult = Math.pow(2, 256) - 1;
> +  const inDecimal = Math.floor(frac * mult);
> +  let hexDigits = inDecimal.toString(16);

This doesn't actually work. Cf:

```

mult = Math.pow(2, 256)
// 1.157920892373162e+77
mult.toString(16)
// 1 (followed by 64 0s)
(mult - 1000).toString(16)
// 1 (followed by 64 0s)
(mult - 1000000).toString(16)
// 1 (followed by 64 0s)
(mult - Math.pow(2,202)).toString(16)
// 1 (followed by 64 0s)
```

So really your keyspace isn't a full sha256 hash, but much smaller - closer to about 2^54 (I'm not a maths or crypto geek so that estimate is likely still off by quite a bit). This is all because of JS using doubles to represent numbers. It has no correct way to talk about numbers as large as sha256.

It looks like if you're using this to bucket or whatever, you could just look at the first N bytes of the sha hash that you're generating with crypto.subtle. For things like sampling at 1/10/50% that should be enough. An appropriate maximum would be using Math.pow(16, 12) - 1 and just using 12 hex characters.

::: browser/extensions/shield-recipe-client/lib/stableSample.js:56
(Diff revision 3)
> +
> +  // Join all the hex strings into one
> +  return hexCodes.join('');
> +}
> +
> +exports.stableSample = function(input, rate) {

It feels like it would make more sense to provide a bucket to the recipe rather than 'making' all the recipes duplicate the stable sample logic of passing in both the user and recipe id.

::: browser/extensions/shield-recipe-client/moz.build:17
(Diff revision 3)
> +  'bootstrap.js',
> +  'index.js',
> +  'install.rdf',
> +  'lib',
> +  'node_modules',
> +  'package.json',

I don't think we need to ship the package.json file, or even have it in-tree?
Attachment #8799078 - Flags: review?(gijskruitbosch+bugs) → review-
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83408

Thanks for the review here. It is very useful, and I know that there is a lot here. I think some of the issues you've raised aren't something that needs changed. I've explained why, and marked those as "Dropped" in reviewboard. If you disagree, please say so, I don't mean to shut down discussion on those issues.

I'm going to work on the code changes from this review, and I'll push a new patch with them soon.

> Why is this necessary?

Self-repair is disabled when the addon is installed. If the addon is removed, it should be re-enabled to leave the browser in a consistent state. This makes sense when recipe-client is an external add-on, but as a system add-on it might not make sense.

> Note that one of the ways we're thinking of making the loader faster is removing the report for node-style paths. Nobody else is using it in-tree, and it seems unlikely many add-ons use it... why are we using it here rather than the .style path we use elsewhere, or the lib/ path that's normally used for this?

I'm not sure what you mean by ".style path we use elsewhere". As to why this isn't in `lib`: In the original source, in github, we don't vendor jexl, but instead pull it in as a part of the build step. Using node-style paths makes it easier to maintain both code bases.

> Exposing the full telemetry pings to the heartbeat/SHIELD thing seems like too big a sledgehammer to use for a very small nail.
> 
> Can we instead just expose the data we're interested in? We've done the same with UITour - we expose what is necessary.

Part of the promise of this add-on is that we can expose all the data, including data added to telmeetry in the future, in a safe way. Doing this means we can target new types of data without shipping new code. Additionally, we have taken steps to avoid exfiltrating this data. This is a very important part of the add-on.

> So... more than half of the code here is the Jexl thing. That's a lot of code to ship for basically just checking "does my environment match the criteria for this recipe".
> 
> Why can't we implement this with eval in a sandbox?

Eval is still more powerful than Jexl. Jexl doesn't have loops, and generally is harder to convince to not halt. It is also as simpler language for those writing the recipes, and something that we can statically analyze on the server.  (We don't do this yet, but are planning to).

We decided to use Jexl instead of writing our own language that checked all the safety and usability boxes we wanted because Jexl is stable, well maintained, and already does exactly what we want. Re-using code that comes from the commnunity is one of the strenghts of JS, and something I'm loathe to give up.

Also, as is a theme in this review, this add-on is not a new system, but is integrating with a pre-existing set of tools. Dropping support for Jexl would break compatiblity with these and require more work than simply using Jexl.

> I wrote a long comment about this and mozreview trashed it so now I am both grumpy and trying to remember all this.
> 
> Basically, there are multiple levels of indirection here, and I think they pose a memory leak risk as well as potentially a security risk.
> 
> You're creating an event emitter in the add-on code, then you clone it into the sandbox (which the emitter itself has a reference to, too...) and then the sandboxed code is supposed to be able to call `on()` to add a listener function which gets called by the add-on when the add-on calls `emit()`.
> 
> I *think* it might all be safe right now (security-wise) because of forwarding-and-cloning functions (bug 1022002).
> 
> However, the cross-compartment circular references between the emitter and the sandbox and the code that created both seem iffy to me - there's no `off()` implementation and so I don't see how any of it would ever be cleaned up. Maybe I'm being overly paranoid.
> 
> Either way, it feels like it would be simpler if you loaded this script (sans chrome requirements and Timer.jsm/Log.js references) into the sandbox and just created the emitter in there. Callers of emit() would need to clone any arguments they'd want to pass (could add a helper for this on the Heartbeat object, of course), but that would feel a lot more contained to me, and easier to reason about.

I'm sorry mozreview trashed your original comment. That sucks.

The sandbox is something I'm actually very unsure about. I feel like I've barely got the sandbox working, and would would be very happy to go over this in more depth. I'm especially unsure about passing objects in and out of the sandbox.

Implementing this class entirely in the sandbox is an interesting idea. I'll see if I can get that working.

> It looks like some of this is significantly inspired/cribbed from the existing UITour.showHeartbeat code. Should we remove that API?

You're right, this started as an exacty copy of UITour.showHeartbeat, and then was modified to work for our context.

I'm not comfortable removing it right now, since there are other callers of it. In the future I definitely want to remove it, probably in a different patch since this is already quite large.

> Note that not all of these are described in the docs.
> 
> Is there some reason not to do something like this instead:
> 
> ```
> let phases = new Set(["NotificationOffered", "LearnMore", ..., "WindowClosed"]);
> let finalPhases = new Set(["WindowClosed", "NotificationClosed"]);
> 
> if (phases.has(name)) {
>   if (!this.surveyResults.timestamps[name]) {
>     this.surveyResults.timestamps[name] = data.timestamp;
>   }
>   sendPing = finalPhases.has(name);
>   if (name == "Voted") {
>     this.surveyResults.score = data.score;
>   }
> } else {
>   Log.error(...);
> }
> ```
> 
> You could initialize surveyResults as `{timestamps: {}, flowId: this.options.flowId}`, and set `data.timestamp` to `Date.now()` before the loop. This avoids repetition.
> 
> This seems simpler than the repetition for the variously-named TS members.

I agree this code isn't the most straight forward. I didn't change this much from the original UITour code because I didn't want to change any of the behavior for telemetry reporting. This code seemed subtle enought hat I wasn't sure I could reproduce its exact behavior, even with tests. I could work on refactoring this if you think it is important.

> Can you elaborate on how these timestamps are used? It feels like it would make sense if there was a telemetry histogram indicating what action was taken, or how soon such an action was taken, but sending the raw timestamps doens't make as much sense to me.

This format is a part of the old UITour way of dealing with heartbeat data, and there are already consumers of it in the Stategy & Insights team. Changing the data format isn't something that is in scope for this add-on right now. 

I tried to keep in mind while making this that this is coding to an existing API and usage pattern, not creating a new interface.

> What's the "testing" thing here and does it require shipping?

This can be set by actions to indicate that the telemetry ping should not be considered in data analysis.

> This seems mislabeled, as it really means you're accepting anything? It also doesn't seem to be used, so maybe just kill the option?

You're right, this isn't being used. This helper was copied from another project I did.

The intent here is that sometimes a 400 error page is not an error, but a legitimate API response that should be handled along the .then() path of a promise chain, not the .catch() path. But not most of the time. Errors like not being able to connect to the site should still be thrown along the .catch() path.

This entire file could be dropped if a more streamlined HTTP utility was available. I'd love to have something like fetch to replace this. Do you know if that is possible in a reasonable way?

> Can we enforce that `url` here is https, and perhaps constrain the domains client-side? If we need http-non-s to work for tests, we can have a "drop the shields" ( https://www.youtube.com/watch?v=RHoXUP804vg ) pref that we toggle on infra and/or local dev in order to run tests.

I don't think that this is an appropriate place for enforcing https, since it is a general utility function, and there may be places that the addon needs to make non-https requests.

I don't like the idea of a "drop the shields" pref. In some other places, I've discussed with security people about having such a pref and we came to the conclusion that if a security measure is worth having, it isn't worth being able to disable it. For example, we require signatures, and require Shield API servers to be https, and we have no way to disable this. Our development tools and process accomodate this.

If there are specific parts of the add-on you think should have https-only enforcment, I'd be happy to look at those.

> Please don't.
> 
> let paramObj = new URLSearchParams();
> for (let key in data) {
>   paramObj.append(key, data[key]);
> }
> url += '?' + paramObj.toString();
> 
> 
> It's also a little surprising that we then still pass `data` to Http.js...
> 
> Should this code just really live in whatever consumer isn't passing sensible URLs? Or take separate data and 'urlparams' arguments or something? This code feels smelly. :-)

Using `URLSearchParams` sounds much better. Niceties like this aren't available in web programming, so I'm used to having to roll my own.

I don't think this should live in the consumer. Proper handling of querystring paramaters is really important, and having solid and simple handling of it has fixed a few bugs in this project alone. A separate argument might make sense, but in practice this matches how I think about calling HTTP endpoints, so it works well.

> Shouldn't this string here be a pref so we can test this on integration tests and developer machines with something other than the real keys?

Rhelmer brought this up as well. This is not something that should ever change, in any environment. Allowing it to change weakens our security. In development we have alternate trust chains that still use this same name, and we configure the root trust by changing the `security.content.signature.root_hash` pref.

> The `globals` comments and package.json make me suspect you're using eslint. Do you have your own eslintrc? It doesn't seem to be included in this commit. Does all this stuff currently pass browser's eslint (./mach eslint browser/extensions/shield-recipe-client/ would be the trivial test assuming you didn't copy the .eslintrc there yet)

The github repo has it's own eslint file, yes. This also passes mach's eslint. I'll be updating the github repo's .eslintrc to more closely match Mozilla's in the future.

> Not familiar with the SDK, but shouldn't this be more unique?

All logging in the add-on is already prefixed by the name of the add-on. It might make sense to prefix the name of the current action, but I haven't found that useful in debugging.

> What is this and why is it set to true by default?

This is part of the driver API spec defined by Normandy. It shouldn't be set to true though, that's a mistake.

> This could be null. You should deal with that - probably bail out if so.
> 
> Also, browser code generally prefers double-quoted strings, but I recognize that might be too late to change everywhere.

Bailing out sounds like a good idea.

Eslint would make it easy to switch to double quotes, thanks to its --fix flag.

If this is preferred style, is the a reason it isn't suggested by eslint?

> Nit: personally I would put the `.then(response =>` bit on the same line as the thing that produces the promise.

I find putting the .then() on the same line often produces too-long lines, and easily makes confusing indentation.

If this is something that is important, I think it should be enforced by eslint.

> This looks like it's basically getting the same stuff as the UITour. Can we just delegate instead of maintaining the same logic here?

You're right, this is getting what is in UITour. I haven't found a good way to work with UITour directly from this code, since UITour assumes things like a window, which this code doesn't use. If you know a good way to delegate to UITour, I'd be happy to change this.

> !filter is true for empty string, so you can omit the OR and that check.
> 
> What happens if you pass " " to JEXL?

In some situtations the Normandy server can pass an empty string here, which represents an old style recipe that should always pass. These situations probably won't happen anymore though, so this can be removed.

> I would like for us not to do this.
> 
> If the forwarding functions in bug 1022002 work as advertised, and this did what it said on the tin, you'd have just forwarded chrome-privileged eval() into your sandbox (setTimeout("(function() { alert(Components.stack); })()", 0)).
> 
> Of course, this isn't a |window| setTimeout (because that doesn't work - you can't call it on a non-window) and Timer.jsm doesn't currently accept strings as arguments. So maybe right now we're lucky at the moment and you can't priv-esc from this. This would change if someone changed Timer.jsm or if this code was ever moved into a window. That kind of privilege escalation isn't something I would like us to have lying around as an "oops" factor for such changes.
> 
> Why do we need this? For "run code async when we've run to completion", Promise.resolved().then() works already.
> 
> Orthogonally, it'd be interesting to see how this worked out in practice with destroying the sandboxes. If we (attempt to) kill off the sandbox and the timeout resolves afterwards... what happens? The actual setTimeout implementation doesn't live in the sandbox, but the sandbox will be dead (and/or data in it will be gone). That doesn't sound like fun either.

Escalation is a serious problem, this is a good catch, thanks.

In practice, actions often need to delay for some time before doing things. Providing a setTimeout interface is a convenient way to do this. We definitely need to provide a method with this API to work with Normandy actions.

We could provide a more isolated version of this to try and avoid the escalation problems. Given that we need to have this, could something like this work?

```
sandbox.setTimeout = function(callback, time) {
  if (typeof callback !== "function") {
    throw new sandbox.Error("setTimeout must be called with a function");
  }
  setTimeout(time, () => {
    callback();
  });
}
```

> Feels like both of these and/or this method as a whole should have a .catch().
> 
> Also, as noted elsewhere, feels like these URLs should be restricted to https.
> 
> Finally, these two things feel like they should have more specific names than "response" or even just "action".

Making these https only is a nice idea. It is redundant, but redundant security isn't a bad idea. (The server, which we trust, only runs on https, and all responses are signed, so we can trust the data wasn't tampered with, and served over ssl).

`action` is a well defined term within Normandy that means something specific, and is the right term to use here. It is unfortunate that it is also a generic term.

> So what controls that these sandboxes are eventually destroyed? Or do we just leak them until the end of the process' lifetime?

I assumed that sandboxes would get garbage collected when they fell out of scope (including not being used by any action code). I'm not confident about this though.

> Why does this return true if the pref is not defined? Also, if we're using Preferences.get() and need a default value if the value is not present, we can just pass a default as the second arg.

This returns true if the pref is not defined because if the pref is not defined, SelfRepair acts as if it were defined as true.

Passing a default as the second arg is a good idea. I'm not sure why it doesn't.

> Feels like we should just throw instead?

There are tests (not including in this patch, sorry) that test this object without sandboxes, for simplicity. The tests should probably be building and passing this mock sandbox object.

> This doesn't actually work. Cf:
> 
> ```
> 
> mult = Math.pow(2, 256)
> // 1.157920892373162e+77
> mult.toString(16)
> // 1 (followed by 64 0s)
> (mult - 1000).toString(16)
> // 1 (followed by 64 0s)
> (mult - 1000000).toString(16)
> // 1 (followed by 64 0s)
> (mult - Math.pow(2,202)).toString(16)
> // 1 (followed by 64 0s)
> ```
> 
> So really your keyspace isn't a full sha256 hash, but much smaller - closer to about 2^54 (I'm not a maths or crypto geek so that estimate is likely still off by quite a bit). This is all because of JS using doubles to represent numbers. It has no correct way to talk about numbers as large as sha256.
> 
> It looks like if you're using this to bucket or whatever, you could just look at the first N bytes of the sha hash that you're generating with crypto.subtle. For things like sampling at 1/10/50% that should be enough. An appropriate maximum would be using Math.pow(16, 12) - 1 and just using 12 hex characters.

The subtraction is ineffectual, that's true. In practice though, `frac * mult` does work.

```
a = Math.pow(2, 256)
// 1.157920892373162e+77
b = a * 0.1
// 1.157920892373162e+76
b / a
// 0.1
```

This is being used for sampling and bucketing. Typical sampling ranges are 0.00001 to 0.2. For these ranges this code works well. Truncating the hash would also work just as well. Do you think it is important for this to be clearer in its assumptions about precision?

> It feels like it would make more sense to provide a bucket to the recipe rather than 'making' all the recipes duplicate the stable sample logic of passing in both the user and recipe id.

That is out of scope right now. We have to maintain compatibility with currently in production recipes that use this style of filtering. We have discussed adding easier to use sampling functions, but not yet.

> I don't think we need to ship the package.json file, or even have it in-tree?

The add-on SDK uses this. Not including it produced errors at runtime. Once we remove the use of the add-on SDK, we can remove this file.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83408

> Self-repair is disabled when the addon is installed. If the addon is removed, it should be re-enabled to leave the browser in a consistent state. This makes sense when recipe-client is an external add-on, but as a system add-on it might not make sense.

It seems like we could just have a separate commit removing the self-repair stuff if it's no longer being used when the add-on is installed? AIUI it won't be maintained going forward, so we can just remove it, right?

> I'm not sure what you mean by ".style path we use elsewhere". As to why this isn't in `lib`: In the original source, in github, we don't vendor jexl, but instead pull it in as a part of the build step. Using node-style paths makes it easier to maintain both code bases.

require("./foo.js");

So this would be require("./lib/foo"), which AIUI would also work with nodejs and locally?

> You're right, this isn't being used. This helper was copied from another project I did.
> 
> The intent here is that sometimes a 400 error page is not an error, but a legitimate API response that should be handled along the .then() path of a promise chain, not the .catch() path. But not most of the time. Errors like not being able to connect to the site should still be thrown along the .catch() path.
> 
> This entire file could be dropped if a more streamlined HTTP utility was available. I'd love to have something like fetch to replace this. Do you know if that is possible in a reasonable way?

I'm not personally familiar with Fetch, but I believe that insofar as we're shipping it for content, it should be available here, maybe with Cu.importGlobalProperties ? I definitely noticed it in the list of things you can have in a sandbox ( with the wantGlobalProperties option) so I would assume it's also available to be imported on privileged scopes.

> I don't think that this is an appropriate place for enforcing https, since it is a general utility function, and there may be places that the addon needs to make non-https requests.
> 
> I don't like the idea of a "drop the shields" pref. In some other places, I've discussed with security people about having such a pref and we came to the conclusion that if a security measure is worth having, it isn't worth being able to disable it. For example, we require signatures, and require Shield API servers to be https, and we have no way to disable this. Our development tools and process accomodate this.
> 
> If there are specific parts of the add-on you think should have https-only enforcment, I'd be happy to look at those.

Why would it ever be OK for the add-on to make insecure requests when out in the wild, under any circumstances? Having done a number of "fun" things with security aspects of our nested protocols (feed:, view-source:, jar:, moz-icon:, ...), I would really prefer to lock this down as much as we possibly can.

For the pref, I don't want a pref so users can disable it, I think we might need a pref in order to run local tests. Ditto for the signature checks which came up elsewhere: we shouldn't include the private key bits of whatever is generating that signature in a public repo (obviously). But we *should* be checking that the signature checks work using automated tests, which should live in the repo. So you need a way to specify an alternative signature chain that is only used for tests. Note that tests in mozilla-central cannot depend on talking to an external server, so we can't hide the private bits on some server somewhere that isn't part of the repo. Everything has to be in the repo. Hence the suggestion that you may need prefs *if* we need non-https for testing. If we don't, all the better, but we should definitely be checking that this stuff works and I feel quite strongly that we should enforce https on the http requests that this add-on makes.

For that same infra-restriction reason, we will need a pref for the endpoint server because Firefox will start requesting this data from said endpoint server, and on infra it will simply crash if network tries to make a non-local network connection (which we do to enforce we don't have external requests/dependencies).

> Rhelmer brought this up as well. This is not something that should ever change, in any environment. Allowing it to change weakens our security. In development we have alternate trust chains that still use this same name, and we configure the root trust by changing the `security.content.signature.root_hash` pref.

So you're saying you don't want a pref that we change in tests to make this work, but you're currently testing this by... changing another pref in tests? Why is changing the root_hash pref better than changing a dedicated pref here?

I mean, as long as we can have tests and the entirety of the test including the server component can be run from a mozilla-central tree, it's all fine by me. But if we need to mock responses in a random mochitest and this ends up being a problem...

> Bailing out sounds like a good idea.
> 
> Eslint would make it easy to switch to double quotes, thanks to its --fix flag.
> 
> If this is preferred style, is the a reason it isn't suggested by eslint?

Re: not already suggested by eslint: we have a lot of very old code and updating all of it will break blame and hasn't been something we've prioritized doing. That's obviously something reasonable people can disagree about, but the inertia + "omg all the changes and none of the blame" has meant we haven't bothered yet.

> You're right, this is getting what is in UITour. I haven't found a good way to work with UITour directly from this code, since UITour assumes things like a window, which this code doesn't use. If you know a good way to delegate to UITour, I'd be happy to change this.

Hm, not without refactoring some of the UITour stuff, so that should probably be a followup.

Though looking at the UITour thing, I think this should probably call Services.search.init() and use the callback?

> Escalation is a serious problem, this is a good catch, thanks.
> 
> In practice, actions often need to delay for some time before doing things. Providing a setTimeout interface is a convenient way to do this. We definitely need to provide a method with this API to work with Normandy actions.
> 
> We could provide a more isolated version of this to try and avoid the escalation problems. Given that we need to have this, could something like this work?
> 
> ```
> sandbox.setTimeout = function(callback, time) {
>   if (typeof callback !== "function") {
>     throw new sandbox.Error("setTimeout must be called with a function");
>   }
>   setTimeout(time, () => {
>     callback();
>   });
> }
> ```

I *think* so, yes. Do swap |time| and |callback| . :-)

Completely different realization I just had: you'll want to also provide clearTimeout and make sure you return the rv of the setTimeout so clearTimeout works, so the code in the sandbox can cancel their timeout if necessary.

> I assumed that sandboxes would get garbage collected when they fell out of scope (including not being used by any action code). I'm not confident about this though.

In theory they should be GC'd, yes. https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Language_Bindings/Components.utils.Sandbox#Freeing_the_sandbox

However... it feels like all the actions that are currently written would at some point want to create a notification bar, right? As that's kind of the only "interact with the user" entrypoint they have. At that point, code in the add-on links up references to stuff in the sandbox to things outside the sandbox, including the event listeners etc. as well as a reference to the sandbox itself. We then also keep a collection of references to all the notification boxes, windows, etc. . All of those would need to be cleared in order for a GC to happen.

So yes, we could rely on GC here, but we would need to do more work to ensure that that actually happens. We can check this in about:memory, and we could probably check that it works in a test by using a weak reference to the sandbox, forcing a GC, and then checking if the weak ref is dead.

> The subtraction is ineffectual, that's true. In practice though, `frac * mult` does work.
> 
> ```
> a = Math.pow(2, 256)
> // 1.157920892373162e+77
> b = a * 0.1
> // 1.157920892373162e+76
> b / a
> // 0.1
> ```
> 
> This is being used for sampling and bucketing. Typical sampling ranges are 0.00001 to 0.2. For these ranges this code works well. Truncating the hash would also work just as well. Do you think it is important for this to be clearer in its assumptions about precision?

I think that would be helpful and slightly more efficient, yes.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83430

This looks basically the same as I reviewed a few months ago - the usage of Cu.Sandbox is correct to my limited understanding of this mechanism, and the API which is exposed to content is limited to read-only access to data. 

One part which is new is the signing code - we should have someone who knows our signing code review that (unless you have done this already ulfr?).

::: browser/extensions/shield-recipe-client/lib/NormandyApi.js:52
(Diff revision 3)
> +      let builtSignature = `x5u="${x5u}";p384ecdsa=${signature}`;
> +
> +      const verifier = Cc['@mozilla.org/security/contentsignatureverifier;1']
> +        .createInstance(Ci.nsIContentSignatureVerifier);
> +
> +      if (!verifier.verifyContentSignature(serialized, builtSignature, certChain, 'normandy.content-signature.mozilla.org')) {

This code probably should be reviewed by someone familiar with our signing code.

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:90
(Diff revision 3)
> +      let appinfo = {
> +        version: Services.appinfo.version,
> +        channel: Services.appinfo.defaultUpdateChannel,
> +        isDefaultBrowser: ShellService.isDefaultBrowser() || null,
> +        searchEngine: Services.search.defaultEngine.identifier,
> +        syncSetup: Services.prefs.prefHasUserValue('services.sync.username'),

I'm curious why send the sync username to the recipe sandbox (more from privacy angle than security)? Do we gather that with heartbeat information?

::: browser/extensions/shield-recipe-client/lib/RecipeRunner.js:124
(Diff revision 3)
> +    const sandbox = new Cu.Sandbox(null, {
> +      wantComponents: false,
> +      wantGlobalProperties: ['URL', 'URLSearchParams'],
> +    });
> +
> +    sandbox.setTimeout = Cu.cloneInto(setTimeout, sandbox, {cloneFunctions: true});

Hmm - is this safe? I think I asked the same question last time, but I dont know enough about Cu.Sandbox to know what the implications for cloning window.timeout into a sandbox is. Probably worth confirming.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83750

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:90
(Diff revision 3)
> +      let appinfo = {
> +        version: Services.appinfo.version,
> +        channel: Services.appinfo.defaultUpdateChannel,
> +        isDefaultBrowser: ShellService.isDefaultBrowser() || null,
> +        searchEngine: Services.search.defaultEngine.identifier,
> +        syncSetup: Services.prefs.prefHasUserValue('services.sync.username'),

This sends whether or not a username is configured, ie a bool. It doesn't send the actual username.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83768

::: browser/extensions/shield-recipe-client/lib/NormandyApi.js:47
(Diff revision 3)
> +      if (rawText.indexOf(serialized) === -1) {
> +        throw new Error('Canonical recipe serialization does not match!');
> +      }
> +
> +      let certChain = (yield Http.get({url: x5u})).text;
> +      let builtSignature = `x5u="${x5u}";p384ecdsa=${signature}`;

you don't have to add the x5u in front of the signature when using verifyContentSignature
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83750

> This sends whether or not a username is configured, ie a bool. It doesn't send the actual username.

That is the intent. `syncSetup` is a boolean, indicating whether the user has setup sync or not.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83408

> I *think* so, yes. Do swap |time| and |callback| . :-)
> 
> Completely different realization I just had: you'll want to also provide clearTimeout and make sure you return the rv of the setTimeout so clearTimeout works, so the code in the sandbox can cancel their timeout if necessary.

Swapping `time` and `callback` would make for a nicer API, but it would break compatibility.

Nice catch with returning the return value and adding clearTimeout.

> What is this for?

Since actions are currently designed to run both in a UITour based shim and in this full featured environment, this provides a global object as expected by actions.
(In reply to Mike Cooper [:mythmon] from comment #22)
> Comment on attachment 8799078 [details]
> Bug 1308656 - Add shield-recipe-client as system add-on
> 
> https://reviewboard.mozilla.org/r/84386/#review83408
> 
> > I *think* so, yes. Do swap |time| and |callback| . :-)
> > 
> > Completely different realization I just had: you'll want to also provide clearTimeout and make sure you return the rv of the setTimeout so clearTimeout works, so the code in the sandbox can cancel their timeout if necessary.
> 
> Swapping `time` and `callback` would make for a nicer API, but it would
> break compatibility.

I meant in the function body:

>   setTimeout(time, () => {
>     callback();
>   });

as you note, that is not compatible with the existing setTimeout that you're delegating to. :-)
(In reply to Mike Cooper [:mythmon] from comment #21)
> Comment on attachment 8799078 [details]
> Bug 1308656 - Add shield-recipe-client as system add-on
> 
> https://reviewboard.mozilla.org/r/84386/#review83750
> 
> > This sends whether or not a username is configured, ie a bool. It doesn't send the actual username.
> 
> That is the intent. `syncSetup` is a boolean, indicating whether the user
> has setup sync or not.

Yes, I was replying to Paul's concern that this would send the username, which I know it doesn't... not sure why mozreview didn't include the original comment. :-\
That is not all the fixes that are needed. There are a few small things remaining, plus bigger things like EventEmitter security and testing that are still missing. But it is progress.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review84480

Here's some thoughts on the changes so far.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:316
(Diff revisions 2 - 4)
> -    }, 3000);
> +      if (this.surveyEndTimer) {
> +        clearTimeout(this.surveyEndTimer);
> +        this.surveyEndTimer = null;
> +      }

I'm confused. Shouldn't this be done before this new setTimeout? Right now it's still possible for both of these timers to fire.

::: browser/extensions/shield-recipe-client/lib/NormandyApi.js:78
(Diff revisions 2 - 4)
>      return this.get('classify_client/')
> -    .then(req => req.json)
> +    .then(({json: clientData}) => {
> -    .then(clientData => {
>        clientData.request_time = new Date(clientData.request_time);
>        return clientData;
>      });

Right, so... thinking about the style nits here, I don't know that there is strong consensus yet about how to deal with promises. We tend to use Task.spawn or Task.async which mean you just yield and don't need all the manual .then calls, so this doesn't come up much.

Where people split further ".foo" access between lines, prevailing style is that you indent the remainder, like in the NormandyDriver bit where you get the locale.

If it's possible to filter for this in eslint I think we should do so, but I don't think it has the capabilities right now.

Anyway, this means that if you want to keep the .then on the new line, that (and therefore probably the function body and the trailing `});`) should be indented.

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:45
(Diff revisions 2 - 4)
>      showHeartbeat(options) {
>        Log.info(`Showing heartbeat prompt "${options.message}"`);
>        let aWindow = Services.wm.getMostRecentWindow('navigator:browser');
>  
> -      let heartbeat = new Heartbeat(aWindow, {
> -        message: options.message,
> +      if (!aWindow) {
> +        throw new Error('No window to show heartbeat in');

Would it be easier on consumers to return a rejected promise here? Then .catch() (or .then with 2 callbacks) will work as expected even for this case. Because this is exposed into unprivileged code, it likely isn't possible for the unprivileged code to detect this case otherwise, and it seems to me that having 1 form of error handling in there would make sense.

::: browser/extensions/shield-recipe-client/install.rdf:11
(Diff revisions 3 - 4)
>            <em:version>0.1.3</em:version>
>            <em:name>Shield Recipe Client</em:name>
>            <em:description>Client to download and run recipes for SHIELD, Heartbeat, etc.</em:description>
> -          <em:creator>Mike Cooper &lt;mcooper@mozilla.com&gt;</em:creator>
> -          <em:optionsURL>data:text/xml,&lt;placeholder/&gt;</em:optionsURL>
>            <em:optionsType>2</em:optionsType>

Shouldn't optionsType also be gone?

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:44
(Diff revisions 3 - 4)
> - * @param {String} [options.engagementURL=null]
> + * @param {String} [options.engagementUrl=null]
>   *        The engagement URL to open in a new tab once user has engaged. If this is null
>   *        or invalid, no new tab is opened.

Actually, this doesn't seem to be used? You seem to have refactored it to use postAnswersUrl.

Related nit: please can we settle on either URL or Url for these properties, instead of the mix?

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:69
(Diff revisions 3 - 4)
> +    if (typeof options.flowId !== 'string') {
> +      throw new Error('flowId must be a string');
> +    }
> +
> +    if (!options.flowId) {
> +      throw new Error('flowId must not be an empty string')

Nit: missing semicolons here and in the one for 'message'. Note that this *is* in our .eslintrc, I think...

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:310
(Diff revisions 3 - 4)
>      // Just open the engagement tab if we have a valid engagement URL.
> -    if (engagementURL) {
> +    if (this.options.postAnswerUrl) {
>        // Open the engagement URL in a new tab.
>        this.chromeWindow.gBrowser.selectedTab =
> -        this.chromeWindow.gBrowser.addTab(engagementURL.toString(), {
> +        this.chromeWindow.gBrowser.addTab(this.options.postAnswerUrl.toString(), {
>            owner: this.chromeWindow.gBrowser.selectedTab,

In fact, passing owner is probably also wrong? AIUI, the tab that happens to be selected when the user interacts with the survey is not necessarily related to the notification at all, because the notification is in a global notification bar (rather than a tab-specific one). Sorry for not explicitly bringing this up when commenting on relatedToCurrent.

::: browser/extensions/shield-recipe-client/lib/Http.js:38
(Diff revisions 3 - 4)
>            }
> -          if (response.status === 0 || (!acceptNon200 && response.status >= 400)) {
> +          if (response.status === 0 || response.status >= 400) {
> +            const err = new Error('Failed HTTP resonse');
> +            err.response = response;
> +            reject(err);
>              reject(response);

Unless interdiff is lying you're rejecting twice here - I'm assuming you meant to remove reject(response) ?
Attachment #8799078 - Flags: review?(gijskruitbosch+bugs)
Note that your trypushes indicate we're connecting to remote servers... we probably need to make sure that whatever's doing that is using a pref, and that we update the pref stuff in testing/ to use localhost or some other mock thing so the world doesn't break. :-)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review84480

> I'm confused. Shouldn't this be done before this new setTimeout? Right now it's still possible for both of these timers to fire.

Ah, I misunderstood your first comment. After looking at it again, I understand now that you meant to call clearTimeout before calling setTimeout, instead of in the callback for setTimeout.

> Right, so... thinking about the style nits here, I don't know that there is strong consensus yet about how to deal with promises. We tend to use Task.spawn or Task.async which mean you just yield and don't need all the manual .then calls, so this doesn't come up much.
> 
> Where people split further ".foo" access between lines, prevailing style is that you indent the remainder, like in the NormandyDriver bit where you get the locale.
> 
> If it's possible to filter for this in eslint I think we should do so, but I don't think it has the capabilities right now.
> 
> Anyway, this means that if you want to keep the .then on the new line, that (and therefore probably the function body and the trailing `});`) should be indented.

I don't agree with this style, but that's ok. I'll change it all to indented-new-lines. Let me know if I missed any.

I did a quick search, and you're right, eslint can't make assertions about the indentation of chained calls on new lines. Oh well. I think they are tracking that in https://github.com/eslint/eslint/issues/5073.

> Shouldn't optionsType also be gone?

Sure. This file is being generated by JPM, and then I'm sed'ing out individual keys that aren't desired in the script that moves the code from the git repo to moz-central.

> Nit: missing semicolons here and in the one for 'message'. Note that this *is* in our .eslintrc, I think...

In `toolkit/.eslintrc` I see the rule `"semi": [2, "always"]`, but it is commented out. `eslint --print-config $THIS_FILE` also doesn't list it.
(In reply to Mike Cooper [:mythmon] from comment #30)
> > Shouldn't optionsType also be gone?
> 
> Sure. This file is being generated by JPM, and then I'm sed'ing out
> individual keys that aren't desired in the script that moves the code from
> the git repo to moz-central.

OK. What are we currently planning in terms of moving away from the SDK? That would avoid all of this as well...

> > Nit: missing semicolons here and in the one for 'message'. Note that this *is* in our .eslintrc, I think...
> 
> In `toolkit/.eslintrc` I see the rule `"semi": [2, "always"]`, but it is
> commented out. `eslint --print-config $THIS_FILE` also doesn't list it.

Right, again, we don't enforce everything that's in there because of legacy code. It's a lot of work (and for some rules, would break blame by a lot) to update all our JS.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review85134

These changes look good wrt the last comments I made, but AIUI there will be more revisions? Clearing review to get this out of my queue.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js
(Diff revisions 4 - 5)
> - * @param {String} [options.engagementUrl=null]
> - *        The engagement URL to open in a new tab once user has engaged. If this is null
> - *        or invalid, no new tab is opened.

I'm assuming this wants postAnswerUrl adding in?
Attachment #8799078 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

This isn't read for more review, just pushing so I can share what I've been working on to get some help. Sorry for the noise!
Attachment #8799078 - Flags: review?(rhelmer)
Attachment #8799078 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83236

> Did you mean to resolve these TODOs?
> 
> `about:plugins` uses `navigator.plugins` to get the list of plugins - it also does a quick refresh and fetches these async.

(I had typed this out before, but I guess I didn't submit it)

We don't have access to `navigator` in this context, since we aren't currently in a window. I could try and grab a window to access this from, but do you know another way of accessing this? How does `navigator.plugins` get populated?

Also, what do you mean by "it also does a quick refresh and fetches these async"? Can you link to the code that does this?
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83236

> (I had typed this out before, but I guess I didn't submit it)
> 
> We don't have access to `navigator` in this context, since we aren't currently in a window. I could try and grab a window to access this from, but do you know another way of accessing this? How does `navigator.plugins` get populated?
> 
> Also, what do you mean by "it also does a quick refresh and fetches these async"? Can you link to the code that does this?

Oh good point about `navigator`. Sorry the code I was referring to is `about:plugins` which is https://dxr.mozilla.org/mozilla-central/rev/f9f3cc95d7282f1fd83f66dd74acbcdbfe821915/toolkit/content/plugins.html

`navigator` is exposed to the DOM via WebIDL: https://dxr.mozilla.org/mozilla-central/source/dom/webidl/Navigator.webidl

But forget all that, there's an easier way I somehow completely forgot to mention - `AddonManager` already provides an API you can use:

```
Cu.import("resource://gre/modules/AddonManager.jsm");
let p = new Promise(resolve => AddonManager.getAddonsByTypes(["plugin"], resolve));
p.then(arr => arr.map(a => console.log(a.name)));
```
Whiteboard: [go-faster-system-addon]
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83408

> It seems like we could just have a separate commit removing the self-repair stuff if it's no longer being used when the add-on is installed? AIUI it won't be maintained going forward, so we can just remove it, right?

We should keep self-repair in so that we can roll this back with a system add-on update quickly in case things go sideways. We'd ship a no-op add-on that does nothing via Balrog, and self-repair would be active again so S&I doesn't lose their ability to run surveys and studies. Otherwise we'd need a point release to roll this back.

Once we're satisfied with how this performs on release we can probably safely remove self-repair.
(In reply to Michael Kelly [:mkelly,:Osmose] from comment #37)
> Comment on attachment 8799078 [details]
> Bug 1308656 - Add shield-recipe-client as system add-on
> 
> https://reviewboard.mozilla.org/r/84386/#review83408
> 
> > It seems like we could just have a separate commit removing the self-repair stuff if it's no longer being used when the add-on is installed? AIUI it won't be maintained going forward, so we can just remove it, right?
> 
> We should keep self-repair in so that we can roll this back with a system
> add-on update quickly in case things go sideways. We'd ship a no-op add-on
> that does nothing via Balrog, and self-repair would be active again so S&I
> doesn't lose their ability to run surveys and studies. Otherwise we'd need a
> point release to roll this back.
> 
> Once we're satisfied with how this performs on release we can probably
> safely remove self-repair.

Wouldn't it be simpler to:
0) not have code messing with the self-repair pref in the system add-on
1) turn off self-repair in firefox.js or whatever
2) if things go sideways, flip the self-repair pref with a hotfix (and/or with the no-op add-on you're suggesting).

That avoids the pref toggling code in this add-on without impacting our ability to remedy potential sideways-going if we release. :-)
(In reply to :Gijs Kruitbosch from comment #38)
> Wouldn't it be simpler to:
> 0) not have code messing with the self-repair pref in the system add-on
> 1) turn off self-repair in firefox.js or whatever
> 2) if things go sideways, flip the self-repair pref with a hotfix (and/or
> with the no-op add-on you're suggesting).
> 
> That avoids the pref toggling code in this add-on without impacting our
> ability to remedy potential sideways-going if we release. :-)

RE: (2), my understanding is that hotfixes are a huge pain to put out vs system add-on updates (which is why a lot of hotfixes are being written as system add-ons these days). And writing a separate system add-on that flips the pref, getting it tested and  approved, is all the work of pushing out a no-op add-on plus making it flip the pref anyway.

That plan also removes the possibility of us shipping the add-on via an update to an earlier version than the version this lands in (assuming it'll land in mozilla-central in 53 since we're so close to the train switch at this point) without de-syncing the Github version that we build the add-on from, and the version that lands in mozilla-central. That's because if we ship it to an earlier version, we need this pref logic in it, so if it got removed here, we'd have to add it back in for the system add-on update. 

Shipping that way is one of the possibilities suggested by QA for getting to 52 if this lands in moz-central after the train switch. We're discussing things with RelMan, but it'd be nice to keep that option open.

Do you have specific concerns about this pref toggling code that makes the alternative more attractive? We've been using it in internal testing for a while and it seems to work reliably enough.
(In reply to Michael Kelly [:mkelly,:Osmose] from comment #39)
> (In reply to :Gijs Kruitbosch from comment #38)
> > Wouldn't it be simpler to:
> > 0) not have code messing with the self-repair pref in the system add-on
> > 1) turn off self-repair in firefox.js or whatever
> > 2) if things go sideways, flip the self-repair pref with a hotfix (and/or
> > with the no-op add-on you're suggesting).
> > 
> > That avoids the pref toggling code in this add-on without impacting our
> > ability to remedy potential sideways-going if we release. :-)
> 
> RE: (2), my understanding is that hotfixes are a huge pain to put out vs
> system add-on updates (which is why a lot of hotfixes are being written as
> system add-ons these days). And writing a separate system add-on that flips
> the pref, getting it tested and  approved, is all the work of pushing out a
> no-op add-on plus making it flip the pref anyway.
> 
> That plan also removes the possibility of us shipping the add-on via an
> update to an earlier version than the version this lands in (assuming it'll
> land in mozilla-central in 53 since we're so close to the train switch at
> this point) without de-syncing the Github version that we build the add-on
> from, and the version that lands in mozilla-central. That's because if we
> ship it to an earlier version, we need this pref logic in it, so if it got
> removed here, we'd have to add it back in for the system add-on update. 
> 
> Shipping that way is one of the possibilities suggested by QA for getting to
> 52 if this lands in moz-central after the train switch. We're discussing
> things with RelMan, but it'd be nice to keep that option open.
> 
> Do you have specific concerns about this pref toggling code that makes the
> alternative more attractive? We've been using it in internal testing for a
> while and it seems to work reliably enough.

No specific concerns, just 'simplest thing that could possibly work'. I had not considered the idea that we wanted to ship post-facto to 52, rather than uplifting. We can leave it as-is, then, and simplify once we're "committed", in more than one sense of the word. :-)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83408

> You can pass appendNotification a document fragment instead of a string. This avoids having to do all the manual insertion afterwards, and might also help with adding classes for styling.

This part of the code will likely get reworked in the future anyways, so I'd prefer not to worry about this at the moment.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review83236

> Hmm. I would have expected jpm to include the tests in the built addon, which this patch was based on. I'll the tests directory.
> 
> I agree that we should have tests that run in mozilla-central on checkin, but I'm not sure how to accomplish that. Do you know what we might do?

I included the tests, and hooked up a potential test runner to them, but I haven't been able to get it to work. Can you look over what I've got here?
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review89654

Getting closer - don't let the number of issues I've noted get you down, they are mostly pretty minor.

::: browser/extensions/shield-recipe-client/lib/EnvExpressions.js:11
(Diff revisions 5 - 7)
> -const {Jexl} = require('jexl');
> -const {stableSample} = require('./stableSample.js');
> +// const Jexl = require('../node_modules/jexl/dist/jexl.min.js');
> +const {Jexl} = require("jexl");

Can you clarify what's going on here? :-)

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:121
(Diff revisions 5 - 7)
>            return true;
>          },
>        }];
>      }
>  
> -    this.notificationBox = this.chromeWindow.document.getElementById('high-priority-global-notificationbox');
> +    this.notificationBox = this.chromeWindow.gBrowser.getNotificationBox();

This will only show up in 1 browser instead of every browser in the tabbrowser. Is that the goal?

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:278
(Diff revisions 5 - 7)
> -      addClientId: true,
> +        addClientId: true,
> -      addEnvironment: true,
> +        addEnvironment: true,
> -    });
> +      });
>  
> -    // only for testing
> +      // only for testing
> -    this.events.emit('TelemetrySent', payload);
> +      Log.debug("Senting telemetry");

Is the Log only for testing, or the event, or both (see comment above this line) - do we even still need it? If we do, can we make the logging less confusing? I thought I was going to nit you for "Sending" vs. "Senting", but then realize that maybe the "typo" was deliberate? Making it more obvious what this is logging would be good.

::: browser/extensions/shield-recipe-client/lib/Heartbeat.js:326
(Diff revisions 5 - 7)
> -    setTimeout(() => {
> +    setTimeout(() => this.close(), NOTIFICATION_TIME);
> +  }
> +
> +  handleWindowClosed() {
> +    this.maybeNotifyHeartbeat("WindowClosed");
> +    this.cleanUp();

typo - cleanup vs. cleanUp - so this should throw. Isn't it throwing? Kind of surprising uncaught exceptions don't break the tests, but I guess if they don't run, that's hard to verify...

But also, both this and `this.close()` call cleanup explicitly, but also, AIUI, fire events that lead to cleanup being called from the `maybeNotifyHeartbeat` code. We should pick either model of responsibility and use it consistently. :-)

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:8
(Diff revisions 5 - 7)
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  "use strict";
>  
> -const {uuid} = require('sdk/util/uuid');
> -const {Cu} = require('chrome');
> +const {uuid} = require("sdk/util/uuid");
> +const {components, Cu, Cc, Ci} = require("chrome");

Nit: Components is normally uppercase. Not sure if this destructuring assignment even works.

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:23
(Diff revisions 5 - 7)
> +  if (typeof sandboxManager !== "object") {
> +    throw new Error(`sandboxManager must be an object, not "${typeof sandboxManager}"`);
> +  }

Note that this will pass for `null`, because js. I'd probably just boolean-check, but if you prefer you can do "and and", ie both check for truthiness and typeof.

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:93
(Diff revisions 5 - 7)
>        return sandbox.Promise.resolve(p);
>      },
>  
>      client() {
> -      let appinfo = {
> -        version: Services.appinfo.version,
> +      const appInfo = {
> +        version: Services.appInfo.version,

AFAICT it's Services.appinfo - Services.appInfo doesn't exist on my nightly, so this isn't right - why was this changed?

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:105
(Diff revisions 5 - 7)
> +          Services.search.init(rv => {
> +            if (components.isSuccessCode(rv)) {
> +              appInfo.searchEngine = Services.search.defaultEngine.identifier;
> +            }
> +            resolve();
> +          });
> +        })
> +        // get list of plugins
> +        .then(() => new Promise(resolve => {
> +          AddonManager.getAddonsByTypes(["plugin"], plugins => {
> +            plugins.forEach(plugin => appInfo.plugins[plugin.name] = plugin);
> +            resolve();
> +          });
> +        }))

You can run these in parallel with something like:

```js
let searchEnginePromise = new Promise(resolve => {
  Services.search.init(rv => {
    if (Components.isSuccessCode(rv)) {
      appInfo.searchEngine = Services.search.defaultEngine.identifier;
    }
    resolve();
  });
});
let pluginsPromise = ...;
return new sandbox.Promise(resolve => {
  Promise.all([searchEnginePromise, pluginsPromise]).then(() => {
    resolve(Cu.cloneInto(appInfo, sandbox));
  });
});
```

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.js:153
(Diff revisions 5 - 7)
> +      if (typeof cb !== "function") {
> +        throw new sandbox.Error(`setTimeout must be called with a function, got "${typeof cb}"`);
> +      }
> +      const token = setTimeout(() => {
> +        cb();
> +        sandboxManager.removeHold("setTimeout");

I don't think we ever add a hold with this name. :-)

::: browser/extensions/shield-recipe-client/lib/RecipeRunner.js:1
(Diff revisions 5 - 7)
> -/* This Source Code Form is subject to the terms of the Mozilla Public
> +/* This Source Code Form is subject to the terms of the Mozilla  * License, v. 2.0. If a copy of the MPL was not distributed with this

Something funky happened here, it looks like?

::: browser/extensions/shield-recipe-client/lib/RecipeRunner.js:138
(Diff revisions 5 - 7)
> +      var pendingAction = null;
> +
>        function registerAction(name, Action) {
>          let a = new Action(sandboxedDriver, sandboxedRecipe);
> -        a.execute()
> +        pendingAction = a.execute()
>          .catch(err => sandboxedDriver.log(err, 'error'));

Nit: indentation (at first I thought the previous line was missing a semicolon...).

Also, as an example, I *think* that for things like this, if the waiveXrays thing is in effect on the sandbox, and the code in the execute() function does something like this:

let x = {toString: function() { if (a) { return "foo" } return "bar"; };
throw x;

the logger's stringifier will be able to invoke the custom toString method. Without waiveXrays, this is not the case, and so it'll get the regular [object Object] toString. This is good because then objects from the sandbox cannot lie about what they are or  mislead the callers (I *think* the compartmentalization of the sandbox and function forwarding should stop privilege escalation, but ymmv).

::: browser/extensions/shield-recipe-client/lib/stableSample.js:49
(Diff revisions 5 - 7)
>      // toString(16) will give the hex representation of the number without padding
>      let stringValue = value.toString(16);
>      // We use concatenation and slice for padding
> -    let padding = '00000000';
> +    let padding = "00000000";
>      let paddedValue = (padding + stringValue).slice(-padding.length);
>      hexCodes.push(paddedValue);

Apparently we're shipping this, so:

```
hexCodes.push(value.toString(16).padStart(8, "0"));
```

should work here. :-)

::: browser/extensions/shield-recipe-client/data/EventEmitter.js:10
(Diff revision 7)
> +// This file is meant to run inside action sandboxes
> +
> +"use strict";
> +
> +
> +function EventEmitter(driver) { // eslint-disable-line no-unused-vars

Nit: I would instead use an annotation to indicate EventEmitter is exported from here.

::: browser/extensions/shield-recipe-client/data/EventEmitter.js:26
(Diff revision 7)
> +        .then(() => {
> +          if (!(eventName in listeners)) {
> +            driver.log(`EventEmitter: Event fired with no listeners: ${eventName}`);
> +            return;
> +          }
> +          let frozenEvent = Object.freeze(event);

Out of curiosity, why are we freezing this?

::: browser/extensions/shield-recipe-client/data/EventEmitter.js:43
(Diff revision 7)
> +      listeners[eventName].push(callback);
> +    },
> +
> +    off(eventName, callback) {
> +      if (!(eventName in listeners)) {
> +        throw new Error("Called off() for event that has no listeners");

Is this standard behaviour for `off()` in node? `removeEventListener` from DOM land would never throw, it would just do nothing.

::: browser/extensions/shield-recipe-client/data/EventEmitter.js:45
(Diff revision 7)
> +
> +    off(eventName, callback) {
> +      if (!(eventName in listeners)) {
> +        throw new Error("Called off() for event that has no listeners");
> +      }
> +      if (eventName in listeners) {

You already checked for the negative and are throwing (or perhaps early returning) so this extra check is unnecessary.

::: browser/extensions/shield-recipe-client/data/EventEmitter.js:46
(Diff revision 7)
> +    off(eventName, callback) {
> +      if (!(eventName in listeners)) {
> +        throw new Error("Called off() for event that has no listeners");
> +      }
> +      if (eventName in listeners) {
> +        listeners[eventName] = listeners[eventName].filter(l => l !== callback);

The nodejs docs say:

> removeListener will remove, at most, one instance of a listener from the listener array. If any single listener has been added multiple times to the listener array for the specified eventName, then removeListener must be called multiple times to remove each instance.

Your implementation removes all the matching listeners. Was that intentional?

::: browser/extensions/shield-recipe-client/data/EventEmitter.js:50
(Diff revision 7)
> +    once(eventName, callback) {
> +      var hasRun = false;
> +      this.on(eventName, event => {
> +        if (hasRun) {
> +          return;
> +        }
> +        hasRun = true;
> +        callback(event);
> +        this.off(eventName, callback);
> +      });
> +    },

I don't think this works.

You're adding an event listener, but you don't add the callback that you get passed, you're adding your own function. So the consumer cannot call `off()` itself, and the code in here that calls `off()` will be a no-op because it does not pass itself but the callback it was passed (which is not in the array).

::: browser/extensions/shield-recipe-client/lib/SandboxManager.js:14
(Diff revision 7)
> +    this._sandbox = makeSandbox();
> +    this.holds = [];
> +  }
> +
> +  get sandbox() {
> +    if (this._sandbox !== null) {

Rather than the strict comparisons with null everywhere, I would simply use truthiness of the `_sandbox` member. It's never going to be anything else than a valid object or null anyway. So:

```
if (this._sandbox) {
  return this._sandbox;
}
```

and in assertNuked,

```
if (!this._sandbox) {
  resolve();
} else { ... }
```

etc.

::: browser/extensions/shield-recipe-client/lib/SandboxManager.js:60
(Diff revision 7)
> +  }
> +};
> +
> +
> +function makeSandbox() {
> +  const sandbox = Cu.waiveXrays(new Cu.Sandbox(null, {

Why do we need waiveXrays? I... I don't think that's a good idea, off-hand.

::: browser/extensions/shield-recipe-client/test/.eslintrc.yaml:1
(Diff revision 7)
> +globals:

We're using .eslintrc.js in other places in the tree by now, so I would suggest changing so it matches. Sorry for the faff (I blame eslint for switching from the perfectly workable json format they had before).

::: browser/extensions/shield-recipe-client/test/jetpack-package.ini:1
(Diff revision 7)
> +[test-driver.js]

I haven't checked the tests - I'll doublecheck them when they work and the code is OK - and am unsure how to make jetpack tests run. Hopefully rhelmer can help, otherwise I guess maybe Mossop? IIRC he did some of the initial conversions so they run as hacked up mochitests in-tree.
Attachment #8799078 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review89738
Attachment #8799078 - Flags: review?(rhelmer)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review89654

> I haven't checked the tests - I'll doublecheck them when they work and the code is OK - and am unsure how to make jetpack tests run. Hopefully rhelmer can help, otherwise I guess maybe Mossop? IIRC he did some of the initial conversions so they run as hacked up mochitests in-tree.

I just dug into this a bit with some addons folks - so the situation is that the old `cfx` tool is still used as a test runner for the in-tree SDK add-ons that we have, in mozilla-central `addon-sdk/source/test/addons`. However I think that's already all hooked up via `mach` and you don't necessarily need to get that deep into it.

It looks like `jetpack-package.ini` files are supported by `mach mochitest`, this seems to run some tests for me:
`./mach mochitest browser/extensions/shield-recipe-client/`

I see a bunch of "suite skipped" messages, but this is probably a place to start.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review89654

> This will only show up in 1 browser instead of every browser in the tabbrowser. Is that the goal?

That is not the goal. I'm not really sure what these terms mean though. The goal is that changing tabs would still show the notification, but that the notification only shows in one window. Is a tab a "browser in the tabbrowser"? How can I do this better?

> typo - cleanup vs. cleanUp - so this should throw. Isn't it throwing? Kind of surprising uncaught exceptions don't break the tests, but I guess if they don't run, that's hard to verify...
> 
> But also, both this and `this.close()` call cleanup explicitly, but also, AIUI, fire events that lead to cleanup being called from the `maybeNotifyHeartbeat` code. We should pick either model of responsibility and use it consistently. :-)

I do have the tests running outside of mozilla-central, in the jpm repo. They are passing, so I guess this represents lacking coverage from the tests. I'll see if I can improve that.

> Nit: Components is normally uppercase. Not sure if this destructuring assignment even works.

The lowercase version is actually correct here, mostly because the addon SDK is weird. https://developer.mozilla.org/en-US/Add-ons/SDK/Tutorials/Chrome_Authority

> Nit: I would instead use an annotation to indicate EventEmitter is exported from here.

I'm not familiar with annotations. Can you give an example of this?

> Out of curiosity, why are we freezing this?

Since we will be passing the event object to multiple event handlers, I figured it would be good to not allow modification of the object between event handlers. Using Object.freeze seemed cleaner than creating a clone of the object per listener.

> Is this standard behaviour for `off()` in node? `removeEventListener` from DOM land would never throw, it would just do nothing.

I don't have an precedent for this. It seemed like a good thing to that code was balanced. Fitting in with DOM land sounds more important though.

> I don't think this works.
> 
> You're adding an event listener, but you don't add the callback that you get passed, you're adding your own function. So the consumer cannot call `off()` itself, and the code in here that calls `off()` will be a no-op because it does not pass itself but the callback it was passed (which is not in the array).

Ah, you're right! I spent a lot of time trying to figure out why this didn't work. I chalked it up to race conditions, but you're totally right.

Relatedly, I didn't think it was important for the consumer to be able to call off() itself. Do you think that is something that is important to support?

> Why do we need waiveXrays? I... I don't think that's a good idea, off-hand.

Several parts of the code rely on functions outside the sandbox seeing modifications on objects made inside the sandbox (I think this is expandos?). With out this, for example, I couldn't find a way to call |EventEmitter#emit| when the EventEmitter lived inside the sandbox. I would always get permission denied errors.

I suppose there could be a separate property, |sandboxManager.waivedSandbox| that could be used for these particular cases, and the general case would use the xrayed version.

> I just dug into this a bit with some addons folks - so the situation is that the old `cfx` tool is still used as a test runner for the in-tree SDK add-ons that we have, in mozilla-central `addon-sdk/source/test/addons`. However I think that's already all hooked up via `mach` and you don't necessarily need to get that deep into it.
> 
> It looks like `jetpack-package.ini` files are supported by `mach mochitest`, this seems to run some tests for me:
> `./mach mochitest browser/extensions/shield-recipe-client/`
> 
> I see a bunch of "suite skipped" messages, but this is probably a place to start.

I've talked with Mossop about this, and he pointed me towards the same thing. I worked on it a bit. Looking at the "suite skipped" messages, they are all import errors. Mossop recommended making a custom loader for the tests to fix the issue. After some work, I ended up with this, but it didn't work:


const {Loader, Require} = require("toolkit/loader");
const loader = new Loader({
  paths: {
    "": "resource://gre/modules/commonjs/",
    lib: "resource://shield-recipe-client-at-mozilla-dot-org/lib",
    test: "resource://shield-recipe-client-at-mozilla-dot-org/test",
  },
});
const extRequire = new Require(loader, module);

const {Storage} = extRequire("lib/Storage.js");


It worked to load files in lib and test directories, as well as all the normal SDK requires I use. But it failed on relative imports in SDK modules. For example, one of the tests has require("lib/Storage"). In lib/Storage.js, there is require("sdk/indexed-db"). *That* file has require("./self"). The import doesn't raise an error, but it also doesn't return the correct thing, and the test fails.

Do either of you know where I might go from there?
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review89654

> That is not the goal. I'm not really sure what these terms mean though. The goal is that changing tabs would still show the notification, but that the notification only shows in one window. Is a tab a "browser in the tabbrowser"? How can I do this better?

Right, so, terminology:
- browser: a thing that shows a webpage
- tabbrowser: `gBrowser` global in a Firefox window, which represents the tabbed browser, which has 1-N tabs with each tab having a browser (which we're slowly moving to constructing lazily, but anyway)

I reviewed the interdiff, and you used to fetch the "high priority notification box" with `getElementById`, which always shows at the top of every browser in a tabbrowser (so the notification will stay if you switch tabs).

`gBrowser.getNotificationBox()` instead will fetch the notification box specific to the currently selected tab (and therefore browser) in the tabbrowser. That's useful if you're doing a notification specific to the page open in the browser, but that's not what the heartbeat notifications generally want, I think.

So yeah, I think you should revert that line to using the high prio notification box. Does that make sense or am I missing something?

> The lowercase version is actually correct here, mostly because the addon SDK is weird. https://developer.mozilla.org/en-US/Add-ons/SDK/Tutorials/Chrome_Authority

D'oh, thanks for clarifying.

> I'm not familiar with annotations. Can you give an example of this?

Oh, I think I'm using inaccurate terminology. I mean, you're telling eslint to ignore that the EventEmitter is unused. You could instead tell it that the EventEmitter is exported, which seems semantically more precise. From a quick search, I think this is basically just:

```js
/* exported EventEmitter */
```

The internet also seems to indicate that assigning to `this.EventEmitter` might also make eslint accept that this is exported, but IMHO that's uglier than the comment. Up to you though.

> Since we will be passing the event object to multiple event handlers, I figured it would be good to not allow modification of the object between event handlers. Using Object.freeze seemed cleaner than creating a clone of the object per listener.

Right, this makes sense, I think. Thanks!

> Ah, you're right! I spent a lot of time trying to figure out why this didn't work. I chalked it up to race conditions, but you're totally right.
> 
> Relatedly, I didn't think it was important for the consumer to be able to call off() itself. Do you think that is something that is important to support?

I'm not sure how important being able to call `off()` for `once()` listeners will be for recipes in practice. I can see a use for it in general (e.g. if you want one of 3 different notifications, and as soon as any 1 comes in, you want to remove the others), but then, I didn't see uses of `once()` in the non-test code so perhaps the recipes/add-on wouldn't normally care - but maybe I missed them?

I *think* the easiest way to support external consumers being able to remove `once()` listeners would be for the array of listeners to have metadata indicating whether the listener should be removed after it's been invoked. You'd have to replace the `indexOf()` to find items with something like `findIndex(item => item.callback == callback)`, and the `emit()` code could itself remove items after it had emitted the relevant event to them (though be careful about modifying that list while iterating it to emit, of course...).

Anyway, if we think it's not important that can be a followup. Just need to remember to file all of those. :-)

> Several parts of the code rely on functions outside the sandbox seeing modifications on objects made inside the sandbox (I think this is expandos?). With out this, for example, I couldn't find a way to call |EventEmitter#emit| when the EventEmitter lived inside the sandbox. I would always get permission denied errors.
> 
> I suppose there could be a separate property, |sandboxManager.waivedSandbox| that could be used for these particular cases, and the general case would use the xrayed version.

Hm. I'm assuming we're talking about expandos on the stuff inside the sandbox being accessed by the code outside the sandbox? If the code outside the sandbox wanted to call EventEmitter, I think it could instead use evalInSandbox, and then you wouldn't need to waive xrays. So something like:

```js
sandbox._nextEvent = Cu.cloneInto(ev, sandbox);
sandbox._nextEvent.freeze(); // Not sure if this needs evalInSandbox, I think it shouldn't because it's a proto method rather than an expando.
Cu.evalInSandbox(`eventEmitter.emit("${eventName}", _nextEvent)`, sandbox);
```

Would that make sense, or am I missing a reason why that couldn't work?

> I've talked with Mossop about this, and he pointed me towards the same thing. I worked on it a bit. Looking at the "suite skipped" messages, they are all import errors. Mossop recommended making a custom loader for the tests to fix the issue. After some work, I ended up with this, but it didn't work:
> 
> 
> const {Loader, Require} = require("toolkit/loader");
> const loader = new Loader({
>   paths: {
>     "": "resource://gre/modules/commonjs/",
>     lib: "resource://shield-recipe-client-at-mozilla-dot-org/lib",
>     test: "resource://shield-recipe-client-at-mozilla-dot-org/test",
>   },
> });
> const extRequire = new Require(loader, module);
> 
> const {Storage} = extRequire("lib/Storage.js");
> 
> 
> It worked to load files in lib and test directories, as well as all the normal SDK requires I use. But it failed on relative imports in SDK modules. For example, one of the tests has require("lib/Storage"). In lib/Storage.js, there is require("sdk/indexed-db"). *That* file has require("./self"). The import doesn't raise an error, but it also doesn't return the correct thing, and the test fails.
> 
> Do either of you know where I might go from there?

I don't, sorry... I think we've already established I don't know very much about the SDK, so I'm not sure. Kris Maglione has recently changed some of how these require thingies work (so they're not as terrible for perf anymore, yay!) so maybe he has ideas?
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review90344

I think this is effectively almost done. We need to ensure that existing tests pass (see below) and that the new tests run in infra. I commented on the tests below. The code changes since last time look good, my earlier comment should help resolve the other issues that weren't addressed in the patch yet.

::: browser/extensions/shield-recipe-client/moz.build:10
(Diff revision 8)
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +DEFINES['MOZ_APP_VERSION'] = CONFIG['MOZ_APP_VERSION']
> +DEFINES['MOZ_APP_MAXVERSION'] = CONFIG['MOZ_APP_MAXVERSION']
> +
> +FINAL_TARGET_FILES.features['shield-recipe-client@mozilla.org'] += [

So I did a new trypush, and it's oranging on Windows and OS X saying:

 408 INFO TEST-UNEXPECTED-FAIL | toolkit/components/perfmonitoring/tests/browser/browser_AddonWatcher.js | The add-on watcher has detected the misbehaving addon - "shield-recipe-client@mozilla.org" == "addonwatcher-test@mozilla.com" - JS frame :: chrome://mochitests/content/browser/toolkit/components/perfmonitoring/tests/browser/browser_AddonWatcher.js :: burn_rubber</observer :: line 91 
 
 cf. https://treeherder.mozilla.org/#/jobs?repo=try&revision=04482dad1660&selectedJob=30323896

I guess we need to somehow tell that code / test to exclude this add-on? Not super familiar with that code, but we should sort it out before landing.

::: browser/extensions/shield-recipe-client/test/test-EventEmitter.js:58
(Diff revision 8)
> +  });
> +
> +  eventEmitter.emit("foo");
> +};
> +
> +exports["test off() works"] = (assert, done) => {

Other things about this event listener code that might be worth improving and testing if we think `off()` will be used in practice by recipe / other code: right now I think `off()` inside of an event listener (or alternatively, `on()` for the same event that is being emitted) will break iteration and so listeners will be skipped or called several times.

Simplest way to avoid this would be to clone the array (`Array.from(ary)`) before iterating over it, though that means that removals also don't take effect until after the current event has dispatched. Alternatively, you could iterate "old style" instead of `for...of` and decrement the iterator if you detect an item has been added/deleted, or something. It's tricky. Dunno offhand what node and/or the DOM do generally.

::: browser/extensions/shield-recipe-client/test/test-Heartbeat.js:49
(Diff revision 8)
> +    flowId: "test",
> +    message: "test",
> +    engagementButtonLabel: "Click me!",
> +  });
> +  assert.equal(hb.notice.querySelectorAll(".star-x").length, 0);
> +  assert.equal(hb.notice.querySelectorAll(".notification-button").length, 1);

Might as well verify the label is inserted on the button?

::: browser/extensions/shield-recipe-client/test/test-Heartbeat.js:87
(Diff revision 8)
> +    flowId: "test",
> +    message: "test",
> +  });
> +
> +  hb.eventEmitter.on("TelemetrySent", payload => {
> +    assert.ok(isOrdered([0, payload.offeredTS, payload.closedTS]));

Several points here (and other similar assertions):

- the failure message here will be useless if it ever fails. Can we write this in a way (e.g. by putting the asserts in the helper for each individual comparison, and actually using assert.greater() or something like this, so both values get echoed?
- `isOrdered` uses strict greater than comparisons. Is it not possible, say if this runs on a very fast machine, for the timestamps to be identical?
- Should we add `Date.now()` to the end of the array to also assert the dates are not in the future?

::: browser/extensions/shield-recipe-client/test/test-Heartbeat.js:168
(Diff revision 8)
> +  }
> +
> +  let promises = [];
> +
> +  for (let notification of notificationBox.allNotifications) {
> +    promises.push(waitForNotificationClose(notification));

This is slightly inefficient in that you're creating a mutation observer for each of these notifications, and the only reason we're using `waitForNotificationClose` is to verify that *all* of them have closed - so we might as well add only 1 mutation observer that disconnects and resolves when `allNotifications.length === 0`. Anyway, it's just a test, so maybe (probably!) I'm overthinking this. :-)

::: browser/extensions/shield-recipe-client/test/test-driver.js:12
(Diff revision 8)
> +let sandboxManager;
> +let driver;
> +
> +exports["test uuid"] = assert => {
> +  let uuid = driver.uuid();
> +  assert.notEqual(/^[a-f0-9-]{36}$/.exec(uuid), null);

Nit: assert.ok(/.../.test(str));

Here and elsewhere: should we include human-readable descriptions where we use `assert.ok` and/or `assert.equal`/`assert.notEqual` and the values aren't self-explanatory?

Trivial small additional test we could do: verify that running it twice produces the same or different UUIDs (depending on what we expect, I forget!).

::: browser/extensions/shield-recipe-client/test/test-env-expressions.js:2
(Diff revision 8)
> +const {Cu} = require("chrome");
> +Cu.import("resource://gre/modules/TelemetryController.jsm"); /* globals TelemetryController: false */

Nit: don't need the `: false` bit.

Also, I thought we had a mozilla eslint plugin that dealt with the implied semantics of Cu.import so you only need these annotations if the jsm's name doesn't match the thing you're globalling (e.g. "osfile.jsm" globals "OS"), which isn't the case here. Is it possible for you to use that plugin also for the github repo to avoid the extra comments?

::: browser/extensions/shield-recipe-client/test/test-env-expressions.js:50
(Diff revision 8)
> +  let context = {someTime: new Date(2016, 0, 1)};
> +
> +  return Promise.all([
> +    EnvExpressions.eval('"2015-01-01"|date < someTime', context)
> +      .then(val => assert.ok(val)),
> +    EnvExpressions.eval('"2017-01-01"|date > someTime', context)

Add one for equality?

::: browser/extensions/shield-recipe-client/test/test-env-expressions.js:56
(Diff revision 8)
> +      .then(val => assert.ok(val)),
> +  ]);
> +});
> +
> +exports["test has a stable sample transform"] = promiseTest(assert => {
> +  return EnvExpressions.eval('["test"]|stableSample(0.999)')

Does this fail one in 1000 times? You would be surprised how often that will start hitting in our m-c automation...

Can we think of a way to test this that doesn't cause test failures, maybe by comparing the value with the one generated by calling stableSample directly with similar input, for various inputs?
Attachment #8799078 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review89654

> Oh, I think I'm using inaccurate terminology. I mean, you're telling eslint to ignore that the EventEmitter is unused. You could instead tell it that the EventEmitter is exported, which seems semantically more precise. From a quick search, I think this is basically just:
> 
> ```js
> /* exported EventEmitter */
> ```
> 
> The internet also seems to indicate that assigning to `this.EventEmitter` might also make eslint accept that this is exported, but IMHO that's uglier than the comment. Up to you though.

I also found the notations /* exported EventEmitter */, but it didn't seem to work for me. I think assigning to this.EventEmitter is nicer than the disable-line comment, so I'll go with that.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review91912

Code-wise, this looks OK. But I didn't see any changes to fix the addon watcher orange, nor a new trypush... are the issues in existing tests fixed and have we verified these tests are running on infra?

r+ in the sense that the code is fine, AFAICT, but I'm guessing there's 1 or 2 things left given those questions ^^ ?

::: browser/extensions/shield-recipe-client/test/test-EventEmitter.js:26
(Diff revisions 8 - 9)
> -    assert.equal(flag, 1);
> +    assert.equal(flag, 1, "event handler fired multiple times");
>      done();
>    });
> -  assert.equal(flag, 0);
> +  assert.equal(flag, 0, "event handler fired before event");
>    eventEmitter.emit("foo");
> -  assert.equal(flag, 0);
> +  assert.equal(flag, 0, "event handler fired syncronously");

Nit: synchronously has an h.
Attachment #8799078 - Flags: review?(gijskruitbosch+bugs) → review+
Thanks for all the review to get the code this far. I do plan on putting a bit more work in on this, but I wanted to get the most up to date patch on bugzilla so I could share it with others today. Specifically I'd like to get the tests running on moz-central (they still aren't), and fix that addon watch orange.

I had some questions about the addon watcher orange that got lost in reviewboard. I'll re-post those.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review90344

> So I did a new trypush, and it's oranging on Windows and OS X saying:
> 
>  408 INFO TEST-UNEXPECTED-FAIL | toolkit/components/perfmonitoring/tests/browser/browser_AddonWatcher.js | The add-on watcher has detected the misbehaving addon - "shield-recipe-client@mozilla.org" == "addonwatcher-test@mozilla.com" - JS frame :: chrome://mochitests/content/browser/toolkit/components/perfmonitoring/tests/browser/browser_AddonWatcher.js :: burn_rubber</observer :: line 91 
>  
>  cf. https://treeherder.mozilla.org/#/jobs?repo=try&revision=04482dad1660&selectedJob=30323896
> 
> I guess we need to somehow tell that code / test to exclude this add-on? Not super familiar with that code, but we should sort it out before landing.

Looking at the failed test's code, I think there is a larger problem. It seems that there are no special cases for any other system add-on, so none of the other system are interfering with the test. I take that to mean that shield-recipe-client is the first system add-on that triggers the slow add-on warnings. Which means it triggers the slow add-on warnings, which worries me. I could probably patch the test to skip over the normandy system addon, accepting that it is slow. That seems bad though.

I've been running the regular version of this add-on (not the system add-on) for a while, and  I don't have any values for PERF_MONITORING_SLOW_ADDON_JANK_US relating to it in about:telemetry. So I haven't been able to reproduce this problem. I wonder why it is being tripped in this test?

What do you think? Patch the test? Or dig through the performance monitoring to figure out why this add-on is feing flagged?

> Several points here (and other similar assertions):
> 
> - the failure message here will be useless if it ever fails. Can we write this in a way (e.g. by putting the asserts in the helper for each individual comparison, and actually using assert.greater() or something like this, so both values get echoed?
> - `isOrdered` uses strict greater than comparisons. Is it not possible, say if this runs on a very fast machine, for the timestamps to be identical?
> - Should we add `Date.now()` to the end of the array to also assert the dates are not in the future?

assert.greater() doesn't seem to exist on the JPM assert library. Besides that, I agree. I'll try and make the assertions a little more helpful.

cf. https://developer.mozilla.org/en-US/Add-ons/SDK/Low-Level_APIs/test_assert

> This is slightly inefficient in that you're creating a mutation observer for each of these notifications, and the only reason we're using `waitForNotificationClose` is to verify that *all* of them have closed - so we might as well add only 1 mutation observer that disconnects and resolves when `allNotifications.length === 0`. Anyway, it's just a test, so maybe (probably!) I'm overthinking this. :-)

That sounds like a better idea. I tried to implement it, and wasn't receiving any events. I'll leave this as is.

> Add one for equality?

Equality ends up being pretty hard to test, because the system isn't really meant to test the equality of dates. This will mostly be used to put upper and/or lower bounds on recipes, equality isn't useful. No one really wants a recipe that only executes for one millisecond.

> Does this fail one in 1000 times? You would be surprised how often that will start hitting in our m-c automation...
> 
> Can we think of a way to test this that doesn't cause test failures, maybe by comparing the value with the one generated by calling stableSample directly with similar input, for various inputs?

This will either always fail, or always succeed. stableSample is not a random sampling, but a function of the input. The only input here is `["test"]`, which does not vary between tests.
(In reply to Mike Cooper [:mythmon] from comment #55)
> Comment on attachment 8799078 [details]
> Bug 1308656 - Add shield-recipe-client as system add-on
> 
> https://reviewboard.mozilla.org/r/84386/#review90344
> 
> > So I did a new trypush, and it's oranging on Windows and OS X saying:
> > 
> >  408 INFO TEST-UNEXPECTED-FAIL | toolkit/components/perfmonitoring/tests/browser/browser_AddonWatcher.js | The add-on watcher has detected the misbehaving addon - "shield-recipe-client@mozilla.org" == "addonwatcher-test@mozilla.com" - JS frame :: chrome://mochitests/content/browser/toolkit/components/perfmonitoring/tests/browser/browser_AddonWatcher.js :: burn_rubber</observer :: line 91 
> >  
> >  cf. https://treeherder.mozilla.org/#/jobs?repo=try&revision=04482dad1660&selectedJob=30323896
> > 
> > I guess we need to somehow tell that code / test to exclude this add-on? Not super familiar with that code, but we should sort it out before landing.
> 
> Looking at the failed test's code, I think there is a larger problem. It
> seems that there are no special cases for any other system add-on, so none
> of the other system are interfering with the test. I take that to mean that
> shield-recipe-client is the first system add-on that triggers the slow
> add-on warnings. Which means it triggers the slow add-on warnings, which
> worries me. I could probably patch the test to skip over the normandy system
> addon, accepting that it is slow. That seems bad though.
> 
> I've been running the regular version of this add-on (not the system add-on)
> for a while, and  I don't have any values for
> PERF_MONITORING_SLOW_ADDON_JANK_US relating to it in about:telemetry. So I
> haven't been able to reproduce this problem. I wonder why it is being
> tripped in this test?
> 
> What do you think? Patch the test? Or dig through the performance monitoring
> to figure out why this add-on is feing flagged?

Yoric wrote the addon jank stuff, so maybe he has ideas, but note bug 1309946. Do we do stuff on startup (like io, either on the network or on the local file system) that could account for this? If so, could we do it later on?

> > Several points here (and other similar assertions):
> > 
> > - the failure message here will be useless if it ever fails. Can we write this in a way (e.g. by putting the asserts in the helper for each individual comparison, and actually using assert.greater() or something like this, so both values get echoed?
> > - `isOrdered` uses strict greater than comparisons. Is it not possible, say if this runs on a very fast machine, for the timestamps to be identical?
> > - Should we add `Date.now()` to the end of the array to also assert the dates are not in the future?
> 
> assert.greater() doesn't seem to exist on the JPM assert library. Besides
> that, I agree. I'll try and make the assertions a little more helpful.
> 
> cf.
> https://developer.mozilla.org/en-US/Add-ons/SDK/Low-Level_APIs/test_assert

Oh, I assumed this was using Assert.jsm, which the MDN docs say was based on some standard. Part of the arguments I've heard for switching to Assert.jsm (as opposed to the stuff the existing mochitests use) is that it's a standard, so this is surprising and a bit sad. Oh well, leave it like this for now. :-(


(In reply to Mike Cooper [:mythmon] from comment #54)
> Thanks for all the review to get the code this far.

NP, I'm sorry for all the nits that I wish we were automatically linting more aggressively so we end up spending time on stupid things.

> I do plan on putting a
> bit more work in on this, but I wanted to get the most up to date patch on
> bugzilla so I could share it with others today.

Oh, right, sure. Note that reviewboard will now remember my r+ so if you need me to look again you get to jump through hoops. I'll just clear it to r=<blank> to avoid that - then reviewboard will reset to r? when you add new versions so I get pinged again.
Flags: needinfo?(mcooper)
Flags: needinfo?(dteller)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review92046
Attachment #8799078 - Flags: review+
Flags: needinfo?(dteller)
(In reply to :Gijs Kruitbosch from comment #56)
> > Looking at the failed test's code, I think there is a larger problem. It
> > seems that there are no special cases for any other system add-on, so none
> > of the other system are interfering with the test. I take that to mean that
> > shield-recipe-client is the first system add-on that triggers the slow
> > add-on warnings. Which means it triggers the slow add-on warnings, which
> > worries me. I could probably patch the test to skip over the normandy system
> > addon, accepting that it is slow. That seems bad though.

Yes, this sounds like this is the first system add-on that triggers the slow add-on warning on tests. Hardly the first one to trigger it in practice, but this is worth investigating.

Is there any chance that the add-on may be doing lots of blocking CPU usage, or main thread I/O, or synchronous DOM reflows or calling CPOWs?

Given that the watcher relies a lot on probabilities to flag the right slow add-on, of course, there is also the chance that the add-on may just be doing a few things but may end up on the wrong bucket in this test. If that's the case, we should whitelist it in the test.
The only IO we are doing is network traffic and use of JSONFile.jsm. I don't expect that is the issue (I could be wrong). There is no DOM, so no synchronous DOM reflows. I don't think we are doing anything heavy on the CPU. I'm not sure about CPOWs.

The add-on does a lot of work with sandboxes, moving lots of objects in and out. Could this create CPOWs? I understand that Sandboxes run on the same thread as their parent, so I don't think this is the case.
Flags: needinfo?(mcooper) → needinfo?(dteller)
Well, JSONFile.jsm does cause main thread I/O, so it's quite possibly the culprit. I think that sandboxes don't cause any issues here. Mmmmh... this is not a Jetpack add-on, is it?
Flags: needinfo?(dteller) → needinfo?(mcooper)
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #60)
> Well, JSONFile.jsm does cause main thread I/O, so it's quite possibly the
> culprit. 

I don't believe this is necessarily true - it is only the case if synchronous initialization is forced. Not sure if that is the case for this patch.

Also, I don't actually see JSONFile being used in the code. Mike, where are we using JSONFile?

On init, the code makes http requests, but even then, we only do this some seconds after the browser starts, and we respond to them async, so I don't see why it'd be particularly slow.
 
> I think that sandboxes don't cause any issues here. Mmmmh... this
> is not a Jetpack add-on, is it?

Yes, it is - but perf was supposed to have become much better. :-\
(In reply to :Gijs Kruitbosch from comment #61)
> (In reply to David Teller [:Yoric] (please use "needinfo") from comment #60)
> > Well, JSONFile.jsm does cause main thread I/O, so it's quite possibly the
> > culprit. 
> 
> I don't believe this is necessarily true - it is only the case if
> synchronous initialization is forced. Not sure if that is the case for this
> patch.

Good point.

> On init, the code makes http requests, but even then, we only do this some
> seconds after the browser starts, and we respond to them async, so I don't
> see why it'd be particularly slow.

Yeah, if it's async and not during startup-time-disk-and-cache-contention, this shouldn't be an issue.

> > I think that sandboxes don't cause any issues here. Mmmmh... this
> > is not a Jetpack add-on, is it?
> 
> Yes, it is - but perf was supposed to have become much better. :-\

Module loading is apparently much faster than it used to be. On the other hand, Jetpack used to cause CPOWs all the time (at least every tab opening), I don't know if this has been fixed.
Oh, I think I forgot to update the patch with the version that used JSONFile. In the same change, I've also eliminated most of my usage of Jetpack's require() function, so that should be gone too. Patch incoming...
Flags: needinfo?(mcooper)
I'll look at the updated patch tomorrow - it's evening here, but also for some reason the interdiff isn't a very good interdiff (that is, lots of files are listed as completely new that really aren't completely new), so it's going to take a little while again.
Sorry for trashing the interdiff. I renamed all the .js files to .jsm, which is probably what happened.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review93468

After manually creating an interdiff by using `hg mv foo.js foo.jsm` a bunch of times on v9 of the diff, this was actually not so bad to review. :-)

Looks good - AFAICT all the JSONFile usage is async so I don't expect any mainthread IO. I'm away Thu/Fri, but in terms of moving forwards with the test, I see a few options:
- do a new trypush. The old one is from a few revs ago, so if we're lucky this has just been magically fixed... I'll do it after submitting this review.
- see if you can reproduce this locally and debug what's happening: ./mach mochitest toolkit/components/perfmonitoring/tests/ (it seems to be failing on Windows 7 VMs but not elsewhere?)
- add logging and push to try again to see if we see logging in the code here being hit while the test "burns rubber", as it says.
- this is almost entirely de-addon-sdk'd already, as far as I can tell? We might be able to go whole hog and rm the last vestiges and hope that gives us enough of a speed boost that the test will green up. I don't see much left besides tests and bootstrap.js ? More generally, I'm not exactly sure why we'd be running into this stuff unless the test running coincides with when we start fetching stuff and running recipes on a timeout after browser start, or something?

::: browser/extensions/shield-recipe-client/test/test-EventEmitter.js:1
(Diff revisions 9 - 11)
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */

We normally omit licenses on test files, which implies public domain ( https://blog.gerv.net/2014/09/licensing-policy-change-tests-are-now-public-domain/ )

::: browser/extensions/shield-recipe-client/lib/EnvExpressions.jsm:14
(Diff revision 11)
> +const {Loader, Require} = Cu.import("resource://gre/modules/commonjs/toolkit/loader.js", {});
> +const loader = new Loader({
> +  paths: {
> +    "": "resource://shield-recipe-client-at-mozilla-dot-org/node_modules/",
> +  },
> +});
> +const require = new Require(loader, {});
> +
> +const {Jexl} = require("jexl/lib/Jexl.js");

In terms of perf, it might be worth putting all of this (and the `new Jexl()` and `jexl.addTransforms()` calls) into a lazy getter? Then this won't get invoked (and loaded) until something actually uses it. You can use XPCOMUtils to make this easier:

```
XPCOMUtils.defineLazyGetter(this, "jexl", function() {
 // will be invoked the first time you try and use the global.jexl variable
});
```

If that is problematic for some reason, you can also use `XPCOMUtils.defineLazyModuleGetter(this, "EnvExpressions", path/to/module);` from all the consumers, assuming we don't actually normally need to invoke this until after we've fetched recipes from the http server?

For all of these, you'll want to `Cu.import` XPCOMUtils, of course.

::: browser/extensions/shield-recipe-client/lib/EnvExpressions.jsm:17
(Diff revision 11)
> +this.EXPORTED_SYMBOLS = ["EnvExpressions"];
> +
> +const {Loader, Require} = Cu.import("resource://gre/modules/commonjs/toolkit/loader.js", {});
> +const loader = new Loader({
> +  paths: {
> +    "": "resource://shield-recipe-client-at-mozilla-dot-org/node_modules/",

This is an odd path... shouldn't this be resource://shield-recipe-client/node_modules/ ?

::: browser/extensions/shield-recipe-client/lib/NormandyApi.jsm:16
(Diff revision 11)
> +Cu.import("resource://gre/modules/CanonicalJSON.jsm");
> +Cu.import("resource://shield-recipe-client/lib/Log.jsm");
> +
> +this.EXPORTED_SYMBOLS = ["NormandyApi"];
> +
> +const PREF_API_URL = "extensions.shield-recipe-client@mozilla.org.api_url";

Can we omit use of simple-prefs elsewhere (index.js) as well?

::: browser/extensions/shield-recipe-client/lib/NormandyApi.jsm:36
(Diff revision 11)
> +      }
> +      data = undefined;
> +    }
> +
> +    const headers = {"Accept": "application/json"};
> +    return fetch(url, {

Is this automatically in scope? I would have expected you to need importGlobalProperties for this - I mean, fine if not, obviously. :-)

::: browser/extensions/shield-recipe-client/lib/NormandyApi.jsm:51
(Diff revision 11)
> +  post(endpoint, data) {
> +    return this.apiCall("post", endpoint, data);
> +  },
> +
> +  fetchRecipes: Task.async(function* (filters={}) {
> +    const rawText = yield (yield this.get("recipe/signed/", filters)).text();

Nit: Can you unnest this with a temp? The double yield with double () is a bit tricky to read. :-)

::: browser/extensions/shield-recipe-client/lib/NormandyApi.jsm:62
(Diff revision 11)
> +      const serialized = CanonicalJSON.stringify(recipe);
> +      if (!rawText.includes(serialized)) {
> +        throw new Error("Canonical recipe serialization does not match!");
> +      }
> +
> +      let certChain = yield (yield fetch(x5u)).text();

Same

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.jsm:70
(Diff revision 11)
> +      if (!this.testing) {
> +        defaultURL = "https://input.mozilla.org/api/v2/hb/";
> +      } else {
> +        defaultURL = "https://input.allizom.org/api/v2/hb/";
> +      }
> +      const url = Services.prefs.getStringPref(PREF_INPUT_HOST, defaultURL);

Sadly `Services.prefs` does not support a default value. If the pref is unknown it'll throw. You probably want to import Preferences.jsm and use Preferences.get() instead.

Note that you use Preferences.get elsewhere but don't specify fallback/default values...

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.jsm:77
(Diff revision 11)
> +      let headers = {Accept: "application/json"};
> +
> +      Log.debug("Sending heartbeat flow data to Input", data);
> +
> +      // Make request to input
> +      let p = fetch(url, {body: JSON.stringify(data), headers})

Do we need to tell the server the mime type of our own request as well? Not sure to what degree the earlier wrapper took care of that.

::: browser/extensions/shield-recipe-client/lib/Storage.jsm:52
(Diff revision 11)
> +            .catch(err => {
> +              Log.error(err);
> +              reject(new sandbox.Error());
> +            });

This is the same 3-4 times. I'm tempted to suggest to stick this in a separate function and use `.catch(myErrorHandler.bind(null, reject))` or something. Up to you - might be less flexible if we think we will want different handlers in the future.

Or, actually, and maybe this makes more sense, maybe we should catch failures in the task jsm to get the file, and just ensure that the `loadStorage()` function always resolves with an object - it might just not be backed by actual JSON storage? Maybe that's evil? I dunno. :-)
Attachment #8799078 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

Clearing my review queue, since I know you're working on tests right now.
Attachment #8799078 - Flags: review?(rhelmer)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review97754

::: browser/extensions/shield-recipe-client/bootstrap.js:56
(Diff revisions 11 - 12)
> +  RecipeRunner.init();
> +};
> +
> +this.shutdown = function(data, reason) {
> +  Cu.import("resource://shield-recipe-client/lib/CleanupManager.jsm");
> +  Cu.import("resource://shield-recipe-client/lib/PreferenceManager.jsm");

What's this file? It's not showing up at all, did you not `hg add` it?

::: browser/extensions/shield-recipe-client/data/EventEmitter.js:4
(Diff revisions 11 - 12)
>  /* This Source Code Form is subject to the terms of the Mozilla Public
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> -
> +"";

Eh, what's this? :-)

::: browser/extensions/shield-recipe-client/lib/EnvExpressions.jsm:26
(Diff revisions 11 - 12)
> -const {Jexl} = require("jexl/lib/Jexl.js");
> +XPCOMUtils.defineLazyGetter(this, "jexl", () => {
> +  const {Jexl} = global.nodeRequire("jexl/lib/Jexl.js");
> +  const jexl = new Jexl();
> +  jexl.addTransforms({
> +    date: dateString => new Date(dateString),
> +    stableSample: Sampling.stableSample,
> +  });
> +});

Shouldn't this explicitly return `jexl`? If I'm not missing something, then why aren't we noticing this in test failures? (If I *am* missing something, which seems likely, then what is that? :-) )

::: browser/extensions/shield-recipe-client/lib/NormandyDriver.jsm:64
(Diff revisions 11 - 12)
>        new Heartbeat(aWindow, ee, sandboxManager, internalOptions);
>        return sandbox.Promise.resolve(ee);
>      },
>  
> -    saveHeartbeatFlow(data) {
> -      let defaultURL;
> +    saveHeartbeatFlow() {
> +      // no-op required by spec

Can you remind me where I can read this spec?

::: browser/extensions/shield-recipe-client/lib/RecipeRunner.jsm:33
(Diff revisions 11 - 12)
>      let delay;
> -    if (Preferences.get(PREF_DEV_MODE)) {
> +    if (prefs.getBoolPref("dev_mode")) {
>        delay = 0;
>      } else {
>        // startup delay is in seconds
> -      delay = Preferences.get(PREF_STARTUP_DELAY) * 1000;
> +      delay = prefs.getIntPref("startup_delay") * 1000;

Nit: indicate units (ie seconds) in the pref name?

::: browser/extensions/shield-recipe-client/lib/RecipeRunner.jsm:41
(Diff revisions 11 - 12)
>      setTimeout(this.start.bind(this), delay);
>    },
>  
>    checkPrefs() {
>      // Only run if Unified Telemetry is enabled.
> -    if (!Preferences.get("toolkit.telemetry.unified", false)) {
> +    if (!Services.prefs.getBoolPref("toolkit.telemetry.unified", false)) {

`Services.prefs.get<whatever>()` doesn't take default values, only `Preferences.get` does. Also, `Services.prefs.get*` will throw if the pref does not exist (ie isn't in an extension or Firefox pref list).

::: browser/extensions/shield-recipe-client/package.json:18
(Diff revisions 11 - 12)
>    "keywords": [
>      "jetpack"
>    ],

Can this file just be rm'd now?

::: browser/extensions/shield-recipe-client/jar.mn:7
(Diff revision 12)
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +[features/shield-recipe-client@mozilla.org] chrome.jar:
> +% resource shield-recipe-client %content/
> +  content/  (./*)

This double-packages everything, and also packs the test/ directory. That doesn't seem correct.

::: browser/extensions/shield-recipe-client/test/browser_EventEmitter.js:19
(Diff revision 12)
> +
> +const evidence = {
> +  a: 0,
> +  b: 0,
> +  c: 0,
> +  log: [],

You use this as a string elsewhere, so to avoid confusion, maybe just define it as "" instead of [] ?

::: browser/extensions/shield-recipe-client/test/browser_EventEmitter.js:59
(Diff revision 12)
> +  // Spin the event loop to run events
> +  yield Promise.resolve();
> +
> +  // one more event for a
> +  eventEmitter.off("event", listenerB);

Shouldn't we assert the intermediate values here?

::: browser/extensions/shield-recipe-client/test/browser_Heartbeat.js:1
(Diff revision 12)
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */

We don't normally MPL-license tests (you can just not have a header and tests will be public domain).

::: browser/extensions/shield-recipe-client/test/browser_Heartbeat.js:16
(Diff revision 12)
> +Cu.import("resource://shield-recipe-client/lib/SandboxManager.jsm", this);
> +Cu.import("resource://shield-recipe-client/lib/NormandyDriver.jsm", this);
> +Cu.import("resource://shield-recipe-client/lib/Log.jsm", this);
> +
> +/**
> + * Assert an array is in non-descending order, and that every element is a number

non-descending as in ascending? :-)

::: browser/extensions/shield-recipe-client/test/browser_Heartbeat.js:37
(Diff revision 12)
> +  }
> +
> +  const promises = [];
> +
> +  for (const notification of notificationBox.allNotifications) {
> +    promises.push(waitForNotificationClose(targetWindow, notification));

Hm, so can you clarify what you mean by "I wasn't getting any events" ? I would kinda assume that if we get mutation observer notifications for 1 removal, we get them for all of the removals?

::: browser/extensions/shield-recipe-client/test/browser_Heartbeat.js:80
(Diff revision 12)
> +  const eventEmitter = new sandboxManager.sandbox.EventEmitter(sandboxedDriver).wrappedJSObject;
> +  const targetWindow = Services.wm.getMostRecentWindow("navigator:browser");
> +  const notificationBox = targetWindow.document.querySelector("#high-priority-global-notificationbox");
> +
> +  const preCount = notificationBox.childElementCount;
> +  const hb = new Heartbeat(targetWindow, eventEmitter, sandboxManager, {

Hm, so does this run this code with privileges, whereas normally the recipe code would run it inside the sandbox, or am I confused? If so, it feels like ideally we should make the test run this inside the sandbox (if not in this test then in some other test). Otherwise we might accidentally break 'real' usage without breaking the tests.

::: browser/extensions/shield-recipe-client/test/browser_Heartbeat.js:102
(Diff revision 12)
> +  const tabOpenPromise = BrowserTestUtils.waitForNewTab(targetWindow.gBrowser)
> +    .then(tab => {
> +      tabLoadPromise = BrowserTestUtils.browserLoaded(
> +        tab.linkedBrowser, true, url => url && url !== "about:blank");
> +    });

Feels like you could create this promise after yielding for the tabOpenPromise, too? That is,

```js

const tabOpenPromise = BTU.waitForNewTab(targetWindow.gBrowser);
learnMoreEl.click();
let tab = yield tabOpenPromise;
let tabLoadPromise = BTU.browserLoaded(tab.linkedBrowser, ...);
```

(Same feedback for the other tasks)

::: browser/extensions/shield-recipe-client/test/browser_Heartbeat.js:110
(Diff revision 12)
> +  tabLoadPromise.then(tabUrl => {
> +    Assert.equal(tabUrl, "https://example.org/learnmore", "Learn more link opened the right url");
> +  });

Nit:

```js
let tabUrl = yield tabLoadPromise;
Assert.equal(tabUrl, ...);
```

is simpler.

::: browser/extensions/shield-recipe-client/test/browser_Heartbeat.js:123
(Diff revision 12)
> +  // Close notification to trigger telemetry to be sent
> +  yield closeAllNotifications(targetWindow, notificationBox);
> +  yield telemetrySentPromise;

You should close the tab you've opened here.

::: browser/extensions/shield-recipe-client/test/browser_Heartbeat.js:186
(Diff revision 12)
> +  const targetWindow = parentWindow.OpenBrowserWindow();
> +  targetWindow.addEventListener("load", function onLoad() {
> +    targetWindow.removeEventListener("load", onLoad);
> +    windowOpenResolve();
> +  });
> +  yield windowOpenPromise;

```js
let targetWindow = yield BrowserTestUtils.openNewBrowserWindow();
```

::: browser/extensions/shield-recipe-client/test/browser_Heartbeat.js:199
(Diff revision 12)
> +  let telemetrySentResolve;
> +  const telemetrySentPromise = new Promise(resolve => telemetrySentResolve = resolve);
> +  hb.eventEmitter.once("TelemetrySent", payload => {
> +    assertOrdered([0, payload.offeredTS, payload.windowClosedTS, Date.now()]);
> +    telemetrySentResolve();
> +  });

Is it worth factoring this out into a utility function?

::: browser/extensions/shield-recipe-client/test/browser_Heartbeat.js:207
(Diff revision 12)
> +    assertOrdered([0, payload.offeredTS, payload.windowClosedTS, Date.now()]);
> +    telemetrySentResolve();
> +  });
> +
> +  // triggers sending ping to normandy
> +  targetWindow.close();

```js
yield BrowserTestUtils.closeWindow(targetWindow);
```

::: browser/extensions/shield-recipe-client/test/browser_Heartbeat.js:216
(Diff revision 12)
> +  for (let i = 0; i < targetWindow.gBrowser.tabs.length; i++) {
> +    yield BrowserTestUtils.removeTab(targetWindow.gBrowser.tabs[i]);
> +  }

This removes half the tabs, because you're removing from the collection you're iterating over.

Best just remove them at the end of the task you add them in.

::: browser/extensions/shield-recipe-client/test/browser_driver_uuids.js:7
(Diff revision 12)
> +const sandboxManager = new SandboxManager();
> +sandboxManager.addHold("test running");
> +let driver = new NormandyDriver(sandboxManager);
> +
> +// Test that UUID look about right
> +const uuid1 = driver.uuid();
> +ok(/^[a-f0-9-]{36}$/.test(uuid1), "valid uuid format");
> +
> +// Test that UUIDs are different each time
> +const uuid2 = driver.uuid();
> +isnot(uuid1, uuid2, "uuids are unique");

I think the assertions should live inside a task or test function.
Attachment #8799078 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review97870

Almost there :)

Since this will get restartless updates, you'll need to make sure to unload modules and that all listeners etc. are removed (more detail on these below).

Also there are some vestiges of the SDK in here, let's remove them if it's no longer in use.

::: addon-sdk/source/python-lib/cuddlefish/prefs.py:81
(Diff revision 12)
>      'extensions.update.background.url': 'http://localhost/extensions-dummy/updateBackgroundURL',
>      'extensions.blocklist.url' : 'http://localhost/extensions-dummy/blocklistURL',
>      # Make sure opening about:addons won't hit the network.
>      'extensions.webservice.discoverURL' : 'http://localhost/extensions-dummy/discoveryURL',
>      'extensions.getAddons.maxResults': 0,
> +    'extensions.shield-recipe-client@mozilla.org.api_url' : 'https://localhost/selfsupport-dummy/',

Are these changes to `/addon-sdk` still necessary or have you moved completely away from SDK now?

::: browser/extensions/shield-recipe-client/bootstrap.js:54
(Diff revision 12)
> +
> +  Cu.import("resource://shield-recipe-client/lib/RecipeRunner.jsm");
> +  RecipeRunner.init();
> +};
> +
> +this.shutdown = function(data, reason) {

This needs to `Cu.unload` any modules loaded from `reource://shield-recipe-client` for two reasons:

1) this is necessary for the extension to load the new version of the module when a restartless update happens (which should always be the case for system add-ons now)

2) memory leaks if we ever remove the extension for some reason (also see the other common causes in https://developer.mozilla.org/en-US/docs/Extensions/Common_causes_of_memory_leaks_in_extensions )

::: browser/extensions/shield-recipe-client/jar.mn:7
(Diff revision 12)
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +[features/shield-recipe-client@mozilla.org] chrome.jar:
> +% resource shield-recipe-client %content/
> +  content/  (./*)

A common way to handle this is to move everything you want to actually ship into a `./content` directory, for instance http://searchfox.org/mozilla-central/rev/a8b5f53e7df90df655a0982e94087ee83290c22e/browser/extensions/formautofill/jar.mn

::: browser/extensions/shield-recipe-client/lib/Heartbeat.jsm:157
(Diff revision 12)
> +        ratingElement.className = "plain star-x";
> +        ratingElement.id = "star" + starIndex;
> +        ratingElement.setAttribute("data-score", starIndex);
> +
> +        // Add the click handler
> +        ratingElement.addEventListener("click", ev => {

Does this listener get removed?

::: browser/extensions/shield-recipe-client/lib/Heartbeat.jsm:191
(Diff revision 12)
> +    if (this.options.learnMoreMessage && this.options.learnMoreUrl) {
> +      const learnMore = this.chromeWindow.document.createElement("label");
> +      learnMore.className = "text-link";
> +      learnMore.href = this.options.learnMoreUrl.toString();
> +      learnMore.setAttribute("value", this.options.learnMoreMessage);
> +      learnMore.addEventListener("click", () => this.maybeNotifyHeartbeat("LearnMore"));

Does this listener get removed?

::: browser/extensions/shield-recipe-client/lib/Log.jsm:1
(Diff revision 12)
> +/* This Source Code Form is subject to the terms of the Mozilla Public

Why ship your own logger instead of using `resource://gre/modules/Log.jsm` ?

::: browser/extensions/shield-recipe-client/package.json:1
(Diff revision 12)
> +{

Is this file still necessary?
Attachment #8799078 - Flags: review?(rhelmer)
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review97870

> Does this listener get removed?

Not explicitly, but I think that is ok. The notification is always removed, handled by the cleanup manager at the bottom of this giant function. I expect this will cause the handlers to get cleaned up just fine. I also don't think we keep any long lived references to the element either, only local variables.

All this is to say that I don't know a good way to properly remove this listeners, and so I don't want to bother figuring it out if it isn't needed.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review97870

> Not explicitly, but I think that is ok. The notification is always removed, handled by the cleanup manager at the bottom of this giant function. I expect this will cause the handlers to get cleaned up just fine. I also don't think we keep any long lived references to the element either, only local variables.
> 
> All this is to say that I don't know a good way to properly remove this listeners, and so I don't want to bother figuring it out if it isn't needed.

I think you're right. It might be worth playing around with about:memory and the devtools browser toolbox and doing some restartless updates of the shield addon - I've found leaks before using these.

I don't think this needs to block landing though, but it'd be interesting to do once it's available in Nightly just to make sure there isn't anything we missed.

Unloading the modules is the really important bit anyway.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review97754

> Can you remind me where I can read this spec?

http://normandy.readthedocs.io/en/latest/dev/driver.html

> Can this file just be rm'd now?

We still need it in the github repo for tooling reasons, but it doesn't need to be in the extension, or in mozilla-central.

> This double-packages everything, and also packs the test/ directory. That doesn't seem correct.

I'm actually not sure how this, bit works. :rhelmer wrote it.

Rob, can you look at this?

> non-descending as in ascending? :-)

To me ascending doesn't include a series like [0, 0, 0, 1, 1, 1], where-as non-descending does. It is >= vs >

> Hm, so can you clarify what you mean by "I wasn't getting any events" ? I would kinda assume that if we get mutation observer notifications for 1 removal, we get them for all of the removals?

I've forgotten what changes I made, but when I tried combining this as suggested, the events that notifications closed just never fired, and the tests hung. I can give it another try, with fresh eyes.

> Hm, so does this run this code with privileges, whereas normally the recipe code would run it inside the sandbox, or am I confused? If so, it feels like ideally we should make the test run this inside the sandbox (if not in this test then in some other test). Otherwise we might accidentally break 'real' usage without breaking the tests.

The recipe code runs in the sandbox, but it accesses this constructor via the driver, which runs outside the sandbox.
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review99564

::: browser/extensions/shield-recipe-client/jar.mn:7
(Diff revision 12)
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +[features/shield-recipe-client@mozilla.org] chrome.jar:
> +% resource shield-recipe-client %content/
> +  content/  (./*)

Hm yeah that was based on an existing manifest that puts everything in `./content`... would you mind changing the directory layout (or the script that you use to copy the files over?)

If you make it look like e.g. https://dxr.mozilla.org/mozilla-central/source/browser/extensions/formautofill and change this from `(./*)` to `(content/*)`, that will keep your test files from being packaged too.
After testing this a bit, and a few false starts, I arrived at something different that seems to work:

[features/shield-recipe-client@mozilla.org] chrome.jar:
% resource shield-recipe-client %content/
  content/lib/ (./lib/*)
  content/data/ (./data/*)
  content/node_modules/jexl/ (/node_modules/jexl/*)

It is a little more wordy, but it also doesn't require code rearranging. I think the code is clearer this way. Does this sound ok?
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review99792

::: browser/extensions/shield-recipe-client/bootstrap.js:37
(Diff revisions 12 - 13)
>  
>  this.install = function() {
>    // Self Repair only checks its pref on start, so if we disable it, wait until
>    // next startup to run, unless the dev_mode preference is set.
> -  if (Services.prefs.getBoolPref(PREF_SELF_SUPPORT_ENABLED, true)) {
> -    Services.prefs.setBoolPRef(PREF_SELF_SUPPORT_ENABLED, false);
> +  if (Preferences.get(PREF_SELF_SUPPORT_ENABLED, true)) {
> +    Preferences.setBoolPref(PREF_SELF_SUPPORT_ENABLED, false);

This should use `Preferences.set` - `Preferences.setBoolPref` doesn't exist.

::: browser/extensions/shield-recipe-client/bootstrap.js:37
(Diff revisions 12 - 13)
> -    Services.prefs.setBoolPRef(PREF_SELF_SUPPORT_ENABLED, false);
> -    if (!Services.prefs.getBoolPref(PREF_DEV_MODE)) {
> +    Preferences.setBoolPref(PREF_SELF_SUPPORT_ENABLED, false);
> +    if (!Services.prefs.setBoolPref(PREF_DEV_MODE, false)) {

I'm confused what this is doing... `setBoolPref` doesn't return anything, I think, and doesn't take a second argument. It was using `.getBoolPref` before. What is this trying to do? Maybe you want `Preferences.get` instead?

::: browser/extensions/shield-recipe-client/lib/NormandyApi.jsm:13
(Diff revisions 12 - 13)
>  const {utils: Cu, classes: Cc, interfaces: Ci} = Components;
>  Cu.import("resource://gre/modules/Services.jsm");
>  Cu.import("resource://gre/modules/Task.jsm");
>  Cu.import("resource://gre/modules/CanonicalJSON.jsm");
> -Cu.import("resource://shield-recipe-client/lib/Log.jsm");
> -Cu.importGlobalProperties(["fetch"]);
> +Cu.import("resource://gre/modules/Log.jsm");
> +Cu.importGlobalProperties(["fetch"]); /* globals fetch */

Can you file a followup bug to make the mozilla eslint plugin do this automatically?

::: browser/extensions/shield-recipe-client/test/browser_Heartbeat.js:60
(Diff revisions 12 - 13)
> -    observer.observe(parent, {childList: true});
> +}
> +
> +/* Check that the correct telmetry was sent */
> +function assertTelemetrySent(hb, eventNames) {
> +  let telemetrySentResolve;
> +  const telemetrySentPromise = new Promise(resolve => { telemetrySentResolve = resolve; });

You could use:

```js
function assertTelemetrySent(hb, eventNames) {
  return new Promise(resolve => {
    // do everything in here.
  });
}
```

That saves you sticking the `resolve` method in a local variable. The function you pass the `Promise` constructor is guaranteed to run immediately.

::: browser/extensions/shield-recipe-client/test/browser_env_expressions.js:8
(Diff revisions 12 - 13)
>  const {utils: Cu} = Components;
>  Cu.import("resource://gre/modules/TelemetryController.jsm", this);
>  Cu.import("resource://gre/modules/Task.jsm", this);
>  
>  Cu.import("resource://shield-recipe-client/lib/EnvExpressions.jsm", this);
> -Cu.import("resource://shield-recipe-client/lib/Log.jsm", this);
> +Cu.import("resource://gre/modules/Log.jsm", this);

It looks like you don't actually use Log.jsm here, so maybe we can just get rid of this?
Attachment #8799078 - Flags: review?(gijskruitbosch+bugs) → review+
One final try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1594d2fcaebedbd32c81ffcdcfc88111443d4e30

I think that the only oranges there are things that are not related to my code. I've fixed all the issues I see on mozreview, including Gijs's most recent comments.

I think this is ready to merge. Gijs and/or Rhelmer, what do you think? What are the next steps here?
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review100372

lgtm
Attachment #8799078 - Flags: review?(rhelmer) → review+
Comment on attachment 8799078 [details]
Bug 1308656 - Add shield-recipe-client as system add-on

https://reviewboard.mozilla.org/r/84386/#review99792

> Can you file a followup bug to make the mozilla eslint plugin do this automatically?

It seems the in-tree linter does do this automatically. There must be an issue with my external linting set up that made me add this. I've removed the comment.
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a53f5aa00347
Add shield-recipe-client as system add-on r=Gijs,rhelmer
Keywords: checkin-needed
Backed out for syntax error in browser_dbg_searchbox-parse.js (line cut off):

https://hg.mozilla.org/integration/autoland/rev/8358ac05d1643b69dca1babb699e42e1cb21438a

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=a53f5aa0034713ae0e5c257e01a53199e8ad5444
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=8236723&repo=autoland

[task 2016-12-21T08:27:29.414695Z] 08:27:29     INFO - TEST-START | devtools/client/debugger/test/mochitest/browser_dbg_searchbox-parse.js
[task 2016-12-21T08:27:29.432379Z] 08:27:29     INFO - TEST-INFO | started process screentopng
[task 2016-12-21T08:27:29.937936Z] 08:27:29     INFO - TEST-INFO | screentopng: exit 0
[task 2016-12-21T08:27:29.938960Z] 08:27:29     INFO - TEST-UNEXPECTED-FAIL | devtools/client/debugger/test/mochitest/browser_dbg_searchbox-parse.js | Exception thrown - at chrome://mochitests/content/browser/devtools/client/debugger/test/mochitest/browser_dbg_searchbox-parse.js:25 - SyntaxError: missing ) after argument list

The following got committed before:

>-    is(filterView.searchData.toSource(), '[":", ["", 42]]',
>+    is(filterView.searchData.toSour
Flags: needinfo?(mcooper)
But if it hadn't been backed out for that, it would have been for https://treeherder.mozilla.org/logviewer.html#?job_id=8246428&repo=autoland - there's something you have to do to talos when you add a system addon, dunno exactly what it is.
philor, aryx: I believe I have fixed the issues.
Flags: needinfo?(philringnalda)
Flags: needinfo?(mcooper)
Flags: needinfo?(aryx.bugmail)
Keywords: checkin-needed
Flags: needinfo?(philringnalda)
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/b35d6cb8df5d
Add shield-recipe-client as system add-on r=Gijs,rhelmer
Keywords: checkin-needed
I think we've stuck the landing. Thanks for the help everyone!
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
(In reply to Mike Cooper [:mythmon] from comment #94)
> I think we've stuck the landing. Thanks for the help everyone!

The bug will automatically be marked as fixed when this hits m-c, and a number of other fields (target milestone, branch status flags) are also set then. Best to just wait for that to happen.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Is there a plan for QE to verify this?
Flags: needinfo?(mcooper)
https://hg.mozilla.org/mozilla-central/rev/b35d6cb8df5d
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
(In reply to :Gijs Kruitbosch from comment #95)
> (In reply to Mike Cooper [:mythmon] from comment #94)
> > I think we've stuck the landing. Thanks for the help everyone!
> 
> The bug will automatically be marked as fixed when this hits m-c, and a
> number of other fields (target milestone, branch status flags) are also set
> then. Best to just wait for that to happen.

Oh, or not, I guess, if this isn't in a component that has those fields. :-(
We have been talking with Adrian Florinescu to do QA on the add-on.
Flags: needinfo?(mcooper)
this has a perma leak upon landing- 
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d252d4d2a4f0b1f63da4f0e10cee64f4e7d2e218

on just about every platform (linux64 bc5):
[task 2016-12-23T21:57:07.610671Z] 21:57:07    ERROR - TEST-UNEXPECTED-FAIL | browser/extensions/shield-recipe-client/test/browser_Heartbeat.js | leaked 1 window(s) until shutdown [url = chrome://browser/content/browser.xul]
[task 2016-12-23T21:57:07.612007Z] 21:57:07    ERROR - TEST-UNEXPECTED-FAIL | browser/extensions/shield-recipe-client/test/browser_Heartbeat.js | leaked 1 window(s) until shutdown [url = about:blank]
[task 2016-12-23T21:57:07.615469Z] 21:57:07     INFO - TEST-INFO | browser/extensions/shield-recipe-client/test/browser_Heartbeat.js | windows(s) leaked: [pid = 1780] [serial = 29], [pid = 1780] [serial = 30]

I was ready to do a landing last night to make these leaks turn jobs orange/red until I saw this error.  Our options are to backout or to disable the test.
Flags: needinfo?(mcooper)
Depends on: 1326225
:jmaher, can we file another bug for that? I think it would make sense to roll forward with this, instead of backing out.
Flags: needinfo?(mcooper) → needinfo?(jmaher)
I filed bug 1328590
Depends on: 1328590
Flags: needinfo?(jmaher)
Depends on: 1330954
See Also: → 1361578
Recently I have received various annoying notifications. They are in a notificationbox element with id="high-priority-global-notificationbox". It seems this was introduced in this bug. How can I get rid of this?
:Oriol, you're right that Shield is one of the ways these notifications are created. It isn't the only one though. The intent is definitely that these notifications should occur at a rate low enough to not be annoying. If you're annoyed, I'd consider that a bug. Can you provide more details about how often these notifications are showing up, and what kind of contents they have? I worry that this may represent a larger bug that may be affecting others, and I'd like to track that down.

If you'd prefer, it may be appropriate to take this conversation outside of this bug, since this bug is fairly old, covers a large amount of code, and includes many people CCed. We could either file a new bug, or you can email me directly at mcooper@mozilla.com.

Also, if these notifications are truly annoying, you can disable the entire Shield system (which may not be the source of your notifications) by setting the pref "extensions.shield-recipe-client.enabled" to false. I would hesitate to do this though, since Shield is used for other features that you may find useful. It would also make it harder to track down this issue.
Flags: needinfo?(oriol-bugzilla)
In fact I haven't received that many notifications (three, I think), but I had never seen them until few days ago so I worried they would continue appearing from now on. They were about legacy add-ons. One linked the "Migrating from Firebug" MDN article, despite having Firebug disabled. And suggestions about replacing a legacy add-on with a webext one. But I have extensions.legacy.enabled=true, so I'm not interested.

I already suspected disabling extensions.shield-recipe-client.enabled could work, effectively I haven't received more notifications since I switched it.
Flags: needinfo?(oriol-bugzilla)
Thanks for the info Oriol. You've comforted me that this isn't an out of control notification, and that there isn't a technical bug here. It is really helpful to get this feedback. As you can see from this bug, the feature has been live for a while. It's unfortunate that you got a quick cluster of so many notifications, but I don't think this represents a long term trend.

It's not really related, but you should probably know that extensions.legacy.enabled=true isn't going to continue working once Firefox 57 is released. If you'd like to talk more about this, I'd be happy to chat via email about it. This bug isn't the right place for further discussion on that matter.

Thanks again for the help!
Product: Shield → Firefox
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: