Last Comment Bug 1018200 - Run an experiment in 32 beta to determine the best value for dom.ipc.plugins.unloadTimeoutSecs
: Run an experiment in 32 beta to determine the best value for dom.ipc.plugins....
Status: VERIFIED FIXED
:
Product: Core
Classification: Components
Component: Plug-ins (show other bugs)
: unspecified
: All All
-- normal (vote)
: ---
Assigned To: Qeole [:qeole]
: Kamil Jozwiak [:kjozwiak]
: Benjamin Smedberg [:bsmedberg]
Mentors: Benjamin Smedberg [:bsmedberg]
Depends on: 1007490 1011136 1017806
Blocks: 1042184
  Show dependency treegraph
 
Reported: 2014-05-30 06:37 PDT by Benjamin Smedberg [:bsmedberg]
Modified: 2014-08-20 06:31 PDT (History)
14 users (show)
benjamin: firefox‑backlog+
See Also:
Crash Signature:
(edit)
QA Whiteboard: [qa!]
Iteration: 34.1
Points: 5
Has Regression Range: ---
Has STR: ---
verified


Attachments
v1-issues.patch (7.87 KB, patch)
2014-07-11 08:09 PDT, Qeole [:qeole]
no flags Details | Diff | Splinter Review
bug1018200-v2.patch (6.06 KB, patch)
2014-07-11 16:39 PDT, Qeole [:qeole]
benjamin: review+
Details | Diff | Splinter Review

Description User image Benjamin Smedberg [:bsmedberg] 2014-05-30 06:37:51 PDT
In bug 1007490 we changed the preference dom.ipc.plugins.unloadTimeoutSecs in FF32 down to 30 seconds, but we don't have a good understanding of how this will affect users in the field.

We're trading off killing a perhaps-leaky plugin instance more frequently, which may cause user slowdowns, against the jank of launching a new plugin process.

I'd like to run an experiment early in 32beta to try out various values for the dom.ipc.plugins.unloadTimeoutSecs pref, using the following values:

0, 5, 15, 30, 60, 90, 120, 300

Then we will check what this does to the total number of plugin launches as well as measuring the jank from those launches.

This bug will tracking the effort to make and deploy the experiment, which I expect to be relatively easy once bug 1017806 is fixed; I'll file a separate bug to analyze the results.
Comment 1 User image Qeole [:qeole] 2014-05-30 06:55:25 PDT
I'd like to work on this (once bug 1017806 is fixed).
Comment 2 User image Benjamin Smedberg [:bsmedberg] 2014-07-07 11:43:11 PDT
qeole, 32 beta is in two weeks. Are you willing to take this now so that we're ready?

I think it will be very easy:

http://hg.mozilla.org/users/bsmedberg_mozilla.com/telemetry-experiment-server/file/844cc35009f0/experiments/branch-test/code/bootstrap.js has some sample code for using experiment branches.

And I think what you want to do is something like:

Cu.import("resource://gre/modules/Preferences.jsm");

let branches = ["0", "5", "15", "30", "60", "90", "120", "300"];
...
let defaultPrefs = Preferences.Preferences({defaultBranch: true});
defaultPrefs.set("dom.ipc.plugins.unloadTimeoutSecs", parseInt(branch));
Comment 3 User image Qeole [:qeole] 2014-07-07 14:59:40 PDT
Yes, I'll be glad to take it. I'll start looking at it tomorrow.
Comment 4 User image Qeole [:qeole] 2014-07-11 08:09:06 PDT
Created attachment 8454460 [details] [diff] [review]
v1-issues.patch

So I've been working on the experiment.
I installed the Staging server on my computer as explained in [1] and used code from comment #2.

I have a working experiment, but I ran into a couple of issues I'd like to solve before I can consider it finished.
I attach what I have for feedback. There are two manifest.json: see (3.) below for explanation.

 1. From bsmedberg's code, |Preferences.Preferences()| call on second line does not work. Browser console says:
“Got error during bootstrap startup: TypeError: Preferences.Preferences is not a function”.
I'm not good enough at Javascript to find what's wrong exactly, so I replaced it by a dirty snippet which actually runs the instruction we were expecting from the |Preferences.Preferences()| function (see bootstrap.js in attachment).

 2. When installing the experiment, the startup function is launched twice. I first thought double logs were due to the presence of two observer agents, one from Experiments and another one from ExperimentEntry. But random branch attribution is performed twice, (often) with distinct resulting values. Here are some logs I got:
    Setting experiment branch to 60
    Setting dom.ipc.plugins.unloadTimeoutSecs preference to 60
    Setting experiment branch to 120
    Setting dom.ipc.plugins.unloadTimeoutSecs preference to 120
    Bootstrap startup succeeded
(second execution prevails, in this case branch and pref end up at 120). It does not prevent experiment from working, but I'm not sure this is an expected/desired behavior.

 3. I found two different manifests.json formats, and had issues with both:
    a) format use in other experiment and in branch-test [2] fails to load with following error:
“Browser.Experiments.Experiments ERROR   Experiments #0::_loadManifest - failure to fetch/parse manifest (continuing anyway): TypeError: this._log.warning is not a function”
    b) “experiments manifests” format described in [3] works well but Staging server does not accept it for “Firefox Telemetry Experiments” web page as it misses meta top-level keys (“publish” and so on).
    I included both versions I used in attachment.

Further questions / remarks:

 4. Experiment itself seems to work well. Pref is changed and keeps a “default value” status, and I could observe following lines in FHR raw data:
    "org.mozilla.experiments.info": {
      "_v": 2,
      "lastActive": "fx-dom.ipc.plugins.unloadTimeoutSecs-beta32@mozilla.org",
      "lastActiveBranch": "120"
    },

 5. Exhaustive list of files I should provide is: [ bootstrap.js, install.rdf, experiment.xpi, manifest.json ], right?

 6. What 'debug' info should be left in bootstrap.js (regarding the |Cu.reportError()| calls)?

 7. I'm unsure what exact max times/max versions should be set in manifest.json and install.rdf. Current values I chose for manifest.json are: |startTime|: July 1st 2014, |endTime|: January 1st 2015, |maxActiveSeconds|: 6 weeks (42 days). |xpiURL| will probably have to be updated. Also, I don't know for |priority| and |sample|. I suppose |channel| should be set to "beta"?.

 8. I wrote a very short Makefile for zipping again experiment.xpi and updating hash in manifest.json with a sed line when I update the experiment code. Could it be of any interest to other people? Maybe I could drop it on the wiki page for editing XPIs ([4])? I include it in current patch for info, but will of course remove it for final submission.

----
[1] https://wiki.mozilla.org/QA/Telemetry/Installing_Staging_Server
[2] https://hg.mozilla.org/users/bsmedberg_mozilla.com/telemetry-experiment-server/rev/branch-testing
[3] https://ci.mozilla.org/job/mozilla-central-docs/Tree_Documentation/experiments/manifest.html
[4] https://wiki.mozilla.org/QA/Telemetry/EditingXPIFiles
Comment 5 User image Benjamin Smedberg [:bsmedberg] 2014-07-11 09:13:30 PDT
>  1. From bsmedberg's code, |Preferences.Preferences()| call on second line
> does not work. Browser console says:
> “Got error during bootstrap startup: TypeError: Preferences.Preferences is
> not a function”.

I was wrong. You actually just need

Preferences({defaultBranch: true});

>  2. When installing the experiment, the startup function is launched twice.
> I first thought double logs were due to the presence of two observer agents,
> one from Experiments and another one from ExperimentEntry. But random branch
> attribution is performed twice, (often) with distinct resulting values. Here
> are some logs I got:
>     Setting experiment branch to 60
>     Setting dom.ipc.plugins.unloadTimeoutSecs preference to 60
>     Setting experiment branch to 120
>     Setting dom.ipc.plugins.unloadTimeoutSecs preference to 120
>     Bootstrap startup succeeded

This is a little surprising, but it matches a bug that Felipe said he had with a prior experiment. I believe that we're getting a separate notification for "install" and "enable". Felipe do you have more details about that, or did you file a bug?

There are two ways we can try working around this, using a flag or by checking the start-reason:

Using a flag:

var gStarted = false;

function startup() {
  if (gStarted) {
    return;
  }
  gStarted = true;
  ...
}

function shutdown() {
  gStarted = false;
}

Using the reason:

function startup(data, reason) {
  if (reason == 5) // ADDON_INSTALL... it's a bug but we should get ADDON_ENABLE next
    return;
}

I think the flag is probably the better option.

>  3. I found two different manifests.json formats

The experiment build system automatically converts one into the other when you run build.py and adds the correct XPI hash. You don't need to do any of the hash or URI calculations.

>  6. What 'debug' info should be left in bootstrap.js (regarding the
> |Cu.reportError()| calls)?

Remove the ones which are just for logging. Keep the ones which are actually for error reporting.

>  7. I'm unsure what exact max times/max versions should be set in
> manifest.json and install.rdf. Current values I chose for manifest.json are:
> |startTime|: July 1st 2014, |endTime|: January 1st 2015, |maxActiveSeconds|:
> 6 weeks (42 days). |xpiURL| will probably have to be updated. Also, I don't
> know for |priority| and |sample|. I suppose |channel| should be set to
> "beta"?.

I think we should stop the experiment after the 6 weeks of 32 beta, which according to https://wiki.mozilla.org/RapidRelease/Calendar will end 2-Sep. So;

startdate: 1-Jul
enddate: 2-Sep
minversion: 32.0
maxversion: 34.*
maxActiveSecond: 42 days
This is not the most important experiment, so priority should be a high number (P1 is evaluated first) such as 10.
In terms of sample, I think we should go for 5% (0.05). That will give us enough data, and shouldn't affect any other experiments we want to run.

>  8. I wrote a very short Makefile for zipping again experiment.xpi and
> updating hash in manifest.json with a sed line when I update the experiment

gps is working on fixing this generically: see bug 1037160.
Comment 6 User image Qeole [:qeole] 2014-07-11 16:39:07 PDT
Created attachment 8454779 [details] [diff] [review]
bug1018200-v2.patch

Thanks Benjamin.

I fixed issue 1. and 3. (I hadn't seen that Staging server would create a new firefox-manifest.json, now I understand better how it works).

I used a flag as you recommand for issue 2. It works well. I let an error message about that, as startup() function should not be called twice.

I've written a new patch which takes into account the answers you provided to all my questions. It seems to work well (experiment loads no more on my nightly since I switched to "beta" in manifest.json file, but this is what we expect), and I have no known issues left.
Could you please provide review?
Comment 7 User image :Felipe Gomes (needinfo me!) 2014-07-14 13:00:10 PDT
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> Preferences({defaultBranch: true});
> 
> >  2. When installing the experiment, the startup function is launched twice.
> > I first thought double logs were due to the presence of two observer agents,
> > one from Experiments and another one from ExperimentEntry. But random branch
> > attribution is performed twice, (often) with distinct resulting values. Here
> > are some logs I got:
> >     Setting experiment branch to 60
> >     Setting dom.ipc.plugins.unloadTimeoutSecs preference to 60
> >     Setting experiment branch to 120
> >     Setting dom.ipc.plugins.unloadTimeoutSecs preference to 120
> >     Bootstrap startup succeeded
> 
> This is a little surprising, but it matches a bug that Felipe said he had
> with a prior experiment. I believe that we're getting a separate
> notification for "install" and "enable". Felipe do you have more details
> about that, or did you file a bug?

No bug filed yet, but this is exactly it. I'm trying to document all the install/startup behaviors of add-ons/experiments to file a bug with some good info. But the workaround done here is indeed what should be done.
Comment 8 User image Qeole [:qeole] 2014-07-14 14:05:55 PDT
In regard with that, and in case it could help, I printed internal reasons for the duplicated |startup()| launch. I get:
  Install:     5
  Startup (1): 3
  Startup (2): 5

Where (from [1]):
  ADDON_ENABLE  (3) is “The add-on is being enabled.”
  ADDON_INSTALL (5) is “The add-on is being installed.”

[1] https://developer.mozilla.org/en-US/Add-ons/Bootstrapped_extensions#Reason_constants
Comment 9 User image Benjamin Smedberg [:bsmedberg] 2014-07-14 15:08:45 PDT
Comment on attachment 8454779 [details] [diff] [review]
bug1018200-v2.patch

This looks great. I'll land this to staging and then ask you and some external QA to validate before we go to production.
Comment 11 User image Benjamin Smedberg [:bsmedberg] 2014-07-15 06:48:42 PDT
NI for qa verification on stage before we push to production. Because the experiment is limited to the beta channel, this may involve hacking a current aurora build to report on the beta channel for testing.
Comment 12 User image Florin Mezei, QA (:FlorinMezei) 2014-07-15 06:51:19 PDT
Since this is related to experiments this goes to Kamil. Kamil, let me know if you need any help from us.
Comment 13 User image Gregory Szorc [:gps] 2014-07-15 11:23:53 PDT
Why doesn't the add-on reset the changed pref on uninstall? Shouldn't experiments clean up after themselves?
Comment 14 User image Benjamin Smedberg [:bsmedberg] 2014-07-15 11:30:34 PDT
It's only setting the default pref value; there is no persistent pref setting.
Comment 15 User image Kamil Jozwiak [:kjozwiak] 2014-07-23 11:32:48 PDT
Went through verification using the following build & telemetry experiment staging server:

Build -> ftp://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/32.0b1-candidates/build2/win32/
Staging -> https://telemetry-experiment-dev.allizom.org/firefox-manifest.json

- ensured that by default, dom.ipc.plugins.unloadTimeoutSecs;30
- ensured that once the experiment is installed, it appears under about:addons
- ensured that once the experiment is installed, it appears under about:support
- ensured that once the experiment is installed, branch information is being listed under about:healthreport
- ensured that the branch number matches the dom.ipc.plugins.unloadTimeoutSecs pref
- ensured that only a single experiment is installed at the same time (tested with search experiment on beta en-us)
- ensured that the experiment appears disabled under about:addons once it's been removed/expired
- ensured that the experiment appears disabled under about:support once it's been removed/expired
- ensured that experiments.activeExperiment;false once the experiment has been removed/expired
- noticed that the defualt pref is not being restored but this has been addressed in comment #13 and comment #14

Ensured that I landed in several different branches:

- dom.ipc.plugins.unloadTimeoutSecs;0 (appearing as Branch: 0 under about:healthreport)
- dom.ipc.plugins.unloadTimeoutSecs;300 (appearing as Branch: 300 under about:healthreport)
- dom.ipc.plugins.unloadTimeoutSecs;90 (appearing as Branch: 90 under about:healthreport)
- dom.ipc.plugins.unloadTimeoutSecs;60 (appearing as Branch: 60 under about:healthreport)
- dom.ipc.plugins.unloadTimeoutSecs;120 (appearing as Branch: 120 under about:healthreport)

Also ran through the above test cases using the following locales:
- en-us, cs, is, ga-IE

Possible Issues:

When the experiment is installed, I see the following error message every time in the browser console:

1406136932558	Browser.Experiments.Experiments	INFO	ExperimentEntry #5::onEnabled() for fx-dom.ipc.plugins.unloadTimeoutSecs-beta32@experiments.mozilla.org
startup() function was launched twice; making second run return. (bootstrap.js:16) <-- error highlighted in red

The experiment does get installed correctly, but I just want to make sure that this error is either expected and won't cause other issues down the road.
Comment 16 User image Kamil Jozwiak [:kjozwiak] 2014-07-23 11:42:36 PDT
:gfritzsche pointed out on IRC that the possible issue I mentioned in comment #15 was addressed in comment #6 and is a known issue. The error was added so the issue isn't ignored.

Note You need to log in before you can comment on or make changes to this bug.