Closed Bug 1163049 Opened 9 years ago Closed 9 years ago

Autophone - 2015-05-08 Throbber stop regression on fx-team

Categories

(Firefox for Android Graveyard :: General, defect)

40 Branch
defect
Not set
normal

Tracking

(firefox40 affected, firefox41 fixed, fennec40+)

RESOLVED FIXED
Firefox 41
Tracking Status
firefox40 --- affected
firefox41 --- fixed
fennec 40+ ---

People

(Reporter: bc, Assigned: benjamin)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

What does this test measure?
Flags: needinfo?(bob)
From reading the description, I think that this is supposed to be a pageload test and unrelated to startup. Is that correct? The difference that bug 1140132 may have caused is that we load the addon manager earlier in the startup process than we did previously. But that is an affect that shouldn't affect pageload performance, so I'm wondering if/how the test waits for startup to complete.
The test page is loaded and once the load event is fired, the quitter extension is used to shut down fennec. The time from the Start proc message to the time the Throbber stop message is emitted in the logcat is displayed Throbber stop time.

From mfinkle's comment:
<quote>
"Throbber Start and Throbber Stop should map to the points where Gecko nsIWebProgressListener fires START|NETWORK and STOP|NETWORK notifications, respectively. In the UI we use those to control the visibility of the "page progress" indicator. It used to be a throbber(spinner) but is now a simple progress line.

"The time between Throbber Start and Throbber Stop is a combination of Gecko networking and page parsing & loading, and rendering. We also have some Java UI affects too.
</quote>

This regression was confirmed by its appearance on the other repos as it was merged.
I got the finkle quote. What are the details of "once the load event is fired"?

mfinkle, does fennec use the toolkit addon manager code at all?
Flags: needinfo?(mark.finkle)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> I got the finkle quote. What are the details of "once the load event is
> fired"?

<script>
function quit()
{
  var evt = document.createEvent('Events');
  evt.initEvent('please-quit', true, false);
  document.dispatchEvent(evt);
}
window.addEventListener("load", function () { setTimeout(quit, 1000); }, false);
window.onerror = quit;
</script>
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> I got the finkle quote. What are the details of "once the load event is
> fired"?
> 
> mfinkle, does fennec use the toolkit addon manager code at all?

Yes. Fennec uses the Toolkit Add-on Manager.
Flags: needinfo?(mark.finkle)
Throbber Stop includes startup and pageload. Throbber Start includes startup. The startup is almost 100% Java related code. Code that happens during Gecko's startup can appear after Throbber Start.

The range is fairly small. It should be simple to backout any suspects and see how the graph is affected.
Sounds like bug 1140132 causes an early initialization of NSS, which we have tried to avoid on mobile, and could explain the regression.
tracking-fennec: ? → 40+
Benjamin, looks like your bug 1140132 caused this
Assignee: nobody → benjamin
Changing the order of NSS initialization is almost certainly what's going on here. This happens via the extension-manager calling into nsICryptoHash, probably here:

http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/internal/PluginProvider.jsm#35

Do the test machines have Flash installed? This hash would only be calculated if Flash is installed.

I can think of the following solutions, in rought order of preference:

* Because there's only ever Flash on Android, don't use crypto hashing for that plugin
* Make it so nsICryptoHash doesn't have to load the certificate DBs (doing main-thread IO)
* Implement nsICryptoHash on android in terms of Java instead of NSS
* Delay plugin data collection on Android
Status: NEW → ASSIGNED
Flags: needinfo?(blassey.bugs)
Flash is not installed on these devices.
I think Bob answered the question you NI'd me for
Flags: needinfo?(blassey.bugs)
If I just build Fennec myself and break at the first call to EnsureNSSInitialized, this is the JS stack that triggers it:

$1 = 0x920cb000 "0 LoginManagerCrypto_SDR.prototype.init() [\"jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/crypto-SDR.js\":79]\n    this = [object Object]\n1 LoginManagerCry"...
(gdb) printf "%s", $s
0 LoginManagerCrypto_SDR.prototype.init() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/crypto-SDR.js":79]
1 LoginManagerCrypto_SDR() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/crypto-SDR.js":14]
2 XPCOMUtils__getFactory/factory.createInstance(outer = null, iid = {2030770e-542e-40cd-8061-cd9d4ad4227f}) ["resource://gre/modules/XPCOMUtils.jsm":302]
3 LoginManagerStorage_mozStorage.prototype._crypto() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/storage-mozStorage.js":75]
4 LoginManagerStorage_mozStorage.prototype.initialize() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/storage-mozStorage.js":202]
5 LoginManager.prototype._initStorage() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/nsLoginManager.js":154]
6 LoginManager.prototype.init() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/nsLoginManager.js":131]
7 LoginManager() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/nsLoginManager.js":55]
8 XPCOMUtils__getFactory/factory.createInstance(outer = null, iid = {38c7f6af-7df9-49c7-b558-2776b24e6cc1}) ["resource://gre/modules/XPCOMUtils.jsm":302]
9 startup/BrowserApp_delayedStartup/<() ["chrome://browser/content/browser.js":381]

So either my hypothesis is incorrect or the autophone test is using a different configuration than I would.

Bob, if I provided a patch which dumped the JS stack to logcat when NSS is initialized, how hard would it be to get that running in the autophone environment and get the logcat output?
Flags: needinfo?(bob)
Not hard. You can do it your self via a try build. Use try: -b o -p android-api-9,android-api-11 -u autophone-s1s2 -t none as the check in comment. The logcat will be available in the Job Info panel on Treeherder and the measurements will be available on phonedash.mozilla.org (select try only).
Flags: needinfo?(bob)
bc is my hero.

The JS stack initializing NSS as a result of this patch is:
05-20 10:34:24.443 I/Gecko   (  403): InitializeNSS
05-20 10:34:24.482 I/Gecko   (  403): 0 loadManifestFromRDF(aUri = [xpconnect wrapped nsIURI], aStream = [xpconnect wrapped (nsISupports, nsIInputStream, nsIBufferedInputStream, nsISeekableStream, nsIStreamBufferAccess)]) ["resource://gre/modules/addons/XPIProvider.jsm":1012]
05-20 10:34:24.482 I/Gecko   (  403): 1 loadManifestFromDir() ["resource://gre/modules/addons/XPIProvider.jsm":1063]
05-20 10:34:24.482 I/Gecko   (  403): 2 next(val = undefined) ["self-hosted":624]
05-20 10:34:24.482 I/Gecko   (  403):     this = [object Generator]
05-20 10:34:24.482 I/Gecko   (  403): 3 TaskImpl_run(aSendResolved = true) ["resource://gre/modules/Task.jsm":314]
05-20 10:34:24.482 I/Gecko   (  403):     this = [object Object]
05-20 10:34:24.482 I/Gecko   (  403): 4 TaskImpl(iterator = [object Generator]) ["resource://gre/modules/Task.jsm":275]
05-20 10:34:24.482 I/Gecko   (  403):     this = [object Object]
05-20 10:34:24.482 I/Gecko   (  403): 5 createAsyncFunction/asyncFunction([xpconnect wrapped nsIFile]) ["resource://gre/modules/Task.jsm":249]
05-20 10:34:24.482 I/Gecko   (  403): 6 loadManifestFromFile(aFile = [xpconnect wrapped nsIFile]) ["resource://gre/modules/addons/XPIProvider.jsm":1158]
05-20 10:34:24.482 I/Gecko   (  403): 7 syncLoadManifestFromFile(aFile = [xpconnect wrapped nsIFile]) ["resource://gre/modules/addons/XPIProvider.jsm":1169]
05-20 10:34:24.482 I/Gecko   (  403): 8 XPI_processPendingFileChanges/<(aLocation = [object Object], 0, [object Objec

Which is where we create a random GUID for sync here: http://hg.mozilla.org/mozilla-central/annotate/8d8df22fe72d/toolkit/mozapps/extensions/internal/XPIProvider.jsm#l1012

I'm not sure what this ID is for or whether it needs strong cryptography. Could we use Math.random for this instead of NSS? If not, could we stub nsIRandomGenerator in Fennec using Java instead of initializing NSS?
Attached patch 1163049-faster-random (obsolete) — Splinter Review
Attachment #8608821 - Flags: review?(rnewman)
Attachment #8608821 - Flags: review?(MattN+bmo)
Comment on attachment 8608821 [details] [diff] [review]
1163049-faster-random

Review of attachment 8608821 [details] [diff] [review]:
-----------------------------------------------------------------

My first response to this was: why not make AddonInternal.guid lazy, and avoid the need to do any work at all at this stage?

If that's a new random GUID, we don't need it until we sync. (It might need to be present sooner to avoid changing a bunch of code, of course.)

Did you consider doing that, Benjamin?

::: toolkit/mozapps/extensions/internal/XPIProvider.jsm
@@ +1008,5 @@
>    let storage = Services.storage;
>  
>    // Generate random GUID used for Sync.
> +  // This does NOT use cryptographically secure algorithms because it is the
> +  // startup path of Fennec and loading NSS causes a startup regression.

Pointer to the bug for further discussion?

@@ +1014,1 @@
>    // This was lifted from util.js:makeGUID() from services-sync.

This comment is no longer correct.

@@ +1014,4 @@
>    // This was lifted from util.js:makeGUID() from services-sync.
> +  function* range(i) { for (let j = 0; j < i; ++j) { yield j; } }
> +  let byte_string = [String.fromCharCode(Math.floor(Math.random() * 0x100))
> +                     for (i of range(9))].join("");

Out of curiosity, do we optimize JS generators enough that this beats

let bytes = [];
for (i = 0; i < 9; ++i) {
  bytes.push(Math.floor(Math.random() * 0x100))
}
let byte_string = ...

or some other phrasing?
Attachment #8608821 - Flags: review?(rnewman) → review+
Comment on attachment 8608821 [details] [diff] [review]
1163049-faster-random

Review of attachment 8608821 [details] [diff] [review]:
-----------------------------------------------------------------

I don't really have anything new to add on top of what rnewman said but making syncGUID a lazy getter seems like a good solution assuming it's not accessed during startup.
Attachment #8608821 - Flags: review?(MattN+bmo)
Lazy like this?

Here's the tryserver run that includes this patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=267662c3866d
logcat:http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/d94732cfc08cfa5f350367f81f62dc7563a30c8aee31a7e01a424b967748d30d9ecc1263ffa3ca34f4fe918d52ae27d8ef5a76f1307a25d36e3d25d533b06e67

And the NSS init comes from verifyDirSignedState which looks late enough to me.
Attachment #8609591 - Flags: review?(rnewman)
Comment on attachment 8609591 [details] [diff] [review]
1163049-lazy-guid

Review of attachment 8609591 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, that looks good to me. If add-on sync continues to work with this change, then great!
Attachment #8609591 - Flags: review?(rnewman) → review+
Attachment #8608821 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/622af20a746a
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → Firefox 41
bc, please verify that this patch actually fixes the autophone regression
Flags: needinfo?(bob)
Depends on: 1170981
Blocks: 1194780
No longer blocks: 1194780
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: