Autophone - 2015-05-08 Throbber stop regression on fx-team

RESOLVED FIXED in Firefox 41

Status

()

RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: bc, Assigned: benjamin)

Tracking

(Blocks: 1 bug, {regression})

40 Branch
Firefox 41
regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox40 affected, firefox41 fixed, fennec40+)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Comment 1

4 years ago
What does this test measure?
Flags: needinfo?(bob)
(Reporter)

Comment 2

4 years ago
mfinkle has a good description in https://wiki.mozilla.org/Auto-tools/Projects/Autophone/Autophone_for_developers#S1S2_Test
Flags: needinfo?(bob)
Blocks: 1150582
(Assignee)

Comment 3

4 years ago
From reading the description, I think that this is supposed to be a pageload test and unrelated to startup. Is that correct? The difference that bug 1140132 may have caused is that we load the addon manager earlier in the startup process than we did previously. But that is an affect that shouldn't affect pageload performance, so I'm wondering if/how the test waits for startup to complete.
(Reporter)

Comment 4

4 years ago
The test page is loaded and once the load event is fired, the quitter extension is used to shut down fennec. The time from the Start proc message to the time the Throbber stop message is emitted in the logcat is displayed Throbber stop time.

From mfinkle's comment:
<quote>
"Throbber Start and Throbber Stop should map to the points where Gecko nsIWebProgressListener fires START|NETWORK and STOP|NETWORK notifications, respectively. In the UI we use those to control the visibility of the "page progress" indicator. It used to be a throbber(spinner) but is now a simple progress line.

"The time between Throbber Start and Throbber Stop is a combination of Gecko networking and page parsing & loading, and rendering. We also have some Java UI affects too.
</quote>

This regression was confirmed by its appearance on the other repos as it was merged.
(Assignee)

Comment 5

4 years ago
I got the finkle quote. What are the details of "once the load event is fired"?

mfinkle, does fennec use the toolkit addon manager code at all?
Flags: needinfo?(mark.finkle)
(Reporter)

Comment 6

4 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> I got the finkle quote. What are the details of "once the load event is
> fired"?

<script>
function quit()
{
  var evt = document.createEvent('Events');
  evt.initEvent('please-quit', true, false);
  document.dispatchEvent(evt);
}
window.addEventListener("load", function () { setTimeout(quit, 1000); }, false);
window.onerror = quit;
</script>
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> I got the finkle quote. What are the details of "once the load event is
> fired"?
> 
> mfinkle, does fennec use the toolkit addon manager code at all?

Yes. Fennec uses the Toolkit Add-on Manager.
Flags: needinfo?(mark.finkle)
Throbber Stop includes startup and pageload. Throbber Start includes startup. The startup is almost 100% Java related code. Code that happens during Gecko's startup can appear after Throbber Start.

The range is fairly small. It should be simple to backout any suspects and see how the graph is affected.
Sounds like bug 1140132 causes an early initialization of NSS, which we have tried to avoid on mobile, and could explain the regression.
tracking-fennec: ? → 40+
Benjamin, looks like your bug 1140132 caused this
Assignee: nobody → benjamin
(Assignee)

Comment 11

4 years ago
Changing the order of NSS initialization is almost certainly what's going on here. This happens via the extension-manager calling into nsICryptoHash, probably here:

http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/internal/PluginProvider.jsm#35

Do the test machines have Flash installed? This hash would only be calculated if Flash is installed.

I can think of the following solutions, in rought order of preference:

* Because there's only ever Flash on Android, don't use crypto hashing for that plugin
* Make it so nsICryptoHash doesn't have to load the certificate DBs (doing main-thread IO)
* Implement nsICryptoHash on android in terms of Java instead of NSS
* Delay plugin data collection on Android
Status: NEW → ASSIGNED
Flags: needinfo?(blassey.bugs)
(Reporter)

Comment 12

4 years ago
Flash is not installed on these devices.
I think Bob answered the question you NI'd me for
Flags: needinfo?(blassey.bugs)
(Assignee)

Comment 14

4 years ago
If I just build Fennec myself and break at the first call to EnsureNSSInitialized, this is the JS stack that triggers it:

$1 = 0x920cb000 "0 LoginManagerCrypto_SDR.prototype.init() [\"jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/crypto-SDR.js\":79]\n    this = [object Object]\n1 LoginManagerCry"...
(gdb) printf "%s", $s
0 LoginManagerCrypto_SDR.prototype.init() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/crypto-SDR.js":79]
1 LoginManagerCrypto_SDR() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/crypto-SDR.js":14]
2 XPCOMUtils__getFactory/factory.createInstance(outer = null, iid = {2030770e-542e-40cd-8061-cd9d4ad4227f}) ["resource://gre/modules/XPCOMUtils.jsm":302]
3 LoginManagerStorage_mozStorage.prototype._crypto() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/storage-mozStorage.js":75]
4 LoginManagerStorage_mozStorage.prototype.initialize() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/storage-mozStorage.js":202]
5 LoginManager.prototype._initStorage() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/nsLoginManager.js":154]
6 LoginManager.prototype.init() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/nsLoginManager.js":131]
7 LoginManager() ["jar:jar:file:///data/app/org.mozilla.fennec_bsmedberg-1/base.apk!/assets/omni.ja!/components/nsLoginManager.js":55]
8 XPCOMUtils__getFactory/factory.createInstance(outer = null, iid = {38c7f6af-7df9-49c7-b558-2776b24e6cc1}) ["resource://gre/modules/XPCOMUtils.jsm":302]
9 startup/BrowserApp_delayedStartup/<() ["chrome://browser/content/browser.js":381]

So either my hypothesis is incorrect or the autophone test is using a different configuration than I would.

Bob, if I provided a patch which dumped the JS stack to logcat when NSS is initialized, how hard would it be to get that running in the autophone environment and get the logcat output?
Flags: needinfo?(bob)
(Reporter)

Comment 15

4 years ago
Not hard. You can do it your self via a try build. Use try: -b o -p android-api-9,android-api-11 -u autophone-s1s2 -t none as the check in comment. The logcat will be available in the Job Info panel on Treeherder and the measurements will be available on phonedash.mozilla.org (select try only).
Flags: needinfo?(bob)
(Assignee)

Comment 16

4 years ago
bc is my hero.

The JS stack initializing NSS as a result of this patch is:
05-20 10:34:24.443 I/Gecko   (  403): InitializeNSS
05-20 10:34:24.482 I/Gecko   (  403): 0 loadManifestFromRDF(aUri = [xpconnect wrapped nsIURI], aStream = [xpconnect wrapped (nsISupports, nsIInputStream, nsIBufferedInputStream, nsISeekableStream, nsIStreamBufferAccess)]) ["resource://gre/modules/addons/XPIProvider.jsm":1012]
05-20 10:34:24.482 I/Gecko   (  403): 1 loadManifestFromDir() ["resource://gre/modules/addons/XPIProvider.jsm":1063]
05-20 10:34:24.482 I/Gecko   (  403): 2 next(val = undefined) ["self-hosted":624]
05-20 10:34:24.482 I/Gecko   (  403):     this = [object Generator]
05-20 10:34:24.482 I/Gecko   (  403): 3 TaskImpl_run(aSendResolved = true) ["resource://gre/modules/Task.jsm":314]
05-20 10:34:24.482 I/Gecko   (  403):     this = [object Object]
05-20 10:34:24.482 I/Gecko   (  403): 4 TaskImpl(iterator = [object Generator]) ["resource://gre/modules/Task.jsm":275]
05-20 10:34:24.482 I/Gecko   (  403):     this = [object Object]
05-20 10:34:24.482 I/Gecko   (  403): 5 createAsyncFunction/asyncFunction([xpconnect wrapped nsIFile]) ["resource://gre/modules/Task.jsm":249]
05-20 10:34:24.482 I/Gecko   (  403): 6 loadManifestFromFile(aFile = [xpconnect wrapped nsIFile]) ["resource://gre/modules/addons/XPIProvider.jsm":1158]
05-20 10:34:24.482 I/Gecko   (  403): 7 syncLoadManifestFromFile(aFile = [xpconnect wrapped nsIFile]) ["resource://gre/modules/addons/XPIProvider.jsm":1169]
05-20 10:34:24.482 I/Gecko   (  403): 8 XPI_processPendingFileChanges/<(aLocation = [object Object], 0, [object Objec

Which is where we create a random GUID for sync here: http://hg.mozilla.org/mozilla-central/annotate/8d8df22fe72d/toolkit/mozapps/extensions/internal/XPIProvider.jsm#l1012

I'm not sure what this ID is for or whether it needs strong cryptography. Could we use Math.random for this instead of NSS? If not, could we stub nsIRandomGenerator in Fennec using Java instead of initializing NSS?
(Assignee)

Comment 17

4 years ago
Created attachment 8608821 [details] [diff] [review]
1163049-faster-random
Attachment #8608821 - Flags: review?(rnewman)
Attachment #8608821 - Flags: review?(MattN+bmo)
Comment on attachment 8608821 [details] [diff] [review]
1163049-faster-random

Review of attachment 8608821 [details] [diff] [review]:
-----------------------------------------------------------------

My first response to this was: why not make AddonInternal.guid lazy, and avoid the need to do any work at all at this stage?

If that's a new random GUID, we don't need it until we sync. (It might need to be present sooner to avoid changing a bunch of code, of course.)

Did you consider doing that, Benjamin?

::: toolkit/mozapps/extensions/internal/XPIProvider.jsm
@@ +1008,5 @@
>    let storage = Services.storage;
>  
>    // Generate random GUID used for Sync.
> +  // This does NOT use cryptographically secure algorithms because it is the
> +  // startup path of Fennec and loading NSS causes a startup regression.

Pointer to the bug for further discussion?

@@ +1014,1 @@
>    // This was lifted from util.js:makeGUID() from services-sync.

This comment is no longer correct.

@@ +1014,4 @@
>    // This was lifted from util.js:makeGUID() from services-sync.
> +  function* range(i) { for (let j = 0; j < i; ++j) { yield j; } }
> +  let byte_string = [String.fromCharCode(Math.floor(Math.random() * 0x100))
> +                     for (i of range(9))].join("");

Out of curiosity, do we optimize JS generators enough that this beats

let bytes = [];
for (i = 0; i < 9; ++i) {
  bytes.push(Math.floor(Math.random() * 0x100))
}
let byte_string = ...

or some other phrasing?
Attachment #8608821 - Flags: review?(rnewman) → review+
Comment on attachment 8608821 [details] [diff] [review]
1163049-faster-random

Review of attachment 8608821 [details] [diff] [review]:
-----------------------------------------------------------------

I don't really have anything new to add on top of what rnewman said but making syncGUID a lazy getter seems like a good solution assuming it's not accessed during startup.
Attachment #8608821 - Flags: review?(MattN+bmo)
(Assignee)

Comment 20

4 years ago
Created attachment 8609591 [details] [diff] [review]
1163049-lazy-guid

Lazy like this?

Here's the tryserver run that includes this patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=267662c3866d
logcat:http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/d94732cfc08cfa5f350367f81f62dc7563a30c8aee31a7e01a424b967748d30d9ecc1263ffa3ca34f4fe918d52ae27d8ef5a76f1307a25d36e3d25d533b06e67

And the NSS init comes from verifyDirSignedState which looks late enough to me.
Attachment #8609591 - Flags: review?(rnewman)
Comment on attachment 8609591 [details] [diff] [review]
1163049-lazy-guid

Review of attachment 8609591 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, that looks good to me. If add-on sync continues to work with this change, then great!
Attachment #8609591 - Flags: review?(rnewman) → review+
(Assignee)

Updated

4 years ago
Attachment #8608821 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/622af20a746a
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
status-firefox41: --- → fixed
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → Firefox 41
(Assignee)

Comment 26

4 years ago
bc, please verify that this patch actually fixes the autophone regression
Flags: needinfo?(bob)
Depends on: 1170981
(Assignee)

Updated

4 years ago
Blocks: 1194780
(Assignee)

Updated

4 years ago
No longer blocks: 1194780
You need to log in before you can comment on or make changes to this bug.