959652 - BrowserHealthRecorder.ensureEnvironment can be expensive during startup

Reporter

Description

•

12 years ago

BrowserHealthRecorder.ensureEnvironment shows up in my GalaxyNexus startup profile, accounting for ~13% (~350ms). I did not see it in the Nexus S startup profile though. I don't know offhand if the code takes different paths based on Android OS version. profile: http://people.mozilla.org/~mfinkle/fennec/profiles/galaxynexus-home-startup.trace

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 1

•

12 years ago

To follow the trace further: [350.922s] BrowserHealthRecorder.ensureEnvironment [350.892s] EnvironmentBuilder.registerCurrentEnvironment [339.447s] HealthReportDatabaseStorage$DatabaseEnvironment.register [318.359s] EnvironmentV1.getHash [309.051s] EnviromentV1.getAppender [309.051s] Class.newInstance I'll take a look.

Assignee: nobody → michael.l.comella

Status: NEW → ASSIGNED

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 2

•

12 years ago

Oh. comment 1 continued: [309.051s] Class.newInstance [308.991s] Class.newInstanceImpl [308.991s] EnvironmentV1$HashAppender.init [308.838s] MessageDigest.getInstance

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 3

•

12 years ago

This is unfortunate: [308.838s] MessageDigest.getInstance [308.838s] fortress.Engine.getInstance [308.197s] fortress.Services.getCacheVersion [308.014s] forgtress.Services.initServiceInfo (called 6 times) [198.484s] String.toUpperCase (called 544 times) :(

Mark Finkle (:mfinkle) (use needinfo?)

Reporter

Updated

•

12 years ago

Blocks: 959776

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 4

•

12 years ago

(In reply to Mark Finkle (:mfinkle) from comment #0) > I did not see it in the Nexus S startup profile though. I don't know offhand > if the code takes different paths based on Android OS version. Skimming the code, it does not appear our code does but I'm curious whether the Android OS code/libraries has changed (see trace). I found the trace in [1]: [29.971s] BHR.ensureEnvironment ... [1.074s] Fortress.Engine.getInstance (Notably, fortress.Services.getCacheVersion is missing...) Finkle, what Android versions are you GNex and NexS? Also, are either of these traces taken on profiles that already contain data and if so, approximately how much (e.g. this is my primary profile, I just took these traces, etc.)? [1]: https://people.mozilla.org/~mfinkle/fennec/profiles/nexuss-cnn-startup.trace

Flags: needinfo?(mark.finkle)

Mark Finkle (:mfinkle) (use needinfo?)

Reporter

Comment 5

•

12 years ago

Nexus S has Android 2.3.7 Galaxy Nexus has Android 4.3 Both builds have small profiles. They are not my main phones. Just a week or two of pageloads and the standard bookmarks.

Flags: needinfo?(mark.finkle)

Mark Finkle (:mfinkle) (use needinfo?)

Reporter

Comment 6

•

12 years ago

(In reply to Michael Comella (:mcomella) from comment #4) > Skimming the code, it does not appear our code does but I'm curious whether > the Android OS code/libraries has changed (see trace). I found the trace in > [1]: > > [29.971s] BHR.ensureEnvironment > ... > [1.074s] Fortress.Engine.getInstance > (Notably, fortress.Services.getCacheVersion is missing...) It looks like the Engine.java and Services.java code will load the crypto providers statically (way before Firefox runs) in Gingerbread, but will force a reload of the providers in Jellybean. Gingerbread: http://androidxref.com/2.3.7/xref/libcore/luni/src/main/java/org/apache/harmony/security/fortress/Engine.java#82 http://androidxref.com/2.3.7/xref/libcore/luni/src/main/java/org/apache/harmony/security/fortress/Services.java#70 Jellybean: http://androidxref.com/4.3_r2.1/xref/libcore/luni/src/main/java/org/apache/harmony/security/fortress/Engine.java#132 http://androidxref.com/4.3_r2.1/xref/libcore/luni/src/main/java/org/apache/harmony/security/fortress/Services.java#203

Mark Finkle (:mfinkle) (use needinfo?)

Reporter

Comment 7

•

12 years ago

The dumb question: Do we need to hash this data?

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 8

•

12 years ago

(In reply to Mark Finkle (:mfinkle) from comment #7) > The dumb question: Do we need to hash this data? Strictly speaking, no, though for performance, we probaby want it. Currently, the hash is used to uniquely identify a device environment. This helps with performance when recording the current environment where SQL will throw if we try to insert a non-unique hash value, telling us that environment already exists. SQLite does this performantly through an implicit index for unique values (i.e. the hash) [1]. If we remove the hash, we'd need to compare every data point (i.e. db column) of our environment (28 values?) in a table walk (we have no indices) in order to identify if the current environment already exists and we'd have to run this "exists?" query before every environment insertion. One instance of environment insertion occurs in ensureEnvironment, which will probably make our issue here worse (and it's likely called elsewhere). Additionally, the hash is also used in the FHR document schema v3 [2], which may be implemented by others (though it wasn't yet when I checked several months ago) and thus it may be more resistant to change. If we can figure out an alternative way of creating a short unique identifier without having to look it up in the DB, we could use that instead (though nothing quickly comes to mind). ...But whether or not we need to hash this data on startup? Still looking into that one. :) Needinfo rnewman to double-check my conclusions. [1]: http://stackoverflow.com/a/19358087 [2]: https://docs.services.mozilla.com/healthreport/index.html#version-3

Flags: needinfo?(rnewman)

Richard Newman [:rnewman]

Comment 9

•

12 years ago

History for this design: Bug 863440. Hashing here serves two important purposes, and one less important but still valuable purpose: * Collapsing down a large environment into a small identifier * Ensuring that two identical environments on the same machine will result in the same identifier, even across crashes and data loss * Ensuring that two identical environments on different machines will result in the same identifier. Furthermore, yes, it's resistant to change. Tooling is built around this format. (I could have sworn we already had a bug open for this…)

Flags: needinfo?(rnewman)

Richard Newman [:rnewman]

Comment 10

•

12 years ago

ckitching spotted this during his internship, btw. Chris, did you file a bug for this? One thing we might want to do: use native crypto to do the hashing. We're shipping native crypto code anyway, and it'll totally avoid initializing the Java crypto providers.

Chris Kitching [:ckitching]

Comment 11

•

12 years ago

(In reply to Richard Newman [:rnewman] from comment #10) > ckitching spotted this during his internship, btw. Chris, did you file a bug > for this? Yup. Bug 905296. It's that one you needinfo'd me on approximately a thousand years ago and which I never got back to you about. I can't recall much of the context, and have no time to go into it at this point, so can't really give you a sensible answer to your question.

Richard Newman [:rnewman]

Comment 13

•

12 years ago

This is a great motivation for using native crypto for FxA, too.

Chris Kitching [:ckitching]

Comment 14

•

12 years ago

Sudden thought. Do you really need a *cryptographic* hash for this? Cryptographic hashes are many, many (read: thousands-of) times slower than non-cryptographic hashes (this is a desirable property of cryptographic hashes, actually) Hashing sounds sensible for the reasons Richard gave in Comment 10 - but all of those things are satisfied by any hashing algorithm that provides collision resistance (such as xxHash - a very high performance non-cryptographic hash library for Java). Do you actually care about preimage resistance? Seems like that might be an alternative solution to both this particular problem, with a shiny, general performance boost to boot. (Although you'd need to keep around extra logic to migrate stuff from one hashing scheme to the next if you did implement this. Still - you could initialise the expensive crypto library on demand in that case).

Richard Newman [:rnewman]

Comment 15

•

12 years ago

See Bug 905296 Comment 1. Migration isn't much of a problem: we'd just need to re-hash each environment. Because FHR documents aren't persisted forever on the server, we can upload another with entirely different contents without fear. That said, a change in scheme would have analytics consequences (not to mention dev costs), so it's not free.

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 16

•

12 years ago

In conclusion, I will attempt to write a patch using native crypto and benchmark the results. :)

Richard Newman [:rnewman]

Comment 17

•

12 years ago

See Bug 915312, or use NSS.

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 18

•

12 years ago

(In reply to Mark Finkle (:mfinkle) from comment #6) > It looks like the Engine.java and Services.java code will load the crypto > providers statically (way before Firefox runs) in Gingerbread, but will > force a reload of the providers in Jellybean. IIUC, this means that there isn't a regression here, but rather that Android has moved this initialization has moved from a place where we can't do anything about it to a place where we can probably do something to improve it. That's a good thing. I am curious if avoiding the use of the crypto providers completely (so they would never be loaded on Jellybean) would significantly reduce memory usage. If so, perhaps that is worth pursuing. If you want to use NSS, but you don't want to pay the full NSS initialization cost, you can try using the NSSLOWHASH API in nsslowhash.h. Alternatively, there's mfbt/SHA1.[ch].

Richard Newman [:rnewman]

Updated

•

12 years ago

Blocks: 949054

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 19

•

12 years ago

Note: bug 967254 might mitigate this, but perhaps it's still worth pursuing for the issues in comment 18.

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 20

•

11 years ago

After discussions on IRC, it seems the consensus is to add SHA-1 to bug 915312 (native crypto) so I'll be taking a stab at moving that forward.

Depends on: 915312

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 21

•

11 years ago

Attached file Patch (github) — Details

Part 1. Tests to come in part 2. Additionally, I noticed we use MessageDigest in other places in the sync code that will require the same initialization (just not at startup), including sha1 routines - should I take care of that in this bug, or file a followup? Traces were performed on Galaxy S 4 on a dirty profile (second run): * current impl: https://people.mozilla.org/~mcomella/trace/959652-01_old.trace * this patch: https://people.mozilla.org/~mcomella/trace/959652-02_new.trace Current impl (times are CPU time): [1504.820] BHR.ensureEnvironment [1504.769] EnvironmentBuilder.registerCurrentEnvironment [1463.870] HRDatabaseStorage$DatabaseEnvironment.register [1400.879] EnvironmentV1.getHash() New: [111.627] BHR.ensureEnvironment [111.591] EnvironmentBuilder.registerCurrentEnvironment [77.368] HRDatabaseStorage$DatabaseEnvironment.register [16.649] EnvironmentV1.getHash() In HRDatabaseStorage$DatabaseEnvironment.register, SQLiteDatabase.insertOrThrow now takes up the most time (25.833), getHash is second most. One last thought is that HashAppender's underlying implementation might be improved by keeping a LinkedList of String, and only creating the byte[] once we know the full size of the String, rather than having StringBuilder do this copying multiple times. However, this might neglect the CPU cache some more, so maybe not. I'll look into this a bit further when writing the tests.

Attachment #8387964 - Flags: review?(rnewman)

Richard Newman [:rnewman]

Comment 22

•

11 years ago

(In reply to Michael Comella (:mcomella) from comment #21) > Additionally, I noticed we use MessageDigest in other places in the sync > code that will require the same initialization (just not at startup), > including sha1 routines - should I take care of that in this bug, or file a > followup? Follow-up. > Current impl (times are CPU time): > [1504.820] BHR.ensureEnvironment > > New: > [111.627] BHR.ensureEnvironment Excellent! > One last thought is that HashAppender's underlying implementation might be > improved by keeping a LinkedList of String, and only creating the byte[] > once we know the full size of the String, rather than having StringBuilder > do this copying multiple times. However, this might neglect the CPU cache > some more, so maybe not. I'll look into this a bit further when writing the > tests. Also potentially troublesome when considering UTF-8 encoding. You can't assume that string length = buffer length.

Richard Newman [:rnewman]

Updated

•

11 years ago

Attachment #8387964 - Flags: review?(rnewman) → review+

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 23

•

11 years ago

(In reply to Richard Newman [:rnewman] from comment #22) > > Additionally, I noticed we use MessageDigest in other places in the sync > > code that will require the same initialization (just not at startup), > > including sha1 routines - should I take care of that in this bug, or file a > > followup? > > Follow-up. bug 981872.

Blocks: 981872

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 24

•

11 years ago

Comment on attachment 8387964 [details] [review] Patch (github) Part 2 WIP for tests. Specifically, testSHA1Hashing throws an assertion that I can't figure out - see the comment on the PR.

Attachment #8387964 - Flags: feedback?(rnewman)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 25

•

11 years ago

Comment on attachment 8387964 [details] [review] Patch (github) Part 1: r+ Part 2: r? Improvements from comment 24: * Fixed the tests using what was discussed on IRC * Added a few more tests. I did not research more performant implementations as mentioned in comment 21, nor do I think it's worth the time at this juncture.

Attachment #8387964 - Flags: review?(rnewman)

Attachment #8387964 - Flags: review+

Attachment #8387964 - Flags: feedback?(rnewman)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 26

•

11 years ago

Comment on attachment 8387964 [details] [review] Patch (github) r+ via github: https://github.com/mozilla-services/android-sync/pull/430#commitcomment-5640969

Attachment #8387964 - Flags: review?(rnewman) → review+

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 27

•

11 years ago

--startupTime: https://hg.mozilla.org/integration/fx-team/rev/9fe5610363b3

Phil Ringnalda (:philor)

Comment 28

•

11 years ago

Your push did fine on the tests on its builds, all of which were, coincidentally, clobbers. On the next push, https://tbpl.mozilla.org/?tree=Fx-Team&rev=bd0463063293, which did not get coincidental clobbers, everything failed. nalexander told me that it was pure coincidence that clobbered builds did perfectly fine and unclobbered builds failed. Backed out in https://hg.mozilla.org/integration/fx-team/rev/31f9a148b33c.

Nick Alexander :nalexander [he/him]

Comment 29

•

11 years ago

(In reply to Phil Ringnalda (:philor) from comment #28) > Your push did fine on the tests on its builds, all of which were, > coincidentally, clobbers. > > On the next push, https://tbpl.mozilla.org/?tree=Fx-Team&rev=bd0463063293, > which did not get coincidental clobbers, everything failed. > > nalexander told me that it was pure coincidence that clobbered builds did > perfectly fine and unclobbered builds failed. Yeah, I have a hard time understanding how this could be clobber-related. But everything else in my world right now appears to be, so why not weird native code + DB interactions?

Nick Alexander :nalexander [he/him]

Comment 30

•

11 years ago

So, this gets more interesting. I downloaded the offending build, and running it locally, I find: W dalvikvm(22386) threadid=10: thread exiting with uncaught exception (group=0x4194e898) E GeckoAppShell(22386) >>> REPORTING UNCAUGHT EXCEPTION FROM THREAD 14630 ("GeckoBackgroundThread") E GeckoAppShell(22386) android.database.sqlite.SQLiteConstraintException: column url is not unique (code 19) E GeckoAppShell(22386) at android.database.sqlite.SQLiteConnection.nativeExecuteForLastInsertedRowId(Native Method) E GeckoAppShell(22386) at android.database.sqlite.SQLiteConnection.executeForLastInsertedRowId(SQLiteConnection.java:973) E GeckoAppShell(22386) at android.database.sqlite.SQLiteSession.executeForLastInsertedRowId(SQLiteSession.java:788) E GeckoAppShell(22386) at android.database.sqlite.SQLiteStatement.executeInsert(SQLiteStatement.java:86) E GeckoAppShell(22386) at android.database.sqlite.SQLiteDatabase.insertWithOnConflict(SQLiteDatabase.java:1591) E GeckoAppShell(22386) at android.database.sqlite.SQLiteDatabase.insertOrThrow(SQLiteDatabase.java:1487) E GeckoAppShell(22386) at org.mozilla.gecko.db.BrowserDatabaseHelper.access$000$7aece89(BrowserDatabaseHelper.java:50) E GeckoAppShell(22386) at org.mozilla.gecko.db.BrowserDatabaseHelper$2.run(BrowserDatabaseHelper.java:921) E GeckoAppShell(22386) at android.os.Handler.handleCallback(Handler.java:730) E GeckoAppShell(22386) at android.os.Handler.dispatchMessage(Handler.java:92) E GeckoAppShell(22386) at android.os.Looper.loop(Looper.java:137) E GeckoAppShell(22386) at org.mozilla.gecko.util.GeckoBackgroundThread.run(GeckoBackgroundThread.java:32) I fingered that BrowserDatabaseHelper as the health recorder layer in Fennec, but now I see that's incorrect. So this is even more mysterious!

Richard Newman [:rnewman]

Comment 31

•

11 years ago

Definitely nothing to do with mcomella's patch. That stack implies that we're inserting the default favicons in a DB that already has them, or the default favicons don't have unique URIs.

Richard Newman [:rnewman]

Comment 32

•

11 years ago

Lots of scary things in that log, including: 03-11 21:36:01.805 D/GeckoJarReader( 1868): No Entry for Firefox: About your browser which implies that something other than a URL is being passed into GeckoJarReader.getStream.

Chris Kitching [:ckitching]

Comment 33

•

11 years ago

(In reply to Richard Newman [:rnewman] from comment #31) > Definitely nothing to do with mcomella's patch. That stack implies that > we're inserting the default favicons in a DB that already has them, or the > default favicons don't have unique URIs. Something that comes to mind - the default favicons don't actually provide a value for the favicons.url field in the database (Indeed, in my FF-25 DB they have null for that field). When the ContentValues are passed to BrowserDatabaseHelper.insertFavicon, the value faviconUrl will always be null. I see that since I last looked at this file someone has added logic to detect this case and use Favicons.guessDefaultFaviconUrl to prevent the row ending up as null. Might this potentially be quite problematic? I certainly wouldn't rely on guessDefaultFaviconUrl to be returning unique urls (Indeed, it *very much* is not going to under some conditions...) On a less on-topic node, why is this method duplicated in BrowserProvider? I guess there's a long and complicated reason why this was done, but it seems potentially problematic... (Not least because, despite the comment, the bodies of these methods aren't identical...)

Richard Newman [:rnewman]

Comment 34

•

11 years ago

See also Bug 962025, which is perhaps related. Duplication is something I had to deal with in Bug 947939. I blame Sola :P

Phil Ringnalda (:philor)

Comment 35

•

11 years ago

Whatever it was, it was indeed not this bug but something in https://hg.mozilla.org/mozilla-central/pushloghtml?startID=26390&endID=26391 which did require a clobber, once that was merged to mozilla-inbound where it caused a multi-hour closure.

Nick Alexander :nalexander [he/him]

Comment 36

•

11 years ago

(In reply to Richard Newman [:rnewman] from comment #31) > Definitely nothing to do with mcomella's patch. That stack implies that > we're inserting the default favicons in a DB that already has them, or the > default favicons don't have unique URIs. Sorry for fingering the wrong thing, mcomella.

Nick Alexander :nalexander [he/him]

Comment 37

•

11 years ago

(In reply to Phil Ringnalda (:philor) from comment #35) > Whatever it was, it was indeed not this bug but something in > https://hg.mozilla.org/mozilla-central/pushloghtml?startID=26390&endID=26391 > which did require a clobber, once that was merged to mozilla-inbound where > it caused a multi-hour closure. I'm starting to suspect I did something bad to Android resource dependencies as part of Bug 978587. My idle speculation is that liuche's strings only patch may have tickled something here. I have open tickets with issues to address in this area, will try to get to it today.

Nick Alexander :nalexander [he/him]

Comment 38

•

11 years ago

(In reply to Richard Newman [:rnewman] from comment #32) > Lots of scary things in that log, including: > > 03-11 21:36:01.805 D/GeckoJarReader( 1868): No Entry for Firefox: About your > browser > > which implies that something other than a URL is being passed into > GeckoJarReader.getStream. I agree this looks frightening, but digging into the *unbelievable* code at http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/db/BrowserDatabaseHelper.java#888 shows that this is a speculative load. I'm digging in further.

Nick Alexander :nalexander [he/him]

Comment 39

•

11 years ago

Reland: https://hg.mozilla.org/integration/fx-team/rev/21ef91fbc1ac Let's hope the fixes in Bug 979388 (or fate) are sufficient to avoid the earlier issues!

Carsten Book [:Tomcat]

Comment 40

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/21ef91fbc1ac

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → Firefox 30

Nobody; OK to take it and work on it

Updated

•

5 years ago

Product: Firefox for Android → Firefox for Android Graveyard