Closed Bug 841244 Opened 12 years ago Closed 12 years ago

FHR performance Telemetry part 2

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Firefox 21

People

(Reporter: gps, Assigned: gps)

References

Details

Attachments

(2 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #828720 +++ We should add some more probes: * Differentiate init time for 1st startup vs subsequent (i.e. isolate db creation) * Measure DB open time (main thread synchronous I/O) * Measure JSON.stringify time for main payload (synchronous op) * DB size * Collection times
+ Shutdown time(s)
It appears Storage already has some Telemetry probes for DB open on main thread (MOZ_SQLITE_OPEN_MS and MOZ_SQLITE_OPEN_MAIN_THREAD_MS). This encompasses FHR's DB open. Furthermore, measuring just FHR's time will be difficult since DB ops are abstracted away in Sqlite.jsm. We'd have to pass probe names or probe prefixes into Sqlite.jsm. I'm going to assert capturing FHR's explicit DB open() time is not worth our while.
I only have maybe 90 minutes to get r+ before I go on PTO. rnewman may have to guide this to completion.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Attachment #714096 - Flags: review?(vdjeric)
Attachment #714096 - Flags: review?(rnewman)
Comment on attachment 714096 [details] [diff] [review] More Telemetry probes, v1 Review of attachment 714096 [details] [diff] [review]: ----------------------------------------------------------------- Fine by me if the telemetry experts are happy!
Attachment #714096 - Flags: review?(rnewman) → review+
Comment on attachment 714105 [details] [diff] [review] More telemetry probes (rebased onto s-c). v2 > // Yes, this will probably run concurrently with remaining constructor work. >- TelemetryStopwatch.start(TELEMETRY_INIT, this); >+ this._initHistogram = this._prefs.get("service.firstRun", false) ? >+ TELEMETRY_INIT : TELEMETRY_INIT_FIRSTRUN; >+ >+ TelemetryStopwatch.start(this._initHistogram, this); > > this._ensureDirectoryExists(this._stateDir) > .then(this._onStateDirCreated.bind(this), > this._onInitError.bind(this)); > } So these probes will still over-estimate the amount of time spent init'ing FHR? >- throw new Task.Result(asObject ? o : JSON.stringify(o)); >+ if (!asObject) { >+ TelemetryStopwatch.start(TELEMETRY_JSON_PAYLOAD_SERIALIZE, this); >+ o = JSON.stringify(o); >+ TelemetryStopwatch.finish(TELEMETRY_JSON_PAYLOAD_SERIALIZE, this); >+ } >+ >+ throw new Task.Result(o); > }, Doesn't some serialization happen earlier in this function? >diff --git a/toolkit/components/telemetry/Histograms.json b/toolkit/components/telemetry/Histograms.json >--- a/toolkit/components/telemetry/Histograms.json >+++ b/toolkit/components/telemetry/Histograms.json Histogram definitions look reasonable
Attachment #714105 - Flags: review+
Comment on attachment 714096 [details] [diff] [review] More Telemetry probes, v1 Oops, commented on the wrong version of the attachment & ended up clearing renwman's r+
Attachment #714096 - Flags: review?(vdjeric)
(In reply to Vladan Djeric (:vladan) from comment #6) > Comment on attachment 714105 [details] [diff] [review] > More telemetry probes (rebased onto s-c). v2 > > > // Yes, this will probably run concurrently with remaining constructor work. > >- TelemetryStopwatch.start(TELEMETRY_INIT, this); > >+ this._initHistogram = this._prefs.get("service.firstRun", false) ? > >+ TELEMETRY_INIT : TELEMETRY_INIT_FIRSTRUN; > >+ > >+ TelemetryStopwatch.start(this._initHistogram, this); > > > > this._ensureDirectoryExists(this._stateDir) > > .then(this._onStateDirCreated.bind(this), > > this._onInitError.bind(this)); > > } > > So these probes will still over-estimate the amount of time spent init'ing > FHR? Define "init'ing FHR." I've defined it as "the time spent ensuring the state directory exists, the database exists and is populated, and the providers and collector are initialized." If you want me to go more granular, we can do that. > > >- throw new Task.Result(asObject ? o : JSON.stringify(o)); > >+ if (!asObject) { > >+ TelemetryStopwatch.start(TELEMETRY_JSON_PAYLOAD_SERIALIZE, this); > >+ o = JSON.stringify(o); > >+ TelemetryStopwatch.finish(TELEMETRY_JSON_PAYLOAD_SERIALIZE, this); > >+ } > >+ > >+ throw new Task.Result(o); > > }, > > Doesn't some serialization happen earlier in this function? The bulk of the function is grabbing data from the database then asking each measurement to formulate that data into a JS object {} which will later be fed into JSON.stringify(). We could probably move the actual JSON.stringify() outside of _getJSONPayload(). Meh.
(In reply to Gregory Szorc [:gps] from comment #8) > Define "init'ing FHR." I've defined it as "the time spent ensuring the state > directory exists, the database exists and is populated, and the providers > and collector are initialized." If you want me to go more granular, we can > do that. Actually I'm asking if it's possible for unrelated events to be processed between TelemetryStopwatch.start(TELEMETRY_INIT) and TelemetryStopwatch.finish(TELEMETRY_INIT)? > The bulk of the function is grabbing data from the database then asking each > measurement to formulate that data into a JS object {} which will later be > fed into JSON.stringify(). So is it possible that a significant amount of data would be deserialized (or serialized) in code that is not timed by this stopwatch?
(In reply to Vladan Djeric (:vladan) from comment #9) > (In reply to Gregory Szorc [:gps] from comment #8) > > Define "init'ing FHR." I've defined it as "the time spent ensuring the state > > directory exists, the database exists and is populated, and the providers > > and collector are initialized." If you want me to go more granular, we can > > do that. > > Actually I'm asking if it's possible for unrelated events to be processed > between TelemetryStopwatch.start(TELEMETRY_INIT) and > TelemetryStopwatch.finish(TELEMETRY_INIT)? If by "unrelated" you mean things not part of FHR, yes. There's really no good way to time async things, sadly. And most of what we do is async. > > The bulk of the function is grabbing data from the database then asking each > > measurement to formulate that data into a JS object {} which will later be > > fed into JSON.stringify(). > > So is it possible that a significant amount of data would be deserialized > (or serialized) in code that is not timed by this stopwatch? There is a place in about:healthreport that serializes the entire payload to a <div>. But I don't expect this to be called that often. This probe is the main location where the payload is serialized to JSON.
(In reply to Gregory Szorc [:gps] from comment #10) > If by "unrelated" you mean things not part of FHR, yes. There's really no > good way to time async things, sadly. And most of what we do is async. :( Is it possible to add up the time spent on main thread doing each "heavy" component of initilaization: Tcheck-dir + Tdb-exists + Tdb-populated + Tproviders-and-collectors-init > There is a place in about:healthreport that serializes the entire payload to > a <div>. But I don't expect this to be called that often. This probe is the > main location where the payload is serialized to JSON. Ok. I was also wondering about the de-serialization of DB strings into JSObjects.. is there a way to capture how long that takes?
(In reply to Vladan Djeric (:vladan) from comment #11) > Is it possible to add up the time spent on main thread doing each "heavy" > component of initilaization: Tcheck-dir + Tdb-exists + Tdb-populated + > Tproviders-and-collectors-init Not really. These operations mostly perform async behavior. They are calling out to worker threads and there is not much beside function call overhead on the main thread. Between DB population and collector init, we probably call out to other threads a few dozen times. There is no easy way to measure time on main thread. The best approximation we have is wall time of the aggregate operations. > > There is a place in about:healthreport that serializes the entire payload to > > a <div>. But I don't expect this to be called that often. This probe is the > > main location where the payload is serialized to JSON. > > Ok. I was also wondering about the de-serialization of DB strings into > JSObjects.. is there a way to capture how long that takes? Not really. This happens over a few dozen calls to the same functions. We'd need to time each then sum them.
So it sounds like it would be difficult to get a more precise evaluation of the impact of FHR operations on the main thread because there are too many async sub-operations. I guess we'll have to make do with these measures although they are essentially a pessimistic upper-bound :( If we notice reports of long delays in the field using these measures, we'll have to investigate & instrument further. Unfortunately, we won't know for sure until after the DB has been used for a couple of months. - Can we add 2 new measurements for just opening the DB on first run of FHR vs subsequent runs? - If you have a bit of time, you can fill up your local FHR DB with >6 months of data and then run Benoit Girard's profiler to find out how much time gets spent executing FHR JS on the main thread during various FHR operations: https://addons.mozilla.org/en-us/firefox/addon/gecko-profiler/ If you have an older Windows machine with a mechanical HDD lying around, that might be the ideal test candidate. You should also probably restart your machine to clear your disk cache before running the tests. I'm on PTO/weekend/holiday the next 4 days, but I'll be answering e-mails, doing reviews & visiting IRC periodically so you can ping me for help on setting this up.
This patch adds additional histograms, as requested. Please verify! (I also fixed the description for the other new probe added in the previous patch.) I don't have a slow machine nearby that's set up for dev, and no Windows machine at all (yet). Furthermore, Greg is away tomorrow and this weekend, and I have limited availability. Can we work on that extended next week?
Attachment #714105 - Attachment is obsolete: true
Attachment #714255 - Flags: review?(vdjeric)
Testing this locally, the first time it opened my DB the browser was busy restoring sessions, throwing stuff into error console, etc, and I got 700msec for DB open, which I'm sure is mostly wall clock. The next time startup was much smoother, and I got 15msec. Background work makes timing really hard!
Comment on attachment 714255 [details] [diff] [review] More telemetry probes (rebased onto s-c). v3 Giving this to gps for review.
Attachment #714255 - Flags: review?(vdjeric) → review?(gps)
Comment on attachment 714255 [details] [diff] [review] More telemetry probes (rebased onto s-c). v3 Review of attachment 714255 [details] [diff] [review]: ----------------------------------------------------------------- Vladan is on PTO IIRC. I'm pretty sure the only part he needed to review was the Histograms.json change. Since the delta from his comment effectively r+'ing that part is trivial, I'm confident we can check this in without compromising the review process. ::: toolkit/components/telemetry/Histograms.json @@ +2625,5 @@ > + "HEALTHREPORT_DB_OPEN_FIRSTRUN_MS": { > + "kind": "exponential", > + "high": "20000", > + "n_buckets": 15, > + "description": "Time (ms) spent to open Firefox Health Report's database the first time." Is it worth calling out that this not only opens the raw DB connection (main thread I/O) but also sets up schema and type information?
Attachment #714255 - Flags: review?(gps) → review+
Tweaked descriptions, rebased onto current histogram file. https://hg.mozilla.org/integration/mozilla-inbound/rev/6e0f106fd859
Target Milestone: --- → mozilla21
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Blocks: 845127
Component: Metrics and Firefox Health Report → Client: Desktop
Product: Mozilla Services → Firefox Health Report
Target Milestone: mozilla21 → Firefox 21
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: