Collect Telemetry stats on startup cache hits and misses
Categories
(Core :: XPConnect, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox67 | --- | fixed |
People
(Reporter: ehsan.akhgari, Assigned: alexical)
References
Details
(Keywords: perf, Whiteboard: [fxperf:p1])
Attachments
(3 files)
Comment 1•8 years ago
|
||
Comment 2•8 years ago
|
||
Updated•8 years ago
|
Reporter | ||
Comment 3•8 years ago
|
||
Comment 4•8 years ago
|
||
Updated•8 years ago
|
Reporter | ||
Comment 5•8 years ago
|
||
Reporter | ||
Comment 6•8 years ago
|
||
Comment 7•8 years ago
|
||
Comment 8•8 years ago
|
||
Reporter | ||
Comment 9•8 years ago
|
||
Comment 10•8 years ago
|
||
Comment 11•7 years ago
|
||
Comment 12•7 years ago
|
||
Comment 13•7 years ago
|
||
Comment 14•7 years ago
|
||
Comment 15•7 years ago
|
||
Comment 16•6 years ago
|
||
Updated•6 years ago
|
Comment 17•6 years ago
|
||
It's not clear to me if this is still an issue. Reading through this bug again, Florian saw it once, and it may have been fixed by some of kmag's work.
It'd be great if we had information about cache hits and misses when profiling start-up.
Updated•6 years ago
|
Comment 18•6 years ago
|
||
We should do this sooner rather than later. We've definitely seen this in more recent performance profiles on the reference device. Mossop found one issue on Windows (bug 1519184) but without telemetry we have no idea what the size of the problem is.
Assignee | ||
Comment 19•6 years ago
|
||
Taking a look at this.
Assignee | ||
Comment 20•6 years ago
|
||
Do we know what we're looking for with this data? What are we expecting to see if there are no bugs? We can filter out in our analysis the cases where we know the cache is invalidated like app and extension updates, but do we know what we want to see after we've filtered out what we can, or are we just going to look at the hit/miss rates and say "hmm, does that look right?"
Comment 21•6 years ago
|
||
We're going to stop flushing the cache after extension updates. Except in the case of local builds, we really only want to see it flushed after app updates.
I'm not sure how much we want to filter even that, though, given that we've seen bugs where we trigger that code path when we don't want to...
Assignee | ||
Comment 22•6 years ago
|
||
In bug 1264235 we have some indication that observed bugs with the
startup cache might have been resolved, but we don't really know
until we collect data. Collecting these stats will give us the
ability to have more certainty that the startup cache is functioning
correctly in the wild.
Comment 23•6 years ago
|
||
In case it's relevant, we used to have some telemetry, added in bug 711297, but it seems to have been removed since then.
Updated•6 years ago
|
Assignee | ||
Comment 24•6 years ago
•
|
||
(Hopefully a needinfo is sufficient without review flags. Should https://wiki.mozilla.org/Firefox/Data_Collection#Requesting_Approval be updated?)
- What questions will you answer with this data?
- To what extent are our caches improving startup performance?
- How often do we see startup caches not aligning with our expectations? I.e., do we see any users who consistently encounter mostly misses w/ their startup caches (accounting for invalidations from app updates)
- Ongoing: have startup caches regressed in some way?
- Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements?
-
Establish baselines or measure changes in product or platform quality or performance.
-
We have a bug report (in this bug) indicating that this was at one time a problem. We want to try to assess if it is still a problem.
- What alternative methods did you consider to answer these questions? Why were they not sufficient?
- Eyeballing the code / trying to reproduce cache problems ourselves. Neither of these can show conclusively that there's not a significant problem in the wild.
- Can current instrumentation answer these questions?
No.
- List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories on the Mozilla wiki.
-
Number of hits / misses to the StartupCache
- Category 1
- Bug 1364235
-
Number of hits / misses to the ScriptPreloader
- Category 1
- Bug 1364235
-
Distribution of amount of time spent waiting for off-thread compiles in the ScriptPreloader
- Category 1
- Bug 1364235
-
How many times we ended up recompiling a script from the script preloader on the main thread.
- Category 1
- Bug 1364235
- How long will this data be collected? Choose one of the following:
- I want this data to be collected for 6 months initially (potentially renewable).
- What populations will you measure?
- Which release channels?
All
- Which countries?
All
- Which locales?
All
- Any other filters? Please describe in detail below.
- If this data collection is default on, what is the opt-out mechanism for users?
The usual technical data opt-out in about:preferences
- Please provide a general description of how you will analyze this data.
Viewing the histograms via TMO, and deeper analysis via SQL or Spark
- Where do you intend to share the results of your analysis?
Likely in this bug or a follow-up.
- Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection? If so:
No.
Comment 25•6 years ago
|
||
Preliminary Notes:
Sorry about that. I've updated the docs to specify that you should use the new data-revew
flag on attachments.
Also, your review mentions you'd like to collect on all channels, but your definitions will only record on pre-release channels. For the purposes of Data Review I'll assume you want to collect on all channels and will update the definitions accordingly.
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes. This collection is Telemetry so is documented in its definitions files (Histograms.json and Scalars.yaml), the Probe Dictionary, and on telemetry.mozilla.org's Measurement Dashboards.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.
If the request is for permanent data collection, is there someone who will monitor the data over time?
N/A this collection expires in Firefox 73.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 1, Technical.
Is the data collection request for default-on or default-off?
Default on for all channels.
Does the instrumentation include the addition of any new identifiers?
No.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does there need to be a check-in in the future to determine whether to renew the data?
Yes. :dthayer is responsible for removing or renewing the collection before it expires in Firefox 73.
Result: datareview+
Comment 26•6 years ago
|
||
Comment 27•6 years ago
|
||
(In reply to Pulsebot from comment #26)
Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1a681dc60b60
Collect telemetry stats on startup cache hits and misses r=kmag
Why was the startup test as written by using Marionette put under testing/marionette/../unit/? That folder is actually the wrong place given that it only contains the unit tests for Marionette itself. Can you please move it to the same location as the other files. /startup/tests/marionette
would be great. You only would have to reference it in:
Assignee | ||
Comment 28•6 years ago
|
||
Ack - my mistake, sorry. I will move it shortly. Leaving the needinfo.
Comment 29•6 years ago
|
||
bugherder |
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 30•6 years ago
|
||
Preliminary analysis:
ScriptPreloader:
Hit 116977763
HitChild 2559619
Miss 160545632
StartupCache:
HitMemory 153288
HitDisk 60646842
Miss 9053288
This is just a straightforward analysis of the data on Nightly for cases where this isn't the first run after an update/install, which unfortunately on Nightly is relatively rare and might deviate quite a bit from the release situation.
Nothing in these numbers seems very surprising to me.
I think the next steps are to do an analysis to see if any profiles are somehow stuck in a pathological case where the hit/miss rates are consistently bad.
Assignee | ||
Comment 31•6 years ago
|
||
Numbers on Beta look qualitatively similar:
ScriptPreloader:
Hit 1060675539 47%
HitChild 26926693 01%
Miss 1152341886 52%
Total 2239944118
StartupCache:
HitMemory 572734 00%
HitDisk 674959366 93%
Miss 48281907 07%
Total 723814007
Description
•