Closed
Bug 844331
Opened 12 years ago
Closed 9 years ago
Support recording of Telemetry at shutdown
Categories
(Toolkit :: Telemetry, defect)
Toolkit
Telemetry
Tracking
()
RESOLVED
FIXED
mozilla22
People
(Reporter: vladan, Unassigned)
References
Details
(Whiteboard: [Telemetry:P1][leave open])
Attachments
(9 files, 3 obsolete files)
|
3.92 KB,
patch
|
taras.mozilla
:
review+
|
Details | Diff | Splinter Review |
|
7.18 KB,
patch
|
taras.mozilla
:
review+
|
Details | Diff | Splinter Review |
|
2.89 KB,
patch
|
taras.mozilla
:
review+
|
Details | Diff | Splinter Review |
|
4.64 KB,
patch
|
taras.mozilla
:
review+
|
Details | Diff | Splinter Review |
|
5.59 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
|
6.02 KB,
patch
|
benjamin
:
review+
bajaj
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
|
3.41 KB,
patch
|
vladan
:
review+
bajaj
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
|
5.80 KB,
patch
|
vladan
:
review+
akeybl
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
|
3.29 KB,
patch
|
vladan
:
review+
|
Details | Diff | Splinter Review |
Telemetry writes out session data to a file at a very early stage of shutdown:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryPing.js#1083
However, Telemetry users such as FHR and Necko try to record Telemetry for shutdown operations that happen during profile-before-change:
HEALTHREPORT_SHUTDOWN_DELAY_MS
CACHE_SERVICE_LOCK_WAIT_MAINTHREAD_NSCACHESERVICE_ONPROFILESHUTDOWN
NETWORK_DISK_CACHE_DELETEDIR_SHUTDOWN
As a result, most of this data is lost and we only receive a handful of reports during weird race conditions (e.g. races between idle-daily & shutdown).
We should either move the writing of the session file to a later point in the shutdown process (e.g. as the last operation during profiler-before-change) or tell devs not to record Telemetry at shutdown.
| Reporter | ||
Updated•12 years ago
|
Whiteboard: [Telemetry:P1]
Comment 1•12 years ago
|
||
I am indifferent about this one, but the name was going to get a little
unwieldy with future commits, so...can drop it on the floor if you like.
Attachment #719613 -
Flags: review?(taras.mozilla)
Comment 2•12 years ago
|
||
getSimpleMeasurements should really live in TelemetryPing anyway, and moving
it, along with the consolidation of the io counters and whatnot, makes it
easier to split the construction of the ping later on.
Attachment #719614 -
Flags: review?(taras.mozilla)
Comment 3•12 years ago
|
||
We'll need these split out functions later. The idea is to gather simpleMeasurements
and metadata at quit-application-granted and then gather all the histogram info
at xpcom-will-shutdown.
Attachment #719616 -
Flags: review?(taras.mozilla)
Comment 4•12 years ago
|
||
We need to get at the profile directory at xpcom-will-shutdown, but we no
longer have an active profile at that point (profile-before-change has
already run). It seemed simplest to simply cache the profile directory
in profile-after-change and use that. This needs changes to test code
for the xpcshell tests, which don't send the appropriate notifications.
Comment 5•12 years ago
|
||
Here's the interesting part of the patch.
Attachment #719618 -
Flags: review?(taras.mozilla)
Updated•12 years ago
|
Attachment #719617 -
Flags: review?(taras.mozilla)
Updated•12 years ago
|
Attachment #719613 -
Flags: review?(taras.mozilla) → review+
Updated•12 years ago
|
Attachment #719614 -
Flags: review?(taras.mozilla) → review+
Comment 6•12 years ago
|
||
Comment on attachment 719616 [details] [diff] [review]
part 3 - split out payload/payload+slug construction to separate functions
this seems reasonable. I'll trust you that this works.
Attachment #719616 -
Flags: review?(taras.mozilla) → review+
Comment 7•12 years ago
|
||
Comment on attachment 719617 [details] [diff] [review]
part 4 - cache the profile directory in TelemetryPing
this.setup();
+ this.cacheProfileDirectory();
r+, but this feels like it should be done in setup(). No strong opinion here.
Attachment #719617 -
Flags: review?(taras.mozilla) → review+
Comment 8•12 years ago
|
||
Comment on attachment 719618 [details] [diff] [review]
part 5 - move saved-session writing to xpcom-will-shutdown to catch cache and FHR activity
Thanks for nicely-split patches.
Attachment #719618 -
Flags: review?(taras.mozilla) → review+
Comment 9•12 years ago
|
||
This needs a test. I'm ok with this landing without a test + doing test in a followup.
Comment 10•12 years ago
|
||
(In reply to Taras Glek (:taras) from comment #9)
> This needs a test. I'm ok with this landing without a test + doing test in a
> followup.
Do you have good ideas about how to test this? xpcshell doesn't really do notifications, and we can't really test shutdown in, say, browser-chrome. I guess we could try faking the notifications in xpcshell and hoping they don't crash...? Or test-only notifications that execute the same codepaths?
It's also worth pointing out two things:
1) I forgot to add TelemetryPing of an observer of xpcom-will-shutdown. Will fix.
2) On Windows--at least--I don't think we ever get xpcom-will-shutdown. At least on Windows, it looks like we do a shorter shutdown sequence:
http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp#4544
Looks like we should be smarter about monitoring profile-before-change, perhaps? Though it looks like FHR already looks at profile-before-change:
http://dxr.mozilla.org/mozilla-central/services/healthreport/healthreporter.jsm#l216
and records telemetry during it:
http://dxr.mozilla.org/mozilla-central/services/healthreport/healthreporter.jsm#l333
so it may be a race to see whether we can get information from FHR anyway...
Comment 11•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #10)
> (In reply to Taras Glek (:taras) from comment #9)
> > This needs a test. I'm ok with this landing without a test + doing test in a
> > followup.
>
> Do you have good ideas about how to test this? xpcshell doesn't really do
> notifications, and we can't really test shutdown in, say, browser-chrome. I
> guess we could try faking the notifications in xpcshell and hoping they
> don't crash...? Or test-only notifications that execute the same codepaths?
I see cookie tests just blithely sending profile-before-change notifications, so maybe we can get away with doing that too.
From browsing around, it looks like everybody else is saving stuff at profile-before-change. Maybe we should be saving then too?
| Reporter | ||
Comment 12•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #11)
> From browsing around, it looks like everybody else is saving stuff at
> profile-before-change. Maybe we should be saving then too?
Telemetry would have to be the *last* observer to handle the profile-before-change event since other components might record in histograms in their own handlers. I don't think there's any way to guarantee this and I'm not sure it's desirable to hack it in.
Firefox does get the "xpcom-will-shutdown" event during regular shutdowns, but the nsWindow.cpp code you linked to is the "fast shutdown" that Firefox performs when Windows tell Firefox that the user has requested OS shutdown.
What do you think of extending the "fast shutdown" sequence to also call xpcom-will-shutdown?
Comment 13•12 years ago
|
||
(In reply to Vladan Djeric (:vladan) from comment #12)
> (In reply to Nathan Froyd (:froydnj) from comment #11)
> > From browsing around, it looks like everybody else is saving stuff at
> > profile-before-change. Maybe we should be saving then too?
>
> Telemetry would have to be the *last* observer to handle the
> profile-before-change event since other components might record in
> histograms in their own handlers. I don't think there's any way to guarantee
> this and I'm not sure it's desirable to hack it in.
Right, that would be ugly.
> What do you think of extending the "fast shutdown" sequence to also call
> xpcom-will-shutdown?
I don't have a problem with it, the question is whether the Android and Windows folks have a problem with it. roc, WDYT about adding xpcom-will-shutdown to the code here:
http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp#4557
http://mxr.mozilla.org/mozilla-central/source/widget/android/nsAppShell.cpp#379
so that we have a place to collect Telemetry data after more-or-less everybody has finished doing things in profile-before-change?
Flags: needinfo?(roc)
Comment 14•12 years ago
|
||
Actually, roc, why don't you just review the patch?
Attachment #720074 -
Flags: review?(roc)
Attachment #720074 -
Flags: review?(roc) → review?(benjamin)
Comment 16•12 years ago
|
||
Comment on attachment 720074 [details] [diff] [review]
have android and windows shutdown notify on xpcom-will-shutdown
I don't think that this is a good name for the action being taken here. As you've described it, this is really "a notification fired right after profile-before-change". Presumably you want this notification to be fired even if we early-exit(), and xpcom-will-shutdown definitely should *not* be fired in this case.
For this case you should just create/use a profile-before-change2.
Attachment #720074 -
Flags: review?(benjamin) → review-
Comment 17•12 years ago
|
||
A semi-blind addition of notifying profile-before-change2 everywhere we
notify profile-before-change. That dom/power bit looks *really* ugly...
Attachment #720074 -
Attachment is obsolete: true
Attachment #722839 -
Flags: review?(benjamin)
Comment 18•12 years ago
|
||
This just moves saved-session writing to profile-before-change2. No material
changes from the earlier part 5, so carrying over r+.
Attachment #719618 -
Attachment is obsolete: true
Attachment #722840 -
Flags: review+
Comment 19•12 years ago
|
||
...and fix typoes.
Attachment #722839 -
Attachment is obsolete: true
Attachment #722839 -
Flags: review?(benjamin)
Attachment #722873 -
Flags: review?(benjamin)
Updated•12 years ago
|
Attachment #722873 -
Flags: review?(benjamin) → review+
Comment 20•12 years ago
|
||
Please update https://wiki.mozilla.org/XPCOM_Shutdown with docs on this when it lands.
Comment 21•12 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/62139bbc7af6
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/53c240fdf21c
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/f8e226c778e1
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/3b54f36d54a8
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/f92ce45a6b92
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/80034b418a48
Comment 22•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/62139bbc7af6
https://hg.mozilla.org/mozilla-central/rev/53c240fdf21c
https://hg.mozilla.org/mozilla-central/rev/f8e226c778e1
https://hg.mozilla.org/mozilla-central/rev/3b54f36d54a8
https://hg.mozilla.org/mozilla-central/rev/f92ce45a6b92
https://hg.mozilla.org/mozilla-central/rev/80034b418a48
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Comment 23•12 years ago
|
||
The xpcshell test harness does send shutdown events, if the test uses the profile: http://mxr.mozilla.org/comm-central/source/mozilla/testing/xpcshell/head.js#909
Do we want to add profile-before-change2 there as well?
Comment 24•12 years ago
|
||
Comment on attachment 719613 [details] [diff] [review]
part 1 - rename bits with |getCurrentSessionPayload| to |getSessionPayload|
[Approval Request Comment]
Would like to upload this series to provide better measurement of things that Firefox Health Report and the network cache do.
Bug caused by (feature/regressing bug #): None
User impact if declined: No user impact. Strictly a measurement improvement patch.
Testing completed (on m-c, etc.): On m-c for a week.
Risk to taking this patch (and alternatives if risky): Little risk.
String or UUID changes made by this patch: None.
Attachment #719613 -
Flags: approval-mozilla-aurora?
Comment 25•12 years ago
|
||
Comment on attachment 719614 [details] [diff] [review]
part 2 - move getSimpleMeasurements to live in TelemetryPing
[Approval Request Comment]
Would like to upload this series to provide better measurement of things that Firefox Health Report and the network cache do.
Bug caused by (feature/regressing bug #): None
User impact if declined: No user impact. Strictly a measurement improvement patch.
Testing completed (on m-c, etc.): On m-c for a week.
Risk to taking this patch (and alternatives if risky): Little risk.
String or UUID changes made by this patch: None.
Attachment #719614 -
Flags: approval-mozilla-aurora?
Comment 26•12 years ago
|
||
Comment on attachment 719616 [details] [diff] [review]
part 3 - split out payload/payload+slug construction to separate functions
[Approval Request Comment]
Would like to upload this series to provide better measurement of things that Firefox Health Report and the network cache do.
Bug caused by (feature/regressing bug #): None
User impact if declined: No user impact. Strictly a measurement improvement patch.
Testing completed (on m-c, etc.): On m-c for a week.
Risk to taking this patch (and alternatives if risky): Little risk.
String or UUID changes made by this patch: None.
Attachment #719616 -
Flags: approval-mozilla-aurora?
Comment 27•12 years ago
|
||
Comment on attachment 719617 [details] [diff] [review]
part 4 - cache the profile directory in TelemetryPing
[Approval Request Comment]
Would like to upload this series to provide better measurement of things that Firefox Health Report and the network cache do.
Bug caused by (feature/regressing bug #): None
User impact if declined: No user impact. Strictly a measurement improvement patch.
Testing completed (on m-c, etc.): On m-c for a week.
Risk to taking this patch (and alternatives if risky): Little risk.
String or UUID changes made by this patch: None.
Attachment #719617 -
Flags: approval-mozilla-aurora?
Comment 28•12 years ago
|
||
Comment on attachment 722840 [details] [diff] [review]
part 6 - move saved-session writing to profile-before-change2 to catch cache and FHR activity
[Approval Request Comment]
Would like to upload this series to provide better measurement of things that Firefox Health Report and the network cache do.
Bug caused by (feature/regressing bug #): None
User impact if declined: No user impact. Strictly a measurement improvement patch.
Testing completed (on m-c, etc.): On m-c for a week.
Risk to taking this patch (and alternatives if risky): Little risk.
String or UUID changes made by this patch: None.patch:
Attachment #722840 -
Flags: approval-mozilla-aurora?
Comment 29•12 years ago
|
||
Comment on attachment 722873 [details] [diff] [review]
part 5 - add profile-before-change2 notification
[Approval Request Comment]
Would like to upload this series to provide better measurement of things that Firefox Health Report and the network cache do.
Bug caused by (feature/regressing bug #): None
User impact if declined: No user impact. Strictly a measurement improvement patch.
Testing completed (on m-c, etc.): On m-c for a week.
Risk to taking this patch (and alternatives if risky): Little risk.
String or UUID changes made by this patch: None.
Attachment #722873 -
Flags: approval-mozilla-aurora?
Comment 30•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #29)
> Comment on attachment 722873 [details] [diff] [review]
> part 5 - add profile-before-change2 notification
>
> [Approval Request Comment]
> Would like to upload this series to provide better measurement of things
> that Firefox Health Report and the network cache do.
>
> Bug caused by (feature/regressing bug #): None
> User impact if declined: No user impact. Strictly a measurement improvement
> patch.
> Testing completed (on m-c, etc.): On m-c for a week.
> Risk to taking this patch (and alternatives if risky): Little risk.
Can you please expand a bit more on the risk here, what could be the worst thing that may happen ?
Considering this is a last week of aurora and this seems a huge uplift right before the merge, are we worried about stability ?Also is this strictly being uplifted to help with FHR and how badly do we need it there ?
Comment 31•12 years ago
|
||
(In reply to bhavana bajaj [:bajaj] from comment #30)
> (In reply to Nathan Froyd (:froydnj) from comment #29)
> > Risk to taking this patch (and alternatives if risky): Little risk.
>
> Can you please expand a bit more on the risk here, what could be the worst
> thing that may happen ?
The worst thing that could happen is that Telemetry would somehow get shut off, but we'd notice that after about a week or so of the patch landing. So we'd have time to back out/fix.
> Considering this is a last week of aurora and this seems a huge uplift right
> before the merge, are we worried about stability ?Also is this strictly
> being uplifted to help with FHR and how badly do we need it there ?
Stability of the browser shouldn't be a problem with these patches.
FHR records several pieces of data at browser shutdown, such as how long FHR shutdown takes. Due to bad interactions with Telemetry, we're not getting any visibility on that data. There's also some networking cache telemetry that we don't have any visibility into.
There's a much shorter fix that could be landed on Aurora, roughly equivalent to part 5 and part 6 of the above patch series. Would you prefer that instead?
Flags: needinfo?(bbajaj)
Comment 32•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #31)
> (In reply to bhavana bajaj [:bajaj] from comment #30)
> > (In reply to Nathan Froyd (:froydnj) from comment #29)
> > > Risk to taking this patch (and alternatives if risky): Little risk.
> >
> > Can you please expand a bit more on the risk here, what could be the worst
> > thing that may happen ?
>
> The worst thing that could happen is that Telemetry would somehow get shut
> off, but we'd notice that after about a week or so of the patch landing. So
> we'd have time to back out/fix.
>
> > Considering this is a last week of aurora and this seems a huge uplift right
> > before the merge, are we worried about stability ?Also is this strictly
> > being uplifted to help with FHR and how badly do we need it there ?
>
> Stability of the browser shouldn't be a problem with these patches.
>
> FHR records several pieces of data at browser shutdown, such as how long FHR
> shutdown takes. Due to bad interactions with Telemetry, we're not getting
> any visibility on that data. There's also some networking cache telemetry
> that we don't have any visibility into.
>
> There's a much shorter fix that could be landed on Aurora, roughly
> equivalent to part 5 and part 6 of the above patch series. Would you prefer
> that instead?
If that workaround solves the purpose here and is even slightly less riskier I think we should go with that option at this time.
Flags: needinfo?(bbajaj)
Comment 33•12 years ago
|
||
Here's a much simpler patch for Aurora uplift only. Instead of using all
the complicated machinery, we simply move ping saving to profile-before-change2.
Which is probably what I should have done once Benjamin said that adding a
new notification was OK.
Attachment #730703 -
Flags: review?(vdjeric)
Updated•12 years ago
|
Attachment #719613 -
Flags: approval-mozilla-aurora?
Updated•12 years ago
|
Attachment #719614 -
Flags: approval-mozilla-aurora?
Updated•12 years ago
|
Attachment #719616 -
Flags: approval-mozilla-aurora?
Updated•12 years ago
|
Attachment #719617 -
Flags: approval-mozilla-aurora?
Updated•12 years ago
|
Attachment #722840 -
Flags: approval-mozilla-aurora?
| Reporter | ||
Updated•12 years ago
|
Attachment #730703 -
Flags: review?(vdjeric) → review+
Comment 34•12 years ago
|
||
Comment on attachment 730703 [details] [diff] [review]
part 6 - switch saving pings over to profile-before-change2
Here's a much simpler patch that, in conjunction with part 5, would implement the minimum that we would need for an Aurora uplift. Taking part 5 and part 6 on Aurora would be much more palatable than the full series. See comment 31 especially for why uplifting these would be a good thing.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): None.
User impact if declined: None.
Testing completed (on m-c, etc.): Local testing OK.
Risk to taking this patch (and alternatives if risky): No patch is risk free; this patch has low risk.
String or IDL/UUID changes made by this patch: None.
Attachment #730703 -
Flags: approval-mozilla-aurora?
Comment 35•12 years ago
|
||
Comment on attachment 730703 [details] [diff] [review]
part 6 - switch saving pings over to profile-before-change2
low risk patch, helps FHR get visibility into shutdown data.
Thanks for the alternative patches here for aurora, please keep an eye on any regressions this may cause due to less baketime on aurora.
Attachment #730703 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•12 years ago
|
Attachment #722873 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 36•12 years ago
|
||
remote: https://hg.mozilla.org/releases/mozilla-aurora/rev/643f2a754972
remote: https://hg.mozilla.org/releases/mozilla-aurora/rev/da9a2b8a16cc
status-firefox21:
--- → fixed
status-firefox22:
--- → fixed
Comment 37•12 years ago
|
||
This bug looks like a very likely candidate for breaking Telemetry submissions
starting around the 19th of February. This patch backs out the last part of
the patch series.
I realize the expected thing to do would be to back out the entire patch series.
However, this is the only patch that actually changes any behavior; the others
are merely rearranging code. I also verified using the debugger that the
functions the other patches affected are still working properly and not throwing
errors. So this is really the only patch that matters...
Attachment #732016 -
Flags: review?(vdjeric)
| Reporter | ||
Updated•12 years ago
|
Attachment #732016 -
Flags: review?(vdjeric) → review+
Comment 38•12 years ago
|
||
Backed out part 6: https://hg.mozilla.org/integration/mozilla-inbound/rev/322dcd797401
Verified on Windows that backing out that patch is sufficient to restore saving of telemetry pings.
It's possible that the patch we uplifted to FF21 (now on beta) is really the right way to go. Will try that next.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [Telemetry:P1] → [Telemetry:P1][leave open]
Comment 39•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #38)
> Backed out part 6:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/322dcd797401
https://hg.mozilla.org/mozilla-central/rev/322dcd797401
Comment 40•12 years ago
|
||
Comment on attachment 732016 [details] [diff] [review]
Back out 80034b418a48 (bug 844331, part 6) on suspicion of breaking Telemetry submission
It turns out that the patches from this bug that were landed late in the last release cycle on m-c markedly lowered telemetry submission rates. This patch landed on m-c ~1 week ago and telemetry submission rates have shown a significant uptick.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 844331
User impact if declined: None
Testing completed (on m-c, etc.): On m-c for > 1 week. Has markedly improved telemetry submission rates.
Risk to taking this patch (and alternatives if risky): None.
String or IDL/UUID changes made by this patch: None
Attachment #732016 -
Flags: approval-mozilla-aurora?
Comment 41•12 years ago
|
||
Comment on attachment 732016 [details] [diff] [review]
Back out 80034b418a48 (bug 844331, part 6) on suspicion of breaking Telemetry submission
Always willing to take fixes in support of our metrics when low risk :)
Attachment #732016 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 42•12 years ago
|
||
Comment 43•12 years ago
|
||
I filed bug 862599 on what looks to be a regression on Beta caused by this patch.
Comment 44•12 years ago
|
||
bug 867401 eliminates the issues with saving pings in profile-before-change2.
This patch, therefore, moves the saving of pings to profile-before-change2.
I've verified on Windows that we do save pings with these changes.
Attachment #744235 -
Flags: review?(vdjeric)
| Reporter | ||
Updated•12 years ago
|
Attachment #744235 -
Flags: review?(vdjeric) → review+
Comment 45•12 years ago
|
||
Comment 46•9 years ago
|
||
Per comment 44 the last outstanding patch here got landed.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•