Closed Bug 864537 Opened 9 years ago Closed 3 years ago

Aborted session percentage is ~2x greater on Beta compared with Aurora and Nightly

Categories

(Toolkit :: Telemetry, defect, P4)

defect

Tracking

()

RESOLVED INVALID

People

(Reporter: gps, Unassigned)

Details

(Whiteboard: [measurement:client])

I was looking at raw session counts - total aborted and active per channel - and noticed that the percentage of aborted sessions is about 2x greater on the Beta channel compared with the Nightly and Aurora channels.

On any given day, 10-20% of Beta 21 sessions are reported by FHR as aborted. Contrast this with Aurora 22 where consistently 6-7% of sessions are aborted. Nightly is in the same ballpark as Aurora.

FHR records a clean shutdown by setting the datareporting.sessions.current.clean pref to true during the "profile-before-change" shutdown notification. When the application next runs the previous session is marked as aborted if this pref is not present. See /services/datareporting/sessions.jsm.

The only time we should see an aborted session is if app shutdown wasn't clean (crash or something else). Another possible explanation is pref writes during "profile-before-change" are not persisted.

Next steps should be to try to isolate the higher rate to platform/os/build ID and to ask around to see if there are any known issues with Beta. Perhaps there's a difference in build configuration on Beta/21 around I/O at shutdown time? The notifications?

Since we have no FHR data from before Firefox 21, I'm not sure if it's possible to isolate this to 21 and/or Beta channel. Perhaps Telemetry has something we could correlate against?
BenWa: I was told you may be able to offer insights.
Notes: if this is peculiar to windows, this is likely due to the abrupt-termination codepath which occurs when people log out/shut down Windows. In this case, we cannot shut down the app normally because Windows terminates the process as soon as you return from handling the WM_ENDSESSION message. So instead we fire off all the notifications synchronously and then _exit.

Possibilities:
* Firing the notifications is causing somebody to spin the event loop, which reenters PeekMessage, and Windows is killing us off .
* We're completing this sequence but when we wrote the pref file but it didn't quite get flushed. This seems unlikely because Preferences::WritePrefFile uses a safeoutputstream.
* You are setting prefs during profile-before-change *after* the prefservice has already saved its prefs. Observer notifications get fired in reverse order, so this basically means you'd have to be registering your observer before the prefservice is instantiated, which is hard but perhaps not impossible.
I'm running a job now to get numbers per platform and build ID.

Differences in user behavior between beta and aurora/nightly and increased abrupt termination could explain this. I was made aware yesterday that we have many users unknowingly on the Beta channel - leftovers from when people were anxious to install Firefox 4 Beta. These "regular" users could explain increased amounts of abrupt terminations.
Here's are the things off the top of my head:
* I believe OSX/Linux is over represented in Nightly. This reinforces windows problem such as what bsmedberg suggested.
* We've landed shutdown perf improvements in the last few release. See Bug 819063. They range between mozilla20-22. This doesn't explain this difference.

Do we have data to compare with old Beta 20?
We only have FHR data from 21 and newer.
So, less than 1% of total captured sessions on Beta are non-Windows. So, the overall percentage for the Beta channel is essentially the Windows aborted rate. What was interesting is the Linux aborted rate is *higher* than Windows!

Keep in mind FHR may not be enabled on all Linux builds in the wild because FHR is not yet enabled on xulrunner (bug 831688). And, I suspect sessions on Linux builds typically last longer because I suspect Linux/power users keep sessions open for longer than normal users. (This reminds me - I want to create a histogram of session length.)

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #2)
> * You are setting prefs during profile-before-change *after* the prefservice
> has already saved its prefs. Observer notifications get fired in reverse
> order, so this basically means you'd have to be registering your observer
> before the prefservice is instantiated, which is hard but perhaps not
> impossible.

I would bet some chips on this one. Although, I do wonder why we weren't seeing this until now. I'm naive about the inner workings of XPCOM service registration, but you'd think behavior would be consistent. Either the session recording service is being called first or it isn't. What would affect this? Build order? Run-time randomness?

While we do have plans to move session recording away from prefs and into Gecko/C++ (bug 841561), are there any quick stop-gaps? Perhaps we could change preferences to flush in the newly introduced profile-before-change2 event? Perhaps we could just move the prefs flush call into the xpcom shutdown sequence itself so as to not be impacted by observers? Preferences are a core component of Gecko, are they not? And, this might even cause prefs saves in scenarios where they are currently being lost (e.g. if an observer is "hanging").
Trying to reason about whether http://hg.mozilla.org/mozilla-central/annotate/1c5977e8d52f/modules/libpref/src/Preferences.cpp#l390 or http://hg.mozilla.org/mozilla-central/annotate/1c5977e8d52f/services/datareporting/sessions.jsm#l276 runs first seems pretty weird. 

Who calls SessionRecorder.onStartup anyway? You could add code which explicitly gets the preference service at the beginning of that function, to guarantee that its observer is registered first.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #8)
> Trying to reason about whether
> http://hg.mozilla.org/mozilla-central/annotate/1c5977e8d52f/modules/libpref/
> src/Preferences.cpp#l390 or
> http://hg.mozilla.org/mozilla-central/annotate/1c5977e8d52f/services/
> datareporting/sessions.jsm#l276 runs first seems pretty weird. 
> 
> Who calls SessionRecorder.onStartup anyway?

The profile-after-change observer in DataReportingService.js, AFAICT.

> You could add code which explicitly gets the preference service at the beginning
> of that function, to guarantee that its observer is registered first.

Looks to me like that already happens, effectively:
http://hg.mozilla.org/mozilla-central/annotate/690b5e0f6562/services/datareporting/DataReportingService.js#l120
We should check if this is still relevant.
Component: Client: Desktop → Telemetry
Priority: -- → P4
Product: Firefox Health Report → Toolkit
Whiteboard: [measurement:client]
FHR is gone.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.