FHR v2/v4 comparison -- totalTime and activeTicks per session must match

RESOLVED WONTFIX

Status

Firefox Health Report
Client: Desktop
RESOLVED WONTFIX
3 years ago
3 years ago

People

(Reporter: brendan c, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [unifiedTelemetry][data-validation])

(Reporter)

Description

3 years ago
For each complete reconstructed session starting within this date range (i.e., not the ongoing session in appSession.current or a session for which not all subsessions have been collected), we should also have these match identically:
* totalTime per session
* activeTicks per session

Among other things, this check is meant to help us verify that measurements are being binned correctly across subsessions.
(Reporter)

Updated

3 years ago
Blocks: 1169103
(Reporter)

Updated

3 years ago
Depends on: 1187054
(Reporter)

Updated

3 years ago
Depends on: 1187069
(Reporter)

Comment 1

3 years ago
So there appear to be some discrepancies that actually do roll up to be substantially different between v2 and v4.

http://nbviewer.ipython.org/gist/bcolloran/1ca418a0e2ce3a81c343#Comparing-overall-sums-of-totalTime-between-v2-and-v4

Looking into this some, it seems like there may be a problem in some cases with the last subsession of a session having a negative measurement for subsessionLength. This may indicate a problem with shutdown.

http://nbviewer.ipython.org/gist/bcolloran/1ca418a0e2ce3a81c343#Are-all-negative-values-%22subsessionLength%22-in-the-final-subsession?

Comment 2

3 years ago
Did you file that?
Flags: needinfo?(bcolloran)
So, we are using Date() now to calculate subsessionLength and we know some people have weird or jumping clocks.
We need to use monotonic clocks for subsessionLength to fix that.

Spot-checking the code i don't see other potential issues really with subsessionLength and nothing special about shutdown really.
Could there be other reasons to make shutdown affected more proportionally? Like highest average subsession length or most sesssions consisting of one or two fragments?
Depends on: 1188416
See also bug 1106048.

With bug 1188416 we should see sane numbers in v4, but v2 would still be off.
I don't think we can reconstruct this properly (as v2 & v4 are affected differently), so i think we really have to drop any pairing where either of v2 or v4 shows clock issues (like it was done in the notebook AFAICT).
(Reporter)

Comment 5

3 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #2)
> Did you file that?

Look like Georg filed it as bug 1188416.
Flags: needinfo?(bcolloran)

Comment 6

3 years ago
I find it hard to believe that clock reset is causing the negative numbers being reported here. So even if we get bug 1188416 landed immediately, I think we should also target some manual QA at the shutdown timing.

Brendan, can you look at the "reason" code for the subsessions that have a negative total time? e.g. is -85359 a "shutdown" or "aborted-session" and is there a pattern in the other subsessions with negative values?
Flags: needinfo?(bcolloran)
Per bug 1106048, the uptime value in v4 might be "similarly" broken as v2.
ni?me to check if that makes sense to compare.
Flags: needinfo?(gfritzsche)

Updated

3 years ago
Depends on: 1189164
(In reply to Georg Fritzsche [:gfritzsche] from comment #7)
> Per bug 1106048, the uptime value in v4 might be "similarly" broken as v2.
> ni?me to check if that makes sense to compare.

We actually have ping.payload.simpleMeasurements.totalTime which should match the FHR totalTime pretty directly (slight difference expected as it is stored at different times).

The totalTime value is not per subsession, so we should be able to match this pretty directly to the FHR v2 behavior if we use the values from the last fragment of a session (shutdown or aborted-session).
Flags: needinfo?(gfritzsche)
FWIW, i confirmed the negative values for both locally by setting my system clock back.
(Reporter)

Comment 10

3 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #6)
> I find it hard to believe that clock reset is causing the negative numbers
> being reported here. So even if we get bug 1188416 landed immediately, I
> think we should also target some manual QA at the shutdown timing.
> 
> Brendan, can you look at the "reason" code for the subsessions that have a
> negative total time? e.g. is -85359 a "shutdown" or "aborted-session" and is
> there a pattern in the other subsessions with negative values?

There does not appear to be any pattern in the reason codes of subsessions with negative subsessionLength values.

http://nbviewer.ipython.org/gist/bcolloran/19240eb870da7df5dde1#Reason-codes-for-subsessions-with-negative-subsessionLength
Flags: needinfo?(bcolloran)

Updated

3 years ago
Whiteboard: [unifiedTelemetry][data-validation]

Comment 11

3 years ago
issues with v2 that won't be fixed in v4
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.