Closed Bug 1186713 Opened 9 years ago Closed 9 years ago

FHR v2/v4 comparison -- totalTime and activeTicks per session must match

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: bcolloran, Unassigned)

References

Details

(Whiteboard: [unifiedTelemetry][data-validation])

For each complete reconstructed session starting within this date range (i.e., not the ongoing session in appSession.current or a session for which not all subsessions have been collected), we should also have these match identically:
* totalTime per session
* activeTicks per session

Among other things, this check is meant to help us verify that measurements are being binned correctly across subsessions.
Blocks: 1169103
Depends on: 1187054
Depends on: 1187069
So there appear to be some discrepancies that actually do roll up to be substantially different between v2 and v4.

http://nbviewer.ipython.org/gist/bcolloran/1ca418a0e2ce3a81c343#Comparing-overall-sums-of-totalTime-between-v2-and-v4

Looking into this some, it seems like there may be a problem in some cases with the last subsession of a session having a negative measurement for subsessionLength. This may indicate a problem with shutdown.

http://nbviewer.ipython.org/gist/bcolloran/1ca418a0e2ce3a81c343#Are-all-negative-values-%22subsessionLength%22-in-the-final-subsession?
Did you file that?
Flags: needinfo?(bcolloran)
So, we are using Date() now to calculate subsessionLength and we know some people have weird or jumping clocks.
We need to use monotonic clocks for subsessionLength to fix that.

Spot-checking the code i don't see other potential issues really with subsessionLength and nothing special about shutdown really.
Could there be other reasons to make shutdown affected more proportionally? Like highest average subsession length or most sesssions consisting of one or two fragments?
Depends on: 1188416
See also bug 1106048.

With bug 1188416 we should see sane numbers in v4, but v2 would still be off.
I don't think we can reconstruct this properly (as v2 & v4 are affected differently), so i think we really have to drop any pairing where either of v2 or v4 shows clock issues (like it was done in the notebook AFAICT).
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #2)
> Did you file that?

Look like Georg filed it as bug 1188416.
Flags: needinfo?(bcolloran)
I find it hard to believe that clock reset is causing the negative numbers being reported here. So even if we get bug 1188416 landed immediately, I think we should also target some manual QA at the shutdown timing.

Brendan, can you look at the "reason" code for the subsessions that have a negative total time? e.g. is -85359 a "shutdown" or "aborted-session" and is there a pattern in the other subsessions with negative values?
Flags: needinfo?(bcolloran)
Per bug 1106048, the uptime value in v4 might be "similarly" broken as v2.
ni?me to check if that makes sense to compare.
Flags: needinfo?(gfritzsche)
Depends on: 1189164
(In reply to Georg Fritzsche [:gfritzsche] from comment #7)
> Per bug 1106048, the uptime value in v4 might be "similarly" broken as v2.
> ni?me to check if that makes sense to compare.

We actually have ping.payload.simpleMeasurements.totalTime which should match the FHR totalTime pretty directly (slight difference expected as it is stored at different times).

The totalTime value is not per subsession, so we should be able to match this pretty directly to the FHR v2 behavior if we use the values from the last fragment of a session (shutdown or aborted-session).
Flags: needinfo?(gfritzsche)
FWIW, i confirmed the negative values for both locally by setting my system clock back.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #6)
> I find it hard to believe that clock reset is causing the negative numbers
> being reported here. So even if we get bug 1188416 landed immediately, I
> think we should also target some manual QA at the shutdown timing.
> 
> Brendan, can you look at the "reason" code for the subsessions that have a
> negative total time? e.g. is -85359 a "shutdown" or "aborted-session" and is
> there a pattern in the other subsessions with negative values?

There does not appear to be any pattern in the reason codes of subsessions with negative subsessionLength values.

http://nbviewer.ipython.org/gist/bcolloran/19240eb870da7df5dde1#Reason-codes-for-subsessions-with-negative-subsessionLength
Flags: needinfo?(bcolloran)
Whiteboard: [unifiedTelemetry][data-validation]
issues with v2 that won't be fixed in v4
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.