1147395 - Validation: Compare a few telemetry measurements between "saved-session" and "main" pings.

Reporter

Description

•

9 years ago

We should ensure that we're receiving the same data via the new "main" pings as we are receiving via the "saved-session" pings. One simple way is to select a few common measures and calculate the aggregations using both record types. They should come out the same if everything is working as expected.

I propose we use the following commonly submitted values:
SIMPLE_MEASURES_UPTIME
CYCLE_COLLECTOR
GC_MS

And an uncommon measure like:
SHUMWAY_ERROR

We should filter records by
- Application Name: "Firefox"
- Channel: "nightly"

We should aggregate separately by type="saved-session" vs. type="main", excluding duplicate Document IDs.  The aggregation should be done by the appBuildId field.

Ideally, the resulting aggregations should be comparable to the data on telemetry.mozilla.org in addition to comparing with each other.

Mark Reid [:mreid]

Reporter

Updated

•

9 years ago

Priority: -- → P3

Mark Reid [:mreid]

Reporter

Updated

•

9 years ago

Priority: P3 → P2

Roberto Agostino Vitillo (:rvitillo)

Updated

•

9 years ago

Depends on: 1125451

Roberto Agostino Vitillo (:rvitillo)

Updated

•

9 years ago

Assignee: nobody → rvitillo

Roberto Agostino Vitillo (:rvitillo)

Updated

•

9 years ago

Depends on: 1157359

Roberto Agostino Vitillo (:rvitillo)

Updated

•

9 years ago

Depends on: 1159297

Roberto Agostino Vitillo (:rvitillo)

Updated

•

9 years ago

Depends on: 1154113

Roberto Agostino Vitillo (:rvitillo)

Updated

•

9 years ago

Depends on: 1157408

Mark Reid [:mreid]

Reporter

Comment 1

•

9 years ago

Roberto, is this work ongoing? Or are you waiting for the follow-up bugs? Are you blocked on anything?

Flags: needinfo?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Comment 2

•

9 years ago

I was waiting for Brendan to be happy with the data first, as he is doing a great job of validating the v4 dataset.

Flags: needinfo?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Updated

•

9 years ago

Depends on: 1169103

Mark Reid [:mreid]

Reporter

Updated

•

9 years ago

Whiteboard: [unifiedTelemetry][b5]

Katie Parlante

Updated

•

9 years ago

Whiteboard: [unifiedTelemetry][b5] → [unifiedTelemetry][b5][data-validation]

Thomas Huelbert

Comment 3

•

9 years ago

needs info to Katie to find resources for this.

Flags: needinfo?(kparlante)

Roberto Agostino Vitillo (:rvitillo)

Updated

•

9 years ago

Assignee: rvitillo → nobody

Roberto Agostino Vitillo (:rvitillo)

Comment 4

•

9 years ago

I will start working on this tomorrow.

Flags: needinfo?(kparlante)

Roberto Agostino Vitillo (:rvitillo)

Comment 5

•

9 years ago

A preliminary analysis is available at [1]. I compared only few metrics but there appears already to be a mismatch in about 7% of sessions for one of those (GC_MS).

At the end of the notebook I dumped few mismatching sessions. Note that in the notebook I have used a single build-id but the percentage seems to be stable across recent build-ids as well.

[1] http://nbviewer.ipython.org/gist/vitillo/8aec1f023265c9bf2293

Flags: needinfo?(gfritzsche)

Roberto Agostino Vitillo (:rvitillo)

Comment 6

•

9 years ago

Forgot to mention that 7% applies only to multi-fragment sessions.

Roberto Agostino Vitillo (:rvitillo)

Updated

•

9 years ago

Flags: needinfo?(alessio.placitelli)

Thomas Huelbert

Updated

•

9 years ago

Whiteboard: [unifiedTelemetry][b5][data-validation] → [40b9] [unifiedTelemetry][data-validation]

Alessio Placitelli [:Dexter]

Comment 7

•

9 years ago

After a quite some testing, I was not able to consistently reproduce the issue locally. This is my test procedure (I've changed the telemetry server pref to point to a local, non existent server, so my pings are kept in the pending pings directory, for simplicity):

- Start Firefox
- Browse to about:telemetry
- Wait for Telemetry to start (1 minute)
- Play a bit with the browser to trigger the GC
- Enable a restartless addon to break the session
- Play a bit more
- Close Firefox

The GC_MS histogram in the environment-changed ping and the one in the shutdown ping sum up nicely and the result equals the value reported by the saved-session ping. What I've noticed though is that there's a mismatch in the GC_MS histogram in the "childPayloads" section of the pings.

I'll keep digging further in.

Flags: needinfo?(alessio.placitelli)

Georg Fritzsche [:gfritzsche]

Updated

•

9 years ago

Depends on: 1186871

Georg Fritzsche [:gfritzsche]

Comment 8

•

9 years ago

Checking through a few things here, it turns out that the GC_MS data used in the notebook is a sum of both child & parent histograms.

Roberto, can you please rerun this with only looking at the parent data?

The child-data discrepancy needs to be fixed before we'd switch away from saved-session or e10s ships, but its not blocking 41 now.

Flags: needinfo?(gfritzsche) → needinfo?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Comment 9

•

9 years ago

Using only parent histograms nearly none of the complete multi-fragment sessions have a mismatch. See http://nbviewer.ipython.org/gist/vitillo/b352ec160ce5c5ee2af6

Flags: needinfo?(rvitillo)

Georg Fritzsche [:gfritzsche]

Comment 10

•

9 years ago

Thanks for rechecking, that is great to hear.
That seems ok and minor mismatches are expected (for bug 1186871).

So i think we can close this off as WORKSFORME and file a follow-up to investigate the child payload discrepancies?
(I think those come mostly down to not collecting child payloads on each subsession collection)

Roberto Agostino Vitillo (:rvitillo)

Comment 11

•

9 years ago

Can we just rename the Bug?

Georg Fritzsche [:gfritzsche]

Comment 12

•

9 years ago

I'd rather close this one as the original question we tracked here is resolved and the context above isn't all related to the e10s issue.

Georg Fritzsche [:gfritzsche]

Comment 13

•

9 years ago

(In reply to Georg Fritzsche [:gfritzsche] [away until july 22] from comment #12)
> I'd rather close this one as the original question we tracked here is
> resolved and the context above isn't all related to the e10s issue.

Filed bug 1187327.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → WORKSFORME

BMO Automation

Updated

•

6 years ago

Product: Cloud Services → Cloud Services Graveyard