Closed Bug 1167456 Opened 5 years ago Closed 5 years ago

v4 submitting non-identical pings under the same ping id

Categories

(Toolkit :: Telemetry, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox40 --- fixed
firefox41 --- fixed

People

(Reporter: bcolloran, Assigned: Dexter)

References

Details

(Whiteboard: [uplift][data-validation][unifiedTelemetry])

Attachments

(2 files, 1 obsolete file)

Aside from the expected there to be some number of exact duplicates, the raw data also appears to include pings submitted under the same ping id, but which contain slightly different data.

http://nbviewer.ipython.org/gist/bcolloran/15cb923e3f2ea4759d06

It looks like "/payload/addonDetails/XPI/*/shutdown_MS" paths are often added/dropped between submissions [1] and a handful of values have turned up as changing between submissions [2].

I guess we should double-check the mechanism that ensures that a ping is "finished" and unchangeable once it has been bundled up and marked for sending.

[1]

{1: u'/payload/addonDetails/XPI/1tinUyqW@cx.com/shutdown_MS',
 2: u'/payload/addonDetails/XPI/CanvasBlocker@kkapsner.de/shutdown_MS',
 3: u'/payload/addonDetails/XPI/browsec@browsec.com/shutdown_MS',
 4: u'/payload/addonDetails/XPI/firebug@software.joehewitt.com/shutdown_MS',
 5: u'/payload/addonDetails/XPI/compatibility@addons.mozilla.org/shutdown_MS',
 6: u'/payload/addonDetails/XPI/aboutsessionstore@dt/shutdown_MS',
 7: u'/payload/addonDetails/XPI/arpit2@techraga.in/shutdown_MS',
 8: u'/payload/addonDetails/XPI/CSTBB@NArisT2_Noia4dev/shutdown_MS',
 9: u'/payload/addonDetails/XPI/en-US@dictionaries.addons.mozilla.org/shutdown_MS',
 10: u'/payload/addonDetails/XPI/firefox@mega.co.nz/shutdown_MS',
 11: u'/payload/addonDetails/XPI/the-addon-bar@GeekInTraining-GiT/shutdown_MS',
 12: u'/payload/addonDetails/XPI/adbhelper@mozilla.org/shutdown_MS',
 13: u'/payload/addonDetails/XPI/{b9db16a4-6edc-47ec-a1f4-b86292ed211d}/shutdown_MS',
 14: u'/payload/addonDetails/XPI/jid1-xUfzOsOFlzSOXg@jetpack/shutdown_MS',
 16: u'/payload/addonDetails/XPI/uriloader@pdf.js/shutdown_MS',
 18: u'/payload/addonDetails/XPI/jid1-cwbvBTE216jjpg@jetpack/shutdown_MS',
 20: u'/payload/addonDetails/XPI/{2b10c1c8-a11f-4bad-fe9c-1c11e82cac42}/shutdown_MS',
 22: u'/payload/addonDetails/XPI/skip_compatibility_check@sdrocking.com/shutdown_MS',
 24: u'/payload/addonDetails/XPI/firefox@ghostery.com/shutdown_MS',
 25: u'/payload/addonDetails/XPI/check-compatibility@dactyl.googlecode.com/shutdown_MS',
 30: u'/payload/addonDetails/XPI/elemhidehelper@adblockplus.org/shutdown_MS',
 58: u'/payload/addonDetails/XPI/mediahint@jetpack/shutdown_MS',
 81: u'/payload/addonDetails/XPI/{d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d}/shutdown_MS'}

[2]

{1: u'/payload/addonDetails/XPI/dta@downthemall.net/shutdown_MS',
 2: u'/payload/simpleMeasurements/UITelemetry/contextmenu/__DEFAULT__/["link","image"]/withoutcustom/openlinkintab',
 3: u'/payload/addonDetails/XPI/firebug@software.joehewitt.com/creator',
 4: u'/payload/addonDetails/XPI/firebug@software.joehewitt.com/startup_MS',
 5: u'/payload/simpleMeasurements/UITelemetry/contextmenu/__DEFAULT__/["link"]/withoutcustom/openlinkintab',
 6: u'/payload/simpleMeasurements/UITelemetry/toolbars/countableEvents/__DEFAULT__/click-builtin-item/back-button/left',
 11: u'/payload/addonDetails/XPI/firebug@software.joehewitt.com/shutdown_MS',
 47: u'/payload/info/reason',
 64: u'/payload/simpleMeasurements/UITelemetry/toolbars/countableEvents/__DEFAULT__/click-builtin-item/tabbrowser-tabs/left'}
Blocks: 1154224, 1134661
Component: Client: Desktop → Telemetry
Product: Firefox Health Report → Toolkit
See Also: → 1139460
Blocks: 1169103
Thanks for the details Brendan, those notebook analysis with breakdowns are really helpful!
From the final comparisons, this looks pretty close to the issues in bug 1139460, which i thought we had nailed down.

Some questions:
* the check for non-identical members per [27]/[50], is that string or proper "deep equality" comparisons?
  it's not very likely, but i wonder if order of members in the JSON data might be a concern here
* how many clients are actually submitting those non-identical duplicates?
  maybe those are submitted by local builds from revisions before bug 1139460 was fixed
  (i'd really love to find a pattern to exclude suspects like that, but don't have one yet)
* what are the ping dates?
  maybe those are older pings generated before the fixes
* what types of pings are affected?
  i.e. ping.type and ping.payload.info.reason
Flags: needinfo?(bcolloran)
Nevermind the above questions, we found an outstanding issue here:
In bug 1139460 we properly cloned the session payloads to avoid data modification, but we ended up using the non-cloned payload:
https://hg.mozilla.org/mozilla-central/annotate/9eae3880b132/toolkit/components/telemetry/TelemetryController.jsm#l464
... somehow this slipped through :(
Flags: needinfo?(bcolloran)
Assignee: nobody → alessio.placitelli
Status: NEW → ASSIGNED
Attached patch bug1167456.patchSplinter Review
Attachment #8614591 - Flags: review?(gfritzsche)
Comment on attachment 8614591 [details] [diff] [review]
bug1167456.patch

Review of attachment 8614591 [details] [diff] [review]:
-----------------------------------------------------------------

Let's also check into test-coverage for this so it doesn't happen again.
Attachment #8614591 - Flags: review?(gfritzsche) → review+
Try push for the first part: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7152ab9b4f86

As discussed over IRC, I'm committing this first part.
Keywords: leave-open
OS: Unspecified → All
Hardware: Unspecified → All
Version: unspecified → Trunk
Attached patch part 2 - Add test coverage. (obsolete) — Splinter Review
This patch adds test coverage for the fix.
Attachment #8616656 - Flags: review?(gfritzsche)
Comment on attachment 8616656 [details] [diff] [review]
part 2 - Add test coverage.

Review of attachment 8616656 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/tests/unit/test_TelemetryController.js
@@ +328,5 @@
>    fakePingSendTimer(() => {}, () => {});
>  });
>  
> +add_task(function* test_changePingAfterSubmission() {
> +  // 1 - Send a ping with a custom payload.

Nit: We don't need the numbers in the comments.
Also, "Submit a ping"? Then later we wait on the ping being sent/archived.

@@ +335,5 @@
> +
> +  // 2 - Change the payload with a predefined value.
> +  payload.canary = "changed";
> +
> +  // 3 - Wait for the ping to be sent.

"to be archived"?

@@ +338,5 @@
> +
> +  // 3 - Wait for the ping to be sent.
> +  const pingId = yield pingPromise;
> +
> +  // 4 - Make sure we get the original payload.

"Make sure our changes didn't affect the submitted payload"?
Attachment #8616656 - Flags: review?(gfritzsche) → review+
Thanks Georg, I've updated the patch. Here's a try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=10502cfb016d
Attachment #8616656 - Attachment is obsolete: true
Attachment #8616758 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/d251ac45a5e2
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Whiteboard: [uplift]
Comment on attachment 8614591 [details] [diff] [review]
bug1167456.patch

Approval Request Comment
[Feature/regressing bug #]:
Unified Telemetry, https://wiki.mozilla.org/Unified_Telemetry
This is part of the first (main) batch of uplifts to 40 to enable shipping on that train, see bug 1120356, comment 2.
[User impact if declined]:
Data & measurement insight projects delayed or blocked with direct impact on projects depending on this.
[Describe test coverage new/current, TreeHerder]:
We have good automation coverage of the feature.
We also had manual tests of the main tasks as well as confirmation of correct behavior on Nightly for the patches here.
[Risks and why]:
Low-risk - these patches are rather isolated to Telemetry and have been on Nightly for a while with no bad reports.
We intend to track on-going data quality and other issues during the 40 aurora & beta and flip the new behavior off when it doesn't meet the requirements.
[String/UUID change made/needed]:
The only string changes were to the about:telemetry page.
We decided that we can live with missing translations on that page for a cycle as that page is not exactly user-facing.
Attachment #8614591 - Flags: approval-mozilla-aurora?
Comment on attachment 8616758 [details] [diff] [review]
part 2 - Add test coverage. - v2

Approval Request Comment
[Feature/regressing bug #]:
Unified Telemetry, https://wiki.mozilla.org/Unified_Telemetry
This is part of the first (main) batch of uplifts to 40 to enable shipping on that train, see bug 1120356, comment 2.
[User impact if declined]:
Data & measurement insight projects delayed or blocked with direct impact on projects depending on this.
[Describe test coverage new/current, TreeHerder]:
We have good automation coverage of the feature.
We also had manual tests of the main tasks as well as confirmation of correct behavior on Nightly for the patches here.
[Risks and why]:
Low-risk - these patches are rather isolated to Telemetry and have been on Nightly for a while with no bad reports.
We intend to track on-going data quality and other issues during the 40 aurora & beta and flip the new behavior off when it doesn't meet the requirements.
[String/UUID change made/needed]:
The only string changes were to the about:telemetry page.
We decided that we can live with missing translations on that page for a cycle as that page is not exactly user-facing.
Attachment #8616758 - Flags: approval-mozilla-aurora?
Comment on attachment 8614591 [details] [diff] [review]
bug1167456.patch

Unified telemetry is an important new feature. It is blocking some other projects. Taking it.
Attachment #8614591 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8616758 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Hey Georg, I want to re-run the notebook at top to verify that this is fixed so that I can strike it from the acceptance criteria list. Please let me know what buildId I should filter on to be sure I have capture these changes. Thank you!
Flags: needinfo?(gfritzsche)
Lets use  buildid>=20150610 as in bug 1171265.
I scanned through the commits today and that looks like a sane cutoff point.
Flags: needinfo?(gfritzsche)
Whiteboard: [uplift] → [uplift][data-validation]
This appears to be fixed. Nice job!

http://nbviewer.ipython.org/gist/bcolloran/3c061932cfd5351f394c
Whiteboard: [uplift][data-validation] → [uplift][data-validation][unifiedTelemetry]
You need to log in before you can comment on or make changes to this bug.