Closed Bug 1367094 Opened 2 years ago Closed 2 years ago

Measure the size of unsuccessfully-sent and successfully-sent pings

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: chutten|PTO, Assigned: katejimmelon)

References

Details

(Whiteboard: [measurement:client])

Attachments

(1 file)

We have a theory that the size of a ping may have an effect on whether or not it can be successfully sent. This has ramifications for the design of the health ping (bug 1318297).

So I propose we measure (opt-in) the size of successfully-sent pings and the size of unsuccessfully-sent pings so we can compare.
Assignee: chutten → nobody
Priority: -- → P2
Assignee: nobody → kustiuzhanina
Priority: P2 → P1
Adding some more notes to this bug:
- this will need changes in TelemetrySend.jsm
- we want to measure this in histograms:
  https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/histograms.html
- prior art and pointers to code locations are:
  - bug 1319026, which instrumented the success and error path
  - existing histogram instrumentation for discarding pings for size:
    https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Histograms.json#7180,7187
- we should double-check if we can cheaply get the size for *all* ping sends
- once this is working, test coverage should be added
  - this will go into test_TelemetrySend.js
  - the test can be run using e.g. "./mach test <path>"
Comment on attachment 8874428 [details]
Bug 1367094 - Measure the size of successful and failed pings.

https://reviewboard.mozilla.org/r/145806/#review149722

This looks pretty solid, thanks!

Were you able to test this locally and check the values in about:telemetry?
Mostly i'm not sure about the testing part, maybe we can just simplify the test code a bit.

::: toolkit/components/telemetry/Histograms.json:7172
(Diff revision 1)
>      "n_buckets": 20,
>      "bug_numbers": [1033860],
>      "description": "Time (ms) it takes for loading archived pings from disk"
>    },
> +  "TELEMETRY_SUCCESSFUL_SEND_PINGS_SIZE_KB": {
> +    "record_in_processes": ["main", "content"],

Lets limit this to "main" only.
We only ever expect to send pings from the "main" process (parent process), never from any child process.

The "content" process here is where we render web content. For this and other child processes we collect metrics, but send them to the parent process to store and batch into pings.

::: toolkit/components/telemetry/tests/unit/test_TelemetrySend.js:485
(Diff revision 1)
> +  }
> +
> +  let randomString = generateRandomString(1024 * 50);
> +  const compressedStringSizeKB = Math.floor(gzipCompressString(randomString).length / 1024);
> +
> +  //Submit a ping with 50 kb size.

Small nit: Please put an empty space after the "//".
Ditto for other comments below.

::: toolkit/components/telemetry/tests/unit/test_TelemetrySend.js:489
(Diff revision 1)
> +  Assert.equal(histSuccessPingSize.snapshot().sum, compressedStringSizeKB,
> +    "Telemetry must report reception of 1 successful ping with size: " + compressedStringSizeKB + "kb");

Does this work? I'm surprised :)

The size you are measuring is of the payload.
The size TelemetrySend is measuring internally is that of the common ping format + the payload.
https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/data/common-ping.html

I suspect this passes because of the histogram bucketing, but it seems very fragile.

Do we need to measure the specific size at all?
Maybe it is enough here to check that we record non-zero values into the different histograms?
Attachment #8874428 - Flags: review?(gfritzsche)
Comment on attachment 8874428 [details]
Bug 1367094 - Measure the size of successful and failed pings.

https://reviewboard.mozilla.org/r/145806/#review150172

This looks like its done code-wise, thanks!

I've only left some comments on the comment style etc. below - staying consistent here helps us with readability in the future.
Once you did those changes we also need to get data review on the patch:
https://wiki.mozilla.org/Firefox/Data_Collection

::: toolkit/components/telemetry/Histograms.json:7174
(Diff revision 3)
>      "description": "Time (ms) it takes for loading archived pings from disk"
>    },
> +  "TELEMETRY_SUCCESSFUL_SEND_PINGS_SIZE_KB": {
> +    "record_in_processes": ["main"],
> +    "alert_emails": ["telemetry-client-dev@mozilla.com"],
> +    "expires_in_version": "never",

This is currently exploratory data.
We should set it to expire in a few versions, say Firefox 61.

::: toolkit/components/telemetry/TelemetrySend.jsm:1104
(Diff revision 3)
>      let deferred = PromiseUtils.defer();
>  
>      let onRequestFinished = (success, event) => {
>        let onCompletion = () => {
>          if (success) {
> +          Telemetry.getHistogramById("TELEMETRY_SUCCESSFUL_SEND_PINGS_SIZE_KB").add(compressedPingSizeKB);

One small thing:
The lines are getting a bit long here.
Can you change this to:
```
let h = ...
h.add(..)
```

The same below.

::: toolkit/components/telemetry/tests/unit/test_TelemetrySend.js:482
(Diff revision 3)
> +
> +  for (let h of [histSuccessPingSize, histFailedPingSize]) {
> +    h.clear();
> +  }
> +
> +  // Submit a ping with 50 kb size.

Nit: That comment needs updating.

::: toolkit/components/telemetry/tests/unit/test_TelemetrySend.js:487
(Diff revision 3)
> +  // Submit a ping with 50 kb size.
> +  await TelemetryController.submitExternalPing(TEST_TYPE, {});
> +  await TelemetrySend.testWaitOnOutgoingPings();
> +
> +  Assert.equal(histogramValueCount(histSuccessPingSize.snapshot()), 1,
> +    "Telemetry must report reception of 1 successful ping");

Maybe something like "Should have recorded 1 successful ping into histogram."?
Similarly for the messages below.

::: toolkit/components/telemetry/tests/unit/test_TelemetrySend.js:489
(Diff revision 3)
> +  await TelemetrySend.testWaitOnOutgoingPings();
> +
> +  Assert.equal(histogramValueCount(histSuccessPingSize.snapshot()), 1,
> +    "Telemetry must report reception of 1 successful ping");
> +
> +  // Check that successful ping didn't affect failed pings size

Nit: "Check that a successful send ..."

But it could be more compact (and for me sufficient) to say e.g.:
```
// Check that we recorded the ping sizes correctly into histograms.
Assert.equal(....histSuccess...);
Assert.equal(....histFailed...);
```

You could use the same pattern for all the checks below.

::: toolkit/components/telemetry/tests/unit/test_TelemetrySend.js:493
(Diff revision 3)
> +
> +  // Check that successful ping didn't affect failed pings size
> +  Assert.equal(histogramValueCount(histFailedPingSize.snapshot()), 0,
> +    "Telemetry must report reception of 0 failed ping");
> +
> +  // Submit the same ping 2nd time.

"... a second time."

::: toolkit/components/telemetry/tests/unit/test_TelemetrySend.js:497
(Diff revision 3)
> +
> +  // Submit the same ping 2nd time.
> +  await TelemetryController.submitExternalPing(TEST_TYPE, {});
> +  await TelemetrySend.testWaitOnOutgoingPings();
> +
> +  // Check that histSuccessPingSize was updated.

Nit: It's better to not mention specific variable names, we can see those below.
"Check that we recorded a ping size for successful sends."

::: toolkit/components/telemetry/tests/unit/test_TelemetrySend.js:505
(Diff revision 3)
> +  // Write a custom ping handler which will return 601. This will trigger ping eviction
> +  // on client side.

"Register a custom ping handler ..."

::: toolkit/components/telemetry/tests/unit/test_TelemetrySend.js:518
(Diff revision 3)
> +  await TelemetryController.submitExternalPing(TEST_TYPE, {});
> +  await ContentTaskUtils.waitForCondition(() => {
> +    return histogramValueCount(histFailedPingSize.snapshot()) > 0;
> +  });
> +
> +  // Check failed ping didn't influence on successful.

Nit: "Check that failed ping wasn't recorded as a success."?

::: toolkit/components/telemetry/tests/unit/test_TelemetrySend.js:520
(Diff revision 3)
> +    return histogramValueCount(histFailedPingSize.snapshot()) > 0;
> +  });
> +
> +  // Check failed ping didn't influence on successful.
> +  Assert.equal(histogramValueCount(histSuccessPingSize.snapshot()), 2,
> +    "Telemetry must report reception of 1 successful ping");

Should be 2 in the message?
Attachment #8874428 - Flags: review?(gfritzsche)
Comment on attachment 8874428 [details]
Bug 1367094 - Measure the size of successful and failed pings.

https://reviewboard.mozilla.org/r/145806/#review150240

Thanks!
Attachment #8874428 - Flags: review?(gfritzsche) → review+
Pushed by georg.fritzsche@googlemail.com:
https://hg.mozilla.org/integration/autoland/rev/7df33990054e
Measure the size of successful and failed pings. r=gfritzsche
https://hg.mozilla.org/mozilla-central/rev/7df33990054e
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment on attachment 8874428 [details]
Bug 1367094 - Measure the size of successful and failed pings.

(In reply to Georg Fritzsche [:gfritzsche] from comment #6)
> Once you did those changes we also need to get data review on the patch:
> https://wiki.mozilla.org/Firefox/Data_Collection

And i missed this part, my bad.
Benjamin, requesting data review for this.

This is a follow-up investigatory instrumentation to look into correlations between ping sizes and Telemetry send success or failure.
We will actively use this for
1) an on-going investigation into send success ratios
2) informing the design of a "telemetry health ping", which will provide long-term monitoring.
Attachment #8874428 - Flags: feedback?(benjamin)
Depends on: 1370880
Note that there was a mixup with the expiry versions, we are addressing that in bug 1370880.
Comment on attachment 8874428 [details]
Bug 1367094 - Measure the size of successful and failed pings.

https://reviewboard.mozilla.org/r/145806/#review152000

data-r=me
Attachment #8874428 - Flags: review+
Attachment #8874428 - Flags: feedback?(benjamin)
Depends on: 1432489
Depends on: 1501303
You need to log in before you can comment on or make changes to this bug.