Closed Bug 1398851 Opened 2 years ago Closed 2 years ago

Frequent Win10 debug toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | test_discardBigPings - [test_discardBigPings : 350] Should have recorded sending success. - [0,1,0] deepEqual [0,2,0]

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gfritzsche)

Details

(Keywords: intermittent-failure)

Attachments

(3 files)

Filed by: rvandermeulen [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=130078829&repo=try

https://queue.taskcluster.net/v1/task/e8jLq8WRRm-NYFGzBP5Ywg/runs/0/artifacts/public/logs/live_backing.log

Testing out an update to the lz4 library, I'm seeing these failures roughly 75% of the time, but only Win10 debug runs.
Georg said he can take a look.
Flags: needinfo?(gfritzsche)
We use lz4json to write to disk. Pings too big failures happen due to the length of the uncompressed string (utf8 length) in TelemetrySend.

Could the updated lz4 be resulting in a large uncompressed buffer size?
In summary, this looks to be just a timing/sync issue around our async sending of pings.
I'll see that i get this test improved.
(Longer-term we really need to figure out a better, less fragile/complicated approach.)

The specific test failure here is:
> test_TelemetrySend.js | test_discardBigPings - [test_discardBigPings : 350] Should have recorded sending success. - [0,1,0] deepEqual [0,2,0]

This happens here:
https://dxr.mozilla.org/mozilla-central/rev/f9a5e9ed62103c84e4cde915f4d08f1ce71be83e/toolkit/components/telemetry/tests/unit/test_TelemetrySend.js#350

We attempt to submit one oversized ping, then a health ping, then check the metrics for those send attempts/actions look as expected.

In the log, i first see the oversized ping being discarded properly, as expected.
Then, following the health ping through the log, it gets into the send queue properly.
However, it doesn't finish sending before the test failure, so we record the expected sending success after the test for it failed.

> TelemetryController::submitExternalPing - type: health, aOptions: {"addClientId":true,"usePingSender":false,"addEnvironment":false}"
...
> TelemetryController::submitExternalPing - ping assembled, id: 93bd0011-2c8f-4e1c-bee0-e00000000000"
...
> TelemetrySend::submitPing - can send pings, trying to send now"
...
> TelemetrySend::_doPing - server: http://localhost:58386, persisted: false, id: 93bd0011-2c8f-4e1c-bee0-e00000000000"
...
> TEST-UNEXPECTED-FAIL | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | test_discardBigPings - [test_discardBigPings : 350] Should have recorded sending success. - [0,1,0] deepEqual [0,2,0]
...
> TelemetrySend::_doPing - successfully loaded, status: 200"
Flags: needinfo?(gfritzsche)
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d543ff3bc906
Temporarily skip test_TelemetrySend.js on Win64 debug due to frequent failures. r=gfritzsche
Assignee: nobody → gfritzsche
Status: NEW → ASSIGNED
Comment on attachment 8907065 [details] [diff] [review]
Part 1 - Add test logging for Telemetry PingServer requests

We often have test diagnosis issues of "hang on, which pings got actually sent out here?".
Just logging the ping requests by default from our test server helps with that.
Attachment #8907065 - Flags: review?(alessio.placitelli)
Comment on attachment 8907066 [details] [diff] [review]
Part 2 - Fix fragile TelemetrySend test

This cleans up this send test a bit to work around timing issues and make it easier to reason about.

In the future we really need a sane, non-racey standard pattern for this, but that will be a bigger effort.
Attachment #8907066 - Flags: review?(alessio.placitelli)
Priority: P5 → P1
Attachment #8907065 - Flags: review?(alessio.placitelli) → review+
Attachment #8907066 - Flags: review?(alessio.placitelli) → review+
Attachment #8907501 - Flags: review?(alessio.placitelli) → review+
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b473e348c16d
Part 1: Add test logging for Telemetry PingServer requests. r=Dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/44cd91774b98
Part 2: Fix fragile TelemetrySend test. r=Dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/838c6c929f5f
Part 3: Re-enable test_TelemetrySend.js on Window 64bit debug. r=Dexter
Keywords: checkin-needed
You need to log in before you can comment on or make changes to this bug.