Frequent Win10 debug toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | test_discardBigPings - [test_discardBigPings : 350] Should have recorded sending success. - [0,1,0] deepEqual [0,2,0]

RESOLVED FIXED in Firefox 57

Status

()

Toolkit
Telemetry
P1
normal
RESOLVED FIXED
9 months ago
8 months ago

People

(Reporter: Treeherder Bug Filer, Assigned: gfritzsche)

Tracking

({intermittent-failure})

unspecified
mozilla57
intermittent-failure
Points:
---

Firefox Tracking Flags

(firefox57 fixed)

Details

Attachments

(3 attachments)

(Reporter)

Description

9 months ago
treeherder
Filed by: rvandermeulen [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=130078829&repo=try

https://queue.taskcluster.net/v1/task/e8jLq8WRRm-NYFGzBP5Ywg/runs/0/artifacts/public/logs/live_backing.log

Testing out an update to the lz4 library, I'm seeing these failures roughly 75% of the time, but only Win10 debug runs.
Georg said he can take a look.
Flags: needinfo?(gfritzsche)

Comment 2

9 months ago
We use lz4json to write to disk. Pings too big failures happen due to the length of the uncompressed string (utf8 length) in TelemetrySend.

Could the updated lz4 be resulting in a large uncompressed buffer size?
(Assignee)

Comment 3

9 months ago
In summary, this looks to be just a timing/sync issue around our async sending of pings.
I'll see that i get this test improved.
(Longer-term we really need to figure out a better, less fragile/complicated approach.)

The specific test failure here is:
> test_TelemetrySend.js | test_discardBigPings - [test_discardBigPings : 350] Should have recorded sending success. - [0,1,0] deepEqual [0,2,0]

This happens here:
https://dxr.mozilla.org/mozilla-central/rev/f9a5e9ed62103c84e4cde915f4d08f1ce71be83e/toolkit/components/telemetry/tests/unit/test_TelemetrySend.js#350

We attempt to submit one oversized ping, then a health ping, then check the metrics for those send attempts/actions look as expected.

In the log, i first see the oversized ping being discarded properly, as expected.
Then, following the health ping through the log, it gets into the send queue properly.
However, it doesn't finish sending before the test failure, so we record the expected sending success after the test for it failed.

> TelemetryController::submitExternalPing - type: health, aOptions: {"addClientId":true,"usePingSender":false,"addEnvironment":false}"
...
> TelemetryController::submitExternalPing - ping assembled, id: 93bd0011-2c8f-4e1c-bee0-e00000000000"
...
> TelemetrySend::submitPing - can send pings, trying to send now"
...
> TelemetrySend::_doPing - server: http://localhost:58386, persisted: false, id: 93bd0011-2c8f-4e1c-bee0-e00000000000"
...
> TEST-UNEXPECTED-FAIL | toolkit/components/telemetry/tests/unit/test_TelemetrySend.js | test_discardBigPings - [test_discardBigPings : 350] Should have recorded sending success. - [0,1,0] deepEqual [0,2,0]
...
> TelemetrySend::_doPing - successfully loaded, status: 200"
Flags: needinfo?(gfritzsche)

Comment 4

9 months ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d543ff3bc906
Temporarily skip test_TelemetrySend.js on Win64 debug due to frequent failures. r=gfritzsche
Keywords: leave-open
(Assignee)

Comment 5

9 months ago
Created attachment 8907065 [details] [diff] [review]
Part 1 - Add test logging for Telemetry PingServer requests
(Assignee)

Updated

9 months ago
Assignee: nobody → gfritzsche
Status: NEW → ASSIGNED
(Assignee)

Comment 6

9 months ago
Created attachment 8907066 [details] [diff] [review]
Part 2 - Fix fragile TelemetrySend test
(Assignee)

Comment 8

9 months ago
Comment on attachment 8907065 [details] [diff] [review]
Part 1 - Add test logging for Telemetry PingServer requests

We often have test diagnosis issues of "hang on, which pings got actually sent out here?".
Just logging the ping requests by default from our test server helps with that.
Attachment #8907065 - Flags: review?(alessio.placitelli)
(Assignee)

Comment 9

9 months ago
Comment on attachment 8907066 [details] [diff] [review]
Part 2 - Fix fragile TelemetrySend test

This cleans up this send test a bit to work around timing issues and make it easier to reason about.

In the future we really need a sane, non-racey standard pattern for this, but that will be a bigger effort.
Attachment #8907066 - Flags: review?(alessio.placitelli)
(Assignee)

Updated

9 months ago
Priority: P5 → P1
(Assignee)

Comment 11

9 months ago
Created attachment 8907501 [details] [diff] [review]
Part 3 - Re-enable test_TelemetrySend.js on Window 64bit debug
Attachment #8907501 - Flags: review?(alessio.placitelli)
Attachment #8907065 - Flags: review?(alessio.placitelli) → review+
Attachment #8907066 - Flags: review?(alessio.placitelli) → review+
Attachment #8907501 - Flags: review?(alessio.placitelli) → review+
(Assignee)

Updated

9 months ago
status-firefox57: --- → fix-optional
Keywords: checkin-needed
Keywords: leave-open

Comment 12

9 months ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b473e348c16d
Part 1: Add test logging for Telemetry PingServer requests. r=Dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/44cd91774b98
Part 2: Fix fragile TelemetrySend test. r=Dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/838c6c929f5f
Part 3: Re-enable test_TelemetrySend.js on Window 64bit debug. r=Dexter
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/b473e348c16d
https://hg.mozilla.org/mozilla-central/rev/44cd91774b98
https://hg.mozilla.org/mozilla-central/rev/838c6c929f5f
Status: ASSIGNED → RESOLVED
Last Resolved: 9 months ago
status-firefox57: fix-optional → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57

Comment 14

8 months ago
2 failures in 1032 pushes (0.002 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* try: 2

Platform breakdown:
* windows10-64: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1398851&startday=2017-09-11&endday=2017-09-17&tree=all
You need to log in before you can comment on or make changes to this bug.