Closed Bug 1164562 Opened 6 years ago Closed 5 years ago

Intermittent test_TelemetryController.js | Test timed out

Categories

(Toolkit :: Telemetry, defect, P2)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox50 --- fixed
firefox51 --- fixed
firefox52 --- fixed

People

(Reporter: KWierso, Assigned: tnguyen)

References

Details

(Keywords: intermittent-failure, Whiteboard: [measurement:client])

Attachments

(1 file, 3 obsolete files)

23:56:42 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetryLockCount.js
23:56:44 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetryLockCount.js | took 1957ms
23:56:44 INFO - mozdevice getting files in '/mnt/sdcard/tests/xpcshell/minidumps'
23:56:45 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetryLog.js
23:56:46 INFO - TEST-PASS | toolkit/components/telemetry/tests/unit/test_TelemetryLog.js | took 1771ms
23:56:46 INFO - mozdevice getting files in '/mnt/sdcard/tests/xpcshell/minidumps'
23:56:47 INFO - TEST-START | toolkit/components/telemetry/tests/unit/test_TelemetryController.js
00:01:47 WARNING - TEST-UNEXPECTED-TIMEOUT | toolkit/components/telemetry/tests/unit/test_TelemetryController.js | Test timed out
00:01:47 INFO - TEST-INFO took 300000ms
00:01:47 INFO - mozdevice unable to kill -1 xpcshell -- not running?
00:01:47 INFO - xpcshell return code: -1
00:01:47 WARNING - TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT)
00:01:47 INFO - INFO | Result summary:
00:01:47 INFO - INFO | Passed: 881
00:01:47 WARNING - INFO | Failed: 1
00:01:47 WARNING - One or more unittests failed.
00:01:47 INFO - INFO | Todo: 12
00:01:47 INFO - INFO | Retried: 0
00:01:47 INFO - SUITE-END | took 3260s
00:01:47 ERROR - Return code: 1
00:01:47 INFO - Output logcat...
00:01:47 INFO - Connecting to: panda-0098
00:01:48 INFO - *** STARTING LOGCAT ***
[Mass Closure] Closing Intermittent as a one off
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Phil, is there any new failure log associated with this bug?
Flags: needinfo?(philringnalda)
Priority: -- → P3
Whiteboard: [measurement:client]
The easiest way to keep track in the modern era where the bot only comments once a week and that only if there are more than five instances is to go to https://bugzilla.mozilla.org/userprefs.cgi?tab=settings and toggle on "When viewing a bug, show its corresponding Orange Factor page". Then you get a little graph, and below that a "5 failures on trunk in the past week (link)" with the link going to https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1164562
Flags: needinfo?(philringnalda)
The late-June spike in this test lines up with bug 1233986 landing (and/or bug 1276195). Can you please take a look, Alessio?
Flags: needinfo?(alessio.placitelli)
Priority: P3 → P2
MozReview-Commit-ID: HX3EwiGL3Pn
Assignee: nobody → tnguyen
Status: REOPENED → ASSIGNED
MozReview-Commit-ID: HX3EwiGL3Pn
Attachment #8800971 - Attachment is obsolete: true
Attachment #8800973 - Flags: review?(gfritzsche)
Blocks: 1296802
Attachment #8800973 - Flags: review?(gfritzsche)
MozReview-Commit-ID: HX3EwiGL3Pn
Attachment #8800973 - Attachment is obsolete: true
Comment on attachment 8801036 [details] [diff] [review]
Intermittent test_TelemetryController.js test timed out

Logs: 
https://treeherder.mozilla.org/logviewer.html#?repo=fx-team&job_id=11826001#L7788

Oh, I see that's a timing issue that  
22:14:16     INFO -  "CONSOLE_MESSAGE: (info) 1475644159864	Toolkit.Telemetry	TRACE	TelemetrySend::Scheduler::triggerSendingPings - active send task: true, immediately: true"

was called before 

 22:14:16     INFO -  "CONSOLE_MESSAGE: (info) 1475644160863	Toolkit.Telemetry	TRACE	TelemetrySend::Scheduler::_doSendTask - bailing out after sending, shutdown: true, pendingPingCount: 1"
 8247 

That means the new ping sending could not be triggered if the old _doSendTask is still running.

https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetrySend.jsm#373

I try to stop the sendtask in the patch and it works for me locally.
Attachment #8801036 - Flags: review?(gfritzsche)
Hi :gfritzsche,
Could you please take a look at this, or do you know would good to look at this?
Thanks for your help.
(In reply to Thomas Nguyen[:tnguyen] (use ni? plz) from comment #37)
> Hi :gfritzsche,
> Could you please take a look at this, or do you know would good to look at
> this?
> Thanks for your help.

It might be interesting to see what ping is being held and not sent, since we're waiting for the send activity to settle a few lines above ( the |testWaitOnOutgoingPings| code at line 170).
(In reply to Alessio Placitelli [:Dexter] from comment #38)
> (In reply to Thomas Nguyen[:tnguyen] (use ni? plz) from comment #37)
> > Hi :gfritzsche,
> > Could you please take a look at this, or do you know would good to look at
> > this?
> > Thanks for your help.
> 
> It might be interesting to see what ping is being held and not sent, since
> we're waiting for the send activity to settle a few lines above ( the
> |testWaitOnOutgoingPings| code at line 170).

I'm not quite familiar with Telemetry things, fwik, testWaitOnOutgoingPings works. After changing the pref, the deletion ping will be submitted again and sending keeps fails because server stopped ( I saw some back-offs and resending in doSendTask). 
Then looking into that TelemetryController.testReset()

https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryController.jsm#932

This will:
- shutdown the send task 
https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryController.jsm#949

- And resend again, somehow in 
https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryController.jsm#947

We may met the timing issue that shutdown is done before resending again and that is this intermittent.
I don't know what's good solution here, probably change the order of these 2 lines, or maybe shutdown the sending task (in the attached patch).
Attachment #8801036 - Flags: review?(gfritzsche) → review?(alessio.placitelli)
Comment on attachment 8801036 [details] [diff] [review]
Intermittent test_TelemetryController.js test timed out

Review of attachment 8801036 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good with the comment changed, it deals with the symptoms, which is ok for this bug. I've filed bug 1312699 to deal with the underlying cause of this problem.

Before landing the patch, would you mind pushing it to try and trigger the xpcshell tests 10 times or more for each affected platform? Just to make sure this is effectively attacking the intermittent.

Let's not land this without this confirmation.

::: toolkit/components/telemetry/tests/unit/test_TelemetryController.js
@@ +184,5 @@
>    // We set the new server using the pref, otherwise it would get reset with
>    // |TelemetryController.testReset|.
>    Preferences.set(TelemetryController.Constants.PREF_SERVER, "http://localhost:" + PingServer.port);
>  
> +  // Stop sending task, and then the sending task could be resolved and submit new ping to new server.

Could you please change that to "Stop the sending task and then start it again."?
Attachment #8801036 - Flags: review?(alessio.placitelli) → review+
Flags: needinfo?(alessio.placitelli)
MozReview-Commit-ID: 4KnTmvEyuFG
Attachment #8801036 - Attachment is obsolete: true
Comment on attachment 8804620 [details] [diff] [review]
Intermittent test_TelemetryController.js test timed out

carry r+
Attachment #8804620 - Flags: review+
Keywords: checkin-needed
Running 10 times on try, result in comment 42 looks good. :)
(In reply to Thomas Nguyen[:tnguyen] (use ni? plz) from comment #45)
> Running 10 times on try, result in comment 42 looks good. :)

Good job!
For future reference, it's preferred that the commit message summarize what the patch is actually doing rather than restating the problem being solved.
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8b98dca70dd1
Intermittent test_TelemetryController.js test timed out. r=dexter
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/8b98dca70dd1
Status: ASSIGNED → RESOLVED
Closed: 6 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
I apparently forgot to mark the bug when uplifting this to Beta a couple weeks ago.
https://hg.mozilla.org/releases/mozilla-beta/rev/089d4b62f227
You need to log in before you can comment on or make changes to this bug.