Open Bug 1212460 Opened 9 years ago Updated 2 years ago

Intermittent test_peerConnection_localReofferRollback.html | Test timed out.

Categories

(Core :: WebRTC, defect, P2)

defect

Tracking

()

People

(Reporter: KWierso, Unassigned)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Rank: 35
Priority: -- → P3
backlog: --- → webrtc/webaudio+
That patch was moved over from bug 1273314 where it was causing permafails on Android. Especially of this intermittent on debug. See https://treeherder.mozilla.org/#/jobs?repo=try&revision=14033d9d78b5.

Nils, this patch (and perhaps rev) should help debugging this. I'm also increasing the priority since it went permafail so easily/randomly.
Rank: 35 → 18
Flags: needinfo?(drno)
Priority: P3 → P1
(In reply to Nils Ohlmeier [:drno] from comment #8)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=1ae8d883aaf1

Apparently we wait for several minutes for outgoing packets to show up. Lets see if full logs will tell us more...
Flags: needinfo?(drno)
Hmm this looks to me like timers are pretty off/busted on Android.

We sleep for 15 seconds from 23:29:24 till 23:29:39, which according to the new patch should have been 1 second.
And then we execute the same check 5 time within the same second. Only to sleep for another 14 seconds.

Or maybe the system is simply overloaded?

[task 2016-10-11T23:29:24.910732Z] 23:29:24     INFO -  6429 INFO Track {da0e59e0-86e8-485c-9299-3697a9bf31bc} has 301 inboundrtp RTP packets.
[task 2016-10-11T23:29:24.910865Z] 23:29:24     INFO -  6430 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_localReofferRollback.html | RTP flowing for track {da0e59e0-86e8-485c-9299-3697a9bf31bc}
[task 2016-10-11T23:29:24.910947Z] 23:29:24     INFO -  6431 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_localReofferRollback.html | Should have RTP stats for track {52c51863-7e4b-4bf9-be81-f6146cb856a1}
[task 2016-10-11T23:29:39.879500Z] 23:29:39     INFO -  6432 INFO Track {52c51863-7e4b-4bf9-be81-f6146cb856a1} has 0 outboundrtp RTP packets.
[task 2016-10-11T23:29:39.879663Z] 23:29:39     INFO -  6433 INFO TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_localReofferRollback.html | The author of the test has indicated that flaky timeouts are expected.  Reason: WebRTC inherently depends on timeouts
[task 2016-10-11T23:29:39.880913Z] 23:29:39     INFO -  6434 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_localReofferRollback.html | Should have RTP stats for track {52c51863-7e4b-4bf9-be81-f6146cb856a1}
[task 2016-10-11T23:29:39.880995Z] 23:29:39     INFO -  6435 INFO Track {52c51863-7e4b-4bf9-be81-f6146cb856a1} has 0 outboundrtp RTP packets.
[task 2016-10-11T23:29:39.881602Z] 23:29:39     INFO -  6436 INFO TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_localReofferRollback.html | The author of the test has indicated that flaky timeouts are expected.  Reason: WebRTC inherently depends on timeouts
[task 2016-10-11T23:29:39.881666Z] 23:29:39     INFO -  6437 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_localReofferRollback.html | Should have RTP stats for track {52c51863-7e4b-4bf9-be81-f6146cb856a1}
[task 2016-10-11T23:29:39.881706Z] 23:29:39     INFO -  6438 INFO Track {52c51863-7e4b-4bf9-be81-f6146cb856a1} has 0 outboundrtp RTP packets.
[task 2016-10-11T23:29:39.881808Z] 23:29:39     INFO -  6439 INFO TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_localReofferRollback.html | The author of the test has indicated that flaky timeouts are expected.  Reason: WebRTC inherently depends on timeouts
[task 2016-10-11T23:29:39.881916Z] 23:29:39     INFO -  6440 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_localReofferRollback.html | Should have RTP stats for track {52c51863-7e4b-4bf9-be81-f6146cb856a1}
[task 2016-10-11T23:29:39.881993Z] 23:29:39     INFO -  6441 INFO Track {52c51863-7e4b-4bf9-be81-f6146cb856a1} has 0 outboundrtp RTP packets.
[task 2016-10-11T23:29:39.882110Z] 23:29:39     INFO -  6442 INFO TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_localReofferRollback.html | The author of the test has indicated that flaky timeouts are expected.  Reason: WebRTC inherently depends on timeouts
[task 2016-10-11T23:29:39.882173Z] 23:29:39     INFO -  6443 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_localReofferRollback.html | Should have RTP stats for track {52c51863-7e4b-4bf9-be81-f6146cb856a1}
[task 2016-10-11T23:29:39.882214Z] 23:29:39     INFO -  6444 INFO Track {52c51863-7e4b-4bf9-be81-f6146cb856a1} has 0 outboundrtp RTP packets.
[task 2016-10-11T23:29:39.882277Z] 23:29:39     INFO -  6445 INFO TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_localReofferRollback.html | The author of the test has indicated that flaky timeouts are expected.  Reason: WebRTC inherently depends on timeouts
[task 2016-10-11T23:29:55.101075Z] 23:29:55     INFO -  6446 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_localReofferRollback.html | Should have RTP stats for track {52c51863-7e4b-4bf9-be81-f6146cb856a1}
[task 2016-10-11T23:29:55.101151Z] 23:29:55     INFO -  6447 INFO Track {52c51863-7e4b-4bf9-be81-f6146cb856a1} has 0 outboundrtp RTP packets.
(In reply to Nils Ohlmeier [:drno] from comment #11)
> Hmm this looks to me like timers are pretty off/busted on Android.
> 
> We sleep for 15 seconds from 23:29:24 till 23:29:39, which according to the
> new patch should have been 1 second.
> And then we execute the same check 5 time within the same second. Only to
> sleep for another 14 seconds.
> 
> Or maybe the system is simply overloaded?

It's definitely overloaded. What I don't understand is how in this situation requesting getStats *more often* can cause RTP packets to flow sooner? I'd be happy to file this under weird scheduling though.

Also see bug 1307042, where running the fake gUM audio device in the realtime audio callback (compared to being triggered by a timer at a tenth of the normal rate) causes a whole sleeve of timeout problems. Probably from cpu starvation.

If we could run the emulator at a tenth of the speed (so it appears overclocked by 10x) or something, that might work. Otherwise I think it makes sense to stop wasting time on it by disabling all peerConnection tests on android emulator.
See Also: → 1307042
(In reply to Nils Ohlmeier [:drno] from comment #11)
> Hmm this looks to me like timers are pretty off/busted on Android.
> 
> We sleep for 15 seconds from 23:29:24 till 23:29:39, which according to the
> new patch should have been 1 second.

Are you basing that only on the times in the main test log? Those times may not be what you think!

For Android, the harness is running on a host, periodically pulling the test log from the emulator and dumping it to the test log, where it gets timestamped. The harness pulls new data no faster than once every 10 seconds, so that may be the delay that you are seeing here.

To get more accurate timestamps, look at the Android logcat: In treeherder, in the Job Details pane, click on the link for the logcat-emulator-5554.log artifact. For the debug job from comment 10, that should be https://public-artifacts.taskcluster.net/PYIKubUBRJ-d6G2m20q76Q/0/public/test_info//logcat-emulator-5554.log - harder to read, but more accurate. Those timestamps are generated on device (emulator) and should reflect when Android saw the log write.


While I'm here, I have to mention, I've seen "AudioFlinger: BUFFER TIMEOUT" in a lot of these timeout bugs, and I keep wondering if that is important. https://code.google.com/p/android/issues/detail?id=58399 talks about a buffer size issue that may be specific to Android 4.3, which is what we run on the emulators.

Setting needinfo just to make sure you see this -- just trying to be helpful.
Flags: needinfo?(drno)
See comment 19
Flags: needinfo?(pehrson)
That's good info indeed. Now what would it take to get proper timestamps into the main log...
Flags: needinfo?(pehrson)
Thanks Geoff that was indeed super helpful. I looked at the trace log already in another case since I learned that here.

I agree with Andreas that it would be really helpful to have the real time stamps to show up in the main logs rather then same inaccurate values.

For this bug, lets see if bug 1313397 will help to mitigate this.
Depends on: 1313397
Flags: needinfo?(drno)
(In reply to Nils Ohlmeier [:drno] from comment #23)
> I agree with Andreas that it would be really helpful to have the real time
> stamps to show up in the main logs rather then same inaccurate values.

I agree too, but I'm not sure what to do about it. I don't fully understand why our log timestamps are the way they are, or what would be needed to get accurate timestamps. I have made a note of the issue in the long-neglected bug 1178554.
Attachment #8798838 - Flags: review?(drno)
Mass change P1->P2 to align with new Mozilla triage process
Priority: P1 → P2
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.