Closed Bug 1535100 Opened 9 months ago Closed 9 months ago

Rework RTCDTMFSender-ontonechange tests to use cumulative time when checking whether tonechange events are happening too early

Categories

(Core :: WebRTC: Signaling, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: bwc, Assigned: bwc)

Details

Attachments

(1 file)

The reason why these tests are flaky right now seems to be because when there is a delay between setting the timer for the next tone, and calling the tonechange event handler, the next time the tonechange handler is called will appear to happen too soon. An example:

0ms: Schedule next tone to fire in 170ms.
... some processing delay
2ms: tonechange handler called, remembers time, expects next call in 170ms, at t=172ms
170ms: Next tone fires, right on time
170ms: tonechange handler called, and test is sad

This is a bit of a mess. The way this test is written, all of the test-cases run simultaneously, because waiting for tone events to fire is kinda time-consuming. Unfortunately, that means that we're creating a bunch of PCs (which involves quite a bit of time) which significantly delays the first tonechange callbacks (by about 200ms). This padding is more than enough to hide errors in test-cases that do fire events way too early.

Almost works. Seeing intermittent nullptr crashes on linux32 debug only. Looking into it.

Ok... seeing a nullptr crash entirely in binding code. |mImpl| is null here:

void
RTCRtpTransceiver::SetStopped(ErrorResult& aRv, JS::Realm* aRealm)
{
return mImpl->SetStopped(aRv, aRealm);
}

The binding code never nulls out |mImpl| (except in d'tor code maybe), but the cycle collector might. This is very weird.

I should point out that this happens only on the non-e10s tests; this seems to be why it happens only on linux32 debug. It would be nice if there were some way to coerce try into running some non-e10s tests on ASAN...

FWIW, I cannot reproduce this (on non-e10s) on linux ASAN locally.

Ok, bug 1531448 is definitely the culprit here. I wonder why it is not safe to call that code in close()?

Dunno why mach try is putting links in here that don't work, here is one that does.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f57adc1a45ec

Ok, so the crash in comment 5 is happening when we close the pc due to the window going away. Evidently, we're in the middle of a cycle collection, and some of the stuff in the cycle has already been cleaned up. This particular problem is easy enough to work around (see patches in comment 14), but I'm left with a nagging feeling that this sort of bug is liable to happen again.

Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d57f48d6e42b
Re-enable and de-flake the RTCDTMFSender-ontonechange tests. r=jib
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/16005 for changes under testing/web-platform/tests
Upstream PR was closed without merging
You need to log in before you can comment on or make changes to this bug.