Closed Bug 1875095 Opened 1 year ago Closed 1 year ago

Shutdown hang intermittent failures with browser wmfme crash tests

Categories

(Core :: Audio/Video: Playback, defect)

Desktop
Windows 11
defect

Tracking

()

RESOLVED DUPLICATE of bug 1879375

People

(Reporter: yannis, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [win:stability])

The wmfme crash tests such as dom/media/test/browser/wmfme/browser_wmfme_crash.js produce intermittent failures where we can observe a shutdown hang, at least on Windows 11. These failures occur on treeherder (see bug 1831236) and I can reproduce the browser_wmfme_crash.js shutdown hang on my (Windows 11) machine in, say, 1 out of 3 runs after compiling a debug build (though it can probably also happen in release, I'm not sure about that).

My understanding is that the main process creates an IPC channel between a VideoBridgeChild and a VideoBridgeParent. This happens in UtilityAudioDecoderChild::CreateVideoBridge() and the VideoBridgeChild will live in a MF Media Engine CDM utility process while the VideoBridgeParent will live in the GPU process. In the test we intentionally kill the utility process and hope that video playback will recover by initiating a new video bridge IPC channel between the GPU process and a new utility process.

Here is what I think I observe when I have a shutdown hang:

  • the main process shutdown is waiting for the GPU process down;
  • the GPU process shutdown is waiting for the compositor thread shutdown;
  • the compositor thread shutdown is waiting for all CompositorThreadHolder references to have been released;
  • the original VideoBridgeParent object was never notified of the death of its VideoBridgeChild peer and still lives, ActorDestroy has not been called for this object;
  • this object has been replaced by the new VideoBridgeParent within (*videoBridgeFromProcess)[VideoBridgeSource::MFMediaEngineCDMProcess], so VideoBridgeParent::ShutdownInternal() won't call Close() for it either;
  • so this object still holds a CompositorThreadHolder reference that it will never release.

Therefore I wonder about the following:

  • What could lead the VideoBridgeParent to not get notified of the death of its peer? Can it be an issue with the way the video bridge classes are written? Or is it necessarily a bug in our IPC layer?

  • Independently of a potential IPC layer bug, in VideoBridgeParent::VideoBridgeParent, if we find that (*videoBridgeFromProcess)[aSource] is already populated, would it make sense to force a call to Close() on the object we find there?

Found it, reminds me of bug 1718210

See Also: → 1718210
Whiteboard: [win:stability]

This is probably a duplicate of bug 1805736. I can fix the hang on my machine by forcing a first call to PVideoBridgeParent::SendPing in VideoBridgeParent::Bind, because that forces the VideoBridgeParent to notice that the other end is dead. I can propose a patch that does that in the current bug and we'll see if that fixes the intermittent failures as well.

In the long run this looks like an IPC layer bug. I will file a new bug to track it, with more details. Once the IPC layer bug is solved it will not be necessary to call PVideoBridgeParent::SendPing anymore.

See Also: → 1805736
Flags: needinfo?(alwu)
Depends on: 1878607

We confirmed with :nika that the root cause for this issue is in the IPC layer.

Per the comment3, we can duplicate this bug to bug 1879375.

Status: NEW → RESOLVED
Closed: 1 year ago
Duplicate of bug: 1879375
Flags: needinfo?(alwu)
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.