Closed
Bug 1388309
Opened 7 years ago
Closed 7 years ago
Video decoding is not recovered from GPU process crash.
Categories
(Core :: Audio/Video: Playback, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox55 | --- | wontfix |
firefox56 | + | verified |
firefox57 | --- | verified |
People
(Reporter: kaku, Assigned: kaku)
References
(Blocks 1 open bug)
Details
(Keywords: regression, Whiteboard: [testcoverage] )
Attachments
(1 file, 3 obsolete files)
59 bytes,
text/x-review-board-request
|
mattwoodrow
:
review+
lizzard
:
approval-mozilla-beta+
|
Details |
Steps to reproduce: * (Optional) Set media.suspend-bkgnd-video.enabled to false in about:config. * Play accelerated video (e.g. H.264 on YouTube). * Kill compositor process. Expected results: Video keeps playing. Actual results: Some error message is shown on the YT player. Regression window: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=26a0e8b9f27895eb55e21f53e3f086bbf809472a&tochange=5c9b668ee568bd506fe720868e3c018e2b71eb89 Probably be a regression of bug 1338011.
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 1•7 years ago
|
||
Brief analysis: When the GPU process crashes, we set GPUProcessManager::mDecodeVideoOnGpuProcess = false here: http://searchfox.org/mozilla-central/rev/bd39b6170f04afeefc751a23bb04e18bbd10352b/gfx/ipc/GPUProcessManager.cpp#508 And then, the GPUProcessManager will never create PVideoDecoderManager actors again: http://searchfox.org/mozilla-central/rev/bd39b6170f04afeefc751a23bb04e18bbd10352b/gfx/ipc/GPUProcessManager.cpp#947 But we keep using RemoteDecoderModule which tries to initialize a VidoeDecoderChild on the content process and then fails. So that, no more video decoding.
Assignee | ||
Comment 2•7 years ago
|
||
Hi Gerald, Do you think comment 1 is the right root cause?
Flags: needinfo?(gsquelart)
Updated•7 years ago
|
status-firefox55:
--- → affected
status-firefox56:
--- → affected
Keywords: regression
Priority: -- → P1
Comment 3•7 years ago
|
||
[Tracking Requested - why for this release]: Video stop playing when GPU crashes.
tracking-firefox56:
--- → ?
Whiteboard: [testcoverage]
I'm afraid I don't know that much about this code. But your analysis in comment 1 makes sense. It means we should stop creating RemoteVideoDecoder's, e.g.: either remove RemoteDecoderModule from PDMFactory::mCurrentPDMs, or somehow make RemoteDecoderModule::Supports() always return false so it won't be used to create decoders anymore. Matt, you worked on the GPU process and remote video decoder -- ideas, suggestions?
Flags: needinfo?(gsquelart) → needinfo?(matt.woodrow)
Comment 5•7 years ago
|
||
RemoteDecoderModule::CreateVideoDecoder already has a fallback path to the normal PDM that gets taken when the GPU process no longer exists. Can we just add a way to check for the state where there is a GPU process, the remote decode pref is true, but we still don't allow remote decoding? VideoDecoderManagerChild::Open should have been called (with a null endpoint) when this happens, and CreateVideoDecoder does a sync task to the same thread, so we should be able to check state there, and gracefully handle failure. VideoDecoderChild::InitIPDL intentionally pretends that it succeeded if the manager is unavailable, to handle the case where we try create a new decoder in the middle of a GPU process crash. The new 'fake' decoder will return NS_ERROR_DOM_MEDIA_NEED_NEW_DECODER, trigger a recreation, and hopefully the manager will be recreated by then. We want to distinguish between the manager being unavailable because we've just crashed and are working on reinitializing (where we return true), and recreation being completed but we've just stopped supporting remote video (return false, and then fallback to the other PDM).
Flags: needinfo?(matt.woodrow)
Matt's suggestion in comment 5 seems reasonable. Anthony, I'm assuming we don't want to back out bug 1338011 that regressed this?
Flags: needinfo?(milan) → needinfo?(ajones)
Assignee | ||
Comment 8•7 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #5) > RemoteDecoderModule::CreateVideoDecoder already has a fallback path to the > normal PDM that gets taken when the GPU process no longer exists. > > Can we just add a way to check for the state where there is a GPU process, > the remote decode pref is true, but we still don't allow remote decoding? Do we really want to stop remote decodding in this case? I think bug 1338011 wants a software decoder in the GPU process, right?
Comment 9•7 years ago
|
||
(In reply to Tzuhao Kuo [:kaku] (PTO on 8/9) from comment #8) > (In reply to Matt Woodrow (:mattwoodrow) from comment #5) > > RemoteDecoderModule::CreateVideoDecoder already has a fallback path to the > > normal PDM that gets taken when the GPU process no longer exists. > > > > Can we just add a way to check for the state where there is a GPU process, > > the remote decode pref is true, but we still don't allow remote decoding? > Do we really want to stop remote decodding in this case? > I think bug 1338011 wants a software decoder in the GPU process, right? I don't think so, it doesn't restart the video IPC channel, so we can't have remote decoders. Running software decoding in the GPU process would work too, but that's basically backing out bug 1338011 and starting again. I don't have a strong opinion as to which way we go.
(In reply to Tzuhao Kuo [:kaku] (PTO on 8/9) from comment #8) > Do we really want to stop remote decodding in this case? > I think bug 1338011 wants a software decoder in the GPU process, right? Good point. I would argue that the main reason we wanted decoding to happen in the GPU process, was to catch OS&driver issues for which we cannot do anything; whereas software decoding is done by our software, and so if there are crashes we should be able to fix them. Of course, if we can do software decoding in the GPU process, it would be nice too, for better isolation. In any case, whether or not we disable remote decoding, we will need to disable hardware decoding, otherwise we'll keep using the same drivers that may have just crashed the GPU process! (So my suggestion in comment 4 was incomplete too, sorry about that.) So in summary I now see two options: - `mDecodeVideoOnGpuProcess` should instead be `mHardwareDecodeVideoOnGpuProcess`, and when it's off, the GPU process would still start the video IPC channel, but from now on the remote decoder should only create software-only decoders in the GPU process. - If mDecodeVideoOnGpuProcess becomes false, we would *both* prevent remote decoding, and only create software-only decoders in the main/content process. What do you think?
Assignee | ||
Comment 11•7 years ago
|
||
(In reply to Gerald Squelart [:gerald] from comment #10) > I would argue that the main reason we wanted decoding to happen in the GPU > process, was to catch OS&driver issues for which we cannot do anything; > whereas software decoding is done by our software, and so if there are > crashes we should be able to fix them. > Of course, if we can do software decoding in the GPU process, it would be > nice too, for better isolation. I agree that "catching software decoder crashing" is not a strong reason to have a software decoder in the GPU process. But, does "software decode in GPU process" benefit performance? I suppose the answer is "not so much" as the most costing part is copying software decoder's result into GPU memory instead of communication between content process and GPU process. However, I don't know much about how decoder and compositor work together, so I still raise this question. > So in summary I now see two options: > - `mDecodeVideoOnGpuProcess` should instead be > `mHardwareDecodeVideoOnGpuProcess`, and when it's off, the GPU process would > still start the video IPC channel, but from now on the remote decoder should > only create software-only decoders in the GPU process. > - If mDecodeVideoOnGpuProcess becomes false, we would *both* prevent remote > decoding, and only create software-only decoders in the main/content process. > > What do you think? At this moment, I think we should have a fix to make the software decoder work on the main/content process and uplift it. Whether have software decoder in GPU process should be left to another bug.
Assignee | ||
Comment 12•7 years ago
|
||
(In reply to Tzuhao Kuo [:kaku] (PTO on 8/9) from comment #11) > At this moment, I think we should have a fix to make the software decoder > work on the main/content process and uplift it. Will try to work out a patch and look for the review. However, if someone who is much more familiar with this code and want to provide a quick fix, please feel free to take over.
Sounds good, thanks. In case it's quite difficult, you could just remove the mDecodeVideoOnGpuProcess-related code and uplift that to beta as a quick fix; and then less urgently, do a proper rework in Nightly.
Comment 14•7 years ago
|
||
Looks like we won't fix 55!
(In reply to Sylvestre Ledru [:sylvestre] from comment #14) > Looks like we won't fix 55! We're failing to play video in cases where we used to crash, so I don't expect many users will see this issue.
Flags: needinfo?(ajones)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8897249 -
Flags: review?(matt.woodrow)
Attachment #8897250 -
Flags: review?(matt.woodrow)
Attachment #8897251 -
Flags: review?(matt.woodrow)
Attachment #8897252 -
Flags: review?(matt.woodrow)
Comment 21•7 years ago
|
||
mozreview-review |
Comment on attachment 8897249 [details] Bug 1388309 P1 - let VideoDecoderManagerChild keep the info of whether decoding video on GPU process or not; https://reviewboard.mozilla.org/r/168110/#review175678 ::: dom/media/ipc/VideoDecoderManagerChild.h:30 (Diff revision 1) > static VideoDecoderManagerChild* GetSingleton(); > > // Can be called from any thread. > static nsIThread* GetManagerThread(); > static AbstractThread* GetManagerAbstractThread(); > + static bool IsDecodeVideoOnGPUProcess(); This is under the comment of 'can be called from any thread', but I think it'd be racy to call it from any other thread. I'm pretty sure we only touch it on the manager thread, so can you move it up next to GetSingleton? Can also add an assertion into the impl to make sure we don't call it on other threads. ::: dom/media/ipc/VideoDecoderManagerChild.cpp:74 (Diff revision 1) > []() { > if (sDecoderManager && > sDecoderManager->CanSend()) { > sDecoderManager->Close(); > sDecoderManager = nullptr; > + sDecodeVideoOnGPUProcess = true; false? ::: dom/media/ipc/VideoDecoderManagerChild.cpp:154 (Diff revision 1) > + sDecodeVideoOnGPUProcess = aDecodeVideoOnGPUProcess; > + > // Make sure we always dispatch everything in sRecreateTasks, even if we > // fail since this is as close to being recreated as we will ever be. > sDecoderManager = nullptr; > if (aEndpoint.IsValid()) { Can't we just use this if condition to set sDecodeVideoOnGPUProcess? If it's valid (and binds succeeds), then we're decoding on the GPU process, if it's invalid, then the parent process must have decided not to decode on the GPU process. I think we can skip all of patches 2 and 3 if we do that.
Updated•7 years ago
|
Attachment #8897250 -
Flags: review?(matt.woodrow)
Attachment #8897251 -
Flags: review?(matt.woodrow)
Comment 22•7 years ago
|
||
mozreview-review |
Comment on attachment 8897252 [details] Bug 1388309 - let VideoDecoderChild::InitIPDL() return false if we don't want to decode video on GPU process; https://reviewboard.mozilla.org/r/168204/#review175680
Attachment #8897252 -
Flags: review?(matt.woodrow) → review+
Assignee | ||
Comment 23•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8897249 [details] Bug 1388309 P1 - let VideoDecoderManagerChild keep the info of whether decoding video on GPU process or not; https://reviewboard.mozilla.org/r/168110/#review175678 > This is under the comment of 'can be called from any thread', but I think it'd be racy to call it from any other thread. > > I'm pretty sure we only touch it on the manager thread, so can you move it up next to GetSingleton? > Can also add an assertion into the impl to make sure we don't call it on other threads. Done. > false? Right, thanks. > Can't we just use this if condition to set sDecodeVideoOnGPUProcess? > > If it's valid (and binds succeeds), then we're decoding on the GPU process, if it's invalid, then the parent process must have decided not to decode on the GPU process. > > I think we can skip all of patches 2 and 3 if we do that. All right, your suggestion makes life easier, thanks!
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8897250 -
Attachment is obsolete: true
Assignee | ||
Updated•7 years ago
|
Attachment #8897251 -
Attachment is obsolete: true
Comment 26•7 years ago
|
||
mozreview-review |
Comment on attachment 8897252 [details] Bug 1388309 - let VideoDecoderChild::InitIPDL() return false if we don't want to decode video on GPU process; https://reviewboard.mozilla.org/r/168204/#review175754 Sorry, I just realised that sDecoderManager and sDecodeVideoOnGPUProcess now contain basically the same information. Can we just return false if (!manager), and return true if (!manager->CanSend()) ?
Assignee | ||
Comment 27•7 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #26) > Comment on attachment 8897252 [details] > Bug 1388309 P2 - let VideoDecoderChild::InitIPDL() return false if we don't > want to decode video on GPU process; > > https://reviewboard.mozilla.org/r/168204/#review175754 > > Sorry, I just realised that sDecoderManager and sDecodeVideoOnGPUProcess now > contain basically the same information. > > Can we just return false if (!manager), and return true if > (!manager->CanSend()) ? Quite right, thanks for reminding. VideoDecoderManagerChild::ActorDestroy() only makes the VideoDecoderManagerChild::mCanSend = false and does not delete the instance. Will update the patch.
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8897249 -
Attachment is obsolete: true
Attachment #8897249 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 29•7 years ago
|
||
(In reply to Tzuhao Kuo [:kaku] from comment #28) > Comment on attachment 8897252 [details] > Bug 1388309 - let VideoDecoderChild::InitIPDL() return false if we don't > want to decode video on GPU process; > > Review request updated; see interdiff: > https://reviewboard.mozilla.org/r/168204/diff/2-3/ Not sure if you want to review it again, and don't know how to re-request review from MozReview, so set the NI request here. Please have a look, thanks!
Flags: needinfo?(matt.woodrow)
Assignee | ||
Comment 31•7 years ago
|
||
Thanks for the review!
Comment 32•7 years ago
|
||
Pushed by tkuo@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7be3f7ba4cda let VideoDecoderChild::InitIPDL() return false if we don't want to decode video on GPU process; r=mattwoodrow
Comment 33•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/7be3f7ba4cda
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Updated•7 years ago
|
Assignee: nobody → kaku
Assignee | ||
Comment 35•7 years ago
|
||
(In reply to Gerry Chang [:gchang] from comment #34) > Hi Kaku, > Do you consider to uplift this to Beta56? Yes, we do, but we haven't get a way to automatically test it. Keep the ni and will see how to sort out.
Comment 36•7 years ago
|
||
If figuring out a way to add automatic test takes a while, perhaps we could do manual test with SV's help for Beta and adding test to nightly.
Updated•7 years ago
|
status-firefox-esr52:
--- → unaffected
Assignee | ||
Comment 37•7 years ago
|
||
(In reply to Blake Wu [:bwu][:blakewu] from comment #36) > If figuring out a way to add automatic test takes a while, perhaps we could > do manual test with SV's help for Beta .... How could we have SV's help on Beta before the patch has been uplifted?
Flags: needinfo?(bwu)
Comment 38•7 years ago
|
||
Florin, Can you help us test it before we request a uplift to Beta?
Flags: qe-verify+
Flags: needinfo?(florin.mezei)
Flags: needinfo?(bwu)
Comment 39•7 years ago
|
||
(In reply to Blake Wu [:bwu][:blakewu] from comment #38) > Florin, > Can you help us test it before we request a uplift to Beta? Someone from Andrei's Release QA team will have a look.
Flags: needinfo?(florin.mezei) → needinfo?(andrei.vaida)
Comment 40•7 years ago
|
||
I managed to reproduce the issue on Firefox 57.0a1 (2017-08-08), under Windows 10x64. The issue is no longer reproducible on Firefox 57.0a1 (2017-09-12), under Windows 10x64.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Flags: needinfo?(andrei.vaida)
Assignee | ||
Comment 41•7 years ago
|
||
(In reply to Mihai Boldan, QA [:mboldan] from comment #40) > I managed to reproduce the issue on Firefox 57.0a1 (2017-08-08), under > Windows 10x64. > The issue is no longer reproducible on Firefox 57.0a1 (2017-09-12), under > Windows 10x64. Hello, may I ask for a verification on a 32bit build also? according to bug 1369097 comment 11.
Flags: needinfo?(mihai.boldan)
Comment 42•7 years ago
|
||
I managed to test this issue also on Firefox 57.0a1 (2017-09-12)(32-bit), under Windows 10x86 and under Windows 10x64. Error is no longer displayed. Video continues to play after killing GPU process. Please let me know if I can help any further with this bug.
Flags: needinfo?(mihai.boldan)
Comment 43•7 years ago
|
||
Can we please get an uplift request on this ASAP?
Assignee | ||
Comment 44•7 years ago
|
||
Comment on attachment 8897252 [details] Bug 1388309 - let VideoDecoderChild::InitIPDL() return false if we don't want to decode video on GPU process; Approval Request Comment [Feature/Bug causing the regression]: bug 1338011 [User impact if declined]: video play won't be resumed if GPU process crashes. [Is this code covered by automated tests?]: no [Has the fix been verified in Nightly?]: yes [Needs manual test from QE? If yes, steps to reproduce]: done, comment 40, comment 42 [List of other uplifts needed for the feature/fix]: no [Is the change risky?]: no [Why is the change risky/not risky?]: The patch only add a case to return false in VideoDecoderChild::InitIPDL(). The false return value triggers the media playback to try other platform decoder module. It's not risky because the try-other-platform-decoder-module mechanism has been in the code base for a very long time and works well. [String changes made/needed]: no
Flags: needinfo?(kaku)
Attachment #8897252 -
Flags: approval-mozilla-beta?
Comment 45•7 years ago
|
||
Comment on attachment 8897252 [details] Bug 1388309 - let VideoDecoderChild::InitIPDL() return false if we don't want to decode video on GPU process; Better recovery from GPU process crashes sounds good. Fix has been verified in nightly. Let's uplift this for beta 12.
Attachment #8897252 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 46•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/3de2b4f255b3
Comment 47•7 years ago
|
||
Mihai, could you also verify this fix on beta 12?
Flags: needinfo?(mihai.boldan)
Comment 48•7 years ago
|
||
Video continues to play correctly after ending the compositor process on Firefox 56.0b12 x32 and on Firefox 56.0b12 x64 builds. Tests were performed under Windows 10x64.
Flags: needinfo?(mihai.boldan)
Comment 49•7 years ago
|
||
Seems that it is causing bug 1374231 We should backout that 56 asap. Ryan, wdyt?
Flags: needinfo?(ryanvm)
Comment 50•7 years ago
|
||
I don't think this should be backout, it's to prevent crash after all. No crash trace involved this part of the code. Additionally, we don't care that the GPU process crash, as it will default back to software decoding which is exactly what we want.
Updated•7 years ago
|
Flags: needinfo?(ryanvm)
You need to log in
before you can comment on or make changes to this bug.
Description
•