Closed Bug 898919 Opened 7 years ago Closed 7 years ago

[A/V] GENLOCK_IOC_DEADLOCK failed occur while youtube streaming

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)

Tracking

(blocking-b2g:leo+, firefox26 unaffected, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 wontfix, b2g-v1.1hd fixed)

RESOLVED FIXED
1.1 QE5
blocking-b2g leo+
Tracking Status
firefox26 --- unaffected
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- wontfix
b2g-v1.1hd --- fixed

People

(Reporter: leo.bugzilla.gecko, Assigned: sotaro)

References

Details

(Whiteboard: [TD-68993][SR 01255148])

Attachments

(2 files, 6 obsolete files)

Precondition : Good wifi connection.
(I think this problem is hard to be reproduced when connection is not good)

STR
Play any video without UI (Without title and progress bar).

While playing, video frame is stuck for a while.
Sometimes, it stopped until I touch the screen to show UI.
And sometimes, it recovered after for a while, automatically.

FYI
Patch in bug 884188 is not applied on LG side because of regression.
I backed out each of bug 891238, bug 885345, bug 888184.
But I still see the symptom.
blocking-b2g: --- → leo+
Whiteboard: [TD-68993]
Strange. I think we saw something similar to this with bug 884188 in the build, but it was fixed when bug 884188 got backed out.
This could be different genlock error. I built a v1.1 leo ROM from today's source and confirmed that the following genlock. It happened at hw codec. It is different error compared to Bug 896286. 

-------------------------------------
07-29 11:53:19.489   142   595 E libgenlock: perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err=Connection timed out fd=37)
07-29 11:53:19.489   142   595 E omx_vdec: Failed to acquire genlock, ret = 1
The genlock error in comment #2 is fixed in Bug 879297 in the past.
Leo can you answer the question?
- Is the status bar shown when doing "Play any video without UI"?
- Does it happen when the app play a local video?
- Can you attach logcat?
Flags: needinfo?(leo.bugzilla.gecko)
It needs to make clear which type of genlock happened at leo. Bug 896286 or Bug 879297.
Component: Gaia::Video → General
Need to confirm the problem happens at latest ROM on both hamachi(buri) and leo.
Keywords: qawanted
QA Contact: dkumar
I was not able to reproduce this issue on both Leo and Buri devices.Was able to play the video without any obstructions and the video frame did not get stuck in the process. Tried playing three 20 min videos on both the devices.

Environmental Variables
Leo Moz RIL Build ID: 20130729070226
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/8135299f3efd
Gaia: 7aaffc8ccb6cf7ddd1e97943c108f1cb9eae5de0
Platform Version: 18.1
Firmware Version: D30008m

Leaving qawanted for someone else to try and reproduce it.
QA Contact: dkumar
Attached file logcat logs
Flags: needinfo?(leo.bugzilla.gecko)
(In reply to Sotaro Ikeda [:sotaro] from comment #4)
> Leo can you answer the question?
> - Is the status bar shown when doing "Play any video without UI"?
 Yes, Status bar is shown normally.
> - Does it happen when the app play a local video?
 I will check local file and inform you.
> - Can you attach logcat?
 Please check attachment #782835 [details].
perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err=Connection timed out fd=48)
(In reply to Sotaro Ikeda [:sotaro] from comment #5)
> It needs to make clear which type of genlock happened at leo. Bug 896286 or
> Bug 879297.

From attachment 782835 [details], it becomes clear that the problem is  similar to Bug 879297.
Two possibility about this problem.
- Bug 879297 did not solve the all problem
- Regression happened by some reason.
When Bug 884188 is committed again, regenerate of this bug might be more difficult.
I am going to confirm if Bug 879297 still works correctly.
Assignee: nobody → sotaro.ikeda.g
By applying the patch, confirmed that Bug 879297 is still working correctly and check when "GENLOCK_IOC_DREADLOCK failed" happens.
By attachment 783380 [details] [diff] [review], confirmed that Bug 879297 is working correctly even when "GENLOCK_IOC_DREADLOCK failed" happened.
Attached file logcat until genlock fail 1 (obsolete) —
logcat of until genlock failure. Applied attachment 783380 [details] [diff] [review] to ROM.
Attached file logcat until genlock fail 2 (obsolete) —
logcat of until genlock failure. Applied attachment 783380 [details] [diff] [review] to ROM.
From the logs and investigations, the following become clear.
- Patch of Bug 879297 works correctly.
  After video frame rendering, a Texture was bound to mDummyGrallocBuffer correctly.
- genlock failure happened short time after rendering change from Open GL to Hw Composer.
- genlock failure happened always during Hw Composer rendering.
- Problem did not happen, when video rendering is restricted only to Hw Composer rendering(disabling video GL rendering) by modifying the ROM.
Whiteboard: [TD-68993] → [TD-68993] [SR 01255148]
I believe the intention of bug 884188 is to make it so all video playback frames (including youtube video) are always rendered with HW composer. If that's so, then this bug should be also solved.
Suplement to comment #18
- genlock failure continued until next OpenGL rendering.
(In reply to Diego Wilson [:diego] from comment #19)
> I believe the intention of bug 884188 is to make it so all video playback
> frames (including youtube video) are always rendered with HW composer. If
> that's so, then this bug should be also solved.

Even when bug 884188 is fixed, video is sometimes rendered by OpenGL, when user touches UI or UI changed by some timing on youtube web site.
There was alway a pattern in the logs before output genlock failure.
- ALL last OpenGL rendered video frame buffers before genlock failure were also rendered by Hw Composer.
Ah yes, you are correct.

I feel like this is a regressions. I'm pretty sure I tested switching back and forth between HWC and GPU back when we were fixing long youtube videos.
Yeah, I also feel this is a regression.
Flags: needinfo?(leo.bugzilla.gecko)
Flags: needinfo?(leo.bugzilla.gecko)
I also think that this is a regression.

This problem happen in current QE.
And in last QE (1th July ~ 12th July), I didn't report this kind of issue.
(It might not be detected then.)

So, I have checked changes from 1th July and backed out several suspicious commits.
However, I didn't figure out, yet.
Diego, do you have an assumption about the cause of the problem?
Flags: needinfo?(dwilson)
(In reply to leo.bugzilla.gecko from comment #25)
> I also think that this is a regression.
> 
> This problem happen in current QE.
> And in last QE (1th July ~ 12th July), I didn't report this kind of issue.
> (It might not be detected then.)

I checked 2 moz built ROMs(June/10, June/28), but still see the problem. And Bug 879297 is fixed on June/06. From them, I re-think that it is not a regression. The problem is just not recognized by us.

Actually, until Bug 879297 was reported by Diego, no one care about it. We had a lot of Youtube playback problems at that time and we might not care about it so much.
(In reply to Sotaro Ikeda [:sotaro] from comment #27)
> (In reply to leo.bugzilla.gecko from comment #25)
> > I also think that this is a regression.
> > 
> > This problem happen in current QE.
> > And in last QE (1th July ~ 12th July), I didn't report this kind of issue.
> > (It might not be detected then.)
> 
> I checked 2 moz built ROMs(June/10, June/28), 

Sorry, I failed to write ROM(June/10). So I confirmed only June/28. I am going to check other ROMs.
The genlock faileure happened on ROMs of June/11, June/14, June/28. v1.1 leo ROMs until June/10 have a problem(can not use data communication of WLAN/RIL), therefore I can not check the problem until June 10.
(In reply to Sotaro Ikeda [:sotaro] from comment #29)
> The genlock faileure happened on ROMs of June/11, June/14, June/28. v1.1 leo
> ROMs until June/10 have a problem

Correction:
v1.1 leo ROMs -> v1.1 leo ROMs stored at mozilla's repository.
Although I can not confirm the problem from June/06-June/10 because of the mozilla repository's v1.1 leo ROM's problem. From comment #29, I think it is not a regression. The problem was present and hidden by other youtube streaming problems.
From Comment 18, gecko works correctly to unbind the gralloc buffers from OpenGL texture. But the genlock failure happened at the situation witten in Comment 22.
And from Comment 29, the problem also happens at ROMS of June/11, June/14 June/28. Therefore I feel that it is not a gecko's defect.
Leo, how do you think about it?
Flags: needinfo?(leo.bugzilla.gecko)
I feel following STR is easier to re-generate the genlock failure.
-[1] Visit youtube web site by browser app
-[2] Select relatively long video and start playback
-[3] Sometimes change the playback position by seeking.

repeat [3] until genlock failure happens.
(In reply to Sotaro Ikeda [:sotaro] from comment #33)
> Leo, how do you think about it?

I cannot understand your comment.

So if this problem is not caused by Gecko..
Does it come from BSP or other side?
Flags: needinfo?(leo.bugzilla.gecko)
Patch from bug 887454 is not landed on LG side, yet.

Sotaro,
Did you see the same problem after using it?
Flags: needinfo?(sotaro.ikeda.g)
(In reply to leo.bugzilla.gecko from comment #35)
> (In reply to Sotaro Ikeda [:sotaro] from comment #33)
> > Leo, how do you think about it?
> 
> I cannot understand your comment.
> 
> So if this problem is not caused by Gecko..
> Does it come from BSP or other side?

Yes. From comment #32, I became clear that it is not a regression and gecko side correctly try to unbind the gralloc buffer. Somehow there is a case that qcom's side code does not unbind the buffer in the following case even when gecko side try to unbind the gralloc buffer.
- ALL last OpenGL rendered video frame buffers before genlock failure were also rendered by Hw Composer.
Flags: needinfo?(sotaro.ikeda.g)
We do not have a qcom's side code. We can not investigate about why qcom side does not unbind the gralloc buffer in the situation even when gecko side try to unbind the buffer.
(In reply to leo.bugzilla.gecko from comment #36)
> Patch from bug 887454 is not landed on LG side, yet.
> 
> Sotaro,
> Did you see the same problem after using it?

Yes.
(In reply to Sotaro Ikeda [:sotaro] from comment #39)
> (In reply to leo.bugzilla.gecko from comment #36)
> > Patch from bug 887454 is not landed on LG side, yet.
> > 
> > Sotaro,
> > Did you see the same problem after using it?
> 
> Yes.

I saw the problem both before and after applying the patch of bug 887454.
There is a difference the way of binding a gralloc buffer to a GL Texture. In android, SurfaceFlinger bind gralloc buffer very drawing update. It does not care that a Layer(Surface) will be rendered by hw composer or OpenGL.

On the other hand, Firefox OS bind a graloc buffer to GL Texture only when a Layer is drawn by OpenGL and then bind to a dummy gralloc buffer soon after the Open GL drawing.

This difference might affect to the problem.

http://androidxref.com/4.0.4/xref/frameworks/base/services/surfaceflinger/Layer.cpp#403
comment 34 has STR, so I'm pulling qawanted.
Keywords: qawanted
(In reply to Jason Smith [:jsmith] from comment #42)
> comment 34 has STR, so I'm pulling qawanted.

For the comfirmation of ROMs(June/11, June/14), I flashed only gecko and gaia, but not flash ril. Because there days of ROM's ril did not work correctly.
Summarize the analysis of the problem until now.

-[1] When gen lock failed happens, there were always a same pattern.
   + ALL last OpenGL rendered video frame buffers
     before genlock failure were also rendered by Hw Composer.
-[2] From gecko, only thing to unlock genlock of gralloc buffers
     they are bounded to OpenGL Texture is following.
   + Bind another gralloc buffer to the texture. 
-[3] A similar genlock problem was fixed in Bug 879297.
   + It still works correctly. 
   + It does [2] by using a dummy gralloc buffer.
-[4] It is not a regression.
   + Bug 879297 is fixed on June/06.
   + A similar genlock failure happens also on the ROMs of June/11, June/14 June/28.
-[5] The problem happens even when [3] works correctly.
I created the patch just to check if more agressive bind to dummy buffer could solve the problem. Because, the problem happen durin hw compsitor rendeirng and during the rendering no binding is done. The patch try to bind dummy buffer even during hw compositor rendeirng.

The Result:
The patch does not fix the problem.
bjacob, do you have any ideas what we can do from gecko side about this problem?
Flags: needinfo?(bjacob)
(Note: I am in France, so it's late today; I'm off tomorrow and on Tuesday; so if you need a quick reply, you might want to ask someone else, such as: Jeff Muizelaar, Peter Chang, or Nicolas Silva. Sorry!)
(In reply to Sotaro Ikeda [:sotaro] from comment #41)
> There is a difference the way of binding a gralloc buffer to a GL Texture.
> In android, SurfaceFlinger bind gralloc buffer very drawing update. It does
> not care that a Layer(Surface) will be rendered by hw composer or OpenGL.
> 
> On the other hand, Firefox OS bind a graloc buffer to GL Texture only when a
> Layer is drawn by OpenGL and then bind to a dummy gralloc buffer soon after
> the Open GL drawing.

Nicolas Silva has recently landed on mozilla-central a patch making us not use a dummy gralloc buffer anymore; so maybe he'll have ideas about that here too. Flagging 'needinfo'.
Flags: needinfo?(bjacob) → needinfo?(nical.bugzilla)
Addition to the summary of comment #44

- [6] The genlock problem do not happen when hw composer is disabled.
- [7] The genlock problem do not happen when vidoe frame is rendered only by HwComposer.
(In reply to Benoit Jacob [:bjacob] from comment #48)
> (In reply to Sotaro Ikeda [:sotaro] from comment #41)
> > There is a difference the way of binding a gralloc buffer to a GL Texture.
> > In android, SurfaceFlinger bind gralloc buffer very drawing update. It does
> > not care that a Layer(Surface) will be rendered by hw composer or OpenGL.
> > 
> > On the other hand, Firefox OS bind a graloc buffer to GL Texture only when a
> > Layer is drawn by OpenGL and then bind to a dummy gralloc buffer soon after
> > the Open GL drawing.
> 
> Nicolas Silva has recently landed on mozilla-central a patch making us not
> use a dummy gralloc buffer anymore; so maybe he'll have ideas about that
> here too. Flagging 'needinfo'.

IIRC the idea was to let the driver unbind the previous gralloc buffer when binding the next one to the same GL texture (which is what surface flinger does).

Now (m-c) we have one GL texture per texture unit globally (post-layers-refactoring it is stored in CompositorOGL, pre-refactoring the GL layer manager would be the best candidate) rather than having a GL texture per texture unit and per layer.

The changeset is: http://hg.mozilla.org/mozilla-central/rev/3c7c31aaea78

The motivation for this was that this behavior yields better performances than what we were doing before (and still do in b2g18). It was not meant to fix a particular bug so I don't know for sure that it would help in this case. Although, reading the comments it looks like the problem is that we are doing things in a way that the driver has not been designed for, so maybe doing a similar change could help (since it brings us closer to what SurfaceFlinger does)
Flags: needinfo?(nical.bugzilla)
(In reply to Sotaro Ikeda [:sotaro] from comment #44)
> -[1] When gen lock failed happens, there were always a same pattern.
>    + ALL last OpenGL rendered video frame buffers
>      before genlock failure were also rendered by Hw Composer.
> -[3] A similar genlock problem was fixed in Bug 879297.
>    + It still works correctly. 
>    + It does [2] by using a dummy gralloc buffer.

In bug 879297 the issue was that the GPU held a buffer while switching to HWC rendering. If [1] is correct, this means the new issue is that HWC is holding a buffer while switching to GPU rendering.
Flags: needinfo?(dwilson)
By using attachment 783380 [details] [diff] [review], I confirmed again that [1] happened on v1.1 leo. From comment #51, to analyze the problem, investigation of internal of HWC is necessary. I am not correct person for it. I un-assign myself from this bug.
Assignee: sotaro.ikeda.g → nobody
Jonas, CJ, who would be the the right person for "investigation of HWC is necessary"?
Flags: needinfo?(jonas)
Flags: needinfo?(cku)
I reminds that there is still a few thing need to check in gecko side. I am going to take the bug again a bit more.
Assignee: nobody → sotaro.ikeda.g
Flags: needinfo?(cku)
Loop in Morris, who is working on 4.3 HWC porting now
Current B2G do not call FramebufferNativeWindow::compositionComplete().
The patch adds a call to FramebufferNativeWindow::compositionComplete().
By applying attachment 789966 [details] [diff] [review] to v1.1 leo, I still do can not reproduce the genlock failure until now.
On ROM with attachment 789966 [details] [diff] [review], the genlock failure did not happen on my leo device by STR in comment #34. It seems that the patch fix the problem.
Add Comments and nit fix.
Attachment #787710 - Attachment is obsolete: true
Attachment #789966 - Attachment is obsolete: true
Comment on attachment 790230 [details] [diff] [review]
patch v2 - Add calling compositionComplete()

jrmuizel, can I have a feed back for the patch?
Attachment #790230 - Flags: feedback?(jmuizelaar)
I tested the patch (In reply to Sotaro Ikeda [:sotaro] from comment #59)
> On ROM with attachment 789966 [details] [diff] [review], the genlock failure
> did not happen on my leo device by STR in comment #34. It seems that the
> patch fix the problem.

Today, I tested longer duration than yesterday. Still the genlock failure did not happen.
(In reply to Sotaro Ikeda [:sotaro] from comment #57)
> Created attachment 789966 [details] [diff] [review]
> patch - Add calling compositionComplete()
> 
> The patch adds a call to FramebufferNativeWindow::compositionComplete().

That's quite the catch there Sotaro! How did you figure this out? Looking at surfaceflinger?
(In reply to Diego Wilson [:diego] from comment #63)
> That's quite the catch there Sotaro! How did you figure this out? Looking at
> surfaceflinger?

Yes, I compared B2G to the ICS's SurfaceFlinger.
Attachment #790230 - Flags: feedback?(jmuizelaar) → feedback+
Attachment #790230 - Flags: review?(bgirard)
Comment on attachment 790230 [details] [diff] [review]
patch v2 - Add calling compositionComplete()

Review of attachment 790230 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the following changed.

::: gfx/layers/opengl/LayerManagerOGL.cpp
@@ +1043,5 @@
>  
> +#ifdef MOZ_WIDGET_GONK
> +  // Should be called when composition rendering is complete for a frame.
> +  // Required by certain older drivers for synchronization.
> +  // XXX Recent HWCs do not need the call

What happens to HWCs that don't need this when you call it? It should be noted in the comment here because it leaves the reader wondering.

This code should be moved to mWidget DrawWindowOverlay.
Attachment #790230 - Flags: review?(bgirard) → review+
(In reply to Benoit Girard (:BenWa) from comment #65)
> 
> What happens to HWCs that don't need this when you call it? It should be
> noted in the comment here because it leaves the reader wondering.

Yeah, the above is not clear. I will update it in next patch. And this bug only care about B2G v1.1. For master, I am going to create another bug for it.

On ICS gonk it is necessary always to call CompositionComplete(), On JB4.3 gonk,  
it is decided by HWC's version. HWC could be 3 versions(v1.0, v1.1 v1.2). CompositionComplete() is called only v1.0 HWC.

http://androidxref.com/4.3_r2.1/xref/frameworks/native/services/surfaceflinger/DisplayHardware/HWComposer.cpp#758

> 
> This code should be moved to mWidget DrawWindowOverlay.

Will fix in next patch.
(In reply to Sotaro Ikeda [:sotaro] from comment #66)
> > This code should be moved to mWidget DrawWindowOverlay.
> 
> Will fix in next patch.

In an offline talk, it is going to keep stay in same place.
Comment on attachment 790356 [details] [diff] [review]
patch v3 for b2g18 - Add calling compositionComplete()

Carry "bgirard review+".
Attachment #790356 - Flags: review+
Attachment #783380 - Attachment is obsolete: true
Attachment #783382 - Attachment is obsolete: true
Attachment #783384 - Attachment is obsolete: true
Created Bug 905304 for master.
check-in is necessary only for b2g18.
Keywords: checkin-needed
Leo, can you check if attachment 790356 [details] [diff] [review] works? Anyway, the patch is necessary.
Flags: needinfo?(leo.bugzilla.gecko)
https://hg.mozilla.org/releases/mozilla-b2g18/rev/09a8c5b6b287
Status: NEW → RESOLVED
Closed: 7 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [TD-68993] [SR 01255148] → [TD-68993][SR 01255148]
(In reply to Sotaro Ikeda [:sotaro] from comment #72)
> Leo, can you check if attachment 790356 [details] [diff] [review] works?
> Anyway, the patch is necessary.

I cannot see the symptom after patching it!!
Flags: needinfo?(leo.bugzilla.gecko)
(In reply to Sotaro Ikeda [:sotaro] from comment #56)
> Current B2G do not call FramebufferNativeWindow::compositionComplete().

Please refer this bug.

https://bugzilla.mozilla.org/show_bug.cgi?id=883006

FramebufferNativeWindow::compositionComplete calls glFinish finally.

And we fixed this bug by adding glFinish too.
ying.xu, thank for the info. FramebufferNativeWindow::compositionComplete() actually end up to calling glFinish on qcom platform. We choose not to call glFinish() directly but call FramebufferNativeWindow::compositionComplete(), because the api is provided by android to abstract the hw's difference. 

And in JB, the compositionComplete() is not called when HWC's version is v1.1 or more.

http://androidxref.com/4.3_r2.1/xref/frameworks/native/services/surfaceflinger/DisplayHardware/DisplaySurface.h#35

http://androidxref.com/4.3_r2.1/xref/frameworks/native/services/surfaceflinger/DisplayHardware/HWComposer.cpp#758
This bug is already fixed. needinfo is not necessary now.
Flags: needinfo?(jonas)
Duplicate of this bug: 883006
You need to log in before you can comment on or make changes to this bug.