Closed Bug 881970 Opened 11 years ago Closed 11 years ago

buffer rotation and HwcComposer can cause jank under certain conditions

Categories

(Core :: Graphics: Layers, defect)

Product:

Component:

Platform:

ARM

Gonk (Firefox OS)

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

()

Status:

RESOLVED DUPLICATE of bug 916264

blocking-b2g

koi+

People

(Reporter: bkelly, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [c=handeye p= s= u=])

Attachments

(4 files)

2013-07-25-20-21-01.png 11 years ago Ben Kelly [:bkelly, not reviewing] 37.38 KB, image/png		Details
2013-07-25-20-21-12.png 11 years ago Ben Kelly [:bkelly, not reviewing] 37.52 KB, image/png		Details
2013-07-25-20-21-19.png 11 years ago Ben Kelly [:bkelly, not reviewing] 36.74 KB, image/png		Details
2013-07-25-20-21-25.png 11 years ago Ben Kelly [:bkelly, not reviewing] 38.65 KB, image/png		Details

Ben Kelly [:bkelly, not reviewing]

Reporter

Description

•

11 years ago

Recently I noticed a repeatable pause during scrolling within the contacts app on b2g. The jank lasts about one second in length each time. To reproduce: - Build and flash device using recent gaia/master and mozilla-central. (I am testing on a Buri.) - Install contacts by running 'make reference-workload-heavy' in gaia. - Launch contacts. - To rule out background work, wait 10 to 20 seconds for the state quiesce. - Begin scrolling down. - When the 'B' header replaces the 'A' header, there will be a one second pause. This jank appears to happen whenever the header is replaced through normally scrolling in either direction or by jump scrolling to a letter group. I assumed this was a gaia issue at first and tried to bisect to find the problem. Surprisingly, the gaia rev did not affect the issue. Bisecting mozilla-central I was able to determine the following (large) starting range: Bad: 134609:81b227f1a522 Good: 133758:6eac1d687575 The fixed header code in contacts app does some CSS transformations using translateY(). I don't really know if that's related, but I put the bug in Layout for now.

Mike Lee [:mlee]

Updated

•

11 years ago

Keywords: perf

Whiteboard: c=

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 1

•

11 years ago

New range: Bad: 134448:697190293f4e Good: 133758:6eac1d687575

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 2

•

11 years ago

Bad: 134366:8e7a612cc232 Good: 134019:0ee6e6d5918e

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 3

•

11 years ago

Bad: 134181:b2c600be7e90 Good: 134098:4c2dadbc0908

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 4

•

11 years ago

Ok, bisecting shows that the trouble commit is: http://hg.mozilla.org/mozilla-central/rev/8634a682e646

Blocks: 862952

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 5

•

11 years ago

So, the structure of the contacts app scrollable region looks like: <div> <section> <section> <abbr>A</abbr> <ol> <li>contact 1</li> <li>contact 2</li> </ol> </section> <section> <abbr>B</abbr> <ol> <li>contact 3</li> <li>contact 4</li> </ol> </section> </section> </div> The app uses translateY() to keep the <abbr> title at the top of the view while scrolling within the associated ordered list of contacts. Do the sections correspond to layers within the gfx subsystem? I'm trying to understand how scrolling from one section to the next could be triggering a problem with this patch. Nick, Diego, Benoit: Any ideas or suggestions? Thanks!

Flags: needinfo?(ncameron)

Flags: needinfo?(dwilson)

Flags: needinfo?(bjacob)

Diego Wilson [:diego]

Comment 6

•

11 years ago

Ben, Is there anything interesting in logcat?

Flags: needinfo?(dwilson)

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 7

•

11 years ago

(In reply to Diego Wilson [:diego] from comment #6) > Is there anything interesting in logcat? Nope. I did add some debug to HwcComposer2D::PrepareLayerList() and the various sizes coming back seem reasonable: ### ### HwcComposer2D::PrepareLayerList() - surfaceSize=273:395 ### ### HwcComposer2D::PrepareLayerList() - state msize=273:395 ### ### GrallocTextureHostOGL::GetRenderState() - EARLY EXIT ### ### HwcComposer2D::PrepareLayerList() - surfaceSize=320:65 ### ### HwcComposer2D::PrepareLayerList() - state msize=320:65 ### ### GrallocTextureHostOGL::GetRenderState() - EARLY EXIT ### ### HwcComposer2D::PrepareLayerList() - surfaceSize=320:50 ### ### HwcComposer2D::PrepareLayerList() - state msize=320:50 ### ### GrallocTextureHostOGL::GetRenderState() - EARLY EXIT ### ### HwcComposer2D::PrepareLayerList() - surfaceSize=320:410 ### ### HwcComposer2D::PrepareLayerList() - state msize=320:410 ### ### GrallocTextureHostOGL::GetRenderState() - EARLY EXIT ### ### HwcComposer2D::PrepareLayerList() - surfaceSize=320:3 ### ### HwcComposer2D::PrepareLayerList() - state msize=320:3 ### ### GrallocTextureHostOGL::GetRenderState() - EARLY EXIT ### ### HwcComposer2D::PrepareLayerList() - surfaceSize=320:480 ### ### HwcComposer2D::PrepareLayerList() - state msize=320:480

Diego Wilson [:diego]

Comment 8

•

11 years ago

IIRC scrolling uses a feature caller "buffer rotation" which HwcComposer2D does not support. It's supposed to fall back to GPU rendering here: https://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#356 It's possible that this flag is not being set and Hwc renders the frame unrotated (ie unscrolled)

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 9

•

11 years ago

(In reply to Diego Wilson [:diego] from comment #8) > IIRC scrolling uses a feature caller "buffer rotation" which HwcComposer2D > does not support. It's supposed to fall back to GPU rendering here: > > https://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D. > cpp#356 > > It's possible that this flag is not being set and Hwc renders the frame > unrotated (ie unscrolled) This does look to be the issue. The jank goes away if I force that short-circuit logic to always return from HwcComposer2D::PrepareLayerList() regardless of flags. Adding some debug shows that during normal scrolling we are getting a mix of states and some have the flag set: ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x2 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x2 When the letter group boundary is reached there are some additional layers without the flag set: ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x2 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x2 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x2 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x2

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 10

•

11 years ago

Not sure if I am tracing this right, but it appears that the flag is driven by the value in ContentClient::mBufferRotation. If this point is 0,0 then the flag is not set. Any non-zero value results in the flag being set. When the jank occurs I see mBufferRotation getting set to zero in two locations repeatedly: - https://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ContentClient.cpp#343 - https://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ContentClient.cpp#431 These two locations also periodically set it to zero while scrolling normally, though. I'm not familiar enough with the code to know if these are really related to the jank or not. Any thoughts?

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 11

•

11 years ago

I tried disabling the style transform using translateY to keep the header at the top of the page. Without this in the mix there is also no jank. So it appears that using translateY while scrolling can prevent the buffer rotation flag from being set.

Nick Cameron [:nrc]

Comment 12

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #5) > So, the structure of the contacts app scrollable region looks like: > > <div> > <section> > <section> > <abbr>A</abbr> > <ol> > <li>contact 1</li> > <li>contact 2</li> > </ol> > </section> > <section> > <abbr>B</abbr> > <ol> > <li>contact 3</li> > <li>contact 4</li> > </ol> > </section> > </section> > </div> > > The app uses translateY() to keep the <abbr> title at the top of the view > while scrolling within the associated ordered list of contacts. > > Do the sections correspond to layers within the gfx subsystem? I'm trying to > understand how scrolling from one section to the next could be triggering a > problem with this patch. > > Nick, Diego, Benoit: Any ideas or suggestions? > > Thanks! It is a bit more complicated than mapping sections to layers - certainly not every section/div will get a layer, in fact we will try to minimise the number of layers created. Furthermore, unless there is good reason, layers will be squished together for compositing, so before Hwc gets to see things, this could all be a single layer. Scrolling usually forces a new layer, so whichever element gets scrolled will probably start a new layer and everything under it will be on that layer. But using translateY might not create a new layer and so the layer has to be redrawn every frame (rather than just moved if the contents don't change which would not require redrawing). This could well cause jank. Is it possible to use position:fixed on the abbr elements rather than translateY? If not then setting a very small perspective 3d value for the abbr element should force it to get its own layer and might cure the jank that way (be aware though that creating an extra layer has downsides too - we will use more memory and rendering can sometimes be slower, so I advise against using this trick unless it is really necessary).

Flags: needinfo?(ncameron)

Nick Cameron [:nrc]

Comment 13

•

11 years ago

So looks like the above theory is not the cause because I missed that this is a regression. One thing to try is changing this line http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer.cpp#449 to |bool canHaveRotation = false;| That will disable buffer rotation everywhere, and if that gets rid of the jank, then we know to blame buffer rotation, if it increases the jank, then maybe it is a problem with Hwc somewhere. If nothing changes then buffer rotation has nothing to do with it!

Nick Cameron [:nrc]

Comment 14

•

11 years ago

It might still be worth trying position:fixed rather than translateY to see if forcing a layer solves the problem. Even if it is not a good solution it will give us more info. Using translateY by a different amount each frame as we scroll will force redrawing which will mean no buffer rotation. We should try to avoid that in general, but this might be a special case for some reason. Sorry I can't jump in and help more directly - I don't have a device which supports Hwc, so I can't observe the jank. Diego - is there any overhead from switching from the Hwc path to the OpenGL path, or is that pretty cheap?

Flags: needinfo?(dwilson)

Diego Wilson [:diego]

Comment 15

•

11 years ago

(In reply to Nick Cameron [:nrc] from comment #14) > Diego - is there any overhead from switching from the Hwc path to the OpenGL > path, or is that pretty cheap? Super cheap. They happen all the time and there's never been any known latency problem with switching.

Flags: needinfo?(dwilson)

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 16

•

11 years ago

(In reply to Nick Cameron [:nrc] from comment #13) > One thing to try is changing this line > http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer. > cpp#449 to |bool canHaveRotation = false;| > > That will disable buffer rotation everywhere, and if that gets rid of the > jank, then we know to blame buffer rotation, if it increases the jank, then > maybe it is a problem with Hwc somewhere. If nothing changes then buffer > rotation has nothing to do with it! Thanks Nick! I made the recommended change in ThebesLayerBuffer to force buffer rotation to be off and this got rid of the jank.

Ben Kelly [:bkelly, not reviewing]

Reporter

Updated

•

11 years ago

Component: Layout → Graphics: Layers

Summary: [b2g][contacts] jank during contacts app scrolling from one letter section to the next → buffer rotation can cause jank under certain conditions

Nick Cameron [:nrc]

Comment 17

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #16) > (In reply to Nick Cameron [:nrc] from comment #13) > > One thing to try is changing this line > > http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer. > > cpp#449 to |bool canHaveRotation = false;| > > > > That will disable buffer rotation everywhere, and if that gets rid of the > > jank, then we know to blame buffer rotation, if it increases the jank, then > > maybe it is a problem with Hwc somewhere. If nothing changes then buffer > > rotation has nothing to do with it! > > Thanks Nick! > > I made the recommended change in ThebesLayerBuffer to force buffer rotation > to be off and this got rid of the jank. Just to clarify, that implicates buffer rotation, but the cause of the jank is some interaction between Hwc and buffer rotation since we didn't have jank before Hwc was enabled.

Summary: buffer rotation can cause jank under certain conditions → buffer rotation and HwcComposer can cause jank under certain conditions

Nick Cameron [:nrc]

Comment 18

•

11 years ago

OK, I have requested hardware so I can investigate this sort of thing. If this is not urgent I can take this bug once the hardware arrives.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 19

•

11 years ago

(In reply to Nick Cameron [:nrc] from comment #18) > OK, I have requested hardware so I can investigate this sort of thing. If > this is not urgent I can take this bug once the hardware arrives. Thanks Nick! I was in the middle of another bug when I ran into this, so I think I will leave it to you for now. If I finish my other work and you still don't have a device then I will circle back.

Flags: needinfo?(bkelly)

Nick Cameron [:nrc]

Comment 20

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #19) > (In reply to Nick Cameron [:nrc] from comment #18) > > OK, I have requested hardware so I can investigate this sort of thing. If > > this is not urgent I can take this bug once the hardware arrives. > > Thanks Nick! I was in the middle of another bug when I ran into this, so I > think I will leave it to you for now. If I finish my other work and you > still don't have a device then I will circle back. Sounds good. It might be a couple of weeks before I get a device though...

Benoit Jacob [:bjacob] (mostly away)

Comment 21

•

11 years ago

I really don't know much about scrolling and buffer rotation; did you want :BenWa ?

Flags: needinfo?(bjacob)

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 22

•

11 years ago

Nominating koi? since it produces bad jank on Buri with mozilla-central.

blocking-b2g: --- → koi?

Flags: needinfo?(bkelly)

Ben Kelly [:bkelly, not reviewing]

Reporter

Updated

•

11 years ago

Flags: needinfo?(bkelly)

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 23

•

11 years ago

Nick, any luck getting a buri to test with? Mike, anything we can do to help get Nick or someone in the gfx team a device?

Flags: needinfo?(ncameron)

Flags: needinfo?(mlee)

Nick Cameron [:nrc]

Comment 24

•

11 years ago

Apparently an Inari got dispatched to me yesterday (finally) so it should be with me in the next few days.

Flags: needinfo?(ncameron)

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 25

•

11 years ago

Thats good news Nick. Thanks! I assume an Inari has the proper hwc hardware?

Flags: needinfo?(mlee)

Nick Cameron [:nrc]

Comment 26

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #25) > Thats good news Nick. Thanks! > > I assume an Inari has the proper hwc hardware? I was told that it did. I hope that is true otherwise the whole month-long ordeal of getting another b2g phone will have been for naught.

Ben Kelly [:bkelly, not reviewing]

Reporter

Updated

•

11 years ago

Flags: needinfo?(bkelly)

Keywords: regression

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 27

•

11 years ago

Attached image 2013-07-25-20-21-01.png — Details

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 28

•

11 years ago

Attached image 2013-07-25-20-21-12.png — Details

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 29

•

11 years ago

Attached image 2013-07-25-20-21-19.png — Details

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 30

•

11 years ago

Attached image 2013-07-25-20-21-25.png — Details

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 31

•

11 years ago

The 4 attachments provide screenshots at various point during scrolling with layers.draw-borders enabled. This is with the pre-regression code at rev 134166:12cdc8931e48. When moving to rev 134167:8634a682e646, which shows the regression, the behavior of layers.draw-borders changes. At times the borders disappear. In particular, the borders are not drawn exactly when transitioning from one title to the next which is also when the jank occurs. Otherwise the layers look similar to the screenshots above. Nick, does this help narrow the problem at all?

Flags: needinfo?(ncameron)

Nick Cameron [:nrc]

Comment 32

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #31) > The 4 attachments provide screenshots at various point during scrolling with > layers.draw-borders enabled. This is with the pre-regression code at rev > 134166:12cdc8931e48. > > When moving to rev 134167:8634a682e646, which shows the regression, the > behavior of layers.draw-borders changes. At times the borders disappear. > In particular, the borders are not drawn exactly when transitioning from one > title to the next which is also when the jank occurs. Otherwise the layers > look similar to the screenshots above. > > Nick, does this help narrow the problem at all Drawing layers borders only works when we are using our own Compositor. If we are using Hwc, then we do not draw borders. I'm not sure how the drawing works and therefore why we have any borders at all if we are using Hwc here, they are probably written back into the GL textures, and so Hwc is rendering old borders. BTW, I am working on this now. Sorry I didn't have any time to get to this earlier.

Flags: needinfo?(ncameron)

Nick Cameron [:nrc]

Comment 33

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #4) > Ok, bisecting shows that the trouble commit is: > > http://hg.mozilla.org/mozilla-central/rev/8634a682e646 This patch just re-enabled Hwc which got accidentally turned off by the layers refactoring. So this regression could be caused by any changes between then and the layers refactoring landing including the refactoring itself (which is highly likely).

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 34

•

11 years ago

Thanks Nick! Just to clarify, have you been able to reproduce on your Inari?

Nick Cameron [:nrc]

Comment 35

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #34) > Thanks Nick! > > Just to clarify, have you been able to reproduce on your Inari? Sadly, I can no longer build for my Inari. My build fails with 'system.img too large'. I haven't found anyone on #b2g to help. (I did manage to build for and flash my Inari a week or so ago, but now that I actually have time to investigate, it no longer works).

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 36

•

11 years ago

How big is your system.img? For my Buri: ls -lh out/target/product/hamachi/system.img -rw------- 1 bkelly bkelly 105M Jul 29 13:52 out/target/product/hamachi/system.img Is this a debug build? Does it work if you set B2G_DEBUG=0 in your .userconfig? Also, how much space is on your phone? Again, for my Buri: adb shell df Filesystem Size Used Free Blksize /dev 90M 48K 90M 4096 /mnt/asec 90M 0K 90M 4096 /mnt/obb 90M 0K 90M 4096 /system 200M 113M 86M 4096 /data 161M 59M 101M 4096 /persist 4M 780K 3M 4096 /cache 40M 1M 38M 4096 /mnt/sdcard 7G 237M 7G 32768 /mnt/secure/asec 7G 237M 7G 32768 I'm going to be mostly out this afternoon and tomorrow, so thought I would post these ideas here. If that doesn't help maybe we can talk more on IRC next week. :mwu would also be a good person to ask. Thanks!

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 37

•

11 years ago

Oh, and you get the 'system.img too large' during ./build.sh or during ./flash.sh?

Flags: needinfo?(ncameron)

Nick Cameron [:nrc]

Comment 38

•

11 years ago

The error is during ./build.sh. Doing a fresh build to get the system.img size. Doing a non-debug build. I guess that still includes debug symbols though, as on desktop? Filesystem Size Used Free Blksize /dev 86M 100K 86M 4096 /mnt/asec 86M 0K 86M 4096 /mnt/obb 86M 0K 86M 4096 /system 234M 114M 119M 4096 /data 152M 61M 91M 4096 /persist 1M 772K 764K 4096 /cache 57M 1M 56M 4096

Flags: needinfo?(ncameron)

Nick Cameron [:nrc]

Comment 39

•

11 years ago

OK, now it works. Which is weird. I presume because I updated my repo. For the record the size of my system.img is 99217536, so I'm betting I went over the 100MB limit which I think Fastboot or something has. Well, the good news is I can actually now look at this bug.

Nick Cameron [:nrc]

Comment 40

•

11 years ago

Hmm, I can't repro the jank. I'll debug tomorrow to see if I am in fact using Composer2D and buffer rotation.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 41

•

11 years ago

My fear is this is a Buri specific issue. For example, I've experienced other hwc related problems such as in bug 900029 and bug 901395. Also, since it is not easily reproduced by others, perhaps its specific to a particular production run of the phone. I'm not sure how much variation in the hardware there is.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 42

•

11 years ago

Please ignore comment 41. I was able to resolve those issues by updating my firmware to the latest vendor version, but this did not help this problem. I do have some additional logcat information, though: D/HWComposer( 2238): ThebesLayerComposite Layer has a rotated buffer D/HWComposer( 2238): Render aborted. Nothing was drawn to the screen D/HWComposer( 2238): ThebesLayerComposite Layer has a rotated buffer D/HWComposer( 2238): Render aborted. Nothing was drawn to the screen D/HWComposer( 2238): ThebesLayerComposite Layer has a rotated buffer D/HWComposer( 2238): Render aborted. Nothing was drawn to the screen D/HWComposer( 2238): Frame rendered D/HWComposer( 2238): Frame rendered E/libgenlock( 2381): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err=Connection timed out fd=62) E/msm7627a.gralloc( 2381): gralloc_lock: genlock_lock_buffer (lockType=0x2) failed W/GraphicBufferMapper( 2381): lock(...) failed -22 (Invalid argument) D/HWComposer( 2238): ThebesLayerComposite Layer has a rotated buffer D/HWComposer( 2238): Render aborted. Nothing was drawn to the screen D/HWComposer( 2238): ThebesLayerComposite Layer has a rotated buffer D/HWComposer( 2238): Render aborted. Nothing was drawn to the screen The perform_lock_unlock_operation line comes out exactly when the jank the occurs.

Nick Cameron [:nrc]

Comment 43

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #41) > My fear is this is a Buri specific issue. For example, I've experienced > other hwc related problems such as in bug 900029 and bug 901395. Also, > since it is not easily reproduced by others, perhaps its specific to a > particular production run of the phone. I'm not sure how much variation in > the hardware there is. Perhaps it is due to doing a full flash.sh? I have only flash.sh gecko and gaia on my phone. Has anyone repro'd this on a phone without a full flash?

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 44

•

11 years ago

(In reply to Nick Cameron [:nrc] from comment #43) > Perhaps it is due to doing a full flash.sh? I have only flash.sh gecko and > gaia on my phone. Has anyone repro'd this on a phone without a full flash? Unfortunately I reproduced the problem only doing a ./flash.sh gecko. I did not do a full flash this time. Does the GENLOCK_IOC_DREADLOCK failure offer any clues?

Sotaro Ikeda [:sotaro]

Comment 45

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #44) > > Does the GENLOCK_IOC_DREADLOCK failure offer any clues? It seems similar to Bug 898919. In the bug, genlock failure always happened in following situation. See Bug 898919 comment #44. - When gen lock failed happens, there were always a same pattern. + ALL last OpenGL rendered video frame buffers before genlock failure were also rendered by Hw Composer.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 46

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #42) > E/libgenlock( 2381): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK > failed (lockType0x1, err=Connection timed out fd=62) > E/msm7627a.gralloc( 2381): gralloc_lock: genlock_lock_buffer (lockType=0x2) > failed > W/GraphicBufferMapper( 2381): lock(...) failed -22 (Invalid argument) Triggering this on a debug build produces: F/MOZ_Assert( 633): Assertion failure: status == OK, at /srv/mozilla-central/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:463 F/libc ( 633): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1) Although I guess that is not too helpful if the true problem is that the lock is being held somewhere.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 47

•

11 years ago

For what its worth, here is the stack trace for that assert. #0 0x4182aba2 in mozilla::layers::ShadowLayerForwarder::PlatformOpenDescriptor ( aMode=mozilla::layers::OPEN_READ_WRITE, aSurface=...) at /srv/mozilla-central/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:463 #1 0x41869504 in mozilla::layers::ShadowLayerForwarder::OpenDescriptor (aMode=117, aSurface=...) at /srv/mozilla-central/gfx/layers/ipc/ShadowLayers.cpp:569 #2 0x4186d2ee in mozilla::layers::DeprecatedTextureClientShmem::GetSurface (this=0x455cd430) at /srv/mozilla-central/gfx/layers/client/TextureClient.cpp:366 #3 0x4186d35e in mozilla::layers::DeprecatedTextureClientShmem::LockSurface (this=0x455cd430) at ../../dist/include/mozilla/layers/TextureClient.h:441 #4 0x4187185e in mozilla::layers::ThebesLayerBuffer::EnsureBuffer (this=0x4509c928) at /srv/mozilla-central/gfx/layers/ThebesLayerBuffer.cpp:410 #5 0x418718d2 in mozilla::layers::ThebesLayerBuffer::GetContextForQuadrantUpdate ( this=0x455cd430, aBounds=..., aSource=123, aTopLeft=0x0) at /srv/mozilla-central/gfx/layers/ThebesLayerBuffer.cpp:302 #6 0x4184bf20 in mozilla::layers::ContentClientDoubleBuffered::UpdateDestinationFrom ( this=0x4509c900, aSource=..., aUpdateRegion=...) at /srv/mozilla-central/gfx/layers/client/ContentClient.cpp:497 #7 0x4184c31c in mozilla::layers::ContentClientDoubleBuffered::SyncFrontBufferToBackBuffer ( this=0x4509c900) at /srv/mozilla-central/gfx/layers/client/ContentClient.cpp:485 #8 0x4183de02 in mozilla::layers::ClientThebesLayer::PaintThebes (this=0x443dc900) at /srv/mozilla-central/gfx/layers/client/ClientThebesLayer.cpp:57 #9 0x4183e3c4 in mozilla::layers::ClientThebesLayer::RenderLayer (this=0x443dc900) at /srv/mozilla-central/gfx/layers/client/ClientThebesLayer.cpp:123 #10 0x4183c1d8 in ClientContainerLayer::RenderLayer (this=0x463bb800) at /srv/mozilla-central/gfx/layers/client/ClientContainerLayer.h:191 #11 0x4183c1d8 in ClientContainerLayer::RenderLayer (this=0x45077400) at /srv/mozilla-central/gfx/layers/client/ClientContainerLayer.h:191 #12 0x4183cf58 in mozilla::layers::ClientLayerManager::EndTransactionInternal (this=0x44850100, aCallback=0x409e99e9 <mozilla::FrameLayerBuilder::DrawThebesLayer(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, nsIntRegion const&, void*)>, aCallbackData=<value optimized out>) at /srv/mozilla-central/gfx/layers/client/ClientLayerManager.cpp:176 #13 0x4183d84e in mozilla::layers::ClientLayerManager::EndTransaction (this=0x44850100, aCallback=0x409e99e9 <mozilla::FrameLayerBuilder::DrawThebesLayer(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, nsIntRegion const&, void*)>, aCallbackData=0xbef334a8, aFlags=mozilla::layers::LayerManager::END_NO_COMPOSITE) at /srv/mozilla-central/gfx/layers/client/ClientLayerManager.cpp:199 #14 0x40a1ff5c in nsDisplayList::PaintForFrame (this=<value optimized out>, aBuilder=0xbef334a8, aCtx=<value optimized out>, aForFrame=<value optimized out>, aFlags=13) at /srv/mozilla-central/layout/base/nsDisplayList.cpp:1190 #15 0x40a20168 in nsDisplayList::PaintRoot (this=0xbef33840, aBuilder=0xbef334a8, aCtx=0x0, aFlags=13) at /srv/mozilla-central/layout/base/nsDisplayList.cpp:1051 #16 0x40a3a5c6 in nsLayoutUtils::PaintFrame (aRenderingContext=<value optimized out>, aFrame=0x44bac298, aDirtyRegion=<value optimized out>, aBackstop=<value optimized out>, aFlags=772) at /srv/mozilla-central/layout/base/nsLayoutUtils.cpp:2126 #17 0x40a4eb34 in PresShell::Paint (this=0x40493630, aViewToPaint=<value optimized out>, aDirtyRegion=<value optimized out>, aFlags=1) at /srv/mozilla-central/layout/base/nsPresShell.cpp:5605 #18 0x40e451ae in nsViewManager::ProcessPendingUpdatesForView (this=0x44ba2730, aView=0x44ba0880, aFlushDirtyRegion=<value optimized out>) at /srv/mozilla-central/view/src/nsViewManager.cpp:410 #19 0x40e45264 in nsViewManager::ProcessPendingUpdates (this=<value optimized out>) at /srv/mozilla-central/view/src/nsViewManager.cpp:1031 #20 0x40a5981e in nsRefreshDriver::Tick (this=0x17c0174, aNowEpoch=282711927, aNowTime=...) at /srv/mozilla-central/layout/base/nsRefreshDriver.cpp:1233 #21 0x40a59d92 in mozilla::RefreshDriverTimer::TickDriver (aTimer=<value optimized out>, aClosure=<value optimized out>) at /srv/mozilla-central/layout/base/nsRefreshDriver.cpp:171 #22 mozilla::RefreshDriverTimer::Tick (aTimer=<value optimized out>, aClosure=<value optimized out>) at /srv/mozilla-central/layout/base/nsRefreshDriver.cpp:163 #23 mozilla::RefreshDriverTimer::TimerTick (aTimer=<value optimized out>, aClosure=<value optimized out>) at /srv/mozilla-central/layout/base/nsRefreshDriver.cpp:188 #24 0x417c221c in nsTimerImpl::Fire (this=0x404a06a0) at /srv/mozilla-central/xpcom/threads/nsTimerImpl.cpp:544 #25 0x417c240a in nsTimerEvent::Run (this=0x448ec190) at /srv/mozilla-central/xpcom/threads/nsTimerImpl.cpp:628 #26 0x417be728 in nsThread::ProcessNextEvent (this=0x40402390, mayWait=<value optimized out>, result=0xbef33e4f) at /srv/mozilla-central/xpcom/threads/nsThread.cpp:622 #27 0x417861ca in NS_ProcessNextEvent (thread=0x40402390, mayWait=false) at /srv/mozilla-central/objdir-gonk-hamachi-debug-m-c/xpcom/build/nsThreadUtils.cpp:238 #28 0x413cbcf8 in mozilla::ipc::MessagePump::Run (this=0x40401bb0, aDelegate=0xbef3490c) at /srv/mozilla-central/ipc/glue/MessagePump.cpp:81 #29 0x413cbe78 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x40401bb0, aDelegate=0xbef3490c) at /srv/mozilla-central/ipc/glue/MessagePump.cpp:234 #30 0x417eb7fe in MessageLoop::RunInternal (this=0xbef3490c) at /srv/mozilla-central/ipc/chromium/src/base/message_loop.cc:220 #31 0x417eb816 in MessageLoop::RunHandler (this=0xbef3490c) at /srv/mozilla-central/ipc/chromium/src/base/message_loop.cc:213 #32 MessageLoop::Run (this=0xbef3490c) at /srv/mozilla-central/ipc/chromium/src/base/message_loop.cc:187 #33 0x4135280e in nsBaseAppShell::Run (this=0x4438d280) at /srv/mozilla-central/widget/xpwidgets/nsBaseAppShell.cpp:163 #34 0x407d0866 in XRE_RunAppShell () at /srv/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:676 #35 0x413cbde2 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x40401bb0, aDelegate=0xbef3490c) at /srv/mozilla-central/ipc/glue/MessagePump.cpp:201 #36 0x417eb7fe in MessageLoop::RunInternal (this=0xbef3490c) at /srv/mozilla-central/ipc/chromium/src/base/message_loop.cc:220 #37 0x417eb816 in MessageLoop::RunHandler (this=0xbef3490c) at /srv/mozilla-central/ipc/chromium/src/base/message_loop.cc:213 #38 MessageLoop::Run (this=0xbef3490c) at /srv/mozilla-central/ipc/chromium/src/base/message_loop.cc:187 #39 0x407d1140 in XRE_InitChildProcess (aArgc=2, aArgv=0xbef34a20, aProcess=1078199296) at /srv/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:513 #40 0x00008786 in main (argc=7, argv=0xbef34aa4) at /srv/mozilla-central/ipc/app/MozillaRuntimeMain.cpp:85

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 48

•

11 years ago

In case anyone else hits this, the easiest way to work around the problem at the moment is to add the following to your gaia/build/custom-prefs.js: pref("layers.bufferrotation.enabled", false); And then do a make reset-gaia.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 49

•

11 years ago

Is it significant or unexpected that we are going through DeprecatedTextureClientShmem here? Any hints on how to track down where the lock is being held in the graphics subsystem would be great.

Flags: needinfo?(ncameron)

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 50

•

11 years ago

So, I can still reproduce this even with buffer rotation disabled now. When I first open contacts scrolling works fine; no jank. I then press the home button and then go back into contacts. Sometimes I get logcat output like: D/HWComposer( 140): Frame rendered E/copybit ( 140): copyBits failed (Operation not permitted) E/copybit ( 140): 0: src={w=320, h=480, f=1, rect={0,0,320,460}} E/copybit ( 140): dst={w=80, h=114, f=14, rect={0,0,80,114}} E/copybit ( 140): flags=00020000 E/msm7627a.hwcomposer( 140): drawLayerUsingCopybit:1676::tmp copybit stretch failed E/copybit ( 140): copyBits failed (Operation not permitted) E/copybit ( 140): 0: src={w=288, h=384, f=1, rect={0,0,263,381}} E/copybit ( 140): dst={w=64, h=94, f=14, rect={0,0,64,94}} E/copybit ( 140): flags=00020000 And then further down: D/HWComposer( 140): ThebesLayerComposite Layer doesn't have a gralloc buffer D/HWComposer( 140): Render aborted. Nothing was drawn to the screen And then when I scroll past a header transition as originally reported I get the jank and the lock failure again: E/libgenlock( 486): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err=Connection timed out fd=47) E/msm7627a.gralloc( 486): gralloc_lock: genlock_lock_buffer (lockType=0x2) failed W/GraphicBufferMapper( 486): lock(...) failed -22 (Invalid argument)

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 51

•

11 years ago

I also sometimes see this at various times: E/memalloc( 140): /dev/pmem: No more pmem available E/msm7627a.gralloc( 140): gralloc failed err=Out of memory W/GraphicBufferAllocator( 140): alloc(640, 960, 2, 00000133, ...) failed -12 (Out of memory)

Diego Wilson [:diego]

Comment 52

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #51) > I also sometimes see this at various times: > > E/memalloc( 140): /dev/pmem: No more pmem available > E/msm7627a.gralloc( 140): gralloc failed err=Out of memory > W/GraphicBufferAllocator( 140): alloc(640, 960, 2, 00000133, ...) failed > -12 (Out of memory) This sounds like a side-effect of the earlier genlock failures. Have you tried Sotaro's compositionComplete() patch in bug 898919?

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 53

•

11 years ago

(In reply to Diego Wilson [:diego] from comment #52) > This sounds like a side-effect of the earlier genlock failures. Have you > tried Sotaro's compositionComplete() patch in bug 898919? I just tried the patch and it unfortunately did not help. Sotaro did indicate it was for b2g18, however, and I am running on m-c. The patch did apply cleanly and compile, though.

Nick Cameron [:nrc]

Comment 54

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #49) > Is it significant or unexpected that we are going through > DeprecatedTextureClientShmem here? > It is expected, they are not deprecated, they are just badly named (there are new texture clients but they are not used yet). > Any hints on how to track down where the lock is being held in the graphics > subsystem would be great. I don't know sorry. bjacob is our man for tracking down gralloc locking errors. bjacob - could you take a look at this please?

Flags: needinfo?(ncameron) → needinfo?(bjacob)

Benoit Jacob [:bjacob] (mostly away)

Comment 55

•

11 years ago

Here is what I know about tracking down genlock failures. I use this patch: http://people.mozilla.org/~bjacob/genlock-logging which applies to some directory under the B2G repo, IIRC under vendor/qcom (genlock is qualcomm-only). This patch records all genlock activity to /data/local/tmp/b2glog, and aborts on the first genlock failure. The log then generally contains the info you need to understand the failure. This patch has a flaw: typically the file gets first created by the main b2g process with root permissions, so subsequent attempts to append to it by other processes fail, causing them to abort. To prevent/fix that problem, just create this /data/local/tmp/b2glog file ahead of time on the device with 0777 permissions, or just edit this patch so that it chmod's the file with 0777 right after opening it, in case that opening was also the creation of the file. I'm happy to look at a resulting b2glog file. I'd also like to help more directly, but my b2g tree is non-qualcomm at the moment (I'm on emulator), and I'm assigned short term emergencies (bug 905214) for the rest of this cycle... let me know if I can answer more questions at least.

Flags: needinfo?(bjacob)

Benoit Jacob [:bjacob] (mostly away)

Comment 56

•

11 years ago

Oh yes, and once you've applied this patch, you need to re-run ./build.sh and MANUALLY push the resulting libgenlock.so to the device, because by default the one we push to the device is NOT the one we build, instead we push a prebuilt vendor binary. The right libgenlock.so is mentioned in the build log as it gets built early during ./build.sh.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 57

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #48) > In case anyone else hits this, the easiest way to work around the problem at > the moment is to add the following to your gaia/build/custom-prefs.js: > > pref("layers.bufferrotation.enabled", false); > > And then do a make reset-gaia. While this does help a bit, it is still possible to run into this problem even with buffer rotation disabled. I tried setting layers.acceleration.disabled to true, but b2g dies with SIGSEGV in that case.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 58

•

11 years ago

Retesting now that bug 905304 is fixed in m-c.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 59

•

11 years ago

Unfortunately I was still able to provoke the GENLOCK_IOC_DREADLOCK failed with m-c at 143295:d136c8999d96.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 61

•

11 years ago

Here are simple instructions for reproducing on the buri: 1) Flash mozilla-central on the phone. 2) From gaia directory run "make reference-workload-heavy" to install test contacts. 3) Open contacts app. 4) Scroll down until you transition from group A to group B, etc. The genlock failures occur during the transition.

Tony Chung [:tchung]

Comment 64

•

11 years ago

Can we get this bug koi+'d? Typing in the URL field of the browser App is very unusable when typing. The experience is much like this screencast, except on the browser. http://youtu.be/xhaZBX1Aq34 ni? relmgmt, as i dont know who triages 1.2 core bugs.

Flags: needinfo?(praghunath)

Flags: needinfo?(akeybl)

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 65

•

11 years ago

My impression is that the genlock failures as reported here are part of the awesome work Sotaro is doing to investigate genlock failures in other bugs. Sotaro can you confirm? Or do you think this bug is describing a separate issue?

Flags: needinfo?(sotaro.ikeda.g)

Sotaro Ikeda [:sotaro]

Comment 66

•

11 years ago

(In reply to Ben Kelly [:bkelly] from comment #65) > My impression is that the genlock failures as reported here are part of the > awesome work Sotaro is doing to investigate genlock failures in other bugs. > > Sotaro can you confirm? Or do you think this bug is describing a separate > issue? This bug seems same to Bug 916264.

Flags: needinfo?(sotaro.ikeda.g)

Ben Kelly [:bkelly, not reviewing]

Reporter

Updated

•

11 years ago

Depends on: 916264

Mike Lee [:mlee]

Comment 67

•

11 years ago

Per Sotaro's comment #66 this appears to be a duplicate of bug 916264. Any reason why we're not closing it ad marking it as such? Is there more left to do here now that bug 916264 is fixed?

Whiteboard: c= → [c=handeye p= s= u=]

Sotaro Ikeda [:sotaro]

Comment 68

•

11 years ago

(In reply to Mike Lee [:mlee] from comment #67) > Per Sotaro's comment #66 this appears to be a duplicate of bug 916264. Any > reason why we're not closing it ad marking it as such? Is there more left to > do here now that bug 916264 is fixed? It seems duplicate, but I did not confirmed it is duplicate by using a device. I fixed one bug of the same symptom. But the path to the same symptom could be multiple.

Ben Kelly [:bkelly, not reviewing]

Reporter

Comment 69

•

11 years ago

While testing yesterday another contacts issue yesterday I noticed this does not reproduce for me anymore. Marking as duplicate.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → DUPLICATE

Milan Sreckovic [:milan] (needinfo for best results)

Comment 70

•

11 years ago

This is a dupe of koi+ bug 916264.

blocking-b2g: koi? → koi+

Flags: needinfo?(praghunath)

Flags: needinfo?(akeybl)

You need to log in before you can comment on or make changes to this bug.