Closed Bug 881970 Opened 11 years ago Closed 11 years ago

buffer rotation and HwcComposer can cause jank under certain conditions

Categories

(Core :: Graphics: Layers, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 916264
blocking-b2g koi+

People

(Reporter: bkelly, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [c=handeye p= s= u=])

Attachments

(4 files)

Recently I noticed a repeatable pause during scrolling within the contacts app on b2g. The jank lasts about one second in length each time. To reproduce: - Build and flash device using recent gaia/master and mozilla-central. (I am testing on a Buri.) - Install contacts by running 'make reference-workload-heavy' in gaia. - Launch contacts. - To rule out background work, wait 10 to 20 seconds for the state quiesce. - Begin scrolling down. - When the 'B' header replaces the 'A' header, there will be a one second pause. This jank appears to happen whenever the header is replaced through normally scrolling in either direction or by jump scrolling to a letter group. I assumed this was a gaia issue at first and tried to bisect to find the problem. Surprisingly, the gaia rev did not affect the issue. Bisecting mozilla-central I was able to determine the following (large) starting range: Bad: 134609:81b227f1a522 Good: 133758:6eac1d687575 The fixed header code in contacts app does some CSS transformations using translateY(). I don't really know if that's related, but I put the bug in Layout for now.
Keywords: perf
Whiteboard: c=
New range: Bad: 134448:697190293f4e Good: 133758:6eac1d687575
Bad: 134366:8e7a612cc232 Good: 134019:0ee6e6d5918e
Bad: 134181:b2c600be7e90 Good: 134098:4c2dadbc0908
Ok, bisecting shows that the trouble commit is: http://hg.mozilla.org/mozilla-central/rev/8634a682e646
Blocks: 862952
So, the structure of the contacts app scrollable region looks like: <div> <section> <section> <abbr>A</abbr> <ol> <li>contact 1</li> <li>contact 2</li> </ol> </section> <section> <abbr>B</abbr> <ol> <li>contact 3</li> <li>contact 4</li> </ol> </section> </section> </div> The app uses translateY() to keep the <abbr> title at the top of the view while scrolling within the associated ordered list of contacts. Do the sections correspond to layers within the gfx subsystem? I'm trying to understand how scrolling from one section to the next could be triggering a problem with this patch. Nick, Diego, Benoit: Any ideas or suggestions? Thanks!
Flags: needinfo?(ncameron)
Flags: needinfo?(dwilson)
Flags: needinfo?(bjacob)
Ben, Is there anything interesting in logcat?
Flags: needinfo?(dwilson)
(In reply to Diego Wilson [:diego] from comment #6) > Is there anything interesting in logcat? Nope. I did add some debug to HwcComposer2D::PrepareLayerList() and the various sizes coming back seem reasonable: ### ### HwcComposer2D::PrepareLayerList() - surfaceSize=273:395 ### ### HwcComposer2D::PrepareLayerList() - state msize=273:395 ### ### GrallocTextureHostOGL::GetRenderState() - EARLY EXIT ### ### HwcComposer2D::PrepareLayerList() - surfaceSize=320:65 ### ### HwcComposer2D::PrepareLayerList() - state msize=320:65 ### ### GrallocTextureHostOGL::GetRenderState() - EARLY EXIT ### ### HwcComposer2D::PrepareLayerList() - surfaceSize=320:50 ### ### HwcComposer2D::PrepareLayerList() - state msize=320:50 ### ### GrallocTextureHostOGL::GetRenderState() - EARLY EXIT ### ### HwcComposer2D::PrepareLayerList() - surfaceSize=320:410 ### ### HwcComposer2D::PrepareLayerList() - state msize=320:410 ### ### GrallocTextureHostOGL::GetRenderState() - EARLY EXIT ### ### HwcComposer2D::PrepareLayerList() - surfaceSize=320:3 ### ### HwcComposer2D::PrepareLayerList() - state msize=320:3 ### ### GrallocTextureHostOGL::GetRenderState() - EARLY EXIT ### ### HwcComposer2D::PrepareLayerList() - surfaceSize=320:480 ### ### HwcComposer2D::PrepareLayerList() - state msize=320:480
IIRC scrolling uses a feature caller "buffer rotation" which HwcComposer2D does not support. It's supposed to fall back to GPU rendering here: https://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#356 It's possible that this flag is not being set and Hwc renders the frame unrotated (ie unscrolled)
(In reply to Diego Wilson [:diego] from comment #8) > IIRC scrolling uses a feature caller "buffer rotation" which HwcComposer2D > does not support. It's supposed to fall back to GPU rendering here: > > https://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D. > cpp#356 > > It's possible that this flag is not being set and Hwc renders the frame > unrotated (ie unscrolled) This does look to be the issue. The jank goes away if I force that short-circuit logic to always return from HwcComposer2D::PrepareLayerList() regardless of flags. Adding some debug shows that during normal scrolling we are getting a mix of states and some have the flag set: ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x2 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x2 When the letter group boundary is reached there are some additional layers without the flag set: ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x2 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x2 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x2 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x0 ### ### HwcComposer2D::PrepareLayerList() - state.mFlags = 0x2
Not sure if I am tracing this right, but it appears that the flag is driven by the value in ContentClient::mBufferRotation. If this point is 0,0 then the flag is not set. Any non-zero value results in the flag being set. When the jank occurs I see mBufferRotation getting set to zero in two locations repeatedly: - https://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ContentClient.cpp#343 - https://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ContentClient.cpp#431 These two locations also periodically set it to zero while scrolling normally, though. I'm not familiar enough with the code to know if these are really related to the jank or not. Any thoughts?
I tried disabling the style transform using translateY to keep the header at the top of the page. Without this in the mix there is also no jank. So it appears that using translateY while scrolling can prevent the buffer rotation flag from being set.
(In reply to Ben Kelly [:bkelly] from comment #5) > So, the structure of the contacts app scrollable region looks like: > > <div> > <section> > <section> > <abbr>A</abbr> > <ol> > <li>contact 1</li> > <li>contact 2</li> > </ol> > </section> > <section> > <abbr>B</abbr> > <ol> > <li>contact 3</li> > <li>contact 4</li> > </ol> > </section> > </section> > </div> > > The app uses translateY() to keep the <abbr> title at the top of the view > while scrolling within the associated ordered list of contacts. > > Do the sections correspond to layers within the gfx subsystem? I'm trying to > understand how scrolling from one section to the next could be triggering a > problem with this patch. > > Nick, Diego, Benoit: Any ideas or suggestions? > > Thanks! It is a bit more complicated than mapping sections to layers - certainly not every section/div will get a layer, in fact we will try to minimise the number of layers created. Furthermore, unless there is good reason, layers will be squished together for compositing, so before Hwc gets to see things, this could all be a single layer. Scrolling usually forces a new layer, so whichever element gets scrolled will probably start a new layer and everything under it will be on that layer. But using translateY might not create a new layer and so the layer has to be redrawn every frame (rather than just moved if the contents don't change which would not require redrawing). This could well cause jank. Is it possible to use position:fixed on the abbr elements rather than translateY? If not then setting a very small perspective 3d value for the abbr element should force it to get its own layer and might cure the jank that way (be aware though that creating an extra layer has downsides too - we will use more memory and rendering can sometimes be slower, so I advise against using this trick unless it is really necessary).
Flags: needinfo?(ncameron)
So looks like the above theory is not the cause because I missed that this is a regression. One thing to try is changing this line http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer.cpp#449 to |bool canHaveRotation = false;| That will disable buffer rotation everywhere, and if that gets rid of the jank, then we know to blame buffer rotation, if it increases the jank, then maybe it is a problem with Hwc somewhere. If nothing changes then buffer rotation has nothing to do with it!
It might still be worth trying position:fixed rather than translateY to see if forcing a layer solves the problem. Even if it is not a good solution it will give us more info. Using translateY by a different amount each frame as we scroll will force redrawing which will mean no buffer rotation. We should try to avoid that in general, but this might be a special case for some reason. Sorry I can't jump in and help more directly - I don't have a device which supports Hwc, so I can't observe the jank. Diego - is there any overhead from switching from the Hwc path to the OpenGL path, or is that pretty cheap?
Flags: needinfo?(dwilson)
(In reply to Nick Cameron [:nrc] from comment #14) > Diego - is there any overhead from switching from the Hwc path to the OpenGL > path, or is that pretty cheap? Super cheap. They happen all the time and there's never been any known latency problem with switching.
Flags: needinfo?(dwilson)
(In reply to Nick Cameron [:nrc] from comment #13) > One thing to try is changing this line > http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer. > cpp#449 to |bool canHaveRotation = false;| > > That will disable buffer rotation everywhere, and if that gets rid of the > jank, then we know to blame buffer rotation, if it increases the jank, then > maybe it is a problem with Hwc somewhere. If nothing changes then buffer > rotation has nothing to do with it! Thanks Nick! I made the recommended change in ThebesLayerBuffer to force buffer rotation to be off and this got rid of the jank.
Component: Layout → Graphics: Layers
Summary: [b2g][contacts] jank during contacts app scrolling from one letter section to the next → buffer rotation can cause jank under certain conditions
(In reply to Ben Kelly [:bkelly] from comment #16) > (In reply to Nick Cameron [:nrc] from comment #13) > > One thing to try is changing this line > > http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ThebesLayerBuffer. > > cpp#449 to |bool canHaveRotation = false;| > > > > That will disable buffer rotation everywhere, and if that gets rid of the > > jank, then we know to blame buffer rotation, if it increases the jank, then > > maybe it is a problem with Hwc somewhere. If nothing changes then buffer > > rotation has nothing to do with it! > > Thanks Nick! > > I made the recommended change in ThebesLayerBuffer to force buffer rotation > to be off and this got rid of the jank. Just to clarify, that implicates buffer rotation, but the cause of the jank is some interaction between Hwc and buffer rotation since we didn't have jank before Hwc was enabled.
Summary: buffer rotation can cause jank under certain conditions → buffer rotation and HwcComposer can cause jank under certain conditions
OK, I have requested hardware so I can investigate this sort of thing. If this is not urgent I can take this bug once the hardware arrives.
(In reply to Nick Cameron [:nrc] from comment #18) > OK, I have requested hardware so I can investigate this sort of thing. If > this is not urgent I can take this bug once the hardware arrives. Thanks Nick! I was in the middle of another bug when I ran into this, so I think I will leave it to you for now. If I finish my other work and you still don't have a device then I will circle back.
Flags: needinfo?(bkelly)
(In reply to Ben Kelly [:bkelly] from comment #19) > (In reply to Nick Cameron [:nrc] from comment #18) > > OK, I have requested hardware so I can investigate this sort of thing. If > > this is not urgent I can take this bug once the hardware arrives. > > Thanks Nick! I was in the middle of another bug when I ran into this, so I > think I will leave it to you for now. If I finish my other work and you > still don't have a device then I will circle back. Sounds good. It might be a couple of weeks before I get a device though...
I really don't know much about scrolling and buffer rotation; did you want :BenWa ?
Flags: needinfo?(bjacob)
Nominating koi? since it produces bad jank on Buri with mozilla-central.
blocking-b2g: --- → koi?
Flags: needinfo?(bkelly)
Flags: needinfo?(bkelly)
Nick, any luck getting a buri to test with? Mike, anything we can do to help get Nick or someone in the gfx team a device?
Flags: needinfo?(ncameron)
Flags: needinfo?(mlee)
Apparently an Inari got dispatched to me yesterday (finally) so it should be with me in the next few days.
Flags: needinfo?(ncameron)
Thats good news Nick. Thanks! I assume an Inari has the proper hwc hardware?
Flags: needinfo?(mlee)
(In reply to Ben Kelly [:bkelly] from comment #25) > Thats good news Nick. Thanks! > > I assume an Inari has the proper hwc hardware? I was told that it did. I hope that is true otherwise the whole month-long ordeal of getting another b2g phone will have been for naught.
Flags: needinfo?(bkelly)
Keywords: regression
Attached image 2013-07-25-20-21-01.png —
Attached image 2013-07-25-20-21-12.png —
Attached image 2013-07-25-20-21-19.png —
Attached image 2013-07-25-20-21-25.png —
The 4 attachments provide screenshots at various point during scrolling with layers.draw-borders enabled. This is with the pre-regression code at rev 134166:12cdc8931e48. When moving to rev 134167:8634a682e646, which shows the regression, the behavior of layers.draw-borders changes. At times the borders disappear. In particular, the borders are not drawn exactly when transitioning from one title to the next which is also when the jank occurs. Otherwise the layers look similar to the screenshots above. Nick, does this help narrow the problem at all?
Flags: needinfo?(ncameron)
(In reply to Ben Kelly [:bkelly] from comment #31) > The 4 attachments provide screenshots at various point during scrolling with > layers.draw-borders enabled. This is with the pre-regression code at rev > 134166:12cdc8931e48. > > When moving to rev 134167:8634a682e646, which shows the regression, the > behavior of layers.draw-borders changes. At times the borders disappear. > In particular, the borders are not drawn exactly when transitioning from one > title to the next which is also when the jank occurs. Otherwise the layers > look similar to the screenshots above. > > Nick, does this help narrow the problem at all Drawing layers borders only works when we are using our own Compositor. If we are using Hwc, then we do not draw borders. I'm not sure how the drawing works and therefore why we have any borders at all if we are using Hwc here, they are probably written back into the GL textures, and so Hwc is rendering old borders. BTW, I am working on this now. Sorry I didn't have any time to get to this earlier.
Flags: needinfo?(ncameron)
(In reply to Ben Kelly [:bkelly] from comment #4) > Ok, bisecting shows that the trouble commit is: > > http://hg.mozilla.org/mozilla-central/rev/8634a682e646 This patch just re-enabled Hwc which got accidentally turned off by the layers refactoring. So this regression could be caused by any changes between then and the layers refactoring landing including the refactoring itself (which is highly likely).
Thanks Nick! Just to clarify, have you been able to reproduce on your Inari?
(In reply to Ben Kelly [:bkelly] from comment #34) > Thanks Nick! > > Just to clarify, have you been able to reproduce on your Inari? Sadly, I can no longer build for my Inari. My build fails with 'system.img too large'. I haven't found anyone on #b2g to help. (I did manage to build for and flash my Inari a week or so ago, but now that I actually have time to investigate, it no longer works).
How big is your system.img? For my Buri: ls -lh out/target/product/hamachi/system.img -rw------- 1 bkelly bkelly 105M Jul 29 13:52 out/target/product/hamachi/system.img Is this a debug build? Does it work if you set B2G_DEBUG=0 in your .userconfig? Also, how much space is on your phone? Again, for my Buri: adb shell df Filesystem Size Used Free Blksize /dev 90M 48K 90M 4096 /mnt/asec 90M 0K 90M 4096 /mnt/obb 90M 0K 90M 4096 /system 200M 113M 86M 4096 /data 161M 59M 101M 4096 /persist 4M 780K 3M 4096 /cache 40M 1M 38M 4096 /mnt/sdcard 7G 237M 7G 32768 /mnt/secure/asec 7G 237M 7G 32768 I'm going to be mostly out this afternoon and tomorrow, so thought I would post these ideas here. If that doesn't help maybe we can talk more on IRC next week. :mwu would also be a good person to ask. Thanks!
Oh, and you get the 'system.img too large' during ./build.sh or during ./flash.sh?
Flags: needinfo?(ncameron)
The error is during ./build.sh. Doing a fresh build to get the system.img size. Doing a non-debug build. I guess that still includes debug symbols though, as on desktop? Filesystem Size Used Free Blksize /dev 86M 100K 86M 4096 /mnt/asec 86M 0K 86M 4096 /mnt/obb 86M 0K 86M 4096 /system 234M 114M 119M 4096 /data 152M 61M 91M 4096 /persist 1M 772K 764K 4096 /cache 57M 1M 56M 4096
Flags: needinfo?(ncameron)
OK, now it works. Which is weird. I presume because I updated my repo. For the record the size of my system.img is 99217536, so I'm betting I went over the 100MB limit which I think Fastboot or something has. Well, the good news is I can actually now look at this bug.
Hmm, I can't repro the jank. I'll debug tomorrow to see if I am in fact using Composer2D and buffer rotation.
My fear is this is a Buri specific issue. For example, I've experienced other hwc related problems such as in bug 900029 and bug 901395. Also, since it is not easily reproduced by others, perhaps its specific to a particular production run of the phone. I'm not sure how much variation in the hardware there is.
Please ignore comment 41. I was able to resolve those issues by updating my firmware to the latest vendor version, but this did not help this problem. I do have some additional logcat information, though: D/HWComposer( 2238): ThebesLayerComposite Layer has a rotated buffer D/HWComposer( 2238): Render aborted. Nothing was drawn to the screen D/HWComposer( 2238): ThebesLayerComposite Layer has a rotated buffer D/HWComposer( 2238): Render aborted. Nothing was drawn to the screen D/HWComposer( 2238): ThebesLayerComposite Layer has a rotated buffer D/HWComposer( 2238): Render aborted. Nothing was drawn to the screen D/HWComposer( 2238): Frame rendered D/HWComposer( 2238): Frame rendered E/libgenlock( 2381): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err=Connection timed out fd=62) E/msm7627a.gralloc( 2381): gralloc_lock: genlock_lock_buffer (lockType=0x2) failed W/GraphicBufferMapper( 2381): lock(...) failed -22 (Invalid argument) D/HWComposer( 2238): ThebesLayerComposite Layer has a rotated buffer D/HWComposer( 2238): Render aborted. Nothing was drawn to the screen D/HWComposer( 2238): ThebesLayerComposite Layer has a rotated buffer D/HWComposer( 2238): Render aborted. Nothing was drawn to the screen The perform_lock_unlock_operation line comes out exactly when the jank the occurs.
(In reply to Ben Kelly [:bkelly] from comment #41) > My fear is this is a Buri specific issue. For example, I've experienced > other hwc related problems such as in bug 900029 and bug 901395. Also, > since it is not easily reproduced by others, perhaps its specific to a > particular production run of the phone. I'm not sure how much variation in > the hardware there is. Perhaps it is due to doing a full flash.sh? I have only flash.sh gecko and gaia on my phone. Has anyone repro'd this on a phone without a full flash?
(In reply to Nick Cameron [:nrc] from comment #43) > Perhaps it is due to doing a full flash.sh? I have only flash.sh gecko and > gaia on my phone. Has anyone repro'd this on a phone without a full flash? Unfortunately I reproduced the problem only doing a ./flash.sh gecko. I did not do a full flash this time. Does the GENLOCK_IOC_DREADLOCK failure offer any clues?
(In reply to Ben Kelly [:bkelly] from comment #44) > > Does the GENLOCK_IOC_DREADLOCK failure offer any clues? It seems similar to Bug 898919. In the bug, genlock failure always happened in following situation. See Bug 898919 comment #44. - When gen lock failed happens, there were always a same pattern. + ALL last OpenGL rendered video frame buffers before genlock failure were also rendered by Hw Composer.
(In reply to Ben Kelly [:bkelly] from comment #42) > E/libgenlock( 2381): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK > failed (lockType0x1, err=Connection timed out fd=62) > E/msm7627a.gralloc( 2381): gralloc_lock: genlock_lock_buffer (lockType=0x2) > failed > W/GraphicBufferMapper( 2381): lock(...) failed -22 (Invalid argument) Triggering this on a debug build produces: F/MOZ_Assert( 633): Assertion failure: status == OK, at /srv/mozilla-central/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:463 F/libc ( 633): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1) Although I guess that is not too helpful if the true problem is that the lock is being held somewhere.
For what its worth, here is the stack trace for that assert. #0 0x4182aba2 in mozilla::layers::ShadowLayerForwarder::PlatformOpenDescriptor ( aMode=mozilla::layers::OPEN_READ_WRITE, aSurface=...) at /srv/mozilla-central/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:463 #1 0x41869504 in mozilla::layers::ShadowLayerForwarder::OpenDescriptor (aMode=117, aSurface=...) at /srv/mozilla-central/gfx/layers/ipc/ShadowLayers.cpp:569 #2 0x4186d2ee in mozilla::layers::DeprecatedTextureClientShmem::GetSurface (this=0x455cd430) at /srv/mozilla-central/gfx/layers/client/TextureClient.cpp:366 #3 0x4186d35e in mozilla::layers::DeprecatedTextureClientShmem::LockSurface (this=0x455cd430) at ../../dist/include/mozilla/layers/TextureClient.h:441 #4 0x4187185e in mozilla::layers::ThebesLayerBuffer::EnsureBuffer (this=0x4509c928) at /srv/mozilla-central/gfx/layers/ThebesLayerBuffer.cpp:410 #5 0x418718d2 in mozilla::layers::ThebesLayerBuffer::GetContextForQuadrantUpdate ( this=0x455cd430, aBounds=..., aSource=123, aTopLeft=0x0) at /srv/mozilla-central/gfx/layers/ThebesLayerBuffer.cpp:302 #6 0x4184bf20 in mozilla::layers::ContentClientDoubleBuffered::UpdateDestinationFrom ( this=0x4509c900, aSource=..., aUpdateRegion=...) at /srv/mozilla-central/gfx/layers/client/ContentClient.cpp:497 #7 0x4184c31c in mozilla::layers::ContentClientDoubleBuffered::SyncFrontBufferToBackBuffer ( this=0x4509c900) at /srv/mozilla-central/gfx/layers/client/ContentClient.cpp:485 #8 0x4183de02 in mozilla::layers::ClientThebesLayer::PaintThebes (this=0x443dc900) at /srv/mozilla-central/gfx/layers/client/ClientThebesLayer.cpp:57 #9 0x4183e3c4 in mozilla::layers::ClientThebesLayer::RenderLayer (this=0x443dc900) at /srv/mozilla-central/gfx/layers/client/ClientThebesLayer.cpp:123 #10 0x4183c1d8 in ClientContainerLayer::RenderLayer (this=0x463bb800) at /srv/mozilla-central/gfx/layers/client/ClientContainerLayer.h:191 #11 0x4183c1d8 in ClientContainerLayer::RenderLayer (this=0x45077400) at /srv/mozilla-central/gfx/layers/client/ClientContainerLayer.h:191 #12 0x4183cf58 in mozilla::layers::ClientLayerManager::EndTransactionInternal (this=0x44850100, aCallback=0x409e99e9 <mozilla::FrameLayerBuilder::DrawThebesLayer(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, nsIntRegion const&, void*)>, aCallbackData=<value optimized out>) at /srv/mozilla-central/gfx/layers/client/ClientLayerManager.cpp:176 #13 0x4183d84e in mozilla::layers::ClientLayerManager::EndTransaction (this=0x44850100, aCallback=0x409e99e9 <mozilla::FrameLayerBuilder::DrawThebesLayer(mozilla::layers::ThebesLayer*, gfxContext*, nsIntRegion const&, nsIntRegion const&, void*)>, aCallbackData=0xbef334a8, aFlags=mozilla::layers::LayerManager::END_NO_COMPOSITE) at /srv/mozilla-central/gfx/layers/client/ClientLayerManager.cpp:199 #14 0x40a1ff5c in nsDisplayList::PaintForFrame (this=<value optimized out>, aBuilder=0xbef334a8, aCtx=<value optimized out>, aForFrame=<value optimized out>, aFlags=13) at /srv/mozilla-central/layout/base/nsDisplayList.cpp:1190 #15 0x40a20168 in nsDisplayList::PaintRoot (this=0xbef33840, aBuilder=0xbef334a8, aCtx=0x0, aFlags=13) at /srv/mozilla-central/layout/base/nsDisplayList.cpp:1051 #16 0x40a3a5c6 in nsLayoutUtils::PaintFrame (aRenderingContext=<value optimized out>, aFrame=0x44bac298, aDirtyRegion=<value optimized out>, aBackstop=<value optimized out>, aFlags=772) at /srv/mozilla-central/layout/base/nsLayoutUtils.cpp:2126 #17 0x40a4eb34 in PresShell::Paint (this=0x40493630, aViewToPaint=<value optimized out>, aDirtyRegion=<value optimized out>, aFlags=1) at /srv/mozilla-central/layout/base/nsPresShell.cpp:5605 #18 0x40e451ae in nsViewManager::ProcessPendingUpdatesForView (this=0x44ba2730, aView=0x44ba0880, aFlushDirtyRegion=<value optimized out>) at /srv/mozilla-central/view/src/nsViewManager.cpp:410 #19 0x40e45264 in nsViewManager::ProcessPendingUpdates (this=<value optimized out>) at /srv/mozilla-central/view/src/nsViewManager.cpp:1031 #20 0x40a5981e in nsRefreshDriver::Tick (this=0x17c0174, aNowEpoch=282711927, aNowTime=...) at /srv/mozilla-central/layout/base/nsRefreshDriver.cpp:1233 #21 0x40a59d92 in mozilla::RefreshDriverTimer::TickDriver (aTimer=<value optimized out>, aClosure=<value optimized out>) at /srv/mozilla-central/layout/base/nsRefreshDriver.cpp:171 #22 mozilla::RefreshDriverTimer::Tick (aTimer=<value optimized out>, aClosure=<value optimized out>) at /srv/mozilla-central/layout/base/nsRefreshDriver.cpp:163 #23 mozilla::RefreshDriverTimer::TimerTick (aTimer=<value optimized out>, aClosure=<value optimized out>) at /srv/mozilla-central/layout/base/nsRefreshDriver.cpp:188 #24 0x417c221c in nsTimerImpl::Fire (this=0x404a06a0) at /srv/mozilla-central/xpcom/threads/nsTimerImpl.cpp:544 #25 0x417c240a in nsTimerEvent::Run (this=0x448ec190) at /srv/mozilla-central/xpcom/threads/nsTimerImpl.cpp:628 #26 0x417be728 in nsThread::ProcessNextEvent (this=0x40402390, mayWait=<value optimized out>, result=0xbef33e4f) at /srv/mozilla-central/xpcom/threads/nsThread.cpp:622 #27 0x417861ca in NS_ProcessNextEvent (thread=0x40402390, mayWait=false) at /srv/mozilla-central/objdir-gonk-hamachi-debug-m-c/xpcom/build/nsThreadUtils.cpp:238 #28 0x413cbcf8 in mozilla::ipc::MessagePump::Run (this=0x40401bb0, aDelegate=0xbef3490c) at /srv/mozilla-central/ipc/glue/MessagePump.cpp:81 #29 0x413cbe78 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x40401bb0, aDelegate=0xbef3490c) at /srv/mozilla-central/ipc/glue/MessagePump.cpp:234 #30 0x417eb7fe in MessageLoop::RunInternal (this=0xbef3490c) at /srv/mozilla-central/ipc/chromium/src/base/message_loop.cc:220 #31 0x417eb816 in MessageLoop::RunHandler (this=0xbef3490c) at /srv/mozilla-central/ipc/chromium/src/base/message_loop.cc:213 #32 MessageLoop::Run (this=0xbef3490c) at /srv/mozilla-central/ipc/chromium/src/base/message_loop.cc:187 #33 0x4135280e in nsBaseAppShell::Run (this=0x4438d280) at /srv/mozilla-central/widget/xpwidgets/nsBaseAppShell.cpp:163 #34 0x407d0866 in XRE_RunAppShell () at /srv/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:676 #35 0x413cbde2 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x40401bb0, aDelegate=0xbef3490c) at /srv/mozilla-central/ipc/glue/MessagePump.cpp:201 #36 0x417eb7fe in MessageLoop::RunInternal (this=0xbef3490c) at /srv/mozilla-central/ipc/chromium/src/base/message_loop.cc:220 #37 0x417eb816 in MessageLoop::RunHandler (this=0xbef3490c) at /srv/mozilla-central/ipc/chromium/src/base/message_loop.cc:213 #38 MessageLoop::Run (this=0xbef3490c) at /srv/mozilla-central/ipc/chromium/src/base/message_loop.cc:187 #39 0x407d1140 in XRE_InitChildProcess (aArgc=2, aArgv=0xbef34a20, aProcess=1078199296) at /srv/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:513 #40 0x00008786 in main (argc=7, argv=0xbef34aa4) at /srv/mozilla-central/ipc/app/MozillaRuntimeMain.cpp:85
In case anyone else hits this, the easiest way to work around the problem at the moment is to add the following to your gaia/build/custom-prefs.js: pref("layers.bufferrotation.enabled", false); And then do a make reset-gaia.
Is it significant or unexpected that we are going through DeprecatedTextureClientShmem here? Any hints on how to track down where the lock is being held in the graphics subsystem would be great.
Flags: needinfo?(ncameron)
So, I can still reproduce this even with buffer rotation disabled now. When I first open contacts scrolling works fine; no jank. I then press the home button and then go back into contacts. Sometimes I get logcat output like: D/HWComposer( 140): Frame rendered E/copybit ( 140): copyBits failed (Operation not permitted) E/copybit ( 140): 0: src={w=320, h=480, f=1, rect={0,0,320,460}} E/copybit ( 140): dst={w=80, h=114, f=14, rect={0,0,80,114}} E/copybit ( 140): flags=00020000 E/msm7627a.hwcomposer( 140): drawLayerUsingCopybit:1676::tmp copybit stretch failed E/copybit ( 140): copyBits failed (Operation not permitted) E/copybit ( 140): 0: src={w=288, h=384, f=1, rect={0,0,263,381}} E/copybit ( 140): dst={w=64, h=94, f=14, rect={0,0,64,94}} E/copybit ( 140): flags=00020000 And then further down: D/HWComposer( 140): ThebesLayerComposite Layer doesn't have a gralloc buffer D/HWComposer( 140): Render aborted. Nothing was drawn to the screen And then when I scroll past a header transition as originally reported I get the jank and the lock failure again: E/libgenlock( 486): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err=Connection timed out fd=47) E/msm7627a.gralloc( 486): gralloc_lock: genlock_lock_buffer (lockType=0x2) failed W/GraphicBufferMapper( 486): lock(...) failed -22 (Invalid argument)
I also sometimes see this at various times: E/memalloc( 140): /dev/pmem: No more pmem available E/msm7627a.gralloc( 140): gralloc failed err=Out of memory W/GraphicBufferAllocator( 140): alloc(640, 960, 2, 00000133, ...) failed -12 (Out of memory)
(In reply to Ben Kelly [:bkelly] from comment #51) > I also sometimes see this at various times: > > E/memalloc( 140): /dev/pmem: No more pmem available > E/msm7627a.gralloc( 140): gralloc failed err=Out of memory > W/GraphicBufferAllocator( 140): alloc(640, 960, 2, 00000133, ...) failed > -12 (Out of memory) This sounds like a side-effect of the earlier genlock failures. Have you tried Sotaro's compositionComplete() patch in bug 898919?
(In reply to Diego Wilson [:diego] from comment #52) > This sounds like a side-effect of the earlier genlock failures. Have you > tried Sotaro's compositionComplete() patch in bug 898919? I just tried the patch and it unfortunately did not help. Sotaro did indicate it was for b2g18, however, and I am running on m-c. The patch did apply cleanly and compile, though.
(In reply to Ben Kelly [:bkelly] from comment #49) > Is it significant or unexpected that we are going through > DeprecatedTextureClientShmem here? > It is expected, they are not deprecated, they are just badly named (there are new texture clients but they are not used yet). > Any hints on how to track down where the lock is being held in the graphics > subsystem would be great. I don't know sorry. bjacob is our man for tracking down gralloc locking errors. bjacob - could you take a look at this please?
Flags: needinfo?(ncameron) → needinfo?(bjacob)
Here is what I know about tracking down genlock failures. I use this patch: http://people.mozilla.org/~bjacob/genlock-logging which applies to some directory under the B2G repo, IIRC under vendor/qcom (genlock is qualcomm-only). This patch records all genlock activity to /data/local/tmp/b2glog, and aborts on the first genlock failure. The log then generally contains the info you need to understand the failure. This patch has a flaw: typically the file gets first created by the main b2g process with root permissions, so subsequent attempts to append to it by other processes fail, causing them to abort. To prevent/fix that problem, just create this /data/local/tmp/b2glog file ahead of time on the device with 0777 permissions, or just edit this patch so that it chmod's the file with 0777 right after opening it, in case that opening was also the creation of the file. I'm happy to look at a resulting b2glog file. I'd also like to help more directly, but my b2g tree is non-qualcomm at the moment (I'm on emulator), and I'm assigned short term emergencies (bug 905214) for the rest of this cycle... let me know if I can answer more questions at least.
Flags: needinfo?(bjacob)
Oh yes, and once you've applied this patch, you need to re-run ./build.sh and MANUALLY push the resulting libgenlock.so to the device, because by default the one we push to the device is NOT the one we build, instead we push a prebuilt vendor binary. The right libgenlock.so is mentioned in the build log as it gets built early during ./build.sh.
(In reply to Ben Kelly [:bkelly] from comment #48) > In case anyone else hits this, the easiest way to work around the problem at > the moment is to add the following to your gaia/build/custom-prefs.js: > > pref("layers.bufferrotation.enabled", false); > > And then do a make reset-gaia. While this does help a bit, it is still possible to run into this problem even with buffer rotation disabled. I tried setting layers.acceleration.disabled to true, but b2g dies with SIGSEGV in that case.
Retesting now that bug 905304 is fixed in m-c.
Unfortunately I was still able to provoke the GENLOCK_IOC_DREADLOCK failed with m-c at 143295:d136c8999d96.
Here are simple instructions for reproducing on the buri: 1) Flash mozilla-central on the phone. 2) From gaia directory run "make reference-workload-heavy" to install test contacts. 3) Open contacts app. 4) Scroll down until you transition from group A to group B, etc. The genlock failures occur during the transition.
Can we get this bug koi+'d? Typing in the URL field of the browser App is very unusable when typing. The experience is much like this screencast, except on the browser. http://youtu.be/xhaZBX1Aq34 ni? relmgmt, as i dont know who triages 1.2 core bugs.
Flags: needinfo?(praghunath)
Flags: needinfo?(akeybl)
My impression is that the genlock failures as reported here are part of the awesome work Sotaro is doing to investigate genlock failures in other bugs. Sotaro can you confirm? Or do you think this bug is describing a separate issue?
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Ben Kelly [:bkelly] from comment #65) > My impression is that the genlock failures as reported here are part of the > awesome work Sotaro is doing to investigate genlock failures in other bugs. > > Sotaro can you confirm? Or do you think this bug is describing a separate > issue? This bug seems same to Bug 916264.
Flags: needinfo?(sotaro.ikeda.g)
Depends on: 916264
Per Sotaro's comment #66 this appears to be a duplicate of bug 916264. Any reason why we're not closing it ad marking it as such? Is there more left to do here now that bug 916264 is fixed?
Whiteboard: c= → [c=handeye p= s= u=]
(In reply to Mike Lee [:mlee] from comment #67) > Per Sotaro's comment #66 this appears to be a duplicate of bug 916264. Any > reason why we're not closing it ad marking it as such? Is there more left to > do here now that bug 916264 is fixed? It seems duplicate, but I did not confirmed it is duplicate by using a device. I fixed one bug of the same symptom. But the path to the same symptom could be multiple.
While testing yesterday another contacts issue yesterday I noticed this does not reproduce for me anymore. Marking as duplicate.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
This is a dupe of koi+ bug 916264.
blocking-b2g: koi? → koi+
Flags: needinfo?(praghunath)
Flags: needinfo?(akeybl)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: