Bugzilla

Updated

•

11 years ago

blocking-b2g: --- → 1.4?

Component: General → Graphics

Keywords: regression, regressionwindow-wanted

Product: Firefox OS → Core

Version: unspecified → 30 Branch

Comment 3

•

11 years ago

Might be the same as bug 984531.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 4

•

11 years ago

Does this reproduce with tiling disabled?

Keywords: regressionwindow-wanted → qawanted

Comment 5

•

11 years ago

Bas, this looks like weird things on tiling boundary.

Assignee: nobody → bas

blocking-b2g: 1.4? → 1.4+

Comment 6

•

11 years ago

Assigning to nical for diagnosis (and possible fixing)

Assignee: bas → nical.bugzilla

Status: NEW → ASSIGNED

Assignee

Comment 7

•

11 years ago

(In reply to Jason Smith [:jsmith] from comment #3) > Might be the same as bug 984531. Indeed, same genlock failures in the logcat.

Comment 8

•

11 years ago

Can you try switching the pref 'layers.overzealous-gralloc-unlocking' to true and seeing if you can reproduce?

Flags: needinfo?(ckreinbring)

Comment 9

•

11 years ago

It does not reproduce with tiling disabled in 1.4

Keywords: qawanted

Comment 10

•

11 years ago

With 'layers.overzealous-gralloc-unlocking' to true, (I pulled the prefs.js, added user_pref("layers.overzealous-gralloc-unlocking", true); line, and pushed it back using modPref.js from https://gist.github.com/edmoz/5596162) After the reboot (with tiling enabled), I do see a lot of checkerboarding when drawing the screen, but I do not see the same type of rendering issue i saw before. in my case, portions of the screen were black where there should be graphics. Gaia c03a6af9028c4b74a84b5a98085bbb0c07261175 Gecko https://hg.mozilla.org/mozilla-central/rev/082761b7bc54 BuildID 20140318160201 Version 31.0a1 ro.build.version.incremental=eng.tclxa.20131223.163538 ro.build.date=Mon Dec 23 16:36:04 CST 2013

Flags: needinfo?(ckreinbring)

Updated

•

11 years ago

Blocks: b2g-tiling

Comment 11

•

11 years ago

(In reply to npark from comment #10) > With 'layers.overzealous-gralloc-unlocking' to true, (I pulled the prefs.js, > added user_pref("layers.overzealous-gralloc-unlocking", true); line, and > pushed it back using modPref.js from https://gist.github.com/edmoz/5596162) > > After the reboot (with tiling enabled), I do see a lot of checkerboarding > when drawing the screen, but I do not see the same type of rendering issue i > saw before. in my case, portions of the screen were black where there > should be graphics. > > Gaia c03a6af9028c4b74a84b5a98085bbb0c07261175 > Gecko https://hg.mozilla.org/mozilla-central/rev/082761b7bc54 > BuildID 20140318160201 > Version 31.0a1 > ro.build.version.incremental=eng.tclxa.20131223.163538 > ro.build.date=Mon Dec 23 16:36:04 CST 2013 So it looks like this has to do wit the overzealous unlocking stuff again :s. That's tricky as I'm not sure what our options here are, what do you think Chris?

Comment 12

•

11 years ago

(In reply to Bas Schouten (:bas.schouten) from comment #11) > (In reply to npark from comment #10) > > With 'layers.overzealous-gralloc-unlocking' to true, (I pulled the prefs.js, > > added user_pref("layers.overzealous-gralloc-unlocking", true); line, and > > pushed it back using modPref.js from https://gist.github.com/edmoz/5596162) > > > > After the reboot (with tiling enabled), I do see a lot of checkerboarding > > when drawing the screen, but I do not see the same type of rendering issue i > > saw before. in my case, portions of the screen were black where there > > should be graphics. > > > > Gaia c03a6af9028c4b74a84b5a98085bbb0c07261175 > > Gecko https://hg.mozilla.org/mozilla-central/rev/082761b7bc54 > > BuildID 20140318160201 > > Version 31.0a1 > > ro.build.version.incremental=eng.tclxa.20131223.163538 > > ro.build.date=Mon Dec 23 16:36:04 CST 2013 > > So it looks like this has to do wit the overzealous unlocking stuff again > :s. That's tricky as I'm not sure what our options here are, what do you > think Chris? Fixing this properly is tricky... If we could have some kind of message to know when the GrallocTexture has no more users on the host-side, we could use that to stop locks succeeding on the client-side (when there are still users left). Then I suppose we'd need to introduce some 'retry' code for when opening the lock fails client-side to just get a different texture (or wait, I suppose, though that's a bit dangerous). nical, what do you think of the above?

Flags: needinfo?(nical.bugzilla)

Comment 13

•

11 years ago

(In reply to Chris Lord [:cwiiis] from comment #12) > Fixing this properly is tricky... If we could have some kind of message to > know when the GrallocTexture has no more users on the host-side, we could > use that to stop locks succeeding on the client-side (when there are still > users left). Then I suppose we'd need to introduce some 'retry' code for > when opening the lock fails client-side to just get a different texture (or > wait, I suppose, though that's a bit dangerous). > > nical, what do you think of the above? Thinking about this some more, this is essentially what the read locks are doing, but they unlock immediately after the render of the flipped buffer. Ideally, they'd unlock on EndFrame though, as this is really when the buffers are unlocked. I guess we could introduce a Flush method on the Compositor's TexturePool that could be called before unlocking, but that feels kind of nasty to me... That said, given that this is 'fixed' by overzealous unlocking, it kind of insinuates that this isn't the problem and that we're locking a texture for writing that's currently being composited. Is there a situation other than the over-production case where this can happen? Although it seems unlikely, I'd really like to rule out that case. I'll see if I can reproduce this and work on a fix for that.

Assignee

Comment 14

•

11 years ago

My hamachi is on a one-week-old-ish revision of mozilla-central and I can't reproduce the bug. I am currently looking at the last patches that may have caused the regression. In the mean time it would be great if someone could build a slightly older version of gecko that has tiling and confirm whether or not the bug was already there.

Flags: needinfo?(nical.bugzilla)

Comment 15

•

11 years ago

I can reproduce this and confirm that it's due to the genlock failures. I doubt very much this is overproduction, however - looking into it.

Comment 16

•

11 years ago

I can't reproduce this if I disable the texture client pool (by reporting all clients as lost in Discard(Front|Back)Buffer). This would hint that the problem may be re-use of a TextureClient that was returned before it was unlocked.

Comment 17

•

11 years ago

Further good news, I can't reproduce this at all if I disable ReturnTextureClientDeferred (and just report those clients as lost). Given that this is a known issue with this code, I'll just go ahead and see if I can get a fix going (I have one in mind).

Assignee: nical.bugzilla → chrislord.net

Botond Ballo [:botond]

Updated

•

11 years ago

Updated

•

11 years ago

Comment 18

•

11 years ago

Attached patch Guarantee that locked TextureClients aren't reused (obsolete) — Details — Splinter Review

This isn't beautiful code and I'm not saying this is the best method - I'm pretty much expecting an r-... But it does fix the problem, and it is something we need to do some way or another.

Attachment #8394453 - Flags: review?(bas)

Comment 19

•

11 years ago

Comment on attachment 8394453 [details] [diff] [review] Guarantee that locked TextureClients aren't reused Review of attachment 8394453 [details] [diff] [review]: ----------------------------------------------------------------- In general this is fine, and if it fixes the bug that's a good thing to have on hand, and we could even land it mostly in its current form to paper over the issue if we can't figure out why this bug is happening, there's 3 things we need though: 1) Figure out if FlushRendering works right (and therefor, if it will prevent overproduction) 2) If it does work right, figure out how this problem can even occur if we're not overproducing. (Because there's a reasonable chance this patch is papering over another bug in that case) 3) If there is no overproduction, but this patch does fix the bug, figure out if with the patch we're still properly re-using texture clients, or if it just fixes the bug by spending a long time in blocking gralloc allocations. ::: gfx/layers/client/TextureClientPool.cpp @@ +138,5 @@ > +TextureClientPool::ReturnDeferredTextureClients(bool aDestroy) > +{ > + MOZ_ASSERT(aData.callback); > + > + // Guard against re-entry What could cause re-entry? That's scary. ::: gfx/layers/client/TiledContentClient.cpp @@ +491,5 @@ > TileClient::DiscardFrontBuffer() > { > if (mFrontBuffer) { > MOZ_ASSERT(mFrontLock); > + TextureClientPool* pool = mManager->GetTexturePool(mFrontBuffer->AsTextureClientDrawTarget()->GetFormat()); Filter this logic out into a separate function.

Comment 20

•

11 years ago

Here's a good log that only adds to the mystery sadly, notice how it doesn't seem like there's overproduction. And it sort of seems mostly business as usual. (Note the time between Wanting to start drawing, and starting to draw is the time it's making sure the compositor finishes drawing that I added with a little code). https://gist.github.com/nhirata/9677272

Ben Kelly [:bkelly, not reviewing]

Comment 21

•

11 years ago

A new log here gives some very interesting data: https://gist.github.com/nhirata/9677890 Note the two different processes we're dealing with, 1229 is getting the failures, that particular process isn't overproducing at all! We're seeing the main/composition process do a lot of compositions before, and after 1229 has started its transaction. So at this point I'm still not sure what's going on. Note that apparently the bug became a lot harder to produce with all my printf's in, so it might be very timing sensitive. Although it could've just been a coincidence. 03-20 18:47:00.689: I/Gecko(1229): XXX - Bas - Wanting to draw client side! 03-20 18:47:00.689: I/Gecko(1229): XXX - Bas - Starting to draw client side! 03-20 18:47:00.829: I/Gecko(136): XXX - Bas - Wanting to draw client side! 03-20 18:47:00.829: I/Gecko(136): XXX - Bas - Starting to draw client side! 03-20 18:47:00.829: I/Gecko(136): XXX - Bas - Finished validation! 03-20 18:47:00.839: I/Gecko(136): XXX - Bas - Starting to draw host side! 03-20 18:47:00.839: I/Gecko(136): XXX - Bas - Finished compositing host side! 03-20 18:47:00.839: I/Gecko(136): XXX - Bas - Starting to draw host side! 03-20 18:47:00.839: I/Gecko(136): XXX - Bas - Forwarded transaction! 03-20 18:47:00.839: I/Gecko(136): XXX - Bas - Finished drawing client side! 03-20 18:47:00.839: I/Gecko(136): XXX - Bas - Finished compositing host side! 03-20 18:47:00.969: I/Gecko(136): XXX - Bas - Wanting to draw client side! 03-20 18:47:00.979: I/Gecko(136): XXX - Bas - Starting to draw client side! 03-20 18:47:00.979: I/Gecko(136): XXX - Bas - Finished validation! 03-20 18:47:00.979: I/Gecko(136): XXX - Bas - Starting to draw host side! 03-20 18:47:00.979: I/Gecko(136): XXX - Bas - Finished compositing host side! 03-20 18:47:00.979: I/Gecko(136): XXX - Bas - Starting to draw host side! 03-20 18:47:00.979: I/Gecko(136): XXX - Bas - Forwarded transaction! 03-20 18:47:00.979: I/Gecko(136): XXX - Bas - Finished drawing client side! 03-20 18:47:00.979: I/Gecko(136): XXX - Bas - Finished compositing host side! 03-20 18:47:01.369: I/Gecko(1283): ###################################### forms.js loaded 03-20 18:47:01.389: I/Gecko(1283): ############################### browserElementPanning.js loaded 03-20 18:47:01.409: I/Gecko(1283): ######################## BrowserElementChildPreload.js loaded 03-20 18:47:01.799: E/libgenlock(1229): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err=Connection timed out fd=61) 03-20 18:47:01.799: E/msm7627a.gralloc(1229): gralloc_lock: genlock_lock_buffer (lockType=0x2) failed 03-20 18:47:01.799: W/GraphicBufferMapper(1229): lock(...) failed -22 (Invalid argument) 03-20 18:47:02.799: E/libgenlock(1229): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err=Connection timed out fd=64) 03-20 18:47:02.799: E/msm7627a.gralloc(1229): gralloc_lock: genlock_lock_buffer (lockType=0x2) failed 03-20 18:47:02.799: W/GraphicBufferMapper(1229): lock(...) failed -22 (Invalid argument) 03-20 18:47:03.799: E/libgenlock(1229): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err=Connection timed out fd=43) 03-20 18:47:03.799: E/msm7627a.gralloc(1229): gralloc_lock: genlock_lock_buffer (lockType=0x2) failed 03-20 18:47:03.799: W/GraphicBufferMapper(1229): lock(...) failed -22 (Invalid argument) 03-20 18:47:04.129: E/QCALOG(191): [MessageQ] ProcessNewMessage: [XTWiFi-PE] unknown deliver target [OS-Agent] 03-20 18:47:04.129: E/QCALOG(191): [MessageQ] ProcessNewMessage: [XT-CS] unknown deliver target [OS-Agent] 03-20 18:47:04.149: E/QCALOG(191): [MessageQ] ProcessNewMessage: [XTWWAN-PE] unknown deliver target [OS-Agent] 03-20 18:47:04.469: D/wpa_supplicant(595): RX ctrl_iface - hexdump(len=11): 53 49 47 4e 41 4c 5f 50 4f 4c 4c 03-20 18:47:04.469: D/wpa_supplicant(595): nl80211: survey data missing! 03-20 18:47:04.799: E/libgenlock(1229): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err=Connection timed out fd=67) 03-20 18:47:04.799: E/msm7627a.gralloc(1229): gralloc_lock: genlock_lock_buffer (lockType=0x2) failed 03-20 18:47:04.799: W/GraphicBufferMapper(1229): lock(...) failed -22 (Invalid argument) 03-20 18:47:04.799: I/Gecko(1229): XXX - Bas - Finished validation!

Comment 22

•

11 years ago

These are probably stupid questions, but: Do you often see the pre-allocated process launch prior to the genlock failures? Or is that just because you captured just after app launch in these cases. See the messages about loading form.js, etc. I assume the pre-allocated app does not do anything with gfx resources, but just thought I would ask. Also, what app was terminated in the second gist? You can see the IPC delivery failures. Just curious.

Updated

•

11 years ago

Flags: needinfo?(nhirata.bugzilla)

Comment 23

•

11 years ago

(In reply to Ben Kelly [:bkelly] (PTO Mar 21, back Mar 24) from comment #22) > These are probably stupid questions, but: > > Do you often see the pre-allocated process launch prior to the genlock > failures? Or is that just because you captured just after app launch in > these cases. See the messages about loading form.js, etc. > > I assume the pre-allocated app does not do anything with gfx resources, but > just thought I would ask. > > Also, what app was terminated in the second gist? You can see the IPC > delivery failures. Just curious. I didn't make this trace, I just ordered the printf's, fwiw, I'm not sure what most of those questions mean :). I don't know what the 'pre-allocated' process is, if you mean 1283 in this case, it isn't in the log attached to the bug, fwiw? I think it's just because the bug repro's on app-launch generally.

Comment 24

•

11 years ago

Attached patch Guarantee that locked TextureClients aren't reused v2 (obsolete) — Details — Splinter Review

Version of the patch with comments addressed. With regards to debugging this problem, it seems to stem from the ReturnTextureClientDeferred call in DiscardFrontBuffer. Looking at when this is called, it can happen when the format changes, when a write-lock fails and it will happen if we fail to add the texture client to the compositable client. In these three cases, I think the texture ought to be reported as lost, rather than returned to the pool (or alternatively, we should wait on the lock, then return it). The other time it gets called is when we do a full tile update, which I think is fine. My patch will have us wait on the lock if the cases I mention above ever occur, otherwise it should result in roughly the same behaviour as before - Although I agree that it's papering over the issue to some extent, I think this is an easier way of dealing with the issue and it guarantees that we never return a locked client to the pool. For the record, I think we should *also* block on over-production on non-progressive updates, but my logging and yours too show that this is a very rare case on mobile (may be more of an issue on desktop) and so not a high priority right now (imo).

Attachment #8394453 - Attachment is obsolete: true

Attachment #8394453 - Flags: review?(bas)

Attachment #8394755 - Flags: review?(bas)

Updated

•

11 years ago

Blocks: 984577

Updated

•

11 years ago

Blocks: 984531

Updated

•

11 years ago

Blocks: 984482

Updated

•

11 years ago

Blocks: 985170

Updated

•

11 years ago

Blocks: 985779

Updated

•

11 years ago

Blocks: 983883

Updated

•

11 years ago

Blocks: 985162

Updated

•

11 years ago

Blocks: 986103

Milan Sreckovic [:milan] (needinfo for best results)

Comment 25

•

11 years ago

Naoki, do you think you could try to reproduce with the patch attached to this bug? I can't reproduce it, but Bas was saying that this can become very hard to reproduce when the timing is affected, so I'd like to know if it actually fixes it (which I think it does), or if it's just a red herring. I'll use my patch to debug this issue on Monday (with it, I can track if locked textures are being returned to the pool and when/why) and see if I can figure out the underlying cause.

Comment 26

•

11 years ago

I will try as well, and CC-ing No-Jun in case he can take a look sooner.

Flags: needinfo?(npark)

Comment 27

•

11 years ago

It seems I've lost the ability to reproduce this bug on plain m-c. My previous steps that hit it instantly no longer hit it, so if anyone has decent STR, let me know...

Comment 28

•

11 years ago

(In reply to Chris Lord [:cwiiis] from comment #27) > It seems I've lost the ability to reproduce this bug on plain m-c. My > previous steps that hit it instantly no longer hit it, so if anyone has > decent STR, let me know... Do you try the STR in any of the blocking bugs?

Assignee

Comment 29

•

11 years ago

I did some more logging around the issue and here is what I see: when looking up 'genlock' in the log, you can look at the last TextureClient lock that went in the log _within the same process as the failure_ (client 3799) one of these is the lock of TextureClient [id:13] going back in time we can see that TextureHost [id:13] was locked and its TextureSource uses the gl texture [tex:196]. The next time EndFrame is called, [tex:196] is not in the "unused" part of the pool. When EndFrame is called again after that, [tex:196] is deleted, but it happens *after* the genlock failure. here are the relevant bits of the logcat in order: I/Gecko ( 3598): -- GrallocTextureHostOGL(0x446dd280)::Lock [id:13] I/Gecko ( 3598): -- GrallocTextureSourceOGL: using [tex:196] I/Gecko ( 3598): -- PerFrameTexturePoolOGL::EndFrame I/Gecko ( 3799): ** Pool: Return deferred TextureClient [id:13] (refcnt:1) I/Gecko ( 3799): ** Pool: Return TextureClient [id:13] (refcnt:1) I/Gecko ( 3799): ** Pool: GetTextureClient from pool [id:13] (refcnt:1) I/Gecko ( 3799): ** GrallocTextureClientOGL::Lock [id:13] E/libgenlock( 3799): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err= Connection timed out fd=71) E/msm7627a.gralloc( 3799): gralloc_lock: genlock_lock_buffer (lockType=0x2) failed W/GraphicBufferMapper( 3799): lock(...) failed -22 (Invalid argument) I/Gecko ( 3598): -- PerFrameTexturePoolOGL::EndFrame I/Gecko ( 3598): -- EndFrame: delete gl texture [tex:196]

Assignee

Comment 30

•

11 years ago

Attached patch logging patch (obsolete) — Details — Splinter Review

Assignee

Comment 31

•

11 years ago

Attached file Log of the error with textureClient/Host and gl texture IDs (obsolete) — Details

Assignee

Comment 32

•

11 years ago

Looking at the same log and rethinking about when the lagginess shows: we seem to be in places where we followed a link, and the client side is doing a lot of throwing around and recycling TextureClients, while the compositor side is doing close to nothing. This isn't your typical overproduction case but it looks kinda similar. Maybe we don't schedule compositions when just destroy layers (which means we don't do the pool's EndFrame?

Comment 33

•

11 years ago

Attached patch Guarantee that locked TextureClients aren't reused v3 (obsolete) — Details — Splinter Review

This is similar to the first patch in intention, but slightly different in design. This patch changes nothing about our old semantics, it just guarantees that we don't return locked textures, and if ever that situation appears, it asserts (but will continue working). I think this is desirable as it lets us track the issue (via the asserts and extra points to add logging), but also stops them from having quite so nasty side-effects (of course, if there's a leak, the app will eventually crash, but misuse of gralloc surfaces quite often tends to have worse side-effects than that (like the whole phone crashing)).

Attachment #8394755 - Attachment is obsolete: true

Attachment #8394755 - Flags: review?(bas)

Attachment #8395927 - Flags: review?(bas)

Comment 34

•

11 years ago

For the record, I don't think these lock failures come up as a result of inconsistency of information between client/host, but more likely a situation like what nical describes in comment #32.

Assignee

Comment 35

•

11 years ago

(In reply to Chris Lord [:cwiiis] from comment #34) > For the record, I don't think these lock failures come up as a result of > inconsistency of information between client/host, but more likely a > situation like what nical describes in comment #32. Yeah, have been doing a lot of logging today and it always looks the same: we are throwing away layers and rebuilding new ones immediately, making us both put tiles in the pool and recyce them before we do an EndFrame on the compositor side to release the gralloc textures. I haven't yet found out whether there is a better place for us to do the Compositor's gl texture pool EndFrame call, but I am pretty sure we need to add a call to the gl pool's EndFrame somewhere specifically for this case, or make sure we don't return the deferred textures in the client-side pool in this specific case (either of the two).

Comment 36

•

11 years ago

(In reply to Nicolas Silva [:nical] from comment #35) > (In reply to Chris Lord [:cwiiis] from comment #34) > > For the record, I don't think these lock failures come up as a result of > > inconsistency of information between client/host, but more likely a > > situation like what nical describes in comment #32. > > Yeah, have been doing a lot of logging today and it always looks the same: > we are throwing away layers and rebuilding new ones immediately, making us > both put tiles in the pool and recyce them before we do an EndFrame on the > compositor side to release the gralloc textures. > I haven't yet found out whether there is a better place for us to do the > Compositor's gl texture pool EndFrame call, but I am pretty sure we need to > add a call to the gl pool's EndFrame somewhere specifically for this case, > or make sure we don't return the deferred textures in the client-side pool > in this specific case (either of the two). In the logs the client side that we care about in this bug (i.e. The pool we're dealing with) is not throwing around transactions at all. And the client transactions we are seeing a lot of should be using a different pool. So I'm still not clear on why this would happen, could you describe (stepwise) a situation that could cause this, that is consistent with the logs we're seeing?

Naoki Hirata :nhirata (please use needinfo instead of cc)

Comment 37

•

11 years ago

(In reply to Nicolas Silva [:nical] from comment #35) > (In reply to Chris Lord [:cwiiis] from comment #34) > > For the record, I don't think these lock failures come up as a result of > > inconsistency of information between client/host, but more likely a > > situation like what nical describes in comment #32. > > Yeah, have been doing a lot of logging today and it always looks the same: > we are throwing away layers and rebuilding new ones immediately, making us > both put tiles in the pool and recyce them before we do an EndFrame on the > compositor side to release the gralloc textures. > I haven't yet found out whether there is a better place for us to do the > Compositor's gl texture pool EndFrame call, but I am pretty sure we need to > add a call to the gl pool's EndFrame somewhere specifically for this case, > or make sure we don't return the deferred textures in the client-side pool > in this specific case (either of the two). Specifically, I can't see this explaining either the log, or the failure of FlushRendering to fix it.

Flags: needinfo?(nical.bugzilla)

Flags: needinfo?(chrislord.net)

Comment 38

•

11 years ago

Attached file logcat.txt — Details

With cwiis' v3 patch, I had to come up with new steps to reproduce the bug. 1. Launch Accuweather 2. change to landscape orientation on the phone 3. As soon as the page change the orientation to landscape pick one of the options ( World Weather, Africa... ) 4. wait til load I think you don't necessarily have to have memory pressured in order to reproduce it this way.

Flags: needinfo?(nhirata.bugzilla)

Comment 39

•

11 years ago

(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #38) > Created attachment 8396091 [details] > logcat.txt > > With cwiis' v3 patch, I had to come up with new steps to reproduce the bug. > > 1. Launch Accuweather > 2. change to landscape orientation on the phone > 3. As soon as the page change the orientation to landscape pick one of the > options ( World Weather, Africa... ) > 4. wait til load > > I think you don't necessarily have to have memory pressured in order to > reproduce it this way. I'll try to confirm this myself - but this would make me change my mind entirely, there should be no possibility of reusing a locked client with this patch (unless I got it wrong, which I'll verify), in which case it would greatly support Bas's theory.

Flags: needinfo?(chrislord.net)

Comment 40

•

11 years ago

(In reply to Chris Lord [:cwiiis] from comment #39) > (In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from > comment #38) > > Created attachment 8396091 [details] > > logcat.txt > > > > With cwiis' v3 patch, I had to come up with new steps to reproduce the bug. > > > > 1. Launch Accuweather > > 2. change to landscape orientation on the phone > > 3. As soon as the page change the orientation to landscape pick one of the > > options ( World Weather, Africa... ) > > 4. wait til load > > > > I think you don't necessarily have to have memory pressured in order to > > reproduce it this way. > > I'll try to confirm this myself - but this would make me change my mind > entirely, there should be no possibility of reusing a locked client with > this patch (unless I got it wrong, which I'll verify), in which case it > would greatly support Bas's theory. Ugh, and I realise, I can't possibly reproduce this because the buri has no accelerometer/gyro :| I'll see if I can reproduce on a Keon...

Comment 41

•

11 years ago

and it appears my build doesn't have working motion events anyway...? (Keon that was previously fine now isn't doing auto-rotation or delivery devicemotion events :/)

Comment 42

•

11 years ago

So I've reflashed my Keon, devicemotion works correctly, but I don't get any screen rotation. No idea why, but it means I'm unable to reproduce these lock failures anymore :/

Milan Sreckovic [:milan] (needinfo for best results)

Comment 43

•

11 years ago

The following is all on a Keon. So, replacing NS_NOTREACHED with a macro that just prints the error, I can get TextureClients that are locked to attempt to re-use (which the patch prevents, but warns about), but I have no reasonable STR for such (I managed it by browsing to planet.mozilla.org and panning quickly quite far down, then all the way back to the top). I can get genlock failures to occur due to out of memory, I assume from a situation that causes too many tiles to be created and fds getting exhausted (the lock fails, rather than the allocate), but this is only when zooming in to extreme levels, and I assume caused by bug 957668. I've spent a reasonable amount of time trying to reproduce this, I need more solid steps to reproduce that don't involve rotation. Also, perhaps the m-c rev may be useful. Whether it fixes it or not, I think it's a reasonable idea to have a patch similar to this to at least assert if this situation gets hit.

Comment 44

•

11 years ago

Jeff and Benoit, let's continue on this today here, maybe with the QRD we have.

Flags: needinfo?(jmuizelaar)

Flags: needinfo?(bgirard)

Assignee

Comment 45

•

11 years ago

Attached file logcat with texture IDs showing lots of transaction but no composition near the genlock failure (obsolete) — Details

Flags: needinfo?(nical.bugzilla)

Comment 46

•

11 years ago

A nice way to help debug this issue, stick this at the top of TextureClientPool.cpp: #define NS_NOTREACHED(s) printf_stderr("XXX: %s\n", s) then, adb logcat|grep -E 'XXX|gralloc' The warnings you get would likely occur when you *would* have had genlock errors (you can confirm that by changing 'return TextureClientPool::DEFER' to 'return TextureClientPool::READY' in TiledContentClient.cpp. Any other lock errors you get are suspect and should be investigated - if they are resolved by overzealous unlocking, the errors stem from using tiles, otherwise they stem from another gralloc user.

Assignee

Comment 47

•

11 years ago

Attached patch Also clear the compositor's unused textures when the frame was composited by hwcomposer — Details — Splinter Review

I can't reproduce the bug with this (embarrassingly) simple patch \o/

Attachment #8396450 - Flags: review?(chrislord.net)

Comment 48

•

11 years ago

Comment on attachment 8396450 [details] [diff] [review] Also clear the compositor's unused textures when the frame was composited by hwcomposer Review of attachment 8396450 [details] [diff] [review]: ----------------------------------------------------------------- Nicely done.

Attachment #8396450 - Flags: review?(chrislord.net) → review+

https://hg.mozilla.org/integration/mozilla-inbound/rev/0c0cfce48311

Assignee

Comment 49

•

11 years ago

Benoit Girard (:BenWa)

Updated

•

11 years ago

Flags: needinfo?(bgirard)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 50

•

11 years ago

For those that have been able to reproduce this bug, it would mean that turning off hardware composer in the developer prefs should make the problem go away, if you want to test before the patch shows up in the nightly.

Flags: needinfo?(jmuizelaar)

Comment 51

•

11 years ago

When hardware composer is turned off, the bug was no longer reproducible on below Gaia build. (and I was able to consistently reproduce this bug on this build) Gaia a2a88d0638594a6510f878d2c5e99a6ead7520ad Gecko https://hg.mozilla.org/releases/mozilla-aurora/rev/67bdb575d833 BuildID 20140325000201 Version 30.0a2

Flags: needinfo?(npark)

Comment 52

•

11 years ago

Comment on attachment 8395927 [details] [diff] [review] Guarantee that locked TextureClients aren't reused v3 Review of attachment 8395927 [details] [diff] [review]: ----------------------------------------------------------------- I've decided that with the actual cause found, I'm not so sure this patch is a good idea. Currently the gralloc lock is not timing out, but it seems to be hit occasionally, which is good since it will throttle our production based on our rate of consumption in the compositor. This patch would undo that and instead make us sometimes create new buffers, requiring a round-trip to the compositor and in the end a locker blocking period. That does not seem desirable.

Attachment #8395927 - Flags: review?(bas) → review-

Milan Sreckovic [:milan] (needinfo for best results)

Comment 53

•

11 years ago

Attached patch Warn when locked texture clients are reused (in debug builds) — Details — Splinter Review

I think nical's identification of the problem means we shouldn't hit this now. However, there are situations where we start writing to a locked tile before it's unlocked on the compositor side. When we hit these, it's usually a sign of some kind of over-production, so we'd rather just wait on the lock and what we do is fine. There's a small chance that this could cause problems though (and a larger one on PVR I think), so this patch doesn't change any behaviour, but for debug builds, will output a warning when we attempt to reuse a locked tile. It also gives us a few more points where we can inject debugging code, which may come in handy later.

Attachment #8395927 - Attachment is obsolete: true

Attachment #8396660 - Flags: review?(bas)

Comment 54

•

11 years ago

(In reply to Nicolas Silva [:nical] from comment #49) > https://hg.mozilla.org/integration/mozilla-inbound/rev/0c0cfce48311 I forgot to remind - when things land on b2g-inbound, we get the build cached per check-in, helps with further testing. Either way, we have the fix :)

Naoki Hirata :nhirata (please use needinfo instead of cc)

Comment 55

•

11 years ago

I tried just https://hg.mozilla.org/integration/mozilla-inbound/rev/0c0cfce48311 and the issue appears still. I turned the hwc off and the issue no longer appeared using steps in both comment 38 and comment 0. Gaia c76ca4811cb80e7fe7082d2d3b97b189b4463b1d Gecko e6d096d7473abeda806cc1534b4e78f05ead0a6a BuildID 20140325123144 Version 31.0a1 ro.build.version.incremental=eng.tclxa.20131223.163538 ro.build.date=Mon Dec 23 16:36:04 CST 2013 Maybe I did something wrong when building...

Milan Sreckovic [:milan] (needinfo for best results)

Comment 56

•

11 years ago

That (you doing a bad build) may be too much to hope for, but let's see what it looks like in the nightly tomorrow.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 57

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/0c0cfce48311

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla31

Comment 58

•

11 years ago

No-Jun - Can you verify this when you get into work tomorrow to see if this works correctly on trunk?

Flags: needinfo?(npark)

Keywords: verifyme

Comment 59

•

11 years ago

On latest Trunk (1.5.0), this is still reproducible when Accuweather is downloaded and executed. Gaia 80af23f8c74d9d2e9388d8ed3c204040b5c528ec Gecko https://hg.mozilla.org/mozilla-central/rev/c69c55582faa BuildID 20140326040202 Version 31.0a1 ro.build.version.incremental=eng.cltbld.20140306.073741 ro.build.date=Thu Mar 6 08:02:39 EST 2014

Flags: needinfo?(npark)

Assignee

Comment 60

•

11 years ago

Attached patch Clear the compositor's unused textures when the frame was aborted — Details — Splinter Review

It looks like CompsoitorOGL::AbortFrame should also call the pool's EndFrame.

Assignee: chrislord.net → nical.bugzilla

Status: RESOLVED → REOPENED

Attachment #8397107 - Flags: review?(bas)

Resolution: FIXED → ---

Assignee

Updated

•

11 years ago

Attachment #8397107 - Attachment is patch: true

Updated

•

11 years ago

Attachment #8397107 - Flags: review?(bas) → review+

Sotaro Ikeda [:sotaro]

Comment 61

•

11 years ago

(In reply to Nicolas Silva [:nical] from comment #60) > Created attachment 8397107 [details] [diff] [review] > Clear the compositor's unused textures when the frame was aborted > > It looks like CompsoitorOGL::AbortFrame should also call the pool's EndFrame. From where does CompsoitorOGL::AbortFrame() get called?

https://hg.mozilla.org/integration/mozilla-inbound/rev/7f9cea6dec5b

Assignee

Comment 62

•

11 years ago

Milan Sreckovic [:milan] (needinfo for best results)

Comment 63

•

11 years ago

(In reply to npark from comment #59) > On latest Trunk (1.5.0), this is still reproducible when Accuweather is > downloaded and executed. > > Gaia 80af23f8c74d9d2e9388d8ed3c204040b5c528ec > Gecko https://hg.mozilla.org/mozilla-central/rev/c69c55582faa > BuildID 20140326040202 > Version 31.0a1 > ro.build.version.incremental=eng.cltbld.20140306.073741 > ro.build.date=Thu Mar 6 08:02:39 EST 2014 I haven't been able to reproduce it on the trunk with the local build.

Assignee

Comment 64

•

11 years ago

Attached patch Logging patch v2 — Details — Splinter Review

Attachment #8395830 - Attachment is obsolete: true

Attachment #8395832 - Attachment is obsolete: true

Attachment #8396388 - Attachment is obsolete: true

Naoki Hirata :nhirata (please use needinfo instead of cc)

Updated

•

11 years ago

Keywords: verifyme

Comment 65

•

11 years ago

Tested the patch w/ Nical's build/patch seems to resolve the issue. From irc convo nical: so the build I gave nhirata_ contains the first EndFrame fix (the one that fixes it for me) + a second somewhat speculative fix that is still on inbound, but that I suspect is fixing dead code + the logging [11:00am] nical: With this logging I could still easily reproduce the bug before the first fix, so it most likely means that the bug is fixed but was shadowing another bug which is harder to reproduce.

Milan Sreckovic [:milan] (needinfo for best results)

Assignee

Comment 66

•

11 years ago

(In reply to Milan Sreckovic [:milan] from comment #63) > I haven't been able to reproduce it on the trunk with the local build. I spent today trying to reproduce as well and could not (optimized build with non logging). I am closing this bug because I am pretty certain that attachment 8396450 [details] [diff] [review] fixes the problem that we diagnosed here. If the symptoms show up again, it means it is a separate bug that we should followup on in a different bugzilla ticket.

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Comment 67

•

11 years ago

Here's a weird thing. I can reproduce this with the latest nightly build (which should have the first EndFrame fix only.) I can't reproduce with my local build (which should have the first EndFrame fix only.) Nical's tracing seems to point to the change somehow not being in the nightly build as well. May have to check tomorrow. In the meantime, Nical, can you uplift the (first) fix to Aurora when you have a chance?

Naoki Hirata :nhirata (please use needinfo instead of cc)

Comment 68

•

11 years ago

Nical was right; It turns out looking at the nightly checkins the nightly build doesn't have the fix it seems. Making my own build, I don't see an issue when in the app; I'm seeing issues when I place the app in the background. We should check this again with tomorrow's build or a build from later today.

Ryan VanderMeulen [:RyanVM]

Comment 69

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/7f9cea6dec5b

https://hg.mozilla.org/releases/mozilla-aurora/rev/bfcdcc4b9c83

Assignee

Comment 70

•

11 years ago

status-b2g-v1.4: affected → fixed

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

status-b2g-v2.0: --- → fixed

status-firefox29: --- → unaffected

status-firefox30: --- → fixed

status-firefox31: --- → fixed

Naoki Hirata :nhirata (please use needinfo instead of cc)

Comment 71

•

11 years ago

Confused via the some of the comments above. Have we confirmed that this is verified or not on trunk? There seems to be conflicting information on the testing above.

Flags: needinfo?(npark)

Comment 72

•

11 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/bfcdcc4b9c83 is in today's 1.4 build: Gaia 7d716de0c186416b5b123baa1f3242e23d50529b Gecko https://hg.mozilla.org/releases/mozilla-aurora/rev/69e896713b11 BuildID 20140327074806 Version 30.0a2 ro.build.version.incremental=324 ro.build.date=Thu Dec 19 14:04:55 CST 2013 We should be able to test there

Milan Sreckovic [:milan] (needinfo for best results)

Comment 74

•

11 years ago

Right - but let's first verify on trunk :)

Comment 75

•

11 years ago

This is no longer reproducible with / without HWC enabled on today's master branch: │ Gaia 9da1b9c11bf518bce882be305ae121c44c5d1e05 │ │ Gecko https://hg.mozilla.org/mozilla-central/rev/9afe2a1145bd │ │ BuildID 20140327040202 │ │ Version 31.0a1

Updated

•

11 years ago

Status: RESOLVED → VERIFIED

Flags: needinfo?(npark)

Updated

•

11 years ago

No longer blocks: 983883

Naoki Hirata :nhirata (please use needinfo instead of cc)

Updated

•

11 years ago

No longer blocks: 985162

Comment 77

•

11 years ago

Verified on Aurora with Gaia 7d716de0c186416b5b123baa1f3242e23d50529b Gecko https://hg.mozilla.org/releases/mozilla-aurora/rev/69e896713b11 BuildID 20140327074806 Version 30.0a2 ro.build.version.incremental=324 ro.build.date=Thu Dec 19 14:04:55 CST 2013 Buri Verified with Mako on : Gaia 9da1b9c11bf518bce882be305ae121c44c5d1e05 Gecko https://hg.mozilla.org/mozilla-central/rev/9afe2a1145bd BuildID 20140327040202 Version 31.0a1 ro.build.version.incremental=eng.cltbld.20140327.073722 ro.build.date=Thu Mar 27 07:37:33 EDT 2014 Mako Note: 1.5 Buri has issues with today's build.

status-b2g-v1.4: fixed → verified

status-b2g-v2.0: fixed → verified

Updated

•

11 years ago

No longer blocks: 984531

Updated

•

11 years ago

No longer blocks: 984577

Updated

•

11 years ago

No longer blocks: 985170

Updated

•

11 years ago

No longer blocks: 986103

Updated

•

11 years ago

No longer blocks: 985779

Comment 84

•

11 years ago

Comment on attachment 8396660 [details] [diff] [review] Warn when locked texture clients are reused (in debug builds) Review of attachment 8396660 [details] [diff] [review]: ----------------------------------------------------------------- I'm a little concerned about most of this code running in release builds but not actually doing anything useful :s.

Attachment #8396660 - Flags: review?(bas) → review-