Keon and Unagi freeze on master branch while trying to load the Marketplace and other apps, in E/Adreno200-EGLSUB( 472): UnlockImage() genlock_unlock_buffer failed E/libgenlock( 472): perform_lock_unlock_operation: GENLOCK_IOC_LOCK failed

RESOLVED FIXED

Status

Firefox OS
General
--
critical
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: fabrice, Unassigned)

Tracking

({hang, regression})

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

5 years ago
With a build from birch, I get an endless stream of libgenlock errors, and the device finally freezes.

E/libgenlock(  472): perform_lock_unlock_operation: GENLOCK_IOC_LOCK failed (lockType0x0, err=Invalid argument fd=105)
E/Adreno200-EGLSUB(  472): UnlockImage() genlock_unlock_buffer failed
E/libgenlock(  472): perform_lock_unlock_operation: GENLOCK_IOC_LOCK failed (lockType0x0, err=Invalid argument fd=105)
E/Adreno200-EGLSUB(  472): UnlockImage() genlock_unlock_buffer failed
E/Adreno200-EGLSUB(  472): UnlockImage() genlock_unlock_buffer failed
E/libgenlock(  472): perform_lock_unlock_operation: GENLOCK_IOC_LOCK failed (lockType0x0, err=Invalid argument fd=105)
E/Adreno200-EGLSUB(  472): UnlockImage() genlock_unlock_buffer failed
E/libgenlock(  472): perform_lock_unlock_operation: GENLOCK_IOC_LOCK failed (lockType0x0, err=Invalid argument fd=105)
E/Adreno200-EGLSUB(  472): UnlockImage() genlock_unlock_buffer failed
E/libgenlock(  472): perform_lock_unlock_operation: GENLOCK_IOC_LOCK failed (lockType0x0, err=Invalid argument fd=105)
E/Adreno200-EGLSUB(  472): UnlockImage() genlock_unlock_buffer failed
E/libgenlock(  472): perform_lock_unlock_operation: GENLOCK_IOC_LOCK failed (lockType0x0, err=Invalid argument fd=105)
E/Adreno200-EGLSUB(  472): UnlockImage() genlock_unlock_buffer failed
E/Adreno200-EGLSUB(  472): UnlockImage() genlock_unlock_buffer failed

Updated

5 years ago
Duplicate of this bug: 896541

Updated

5 years ago
Blocks: 884399

Updated

5 years ago
Keywords: hang
Which device?

QA: Can you please test to see if this is reproducible on shipping devices, or a regression on Birch?
Keywords: qawanted
(Reporter)

Comment 3

5 years ago
Keon, as stated in the bug title :P.
(In reply to Fabrice Desré [:fabrice] from comment #3)
> Keon, as stated in the bug title :P.

Sorry, new to Bugzilla :)

Comment 5

5 years ago
We've been able to replicate this on Unagi devices since this[1] Friday nightly build if that helps your regression range.

[1] https://pvtbuilds.mozilla.org/pub/mozilla.org/b2g/nightly/mozilla-central-unagi-eng/2013/07/2013-07-19-03-02-06/
(In reply to Dietrich Ayala (:dietrich) from comment #2)
> Which device?
> 
> QA: Can you please test to see if this is reproducible on shipping devices,
> or a regression on Birch?

To my understanding, we currently cannot test on central+master with any builds effectively except for unagi, as there's problems with getting central builds running on other shipping devices. This is being worked on, but at the moment, this is reproducible on unagi, which is our currently 1.2 device we are using for central testing. When we get a better reference device, then we can use that.
Keywords: qawanted
Since bug 896541 is blocking Marketplace automation for Gaia, and was marked a duplicate of this, marking this as critical severity too.
Severity: normal → critical

Updated

5 years ago
Duplicate of this bug: 896926
This is not only Marketplace some other apps fail

I Was able to reproduce this in the e-mail app on Unagi 

Gecko  http://hg.mozilla.org/mozilla-central/rev/e3c19a339b36
Gaia  72d0d7a29b725cf554c8241ea2204625b27dc3ed
BuildID 20130722172607
Version 25.0a1

STR:
1. Open the Email app
2. add your credentials 
3. tap next

Actual:
3. When we tap next the device freezes and the phone is not usable any more. 
!! Screen and buttons don't work

We need to take take out the battery or use adb reboot to make the phone usable
Summary: Keon freezes while trying to load the Marketplace. → Keon and Unagi freeze on master branch while trying to load the Marketplace and other apps, in E/Adreno200-EGLSUB( 472): UnlockImage() genlock_unlock_buffer failed E/libgenlock( 472): perform_lock_unlock_operation: GENLOCK_IOC_LOCK failed
I emailed Milan to see if someone from the GFX team could take a look into this bug.
Let's see if we can get a regression range here to identify the bug that caused this.
Keywords: regression, regressionwindow-wanted
I wonder if this is related to bug 880780#c23.  In both cases the problem is not on all devices, but it is there on unagi, and it points to pieces lower than what we control (driver, genlock)
When looking at regression range, see if it worked fine on June 4th, but not a week later, or if this is a much later regression.
(In reply to Jason Smith [:jsmith] from comment #12)
> Let's see if we can get a regression range here to identify the bug that
> caused this.

My bug 896926 is dup'ed onto this one, and the commit I found after many hours of bisection was "Bug 884188 - Resize small gralloc'd surfaces to work around issues when drawing small canvases on certain devices. r=nrc", hg d264bdb8b400, git.m.o 4af4c99ef22d.  So we may want to look at that.

Incidentally, I didn't see any of the log messages referenced in comment #0 in my case.

(In reply to Milan Sreckovic [:milan] from comment #13)
> I wonder if this is related to bug 880780#c23.  In both cases the problem is
> not on all devices, but it is there on unagi, and it points to pieces lower
> than what we control (driver, genlock)

See also https://bugzilla.mozilla.org/attachment.cgi?id=779667 (also from bug 896926).
BAD:
Failed > Console Output  #144[1] Jul 19, 2013 8:30:35 AM	 x68MB (88 failures, definitely hit this bug)

UNSURE, due to aborted build with no logcat:
Aborted > Console Output  #143[2] Jul 19, 2013 7:06:08 AM	 38MB

GOOD: Failed > Console Output  #142[3] 	Jul 19, 2013 3:42:05 AM	 76MB

I _think_ this started with the July 19th, 2013 master Unagi-engineering build (Mozilla RIL) -- https://pvtbuilds.mozilla.org/pub/mozilla.org/b2g/nightly/mozilla-central-unagi-eng/2013/07/2013-07-19-03-02-06/

[1] http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Unagi/job/b2g.unagi.mozril.gaia.master.ui/144/
[2] http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Unagi/job/b2g.unagi.mozril.gaia.master.ui/143/
[3] http://qa-selenium.mv.mozilla.com:8080/view/B2G%20Unagi/job/b2g.unagi.mozril.gaia.master.ui/142/

Here's the sources.xml from the bad build: https://pvtbuilds.mozilla.org/pub/mozilla.org/b2g/nightly/mozilla-central-unagi-eng/2013/07/2013-07-19-03-02-06/sources.xml
The combination of comment 15 & comment 16 align with each other to point to bug 884188.
Keywords: regressionwindow-wanted

Updated

5 years ago
Blocks: 884188
Gabriele - Can you confirm that the patch cited in comment 18 caused this regression in this bug? If so, can you back that patch out?
Flags: needinfo?(gsvelto)
Flags: needinfo?(dscravaglieri)
I'm testing right now both with and without the patch and I'll get back with an answer ASAP.

Updated

5 years ago
Duplicate of this bug: 896369
(In reply to Jason Smith [:jsmith] from comment #19)
> Gabriele - Can you confirm that the patch cited in comment 18 caused this
> regression in this bug? If so, can you back that patch out?

With bug 884188, Gabriele enables small Image Layer to use gralloc buffer.
And this image layer used single buffer which caused the buffer locked (genlock) between content process and chrome process.

The following line explains why buffer locked error shown if there is only single buffer used. (For each bufferswap, it will call DeleteTexture free the texture which used by last time.)
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/TextureHostOGL.cpp#780

I created a patch to use double buffer for image layer and verified with Gabriele's patch, I didn't see genlock error any more.
Created attachment 780290 [details] [diff] [review]
genlock debug for image layer
I can confirm that the problem was caused by http://hg.mozilla.org/mozilla-central/rev/d264bdb8b400 and the specific issue was triggered by the network activity icon being refreshed (which is why it showed up in the MarketPlace). I've also tested that Peter's patch in attachment 780290 [details] [diff] [review] does make the error message go away.

As I mentioned in bug 884188 this is really a workaround for an issue in an external dependency so I'm not sure what would be the best way forward (read: less risky). If Peter's patch fixes the issue for good I'm all for taking it; if not or if we think this is too fragile I'd rather back out http://hg.mozilla.org/mozilla-central/rev/d264bdb8b400 and deploy a Gaia workaround for the specific issue (i.e. the network activity icon) as that would have a lower risk.

Peter, you're far more familiar with this code than me so I leave it up to you to evaluate if it's worth making a follow up based on attachment 780290 [details] [diff] [review] or not. I'll prepare a Gaia patch nonetheless in the meantime because we might need it in mozilla-b2g18 anyway where we've already backed out the workaround. If you think there's no easy way out of it I'd say we back-out that commit today and I'll roll in the Gaia workaround ASAP.
Flags: needinfo?(gsvelto)
attachment 780290 [details] [diff] [review] increase memory allocation. I feel that it is not a way to go. Master already have a problem around gen lock (Bug 880780, Bug 869696, Bug 871624). Bug 858914 is going to fix the gralloc handling in GFX. On master the problem should be fixed depend on Bug 858914.
bug 884188 should fix problem on b2g18 soon not on master. And bug 884188 is already  backed out on b2g18. Master could have a time until fix.
(In reply to Sotaro Ikeda [:sotaro] from comment #25)
> Master already have a problem around gen
> lock (Bug 880780, Bug 869696, Bug 871624). Bug 858914 is going to fix the
> gralloc handling in GFX. On master the problem should be fixed depend on Bug
> 858914.

OK, so shall we back out the mozilla-central workaround and wait to land it again until bug 858914 is fixed? I have a gaia workaround for bug 884188 ready for v1-train which will not require us to enlarge the gralloc'd buffer so we won't have the problem there either.
(In reply to Gabriele Svelto [:gsvelto] from comment #27)
> OK, so shall we back out the mozilla-central workaround and wait to land it
> again until bug 858914 is fixed? I have a gaia workaround for bug 884188
> ready for v1-train which will not require us to enlarge the gralloc'd buffer
> so we won't have the problem there either.

Yes. It is better to back-out the patch until the fix is confirmed after bug 858914 landed.

Updated

5 years ago
Duplicate of this bug: 897196

Updated

5 years ago
Duplicate of this bug: 897560
I got that on several other cases, but since it has been reverted, problems are gone.
Fixed by back out in bug 884188.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED

Updated

5 years ago
Flags: needinfo?(dscravaglieri)

Updated

5 years ago
Duplicate of this bug: 896830
You need to log in before you can comment on or make changes to this bug.