Closed Bug 916264 Opened 9 years ago Closed 9 years ago

genlock failures on some web pages by browser app

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:koi+, firefox26 fixed, firefox27 fixed, b2g-v1.2 fixed)

RESOLVED FIXED
1.2 C1(Sep27)
blocking-b2g koi+
Tracking Status
firefox26 --- fixed
firefox27 --- fixed
b2g-v1.2 --- fixed

People

(Reporter: sotaro, Assigned: sotaro)

References

Details

(Keywords: perf, Whiteboard: [c=handeye p= s=2013.09.20 u=1.2])

Attachments

(3 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #912134 +++

This bug handles the genlock failures in Bug 912134. Bug 912134 fixed only slow fps, did not fix genlock failures. I saw the failures often during web page scrolling case.

There is a similar bug of Bug 906715. But I keep this bus as separate from the bug. Bug 906715 comment 0 has a following comment. It is about a bug of unlock(). And do not see this error since Bug 912134 fixed.
> genlock/gralloc errors reporting -EINVAL on unlock().

During web page scrolling. I always saw GraphicBuffer::lock() failure. The cause of defect seems different.
Priority: P1 → --
I saw the following type of genlock failure.

> E libgenlock: perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err=Connection timed out fd=35)
> E msm7627a.gralloc: gralloc_lock: genlock_lock_buffer (lockType=0x2) failed
> W GraphicBufferMapper: lock(...) failed -22 (Invalid argument)
> E GraphicBufferMapper: lock(...) failed
stack trace of the genlock failure is Bug 912134 Comment 24.
From the debugging, when the problem happens, it was always the following situation.
- ContentClientDoubleBuffered always owns two DeprecatedTextureClientShmem
- When problem happens, one DeprecatedTextureClientShmem owns gralloc buffer.
  But another DeprecatedTextureClientShmem owns shmem.
  It happens because of out of pmem. memory allocation fall back from gralloc to shmem.
- When the lock() failure happens, gralloc buffer is handled as a back buffer and 
  is going to be synchronized with front buffer.
  Front buffer is shmem, it is not a gralloc, then rendering is always done by OpenGL not HwComposer.
In b2g process, following classes are used for rendering.
- GrallocDeprecatedTextureHostOGL: used for thebes layer's gralloc buffer.
- TextureImageDeprecatedTextureHostOGL: used for thebes layers Shmem(for content process) or MemoryImage(for b2g process)

GrallocDeprecatedTextureHostOGL classes share a texture for rendering, but TextureImageDeprecatedTextureHostOGL uses own texture for rendering. Therefore, gralloc buffer is continue to be bounded even when the gralloc buffer is handled as back buffer. Only two ways to unbound gralloc buffer from texture is the following.
- [1] bound next gralloc buffer to the texture
- [2] delete the texture
Today, I tried [1], by using a similar way as in Bug 909851. The failure frequency seems to decrease, but strangely the genlock failure does not disappear :-(
At last, I tried [2]. It seems to work. I did not observe genlock failure. Currently I can not get recent source code from git by some reason. When I could get the recent source, I am going to check again.
Assignee: nobody → sotaro.ikeda.g
Locally disabling tiled rendering, I confirmed that genlock failure does not happen on hamachi.
attachment 805040 [details] [diff] [review] is depend on Bug 916264, but the same problem could happen even before Bug 916264.
++Sotaro. Can you please also disable tiles again on trunk? I can r+ that patch.
Bug 916112 is a bug to disable tiles is already commited to b2g-inbound state.
Depends on: 916112
Attachment #805040 - Flags: review?(nical.bugzilla)
Attachment #805040 - Flags: review?(nical.bugzilla) → review+
Committable patch. Carry "nical.bugzilla: review+".
Attachment #805318 - Flags: review+
Attachment #805040 - Attachment is obsolete: true
Keywords: checkin-needed
Sotaro, i'm seeing some really bad lagging when typing , and it's throwing these errors.  will you patch here fix the issue?  

See screencast: http://www.youtube.com/watch?v=fN342lRVxzs 

logcat:
09-16 13:55:35.039: E/libgenlock(140): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1, err=Connection timed out fd=168)
09-16 13:55:35.039: E/msm7627a.gralloc(140): gralloc_lock: genlock_lock_buffer (lockType=0x2) failed
09-16 13:55:35.039: W/GraphicBufferMapper(140): lock(...) failed -22 (Invalid argument)

Also, is it the same as bug 858914?   cc'ing gsvelto cause he said its related.

If this is the right fix, we need this patch to block koi+
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Tony Chung [:tchung] from comment #13)
> Sotaro, i'm seeing some really bad lagging when typing , and it's throwing
> these errors.  will you patch here fix the issue?  

It seems related. But I am not sure the patch could fix the keyboard's error.
Flags: needinfo?(sotaro.ikeda.g)
Sotaro, any idea why the keyboard triggers this in particular? Maybe because its in the parent process?
I saw this kind of problem happening when animating the network icon too once I forced small layers to be gralloc'd. The reason why this is not showing up anymore is because that change was backed out and now we're not gralloc'ing small layers anymore. This however hurts both performance and battery life as the hardware compositor can't kick in in a number of common scenarios.
Unfortunately this landed with bug 909746 in the commit message because it was set wrong in the attachment here and I failed to notice it before pushing.

Backed out and re-landed with the correct bug #.
https://hg.mozilla.org/integration/b2g-inbound/rev/bfbb52f4665d
Keywords: checkin-needed
Sorry for my mistake.
https://hg.mozilla.org/mozilla-central/rev/bfbb52f4665d
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Attached file Genlock0917.zip
I was able to get a genlock issue (as seen in the attachment) by going into the browser, making a few google searches, playing a full screen youtube video, and then hitting the home button. I'm going home today, but will try out more scenarios tomorrow if needed just let me know.
FYI: I created a diagram around ClientThebesLayer.
https://github.com/sotaroikeda/firefox-diagrams/wiki/Firefox-Diagrams
Attached file Genlock0918.zip
Disregard comment 20 as that was on a 1.3 build (unless that's what you want) along with the corresponding attachment. I checked and was able to repro again today for 1.2 and am adding a new logcat that has the specific genlock lines along with a full verbose txt file.

Environmental Variables
Device: Buri 1.2 mozRIL
Build ID: 20130918004001
Gecko: http://hg.mozilla.org/releases/mozilla-aurora/rev/0322470077b7
Gaia: 9b1b262e8fde58be453fb05ed91c0e93ab86d394
Platform Version: 26.0a2
(In reply to gbennett from comment #22)
> Created attachment 806732 [details]
> Genlock0918.zip
> 
> Disregard comment 20 as that was on a 1.3 build (unless that's what you
> want) along with the corresponding attachment. I checked and was able to
> repro again today for 1.2 and am adding a new logcat that has the specific
> genlock lines along with a full verbose txt file.
> 
> Environmental Variables
> Device: Buri 1.2 mozRIL
> Build ID: 20130918004001
> Gecko: http://hg.mozilla.org/releases/mozilla-aurora/rev/0322470077b7
> Gaia: 9b1b262e8fde58be453fb05ed91c0e93ab86d394
> Platform Version: 26.0a2

Can you file a followup bug?
blocking-b2g: koi? → koi+
Keywords: perf
Priority: -- → P1
Whiteboard: [c=handeye p= s=2013.09.20 u=1.2]
Duplicate of this bug: 881970
Target Milestone: --- → 1.2 C1(Sep27)
You need to log in before you can comment on or make changes to this bug.