Closed Bug 1062475 Opened 10 years ago Closed 9 years ago

[Camera] Camera button highlight appears as a square.

Categories

(Core :: Graphics, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla36
blocking-b2g 2.1?
Tracking Status
b2g-v2.0 --- unaffected
b2g-v2.1 --- affected
b2g-v2.2 --- unaffected

People

(Reporter: jthomas, Assigned: chiajung)

References

()

Details

(Whiteboard: [2.1-exploratory])

Attachments

(2 files, 5 obsolete files)

Description: When holding down the camera button and pressing elsewhere a square icon is displayed around the camera button.

Repro Steps:
1) Update a Flame to 20140902040205
2) Launch Camera App
3) Press and hold the camera button.
4) While holding the camera button, press rapidly elsewhere in the camera app.

Actual: A square icon appears around the camera button.

Expected: It is expected that the camera button will change colour to signify it is being pressed.

Flame 2.1

Environmental Variables:
Device: Flame Master
Build ID: 20140903000204
Gaia: fbb297c39aab5f17b179533d2a9a6c5166b2c197
Gecko: fb5e796da813
Version: 34.0a2 (Master)
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Repro frequency: 100%
See attached: Logcat, Video https://www.youtube.com/watch?v=ZMhhWA3yYTY&edit=vd
This issue DOES occur on the Flame 2.1 (512mb) and Open_C 2.1

Description: When holding down the camera button and pressing elsewhere a square icon is displayed around the camera button.

Flame 2.1

Environmental Variables:
Device: Flame Master
Build ID: 20140903000204
Gaia: fbb297c39aab5f17b179533d2a9a6c5166b2c197
Gecko: fb5e796da813
Version: 34.0a2 (Master)
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Open C 2.1

Environmental Variables:
Device: Open_C Master
Build ID: 20140903000204
Gaia: fbb297c39aab5f17b179533d2a9a6c5166b2c197
Gecko: fb5e796da813
Version: 34.0a2 (Master)
Firmware Version: P821A10V1.0.0B06_LOG_DL
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

This issue does NOT occur on the Flame 2.0 and Open_C 2.0

Result: A square icon appears around the camera button.

Flame 2.0

Environmental Variables:
Device: Flame 2.0 (319mb)
BuildID: 20140903000206
Gaia: 449d8db9b3ea1f9262db822c37ef2143477172b7
Gecko: 13b41e22c8f6
Version: 32.0 (2.0)
Firmware: V123
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Open_C 2.0

Environmental Variables:
Device: Open_C 2.0
Build ID: 20140903000206
Gaia: 449d8db9b3ea1f9262db822c37ef2143477172b7
Gecko: 13b41e22c8f6
Version: 32.0 (2.0)
Firmware Version: P821A10V1.0.0B06_LOG_DL
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Keywords: regression
Whiteboard: [2.1-exploratory]
Please clarify your actual result for 2.0.
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage-]
Flags: needinfo?(ktucker) → needinfo?(jthomas)
Results for 2.0: When holding the camera button and pressing anywhere else on the screen rapidly the camera icon will turn white, shrinking in only one time. I believe this to be by design.

Additionally this issue DOES occur on Flame 2.2

Flame 2.2(319mb)

Environmental Variables:
Device: Flame Master (319mb)
Build ID: 20140904040204
Gaia: 008026e932b64b4a70b9931c3da96986583bc8d4
Gecko: 776fa9cf70cd
Version: 35.0a1 (Master)
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0

Result: A square icon appears around the camera button.
QA Whiteboard: [QAnalyst-Triage-] → [QAnalyst-Triage?]
Flags: needinfo?(jthomas) → needinfo?(ktucker)
Even though this is a regression not nominating to block on this because it appears to be minor. The user has to hold down the shutter button and tap on the screen to see the square.
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][lead-review+]
From bug 1073144 (which I'm marking as a dupe):

When the camera app is under high load (perhaps when memory is swapping?) and/or when the shutter button is tapped very, very rapidly, it sometimes animates as a square rather than as a circle.

I've seen this in both camera and video mode. You can see it in the video that was recorded for bug 1066885: http://youtu.be/8-mlqgAvaUY

I find it fairly easy to reproduce this by just tapping a fast as I can on the shutter button in either mode.

We know (from bug 1066885, for example) that the camera code can get stuff out of sync when buttons are pushed really rapidly.  What if js/controllers/controls.js calls captureHighlightOff while the captureHighlightOn animation is still happening? Or vice versa? Could either of those things cause the border-radius of an animated element to change in mid-animation?

On the other hand, this seems like it could be a platform bug. If I were investigating, I'd ask someone in #gfx if they know whether css animations ever intentionally drop details like border-radius settings to speed things up under high load.
The appears to be a platform graphics issue related to masking (via border-radius and overflow: hidden) within layerised content.

Moving this component to core:graphics and NI cwiiis (sorry Chris!) to help NI the correct graphics person.
Component: Gaia::Camera → Graphics
Flags: needinfo?(chrislord.net)
Product: Firefox OS → Core
Passing it on to mattwoodrow and tn.
Flags: needinfo?(tnikkel)
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(chrislord.net)
This sounds very similar to bug 1072830, which was being investigated by Chiajung.
Flags: needinfo?(matt.woodrow) → needinfo?(chung)
Bug 1072830 is closed because of gaia behaviour change, I can tell only:
(1) The code path is common code, they should be the same on all platform
(2) The mask only missed while the masked elements becomes a standalone layer(mask effect must be applied by compositor). While it is merged in content side, we can never observe the issue.
(3) The missing mask layer symptom is always single frame from layerscope.
(4) Bug 1072830 is a ThebesLayer+maskLayer, this bug seems colorLayer+maskLayer.

From (3), it maybe some kind of out of sync problem, I am checking now, however, while layerscope enabled, the animation is usually skipped and make the problem hard to be observed. The slowdown may mislead the observation.
Flags: needinfo?(chung)
Based on the observation above, I found this is a hardware composer bug.
Attached patch V1 (obsolete) — Splinter Review
Let hardware composer skip layer while maskLayer presents.
Attachment #8501587 - Flags: feedback?(sotaro.ikeda.g)
Comment on attachment 8501587 [details] [diff] [review]
V1

Good catch! The patch seems work. I forward the feedback request to Sushil.
Attachment #8501587 - Flags: feedback?(sotaro.ikeda.g) → feedback?(sushilchauhan)
Comment on attachment 8501587 [details] [diff] [review]
V1

Review of attachment 8501587 [details] [diff] [review]:
-----------------------------------------------------------------

If I disable HWComposer from Settings, even then I see a square on the Camera button. It is using GPU Composition now. How is it related to HWC ? I think it is layer painting issue.
Attachment #8501587 - Flags: feedback?(sushilchauhan) → feedback-
I can not observe square icon after disable hwc. Can you provide a video/snapshot shows GPU composition + square icon?

Please enable FPS meter in Settings app and take video. If it uses HWC, the fps counter will not show. I confirmed that yesterday, I may upload my video/snap later.

By the way, does hwc support such kind of ROP?
Sorry, we do not have v2.1 build setup for Flame device and I am busy with other stuff. I had checked it on a H/W Overlay device with v2.1 and disabled HWC from Settings. I verified with logs that it is using GPU Composition and I noticed square image on Camera button. You can check on MADAI with v2.1 build.
I confirmed it on Flame-KK build, and can not reproduce both this and bug 1072830.
While 1072830 hit rate is 100% with older gaia, and this is harder to trigger, I would suggest to try disable HWC and use older gaia to reproduce the problem.

By the way, I can not access MADAI device here, and may not able to further check it while my Flame can not reproduce it.

@pchang
Can you help access MADAI device or confirm whether it is MADAI only?
Flags: needinfo?(pchang)
(In reply to Chiajung Hung [:chiajung] from comment #18)
> I confirmed it on Flame-KK build, and can not reproduce both this and bug
> 1072830.
> While 1072830 hit rate is 100% with older gaia, and this is harder to
> trigger, I would suggest to try disable HWC and use older gaia to reproduce
> the problem.
> 
> By the way, I can not access MADAI device here, and may not able to further
> check it while my Flame can not reproduce it.
> 
> @pchang
> Can you help access MADAI device or confirm whether it is MADAI only?

chiajung, you can get it from device team.
Flags: needinfo?(pchang)
I tried MADAI with this week's build, but I can not reproduce both this and bug 1072830.

Since I can not reproduce this on Flame with my patch and we can not access MADAI's source code, I may not able to follow up this bug for MADAI device.

@sotaro
Do you think the patch is safe to land into mozilla-central? If yes, please label r+, and I will format the commit log to be merged.
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Sushil from comment #15)
> Comment on attachment 8501587 [details] [diff] [review]
> V1
> 
> Review of attachment 8501587 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> If I disable HWComposer from Settings, even then I see a square on the
> Camera button. It is using GPU Composition now. How is it related to HWC ? I
> think it is layer painting issue.

I just re-checked it by using latest master flame-kk. The problem happened by disabling Hwcomposer.
Flags: needinfo?(sotaro.ikeda.g)
I recall one thing. If a layer has a mask, ContainerLayer uses intermediate surface. In this case, HwComposer is not used. Therefore attachment 8501587 [details] [diff] [review] does not have actual effect.

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/Layers.cpp#1069
http://mxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp#380
Thanks for the info, I further investigate this problem with modified layerscope tool, and find the MaskLayer is not nullptr but EffectMask is nullptr.

I make some related log printf_stderr, and found that we will early return here:
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/CompositableHost.cpp#160
Finally, I found the culprit:

http://dxr.mozilla.org/mozilla-central/source/gfx/layers/client/ImageClient.cpp#189

After remove this and can not reproduce the problem over 100 click.

This is strange: this makes us always send UpdateTexture, UseTexture, RemoveTexture in 1 transaction, and make me think it should always reproducible and not BSP dependent.

To find a proper solution, I may need take time to investigate the reason why its random.
Attached patch V2 (obsolete) — Splinter Review
I found there are 2 or more solution, one is this patch, another is to use ImageClientSingle like other platform rather then ImageClientDouble here. But that may affect other part based on the comment there.

The problem occurs while the Image::GetSerial() differs but the underlying TextureClient is the same. Originally I think I should make CairoImage cache TextureClient use Image::GetSerial as key rather than CompositableForwarder::GetSerial but failed. Therefore, I fix it here...
Attachment #8501587 - Attachment is obsolete: true
Attachment #8505145 - Flags: feedback?(nical.bugzilla)
Comment on attachment 8505145 [details] [diff] [review]
V2

Review of attachment 8505145 [details] [diff] [review]:
-----------------------------------------------------------------

Sotaro is a better person to give this feedback.
I think that it indeed makes sense to not remove the current texture if the new one is the same, but I may be missing something.
Attachment #8505145 - Flags: feedback?(nical.bugzilla) → feedback?(sotaro.ikeda.g)
Comment on attachment 8505145 [details] [diff] [review]
V2

It looks good. It might not be optimum solution from performance point of view. but it seem OK for the time being.
Attachment #8505145 - Flags: feedback?(sotaro.ikeda.g) → feedback+
In current gecko, the following is expected for duplicate detection. But this bug make clear that is actually not enough. There could be a case that same image is sent back to back.

>   if (mLastPaintedImageSerial == image->GetSerial()) {

In gecko, we use GrallocTextureClient. gralloc buffer do direct binding to texture. Therefore, if it happens, GrallocTextureClient should be immutable, otherwise it is a bug. When it is immutable, there is not need to forward GrallocTextureClient to parent side.
In ImageClientSingle::UpdateImageInternal(), GetForwarder()->UseTexture() for new TextureClient is called before RemoveTexture() for old TextureHost. This order is chosen for the performance. If same TextureClient is used for updating image, current TextureClient is soon removed as old one. It causes the this bug's problem.
(In reply to Sotaro Ikeda [:sotaro] from comment #28)
> In current gecko, the following is expected for duplicate detection. But
> this bug make clear that is actually not enough. There could be a case that
> same image is sent back to back.
> 
> >   if (mLastPaintedImageSerial == image->GetSerial()) {
> 
> In gecko, we use GrallocTextureClient. gralloc buffer do direct binding to
> texture. Therefore, if it happens, GrallocTextureClient should be immutable,
> otherwise it is a bug. When it is immutable, there is not need to forward
> GrallocTextureClient to parent side.

A small correction here: MaskLayers is A8 surface which we use normal memory sharing rather than Gralloc one. BTW, the cost of sending GrallocBuffers around is much more cheap than normal memory. Since we send GrallocBufferHandle(2 integers) rather than a bunch of raw data.
Attached patch V2,final (obsolete) — Splinter Review
Clean up for review/landing.
Attachment #8505145 - Attachment is obsolete: true
Attachment #8505861 - Flags: review?(sotaro.ikeda.g)
Seems like this bug is under control now, clearing needinfo.
Flags: needinfo?(tnikkel)
(In reply to Chiajung Hung [:chiajung] from comment #30)
> > 
> > In gecko, we use GrallocTextureClient. gralloc buffer do direct binding to
> > texture. Therefore, if it happens, GrallocTextureClient should be immutable,
> > otherwise it is a bug. When it is immutable, there is not need to forward
> > GrallocTextureClient to parent side.
> 
> A small correction here: MaskLayers is A8 surface which we use normal memory
> sharing rather than Gralloc one. BTW, the cost of sending GrallocBuffers
> around is much more cheap than normal memory. Since we send
> GrallocBufferHandle(2 integers) rather than a bunch of raw data.

Yeha, A8 is not supported by gralloc hal, therefore we use shmem for it.
Comment on attachment 8505861 [details] [diff] [review]
V2,final

Review of attachment 8505861 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/ImageClient.cpp
@@ +298,4 @@
>  
>      if (mFrontBuffer &&
>          (mFrontBuffer->IsImmutable() || mFrontBuffer->GetSize() != size)) {
> +      //autoRemoveTexture.mTexture = mFrontBuffer;

Why is it commented out?
Attachment #8505861 - Flags: review?(sotaro.ikeda.g)
Attached patch V2,final_1 (obsolete) — Splinter Review
Remove test code.
Attachment #8505861 - Attachment is obsolete: true
Attachment #8506573 - Flags: review?(sotaro.ikeda.g)
Attachment #8506573 - Flags: review?(sotaro.ikeda.g) → review+
Attached patch Final (obsolete) — Splinter Review
Try ticket:
https://tbpl.mozilla.org/?tree=Try&rev=738a24242568

All the try fails seems non-related.
Assignee: nobody → chung
Attachment #8506573 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8507760 - Flags: review+
Why was this pushed to Try on top of a parent from early September? That seems like a *really* bad idea to me. Also, an "all" run for this was almost certainly excessive, which leads to backlogs that affect all developers. Please be more considerate in the future.
https://wiki.mozilla.org/Sheriffing/How:To:Recommended_Try_Practices
Flags: needinfo?(chung)
https://hg.mozilla.org/integration/mozilla-inbound/rev/2a87759dd8c8

Also, please keep in mind that your commit message should be summarizing what the patch is doing, not just restating the problem it's solving.
https://developer.mozilla.org/en-US/docs/Developer_Guide/Committing_Rules_and_Responsibilities#Checkin_comment
Keywords: checkin-needed
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #37)
> Why was this pushed to Try on top of a parent from early September? That
I push it yesterday, and the time stamp here is "Sun Oct 19 21:52:58 2014 PDT"?
> seems like a *really* bad idea to me. Also, an "all" run for this was almost
> certainly excessive, which leads to backlogs that affect all developers.
> Please be more considerate in the future.
> https://wiki.mozilla.org/Sheriffing/How:To:Recommended_Try_Practices
I try all for this is because the change is in common path. 

Though the bug is b2g only, we use the same ImageClientSingle on all platform for the same thing. B2G uses ImageClientDouble which calls to ImageClientSingle in this case, and all the related logic is in ImageClientSingle. I think if my patch's wrong, I may cause some leaking or even break all other platforms as well.

(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #38)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/2a87759dd8c8
> 
> Also, please keep in mind that your commit message should be summarizing
> what the patch is doing, not just restating the problem it's solving.
> https://developer.mozilla.org/en-US/docs/Developer_Guide/
> Committing_Rules_and_Responsibilities#Checkin_comment
OK, I will upload a new patch with summarize.
Flags: needinfo?(chung)
(In reply to Chiajung Hung [:chiajung] from comment #39)
> I push it yesterday, and the time stamp here is "Sun Oct 19 21:52:58 2014
> PDT"?

The parent revision your patches were applied on top of was 152ef25e89ae, from 9-Sep. Note that the failure-palooza on Android and Mn/Mnw are dead giveaways to how it was too (infrastructure changes that needed in-tree changes to work properly).
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=152ef25e89ae

> I try all for this is because the change is in common path. 
> 
> Though the bug is b2g only, we use the same ImageClientSingle on all
> platform for the same thing. B2G uses ImageClientDouble which calls to
> ImageClientSingle in this case, and all the related logic is in
> ImageClientSingle. I think if my patch's wrong, I may cause some leaking or
> even break all other platforms as well.

You can still almost certainly ascertain that by running a subset of platforms/tests rather than 300+hr worth of everything.

> OK, I will upload a new patch with summarize.

No need to at this point, it's already pushed. Just keep it in mind for the future :)
Add summarize for the patch.
Attachment #8507760 - Attachment is obsolete: true
Attachment #8508395 - Flags: review+
Keywords: checkin-needed
Per comment 38, your patch was already pushed to inbound.
Keywords: checkin-needed
Oops, you are really fast :)
https://hg.mozilla.org/mozilla-central/rev/2a87759dd8c8
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Hello Chiajung, 

This bug was originally written for the Flame device in the 2.2 and 2.1 branches.  However, it does not appear to have a patch uplifted to those branches. 

If the patch was intended to fix Flame 2.2, 2.1 then it would fail our verification process because 2.1 is still affected.  Is this bug closed completely, or should we wait for another fix to land and then verify it?  Thanks for your time!  

Affected build: 

Device: Flame 2.1
BuildID: 20141103001220
Gaia: 027a7de0c95320cea0579bfd1a4ceef3e9038f34
Gecko: ffecb2be228b
Gonk: 48835395daa6a49b281db62c50805bd6ca24077e
Version: 34.0 (2.1)
Firmware: V188
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0
Flags: needinfo?(chung)
I heard :greg can still reproduce this in some special condition, I would further debug it later.
However, I tested over 500 run before landing and can not reproduce it, I think I will need a better reproduce step for the bug.

@gred
Can you describe how to reproduce this bug after the patch?
Status: RESOLVED → REOPENED
Flags: needinfo?(chung) → needinfo?(gweng)
Resolution: FIXED → ---
I had reproduced this bug after the previous patch at a special scenario: 

1. Put the Flame onto my MacBookAir with an angle (necessary!)
2. Launch Camera
3. Before it's fully ready, press the taking shot button

The button, as it processing something and the performance is not so perfect, would become square. However, it can't be reproduced stably. I made a video to show the STR I mentioned:

As I tested, the angle is necessary. If I put it on my desk, or on a another laptop which is thicker than MacBookAir, the symptom would not occur. Maybe this is because of the angle would let it try hard to focus on the target and cause some hardware issue, since if I switch the hardware compositor off, the issue gone.
Flags: needinfo?(gweng)
[Blocking Requested - why for this release]: Based on Bug 1099010.
blocking-b2g: --- → 2.1?
Hi Chiajung, is any update here?
Flags: needinfo?(chung)
As I said in comment #46, I can not reproduce this locally, so I have no way to further debug it.

@Greg,
Can you still reproduce this now?
We have several related bug landed, maybe the symptom you found is just a duplication of other bugs.
Could you please try it again?
Flags: needinfo?(chung) → needinfo?(gweng)
Per talked to Greg, he can not reproduce this now, too.
Let's close it.
Status: REOPENED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → FIXED
Flags: needinfo?(gweng)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: