Closed
Bug 986694
Opened 11 years ago
Closed 1 year ago
Browser app scrolling shows white screen occasionally on Desktop Sites
Categories
(Core :: Graphics: Layers, defect, P3)
Tracking
()
RESOLVED
INVALID
blocking-b2g | - |
People
(Reporter: tkundu, Unassigned)
References
Details
(Keywords: perf, Whiteboard: [c=handeye p= s= u=])
Attachments
(5 files)
Reference Bug: https://bugzilla.mozilla.org/show_bug.cgi?id=942750#c0
STR:
1) Flash v1.4 FFOS build on device
2) make sure that APZ is turned on for All gaia app in your device
3) start browser App and launch www.cnbc.com . Give it 1 min to load full website. Try to scroll as fast as possible on 800x480 display device.
4) You will see white screens occasionally.
Max size 10MB is tool small for video attachment. I can still compress and upload a video if needed.
Gaia:https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v1.4&id=ee89ad8ce3dbaa27b372affb7121a429ffe18f7a
Gecko: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v1.4&id=176fc2ed072055ac33a174d4e92169705874a4b9
Reporter | ||
Updated•11 years ago
|
blocking-b2g: --- → 1.4?
Comment 1•11 years ago
|
||
Can you upload to youtube?
Updated•11 years ago
|
blocking-b2g: 1.4? → 1.4+
Updated•11 years ago
|
Whiteboard: [systemsfe]
Updated•11 years ago
|
Target Milestone: --- → 1.4 S4 (28mar)
Comment 2•11 years ago
|
||
triage: Can someone on the browser team please grab a profile, which will help us diagnose the issue. Thanks! Profiling guide - https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler
Flags: needinfo?(anygregor)
Whiteboard: [systemsfe] → [systemsfe][c=handeye p= s= u=]
Updated•11 years ago
|
Whiteboard: [systemsfe][c=handeye p= s= u=] → [systemsfe][c=handeye p= s= u=1.4]
Comment 4•11 years ago
|
||
Using this I seen some white screen, particularly when scrolling back to the top of the page (graphics heavy) from the bottom
There were a few crashes, this is a desktop site we are being sent (seperate evangelism bug?) and its very heavy, not sure the white is entirely unexpected
This was a fresh gecko build and gaia profile,
gecko: d75fda4229c7f297f2aa15ae61520dbbb07160f2
Comment 5•11 years ago
|
||
And deassigning, the white screen is a graphics issue, I dont believe theres anything well be able to do gaia side
Assignee: dale → nobody
Comment 6•11 years ago
|
||
I did a profile when scrolling naver.com, that seems different from Dale's profile, so I would like to share it. I used nexus-4 for that.
In this profile, the child spends long time on Paint, because it is waiting the main process to allocate the buffer (SendGrallocBufferConstructor).
As I understand, TextureClientPool should avoid that allocation by keeping some buffers on its pool. However the number of active buffers (mOutstandingClients) is bigger than the maximum texture clients managed by this pool (sMaxTextureClients), so no textures are kept in the pool, causing the child to request a new buffer almost always.
I did a small test, by increasing the sMaxTextureClients from 50 to 200, and I could see a improvement in frequency that the white screen appears.
Comment 7•11 years ago
|
||
I should have mentioned, mine was on a hamachi device, apologies
Updated•11 years ago
|
Component: Gaia::Browser → Graphics: Layers
Product: Firefox OS → Core
Version: unspecified → 30 Branch
Updated•11 years ago
|
Whiteboard: [systemsfe][c=handeye p= s= u=1.4] → [c=handeye p= s= u=1.4]
Comment 8•11 years ago
|
||
I don't know what we can do here. Increasing the number of maximum clients means a better chance of OOM. At some point, you're trying to draw more than this device can cache or keep up with. Especially given that we're hitting the desktop site as mentioned above.
Comment 9•11 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #8)
> I don't know what we can do here. Increasing the number of maximum clients
> means a better chance of OOM. At some point, you're trying to draw more
> than this device can cache or keep up with. Especially given that we're
> hitting the desktop site as mentioned above.
I think we should get a more realistic test case here. Desktop sites will always have problems, as they aren't optimized for mobile devices.
I'm renoming this because I don't think it's realistic to set the no checkerboarding requirement on non-optimized sites for mobile.
blocking-b2g: 1.4+ → 1.4?
Comment 10•11 years ago
|
||
Tapas,
Can you please help check if this happening on an optimized site for the phone?
Flags: needinfo?(tkundu)
Reporter | ||
Comment 11•11 years ago
|
||
(In reply to Preeti Raghunath(:Preeti) from comment #10)
> Tapas,
>
> Can you please help check if this happening on an optimized site for the
> phone?
It does not come on optimized site for the phone. I tested with youtube, yahoo and cnbc mobile websites.
Flags: needinfo?(tkundu)
Comment 12•11 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #11)
> (In reply to Preeti Raghunath(:Preeti) from comment #10)
> > Tapas,
> >
> > Can you please help check if this happening on an optimized site for the
> > phone?
>
> It does not come on optimized site for the phone. I tested with youtube,
> yahoo and cnbc mobile websites.
Are you saying you can or can't reproduce this with these mobile optimized sites? I can't tell by your comment here.
Flags: needinfo?(tkundu)
Reporter | ||
Comment 13•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #12)
> (In reply to Tapas Kumar Kundu from comment #11)
> Are you saying you can or can't reproduce this with these mobile optimized
> sites? I can't tell by your comment here.
I CANNOT reproduce this issue with mobile optimized web sites.
Flags: needinfo?(tkundu)
Updated•11 years ago
|
Summary: Browser app scrolling shows white screen occasionally → Browser app scrolling shows white screen occasionally on Desktop Sites
Comment 14•11 years ago
|
||
Inder
Since this is not seen in mobile sites, we wouldn't block on this. Please assess and let me know
Flags: needinfo?(ikumar)
Comment 15•11 years ago
|
||
Moving the ni to Vikram, who can assess the requirement from perf perspective.
Flags: needinfo?(ikumar) → needinfo?(mvikram)
Comment 16•11 years ago
|
||
The problem is that an end user will not know if he is hitting a mobile or desktop site. Also, as everyone knows, not all websites direct mobiles to a mobile friendly site.
Was this a regression since when APZC was introduced(I'm not sure because I thought the browser always supported APZC). Can we quantify the memory increase by increasing the buffer pool? We could consider making this a pref value as some devices may not be that memory constrained.
Flags: needinfo?(mvikram)
Comment 17•11 years ago
|
||
During triage we were wondering if adding checkerboarding over the background color to indicate motion would be an acceptable interim solution for this bug.
Flags: needinfo?(milan)
Comment 18•11 years ago
|
||
That suggestion has certainly been forwarded before, including the perhaps having different pattern for different applications, or only doing it on applications and not browser, or others.
Something like that is probably doable in the 2.0 timeframe if we decide to prioritize it.
Flags: needinfo?(milan)
Comment 20•11 years ago
|
||
Maybe we can try FF for Android using the same Gecko version on a QRD device to better level-set the issue. Might as well try Chrome too. If they do no better than I think we should reconsider spending further v1.4 time on this.
Flags: needinfo?(tkundu)
Comment 21•11 years ago
|
||
(In reply to Mandyam Vikram from comment #16)
> ...
> Was this a regression since when APZC was introduced(I'm not sure because I
> thought the browser always supported APZC). Can we quantify the memory
> increase by increasing the buffer pool? We could consider making this a pref
> value as some devices may not be that memory constrained.
Yes, browser supported APZ since the start. I don't know if we have devices that run both 1.0 and 1.4 in order to compare if this should be marked as a regression.
Flags: needinfo?(milan)
Reporter | ||
Comment 22•11 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #20)
> Maybe we can try FF for Android using the same Gecko version on a QRD device
> to better level-set the issue. Might as well try Chrome too. If they do no
> better than I think we should reconsider spending further v1.4 time on this.
I tested firefox aurora[1] on msm8x26 android Kitkat. Browser is scrolling fine with www.cnbc.com in android and it does not show any white screen if we scroll fast. Same is observed with chrome too.
So if want to make v1.4 FFOS as good as 'firefox for android' then we should fix this issue in v1.4.
[1] https://www.mozilla.org/en-US/mobile/aurora/
Flags: needinfo?(tkundu)
Comment 23•11 years ago
|
||
BenWa, let's see if there is something obvious here.
Assignee: nobody → bgirard
blocking-b2g: 1.4? → 1.4+
Comment 24•11 years ago
|
||
(In reply to Andre Graziani (:graziani) from comment #6)
> Created attachment 8397502 [details]
> Screen shot of profile while scrolling naver.com
>
> I did a profile when scrolling naver.com, that seems different from Dale's
> profile, so I would like to share it. I used nexus-4 for that.
>
> In this profile, the child spends long time on Paint, because it is waiting
> the main process to allocate the buffer (SendGrallocBufferConstructor).
The pool is aimed to make this a bit better but in general bug 959089 should be a better solution.
(In reply to Mandyam Vikram from comment #16)
> Can we quantify the memory
> increase by increasing the buffer pool? We could consider making this a pref
> value as some devices may not be that memory constrained.
Yes, take sMaxTextureClients * 256 * 256 * 4 will give you an upper-bound. Adding a preference for this is a good idea but we should discuss this in a different bug (clone of this bug is fine).
(In reply to Tapas Kumar Kundu from comment #22)
> So if want to make v1.4 FFOS as good as 'firefox for android' then we should
> fix this issue in v1.4.
On Firefox for android we just use GL Texture. They are overall slower than Gralloc but they can be faster if there's a lot of allocation of gpu tiles and the compositor thread is busy thus dealing servicing the incoming gralloc allocations. Bug 959089 will hopefully close this gap. From the profile in Comment 6 this is what we're seeing. This would explain why Firefox for android would be faster then.
My suggestion here is to divert all effort to bug 959089.
Depends on: 959089
Comment 25•11 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #24)
> Adding a preference for this is a good idea but we should discuss this in a
> different bug (clone of this bug is fine).
Opened bug 996458
Comment 26•11 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #24)
> ...
>
> My suggestion here is to divert all effort to bug 959089.
That may be the only practical thing to do now. It would probably disqualify this bug from being 1.4, there are a lot of changes in that bug, and it depends on more work that needs to be done.
We have our explanation as to why we're slower, at this point, I would prefer we reconsider this as a blocker. Re-sending to triage.
blocking-b2g: 1.4+ → 1.4?
Comment 27•11 years ago
|
||
Inder,
Based on risk, we'd like to move this to 2.0
Flags: needinfo?(ikumar)
Comment 28•11 years ago
|
||
Preeti -- ok. fine by me.
Moving ni to Vikram to get his input as well.
Flags: needinfo?(ikumar) → needinfo?(mvikram)
Updated•11 years ago
|
Whiteboard: [c=handeye p= s= u=1.4] → [c=handeye p= s= u=]
Updated•11 years ago
|
Status: NEW → ASSIGNED
Comment 31•11 years ago
|
||
Minusing from 2.0 since in past releases desktop sites have not been expected to be problem-free when viewing in the fxOS browser. If that understanding has changed and involved teams are committing to supporting full desktop sites feel free to renom.
blocking-b2g: 2.0? → -
Priority: P1 → P3
Target Milestone: 1.4 S4 (28mar) → ---
Comment 32•11 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #24)
> My suggestion here is to divert all effort to bug 959089.
Bug 959089 was landed to master and visually it seems that checkerboarding was reduced when scrolling desktop pages. But it still happens a lot.
This is the new profile I got after patch from bug 959089, for scrolling in the same conditions as reported in comment 6:
http://people.mozilla.org/~bgirard/cleopatra/#report=5c319354028788c710d0c381cda655405b153635
The time spent by the child process waiting for buffer allocation is about 30% of the total.
Comment 33•11 years ago
|
||
Thanks for capturing this. Can you please enable some debug info in SimpleTextureClientPool (at the top of the file) to see why we aren't reusing tiles more efficiently?
Comment 34•11 years ago
|
||
Important to notice that I had to enable "Simple Tiling" in developer menu to get this log. By default the SimpleTextureClientPool path is not taken, it uses the TextureClientPool.
So, using the Simple Tiling, the logs show that about 25% of the new textures requested come from newly allocated textures, and 75% comes from recycled.
Comment 35•11 years ago
|
||
Hey Andre. Sorry about misdirecting you. We want to use the default tile pool.
Looking at TextureClientPool I don't think we are running into sMaxTextureClients. Could you add a printf into the GetTextureClient path to verify? More likely we shrink down to sMinCacheSize, which is 0, and then have to allocate our way back up.
That having said there is another bug here. We aren't flushing the pool on a low-memory notification. We only do that after the timer. That needs to be fixed as well. Can you add the printf and see what mOutstandingClients looks like during your tests? We can whip up a patch that keeps more tiles around as long there is no memory pressure.
Comment 36•11 years ago
|
||
Comment 37•11 years ago
|
||
Andre, want to try this patch? After a few initial allocations we should never wait, except if we run low on memory. Also please log how many outstanding texture clients we have with a printf so we know what we are dealing with here.
Tapas: "The time spent by the child process waiting for buffer allocation is about 30% of the total." Are you guys allocating without MAP_POPULATE again in your vendor code? 30% time spent in allocation sounds a lot like memset exercising the page fault handler of the kernel. Would be good if you can measure where this time is spent. If its in the kernel, its probably something we should fix in the vendor code as well (the caching stuff above is a separate bug, that avoids the parent process round-trip).
Flags: needinfo?(tkundu)
Comment 38•11 years ago
|
||
Andreas, I think the line 80 was deleted by mistake in your patch, otherwise the pool will be always empty. So I just inserted it back in my tests.
This log contains the number of outstanding clients and the number of textures in the pool.
Reporter | ||
Comment 39•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #37)
> Tapas: "The time spent by the child process waiting for buffer allocation is
> about 30% of the total." Are you guys allocating without MAP_POPULATE again
> in your vendor code? 30% time spent in allocation sounds a lot like memset
> exercising the page fault handler of the kernel. Would be good if you can
> measure where this time is spent. If its in the kernel, its probably
> something we should fix in the vendor code as well (the caching stuff above
> is a separate bug, that avoids the parent process round-trip).
Thanks for pointing me to this. I am looking into it.. I will update asap
Reporter | ||
Comment 40•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #37)
> Tapas: "The time spent by the child process waiting for buffer allocation is
> about 30% of the total." Are you guys allocating without MAP_POPULATE again
> in your vendor code? 30% time spent in allocation sounds a lot like memset
> exercising the page fault handler of the kernel. Would be good if you can
> measure where this time is spent. If its in the kernel, its probably
> something we should fix in the vendor code as well (the caching stuff above
> is a separate bug, that avoids the parent process round-trip).
Can you please point me to exact gecko function/line number where you are seeing this delay during buffer allocation ? Thanks a lot for your help.
Flags: needinfo?(gal)
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(tkundu)
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(andre.graziani)
Comment 41•11 years ago
|
||
Hi Tapas, I am going to rewrite my statement to make it clearer: "The time spent by the child process waiting for parent process to allocate buffer is about 30% of the total."
I got this from the profile in comment 32. Here is the exactly same profile, but with a better range: http://people.mozilla.org/~bgirard/cleopatra/#report=f4ef254bbb5a77a213007d24f05d4b1a3bbe73d5
If you keep expanding the methods with highest running time, you may end up here:
http://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ISurfaceAllocator.cpp#318
where 30% of the time is spent.
However, AFAIK, the allocation happens at parent process, so you may go deeper into parent side.
Flags: needinfo?(andre.graziani)
Comment 42•11 years ago
|
||
The parent handles this in SharedBufferManagerParent::RecvAllocateGrallocBuffer, which does:
sp<GraphicBuffer> outgoingBuffer = new GraphicBuffer(aSize.width, aSize.height, aFormat, aUsage);
In previous versions of your silicon gonk we have seen an mmap in the gralloc/ion code that doesn't pre-map the entire buffer and then does an memset, which for larger buffers causes hundreds of segfaults, which is slow.
Flags: needinfo?(gal)
Comment 43•11 years ago
|
||
110 is a lot of textures. Why are we doing that?
Reporter | ||
Comment 44•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #42)
> The parent handles this in
> SharedBufferManagerParent::RecvAllocateGrallocBuffer, which does:
>
> sp<GraphicBuffer> outgoingBuffer = new GraphicBuffer(aSize.width,
> aSize.height, aFormat, aUsage);
>
> In previous versions of your silicon gonk we have seen an mmap in the
> gralloc/ion code that doesn't pre-map the entire buffer and then does an
> memset, which for larger buffers causes hundreds of segfaults, which is slow.
I just confirmed that SharedBufferManagerParent::RecvAllocateGrallocBuffer() takes negligible time
to create gralloc buffer.
but there is a big IPC delay delay between ISurfaceAllocator::AllocGrallocBuffer() and SharedBufferManagerParent::RecvAllocateGrallocBuffer() . And this delay happens randomly during scrolling www.cnbc.com
I used following gaia/gecko for profiling:
gaia: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/master&id=ed5d408dc1120b035ebce9a809499c30fbfb4582
gecko: https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/master&id=45c69de2af9d2504a8baac39ca759403931d5158
please make NI on me for faster response .
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(andre.graziani)
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(gal)
Comment 45•11 years ago
|
||
Do we peg both CPUs at 100% during this time? Any idea why there is a scheduling delay?
Flags: needinfo?(gal)
Comment 47•11 years ago
|
||
I tested today again with the current master, and I was surprised with the improvements that prograssive-paint/low-precision-buffer features have done about the checkerboarding.
I still can see checkerboarding in some extreme cases, but most of the time it doesn't appear.
Flags: needinfo?(andre.graziani)
Updated•10 years ago
|
Assignee: bgirard → nobody
Status: ASSIGNED → NEW
Updated•3 years ago
|
Severity: normal → S3
Updated•1 year ago
|
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•