Unexpected painting behaviour with low-res tiling code

RESOLVED FIXED in mozilla23

Status

()

Core
Graphics
RESOLVED FIXED
6 years ago
a year ago

People

(Reporter: kats, Assigned: cwiiis)

Tracking

20 Branch
mozilla23
All
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(fennec+)

Details

(URL)

Attachments

(1 attachment)

I'm observing some strange painting behaviour; BenWa agrees this is not expected behaviour at all and needs investigation.

STR:
1. Load the URL in the URL field of the bug
2. Tap on the page. this will start an infinite JS loop on the page, hanging the main thread and preventing painting from happening
3. Pan the page

Expected results:
The low-res displayport area outside the critical displayport should be painted in low-resolution

Actual results:
The low-res displayport area is not painted at all; it is just white.

Something else that helps isolate the problem is setting the gfx.displayport.strategy pref to 3, which switches to no-margins displayport. With this, the critical displayport will always be exactly what is visible on the screen, and the low-res displayport will extend a little outside that.
Blocks: 818546, 792138
Version: 19 Branch → 20 Branch
(Assignee)

Comment 1

6 years ago
This is expected - this is because the low-res area is only drawn when we're at risk of checkerboarding, and this is decided during the draw. You're blocking further draws happening, so we can't ever decide to switch on low-res drawing.

To make this situation work, we'd always have to draw the low res region. I'm open to the idea, but it makes the more common case (that we don't need low res drawing) slower.

Unless we took drawing off of the Gecko thread, this isn't really fixable in any other way (?)
(Assignee)

Comment 2

6 years ago
How I'm guessing this came up is that a very slow device gets blocked for way too long doing non-drawing tasks and you see blank a lot - it might be worth having some kind of flag for when a device (or page, somehow?) is expected to be slow, and always doing low-res drawing?
I see a lot of blank even on the Galaxy Tab and Galaxy Nexus on that page, which should generally be pretty fast to draw and doesn't do anything else at all. TBH I'm not sure what the specific algorithm is supposed to be painting when, but the experience is pretty bad. Even after the point where the entire page should be cached in low-res, there are frequent flashes of white as I hit blank areas.
https://www.youtube.com/watch?v=xIa3q2byGag - this is on the Galaxy Tab with me just panning around on the page without having tapped (so that drawing is still happening as needeD).
(Assignee)

Comment 5

6 years ago
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4)
> https://www.youtube.com/watch?v=xIa3q2byGag - this is on the Galaxy Tab with
> me just panning around on the page without having tapped (so that drawing is
> still happening as needeD).

The invalidation of this page doesn't look right at all... What display port strategy is this with?

Assuming that isn't a default build (if it is, there's definitely something buggy going on there), the problem here is that the way we determine when to activate low res drawing (DisplayPortCalculator.aboutToCheckerboard) is either getting the wrong values, or inadequate for this use-case. We don't want to slow down every page for the pages that render slowly, conversely, we don't want those pages to provide a bad UX when we could theoretically do a lot better.

Suggestions for a cleverer algorithm?

Perhaps we could track how much of the viewport isn't in the critical displayport over time (or how far it is from the center) and have some kind of rolling average that forces on low-precision drawing over a certain amount?

Also, although the example you show is very bad, it's a synthetic test to demonstrate this problem - let's not spend *too* long optimising for unnatural behaviour.
The video is a normal build but with the no-margin displayport strategy. It's better with the default strategy but I think only because the critical displayport is larger; the low-res one still isn't getting painted at reasonable times. We'll have to look at that in more detail.
Assignee: nobody → chrislord.net
tracking-fennec: --- → ?
(Assignee)

Comment 7

6 years ago
My current plan for this is to scrap only drawing in low res when in danger, and instead, always drawing low-res when scrolling and having a timer so that it only gets disabled after a certain period of being static (say a second or so). It will still be quite easy to expose blank area during page load and on first scroll during intensive JavaScript. Any thoughts on this?

Another possibility is to switch the order of drawing to draw low-res first instead of high-res... If we're only drawing low-res when scrolling, this ought to work ok, but I'd be a little wary of such a change. That ought to reduce the possibility of seeing blank area quite a lot though, I'd have thought. Any thoughts on this too?

I'll cook up patches for both and we can see how it affects synthetic tests, but I suspect that it's something that will require manual testing to really get a feel for.
(Assignee)

Updated

6 years ago
Status: NEW → ASSIGNED
I think one thing we should definitely do is after we're done drawing a high res patch, fill in the surrounding area with low-res. That will usually happen at the end of the user scrolling (and also on page load) and provides a buffer of low-res content that can be rendered when they start scrolling again to prevent blanking.
(Assignee)

Comment 9

6 years ago
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #8)
> I think one thing we should definitely do is after we're done drawing a high
> res patch, fill in the surrounding area with low-res. That will usually
> happen at the end of the user scrolling (and also on page load) and provides
> a buffer of low-res content that can be rendered when they start scrolling
> again to prevent blanking.

The problem is finding the heuristic to do this that doesn't kill performance during page-load and when not scrolling. During page load, the entire page is constantly reflowing and being redrawn, so enabling it here has a bad impact on page load time, especially on armv6 phones. When not scrolling, we really only want to be updating what's on the screen - though it's unlikely that stuff is changing outside of the screen, if it is we don't want it to impact too largely on what the user can actually see...

Though I may be wrong, I don't think page load changing actually guarantees a draw, so we can't just turn it off when page load < 100%, turn it on for page load to cache the area and expect that to work reliably. We could force a draw on page load == 100%, which incurs the display list building penalty, but that might be acceptable? This would let us cache the area after load.

I think the scroll timer will be reasonable, and maybe we want the forced draw too... I'll write a third patch for this.
tracking-fennec: ? → 21+
(Assignee)

Comment 10

6 years ago
Created attachment 706365 [details] [diff] [review]
Part 1 - Small changes to danger determination

This likely doesn't help too much for your particular test case, but seems to produce marginally better talos numbers.

It changes things so that if any draw is aborted because the viewport slipped outside of the displayport, the next draw will have low precision tiles (regardless of whether this happened on high or low precision drawing).

It also allows low precision drawing to happen if we slip into the danger zone without having sent another viewport adjustment (I don't think this should ever happen, but it doesn't hurt to do it in this order just in case).
Attachment #706365 - Flags: review?(bugmail.mozilla)
Comment on attachment 706365 [details] [diff] [review]
Part 1 - Small changes to danger determination

Review of attachment 706365 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM.
Attachment #706365 - Flags: review?(bugmail.mozilla) → review+
why hasn't this landed?
Flags: needinfo?(chrislord.net)
(Assignee)

Comment 13

5 years ago
Pushed to inbound:

https://hg.mozilla.org/integration/mozilla-inbound/rev/3b237165e5c7

I'm adding [leave-open] as this patch doesn't entirely fix this bug, and I don't think it really can be fixed without OMTP (off-main-thread painting). I don't think the patch I've pushed will make a hugely significant difference, so I'd suggest that this shouldn't be a tracking bug.
Flags: needinfo?(chrislord.net)
Whiteboard: [leave-open]
(Assignee)

Comment 14

5 years ago
Also to note, I have a couple of patches that experimented with low-precision rendering fall-off based on time, and it only had negative effects on talos measurements.

Some of that is because our tests just aren't stressful enough, but I wouldn't feel confident in committing a solution like that - I'd prefer to wait for OMTP unless this is deemed a serious issue.
tracking-fennec: 21+ → +
Depends on: 912148
We really should stop using leave-open like this. The patch landed almost a year ago!
Has OMTP happened? Is that OTMC or something different?
Low res painting still occurs on the main thread. With tiling when some heuristics are hit like if we're going to pan faster then the peak memory bandwidth (no possible way we can keep up with painting) the main thread will paint the screen tile by tile at 1/4 the resolution/bandwidth.
(Assignee)

Comment 19

4 years ago
I don't think it's useful to keep this open anymore, resolving as invalid. I don't think it's possible to improve the heuristics enough to deal with the situation described in this bug without negatively affecting performance in more common situations. As already stated, we would need OMTP to do that.
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → INVALID
A patch landed on this bug, so resolving as fixed. According to the rapid release calendar 04 apr 2013 should have been Fx23 but I didn't confirm that with hg.
Resolution: INVALID → FIXED
Whiteboard: [leave-open]
Target Milestone: --- → mozilla23
You need to log in before you can comment on or make changes to this bug.