Closed Bug 786740 Opened 12 years ago Closed 12 years ago

CSS transition with rounded borders cause rendering issues

Categories

(Core :: Graphics: Layers, defect)

17 Branch
x86
All
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla19
Tracking Status
firefox15 --- wontfix
firefox16 + affected
firefox17 + verified
firefox18 + verified

People

(Reporter: developer, Assigned: roc)

References

Details

(Keywords: regression, Whiteboard: [qa-])

Attachments

(7 files, 3 obsolete files)

Attached file bug.html —
User Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:18.0) Gecko/18.0 Firefox/18.0
Build ID: 20120829030520

Steps to reproduce:

If you open the attached test case, after 1 second, a div will slide out at the bottom of the screen.  This div should be a rounded div with the number "10" inside of it.  But in trunk, it renders glitchy.  The number is not visible and instead of a rounded div, it has jagged edges.

I've tried this with both a Linux nightly and a Windows nightly.  I thought this might be related with dlbi since it started around that time frame but I downloaded a 2012-08-12 build on windows and it still showed the problem.
Attachment #656510 - Attachment mime type: text/plain → text/html
Nightly 20120829030520 Linux x86_64: the “10” is black on dark grey.
I checked some more builds.  It works in the 2012-07-25 nightly and does not work in the nightly 2012-07-26.  This was using a windows nightly with a clean profile.
I can reproduce with HWA off in windows7

Regression window(m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/eecd3aa199e6
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Firefox/17.0 ID:20120724224555
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/79727970b987
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Firefox/17.0 ID:20120725000855
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=eecd3aa199e6&tochange=79727970b987

Suspected : Bug 769541
Blocks: 769541
Status: UNCONFIRMED → NEW
Component: Untriaged → Graphics: Layers
Ever confirmed: true
Keywords: regression
OS: Linux → All
Product: Firefox → Core
Version: 18 Branch → 17 Branch
I'd say this is actually much more likely to be caused by bug 771154.
Assignee: nobody → chrislord.net
Blocks: 771154
No longer blocks: 769541
Status: NEW → ASSIGNED
Sorry, I missed the regression range, you're right. Will take a look at this.
Blocks: 769541
No longer blocks: 771154
(In reply to Chris Lord [:cwiiis] from comment #5)
> Sorry, I missed the regression range, you're right. Will take a look at this.

Have you had the chance to look at this yet Chris? We're getting very close to 17 on beta.
(In reply to Alex Keybl [:akeybl] from comment #6)
> (In reply to Chris Lord [:cwiiis] from comment #5)
> > Sorry, I missed the regression range, you're right. Will take a look at this.
> 
> Have you had the chance to look at this yet Chris? We're getting very close
> to 17 on beta.

Sorry, I haven't gotten round to looking at this yet, I've been spinning too many plates... I should have the time to look at it today. Any help is appreciated, but I guess layout are probably very busy dealing with DLBI atm.
Not sure what's going on here.

The optimised display-list is:

Viewport (Frame A)
  Clip (Frame B)
    CanvasBackground (Frame C)
    FixedPosition (Frame B)
      Background (Frame D)

Invalidations happen on frame A in the correct position, and nothing else is invalidated. I'm not sure if this is correct behaviour or not.

The animated tab is the background item, for which no invalidations are logged against. As it animates in, the area that it previously occupied remains the same and the new area is drawn (because the visible region would have changed). As the layer has no idea of its offset, it can't know to invalidate the area it previously occupied.

This seems to me like either the invalidation that happens on frame A should propagate up somehow, or that there are missing invalidations entirely. I would have expected for there to be invalidations on Frame D, as this is the frame that's moving.
Right, after a bit of digging, I'd have thought this line:

https://mxr.mozilla.org/mozilla-aurora/source/layout/generic/nsAbsoluteContainingBlock.cpp#463

should cause invalidations on either Frame B or D (or both?)
I cannot reproduce in Nigltly18.a1 anymore.
http://hg.mozilla.org/mozilla-central/rev/fd724f194a1f
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/18.0 Firefox/18.0 ID:20121005030609
http://hg.mozilla.org/mozilla-central/rev/fd724f194a1f
Mozilla/5.0 (X11; Linux i686; rv:18.0) Gecko/18 Firefox/18.0a1 ID:20121005030609

However, I can still reproduce in Aurora.17.0a2 w/o HWA.
http://hg.mozilla.org/releases/mozilla-aurora/rev/97d5b2b09fb8
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Firefox/17.0 ID:20121005042010
http://hg.mozilla.org/releases/mozilla-aurora/rev/97d5b2b09fb8
Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/17 Firefox/17.0a2 ID:20121005042010
(In reply to Alice0775 White from comment #10)
> I cannot reproduce in Nigltly18.a1 anymore.
> http://hg.mozilla.org/mozilla-central/rev/fd724f194a1f
> Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/18.0 Firefox/18.0
> ID:20121005030609
> http://hg.mozilla.org/mozilla-central/rev/fd724f194a1f
> Mozilla/5.0 (X11; Linux i686; rv:18.0) Gecko/18 Firefox/18.0a1
> ID:20121005030609
> 
> However, I can still reproduce in Aurora.17.0a2 w/o HWA.
> http://hg.mozilla.org/releases/mozilla-aurora/rev/97d5b2b09fb8
> Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Firefox/17.0
> ID:20121005042010
> http://hg.mozilla.org/releases/mozilla-aurora/rev/97d5b2b09fb8
> Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/17 Firefox/17.0a2
> ID:20121005042010

Right, this would have been fixed by DLBI, but will be reproduceable in Aurora regardless on hardware acceleration.

Drilling down some more;

The invalidation happens on Frame D, but frame D's parent is frame A. As there's a container layer between frame A and D, the invalidation never reaches where it needs to be. I think some extra invalidation needs to happen in nsAbsoluteContainingBlock.
Attached patch Fix invalidation of nested fixed position items (obsolete) — — Splinter Review
The problem is that fixed position items end up adjacent on the frame tree, but can still be hierarchical on the display list. This means that when invalidating, if an item is on a container layer owned by its sibling, its invalidations will be lost.

This patch fixes it by applying invalidations to fixed items to their siblings. I expect there to be a more elegant solution than this, but this is what I came up with.
Attachment #668527 - Flags: review?(roc)
Comment on attachment 668527 [details] [diff] [review]
Fix invalidation of nested fixed position items

Review of attachment 668527 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsAbsoluteContainingBlock.cpp
@@ +470,5 @@
> +      if (fixedSibling == aKidFrame) {
> +        continue;
> +      }
> +      fixedSibling->Invalidate(parentRelativeVisualOverflow - fixedSibling->GetPosition());
> +    }

This seems unnecessary. Details of layer arrangements shouldn't leak into here.

I think we could expand the call to InvalidateFrameSubtree below into Invalidate() and InvalidateThebesLayersInSubtree(), but replace the latter with InvalidateThebesLayersInSubtreeWithUntrustedFrameGeometry(), to work around the fact that the position of the frame has changed.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #13)
> Comment on attachment 668527 [details] [diff] [review]
> Fix invalidation of nested fixed position items
> 
> Review of attachment 668527 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: layout/generic/nsAbsoluteContainingBlock.cpp
> @@ +470,5 @@
> > +      if (fixedSibling == aKidFrame) {
> > +        continue;
> > +      }
> > +      fixedSibling->Invalidate(parentRelativeVisualOverflow - fixedSibling->GetPosition());
> > +    }
> 
> This seems unnecessary. Details of layer arrangements shouldn't leak into
> here.
> 
> I think we could expand the call to InvalidateFrameSubtree below into
> Invalidate() and InvalidateThebesLayersInSubtree(), but replace the latter
> with InvalidateThebesLayersInSubtreeWithUntrustedFrameGeometry(), to work
> around the fact that the position of the frame has changed.

This doesn't work, as the invalidation still won't reach the right container layer. Perhaps we could mark fixed-pos frames that aren't direct descendents of their parents so that they could invalidate just the previous sibling (recursively) to reduce the amount of unnecessary work that may be done here?
How about adding a flag to RefCountedRegion to indicate that the invalid region is infinite, and setting that flag from InvalidateThebesLayersInSubtreeWithUntrustedFrameGeometry?

Or just setting the region to be as big as can be?
Attached patch possible fix (obsolete) — — Splinter Review
To summarise IRC conversation:

This is actually a symptom of a much bigger and more general problem and we probably ought to get a regression range on it. roc formulated a better test-case that doesn't involve fixed-position items which I'll turn into a ref-test.

roc's patch looks good to the both of us, I'll update it with a short-circuit path that we've agreed upon and it's r=me - with DLBI, we don't need this, so this is only for Aurora (and possibly beta, depending on regression range?)

As documented in the patch, the problem here is that we invalidate following the frame-tree, but we build the display-list following placeholder frames, so if any non-shared ancestors of the placeholder and out-of-flow frames aren't shared and create container layers, invalidations will be lost.

This is actually worryingly easy to trigger...

Thanks a lot roc for figuring this one out properly!
Ignore the comment about short-circuiting, it breaks invalidation of other things, so the patch was fine as is. Attaching with commit comment and r=me. Working on reftest.
Attachment #669096 - Attachment is obsolete: true
Attachment #669116 - Flags: review+
Attachment #668527 - Attachment is obsolete: true
Attachment #668527 - Flags: review?(roc)
Attached file Invalidation test (obsolete) —
This is a test that will trigger the bug that doesn't involve position:fixed - we should use this to pin down a regression range, to determine if this patch needs to be applied to beta.

How to use: Click where it says to click, then hover where it says to hover - the hover item should turn red. With the bug, it fails to update until the animation stops (10 seconds later - no need to wait for those 10 seconds though, if it doesn't change, the bug is present).
Adding qa flags to find the real regression window - please see comment #19.
Sorry, use the prefixed CSS transition property so that it works across more Firefox versions. I can confirm that this bug exists in the current Firefox release (15) :(
Attachment #669117 - Attachment is obsolete: true
Comment on attachment 669116 [details] [diff] [review]
Fix invalidations on out-of-flow frames

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Long-standing bug (possible regression)
User impact if declined: Out-of-flow frame backed elements (such as absolutely and fixed position elements) will not update in some situations. This will happen more often for fixed-position elements than others due to the changes from bug 758620
Testing completed (on m-c, etc.): Local testing completed.
Risk to taking this patch (and alternatives if risky): Low risk, patch only adds *more* invalidation. There will be a (likely small) performance regression in the cases that this affects (most of which are quite rare).
String or UUID changes made by this patch: None.
Attachment #669116 - Flags: approval-mozilla-aurora?
Comment on attachment 669116 [details] [diff] [review]
Fix invalidations on out-of-flow frames

[Approval Request Comment]
Same as above.
Attachment #669116 - Flags: approval-mozilla-beta?
Comment on attachment 669116 [details] [diff] [review]
Fix invalidations on out-of-flow frames

FF15 is affected according to comment 21, so we won't make the call to re-spin FF16 at this point. Please land on Aurora 17 now, however.
Attachment #669116 - Flags: approval-mozilla-beta?
Attachment #669116 - Flags: approval-mozilla-beta-
Attachment #669116 - Flags: approval-mozilla-aurora?
Attachment #669116 - Flags: approval-mozilla-aurora+
Pushed to Aurora: https://hg.mozilla.org/releases/mozilla-aurora/rev/7f293a7c521d

Just to clarify for anyone reading this/future;

This bug exists at least since Firefox 15 and affects any out-of-flow frame on a container layer that isn't created from itself or an ancestor of its placeholder frame. This is a rare enough situation that this hasn't been a major issue for us (and I'm not sure how far back this bug goes), but as you can see from roc's test-case, this is not particularly hard to trigger.

What makes this worse is the position:fixed changes that landed in 16 meaning that the original test-case in this bug will trigger it (invalidations of any fixed position child of another fixed-position element). Hopefully this isn't too common a situation either, but the patch is here if need be.

This is fixed by DLBI, so 18 is unaffected.
The decision not to preemptively chemspill for this comes from the fact that this issue has been in its current state in FF16 for >2months without any major website regressions reported. We'll consider for a 16.0.1 (based upon user feedback) if we end up having one, though.
Is there any way to work around this?  We ship software so all software that we've released so far will look bad for these 6 weeks but it would be nice to have a work around for our upcoming releases.
I can see the problem as described in comment #19 all the way back to 4.0.1. I tried 3.6.28, and it seems to work ok in that version. We can narrow this down tomorrow.
(In reply to Trev from comment #28)
> Is there any way to work around this?  We ship software so all software that
> we've released so far will look bad for these 6 weeks but it would be nice
> to have a work around for our upcoming releases.

If you're talking about the bug as it manifests with position:fixed, there are ways - basically, you want to either avoid nesting position:fixed elements, or you need to find a way to either do all the invalidation you need prior to animating (and not have any dynamic elements), or find a way to force the invalidation on the root position:fixed element.

There are various ways to force the invalidation, I'm not sure what the 'best' and most reliable way to do it would be though. Altering the DOM structure for that element will likely do it, or altering style properties.

If you don't mind, could you elaborate on how this affects you? (i.e. what particular structure you're using that regresses due to this bug)
(In reply to Chris Lord [:cwiiis] from comment #30)
> If you don't mind, could you elaborate on how this affects you? (i.e. what
> particular structure you're using that regresses due to this bug)

The original test case is from our source code.
Chris, you still need to land this on trunk, I think :-)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #32)
> Chris, you still need to land this on trunk, I think :-)

trunk as in release? Going on comment #27, I shouldn't do that? Bit confused now...

I do still need to sort out the reftest though.
Depends on: 799856
Oh right, fixed on trunk by DLBI. I momentarily forgot :-)
What happened here is that InvalidateAncestorOfOutOfFlow walks up the ancestor chain of a placeholder frame that's being destroyed (in nsPlaceholderFrame::DestroyFrom, which is invalidating its out-of-flow frame, which InvalidateAncestorOfOutOfFlow directs back to the placeholder). It finds an ancestor (the immediate parent, in this case) and calls FrameLayerBuilder::InvalidateThebesLayersInSubtreeWithUntrustedFrameGeometry on it, which walks the frame subtree rooted at the ancestor. Unfortunately we're in the middle of enumerating and destroying the ancestor's direct child frames, so we find some dead frames there and blow up.
This is not easy to fix in a 100% reliable way that I'd be comfortable with landing on beta. Then again, there's a lot of painful backout to do if we don't fix it right. Ho hum.
Attached patch Keep lists in sync — — Splinter Review
I think this is the best thing to do. I considered various work-arounds for the frame tree not being traversable during frame destruction, but couldn't come up with anything robust.

Almost all frames destroy their children by calling nsFrameList::DestroyFramesFrom() directly or indirectly. That method removes each frame from the list before destroying it so we never have dead frames in the tree. As far as I know the only exception is nsBlockFrame, where it destroys frames via nsLineList::DeleteLineList, which destroys all the frames before removing them from the line list and the frame list. This patch fixes that by having nsLineList::DeleteLineList remove each frame from the line list and the frame list before destroying it.

I think we should do this on trunk as well as branches, for sanity's sake.

New code like this isn't ideal for Beta, but I think this patch is probably better than our alternatives at this point. It's quite isolated and easy to understand. My main concern is actually that there could be other regressions from the previous patch in this bug that this patch doesn't fix.
Assignee: chrislord.net → roc
Attachment #670352 - Flags: review?(matspal)
Attached file crashtest frame dump —
For the record, this is the frame dump (for the crashtest) to go with
roc's description of the crash in comment 36.
Comment on attachment 669116 [details] [diff] [review]
Fix invalidations on out-of-flow frames

It appears that each out-of-flow calls InvalidateAncestorOfOutOfFlow that
walks up the ancestor chain to a HAS_CONTAINER_LAYER frame and then calls
InvalidateThebesLayersInSubtreeWithUntrustedFrameGeometry
on that, which in turn walks all its descendants.
This seems like it could be O(N^2) to me.  Even if we believe this is a
rare edge case, I'm still a little worried about performance (having
"IE Maze" in recent memory).
Attached patch Alternative 2b — — Splinter Review
I think we could simply remove all the lines up front if we want.
The important thing to fix here is the dangling pointers in mFrames
so that the frame tree is valid at all times.

This makes any line list iteration (during destruction) not find
any lines; I don't think it matters since they are all going away
anyway.  As far as I know, it's only used for marking lines dirty.
Attached patch Alternative 2c — — Splinter Review
I think your patch is correct, but I would add an optimization
to delete the line's frame hash table up front (SwitchToCounter).
This is to avoid removing frames one by one from it
(in NoteFrameRemoved).

Also, why do we do "aDestructRoot ? aDestructRoot : child" ?
This is the only place we do that and I don't see why.
(The only caller is nsBlockFrame::DestroyFrom which just pass
along its aDestructRoot, which seems like it should always work)
Comment on attachment 670352 [details] [diff] [review]
Keep lists in sync

Looks fine, but I prefer 2b or 2c.
Attachment #670352 - Flags: review?(matspal) → review+
> why do we do "aDestructRoot ? aDestructRoot : child" ?
Boris asked the same in bug 508473 comment 23 when this code was added -
and fantasai explains why in bug 508473 comment 36.

Back then, DestroyOverflowLines called nsLineBox::DeleteLineList
with aDestructRoot == nsnull:
http://hg.mozilla.org/mozilla-central/file/1e37be566afa/layout/generic/nsBlockFrame.cpp#l4543

DestroyOverflowLines doesn't do that anymore, it asserts that it's empty:
http://hg.mozilla.org/releases/mozilla-beta/file/tip/layout/generic/nsBlockFrame.cpp#l4594

So changing it to just pass aDestructRoot there should be fine I think.
Blocks: 733991
(In reply to Mats Palmgren [:mats] from comment #40)
> It appears that each out-of-flow calls InvalidateAncestorOfOutOfFlow that
> walks up the ancestor chain to a HAS_CONTAINER_LAYER frame and then calls
> InvalidateThebesLayersInSubtreeWithUntrustedFrameGeometry
> on that, which in turn walks all its descendants.
> This seems like it could be O(N^2) to me.  Even if we believe this is a
> rare edge case, I'm still a little worried about performance (having
> "IE Maze" in recent memory).

True. However this case is rare (since we haven't seen any bugs filed on it since Firefox 4!) and it's completely fixed with DLBI. So I think we can take this possible performance regression for one release.
(In reply to Mats Palmgren [:mats] from comment #42)
> Created attachment 671004 [details] [diff] [review]
> Alternative 2c
> 
> I think your patch is correct, but I would add an optimization
> to delete the line's frame hash table up front (SwitchToCounter).
> This is to avoid removing frames one by one from it
> (in NoteFrameRemoved).

Sure, I'll take this. I think I'd rather leave the line list valid during destruction "just in case".
(In reply to Chris Lord [:cwiiis] from comment #19)
> Created attachment 669117 [details]
> Invalidation test

I'm getting inconsistent results with the testcase in comment 19. The testcase fails in Firefox 15.0 but passes in Firefox 14.0.1. However, when I try the testcase in the nightlies, it fails between 2012-06-05 and 2012-03-01 which goes well before Firefox 14.0 was released. AFAICT the testcase should PASS at some point within that range but it isn't.
Comment on attachment 669116 [details] [diff] [review]
Fix invalidations on out-of-flow frames

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Longstanding bug but exposed by bug 758620 (and causing bug 793998)
User impact if declined: Out-of-flow frame backed elements (such as absolutely and fixed position elements) will not update in some situations. This will happen more often for fixed-position elements than others due to the changes from bug 758620
Testing completed (on m-c, etc.): Local testing completed, was on mozilla-beta for a while but got backed out due to crashes. Patch to fix crashes has been on inbound for a couple of days.
Risk to taking this patch (and alternatives if risky): Low risk, patch only adds *more* invalidation. There will be a (likely small) performance regression in the cases that this affects (most of which are quite rare). Did cause crashes which are addressed by the extra patch, which is itself pretty safe.
String or UUID changes made by this patch: None.
Attachment #669116 - Flags: approval-mozilla-beta- → approval-mozilla-beta?
Comment on attachment 671004 [details] [diff] [review]
Alternative 2c

See approval request in previous patch in this bug for details.
Attachment #671004 - Flags: approval-mozilla-beta?
Comment on attachment 669116 [details] [diff] [review]
Fix invalidations on out-of-flow frames

It's early, we will have time to watch the Beta channel for any significant volume of crashes, should that occur after this lands and goes out in Beta 2, approving for uplift.
Attachment #669116 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #671004 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 671004 [details] [diff] [review]
Alternative 2c

[Approval Request Comment]
Bug caused by (feature/regressing bug #): None
User impact if declined: None
Testing completed (on m-c, etc.): a few days on mozilla-central
Risk to taking this patch (and alternatives if risky): very low
String or UUID changes made by this patch: None
The only reason to take this on Aurora is that it's on central and beta, so it might be confusing to not have it on Aurora as well.
Attachment #671004 - Flags: approval-mozilla-aurora?
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Attachment #671004 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Blocks: 802470
Whiteboard: [leave open]
Given my inconsistent results in comment 48, I'm not sure we can address the qawanted (if it's still needed) and I'm not sure if we'll be able to reliably verify this bug is fixed. Please advise.
Since no one has spoken up to object or advise my request in comment 56 I'm dropping qawanted from this bug. I'm also flagging qa- since there doesn't seem to be a way to reliably verify this is fixed.

@Trev, since you reported this bug, if you can help by verifying it's fixed, that would be appreciated. It should be tested in the latest Firefox 19.0a1 Nightly, 18.0a2 Aurora, and 17.0b6.

Thank you
Keywords: qawanted
Whiteboard: [qa-]
I've checked Nightly, Aurora, and Beta.  It is working for me.
Thanks Trev!
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla19
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: