Closed Bug 1459937 Opened 2 years ago Closed Last year

Clipped question mark on Nightly whatsnew page after asynchronous font load

Categories

(Core :: Layout: Text and Fonts, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- verified

People

(Reporter: mstange, Assigned: gerald)

References

(Regression)

Details

(Keywords: regression)

Attachments

(6 files)

Steps to reproduce:
 1. Make sure your window is at least 1400 pixels wide.
 2. Go to https://www.mozilla.org/en-US/firefox/62.0a1/whatsnew/
 3. Wait for it to load and notice the font switch.

Expected results:
The text "Unexpected behavior?" at the end of the first line in the right column should be visible in its entirety.

Actual results:
The question mark is clipped off.

If the bug didn't happen, try to force a full reload using shift+reload.

Triggering a reflow by resizing the window to be smaller and then larger again fixes it.
Attached image screenshot [broken]
Attached image screenshot [fixed]
Looks like it's connected to the infamous "flash of unstyled [or un-fonted] text": the problem occurs when the webfont loads and we re-render the content, replacing the fallback font that appeared initially. Apparently we're not doing a sufficiently thorough reflow at that point.

So it may not happen for people on really fast connections, if the font resources load almost instantaneously; but it's pretty reliably reproducible on my home ADSL, for example.

The use of the font-display:swap descriptor in the @font-face rules on this page make it much likelier that the user will see the FOUT (and hence this bug) than would otherwise be the case. (I wonder why they chose to use this? Generally the default behavior is thought to be preferable.)
Here's a further screenshot of a similar bug after the async webfonts have been swapped in; see the italic text "nightly-community", as well as the clipped word "final" on the 4th line.
mozregression points at:

14:24.43 INFO: Last good revision: e6e712904806da25a9c8f48ea4533abe7c6ea8f4
14:24.43 INFO: First bad revision: d6bf703c5deaf1e328babd03d5e68ff2a4ffe10e
14:24.43 INFO: Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e6e712904806da25a9c8f48ea4533abe7c6ea8f4&tochange=d6bf703c5deaf1e328babd03d5e68ff2a4ffe10e

In that pushlog, bug 1308876 sounds like a possible candidate. :dbaron, any thoughts?
Blocks: 1308876
Flags: needinfo?(dbaron)
Keywords: regression
I'm inclined to say that this uncovered an existing bug.

In particular, if you look at the way style change hints for reflows are handled, we traverse all continuations and IB-split siblings:
https://searchfox.org/mozilla-central/rev/5ff2d7683078c96e4b11b8a13674daded935aa44/layout/base/RestyleManager.cpp#1241-1245

However, for font loads, we only post this reflow to the first continuation:
https://searchfox.org/mozilla-central/rev/5ff2d7683078c96e4b11b8a13674daded935aa44/layout/style/nsFontFaceUtils.cpp#109

I think the latter should have the same loop over nsLayoutUtils::GetNextContinuationOrIBSplitSibling() that the former does.





While I'm there, I also think this optimization:
          if (NS_SUBTREE_DIRTY(f)) {
            // This is a displayed frame, so if it is already dirty, we
            // will be reflowed soon anyway.  No need to call
            // FrameNeedsReflow again, then.
            return;
          }
is not strictly correct, since nothing guarantees that the thing that marked it dirty used nsIPresShell::eStyleChange when it called FrameNeedsReflow and thus marked intrinsic widths as dirty.  Though perhaps intrinsic widths matter less for SVG.
Flags: needinfo?(dbaron)
On the other hand, given that the parent function is iterating over subtrees, that would become quite expensive if the loop were there -- the loop over IB-split siblings probably belongs in the handling of the subtree roots in nsFontFaceUtils::MarkDirtyForFontChange -- either when they're put on the stack or taken off the stack (I suspect put on is easier).
Given this regression goes back to 56, probs a wontfix for 60.
Priority: -- → P2
Duplicate of this bug: 1462239
Cameron suggested I should have a look at it, see if I can help.
Assignee: nobody → gsquelart
Update and rubber-ducking:

Good news everyone: I can reproduce it on my Mac.

Bad news: Using devtools makes the problem go away: The '?' becomes fully visible again at the same time the devtool inspector displays its content.

As suggested in comment 6, I added a loop at the end of ScheduleReflow, but that didn't help -- The only continuation that was followed there was in the previous paragraph (in the previous column).
Same with the suggestion in comment 7.

Looking at the frame tree, the line at issue is actually the first one in the paragraph (and column), so it is not itself a continuation, and therefore the issue is probably not about continuations.
And thinking about it, the nsFontFaceUtils::MarkDirtyForFontChange algorithm goes through entire subtrees, so it should go through all nodes, continuations or not -- unless the issue was with a continuation of the subtree root itself.
In our case here, the subtree under consideration is the top viewport, so we should go through every frame anyway.
(And therefore the suggestions in comment 6 and comment 7 above wouldn't have helped anyway. Or am I misunderstanding something?)

Then, trying to reproduce on Linux to hopefully debug faster with rr, I found that the behavior is different:
When first loading (or shift-reloading), "behavior?" is moved to the second line; supposedly to correctly prevent clipping the '?'?
Then when just refreshing, sometimes(!) "behavior?" appears at the end of the first line, but without any clipping.
I'm guessing the different default font and/or font display engine give us different text dimensions that don't cause visible issues on Linux.
But the fact that we can get two different reflow end results should probably still count as a bug, right? However maybe it's really a different cause.

Windows behaves the same way as Mac: clipped '?' on full (re)load.

I'll keep investigating. Suggestions welcome.
David, when you have a few minutes, please have a look at my comment 11 above, see if you have further ideas&suggestions. Thanks!
Flags: needinfo?(dbaron)
(In reply to Gerald Squelart [:gerald] from comment #11)
> In our case here, the subtree under consideration is the top viewport, so we
> should go through every frame anyway.

Yeah, given that MarkDirtyForFontChange is called only on the root frame, it's actually fine (except for the last paragraph of comment 6, maybe, but that's not relevant here).

So this probably fits with many of the other regressions of bug 1308876, and requires debugging to understand what should be reflowed in a stronger way (e.g., with NS_FRAME_IS_DIRTY) but wasn't.

One debugging technique that might be useful is:
 * use the layout debugger
 * load the page in a way that shows the bug, and dump the frame tree
 * zoom in and zoom out once, which I think should make the bug go away
 * dump the frame tree again
 * figure out what's *different* about the frame trees (e.g., frame bounds, overflow areas, etc.)

(This might be easier with a somewhat simplified testcase.)

Then from there you might be able to debug when things should have been correct -- although part of what's difficult about the regressions from bug 1308876 is that I don't have a good model of what the rules are supposed to be.
Flags: needinfo?(dbaron)
Duplicate of this bug: 1471484
So a new theory after discussing and looking at this with Gerald a little bit more:

We have two column frames, call them C1 and C2.  We reflow C1, and it propagates NS_FRAME_IS_DIRTY to all its children at the start of its reflow.  Then, towards the end of its reflow, it pulls the first child from C2 into C1, reflows it, and determines it doesn't fit.  Then we reflow C2:  it marks all its children dirty, and then pulls the overflow list from C1 and reflows the item in it... which has not been marked dirty anywhere in this process.

My theory is that whenever we pull a frame from a later frame into an earlier frame, we should mark it dirty if the parent is marked dirty.  I don't think we need to do this in the other direction on the theory that (1) continuations are basically all dirty or not-dirty in sync with each other and (2) we reflow the earlier continuations before the later continuations.
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=de97c0b34421c2338ccdb6e293614c1ace2f7ed1


David, I'm pretty sure the first patch (lines only) is good to go, I'm less sure about the second one (floats).

Also, I'm not sure we cover 100% of "pulling from later frames" cases. Unless you think of particular things we should fix right now, how about we push these first, and if needed we can open a follow-up to catch most similar issues?
Comment on attachment 8990868 [details]
Bug 1459937 - Mark pulled floats (from pulled lines) dirty -

https://reviewboard.mozilla.org/r/255872/#review262724

It might be worth simplifying this to make the callers of ReparentFloats just pass a boolean saying which *direction* they're moving the floats, and then have ReparentFloats pass through aMovingBackwards && (GetStateBits() & NS_FRAME_IS_DIRTY).  Then there need to be fewer NS_FRAME_IS_DIRTY checks scattered around.

Also, both of the ruby callers are pulling frames from a later continuation to an earlier one, so they need to pass true in some cases as well.

Also see my comment on the previous patch for one change that should be in this one instead.

I think this will be good with those changes, but I'd like to look at the revisions.

::: layout/generic/nsContainerFrame.cpp:1772
(Diff revision 1)
>  
>    nsBlockFrame* ourBlock = nsLayoutUtils::GetAsBlock(aOurLineContainer);
>    NS_ASSERTION(ourBlock, "Not a block, but broke vertically?");
>  
>    while (true) {
> -    ourBlock->ReparentFloats(aFrame, frameBlock, false);
> +    ourBlock->ReparentFloats(aFrame, frameBlock, false, false);

So this isn't correct:  one of the callers of this function (nsInlineFrame::PullOneFrame) is moving a frame from a later continuation to an earlier one, whereas the other (nsContainerFrame::MoveInlineOverflowToChildList) is doing the opposite.  So the nsInlineFrame caller should pass a boolean differently from the nsContainerFrame one.  (And the recursive call should presumably pass the boolean through.)

I guess the boolean passed in could either be the full computation, or just the forward/backward boolean that's &&ed here with the NS_FRAME_IS_DIRTY check.
Attachment #8990868 - Flags: review?(dbaron) → review-
Comment on attachment 8990867 [details]
Bug 1459937 - Mark pulled lines (from n-i-f or overflow) dirty -

https://reviewboard.mozilla.org/r/255870/#review262720

r=dbaron with the following changes, although I think there's quite a bit more code auditing to do through all of the frame classes.

::: layout/generic/nsBlockFrame.cpp:608
(Diff revision 1)
> +// In cases where we reparent a frame from a not-yet-reflowed subtree (i.e.,
> +// aOldParent is "after" aNewParent), it should be marked as dirty if we
> +// (aNewParent) are dirty too, because ReflowInput::Init wouldn't have marked
> +// that frame dirty as it was not in the known-children list yet.

I think this comment could be a little be clearer about what aMarkDirty should mean.  Perhaps it could say:

Because a frame with NS_FRAME_IS_DIRTY marks all of its children dirty at the start of its reflow, when we move a frame *from* a later frame *to* an earlier frame, and the earlier frame has NS_FRAME_IS_DIRTY (though that should correspond with the later frame having NS_FRAME_IS_DIRTY), we need to add NS_FRAME_IS_DIRTY.  Therefore, aMarkDirty should be true when (a) aOldParent is after aNewParent and (b) aNewParent has NS_FRAME_IS_DIRTY set.

::: layout/generic/nsBlockFrame.cpp:2934
(Diff revision 1)
>                 "should only pull from first line");
>      aFromContainer->mFrames.RemoveFrame(frame);
>  
>      // When pushing and pulling frames we need to check for whether any
>      // views need to be reparented.
> -    ReparentFrame(frame, aFromContainer, this);
> +    ReparentFrame(frame, aFromContainer, this, false);

This should pass GetStateBits & NS_FRAME_IS_DIRTY, since it's pulling a frame from the following block.

::: layout/generic/nsBlockFrame.cpp:2939
(Diff revision 1)
> -    ReparentFloats(frame, aFromContainer, false);
> +    ReparentFloats(frame, aFromContainer, false,
> +                   GetStateBits() & NS_FRAME_IS_DIRTY);

This change belongs in the second patch
Attachment #8990867 - Flags: review?(dbaron) → review+
Gerald had the possibly-better suggestion that we do the marking of all of the subtree as dirty earlier so we don't need complicated logic when we pull frames.  If this doesn't cause any performance regressions (due to poor memory locality) then I think it's a good idea, since it will be simpler than what we were talking about above.

We can do it in PresShell::FrameNeedsReflow, and essentially maintain the invariant that if a frame has NS_FRAME_IS_DIRTY, then all of its descendants do too.  This means that when FrameNeedsReflow marks all the frames in the subtree as dirty, we can skip any frame (and its descendants) if it already has NS_FRAME_IS_DIRTY.
See Also: → 1474771
My first attempt at bug 1474771 (which would make this one unnecessary) failed miserably, so I think we may still want to push this fix here in the meantime.

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=277c532298ac1b85b8c9df5a9f7bf362fe6856cf
Comment on attachment 8990868 [details]
Bug 1459937 - Mark pulled floats (from pulled lines) dirty -

https://reviewboard.mozilla.org/r/255872/#review263276

::: layout/generic/nsRubyBaseContainerFrame.cpp:788
(Diff revision 2)
>        nsLayoutUtils::GetAsBlock(aLineLayout->LineContainerFrame());
>      MOZ_ASSERT(newFloatCB, "Must have a float containing block");
>      if (oldFloatCB != newFloatCB) {
>        for (nsIFrame* frame : aColumn) {
> -        newFloatCB->ReparentFloats(frame, oldFloatCB, false);
> +        newFloatCB->ReparentFloats(frame, oldFloatCB, false,
> +                                   ReparentingDirection::Forwards);

this should be Backwards (per comment 19)

::: layout/generic/nsRubyFrame.cpp:425
(Diff revision 2)
>  
>    if (nsBlockFrame* newFloatCB =
>        nsLayoutUtils::GetAsBlock(aLineLayout->LineContainerFrame())) {
>      if (oldFloatCB && oldFloatCB != newFloatCB) {
> -      newFloatCB->ReparentFloats(baseFrame, oldFloatCB, true);
> +      newFloatCB->ReparentFloats(baseFrame, oldFloatCB, true,
> +                                 ReparentingDirection::Forwards);

this should be Backwards (per comment 19)
Attachment #8990868 - Flags: review?(dbaron) → review+
This is embarrassing, I thought I had carefully gone through each of these. Thank you for the thorough reviews.
Comment on attachment 8990867 [details]
Bug 1459937 - Mark pulled lines (from n-i-f or overflow) dirty -

https://reviewboard.mozilla.org/r/255870/#review263280

Revisions still look good, with a few comments:

::: layout/generic/nsBlockFrame.cpp:626
(Diff revision 2)
>    // views need to be reparented
>    nsContainerFrame::ReparentFrameView(aFrame, aOldParent, aNewParent);
>  }
>  
> +static bool
> +MarkReparentedFramesDirty(nsIFrame* aNewParent, ReparentingDirection aDirection)

This should be ShouldMarkReparentedFramesDirty rather than MarkReparentedFramesDirty, so that it's clear that it doesn't do the marking.

::: layout/generic/nsBlockFrame.cpp:626
(Diff revision 2)
>    // views need to be reparented
>    nsContainerFrame::ReparentFrameView(aFrame, aOldParent, aNewParent);
>  }
>  
> +static bool
> +MarkReparentedFramesDirty(nsIFrame* aNewParent, ReparentingDirection aDirection)

This should be ShouldMarkReparentedFramesDirty rather than MarkReparentedFramesDirty, so that it's clear that it doesn't do the marking.

::: layout/generic/nsBlockFrame.cpp:634
(Diff revision 2)
> +// an earlier frame, and the earlier frame has NS_FRAME_IS_DIRTY, we need to
> +// add NS_FRAME_IS_DIRTY to the reparented frame (though that should have
> +// corresponded with the later frame having NS_FRAME_IS_DIRTY).

put the parenthetical before the comma since it modifies "the earlier frame has NS_FRAME_IS_DIRTY", and change "should have corresponded" to "should correspond".
Comment on attachment 8991225 [details]
Bug 1459937 - In DEBUG mode, verify that reparenting direction is correct -

https://reviewboard.mozilla.org/r/256186/#review263282

::: layout/generic/nsBlockFrame.cpp:635
(Diff revision 1)
> +                         nsIFrame* aNewParent,
> +                         ReparentingDirection aDirection)
>  {
> +#ifdef DEBUG
> +  NS_ASSERTION(aOldParent->FirstInFlow() == aNewParent->FirstInFlow(),
> +               "Reparenting should provide in-flow parents");

I'd describe this as "Reparenting should be between continuations of the same frame"
Attachment #8991225 - Flags: review?(dbaron) → review+
Bug 1474771 may take some time, so I'll land this now, to get rid of some publicly-visible layout issues.
Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bd4bd8ac335c
Mark pulled lines (from n-i-f or overflow) dirty - r=dbaron
https://hg.mozilla.org/integration/autoland/rev/fb3fba19e615
Mark pulled floats (from pulled lines) dirty - r=dbaron
https://hg.mozilla.org/integration/autoland/rev/2cff5c67d000
In DEBUG mode, verify that reparenting direction is correct - r=dbaron
Backed otu for failing crashtests with Assertion failure: (IndexInFlow(aOldParent) < IndexInFlow(aNewParent)) == (aDirection == ReparentingDirection::Forwards)

Push that caused the failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=2cff5c67d000031ae1cc30f0a89ceb8503c3ff27

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=188207065&repo=autoland&lineNumber=57764

Backout: https://hg.mozilla.org/integration/autoland/rev/315103f6db6b1f44647d82c0f494c2d45fc42e07
Flags: needinfo?(gsquelart)
Thank you Andreea.

I've got an update but mozreview throws a 500 :-(
I'll try again tomorrow...
Flags: needinfo?(gsquelart)
:gerald if you still encounter this tomorrow, please write in #sheriffs or #ci and someone will look at it. 

Thank you.
Flags: needinfo?(gsquelart)
Depends on: 1476484
Opened bug 1476484 about the mozreview issue.
Will work on other things in the meantime...
Flags: needinfo?(gsquelart)
Could you just attach them as patches?
Flags: needinfo?(gsquelart)
I'm giving IT a chance to look at the issue, in case it could affect others. And I'm not optimally setup to attach patches and push to inbound, so I'd prefer to avoid it!
But if still not fixed in the next ~24h, I'll attach them...


David, if you want to have an early look at the updated patches, they're attached to this Try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7bc98cd79431c2f73745202db179495bb12b55eb

The difference is the introduction of `ReparentingDirection::Variable`, when the direction can be either backwards or forwards and not easily inferred from the code. In which case I pessimistically set NS_FRAME_IS_DIRTY, and of course there is no incorrect direction to MOZ_ASSERT about.
As you can see from the backout in comment 35, it seems there are situation where nsBlockFrame::ReflowBlockFrame could pull an earlier frame! Only 1 call seems to be affected:
https://searchfox.org/mozilla-central/rev/c296d5b2391c8b37374b118180b64cca66c0aa16/layout/generic/nsBlockFrame.cpp#3677

I didn't investigate further, and I didn't try to optimize, because:
- Correctness trumps performance (right?), especially for a page as visible as our "whatsnew",
- The performance impact would be an extra sub-reflow in these (rare?) cases when pulling frames forward in ReflowBlockFrame,
- If we find more similar issues in the field, it will be easy to use ReparentingDirection::Variable as quick fix,
- Hopefully bug 1474771 will make this bug here obsolete anyway, so I should focus on it now.
Flags: needinfo?(gsquelart)
David, please see comment 40 for details&reasons for these latest changes, let me know if that's ok to push.
Flags: needinfo?(dbaron)
Yep, the revisions all look good.
Flags: needinfo?(dbaron)
Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1f0f8ed5b376
Mark pulled lines (from n-i-f or overflow) dirty - r=dbaron
https://hg.mozilla.org/integration/autoland/rev/4894ad1626a2
Mark pulled floats (from pulled lines) dirty - r=dbaron
https://hg.mozilla.org/integration/autoland/rev/d29652fbe483
In DEBUG mode, verify that reparenting direction is correct - r=dbaron
Blocks: 1470311
I have managed to reproduce the issue using Fx 62.0a1 buildID: 20180508231737.

The issues is verified fixed using Fx63.0b10 buildID: 20180927143327, on windows 10 x64, Ubuntu 16.04 and macOS 10.13 (macbook). The 'unexpected behavior?' is now displayed on 2 rows (no risk of truncating the '?' symbol) without any issues.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
No longer blocks: 1308876
Regressed by: 1308876
You need to log in before you can comment on or make changes to this bug.