Closed Bug 1288938 Opened 3 years ago Closed 3 years ago

Browser freeze while loading rustdoc source pages

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox51 --- fixed

People

(Reporter: emilio, Assigned: emilio)

References

(Blocks 1 open bug, )

Details

Attachments

(3 files, 1 obsolete file)

When opening a [src] link of a rustdoc page in firefox, it takes a huge amount of time to load, freezing the browser. It eventually finishes, and the page responds normally after that.

See, for example, the url attached to this bug: http://doc.servo.org/src/script/script_thread.rs.html

I've done a bit of investigation, and it seems it's a giant reflow while the document is loading, and we spend an *enormous* amount of time in CreateObserverForAnimatedGlyphs (in layout/generic/nsTextRun.cpp).

In gdb I'm seeing scary things like this:

(gdb) p userData->mMappedFlowCount
$17 = 2303209708

for a single text run in nsTextFrame.cpp:798.

That being said, I don't know anything about text layout in Gecko, so my diagnostic might be wrong.
Also, the problem goes away if I remove the font used on the page (Source Code Pro) from my system.
Here is a zip file with the above page exported as HTML, with the "Source Code Pro" added as a webfont. I reproduced the freeze by opening the HTML file in Firefox 47.0.1 and Firefox Nightly (2016-07-23). In Firefox 47, the page immediate freezes. In Nightly, the page displays fine at first, but after reloading the page it freezes as well.
Looks like in some case, CreateObserversForAnimatedGlyphs (called only by BreakSink::Finish) is called ~13k times, and then it can call CreateObserverForAnimatedGlyphs ~109M times. So there are some kind of O(N^2) algorithm involves...

I don't see the unreasonably number of userData->mMappedFlowCount as described in comment 0, FWIW.
To make it clearer, when that happen, a single BuildTextRunsScanner::FlushLineBreaks() calls BreakSink::Finish() ~13k times (the same number for CreateObserversForAnimatedGlyphs()), and then CreateObserverForAnimatedGlyphs() is called ~109M times.

It looks like BreakSinks are created in BuildTextRunsScanner::SetupBreakSinksForTextRun(), which creates mMappedFlows.Length() number of BreakSink, and then BuildTextRunsScanner::FlushLineBreaks() calls BreakSink::Finish() for each BreakSink, and then BreakSink::Finish() calls CreateObserverForAnimatedGlyphs() for userData->mMappedFlowCount times (which is derived from mMappedFlows.Length() in BuildTextRunsScanner::SetupLineBreakerContext).

So it is basically an O(N^2) algorithm, where N = mMappedFlows.Length(). When the mapped flows grow very large, it becomes very slow.
My guess is probably we can try moving the call to CreateObserversForAnimatedGlyphs() out from BreakSink::Finish(). I need to learn more about mapped flow and text run to be confident what should be done here. I'll probably try tomorrow if emilio is not going to continue investigating this.

Also jfkthame may have some idea about what should happen here.
This makes the problem go away for that particular page (doesn't freeze the browser anymore).

Instead of sticking observers into every text frame of the run, we stick it just on the run, then if the glyphs change we dirty all its frames.

I need to rethink how to organize this in order to not re-export the world, but I wanted some feedback before.

Also, probably adding a word to every text run is not the best regarding memory consumption? Maybe it's not such a big deal but... Ideas welcome.
Attachment #8774092 - Flags: feedback?(xidorn+moz)
Attachment #8774092 - Flags: feedback?(jfkthame)
Please put the code styling stuff (removing tailing whitespaces) into a separate patch. They are annoying...
And it also seems to me referring layout stuff in gfx part is unfortunate.
Yes, I know, sorry for the whitespace stuff, I'll try to split them in other patches.

I agree with the layout stuff, I just wanted feedback on the approach. We probably want to move that array to the UserData() on the text run, and keep the GlyphObserver on layout, I guess.
I guess the styling patch can probably be landed without review, and with "DONTBUILD", if it is just removing tailing spaces.
Attachment #8774092 - Attachment is obsolete: true
Attachment #8774092 - Flags: feedback?(xidorn+moz)
Attachment #8774092 - Flags: feedback?(jfkthame)
While moving the observer to the textrun may well make sense here, I'll note that I can't seem to reproduce the problem as described -- the test page, whether loading from the dev.servo.org URL or the standalone testcase downloaded locally, seems to load & render in about a second for me, which I wouldn't describe as "a huge amount of time...freezes the browser". I tried both Firefox release and current Nightly.

So I'm curious to know what's causing this problem to appear for you but not for me.
I can reliably reproduce this issue with the testcase on aurora and nightly. Probably you somehow disabled SVG-in-OpenType?
No, SVG-in-OT is definitely working for me (including an animated example, http://people.mozilla.org/~jkew/opentype-svg/soccer.html). So it's not that; something else must be different.
Hmmm, yeah, it works pretty good on my OS X... I reproduced this on Windows. Not sure why...
Comment on attachment 8774107 [details]
Bug 1288938: layout: Move the GlyphObserver to the text run instead of the frame.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66712/diff/1-2/
I'm on Linux, FWIW.

The second diff actually passes try (the first one didn't pass in debug mode because I don't know how to binary and I changed the value of TEXT_UNUSED_FLAG to be the same as TEXT_TRAILING_ARABICCHAR), so the debug checks fired when arabic chars where present.

Actual successful try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f2636df5f15f2e677ead60fa25d1f0b916908d52
(In reply to Xidorn Quan [:xidorn] (away until July 25) from comment #15)
> Hmmm, yeah, it works pretty good on my OS X... I reproduced this on Windows.
> Not sure why...

Although the first load is fast enough (in ~1s), if you reload the page, it would get stuck. Probably related to incremental layout somehow?
It seems if we can split the text frames earlier for force line break inside, this issue would be partially fixed as well, since the text run would be much shorter. That could probably be done in the frame constructor I guess.

And this bug itself, though, is still worth fixing, since lines can be very long as well.
(In reply to Xidorn Quan [:xidorn] (away until July 25) from comment #19)
> It seems if we can split the text frames earlier for force line break
> inside, this issue would be partially fixed as well, since the text run
> would be much shorter. That could probably be done in the frame constructor
> I guess.

Filed bug 1288983.
Comment on attachment 8774107 [details]
Bug 1288938: layout: Move the GlyphObserver to the text run instead of the frame.

https://reviewboard.mozilla.org/r/66712/#review63432

This looks fine to me, but I'm not confident enough to r+ this patch. I'd hand over the request to jfkthame and see if he has any concern else.

::: layout/generic/nsTextFrame.cpp:260
(Diff revision 2)
>   * This is our user data for the textrun, when textRun->GetFlags() does not
>   * have TEXT_IS_SIMPLE_FLOW set. When TEXT_IS_SIMPLE_FLOW is set, there is
>   * just one flow, the textrun's user data pointer is a pointer to mStartFrame
>   * for that flow, mDOMOffsetToBeforeTransformOffset is zero, and mContentLength
>   * is the length of the text node.

This comment would need an update as well for the fact that when TEXT_IS_SIMPLE_FLOW, the user data could be either the frame or a SimpleTextRunUserData.

::: layout/generic/nsTextFrame.cpp:292
(Diff revision 2)
> + * This allows having an array of observers if there are fonts whose glyphs
> + * might change, but also avoid allocation in the simple case that there aren't.
> + */
> +struct SimpleTextRunUserData {
> +  nsTArray<UniquePtr<GlyphObserver>> mGlyphObserverArray;
> +  nsIFrame* mFrame;

It seems to me mFrame can only be nsTextFrame. Based on that, you can remove some of static_cast below.

::: layout/generic/nsTextFrame.cpp:1466
(Diff revision 2)
>    if (aTextRun->GetFlags() & nsTextFrameUtils::TEXT_IS_SIMPLE_FLOW)
>      return mMappedFlows.Length() == 1 &&
> -      mMappedFlows[0].mStartFrame == static_cast<nsTextFrame*>(aTextRun->GetUserData()) &&
> +      mMappedFlows[0].mStartFrame == static_cast<nsTextFrame*>(GetFrameForSimpleFlow(aTextRun)) &&
>        mMappedFlows[0].mEndFrame == nullptr;

As far as you're here, could you wrap this block with { }?
Attachment #8774107 - Flags: review?(xidorn+moz)
Attachment #8774107 - Flags: review?(jfkthame)
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #18)
> (In reply to Xidorn Quan [:xidorn] (away until July 25) from comment #15)
> > Hmmm, yeah, it works pretty good on my OS X... I reproduced this on Windows.
> > Not sure why...
> 
> Although the first load is fast enough (in ~1s), if you reload the page, it
> would get stuck. Probably related to incremental layout somehow?

I've tried this on OS X, Windows and Linux, and still haven't been able to reproduce a significant "freeze" on any of them, either on first load or when reloading. Which seems very weird; I don't know why it's happening to you but not to me.

The patch here seems like it's a good thing to do, anyhow, but I'd really like to understand what's causing the problem to show up on some systems but not others.
Comment on attachment 8774107 [details]
Bug 1288938: layout: Move the GlyphObserver to the text run instead of the frame.

https://reviewboard.mozilla.org/r/66712/#review63986

This basically looks good, but I think we should consider a bit more adjustment - see comments below - unless the suggestion there makes it unmanageably complex. Clearing r? flag for now, until we have an updated patch to look at.

::: layout/generic/nsTextFrame.cpp:267
(Diff revision 2)
>   * just one flow, the textrun's user data pointer is a pointer to mStartFrame
>   * for that flow, mDOMOffsetToBeforeTransformOffset is zero, and mContentLength
>   * is the length of the text node.
>   */
> -struct TextRunUserData {
> +struct ComplexTextRunUserData {
> +  nsTArray<UniquePtr<GlyphObserver>> mGlyphObserverArray;

No need for the `Array` suffix on the field name here; just call it mGlyphObservers.

Also, although it adds a bit more complexity, I think we should probably have two forms of "complex" user data, one without the observer array and one with it, and only swap to the most-complex version when actually needed. I suspect we allocate quite a lot of these (though I haven't explicitly measured this), and in the vast majority of cases we won't need the observer array as there aren't any SVG-in-OT fonts involved. In particular, we frequently use a stack-allocated "dummy" user data record, and it'd be a pity to add the redundant construction of an nsTArray to all of those.

So we'll have four kinds of 'user data' altogether (though only three have their own struct definition).

::: layout/generic/nsTextFrame.cpp:277
(Diff revision 2)
>  
> +static void
> +DestroyComplexUserData(void* aUserData)
> +{
> +  ComplexTextRunUserData* userData = static_cast<ComplexTextRunUserData*>(aUserData);
> +  if (userData) {

AFAICS the Destroy function is always called with a properly-typed pointer, so there's no reason to give it a `void*` parameter that you then have to cast. Just have it take a `ComplexTextRunUserData*` directly, and return immediately if that's null.

::: layout/generic/nsTextFrame.cpp:291
(Diff revision 2)
> + *
> + * This allows having an array of observers if there are fonts whose glyphs
> + * might change, but also avoid allocation in the simple case that there aren't.
> + */
> +struct SimpleTextRunUserData {
> +  nsTArray<UniquePtr<GlyphObserver>> mGlyphObserverArray;

As above, no need for `Array` on the name.

::: layout/generic/nsTextFrame.cpp:441
(Diff revision 2)
>  
>    nscolor GetResolvedForeColor(nscolor aColor, nscolor aDefaultForeColor,
>                                 nscolor aBackColor);
>  };
>  
> -static void
> +static nsIFrame*

Along with Xidorn's comment above, this can also return `nsTextFrame*` rather than `nsIFrame*`.

::: layout/generic/nsTextFrame.cpp:835
(Diff revision 2)
>      return;
>    }
>  
> +  nsTArray<UniquePtr<GlyphObserver>>* observers;
> +
> +  // Swap the frame pointer for a just-allocated SimpleTextRunUserData.

This comment belongs inside the `if` where the swap happens; or else make it more general, to describe what's happening in each of the branches below.

::: layout/generic/nsTextFrame.cpp:1049
(Diff revision 2)
>    uint8_t                       mNextRunContextInfo;
>    uint8_t                       mCurrentRunContextInfo;
>  };
>  
> +static MOZ_ALWAYS_INLINE ComplexTextRunUserData*
> +CreateComplexUserData(AutoTArray<BuildTextRunsScanner::MappedFlow, 10>& mappedFlows)

I think it'll be more readable if you just pass an integer parameter `aFlowCount` here.

Also, now that you've got a Create... helper for this struct, it'd be nice to move the initialization of the `mMappedFlows` pointer in here, instead of the caller having to set that up afterwards.

::: layout/generic/nsTextFrame.cpp:2640
(Diff revision 2)
>           f = static_cast<nsTextFrame*>(f->GetNextContinuation())) {
>  #ifdef DEBUG_roc
>        if (f->GetTextRun(mWhichTextRun)) {
>          gfxTextRun* textRun = f->GetTextRun(mWhichTextRun);
>          if (textRun->GetFlags() & nsTextFrameUtils::TEXT_IS_SIMPLE_FLOW) {
> -          if (mMappedFlows[0].mStartFrame != static_cast<nsTextFrame*>(textRun->GetUserData())) {
> +          if (mMappedFlows[0].mStartFrame != static_cast<nsTextFrame*>(GetFrameForSimpleFlow(textRun)) {

This line is way over 80 chars, could you please add a line-break as you're touching it anyway?
Attachment #8774107 - Flags: review?(jfkthame)
Comment on attachment 8774107 [details]
Bug 1288938: layout: Move the GlyphObserver to the text run instead of the frame.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66712/diff/2-3/
Attachment #8774107 - Flags: review?(jfkthame)
Attachment #8774107 - Flags: review?(jfkthame)
Comment on attachment 8774107 [details]
Bug 1288938: layout: Move the GlyphObserver to the text run instead of the frame.

https://reviewboard.mozilla.org/r/66712/#review64422

This looks good, modulo a bit of rearrangement for clarity; and I think you can simplify the code a bit by making ComplexTextRunUserData a subclass of TextRunUserData.

::: layout/generic/nsTextFrame.cpp:228
(Diff revision 3)
> + *   - A TextRunUserData in the case a text run maps multiple flows, but it
> + *     doesn't have any glyph observer for changes in SVG fonts.
> + *
> + *   - A nsIFrame* in the case a text run maps to only one flow. In this case,
> + *     the textrun's user data pointer is a pointer to mStartFrame for that
> + *     flow, mDOMOffsetToBeforeTransformOffset is zero, and mContentLength is
> + *     the length of the text node.
> + *
> + *   - A ComplexTextRunUserData in the case a text run maps to multiple flows,
> + *     but we need to keep a list of glyph observers.
> + *
> + *   - A SimpleTextRunUserData in the case a text run maps to one flow, but we
> + *     still have to keep a list of glyph observers.

Please re-arrange these cases so that they're in a more logical progression (at least to my mind):

- A nsIFrame...
- A SimpleTextRunUserData...
- A TextRunUserData...
- A ComplexTextRunUserData...

::: layout/generic/nsTextFrame.cpp:308
(Diff revision 3)
> +  }
> +};
> +
> +/**
> + * This is our user data for the textrun, when textRun->GetFlags() does not
> + * have TEXT_IS_SIMPLE_FLOW set nor the TEXT_MIGHT HAVE_GLYPH_CHANGES flag.

This should be "...does not have TEXT_IS_SIMPLE_FLOW set, and does have TEXT_MIGHT_HAVE_GLYPH_CHANGES".

::: layout/generic/nsTextFrame.cpp:310
(Diff revision 3)
> +struct ComplexTextRunUserData {
> +  nsTArray<UniquePtr<GlyphObserver>> mGlyphObservers;
> +  TextRunMappedFlow* mMappedFlows;
> +  uint32_t           mMappedFlowCount;
> +  uint32_t           mLastFlowIndex;
> +};

I think it would help if you make this a subclass of TextRunUserData:

    struct ComplexTextRunUserData : public TextRunUserData {
      nsTArray<UniquePtr<GlyphObserver>> mGlyphObservers;
    };

Then you won't need the Set & GetComplexFlowInfo helpers; you can get the userdata as a `TextRunUserData*` and directly access its fields regardless of whether it is the complex subclass that has the observers array or not. You'll only need to explicitly distinguish the "complex" from the "normal" struct when creating/deleting it, and when needing to access the observers array.

It'd also be helpful, then, to reorder these structs so that ComplexTextRunUserData immediately follows TextRunUserData, as they'll be closely related. So moving SimpleTextRunUserData up above the declaration of TextRunMappedFlow (which it doesn't use) would better match the suggested ordering in the descriptive comment earlier.
Comment on attachment 8774107 [details]
Bug 1288938: layout: Move the GlyphObserver to the text run instead of the frame.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66712/diff/3-4/
Attachment #8774107 - Flags: review?(jfkthame)
Comment on attachment 8774107 [details]
Bug 1288938: layout: Move the GlyphObserver to the text run instead of the frame.

https://reviewboard.mozilla.org/r/66712/#review64532

LGTM, thanks!
Attachment #8774107 - Flags: review?(jfkthame) → review+
Comment on attachment 8774107 [details]
Bug 1288938: layout: Move the GlyphObserver to the text run instead of the frame.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66712/diff/4-5/
Pushed by ecoal95@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7d6baa38a04c
layout: Move the GlyphObserver to the text run instead of the frame. r=jfkthame
Assignee: nobody → ealvarez
Thanks to you Jonathan for the fast review!

Also your comments were on point and simplified a lot :-)
Actually if we do allocate TextRunUserData a lot, we should probably try removing mMappedFlows from it, as it is almost always just points to the memory immediately after the struct.

It also reminds me that we should probably
> static_assert(alignof(TextRunUserData) + sizeof(TextRunUserData) % alignof(TextRunMappedFlow) == 0);
and
> static_assert(alignof(ComplexTextRunUserData) + sizeof(ComplexTextRunUserData) % alignof(TextRunMappedFlow) == 0);
Hmm.. I definitely saw something like that on try, but couldn't reproduce locally and saw that there was an intermittent filled for a crash with the same signature, so assumed it was a known itermittent.

Will dig into it, thanks Wes.
Flags: needinfo?(ealvarez)
So today I took a look into this again, and I'm definitely unable to reproduce the problem. The assertion failure is:

> Assertion failure: IsIdle(oldState), at xpcom/glue/PLDHashTable.h:132

In the GlyphObserver destructor, when it's removed from the font observer set.

This assertion is, AIUI, made to check concurrent operations in our hash tables. But as far as I know, only the main thread touches ever the observer set of a gfxFont.

What I tried to (unsuccessfully) reproduce the crash is:

> ./mach reftest --run-until-failure --repeat 500 --disable-e10s --debugger=rr --debugger-args="record --chaos" layout/reftests/text-svgglyphs/svg-glyph-paint-server.svg

Is there something else I can do in order to catch this? I don't find a logic reason for that assertion to hit (unless there's a bug on our hash table checker, that I'd consider pretty unlikely, or we actually access the font from multiple threads, in which case the hash table should be locked I guess, but I don't see an observer that could potentially be destroyed off-main thread).

Jonathan seems on PTO, Xidorn, do you know any other way I could try to reproduce this? I'm going to try an ASAN build or something, but I'm running out of ideas...
Flags: needinfo?(xidorn+moz)
Looking at the stack, I believe it isn't because of multithread access, but because the destructor gfxFont::GlyphChangeObserver::~GlyphChangeObserver() which is accessing the hash table is called inside gfxFont::NotifyGlyphsChanged() which opened the hash table for iterating.

I don't have a good idea currently about how to reproduce it or fix it. Need more time to think.

I guess this could be a crash when cleaning up resources from some previous reftests. So you may want to try run all the reftests in that directory to see whether you can reproduce it.
Flags: needinfo?(xidorn+moz)
Ohh, you're right, reentrancy is the problem, I see (wow, I completely missed that, thanks!). I'll try to see what situation can lead to that and build a test case.
A quick (and probably dirty) way to fix this issue would be that, in gfxFont::NotifyGlyphsChanged, iterating all the observers and put them in an array first, and then call them from the array rather than the hash table.

But given GlyphChangeObserver is not ref-counted, this may lead to calling dangling pointer, so you may need to check whether the observer still exists inside the hash table before calling them. Not sure whether it would hurt performance somehow...
Yeah, that's the fix I was thinking about, but doesn't seem clean... I'll try to see if I can avoid it.
Otherwise, when a glyph changes, we might end up doing too much work, destroying
the text-run and the observer that dirtied the frame, causing an assertion when
trying to delete it from the observer set.

Review commit: https://reviewboard.mozilla.org/r/68190/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/68190/
Attachment #8776429 - Flags: review?(xidorn+moz)
Comment on attachment 8774107 [details]
Bug 1288938: layout: Move the GlyphObserver to the text run instead of the frame.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/66712/diff/5-6/
Attachment #8776429 - Flags: review?(xidorn+moz) → review?(cam)
I think this change itself makes sense. But I doubt whether it fixes all reentrancy cases... Probably it does currently. We might eventually need to do it in a safer way, though.
Anyway, I'd redirect SVG-related review to heycam, although this patch looks good to me.
Attachment #8776429 - Flags: review?(cam) → review+
Comment on attachment 8776429 [details]
Bug 1288938: Allow passing different reasons to dirty a non display SVG text frame.

https://reviewboard.mozilla.org/r/68190/#review65894

::: layout/svg/SVGTextFrame.h:402
(Diff revision 1)
>     * The only case where we have to do this is in response to a style change on
>     * a non-display <text>. It is done in response to glyphs changes on
>     * non-display <text> (i.e., animated SVG-in-OpenType glyphs).

Can you mention in the comment here that FrameNeedsReflow will be called with NS_FRAME_IS_DIRTY, and that normally eStyleChange is passed for aReason, but that for glyph changes we only need to use eResize since that won't affect layout.

::: layout/svg/SVGTextFrame.cpp:3398
(Diff revision 1)
>    }
>  
>    MOZ_ASSERT(f, "should have found an ancestor frame to reflow");
>  
>    PresContext()->PresShell()->FrameNeedsReflow(
> -    f, nsIPresShell::eStyleChange, NS_FRAME_IS_DIRTY);
> +    f, aReason, NS_FRAME_IS_DIRTY);

This can all fit on one line now.
Pushed by ecoal95@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8ea3f72ecd72
layout: Move the GlyphObserver to the text run instead of the frame. r=jfkthame
https://hg.mozilla.org/integration/mozilla-inbound/rev/f865bf340a8a
Allow passing different reasons to dirty a non display SVG text frame. r=heycam
https://hg.mozilla.org/mozilla-central/rev/8ea3f72ecd72
https://hg.mozilla.org/mozilla-central/rev/f865bf340a8a
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Duplicate of this bug: 1332961
You need to log in before you can comment on or make changes to this bug.