Spell-checker goes into self-rescheduling loop and keeps main thread busy spellchecking minified JS for sites like Facebook, Google Docs/Google Sheets due to apparent mozInlineSpellWordUtil::BuildSoftText backtracking.

RESOLVED FIXED in Firefox 56

Status

()

Core
Spelling checker
RESOLVED FIXED
3 months ago
2 months ago

People

(Reporter: sinker, Assigned: evelyn)

Tracking

unspecified
mozilla56
Points:
---

Firefox Tracking Flags

(firefox56 fixed)

Details

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(5 attachments)

(Reporter)

Description

3 months ago
Once I load Facebook in a tab, the process of the tab is spinning, the fan of my laptop is spinning too.

I tried to attach gdb with the content process, breaking it randomly, getting similar backtrace almost everytime. (see frame #7)
Following is the backtrace of the content process:

#0  0x00007fba139c1efc in WordSplitState::ShouldSkipWord(int, int) ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#1  0x00007fba139c5720 in mozInlineSpellWordUtil::SplitDOMWord(int, int) ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#2  0x00007fba139c5801 in mozInlineSpellWordUtil::BuildRealWords() ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#3  0x00007fba139c584f in mozInlineSpellWordUtil::EnsureWords() ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#4  0x00007fba139c5e97 in mozInlineSpellWordUtil::SetPosition(nsINode*, int) ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#5  0x00007fba139c702a in mozInlineSpellChecker::DoSpellCheck(mozInlineSpellWordUtil&, mozilla::dom::Selection*, mozilla::UniquePtr<mozInlineSpellStatus, mozilla::DefaultDelete<mozInlineSpellStatus> > const&, bool*) ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#6  0x00007fba139c7bbb in mozInlineSpellChecker::ResumeCheck(mozilla::UniquePtr<mozInlineSpellStatus, mozilla::DefaultDelete<mozInlineSpellStatus> >&&) ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#7  0x00007fba139c7cd5 in mozInlineSpellResume::Run() ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#8  0x00007fba11c840f4 in nsThread::ProcessNextEvent(bool, bool*) ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#9  0x00007fba11c84c5f in NS_ProcessNextEvent(nsIThread*, bool) ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#10 0x00007fba11f63872 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#11 0x00007fba11f47a7b in MessageLoop::Run() ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#12 0x00007fba1300217f in nsBaseAppShell::Run() ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#13 0x00007fba13b1bdb3 in XRE_RunAppShell() ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#14 0x00007fba11f47a7b in MessageLoop::Run() ()
   from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#15 0x00007fba13b1c191 in XRE_InitChildProcess(int, char**, XREChildData const*) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so
#16 0x000055a04f77dd99 in content_process_main(mozilla::Bootstrap*, int, char**) ()
#17 0x000055a04f77d77f in main ()
I just experienced something similar with Google Sheets on nightly, even in safe mode.  A content process's main thread was dominated by mozInlineSpellChecker::ResumeCheck stack frames.  I was not able to reproduce on Facebook, however.

Inspecting the mozInlineSpellChecker::ResumeCheck frames, the wordUtil's mSoftText seems to implicate some kind of debug handling, specifically, gdb's dump of mSoftText in every case looked exactly like this:
mSoftText = u" for \"gwt:onLoadErrorFn\"',Ab='TX_gwtAppBasePath',Bb='/spreadsheets/static/ritz',Cb='/',Db='docs.debug',Eb='&',Fb='docs.debug=true',Gb='docs.debug=1',Hb='ritz.debug',Ib='ritz.debug=true',Jb='ritz.debug"...,
Created attachment 8865521 [details]
contents of inline script tag as copied by devtools' "Copy Inner HTML" mechanism

Elaborating on my Google Sheets case, it appears the spell-checker has decided to attempt to spellcheck the contents of an inline script tag and it gets hung up on it, for some reason deciding to backtrack or reset itself a lot.  The script's textContent.length is 34584, but if I set a breakpoint inside the "see if we've done enough words in this round and run out of time" success branch of mozInlineSpellChecker::DoSpellCheck with a command of `p wordRange.mRawPtr->mStartOffset` I see the values seem to loop, never progressing beyond 6k: 6388, 5198, 6233, 6154, 6422, 6426, 6676, 1649, 3330, 6513, 6433, 6335, 4046, 3889, 6373, 3995, 6433, 4046, 6638, 5145, 6596, 6596, 6471, 6380, 6373, 6429, 4342, etc.
Okay, so, yeah, it looks like mozInlineSpellWordUtil::BuildSoftText explicitly back-tracks per the comment:
  // First we have to work backwards from mSoftStart to find a text node
  // containing a DOM word separator, a non-inline-element
  // boundary, or the hard start node. That's where we'll start building the
  // soft string from.

And when I set a breakpoint on the exit line (I actually set the breakpoint on the "break" at mozInlineSpellWordUtil.cpp:610 but the breakpoint fires as mozInlineSpellWordUtil.cpp:619), I find that it keeps rebuilding the exact same mSoftText:
$248 = u" for \"gwt:onLoadErrorFn\"',Ab='TX_gwtAppBasePath',Bb='/spreadsheets/static/ritz',Cb='/',Db='docs.debug',Eb='&',Fb='docs.debug=true',Gb='docs.debug=1',Hb='ritz.debug',Ib='ritz.debug=true',Jb='ritz.debug"...

It seems like there are 2 issues here then:
1) We shouldn't be spell-checking the contents of a script tag.
2) If there's a long enough region without whitespace, it seems like the spellchecker fails to make forward progress because of backtracking.

I suspect with no basis that the first bit is somehow a regression but the second bit has always existed, we just didn't try and parse minified JS before.
Summary: Facebook is busy on spell checking → Spell-checker goes into self-rescheduling loop and keeps main thread busy spellchecking minified JS for sites like Facebook, Google Docs/Google Sheets due to apparent mozInlineSpellWordUtil::BuildSoftText backtracking.

Comment 4

3 months ago
dupe of bug 1363176 ?
The infinite loop certainly is, not sure about the fact that I observed it spellchecking JS.  I'll comment on that bug since it has more eyes (read: :ehsan ;) on it.
(Assignee)

Comment 6

3 months ago
Comment 3 is exactly what we found in Facebook case in bug 1363176 comment 10. It shouldn't go into <script> tag IMO it doesn't make sense to check non-editable content. Given this part of code is old and doesn't have any test case, it's very likely other changes broke it already, and my patch on bug 1354641 exposes this problem.

I would like to keep this bug open and fix the wrong logic of mozInlineSpellWordUtil::BuildSoftText here, because it's a totally different problem from bug 1354641. And also thank Andrew's analysis we have more details here than on bug 1363176.
(Assignee)

Updated

3 months ago
See Also: → bug 1363176
(Assignee)

Updated

3 months ago
Assignee: nobody → ehung
(Assignee)

Comment 7

3 months ago
Created attachment 8868673 [details]
loop.html

Here is a simplified test case that will run into an infinite loop on the build with my patch. Note that the content of <div> matters, replacing them with repeated single character can't cause the infinite loop.

I'm sure that the logic of the spell checking traverses all text nodes in body element, no matter they are editable or not. It seems an intended behavior and not a regression of my patch. Regardless of this behavior, the possible regression caused by my patch is the issue (2) in comment 3. I'm digging into this.
(Assignee)

Comment 8

3 months ago
Okay... I found the problem. 

In the original design, it checks and skips text in non-editable elements before checking timeout [1]. That means these text won't be counted (in terms of time) although they take time to traverse. I moved the timeout check prior to the non-editable node check, because we concerned that the traversal might take too long and we run out of time (bug 1354641 comment 10). Which is exactly the case of this <script> tag.

Since bug 1354641 is a performance bug, and the goal is to not block main thread more than 1ms, I feel my place of the timeout check is right. Then the first problem is: can we skip non-editable content as early as possible?

@Ehsan, is there any reason we can't filter out non-editable node earlier?


The second piece of this puzzle is my mNextWordIndex in mozInlineSpellWordUtil will be reset to 0, or like comment 2 said, to loop. In my test case, removing all whitespaces in <div> will get rid of this issue, i.e. making forward progress, which is a bit different from Andrew's observation. I'm still debugging.

[1] http://searchfox.org/mozilla-central/source/extensions/spellcheck/src/mozInlineSpellChecker.cpp#1544-1545
Flags: needinfo?(ehsan)
(In reply to Evelyn Hung [:evelyn] from comment #8)
> Okay... I found the problem. 
> 
> In the original design, it checks and skips text in non-editable elements
> before checking timeout [1]. That means these text won't be counted (in
> terms of time) although they take time to traverse. I moved the timeout
> check prior to the non-editable node check, because we concerned that the
> traversal might take too long and we run out of time (bug 1354641 comment
> 10). Which is exactly the case of this <script> tag.
> 
> Since bug 1354641 is a performance bug, and the goal is to not block main
> thread more than 1ms, I feel my place of the timeout check is right. Then
> the first problem is: can we skip non-editable content as early as possible?
> 
> @Ehsan, is there any reason we can't filter out non-editable node earlier?

I don't think your reason for moving the timeout check is a good one here.
None of the checks happening in <http://searchfox.org/mozilla-central/rev/f55349994fdac101d121b11dac769f3f17fbec4b/extensions/spellcheck/src/mozInlineSpellChecker.cpp#1522-1557> are particularly expensive.  The problem is that when you reorder these checks, you're effectively implicitly relying on the checks to occur the next time around you get back to spell checking, but that's not necessarily the case.  For example, when you bypass the ShouldSpellCheckNode() check here, there's no other place where this check will happen later, which is the root cause behind the bug at hand.

I think a better fix for bug 1354641 would be to modify the condition in this block: <http://searchfox.org/mozilla-central/rev/f55349994fdac101d121b11dac769f3f17fbec4b/extensions/spellcheck/src/mozInlineSpellChecker.cpp#1577>

> The second piece of this puzzle is my mNextWordIndex in
> mozInlineSpellWordUtil will be reset to 0, or like comment 2 said, to loop.
> In my test case, removing all whitespaces in <div> will get rid of this
> issue, i.e. making forward progress, which is a bit different from Andrew's
> observation. I'm still debugging.

I strongly suspect that doing what I suggested above may be enough.  Can you try that and see what happens?
Flags: needinfo?(ehsan)
(Assignee)

Comment 10

3 months ago
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #9)
> (In reply to Evelyn Hung [:evelyn] from comment #8)
> > Okay... I found the problem. 
> > 
> > In the original design, it checks and skips text in non-editable elements
> > before checking timeout [1]. That means these text won't be counted (in
> > terms of time) although they take time to traverse. I moved the timeout
> > check prior to the non-editable node check, because we concerned that the
> > traversal might take too long and we run out of time (bug 1354641 comment
> > 10). Which is exactly the case of this <script> tag.
> > 
> > Since bug 1354641 is a performance bug, and the goal is to not block main
> > thread more than 1ms, I feel my place of the timeout check is right. Then
> > the first problem is: can we skip non-editable content as early as possible?
> > 
> > @Ehsan, is there any reason we can't filter out non-editable node earlier?
> 
> I don't think your reason for moving the timeout check is a good one here.
> None of the checks happening in
> <http://searchfox.org/mozilla-central/rev/
> f55349994fdac101d121b11dac769f3f17fbec4b/extensions/spellcheck/src/
> mozInlineSpellChecker.cpp#1522-1557> are particularly expensive.  

I probably didn't make my point clear enough, let me rephrase it. These checks 
become expensive when there are non-editable elements with a lot of text in there
Then we will run these checks for every single _word_ (even it's not literally a
word) until we finish all the content. In the case we have a bunch of inline <script>
tags in the HTML, these take time to check and might block main thread for a while. 
Since it calls |continue| in the while loop, that will bypass the timeout check in
the current code logic, which means we have no way to take main thread's control back.
Depends on how much text is in these non-editable elements, these might take more than
1ms, I think.

> The problem is that when you reorder these checks, you're effectively implicitly
> relying on the checks to occur the next time around you get back to spell
> checking, but that's not necessarily the case.

Yes.

> For example, when you bypass
> the ShouldSpellCheckNode() check here, there's no other place where this
> check will happen later, which is the root cause behind the bug at hand.

I'm not sure I get your point. We will come back to these checks for the next round 
of |ResumeCheck|, don't we? 

The problem here as I noticed, we felled into a infinite loop because we go 
backward to the beginning and check the text we've checked before, and it always 
happens, for every round. I will explain the problem in next comment with a test
case.

> I think a better fix for bug 1354641 would be to modify the condition in
> this block:
> <http://searchfox.org/mozilla-central/rev/
> f55349994fdac101d121b11dac769f3f17fbec4b/extensions/spellcheck/src/
> mozInlineSpellChecker.cpp#1577>
> 
> > The second piece of this puzzle is my mNextWordIndex in
> > mozInlineSpellWordUtil will be reset to 0, or like comment 2 said, to loop.
> > In my test case, removing all whitespaces in <div> will get rid of this
> > issue, i.e. making forward progress, which is a bit different from Andrew's
> > observation. I'm still debugging.
> 
> I strongly suspect that doing what I suggested above may be enough.  Can you
> try that and see what happens?

Yes I did. It fixed most cases if I didn't reorder the timeout checking. If the
concerns above isn't really critical, I'm good to do this first and finish bug 
1354641.

However, I still think the logic is broken. I have made a case that will also 
keep main thread busy and fall into an infinite loop, on Nightly and earlier
releases. See next comment.
(Assignee)

Comment 11

3 months ago
Created attachment 8870104 [details]
loop-on-all-releases.html

This test is a bit tricky, but I think it's a valid real word case. It leads main process occupying CPU forever on Nightly and earlier releases.

In this test case, imaging upper case letters represent a word, it is in this pattern:

A B C.D.E.F.G

where C.D.E.F.G are thousands words concatenated by dots. 

When we do |BuildSoftText|, we basically look for _whitespace_ to identify word boundary, and then go backward to build SoftText. So here, there are only three words from this perspective: A and B and C.D.E.F.G.
However, when we do |BuildRealWords|, we use more complicated logic to identify word boundaries [1], so C.D.E.F.G will be split to many words.

Therefore, if the last round we were in somewhere in the middle of C.D.E.F.G, so for example mSoftBegin.mOffset = 297 as the initial value. the built mSoftText will be still be "A B C.D.E.F.G", because we look backward for up to two words, and then |mSoftBegin.mOffset| will be reset to 0. Then |FindRealWordContaining| set the |mNextWordIndex| to 0, so we restarted! In the case there are many words separated by whitespace, we can make some progress, but once we encounter a _big-word_ concatenated by dot or hyphen, we meet this problem and it causes an infinite loop.

@Ehsan, I'm wondering why the word boundary checking in [1] isn't applied to |BuildSoftText|. I worry modifying the logic here might break other cases. 

Note that the chance of the infinite loop happening in real word seems to be low, because we set a misspell word limit (now 375). If there are some misspells in the _big-word_, after a few rounds of rechecking, we will hit the limit and then stop.

I'd say that this is a really sophisticated design. ;-)

[1] http://searchfox.org/mozilla-central/source/extensions/spellcheck/src/mozInlineSpellWordUtil.cpp#845
Flags: needinfo?(ehsan)
(Assignee)

Comment 12

3 months ago
(In reply to Evelyn Hung [:evelyn] from comment #11)
> Created attachment 8870104 [details]
> loop-on-all-releases.html
>  
> In this test case, imaging upper case letters represent a word

I meant "imagine each upper case letter represents a word".
Unfortunately I don't know why the current code works like this...  so I feel like I can't answer your question here.  My guess is that BuildSoftText() needs to be calling ClassifyCharacter() down the line somehow in order to have the same behavior and if it isn't that's a bug...

But in general this is why any change to the logic here is prone to causing various kinds of regressions. :-(
Flags: needinfo?(ehsan)
(Assignee)

Comment 14

3 months ago
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #13)
> Unfortunately I don't know why the current code works like this...  so I
> feel like I can't answer your question here.  My guess is that
> BuildSoftText() needs to be calling ClassifyCharacter() down the line
> somehow in order to have the same behavior and if it isn't that's a bug...

Unfortunately, it isn't. :(

Not sure if changing the word boundary detection logic causes side effects, so I decide to do incremental improvement. Let's land patch on bug 135464 back although it isn't perfect. Then the fix on this bug will focus on improving the broken logic in mozInlineSpellWordUtil, so we won't do redundant checking and prevent us from an infinite loop.
(Assignee)

Comment 15

2 months ago
Clarify the scope of this bug:
1. Stop including text content in <script> and <style> while building soft text, so we won't waste time to deal with them afterward.
2. Fix the inconsistent logic of checking word boundary in BuildSoftText(). Besides white space, we should call ClassifyCharacter() to include all punctuations. This is to fix the case in comment 11, so we won't go backward and repeatedly checking same words.
(Assignee)

Comment 16

2 months ago
(In reply to Evelyn Hung [:evelyn] from comment #15)
> Clarify the scope of this bug:
> 1. Stop including text content in <script> and <style> while building soft
> text, so we won't waste time to deal with them afterwards.

Following Ehsan's suggestion and modified IsTextNode() by adding the these lines:

 static inline bool
 IsTextNode(nsINode* aNode)
 {
  nsIContent *parent = aNode->GetParent();
+  if (parent->IsHTMLElement(nsGkAtoms::script) ||
+      parent->IsHTMLElement(nsGkAtoms::style))
+      return false;
   return aNode->IsNodeOfType(nsINode::eTEXT);
 }

and it did exclude those <script> and <style> nodes from building soft text.
Working on a patch for review.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment on attachment 8876658 [details]
Bug 1362858 - Part 1: make word boundary check more consistent.

https://reviewboard.mozilla.org/r/148012/#review152520
Attachment #8876658 - Flags: review?(ehsan) → review+
Comment on attachment 8876659 [details]
Bug 1362858 - Part 2: Exclude script and style tags when building soft text.

https://reviewboard.mozilla.org/r/148014/#review152524

::: extensions/spellcheck/src/mozInlineSpellChecker.cpp:1265
(Diff revision 1)
>  mozInlineSpellChecker::ShouldSpellCheckNode(nsIEditor* aEditor,
>                                              nsINode *aNode)
>  {
>    MOZ_ASSERT(aNode);
>    if (!aNode->IsContent())
> -    return true;
> +    return false;

Hmm, I wonder why we used to return true here for things like documents and attributes!!!  Returning false certainly sounds more correct.

::: extensions/spellcheck/src/mozInlineSpellWordUtil.cpp:88
(Diff revision 1)
>  static inline bool
> -IsTextNode(nsINode* aNode)
> +IsSpellCheckingTextNode(nsINode* aNode)
>  {
> +  nsIContent *parent = aNode->GetParent();
> +  if (parent->IsHTMLElement(nsGkAtoms::script) ||
> +      parent->IsHTMLElement(nsGkAtoms::style))

Nit: please use this helper:
https://searchfox.org/mozilla-central/rev/d840ebd5858a61dbc1622487c1fab74ecf235e03/dom/base/nsIContent.h#279

if (parent->IsAnyOfHTMLElements(nsGkAtoms::script, nsGkAtoms::style)) {
  return false;
}
Attachment #8876659 - Flags: review?(ehsan) → review+
Comment hidden (mozreview-request)
(Assignee)

Comment 22

2 months ago
Try looks green!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f0e5a77fab8bed76e12e1451e3fbdafbdd0728a2

Comment 23

2 months ago
Pushed by ehung@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/eaf99ba3813a
Part 1: make word boundary check more consistent.r=Ehsan
https://hg.mozilla.org/integration/autoland/rev/307fd37d519d
Part 2: Exclude script and style tags when building soft text.r=Ehsan

Comment 24

2 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/eaf99ba3813a
https://hg.mozilla.org/mozilla-central/rev/307fd37d519d
Status: NEW → RESOLVED
Last Resolved: 2 months ago
status-firefox56: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in before you can comment on or make changes to this bug.