1362858 - Spell-checker goes into self-rescheduling loop and keeps main thread busy spellchecking minified JS for sites like Facebook, Google Docs/Google Sheets due to apparent mozInlineSpellWordUtil::BuildSoftText backtracking.

Reporter

Description

•

8 years ago

Once I load Facebook in a tab, the process of the tab is spinning, the fan of my laptop is spinning too. I tried to attach gdb with the content process, breaking it randomly, getting similar backtrace almost everytime. (see frame #7) Following is the backtrace of the content process: #0 0x00007fba139c1efc in WordSplitState::ShouldSkipWord(int, int) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #1 0x00007fba139c5720 in mozInlineSpellWordUtil::SplitDOMWord(int, int) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #2 0x00007fba139c5801 in mozInlineSpellWordUtil::BuildRealWords() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #3 0x00007fba139c584f in mozInlineSpellWordUtil::EnsureWords() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #4 0x00007fba139c5e97 in mozInlineSpellWordUtil::SetPosition(nsINode*, int) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #5 0x00007fba139c702a in mozInlineSpellChecker::DoSpellCheck(mozInlineSpellWordUtil&, mozilla::dom::Selection*, mozilla::UniquePtr<mozInlineSpellStatus, mozilla::DefaultDelete<mozInlineSpellStatus> > const&, bool*) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #6 0x00007fba139c7bbb in mozInlineSpellChecker::ResumeCheck(mozilla::UniquePtr<mozInlineSpellStatus, mozilla::DefaultDelete<mozInlineSpellStatus> >&&) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #7 0x00007fba139c7cd5 in mozInlineSpellResume::Run() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #8 0x00007fba11c840f4 in nsThread::ProcessNextEvent(bool, bool*) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #9 0x00007fba11c84c5f in NS_ProcessNextEvent(nsIThread*, bool) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #10 0x00007fba11f63872 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #11 0x00007fba11f47a7b in MessageLoop::Run() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #12 0x00007fba1300217f in nsBaseAppShell::Run() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #13 0x00007fba13b1bdb3 in XRE_RunAppShell() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #14 0x00007fba11f47a7b in MessageLoop::Run() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #15 0x00007fba13b1c191 in XRE_InitChildProcess(int, char**, XREChildData const*) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #16 0x000055a04f77dd99 in content_process_main(mozilla::Bootstrap*, int, char**) () #17 0x000055a04f77d77f in main ()

Andrew Sutherland [:asuth] (he/him)

Comment 1

•

8 years ago

I just experienced something similar with Google Sheets on nightly, even in safe mode. A content process's main thread was dominated by mozInlineSpellChecker::ResumeCheck stack frames. I was not able to reproduce on Facebook, however. Inspecting the mozInlineSpellChecker::ResumeCheck frames, the wordUtil's mSoftText seems to implicate some kind of debug handling, specifically, gdb's dump of mSoftText in every case looked exactly like this: mSoftText = u" for \"gwt:onLoadErrorFn\"',Ab='TX_gwtAppBasePath',Bb='/spreadsheets/static/ritz',Cb='/',Db='docs.debug',Eb='&',Fb='docs.debug=true',Gb='docs.debug=1',Hb='ritz.debug',Ib='ritz.debug=true',Jb='ritz.debug"...,

Andrew Sutherland [:asuth] (he/him)

Comment 2

•

8 years ago

Attached file contents of inline script tag as copied by devtools' "Copy Inner HTML" mechanism — Details

Elaborating on my Google Sheets case, it appears the spell-checker has decided to attempt to spellcheck the contents of an inline script tag and it gets hung up on it, for some reason deciding to backtrack or reset itself a lot. The script's textContent.length is 34584, but if I set a breakpoint inside the "see if we've done enough words in this round and run out of time" success branch of mozInlineSpellChecker::DoSpellCheck with a command of `p wordRange.mRawPtr->mStartOffset` I see the values seem to loop, never progressing beyond 6k: 6388, 5198, 6233, 6154, 6422, 6426, 6676, 1649, 3330, 6513, 6433, 6335, 4046, 3889, 6373, 3995, 6433, 4046, 6638, 5145, 6596, 6596, 6471, 6380, 6373, 6429, 4342, etc.

Andrew Sutherland [:asuth] (he/him)

Comment 3

•

8 years ago

Okay, so, yeah, it looks like mozInlineSpellWordUtil::BuildSoftText explicitly back-tracks per the comment: // First we have to work backwards from mSoftStart to find a text node // containing a DOM word separator, a non-inline-element // boundary, or the hard start node. That's where we'll start building the // soft string from. And when I set a breakpoint on the exit line (I actually set the breakpoint on the "break" at mozInlineSpellWordUtil.cpp:610 but the breakpoint fires as mozInlineSpellWordUtil.cpp:619), I find that it keeps rebuilding the exact same mSoftText: $248 = u" for \"gwt:onLoadErrorFn\"',Ab='TX_gwtAppBasePath',Bb='/spreadsheets/static/ritz',Cb='/',Db='docs.debug',Eb='&',Fb='docs.debug=true',Gb='docs.debug=1',Hb='ritz.debug',Ib='ritz.debug=true',Jb='ritz.debug"... It seems like there are 2 issues here then: 1) We shouldn't be spell-checking the contents of a script tag. 2) If there's a long enough region without whitespace, it seems like the spellchecker fails to make forward progress because of backtracking. I suspect with no basis that the first bit is somehow a regression but the second bit has always existed, we just didn't try and parse minified JS before.

Summary: Facebook is busy on spell checking → Spell-checker goes into self-rescheduling loop and keeps main thread busy spellchecking minified JS for sites like Facebook, Google Docs/Google Sheets due to apparent mozInlineSpellWordUtil::BuildSoftText backtracking.

Mayank Bansal

Comment 4

•

8 years ago

dupe of bug 1363176 ?

Andrew Sutherland [:asuth] (he/him)

Comment 5

•

8 years ago

The infinite loop certainly is, not sure about the fact that I observed it spellchecking JS. I'll comment on that bug since it has more eyes (read: :ehsan ;) on it.

Evelyn Hung

Assignee

Comment 6

•

8 years ago

Comment 3 is exactly what we found in Facebook case in bug 1363176 comment 10. It shouldn't go into <script> tag IMO it doesn't make sense to check non-editable content. Given this part of code is old and doesn't have any test case, it's very likely other changes broke it already, and my patch on bug 1354641 exposes this problem. I would like to keep this bug open and fix the wrong logic of mozInlineSpellWordUtil::BuildSoftText here, because it's a totally different problem from bug 1354641. And also thank Andrew's analysis we have more details here than on bug 1363176.

Evelyn Hung

Assignee

Updated

•

8 years ago

Updated

•

8 years ago

Assignee: nobody → ehung

Evelyn Hung

Assignee

Comment 7

•

8 years ago

Attached file loop.html — Details

Here is a simplified test case that will run into an infinite loop on the build with my patch. Note that the content of <div> matters, replacing them with repeated single character can't cause the infinite loop. I'm sure that the logic of the spell checking traverses all text nodes in body element, no matter they are editable or not. It seems an intended behavior and not a regression of my patch. Regardless of this behavior, the possible regression caused by my patch is the issue (2) in comment 3. I'm digging into this.

Evelyn Hung

Assignee

Comment 8

•

8 years ago

Okay... I found the problem. In the original design, it checks and skips text in non-editable elements before checking timeout [1]. That means these text won't be counted (in terms of time) although they take time to traverse. I moved the timeout check prior to the non-editable node check, because we concerned that the traversal might take too long and we run out of time (bug 1354641 comment 10). Which is exactly the case of this <script> tag. Since bug 1354641 is a performance bug, and the goal is to not block main thread more than 1ms, I feel my place of the timeout check is right. Then the first problem is: can we skip non-editable content as early as possible? @Ehsan, is there any reason we can't filter out non-editable node earlier? The second piece of this puzzle is my mNextWordIndex in mozInlineSpellWordUtil will be reset to 0, or like comment 2 said, to loop. In my test case, removing all whitespaces in <div> will get rid of this issue, i.e. making forward progress, which is a bit different from Andrew's observation. I'm still debugging. [1] http://searchfox.org/mozilla-central/source/extensions/spellcheck/src/mozInlineSpellChecker.cpp#1544-1545

Flags: needinfo?(ehsan)

(no longer active)

Comment 9

•

8 years ago

(In reply to Evelyn Hung [:evelyn] from comment #8) > Okay... I found the problem. > > In the original design, it checks and skips text in non-editable elements > before checking timeout [1]. That means these text won't be counted (in > terms of time) although they take time to traverse. I moved the timeout > check prior to the non-editable node check, because we concerned that the > traversal might take too long and we run out of time (bug 1354641 comment > 10). Which is exactly the case of this <script> tag. > > Since bug 1354641 is a performance bug, and the goal is to not block main > thread more than 1ms, I feel my place of the timeout check is right. Then > the first problem is: can we skip non-editable content as early as possible? > > @Ehsan, is there any reason we can't filter out non-editable node earlier? I don't think your reason for moving the timeout check is a good one here. None of the checks happening in <http://searchfox.org/mozilla-central/rev/f55349994fdac101d121b11dac769f3f17fbec4b/extensions/spellcheck/src/mozInlineSpellChecker.cpp#1522-1557> are particularly expensive. The problem is that when you reorder these checks, you're effectively implicitly relying on the checks to occur the next time around you get back to spell checking, but that's not necessarily the case. For example, when you bypass the ShouldSpellCheckNode() check here, there's no other place where this check will happen later, which is the root cause behind the bug at hand. I think a better fix for bug 1354641 would be to modify the condition in this block: <http://searchfox.org/mozilla-central/rev/f55349994fdac101d121b11dac769f3f17fbec4b/extensions/spellcheck/src/mozInlineSpellChecker.cpp#1577> > The second piece of this puzzle is my mNextWordIndex in > mozInlineSpellWordUtil will be reset to 0, or like comment 2 said, to loop. > In my test case, removing all whitespaces in <div> will get rid of this > issue, i.e. making forward progress, which is a bit different from Andrew's > observation. I'm still debugging. I strongly suspect that doing what I suggested above may be enough. Can you try that and see what happens?

Flags: needinfo?(ehsan)

Evelyn Hung

Assignee

Comment 10

•

8 years ago

(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #9) > (In reply to Evelyn Hung [:evelyn] from comment #8) > > Okay... I found the problem. > > > > In the original design, it checks and skips text in non-editable elements > > before checking timeout [1]. That means these text won't be counted (in > > terms of time) although they take time to traverse. I moved the timeout > > check prior to the non-editable node check, because we concerned that the > > traversal might take too long and we run out of time (bug 1354641 comment > > 10). Which is exactly the case of this <script> tag. > > > > Since bug 1354641 is a performance bug, and the goal is to not block main > > thread more than 1ms, I feel my place of the timeout check is right. Then > > the first problem is: can we skip non-editable content as early as possible? > > > > @Ehsan, is there any reason we can't filter out non-editable node earlier? > > I don't think your reason for moving the timeout check is a good one here. > None of the checks happening in > <http://searchfox.org/mozilla-central/rev/ > f55349994fdac101d121b11dac769f3f17fbec4b/extensions/spellcheck/src/ > mozInlineSpellChecker.cpp#1522-1557> are particularly expensive. I probably didn't make my point clear enough, let me rephrase it. These checks become expensive when there are non-editable elements with a lot of text in there Then we will run these checks for every single _word_ (even it's not literally a word) until we finish all the content. In the case we have a bunch of inline <script> tags in the HTML, these take time to check and might block main thread for a while. Since it calls |continue| in the while loop, that will bypass the timeout check in the current code logic, which means we have no way to take main thread's control back. Depends on how much text is in these non-editable elements, these might take more than 1ms, I think. > The problem is that when you reorder these checks, you're effectively implicitly > relying on the checks to occur the next time around you get back to spell > checking, but that's not necessarily the case. Yes. > For example, when you bypass > the ShouldSpellCheckNode() check here, there's no other place where this > check will happen later, which is the root cause behind the bug at hand. I'm not sure I get your point. We will come back to these checks for the next round of |ResumeCheck|, don't we? The problem here as I noticed, we felled into a infinite loop because we go backward to the beginning and check the text we've checked before, and it always happens, for every round. I will explain the problem in next comment with a test case. > I think a better fix for bug 1354641 would be to modify the condition in > this block: > <http://searchfox.org/mozilla-central/rev/ > f55349994fdac101d121b11dac769f3f17fbec4b/extensions/spellcheck/src/ > mozInlineSpellChecker.cpp#1577> > > > The second piece of this puzzle is my mNextWordIndex in > > mozInlineSpellWordUtil will be reset to 0, or like comment 2 said, to loop. > > In my test case, removing all whitespaces in <div> will get rid of this > > issue, i.e. making forward progress, which is a bit different from Andrew's > > observation. I'm still debugging. > > I strongly suspect that doing what I suggested above may be enough. Can you > try that and see what happens? Yes I did. It fixed most cases if I didn't reorder the timeout checking. If the concerns above isn't really critical, I'm good to do this first and finish bug 1354641. However, I still think the logic is broken. I have made a case that will also keep main thread busy and fall into an infinite loop, on Nightly and earlier releases. See next comment.

Evelyn Hung

Assignee

Comment 11

•

8 years ago

Attached file loop-on-all-releases.html — Details

This test is a bit tricky, but I think it's a valid real word case. It leads main process occupying CPU forever on Nightly and earlier releases. In this test case, imaging upper case letters represent a word, it is in this pattern: A B C.D.E.F.G where C.D.E.F.G are thousands words concatenated by dots. When we do |BuildSoftText|, we basically look for _whitespace_ to identify word boundary, and then go backward to build SoftText. So here, there are only three words from this perspective: A and B and C.D.E.F.G. However, when we do |BuildRealWords|, we use more complicated logic to identify word boundaries [1], so C.D.E.F.G will be split to many words. Therefore, if the last round we were in somewhere in the middle of C.D.E.F.G, so for example mSoftBegin.mOffset = 297 as the initial value. the built mSoftText will be still be "A B C.D.E.F.G", because we look backward for up to two words, and then |mSoftBegin.mOffset| will be reset to 0. Then |FindRealWordContaining| set the |mNextWordIndex| to 0, so we restarted! In the case there are many words separated by whitespace, we can make some progress, but once we encounter a _big-word_ concatenated by dot or hyphen, we meet this problem and it causes an infinite loop. @Ehsan, I'm wondering why the word boundary checking in [1] isn't applied to |BuildSoftText|. I worry modifying the logic here might break other cases. Note that the chance of the infinite loop happening in real word seems to be low, because we set a misspell word limit (now 375). If there are some misspells in the _big-word_, after a few rounds of rechecking, we will hit the limit and then stop. I'd say that this is a really sophisticated design. ;-) [1] http://searchfox.org/mozilla-central/source/extensions/spellcheck/src/mozInlineSpellWordUtil.cpp#845

Flags: needinfo?(ehsan)

Evelyn Hung

Assignee

Comment 12

•

8 years ago

(In reply to Evelyn Hung [:evelyn] from comment #11) > Created attachment 8870104 [details] > loop-on-all-releases.html > > In this test case, imaging upper case letters represent a word I meant "imagine each upper case letter represents a word".

(no longer active)

Comment 13

•

8 years ago

Unfortunately I don't know why the current code works like this... so I feel like I can't answer your question here. My guess is that BuildSoftText() needs to be calling ClassifyCharacter() down the line somehow in order to have the same behavior and if it isn't that's a bug... But in general this is why any change to the logic here is prone to causing various kinds of regressions. :-(

Flags: needinfo?(ehsan)

Evelyn Hung

Assignee

Comment 14

•

8 years ago

(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #13) > Unfortunately I don't know why the current code works like this... so I > feel like I can't answer your question here. My guess is that > BuildSoftText() needs to be calling ClassifyCharacter() down the line > somehow in order to have the same behavior and if it isn't that's a bug... Unfortunately, it isn't. :( Not sure if changing the word boundary detection logic causes side effects, so I decide to do incremental improvement. Let's land patch on bug 135464 back although it isn't perfect. Then the fix on this bug will focus on improving the broken logic in mozInlineSpellWordUtil, so we won't do redundant checking and prevent us from an infinite loop.

Evelyn Hung

Assignee

Comment 15

•

8 years ago

Clarify the scope of this bug: 1. Stop including text content in <script> and <style> while building soft text, so we won't waste time to deal with them afterward. 2. Fix the inconsistent logic of checking word boundary in BuildSoftText(). Besides white space, we should call ClassifyCharacter() to include all punctuations. This is to fix the case in comment 11, so we won't go backward and repeatedly checking same words.

Evelyn Hung

Assignee

Comment 16

•

8 years ago

(In reply to Evelyn Hung [:evelyn] from comment #15) > Clarify the scope of this bug: > 1. Stop including text content in <script> and <style> while building soft > text, so we won't waste time to deal with them afterwards. Following Ehsan's suggestion and modified IsTextNode() by adding the these lines: static inline bool IsTextNode(nsINode* aNode) { nsIContent *parent = aNode->GetParent(); + if (parent->IsHTMLElement(nsGkAtoms::script) || + parent->IsHTMLElement(nsGkAtoms::style)) + return false; return aNode->IsNodeOfType(nsINode::eTEXT); } and it did exclude those <script> and <style> nodes from building soft text. Working on a patch for review.

Comment hidden (mozreview-request)

(no longer active)

Comment 19

•

8 years ago

mozreview-review

Comment on attachment 8876658 [details] Bug 1362858 - Part 1: make word boundary check more consistent. https://reviewboard.mozilla.org/r/148012/#review152520

Attachment #8876658 - Flags: review?(ehsan) → review+

(no longer active)

Comment 20

•

8 years ago

mozreview-review

Comment on attachment 8876659 [details] Bug 1362858 - Part 2: Exclude script and style tags when building soft text. https://reviewboard.mozilla.org/r/148014/#review152524 ::: extensions/spellcheck/src/mozInlineSpellChecker.cpp:1265 (Diff revision 1) > mozInlineSpellChecker::ShouldSpellCheckNode(nsIEditor* aEditor, > nsINode *aNode) > { > MOZ_ASSERT(aNode); > if (!aNode->IsContent()) > - return true; > + return false; Hmm, I wonder why we used to return true here for things like documents and attributes!!! Returning false certainly sounds more correct. ::: extensions/spellcheck/src/mozInlineSpellWordUtil.cpp:88 (Diff revision 1) > static inline bool > -IsTextNode(nsINode* aNode) > +IsSpellCheckingTextNode(nsINode* aNode) > { > + nsIContent *parent = aNode->GetParent(); > + if (parent->IsHTMLElement(nsGkAtoms::script) || > + parent->IsHTMLElement(nsGkAtoms::style)) Nit: please use this helper: https://searchfox.org/mozilla-central/rev/d840ebd5858a61dbc1622487c1fab74ecf235e03/dom/base/nsIContent.h#279 if (parent->IsAnyOfHTMLElements(nsGkAtoms::script, nsGkAtoms::style)) { return false; }

Attachment #8876659 - Flags: review?(ehsan) → review+

Comment hidden (mozreview-request)

Evelyn Hung

Assignee

Comment 22

•

8 years ago

Try looks green! https://treeherder.mozilla.org/#/jobs?repo=try&revision=f0e5a77fab8bed76e12e1451e3fbdafbdd0728a2

Pulsebot

Comment 23

•

8 years ago

Pushed by ehung@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/eaf99ba3813a Part 1: make word boundary check more consistent.r=Ehsan https://hg.mozilla.org/integration/autoland/rev/307fd37d519d Part 2: Exclude script and style tags when building soft text.r=Ehsan

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 24

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/eaf99ba3813a https://hg.mozilla.org/mozilla-central/rev/307fd37d519d

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox56: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla56

Ekanan Ketunuti

Updated

•

7 years ago

Depends on: 1418629

Jorg K (CEST = GMT+2)

Comment 25

•

6 years ago

Why did you move a block of 241 lines from here https://hg.mozilla.org/mozilla-central/rev/eaf99ba3813a#l1.304 to here? https://hg.mozilla.org/mozilla-central/rev/eaf99ba3813a#l1.12 Looks like a huge change/patch when in effect nothing changed. I believe MozReview showed you moved blocks, but that's gone now. So the net change in Part 1 one is just this hunk here: https://hg.mozilla.org/mozilla-central/rev/eaf99ba3813a#l1.261

Jorg K (CEST = GMT+2)

Comment 26

•

6 years ago

Oh, I see, the new code is using CHAR_CLASS_SEPARATOR and that comes from the moved hunk.

Jorg K (CEST = GMT+2)

Comment 27

•

6 years ago

https://hg.mozilla.org/releases/mozilla-esr60/rev/78c54e4a83e8807cdeaa96e22648d381eb51c8da on THUNDERBIRD_60_VERBRANCH Backed out part of changeset eaf99ba3813a (bug 1362858, part 1, for causing bug 1418629) to build Thunderbird 60.2. a=jorgk

Makoto Kato [:m_kato]

Updated

•

6 years ago

Regressions: 1418629

Jorg K (CEST = GMT+2)

Comment 28

•

6 years ago

https://hg.mozilla.org/releases/mozilla-esr68/rev/0fb2663f9e540d943e9b6ef6617ead9dca2a5981 on THUNDERBIRD_68_VERBRANCH
Backed out part of changeset eaf99ba3813a (bug 1362858, part 1, for causing bug 1418629) to build Thunderbird 68.0. a=jorgk

Mirko Brodesser (:mbrodesser-Igalia), offline until 10th of Feb

Updated

•

4 years ago

contents of inline script tag as copied by devtools' "Copy Inner HTML" mechanism 8 years ago Andrew Sutherland [:asuth] (he/him) 33.79 KB, text/plain		Details
loop.html 8 years ago Evelyn Hung 78.90 KB, text/html		Details
loop-on-all-releases.html 8 years ago Evelyn Hung 17.58 KB, text/html		Details
Bug 1362858 - Part 1: make word boundary check more consistent. 8 years ago Evelyn Hung 59 bytes, text/x-review-board-request	ehsan.akhgari : review+	Details
Bug 1362858 - Part 2: Exclude script and style tags when building soft text. 8 years ago Evelyn Hung 59 bytes, text/x-review-board-request	ehsan.akhgari : review+	Details