Closed Bug 1362858 Opened 8 years ago Closed 8 years ago

Spell-checker goes into self-rescheduling loop and keeps main thread busy spellchecking minified JS for sites like Facebook, Google Docs/Google Sheets due to apparent mozInlineSpellWordUtil::BuildSoftText backtracking.

Categories

(Core :: Spelling checker, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: sinker, Assigned: jj.evelyn)

References

Details

Attachments

(5 files)

Once I load Facebook in a tab, the process of the tab is spinning, the fan of my laptop is spinning too. I tried to attach gdb with the content process, breaking it randomly, getting similar backtrace almost everytime. (see frame #7) Following is the backtrace of the content process: #0 0x00007fba139c1efc in WordSplitState::ShouldSkipWord(int, int) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #1 0x00007fba139c5720 in mozInlineSpellWordUtil::SplitDOMWord(int, int) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #2 0x00007fba139c5801 in mozInlineSpellWordUtil::BuildRealWords() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #3 0x00007fba139c584f in mozInlineSpellWordUtil::EnsureWords() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #4 0x00007fba139c5e97 in mozInlineSpellWordUtil::SetPosition(nsINode*, int) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #5 0x00007fba139c702a in mozInlineSpellChecker::DoSpellCheck(mozInlineSpellWordUtil&, mozilla::dom::Selection*, mozilla::UniquePtr<mozInlineSpellStatus, mozilla::DefaultDelete<mozInlineSpellStatus> > const&, bool*) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #6 0x00007fba139c7bbb in mozInlineSpellChecker::ResumeCheck(mozilla::UniquePtr<mozInlineSpellStatus, mozilla::DefaultDelete<mozInlineSpellStatus> >&&) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #7 0x00007fba139c7cd5 in mozInlineSpellResume::Run() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #8 0x00007fba11c840f4 in nsThread::ProcessNextEvent(bool, bool*) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #9 0x00007fba11c84c5f in NS_ProcessNextEvent(nsIThread*, bool) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #10 0x00007fba11f63872 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #11 0x00007fba11f47a7b in MessageLoop::Run() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #12 0x00007fba1300217f in nsBaseAppShell::Run() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #13 0x00007fba13b1bdb3 in XRE_RunAppShell() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #14 0x00007fba11f47a7b in MessageLoop::Run() () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #15 0x00007fba13b1c191 in XRE_InitChildProcess(int, char**, XREChildData const*) () from /home/thinker/tmp/ff/lib/firefox-55.0a1/libxul.so #16 0x000055a04f77dd99 in content_process_main(mozilla::Bootstrap*, int, char**) () #17 0x000055a04f77d77f in main ()
I just experienced something similar with Google Sheets on nightly, even in safe mode. A content process's main thread was dominated by mozInlineSpellChecker::ResumeCheck stack frames. I was not able to reproduce on Facebook, however. Inspecting the mozInlineSpellChecker::ResumeCheck frames, the wordUtil's mSoftText seems to implicate some kind of debug handling, specifically, gdb's dump of mSoftText in every case looked exactly like this: mSoftText = u" for \"gwt:onLoadErrorFn\"',Ab='TX_gwtAppBasePath',Bb='/spreadsheets/static/ritz',Cb='/',Db='docs.debug',Eb='&',Fb='docs.debug=true',Gb='docs.debug=1',Hb='ritz.debug',Ib='ritz.debug=true',Jb='ritz.debug"...,
Elaborating on my Google Sheets case, it appears the spell-checker has decided to attempt to spellcheck the contents of an inline script tag and it gets hung up on it, for some reason deciding to backtrack or reset itself a lot. The script's textContent.length is 34584, but if I set a breakpoint inside the "see if we've done enough words in this round and run out of time" success branch of mozInlineSpellChecker::DoSpellCheck with a command of `p wordRange.mRawPtr->mStartOffset` I see the values seem to loop, never progressing beyond 6k: 6388, 5198, 6233, 6154, 6422, 6426, 6676, 1649, 3330, 6513, 6433, 6335, 4046, 3889, 6373, 3995, 6433, 4046, 6638, 5145, 6596, 6596, 6471, 6380, 6373, 6429, 4342, etc.
Okay, so, yeah, it looks like mozInlineSpellWordUtil::BuildSoftText explicitly back-tracks per the comment: // First we have to work backwards from mSoftStart to find a text node // containing a DOM word separator, a non-inline-element // boundary, or the hard start node. That's where we'll start building the // soft string from. And when I set a breakpoint on the exit line (I actually set the breakpoint on the "break" at mozInlineSpellWordUtil.cpp:610 but the breakpoint fires as mozInlineSpellWordUtil.cpp:619), I find that it keeps rebuilding the exact same mSoftText: $248 = u" for \"gwt:onLoadErrorFn\"',Ab='TX_gwtAppBasePath',Bb='/spreadsheets/static/ritz',Cb='/',Db='docs.debug',Eb='&',Fb='docs.debug=true',Gb='docs.debug=1',Hb='ritz.debug',Ib='ritz.debug=true',Jb='ritz.debug"... It seems like there are 2 issues here then: 1) We shouldn't be spell-checking the contents of a script tag. 2) If there's a long enough region without whitespace, it seems like the spellchecker fails to make forward progress because of backtracking. I suspect with no basis that the first bit is somehow a regression but the second bit has always existed, we just didn't try and parse minified JS before.
Summary: Facebook is busy on spell checking → Spell-checker goes into self-rescheduling loop and keeps main thread busy spellchecking minified JS for sites like Facebook, Google Docs/Google Sheets due to apparent mozInlineSpellWordUtil::BuildSoftText backtracking.
dupe of bug 1363176 ?
The infinite loop certainly is, not sure about the fact that I observed it spellchecking JS. I'll comment on that bug since it has more eyes (read: :ehsan ;) on it.
Comment 3 is exactly what we found in Facebook case in bug 1363176 comment 10. It shouldn't go into <script> tag IMO it doesn't make sense to check non-editable content. Given this part of code is old and doesn't have any test case, it's very likely other changes broke it already, and my patch on bug 1354641 exposes this problem. I would like to keep this bug open and fix the wrong logic of mozInlineSpellWordUtil::BuildSoftText here, because it's a totally different problem from bug 1354641. And also thank Andrew's analysis we have more details here than on bug 1363176.
See Also: → 1363176
Assignee: nobody → ehung
Attached file loop.html
Here is a simplified test case that will run into an infinite loop on the build with my patch. Note that the content of <div> matters, replacing them with repeated single character can't cause the infinite loop. I'm sure that the logic of the spell checking traverses all text nodes in body element, no matter they are editable or not. It seems an intended behavior and not a regression of my patch. Regardless of this behavior, the possible regression caused by my patch is the issue (2) in comment 3. I'm digging into this.
Okay... I found the problem. In the original design, it checks and skips text in non-editable elements before checking timeout [1]. That means these text won't be counted (in terms of time) although they take time to traverse. I moved the timeout check prior to the non-editable node check, because we concerned that the traversal might take too long and we run out of time (bug 1354641 comment 10). Which is exactly the case of this <script> tag. Since bug 1354641 is a performance bug, and the goal is to not block main thread more than 1ms, I feel my place of the timeout check is right. Then the first problem is: can we skip non-editable content as early as possible? @Ehsan, is there any reason we can't filter out non-editable node earlier? The second piece of this puzzle is my mNextWordIndex in mozInlineSpellWordUtil will be reset to 0, or like comment 2 said, to loop. In my test case, removing all whitespaces in <div> will get rid of this issue, i.e. making forward progress, which is a bit different from Andrew's observation. I'm still debugging. [1] http://searchfox.org/mozilla-central/source/extensions/spellcheck/src/mozInlineSpellChecker.cpp#1544-1545
Flags: needinfo?(ehsan)
(In reply to Evelyn Hung [:evelyn] from comment #8) > Okay... I found the problem. > > In the original design, it checks and skips text in non-editable elements > before checking timeout [1]. That means these text won't be counted (in > terms of time) although they take time to traverse. I moved the timeout > check prior to the non-editable node check, because we concerned that the > traversal might take too long and we run out of time (bug 1354641 comment > 10). Which is exactly the case of this <script> tag. > > Since bug 1354641 is a performance bug, and the goal is to not block main > thread more than 1ms, I feel my place of the timeout check is right. Then > the first problem is: can we skip non-editable content as early as possible? > > @Ehsan, is there any reason we can't filter out non-editable node earlier? I don't think your reason for moving the timeout check is a good one here. None of the checks happening in <http://searchfox.org/mozilla-central/rev/f55349994fdac101d121b11dac769f3f17fbec4b/extensions/spellcheck/src/mozInlineSpellChecker.cpp#1522-1557> are particularly expensive. The problem is that when you reorder these checks, you're effectively implicitly relying on the checks to occur the next time around you get back to spell checking, but that's not necessarily the case. For example, when you bypass the ShouldSpellCheckNode() check here, there's no other place where this check will happen later, which is the root cause behind the bug at hand. I think a better fix for bug 1354641 would be to modify the condition in this block: <http://searchfox.org/mozilla-central/rev/f55349994fdac101d121b11dac769f3f17fbec4b/extensions/spellcheck/src/mozInlineSpellChecker.cpp#1577> > The second piece of this puzzle is my mNextWordIndex in > mozInlineSpellWordUtil will be reset to 0, or like comment 2 said, to loop. > In my test case, removing all whitespaces in <div> will get rid of this > issue, i.e. making forward progress, which is a bit different from Andrew's > observation. I'm still debugging. I strongly suspect that doing what I suggested above may be enough. Can you try that and see what happens?
Flags: needinfo?(ehsan)
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #9) > (In reply to Evelyn Hung [:evelyn] from comment #8) > > Okay... I found the problem. > > > > In the original design, it checks and skips text in non-editable elements > > before checking timeout [1]. That means these text won't be counted (in > > terms of time) although they take time to traverse. I moved the timeout > > check prior to the non-editable node check, because we concerned that the > > traversal might take too long and we run out of time (bug 1354641 comment > > 10). Which is exactly the case of this <script> tag. > > > > Since bug 1354641 is a performance bug, and the goal is to not block main > > thread more than 1ms, I feel my place of the timeout check is right. Then > > the first problem is: can we skip non-editable content as early as possible? > > > > @Ehsan, is there any reason we can't filter out non-editable node earlier? > > I don't think your reason for moving the timeout check is a good one here. > None of the checks happening in > <http://searchfox.org/mozilla-central/rev/ > f55349994fdac101d121b11dac769f3f17fbec4b/extensions/spellcheck/src/ > mozInlineSpellChecker.cpp#1522-1557> are particularly expensive. I probably didn't make my point clear enough, let me rephrase it. These checks become expensive when there are non-editable elements with a lot of text in there Then we will run these checks for every single _word_ (even it's not literally a word) until we finish all the content. In the case we have a bunch of inline <script> tags in the HTML, these take time to check and might block main thread for a while. Since it calls |continue| in the while loop, that will bypass the timeout check in the current code logic, which means we have no way to take main thread's control back. Depends on how much text is in these non-editable elements, these might take more than 1ms, I think. > The problem is that when you reorder these checks, you're effectively implicitly > relying on the checks to occur the next time around you get back to spell > checking, but that's not necessarily the case. Yes. > For example, when you bypass > the ShouldSpellCheckNode() check here, there's no other place where this > check will happen later, which is the root cause behind the bug at hand. I'm not sure I get your point. We will come back to these checks for the next round of |ResumeCheck|, don't we? The problem here as I noticed, we felled into a infinite loop because we go backward to the beginning and check the text we've checked before, and it always happens, for every round. I will explain the problem in next comment with a test case. > I think a better fix for bug 1354641 would be to modify the condition in > this block: > <http://searchfox.org/mozilla-central/rev/ > f55349994fdac101d121b11dac769f3f17fbec4b/extensions/spellcheck/src/ > mozInlineSpellChecker.cpp#1577> > > > The second piece of this puzzle is my mNextWordIndex in > > mozInlineSpellWordUtil will be reset to 0, or like comment 2 said, to loop. > > In my test case, removing all whitespaces in <div> will get rid of this > > issue, i.e. making forward progress, which is a bit different from Andrew's > > observation. I'm still debugging. > > I strongly suspect that doing what I suggested above may be enough. Can you > try that and see what happens? Yes I did. It fixed most cases if I didn't reorder the timeout checking. If the concerns above isn't really critical, I'm good to do this first and finish bug 1354641. However, I still think the logic is broken. I have made a case that will also keep main thread busy and fall into an infinite loop, on Nightly and earlier releases. See next comment.
This test is a bit tricky, but I think it's a valid real word case. It leads main process occupying CPU forever on Nightly and earlier releases. In this test case, imaging upper case letters represent a word, it is in this pattern: A B C.D.E.F.G where C.D.E.F.G are thousands words concatenated by dots. When we do |BuildSoftText|, we basically look for _whitespace_ to identify word boundary, and then go backward to build SoftText. So here, there are only three words from this perspective: A and B and C.D.E.F.G. However, when we do |BuildRealWords|, we use more complicated logic to identify word boundaries [1], so C.D.E.F.G will be split to many words. Therefore, if the last round we were in somewhere in the middle of C.D.E.F.G, so for example mSoftBegin.mOffset = 297 as the initial value. the built mSoftText will be still be "A B C.D.E.F.G", because we look backward for up to two words, and then |mSoftBegin.mOffset| will be reset to 0. Then |FindRealWordContaining| set the |mNextWordIndex| to 0, so we restarted! In the case there are many words separated by whitespace, we can make some progress, but once we encounter a _big-word_ concatenated by dot or hyphen, we meet this problem and it causes an infinite loop. @Ehsan, I'm wondering why the word boundary checking in [1] isn't applied to |BuildSoftText|. I worry modifying the logic here might break other cases. Note that the chance of the infinite loop happening in real word seems to be low, because we set a misspell word limit (now 375). If there are some misspells in the _big-word_, after a few rounds of rechecking, we will hit the limit and then stop. I'd say that this is a really sophisticated design. ;-) [1] http://searchfox.org/mozilla-central/source/extensions/spellcheck/src/mozInlineSpellWordUtil.cpp#845
Flags: needinfo?(ehsan)
(In reply to Evelyn Hung [:evelyn] from comment #11) > Created attachment 8870104 [details] > loop-on-all-releases.html > > In this test case, imaging upper case letters represent a word I meant "imagine each upper case letter represents a word".
Unfortunately I don't know why the current code works like this... so I feel like I can't answer your question here. My guess is that BuildSoftText() needs to be calling ClassifyCharacter() down the line somehow in order to have the same behavior and if it isn't that's a bug... But in general this is why any change to the logic here is prone to causing various kinds of regressions. :-(
Flags: needinfo?(ehsan)
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #13) > Unfortunately I don't know why the current code works like this... so I > feel like I can't answer your question here. My guess is that > BuildSoftText() needs to be calling ClassifyCharacter() down the line > somehow in order to have the same behavior and if it isn't that's a bug... Unfortunately, it isn't. :( Not sure if changing the word boundary detection logic causes side effects, so I decide to do incremental improvement. Let's land patch on bug 135464 back although it isn't perfect. Then the fix on this bug will focus on improving the broken logic in mozInlineSpellWordUtil, so we won't do redundant checking and prevent us from an infinite loop.
Clarify the scope of this bug: 1. Stop including text content in <script> and <style> while building soft text, so we won't waste time to deal with them afterward. 2. Fix the inconsistent logic of checking word boundary in BuildSoftText(). Besides white space, we should call ClassifyCharacter() to include all punctuations. This is to fix the case in comment 11, so we won't go backward and repeatedly checking same words.
(In reply to Evelyn Hung [:evelyn] from comment #15) > Clarify the scope of this bug: > 1. Stop including text content in <script> and <style> while building soft > text, so we won't waste time to deal with them afterwards. Following Ehsan's suggestion and modified IsTextNode() by adding the these lines: static inline bool IsTextNode(nsINode* aNode) { nsIContent *parent = aNode->GetParent(); + if (parent->IsHTMLElement(nsGkAtoms::script) || + parent->IsHTMLElement(nsGkAtoms::style)) + return false; return aNode->IsNodeOfType(nsINode::eTEXT); } and it did exclude those <script> and <style> nodes from building soft text. Working on a patch for review.
Comment on attachment 8876658 [details] Bug 1362858 - Part 1: make word boundary check more consistent. https://reviewboard.mozilla.org/r/148012/#review152520
Attachment #8876658 - Flags: review?(ehsan) → review+
Comment on attachment 8876659 [details] Bug 1362858 - Part 2: Exclude script and style tags when building soft text. https://reviewboard.mozilla.org/r/148014/#review152524 ::: extensions/spellcheck/src/mozInlineSpellChecker.cpp:1265 (Diff revision 1) > mozInlineSpellChecker::ShouldSpellCheckNode(nsIEditor* aEditor, > nsINode *aNode) > { > MOZ_ASSERT(aNode); > if (!aNode->IsContent()) > - return true; > + return false; Hmm, I wonder why we used to return true here for things like documents and attributes!!! Returning false certainly sounds more correct. ::: extensions/spellcheck/src/mozInlineSpellWordUtil.cpp:88 (Diff revision 1) > static inline bool > -IsTextNode(nsINode* aNode) > +IsSpellCheckingTextNode(nsINode* aNode) > { > + nsIContent *parent = aNode->GetParent(); > + if (parent->IsHTMLElement(nsGkAtoms::script) || > + parent->IsHTMLElement(nsGkAtoms::style)) Nit: please use this helper: https://searchfox.org/mozilla-central/rev/d840ebd5858a61dbc1622487c1fab74ecf235e03/dom/base/nsIContent.h#279 if (parent->IsAnyOfHTMLElements(nsGkAtoms::script, nsGkAtoms::style)) { return false; }
Attachment #8876659 - Flags: review?(ehsan) → review+
Pushed by ehung@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/eaf99ba3813a Part 1: make word boundary check more consistent.r=Ehsan https://hg.mozilla.org/integration/autoland/rev/307fd37d519d Part 2: Exclude script and style tags when building soft text.r=Ehsan
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Depends on: 1418629
Why did you move a block of 241 lines from here https://hg.mozilla.org/mozilla-central/rev/eaf99ba3813a#l1.304 to here? https://hg.mozilla.org/mozilla-central/rev/eaf99ba3813a#l1.12 Looks like a huge change/patch when in effect nothing changed. I believe MozReview showed you moved blocks, but that's gone now. So the net change in Part 1 one is just this hunk here: https://hg.mozilla.org/mozilla-central/rev/eaf99ba3813a#l1.261
Oh, I see, the new code is using CHAR_CLASS_SEPARATOR and that comes from the moved hunk.
https://hg.mozilla.org/releases/mozilla-esr60/rev/78c54e4a83e8807cdeaa96e22648d381eb51c8da on THUNDERBIRD_60_VERBRANCH Backed out part of changeset eaf99ba3813a (bug 1362858, part 1, for causing bug 1418629) to build Thunderbird 60.2. a=jorgk
Regressions: 1418629

https://hg.mozilla.org/releases/mozilla-esr68/rev/0fb2663f9e540d943e9b6ef6617ead9dca2a5981 on THUNDERBIRD_68_VERBRANCH
Backed out part of changeset eaf99ba3813a (bug 1362858, part 1, for causing bug 1418629) to build Thunderbird 68.0. a=jorgk

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: