Closed Bug 324521 Opened 17 years ago Closed 17 years ago

Poor Inline Spell Checking Performance On Large Text Documents

Categories

(Thunderbird :: Mail Window Front End, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird2.0

People

(Reporter: jhg, Assigned: mscott)

References

(Blocks 1 open bug)

Details

(Keywords: fixed1.8.1, verified1.8.1.3)

Attachments

(2 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

The attached message causes a 100% CPU loop/hang when the "Forward" button
is pressed, but not when "Forward as -> Attachment" is used.

1) Press Forward button on the toolbar.  

   A compose window comes up but remains unpainted (only the frame appears).
   CPU goes to 100% and stays there.

2) From the menu, select "Forward -> Inline"

   A compose window comes up and fully paints with the message to be 
   forwarded, but then CPU goes to 100% and no more interaction is possible.

3) From the menu select "Forward -> As Attachment"

   Normal, correct behavior.  The message can be forwarded.

Reproducible: Always

Steps to Reproduce:
See details above




Scenarios 1 and 2 above may be the same.  I recall once seeing the behavior
in (2) when pressing the forward button.
Related to bug 311137?
Seeing the same thing here, on Windows 2000, Thunderbird version 1.5 (20051201).  It seems to happen with any email larger than 50KB.  Forwarding inline hangs; forwarding as an attachment works.
we're spending all our time spell-checking - I think we're just doing a single spell check of the whole message, but I haven't debugged it much yet.
I can confirm the spell check seems to be causing the hang.

1. Changed "mail.spellcheck.inline" to "false"
2. Restarted Thunderbird
3. Successfully forwarded 70KB message inline
4. Changed "mail.spellcheck.inline" back to "true"
5. Restarted Thunderbird
6. Attempted to forward the same 70KB message inline, Thunderbird hangs
(In reply to comment #5)
> I can confirm the spell check seems to be causing the hang.

Related to/duplicate of bug 323637?
Status: UNCONFIRMED → NEW
Ever confirmed: true
*** Bug 323637 has been marked as a duplicate of this bug. ***
Here's a stack trace:

>	editor.dll!nsTextEditRules::ReplaceNewlines(nsIDOMRange * aRange=0x074d02d0)  Line 1203	C++
 	editor.dll!nsHTMLEditRules::AfterEditInner(int action=3013, short aDirection=1)  Line 483 + 0x1d bytes	C++
 	editor.dll!nsHTMLEditRules::AfterEdit(int action=3013, short aDirection=1)  Line 391 + 0x14 bytes	C++
 	editor.dll!nsHTMLEditor::EndOperation()  Line 4133 + 0x40 bytes	C++
 	editor.dll!nsAutoRules::~nsAutoRules()  Line 125	C++
 	editor.dll!nsHTMLEditor::LoadHTML(const nsAString_internal & aInputString={...})  Line 244 + 0x57 bytes	C++
 	editor.dll!nsHTMLEditor::RebuildDocumentFromSource(const nsAString_internal & aSourceString={...})  Line 1813 + 0x39 bytes	C++
 	mail.dll!nsMsgCompose::ConvertAndLoadComposeWindow(nsString & aPrefix={...}, nsString & aBuf={...}, nsString & aSignature={...}, int aQuoted=0, int aHTMLEditor=1)  Line 543	C++
 	mail.dll!nsMsgCompose::BuildBodyMessageAndSignature()  Line 3784 + 0x2c bytes	C++
 	mail.dll!nsMsgCompose::InitEditor(nsIEditor * aEditor=0x06405578, nsIDOMWindow * aContentWindow=0x0742dd30)  Line 1370 + 0x11 bytes	C++

We go through this loop really really slowly when the document is large. Inside that loop, we seem to spend most of our time in here:

 	spellchk.dll!nsCOMPtr<nsIDOMNSRange>::nsCOMPtr<nsIDOMNSRange>(nsQueryInterface qi={...})  Line 646	C++
 	spellchk.dll!mozInlineSpellChecker::IsPointInSelection(nsISelection * aSelection=0x0749c868, nsIDOMNode * aNode=0x0a9e17ec, int aOffset=2, nsIDOMRange * * aRange=0x0012cf1c)  Line 1078	C++
 	spellchk.dll!mozInlineSpellChecker::SpellCheckRange(nsIDOMRange * aRange=0x0a9e1708, nsISelection * aSpellCheckSelection=0x0749c868)  Line 1013 + 0x3d bytes	C++
 	spellchk.dll!mozInlineSpellChecker::SpellCheckBetweenNodes(nsIDOMNode * aStartNode=0x0a9e17ec, int aStartOffset=0, nsIDOMNode * aEndNode=0x0a9e17ec, int aEndOffset=0, nsISelection * aSpellCheckSelection=0x00000000)  Line 708 + 0x1a bytes	C++
 	spellchk.dll!mozInlineSpellChecker::DidSplitNode(nsIDOMNode * aExistingRightNode=0x0a430ddc, int aOffset=13, nsIDOMNode * aNewLeftNode=0x0a9e17ec, unsigned int aResult=0)  Line 567	C++
 	editor.dll!nsEditor::SplitNode(nsIDOMNode * aNode=0x0a430ddc, int aOffset=13, nsIDOMNode * * aNewLeftNode=0x0012d1a4)  Line 1431	C++
 	editor.dll!nsHTMLEditor::CreateBRImpl(nsCOMPtr<nsIDOMNode> * aInOutParent=0x0012d1f0, int * aInOutOffset=0x0012d1ec, nsCOMPtr<nsIDOMNode> * outBRNode=0x0012d268, short aSelect=0)  Line 1479 + 0x36 bytes	C++
 	editor.dll!nsHTMLEditor::CreateBR(nsIDOMNode * aNode=0x0a430ddc, int aOffset=13, nsCOMPtr<nsIDOMNode> * outBRNode=0x0012d268, short aSelect=0)  Line 1529 + 0x2b bytes	C++
 	editor.dll!nsTextEditRules::ReplaceNewlines(nsIDOMRange * aRange=0x074d02d0)  Line 1227 + 0x3f bytes	C++
nsRange::IsIncreasing looks like the real bottleneck. That's usually on the stack when I break in the debugger...

http://lxr.mozilla.org/seamonkey/source/content/base/src/nsRange.cpp#678

Also, to answer your question, this does happen when you do edit | message as new on one of these large text messages. It feels like there's an n squared algorithm somewhere here - I don't know if it's in spell check or editor or content.
Flags: blocking-thunderbird2+
Target Milestone: --- → Thunderbird2.0
This patch attempts to remove the inline spell checker from being involved when doing the initial document load. It helps the new window time when forwarding and editing the test message as new. 

1) Stop calling checkDocument when mail compose initializes the editor shell
2) Don't spell check the entire document in mozInlineSpellChecker::SetEnableRealTimeSpell is called to enable inline spell checking. 
3) When recyling a compose window, turn off inline spell checking. This causes the inline spell check object to unregister it's event listeners on the editor shell. So the next time the recyled compose window is loaded, it won't see all of the editor notifications as the initial message contents get built by editor.

1 and 2 may have been working together to cause the document to get spell checked multiple times.  3 was causing recyled compose windows to be much slower than the initial compose window when opening the test message.

This patch isn't perfect. disabling then re-enabling inline spell check in the compose window causes you to see the same hang as before. Ditto for switching the language in the compose window. 

I think this document has so many words that are unrecognized, that our spell check selection is getting huge (over 22,000 elements) which makes the IsPointInSelection code really slow because no one envisioned a selection range with so many elements in it.
For grins, I commented out the code which adds mispelled word ranges to the spell check selection. This resulted in a substantial improvement to the amount of time we spend inline spell checking the entire document. Down to about 14 seconds on a debug build instead of 5+ minutes for this test message. 

Solution #1:

We could attempt to restrict the size of the spell check selection range by only underlining the first n mispellings of a word. This would in turn help limit the number of ranges we add to the spell check selection code. 

Add a hash table to the inline spell checker which stores: a word and an occurrence count. The occurrence count is the # of times the word has already been added to the selection range.

Every time we find a mispelled word, look it up in the hash table, if we have a match, check the occurrence count and only add the word to the spell check selection range if the occurrence count is less than n.

The occurrence count for our entry in the hash table only gets incremented if we end up adding the word to the spell check range.

When words are ignored or added to the dictionary, remove the entries from the hash table. 

Problems with this solution:

1) If you correct the spelling of a word yourself, we won't know this and won't know to adjust the occurrence account in our hash table for the word. Lets say you mispelled the word 'the' 25 times in a document and n is set to 10. Only the first 10 mispellings would get underlined. If you fix this ten yourself, we won't know to start underlining the next batch of 10. 

2) If you have a message the same size as this test message but every word is mispelled and is unique, we could still end up flooding the spell check selection range. 

Solution #2

A much simpler solution would be to have a global limit on the number of misspelled words we will add to the spell check selection range. Only the first n misspelled words will get underlined. Once the range reaches that size, no more words are added until words are removed from the spell check selection range. 

This would be much simpler to implement with less complexity and less overhead. No worries here with keeping track of the "occurrences" count for each word which would be buggy anyway. And it would work if every word was mispelled and unique.

Thoughts?
Status: NEW → ASSIGNED
(In reply to comment #11)
> Solution #2
> 
> A much simpler solution would be to have a global limit on the number of
> misspelled words we will add to the spell check selection range. Only the first
> n misspelled words will get underlined. Once the range reaches that size, no
> more words are added until words are removed from the spell check selection
> range. 

I like this solution.  The limit makes sense, too.  If I have a message where even 25% of the (unique) words are misspelled, there's a good chance I'm either a horrific speller and won't care anyway or the document contains a good amount of text content that really shouldn't be spell checked (e.g. code, log reports, etc.).  Even if that's not the case, fixing some of the issues should reveal more.  So if I'm reading this correctly, spell checking the document becomes incremental instead of all-at-once.  Make 'n' a hidden pref and set a good default value and I think we'll have the first issue solved.

One question though.  What happens in this case?...

[1. good text . . . . . . . . . . . . . . ]
[2. bad text containing N misspelled words]
[3. good text . . . . . . . . . . . . . . ]

User modifies a word in block 1 that results in a new misspelled word.  It's clear that the new misspelling will go unnoticed with the proposed change unless at least one word is corrected in block 2 first.  So what happens if a word is corrected in block 2 after the new misspelling in block 1?  Is block 1 re-checked?  Is there a most-recently-typed caching that is or could be applied?  (Please forgive my unfamiliarity w/ the current spell check implementation.)
adjusting the summary to be a bit more clear on what we'll actually be fixing for this bug. We may file some new bugs to track other areas for improvement. 
Summary: 100% CPU loop when trying to forward email → Poor Inline Spell Checking Performance On Large Text Documents
As previously discussed in comment 11, this patch adds the following optimizations to the inline spell checking code:

1) Add a pref controlled max value for the total number of words we allow into our spell check selection. Current default is 250 words. Stop trying to spell check text once we hit this number. Spell checking will begin again once the # of misspelled words dips below this threshold again.

This was implemented by adding AddRange and RemoveRange to mozInlineSpellChecker. The pref "extensions.spellcheck.inline.selection.size" was added to all.js because the spellchecker is built by thunderbird, firefox and the suite. Modifications were made to SpellCheckRange to kick out of the nested loops as soon as the spell check selection becomes full. 

2) Optimize Addword and IgnoreWord. We used to re-check the entire document whenever the user added or ignored a word. I wrote a new method which spell checks just the current spell check selection instead, thereby saving us from rechecking the entire document. This turned out to be a very nice win.

Modifications for this were made by re-writing SpellCheckSelection to handle passing in the spell check selection.

3) Some improvements were made to the mail compose window. 

Stop calling checkDocument when mail compose initializes the editor shell because mozInlineSpellChecker already calls checkDocument when inline spell checking gets initialized for the editor shell. No need to do it twice. 

When recyling a compose window, turn off inline spell checking. This causes
the inline spell check object to unregister it's event listeners on the editor
shell. So the next time the recyled compose window is loaded, it won't see all
of the editor notifications as the initial message contents get built by
editor, again generating needless attempts to spell check the document.One side effect of this improvement is that we may end up destroying the dictionary, reloading it the next time you bring up a compose window.
Attachment #212048 - Attachment is obsolete: true
Attachment #212173 - Flags: superreview?(bienvenu)
Testing Comments:

Feel free to manually change the pref to small value like 10 or so to make it easier to test. We'll call this number n (which is currently 250). You can unzip the test file in this bug and append it to a Local Folder. Then try to edit the message as new. 

1) We still spell check the document when you first bring up the compose window.

2) Once the inline spell checker finds n misspelled words, we stop underlining additional misspelled words.

3) deleting, ignoring or adding a word to thee dictionary should free up room for more mispellings. You can either type a misspelled word or arrow into and then out of an existing non-identified misspelled word and you should see it get underlined.

4) Enabling / Disabling inline spell checking (Options / Spell Check As You Type) should now be pretty snappy and not on the order of minutes. 

5) Switching dictionaries should also be pretty snappy and not on the order of minutes like it was before. 
Comment on attachment 212173 [details] [diff] [review]
inline spell check improvements

very nice.

couple nits:

-  return SpellCheckBetweenNodes(rootElem, 0, rootElem, -1, NULL);
+  for (PRInt32 index =0; index < count; index++)

need a space after the =, so = 0, not =0

"extensions.spellcheck.inline.selection.size";

this pref name is more of a backend code name, and a config editor user would have no clue what it meant - I'm not sure what else to call it - perhaps e.s.i.max_misspellings?
Attachment #212173 - Flags: superreview?(bienvenu) → superreview+
Comment on attachment 212173 [details] [diff] [review]
inline spell check improvements

mvl has been doing most of the peer reviews for me on the spell checker. 

mvl for more background, see comment 14 and comment 15.

Thanks
Attachment #212173 - Flags: review?(mvl)
*** Bug 315412 has been marked as a duplicate of this bug. ***
I renamed the pref to:

extensions.spellcheck.inline.max-misspellings

and fixed the typo.

I'm going to start to check this in so we can begin to get feedback on it. I'd still like an additional review from mvl even if it's after the fact, so feel free to still provide comments Michael. 
Comment on attachment 212173 [details] [diff] [review]
inline spell check improvements

high reward performance improvements to spell checking large documents.
Attachment #212173 - Flags: approval-branch-1.8.1+
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Comment on attachment 212173 [details] [diff] [review]
inline spell check improvements

(sorry for the late reply)
The patch looks fine to me, but i don't know the inline spellcheck very well. If you need a complete review, i need more time.
*** Bug 339752 has been marked as a duplicate of this bug. ***
*** Bug 344004 has been marked as a duplicate of this bug. ***
Comment on attachment 212173 [details] [diff] [review]
inline spell check improvements

Does this patch still need review? I don't have time for doing the review, maybe you can look for somebody else.
Attachment #212173 - Flags: review?(mvl)
*** Bug 347681 has been marked as a duplicate of this bug. ***
*** Bug 349216 has been marked as a duplicate of this bug. ***
*** Bug 336843 has been marked as a duplicate of this bug. ***
*** Bug 341390 has been marked as a duplicate of this bug. ***
*** Bug 353620 has been marked as a duplicate of this bug. ***
*** Bug 354395 has been marked as a duplicate of this bug. ***
*** Bug 356671 has been marked as a duplicate of this bug. ***
OS: Windows XP → All
Hardware: PC → All
Blocks: 360434
*** Bug 329631 has been marked as a duplicate of this bug. ***
*** Bug 361361 has been marked as a duplicate of this bug. ***
*** Bug 212358 has been marked as a duplicate of this bug. ***
verified fixed 1.8.1.3 using Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.3) Gecko/20070326 Thunderbird/2.0.0.0 Mnenhy/0.7.5.0 ID:2007032620 and verified fixed after some tests.
Keywords: verified1.8.1.3
See Also: → 1743318
You need to log in before you can comment on or make changes to this bug.