GetNextMisspelledWord() returns misspelled words in style blocks (see comment #13)

RESOLVED FIXED in Firefox 45

Status

()

Core
Spelling checker
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: najoll, Assigned: Jorg K (GMT+2))

Tracking

Trunk
mozilla45
Points:
---

Firefox Tracking Flags

(firefox45 fixed)

Details

Attachments

(1 attachment, 5 obsolete attachments)

(Reporter)

Description

2 years ago
User Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0
Build ID: 20151014143721

Steps to reproduce:

In case (1) I checked the spelling of an e-mail by manually initiating a spell-check. (2) In case 2, check-as-you-type is enabled and a word is flagged as misspelled. (I am reporting two bugs.)

Problem 2 has been around for *years*.


Actual results:

Re 1: Sometimes (not always) non-printing characters - describing fonts, I think - appear as misspelled words.

Re 2: Sometimes (not always) I add a word to the dictionary and yet the word retain ths wobbly underline.


Expected results:

Obviously: in case 1, these words should not be flagged; in case 2, the underline should disappear.

By the way: I have not checked exhaustively to ensure that this bug has not been (or that these bugs, plural, have not been) reported before. Nor have I tested the Nightly version. Why should I have to do that?
(Reporter)

Comment 1

2 years ago
More information: problem 1 occurs on both Windows and Linux. I have had problem 2 on Windows; it may also affect Linux. Both problems may affect other operating systems too.
(Assignee)

Comment 2

2 years ago
What are "non-printing" characters? How do you see or insert them? Can you give an example?
Do you have an example for case 2?

This looks like a duplicate of bug 420817.

Also, this is in the wrong product/component: This should move to Core/Spelling Checker.
I will move it unless I close it as a duplicate.
Flags: needinfo?(joll.nicholas)
(Reporter)

Comment 3

2 years ago
Thanks for the response. It is hard for me to supply much more information, because the problem is intermittent. I can say the following. When I wrote of 'non-printing characters - describing fonts, I think' I meant strings of text, at least typically without spaces I think, that seem to name fonts, i.e. text along the lines of:

timesnewroman

Please note that this cannot be an exact duplicate of the other bug you mention, because the present bug affects me on Windows (as well as, I seem to recall, Linux). Also, I have encountered this behaviour for years.

Thanks.
Flags: needinfo?(joll.nicholas)
(Assignee)

Comment 4

2 years ago
Sorry, I don't get it. Where does "timesnewroman" appear? Surely it's a spelling error when it appears in the text.

If you can't deliver a reproducible test case, no one will look at the bug.
(Assignee)

Updated

2 years ago
Component: Untriaged → Spelling checker
Product: Thunderbird → Core
Version: 42 → 42 Branch
(Assignee)

Comment 5

2 years ago
Note that Thunderbird's spell checking functionality is supplied by Mozilla's Gecko engine. However, it might be easier to reproduce in Thunderbird.

You said: You add the word to the dictionary and the red spell check mark is not removed. Can that not be repeated? What would such a word be? Can you not take note of the word next time it happens?

Since the spell checking generally works, I assume that the text contains HTML formatting which isn't correctly removed before adding the word to the personal dictionary. There are two things you can do:

When it happens, immediately check the personal dictionary for the newly added word:
The personal dictionary is stored in a file "persdict.dat" in your profile folder, on Windows:
C:\Users\<username>\AppData\Roaming\Thunderbird\Profiles\<profile-name>\persdict.dat
On Linux:
~/.thunderbird/<profile-name>/persdict.dat

Save the message as a draft at that point and inspect the HTML. Alternatively you can install the "Stationary" add-on (https://addons.mozilla.org/en-GB/thunderbird/addon/stationery/) which allows a HTML view of the message.

We need to have a reproducible case.
(Reporter)

Comment 6

2 years ago
I have caused a little difficulty in reporting two bugs in one report, I fear. However, I do feel we are now getting somewhere. Thank you for your patience.

As to the problem that I marked '1', which concerns (as I put it)'non-printing characters - describing fonts, I think - appear as misspelled words': here is what happens. I create a new e-mail. I enter some text in it; let's say that the whole of that text comprises the words, 'See you later.' Next, I check the spelling of the e-mail. The spell-checker finds (or rather, in scare-quotes, 'finds') words - misspelled words - in the e-mail *that I did not type and which do not show up but seem instead to be be hidden font information*, as in the example I gave above. Words ('words') that this has happened with (working from, in fact, my persdict.dat; I must have added these terms to stop them being flagged): MsoListContinue; MsoFootnoteReference; charset; quotewithtag. I think that last one may be a clue. I have a Microsoft Word style of that name. So perhaps the problem owes to text pasted from Word. In fact, I've just now tested it, and this is indeed the case. To verify: open Word. Type 'testing' in a new document. Apply a style to that word. Select and copy the word into Thunderbird. Run spell-check. Note that the problem occurs even if one chooses 'paste without formatting'. Or at least it does on my (Windows) system. (Haven't tested all this on Linux yet.)

As to the problem I numbered '2', and which is about adding words to the dictionary: one can reproduce as follows. Create a new e-mail. Enter text including the word 'Slartibartfast'. Spell-check it. Add 'Slartibartfast' to the dictionary. Observe that the wavy red line persists. (This does not seem to happen in Firefox - only Thunderbird; and, again I am testing at present on Windows.)
(Assignee)

Comment 7

2 years ago
OK, problem 1) is quite clear.

When you paste text from MS Word, it is pasted as HTML with a lot of style information. Do as I said: Save the message and inspect the HTML source of the draft. You will find a lot of tags with classes whose names start with "Mso".

I've just repeated what you said:
copied a styled word from MS Word and ran the spell check. For one pasted word "testing" I get a million spell check errors, like for example "Cambria", which is part of the HTML.

So a proper summary of part 1 is:

When pasting text from MS Word into a Thunderbird mail composition, a subsequent manual spellcheck shows many errors. Words that are not on the pasted text, but in the HTML (eg. "Cambria") are flagged as misspelled.

This is indeed a Thunderbird problem.

The other problem I cannot reproduce. I added the text:
  this is a Slartibartfast 
I ran the manual spell check. I added the word to the dictionary. Spell check mark persisted. When I closed the spell check panel, the red spell check mark disappeared. In fact, I tried with a few misspelled words, which I all added to the dictionary. Their spell check marks persists until I close the spell check panel. I'm running TB Earlybird 44. Lost of changes in spell checking have happened in Gecko 44, so I suggest to retest this with TB of at least version 44. In fact, I tried in TB 38 just now, and after dismissing the panel, the red mark persisted as you describe.

So part 1 is a problem, part 2 is fixed. I will move the bug back to the Thunderbird product.
(Assignee)

Updated

2 years ago
Component: Spelling checker → Message Compose Window
Product: Core → Thunderbird
Version: 42 Branch → Trunk
(Assignee)

Updated

2 years ago
Summary: Spell check problems → Manual spell checker flags many words in the HTML source of the message as spelling errors (see comment #8)
(Assignee)

Comment 8

2 years ago
When pasting text from MS Word into a Thunderbird mail composition, a subsequent manual spellcheck shows many errors. Words that are not on the pasted text, but in the HTML source are flagged as misspelled.

Here is a little list of words that the spell check complains about:
Calibri, panose, mso, charset, swiss, MsoNormal, etc.

Here's an excerpt from the HTML source of the e-mail:
<!--
 /* Font Definitions */
 @font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;
	mso-font-charset:0;
	mso-generic-font-family:swiss;
	mso-font-pitch:variable;
	mso-font-signature:-536870145 1073786111 1 0 415 0;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal

So it looks to me that HTML comments are not skipped but indeed spell checked.
Maybe a M-C core problem after all, I'd have to investigate.

Neil, can you lend a hand here?
Flags: needinfo?(neil)
(Assignee)

Comment 9

2 years ago
Fixing derailed grammar from comment #7:
In fact, I tried with a few misspelled words, which I all added to the dictionary. Their spell check marks persisted until I closed the spell check panel. I'm running TB Earlybird 44. *LOTS* of changes in spell checking have happened in Gecko 44, so I suggest to retest this with TB of at least version 44.
(Assignee)

Updated

2 years ago
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Assignee)

Comment 10

2 years ago
Created attachment 8686130 [details] [diff] [review]
Mochitest to simulate the problem in Firefox

I created a Mochitest to simulate the problem in Firefox. I pasted some text from MS Word into an e-mail and then incorporated the whole HTML from the e-mail into the test.
Result: The test passes. The only misspelled word returned is the one that is expected. So at least here, the M-C spell checker has skipped the comments and other stuff it shouldn't be spell-checking.

Looks like I do need the inspiration from Neil ;-)
(Assignee)

Comment 11

2 years ago
Created attachment 8686177 [details]
Draft message demonstrating the problem.

Run a manual spell check in this message to see the problem. "Calibri" is the first misspelled word.
(Assignee)

Comment 12

2 years ago
Created attachment 8686232 [details]
Draft message demonstrating the problem (simplified)

OK, all you need is this in the message:

<html>
<head>
</head><body bgcolor="#FFFFFF" text="#000000">
<p>testing</p>
<style>
<!-- huhu -->
</style>
</body>
</html>

If you remove the <style> tags the comment is not spell checked. Weird.
(Assignee)

Updated

2 years ago
Component: Message Compose Window → Spelling checker
Product: Thunderbird → Core
(Assignee)

Comment 13

2 years ago
Created attachment 8686258 [details] [diff] [review]
Mochitest to simulate the problem in Firefox

OK, forget Thunderbird. This can be reproduced in Firefox alone.

The attached test prints:
3 INFO TEST-PASS | editor/composer/test/test_bug1219928.html | one misspelled word expected: misssspelled

misssspelled
huhu

'misssspelled' and 'huhu' come from two calls to GetNextMisspelledWord();

'huhu' is in a comment and should not be returned. Note that when removing the <style> tags, 'huhu' is not returned.
Attachment #8686130 - Attachment is obsolete: true
Attachment #8686177 - Attachment is obsolete: true
Attachment #8686232 - Attachment is obsolete: true
Flags: needinfo?(neil)
(Assignee)

Updated

2 years ago
OS: Unspecified → All
Hardware: Unspecified → All
Summary: Manual spell checker flags many words in the HTML source of the message as spelling errors (see comment #8) → GetNextMisspelledWord() returns misspelled words in HTML comments under certain circumstances (see comment #13)
(Assignee)

Comment 14

2 years ago
Robert, can you please give me your opinion on this.

This is the horrible HTML that we get when pasting from MS Word (simplified):

<div contenteditable id="en-US" lang="en-US">
<p>And here a misssspelled word</p>
<style>
<!-- huhu -->
</style>
</div>

Surely <!-- --> is a HTML comment and not a CSS comment.

The spell check selection returns one misspelled word, 'misssspelled'.
GetNextMisspelledWord() when called twice returns two, 'misssspelled' and 'huhu'. If you remove the <style> tags, it returns only one misspelled word.

So something is a little inconsistent here, do you agree?

The Thunderbird spell check calls GetNextMisspelledWord() and presents all a lot of confusing words to the user which originate from HTML comments inside <style> tags (see comment #8 for more details).
Flags: needinfo?(roc)
(Assignee)

Comment 15

2 years ago
Created attachment 8686731 [details] [diff] [review]
Proposed fix together with test.

I think we can safely skip style blocks in the spell check.
What do you think?
Attachment #8686258 - Attachment is obsolete: true
Attachment #8686731 - Flags: review?(roc)
(Assignee)

Updated

2 years ago
Summary: GetNextMisspelledWord() returns misspelled words in HTML comments under certain circumstances (see comment #13) → GetNextMisspelledWord() returns misspelled words in style blocks (see comment #13)
(Assignee)

Comment 16

2 years ago
Green try run here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=303769bd1dfb
(Reporter)

Comment 17

2 years ago
I (the original poster) should add that the problem seems to affect not just the main message text but also *signatures* that were created in Word.
(Assignee)

Comment 18

2 years ago
Sure, anything created in MS Word will have horrible HTML that trips up the spell checker. My patch proposes to skip more "invisible formatting" in the spell check.
(Reporter)

Comment 19

2 years ago
Jorg K: Well, I do hope your patch gets accepted. Here I might add that surely this problem affects *thousands* of users - because many people will paste from MS Word and some people will use signatures created in MS Word.

In the meantime: is there a way I can edit/cleanup my e-mail signature to avoid these problems? Note that the signature contains formatting and links and that I want to preserve all that.
(Assignee)

Comment 20

2 years ago
Edit the e-mail signature with a decent HTML editor, or better, with a plain text editor, like Notepad++.
Here is how one of my signatures looks like:
==
<p>Jörg Knobloch - 
<a href="http://www.jorgk.com">www.jorgk.com</a> -
<a href="mailto:jorgk@jorgk.com">jorgk@jorgk.com</a>
</p>
<img border="0" moz-do-not-send="true" src="http://www.jorgk.com/images/deflag.png" width="50" height="34">
==
Let's please stop the discussion about support issues here. Developers and reviewers don't appreciate bugs with many comments and comments that aren't really related to the bug. We understand the cause of the problem - thanks for reporting it - and it will get fixed.
(Assignee)

Comment 21

2 years ago
Comment on attachment 8686731 [details] [diff] [review]
Proposed fix together with test.

I've got to know Neil on another bug and he's happy to review this.
Flags: needinfo?(roc)
Attachment #8686731 - Flags: review?(roc) → review?(enndeakin)
(Assignee)

Updated

2 years ago
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
Comment on attachment 8686731 [details] [diff] [review]
Proposed fix together with test.

>+function spellCheckStarted() {
>+  var gMisspelledWord = gSpellChecker.GetNextMisspelledWord();

The 'g' prefix shouldn't be used for non-globals.
Attachment #8686731 - Flags: review?(enndeakin) → review+
(Assignee)

Comment 23

2 years ago
Created attachment 8691434 [details] [diff] [review]
Proposed fix together with test (nits fixed).

Carrying forward Neil Deakin's r+.

Thanks for the quick review.
Sorry about the badly named variables, it was a quick cut/paste job out of some existing code. I've fixed them now.
Attachment #8686731 - Attachment is obsolete: true
Attachment #8691434 - Flags: review+
(Assignee)

Comment 24

2 years ago
Dear Sheriff,

this is a very small tweak (one line!) that won't break anything, promise ;-)
It's not urgent to land, so in order to save machine resources, it would be best to combine it with some other landings.

Thank you in advance.
Keywords: checkin-needed

Comment 25

2 years ago
https://hg.mozilla.org/integration/fx-team/rev/61ad0b3da56a
Keywords: checkin-needed

Comment 26

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/61ad0b3da56a
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox45: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
(Assignee)

Updated

a year ago
Duplicate of this bug: 222747
You need to log in before you can comment on or make changes to this bug.