Closed Bug 1565919 Opened 1 year ago Closed 1 year ago

Firefox in-built spellcheck for English en-US not working properly - single quote marks

Categories

(Core :: Spelling checker, defect, P2, critical)

68 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla70
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 69+ verified
firefox68 - wontfix
firefox69 + verified
firefox70 + verified

People

(Reporter: Babelfish0101, Assigned: m_kato)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0

Steps to reproduce:

I was writing a message in response to a thread at a mozillaZine forum. In the text I entered the word (including the single quote marks) 'update' the Firefox inbuilt speller marked the word as incorrect. I tried entering the word (including the double quote marks) "update" the speller did not mark the word as incorrect. In Firefox I use the Marco Pinto English GB speller list -- I keep this up to date at the latest version.

I emailed Marco Pinto suggesting that this might be bug in his speller. He wrote back saying that likely it was a bug, or some other recent change, Mozilla had done with the spellchecker component to Firefox -- perhaps the problem lay with Hunspell.

(I would note that I use the Marco Pinto GB speller also in Thunderbird Portable (from PortableApps). The error with the speller as described above does not occur in Thunderbird. Likewise I use the Marco Pinto GB speller in LibreOffice Writer, again, the error described above does not occur in that.)

Actual results:

Please see above.

Expected results:

I would expect that the word update enclosed in single quote marks should not be flagged as an error in Firefox. I have a suspicion that the Firefox inbuilt speller is not making allowances for people enclosing a word in single quote marks.

Component: Untriaged → Spelling checker
Product: Firefox → Core

Confirming
Tested on: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0 (latest Nightly)

Enter text example: 'test' Note Spellcheck thinks its misspelled. Works properly in Chrome.

Many posters to various forums etc will use the 'single quote' to emphasize a point. Double quotes are usually reserved for direct quotes from another post/party/person.

It makes no difference which dictionary you are using, as the built-in spell check in Firefox is doing the same as noted.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression
Regressed by: 1418629

Yes, also visible in Thunderbird 68.

Has Regression Range: --- → yes
Has STR: --- → yes

Makoto, any chance you could please have a look? Thanks!

Flags: needinfo?(m_kato)

I will look this

Assignee: nobody → m_kato
Flags: needinfo?(m_kato)

I don't know if this will be of any help but I have a suspicion that there might be a conflict in the inbuilt speller such that it expects a terminating single quote mark to be trying to delineate the start of a grammatically 'possessive' word e.g. "The sheep's wool was white." would be considered a proper word construction and it doesn't get marked as incorrect. But in the case of "The so-called 'sheep' wool was yellow and that made people suspicious." The speller identifies that as a not properly terminated possessive (in English no 's' added after the single quote mark - though there are grammatically legitimate exceptions to this), and hence marks it, quite correctly, as wrong against that rule. So I think there might be some kind of conflict going on at the algorithm level. Between people using two single quote marks, one at the beginning and one at the end, to emphasise a word or phrase and people using a single quote mark to denote a possessive. Just guessing of course but it looks a bit like that to me.

Exceptions to the rule would be something like "James' jacket was all crumpled." Grammatically that is valid but the speller marks it as wrong because there is no terminating 's' for James and the speller is coded such that it assumes there must be terminating 's'. Assuming that is a rule the speller is applying.

From the point of view of constructing algorithms for the speller to operate on it all seems like a bit of minefield for a coder to me.

General rules for possessives: Forming the Possessive

Hope this helps. Please ignore if I've just muddied the waters.

Doubtful we'll do anything for 68 at this point, but it would definitely be nice to get this fixed in time for 69/68.1esr.

FWIW, I'll put any fix we get here into TB asap.

This is not only broken with Marco's en-GB dictionary but also with the inbuilt en-US dictionary.

Things like 'update' but also quotes/citations like 'To be or not to be' get a misspelling shown on the word before the closing quote. Highly confusing. EDIT: We can't ship Thunderbird like this since we'd get too many complaints as people use it for writing.

Severity: normal → critical
Summary: Firefox in-built spellcheck for English GB not working properly - single quote marks → Firefox in-built spellcheck for English en-US not working properly - single quote marks

"Things like 'update' but also quotes/citations like 'To be or not to be' get a misspelling shown on the word before the closing quote."

Actually, if you look very closely you can see that for a word/phrase like 'wrong update' what actually gets marked as a misspelling is the word itself + the terminating quote mark. That's not immediately obvious to casual glance but a closer look and it can be seen.

Can I ask a very naive question: Would backing out bug 1418629 and parts of bug 1362858 be an option as I've done for TB 68.0?
https://hg.mozilla.org/releases/mozilla-esr68/rev/b62ef04e0947bf0d621cc87c2314ac3a4a660359
https://hg.mozilla.org/releases/mozilla-esr68/rev/0fb2663f9e540d943e9b6ef6617ead9dca2a5981

EDIT: Would that regress bug 1362858? I don't know what the snippet I reversed:
https://hg.mozilla.org/releases/mozilla-esr68/rev/0fb2663f9e540d943e9b6ef6617ead9dca2a5981#l1.12
tried to achieve. We ran all of Thunderbird 60.x with that backout and didn't see any adverse effects.

Duplicate of this bug: 1569033

Macro just pointed out in bug 1569033 that possessives of plurals are also affected, as in: The owners' cars can be parked on the houses' drives.
That's really pretty bad.

The priority flag is not set for this bug.
:smaug, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(bugs)

This is pretty bad. If nothing else, bug 1418629 should be backed out. But perhaps the fix is rather simple.

Flags: needinfo?(bugs)
Priority: -- → P2

Makoto, could we backout bug 1418629 from 69 at least?

Flags: needinfo?(m_kato)

As I said, can someone check what the initial fix in bug 1362858 does and whether it can be reverted, see comment #10.

(In reply to Olli Pettay [:smaug] from comment #15)

Makoto, could we backout bug 1418629 from 69 at least?

Hmm, and FF 68 ESR, no?

Sure, but let's start with 69 :)

Personally, I'd start with trying to understand the issue starting at bug 1362858. There Ehsan and Evelyn, who sadly left the project, came up with a patch, and reading through this bug doesn't fill me with so much confidence at all. As I said, TB 60.x ran with a partial backout for a year without adverse effects, but then of course our users "browse" their e-mail and not Facebook or Google Docs/Sheets.

(In reply to Jorg K (GMT+2) from comment #19)

Personally, I'd start with trying to understand the issue starting at bug 1362858. There Ehsan and Evelyn, who sadly left the project, came up with a patch, and reading through this bug doesn't fill me with so much confidence at all. As I said, TB 60.x ran with a partial backout for a year without adverse effects, but then of course our users "browse" their e-mail and not Facebook or Google Docs/Sheets.

Bug 1362858 has absolutely nothing to do with the bug at hand. This regression is very clearly caused by bug 1418629 and its fix will be either in the code added there or through adding more code related to handling the single quote character in determining word boundaries in spell checker.

Hi Ehsan, look like we differ here. Bug 1362858 caused bug 1418629, and bug 1418629 caused this bug here. How can it have absolutely nothing to do? It's a clear dependency chain. I just try to understand what the fix is about in bug 1362858. Apart from some reshuffling and part 2, these are the changes in part 1: https://hg.mozilla.org/mozilla-central/rev/eaf99ba3813a

TextNodeContainsDOMWordSeparator(nsINode* aNode,
                                  int32_t aBeforeOffset,
                                  int32_t* aSeparatorOffset)
 {
   // aNode is actually an nsIContent, since it's eTEXT
   nsIContent* content = static_cast<nsIContent*>(aNode);
   const nsTextFragment* textFragment = content->GetText();
   NS_ASSERTION(textFragment, "Where is our text?");
-  for (int32_t i = std::min(aBeforeOffset, int32_t(textFragment->GetLength())) - 1; i >= 0; --i) {
-    if (IsDOMWordSeparator(textFragment->CharAt(i))) {
+  nsString text;
+  int32_t end = std::min(aBeforeOffset, int32_t(textFragment->GetLength()));
+  bool ok = textFragment->AppendTo(text, 0, end, mozilla::fallible);
+  if(!ok)
+    return false;
+
+  WordSplitState state(nullptr, text, 0, end);
+  for (int32_t i = end - 1; i >= 0; --i) {
+    if (IsDOMWordSeparator(textFragment->CharAt(i)) ||
+        state.ClassifyCharacter(i, true) == CHAR_CLASS_SEPARATOR) {
       // Be greedy, find as many separators as we can
       for (int32_t j = i - 1; j >= 0; --j) {
-        if (IsDOMWordSeparator(textFragment->CharAt(j))) {
+        if (IsDOMWordSeparator(textFragment->CharAt(j)) ||
+            state.ClassifyCharacter(j, true) == CHAR_CLASS_SEPARATOR) {
           i = j;
         } else {
           break;
         }
       }
       *aSeparatorOffset = i;
       return true;
     }

What are those changes about and do they regress that bug if removed? Can they be tweaked slightly without causing bug 1418629?

(In reply to Olli Pettay [:smaug] from comment #15)

Makoto, could we backout bug 1418629 from 69 at least?

OK, I will back out this. But I found another regression (email case) by bug bug 1362858 when I test a lot. There was no test cases.

(In reply to Jorg K (GMT+2) from comment #21)

What are those changes about and do they regress that bug if removed? Can they be tweaked slightly without causing bug 1418629?

Right. I guess that this doesn't also consider some cases (email address, URL and etc.). I don't know whey Thunderbird users don't report email address case. (When typing dee@dee.com, dee@dee.com shouldn't check spell. After bug 1362858, dee in domain part is misspelled word).

Flags: needinfo?(m_kato)

Original regression was by bug 1362858, and bug 1518629 wasn't enough to fix.

By bug 1362858, we use CHAR_CLASS_SEPARATOR in additional to DOM word separator. But some characters such as single quote, @ and etc are CHAR_CLASS_SEPARATOR, so we may check spell by incomplete word.

We shouldn't separate word by characters that is email part, URL part or conditional punctuation.

And I also update test cases for this situation. <textarea> is better for spell checking since it can has multiple anonymous text nodes.

Makoto-san, please edit comment #24, it's bug 1418629 and not bug 1518629.

E-Mail users didn't complain, since bug 1362858 was always backed out on the ESR channel. Maybe beta users were more tolerant.

FWIW, the patch from the try run in comment #23 appears to restore the desired behaviour.

Pushed by m_kato@ga2.so-net.ne.jp:
https://hg.mozilla.org/integration/autoland/rev/15fbd2a9c32d
Don't separate words by contextual-based character. r=Ehsan
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70

Please take the necessary steps to get this backported to FF 69 beta and FF 68 ESR.

Please make it available for FF 68 ESR.

Thanks!

Hi Makoto, since 69 is marked as affected, could you please nominate an uplift of this fix to beta69? Thanks!

Flags: needinfo?(m_kato)

Comment on attachment 9081524 [details]
Bug 1565919 - Don't separate words by contextual-based character.

Beta/Release Uplift Approval Request

  • User impact if declined: This is regression by bug 1418629 and bug 1362858. When inputting 'test' (word between single quote), word with single quote becomes misspell word although word is correct spell.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: Bug 1570861
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Due to reverting bug 1418629 fix, original report of this issue is fixed by reverting.
    Also, although spellchecker scan current text for checking, this fix allows to scan more text until correct word break.
  • String changes made/needed: No

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: This issue is spellchecker regression that is in 68. Spellchecker shows misspell for correct word when user uses single quote.
  • User impact if declined: This is regression by bug 1418629 and bug 1362858. When inputting 'test' (word between single quote), word with single quote becomes misspell word although word is correct spell.
  • Fix Landed on Version: 70
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Due to reverting bug 1418629 fix, original report of this issue is fixed by reverting.
    Also, although spellchecker scan current text for checking, this fix allows to scan more text until correct word break.
  • String or UUID changes made by this patch: No
Flags: needinfo?(m_kato)
Attachment #9081524 - Flags: approval-mozilla-esr68?
Attachment #9081524 - Flags: approval-mozilla-beta?

(In reply to Makoto Kato [:m_kato] from comment #32)

Makoto, there is no uplift request in Bug 1570861

Flags: needinfo?(m_kato)

(In reply to Pascal Chevrel:pascalc from comment #33)

(In reply to Makoto Kato [:m_kato] from comment #32)

Makoto, there is no uplift request in Bug 1570861

I request uplift it too.

Flags: needinfo?(m_kato)

Comment on attachment 9081524 [details]
Bug 1565919 - Don't separate words by contextual-based character.

Fixes a bug causing some words to be incorrectly marked as misspelled due to some punctuation marks. Approved for 69.0b13 and 68.1esr.

Attachment #9081524 - Flags: approval-mozilla-esr68?
Attachment #9081524 - Flags: approval-mozilla-esr68+
Attachment #9081524 - Flags: approval-mozilla-beta?
Attachment #9081524 - Flags: approval-mozilla-beta+
Flags: qe-verify+
QA Whiteboard: [qa-triaged]

I have reproduced this issue in Nightly v70.0a1 (2019-07-14) (64-bit) and confirmed the fix in Nighlty v70.0a1 (2019-08-12) (64-bit), Beta v69.0b13 and ESR v68.1.0esr (32-bit) on Windows 10 and Mac OS 10.14.6.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.