Last Comment Bug 777529 - Superscript 1, 2, and 3 characters are marked as misspelled.
: Superscript 1, 2, and 3 characters are marked as misspelled.
Product: Core
Classification: Components
Component: Spelling checker (show other bugs)
: 14 Branch
: All All
-- normal (vote)
: mozilla17
Assigned To: Aryeh Gregor (:ayg) (next working March 28-April 26)
: Jet Villegas (:jet)
Depends on: 779551
  Show dependency treegraph
Reported: 2012-07-25 15:30 PDT by Brian Beloian
Modified: 2012-08-01 10:14 PDT (History)
3 users (show)
ayg: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

Patch (2.20 KB, patch)
2012-07-30 07:55 PDT, Aryeh Gregor (:ayg) (next working March 28-April 26)
ehsan: review+
Details | Diff | Splinter Review

Description User image Brian Beloian 2012-07-25 15:30:17 PDT
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1
Build ID: 20120713134347

Steps to reproduce:

I typed the superscript 1, 2, or 3 (¹ ² ³) characters in a textarea.

Actual results:

The characters were marked as misspelled.

Expected results:

As numbers, they should not have been checked in the first place.
Comment 1 User image :Ehsan Akhgari 2012-07-26 15:09:34 PDT
Thanks for the bug report, Brian!

We even have a comment here about this: <>  :-)

Simon: is there a proper Unicode way of determining this?
Comment 2 User image Simon Montagu :smontagu 2012-07-26 15:23:37 PDT
mozilla::unicode::GetGenCategory(ch) == nsIUGenCategory::kNumber 
(that doesn't account for supplementary characters, but neither does the current code)
Comment 3 User image :Ehsan Akhgari 2012-07-26 17:10:18 PDT
(In reply to comment #2)
> mozilla::unicode::GetGenCategory(ch) == nsIUGenCategory::kNumber 
> (that doesn't account for supplementary characters, but neither does the
> current code)

So that will only evaluate to true for '0'-'9'?
Comment 4 User image Simon Montagu :smontagu 2012-07-27 00:15:38 PDT
Is that what you want? I thought you wanted to identify any number in any script.

There are three "Number" categories in the Unicode Character Database (UCDB): Nd, Decimal number; Nl, Letter number; and No, Other number.

Nd includes sets of digits from 0-9 used in a decimal positional system, like ASCII 0-9, Arabic ٠-٩, Devanagari ०-९, etc., etc., etc.

Nl includes things like Roman numerals, U+3007 IDEOGRAPHIC NUMBER ZERO etc.

No includes the subscript and superscript digits and fractions and all the circled numbers, parenthesized numbers and so on.

There should be complete lists of which Unicode code points are in which category at and, but that site seems to be down right now.

mozilla::unicode::GetGenCategory(ch) == nsIUGenCategory::kNumber will evaluate to true for codepoints in any of those categories. For a more fine-grained test one could use mozilla::unicode::GetGeneralCategory(ch) and test against some combination of HB_UNICODE_GENERAL_CATEGORY_DECIMAL_NUMBER,  HB_UNICODE_GENERAL_CATEGORY_LETTER_NUMBER and HB_UNICODE_GENERAL_CATEGORY_OTHER_NUMBER.
Comment 5 User image :Ehsan Akhgari 2012-07-27 07:33:02 PDT
Thanks for the explanation.  I think we'll want to use mozilla::unicode::GetGenCategory(ch) == nsIUGenCategory::kNumber then.  Aryeh, can you please take a look at this?
Comment 6 User image Aryeh Gregor (:ayg) (next working March 28-April 26) 2012-07-30 07:55:29 PDT
Created attachment 647173 [details] [diff] [review]

Are reftests really the best way to test this?  If so, is there any good way to make sure the test isn't passing for some unrelated reason?  It seems fragile.  For instance, initially I forgot to add the focus() lines, and wouldn't have realized that made the test worthless if I didn't have a habit of reverting the code changes and making sure the test fails.
Comment 7 User image :Ehsan Akhgari 2012-07-30 09:28:39 PDT
Comment on attachment 647173 [details] [diff] [review]

Please name the reftest something like spellcheck-superscript.html to keep the names in sync with the rest of the spell checker reftests.  Also please use needs-focus.

In order to make the test more reliable, you should also add another version of the test which has another piece of text in the textarea (such as blahblah) which will get marked as misspelling, and you should mark the other one as !=.  This will make sure that you'll get no misspelling markers in the case of the super-script characters alone, and you'll get some when a misspelled text gets added to the textarea.

r=me with the above.
Comment 8 User image Aryeh Gregor (:ayg) (next working March 28-April 26) 2012-07-31 03:57:07 PDT
Comment 9 User image Ryan VanderMeulen [:RyanVM] 2012-07-31 19:19:49 PDT

Note You need to log in before you can comment on or make changes to this bug.