User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1
Build ID: 20120713134347
Steps to reproduce:
I typed the superscript 1, 2, or 3 (¹ ² ³) characters in a textarea.
The characters were marked as misspelled.
As numbers, they should not have been checked in the first place.
Thanks for the bug report, Brian!
We even have a comment here about this: <http://mxr.mozilla.org/mozilla-central/source/extensions/spellcheck/src/mozInlineSpellWordUtil.cpp#952> :-)
Simon: is there a proper Unicode way of determining this?
mozilla::unicode::GetGenCategory(ch) == nsIUGenCategory::kNumber
(that doesn't account for supplementary characters, but neither does the current code)
(In reply to comment #2)
> mozilla::unicode::GetGenCategory(ch) == nsIUGenCategory::kNumber
> (that doesn't account for supplementary characters, but neither does the
> current code)
So that will only evaluate to true for '0'-'9'?
Is that what you want? I thought you wanted to identify any number in any script.
There are three "Number" categories in the Unicode Character Database (UCDB): Nd, Decimal number; Nl, Letter number; and No, Other number.
Nd includes sets of digits from 0-9 used in a decimal positional system, like ASCII 0-9, Arabic ٠-٩, Devanagari ०-९, etc., etc., etc.
Nl includes things like Roman numerals, U+3007 IDEOGRAPHIC NUMBER ZERO etc.
No includes the subscript and superscript digits and fractions and all the circled numbers, parenthesized numbers and so on.
There should be complete lists of which Unicode code points are in which category at http://www.fileformat.info/info/unicode/category/Nd/list.htm http://www.fileformat.info/info/unicode/category/No/list.htm and http://www.fileformat.info/info/unicode/category/Nl/list.htm, but that site seems to be down right now.
mozilla::unicode::GetGenCategory(ch) == nsIUGenCategory::kNumber will evaluate to true for codepoints in any of those categories. For a more fine-grained test one could use mozilla::unicode::GetGeneralCategory(ch) and test against some combination of HB_UNICODE_GENERAL_CATEGORY_DECIMAL_NUMBER, HB_UNICODE_GENERAL_CATEGORY_LETTER_NUMBER and HB_UNICODE_GENERAL_CATEGORY_OTHER_NUMBER.
Thanks for the explanation. I think we'll want to use mozilla::unicode::GetGenCategory(ch) == nsIUGenCategory::kNumber then. Aryeh, can you please take a look at this?
Created attachment 647173 [details] [diff] [review]
Are reftests really the best way to test this? If so, is there any good way to make sure the test isn't passing for some unrelated reason? It seems fragile. For instance, initially I forgot to add the focus() lines, and wouldn't have realized that made the test worthless if I didn't have a habit of reverting the code changes and making sure the test fails.
Comment on attachment 647173 [details] [diff] [review]
Please name the reftest something like spellcheck-superscript.html to keep the names in sync with the rest of the spell checker reftests. Also please use needs-focus.
In order to make the test more reliable, you should also add another version of the test which has another piece of text in the textarea (such as blahblah) which will get marked as misspelling, and you should mark the other one as !=. This will make sure that you'll get no misspelling markers in the case of the super-script characters alone, and you'll get some when a misspelled text gets added to the textarea.
r=me with the above.