Last Comment Bug 730209 - Dictionary list should show raw language subtags when names are unavailable
: Dictionary list should show raw language subtags when names are unavailable
Status: RESOLVED FIXED
[bcp47]
:
Product: Core
Classification: Components
Component: Spelling checker (show other bugs)
: 9 Branch
: All All
: -- normal (vote)
: mozilla14
Assigned To: Gordon P. Hemsley [:GPHemsley]
:
Mentors:
http://hg.mozilla.org/mozilla-central...
Depends on:
Blocks: bcp47 666731 739861 709930 741842
  Show dependency treegraph
 
Reported: 2012-02-23 23:15 PST by Gordon P. Hemsley [:GPHemsley]
Modified: 2012-04-03 09:37 PDT (History)
7 users (show)
ryanvm: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Parse dictionary filename in a BCP 47–aware way (4.13 KB, patch)
2012-02-24 16:50 PST, Gordon P. Hemsley [:GPHemsley]
gavin.sharp: review-
Details | Diff | Splinter Review
Screenshot of patch's behavior (183.36 KB, image/png)
2012-02-24 16:57 PST, Gordon P. Hemsley [:GPHemsley]
no flags Details
Parse dictionary filename in a BCP 47–aware way (v2) (4.62 KB, patch)
2012-03-25 17:29 PDT, Gordon P. Hemsley [:GPHemsley]
gavin.sharp: review+
Details | Diff | Splinter Review
Parse dictionary filename in a BCP 47–aware way (v3) (12.07 KB, patch)
2012-03-27 17:04 PDT, Gordon P. Hemsley [:GPHemsley]
gavin.sharp: review+
Details | Diff | Splinter Review
Screenshot of patch's behavior (v3) (42.42 KB, image/png)
2012-03-27 17:30 PDT, Gordon P. Hemsley [:GPHemsley]
no flags Details
609953: Parse dictionary filename in a BCP 47–aware way (v4) (12.08 KB, patch)
2012-03-27 20:12 PDT, Gordon P. Hemsley [:GPHemsley]
no flags Details | Diff | Splinter Review
Parse dictionary filename in a BCP 47–aware way (v5) (12.37 KB, patch)
2012-03-28 15:59 PDT, Gordon P. Hemsley [:GPHemsley]
no flags Details | Diff | Splinter Review

Description Gordon P. Hemsley [:GPHemsley] 2012-02-23 23:15:43 PST
The spellcheck dictionary interface should not rely on the main localized list of language and region subtags to be displayed on the dictionary selection list.

The list of possible language subtags is magnitudes longer than the list we expect to make available to localizers (bug 716321) in the short term. (In the long term, we hope to have the full list available, if only in their default English names; see bug 666662 and 666731.)

The spellchecking dictionary list interface is not aware of this fact and thus provides a nearly-useless experience for a user who installs a spellchecking dictionary for a language whose language subtag is not available in languageNames.properties. In particular, the code is unable to find a name associated with the particular subtag, so it displays absolutely nothing in its place.

Instead, it should simply display the raw language subtag, which is certainly better than nothing. Although less likely to be a problem, the same should be done for the region subtag, as well.

The code in question should also be made aware of the possibility of an intervening script subtag between the language and region subtags. The code was originally written at a time when the script subtag was not a standard part of a language tag, so it simply assumes that the second subtag is always region tag. This problem has been seen in the wild with, for example, Macedonian. There are spellcheckers available for that language in both Cyrillic and Latin scripts. However, since Firefox does not support the valid xx-Xxxx-XX subtag, the authors of those dictionary addons have resorted to abusing the fact that the spellchecker code outputs the third subtag as a literal (I assume in anticipation of it being a variant subtag), resulting in dictionaries for the invalid mk-MK-Cyrl and mk-MK-Latn language tags.

One final note: Kevin and I seem to recall that this behavior was present in Firefox, at least for the language subtag, at some point in the past. (Bug 666731 comment 1 suggests it being present as of 2011-07-20, though I can't confirm that's what I meant at the time.) I thought I'd tracked the change down to either bug 338427 or bug 678842, but neither of those bugs changed the code in question here. So either Kevin and I are misremembering, or there was some change somewhere else that had a cascading effect on this behavior and might be worth investigating.

Oh, I should be clear. The code in question is here:
http://hg.mozilla.org/mozilla-central/file/cd120efbe4c6/toolkit/content/InlineSpellChecker.jsm#l207
Comment 1 Gordon P. Hemsley [:GPHemsley] 2012-02-24 15:11:43 PST
The fix for this is pretty simple, at least in the short term. (That is, without localized script or variant subtags.) I've got a patch forthcoming for this. The long term implementation will happen in bug 666662 and bug 666731.

However, I am worried about what caused the regression, as it's clear that the code actually powering the list was never changed. I'm wondering if the string bundle service was changed so that it no longer returns the raw strings when a lookup failed. In particular, I worry what kind of cascading effects such a change might have had beyond just the spellchecker dictionary list.
Comment 2 Gordon P. Hemsley [:GPHemsley] 2012-02-24 16:50:01 PST
Created attachment 600568 [details] [diff] [review]
Parse dictionary filename in a BCP 47–aware way

Here's a patch that brings the dictionary filename parsing up-to-date with BCP 47. In particular reference to this bug, it allows fallback to the raw subtags if a corresponding localized name cannot be found. It also adds primitive support for script subtags and multiple variants.
Comment 3 Gordon P. Hemsley [:GPHemsley] 2012-02-24 16:57:13 PST
Created attachment 600573 [details]
Screenshot of patch's behavior

Here's a screenshot of the patch in action, parsing the following dictionaries:
haw-XA
en-US
qa-Qaaa-QZ-fonipa
dyo-SN
mk-MK-Latn
qa-Qaaa-QA-fonipa-fonxsamp
haw-US
mk-MK-Cyrl
haw-Cyrl-XA
csk-SN

(Note: Some of these are nonsensical test dictionaries; others are real dictionaries available in the wild.)
Comment 4 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-03-22 17:43:37 PDT
Comment on attachment 600568 [details] [diff] [review]
Parse dictionary filename in a BCP 47–aware way

>diff --git a/toolkit/content/InlineSpellChecker.jsm b/toolkit/content/InlineSpellChecker.jsm

>+      // Get the display name for this dictionary.
>+      let [languageTag, languageSubtag, scriptSubtag, regionSubtag, variantSubtags] = list[i].match(/([a-z]{2,3}|[a-z]{4}|[a-z]{5,8})(?:[-_]([a-z]{4}))?(?:[-_]([A-Z]{2}|[0-9]{3}))?((?:[-_](?:[a-z0-9]{5,8}|[0-9][a-z0-9]{3}))*)/i);

Can you put the regex into a local variable, at least, to make this slightly more readable?

>+      var displayName = ""; dump( languageTag + ", " + languageSubtag + ", " + scriptSubtag + ", " + regionSubtag + ", " + variantSubtags + "\n" );

You'll want to remove the dump before landing this.

>+      if (languageTag) {

Can you move all of this code into a getDictionaryDisplayName helper function?

>+          /*try {
>+            displayName += gScriptBundle.GetStringFromName(scriptSubtag.toLowerCase());
>+          } catch(e) {*/
>+            displayName += scriptSubtag; // Fall back to raw script subtag.
>+          /*}*/

Why is this commented out?

r- for these minor nits - if you address these comments I'll have a quicker turnaround, sorry for lagging behind on this request.
Comment 5 Gordon P. Hemsley [:GPHemsley] 2012-03-24 06:08:48 PDT
(In reply to Gavin Sharp (use gavin@gavinsharp.com for email) from comment #4)
> >+      // Get the display name for this dictionary.
> >+      let [languageTag, languageSubtag, scriptSubtag, regionSubtag, variantSubtags] = list[i].match(/([a-z]{2,3}|[a-z]{4}|[a-z]{5,8})(?:[-_]([a-z]{4}))?(?:[-_]([A-Z]{2}|[0-9]{3}))?((?:[-_](?:[a-z0-9]{5,8}|[0-9][a-z0-9]{3}))*)/i);
> 
> Can you put the regex into a local variable, at least, to make this slightly
> more readable?

So, you mean, something like this?

let languageTagMatch = /([a-z]{2,3}|[a-z]{4}|[a-z]{5,8})(?:[-_]([a-z]{4}))?(?:[-_]([A-Z]{2}|[0-9]{3}))?((?:[-_](?:[a-z0-9]{5,8}|[0-9][a-z0-9]{3}))*)/i;
let [languageTag, languageSubtag, scriptSubtag, regionSubtag, variantSubtags] = list[i].match(languageTagMatch);

> >+      var displayName = ""; dump( languageTag + ", " + languageSubtag + ", " + scriptSubtag + ", " + regionSubtag + ", " + variantSubtags + "\n" );
> 
> You'll want to remove the dump before landing this.

D'oh!

> >+      if (languageTag) {
> 
> Can you move all of this code into a getDictionaryDisplayName helper
> function?

So, something like getDictionaryDisplayName(matches)?

Out of curiosity, what benefit would that serve?

> >+          /*try {
> >+            displayName += gScriptBundle.GetStringFromName(scriptSubtag.toLowerCase());
> >+          } catch(e) {*/
> >+            displayName += scriptSubtag; // Fall back to raw script subtag.
> >+          /*}*/
> 
> Why is this commented out?

We don't currently have a dictionary for script subtags, but I wanted to indicate where it would go and what the code should look like when we got one.

> r- for these minor nits - if you address these comments I'll have a quicker
> turnaround, sorry for lagging behind on this request.

I'll try to get to it this week. Thanks for the review.
Comment 6 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-03-24 18:35:38 PDT
(In reply to Gordon P. Hemsley [:gphemsley] from comment #5)
> So, you mean, something like this?
> 
> let languageTagMatch =
> /([a-z]{2,3}|[a-z]{4}|[a-z]{5,8})(?:[-_]([a-z]{4}))?(?:[-_]([A-Z]{2}|[0-
> 9]{3}))?((?:[-_](?:[a-z0-9]{5,8}|[0-9][a-z0-9]{3}))*)/i;
> let [languageTag, languageSubtag, scriptSubtag, regionSubtag,
> variantSubtags] = list[i].match(languageTagMatch);

Yep.

> So, something like getDictionaryDisplayName(matches)?
> 
> Out of curiosity, what benefit would that serve?

No, I meant getDictionaryDisplayName(list[i]). The purpose is to make addDictionaryListToMenu easier to understand.

> We don't currently have a dictionary for script subtags, but I wanted to
> indicate where it would go and what the code should look like when we got
> one.

Seems unnecessary - this code isn't very complicated, so leaving it in commented out is more confusing than useful IMO.
Comment 7 Gordon P. Hemsley [:GPHemsley] 2012-03-25 17:29:29 PDT
Created attachment 609177 [details] [diff] [review]
Parse dictionary filename in a BCP 47–aware way (v2)

This should address the problems you mentioned.
Comment 8 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-03-26 17:53:09 PDT
Comment on attachment 609177 [details] [diff] [review]
Parse dictionary filename in a BCP 47–aware way (v2)

>diff --git a/toolkit/content/InlineSpellChecker.jsm b/toolkit/content/InlineSpellChecker.jsm

>+function getDictionaryDisplayName(rawLanguageTag) {

Put this at the end of the file, rather than between the definition of InlineSpellChecker and InlineSpellChecker.prototype.

>+  var displayName = "";
>+  if (languageTag) {

if (!languageTag)
  return rawLanguageTag;

and then unindent the rest of the function.

>-      var displayName = "";
>+      displayName = getDictionaryDisplayName(list[i]);

Looks like you lost the "var". But you can get rid of that variable entirely and just do:

item.setAttribute("label", getDictionaryDisplayName(list[i]));

below.

Given that the method is now factored out, it should be pretty easy to write a browser chrome test that verifies its output (ideally at least one test for all possible code paths). There's some info on browser chrome tests at https://developer.mozilla.org/en/Browser_chrome_tests, and an example browser chrome test at http://mxr.mozilla.org/mozilla-central/source/toolkit/content/tests/browser/browser_Geometry.js . 

r=me with those changes.
Comment 9 Gordon P. Hemsley [:GPHemsley] 2012-03-27 17:04:40 PDT
Created attachment 609953 [details] [diff] [review]
Parse dictionary filename in a BCP 47–aware way (v3)

This should address the issues mentioned here, as well as the issues discussed via IRC.

This moves getDictionaryDisplayName() back into InlineSpellChecker.prototype, and makes the regex exhaustive. It also changes the check for invalid language tags into a try/catch block.

Most importantly, it adds a bazillion tests. :)
Comment 10 Gordon P. Hemsley [:GPHemsley] 2012-03-27 17:30:06 PDT
Created attachment 609960 [details]
Screenshot of patch's behavior (v3)

Here's an updated screenshot of the patch's behavior. (Includes updated list of language names proposed in bug 716321.)
Comment 11 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-03-27 18:22:02 PDT
Comment on attachment 609953 [details] [diff] [review]
Parse dictionary filename in a BCP 47–aware way (v3)

>diff --git a/toolkit/content/tests/browser/browser_InlineSpellChecker.js b/toolkit/content/tests/browser/browser_InlineSpellChecker.js

>+let tempScope = {};
>+Components.utils.import("resource://gre/modules/InlineSpellChecker.jsm", tempScope);
>+let InlineSpellChecker = tempScope.InlineSpellChecker;

You can put this code inside the test() function (it's generally best to avoid running code in tests outside of test()).

Good job with the test coverage!
Comment 12 Gordon P. Hemsley [:GPHemsley] 2012-03-27 20:12:55 PDT
Created attachment 609988 [details] [diff] [review]
609953: Parse dictionary filename in a BCP 47–aware way (v4)

(In reply to Gavin Sharp (use gavin@gavinsharp.com for email) from comment #11)
> Comment on attachment 609953 [details] [diff] [review]
> Parse dictionary filename in a BCP 47–aware way (v3)
> 
> >diff --git a/toolkit/content/tests/browser/browser_InlineSpellChecker.js b/toolkit/content/tests/browser/browser_InlineSpellChecker.js
> 
> >+let tempScope = {};
> >+Components.utils.import("resource://gre/modules/InlineSpellChecker.jsm", tempScope);
> >+let InlineSpellChecker = tempScope.InlineSpellChecker;
> 
> You can put this code inside the test() function (it's generally best to
> avoid running code in tests outside of test()).

FTR, I copied that directly from the test you linked to me. ;)

But here's an updated patch.
Comment 13 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-03-27 20:14:21 PDT
(In reply to Gordon P. Hemsley [:gphemsley] from comment #12)
> FTR, I copied that directly from the test you linked to me. ;)

It's wrong there too :)
Comment 14 Gordon P. Hemsley [:GPHemsley] 2012-03-28 15:59:05 PDT
Created attachment 610334 [details] [diff] [review]
Parse dictionary filename in a BCP 47–aware way (v5)

Rolled up v4 into a Hg Queue patch for easier checkin.
Comment 15 Ryan VanderMeulen [:RyanVM] 2012-03-28 18:01:50 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/9949bb092303
Comment 16 Phil Ringnalda (:philor) 2012-03-28 22:58:10 PDT
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/bb679e5939b9 - odds are most of the bustage from that push came from your push-neighbors, but it seems maybe not entirely impossible that the way bug 718316 went from something that hadn't happened a single time since the end of February to something that happened every single time starting with that push might be related to this. I'd highly recommend a try push (and a defensively placed link to tbpl for it put into the bug).
Comment 17 Phil Ringnalda (:philor) 2012-03-28 23:26:47 PDT
Pushed to try in https://tbpl.mozilla.org/?tree=Try&rev=a81db845ed05 while you were pushing https://tbpl.mozilla.org/?tree=Try&rev=34db65aca0e5 so between us we ought to get you cleared of suspicion.
Comment 18 Gordon P. Hemsley [:GPHemsley] 2012-03-29 06:37:11 PDT
(In reply to Phil Ringnalda (:philor) from comment #17)
> Pushed to try in https://tbpl.mozilla.org/?tree=Try&rev=a81db845ed05 while
> you were pushing https://tbpl.mozilla.org/?tree=Try&rev=34db65aca0e5 so
> between us we ought to get you cleared of suspicion.

If I'm reading these results correctly, this patch seems to be in the clear. :)
Comment 20 Ed Morley [:emorley] 2012-03-30 13:07:37 PDT
https://hg.mozilla.org/mozilla-central/rev/7737ca752860

Note You need to log in before you can comment on or make changes to this bug.