Closed Bug 943252 Opened 11 years ago Closed 10 years ago

Remove nsCharsetMenu from m-c

Categories

(Toolkit :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: hsivonen, Assigned: hsivonen)

References

Details

Attachments

(2 files, 6 obsolete files)

As of bug 805374 landing, nsCharsetMenu and charsetTitles.properties are dead code for Firefox, but they are still used by Thunderbird. Hence, they should move from m-c to c-c.
Attached patch Remove dead code from m-c (obsolete) — Splinter Review
Here's the m-c part. Now, who wants to write the corresponding c-c patch that imports this stuff to c-c?
Attached patch Remove even more stuff (obsolete) — Splinter Review
Attachment #8338361 - Attachment is obsolete: true
I've been having a look through the charset menus in comm-central. Apart from the message compose menu which doesn't have the auto-detect or the more encodings submenus, they are all broken, as are the customisation dialogs, the various charset preference widgets and also the editor save as charset dialog, because of the removal of .notForBrowser.
Do we want to support legacy character decoders in email? If not then perhaps the solution is to have one charset menu to rule them all, i.e.
* Move the charset menu from browser to toolkit (bug 940907)
* Update all of c-c to use the browser charset menu
Note that we lose customisation here but I don't know whether that's important, since the list is now exhaustive as far as we are concerned, right?
(In reply to neil@parkwaycc.co.uk from comment #4)
> I've been having a look through the charset menus in comm-central. Apart
> from the message compose menu which doesn't have the auto-detect or the more
> encodings submenus, they are all broken, as are the customisation dialogs,
> the various charset preference widgets and also the editor save as charset
> dialog, because of the removal of .notForBrowser.

Do you mean broken with the m-c patch from here applied or broken right now on trunk? .notForBrowser is supposed to still exist for the benefit of c-c on trunk.

> Do we want to support legacy character decoders in email?

I have been working with the assumption that mailnews *might* want to have decoders that Firefox doesn't. However, it seems that even mailnews developers don't really know if they truly need those decoders. Once bug 935453 reaches a Thunderbird release, we might get some data but not *all* data. If you want data about EUC-TW, ISO-2022-KR and such, you'll need to add telemetry for those.

As for *encoder* considerations, see bug 936466.

Needinfoing jcranmer if he has more to say.

> If not then
> perhaps the solution is to have one charset menu to rule them all, i.e.
> * Move the charset menu from browser to toolkit (bug 940907)
> * Update all of c-c to use the browser charset menu

Issues:

 * If mailnews wishes to keep supporting more encodings for incoming data than Firefox, the charset menu in the mail viewer may need to be broader than the browser charset menu.

 * The mail compose charset menu should probably be *narrower* than the current browser charset menu, except if we remove ISO-2022-JP from the menu in the browser (bug 943256), mail compose should still have ISO-2022-JP.

 * If I was in charge of the SeaMonkey Composer, I'd make it always output UTF-8, but I'm not in charge. As far as input goes, the browser's character encoding menu might work.

Anyway, I'd be seriously unhappy about having to change the browser's character encoding menu to accommodate non-browser use cases and the mailnews use cases are likely to differ, so I think it would make sense for mailnews to implement its own (RDF-free hopefully) view and compose character encoding menus separately from the browser's character encoding menu.

> Note that we lose customisation here but I don't know whether that's
> important, since the list is now exhaustive as far as we are concerned,
> right?

Right. I'd get rid of customizability. (Including customizability of the mail compose menu via a localizable pref.)
Flags: needinfo?(Pidgeot18)
(In reply to Henri Sivonen (:hsivonen) from comment #5)
> Do you mean broken with the m-c patch from here applied or broken right now
> on trunk? .notForBrowser is supposed to still exist for the benefit of c-c
> on trunk.

It is not.
https://mxr.mozilla.org/mozilla-central/search?string=notForBrowser
It was removed by bug 805374. See also bug 943732.
(In reply to Henri Sivonen from comment #5)
> If I was in charge of the SeaMonkey Composer, I'd make it always output UTF-8
It certainly defaults to UTF-8 ever since about:blank was flipped over.

> As far as input goes, the browser's character encoding menu might work.
Which is where bug 940907 would help, as it could then use CharsetMenu.jsm too.
(In reply to Henri Sivonen (:hsivonen) from comment #5)
> > Do we want to support legacy character decoders in email?
> 
> I have been working with the assumption that mailnews *might* want to have
> decoders that Firefox doesn't. However, it seems that even mailnews
> developers don't really know if they truly need those decoders. Once bug
> 935453 reaches a Thunderbird release, we might get some data but not *all*
> data. If you want data about EUC-TW, ISO-2022-KR and such, you'll need to
> add telemetry for those.
> 
> As for *encoder* considerations, see bug 936466.
> 
> Needinfoing jcranmer if he has more to say.

I'm much more minded to only explicitly consider the decoders/encoders we want to keep rather than the other way around. The only constraint I have is that we're likely to find defects only on the ESR branch, so we would need to be able to quickly turn the disabled ones back on [for TB only if necessary--I'm not objecting to an ifdef MOZ_THUNDERBIRD here]. For encoders, I really think we need to limit them to the actually used ones (i.e., everything that the encoding spec says is safe to decode), although our reply/forward code paths may assume we can encode any charset we can decode :-/.

Once again, I have no data on anything, only speculation. And I'm in a bad place to speculate.

[I'm excluding IMAP modified UTF-7, since the only reason that's a full encoding is to share a codepath with UTF-7 decoding]
Flags: needinfo?(Pidgeot18)
(In reply to Joshua Cranmer [:jcranmer] from comment #8)
> I'm much more minded to only explicitly consider the decoders/encoders we
> want to keep rather than the other way around.

There are really four questions here:
 1) What encodings does mailnews want in the compose charset menu?
 2) What encodings does mailnews want in the message view charset menu?
 3) What encoders does mailnews want to support?
 4) What decoders does mailnews want to support?

As long as we are mindful of the obvious subset relations for these, we don't need to answer 3 and 4 in *this* bug. In fact, strictly speaking we don't need to answer 1 or 2 in this bug, either, if this bug was treated as a mere code move bug instead of an opportunity to reimplement stuff.

But if we are re-designing the message compose and message view charset menus here:

The default message compose menu has the following very questionable items:

 * Western (ISO-8859-15): This encoding post-dates UTF-8. It seems implausible that there ever be a reason why it would be a better idea to send ISO-8859-15 than to send windows-1252 labeled as ISO-8859-1 or to send UTF-8. The German, Italian and Breton localizations default to ISO-8859-15, but it's *extremely* unlikely that the reason is practical. It's more likely to be an unpractical political statement. In fact, the whole existence of ISO-8859-15 amounts to impractical political posturing. (The encoding can't represent any characters that Windows-1252 and UTF-8 couldn't represent and both of those existed first.)

 * Arabic (ISO-8859-6): The Arabic localization defaults to UTF-8. Windows-1256 is not in the menu. Is there evidence that ISO-8859-6 ever makes more sense than Windows-1256 or UTF-8 and that users know what that case is and know to choose the menu item in that case?

 * Armenian (ARMSCII-8): Is there any evidence that any email app other than Thunderbird itself understands this encoding label? The Armenian  localization defaults to iso-8859-1.  Does that mean that Armenian users use intentionally mis-encoded fonts with iso-8859-1-labeled in email or that messages get autopromoted to UTF-8? Either way, ARMSCII-8 seems bogus as a menu item.

 * Baltic (ISO-8859-13): The Estonian localization defaults to ISO-8859-1, the Lithuanian localization to windows-1257 and (surprisingly!) we don't have a Latvian localization. Windows-1257 is not in the menu. Is there evidence that ISO-8859-13 ever makes more sense than Windows-1257 or UTF-8 and that users know what that case is and know to choose the menu item in that case?

 * Celtic (ISO-8859-14): Is there any evidence that this is ever more useful than UTF-8?

 * Having two Hebrew encodings, both iso-8859-8-i and windows-1255, in the menu: These encodings differ by one currency symbol. Is there any evidence that users know about the difference and can make an informed choice? Hebrew Thunderbird defaults to windows-1255 and Hebrew SeaMonkey to ISO-8859-8-I. I suggest having only windows-1255 in the menu if there's merit to having legacy single-byte encodings in the menu at all at this point.

 * Nordic (ISO-8859-10): Is there any evidence that this is ever more useful than UTF-8? How many users understand that you aren't supposed to use this for the majority languages of the Nordic countries? (The majority languages are ISO-8859-1/Windows-1252.)

 * South European (ISO-8859-10): Is there any evidence that this is ever more useful than UTF-8? How many users understand that "South European" is code for "Esperanto and Maltese"?

 * Vietnamese (VISCII): The Vietnamese localization default to UTF-8. VISCII doesn't appear to be necessary for legacy Web content. Does anyone *really* need this for sending email?

Frankly, it looks like the current menu  is based on shunning Microsoft and being inclusive of de jure standards  without regard to practical considerations.

Considering how Polish, Russian, Belarusian, Serbian, Vietnamese and Arabic localizations get away with defaulting to UTF-8 and how the Central European, Cyrillic and Arabic encodings are incompatible between the ISO and the Windows series, I'm skeptical about the utility of even offering the legacy options for Central European, Cyrillic, Vietnamese and Arabic. Actually, I'm rather skeptical about offering legacy options even for Western, Thai, Hebrew, Turkish, where the Windows series encodings are compatible or almost compatible with the ISO series encodings, since if CE/Cyrillic/Arabic/Vietnamese/Trad.Chinese can get away with UTF-8, why not Western/Thai/Turkish/Hebrew/Greek? Also, considering that Traditional Chinese Thunderbird defaults to UTF-8 and our Big5 and Big5-HKSCS encoding story is a mess, offering Big5 in the menu might do more hard than good.

If we don't want to block this bug on fear about Japanese feature phones or PRC encoding politics, it would already go a long way if the compose menu was just:
UTF-8
GB18030
ISO-2022-JP

And maybe EUC-KR.

> The only constraint I have is
> that we're likely to find defects only on the ESR branch, so we would need
> to be able to quickly turn the disabled ones back on [for TB only if
> necessary--I'm not objecting to an ifdef MOZ_THUNDERBIRD here].

Does that get defined for SeaMonkey, too? Or is there another define for mailnews?

> For
> encoders, I really think we need to limit them to the actually used ones
> (i.e., everything that the encoding spec says is safe to decode), although
> our reply/forward code paths may assume we can encode any charset we can
> decode :-/.

If that's the case, then the code paths are already broken, since we have a bunch of .notForOutgoing encodings.
I just noticed that even Greek defaults to UTF-8 already.
About the message view menu:

You can probably drop all the non-CJK encodings that aren't in the new browser encoding menu from the message view encoding menu, but you might want to keep CJK encodings that aren't in the browser menu (HZ, in the future ISO-2022-JP, EUC-TW, ISO-2022-CN, ISO-2022-KR).

The UI strings in the browser menu are written with the assumption that there won't be more encodings added, so you'd need a different source of UI strings. For example, in the browser menu, the UI string for EUC-KR is just "Korean", because it's the only Korean encoding. If you put ISO-2022-KR or JOHAB in the menu, that's no longer the case.

For now, I suggest pulling the UI strings for mailnews menus from charsetTitle.properties (after moving it to c-c!) in order not to block this bug on not knowing which encodings TB actually wants to keep.
(In reply to Henri Sivonen (:hsivonen) from comment #9)
> (In reply to Joshua Cranmer [:jcranmer] from comment #8)
> > I'm much more minded to only explicitly consider the decoders/encoders we
> > want to keep rather than the other way around.
> 
> There are really four questions here:
>  1) What encodings does mailnews want in the compose charset menu?
>  2) What encodings does mailnews want in the message view charset menu?
>  3) What encoders does mailnews want to support?
>  4) What decoders does mailnews want to support?

Answering them in reverse order, since it's easiest like that:
4. Anything the encoding spec says plus whatever we determine is necessary to support.
3. For the moment, probably everything in #4, since our compose-a-reply-message code may break if we have a decoder but not an encoder. In the long term, this would want to be the same as #1.
2. In theory, the same as #4. The message view charset is supposed to be able to override a mislabeled charset, so in theory it should be able to target anything that we can decode. In practice, though, it's basically a de-mojibake situation, which means we should be able to get by with only charsets likely to find mojibake (UTF-8 + ISO 8859-*, I think? I don't know the story for KOI-* or East Asian charsets here).

The only question with a truly interesting answer is #1. Any value which is in the send_default_charset pref for some locale at the moment should be on the list. This is UTF-8, ISO-8859-{1,2,8-I,9,15}, Windows-{1250,1251,1255,1257}, EUC-KR, ISO-2022-JP, gbk, Big5, although the Big5 and ISO-8859-8-I are only used in the suite/ locales and are 

>  * Western (ISO-8859-15): This encoding post-dates UTF-8. It seems
> implausible that there ever be a reason why it would be a better idea to
> send ISO-8859-15 than to send windows-1252 labeled as ISO-8859-1 or to send
> UTF-8. The German, Italian and Breton localizations default to ISO-8859-15,
> but it's *extremely* unlikely that the reason is practical. It's more likely
> to be an unpractical political statement. In fact, the whole existence of
> ISO-8859-15 amounts to impractical political posturing. (The encoding can't
> represent any characters that Windows-1252 and UTF-8 couldn't represent and
> both of those existed first.)

As I mentioned... somewhere else, the RFCs do explicitly prefer ISO-8859-* as a specific set of charsets to be supported. That is persuasive enough to me that we should emit an ISO-8859-* charset instead of a Windows-*-equivalent charset, at least as far as legacy charsets are concerned. Even were it not on the list of charsets defaulted to by some locale, I would have included it due to it being something for which I saw some recent use in the wild.

>  * Arabic (ISO-8859-6):
>  * Armenian (ARMSCII-8):
>  * Baltic (ISO-8859-13):
>  * Celtic (ISO-8859-14):

I don't have data to back me up, but I suspect these fall largely into the category of disuse. :-)

>  * Having two Hebrew encodings, both iso-8859-8-i and windows-1255, in the
> menu: These encodings differ by one currency symbol. Is there any evidence
> that users know about the difference and can make an informed choice? Hebrew
> Thunderbird defaults to windows-1255 and Hebrew SeaMonkey to ISO-8859-8-I. I
> suggest having only windows-1255 in the menu if there's merit to having
> legacy single-byte encodings in the menu at all at this point.

I assume that SeaMonkey's use of ISO-8859-8-I is wrong. Unless we change that charset, though, we'd need to include it in the menu or risk breaking SeaMonkey in the Hebrew locale :-( (the compose window I don't trust to work if the locale-default charset isn't in the selectable list of charsets in the UI).

> > The only constraint I have is
> > that we're likely to find defects only on the ESR branch, so we would need
> > to be able to quickly turn the disabled ones back on [for TB only if
> > necessary--I'm not objecting to an ifdef MOZ_THUNDERBIRD here].
> 
> Does that get defined for SeaMonkey, too? Or is there another define for
> mailnews?

I don't think there's a mailnews-only define.

Actually, come to think of it, if the main objection to having items in the list is the lack of UI labels for them, then the compose-charset menu ought to take a different approach. Ever since we instituted auto-fallback-to-UTF-8, then it is no longer necessary to change the charset to be able to send a message. The only people who change the setting are likely to be those who want to choose a specific charset and probably expect to see actual charset names instead of things like Arabic or Korean (this doesn't hold true for the other charset menus, though).
(In reply to Joshua Cranmer [:jcranmer] from comment #12)
> (In reply to Henri Sivonen (:hsivonen) from comment #9)
> > (In reply to Joshua Cranmer [:jcranmer] from comment #8)
> > > I'm much more minded to only explicitly consider the decoders/encoders we
> > > want to keep rather than the other way around.
> > 
> > There are really four questions here:
> >  1) What encodings does mailnews want in the compose charset menu?
> >  2) What encodings does mailnews want in the message view charset menu?
> >  3) What encoders does mailnews want to support?
> >  4) What decoders does mailnews want to support?
> 
> Answering them in reverse order, since it's easiest like that:
> 4. Anything the encoding spec says plus whatever we determine is necessary
> to support.

...and currently the only encoding that's not in the spec but is deemed "necessary" is UTF-7 and the rest is "don't know".

> 3. For the moment, probably everything in #4, since our
> compose-a-reply-message code may break if we have a decoder but not an
> encoder.

Have you tested replying to a UTF-7 message? (UTF-7 is flagged .notForOutgoing.)

> 2. In theory, the same as #4. The message view charset is supposed to be
> able to override a mislabeled charset, so in theory it should be able to
> target anything that we can decode.

FWIW, the browser encoding menu doesn't work like this. It doesn't include encodings that we can decode but are dangerous on the Web (not relevant to email), encodings that IE doesn't have in its corresponding menu (and, therefore, failure to label such an encoding is already irrecoverably broken in IE) or encodings whose difference can't possibly prompt the user to manually switch between the two.

Omitting items like this removes some confusion such as multiple "Western" items or multiple (non-visual) "Hebrew" items or multiple "Baltic (ISO)" items. In due course, we should get rid of duplicate Simplified Chinese items (leave only GB18030). It also makes the menu less cluttered by removing virtually unused European items that would sort all over the menu cluttering it up (Nordic, Celtic, South European, Romanian) and that have misleading names. (Well, "Celtic" may not be misleading. Romanian is misleading, since it's not really commonly used, so you probably don't want to try it for presumed-Romanian mojibake. Using Nordic to refer to Sami but not really Swedish/Norwegian/Danish/Finnish/Icelandic is cryptic and using South European to refer to Esperanto [and Maltese] is cryptic.)

> In practice, though, it's basically a
> de-mojibake situation, which means we should be able to get by with only
> charsets likely to find mojibake (UTF-8 + ISO 8859-*, I think?

ISO-8859-* is a very different story for * <= 9 and * >= 10. I recommend not putting * >= 10 in the menu, because * >= 10 are obscure due to having been introduced so late.

> The only question with a truly interesting answer is #1. Any value which is
> in the send_default_charset pref for some locale at the moment should be on
> the list. This is UTF-8, ISO-8859-{1,2,8-I,9,15},
> Windows-{1250,1251,1255,1257}, EUC-KR, ISO-2022-JP, gbk, Big5, although the
> Big5 and ISO-8859-8-I are only used in the suite/ locales and are 

I recommend exercising some oversight over the localizations to:

 * Take Big5 off the list. If Thunderbird can default to UTF-8, so could SeaMonkey. And our Big5 support is in bad enough a shape that it doesn't seem like a good idea to use it if avoidable.

 * Take ISO-8858-8-I off the list. If Thunderbird can default to windows-1255, so can SeaMonkey.

 * Take ISO-8859-15 off the list. ISO-8859-15 was introduced for transliterations in the Finnish context (completely missing the whole point of transliteration!), French and the euro sign. The notion that German, Italian and Breton would need to default to ISO-8859-15 is utterly bogus when the Finnish and French localizations and all the localizations for all the other encoding-wise "Western" Eurozone-affiliated languages get away with defaulting to ISO-8859-1 (i.e. windows-1252 intentionally mislabeled as ISO-8859-1).

> As I mentioned... somewhere else, the RFCs do explicitly prefer ISO-8859-*
> as a specific set of charsets to be supported. That is persuasive enough to
> me that we should emit an ISO-8859-* charset instead of a
> Windows-*-equivalent charset,

It might make sense to emit the ISO-8859-1 label when encoding in windows-1252 and to emit the ISO-8859-9 label when encoding in Windows-1254. Other than that:
 * TIS-620 is an older label than ISO-8859-11, so it doesn't make sense to prefer the ISO label.
 * ISO-8859-10 and above didn't exist at the time of the RFC and the RFC can't possibly give reasonable advice about them.
 * The Thunderbird defaults for the languages covered by ISO-8859-2 through ISO-8859-8-I look like a vote of no confidence against the ISO encodings. 

> I assume that SeaMonkey's use of ISO-8859-8-I is wrong. Unless we change
> that charset, though, we'd need to include it in the menu or risk breaking
> SeaMonkey in the Hebrew locale :-(

Surely we shouldn't include stuff to cater to SeaMonkey diverging from Thunderbird!

> Ever since we instituted
> auto-fallback-to-UTF-8, then it is no longer necessary to change the charset
> to be able to send a message. The only people who change the setting are
> likely to be those who want to choose a specific charset and probably expect
> to see actual charset names instead of things like Arabic or Korean (this
> doesn't hold true for the other charset menus, though).

Makes sense.
Should I expect the charset menus in comm-central to be under an RDF-free rewrite or should I expect the existing nsCharsetMenu stuff to move over?

comm-central badness that I was  previously unaware of has come to my attention. Specifically, there is  a setting for the default character encoding for each NNTP  server and the UI for that setting offers a broader range of encodings than the mail compose encoding menu.

Data gathered from the wild suggests that users make bad choices from that menu. For example, if you think that for Croatian discussions you should be choosing the only item from the list that says "Croatian",  you end up sending messages with an x- label in a legacy Mac encoding!

Should I file a different bug about that or should I expect all these menus to get a redesign as part of a non-RDF rewrite here (instead of the code move that I initially filed this bug about)?
Flags: needinfo?(neil)
Toolkit's charsetMenu.jsm is now almost usable in SeaMonkey's browser, and I am writing a patch to switch over to it. I'm also looking at the uses in editor and mailnews, although there I guess I may need to coordinate with jcranmer.
Flags: needinfo?(neil)
charsetTitles.properties is now part of bug 943268.
Summary: Move nsCharsetMenu and charsetTitles.properties to comm-central → Move nsCharsetMenu to comm-central
(In reply to neil@parkwaycc.co.uk from comment #15)
> Toolkit's charsetMenu.jsm is now almost usable in SeaMonkey's browser, and I
> am writing a patch to switch over to it. I'm also looking at the uses in
> editor and mailnews, although there I guess I may need to coordinate with
> jcranmer.

MXR suggests c-c hasn't migrated away from nsCharsetMenu. :-(

I guess I'll write a patch to import nsCharsetMenu to c-c to make progress on the m-c side. :-(
Assignee: nobody → hsivonen
Attachment #8407356 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attached patch c-c WIP that doesn't work (obsolete) — Splinter Review
This patch breaks Thunderbird, because something still searches charsetOverlay.xul under toolkit. Can someone who actually understands overlays and jar manifests tell me how to fix?
Neil, could you please take a look at why Thunderbird still looks for charsetOverlay.xul under mozilla/toolkit/ with this patch applied (on top of bug 937056)?
Flags: needinfo?(neil)
Because of the <?xul-overlay?> PIs referring to it.

Note that SeaMonkey no longer needs the charset overlay. (Unfortunately we're still using the charset menu in some of the preferences, including a file shared with Thunderbird, who still haven't decided what they're going to do about it...)
Flags: needinfo?(neil)
(In reply to neil@parkwaycc.co.uk from comment #23)
> Note that SeaMonkey no longer needs the charset overlay. (Unfortunately
> we're still using the charset menu in some of the preferences, including a
> file shared with Thunderbird, who still haven't decided what they're going
> to do about it...)

I'd personally prefer that the charset menus, at least for compose code, use the actual charset names instead of silly attempts to call them by languages, although I concede that the latter may be more useful for the message view.
(In reply to neil@parkwaycc.co.uk from comment #23)
> Because of the <?xul-overlay?> PIs referring to it.

None of the PIs have the string "toolkit/" in the reference. Yet, the error I see relates to the file not being found at a "toolkit/" path.

As noted in dev-platform and dev-apps-thunderbird, I'm inclined to land at the m-c patch here and consider the not-quite-working c-c patch as a courtesy starting point for c-c developers who know how to wrangle the jar manifests in Thunderbird and SeaMonkey.

(In reply to Joshua Cranmer [:jcranmer] from comment #24)
> I'd personally prefer that the charset menus, at least for compose code, use
> the actual charset names instead of silly attempts to call them by
> languages, although I concede that the latter may be more useful for the
> message view.

I'd prefer the mail-related encoding menus to be rewritten and trimmed a lot. E.g. the ARMSCII and VISCII entries are completely misguided. See comment 9 from January. I just don't want to keep dead code in m-c and keep building and shipping it in Firefox and Firefox OS while waiting.
(In reply to Henri Sivonen from comment #25)
> (In reply to comment #23)
> > Because of the <?xul-overlay?> PIs referring to it.
> 
> None of the PIs have the string "toolkit/" in the reference. Yet, the error
> I see relates to the file not being found at a "toolkit/" path.

The "toolkit" path you're seeing relates purely to the "toolkit.jar" line at the top of toolkit/content/jar.mn (note that the location of the jar.mn is purely coincidental and does not itself relate to the path you're seeing).
Comment on attachment 8407374 [details] [diff] [review]
Remove nsCharsetMenu from m-c, remove even more cruft

Review of attachment 8407374 [details] [diff] [review]:
-----------------------------------------------------------------

DING-DONG! THE WITCH IS DEAD!


Seem to have forgotten to remove nsCharsetMenu.cpp

::: toolkit/content/customizeCharset.xul
@@ -1,1 @@
> -<?xml version="1.0"?>

Huh, apparently we have code in the new appmenu in Firefox that opens this... but nothing that actually calls it. Could excise that too:
http://dxr.mozilla.org/mozilla-central/source/browser/components/customizableui/content/panelUI.js#364

::: toolkit/locales/en-US/chrome/global/intl.properties
@@ +40,5 @@
>  # http://mxr.mozilla.org/mozilla/source/browser/components/preferences/fonts.xul
>  font.language.group=x-western
>  
> +# LOCALIZATION NOTE (intl.charset.detector):
> +# Must be empty for locales other than Japanese, Russian and Ukrainian

Seems like this comment could be a bit more descriptive.
Attachment #8407374 - Flags: review?(bmcbride) → review-
(In reply to Blair McBride [:Unfocused] from comment #27)
> Seem to have forgotten to remove nsCharsetMenu.cpp

Oops. Fixed.

> ::: toolkit/content/customizeCharset.xul
> @@ -1,1 @@
> > -<?xml version="1.0"?>
> 
> Huh, apparently we have code in the new appmenu in Firefox that opens
> this... but nothing that actually calls it. Could excise that too:
> http://dxr.mozilla.org/mozilla-central/source/browser/components/
> customizableui/content/panelUI.js#364

Fixed.

> ::: toolkit/locales/en-US/chrome/global/intl.properties
> @@ +40,5 @@
> >  # http://mxr.mozilla.org/mozilla/source/browser/components/preferences/fonts.xul
> >  font.language.group=x-western
> >  
> > +# LOCALIZATION NOTE (intl.charset.detector):
> > +# Must be empty for locales other than Japanese, Russian and Ukrainian
> 
> Seems like this comment could be a bit more descriptive.

Fixed.
Attachment #8407374 - Attachment is obsolete: true
Attachment #8410083 - Flags: review?(bmcbride)
(In reply to neil@parkwaycc.co.uk from comment #26)
> (In reply to Henri Sivonen from comment #25)
> > (In reply to comment #23)
> > > Because of the <?xul-overlay?> PIs referring to it.
> > 
> > None of the PIs have the string "toolkit/" in the reference. Yet, the error
> > I see relates to the file not being found at a "toolkit/" path.
> 
> The "toolkit" path you're seeing relates purely to the "toolkit.jar" line at
> the top of toolkit/content/jar.mn (note that the location of the jar.mn is
> purely coincidental and does not itself relate to the path you're seeing).

Even after a deleting the obj dir and building afresh, when starting Thunderbird with these patches applied, I still get:

Chrome file doesn't exist: /opt/Projects/obj-comm/mozilla/dist/bin/chrome/toolkit/content/global/charsetOverlay.xul

AFAICT:
 1) No XUL file references to charsetOverlay.xul with "toolkit" in the reference.
 2) No jar manifest lists charsetOverlay.xul under "toolkit.jar".

Is the new c-c location I have for charsetOverlay.xul OK? If not, what path should I place it at?

What about the corresponding dtd?

What jar.mn lines should I be adding?

Should I change the referring PIs as a result? How?
Attached patch c-c WIP that doesn't work, still (obsolete) — Splinter Review
(Re-attaching my latest patch in case it differs from what I had already attached.)
Attachment #8407380 - Attachment is obsolete: true
Flags: needinfo?(neil)
(In reply to Henri Sivonen from comment #30)
> Even after a deleting the obj dir and building afresh, when starting
> Thunderbird with these patches applied, I still get:
> 
> Chrome file doesn't exist:
> /opt/Projects/obj-comm/mozilla/dist/bin/chrome/toolkit/content/global/
> charsetOverlay.xul
> 
> AFAICT:
>  1) No XUL file references to charsetOverlay.xul with "toolkit" in the
> reference.
>  2) No jar manifest lists charsetOverlay.xul under "toolkit.jar".
> 
> Is the new c-c location I have for charsetOverlay.xul OK? If not, what path
> should I place it at?
> 
> What about the corresponding dtd?
> 
> What jar.mn lines should I be adding?
> 
> Should I change the referring PIs as a result? How?

OK, so we need a quick tour of jar manifests to work out what we need.
1. Jar lines. Historically chrome was stored in individual jar files and these lines named those files. Nowadays they specify the subfolder (without the .jar suffix) in the omni.jar (or in the file system in your local build) at which all the entries are placed. This can mean that the name appears in error messages. You can place files from more than one source path into the same jar by specifying the same jar line, although you can only register any chrome URL once, and the files have to build into the same omni.jar (e.g. you can't mix browser and toolkit files in the same jar.mn; I'm not sure how that bit works, probably something in the moz.build files).
2. Registration lines. These match up chrome URLs with subfolders within the above jar folder. The format for content and skin types is % [type] [package] %[path] (I can't remember what the format for locale types is). This maps files stored under the given path to a chrome://package/type/ URL. There are also other registration lines that are not relevant here.
3. Entries. These map files stored in the source tree to the registered path. The format is {modifiers} [jarpath] {sourcepath}.

Therefore given a file in the source tree, you find its corresponding jar.mn manifest, look for the entry with its sourcepath, find the registration prefix matching its jarpath, and you can then figure out the chrome: URL for the file.

Now, with your patch as written, you added some entries to jar.mn under the messenger subfolder, but you gave them a content/global path which isn't registered anywhere.

The easiest way to make them available is to given them a content/messenger path which is registered to a chrome://messenger/ URL, however as you noticed you will need to change the referring PIs as a result.

I would prefer that you add all of your chrome files somewhere under mail/ as SeaMonkey no longer uses them. (Unfortunately we still have one shared use of the RDF data source, so we can't eliminate that yet.)
Flags: needinfo?(neil)
I'll try looking at using the new CharsetMenu.jsm in Thunderbird tonight. Preventing the use of x-mac-croatian in the compose window in ESR 31 would be a massive blessing to me. :-)

(In reply to neil@parkwaycc.co.uk from comment #32)
> I would prefer that you add all of your chrome files somewhere under mail/
> as SeaMonkey no longer uses them. (Unfortunately we still have one shared
> use of the RDF data source, so we can't eliminate that yet.)

Three, actually: EditorSaveAsCharset.xul, am-server.xul, and folderProps.xul. Plus there are three mentions in suite/ and two in mail/.
Depends on: 999881
(In reply to Henri Sivonen (:hsivonen) from comment #30)
> Is the new c-c location I have for charsetOverlay.xul OK? If not, what path
> should I place it at?

If you wait until I land bug 999881, you don't have to worry about porting any UI, just the nsCharsetMenu.cpp RDF gunk...
(In reply to Joshua Cranmer [:jcranmer] from comment #34)
> (In reply to Henri Sivonen (:hsivonen) from comment #30)
> > Is the new c-c location I have for charsetOverlay.xul OK? If not, what path
> > should I place it at?
> 
> If you wait until I land bug 999881, you don't have to worry about porting
> any UI, just the nsCharsetMenu.cpp RDF gunk...

After bug 999881, where will c-c use the nsCharsetMenu RDF data source besides the various popup menus buried in prefs? Could those menus be replaced with hard-coded XUL menus that only contain a more reasonable set of encodings?
Oops. Comment 33 already listed the remaining users.
Neil, thanks for the jar manifest help. I'll probably need it for bug 943268 even if not for the bug now that bug 999881 is involved.

This patch applies on top of the patch for bug 999881. The result runs, but the message compose encoding menu is empty, because its creation function went away.

This patch makes the contents of the menus for incoming and outgoing under Preferences: Display: Advanced less crazy per earlier comments in this bug. (I know it's not cool to do that in a supposed code *move*, but if I don't do it here, those menus will remain crazy.)
Attachment #8410100 - Attachment is obsolete: true
(In reply to Henri Sivonen (:hsivonen) from comment #37)
> This patch applies on top of the patch for bug 999881. The result runs, but
> the message compose encoding menu is empty, because its creation function
> went away.

I didn't upload my patch of the message compose encoding menu switcharoo because I decided to try something a bit more ambitious and it blew up instead.

> This patch makes the contents of the menus for incoming and outgoing under
> Preferences: Display: Advanced less crazy per earlier comments in this bug.
> (I know it's not cool to do that in a supposed code *move*, but if I don't
> do it here, those menus will remain crazy.)

Could you upload a diff of the two versions?
What kind of ambitious? The new compose menu could very short and hard-coded into XUL, right?
(In reply to Henri Sivonen (:hsivonen) from comment #39)
> What kind of ambitious? The new compose menu could very short and hard-coded
> into XUL, right?

"If I try to have a charset not in the menu, institute a fallback" (the intent was to explicitly prevent x-mac-croatian from being a usable charset).
(In reply to Joshua Cranmer [:jcranmer] from comment #38)
> Could you upload a diff of the two versions?

A diff isn't as useful as a description:

1) Make intl.charsetmenu.mailedit non-localizable (changing the corresponding function calls to read it in nsCharsetMenu.cpp).
2) Change intl.charsetmenu.mailedit from 
ISO-8859-1, ISO-8859-15, ISO-8859-6, armscii-8, ISO-8859-13, ISO-8859-14, ISO-8859-2, GB2312, GB18030, Big5, KOI8-R, windows-1251, KOI8-U, ISO-8859-7, ISO-8859-8-I, windows-1255, ISO-2022-JP, EUC-KR, ISO-8859-10, ISO-8859-3, TIS-620, ISO-8859-9, UTF-8, VISCII
to
UTF-8, ISO-8859-1, ISO-8859-2, GB2312, GB18030, KOI8-R, windows-1251, KOI8-U, ISO-8859-7, windows-1255, ISO-2022-JP, EUC-KR, TIS-620, ISO-8859-9
3) Add this filtering hack (the && part) to the decoder list enumeration loop in nsCharsetMenu.cpp:

+    // XXX hack. Filter out stuff that should not appear in a menu anywhere.
+    if (NS_SUCCEEDED(rv) && (value.EqualsLiteral("UTF-8") ||
+                             value.EqualsLiteral("Shift_JIS") ||
+                             value.EqualsLiteral("TIS-620") ||
+                             !value.Find("windows-") ||
+                             !value.Find("ISO-") ||
+                             !value.Find("gb") ||
+                             !value.Find("GB") ||
+                             !value.Find("Big5") ||
+                             !value.Find("KOI") ||
+                             !value.Find("HZ") ||
+                             !value.Find("EUC-"))) {
+      aArray.AppendElement(value);
+    }

The intl.charsetmenu.mailedit change above is rather conservative. I'd still like to advocate trimming it down to:
UTF-8
ISO-2022-JP
gb18030
EUC-KR
(In reply to Henri Sivonen (:hsivonen) from comment #41)
> (In reply to Joshua Cranmer [:jcranmer] from comment #38)
> > Could you upload a diff of the two versions?
> 
> A diff isn't as useful as a description:
> 
> 1) Make intl.charsetmenu.mailedit non-localizable (changing the
> corresponding function calls to read it in nsCharsetMenu.cpp).
> 2) Change intl.charsetmenu.mailedit from 
> ISO-8859-1, ISO-8859-15, ISO-8859-6, armscii-8, ISO-8859-13, ISO-8859-14,
> ISO-8859-2, GB2312, GB18030, Big5, KOI8-R, windows-1251, KOI8-U, ISO-8859-7,
> ISO-8859-8-I, windows-1255, ISO-2022-JP, EUC-KR, ISO-8859-10, ISO-8859-3,
> TIS-620, ISO-8859-9, UTF-8, VISCII
> to
> UTF-8, ISO-8859-1, ISO-8859-2, GB2312, GB18030, KOI8-R, windows-1251,
> KOI8-U, ISO-8859-7, windows-1255, ISO-2022-JP, EUC-KR, TIS-620, ISO-8859-9

No ISO-8859-15? That's #4 on my list of "most common charsets," after ISO-8859-1, UTF-8, and US-ASCII. The lack of Big5 (#9) is also slightly surprising to me. ISO-8859-6 I'm on the fence about, but the other missing charsets seem to me like they can very easily go still.
ISO-8859-15 post-dates UTF-8 and doesn't solve for email anything not already solved by windowws-1252. ISO-8859-15 lacks all practical merit and is about anti-Microsoft political posturing, ISO NIH and misguided reverence of the C1 control range. Its usage ranking doesn't mean that people need to be able to use it instead of UTF-8.
As for Big5, zh-TW TB alredy defaults to UTF-8 and Big5 is the flakiest of our converters.
In general, the compose menu should have emcodings that have use cases where they are needed instead of UTF-8. Only ISO-2022-JP has a stated use case like that...
(In reply to Henri Sivonen (:hsivonen) from comment #45)
> In general, the compose menu should have emcodings that have use cases where
> they are needed instead of UTF-8. Only ISO-2022-JP has a stated use case
> like that...

I can see people claiming that GB18030 is necessary (since, unlike many legacy charsets, GB18030 actually does have full Unicode support). I recall one of the force-UTF-8 bugs had a commenter suggest that the Chinese government mandated GB18030 support but I am not an expert here and can do nothing but report hearsay.

On the other hand, I think I will have effectively killed all uses of the mail-encodings set in the dependent bug anyways... :-)
Attachment #8410083 - Flags: review?(bmcbride) → review+
(In reply to Joshua Cranmer [:jcranmer] from comment #46)
> On the other hand, I think I will have effectively killed all uses of the mail-encodings set in the
> dependent bug anyways... :-)

There's still the outgoing mail pref buried under under Preferences: Display: Advanced. Furthermore, nsCharsetMenu is still used to populate the outgoing pref in the same window as well as the per-server setting in the NNTP server settings.

So what should the landing plan for this bug to be?

The m-c patch has been reviewed and could land without breaking Thunderbird's primary UI. Landing the patch would break some peripheral UI in Thunderbird, though: the outgoing/incoming mail pref UI and the NNTP per-server encoding pref UI.

At this point, I think there isn't much value in moving nsCharsetMenu into c-c. I think it would make more sense to get rid of the RDF data source altogether and too hard-code the above-mentioned pref pop-up menus in XUL.

My preferred option would be landing the m-c patch now and leaving it to Thunderbird developers to fix the peripheral UI as a follow-up, which would need uplift to Thunderbird 31 if fixed after ESR branches. Alternatively, I could wait until ESR has branched, but that would leave x-mac-croatian in the NNTP settings for another ESR cycle.
Flags: needinfo?(Pidgeot18)
(In reply to Henri Sivonen (:hsivonen) from comment #47)
> At this point, I think there isn't much value in moving nsCharsetMenu into
> c-c. I think it would make more sense to get rid of the RDF data source
> altogether and too hard-code the above-mentioned pref pop-up menus in XUL.

I don't know how hard it is to get rid of the last several RDF uses. I don't want to break these in 31, but since the most important UIs have evil charsets disabled (with tests!), the worst should be over.

> My preferred option would be landing the m-c patch now and leaving it to
> Thunderbird developers to fix the peripheral UI as a follow-up, which would
> need uplift to Thunderbird 31 if fixed after ESR branches. Alternatively, I
> could wait until ESR has branched, but that would leave x-mac-croatian in
> the NNTP settings for another ESR cycle.

You can no longer compose a message in x-mac-croatian. Once I land bug 790855, you can't use it in the RFC 2047 encoding either, so the damage here is effectively neutralized.
Flags: needinfo?(Pidgeot18)
(In reply to Joshua Cranmer [:jcranmer] from comment #48)
> (In reply to Henri Sivonen (:hsivonen) from comment #47)
> > At this point, I think there isn't much value in moving nsCharsetMenu into
> > c-c. I think it would make more sense to get rid of the RDF data source
> > altogether and too hard-code the above-mentioned pref pop-up menus in XUL.
> 
> I don't know how hard it is to get rid of the last several RDF uses. I don't
> want to break these in 31, but since the most important UIs have evil
> charsets disabled (with tests!), the worst should be over.

I'll wait until ESR has branched and then I'll land the m-c patch. At this point, adding nsCharsetMenu to c-c doesn't seem worthwhile, since, per below, the existing consumers will be wrong until changed and there's a whole ESR cycle of time to change them. Having nsCharsetMenu go away altogether will make it easier to get rid of charsetData.properties and charsetTitles.properties.

> > My preferred option would be landing the m-c patch now and leaving it to
> > Thunderbird developers to fix the peripheral UI as a follow-up, which would
> > need uplift to Thunderbird 31 if fixed after ESR branches. Alternatively, I
> > could wait until ESR has branched, but that would leave x-mac-croatian in
> > the NNTP settings for another ESR cycle.
> 
> You can no longer compose a message in x-mac-croatian. Once I land bug
> 790855, you can't use it in the RFC 2047 encoding either, so the damage here
> is effectively neutralized.

Well, then it's still bad UI to have the pref UI show options that never take effect.
Summary: Move nsCharsetMenu to comm-central → Remove nsCharsetMenu from m-c
Filed bug 1003716 as a Thunderbird reminder. Thanks everyone!
https://hg.mozilla.org/mozilla-central/rev/9eb4f13a78d8
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Depends on: 1005176
See Also: → 1130533
You need to log in before you can comment on or make changes to this bug.