Closed Bug 611580 Opened 9 years ago Closed 9 years ago

Character Encoding Menu for Fennec

Categories

(Firefox for Android Graveyard :: General, defect, major)

defect
Not set
major

Tracking

(fennec2.0+)

VERIFIED FIXED
Tracking Status
fennec 2.0+ ---

People

(Reporter: m_kato, Assigned: wesj)

Details

(Keywords: uiwanted, Whiteboard: [ime])

Attachments

(5 files, 3 obsolete files)

It is necessary for Japanese pages because JPN uses some encodings (Shift JIS, EUC-JP, JIS and UTF8).

See bug 575182 comment #14 (it is Firefox menu issue) why this menu is necessary by default.
tracking-fennec: --- → ?
Keywords: uiwanted
Where on earth will we put this?
Can we show it only for relevant locales?

Conceptually, it belongs in the site menu, but I don't want it there all the time.
For comparison, the stock Android Browser has a "Text encoding" option in its settings screen.  I'm not sure how this interacts with per-page auto detection.
If it has to be available, somehow, all the time (i.e., not just for certain locales), I'd stick an option under content in the prefs panel

Show Text Encoding  [yes|no]

which would mean a permanent "Text Encoding" item in the Site Menu.  Then for certain locales, this could be set to ON by default.
I was just looking at the desktop UI for this and it's... quite extensive.  What's the scope of what we need to include on mobile?
Also - is this a app-wide setting or a per-page setting?
The android browser UI suggests that it's an app-wide setting, but that suggests a different UI -- not in the site menu, just an item in prefs.
(In reply to comment #2)
> Can we show it only for relevant locales?
> 
> Conceptually, it belongs in the site menu, but I don't want it there all the
> time.

perhaps only for relevant locales and only on pages that don't specify an encoding (see the explanation in bug 575182 comment #14)
tracking-fennec: ? → 2.0+
Assignee: nobody → wjohnston
Attached patch WIP (obsolete) — Splinter Review
Here's a WIP on this. Adds an entry to the site menu for "Character Set". The shown list includes everything we have on desktop, unsorted, with none of the submenu things that happen there. It's all built through a fake menu/menupopup element built in javascript to avoid inserting new XUL into the document.

I'm trying to show/hide this based on whether or not the site included a character encoding in its httpResponse or through a meta-tag inside the page. That seems to work, but the code feels clunky. I'd like to find better ways to parse and look at that stuff.
Talked to mwu about this for a bit. I think we agree that showing/hiding this based on what the site says will likely result in users not having it when they need it. I need to rethink plans for where this will go, how/when to show it.

Stock browser (on my phone at least) only shows a few options:
Latin-1 (ISO-8859-1)
Unicode (UTF-8)
Chinese (GBK)
Chinese (Big5)
Japanese (ISO-2022-JP)
Japanese (SHIFT_JIS)
Japanese (EUC-JP)

Unless there is some argument, I'll adjust this to only include those specific entries. That will be a pref anyway, and should be easy to adjust in the future.
Could we have a screenshot of this? What happens in the worst case (web page not readable, I guess)?
Would the code page be a browser-wide setting or per page?
Attached image Screenshot (obsolete) —
Not sure that this is what you want a screenshot of, but this shows the UI. This is most commonly used in Asian languages. I can dig around for an example page if that is what we want.
The UI looks good. We also wanted to see what a webpage, in the incorrect encoding, looked like in Fennec.
Here is a screenshot of the japanese wikipedia mojibake site, rendered with two different encodings. In this case I had to force Fennec to screw up because the page specifies its encoding. Finding sites that do this is hard, especially for English speakers (at least I'm having trouble finding some, maybe someone from a different country can give us a better idea?)

As far as I understand this, if/when sites don't specify an encoding (either in a meta-tag or in the HTTPResponse header) Gecko has fallbacks that attempt to determine the the encoding. In Firefox you can specify the detector using the "Auto-Detect" setting. If both of those fail we fallback to ISO-8859-1. This does not provide UI for either of those settings (atuo-detect or default). I think the Android default browser likely only allows you to set the default, as I can't get it to display Wikipedia like this.

However, its likely that the biggest problem is sites that say they are in one encoding, but are actually in another.
How about adding a "More" button at the bottom of this dialog? We could put this behind there by default (and other things if the list of page actions ever grows too long). That vaguely matches the Android model from the menu button (where we also need to add a "More" button).
My strong preference, if it will solve the problem, is to have a setting in prefs the way the android browser does.  I think that I don't fully understand the problem though.  Is it

1 - that people need to over-ride the encoding setting for individual pages that have set their encoding incorrectly?

2a - that people want to specify their preferred encoding that will get applied to all sites that don't specify one?

2b - that people want to specify an encoding that will be applied to all sites no matter what they themselves specify?

something else?

Does it have to be a site by site thing?
I think at this point I would rather move this whole thing into an extension.

BUT, if we are seeing demand from some set of users I am fine with moving to the preference pane at this point. There are three settings in Firefox:

1.) A setting for which auto-detect mechanism will be used. We use this is the page doesn't tell us what character encoding it has to make a guess.
3.) A setting for the fallback if we auto-detect fails (there is no UI for this in Firefox)
2.) A setting for the encoding used for the current page, you can use this to override whatever the site/autodetect decided.

#3 is inherently a per-site setting, and is what I was trying to expose. If we move this to the preference pane, there is some question about which of these to expose there. Despite it not making sense from a user perspective, I would lean towards using it to override the current site Character Encoding, as that seems more in line with what they are trying to change.

I am adding mwu and Masayuki Nakano as they seem to know a bit more about this, and can correct me if I've said something wrong (sorry for the extra spam guys).
So, this is one of those cases where something is _never_ used by one large segment of the user population, and _very frequently_ used by another large segment of the user population.  Also, it doesn't sound like we can be clever about knowing when to hide it or when to show it.  Which suggests to me... user preference.

As I was suggesting in c4 above, before I actually understood the problem, is to put a pref in preferences like so:

Show Text Encoding [YES|NO]

I would default this to NO, though possibly some locales where this bug is more of a problem would default it to YES.

If it is set to yes, we would add the following to the Site Menu:

------------------
Set Text Encoding       (major title)
$CURRENT_SETTING        (subtitle - this is the encoding called for by the page)
------------------

If the user taps, they get a popup list, as in wesj's screenshot in comment 12.  The user can override the Text Encoding for the current page by changing it.  When the user changes it, the page changes to use that encoding.  This is a per-page setting.

Does that seem reasonable?
(In reply to comment #17)
> I think at this point I would rather move this whole thing into an extension.
> 
> BUT, if we are seeing demand from some set of users I am fine with moving to
> the preference pane at this point. There are three settings in Firefox:
> 
> 1.) A setting for which auto-detect mechanism will be used. We use this is the
> page doesn't tell us what character encoding it has to make a guess.
> 3.) A setting for the fallback if we auto-detect fails (there is no UI for this
> in Firefox)

See "Content" panel on pref dialog and press the "Advance..." button. Then, you can see the "Default Character Encoding" on bottom of the dialog. I think that autodetect cannot decide what is the right encoding, this encoding will be used.
# I'm not sure the detail of the fallback rules.

> 2.) A setting for the encoding used for the current page, you can use this to
> override whatever the site/autodetect decided.
> 
> #3 is inherently a per-site setting, and is what I was trying to expose. If we
> move this to the preference pane, there is some question about which of these
> to expose there. Despite it not making sense from a user perspective, I would
> lean towards using it to override the current site Character Encoding, as that
> seems more in line with what they are trying to change.
> 
> I am adding mwu and Masayuki Nakano as they seem to know a bit more about this,
> and can correct me if I've said something wrong (sorry for the extra spam
> guys).

I'm not familiar with mobile browser UI. We should research other mobile browsers' UI. Kato-san, do you have some advice?

This UI's difficult issue is, the UI is used only few times a week (or a month) because nowadays, most web page authors don't write HTML documents with their hand, they're typically using CMS or Blog. On the other hand, unfortunately, there are still existing some legacy web pages which were written when web standards are not popular than now. If we met such page, we need to use this UI.

So, the UI doesn't need to be one of the easiest accessible UI. But the UI should be found easily when users want. The most important thing about the UI is that users see the UI which opens the list of encoding in daily use. If so, users don't be confused.
(In reply to comment #17)
> I think at this point I would rather move this whole thing into an extension.
> 
> BUT, if we are seeing demand from some set of users I am fine with moving to
> the preference pane at this point. There are three settings in Firefox:
> 
> 1.) A setting for which auto-detect mechanism will be used. We use this is the
> page doesn't tell us what character encoding it has to make a guess.
> 3.) A setting for the fallback if we auto-detect fails (there is no UI for this
> in Firefox)
> 2.) A setting for the encoding used for the current page, you can use this to
> override whatever the site/autodetect decided.
> 
> #3 is inherently a per-site setting, and is what I was trying to expose. If we
> move this to the preference pane, there is some question about which of these
> to expose there. Despite it not making sense from a user perspective, I would
> lean towards using it to override the current site Character Encoding, as that
> seems more in line with what they are trying to change.

On mobile site, although Shift-JIS is used without meta element or http header in Japan, Fennec/Firefox sometimes detects wrong charset. So it causes character corruption (mojibake in Japanese).  Although Mozilla's auto-detection code is smarter than WebKit(icu) or IE(mlang), there is no algorithm to detect it perfectly.  If it is another language such as Chinese, they has several charsets such as GBK and etc, it is same situation.

(In reply to comment #19)
> I'm not familiar with mobile browser UI. We should research other mobile
> browsers' UI. Kato-san, do you have some advice?

Although it is not good to add many many items to site panel, it is best to add charset menu to it.  It should choose L10N team per language like Firefox menu or by the preferences whether charset menu in site panel like Find In Page.. As design, https://bug611580.bugzilla.mozilla.org/attachment.cgi?id=505441 is good way (I have no good idea except to it).

Charset is per contents, not per browsers.  So I think that charset menu of Android's default browser is bad design.
I think Kato-san is right.  If you want a test page for different encoding, see the links down at the bottom of this web page: http://linear.mv.com/cgi-bin/j-e

The encoding is on a per webpage basis.
Attached patch Patch (obsolete) — Splinter Review
OK. Sounds like they want this, so I'm getting it rolling. This adds an entry to preferences (at the very bottom) reading "Show Character Encoding", which toggles the menu on and off. The default value for this is stored in a pref that can be localized for different languages.

When its on, things work like I showed before for the most part. A "Character Encoding" option in the site menu which lets you switch between a few carefully chosen encodings. That list is also localizable. The current encoding is shown below it a subtitle on the item. I'll attach a shot.

I'm not quite sure if Gecko gives us better ways to find out the current encoding, and about where to store it. I'm using documentCharsetInfo here, but the documentation about it is... sparse. I'm also using Places to store the encoding the same way desktop does, but I don't think that actually does anything in us, or in desktop.
Attachment #513327 - Flags: feedback?(mark.finkle)
Comment on attachment 513327 [details] [diff] [review]
Patch

Oops. Might as well ask for review.
Attachment #513327 - Flags: feedback?(mark.finkle) → review?(mark.finkle)
Attached image MenuItem Screenshot
Attachment #505441 - Attachment is obsolete: true
Attached image Prefs shot
And the prefs screen with this.
Attachment #504907 - Attachment is obsolete: true
Comment on attachment 513327 [details] [diff] [review]
Patch

>diff --git a/app/mobile.js b/app/mobile.js

>+// Whether the character encoding menu is under the main Firefox button. This
>+// preference is a string so that localizers can alter it.
>+pref("browser.menu.showCharacterEncoding", "chrome://browser/locale/browser.properties");
>+pref("intl.charsetmenu.browser.menu", "chrome://browser/locale/browser.properties");

"intl.charsetmenu.browser.static", to match desktop?

>diff --git a/chrome/content/bindings/browser.xml b/chrome/content/bindings/browser.xml

>       <property name="documentCharsetInfo"

>+            return {
>+               forcedCharset : this._charset,
>+               forcedDetector : false,
>+               parentCharset : "",
>+               parentCharsetSource : 0,
>+            }

drop the trailing comma here

>diff --git a/chrome/content/bindings/setting.xml b/chrome/content/bindings/setting.xml

>+  <binding id="setting-localized-bool" extends="chrome://browser/content/bindings/setting.xml#setting-bool">

>+        <![CDATA[
>+          let val = Services.prefs.getComplexValue(this.pref,
>+                        Components.interfaces.nsIPrefLocalizedString).data;

Feel free to not wrap

>+          let pref = Components.classes["@mozilla.org/pref-localizedstring;1"]
>+                              .createInstance(Components.interfaces.nsIPrefLocalizedString);

Same

>diff --git a/chrome/content/browser.css b/chrome/content/browser.css

> setting[type="bool"] {
>   -moz-binding: url("chrome://browser/content/bindings/setting.xml#setting-bool");
> }
> 
>+setting[type="localized-bool"] {

I like this as: setting[type="bool"][localized="true"]

It keeps the type cleaner

>diff --git a/chrome/content/browser.xul b/chrome/content/browser.xul
>--- a/chrome/content/browser.xul
>+++ b/chrome/content/browser.xul
>@@ -50,16 +50,18 @@
> %browserDTD;
> <!ENTITY % brandDTD SYSTEM "chrome://branding/locale/brand.dtd">
> %brandDTD;
> <!ENTITY % prefsDTD SYSTEM "chrome://browser/locale/preferences.dtd">
> %prefsDTD;
> #ifdef MOZ_SERVICES_SYNC
> <!ENTITY % syncDTD SYSTEM "chrome://browser/locale/sync.dtd">
> %syncDTD;
>+<!ENTITY % charsetDTD SYSTEM "chrome://global/locale/charsetOverlay.dtd" >
>+%charsetDTD;
> #endif

Let's just put the single string into browser.dtd and skip loading this file. We already need to load a string from preferences.dtd

Also, you put this DTD in the MOZ_SERVICES_SYNC preprocessor block

>diff --git a/chrome/content/common-ui.js b/chrome/content/common-ui.js

>+  _getPrefCharSets: function(aPref) {
>+    return Services.prefs.getComplexValue("intl.charsetmenu.browser.menu",
>+                        Components.interfaces.nsIPrefLocalizedString).data.split(",");
>+  },

Ci and no wrap. Can we memoize this? The pref should not change during a run, so we only need to read it once, right?

>diff --git a/chrome/content/content.js b/chrome/content/content.js


>+var CharsetObserver = {
>+  init: function() {

>+  },
>+
>+  receiveMessage: function(aMessage) {

>+  }

Let's just put this into Content

r-, but looks good and follows desktop very nicely
Attachment #513327 - Flags: review?(mark.finkle) → review-
Attached patch Patch v3Splinter Review
Addressed comments
Attachment #513327 - Attachment is obsolete: true
Attachment #513510 - Flags: review?(mark.finkle)
Attachment #513510 - Flags: review?(mark.finkle) → review+
Pushed: http://hg.mozilla.org/mobile-browser/rev/c7ec6010dc25
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
verified:
Mozilla/5.0 (Android; Linux armv71; rv2.0b12pre) Gecko/20110222 Firefox/4.0b12pre Fennec/4.0b6pre
Device: Droid 2 
OS: Android 2.2

Mozilla/5.0 (Maemo; Linux armv71; rv:2.0b12pre) Gecko/20110222 Firefox/4.0b12pre Fennec/4.0b6pre
Device: Maemo N900

Note: you have to turn on encoding in preferences as stated above.
Status: RESOLVED → VERIFIED
Flags: in-litmus?(nhirata.bugzilla)
This seems to default to "true" in my en-US build, though it looks like the intention of the patch and comments was to have it default to "false". Is this a bug?
It is set to false:

http://mxr.mozilla.org/mobile-browser/source/locales/en-US/chrome/browser.properties#220

and shows up as false in my builds.
litmus test cases : 49250, 49299, 49431, 49320, 49506,
Flags: in-litmus?(nhirata.bugzilla) → in-litmus+
You need to log in before you can comment on or make changes to this bug.