Last Comment Bug 699673 - For certain pages in UTF-16, the label of the tab that shows the page gets messed up.
: For certain pages in UTF-16, the label of the tab that shows the page gets me...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla10
Assigned To: Kang-Hao (Kenny) Lu [:kennyluck]
:
Mentors:
: 396636 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-03 17:52 PDT by Kang-Hao (Kenny) Lu [:kennyluck]
Modified: 2011-11-07 03:43 PST (History)
6 users (show)
smontagu: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
proposed patch with mochitest attached (2.88 KB, patch)
2011-11-03 19:33 PDT, Kang-Hao (Kenny) Lu [:kennyluck]
no flags Details | Diff | Review
Comments addressed. (3.05 KB, patch)
2011-11-04 06:45 PDT, Kang-Hao (Kenny) Lu [:kennyluck]
smontagu: review+
smontagu: checkin+
Details | Diff | Review

Description Kang-Hao (Kenny) Lu [:kennyluck] 2011-11-03 17:52:58 PDT
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1
Build ID: 20110928134238

Steps to reproduce:

Open data:text/html,%FE%FF


Actual results:

The label of the tab opening the page got messed up with something like "摡瑡㩴數琯桴浬Ⱕ䙅╆"


Expected results:

The label of the tab should be something readable, like something starting with "data:text/html,"
Comment 1 Kang-Hao (Kenny) Lu [:kennyluck] 2011-11-03 19:33:26 PDT
Created attachment 571873 [details] [diff] [review]
proposed patch with mochitest attached

I don't think it ever makes sense to unescape the URI with a non-ASCII superset encoding, and hence the proposed fix. The list of encodings was from nsHtml5StreamParser::PreferredForInternalEncodingDecl . AFAICT, the only two places where nsTextToSubURI::UnEscapeNonAsciiURI is used in mozilla trunk are:

1. setTabTitle in tabbrowser.xml when the title is set to empty. This relates to the bug I found. The encoding is from the content document.
2. nsJSProtocolHandler::EnsureUTF8Spec, where it seems that the only possible encoding is UTF-8 (though I can't explain why there's that needless aCharset there), so it should be unaffected by this patch.

I was tempted to change nsTextToSubURI::convertURItoUnicode but that function is used by nsTextToSubURI::UnEscapeURIForUI which is in turn used by nsLocation::getHash and WorkerPrivateParent<Derived>::SetBaseURI (why?) and I don't have the bandwidth to check if changing nsTextToSubURI::convertURItoUnicode would affect standard compliance.

The patch checks isUTF8 before returning the original URI because for a much more common case when the user opens a file encoded in UTF-16 with a non-ASCII filename (or URL).

By the way, I have two questions being a newbie to this module (and the Mozilla project):

1. How do I run mochitests in intl/uconv/tests/unit/ ? I don't even know how to run my test.. sorry
2. What is the purpose of the stateful charset check there in don't have the bandwidth to check if changing nsTextToSubURI::convertURItoUnicode ?
Comment 2 Kang-Hao (Kenny) Lu [:kennyluck] 2011-11-03 19:36:53 PDT
(In reply to KangHao (Kenny) Lu from comment #1)
> 2. What is the purpose of the stateful charset check there in don't have the
> bandwidth to check if changing nsTextToSubURI::convertURItoUnicode ?

Copy & Paste error. Oops.

2. What is the purpose of the stateful charset check in nsTextToSubURI::convertURItoUnicode ?
Comment 3 Simon Montagu :smontagu 2011-11-04 05:42:52 PDT
Comment on attachment 571873 [details] [diff] [review]
proposed patch with mochitest attached

Review of attachment 571873 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/uconv/src/nsTextToSubURI.cpp
@@ +263,5 @@
> +       aCharset.LowerCaseEqualsLiteral("utf-7") ||
> +       aCharset.LowerCaseEqualsLiteral("jis_x0212-1990") ||
> +       aCharset.LowerCaseEqualsLiteral("x-jis0208") ||
> +       aCharset.LowerCaseEqualsLiteral("x-imap4-modified-utf7") ||
> +       aCharset.LowerCaseEqualsLiteral("x-user-defined"))){

x-user-defined is an ASCII superset and x-jis0208 and jis_x0212-1990 are not supported charsets, so they don't need to be here.

::: intl/uconv/tests/unit/test_bug699673.js
@@ +1,1 @@
> +// Tests whether nsTextToSubURI does UTF-16 unesacpaing (it shouldn't)

"unescaping"

You also need to add the test to the manifest, intl/uconv/tests/unit/xpcshell.ini.

> 1. How do I run mochitests in intl/uconv/tests/unit/ ?

These are in fact xpcshell-tests. To run all the tests in the directory:

make -C {objdir}/intl/uconv/tests xpcshell-tests

To run just your test:
make SOLO_FILE="test_bug699673.js" -C {objdir}/intl/uconv/tests check-one

For more information, see https://developer.mozilla.org/en/Writing_xpcshell-based_unit_tests
Comment 4 Kang-Hao (Kenny) Lu [:kennyluck] 2011-11-04 06:41:07 PDT
(In reply to Simon Montagu from comment #3)
> ::: intl/uconv/src/nsTextToSubURI.cpp
> @@ +263,5 @@
> > +       aCharset.LowerCaseEqualsLiteral("utf-7") ||
> > +       aCharset.LowerCaseEqualsLiteral("jis_x0212-1990") ||
> > +       aCharset.LowerCaseEqualsLiteral("x-jis0208") ||
> > +       aCharset.LowerCaseEqualsLiteral("x-imap4-modified-utf7") ||
> > +       aCharset.LowerCaseEqualsLiteral("x-user-defined"))){
> 
> x-user-defined is an ASCII superset and x-jis0208 and jis_x0212-1990 are not
> supported charsets, so they don't need to be here.

OK, good to know. Fixed.
 
> ::: intl/uconv/tests/unit/test_bug699673.js
> @@ +1,1 @@
> > +// Tests whether nsTextToSubURI does UTF-16 unesacpaing (it shouldn't)
> 
> "unescaping"

Oops. Fixed.
 
> You also need to add the test to the manifest,
> intl/uconv/tests/unit/xpcshell.ini.

Done.

> > 1. How do I run mochitests in intl/uconv/tests/unit/ ?
> 
> These are in fact xpcshell-tests. To run all the tests in the directory:

Ah, I guess I had the misconception that all JS tests are of the same type.
Comment 5 Kang-Hao (Kenny) Lu [:kennyluck] 2011-11-04 06:45:10 PDT
Created attachment 571953 [details] [diff] [review]
Comments addressed.

By the way, I found a typo in my comment. s/converting "http:" such an encoding/converting "http:" with such an encoding/ .
Comment 6 Simon Montagu :smontagu 2011-11-06 02:34:57 PST
*** Bug 396636 has been marked as a duplicate of this bug. ***
Comment 7 Philip Chee 2011-11-06 03:35:48 PST
> +       aCharset.LowerCaseEqualsLiteral("utf-7") ||
I thought we dropped support for utf-7 on trunk?
Comment 8 Simon Montagu :smontagu 2011-11-06 07:09:10 PST
(In reply to Philip Chee from comment #7)
> > +       aCharset.LowerCaseEqualsLiteral("utf-7") ||
> I thought we dropped support for utf-7 on trunk?

We did, but we didn't remove the decoder completely, so I think it's safer to leave the condition for utf-7 in here.

Ideally, we should have an API for IsCharsetASCIICompatible and not have all those ifs anyway (bug 475351)
Comment 9 Simon Montagu :smontagu 2011-11-06 07:17:03 PST
Comment on attachment 571953 [details] [diff] [review]
Comments addressed.

https://hg.mozilla.org/integration/mozilla-inbound/rev/a0ec48842f56
Comment 10 Kang-Hao (Kenny) Lu [:kennyluck] 2011-11-06 08:48:15 PST
(In reply to Simon Montagu from comment #9)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/a0ec48842f56

Thanks for the r+ and the checkin!

(In reply to Simon Montagu from comment #8)
> Ideally, we should have an API for IsCharsetASCIICompatible and not have all
> those ifs anyway (bug 475351)

Agreed (I had the same comment when I was reading the HTML5 parser code), though I would assume IsCharsetASCIICompatible to be a property on nsIConverter and this part should really get moved to convertURItoUnicode after my concern in Comment 1 is addressed.


Uh... A question about procedure. Is it supposed to be the patch lander who appends the "r=xx" string in the commit message? I am asking this because last time (my first patch) I appended the string myself after getting r+ and since I didn't want to annoy the reviewer again (nothing changed besides the "r=xx" string in the patch), I gave myself a "kennyluck: review+", which is just... odd.

I am guessing I never need to put "r=xx" string in the commit message myself? Or should put that in before requesting review?
Comment 11 Philip Chee 2011-11-06 12:07:35 PST
> I am guessing I never need to put "r=xx" string in the commit message myself? Or should
> put that in before requesting review?
Once you get a review+, you should add a "r=xx" to your commit message.

And thank you for your patch!
Comment 12 Kang-Hao (Kenny) Lu [:kennyluck] 2011-11-06 15:33:49 PST
(In reply to Philip Chee from comment #11)
> > I am guessing I never need to put "r=xx" string in the commit message myself? Or should
> > put that in before requesting review?
>
> Once you get a review+, you should add a "r=xx" to your commit message.
>

I mean, then I need to make the patch that got reviewed obsolete right? My question is, how do I deal with the review flag for that trivial update? I can

1) Ask for the reviewer's review again (r?)
2) Give myself r+
3) Don't set the review flag

1) and 2) don't make much sense so I guess I should do 3) ?

Sorry for this rather .... dumb question, but an answer will kill my confusion.
Comment 13 Philip Chee 2011-11-06 19:25:46 PST
Oh sorry I thought you had commit rights. Whoever checks in the patch should be the person to add the review/approval flags to the commit message.
Comment 14 Kang-Hao (Kenny) Lu [:kennyluck] 2011-11-06 19:30:51 PST
(In reply to Philip Chee from comment #13)
> Oh sorry I thought you had commit rights. Whoever checks in the patch should
> be the person to add the review/approval flags to the commit message.

Oh, OK. Completely understood now. Thanks for the help!
Comment 15 Marco Bonardo [::mak] 2011-11-07 03:43:33 PST
https://hg.mozilla.org/mozilla-central/rev/a0ec48842f56

Note You need to log in before you can comment on or make changes to this bug.