Closed Bug 801402 Opened 12 years ago Closed 12 years ago

Use EncodingUtils::FindEncodingForLabel instead of nsCharsetAlias::GetPreferred from HTML5 parser and DOM APIs

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla19

People

(Reporter: emk, Assigned: emk)

References

(Blocks 2 open bugs)

Details

(Keywords: dev-doc-needed)

Attachments

(8 files, 11 obsolete files)

5.77 KB, patch
sicking
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
1.95 KB, patch
sicking
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
1.95 KB, patch
hsivonen
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
2.45 KB, patch
khuey
: review+
hsivonen
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
27.40 KB, patch
hsivonen
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
13.55 KB, patch
emk
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
14.43 KB, patch
hsivonen
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
3.62 KB, patch
hsivonen
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
It will take a long time to align intl/uconv to Encoding Standard because of the backward compatibility.
Depends on: 801487
Assignee: nobody → VYV03354
Status: NEW → ASSIGNED
Attachment #672553 - Flags: review?(jonas)
Attachment #672554 - Flags: review?(jonas)
Attachment #672555 - Flags: review?(bugs)
Attachment #672557 - Flags: review?(bent.mozilla)
Attachment #672558 - Flags: review?(bugs)
Attachment #672559 - Flags: review?(hsivonen)
Attachment #672560 - Flags: review?(bugs)
Comment on attachment 672555 [details] [diff] [review]
Use FindEncodingForLabel from XSLT

Jonas or peterv should review changes to XSLT.
Attachment #672555 - Flags: review?(bugs) → review?(jonas)
Attachment #672560 - Flags: review?(bugs) → review?(hsivonen)
Attachment #672555 - Attachment is patch: true
Comment on attachment 672558 [details] [diff] [review]
Use FindEncodingForLabel from OS.File

Sorry Jonas, I think you should review this too.
Attachment #672558 - Flags: review?(bugs) → review?(jonas)
Comment on attachment 672553 [details] [diff] [review]
Make EncodingUtil methods case sensitive

This patch caused many test failures.
Attachment #672553 - Attachment is obsolete: true
Attachment #672553 - Flags: review?(jonas)
Comment on attachment 672560 [details] [diff] [review]
Use FindEncodingForLabel from XML parser

The new method returns a label in the lower case and the returned label is used in a context that expect a Gecko-canonical label. Unfortunately, those are not all lower-case, yet.
Attachment #672560 - Flags: review?(hsivonen) → review-
Comment on attachment 672559 [details] [diff] [review]
Use FindEncodingForLabel from HTML parser

Same problem as with the SAX patch.
Attachment #672559 - Flags: review?(hsivonen) → review-
Oh. I didn’t look at the patch that changed what EncodingUtils return. But looks like that’s obsolete now.
(In reply to Henri Sivonen (:hsivonen) from comment #13)
> Oh. I didn’t look at the patch that changed what EncodingUtils return. But
> looks like that’s obsolete now.
Yup, I'll attach an updated patch once test failures are resolved.
- Fixed some bonehead mistakes.
- Made EncodingUtils thread-safe.
Attachment #673022 - Flags: review?(jonas)
Ah, one hank lacked.
Attachment #673022 - Attachment is obsolete: true
Attachment #673022 - Flags: review?(jonas)
Attachment #673025 - Flags: review?(jonas)
Fixed some tests.
Attachment #672554 - Attachment is obsolete: true
Attachment #672554 - Flags: review?(jonas)
Attachment #673027 - Flags: review?(jonas)
Fixed a test.
Attachment #672557 - Attachment is obsolete: true
Attachment #672557 - Flags: review?(bent.mozilla)
Attachment #673028 - Flags: review?(bent.mozilla)
Henri, could you rethink on the premise of the change of EncodingUtils?
Attachment #672559 - Attachment is obsolete: true
Attachment #673032 - Flags: review?(hsivonen)
Attachment #672560 - Flags: review- → review?(hsivonen)
Comment on attachment 673025 [details] [diff] [review]
Make EncodingUtil methods case sensitive

Drive-by comments:
Adding a mutex seems unfortunate. I’d prefer a design that only reads the shared data and, therefore, doesn’t need a mutex. The old alias code does not need a mutex, because it works with plain old C data in a read-only fashion: http://mxr.mozilla.org/mozilla-central/source/intl/locale/src/nsUConvPropertySearch.cpp#10

>  CopyASCIItoUTF16(mEncoding, aEncoding);
>  nsContentUtils::ASCIIToLower(aEncoding);

Why ASCIIToUTF16 instead of UTF8ToUTF16? What does the ASCIIToUTF16 version do to non-ASCII bytes?
Attachment #672560 - Flags: review?(hsivonen) → review+
Comment on attachment 673032 [details] [diff] [review]
Use FindEncodingForLabel from HTML parser

>@@ -1186,21 +1188,18 @@ nsHtml5StreamParser::PreferredForInterna

>-  rv = nsCharsetAlias::GetPreferred(newEncoding, preferred);
>-  if (NS_FAILED(rv)) {
>+  if (!EncodingUtils::FindEncodingForLabel(newEncoding, preferred)) {
>     // This charset has been blacklisted for permitting XSS smuggling.
>     // EncMetaNonRoughSuperset is a reasonable approximation to the
>     // right error message.

For this code to continue to make sense, you need to change the earlier nsCharsetAlias::Equals call to use the same data as EncodingUtils::FindEncodingForLabel. Once you change that, if your new Equals method has the same rv semantics as the old one (the old one returns a failure rv for bogus labels rather than returning success rv and setting the boolean to false), this call should never fail, because unsupported encodings would have been caught by the Equals check returning a failure rv for a bogus label. If you introduce an equals check with more intuitive semantics (merely returns false for bogus labels), then you need to make sure the error messages still work the right way with bogus labels but catching the bogus labels by other means (or with FindEncodingForLabel here and rearranging the error messages).
Attachment #673032 - Flags: review?(hsivonen) → review-
In general, nsCharsetAlias::Equals calls need to be replaced with something that uses the same data EncodingUtils::FindEncodingForLabel.
(In reply to Henri Sivonen (:hsivonen) from comment #20)
> Adding a mutex seems unfortunate. I’d prefer a design that only reads the
> shared data and, therefore, doesn’t need a mutex. The old alias code does
> not need a mutex, because it works with plain old C data in a read-only
> fashion:
> http://mxr.mozilla.org/mozilla-central/source/intl/locale/src/
> nsUConvPropertySearch.cpp#10
I borrowed nsUConvPropertySearch. Mutex is not longer present. Now EncodingUtils is completely static class.

> >  CopyASCIItoUTF16(mEncoding, aEncoding);
> >  nsContentUtils::ASCIIToLower(aEncoding);
> 
> Why ASCIIToUTF16 instead of UTF8ToUTF16? What does the ASCIIToUTF16 version
> do to non-ASCII bytes?
mEncoding is alwas ASCII.
Attachment #673025 - Attachment is obsolete: true
Attachment #673025 - Flags: review?(jonas)
Attachment #673583 - Flags: review?(jonas)
Fixed one more test.
Attachment #673027 - Attachment is obsolete: true
Attachment #673027 - Flags: review?(jonas)
Attachment #673584 - Flags: review?(jonas)
(In reply to Henri Sivonen (:hsivonen) from comment #21)
> >@@ -1186,21 +1188,18 @@ nsHtml5StreamParser::PreferredForInterna
> 
> >-  rv = nsCharsetAlias::GetPreferred(newEncoding, preferred);
> >-  if (NS_FAILED(rv)) {
> >+  if (!EncodingUtils::FindEncodingForLabel(newEncoding, preferred)) {
> >     // This charset has been blacklisted for permitting XSS smuggling.
> >     // EncMetaNonRoughSuperset is a reasonable approximation to the
> >     // right error message.
> 
> For this code to continue to make sense, you need to change the earlier
> nsCharsetAlias::Equals call to use the same data as
> EncodingUtils::FindEncodingForLabel. Once you change that, if your new
> Equals method has the same rv semantics as the old one (the old one returns
> a failure rv for bogus labels rather than returning success rv and setting
> the boolean to false), this call should never fail, because unsupported
> encodings would have been caught by the Equals check returning a failure rv
> for a bogus label. If you introduce an equals check with more intuitive
> semantics (merely returns false for bogus labels), then you need to make
> sure the error messages still work the right way with bogus labels but
> catching the bogus labels by other means (or with FindEncodingForLabel here
> and rearranging the error messages).
I've rewritten PreferredForInternalEncodingDecl so that nsCharsetAlias::Equals is no longer needed.

(In reply to Henri Sivonen (:hsivonen) from comment #22)
> In general, nsCharsetAlias::Equals calls need to be replaced with something
> that uses the same data EncodingUtils::FindEncodingForLabel.
HTML5 parser and old HTML parser was the only callers of nsCharsetAlias::Equals.
Attachment #673032 - Attachment is obsolete: true
Attachment #673586 - Flags: review?(hsivonen)
Blocks: 802082
Attachment #673586 - Flags: review?(hsivonen) → review+
Comment on attachment 673583 [details] [diff] [review]
Make EncodingUtil methods case sensitive

Did this patch bitrot or does it need another patch to be applied first?
Blocks: 796882
(In reply to Henri Sivonen (:hsivonen) from comment #27)
> Comment on attachment 673583 [details] [diff] [review]
> Make EncodingUtil methods case sensitive
> 
> Did this patch bitrot or does it need another patch to be applied first?

It needs attachment 671648 [details] [diff] [review] in bug 801487 patch first.
The attachment 671648 [details] [diff] [review] has been landed.
No longer depends on: 801487
Comment on attachment 673583 [details] [diff] [review]
Make EncodingUtil methods case sensitive

Review of attachment 673583 [details] [diff] [review]:
-----------------------------------------------------------------

I think Henri would be a better reviewer.
Attachment #673583 - Flags: review?(jonas) → review?(hsivonen)
Comment on attachment 673584 [details] [diff] [review]
Use EncodingUtils::FindEncodingForLabel from DOM

Review of attachment 673584 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/src/nsXMLHttpRequest.cpp
@@ +755,5 @@
>    nsAutoCString charsetVal;
> +  bool ok = channel ? NS_SUCCEEDED(channel->GetContentCharset(charsetVal)) :
> +                      false;
> +  if (ok) {
> +    ok = EncodingUtils::FindEncodingForLabel(charsetVal, mResponseCharset);

bool ok = channel &&
          NS_SUCCEEDED(...) &&
          EncodingUtils::Find...;
Attachment #673584 - Flags: review?(jonas) → review+
Fixed a bitrot by bug 801487 update & bug 799917 backout.
Attachment #673583 - Attachment is obsolete: true
Attachment #673583 - Flags: review?(hsivonen)
Attachment #678708 - Flags: review?(hsivonen)
Unbitrotted & resolved a review comment.
Attachment #673584 - Attachment is obsolete: true
Attachment #678713 - Flags: review+
Comment on attachment 678708 [details] [diff] [review]
Make EncodingUtil methods case sensitive

Looks good to me.
Attachment #678708 - Flags: review?(hsivonen) → review+
Comment on attachment 673586 [details] [diff] [review]
Use FindEncodingForLabel from HTML parser

Bug 716579 bitrotted the nsParser and nsScanner parts of this patch. :-(
Unbitrotted
Attachment #678878 - Flags: review?(hsivonen)
Attachment #673586 - Attachment is obsolete: true
Comment on attachment 673028 [details] [diff] [review]
Use FindEncodingForLabel from FileReaderSync

Bent seems to be very busy.
Attachment #673028 - Flags: review?(bent.mozilla) → review?(khuey)
Comment on attachment 678878 [details] [diff] [review]
Use FindEncodingForLabel from HTML parser

These changes themselves are OK, but I think the code in nsScanner after this patch can get confused and assert fatally if you don’t also change the alias resolution in nsDocument::TryChannelCharset to use the new alias code.
http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsDocument.cpp#3047
Attachment #678878 - Flags: review?(hsivonen) → review+
(In reply to Henri Sivonen (:hsivonen) from comment #40)
> Comment on attachment 678878 [details] [diff] [review]
> Use FindEncodingForLabel from HTML parser
> 
> These changes themselves are OK, but I think the code in nsScanner after
> this patch can get confused and assert fatally if you don’t also change the
> alias resolution in nsDocument::TryChannelCharset to use the new alias code.
> http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsDocument.
> cpp#3047
It's changed in attachment #678713 [details] [diff] [review].
The worker part can be landed separately.
Keywords: checkin-needed
Whiteboard: [leave open][land except 673028]
Blocks: 809347
Attachment #672555 - Flags: checkin+
Attachment #672558 - Flags: checkin+
Attachment #672560 - Flags: checkin+
Attachment #678708 - Flags: checkin+
Attachment #678713 - Flags: checkin+
Attachment #678878 - Flags: checkin+
Now form submission uses encoders so we need this workaround again until bug 562091 is fixed.
Attachment #679731 - Flags: review?(hsivonen)
Comment on attachment 673028 [details] [diff] [review]
Use FindEncodingForLabel from FileReaderSync

Henri, could you review this? This patch shouldn't require any knowledge specific to workers. I really want to land this part before aurora uplifting.
Attachment #673028 - Flags: review?(hsivonen)
Comment on attachment 679731 [details] [diff] [review]
Reintroduce x-windows-949 hack

r+, but this is so sad. We really need to align our internal encoding names with the Encoding Standard.
Attachment #679731 - Flags: review?(hsivonen) → review+
Comment on attachment 673028 [details] [diff] [review]
Use FindEncodingForLabel from FileReaderSync

I’m rather surprised that our old code assumed big endian with the "UTF-16" label and no BOM.
Attachment #673028 - Flags: review?(hsivonen) → review+
Keywords: checkin-needed
Whiteboard: [leave open]
Attachment #679731 - Flags: checkin+
Attachment #673028 - Flags: checkin+
https://hg.mozilla.org/mozilla-central/rev/1889ff1d55da
https://hg.mozilla.org/mozilla-central/rev/3e88fade2d0c
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
Depends on: 813427
Depends on: 814900
Depends on: 816849
Since Firefox 19, Web pages can use only encodings defined in <http://encoding.spec.whatwg.org/>.
Keywords: dev-doc-needed
This needs to be backed out of FF19 due to bug 782412, which caused "users will not be able to open Web pages encoded in UTF-16 big endian".
(In reply to Alex Keybl [:akeybl] from comment #53)
> This needs to be backed out of FF19 due to bug 782412, which caused "users
> will not be able to open Web pages encoded in UTF-16 big endian".

Nevermind we'll take bug 814900 instead.
Depends on: 844461
Depends on: 845743
Depends on: 846936
Depends on: 847880
Depends on: 847886
Depends on: 860180
Depends on: 931401
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: