Closed
Bug 107790
Opened 23 years ago
Closed 9 years ago
nsIUnicodeDecoder, add an option to proceed the conversion skipping errors.
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: nhottanscp, Assigned: smontagu)
References
Details
Attachments
(2 files)
8.89 KB,
patch
|
Details | Diff | Splinter Review | |
1.90 KB,
patch
|
Details | Diff | Splinter Review |
The current behavior of unicode decoder when the input is not valid range in the charset is to abort the process and return error. This is not convenient for the callers who wants to ignore those out of range characters and proceed the conversion. Can we do something similar to nsIUnicodeEncoder::SetOutputErrorBehavior?
Reporter | ||
Comment 1•23 years ago
|
||
Change summary this bug is for nsIUnicodeDecoder not nsIUnicodeEncoder.
Summary: nsIUnicodeEncoder, add an option to proceed the conversion skipping errors. → nsIUnicodeDecoder, add an option to proceed the conversion skipping errors.
Comment 2•23 years ago
|
||
Can it be done for 0.9.6 (less that a week from now)?
Comment 3•23 years ago
|
||
*** Bug 107712 has been marked as a duplicate of this bug. ***
Comment 4•23 years ago
|
||
why we need this?
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.9
Comment 5•23 years ago
|
||
We need this because the HTTP parser is already implementing this functionality on its own. The CSS loader also needs this functionality. Rather than having them both have to implement it, it makes a lot more sense to move it into the unicode decoder, which should be able to handler conversion errors much better because it has more knowledge of exactly what state it's in and such when the error occurs. In particular, give bug 106843 a read for an example of why this is wanted.
Comment 6•23 years ago
|
||
*** Bug 114209 has been marked as a duplicate of this bug. ***
Comment 7•23 years ago
|
||
*** Bug 115805 has been marked as a duplicate of this bug. ***
Comment 8•23 years ago
|
||
unmark this as ---. give it to shanjian. It seems a big chunk of work.
Assignee: ftang → shanjian
Status: ASSIGNED → NEW
Target Milestone: mozilla0.9.9 → ---
Comment 9•23 years ago
|
||
Could we get a realistic target assessment here so we know whether to work around this in the CSSLoader, please? The "chunk of work" is already done in the CSS Parser and needs to happen in the CSS Loader too unless the decoders do it.
Comment 10•23 years ago
|
||
I'm actually a former victim of this "bug". Well i think the character encoding workaround for CSS shouldn't be done. Just because somebody screws or mixes encodings (like me...) isn't worth creating a workaround. It's an html-editor fault and must therefore be fixed by html- editors, not by the browser. To support inconsistent encoded html-pages is vital for being able to read them (or parts at least), but css is not. I think this is not a bug and must not be fixed. Support standards and don't make **** code work like MSIE does.
Comment 11•23 years ago
|
||
*** Bug 125331 has been marked as a duplicate of this bug. ***
Comment 12•23 years ago
|
||
> Just because somebody screws or mixes encodings
Well..... There is no standard saying what encoding should be used for
stylesheets when none is specified in the sheet. We use the document encoding,
but a slightly different reading of things could lead to ISO-8859-1 being used
as default by a different browser. Furthermore, Mozilla itself used to default
to ISO-8859-1 instead of the document charset. So web authors may be expecting
that behavior...
Comment 13•23 years ago
|
||
> So web authors may be expecting that behavior... Yep, that was how i became aware of this "bug", cause all older versions didn't choke on that one ;) BTW i think there's a standard, although not directly mentioned... Posting <form> data for example implies that the html- documents character encoding is to be used for encoding the form-data too. The "accept-charset" attribute changes this behaviour. Same goes for the <link> tag but with the attribute "charset". http://www.w3.org/TR/REC-html40/struct/links.html#edef- LINK http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.3 as you can read there - accept-charset and charset are #IMPLIED. Concerning this information i still think this isn't a bug but a standard-compliant implementation. My interpretation of the w3c definitions might be wrong though...
Keywords: mozilla1.0
Comment 14•23 years ago
|
||
Simon, this is the bug we were talking about.
Assignee | ||
Comment 15•23 years ago
|
||
This patch is with diff -u, and includes standardization of tabs and indents
Assignee | ||
Comment 16•23 years ago
|
||
Assignee | ||
Comment 17•23 years ago
|
||
Hmmm, my patch doesn't really address the central issue of this bug as reported, although it does fix a number of sites that are currently broken. Maybe it should be punted to a new bug. The comments in nsIUnicodeDecoder.h say: * Error conditions: * If the read value does not belong to this character set, one should * replace it with the Unicode special 0xFFFD. When an actual input error is * encountered, like a format error, the converter stop and return error. * Hoever, we should keep in mind that we need to be lax in decoding. I believe that the specific case of UTF-8 is a classic example where we should be lax. In the real-world examples where we are decoding an ISO-8859-1 page as if it were UTF-8, being lax will let us retrieve all the characters <= 0x7E correctly and render as expected. This argument does not apply to sequences which are decodable but illegal because they are not the minimum encoding, which is why I continue to return an error in these cases.
Comment 19•22 years ago
|
||
*** Bug 128896 has been marked as a duplicate of this bug. ***
Comment 20•22 years ago
|
||
for risk reason, we should fix the particular issue instead of general issue. so, nsbeta1- and file a seperate bug for the CSS. and nominate that bug for nsbeta1
Comment 21•22 years ago
|
||
So per Ftang comment should we reopen bug 128896?
Assignee | ||
Comment 22•22 years ago
|
||
OK, reopening 128896
Assignee | ||
Updated•22 years ago
|
Status: NEW → ASSIGNED
Comment 23•19 years ago
|
||
*** Bug 278291 has been marked as a duplicate of this bug. ***
Updated•15 years ago
|
QA Contact: teruko → i18n
Comment 24•9 years ago
|
||
Fixed long ago.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•