Closed Bug 782721 Opened 12 years ago Closed 12 years ago

Exising nsIUnicodeDecoder, nsBasicDecoder do not have a replacement character option when an unknown character is found.

Tracking

()

Status:

RESOLVED DUPLICATE of bug 638379

People

(Reporter: bsurender, Unassigned)

References

(Blocks 1 open bug)

Details

bsurender

Reporter

Description

•

12 years ago

The existing nsIUnicodeDecoder does not support a replacement character option if an unknown character is detected in the incoming charset to be decoded to UTF-16 code units.

http://mxr.mozilla.org/mozilla-central/source/intl/uconv/public/nsIUnicodeDecoder.h#61

Instead, the decoder stops and signals, or just recovers and continues.

The encoder however supports the above two options as well as replacement characters.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

12 years ago

Component: DOM → Internationalization

Masatoshi Kimura [:emk]

Comment 1

•

12 years ago

Are kOnError_Signal and kOnError_Recover insufficient? Although those options may have some subtle differences from what the Encoding Standard requires, I don't think it's enough critical to block implementing TextDecoder.

Simon Montagu :smontagu

Comment 2

•

12 years ago

(In reply to bsurender from comment #0)
> Instead, the decoder stops and signals, or just recovers and continues.

This is by design (though the behaviour could be changed to some extent if necessary): see bug 231267

Masatoshi Kimura [:emk]

Comment 3

•

12 years ago

Our decoder will replace unmapped character to what GetCharacterForUnMapped(). It is U+31FC for Shift_JIS, U+FFFD for other encodings.
Until bug 747762 has been fixed (but honestly I do not want break the compatibility with the Windows decoder deliberately), You can workaround the quirks as follows:
1. If the "fatal" flag is true, use kOnError_Signal and throw "EncodingError" on failure.
2. Otherwise, if GetCharacterForUnMapped() returns U+FFFD, use kOnError_Recover.
3. Otherwise, use kOnError_Signal and replaces the offending character to U+FFFD yourself.
Step 2 is optional. You can always handle the error yourself instead of inspecting the return value of GetCharacterForUnMapped().

Blocks: encoding

bsurender

Reporter

Updated

•

12 years ago

Assignee: bsurender → nobody

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 4

•

12 years ago

Do we still need this? Or should we close it as WONTFIX?

Masatoshi Kimura [:emk]

Comment 5

•

12 years ago

I found this was already covered by bug 638379.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Exising nsIUnicodeDecoder, nsBasicDecoder do not have a replacement character option when an unknown character is found.

Categories

(Core :: Internationalization, defect)

Tracking

()

People

(Reporter: bsurender, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5