Closed Bug 814254 Opened 11 years ago Closed 11 years ago
.61-8bit charset is crashy
The crash is possibly OOM, but that is odd. The serialized document really shouldn't take that much memory. Other charsets don't crash. Do we need T.61-8bit for anything?
security sensitive because I haven't analyzed whether the crash is just OOM or something else.
There seem to be at least two bugs here. The first one is bad handling of the range testing in uGenerateShift: testing the input low byte and high byte separately against the ranges in the table doesn't work well when the low byte of the first entry in the table is higher than legal values for the low byte with later values for the hight byte. Specifically in this testcase, the table is ShiftOutCell(u2BytesChar, 2, 0xC0, 0x41, 0xCF, 0x7A) at http://mxr.mozilla.org/mozilla-central/source/intl/uconv/ucvlatin/nsUnicodeToT61.cpp?#20, and the input is 0xC420
Attachment #684364 - Flags: review?(VYV03354)
I want to analyse it further, but the second bug seems to be that at http://mxr.mozilla.org/mozilla-central/source/intl/uconv/util/nsUnicodeEncodeHelper.cpp#53, we return NS_OK_UENC_MOREOUTPUT for an unmapped character. http://mxr.mozilla.org/mozilla-central/source/intl/uconv/util/nsUCSupport.cpp#475 then doubles the size of the buffer and tries again, and this carries on until the buffer size overflows. I think the return code here should be NS_ERROR_UENC_NOMAPPING, but if so I don't understand why we don't crash like this all the time, so I want to investigate more.
According to http://wiki.whatwg.org/wiki/Web_Encodings , Opera and Safari don't support this encoding and IE's support is label-incompatible with Gecko's. I think we should remove support for this encoding. If it was necessary for Web compat, Opera and Safari would probably have added support by now and we'd have aligned labels with IE.
(In reply to Henri Sivonen (:hsivonen) from comment #4) > I think we should remove support for this encoding. Can we have that conversation in a different bug?
T.61-8bit is already banned. https://mxr.mozilla.org/mozilla-central/source/intl/uconv/src/charsetData.properties?rev=fd7a0ace6b0e#51 Why is it used at all? It is a bug itself.
Wow, GetUnicodeEncoder didn't check the .isInternal flag! https://mxr.mozilla.org/mozilla-central/source/intl/uconv/src/nsCharsetConverterManager.cpp?rev=73651efc5d71#130 IMO we should remove T.61-8bit rather than bearing a maintenance burden. We made a similar decision about UTF-32 (bug 604317). For branches, it's sufficient to remove T.61-8bit from charsetAlias.properties so that GetUnicodeEncoder couldn't find the encoding anymore.
That said, bugs are not specific to T.61-8bit at all. The bug title is a bit misleading. Indeed T.61-8bit removal should be discussed in another bug.
Summary: T.61-8bit charset is crashy → Table encoders are crashy
Comment on attachment 684364 [details] [diff] [review] Patch part 1 I don't think this change is correct. For example, 0x8420 is not a valid code for Shift_JIS. If 0xC420 is a valid code, it's just the range definition is wrong. If it is not valid, why the code is passed to uGenerateShift in the first place?
Attachment #684364 - Flags: review?(VYV03354) → review-
(In reply to Simon Montagu from comment #3) > I want to analyse it further, but the second bug seems to be that at > http://mxr.mozilla.org/mozilla-central/source/intl/uconv/util/ > nsUnicodeEncodeHelper.cpp#53, we return NS_OK_UENC_MOREOUTPUT for an > unmapped character. > http://mxr.mozilla.org/mozilla-central/source/intl/uconv/util/nsUCSupport. > cpp#475 then doubles the size of the buffer and tries again, and this > carries on until the buffer size overflows. > > I think the return code here should be NS_ERROR_UENC_NOMAPPING, but if so I > don't understand why we don't crash like this all the time, so I want to > investigate more. The variable name "charFound" is also misleading, uGenerateShift will also return false even when the output buffer is insufficient.
After all, this is specific to T.61-8bit. We should just remove the encoding.
Summary: Table encoders are crashy → T.61-8bit charset is crashy
According to ITU T.61 recommendation, 0xC420 is a valid sequence representing tilde. Then an entry something like ShiftOutCell(u2BytesChar, 2, 0xC0, 0x20, 0xCF, 0x20) should be added to g_T61ShiftOutTable *if* we don't remove T.61-8bit. (But we should just remove the encoding.)
serializeToStream is no longer visible from the content. Feel free to file a new bug for removing T.61-8bit.
Status: NEW → RESOLVED
Closed: 11 years ago
Depends on: 816410
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Removal is bug 951571. Does this bug need to remain secret?
You need to log in before you can comment on or make changes to this bug.