Closed Bug 63841 Opened 24 years ago Closed 24 years ago

[Composer / ISO-2022-JP Charset]Characters are messed up when input hankaku katakana after kanji.

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: amyy, Assigned: nhottanscp)

References

Details

(Keywords: intl)

12-27 Mtrunk build:
[Composer / ISO-2022-JP Charset]Characters are messed up when input hankaku 
katakana after kanji. 

Steps to reproduce:
1. Start Composer.
2. View | Character Coding | More | East Asian | Japanese(ISO-2022-JP).
3. Type a Japanese kanji(e.g. "hyou") follow by a hankaku katakana(e.g. "a").
4. Save as a file and click on "Browse" icon to bring the page navigator Window.

Result:
The characters are messed up, and if you close the file and re-open it, you can 
see the incorrect characrters in Composer also.

Notes:
1. It exists in WinME and Mac, I have no idea with Linux hankaku katakana input 
method.  However, in Linux I can type some kanji and zenkaku katakana, when you 
browse the page, and go [View] | [Page Source], the characters in the body are 
show reference code like "&#65395" instead of show kanji or kakakana. 
2. WinME, after you created a page and save it, sometimes most of icons of 
Coposition  Toolbar are disable. 
3. After create a page, and bring the page source in [View], a lots of time, the 
hankaku katakana show reference code.
Change QA contact and add keywords.
Keywords: intl, nsbeta1
QA Contact: sujay → ylong
Summary: [Composer / ISO-2022-JP Charset]Characters are messed up when input hankaku katakana after kanji. → [Composer / ISO-2022-JP Charset]Characters are messed up when input hankaku katakana after kanji.
Assignee: beppe → nhotta
Component: Editor → Internationalization
Keywords: regression
This is regression.  Related bug is 49262. Changed the component to international and assign to
nhotta@netscape.com.
I can reproduce this with NS 6 release build (so probably not a regression).
Reassign to ftang.
Assignee: nhotta → ftang
I cannot reproduce this on my 2000100908 build. It probably introduce after that
time.
Can someone try Beta3 . both ylong and nhotta show me this problem in trunk and
N6RTM. Reassign this to yokoyama. Notice the "hyou" (you have to hit return to
convert) contains 5c in the 2nd byte. It might be some editor code strip out
0x5c incorrectly introduced after beta3.

Reassign to yokoyama to work on. Yokoyama, please talk to nhotta about this.

Assignee: ftang → yokoyama
It might not cause by contains 5c in 2nd byte.  When I showed this to ftang with 
01-09-06 Win Mtrunk build, it just normal kanji follow by hankaku katakana. 
This is interesting, if you use an plain text editor to see the raw byte, it show
esc + "$B" + "I=ハ" + esc + "(B"

If I don't put the halfwidth hiragana in there, I will get
esc + "$B" + "I="+ esc + "(B"
Notice that esc + "$B" is the shift in escape sequence for JIS x0208 in
ISO-2022-JP and esc + "(B" is the shift out escape sequence for JIS x0208 in
ISO-2022-JP.
and "I=" is the JIS x0208 in 7 bits for the character "hyou".
The correct result should be
esc + "$B" + "I=" + esc + "(B" + "ハ"

I guess the problem code is in
mozilla/htmlparser/src/nsHTMLContentSinkStream.cpp
210 akkana 3.69 NS_IMETHODIMP

211 akkana 3.89 nsHTMLContentSinkStream::InitEncoders()

212 akkana 3.41 {

...
206

207 akkana 3.69 /**
208 * Initialize the Unicode encoder with our current
mCharsetOverride.

209 akkana 3.41 */

210 akkana 3.69 NS_IMETHODIMP

211 akkana 3.89 nsHTMLContentSinkStream::InitEncoders()

212 akkana 3.41 {

213 akkana 3.69 nsresult res;

214 akkana 3.41

215 akkana 3.89 // Initialize an entity encoder if we're using the string interface:
216 if (mString && (mFlags &
nsIDocumentEncoder::OutputEncodeEntities))
217 res =
nsComponentManager::CreateInstance(kEntityConverterCID, NULL,
218
NS_GET_IID(nsIEntityConverter),
219
getter_AddRefs(mEntityConverter));

220 akkana 3.41

221 akkana 3.89 // Initialize a charset encoder if we're using the stream interface
222 if (mStream)

223 rickg 3.37 {

224 jst 3.108 nsAutoString charsetName; charsetName.Assign(mCharsetOverride);

225 akkana 3.89 NS_WITH_SERVICE(nsICharsetAlias, calias, kCharsetAliasCID, &res);

226 scc 3.92 if (NS_SUCCEEDED(res) && calias) {

227 jst 3.108 nsAutoString temp; temp.Assign(mCharsetOverride);

228 scc 3.92 res = calias->GetPreferred(temp, charsetName);
229 }

230 akkana 3.89 if (NS_FAILED(res))

231 rickg 3.37 {

232 akkana 3.89 // failed - unknown alias , fallback to ISO-8859-1

233 scc 3.92 charsetName.AssignWithConversion("ISO-8859-1");

234 rickg 3.37 }

235 akkana 3.89 236 res =
nsComponentManager::CreateInstance(kSaveAsCharsetCID, NULL,
237
NS_GET_IID(nsISaveAsCharset),
238
getter_AddRefs(mCharsetEncoder));
239 if (NS_FAILED(res))
240 return res;
241 // SaveAsCharset requires a const char* in its first
argument:

242 mjudge 3.98 nsCAutoString charsetCString;
charsetCString.AssignWithConversion(charsetName);

243 akkana 3.89 // For ISO-8859-1 only, convert to entity first (always generate
entites like  ).
244 res = mCharsetEncoder->Init(charsetCString,
245
charsetName.EqualsIgnoreCase("ISO-8859-1") ?
246
nsISaveAsCharset::attr_htmlTextDefault :
247
nsISaveAsCharset::attr_EntityAfterCharsetConv
248 +
nsISaveAsCharset::attr_FallbackDecimalNCR,

249 nhotta 3.107 nsIEntityConverter::html32);

I think the 0x5C is not a factor here. I can reproduce wiht character which do
not have a 5c in it.

I think someone should debug through intl/unicharutil/src/nsSaveAsCharset.cpp
nsSaveAsCharset::DoCharsetConversion(const PRUnichar *inString, char **outString)
nsSaveAsCharset::DoConversionFallBack

and see what happen there.
I reassign this back to nhotta since he know tha code better.
Put P3 as priority for now.
ji- please try to send it in HTML Mail and see does it do the same thing there.
If so, we should change this to P2 since ISO-2022-JP is much more important in
Mail than web page.


Assignee: yokoyama → nhotta
Priority: -- → P3
mark this as P3 moz0.9
Target Milestone: --- → mozilla0.9
Remove 'regression' keyword because it's reproducible with RTM.
Keywords: regression
I found nsISaveAsCharset is not used any more (see bug 65324, bug 59679).
Probably a problem in nsDocumentEncoder.cpp, cc to jst.
Reassign to jst, see bug 59679 for detail.
Assignee: nhotta → jst
Reassigning to anthonyd who owns the serializer code.
Assignee: jst → anthonyd
Target Milestone: mozilla0.9 → Future
I'm not real sure why this is now on my plate, but oh well.  If this is any sort 
of priority, then some one form 118n should take it.
setting to future.

anthonyd
Reassign to nhotta.
Assignee: anthonyd → nhotta
Depends on: 59679
Target Milestone: Future → mozilla0.9
Bug 59679 was fixed, I cannot reproduce the problem using today's build.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Mark as verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.