63841 - [Composer / ISO-2022-JP Charset]Characters are messed up when input hankaku katakana after kanji.

Reporter

Description

•

24 years ago

12-27 Mtrunk build:
[Composer / ISO-2022-JP Charset]Characters are messed up when input hankaku 
katakana after kanji. 

Steps to reproduce:
1. Start Composer.
2. View | Character Coding | More | East Asian | Japanese(ISO-2022-JP).
3. Type a Japanese kanji(e.g. "hyou") follow by a hankaku katakana(e.g. "a").
4. Save as a file and click on "Browse" icon to bring the page navigator Window.

Result:
The characters are messed up, and if you close the file and re-open it, you can 
see the incorrect characrters in Composer also.

Notes:
1. It exists in WinME and Mac, I have no idea with Linux hankaku katakana input 
method.  However, in Linux I can type some kanji and zenkaku katakana, when you 
browse the page, and go [View] | [Page Source], the characters in the body are 
show reference code like "&#65395" instead of show kanji or kakakana. 
2. WinME, after you created a page and save it, sometimes most of icons of 
Coposition  Toolbar are disable. 
3. After create a page, and bring the page source in [View], a lots of time, the 
hankaku katakana show reference code.

Yuying Long

Reporter

Comment 1

•

24 years ago

Change QA contact and add keywords.

Keywords: intl, nsbeta1

QA Contact: sujay → ylong

Summary: [Composer / ISO-2022-JP Charset]Characters are messed up when input hankaku katakana after kanji. → [Composer / ISO-2022-JP Charset]Characters are messed up when input hankaku katakana after kanji.

Teruko Kobayashi

Updated

•

24 years ago

Assignee: beppe → nhotta

Component: Editor → Internationalization

Keywords: regression

Teruko Kobayashi

Comment 2

•

24 years ago

This is regression.  Related bug is 49262. Changed the component to international and assign to
nhotta@netscape.com.

nhottanscp

Assignee

Comment 3

•

24 years ago

I can reproduce this with NS 6 release build (so probably not a regression).
Reassign to ftang.

Assignee: nhotta → ftang

Frank Tang

Comment 4

•

24 years ago

I cannot reproduce this on my 2000100908 build. It probably introduce after that
time.

Frank Tang

Comment 5

•

24 years ago

Can someone try Beta3 . both ylong and nhotta show me this problem in trunk and
N6RTM. Reassign this to yokoyama. Notice the "hyou" (you have to hit return to
convert) contains 5c in the 2nd byte. It might be some editor code strip out
0x5c incorrectly introduced after beta3.

Reassign to yokoyama to work on. Yokoyama, please talk to nhotta about this.

Assignee: ftang → yokoyama

Yuying Long

Reporter

Comment 6

•

24 years ago

It might not cause by contains 5c in 2nd byte.  When I showed this to ftang with 
01-09-06 Win Mtrunk build, it just normal kanji follow by hankaku katakana.

Frank Tang

Comment 7

•

24 years ago

This is interesting, if you use an plain text editor to see the raw byte, it show
esc + "$B" + "I=&#65418;" + esc + "(B"

If I don't put the halfwidth hiragana in there, I will get
esc + "$B" + "I="+ esc + "(B"
Notice that esc + "$B" is the shift in escape sequence for JIS x0208 in
ISO-2022-JP and esc + "(B" is the shift out escape sequence for JIS x0208 in
ISO-2022-JP.
and "I=" is the JIS x0208 in 7 bits for the character "hyou".
The correct result should be
esc + "$B" + "I=" + esc + "(B" + "&#65418;"

I guess the problem code is in
mozilla/htmlparser/src/nsHTMLContentSinkStream.cpp
210 akkana 3.69 NS_IMETHODIMP

211 akkana 3.89 nsHTMLContentSinkStream::InitEncoders()

212 akkana 3.41 {

...
206

207 akkana 3.69 /**
208 * Initialize the Unicode encoder with our current
mCharsetOverride.

209 akkana 3.41 */

210 akkana 3.69 NS_IMETHODIMP

211 akkana 3.89 nsHTMLContentSinkStream::InitEncoders()

212 akkana 3.41 {

213 akkana 3.69 nsresult res;

214 akkana 3.41

215 akkana 3.89 // Initialize an entity encoder if we're using the string interface:
216 if (mString && (mFlags &
nsIDocumentEncoder::OutputEncodeEntities))
217 res =
nsComponentManager::CreateInstance(kEntityConverterCID, NULL,
218
NS_GET_IID(nsIEntityConverter),
219
getter_AddRefs(mEntityConverter));

220 akkana 3.41

221 akkana 3.89 // Initialize a charset encoder if we're using the stream interface
222 if (mStream)

223 rickg 3.37 {

224 jst 3.108 nsAutoString charsetName; charsetName.Assign(mCharsetOverride);

225 akkana 3.89 NS_WITH_SERVICE(nsICharsetAlias, calias, kCharsetAliasCID, &res);

226 scc 3.92 if (NS_SUCCEEDED(res) && calias) {

227 jst 3.108 nsAutoString temp; temp.Assign(mCharsetOverride);

228 scc 3.92 res = calias->GetPreferred(temp, charsetName);
229 }

230 akkana 3.89 if (NS_FAILED(res))

231 rickg 3.37 {

232 akkana 3.89 // failed - unknown alias , fallback to ISO-8859-1

233 scc 3.92 charsetName.AssignWithConversion("ISO-8859-1");

234 rickg 3.37 }

235 akkana 3.89 236 res =
nsComponentManager::CreateInstance(kSaveAsCharsetCID, NULL,
237
NS_GET_IID(nsISaveAsCharset),
238
getter_AddRefs(mCharsetEncoder));
239 if (NS_FAILED(res))
240 return res;
241 // SaveAsCharset requires a const char* in its first
argument:

242 mjudge 3.98 nsCAutoString charsetCString;
charsetCString.AssignWithConversion(charsetName);

243 akkana 3.89 // For ISO-8859-1 only, convert to entity first (always generate
entites like &nbsp;).
244 res = mCharsetEncoder->Init(charsetCString,
245
charsetName.EqualsIgnoreCase("ISO-8859-1") ?
246
nsISaveAsCharset::attr_htmlTextDefault :
247
nsISaveAsCharset::attr_EntityAfterCharsetConv
248 +
nsISaveAsCharset::attr_FallbackDecimalNCR,

249 nhotta 3.107 nsIEntityConverter::html32);

I think the 0x5C is not a factor here. I can reproduce wiht character which do
not have a 5c in it.

I think someone should debug through intl/unicharutil/src/nsSaveAsCharset.cpp
nsSaveAsCharset::DoCharsetConversion(const PRUnichar *inString, char **outString)
nsSaveAsCharset::DoConversionFallBack

and see what happen there.
I reassign this back to nhotta since he know tha code better.
Put P3 as priority for now.
ji- please try to send it in HTML Mail and see does it do the same thing there.
If so, we should change this to P2 since ISO-2022-JP is much more important in
Mail than web page.

Assignee: yokoyama → nhotta

Priority: -- → P3

Frank Tang

Comment 8

•

24 years ago

mark this as P3 moz0.9

Target Milestone: --- → mozilla0.9

nhottanscp

Assignee

Comment 9

•

24 years ago

Remove 'regression' keyword because it's reproducible with RTM.

Keywords: regression

nhottanscp

Assignee

Comment 10

•

24 years ago

I found nsISaveAsCharset is not used any more (see bug 65324, bug 59679).
Probably a problem in nsDocumentEncoder.cpp, cc to jst.

nhottanscp

Assignee

Comment 11

•

24 years ago

Reassign to jst, see bug 59679 for detail.

Assignee: nhotta → jst

Johnny Stenback (:jst)

Comment 12

•

24 years ago

Reassigning to anthonyd who owns the serializer code.

Assignee: jst → anthonyd

anthonyd

Updated

•

24 years ago

Target Milestone: mozilla0.9 → Future

anthonyd

Comment 13

•

24 years ago

I'm not real sure why this is now on my plate, but oh well.  If this is any sort 
of priority, then some one form 118n should take it.
setting to future.

anthonyd

nhottanscp

Assignee

Comment 14

•

24 years ago

Reassign to nhotta.

Assignee: anthonyd → nhotta

Depends on: 59679

Target Milestone: Future → mozilla0.9

nhottanscp

Assignee

Comment 15

•

24 years ago

Bug 59679 was fixed, I cannot reproduce the problem using today's build.

Status: NEW → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

Yuying Long

Reporter

Comment 16

•

23 years ago

Mark as verified.

Status: RESOLVED → VERIFIED

Bugzilla

Quick Search

[Composer / ISO-2022-JP Charset]Characters are messed up when input hankaku katakana after kanji.

Categories

(Core :: Internationalization, defect, P3)

Tracking

()

People

(Reporter: amyy, Assigned: nhottanscp)

References

Details

(Keywords: intl)

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Comment 13

Comment 14

Comment 15

Comment 16