Closed Bug 25037 Opened 26 years ago Closed 24 years ago

illegal 0xA0 code point in Multibyte charset break parser

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla0.9

People

(Reporter: teruko, Assigned: shanjian)

References

(
URL
)

Details

Attachments

(7 files)

test case 25 years ago Teruko Kobayashi 1.66 KB, text/html		Details
Simpler test case 25 years ago harishd 178 bytes, text/html		Details
Proposed fix part 1 24 years ago Shanjian Li 1.37 KB, patch		Details \| Diff \| Splinter Review
Proposed fix part 2 24 years ago Shanjian Li 754 bytes, patch		Details \| Diff \| Splinter Review
Proposed fix part 1 24 years ago Shanjian Li 627 bytes, patch		Details \| Diff \| Splinter Review
Proposed fix part 4 24 years ago Shanjian Li 1.20 KB, patch		Details \| Diff \| Splinter Review
new part2 fix 24 years ago Shanjian Li 1.34 KB, patch		Details \| Diff \| Splinter Review

Teruko Kobayashi

Reporter

Description

•

26 years ago

When you see the above page, the last table on the right displays /TD. Steps of reproduce 1. Go to above URL 2. Look at the table on the right Under "DAVOS 2000", "/TD" is displayed. 3. Select menu View|Page Source Look at the Line <TD VALIGN=TOP><LI> /TD> Source does not have "<" before "/TD>" However, I look at the source of this page in Communicator 4.x. <TD VALIGN=TOP><LI></TD> Tested 2000012515 Win32 and Linux build.

rickg

Comment 1

•

26 years ago

This works perfectly for me. Petersen, can you see the problem. Also -- a reduced test case would be very helpful.

Assignee: rickg → petersen

URL: http://home.netscape.com/zh/cn/ → http://home.netscape.com/zh/cn/

bobj

Comment 2

•

26 years ago

I reproduced this with the 1/25/2000 M14 build running on Win95. The strange thing, is that when I saved it to the blues server with the intention to try and make a simpler test case, it worked! I looked at the saved unmodified file on blues, and it looked the same. Here is my saved file: http://blues/users/bobj/publish/test/zh-tw-index.html

Chris Petersen

Comment 3

•

25 years ago

With the Feb 07, (20000020608), I can't reproduce the problem described.

Assignee: petersen → rickg

rickg

Comment 4

•

25 years ago

Harish -- Petersen and I dont see this, but you might give it a try.

Assignee: rickg → harishd

Teruko Kobayashi

Reporter

Comment 5

•

25 years ago

Since the content of http://home.netscape.com/zh/cn/ has been changed, I copied simple page to http://jazz/users/teruko/tests/cntest1.html

URL: http://home.netscape.com/zh/cn/ → http://jazz/users/teruko/publish/test...

harishd

Comment 6

•

25 years ago

Not able to reproduce!!! Teruko, is this bug still valid?

bobj

Comment 7

•

25 years ago

In my 2000-01-26 14:39 comment, I had strange results. I could reproduce it, but when I copied the page to another server, it worked...

harishd

Comment 8

•

25 years ago

Which implies that the problem could be server related..right??

Teruko Kobayashi

Reporter

Comment 9

•

25 years ago

Correction of my previous comment. Since the content of http://home.netscape.com/zh/cn/ has been changed, I copied simple page to http://jazz/users/teruko/publish/tests/cntest1.html harishd, Did you try http://jazz/users/teruko/publish/tests/cntest1.html?

Teruko Kobayashi

Reporter

Comment 10

•

25 years ago

Attached file test case — Details

harishd

Comment 11

•

25 years ago

Attached file Simpler test case — Details

harishd

Comment 12

•

25 years ago

Could be a bug in ftang's code in nsScanner::Append(). Frank??

Frank Tang

Comment 13

•

25 years ago

Can you reproduce the problem w/ the "Simpler test case" http://bugzilla.mozilla.org/showattachment.cgi?attach_id=5105 ?

Frank Tang

Comment 14

•

25 years ago

harishd is right. This is a converter problem. What happen is somehow there are a 0xA0 after <LI> . but 0xA0 is not a legal code point in GB2312. somehow our error handling code eat two bytes instead of one byte for this and cause the < get eaten. Reassign this back to ftang. Good catch, harishd. jbetak- is this similar to the one you found in the old UTF-8 code ?

Assignee: harishd → ftang

Frank Tang

Comment 15

•

25 years ago

I am not sure how many content out there have this kind of illegal code point issue. Mark it M18

Status: NEW → ASSIGNED

Target Milestone: M18

jbetak@netscape.com (away - not reading bugmail)

Comment 16

•

25 years ago

ftang: the problem with UTF-8 before Feb 4 was very similar - we were eating 2 bytes instead of one. My impression was that it was happening in the buffering / error handling code. I was not able to discover a similar regularity like in this 0xA0 problem though. Will have a look again later this week, maybe they have more in common. Referencing to <A HREF="http://bugzilla.mozilla.org/show_bug.cgi?id=8702">Bug #8702</A>

Frank Tang

Comment 17

•

25 years ago

Change the summary from "</TD> parsing problem" to "illegal 0xA0 code point in Multibyte charset break parser" In 4.x, we silently support undef 0xa0 code point for CJK multibyte code page, however, in seamonkey, we don't. This cuase backward compatability issue. Some page accidentally have this character and the webmaster does not spot the problem since it is display correctly in 4.x. When SeaMonkey hit it, it could cause parser problem. This happen especailly if the 0xA0 is before a open tag, such as <TABLE> , currently SeaMonkey will take the 0xA0 and the next character to form a undefine character, therefore the '<' of the <TABLE> (or other tag) will be eat by the converter We have the following options- 1. Ignore this bug and let the web master fix their page since these character is not defined in the standard of these charset. 2. Add 0xa0 to the convert to unicode converter for all the multi byte charset so it will be convert to U+00A0 Mark it M16

Summary: </TD> parsing problem → illegal 0xA0 code point in Multibyte charset break parser

Target Milestone: M18 → M16

Frank Tang

Comment 18

•

25 years ago

*** Bug 27704 has been marked as a duplicate of this bug. ***

Erik van der Poel

Comment 19

•

25 years ago

Mozilla has worked very hard to be compatible with Nav4. This 0xA0 issue should not be an exception. We should be compatible with Nav4, at least in NavQuirks mode, and probably even in Strict mode. Even in Strict mode, if there is an A0 byte, we should not eat the next byte, I think. Is this bug serious? Is M16 the right milestone for this?

Frank Tang

Updated

•

25 years ago

Target Milestone: M16 → M18

Frank Tang

Comment 20

•

25 years ago

converter related issue reassign to cata.

Assignee: ftang → cata

Status: ASSIGNED → NEW

cata

Updated

•

25 years ago

Status: NEW → ASSIGNED

Frank Tang

Comment 21

•

25 years ago

*** Bug 27454 has been marked as a duplicate of this bug. ***

Frank Tang

Updated

•

25 years ago

Target Milestone: M18 → M21

Frank Tang

Comment 22

•

25 years ago

move all cata's bug to ftang

Assignee: cata → ftang

Status: ASSIGNED → NEW

Frank Tang

Comment 23

•

25 years ago

We need to add 0xa0 for Big5, gb2312, EUC-KR, GBK, EUC-JP. Mark this as moz0.9 P3

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla0.9

Frank Tang

Comment 24

•

24 years ago

shanjian- can you help to take a look at this?

Assignee: ftang → shanjian

Status: ASSIGNED → NEW

Shanjian Li

Assignee

Comment 25

•

24 years ago

Attached patch Proposed fix part 1 — Details — Splinter Review

Shanjian Li

Assignee

Comment 26

•

24 years ago

Attached patch Proposed fix part 2 — Details — Splinter Review

Shanjian Li

Assignee

Comment 27

•

24 years ago

Attached patch Proposed fix part 1 — Details — Splinter Review

Shanjian Li

Assignee

Comment 28

•

24 years ago

Attached patch Proposed fix part 4 — Details — Splinter Review

Shanjian Li

Assignee

Updated

•

24 years ago

Status: NEW → ASSIGNED

Shanjian Li

Assignee

Comment 29

•

24 years ago

In GBK, 0xA0 is a legal lead byte. Patch 3 (wrongly marked as 1) should be dropped.

Shanjian Li

Assignee

Comment 30

•

24 years ago

Attached patch new part2 fix — Details — Splinter Review

Shanjian Li

Assignee

Comment 31

•

24 years ago

When I took a look at bug 64235, I revised fix part2. So problem in 64235 will be taken care of here. For people to review the fix, complete fix include: fix part 1, (first one, not the one wrongly marked) nsUnicodeDecodeHelper.cpp new part2 fix, fix GB2312 fix part 4, fix japanese (eucjp and sjis)

Jungshik Shin

Comment 32

•

24 years ago

Bug 64235 is a superset of this bug, IMHO. It has to do with not just stand-alone 0xA0 but also with any stand-alone byte/octet with MSB=1 in various CJK encodings. I added a new patch to take care of it for GBK and GB2312 (to bug 64235)

bsharma

Comment 33

•

24 years ago

updated qa contact.

QA Contact: janc → bsharma

Frank Tang

Comment 34

•

24 years ago

shanjian- this is hard to review. Why don't you produce a new patch which include all the necessary part.

Frank Tang

Comment 35

•

24 years ago

Is that true the patch in 64235 cover the fix of this ?

Shanjian Li

Assignee

Comment 36

•

24 years ago

yes, 64235 covered this one.

Shanjian Li

Assignee

Comment 37

•

24 years ago

fix has been checked.

Status: ASSIGNED → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

bsharma

Comment 38

•

24 years ago

Verified on build: 2001-06-20-04-Trunk platform: Win NT I do not see "/TD".

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.