26920 - Generic UCV buffering scheme byte misalignments

jbetak@netscape.com (away - not reading bugmail)

Reporter

Description

•

25 years ago

If you use an old build (e.g. 2000020310 or earlier), then you will notice that 
we "eat" some parts of the HTML and the file rendering is disrupted. Although 
this buffering scheme is not used in the UTF-8 decoder anymore, it is still 
needed for other decoders and should be revisited since there might be problems 
with buffer alignment. 

I looked at it in the debugger and my impression was just that - the buffer 
comes back with inappropriately aligned bytes and we lose some of them in the 
process...

Frank Tang

Comment 1

•

25 years ago

jbetak- Can this bug be reproduceable by the current build w/ other charsets ?

Assignee: ftang → cata

cata

Updated

•

25 years ago

Status: NEW → ASSIGNED

cata

Comment 2

•

25 years ago

Being a rather random and difficult to reproduce bug, I'm moving it far away. No 
need to worry until it bites us harder or we have extra time.

Target Milestone: M20

bobj

Comment 3

•

25 years ago

Cata and Juraj, can you better characterize the nature of this bug?
Then IQA can do some testing to assure us that this is indeed a rare
case.  I don't want to find out that this is not rare after Beta1.

Frank Tang

Comment 4

•

24 years ago

with no specific information. Mark it as invalid

Status: ASSIGNED → RESOLVED

Closed: 24 years ago

Resolution: --- → INVALID

Teruko Kobayashi

Comment 5

•

24 years ago

After I talked with Bob, we should reopen this bug.  I reopen this and mark as future.

Status: RESOLVED → REOPENED

Resolution: INVALID → ---

Target Milestone: M20 → Future

cata

Comment 6

•

24 years ago

Are you sure, guys? For all I know Juraj was the only one seeing this once or 
twice a long time ago. I never heard of other occurances, and I cannot 
reproduce it...

Why reopen?

bobj

Comment 7

•

24 years ago

Have your carefully reviewed the code where Juraj thought he saw the loss of
misaligned bytes?  How did you try to reproduce this?  What test cases do you
have?  Can we do any instrumentation (e.g., ASSERTs) to try to catch this?

Losing random bytes can be very hard to find.  Let's reassure ourselves that
there are no edge cases where bytes are being lost before invalidating this.

cata

Comment 8

•

24 years ago

I tried to reproduce this with the URL from the bug report: 
http://people/ftang/demo/utf8all.html. That's the only test case I know of. And 
it worked just fine.

The reason I do not belive this bug is valid is the nature of that code. It is 
byte-processing code. Very deterministic. If there's a problem once, I expect it 
to be always there, every time we process that bytestream. Also, that code is 
shared by *all* converters. It is exercised for every multybyte page. Any "byte 
eating" should be *very* obvious (pretty much garbage on the rest of the whole 
page...). And yet this is the only report we have and I can't reproduce it.

So, I'll leave it up to you if you want to close the bug or leave it open.

Teruko Kobayashi

Updated

•

24 years ago

Keywords: intl

Frank Tang

Comment 9

•

24 years ago

move all cata's bug to ftang

Assignee: cata → ftang

Status: REOPENED → NEW

Frank Tang

Updated

•

24 years ago

Status: NEW → ASSIGNED

Jungshik Shin

Comment 10

•

23 years ago

I haven't seen the symptom of this bug before m 0.9.1, but with
m 0.9.1 I came across a lot of pages (Korean in EUC-KR) manifesting
what I believe to be a symptom of this bug. Sometimes, misalignment
in the converter seems to be get fixed by reloading, but most
of time, reloading pages doesn't help. 
This bug is very very serious. For instance, look at
http://www.hani.co.kr/section-005000000/2001/07/005000000200107021018305.html

Around the end of the 4th paragraph, misalignment occurred and
what should three Hangul syllables (U+AC83, U+C73C, U+B85C) 
is rendered as UNKNOWN("?" inside diamond), U+75FC, U+B9C9, UNKNOWN.
The sequence (in EUC-KR) is 

  (20) (B0,CD) (C0,B8) (B7,CE) (20)       : EUC-KR 
  (which should be converted to 

    U+0020 U+AC83 U+C73C U+B85C U+0020   : UCS-2 converted from 
                                            correct EUC-KR
   )
is interpreted as 

   (20) (B0) (CD,C0), (B8,B7), (CE) (20)   : misaligned EUC-KR

which, in turn, is converted to
   
   U+0020, UNKNOWN, U+75FC, UB9C9, UNKNOWN, U+0020 : UCS-2 (converted
                                                   from misaligned EUC-KR)


where a pair of parentheses denotes a sequence of octet(s)
for a single character. 

   I encountered this problem *every few* Korean pages, but it doesn't
seem to have any pattern(at least my casual inspection
hasn't given me any regularity). Even when there are two identical
strings in a single page, one of them gets corrupted while
the other doesn't. 
   
   As I wrote above, I guess this is very serious and fixing
this cannot be put off any longer.

Jungshik Shin

Comment 11

•

23 years ago

Attached image screenshot of Mozilla displaying the URL in my prev. comment — Details

Jungshik Shin

Comment 12

•

23 years ago

The page I gave in my prev. comment may not get corrupted if you 
try to reproduce it. When I revisited the page after quiting and
restarting Mozilla (MS-Windows ME) 0.9.1, the page rendered all right
at my first attempt. However, when I reloaded the page, I was able
to reproduce the problem. Under Linux, there was no problem in the page.
This does not mean that there's no problem under Linux but just means
that it's very hard to find any regularity in this bug. There
are web pages(where I don't see any problem in MS-Windows) with this symptom
in Linux version of Mozilla m0.9.1. One of such pages is

http://www.hani.co.kr/section-003000000/2001/05/003000000200105131503349.html

where '(20) (C0,CF) (C1,A4) (C0,BB) (20)' in EUC-KR is misaligned
and is treated as '(20) (C0,CF) (C1) (A4,C0), (BB) (20)'.

Frank Tang

Comment 13

•

23 years ago

move it to m0.9.3

Target Milestone: Future → mozilla0.9.3

jbetak@netscape.com (away - not reading bugmail)

Reporter

Comment 14

•

23 years ago

ftang: if you want me to, I could do some investigation on this bug...

jbetak@netscape.com (away - not reading bugmail)

Reporter

Comment 15

•

23 years ago

per ftang's comment, this has recently been improved, but still not completely 
fixed. Keeping opem and pushing out to 0.9.4.

jbetak@netscape.com (away - not reading bugmail)

Reporter

Updated

•

23 years ago

Target Milestone: mozilla0.9.3 → mozilla0.9.4

Frank Tang

Comment 16

•

23 years ago

I think we fix a lot of issue at m0.9.2 already. move this one to m0.9.7

Target Milestone: mozilla0.9.4 → mozilla0.9.7

Frank Tang

Comment 17

•

23 years ago

future it for now.

Target Milestone: mozilla0.9.7 → Future

Frankie

Updated

•

21 years ago

Blocks: 187812

jbetak@netscape.com (away - not reading bugmail)

Reporter

Updated

•

21 years ago

No longer blocks: 187812

URL: http://people/ftang/demo/utf8all.html → http://people.netscape.com/ftang/demo...

Frank Tang

Comment 18

•

19 years ago

what a hack. I have not touch mozilla code for 2 years. I didn't read these bugs
for 2 years. And they are still there. Just close them as won't fix to clean up.

Status: ASSIGNED → RESOLVED

Closed: 24 years ago → 19 years ago

Resolution: --- → WONTFIX

Travis Chase

Comment 19

•

19 years ago

Mass Bug Re-Open of bugs Frank Tang Closed with no good reason. Spam is his
fault not my own

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

Travis Chase

Comment 20

•

19 years ago

Mass Re-assinging Frank Tangs old bugs that he closed won't fix and had to be
re-open. Spam is his fault not my own

Assignee: ftang → nobody

Status: REOPENED → NEW

Serge Gautherie (:sgautherie)

Comment 21

•

16 years ago

Filter on "Nobody_NScomTLD_20080620"

Assignee: nobody → smontagu

QA Contact: teruko → i18n

Henri Sivonen (:hsivonen)

Assignee

Updated

•

8 years ago

Depends on: encoding_rs

Henri Sivonen (:hsivonen)

Assignee

Comment 22

•

7 years ago

I believe this was already fixed but this was fixed by bug 1261841 at the latest.

Status: NEW → RESOLVED

Closed: 19 years ago → 7 years ago

Resolution: --- → FIXED

Whiteboard: [fixed by encoding_rs]

Henri Sivonen (:hsivonen)

Assignee

Updated

•

7 years ago

Assignee: smontagu → hsivonen

Target Milestone: Future → mozilla56