Closed Bug 26920 Opened 26 years ago Closed 8 years ago

Generic UCV buffering scheme byte misalignments

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla56

People

(Reporter: jbetak, Assigned: hsivonen)

References

(
URL
)

Details

(Keywords: intl, Whiteboard: [fixed by encoding_rs])

Attachments

(1 file)

screenshot of Mozilla displaying the URL in my prev. comment 24 years ago Jungshik Shin 108.60 KB, image/jpeg		Details

jbetak@netscape.com (away - not reading bugmail)

Reporter

Description

•

26 years ago

If you use an old build (e.g. 2000020310 or earlier), then you will notice that we "eat" some parts of the HTML and the file rendering is disrupted. Although this buffering scheme is not used in the UTF-8 decoder anymore, it is still needed for other decoders and should be revisited since there might be problems with buffer alignment. I looked at it in the debugger and my impression was just that - the buffer comes back with inappropriately aligned bytes and we lose some of them in the process...

Frank Tang

Comment 1

•

26 years ago

jbetak- Can this bug be reproduceable by the current build w/ other charsets ?

Assignee: ftang → cata

cata

Updated

•

26 years ago

Status: NEW → ASSIGNED

cata

Comment 2

•

26 years ago

Being a rather random and difficult to reproduce bug, I'm moving it far away. No need to worry until it bites us harder or we have extra time.

Target Milestone: M20

bobj

Comment 3

•

26 years ago

Cata and Juraj, can you better characterize the nature of this bug? Then IQA can do some testing to assure us that this is indeed a rare case. I don't want to find out that this is not rare after Beta1.

Frank Tang

Comment 4

•

25 years ago

with no specific information. Mark it as invalid

Status: ASSIGNED → RESOLVED

Closed: 25 years ago

Resolution: --- → INVALID

Teruko Kobayashi

Comment 5

•

25 years ago

After I talked with Bob, we should reopen this bug. I reopen this and mark as future.

Status: RESOLVED → REOPENED

Resolution: INVALID → ---

Target Milestone: M20 → Future

cata

Comment 6

•

25 years ago

Are you sure, guys? For all I know Juraj was the only one seeing this once or twice a long time ago. I never heard of other occurances, and I cannot reproduce it... Why reopen?

bobj

Comment 7

•

25 years ago

Have your carefully reviewed the code where Juraj thought he saw the loss of misaligned bytes? How did you try to reproduce this? What test cases do you have? Can we do any instrumentation (e.g., ASSERTs) to try to catch this? Losing random bytes can be very hard to find. Let's reassure ourselves that there are no edge cases where bytes are being lost before invalidating this.

cata

Comment 8

•

25 years ago

I tried to reproduce this with the URL from the bug report: http://people/ftang/demo/utf8all.html. That's the only test case I know of. And it worked just fine. The reason I do not belive this bug is valid is the nature of that code. It is byte-processing code. Very deterministic. If there's a problem once, I expect it to be always there, every time we process that bytestream. Also, that code is shared by *all* converters. It is exercised for every multybyte page. Any "byte eating" should be *very* obvious (pretty much garbage on the rest of the whole page...). And yet this is the only report we have and I can't reproduce it. So, I'll leave it up to you if you want to close the bug or leave it open.

Teruko Kobayashi

Updated

•

25 years ago

Keywords: intl

Frank Tang

Comment 9

•

25 years ago

move all cata's bug to ftang

Assignee: cata → ftang

Status: REOPENED → NEW

Frank Tang

Updated

•

25 years ago

Status: NEW → ASSIGNED

Jungshik Shin

Comment 10

•

24 years ago

I haven't seen the symptom of this bug before m 0.9.1, but with m 0.9.1 I came across a lot of pages (Korean in EUC-KR) manifesting what I believe to be a symptom of this bug. Sometimes, misalignment in the converter seems to be get fixed by reloading, but most of time, reloading pages doesn't help. This bug is very very serious. For instance, look at http://www.hani.co.kr/section-005000000/2001/07/005000000200107021018305.html Around the end of the 4th paragraph, misalignment occurred and what should three Hangul syllables (U+AC83, U+C73C, U+B85C) is rendered as UNKNOWN("?" inside diamond), U+75FC, U+B9C9, UNKNOWN. The sequence (in EUC-KR) is (20) (B0,CD) (C0,B8) (B7,CE) (20) : EUC-KR (which should be converted to U+0020 U+AC83 U+C73C U+B85C U+0020 : UCS-2 converted from correct EUC-KR ) is interpreted as (20) (B0) (CD,C0), (B8,B7), (CE) (20) : misaligned EUC-KR which, in turn, is converted to U+0020, UNKNOWN, U+75FC, UB9C9, UNKNOWN, U+0020 : UCS-2 (converted from misaligned EUC-KR) where a pair of parentheses denotes a sequence of octet(s) for a single character. I encountered this problem *every few* Korean pages, but it doesn't seem to have any pattern(at least my casual inspection hasn't given me any regularity). Even when there are two identical strings in a single page, one of them gets corrupted while the other doesn't. As I wrote above, I guess this is very serious and fixing this cannot be put off any longer.

Jungshik Shin

Comment 11

•

24 years ago

Attached image screenshot of Mozilla displaying the URL in my prev. comment — Details

Jungshik Shin

Comment 12

•

24 years ago

The page I gave in my prev. comment may not get corrupted if you try to reproduce it. When I revisited the page after quiting and restarting Mozilla (MS-Windows ME) 0.9.1, the page rendered all right at my first attempt. However, when I reloaded the page, I was able to reproduce the problem. Under Linux, there was no problem in the page. This does not mean that there's no problem under Linux but just means that it's very hard to find any regularity in this bug. There are web pages(where I don't see any problem in MS-Windows) with this symptom in Linux version of Mozilla m0.9.1. One of such pages is http://www.hani.co.kr/section-003000000/2001/05/003000000200105131503349.html where '(20) (C0,CF) (C1,A4) (C0,BB) (20)' in EUC-KR is misaligned and is treated as '(20) (C0,CF) (C1) (A4,C0), (BB) (20)'.

Frank Tang

Comment 13

•

24 years ago

move it to m0.9.3

Target Milestone: Future → mozilla0.9.3

jbetak@netscape.com (away - not reading bugmail)

Reporter

Comment 14

•

24 years ago

ftang: if you want me to, I could do some investigation on this bug...

jbetak@netscape.com (away - not reading bugmail)

Reporter

Comment 15

•

24 years ago

per ftang's comment, this has recently been improved, but still not completely fixed. Keeping opem and pushing out to 0.9.4.

jbetak@netscape.com (away - not reading bugmail)

Reporter

Updated

•

24 years ago

Target Milestone: mozilla0.9.3 → mozilla0.9.4

Frank Tang

Comment 16

•

24 years ago

I think we fix a lot of issue at m0.9.2 already. move this one to m0.9.7

Target Milestone: mozilla0.9.4 → mozilla0.9.7

Frank Tang

Comment 17

•

24 years ago

future it for now.

Target Milestone: mozilla0.9.7 → Future

Frankie

Updated

•

22 years ago

Blocks: 187812

jbetak@netscape.com (away - not reading bugmail)

Reporter

Updated

•

22 years ago

No longer blocks: 187812

URL: http://people/ftang/demo/utf8all.html → http://people.netscape.com/ftang/demo...

Frank Tang

Comment 18

•

20 years ago

what a hack. I have not touch mozilla code for 2 years. I didn't read these bugs for 2 years. And they are still there. Just close them as won't fix to clean up.

Status: ASSIGNED → RESOLVED

Closed: 25 years ago → 20 years ago

Resolution: --- → WONTFIX

Travis Chase

Comment 19

•

20 years ago

Mass Bug Re-Open of bugs Frank Tang Closed with no good reason. Spam is his fault not my own

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

Travis Chase

Comment 20

•

20 years ago

Mass Re-assinging Frank Tangs old bugs that he closed won't fix and had to be re-open. Spam is his fault not my own

Assignee: ftang → nobody

Status: REOPENED → NEW

Serge Gautherie (:sgautherie)

Comment 21

•

17 years ago

Filter on "Nobody_NScomTLD_20080620"

Assignee: nobody → smontagu

QA Contact: teruko → i18n

Henri Sivonen (:hsivonen)

Assignee

Updated

•

9 years ago

Depends on: encoding_rs

Henri Sivonen (:hsivonen)

Assignee

Comment 22

•

8 years ago

I believe this was already fixed but this was fixed by bug 1261841 at the latest.

Status: NEW → RESOLVED

Closed: 20 years ago → 8 years ago

Resolution: --- → FIXED

Whiteboard: [fixed by encoding_rs]

Henri Sivonen (:hsivonen)

Assignee

Updated

•

8 years ago

Assignee: smontagu → hsivonen

Target Milestone: Future → mozilla56

You need to log in before you can comment on or make changes to this bug.