Closed
Bug 26920
Opened 25 years ago
Closed 7 years ago
Generic UCV buffering scheme byte misalignments
Categories
(Core :: Internationalization, defect, P3)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla56
People
(Reporter: jbetak, Assigned: hsivonen)
References
()
Details
(Keywords: intl, Whiteboard: [fixed by encoding_rs])
Attachments
(1 file)
108.60 KB,
image/jpeg
|
Details |
If you use an old build (e.g. 2000020310 or earlier), then you will notice that we "eat" some parts of the HTML and the file rendering is disrupted. Although this buffering scheme is not used in the UTF-8 decoder anymore, it is still needed for other decoders and should be revisited since there might be problems with buffer alignment. I looked at it in the debugger and my impression was just that - the buffer comes back with inappropriately aligned bytes and we lose some of them in the process...
Comment 1•25 years ago
|
||
jbetak- Can this bug be reproduceable by the current build w/ other charsets ?
Assignee: ftang → cata
Being a rather random and difficult to reproduce bug, I'm moving it far away. No need to worry until it bites us harder or we have extra time.
Target Milestone: M20
Cata and Juraj, can you better characterize the nature of this bug? Then IQA can do some testing to assure us that this is indeed a rare case. I don't want to find out that this is not rare after Beta1.
Comment 4•24 years ago
|
||
with no specific information. Mark it as invalid
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → INVALID
Comment 5•24 years ago
|
||
After I talked with Bob, we should reopen this bug. I reopen this and mark as future.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Target Milestone: M20 → Future
Are you sure, guys? For all I know Juraj was the only one seeing this once or twice a long time ago. I never heard of other occurances, and I cannot reproduce it... Why reopen?
Have your carefully reviewed the code where Juraj thought he saw the loss of misaligned bytes? How did you try to reproduce this? What test cases do you have? Can we do any instrumentation (e.g., ASSERTs) to try to catch this? Losing random bytes can be very hard to find. Let's reassure ourselves that there are no edge cases where bytes are being lost before invalidating this.
I tried to reproduce this with the URL from the bug report: http://people/ftang/demo/utf8all.html. That's the only test case I know of. And it worked just fine. The reason I do not belive this bug is valid is the nature of that code. It is byte-processing code. Very deterministic. If there's a problem once, I expect it to be always there, every time we process that bytestream. Also, that code is shared by *all* converters. It is exercised for every multybyte page. Any "byte eating" should be *very* obvious (pretty much garbage on the rest of the whole page...). And yet this is the only report we have and I can't reproduce it. So, I'll leave it up to you if you want to close the bug or leave it open.
Updated•24 years ago
|
Status: NEW → ASSIGNED
Comment 10•23 years ago
|
||
I haven't seen the symptom of this bug before m 0.9.1, but with m 0.9.1 I came across a lot of pages (Korean in EUC-KR) manifesting what I believe to be a symptom of this bug. Sometimes, misalignment in the converter seems to be get fixed by reloading, but most of time, reloading pages doesn't help. This bug is very very serious. For instance, look at http://www.hani.co.kr/section-005000000/2001/07/005000000200107021018305.html Around the end of the 4th paragraph, misalignment occurred and what should three Hangul syllables (U+AC83, U+C73C, U+B85C) is rendered as UNKNOWN("?" inside diamond), U+75FC, U+B9C9, UNKNOWN. The sequence (in EUC-KR) is (20) (B0,CD) (C0,B8) (B7,CE) (20) : EUC-KR (which should be converted to U+0020 U+AC83 U+C73C U+B85C U+0020 : UCS-2 converted from correct EUC-KR ) is interpreted as (20) (B0) (CD,C0), (B8,B7), (CE) (20) : misaligned EUC-KR which, in turn, is converted to U+0020, UNKNOWN, U+75FC, UB9C9, UNKNOWN, U+0020 : UCS-2 (converted from misaligned EUC-KR) where a pair of parentheses denotes a sequence of octet(s) for a single character. I encountered this problem *every few* Korean pages, but it doesn't seem to have any pattern(at least my casual inspection hasn't given me any regularity). Even when there are two identical strings in a single page, one of them gets corrupted while the other doesn't. As I wrote above, I guess this is very serious and fixing this cannot be put off any longer.
Comment 11•23 years ago
|
||
Comment 12•23 years ago
|
||
The page I gave in my prev. comment may not get corrupted if you try to reproduce it. When I revisited the page after quiting and restarting Mozilla (MS-Windows ME) 0.9.1, the page rendered all right at my first attempt. However, when I reloaded the page, I was able to reproduce the problem. Under Linux, there was no problem in the page. This does not mean that there's no problem under Linux but just means that it's very hard to find any regularity in this bug. There are web pages(where I don't see any problem in MS-Windows) with this symptom in Linux version of Mozilla m0.9.1. One of such pages is http://www.hani.co.kr/section-003000000/2001/05/003000000200105131503349.html where '(20) (C0,CF) (C1,A4) (C0,BB) (20)' in EUC-KR is misaligned and is treated as '(20) (C0,CF) (C1) (A4,C0), (BB) (20)'.
Reporter | ||
Comment 14•23 years ago
|
||
ftang: if you want me to, I could do some investigation on this bug...
Reporter | ||
Comment 15•23 years ago
|
||
per ftang's comment, this has recently been improved, but still not completely fixed. Keeping opem and pushing out to 0.9.4.
Reporter | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Comment 16•23 years ago
|
||
I think we fix a lot of issue at m0.9.2 already. move this one to m0.9.7
Target Milestone: mozilla0.9.4 → mozilla0.9.7
Comment 18•19 years ago
|
||
what a hack. I have not touch mozilla code for 2 years. I didn't read these bugs for 2 years. And they are still there. Just close them as won't fix to clean up.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago → 19 years ago
Resolution: --- → WONTFIX
Comment 19•19 years ago
|
||
Mass Bug Re-Open of bugs Frank Tang Closed with no good reason. Spam is his fault not my own
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 20•19 years ago
|
||
Mass Re-assinging Frank Tangs old bugs that he closed won't fix and had to be re-open. Spam is his fault not my own
Assignee: ftang → nobody
Status: REOPENED → NEW
Comment 21•16 years ago
|
||
Filter on "Nobody_NScomTLD_20080620"
Assignee: nobody → smontagu
QA Contact: teruko → i18n
Assignee | ||
Updated•8 years ago
|
Depends on: encoding_rs
Assignee | ||
Comment 22•7 years ago
|
||
I believe this was already fixed but this was fixed by bug 1261841 at the latest.
Status: NEW → RESOLVED
Closed: 19 years ago → 7 years ago
Resolution: --- → FIXED
Whiteboard: [fixed by encoding_rs]
Assignee | ||
Updated•7 years ago
|
Assignee: smontagu → hsivonen
Target Milestone: Future → mozilla56
You need to log in
before you can comment on or make changes to this bug.
Description
•