Closed Bug 18377 Opened 20 years ago Closed 20 years ago
Latin2 character E8 is not displayed in Input Value field
Only E8 character "è" is not display INPUT Value field. Steps of reproduce 1. go to above URL Look at field after State, Country, Zip. The 4 characters same as after test1 and test3 should be displayed in there. However, the last "è" is missing. Tested 110909 Win32, Mac, and Linux build.
window display problem. Assign to erik.
Frank, Teruko said that she tested Win32, Mac and Linux. So this is not a Windows display problem. Teruko, if this problems appears on Win32, Mac and Linux, please change the OS field to All. Teruko, I had a look at the URL above, and found that the document is in little endian Unicode, even though the META charset says iso-8859-2. Either the document shouldn't be in Unicode, or the META charset shouldn't say iso-8859-2, right? Re-assigning to Teruko so that she can fix the test page first.
Ok, I fixed the test cases.
The 4th letter (0xE8, which is small c with caron in iso-8859-2) is indeed missing in the State, Country and Zip fields. When I did a View Source, that letter was missing even in the source, but not in Nav4's View Source. So this may be a parser bug. Re-assigning to RickG.
The 4th character is truly missing in the display, but it is correctly handled in the parser (a breakpoint in nsHTMLTokenizer::ConsumeAttributes proves it). I suspect a font rendering problem. Another interesting problem: viewsource doesn't display on this page (for me) because the charset system is not correctly handling the meta tag. Returning to erik for his opinion on the charset/font issue. I've attached a min. test case.
I did some checking in the font engine on Windows, and it turns out that I do see the 4th character 0xE8 the first time, but then the next time I only see 3 characters with different codes. The different codes are due to the META charset causing a re-parse with the iso-8859-2 characters converted to Unicodes. However, the loss of the 4th char is due to a different problem. However, the font engine *is* receiving all 4 of the Unicodes later on in the document (i.e. next to "test1"). This means that the font engine is working properly (since it displays all 4 chars), and the charset converter is working properly (since the final 0xE8 in iso-8859-2 becomes 0x010D in Unicode). So we have a bug, and it is not in the charset converter, and not in the font engine. It could be in the parser, or somewhere downstream between the parser and font engine (e.g. content sink, style/frame system, etc). This is just a wild guess, but the code 0x010D happens to have 0x0D in the least significant byte, which is Carriage Return. Perhaps the HTML attribute parser is looking for CR (0x0D) and LF (0x0A) to terminate the attribute value, and it is masking the most significant byte in the Unicode so that it only sees the least significant byte (i.e. 0x010D looks like 0x0D and fools the parser). Returning to RickG for his opinion on my wild guess.
By the way, View Source is working for me, even with the META charset. (Tree pulled and built today.)
Ok -- silly me. The real problem was that my tree (in san diego) had gone stale. I've corrected the problem and will land it with my next update.
Fixed by change to nsStr where char's were being promoted with sign extended.
I tested this in 111708 Win32, 111709 build. This works fine. However, in 111612 (I downloaded in 111708-m12 directory), the character 'E8' does not show. I need to reopen this. I will test this in next Mac build.
Can you please verify this before reopening? Also -- the build number you cite with the problem is on the mac, I presume?
Status: NEW → RESOLVED
Closed: 20 years ago → 20 years ago
Resolution: --- → FIXED
I tested this in 111708 Mac build. This works fine. I think the fix was not there in Mac build I tested before. I see some other characters are not displayed in Mac. That is in bug 18095.
You need to log in before you can comment on or make changes to this bug.