Closed Bug 18377 Opened 20 years ago Closed 20 years ago

Latin2 character E8 is not displayed in Input Value field


(Core :: Internationalization, defect, P3)






(Reporter: teruko, Assigned: teruko)





(1 file)

254 bytes, text/html
Only E8 character "è" is not display INPUT Value field.

Steps of reproduce
1. go to above URL

Look at field after State, Country, Zip.
The 4 characters same as after test1 and test3 should be displayed in there.
However, the last "è" is missing.

Tested 110909 Win32, Mac, and Linux build.
Assignee: ftang → erik
window display problem. Assign to erik.
Assignee: erik → teruko
Frank, Teruko said that she tested Win32, Mac and Linux. So this is not a
Windows display problem.

Teruko, if this problems appears on Win32, Mac and Linux, please change the OS
field to All.

Teruko, I had a look at the URL above, and found that the document is in little
endian Unicode, even though the META charset says iso-8859-2. Either the
document shouldn't be in Unicode, or the META charset shouldn't say iso-8859-2,

Re-assigning to Teruko so that she can fix the test page first.
OS: Windows NT → All
Assignee: teruko → erik
Ok, I fixed the test cases.
Assignee: erik → rickg
The 4th letter (0xE8, which is small c with caron in iso-8859-2) is indeed
missing in the State, Country and Zip fields. When I did a View Source, that
letter was missing even in the source, but not in Nav4's View Source. So this
may be a parser bug. Re-assigning to RickG.
Attached file reduced case
Assignee: rickg → erik
The 4th character is truly missing in the display, but it is correctly handled
in the parser (a breakpoint in nsHTMLTokenizer::ConsumeAttributes proves it). I
suspect a font rendering problem.

Another interesting problem: viewsource doesn't display on this page (for me)
because the charset system is not correctly handling the meta tag. Returning to
erik for his opinion on the charset/font issue.

I've attached a min. test case.
Assignee: erik → rickg
I did some checking in the font engine on Windows, and it turns out that I do
see the 4th character 0xE8 the first time, but then the next time I only see
3 characters with different codes. The different codes are due to the META
charset causing a re-parse with the iso-8859-2 characters converted to Unicodes.
However, the loss of the 4th char is due to a different problem.

However, the font engine *is* receiving all 4 of the Unicodes later on in the
document (i.e. next to "test1"). This means that the font engine is working
properly (since it displays all 4 chars), and the charset converter is working
properly (since the final 0xE8 in iso-8859-2 becomes 0x010D in Unicode).

So we have a bug, and it is not in the charset converter, and not in the font
engine. It could be in the parser, or somewhere downstream between the parser
and font engine (e.g. content sink, style/frame system, etc).

This is just a wild guess, but the code 0x010D happens to have 0x0D in the least
significant byte, which is Carriage Return. Perhaps the HTML attribute parser
is looking for CR (0x0D) and LF (0x0A) to terminate the attribute value, and it
is masking the most significant byte in the Unicode so that it only sees the
least significant byte (i.e. 0x010D looks like 0x0D and fools the parser).

Returning to RickG for his opinion on my wild guess.
By the way, View Source is working for me, even with the META charset. (Tree
pulled and built today.)
Ok -- silly me. The real problem was that my tree (in san diego) had gone stale.
I've corrected the problem and will land it with my next update.
Closed: 20 years ago
Resolution: --- → FIXED
Fixed by change to nsStr where char's were being promoted with sign extended.
I tested this in 111708 Win32, 111709 build. This works fine.
However, in 111612 (I downloaded in 111708-m12 directory), the character 'E8'
does not show.  I need to reopen this.  I will test this in next Mac build.
Resolution: FIXED → ---
Assignee: rickg → teruko
Can you please verify this before reopening? Also -- the build number you cite
with the problem is on the mac, I presume?
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
I tested this in 111708 Mac build.  This works fine. I think the fix was not
there in Mac build I tested before.  I see some other characters are not
displayed in Mac.  That is in bug 18095.
You need to log in before you can comment on or make changes to this bug.