Closed Bug 138215 Opened 22 years ago Closed 3 years ago

Unicode control characters are printed as symbols

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bronger, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020313
BuildID:    2002031312

Unicode characters like "Emspace" "ThinSpace" or "PrivateUseOne" (in the Unicode
code charts enclosed by dashed lines) are printed as their code chart symbols. 
But correct would be the verbatim output, i.e. a *real* em-space or simply
nothing for "PrivateUseOne".  These are only examples, this report applies to
all special characters.

Reproducible: Always
Steps to Reproduce:
1. Open the given URL
2.
3.

Actual Results:  Unicode characters like "Emspace" "ThinSpace" or
"PrivateUseOne" (in the Unicode code charts enclosed by dashed lines) are
printed as their code chart symbols.

Expected Results:  Correct would be the verbatim output, i.e. a *real* em-space
(broad white space) or simply nothing for "PrivateUseOne".  These are only
examples, this report applies to all special characters.

This doesn't happen if you use name entities in the HTML code.  So, ߓ and
  produce different output, which mustn't be.
To intl.
Assignee: attinasi → yokoyama
Status: UNCONFIRMED → NEW
Component: Layout → Internationalization
Ever confirmed: true
QA Contact: petersen → ruixu
Keywords: intl
QA Contact: ruixu → ylong
Over to shanjian
Assignee: yokoyama → shanjian
Those are 2 different issues. 
For the first issue, I could not reproduce it on both linux and windows. 
The 2nd observed behavior is intentioal. Because of the wide spread of win1252,
and MS sometimes misname it as win-latin1, many webpages take for granted and
use 0x92 for single quote. Since this code point is not used in latin1 anyway,
we interpret using win1252. Some people may disagree of this implementation, but
if we don't do that, we will have tons of bugs and users will blame mozilla. 
Status: NEW → ASSIGNED
If you export the following HTML excerpt

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
   <title>Test Page</title>
</head>
<body>
<p>&#x2003;&#x91;&#x82;</p>
</body>
</html>

to the local file bronger.xhtml (I think the 'xhtml' is significant!)
and load it into Mozilla099 (Gecko/20020313, I use the Linux version),
then you get this:

EM
SP PU1 BPH

(e.i. nine letters and one digit) which is wrong.  &#x...; refers 
in XML files to unicodes, the file is a UTF-8 XML file.  No 
Latin-1 here.  (But BTW, an encoding = "iso-8859-1" wouldn't
change anything.)  The "EMSP" must in fact be a wide white 
space, and the other two C1-Control characters should
Mozilla at least ignore, but under no circumstance it should produce
their "names".
I've prepared a better demonstration document at
<http://tbookdtd.sourceforge.net/unitest.xhtml>.  I consider the codes in the
table (except for the C1 characters above 83) more or less significant skip
characters that should be printed properly.  (Although Unicode offers even more.)
shanjian is no longer working on mozilla for 2 years and these bugs are still
here. Mark them won't fix. If you want to reopen it, find a good owner first. 
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → WONTFIX
I find this bug-closing policy a little bit odd, but most of the wrong glyphs
mentioned here have been fixed without being noted here anyway.  The only
remaining one that's worth a new bug entry is the zwnj in my opinion.
Mass Re-assigning bugs that Frank Tang Closed on March 1st Spam is his fault

Mass Re-Open to follow
Assignee: shanjian → nobody
Mass Bug Re-Open of bugs Frank Tang Closed with no good reason. Spam is his
fault not my own
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Reassigning Franks old bugs to Jungshik Shin for triage - Sorry for spam
Assignee: nobody → jshin1987
Status: REOPENED → NEW
QA Contact: amyy → i18n

this seems to be working now

Status: NEW → RESOLVED
Closed: 19 years ago3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.