Closed Bug 138215 Opened 23 years ago Closed 4 years ago

Unicode control characters are printed as symbols

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bronger, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020313 BuildID: 2002031312 Unicode characters like "Emspace" "ThinSpace" or "PrivateUseOne" (in the Unicode code charts enclosed by dashed lines) are printed as their code chart symbols. But correct would be the verbatim output, i.e. a *real* em-space or simply nothing for "PrivateUseOne". These are only examples, this report applies to all special characters. Reproducible: Always Steps to Reproduce: 1. Open the given URL 2. 3. Actual Results: Unicode characters like "Emspace" "ThinSpace" or "PrivateUseOne" (in the Unicode code charts enclosed by dashed lines) are printed as their code chart symbols. Expected Results: Correct would be the verbatim output, i.e. a *real* em-space (broad white space) or simply nothing for "PrivateUseOne". These are only examples, this report applies to all special characters. This doesn't happen if you use name entities in the HTML code. So, ߓ and   produce different output, which mustn't be.
To intl.
Assignee: attinasi → yokoyama
Status: UNCONFIRMED → NEW
Component: Layout → Internationalization
Ever confirmed: true
QA Contact: petersen → ruixu
Keywords: intl
QA Contact: ruixu → ylong
Over to shanjian
Assignee: yokoyama → shanjian
Those are 2 different issues. For the first issue, I could not reproduce it on both linux and windows. The 2nd observed behavior is intentioal. Because of the wide spread of win1252, and MS sometimes misname it as win-latin1, many webpages take for granted and use 0x92 for single quote. Since this code point is not used in latin1 anyway, we interpret using win1252. Some people may disagree of this implementation, but if we don't do that, we will have tons of bugs and users will blame mozilla.
Status: NEW → ASSIGNED
If you export the following HTML excerpt <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Test Page</title> </head> <body> <p>&#x2003;&#x91;&#x82;</p> </body> </html> to the local file bronger.xhtml (I think the 'xhtml' is significant!) and load it into Mozilla099 (Gecko/20020313, I use the Linux version), then you get this: EM SP PU1 BPH (e.i. nine letters and one digit) which is wrong. &#x...; refers in XML files to unicodes, the file is a UTF-8 XML file. No Latin-1 here. (But BTW, an encoding = "iso-8859-1" wouldn't change anything.) The "EMSP" must in fact be a wide white space, and the other two C1-Control characters should Mozilla at least ignore, but under no circumstance it should produce their "names".
I've prepared a better demonstration document at <http://tbookdtd.sourceforge.net/unitest.xhtml>. I consider the codes in the table (except for the C1 characters above 83) more or less significant skip characters that should be printed properly. (Although Unicode offers even more.)
shanjian is no longer working on mozilla for 2 years and these bugs are still here. Mark them won't fix. If you want to reopen it, find a good owner first.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
I find this bug-closing policy a little bit odd, but most of the wrong glyphs mentioned here have been fixed without being noted here anyway. The only remaining one that's worth a new bug entry is the zwnj in my opinion.
Mass Re-assigning bugs that Frank Tang Closed on March 1st Spam is his fault Mass Re-Open to follow
Assignee: shanjian → nobody
Mass Bug Re-Open of bugs Frank Tang Closed with no good reason. Spam is his fault not my own
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Reassigning Franks old bugs to Jungshik Shin for triage - Sorry for spam
Assignee: nobody → jshin1987
Status: REOPENED → NEW
QA Contact: amyy → i18n

this seems to be working now

Status: NEW → RESOLVED
Closed: 20 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.