Closed Bug 454 Opened 26 years ago Closed 24 years ago

Unix: 0x80-0x9F in cp1252 do not display correctly

Categories

(Core :: Internationalization, defect, P2)

x86
Linux
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: tim, Assigned: erik)

References

Details

(Keywords: platform-parity)

Attachments

(1 file)

Created by Tim Eliseo (tim@quiknet.com) on Friday, June 19, 1998 8:48:18 PM PDT
Additional Details :
Many Web pages use quote characters in the range 0x91-0x94
which are Microsoft codepage 1252 extensions. For X these
are currently mapped to the ? character rather than normal
quote characters. A patch follows to fix this. Note that
sequences such as ‘ are currently mapped properly; this
problem only shows up when the actual characters are in the
file.

--- mozilla/lib/libi18n/sbconvtb.c	Sat May  9 03:57:48 1998
+++ mozilla/lib/libi18n/sbconvtb.c.new	Fri Jun 19 20:28:19
1998
@@ -71,7 +71,7 @@
 /* Tables for Win CP1252 -> ISO 8859-1          */
 PRIVATE unsigned char cp1252_to_iso8859_1[] = {
 /*8x*/  '?', '?', ',', 'f', '?', '?', '?', '?', '^', '?',
'S', '<', '?', '?', '?', '?',
-/*9x*/  '?', '?', '?', '?', '?', '*', '-', '-', '~', '?',
's', '>', '?', '?', '?', 'Y',
+/*9x*/  '?', '`', '\'', '"', '"', '*', '-', '-', '~', '?',
's', '>', '?', '?', '?', 'Y',
 /*Ax*/
0xA0,0xA1,0xA2,0xA3,0xA4,0xA5,0xA6,0xA7,0xA8,0xA9,0xAA,0xAB,0xAC,0xAD,0xAE,0xAF,
 /*Bx*/
0xB0,0xB1,0xB2,0xB3,0xB4,0xB5,0xB6,0xB7,0xB8,0xB9,0xBA,0xBB,0xBC,0xBD,0xBE,0xBF,
 /*Cx*/
0xC0,0xC1,0xC2,0xC3,0xC4,0xC5,0xC6,0xC7,0xC8,0xC9,0xCA,0xCB,0xCC,0xCD,0xCE,0xCF,


For those of you like myself annoyed by this bug in the
commercial Netscape version, here's a quick fix:

adb -w netscape
cp1252_to_iso8859_1+0x11?W 0x22222760
^d

This is correct for little-endian architectures.
Assignee: bobj → ftang
Status: NEW → ASSIGNED
reassigned this to erik.
The patch does not work since it will break JavaScript string litera which force
to terminate eariler than it should. We have to move the fallback to the XFE.
But we need to keep those character value.....
Summary: codepage 1252 quote characters not mapped properly → 0x80-0x9F in cp1252 does not display correctly on Mac and UNIX
We won't take the same approach but we need to put code in the X rendering
engine to rneder those unicode code point which correspoding in 0x80-0x9F of
cp1252. Change the Summary to - 0x80-0x9F in cp1252 does not display correctly
on Mac and UNIX
QA Contact: 3851
Mac and Window is now working on apprunner and viewer. I don't think UNIX
is working. IQA, could you verify. We need to fix GTK GFX ...
I18n component in Bugzilla being retired.  Moving these bugs to
Internationalization component.
OS: other → Linux
Summary: 0x80-0x9F in cp1252 does not display correctly on Mac and UNIX → 0x80-0x9F in cp1252 does not display correctly on UNIX
Whiteboard: Mac is fixed. Unix is not.
Change summary from "0x80-0x9F in cp1252 does not display correctly on Mac and
UNIX" to 0x80-0x9F in cp1252 does not display correctly on UNIX".
I believe Mac is now working. IQA, please verify Mac. If Mac is not working,
please open a seperate bug. One bug for two platform is difficult to track.
Thanks.
Assignee: ftang → erik
Status: ASSIGNED → NEW
Target Milestone: M5
reassign the UNIX rendering bug to erik and mark the target fix as M5.
Status: NEW → ASSIGNED
Summary: 0x80-0x9F in cp1252 does not display correctly on UNIX → UNIX GFX Unicode Text Drawing- 0x80-0x9F in cp1252 does not display correctly on UNIX
Summary: UNIX GFX Unicode Text Drawing- 0x80-0x9F in cp1252 does not display correctly on UNIX → [PP]UNIX GFX Unicode Text Drawing- 0x80-0x9F in cp1252 does not display correctly on UNIX
Summary: [PP]UNIX GFX Unicode Text Drawing- 0x80-0x9F in cp1252 does not display correctly on UNIX → [PP] Unix: 0x80-0x9F in cp1252 do not display correctly
Target Milestone: M5 → M6
Target Milestone: M6 → M7
Target Milestone: M7 → M10
*** Bug 7880 has been marked as a duplicate of this bug. ***
Target Milestone: M10 → M12
Target Milestone: M12 → M15
*** Bug 5383 has been marked as a duplicate of this bug. ***
Blocks: 16507
Added bug 16507 as dependant on this.
Has this been tested on a font server configured to serve fonts as windows-1252
or in utf-16 or something where they are accessible through the proper unicode
codepoints?

Probably as many of these characters as possible should be displayed using
things like ' and " and -- if the correct glyphs aren't available rather than
displaying a character-not-displayed character.
Agreed -- displaying nothing at all (the current behavior) is even worse than
the 4.x behavior of showing the entity, since it means you don't see that you're
missing characters.  Showing something vaguely close to the right character
would be a lot better than showing nothing.
Keywords: pp
Moving all of my M15s to M16. Please add comments if you disagree.
Target Milestone: M15 → M16
Summary: [PP] Unix: 0x80-0x9F in cp1252 do not display correctly → Unix: 0x80-0x9F in cp1252 do not display correctly
Subject: Quotes problem
Date: Fri, 11 Feb 2000 12:52:38 +0000
From: Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk>

Erik,

Could you check out, whether the following small character conversion
table fix could be made on the Netscape Web browser:

As you can see on the test page

  http://www.cl.cam.ac.uk/~mgk25/ucs/CP1252.html

My Netscape Navigator 4.6 for Linux maps &#8216; (LEFT SINGLE QUOTATION
MARK &#x2018;) to 0x60 (GRAVE ACCENT). While this does look good in the
current X11 Adobe fonts which follow the old Adobe standard encoding for
ASCII and have on 0x27 "quoteright" and on 0x60 "quoteleft", the new X11
fonts will follow the modern Adobe Unicode mapping
<http://partners.adobe.com/asn/developer/typeforum/unicodegn.html>
and have accordingly instead on 0x27 "quotesingle" and on 0x60 "grave"
(because Unicode fonts have on U+2018 "quoteleft" and on U+2019
"quoteright".)

In other words: The ASCII text 'quote' will look acceptable with both
old and new fonts but `quote' will look slightly ugly with the new fonts
(this has been the case for a long time on MS-Windows already). The
advantage of the new fonts is that you will now find the proper
directional quotation marks on 0x2018 and 0x2019 such that you can show
all forms of the quotation marks accurately.

Therefore my urgent suggestion: Whenever you do a Unicode -> Latin-1
mapping, then please map both U+2018 and U+2019 to 0x27 and do NOT map
U+2018 to 0x60.

For details and background information on this issue, please read

  http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

Sorry if you have fixed all this already long ago in Mozilla.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>
&#8220; and &#8221; are lost completely.

They are generated by SGML-tools for the DocBook tag "quote".
We should address bug 31252 first, to get some basic fallback in place, and then
address this bug, so I'm targetting M17.
Target Milestone: M16 → M17
*** Bug 16872 has been marked as a duplicate of this bug. ***
*** Bug 24924 has been marked as a duplicate of this bug. ***
Blocks: 17962
I am working on this right now.
Severity: trivial → normal
Priority: P3 → P2
Target Milestone: M17 → M15
It's done. I would like a code review. Anybody?
erik, if you attach the patch, I *may* take a look at it (no promises at all).
Hi Pav, I'm about to attach the diffs to get "smart quotes", trademark,
ellipsis, and all those other windows-1252 characters to display on ordinary
Unix systems via fallbacks. If you're OK with these, I'd like to check in.
Roger, I have written the code to do fallbacks for windows-1252 characters
(e.g. ellipsis -> ...) and '?' for others on Unix. The fix is attached to this
bug. It is quite similar to the code you wrote recently for Windows (thanks).
Would you be willing to review it for me so that I can check in?
OK, I have read the diff. The following
+       nsFontGTK* font = FindFont('a');
should be based on the actual REPLACEMENT_CHAR, i.e., FindFont('?').
This way, if someone has, e.g., font-family: Symbol, the search will
still return straight away because Symbol has '?'. Other than that,
the patch looks fine.
I decided to use 'a' instead of '?' as the argument to FindFont because the
replacements are strings such as "EUR" (for euro), "OE" (for OE ligature),
"..." (for ellipsis), and so on. So we need to pass something that is likely
to return a font that has all of those characters.

On Unix, there are several fonts that do not even contain 'a'. For example, all
the East Asian fonts (Japanese, Chinese, Korean). Also, Symbol does not contain
all of the upper-case and lower-case letters A-Z and a-z.

Ideally, nsFontGTKSubstitute would actually do some font switching of its own
in GetWidth and DrawString, but since all of the current replacement chars
(e.g. "EUR") are from ASCII, and since all fonts that contain 'a' also contain
the rest of the ASCII characters, I think FindFont('a') is the best first step
we can take in this development. Maybe we'll do actual font switching later.

Thanks for the review, Roger! Checked in; marking FIXED.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
I verified this in 2000041307 Mac and 2000041310 Linux build.
Status: RESOLVED → VERIFIED
Problem still exists on OS/2
See http://www.heise.de/newsticker/data/jk-10.08.00-004/
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Setting platform to OS/2 and clearing status whiteboard.
OS: Linux → OS/2
Hardware: Other → PC
Whiteboard: Mac is fixed. Unix is not.
Target Milestone: M15 → ---
Daniel, please create a separate bug for OS/2. This bug is specifically for
Unix. Marking FIXED again.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
OS: OS/2 → Linux
Resolution: --- → FIXED
Verifying, based on Teruko's comment.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: