Closed Bug 199143 Opened 21 years ago Closed 21 years ago

Some Persian and Urdu characters in Code Page 1256 are not displayed

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: andreasprilopwww, Assigned: ftang)

References

()

Details

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020920 Netscape/7.0
Build Identifier: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020920 Netscape/7.0

Mozilla does not display some Persian and all Urdu characters from
Code Page 1256, i.e. on a page written in Windows-1256.
They are displayed with the "replacement character". It seems that
Mozilla assumes an obsolete, pre-1999 version of Code Page 1256.

Reproducible: Always

Steps to Reproduce:
Go to <http://www.unics.uni-hannover.de/nhtcapri/arabic.win>
Look under "Persian alphabet" and "Additional Urdu letters".
Actual Results:  
Some characters are displayed using the "replacement character".

Expected Results:  
Display Persian and Urdu characters.

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1256.TXT
You forgot to CC :-)
Interesting synchronicity. Frank and I were just looking at this problem yesterday.
Assignee: mkaply → ftang
Status: UNCONFIRMED → NEW
Component: BiDi Hebrew & Arabic → Internationalization
Ever confirmed: true
This testcase was created by a tool which builds an encoding table from the
mapping file at http://www.unicode.org/Public/MAPPINGS with the addition of
javascript to evaluate the results of decoding.
The tool I mentioned in comment 3 is now live at
http://smontagu.damowmow.com/encodingtest.html

Codepages with errors in current Mozilla trunk:
ISO-8859-7
ISO-8859-8
ISO-8859-10
windows-1256
windows-1258
windows-950 (Big5)
Keywords: nsbeta1
Please check also the Macintosh character sets at
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/

For example, MacCyrillic has
xA2   GHE with upturn   NOT   cent sign
xB6   ghe with upturn   NOT   partial differential
xFF   euro sign         NOT   currency sign
adt: nsbeta1-
Keywords: nsbeta1nsbeta1-
Thanks for the reminder, Andreas.

Mac encodings with errors (not including encodings that don't appear in our
menus such as MacThai):

MacCroatian
MacCyrillic
MacDevanagari
MacFarsi
MacGreek
MacGujarati
MacGurmukhi
MacIcelandic
MacRoman
MacRomanian
MacTurkish
MacUkrainian
Depends on: 203838
Strike MacDevanagari and MacGujarati from that list: they are affected by
another issue (bug 203838). MacGurmukhi is blocked by bug 203838 too, but also
has one conversion error.
Andreas: do you know any URLs which actually use Mac codepages? I have suggested
in bug 203838 that we don't need to expose them in the UI.
IMHO support for Macintosh character sets is pointless for the Unix and
MS Windows versions because web pages are not written in Mac encodings.

It is important for the Macintosh Mozilla version.

Mozilla/Netscape for Mac OS 9 displays, for example, Cyrillic text incorrectly
because of errors in the assumed x-Mac-Cyrillic character set (ghe with upturn,
euro). This may be of no importance in Mac OS X.
Comment on attachment 127020 [details] [diff] [review]
Update cp1256.uf and cp1256.ut

jshin, can you review this? It's generated by intl/uconv/tools/umaptable.c from
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1256.TXT
Attachment #127020 - Flags: review?(jshin)
Comment on attachment 127020 [details] [diff] [review]
Update cp1256.uf and cp1256.ut

r=jshin

BTW, it looks like we're reverting to NPL 1.0 because umaptable.c is not
updated.
Attachment #127020 - Flags: review?(jshin) → review+
erk, you're right. I saw the 1.0/1.1 difference but I just assumed that it was
going in the right direction without noticing which lines had + and which -
Now I get it: the files in the tree have had the license changed since they were
generated so regenerating reverts to the original form. I'll edit out the
license changes before checking in.
Comment on attachment 127020 [details] [diff] [review]
Update cp1256.uf and cp1256.ut

roc, can you sr? Would you be ready to give blanket rs= for similar updates in
future, which just generate new *.uf and *.ut files from the mapping files at
www.unicode.org without any code changes?
Attachment #127020 - Flags: superreview?(roc+moz)
yea. the old code is based on version 2.0 dated 4/15/98
Comment on attachment 127020 [details] [diff] [review]
Update cp1256.uf and cp1256.ut

smontagu, that makes sense ... blanket rs granted
Attachment #127020 - Flags: superreview?(roc+moz) → superreview+
Fix checked in.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
*** Bug 228030 has been marked as a duplicate of this bug. ***
It looks like this broke "smart quote" rendering on Linux (removing the fallback
chars).  See bug 232026.
I am forwarding a link of Urdu Page
http://www.aiqbal.iamyourhost.com/cafe/index.php?p=167#comments
Urdu like Arabic change shape in writing with reference to the Characters
comming befer or after. If you see the page in IE Characters are properly
jointing whereas in Mozilla (firebox or any) few Characters don't join properly.
Please note every thing in the page is in unicode (charset=utf-8)
(In reply to comment #22)
> I am forwarding a link of Urdu Page
> http://www.aiqbal.iamyourhost.com/cafe/index.php?p=167#comments
> Urdu like Arabic change shape in writing with reference to the Characters
> comming befer or after. If you see the page in IE Characters are properly
> jointing whereas in Mozilla (firebox or any) few Characters don't join properly.
> Please note every thing in the page is in unicode (charset=utf-8)

In addition to my above post following is the link for BBc Urdu
http://www.bbc.co.uk/urdu/
Please check
Nazir, please open a new bug for the shaping issue.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: