Closed
Bug 199143
Opened 22 years ago
Closed 22 years ago
Some Persian and Urdu characters in Code Page 1256 are not displayed
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
People
(Reporter: andreasprilopwww, Assigned: ftang)
References
()
Details
Attachments
(2 files)
49.62 KB,
text/html
|
Details | |
13.23 KB,
patch
|
jshin1987
:
review+
roc
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020920 Netscape/7.0
Build Identifier: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020920 Netscape/7.0
Mozilla does not display some Persian and all Urdu characters from
Code Page 1256, i.e. on a page written in Windows-1256.
They are displayed with the "replacement character". It seems that
Mozilla assumes an obsolete, pre-1999 version of Code Page 1256.
Reproducible: Always
Steps to Reproduce:
Go to <http://www.unics.uni-hannover.de/nhtcapri/arabic.win>
Look under "Persian alphabet" and "Additional Urdu letters".
Actual Results:
Some characters are displayed using the "replacement character".
Expected Results:
Display Persian and Urdu characters.
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1256.TXT
Comment 1•22 years ago
|
||
You forgot to CC :-)
Comment 2•22 years ago
|
||
Interesting synchronicity. Frank and I were just looking at this problem yesterday.
Assignee: mkaply → ftang
Status: UNCONFIRMED → NEW
Component: BiDi Hebrew & Arabic → Internationalization
Ever confirmed: true
Comment 3•22 years ago
|
||
This testcase was created by a tool which builds an encoding table from the
mapping file at http://www.unicode.org/Public/MAPPINGS with the addition of
javascript to evaluate the results of decoding.
Comment 4•22 years ago
|
||
The tool I mentioned in comment 3 is now live at
http://smontagu.damowmow.com/encodingtest.html
Codepages with errors in current Mozilla trunk:
ISO-8859-7
ISO-8859-8
ISO-8859-10
windows-1256
windows-1258
windows-950 (Big5)
Keywords: nsbeta1
Reporter | ||
Comment 5•22 years ago
|
||
Please check also the Macintosh character sets at
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/
For example, MacCyrillic has
xA2 GHE with upturn NOT cent sign
xB6 ghe with upturn NOT partial differential
xFF euro sign NOT currency sign
Comment 7•22 years ago
|
||
Thanks for the reminder, Andreas.
Mac encodings with errors (not including encodings that don't appear in our
menus such as MacThai):
MacCroatian
MacCyrillic
MacDevanagari
MacFarsi
MacGreek
MacGujarati
MacGurmukhi
MacIcelandic
MacRoman
MacRomanian
MacTurkish
MacUkrainian
Comment 8•22 years ago
|
||
Strike MacDevanagari and MacGujarati from that list: they are affected by
another issue (bug 203838). MacGurmukhi is blocked by bug 203838 too, but also
has one conversion error.
Comment 9•22 years ago
|
||
Andreas: do you know any URLs which actually use Mac codepages? I have suggested
in bug 203838 that we don't need to expose them in the UI.
Reporter | ||
Comment 10•22 years ago
|
||
IMHO support for Macintosh character sets is pointless for the Unix and
MS Windows versions because web pages are not written in Mac encodings.
It is important for the Macintosh Mozilla version.
Mozilla/Netscape for Mac OS 9 displays, for example, Cyrillic text incorrectly
because of errors in the assumed x-Mac-Cyrillic character set (ghe with upturn,
euro). This may be of no importance in Mac OS X.
Comment 11•22 years ago
|
||
Comment 12•22 years ago
|
||
Comment on attachment 127020 [details] [diff] [review]
Update cp1256.uf and cp1256.ut
jshin, can you review this? It's generated by intl/uconv/tools/umaptable.c from
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1256.TXT
Attachment #127020 -
Flags: review?(jshin)
Comment 13•22 years ago
|
||
Comment on attachment 127020 [details] [diff] [review]
Update cp1256.uf and cp1256.ut
r=jshin
BTW, it looks like we're reverting to NPL 1.0 because umaptable.c is not
updated.
Attachment #127020 -
Flags: review?(jshin) → review+
Comment 14•22 years ago
|
||
erk, you're right. I saw the 1.0/1.1 difference but I just assumed that it was
going in the right direction without noticing which lines had + and which -
Comment 15•22 years ago
|
||
Now I get it: the files in the tree have had the license changed since they were
generated so regenerating reverts to the original form. I'll edit out the
license changes before checking in.
Comment 16•22 years ago
|
||
Comment on attachment 127020 [details] [diff] [review]
Update cp1256.uf and cp1256.ut
roc, can you sr? Would you be ready to give blanket rs= for similar updates in
future, which just generate new *.uf and *.ut files from the mapping files at
www.unicode.org without any code changes?
Attachment #127020 -
Flags: superreview?(roc+moz)
Assignee | ||
Comment 17•22 years ago
|
||
yea. the old code is based on version 2.0 dated 4/15/98
Comment on attachment 127020 [details] [diff] [review]
Update cp1256.uf and cp1256.ut
smontagu, that makes sense ... blanket rs granted
Attachment #127020 -
Flags: superreview?(roc+moz) → superreview+
Comment 19•22 years ago
|
||
Fix checked in.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 20•22 years ago
|
||
*** Bug 228030 has been marked as a duplicate of this bug. ***
![]() |
||
Comment 21•21 years ago
|
||
It looks like this broke "smart quote" rendering on Linux (removing the fallback
chars). See bug 232026.
Comment 22•21 years ago
|
||
I am forwarding a link of Urdu Page
http://www.aiqbal.iamyourhost.com/cafe/index.php?p=167#comments
Urdu like Arabic change shape in writing with reference to the Characters
comming befer or after. If you see the page in IE Characters are properly
jointing whereas in Mozilla (firebox or any) few Characters don't join properly.
Please note every thing in the page is in unicode (charset=utf-8)
Comment 23•21 years ago
|
||
(In reply to comment #22)
> I am forwarding a link of Urdu Page
> http://www.aiqbal.iamyourhost.com/cafe/index.php?p=167#comments
> Urdu like Arabic change shape in writing with reference to the Characters
> comming befer or after. If you see the page in IE Characters are properly
> jointing whereas in Mozilla (firebox or any) few Characters don't join properly.
> Please note every thing in the page is in unicode (charset=utf-8)
In addition to my above post following is the link for BBc Urdu
http://www.bbc.co.uk/urdu/
Please check
Comment 24•21 years ago
|
||
Nazir, please open a new bug for the shaping issue.
You need to log in
before you can comment on or make changes to this bug.
Description
•