Closed
Bug 199143
Opened 21 years ago
Closed 21 years ago
Some Persian and Urdu characters in Code Page 1256 are not displayed
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
People
(Reporter: andreasprilopwww, Assigned: ftang)
References
()
Details
Attachments
(2 files)
49.62 KB,
text/html
|
Details | |
13.23 KB,
patch
|
jshin1987
:
review+
roc
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020920 Netscape/7.0 Build Identifier: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020920 Netscape/7.0 Mozilla does not display some Persian and all Urdu characters from Code Page 1256, i.e. on a page written in Windows-1256. They are displayed with the "replacement character". It seems that Mozilla assumes an obsolete, pre-1999 version of Code Page 1256. Reproducible: Always Steps to Reproduce: Go to <http://www.unics.uni-hannover.de/nhtcapri/arabic.win> Look under "Persian alphabet" and "Additional Urdu letters". Actual Results: Some characters are displayed using the "replacement character". Expected Results: Display Persian and Urdu characters. http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1256.TXT
Comment 1•21 years ago
|
||
You forgot to CC :-)
Comment 2•21 years ago
|
||
Interesting synchronicity. Frank and I were just looking at this problem yesterday.
Assignee: mkaply → ftang
Status: UNCONFIRMED → NEW
Component: BiDi Hebrew & Arabic → Internationalization
Ever confirmed: true
Comment 3•21 years ago
|
||
This testcase was created by a tool which builds an encoding table from the mapping file at http://www.unicode.org/Public/MAPPINGS with the addition of javascript to evaluate the results of decoding.
Comment 4•21 years ago
|
||
The tool I mentioned in comment 3 is now live at http://smontagu.damowmow.com/encodingtest.html Codepages with errors in current Mozilla trunk: ISO-8859-7 ISO-8859-8 ISO-8859-10 windows-1256 windows-1258 windows-950 (Big5)
Keywords: nsbeta1
Reporter | ||
Comment 5•21 years ago
|
||
Please check also the Macintosh character sets at http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ For example, MacCyrillic has xA2 GHE with upturn NOT cent sign xB6 ghe with upturn NOT partial differential xFF euro sign NOT currency sign
Comment 7•21 years ago
|
||
Thanks for the reminder, Andreas. Mac encodings with errors (not including encodings that don't appear in our menus such as MacThai): MacCroatian MacCyrillic MacDevanagari MacFarsi MacGreek MacGujarati MacGurmukhi MacIcelandic MacRoman MacRomanian MacTurkish MacUkrainian
Comment 8•21 years ago
|
||
Strike MacDevanagari and MacGujarati from that list: they are affected by another issue (bug 203838). MacGurmukhi is blocked by bug 203838 too, but also has one conversion error.
Comment 9•21 years ago
|
||
Andreas: do you know any URLs which actually use Mac codepages? I have suggested in bug 203838 that we don't need to expose them in the UI.
Reporter | ||
Comment 10•21 years ago
|
||
IMHO support for Macintosh character sets is pointless for the Unix and MS Windows versions because web pages are not written in Mac encodings. It is important for the Macintosh Mozilla version. Mozilla/Netscape for Mac OS 9 displays, for example, Cyrillic text incorrectly because of errors in the assumed x-Mac-Cyrillic character set (ghe with upturn, euro). This may be of no importance in Mac OS X.
Comment 11•21 years ago
|
||
Comment 12•21 years ago
|
||
Comment on attachment 127020 [details] [diff] [review] Update cp1256.uf and cp1256.ut jshin, can you review this? It's generated by intl/uconv/tools/umaptable.c from http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1256.TXT
Attachment #127020 -
Flags: review?(jshin)
Comment 13•21 years ago
|
||
Comment on attachment 127020 [details] [diff] [review] Update cp1256.uf and cp1256.ut r=jshin BTW, it looks like we're reverting to NPL 1.0 because umaptable.c is not updated.
Attachment #127020 -
Flags: review?(jshin) → review+
Comment 14•21 years ago
|
||
erk, you're right. I saw the 1.0/1.1 difference but I just assumed that it was going in the right direction without noticing which lines had + and which -
Comment 15•21 years ago
|
||
Now I get it: the files in the tree have had the license changed since they were generated so regenerating reverts to the original form. I'll edit out the license changes before checking in.
Comment 16•21 years ago
|
||
Comment on attachment 127020 [details] [diff] [review] Update cp1256.uf and cp1256.ut roc, can you sr? Would you be ready to give blanket rs= for similar updates in future, which just generate new *.uf and *.ut files from the mapping files at www.unicode.org without any code changes?
Attachment #127020 -
Flags: superreview?(roc+moz)
Assignee | ||
Comment 17•21 years ago
|
||
yea. the old code is based on version 2.0 dated 4/15/98
Comment on attachment 127020 [details] [diff] [review] Update cp1256.uf and cp1256.ut smontagu, that makes sense ... blanket rs granted
Attachment #127020 -
Flags: superreview?(roc+moz) → superreview+
Comment 19•21 years ago
|
||
Fix checked in.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment 20•21 years ago
|
||
*** Bug 228030 has been marked as a duplicate of this bug. ***
Comment 21•20 years ago
|
||
It looks like this broke "smart quote" rendering on Linux (removing the fallback chars). See bug 232026.
Comment 22•20 years ago
|
||
I am forwarding a link of Urdu Page http://www.aiqbal.iamyourhost.com/cafe/index.php?p=167#comments Urdu like Arabic change shape in writing with reference to the Characters comming befer or after. If you see the page in IE Characters are properly jointing whereas in Mozilla (firebox or any) few Characters don't join properly. Please note every thing in the page is in unicode (charset=utf-8)
Comment 23•20 years ago
|
||
(In reply to comment #22) > I am forwarding a link of Urdu Page > http://www.aiqbal.iamyourhost.com/cafe/index.php?p=167#comments > Urdu like Arabic change shape in writing with reference to the Characters > comming befer or after. If you see the page in IE Characters are properly > jointing whereas in Mozilla (firebox or any) few Characters don't join properly. > Please note every thing in the page is in unicode (charset=utf-8) In addition to my above post following is the link for BBc Urdu http://www.bbc.co.uk/urdu/ Please check
Comment 24•20 years ago
|
||
Nazir, please open a new bug for the shaping issue.
You need to log in
before you can comment on or make changes to this bug.
Description
•