Closed
Bug 210629
Opened 22 years ago
Closed 21 years ago
add encoder/decoder for (x-)IBM-1046 used for Arabic on AIX
Categories
(Core :: Layout: Text and Fonts, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: ahtaha, Assigned: pkwarren)
Details
(Keywords: intl)
Attachments
(1 file, 6 obsolete files)
40.41 KB,
patch
|
jshin1987
:
review+
roc
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322)
Build Identifier: Mozilla/5.0 (X11; U; AIX 0044F61A4C00; en-US; rv:1.3) Gecko/20030624
AIX mozilla displays Arabic letters as question marks when browsing any Arabic
page, because Mozilla doesn't support IBM-1046 encoding and fonts on AIX.
IBM-1046 charset based fonts are the default Arabic fonts on AIX.
The proposed solution is to add IBM-1046 charset support to Mozilla.
Reproducible: Always
Steps to Reproduce:
1. Get AIX system with Default Arabic fonts installed on
2. Get Mozilla on AIX
3. Launch the browser and open any Arabic page (http://www.ahram.org.eg -
http://akhbarelyom.org.eg)
Actual Results:
All Arabic letters apeared as question marks
Expected Results:
Arabic letters to be displayed properly
Reporter | ||
Updated•22 years ago
|
OS: other → AIX
Reporter | ||
Comment 1•22 years ago
|
||
Reporter | ||
Comment 2•22 years ago
|
||
The attached file contains the following files:
1. intl.mod.tar: contains all modified files under intl directory
2. intl.new.tar: contains the new files created under intl directory
3. gfx.mod.tar: contains the modified files under gfx directory
4. layout.mod.tar: contains the modified files under layout directory
Status: UNCONFIRMED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 3•22 years ago
|
||
Eh Ahmad ... why did you change thuis to 'fixed' ? Did you find the reason why
theu didn't display ?
Reporter | ||
Comment 4•22 years ago
|
||
Hi Jo,
I changed the status to fixed as the attached file contains the required files
to solve the bug.
unfortunately the modified files are not in patch format. You might cvs diff
them.
Comment 5•22 years ago
|
||
We only mark bugs FIXED when the fixes are checked in.
I'll try to make a diff for this.
Status: RESOLVED → UNCONFIRMED
Resolution: FIXED → ---
Comment 6•22 years ago
|
||
This isn't going to work. I have to have a diff for the old files. I'm already
out of sync.
Please do a cvs diff -u against your tree with these files and post the diff.
I'll do a diff for the new files.
Reporter | ||
Comment 7•22 years ago
|
||
I have created cvs diff patchs for the modified files (12 files) to add
IBM-1046 charset support to Mozilla
Comment 8•22 years ago
|
||
Do we need the decoder as well as the encoder? If IBM-1046 is only used as a
font encoding by AIX and not in web pages, then we only need conversion from
Unicode to IBM-1046 and not vice versa, as we currently have for the Langbox
Arabic encodings, so only part of the patch would be necessary.
Also, since it's not defined in IANA, we would call it x-IBM-1046.
Comment 9•22 years ago
|
||
Can file names need decoding?
Comment 10•22 years ago
|
||
Lina Kemmel wrote:
> Can file names need decoding?
Do you mean something like that the Unix env variable $LANG is set to something
like "ar.IBM-1046" (=Arabic locale which uses IBM-1046 encoding) ? Then the
answer is "yes".
Comment 11•22 years ago
|
||
Roland Mainz wrote:
>> Do you mean ... Arabic locale which uses IBM-1046 encoding?
Yes, Ar_AA.IBM-1046, or just its alias Ar_AA.
Reporter | ||
Comment 12•22 years ago
|
||
We need the decoder as well as the encoder for the following cases:
1. All web files that are created under the Arabic locale "Ar_AA" (This is the
arabic locale that uses IBM-1046 charset.
You can find html files describing steps or whatever that are created on AIX
mainly, manually or automatically (for example in a banking system ...etc).
These files will be created using IBM-1046 charset.
2. In case running the browser or one of its components such as the mail
composer, or Java Script editor in Arabic locale "Ar_AA" and input any Arabic
data.
As for the IANA issue, we will start the adequate procedures to register IBM-
1046 charset with IANA. So we will not need to rename it to x-IBM-1046
Comment 13•22 years ago
|
||
Please attach the dif file directly to the bug.
Don't attach tar.gz files.
Reporter | ||
Comment 14•22 years ago
|
||
This patch doesn't include the new additional 6 files to completely support
IBM-1046 charset.
The new 6 files are attached in the first attachement.
Comment 15•22 years ago
|
||
I'm changing the summary line to better describe what this bug is about.
+ucs2.base=UTF-16BE
This line seems to be a 'pollution'...
@@ -88,6 +90,13 @@
+##
+## Aliases for IBM-1046
+##
+ibm-1046=IBM-1046
+cp1046=IBM-1046
+1046=IBM-1046
+csIBM1046=IBM-1046
Are all of these actually used in emails/web pages or at least documented
somewhere? Aliases like 'csIBMxxx' were put there because IANA charset registry
has them. If IBM-1046 is not even registered, yet, you don't need them.
Can you give a couple of examples of actual web pages encoded in IBM-1046? Two
web sites given in your report are in Windows-1256, which prompted Simon's
question in comment #8.
Summary: Arabic letters are displayed as question marks on AIX → add encoder/decoder for (x-)IBM-1046 used for Arabic on AIX
Reporter | ||
Comment 16•22 years ago
|
||
This line "+ucs2.base=UTF-16BE" should not be there, it's added for debugging
purpose and should be removed.
The aliases for IBM-1046 are taken from Linux conversion module. Mapping
between IBM-1046 and Unicode can be obtained from
http://oss.software.ibm.com/cvs/icu//charset/data/ucm/
The two web sites givven in my report are just as an example for any arabic
page. The main concern is to enable Mozilla using default Arabic fonts in AIX
(IBM-1046 based)
IBM-1046 encoded content may not be available so frequent on the web, however
it's more likely to be found on intranets mainly generated on an AIX host.
for testing purpose, it's possible to create IBM-1046 encoded html page using
the following technique:
1. Save any Arabic web page
2. Transfer it to AIX host in binary mode
3. Use iconv command to convert encoding of the test page to IBM-1046
iconv -f <CHARSET> -t IBM-1046 < TestPage.html > TestPage-1046.html
Comment 17•22 years ago
|
||
> The aliases for IBM-1046 are taken from Linux conversion module
Then, it must have come from somewhere. If nobody has used names like
csIBM1046 as a MIME charset name, we don't need them in alias. Well, it's not a
big deal but I just wanted know whether it's documented somewhere (Linux
iconv(3) doesn't count ;-) because it's not for web publication/email/news but
for *internal* use )
> The main concern is to enable Mozilla using default Arabic fonts in AIX
> (IBM-1046 based)
This is for Unicode to IBM-1046 (font-encoding) encoder.
> it's more likely to be found on intranets mainly generated on an AIX host.
Have you seen one? What's MIME charset to tag them? Do they (intranet
document) use one of aliases you listed? I guess they do, but just like to make
sure that we don't miss any other name used to tage IBM-1046.
> it's possible to create IBM-1046 encoded html page using
> the following technique
Sure, it's possible that way. My point is whether we need to add a decoder if
even you can't come up with a single page that's encoded in IBM-1046 that is not
made for the sake of demonstration.
Given that there are a few more decoders (of bigger size) of more 'dubious'
value in the current tree, I shouldn't make any more 'noise' :-)
Reporter | ||
Comment 18•22 years ago
|
||
As for aliases I think we can omit "csIBM1046" and "1046", and keep
only "cp1046" as it's used in Java.
As for the decoder, since Ar_AA locale on AIX uses IBM-1046 charmap so we do
need the decoder to allow Arabic input in Mozilla when it's launched under
Ar_AA locale.
Reporter | ||
Comment 19•22 years ago
|
||
Additional information about IBM PC code sets (IBM-1046 is one IBM PC code
sets) can be obtained from the following URL:
http://publib16.boulder.ibm.com/pseries/en_US/aixprggd/nlsgdrf/nlsgdrf02.htm#wq1
Assignee | ||
Updated•21 years ago
|
Assignee: mkaply → pkw
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 20•21 years ago
|
||
pkw:
1. Is it possible to attach a *.enc mapping file for Xfree86 (see
http://xprint.freedesktop.org/cgi-bin/viewcvs.cgi/*checkout*/xprint/xprint/src/xprint_main/xc/fonts/encodings/iso8859-6.8x.enc?rev=HEAD&content-type=text/plain
for an example), please ?
2. Can you please attach an example font (PCF, BDF, etc.) to this bug that I can
test the patch (for review), please ?
Comment 21•21 years ago
|
||
Roland, wouldn't this suffice for you to generate the font encoding file for IBM
1046?
http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/ucm/aix-IBM_1046-4.3.6.ucm?rev=1.1&content-type=text/plain
Keywords: intl
Reporter | ||
Comment 22•21 years ago
|
||
The font encoding and decoding files are attached to this bug. They were
generated using umaptable tool.
Please have a look to the first attachment created on 2003-06-25 07:43 PST. I'm
going to attach a new patch includes the new files as well as the modified
files.
Assignee | ||
Updated•21 years ago
|
Attachment #126457 -
Attachment is obsolete: true
Assignee | ||
Updated•21 years ago
|
Attachment #126540 -
Attachment is obsolete: true
Assignee | ||
Updated•21 years ago
|
Attachment #126813 -
Attachment is obsolete: true
Comment 23•21 years ago
|
||
Jungshik Shin wrote:
> wouldn't this suffice for you to generate the font encoding file for IBM
> 1046?
>
http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/ucm/aix-IBM_1046-4.3.6.ucm?rev=1.1&content-type=text/plain
Cool... :)
I wasn't aware that this information was available in the public.
... but we still need one or two PCF/BDF example fonts to test whether:
a) this patch is working
b) my encoding map for Xfree86 works
Assignee | ||
Comment 24•21 years ago
|
||
Latest version of the patch which is updated for the trunk. Removed some
unneeded aliases. As stated in comment 18, both the encoder and decoder are
needed to work properly.
Assignee | ||
Updated•21 years ago
|
Attachment #142359 -
Flags: superreview?(smontagu)
Attachment #142359 -
Flags: review?(jshin)
Assignee | ||
Updated•21 years ago
|
Attachment #142359 -
Flags: superreview?(smontagu) → superreview?(blizzard)
Comment 25•21 years ago
|
||
Comment on attachment 142359 [details] [diff] [review]
Patch v2
> Index: gfx/src/gtk/nsFontMetricsGTK.cpp
> ===================================================================
> RCS file: /cvsroot/mozilla/gfx/src/gtk/nsFontMetricsGTK.cpp,v
> retrieving revision 1.267
> diff -u -6 -r1.267 nsFontMetricsGTK.cpp
> --- gfx/src/gtk/nsFontMetricsGTK.cpp 5 Feb 2004 01:57:03 -0000 1.267
> +++ gfx/src/gtk/nsFontMetricsGTK.cpp 26 Feb 2004 19:24:08 -0000
> @@ -22,12 +22,13 @@
> * Contributor(s):
> * Pierre Phaneuf <pp@ludusdesign.com>
> * Roland Mainz <roland.mainz@informatik.med.uni-giessen.de>
> * Brian Stell <bstell@ix.netcom.com>
> * Morten Nilsen <morten@nilsen.com>
> * Jungshik Shin <jshin@mailaps.org>
> + * IBM Corporation
Erm... is it possible to provide an email here, please ? Just for the case that
someone has questions...
[snip]
> { "ibm-1252", &FLG_NONE, &Unknown },
> { "ibm-850", &FLG_NONE, &Unknown },
> + { "ibm-1046", &FLG_ARABIC, &IBM1046 },
Rumor says the table should be in alphabetical order (I am not sure whether
this rule is mandatory anymore or just to make searching by humans faster :)
> { "ibm-fontspecific", &FLG_NONE, &Unknown },
> { "ibm-sbdcn", &FLG_NONE, &Unknown },
> { "ibm-sbdtw", &FLG_NONE, &Unknown },
> { "ibm-special", &FLG_NONE, &Unknown },
> { "ibm-udccn", &FLG_NONE, &Unknown },
> { "ibm-udcjp", &FLG_NONE, &Unknown },
> Index: gfx/src/xlib/nsFontMetricsXlib.cpp
> ===================================================================
> RCS file: /cvsroot/mozilla/gfx/src/xlib/nsFontMetricsXlib.cpp,v
> retrieving revision 1.162
> diff -u -6 -r1.162 nsFontMetricsXlib.cpp
> --- gfx/src/xlib/nsFontMetricsXlib.cpp 5 Feb 2004 01:57:06 -0000 1.162
> +++ gfx/src/xlib/nsFontMetricsXlib.cpp 26 Feb 2004 19:24:09 -0000
> @@ -25,12 +25,13 @@
> * Quy Tonthat <quy@igelaus.com.au>
> * Tony Tsui <tony@igelaus.com.au>
> * Tim Copperfield <timecop@network.email.ne.jp>
> * Roland Mainz <roland.mainz@informatik.med.uni-giessen.de>
> * Brian Stell <bstell@ix.netcom.com>
> * Jungshik Shin <jshin@mailaps.org>
> + * IBM Corporation
See above.
[snip]
> +static nsFontCharSetInfoXlib IBM1046 =
> + { "IBM-1046", SingleByteConvert, 0,
> + TT_OS2_CPR1_ARABIC, TT_OS2_CPR2_ARABIC | TT_OS2_CPR2_ARABIC_708 };
> static nsFontCharSetInfoXlib ISO88597 =
> { "ISO-8859-7", SingleByteConvert, 0,
> TT_OS2_CPR1_GREEK, TT_OS2_CPR2_GREEK | TT_OS2_CPR2_GREEK_437G };
> static nsFontCharSetInfoXlib ISO88598 =
> { "ISO-8859-8", SingleByteConvert, 0,
> TT_OS2_CPR1_HEBREW, TT_OS2_CPR2_HEBREW };
> @@ -602,12 +606,13 @@
> { "hp-tchinesebig5", &FLG_ZHTW, &Big5 },
> { "hp-wa", &FLG_NONE, &Unknown },
> { "hpbig5-", &FLG_ZHTW, &Big5 },
> { "hphkbig5-", &FLG_ZHHK, &HKSCS },
> { "hproc16-", &FLG_NONE, &Unknown },
> { "ibm-1252", &FLG_NONE, &Unknown },
> + { "ibm-1046", &FLG_ARABIC, &IBM1046 },
See above.
[snip]
> Index: intl/uconv/src/nsUConvModule.cpp
> ===================================================================
> RCS file: /cvsroot/mozilla/intl/uconv/src/nsUConvModule.cpp,v
> retrieving revision 1.52
> diff -u -6 -r1.52 nsUConvModule.cpp
> --- intl/uconv/src/nsUConvModule.cpp 7 Sep 2003 22:24:11 -0000 1.52
> +++ intl/uconv/src/nsUConvModule.cpp 26 Feb 2004 19:24:16 -0000
> @@ -18,12 +18,13 @@
> * Netscape Communications Corporation.
> * Portions created by the Initial Developer are Copyright (C) 1998
> * the Initial Developer. All Rights Reserved.
> *
> * Contributor(s):
> * Pierre Phaneuf <pp@ludusdesign.com>
> + * IBM Corporation
Email ?
[snip]
> Index: intl/uconv/ucvibm/nsUnicodeToCP1046.h
> ===================================================================
> RCS file: intl/uconv/ucvibm/nsUnicodeToCP1046.h
> diff -N intl/uconv/ucvibm/nsUnicodeToCP1046.h
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ intl/uconv/ucvibm/nsUnicodeToCP1046.h 26 Feb 2004 19:24:16 -0000
> @@ -0,0 +1,39 @@
> +/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*-
> + *
> + * The contents of this file are subject to the Netscape Public
> + * License Version 1.1 (the "License"); you may not use this file
> + * except in compliance with the License. You may obtain a copy of
> + * the License at http://www.mozilla.org/NPL/
> + *
> + * Software distributed under the License is distributed on an "AS
> + * IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
> + * implied. See the License for the specific language governing
> + * rights and limitations under the License.
> + *
> + * The Original Code is Mozilla Communicator client code.
> + *
> + * The Initial Developer of the Original Code is Netscape Communications
> + * Corporation. Portions created by Netscape are
> + * Copyright (C) 1998 Netscape Communications Corporation. All
> + * Rights Reserved.
> + *
> + * Contributor(s):
> + * IBM Corporation
> + *
> + */
> +
> +#ifndef nsUnicodeToCP1046_h___
> +#define nsUnicodeToCP1046_h___
> +
> +#include "nsISupports.h"
> +
> +
> +/**
> + * A character set converter from Unicode to CP1046.
> + */
> +NS_METHOD
> +nsUnicodeToCP1046Constructor(nsISupports *aOuter, REFNSIID aIID,
> + void **aResult);
> +
> +
> +#endif /* nsUnicodeToCP1046_h___ */
Nit:
#endif /* !nsUnicodeToCP1046_h___ */ when |ifndef| or |if !defined(xxx)| was
used.
> Index: layout/base/src/nsPresContext.cpp
> ===================================================================
> RCS file: /cvsroot/mozilla/layout/base/src/nsPresContext.cpp,v
> retrieving revision 3.250
> diff -u -6 -r3.250 nsPresContext.cpp
> --- layout/base/src/nsPresContext.cpp 23 Feb 2004 21:29:04 -0000 3.250
> +++ layout/base/src/nsPresContext.cpp 26 Feb 2004 19:24:16 -0000
> @@ -17,12 +17,13 @@
> * The Initial Developer of the Original Code is
> * Netscape Communications Corporation.
> * Portions created by the Initial Developer are Copyright (C) 1998
> * the Initial Developer. All Rights Reserved.
> *
> * Contributor(s):
> + * IBM Corporation
> *
> * Alternatively, the contents of this file may be used under the terms of
> * either the GNU General Public License Version 2 or later (the "GPL"), or
> * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
> * in which case the provisions of the GPL or the LGPL are applicable instead
> * of those above. If you wish to allow use of your version of this file only
> @@ -1054,13 +1055,17 @@
> #ifdef IBMBIDI
> //ahmed
> NS_IMETHODIMP
> nsPresContext::IsArabicEncoding(PRBool& aResult) const
> {
> aResult=PR_FALSE;
> - if ( (mCharset.EqualsIgnoreCase("ibm864") )||(mCharset.EqualsIgnoreCase("ibm864i") )||(mCharset.EqualsIgnoreCase("windows-1256") )||(mCharset.EqualsIgnoreCase("iso-8859-6") ))
> + if (mCharset.EqualsIgnoreCase("ibm864")
> + || mCharset.EqualsIgnoreCase("ibm864i")
> + || mCharset.EqualsIgnoreCase("windows-1256")
> + || mCharset.EqualsIgnoreCase("iso-8859-6")
> + || mCharset.EqualsIgnoreCase("ibm-1046"))
> aResult=PR_TRUE;
> return NS_OK;
> }
Small optimisation:
In theory you could assign |mCharset| to a temp. var, make all chars in that
temp. var lowercase and then you could use |Equals()| instead of
|EqualsIgnoreCase()|
> NS_IMETHODIMP
> nsPresContext::IsVisRTL(PRBool& aResult) const
Patch compiles and seems to work, r=roland.mainz@nrubsig.org
Attachment #142359 -
Flags: review?(jshin) → review+
Assignee | ||
Comment 26•21 years ago
|
||
Address Roland's comments.
Attachment #142359 -
Attachment is obsolete: true
Assignee | ||
Updated•21 years ago
|
Attachment #142359 -
Flags: superreview?(blizzard)
Assignee | ||
Updated•21 years ago
|
Attachment #142380 -
Flags: superreview?(blizzard)
Attachment #142380 -
Flags: review?(jshin)
Assignee | ||
Updated•21 years ago
|
Status: NEW → ASSIGNED
Comment 27•21 years ago
|
||
Comment on attachment 142380 [details] [diff] [review]
Patch v3
>+static nsFontCharSetInfo IBM1046 =
>+ { "IBM-1046", SingleByteConvert, 0,
>+ TT_OS2_CPR1_ARABIC, TT_OS2_CPR2_ARABIC | TT_OS2_CPR2_ARABIC_708 };
I've just checked the IANA charset registry [1] and found that IBM1046 hadn't
been registered. In that case, we shouldn't use 'IBM1046' (nor 'IBM-1046'). The
canonical name for this encoding should be 'x-IBM1046'. Better would be to
register IBM1046 with the IANA and then use 'IBM1046' (no '-') instead of
'x-IBM1046'. See
http://www.iana.org/assignments/character-sets
I guess it'd not take long to register it.
>+ { "ibm-1046", &FLG_ARABIC, &IBM1046 },
This is fine assuming that AIX comes with X11 fonts with 'ibm-1046' as the last
two fields of XLFD.
>Index: gfx/src/xlib/nsFontMetricsXlib.cpp
>+static nsFontCharSetInfoXlib IBM1046 =
>+ { "IBM-1046", SingleByteConvert, 0,
>+ TT_OS2_CPR1_ARABIC, TT_OS2_CPR2_ARABIC |
The same here. 'IBM-1046' should be replaced by 'x-IBM1046'
>Index: intl/uconv/src/charsetData.properties
> ibm866.LangGroup = x-cyrillic
>+ibm-1046.LangGroup = ar
Pls, use 'x-ibm1046' and add the following line :
x-ibm1046.notForOutgoing = true
unless it's necessary to save files and to send emails in IBM1046.
>+ibm-1046.title = Arabic (IBM-1046)
x-ibm1046.title = Arabic (IBM-1046)
>Index: intl/uconv/src/charsetalias.properties
> ibm864i=IBM864i
>+##
>+## Aliases for IBM-1046
>+##
>+ibm-1046=IBM-1046
>+cp1046=IBM-1046
>Index: intl/uconv/src/nsUConvModule.cpp
>+NS_UCONV_REG_UNREG("IBM-1046", NS_CP1046TOUNICODE_CID, NS_UNICODETOCP1046_CID)
>+ DECODER_NAME_BASE "IBM-1046" , NS_CP1046TOUNICODE_CID,
>+ NS_UNICODEDECODER_CONTRACTID_BASE "IBM-1046",
>+ nsCP1046ToUnicodeConstructor ,
s/IBM-1046/x-IBM1046/ : a few more in this file.
>Index: intl/uconv/src/unixcharset.properties
> # AIX
>-#locale.all.Ar_AA=IBM-1046
>+locale.all.Ar_AA=IBM-1046
ditto
>Index: intl/uconv/ucvibm/cp1046.ut
>+ * The contents of this file are subject to the Netscape Public License
>+ * Version 1.0 (the "NPL"); you may not use this file except in
>+ * compliance with the NPL. You may obtain a copy of the NPL at
All new files should use GPL/MPL/LGPL instead of NPL. See
http://www.mozilla.org/MPL/boilerplate-1.1/
>Index: intl/uconv/ucvibm/nsCP1046ToUnicode.cpp
>Index: intl/uconv/ucvibm/nsCP1046ToUnicode.h
ditto.
Comment 28•21 years ago
|
||
> Pls, use 'x-ibm1046' and add the following line :
> x-ibm1046.notForOutgoing = true
> unless it's necessary to save files and to send emails in IBM1046.
In addition, if, as I suspect, we don't need to let users save files and send
emails in IBM1046, IBM1046 related-files (in intl/uconv/ucvlatin/Makefile.in)
and IBM1046-related lines (in nsUConvModules.cpp) have to be enclosed by '#ifdef
MOZ_EXTRA_X11CONVERTERS' and '#endif'.
Assignee | ||
Comment 29•21 years ago
|
||
I have a new patch which I will be posting shortly.
(In reply to comment #27)
> >+static nsFontCharSetInfo IBM1046 =
> >+ { "IBM-1046", SingleByteConvert, 0,
> >+ TT_OS2_CPR1_ARABIC, TT_OS2_CPR2_ARABIC | TT_OS2_CPR2_ARABIC_708 };
>
> I've just checked the IANA charset registry [1] and found that IBM1046 hadn't
> been registered. In that case, we shouldn't use 'IBM1046' (nor 'IBM-1046'). The
> canonical name for this encoding should be 'x-IBM1046'. Better would be to
> register IBM1046 with the IANA and then use 'IBM1046' (no '-') instead of
> 'x-IBM1046'. See
>
> http://www.iana.org/assignments/character-sets
>
> I guess it'd not take long to register it.
I have started the process internally to register IBM1046 with IANA. Once I have
more information to report on this, I will add it to this bug. For now, I will
change this line to be IBM1046 instead of IBM-1046.
> >+ { "ibm-1046", &FLG_ARABIC, &IBM1046 },
>
> This is fine assuming that AIX comes with X11 fonts with 'ibm-1046' as the last
> two fields of XLFD.
It does:
pkw@wave:~/ > xlsfonts | grep 'ibm-1046$'
-applied arabic-boutros typing-bold-i-normal--0-0-96-96-m-0-ibm-1046
-applied arabic-boutros typing-bold-i-normal--11-80-96-96-m-60-ibm-1046
-applied arabic-boutros typing-bold-i-normal--13-100-96-96-m-80-ibm-1046
-applied arabic-boutros typing-bold-i-normal--16-120-96-96-m-100-ibm-1046
...
> >Index: gfx/src/xlib/nsFontMetricsXlib.cpp
>
> >+static nsFontCharSetInfoXlib IBM1046 =
> >+ { "IBM-1046", SingleByteConvert, 0,
> >+ TT_OS2_CPR1_ARABIC, TT_OS2_CPR2_ARABIC |
>
> The same here. 'IBM-1046' should be replaced by 'x-IBM1046'
Changed to IBM1046.
> >Index: intl/uconv/src/charsetData.properties
>
> > ibm866.LangGroup = x-cyrillic
> >+ibm-1046.LangGroup = ar
>
> Pls, use 'x-ibm1046' and add the following line :
>
> x-ibm1046.notForOutgoing = true
>
> unless it's necessary to save files and to send emails in IBM1046.
While I don't think it is necessary to send emails in IBM1046, it is useful to
be able to save files in IBM1046. The Ar_AA locale on AIX saves all files in
this encoding, so for users of this locale this is a useful feature. I have
changed the line to ibm1046 instead of ibm-1046.
> >+ibm-1046.title = Arabic (IBM-1046)
>
> x-ibm1046.title = Arabic (IBM-1046)
Changed to ibm1046 instead of ibm-1046.
> >Index: intl/uconv/src/charsetalias.properties
>
> > ibm864i=IBM864i
> >+##
> >+## Aliases for IBM-1046
> >+##
> >+ibm-1046=IBM-1046
> >+cp1046=IBM-1046
Changed IBM-1046 to IBM1046. Changed existing ibm-1046 alias to ibm1046.
> >Index: intl/uconv/src/nsUConvModule.cpp
>
>
> >+NS_UCONV_REG_UNREG("IBM-1046", NS_CP1046TOUNICODE_CID, NS_UNICODETOCP1046_CID)
>
> >+ DECODER_NAME_BASE "IBM-1046" , NS_CP1046TOUNICODE_CID,
> >+ NS_UNICODEDECODER_CONTRACTID_BASE "IBM-1046",
> >+ nsCP1046ToUnicodeConstructor ,
>
> s/IBM-1046/x-IBM1046/ : a few more in this file.
Changed to IBM1046.
> >Index: intl/uconv/src/unixcharset.properties
>
> > # AIX
> >-#locale.all.Ar_AA=IBM-1046
> >+locale.all.Ar_AA=IBM-1046
>
> ditto
Changed to IBM1046.
> >Index: intl/uconv/ucvibm/cp1046.ut
>
> >+ * The contents of this file are subject to the Netscape Public License
> >+ * Version 1.0 (the "NPL"); you may not use this file except in
> >+ * compliance with the NPL. You may obtain a copy of the NPL at
>
> All new files should use GPL/MPL/LGPL instead of NPL. See
>
> http://www.mozilla.org/MPL/boilerplate-1.1/
Done. These license statements are auto-generated from the umaptable command -
should the license in umaptable.c be modified to the new tri-license?
> >Index: intl/uconv/ucvibm/nsCP1046ToUnicode.cpp
> >Index: intl/uconv/ucvibm/nsCP1046ToUnicode.h
>
> ditto.
Done.
Assignee | ||
Updated•21 years ago
|
Attachment #142380 -
Attachment is obsolete: true
Attachment #142380 -
Flags: superreview?(blizzard)
Attachment #142380 -
Flags: review?(jshin)
Assignee | ||
Comment 30•21 years ago
|
||
- Updated to correct license on all new code.
- Changed IBM-1046 to IBM1046 throughout.
- Added alias ibm-1046 in charsetalias.properties.
- Added IBM1046 to toolkit/xpfe navigator.properties so it shows up properly
under Middle Eastern section of View->Character Coding.
Assignee | ||
Updated•21 years ago
|
Attachment #142758 -
Flags: review?(jshin)
Assignee | ||
Updated•21 years ago
|
Attachment #142758 -
Attachment is obsolete: true
Attachment #142758 -
Flags: review?(jshin)
Comment 31•21 years ago
|
||
Comment on attachment 142758 [details] [diff] [review]
Patch v4
r=gisburn
Attachment #142758 -
Flags: superreview?(roc)
Attachment #142758 -
Flags: review+
Attachment #142758 -
Flags: review+ → review?(jshin)
Assignee | ||
Comment 32•21 years ago
|
||
Sorry for not being more clear, but I spoke with the IBM team responsible for
registering character sets, and they would prefer to not register IBM1046 (and
its associated character set with the euro character, IBM9238). I am preparing
a patch which uses the x- prefix. This is why the last patch was marked
obsolete.
Assignee | ||
Updated•21 years ago
|
Attachment #142758 -
Flags: superreview?(roc)
Attachment #142758 -
Flags: review?(jshin)
Assignee | ||
Comment 33•21 years ago
|
||
This is a new minimal approach to this bug in the hope that we can get this
checked in before Mozilla 1.7b. Major changes include:
- We are now using the x- prefix on IBM1046. I have contacted the team which
does IANA registrations within IBM, and they are hesitant to register IBM1046
at this time.
- Marked the IBM1046 character set as "notForBrowser" and "notForOutgoing",
assuming that if those are ever needed in the future it is easy enough to
enable them.
- Enclosed the support for the IBM1046 character set within the
MOZ_EXTRA_X11CONVERTERS define, so they are only enabled on X11 platforms.
- Left out patches to the nsPresContext.cpp and nsObjectFrame.cpp until we can
find a valid page on the external Internet which uses the IBM-1046 character
set.
Assignee | ||
Updated•21 years ago
|
Attachment #142919 -
Flags: superreview?(roc)
Attachment #142919 -
Flags: review?(jshin)
Comment 34•21 years ago
|
||
> Marked the IBM1046 character set as "notForBrowser" and "notForOutgoing",
> assuming that if those are ever needed in the future it is easy enough to
> enable them.
I thought IBM1046 IS needed _now_ because on AIX (ar_AA locale), the default
character encoding is IBM1046. If you flag it as 'notForBrowser', you'd not be
able to select it in View | Character Encoding menu. With it flagged as
'notForOutgoing', you wouldn't be able to save a file in IBM1046. Is it what you
intended (did you change your mind about IBM1046) ?
> Enclosed the support for the IBM1046 character set within the
> MOZ_EXTRA_X11CONVERTERS define, so they are only enabled on X11 platforms.
I'm sorry I was wrong that you could enclose it with MOZ_EXTRA_X11CONVERTERS.
You can do that _only if_ IBM1046 is flagged as 'notForBrowser' and
'notForOutgoing'. It is the case in your patch, but as I wrote above, I'm afraid
that's not what you intended (originally).
Comment on attachment 142919 [details] [diff] [review]
Patch v5
Looks OK, but my superreview can't add much to this kind of patch.
Attachment #142919 -
Flags: superreview?(roc) → superreview+
Assignee | ||
Comment 36•21 years ago
|
||
(In reply to comment #34)
> I thought IBM1046 IS needed _now_ because on AIX (ar_AA locale), the default
> character encoding is IBM1046. If you flag it as 'notForBrowser', you'd not
be
> able to select it in View | Character Encoding menu. With it flagged as
> 'notForOutgoing', you wouldn't be able to save a file in IBM1046. Is it what
you
> intended (did you change your mind about IBM1046) ?
We changed our mind on IBM-1046. It works fine with Arabic input, fonts, and
file names as the patch is currently written. Since we were unable to come up
with a single website in the IBM1046 encoding, I think it is fine to mark it
as 'notForBrowser'. In addition, we don't *really* need to save pages in the
IBM1046 character set, and anyone who really needs this can just use AIX's
iconv to convert individual pages to IBM1046. The primary motivation for this
bug was to get support for the IBM1046 fonts.
> I'm sorry I was wrong that you could enclose it with MOZ_EXTRA_X11CONVERTERS.
> You can do that _only if_ IBM1046 is flagged as 'notForBrowser' and
> 'notForOutgoing'. It is the case in your patch, but as I wrote above, I'm
afraid
> that's not what you intended (originally).
You are correct in that this was not the original intent, but we are happy
with marking it as 'notForBrowser' and 'notForOutgoing'. If we ever find a
page in the future which needs IBM1046 support in the browser, it is easy
enough to enable it at that time.
Comment 37•21 years ago
|
||
Comment on attachment 142919 [details] [diff] [review]
Patch v5
r=jshin
Just one change is necessary. Because it's marked as
notForBrowser/notForOutgoing, there's no point of adding it to
charsetTitle.properties. Please, get rid of it for the now.
I was about to suggest that we shouldn't add IBM1046 decoder (if we just need
it as an X11 font encoding), but I realize that intl/locale (e.g. datetime
code) depends on it.
BTW, what do you get from nl_langinfo(CODESET) under ar_AA locale on AIX?
Attachment #142919 -
Flags: review?(jshin) → review+
Reporter | ||
Comment 38•21 years ago
|
||
> I was about to suggest that we shouldn't add IBM1046 decoder (if we just need
> it as an X11 font encoding), but I realize that intl/locale (e.g. datetime
> code) depends on it.
We still need the decoder for Arabic input under Ar_AA locale as well.
> BTW, what do you get from nl_langinfo(CODESET) under ar_AA locale on
AIX?
it returns IBM-1046 under Ar_AA locale
it returns ISO8859-6 under ar_AA locale
Assignee | ||
Comment 39•21 years ago
|
||
Fix checked in (without change to charsetTitles.properties).
Checking in gfx/src/gtk/nsFontMetricsGTK.cpp;
/cvsroot/mozilla/gfx/src/gtk/nsFontMetricsGTK.cpp,v <-- nsFontMetricsGTK.cpp
new revision: 1.268; previous revision: 1.267
done
Checking in gfx/src/xlib/nsFontMetricsXlib.cpp;
/cvsroot/mozilla/gfx/src/xlib/nsFontMetricsXlib.cpp,v <-- nsFontMetricsXlib.cpp
new revision: 1.165; previous revision: 1.164
done
Checking in intl/uconv/src/charsetData.properties;
/cvsroot/mozilla/intl/uconv/src/charsetData.properties,v <--
charsetData.properties
new revision: 1.39; previous revision: 1.38
done
Checking in intl/uconv/src/nsUConvModule.cpp;
/cvsroot/mozilla/intl/uconv/src/nsUConvModule.cpp,v <-- nsUConvModule.cpp
new revision: 1.53; previous revision: 1.52
done
Checking in intl/uconv/src/unixcharset.properties;
/cvsroot/mozilla/intl/uconv/src/unixcharset.properties,v <--
unixcharset.properties
new revision: 1.24; previous revision: 1.23
done
Checking in intl/uconv/ucvibm/Makefile.in;
/cvsroot/mozilla/intl/uconv/ucvibm/Makefile.in,v <-- Makefile.in
new revision: 1.19; previous revision: 1.18
done
RCS file: /cvsroot/mozilla/intl/uconv/ucvibm/cp1046.uf,v
done
Checking in intl/uconv/ucvibm/cp1046.uf;
/cvsroot/mozilla/intl/uconv/ucvibm/cp1046.uf,v <-- cp1046.uf
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/intl/uconv/ucvibm/cp1046.ut,v
done
Checking in intl/uconv/ucvibm/cp1046.ut;
/cvsroot/mozilla/intl/uconv/ucvibm/cp1046.ut,v <-- cp1046.ut
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/intl/uconv/ucvibm/nsCP1046ToUnicode.cpp,v
done
Checking in intl/uconv/ucvibm/nsCP1046ToUnicode.cpp;
/cvsroot/mozilla/intl/uconv/ucvibm/nsCP1046ToUnicode.cpp,v <--
nsCP1046ToUnicode.cpp
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/intl/uconv/ucvibm/nsCP1046ToUnicode.h,v
done
Checking in intl/uconv/ucvibm/nsCP1046ToUnicode.h;
/cvsroot/mozilla/intl/uconv/ucvibm/nsCP1046ToUnicode.h,v <-- nsCP1046ToUnicode.h
initial revision: 1.1
done
Checking in intl/uconv/ucvibm/nsUCvIBMCID.h;
/cvsroot/mozilla/intl/uconv/ucvibm/nsUCvIBMCID.h,v <-- nsUCvIBMCID.h
new revision: 1.7; previous revision: 1.6
done
RCS file: /cvsroot/mozilla/intl/uconv/ucvibm/nsUnicodeToCP1046.cpp,v
done
Checking in intl/uconv/ucvibm/nsUnicodeToCP1046.cpp;
/cvsroot/mozilla/intl/uconv/ucvibm/nsUnicodeToCP1046.cpp,v <--
nsUnicodeToCP1046.cpp
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/intl/uconv/ucvibm/nsUnicodeToCP1046.h,v
done
Checking in intl/uconv/ucvibm/nsUnicodeToCP1046.h;
/cvsroot/mozilla/intl/uconv/ucvibm/nsUnicodeToCP1046.h,v <-- nsUnicodeToCP1046.h
initial revision: 1.1
done
Status: ASSIGNED → RESOLVED
Closed: 22 years ago → 21 years ago
Resolution: --- → FIXED
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in
before you can comment on or make changes to this bug.
Description
•