Closed Bug 210629 Opened 21 years ago Closed 20 years ago

add encoder/decoder for (x-)IBM-1046 used for Arabic on AIX

Categories

(Core :: Layout: Text and Fonts, defect)

All
AIX
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: ahtaha, Assigned: pkwarren)

Details

(Keywords: intl)

Attachments

(1 file, 6 obsolete files)

User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705; .NET CLR 1.1.4322)
Build Identifier: Mozilla/5.0 (X11; U; AIX 0044F61A4C00; en-US; rv:1.3) Gecko/20030624

AIX mozilla displays Arabic letters as question marks when browsing any Arabic 
page, because Mozilla doesn't support IBM-1046 encoding and fonts on AIX.
IBM-1046 charset based fonts are the default Arabic fonts on AIX.
The proposed solution is to add IBM-1046 charset support to Mozilla.

Reproducible: Always

Steps to Reproduce:
1. Get AIX system with Default Arabic fonts installed on
2. Get Mozilla on AIX
3. Launch the browser and open any Arabic page (http://www.ahram.org.eg - 
http://akhbarelyom.org.eg)

Actual Results:  
All Arabic letters apeared as question marks

Expected Results:  
Arabic letters to be displayed properly
OS: other → AIX
The attached file contains the following files: 
1. intl.mod.tar: contains all modified files under intl directory
2. intl.new.tar: contains the new files created under intl directory
3. gfx.mod.tar: contains the modified files under gfx directory
4. layout.mod.tar: contains the modified files under layout directory
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Eh Ahmad ... why did you change thuis to 'fixed' ? Did you find the reason why
theu didn't display ?
Hi Jo,
I changed the status to fixed as the attached file contains the required files 
to solve the bug. 
unfortunately the modified files are not in patch format. You might cvs diff 
them.
We only mark bugs FIXED when the fixes are checked in.

I'll try to make a diff for this.
Status: RESOLVED → UNCONFIRMED
Resolution: FIXED → ---
This isn't going to work. I have to have a diff for the old files. I'm already
out of sync.

Please do a cvs diff -u against your tree with these files and post the diff.

I'll do a diff for the new files.
Attached file cvs diff patchs for the modified files (obsolete) —
I have created cvs diff patchs for the modified files (12 files) to add
IBM-1046 charset support to Mozilla
Do we need the decoder as well as the encoder? If IBM-1046 is only used as a
font encoding by AIX and not in web pages, then we only need conversion from
Unicode to IBM-1046 and not vice versa, as we currently have for the Langbox
Arabic encodings, so only part of the patch would be necessary.

Also, since it's not defined in IANA, we would call it x-IBM-1046.
Can file names need decoding?
Lina Kemmel wrote:
> Can file names need decoding?

Do you mean something like that the Unix env variable $LANG is set to something
like "ar.IBM-1046" (=Arabic locale which uses IBM-1046 encoding) ? Then the
answer is "yes".
Roland Mainz wrote:
>> Do you mean ... Arabic locale which uses IBM-1046 encoding?

Yes, Ar_AA.IBM-1046, or just its alias Ar_AA.
We need the decoder as well as the encoder for the following cases:
1. All web files that are created under the Arabic locale "Ar_AA" (This is the 
arabic locale that uses IBM-1046 charset.
You can find html files describing steps or whatever that are created on AIX 
mainly, manually or automatically (for example in a banking system ...etc). 
These files will be created using IBM-1046 charset.

2. In case running the browser or one of its components such as the mail 
composer, or Java Script editor in Arabic locale "Ar_AA" and input any Arabic 
data.

As for the IANA issue, we will start the adequate procedures to register IBM-
1046 charset with IANA. So we will not need to rename it to x-IBM-1046
Please attach the dif file directly to the bug.

Don't attach tar.gz files.
This patch doesn't include the new additional 6 files to completely support
IBM-1046 charset. 
The new 6 files are attached in the first attachement.
I'm changing the summary line to better describe what this bug is about.

+ucs2.base=UTF-16BE
 
This line seems to be a 'pollution'...

@@ -88,6 +90,13 @@
+##
+## Aliases for IBM-1046
+##
+ibm-1046=IBM-1046
+cp1046=IBM-1046
+1046=IBM-1046
+csIBM1046=IBM-1046

Are all of these actually used in emails/web pages or at least documented
somewhere? Aliases like 'csIBMxxx' were put there because IANA charset registry
has them. If IBM-1046 is not even registered, yet, you don't need them.  

Can you give a couple of examples of actual web pages  encoded in IBM-1046? Two
web sites given in your report are in Windows-1256, which prompted Simon's
question in comment #8. 
Summary: Arabic letters are displayed as question marks on AIX → add encoder/decoder for (x-)IBM-1046 used for Arabic on AIX
This line "+ucs2.base=UTF-16BE" should not be there, it's added for debugging 
purpose and should be removed. 

The aliases for IBM-1046 are taken from Linux conversion module. Mapping 
between IBM-1046 and Unicode can be obtained from 
http://oss.software.ibm.com/cvs/icu//charset/data/ucm/

The two web sites givven in my report are just as an example for any arabic 
page. The main concern is to enable Mozilla using default Arabic fonts in AIX 
(IBM-1046 based)

IBM-1046 encoded content may not be available so frequent on the web, however 
it's more likely to be found on intranets mainly generated on an AIX host.

for testing purpose, it's possible to create IBM-1046 encoded html page using 
the following technique:

1. Save any Arabic web page
2. Transfer it to AIX host in binary mode
3. Use iconv command to convert encoding of the test page to IBM-1046
   iconv -f <CHARSET> -t IBM-1046 < TestPage.html > TestPage-1046.html
> The aliases for IBM-1046 are taken from Linux conversion module

  Then, it must have come from somewhere. If nobody has used names like
csIBM1046 as a MIME charset name, we don't need them in alias. Well, it's not a
big deal but I just wanted know whether it's documented somewhere (Linux
iconv(3) doesn't count ;-) because it's not for web publication/email/news but
for *internal* use )

> The main concern is to enable Mozilla using default Arabic fonts in AIX 
> (IBM-1046 based)

  This is for Unicode to IBM-1046 (font-encoding) encoder. 

> it's more likely to be found on intranets mainly generated on an AIX host.

  Have you seen one? What's MIME charset to tag them? Do they (intranet
document) use one of aliases you listed? I guess they do, but just like to make
sure that we don't miss any other name used to tage IBM-1046.  

> it's possible to create IBM-1046 encoded html page using 
> the following technique

  Sure, it's possible that way. My point is whether we need to add a decoder if
even you can't come up with a single page that's encoded in IBM-1046 that is not
made for the sake of demonstration.  

  Given that there are a few more decoders (of bigger size) of more 'dubious'
value  in the current tree, I shouldn't make any more 'noise' :-)
As for aliases I think we can omit "csIBM1046" and "1046", and keep 
only "cp1046" as it's used in Java. 

As for the decoder, since Ar_AA locale on AIX uses IBM-1046 charmap so we do 
need the decoder to allow Arabic input in Mozilla when it's launched under 
Ar_AA locale.
Additional information about IBM PC code sets (IBM-1046 is one IBM PC code 
sets) can be obtained from the following URL: 
http://publib16.boulder.ibm.com/pseries/en_US/aixprggd/nlsgdrf/nlsgdrf02.htm#wq1
Assignee: mkaply → pkw
Status: UNCONFIRMED → NEW
Ever confirmed: true
pkw:
1. Is it possible to attach a *.enc mapping file for Xfree86 (see
http://xprint.freedesktop.org/cgi-bin/viewcvs.cgi/*checkout*/xprint/xprint/src/xprint_main/xc/fonts/encodings/iso8859-6.8x.enc?rev=HEAD&content-type=text/plain
for an example), please ?
2. Can you please attach an example font (PCF, BDF, etc.) to this bug that I can
test the patch (for review), please ?
Roland, wouldn't this suffice for you to generate the font encoding file for IBM
1046? 

http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/ucm/aix-IBM_1046-4.3.6.ucm?rev=1.1&content-type=text/plain

Keywords: intl
The font encoding and decoding files are attached to this bug. They were 
generated using umaptable tool.
Please have a look to the first attachment created on 2003-06-25 07:43 PST. I'm 
going to attach a new patch includes the new files as well as the modified 
files.
Attachment #126457 - Attachment is obsolete: true
Attachment #126540 - Attachment is obsolete: true
Attachment #126813 - Attachment is obsolete: true
Jungshik Shin wrote:
> wouldn't this suffice for you to generate the font encoding file for IBM
> 1046? 
>
http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/ucm/aix-IBM_1046-4.3.6.ucm?rev=1.1&content-type=text/plain

Cool... :)
I wasn't aware that this information was available in the public.

... but we still need one or two PCF/BDF example fonts to test whether:
a) this patch is working
b) my encoding map for Xfree86 works
Attached patch Patch v2 (obsolete) — Splinter Review
Latest version of the patch which is updated for the trunk. Removed some
unneeded aliases. As stated in comment 18, both the encoder and decoder are
needed to work properly.
Attachment #142359 - Flags: superreview?(smontagu)
Attachment #142359 - Flags: review?(jshin)
Attachment #142359 - Flags: superreview?(smontagu) → superreview?(blizzard)
Comment on attachment 142359 [details] [diff] [review]
Patch v2

> Index: gfx/src/gtk/nsFontMetricsGTK.cpp
> ===================================================================
> RCS file: /cvsroot/mozilla/gfx/src/gtk/nsFontMetricsGTK.cpp,v
> retrieving revision 1.267
> diff -u -6 -r1.267 nsFontMetricsGTK.cpp
> --- gfx/src/gtk/nsFontMetricsGTK.cpp    5 Feb 2004 01:57:03 -0000       1.267
> +++ gfx/src/gtk/nsFontMetricsGTK.cpp    26 Feb 2004 19:24:08 -0000
> @@ -22,12 +22,13 @@
>   * Contributor(s):
>   *   Pierre Phaneuf <pp@ludusdesign.com> 
>   *   Roland Mainz <roland.mainz@informatik.med.uni-giessen.de>
>   *   Brian Stell <bstell@ix.netcom.com>
>   *   Morten Nilsen <morten@nilsen.com>
>   *   Jungshik Shin <jshin@mailaps.org>
> + *   IBM Corporation

Erm... is it possible to provide an email here, please ? Just for the case that
someone has questions...

[snip]
>    { "ibm-1252",           &FLG_NONE,    &Unknown       },
>    { "ibm-850",            &FLG_NONE,    &Unknown       },
> +  { "ibm-1046",           &FLG_ARABIC,  &IBM1046       },

Rumor says the table should be in alphabetical order (I am not sure whether
this rule is mandatory anymore or just to make searching by humans faster :)

>    { "ibm-fontspecific",   &FLG_NONE,    &Unknown       },
>    { "ibm-sbdcn",          &FLG_NONE,    &Unknown       },
>    { "ibm-sbdtw",          &FLG_NONE,    &Unknown       },
>    { "ibm-special",        &FLG_NONE,    &Unknown       },
>    { "ibm-udccn",          &FLG_NONE,    &Unknown       },
>    { "ibm-udcjp",          &FLG_NONE,    &Unknown       },
> Index: gfx/src/xlib/nsFontMetricsXlib.cpp
> ===================================================================
> RCS file: /cvsroot/mozilla/gfx/src/xlib/nsFontMetricsXlib.cpp,v
> retrieving revision 1.162
> diff -u -6 -r1.162 nsFontMetricsXlib.cpp
> --- gfx/src/xlib/nsFontMetricsXlib.cpp  5 Feb 2004 01:57:06 -0000       1.162
> +++ gfx/src/xlib/nsFontMetricsXlib.cpp  26 Feb 2004 19:24:09 -0000
> @@ -25,12 +25,13 @@
>   *   Quy Tonthat <quy@igelaus.com.au>
>   *   Tony Tsui <tony@igelaus.com.au>
>   *   Tim Copperfield <timecop@network.email.ne.jp>
>   *   Roland Mainz <roland.mainz@informatik.med.uni-giessen.de>
>   *   Brian Stell <bstell@ix.netcom.com>
>   *   Jungshik Shin <jshin@mailaps.org>
> + *   IBM Corporation

See above.

[snip]
> +static nsFontCharSetInfoXlib IBM1046 =
> +  { "IBM-1046", SingleByteConvert, 0,
> +      TT_OS2_CPR1_ARABIC, TT_OS2_CPR2_ARABIC | TT_OS2_CPR2_ARABIC_708 };
>  static nsFontCharSetInfoXlib ISO88597 =
>    { "ISO-8859-7", SingleByteConvert, 0,
>      TT_OS2_CPR1_GREEK, TT_OS2_CPR2_GREEK | TT_OS2_CPR2_GREEK_437G };
>  static nsFontCharSetInfoXlib ISO88598 =
>    { "ISO-8859-8", SingleByteConvert, 0,
>      TT_OS2_CPR1_HEBREW, TT_OS2_CPR2_HEBREW };
> @@ -602,12 +606,13 @@
>    { "hp-tchinesebig5",    &FLG_ZHTW,    &Big5          },
>    { "hp-wa",              &FLG_NONE,    &Unknown       },
>    { "hpbig5-",            &FLG_ZHTW,    &Big5          },
>    { "hphkbig5-",          &FLG_ZHHK,    &HKSCS         },
>    { "hproc16-",           &FLG_NONE,    &Unknown       },
>    { "ibm-1252",           &FLG_NONE,    &Unknown       },
> +  { "ibm-1046",           &FLG_ARABIC,  &IBM1046       },

See above.

[snip]
> Index: intl/uconv/src/nsUConvModule.cpp
> ===================================================================
> RCS file: /cvsroot/mozilla/intl/uconv/src/nsUConvModule.cpp,v
> retrieving revision 1.52
> diff -u -6 -r1.52 nsUConvModule.cpp
> --- intl/uconv/src/nsUConvModule.cpp    7 Sep 2003 22:24:11 -0000       1.52
> +++ intl/uconv/src/nsUConvModule.cpp    26 Feb 2004 19:24:16 -0000
> @@ -18,12 +18,13 @@
>   * Netscape Communications Corporation.
>   * Portions created by the Initial Developer are Copyright (C) 1998
>   * the Initial Developer. All Rights Reserved.
>   *
>   * Contributor(s):
>   *   Pierre Phaneuf <pp@ludusdesign.com>
> + *   IBM Corporation

Email ?

[snip]
> Index: intl/uconv/ucvibm/nsUnicodeToCP1046.h
> ===================================================================
> RCS file: intl/uconv/ucvibm/nsUnicodeToCP1046.h
> diff -N intl/uconv/ucvibm/nsUnicodeToCP1046.h
> --- /dev/null   1 Jan 1970 00:00:00 -0000
> +++ intl/uconv/ucvibm/nsUnicodeToCP1046.h       26 Feb 2004 19:24:16 -0000
> @@ -0,0 +1,39 @@
> +/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*-
> + *
> + * The contents of this file are subject to the Netscape Public
> + * License Version 1.1 (the "License"); you may not use this file
> + * except in compliance with the License. You may obtain a copy of
> + * the License at http://www.mozilla.org/NPL/
> + *
> + * Software distributed under the License is distributed on an "AS
> + * IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
> + * implied. See the License for the specific language governing
> + * rights and limitations under the License.
> + *
> + * The Original Code is Mozilla Communicator client code.
> + *
> + * The Initial Developer of the Original Code is Netscape Communications
> + * Corporation.  Portions created by Netscape are
> + * Copyright (C) 1998 Netscape Communications Corporation. All
> + * Rights Reserved.
> + *
> + * Contributor(s): 
> + * IBM Corporation
> + *
> + */
> +
> +#ifndef nsUnicodeToCP1046_h___
> +#define nsUnicodeToCP1046_h___
> +
> +#include "nsISupports.h"
> +
> +
> +/**
> + * A character set converter from Unicode to CP1046.
> + */
> +NS_METHOD
> +nsUnicodeToCP1046Constructor(nsISupports *aOuter, REFNSIID aIID,
> +                            void **aResult);
> +
> +
> +#endif /* nsUnicodeToCP1046_h___ */

Nit:
#endif /* !nsUnicodeToCP1046_h___ */ when |ifndef| or |if !defined(xxx)| was
used.

> Index: layout/base/src/nsPresContext.cpp
> ===================================================================
> RCS file: /cvsroot/mozilla/layout/base/src/nsPresContext.cpp,v
> retrieving revision 3.250
> diff -u -6 -r3.250 nsPresContext.cpp
> --- layout/base/src/nsPresContext.cpp   23 Feb 2004 21:29:04 -0000      3.250
> +++ layout/base/src/nsPresContext.cpp   26 Feb 2004 19:24:16 -0000
> @@ -17,12 +17,13 @@
>   * The Initial Developer of the Original Code is 
>   * Netscape Communications Corporation.
>   * Portions created by the Initial Developer are Copyright (C) 1998
>   * the Initial Developer. All Rights Reserved.
>   *
>   * Contributor(s):
> + *   IBM Corporation
>   *
>   * Alternatively, the contents of this file may be used under the terms of
>   * either the GNU General Public License Version 2 or later (the "GPL"), or 
>   * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
>   * in which case the provisions of the GPL or the LGPL are applicable instead
>   * of those above. If you wish to allow use of your version of this file only
> @@ -1054,13 +1055,17 @@
>  #ifdef IBMBIDI
>  //ahmed
>  NS_IMETHODIMP
>  nsPresContext::IsArabicEncoding(PRBool& aResult) const
>  {
>    aResult=PR_FALSE;
> -  if ( (mCharset.EqualsIgnoreCase("ibm864") )||(mCharset.EqualsIgnoreCase("ibm864i") )||(mCharset.EqualsIgnoreCase("windows-1256") )||(mCharset.EqualsIgnoreCase("iso-8859-6") ))
> +  if (mCharset.EqualsIgnoreCase("ibm864")
> +      || mCharset.EqualsIgnoreCase("ibm864i")
> +      || mCharset.EqualsIgnoreCase("windows-1256")
> +      || mCharset.EqualsIgnoreCase("iso-8859-6")
> +      || mCharset.EqualsIgnoreCase("ibm-1046"))
>      aResult=PR_TRUE;
>    return NS_OK;
>  }

Small optimisation:
In theory you could assign |mCharset| to a temp. var, make all chars in that
temp. var lowercase and then you could use |Equals()| instead of
|EqualsIgnoreCase()|

>  NS_IMETHODIMP
>  nsPresContext::IsVisRTL(PRBool& aResult) const

Patch compiles and seems to work, r=roland.mainz@nrubsig.org
Attachment #142359 - Flags: review?(jshin) → review+
Attached patch Patch v3 (obsolete) — Splinter Review
Address Roland's comments.
Attachment #142359 - Attachment is obsolete: true
Attachment #142359 - Flags: superreview?(blizzard)
Attachment #142380 - Flags: superreview?(blizzard)
Attachment #142380 - Flags: review?(jshin)
Status: NEW → ASSIGNED
Comment on attachment 142380 [details] [diff] [review]
Patch v3


>+static nsFontCharSetInfo IBM1046 =
>+  { "IBM-1046", SingleByteConvert, 0,
>+      TT_OS2_CPR1_ARABIC, TT_OS2_CPR2_ARABIC | TT_OS2_CPR2_ARABIC_708 };

I've just checked the IANA charset registry [1] and found that IBM1046 hadn't
been registered. In that case, we shouldn't use 'IBM1046' (nor 'IBM-1046'). The
canonical name for this encoding should be 'x-IBM1046'. Better would be to
register IBM1046 with the IANA and then use 'IBM1046' (no '-') instead of
'x-IBM1046'. See 

http://www.iana.org/assignments/character-sets

I guess it'd not take long to register it. 

>+  { "ibm-1046",           &FLG_ARABIC,  &IBM1046       },

This is fine assuming that AIX comes with X11 fonts with 'ibm-1046' as the last
two fields of XLFD. 

>Index: gfx/src/xlib/nsFontMetricsXlib.cpp

>+static nsFontCharSetInfoXlib IBM1046 =
>+  { "IBM-1046", SingleByteConvert, 0,
>+      TT_OS2_CPR1_ARABIC, TT_OS2_CPR2_ARABIC | 

The same here. 'IBM-1046' should be replaced by 'x-IBM1046'


>Index: intl/uconv/src/charsetData.properties

> ibm866.LangGroup                   = x-cyrillic
>+ibm-1046.LangGroup                 = ar

Pls, use 'x-ibm1046' and add the following line :

x-ibm1046.notForOutgoing    = true 

unless it's necessary to save files and to send emails in IBM1046. 


>+ibm-1046.title = Arabic (IBM-1046)

   x-ibm1046.title = Arabic (IBM-1046)

>Index: intl/uconv/src/charsetalias.properties

> ibm864i=IBM864i
>+##
>+## Aliases for IBM-1046
>+##
>+ibm-1046=IBM-1046
>+cp1046=IBM-1046


>Index: intl/uconv/src/nsUConvModule.cpp


>+NS_UCONV_REG_UNREG("IBM-1046", NS_CP1046TOUNICODE_CID, NS_UNICODETOCP1046_CID)

>+    DECODER_NAME_BASE "IBM-1046" , NS_CP1046TOUNICODE_CID,
>+    NS_UNICODEDECODER_CONTRACTID_BASE "IBM-1046",
>+    nsCP1046ToUnicodeConstructor ,

s/IBM-1046/x-IBM1046/  : a few more in this file.


>Index: intl/uconv/src/unixcharset.properties

> # AIX
>-#locale.all.Ar_AA=IBM-1046
>+locale.all.Ar_AA=IBM-1046

ditto



>Index: intl/uconv/ucvibm/cp1046.ut

>+ * The contents of this file are subject to the Netscape Public License
>+ * Version 1.0 (the "NPL"); you may not use this file except in
>+ * compliance with the NPL.  You may obtain a copy of the NPL at

All new files should use GPL/MPL/LGPL instead of NPL. See 

http://www.mozilla.org/MPL/boilerplate-1.1/

>Index: intl/uconv/ucvibm/nsCP1046ToUnicode.cpp
>Index: intl/uconv/ucvibm/nsCP1046ToUnicode.h

  ditto.
> Pls, use 'x-ibm1046' and add the following line :

> x-ibm1046.notForOutgoing    = true 

> unless it's necessary to save files and to send emails in IBM1046.

In addition, if, as I suspect, we don't need to let users save files and send
emails in IBM1046, IBM1046 related-files (in intl/uconv/ucvlatin/Makefile.in)
and IBM1046-related lines (in nsUConvModules.cpp) have to be enclosed by '#ifdef
MOZ_EXTRA_X11CONVERTERS'  and '#endif'. 
I have a new patch which I will be posting shortly.

(In reply to comment #27)
> >+static nsFontCharSetInfo IBM1046 =
> >+  { "IBM-1046", SingleByteConvert, 0,
> >+      TT_OS2_CPR1_ARABIC, TT_OS2_CPR2_ARABIC | TT_OS2_CPR2_ARABIC_708 };
> 
> I've just checked the IANA charset registry [1] and found that IBM1046 hadn't
> been registered. In that case, we shouldn't use 'IBM1046' (nor 'IBM-1046'). The
> canonical name for this encoding should be 'x-IBM1046'. Better would be to
> register IBM1046 with the IANA and then use 'IBM1046' (no '-') instead of
> 'x-IBM1046'. See 
> 
> http://www.iana.org/assignments/character-sets
> 
> I guess it'd not take long to register it.

I have started the process internally to register IBM1046 with IANA. Once I have
more information to report on this, I will add it to this bug. For now, I will
change this line to be IBM1046 instead of IBM-1046.

> >+  { "ibm-1046",           &FLG_ARABIC,  &IBM1046       },
> 
> This is fine assuming that AIX comes with X11 fonts with 'ibm-1046' as the last 
> two fields of XLFD. 

It does:

pkw@wave:~/ > xlsfonts | grep 'ibm-1046$'
-applied arabic-boutros typing-bold-i-normal--0-0-96-96-m-0-ibm-1046
-applied arabic-boutros typing-bold-i-normal--11-80-96-96-m-60-ibm-1046
-applied arabic-boutros typing-bold-i-normal--13-100-96-96-m-80-ibm-1046
-applied arabic-boutros typing-bold-i-normal--16-120-96-96-m-100-ibm-1046
...

> >Index: gfx/src/xlib/nsFontMetricsXlib.cpp
> 
> >+static nsFontCharSetInfoXlib IBM1046 =
> >+  { "IBM-1046", SingleByteConvert, 0,
> >+      TT_OS2_CPR1_ARABIC, TT_OS2_CPR2_ARABIC | 
> 
> The same here. 'IBM-1046' should be replaced by 'x-IBM1046'

Changed to IBM1046.

> >Index: intl/uconv/src/charsetData.properties
> 
> > ibm866.LangGroup                   = x-cyrillic
> >+ibm-1046.LangGroup                 = ar
> 
> Pls, use 'x-ibm1046' and add the following line :
> 
> x-ibm1046.notForOutgoing    = true 
> 
> unless it's necessary to save files and to send emails in IBM1046.

While I don't think it is necessary to send emails in IBM1046, it is useful to
be able to save files in IBM1046. The Ar_AA locale on AIX saves all files in
this encoding, so for users of this locale this is a useful feature. I have
changed the line to ibm1046 instead of ibm-1046.

> >+ibm-1046.title = Arabic (IBM-1046)
> 
>    x-ibm1046.title = Arabic (IBM-1046)

Changed to ibm1046 instead of ibm-1046.

> >Index: intl/uconv/src/charsetalias.properties
> 
> > ibm864i=IBM864i
> >+##
> >+## Aliases for IBM-1046
> >+##
> >+ibm-1046=IBM-1046
> >+cp1046=IBM-1046

Changed IBM-1046 to IBM1046. Changed existing ibm-1046 alias to ibm1046.

> >Index: intl/uconv/src/nsUConvModule.cpp
> 
> 
> >+NS_UCONV_REG_UNREG("IBM-1046", NS_CP1046TOUNICODE_CID, NS_UNICODETOCP1046_CID)
> 
> >+    DECODER_NAME_BASE "IBM-1046" , NS_CP1046TOUNICODE_CID,
> >+    NS_UNICODEDECODER_CONTRACTID_BASE "IBM-1046",
> >+    nsCP1046ToUnicodeConstructor ,
> 
> s/IBM-1046/x-IBM1046/  : a few more in this file.

Changed to IBM1046.

> >Index: intl/uconv/src/unixcharset.properties
> 
> > # AIX
> >-#locale.all.Ar_AA=IBM-1046
> >+locale.all.Ar_AA=IBM-1046
> 
> ditto

Changed to IBM1046.

> >Index: intl/uconv/ucvibm/cp1046.ut
> 
> >+ * The contents of this file are subject to the Netscape Public License
> >+ * Version 1.0 (the "NPL"); you may not use this file except in
> >+ * compliance with the NPL.  You may obtain a copy of the NPL at
> 
> All new files should use GPL/MPL/LGPL instead of NPL. See 
> 
> http://www.mozilla.org/MPL/boilerplate-1.1/

Done. These license statements are auto-generated from the umaptable command -
should the license in umaptable.c be modified to the new tri-license?

> >Index: intl/uconv/ucvibm/nsCP1046ToUnicode.cpp
> >Index: intl/uconv/ucvibm/nsCP1046ToUnicode.h
> 
>   ditto.

Done.
Attachment #142380 - Attachment is obsolete: true
Attachment #142380 - Flags: superreview?(blizzard)
Attachment #142380 - Flags: review?(jshin)
Attached patch Patch v4 (obsolete) — Splinter Review
- Updated to correct license on all new code.
- Changed IBM-1046 to IBM1046 throughout.
- Added alias ibm-1046 in charsetalias.properties.
- Added IBM1046 to toolkit/xpfe navigator.properties so it shows up properly
under Middle Eastern section of View->Character Coding.
Attachment #142758 - Flags: review?(jshin)
Attachment #142758 - Attachment is obsolete: true
Attachment #142758 - Flags: review?(jshin)
Comment on attachment 142758 [details] [diff] [review]
Patch v4

r=gisburn
Attachment #142758 - Flags: superreview?(roc)
Attachment #142758 - Flags: review+
Attachment #142758 - Flags: review+ → review?(jshin)
Sorry for not being more clear, but I spoke with the IBM team responsible for 
registering character sets, and they would prefer to not register IBM1046 (and 
its associated character set with the euro character, IBM9238). I am preparing 
a patch which uses the x- prefix. This is why the last patch was marked 
obsolete.
Attachment #142758 - Flags: superreview?(roc)
Attachment #142758 - Flags: review?(jshin)
Attached patch Patch v5Splinter Review
This is a new minimal approach to this bug in the hope that we can get this
checked in before Mozilla 1.7b. Major changes include:

- We are now using the x- prefix on IBM1046. I have contacted the team which
does IANA registrations within IBM, and they are hesitant to register IBM1046
at this time. 
- Marked the IBM1046 character set as "notForBrowser" and "notForOutgoing",
assuming that if those are ever needed in the future it is easy enough to
enable them.
- Enclosed the support for the IBM1046 character set within the
MOZ_EXTRA_X11CONVERTERS define, so they are only enabled on X11 platforms.
- Left out patches to the nsPresContext.cpp and nsObjectFrame.cpp until we can
find a valid page on the external Internet which uses the IBM-1046 character
set.
Attachment #142919 - Flags: superreview?(roc)
Attachment #142919 - Flags: review?(jshin)
> Marked the IBM1046 character set as "notForBrowser" and "notForOutgoing",
> assuming that if those are ever needed in the future it is easy enough to
> enable them.

I thought IBM1046 IS needed _now_ because on AIX (ar_AA locale), the default
character encoding is IBM1046. If you flag it as 'notForBrowser', you'd not be
able to select it in View | Character Encoding menu. With it flagged as
'notForOutgoing', you wouldn't be able to save a file in IBM1046. Is it what you
intended (did you change your mind about IBM1046) ?

> Enclosed the support for the IBM1046 character set within the
> MOZ_EXTRA_X11CONVERTERS define, so they are only enabled on X11 platforms.

I'm sorry I was wrong that you could enclose it with MOZ_EXTRA_X11CONVERTERS.
You can do that _only if_ IBM1046 is flagged as 'notForBrowser' and
'notForOutgoing'. It is the case in your patch, but as I wrote above, I'm afraid
that's not what you intended (originally). 
Comment on attachment 142919 [details] [diff] [review]
Patch v5

Looks OK, but my superreview can't add much to this kind of patch.
Attachment #142919 - Flags: superreview?(roc) → superreview+
(In reply to comment #34)
> I thought IBM1046 IS needed _now_ because on AIX (ar_AA locale), the default
> character encoding is IBM1046. If you flag it as 'notForBrowser', you'd not 
be
> able to select it in View | Character Encoding menu. With it flagged as
> 'notForOutgoing', you wouldn't be able to save a file in IBM1046. Is it what 
you
> intended (did you change your mind about IBM1046) ?

We changed our mind on IBM-1046. It works fine with Arabic input, fonts, and 
file names as the patch is currently written. Since we were unable to come up 
with a single website in the IBM1046 encoding, I think it is fine to mark it 
as 'notForBrowser'. In addition, we don't *really* need to save pages in the 
IBM1046 character set, and anyone who really needs this can just use AIX's 
iconv to convert individual pages to IBM1046. The primary motivation for this 
bug was to get support for the IBM1046 fonts.

> I'm sorry I was wrong that you could enclose it with MOZ_EXTRA_X11CONVERTERS.
> You can do that _only if_ IBM1046 is flagged as 'notForBrowser' and
> 'notForOutgoing'. It is the case in your patch, but as I wrote above, I'm 
afraid
> that's not what you intended (originally). 

You are correct in that this was not the original intent, but we are happy 
with marking it as 'notForBrowser' and 'notForOutgoing'. If we ever find a 
page in the future which needs IBM1046 support in the browser, it is easy 
enough to enable it at that time.
Comment on attachment 142919 [details] [diff] [review]
Patch v5

r=jshin
Just one change is necessary. Because it's marked as
notForBrowser/notForOutgoing, there's no point of adding it to
charsetTitle.properties. Please, get rid of it for the now.

I was about to suggest that we shouldn't add IBM1046 decoder (if we just need
it as an X11 font encoding), but I realize that intl/locale (e.g. datetime
code) depends on it. 

BTW, what do you get from nl_langinfo(CODESET)	under ar_AA locale on AIX?
Attachment #142919 - Flags: review?(jshin) → review+
> I was about to suggest that we shouldn't add IBM1046 decoder (if we just need
> it as an X11 font encoding), but I realize that intl/locale (e.g. datetime
> code) depends on it. 
We still need the decoder for Arabic input under Ar_AA locale as well.

> BTW, what do you get from nl_langinfo(CODESET)	under ar_AA locale on 
AIX?
it returns IBM-1046 under Ar_AA locale
it returns ISO8859-6 under ar_AA locale
Fix checked in (without change to charsetTitles.properties).

Checking in gfx/src/gtk/nsFontMetricsGTK.cpp;
/cvsroot/mozilla/gfx/src/gtk/nsFontMetricsGTK.cpp,v  <--  nsFontMetricsGTK.cpp
new revision: 1.268; previous revision: 1.267
done
Checking in gfx/src/xlib/nsFontMetricsXlib.cpp;
/cvsroot/mozilla/gfx/src/xlib/nsFontMetricsXlib.cpp,v  <--  nsFontMetricsXlib.cpp
new revision: 1.165; previous revision: 1.164
done
Checking in intl/uconv/src/charsetData.properties;
/cvsroot/mozilla/intl/uconv/src/charsetData.properties,v  <-- 
charsetData.properties
new revision: 1.39; previous revision: 1.38
done
Checking in intl/uconv/src/nsUConvModule.cpp;
/cvsroot/mozilla/intl/uconv/src/nsUConvModule.cpp,v  <--  nsUConvModule.cpp
new revision: 1.53; previous revision: 1.52
done
Checking in intl/uconv/src/unixcharset.properties;
/cvsroot/mozilla/intl/uconv/src/unixcharset.properties,v  <-- 
unixcharset.properties
new revision: 1.24; previous revision: 1.23
done
Checking in intl/uconv/ucvibm/Makefile.in;
/cvsroot/mozilla/intl/uconv/ucvibm/Makefile.in,v  <--  Makefile.in
new revision: 1.19; previous revision: 1.18
done
RCS file: /cvsroot/mozilla/intl/uconv/ucvibm/cp1046.uf,v
done
Checking in intl/uconv/ucvibm/cp1046.uf;
/cvsroot/mozilla/intl/uconv/ucvibm/cp1046.uf,v  <--  cp1046.uf
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/intl/uconv/ucvibm/cp1046.ut,v
done
Checking in intl/uconv/ucvibm/cp1046.ut;
/cvsroot/mozilla/intl/uconv/ucvibm/cp1046.ut,v  <--  cp1046.ut
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/intl/uconv/ucvibm/nsCP1046ToUnicode.cpp,v
done
Checking in intl/uconv/ucvibm/nsCP1046ToUnicode.cpp;
/cvsroot/mozilla/intl/uconv/ucvibm/nsCP1046ToUnicode.cpp,v  <-- 
nsCP1046ToUnicode.cpp
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/intl/uconv/ucvibm/nsCP1046ToUnicode.h,v
done
Checking in intl/uconv/ucvibm/nsCP1046ToUnicode.h;
/cvsroot/mozilla/intl/uconv/ucvibm/nsCP1046ToUnicode.h,v  <--  nsCP1046ToUnicode.h
initial revision: 1.1
done
Checking in intl/uconv/ucvibm/nsUCvIBMCID.h;
/cvsroot/mozilla/intl/uconv/ucvibm/nsUCvIBMCID.h,v  <--  nsUCvIBMCID.h
new revision: 1.7; previous revision: 1.6
done
RCS file: /cvsroot/mozilla/intl/uconv/ucvibm/nsUnicodeToCP1046.cpp,v
done
Checking in intl/uconv/ucvibm/nsUnicodeToCP1046.cpp;
/cvsroot/mozilla/intl/uconv/ucvibm/nsUnicodeToCP1046.cpp,v  <-- 
nsUnicodeToCP1046.cpp
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/intl/uconv/ucvibm/nsUnicodeToCP1046.h,v
done
Checking in intl/uconv/ucvibm/nsUnicodeToCP1046.h;
/cvsroot/mozilla/intl/uconv/ucvibm/nsUnicodeToCP1046.h,v  <--  nsUnicodeToCP1046.h
initial revision: 1.1
done
Status: ASSIGNED → RESOLVED
Closed: 21 years ago20 years ago
Resolution: --- → FIXED
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: