Closed Bug 97175 Opened 23 years ago Closed 23 years ago

startup perf- remove the need of loading of charsetalias.properties files at startup

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla0.9.6

People

(Reporter: ftang, Assigned: jbetak)

References

Details

(Keywords: perf, Whiteboard: requested sr by email on 11/02/2001)

Attachments

(1 file, 10 obsolete files)

2.58 KB, patch
nhottanscp
: review+
kinmoz
: superreview+
Details | Diff | Splinter Review
we currently load charsetalias.properties at startup time, we should find a way 
to 
remove the need of loading it at startup time to speed up startup performance
we could 
1. build in common used pair in the cpp code, and/or
2. cache the resolved result in registry.
Blocks: 7251
Keywords: perf
Summary: startup perf- remove the need of loading of charsetalias.properties files at startup → startup perf- remove the need of loading of charsetalias.properties files at startup
per offline discussion with ftang - tentatively reassigning to myself
Assignee: ftang → jbetak
list of charset aliases requested during startup in sequential order (US trunk 
build):

windows-1252
x-mac-roman
us-ascii
x-u-escaped
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-6-I
ISO-8859-6-E
ISO-8859-7
ISO-8859-8
ISO-8859-8-I
ISO-8859-8-E
ISO-8859-9
ISO-8859-10
ISO-8859-13
ISO-8859-14
ISO-8859-15
ISO-IR-111
windows-1250
windows-1251
windows-1253
windows-1254
windows-1255
windows-1256
windows-1257
windows-1258
TIS-620
IBM866
KOI8-R
KOI8-U
x-mac-ce
x-mac-greek
x-mac-turkish
x-mac-croatian
x-mac-romanian
x-mac-cyrillic
x-mac-ukrainian
x-mac-icelandic
armscii-8
x-viet-tcvn5712
VISCII
x-viet-vps
UTF-7
x-imap4-modified-utf7
UTF-16BE
UTF-16LE
UTF-32BE
UTF-32LE
T.61-8bit
x-user-defined
x-mac-arabic
x-mac-devanagari
x-mac-farsi
x-mac-gurmukhi
x-mac-gujarati
x-mac-hebrew
IBM850
IBM852
IBM855
IBM857
IBM862
IBM864
IBM864i
Shift_JIS
EUC-JP
ISO-2022-JP
GB2312
windows-936
x-gbk
HZ-GB-2312
gb18030
Big5
Big5-HKSCS
x-euc-tw
EUC-KR
x-johab
x-windows-949
windows-1252
x-mac-roman
us-ascii
x-u-escaped
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-6-I
ISO-8859-6-E
ISO-8859-7
ISO-8859-8
ISO-8859-8-I
ISO-8859-8-E
ISO-8859-9
ISO-8859-10
ISO-8859-13
ISO-8859-14
ISO-8859-15
ISO-IR-111
windows-1250
windows-1251
windows-1253
windows-1254
windows-1255
windows-1256
windows-1257
windows-1258
TIS-620
IBM866
KOI8-R
KOI8-U
x-mac-ce
x-mac-greek
x-mac-turkish
x-mac-croatian
x-mac-romanian
x-mac-cyrillic
x-mac-ukrainian
x-mac-icelandic
armscii-8
x-viet-tcvn5712
VISCII
x-viet-vps
UTF-7
x-imap4-modified-utf7
UTF-16BE
UTF-16LE
UTF-16
UTF-32BE
UTF-32LE
T.61-8bit
x-user-defined
Adobe-Symbol-Encoding
x-zapf-dingbats
x-mac-arabic
x-mac-devanagari
x-mac-farsi
x-mac-gurmukhi
x-mac-gujarati
x-mac-hebrew
IBM850
IBM852
IBM855
IBM857
IBM862
IBM864
IBM864i
Shift_JIS
EUC-JP
ISO-2022-JP
jis_0201
jis_0208-1983
jis_0212-1990
GB2312
windows-936
x-gbk
x-gbk-noascii
HZ-GB-2312
gb_2312-80
gb18030.2000-0
gb18030.2000-1
gb18030
Big5
x-x-big5
Big5-HKSCS
hkscs-1
x-euc-tw
x-cns-11643-1
x-cns-11643-2
x-cns-11643-3
x-cns-11643-4
x-cns-11643-5
x-cns-11643-6
x-cns-11643-7
EUC-KR
ks_c_5601-1987
x-x11johab
x-johab
x-johab-noascii
x-windows-949
ISO-8859-15
armscii-8
ISO-8859-4
ISO-8859-14
ISO-8859-2
GB2312
Big5
KOI8-R
windows-1251
KOI8-U
ISO-8859-7
ISO-2022-JP
EUC-KR
ISO-8859-10
ISO-8859-3
TIS-620
ISO-8859-9
VISCII
iso-8859-15
ibm850
x-mac-roman
windows-1252
iso-8859-14
iso-8859-7
x-mac-greek
windows-1253
x-mac-icelandic
iso-8859-10
iso-8859-3
iso-8859-4
iso-8859-13
windows-1257
ibm852
iso-8859-2
x-mac-ce
windows-1250
x-mac-croatian
ibm855
iso-8859-5
iso-ir-111
koi8-r
x-mac-cyrillic
windows-1251
ibm866
koi8-u
x-mac-ukrainian
x-mac-romanian
gb2312
x-gbk
gb18030
hz-gb-2312
big5
big5-hkscs
x-euc-tw
euc-jp
iso-2022-jp
shift_jis
euc-kr
x-windows-949
x-johab
armscii-8
tis-620
ibm857
iso-8859-9
x-mac-turkish
windows-1254
x-viet-tcvn5712
viscii
x-viet-vps
windows-1258
x-mac-devanagari
x-mac-gujarati
x-mac-gurmukhi
iso-8859-6
windows-1256
ibm864
x-mac-arabic
x-mac-farsi
iso-8859-8-i
windows-1255
iso-8859-8
ibm862
x-mac-hebrew

I just spoke to ftang and we shouldn't be needing all of these during startup. 
I suspect that there is some enumeration going on somewhere and we should defer 
that to later. I'm still in the first stage of investigation, but crude 
preliminary data indicate that this might be a worthwhile effort... 
OK, I believe we are loading all these aliases because of the charset menu 
construction. I'm marking dependency to bug 64146.
Status: NEW → ASSIGNED
Depends on: 64146
Priority: -- → P3
Target Milestone: --- → mozilla0.9.5
confirming my last assumption:

removing the charset menu loader reduces the number of charset alias 
resolutions (in a US trunk build) to two:

UTF-8
ISO-8859-1

Looks like we'll have to address both issues together. I'll run the same check 
with a Japanese build as Frank suggested and enhance the list of hard-coded 
charset aliases. However, as it seems the delayed loading of the charset menu 
would help greatly in addressing this issue.
here is a list of non-canonical charset designators. We need to fix these in 
our internal prefs to avoid the need for alias resolution and subsequent 
loading of charsetalias.properties. Two patches coming up - one will defer 
charset alias resolution to debug builds only, the other will address the need 
for strictly canonical charset designators in our internal prefs.

Resolving charset alias: UTF-16 = UTF-16BE
Resolving charset alias: gb_2312-80 = GB2312
Resolving charset alias: x-x-big5 = Big5
Resolving charset alias: ks_c_5601-1987 = x-windows-949
Resolving charset alias: iso-8859-1 = ISO-8859-1
Resolving charset alias: iso-8859-1 = ISO-8859-1
Resolving charset alias: iso-8859-1 = ISO-8859-1
Resolving charset alias: iso-8859-1 = ISO-8859-1
Resolving charset alias: iso-8859-15 = ISO-8859-15
Resolving charset alias: ibm850 = IBM850
Resolving charset alias: iso-8859-14 = ISO-8859-14
Resolving charset alias: iso-8859-7 = ISO-8859-7
Resolving charset alias: iso-8859-10 = ISO-8859-10
Resolving charset alias: iso-8859-3 = ISO-8859-3
Resolving charset alias: iso-8859-4 = ISO-8859-4
Resolving charset alias: iso-8859-13 = ISO-8859-13
Resolving charset alias: windows-1257 = windows-1257
Resolving charset alias: ibm852 = IBM852
Resolving charset alias: iso-8859-2 = ISO-8859-2
Resolving charset alias: ibm855 = IBM855
Resolving charset alias: iso-8859-5 = ISO-8859-5
Resolving charset alias: iso-ir-111 = ISO-IR-111
Resolving charset alias: koi8-r = KOI8-R
Resolving charset alias: ibm866 = IBM866
Resolving charset alias: koi8-u = KOI8-U
Resolving charset alias: gb2312 = GB2312
Resolving charset alias: hz-gb-2312 = HZ-GB-2312
Resolving charset alias: big5 = Big5
Resolving charset alias: big5-hkscs = Big5-HKSCS
Resolving charset alias: euc-jp = EUC-JP
Resolving charset alias: iso-2022-jp = ISO-2022-JP
Resolving charset alias: shift_jis = Shift_JIS
Resolving charset alias: euc-kr = EUC-KR
Resolving charset alias: tis-620 = TIS-620
Resolving charset alias: ibm857 = IBM857
Resolving charset alias: iso-8859-9 = ISO-8859-9
Resolving charset alias: viscii = VISCII
Resolving charset alias: iso-8859-6 = ISO-8859-6
Resolving charset alias: ibm864 = IBM864
Resolving charset alias: iso-8859-8-i = ISO-8859-8-I
Resolving charset alias: iso-8859-8 = ISO-8859-8
Resolving charset alias: ibm862 = IBM862
cc'ing bobj and cathleen, first rough measurements show that this could save 
around 3% (0.25s) of startup time.
Review:
So I guess we are completely eliminating the charset alias ? In that case, can
we remove GetCharsetAtom() and GetCharsetAtom2() If not, atleast we can make
GetCharsetAtom() call GetCharsetAtomInternal()

r=dp
> So I guess we are completely eliminating the charset alias ?
No, we are forcing all "internal" charset designators to use the canonical
names, but we still need to support the aliases for external charsets
(e.g., MIME headers).

Nominating nsbranch for 0.9.4.
Keywords: nsbranch
Target Milestone: mozilla0.9.5 → mozilla0.9.4
dp,

thanks for looking at this! I guess bobj already addressed your question. I 
might only add that I too believe that we might be able to increase the number 
of instances using GetCharsetAtomInternal(). I felt I had to create a new 
method, since the old one is used for both - internal and external charset 
designator resolution. 

The need for internal chasert aliasing could be eliminated thoroughtout the 
whole application, so this could be just a first step. I'm not entirely sure, 
how much we could benefit from going all the way, I'd have to investigate some 
more.
there is still a small problem left to resolve. After changing 
navigator.properties, I'm still left with the following non-canonical charset 
designators. I'm investigating, where they are coming from and suspect the 
source could be individual charset decoders.

Resolving charset alias: UTF-16 = UTF-16BE
Resolving charset alias: gb_2312-80 = GB2312
Resolving charset alias: x-x-big5 = Big5
Resolving charset alias: ks_c_5601-1987 = x-windows-949
I just verified that the remaining four non-canonical references (UTF-16, 
gb_2312-80, x-x-big5 and ks_c_5601-1987) are not coming from our registered 
charset decoders and I'm 100% certain they are not coming from the charset menu 
either, which was at the center of my optimization effort here.

I'll give it a more careful look in the debugger and come up with a 
recommendation how to proceed from here. Please bear with me.
Attachment #48645 - Flags: review+
Attachment #48645 - Flags: needs-work+
Attachment #48655 - Attachment is obsolete: true
cc'ing some more i18n folks to solicit feedback on the proposed changes. 

As it turned out the encoding designators requiring aliasing came from four 
encoders: nsUnicodeToBIG5NoAscii, nsUnicodeToUTF16, nsUnicodeToKSC5601, 
nsUnicodeToGB2312GL. 

I'd like to change the _To argument in the NS_UCONV_REG_UNREG macro. It is my 
belief that it should cause no harm, since the 
NS_UNICODEDECODER_CONTRACTID_BASE and the DECODER_NAME_BASE remains unchanged. 

Another more aggressive change involves making GetCharsetAtom2() call 
GetCharsetAtomInternal(). Unlike GetCharsetAtom(), the "2" method is being used 
exclusively for internal purposes and we could easily avoid the penalty of 
encoding name aliasing:

http://lxr.mozilla.org/seamonkey/search?string=GetCharsetAtom2

I'm still concerned that we might run into trouble, when QT/GTK/Xlib refer to 
different encoding names requiring aliasing. I might whip up 
GetCharsetAtomInternal2() to make this approach less aggressive, but I'd rather 
not. I do believe there is some measurable benefit in avoid internal encoding 
aliasing throughout the whole application as dp pointed out and the current 
patch is a first in that direction.

Before we push for a branch checkin, I'd also like to make a binary drop for 
the QA to confirm the estimated startup time savings of 2.5-3%.
I'm afraid the following patch to nsUCvKoModule.cpp has some
problem.
--------------
RCS file: /cvsroot/mozilla/intl/uconv/ucvko/nsUCvKoModule.cpp,v
retrieving revision 1.17
diff -u -w -b -r1.17 nsUCvKoModule.cpp
--- nsUCvKoModule.cpp	2001/07/18 21:48:43	1.17
+++ nsUCvKoModule.cpp	2001/09/09 00:38:27
@@ -74,7 +74,7 @@
 
 NS_UCONV_REG_UNREG(nsEUCKRToUnicode, "EUC-KR", "Unicode" , NS_EUCKRTOUNICODE_CID);
 NS_UCONV_REG_UNREG(nsUnicodeToEUCKR, "Unicode", "EUC-KR",  NS_UNICODETOEUCKR_CID);
-NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "ks_c_5601-1987", 
NS_UNICODETOKSC5601_CID);
+NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "EUC-KR", 
NS_UNICODETOKSC5601_CID);
 NS_UCONV_REG_UNREG(nsUnicodeToX11Johab, "Unicode", "x-x11johab", 
NS_UNICODETOX11JOHAB_CID);
 NS_UCONV_REG_UNREG(nsJohabToUnicode, "x-johab", "Unicode" , NS_JOHABTOUNICODE_CID);
 NS_UCONV_REG_UNREG(nsUnicodeToJohab, "Unicode", "x-johab",  NS_UNICODETOJOHAB_CID);
----------------------------------------

 'ks_c_5601-1987' plays *two* roles in Mozilla and this is a source
of confusion. 

  One is an alias 
name for the canonical (MIME) charset name 'x-windows-949'. Microsoft
came up with this standard-violating MIME charset name (it's a long
story) and began to use it in their mail/news clients. As their
clients are so widespread that Mozilla has to be generous and
accept it as an alias to either EUC-KR or x-windows-949. (the
latter is MS's proprieatary extension of the former). For this role,
the canonical name 'x-windows-949' can be used internally in place
of 'ks_c_5601-1987'. However, to deal with Korean mail messages and
news articles with MIME charset name 'ks_c_5601-1987', this has
to be kept as an alias for x-windows-949 (the canonical name in Mozilla). 

  The other is an internal charset name within Mozilla for 
KS C 5601 GL encoding (used ONLY by X11 *fonts* for chararacters in KS C
5601-1987. That's why there's no nsKSC5601ToUnicode.cpp)
It is this role that 'ks_c_5601-1987' is playing in nsUnicodeToKSC5601.cpp
and this CANNOT and SHOULD NOT be regarded as identical to 'EUC-KR'.
Neither can/should it be regarded as identical to 'x-windows-949'. 
So, what has to be done? Unless there's some restriction
on what format to be used for 'charset' name(e.g.
only one '-' can be used or '.' cannot be used), we can just
come up with a *distinct* name to use for this second case (i.e. in
nsUnicodeToKSC5601 converter). I suggest we use
'ksc5601-1987-0' or 'ksc5601.1987-0' in intl/uconv/ucvko/nsUCvKoModule.cpp.
I prefer the latter because that's exactly the same as used in
the last two fields of XLFD of the fonts Unicode -> KSC5601 converter
deals with. In case '.' cannot be used, we should use 'ksc5601-1987-0'.
In case only one '-' can be used (which I don't think is the case),
what about 'ksc5601_1987-0'. In short, intl/uconv/ucvko/nsUCvKoModule.cpp
should have the following (assuming we can use '.' )

NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "ksc5601.1987-0", 
NS_UNICODETOKSC5601_CID);

in place of 

NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "ks_c_5601-1987", 
NS_UNICODETOKSC5601_CID);
  
  If we decide on what to use for designation of KSC5601 GL encoding
for X11 fonts, we also have to make changes in gfx/src/gtk/nsFontMetricGTK.cpp.
'ks_c_5601-1987' in the following part of the file has to be replaced
by what we decide to use for KS C 5601 GL encoding.

static nsFontCharSetInfo KSC5601 =
  { "ks_c_5601-1987", DoubleByteConvert,1); 

should be changed to

static nsFontCharSetInfo KSC5601 =
  { "ksc5601.1987-0", DoubleByteConvert,1);



  I suspect gb_2312-80 has a similar problem. 

One may wonder why JIS 0208 and JIS 0212 don't have
this problem. That's because for Japanese, there's been
no confusion between coded character set name (JIS X 0208,
JIS X 0212) on the one hand and character set encoding scheme
(EUC-JP, ISO-2022-JP). For Korean, Microsoft mixed all these up
(although EUC-KR had been very well established before MS shipped
their first internet mail clients)
and Mozilla, in its effort to be generous, made a little blunder
by failing to distinguish between ks_c_5601-1987 (the standard-violating
MIME charset name used by MS products) and ksc5601.1987-0 (the last two
fields of XLFD for Korean X11 fonts with KSC5601 GL encoding). 
The situation for simplified Chinese is even worse (GB2312 and
variants are used to mean at least two different things. It would
be nice if a much better name EUC-CN had caught on as MIME charset)


Exactly like ks_c_5601-1987 plays dual roles in Mozilla source,
x-x-big5 and gb_2312-80 play two roles as 'MIME charset' name
(non-canonical) and the font encoding name in X11. The same principle
can be applied to distinguish one from the other.
I'm attaching a patch to fix all three cases together.


In this patch, I use 'ksc5601.1987-0' for KS C 5601-1987 GL encoding,
'x-big5-noascii' for X11 Big5 font without US-ASCII part
(similar names have been used for X11 JOHAB fonts and X11 GBK
fonts), and 'gb2312.1980-0' for GB 2312-1980 GL encoding.   
I found that there's nothing to prevent '.' from being used
in Mozilla's internal charset name and '.' is used
for 'gb18030.2000-0'. Besides, I think using 'gb2312.1980-0'
and 'ksc5601.1987-0' as Mozilla's internal charset names
for Mozilla's encoders to deal with
the X11 fonts   with the same XLFD name fields   is a good idea.

  'ksc5601.1987-0', 'x-big5-noascii' and 'gb2312.1980-0' are 
used only by encoders for X11 fonts and they are never used
as MIME charset name (so they should not be in charsetalias.properties
file as commented there). Like jis_0208-1983 and jis_0212-1990
don't cause any alias resolution problem, these three names
wouldn't cause any alias resolution problem.  

I forgot to look into QT and Xlib part of gfx. I'm going to
attach a new patch that take care of QT and Xlib as well as gtk.
Attachment #48747 - Attachment is obsolete: true
Attachment #48775 - Attachment is obsolete: true
I'm sorry I keep forgetting a thing or two. I believe the latest attachment
I've just added (timed 9:43 US PDT) has everything I want to change. BTW, most
of my changes can be
regarded as merely 'aesthetic' (not affecting the way Mozilla
runs but simply making the source code clearer). However, I think
it's better this way because it will prevent potential confusion
in the future. 
thanks for all your input and the hard work Jungshik! At a first glance the 
changes look good, I'll roll up a new hopefully all-encompassing patch 
including your changes.

mkaply - do we need to worry about nsCSIDtoCharsetName.cpp? What is your 
opinion on the proposed course of action?

I like this fix a lot. I was investigating some Mac encoding problems and was 
very confused to discover that aliases are required for all charsets even when 
the alias is the same as the original charset.

I don't think you need to mess with nCSIDtoCharsetName.cpp - this is really just 
a 4.x compatibility thing. Frank would have more info though.
marking nsbranch+ since it is a performance win
Keywords: nsbranchnsbranch+
I spent several hours working with Juraj on this yesterday.

It is not clear to me that we will actually save much startup time by not
loading this file.

Juraj put a high accuracy time around the code that loads this file and
while it takes about 250 milli-seconds to build the charset menu 
he reported it takes about 30 milli-seconds to load this file.
30 ms out of ? An easy way to time startup is

- create 'Default User'
- time ./mozilla -P 'Default User' file:///usr/tmp/quit.html
  quit.html: <html><body onLoad="window.close()"></body></html>

There are no big wins; we got to be doing a whole bunch of small wins to get a
cumulative performance improvement. Maybe small improvements need not be taken
in for the branch.
bstell, I've just learned from jrgm that the timer resolution on W2K is 16ms. 
This would explain why I was measuring either 16 or 32 ms gain using the built-
in timer method. It also explains some of the "noise" I was picking up when 
using the method described in bug 97712, which I continue to use in parallel.  
I will also apply dp's method on Linux. 

So far it looks like the lower boundary is at ~ 30 ms and the upper boundary at 
~ 250 ms. After completing the patch, we could perhaps apply it to the trunk 
and have the QA officially record the performance gain using their metrics. If 
deemed significant enough, we could take the patch into the branch.
I'd like to wait for ftang to have a look at this before proceeding any futher.
I saw some asserts with a Linux test build
(http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#3692),
but they seem to be unrelated to this change.
Whiteboard: have patch, need r/sr
> I'd like to wait for ftang to have a look at this before proceeding any futher.
> I saw some asserts with a Linux test build
> (http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#3692),
> but they seem to be unrelated to this change.

If it's unrelated, can you back out your change and still see the asserts?

>If it's unrelated, can you back out your change and still see the asserts?

Yes, I've tried on two separate build machines (thanks Brian!). Although, the 
asserts are quite irritating, they seem to unrelated. I just wanted to have it 
recorded here, to have the benefit of a wider audience. Although I'm not sure 
how to resolve those asserts at this point, I believe we could proceed with 
this patch a that's why I posted it.
Juraj: please set the environment variable NS_FONT_DEBUG to 4 and
find out what charset is complaining.

Dp: Building the charset menu seems to take around 250-300mS. Not reading
in the file (which is the stated plan of this bug) would appear to save 
about 30 mS which is not much of the 250+ mS.

0.9.4 is out the door.
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Brian: 

I believe it's hpbig5, but there might be others. I'll have to rerun all of my 
converter test with the font debug turned on. I'll follow up on that soon.

I followed up with Chris McAfee in bug 97712 and it seems that the "official" 
startup metrics can achieve a 3% margin of error, which is roughly 3 times 
better than what I have been seeing. If the performance improvement is small, 
it could be lost in the "noise". I believe the 30 mS for the file access is 
correct, although the overall improvement could be better due to reduced number 
of hash table lookups, smaller bloat etc.
from the startup log:
###!!! ASSERTION: failed to get converter type: 'mDocConverterType!=nsnull', 
file nsFontMetricsGTK.cpp, line 3692
###!!! Break: at file nsFontMetricsGTK.cpp, line 3692
failed to get converter type for zh-TW, nsFontMetricsGTK.cpp 3693

The zh-TW charset is for traditional Chinese.
> The zh-TW charset

well, that would be a language group, wouldn't it? Am I right in assuming 
that "mDocConverterType = first_font->mCharSetInfo->Convert" failed?
taking the nsbranch+ off this.  it should bake on the trunk for awhile and then
only come to the branch if it's extreme low risk and high reward at this point..
let me know if it turns out that way and it happens in the next week or so.
Keywords: nsbranch+nsbranch-
OS: Windows 95 → All
Attachment #49623 - Attachment is obsolete: true
Attachment #49402 - Attachment is obsolete: true
Attachment #48776 - Attachment is obsolete: true
Attachment #48744 - Attachment is obsolete: true
Attachment #48721 - Attachment is obsolete: true
Attachment #48645 - Attachment is obsolete: true
ftang: 

would you be able to review tha most recent patch? The changes now affect the 
registry enumeration, which is in turn used only by the UI (charset menu, mail 
encoding menu etc.). This would greatly minimize any regression risk as 
discussed offline. I did some preliminary verification and it looks good. I'd 
like to finally close this and move on to "bigger fish" like bug 64146.
another comment: we really need to resolve both "x-sjis" and "shift_jis" for 
Japanese builds. I was quite surprised at this finding.

> another comment: we really need to resolve both "x-sjis" and "shift_jis" for
> Japanese builds. I was quite surprised at this finding.

Probably a bug.  Historically, when shift_jis was not registered with IANA we
created x-sjis.  We still need to alias x-sjis for old webpages, but we should not
use this internally. 

Cc'ing rchen who might know why the Japanese builds are using x-sjis.
Blocks: 103175
this one is more complex than I thought. jbetak said it is still ok for m0.9.6
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Attachment #50337 - Attachment is obsolete: true
ftang - could you please review this again? 

I removed all attempts to optimize charset aliasing to reduce risk of 
regressions. When bug 64146 is checked in, there is no need to enumerate 180+ 
converters, which simplifies this problem quite a bit. By accident I noticed 
that we didn't report any error when a charset designator was empty - 
mozilla.org returns an empty HTTP header charset and we end up loading the 
string bundle to look it up - I found it quite amusing.

Also cc'ing blizzard to put this on his radar. It would be nice, if he could sr 
after ftang approves of this approach.
nhotta - would you feel comfortable with and have time to review this? I'd 
really appreciate it. Hopefully Chriss Blizzard will sr soon, so that I can get 
it checked in.
NS_TIMELINE_STOP_TIMER("nsCharsetAlias2:GetPreferred");
is that for permance profiling?
nhotta - thanks for looking at this! 

Yes, NS_TIMELINE_STOP_TIMER is for performance profiling, it only gets built in 
debug builds with the the enviroment variable MOZ_TIMELINE set. I was surprised 
to see that lxr is full of these macros - someone has been putting them all 
over :-)
So is this just adding checks for two cases ("shift_jis" and "x-sjis")?

 
Comment on attachment 53808 [details] [diff] [review]
final patch candidate - this assumes that bug 64146 will be resolved soon

r=nhotta
Attachment #53808 - Flags: review+
blizzard - 
could I pester you with one more sr? This one is on i18n performance (startup) 
tuning. I'd really appreciate it, as M096 is drawing near :-)
cc'ing waterson for potential sr
Blocks: 107067
Keywords: nsbranch-
cc'ing brendan for potential i18n sr
Whiteboard: have patch, need r/sr → needs sr
Whiteboard: needs sr → requested sr by email on 11/02/2001
Comment on attachment 53808 [details] [diff] [review]
final patch candidate - this assumes that bug 64146 will be resolved soon

sr=kin@netscape.com
Attachment #53808 - Flags: superreview+
I forgot to ask why the "x-sjis" and "shift_jis" cases can't be combined into a 
single if statement like ...

+  if (aKey.Equals(NS_LITERAL_STRING("x-sjis") ||
+      aKey.Equals(NS_LITERAL_STRING("shift_jis")))
Netcenter needs to be evangelized and x-sjis should be dropped from their 
Japanese site. The only reason I kept them separate was my vain hope to have 
them do it before I can check this patch in.

Kin, thanks so much for your review - I'll correct this and land the patch in a 
few minutes.
patch has just landed... Marking fixed - thanks everyone!
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Changed QA contact to ftang@netscape.com.
QA Contact: andreasb → ftang
Kat,
> Netcenter needs to be evangelized and x-sjis should be dropped from their 
> Japanese site. The only reason I kept them separate was my vain hope to have 
> them do it before I can check this patch in.

Can you help on this?
> Kat,
>> Netcenter needs to be evangelized and x-sjis should be dropped from their 
>> Japanese site. 

I guess it is finally time to mothball Netscape 2.x or earlier & IE3
completely.


>>The only reason I kept them separate was my vain hope to have 
>> them do it before I can check this patch in.

NetCenter isn't the only site using x-sjis, but it should now be safe
for other web sites to drop x-sjis also. As long as external charset handling
for Mozilla can deal with x-sjis, this should not be a problem.

>Can you help on this?

Sure.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: