Closed
Bug 97175
Opened 23 years ago
Closed 23 years ago
startup perf- remove the need of loading of charsetalias.properties files at startup
Categories
(Core :: Internationalization, defect, P3)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla0.9.6
People
(Reporter: ftang, Assigned: jbetak)
References
Details
(Keywords: perf, Whiteboard: requested sr by email on 11/02/2001)
Attachments
(1 file, 10 obsolete files)
2.58 KB,
patch
|
nhottanscp
:
review+
kinmoz
:
superreview+
|
Details | Diff | Splinter Review |
we currently load charsetalias.properties at startup time, we should find a way
to
remove the need of loading it at startup time to speed up startup performance
we could
1. build in common used pair in the cpp code, and/or
2. cache the resolved result in registry.
Updated•23 years ago
|
Assignee | ||
Comment 1•23 years ago
|
||
per offline discussion with ftang - tentatively reassigning to myself
Assignee: ftang → jbetak
Assignee | ||
Comment 2•23 years ago
|
||
list of charset aliases requested during startup in sequential order (US trunk
build):
windows-1252
x-mac-roman
us-ascii
x-u-escaped
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-6-I
ISO-8859-6-E
ISO-8859-7
ISO-8859-8
ISO-8859-8-I
ISO-8859-8-E
ISO-8859-9
ISO-8859-10
ISO-8859-13
ISO-8859-14
ISO-8859-15
ISO-IR-111
windows-1250
windows-1251
windows-1253
windows-1254
windows-1255
windows-1256
windows-1257
windows-1258
TIS-620
IBM866
KOI8-R
KOI8-U
x-mac-ce
x-mac-greek
x-mac-turkish
x-mac-croatian
x-mac-romanian
x-mac-cyrillic
x-mac-ukrainian
x-mac-icelandic
armscii-8
x-viet-tcvn5712
VISCII
x-viet-vps
UTF-7
x-imap4-modified-utf7
UTF-16BE
UTF-16LE
UTF-32BE
UTF-32LE
T.61-8bit
x-user-defined
x-mac-arabic
x-mac-devanagari
x-mac-farsi
x-mac-gurmukhi
x-mac-gujarati
x-mac-hebrew
IBM850
IBM852
IBM855
IBM857
IBM862
IBM864
IBM864i
Shift_JIS
EUC-JP
ISO-2022-JP
GB2312
windows-936
x-gbk
HZ-GB-2312
gb18030
Big5
Big5-HKSCS
x-euc-tw
EUC-KR
x-johab
x-windows-949
windows-1252
x-mac-roman
us-ascii
x-u-escaped
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-6-I
ISO-8859-6-E
ISO-8859-7
ISO-8859-8
ISO-8859-8-I
ISO-8859-8-E
ISO-8859-9
ISO-8859-10
ISO-8859-13
ISO-8859-14
ISO-8859-15
ISO-IR-111
windows-1250
windows-1251
windows-1253
windows-1254
windows-1255
windows-1256
windows-1257
windows-1258
TIS-620
IBM866
KOI8-R
KOI8-U
x-mac-ce
x-mac-greek
x-mac-turkish
x-mac-croatian
x-mac-romanian
x-mac-cyrillic
x-mac-ukrainian
x-mac-icelandic
armscii-8
x-viet-tcvn5712
VISCII
x-viet-vps
UTF-7
x-imap4-modified-utf7
UTF-16BE
UTF-16LE
UTF-16
UTF-32BE
UTF-32LE
T.61-8bit
x-user-defined
Adobe-Symbol-Encoding
x-zapf-dingbats
x-mac-arabic
x-mac-devanagari
x-mac-farsi
x-mac-gurmukhi
x-mac-gujarati
x-mac-hebrew
IBM850
IBM852
IBM855
IBM857
IBM862
IBM864
IBM864i
Shift_JIS
EUC-JP
ISO-2022-JP
jis_0201
jis_0208-1983
jis_0212-1990
GB2312
windows-936
x-gbk
x-gbk-noascii
HZ-GB-2312
gb_2312-80
gb18030.2000-0
gb18030.2000-1
gb18030
Big5
x-x-big5
Big5-HKSCS
hkscs-1
x-euc-tw
x-cns-11643-1
x-cns-11643-2
x-cns-11643-3
x-cns-11643-4
x-cns-11643-5
x-cns-11643-6
x-cns-11643-7
EUC-KR
ks_c_5601-1987
x-x11johab
x-johab
x-johab-noascii
x-windows-949
ISO-8859-15
armscii-8
ISO-8859-4
ISO-8859-14
ISO-8859-2
GB2312
Big5
KOI8-R
windows-1251
KOI8-U
ISO-8859-7
ISO-2022-JP
EUC-KR
ISO-8859-10
ISO-8859-3
TIS-620
ISO-8859-9
VISCII
iso-8859-15
ibm850
x-mac-roman
windows-1252
iso-8859-14
iso-8859-7
x-mac-greek
windows-1253
x-mac-icelandic
iso-8859-10
iso-8859-3
iso-8859-4
iso-8859-13
windows-1257
ibm852
iso-8859-2
x-mac-ce
windows-1250
x-mac-croatian
ibm855
iso-8859-5
iso-ir-111
koi8-r
x-mac-cyrillic
windows-1251
ibm866
koi8-u
x-mac-ukrainian
x-mac-romanian
gb2312
x-gbk
gb18030
hz-gb-2312
big5
big5-hkscs
x-euc-tw
euc-jp
iso-2022-jp
shift_jis
euc-kr
x-windows-949
x-johab
armscii-8
tis-620
ibm857
iso-8859-9
x-mac-turkish
windows-1254
x-viet-tcvn5712
viscii
x-viet-vps
windows-1258
x-mac-devanagari
x-mac-gujarati
x-mac-gurmukhi
iso-8859-6
windows-1256
ibm864
x-mac-arabic
x-mac-farsi
iso-8859-8-i
windows-1255
iso-8859-8
ibm862
x-mac-hebrew
I just spoke to ftang and we shouldn't be needing all of these during startup.
I suspect that there is some enumeration going on somewhere and we should defer
that to later. I'm still in the first stage of investigation, but crude
preliminary data indicate that this might be a worthwhile effort...
Assignee | ||
Comment 3•23 years ago
|
||
OK, I believe we are loading all these aliases because of the charset menu
construction. I'm marking dependency to bug 64146.
Assignee | ||
Comment 4•23 years ago
|
||
confirming my last assumption:
removing the charset menu loader reduces the number of charset alias
resolutions (in a US trunk build) to two:
UTF-8
ISO-8859-1
Looks like we'll have to address both issues together. I'll run the same check
with a Japanese build as Frank suggested and enhance the list of hard-coded
charset aliases. However, as it seems the delayed loading of the charset menu
would help greatly in addressing this issue.
Assignee | ||
Comment 5•23 years ago
|
||
here is a list of non-canonical charset designators. We need to fix these in
our internal prefs to avoid the need for alias resolution and subsequent
loading of charsetalias.properties. Two patches coming up - one will defer
charset alias resolution to debug builds only, the other will address the need
for strictly canonical charset designators in our internal prefs.
Resolving charset alias: UTF-16 = UTF-16BE
Resolving charset alias: gb_2312-80 = GB2312
Resolving charset alias: x-x-big5 = Big5
Resolving charset alias: ks_c_5601-1987 = x-windows-949
Resolving charset alias: iso-8859-1 = ISO-8859-1
Resolving charset alias: iso-8859-1 = ISO-8859-1
Resolving charset alias: iso-8859-1 = ISO-8859-1
Resolving charset alias: iso-8859-1 = ISO-8859-1
Resolving charset alias: iso-8859-15 = ISO-8859-15
Resolving charset alias: ibm850 = IBM850
Resolving charset alias: iso-8859-14 = ISO-8859-14
Resolving charset alias: iso-8859-7 = ISO-8859-7
Resolving charset alias: iso-8859-10 = ISO-8859-10
Resolving charset alias: iso-8859-3 = ISO-8859-3
Resolving charset alias: iso-8859-4 = ISO-8859-4
Resolving charset alias: iso-8859-13 = ISO-8859-13
Resolving charset alias: windows-1257 = windows-1257
Resolving charset alias: ibm852 = IBM852
Resolving charset alias: iso-8859-2 = ISO-8859-2
Resolving charset alias: ibm855 = IBM855
Resolving charset alias: iso-8859-5 = ISO-8859-5
Resolving charset alias: iso-ir-111 = ISO-IR-111
Resolving charset alias: koi8-r = KOI8-R
Resolving charset alias: ibm866 = IBM866
Resolving charset alias: koi8-u = KOI8-U
Resolving charset alias: gb2312 = GB2312
Resolving charset alias: hz-gb-2312 = HZ-GB-2312
Resolving charset alias: big5 = Big5
Resolving charset alias: big5-hkscs = Big5-HKSCS
Resolving charset alias: euc-jp = EUC-JP
Resolving charset alias: iso-2022-jp = ISO-2022-JP
Resolving charset alias: shift_jis = Shift_JIS
Resolving charset alias: euc-kr = EUC-KR
Resolving charset alias: tis-620 = TIS-620
Resolving charset alias: ibm857 = IBM857
Resolving charset alias: iso-8859-9 = ISO-8859-9
Resolving charset alias: viscii = VISCII
Resolving charset alias: iso-8859-6 = ISO-8859-6
Resolving charset alias: ibm864 = IBM864
Resolving charset alias: iso-8859-8-i = ISO-8859-8-I
Resolving charset alias: iso-8859-8 = ISO-8859-8
Resolving charset alias: ibm862 = IBM862
Assignee | ||
Comment 6•23 years ago
|
||
Assignee | ||
Comment 7•23 years ago
|
||
cc'ing bobj and cathleen, first rough measurements show that this could save
around 3% (0.25s) of startup time.
Comment 8•23 years ago
|
||
Review:
So I guess we are completely eliminating the charset alias ? In that case, can
we remove GetCharsetAtom() and GetCharsetAtom2() If not, atleast we can make
GetCharsetAtom() call GetCharsetAtomInternal()
r=dp
> So I guess we are completely eliminating the charset alias ?
No, we are forcing all "internal" charset designators to use the canonical
names, but we still need to support the aliases for external charsets
(e.g., MIME headers).
Nominating nsbranch for 0.9.4.
Keywords: nsbranch
Target Milestone: mozilla0.9.5 → mozilla0.9.4
Assignee | ||
Comment 10•23 years ago
|
||
Assignee | ||
Comment 11•23 years ago
|
||
dp,
thanks for looking at this! I guess bobj already addressed your question. I
might only add that I too believe that we might be able to increase the number
of instances using GetCharsetAtomInternal(). I felt I had to create a new
method, since the old one is used for both - internal and external charset
designator resolution.
The need for internal chasert aliasing could be eliminated thoroughtout the
whole application, so this could be just a first step. I'm not entirely sure,
how much we could benefit from going all the way, I'd have to investigate some
more.
Assignee | ||
Comment 12•23 years ago
|
||
there is still a small problem left to resolve. After changing
navigator.properties, I'm still left with the following non-canonical charset
designators. I'm investigating, where they are coming from and suspect the
source could be individual charset decoders.
Resolving charset alias: UTF-16 = UTF-16BE
Resolving charset alias: gb_2312-80 = GB2312
Resolving charset alias: x-x-big5 = Big5
Resolving charset alias: ks_c_5601-1987 = x-windows-949
Assignee | ||
Comment 13•23 years ago
|
||
I just verified that the remaining four non-canonical references (UTF-16,
gb_2312-80, x-x-big5 and ks_c_5601-1987) are not coming from our registered
charset decoders and I'm 100% certain they are not coming from the charset menu
either, which was at the center of my optimization effort here.
I'll give it a more careful look in the debugger and come up with a
recommendation how to proceed from here. Please bear with me.
Assignee | ||
Updated•23 years ago
|
Attachment #48645 -
Flags: review+
Attachment #48645 -
Flags: needs-work+
Assignee | ||
Updated•23 years ago
|
Attachment #48655 -
Attachment is obsolete: true
Assignee | ||
Comment 14•23 years ago
|
||
Assignee | ||
Comment 15•23 years ago
|
||
cc'ing some more i18n folks to solicit feedback on the proposed changes.
As it turned out the encoding designators requiring aliasing came from four
encoders: nsUnicodeToBIG5NoAscii, nsUnicodeToUTF16, nsUnicodeToKSC5601,
nsUnicodeToGB2312GL.
I'd like to change the _To argument in the NS_UCONV_REG_UNREG macro. It is my
belief that it should cause no harm, since the
NS_UNICODEDECODER_CONTRACTID_BASE and the DECODER_NAME_BASE remains unchanged.
Another more aggressive change involves making GetCharsetAtom2() call
GetCharsetAtomInternal(). Unlike GetCharsetAtom(), the "2" method is being used
exclusively for internal purposes and we could easily avoid the penalty of
encoding name aliasing:
http://lxr.mozilla.org/seamonkey/search?string=GetCharsetAtom2
I'm still concerned that we might run into trouble, when QT/GTK/Xlib refer to
different encoding names requiring aliasing. I might whip up
GetCharsetAtomInternal2() to make this approach less aggressive, but I'd rather
not. I do believe there is some measurable benefit in avoid internal encoding
aliasing throughout the whole application as dp pointed out and the current
patch is a first in that direction.
Before we push for a branch checkin, I'd also like to make a binary drop for
the QA to confirm the estimated startup time savings of 2.5-3%.
Comment 16•23 years ago
|
||
I'm afraid the following patch to nsUCvKoModule.cpp has some
problem.
--------------
RCS file: /cvsroot/mozilla/intl/uconv/ucvko/nsUCvKoModule.cpp,v
retrieving revision 1.17
diff -u -w -b -r1.17 nsUCvKoModule.cpp
--- nsUCvKoModule.cpp 2001/07/18 21:48:43 1.17
+++ nsUCvKoModule.cpp 2001/09/09 00:38:27
@@ -74,7 +74,7 @@
NS_UCONV_REG_UNREG(nsEUCKRToUnicode, "EUC-KR", "Unicode" , NS_EUCKRTOUNICODE_CID);
NS_UCONV_REG_UNREG(nsUnicodeToEUCKR, "Unicode", "EUC-KR", NS_UNICODETOEUCKR_CID);
-NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "ks_c_5601-1987",
NS_UNICODETOKSC5601_CID);
+NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "EUC-KR",
NS_UNICODETOKSC5601_CID);
NS_UCONV_REG_UNREG(nsUnicodeToX11Johab, "Unicode", "x-x11johab",
NS_UNICODETOX11JOHAB_CID);
NS_UCONV_REG_UNREG(nsJohabToUnicode, "x-johab", "Unicode" , NS_JOHABTOUNICODE_CID);
NS_UCONV_REG_UNREG(nsUnicodeToJohab, "Unicode", "x-johab", NS_UNICODETOJOHAB_CID);
----------------------------------------
'ks_c_5601-1987' plays *two* roles in Mozilla and this is a source
of confusion.
One is an alias
name for the canonical (MIME) charset name 'x-windows-949'. Microsoft
came up with this standard-violating MIME charset name (it's a long
story) and began to use it in their mail/news clients. As their
clients are so widespread that Mozilla has to be generous and
accept it as an alias to either EUC-KR or x-windows-949. (the
latter is MS's proprieatary extension of the former). For this role,
the canonical name 'x-windows-949' can be used internally in place
of 'ks_c_5601-1987'. However, to deal with Korean mail messages and
news articles with MIME charset name 'ks_c_5601-1987', this has
to be kept as an alias for x-windows-949 (the canonical name in Mozilla).
The other is an internal charset name within Mozilla for
KS C 5601 GL encoding (used ONLY by X11 *fonts* for chararacters in KS C
5601-1987. That's why there's no nsKSC5601ToUnicode.cpp)
It is this role that 'ks_c_5601-1987' is playing in nsUnicodeToKSC5601.cpp
and this CANNOT and SHOULD NOT be regarded as identical to 'EUC-KR'.
Neither can/should it be regarded as identical to 'x-windows-949'.
So, what has to be done? Unless there's some restriction
on what format to be used for 'charset' name(e.g.
only one '-' can be used or '.' cannot be used), we can just
come up with a *distinct* name to use for this second case (i.e. in
nsUnicodeToKSC5601 converter). I suggest we use
'ksc5601-1987-0' or 'ksc5601.1987-0' in intl/uconv/ucvko/nsUCvKoModule.cpp.
I prefer the latter because that's exactly the same as used in
the last two fields of XLFD of the fonts Unicode -> KSC5601 converter
deals with. In case '.' cannot be used, we should use 'ksc5601-1987-0'.
In case only one '-' can be used (which I don't think is the case),
what about 'ksc5601_1987-0'. In short, intl/uconv/ucvko/nsUCvKoModule.cpp
should have the following (assuming we can use '.' )
NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "ksc5601.1987-0",
NS_UNICODETOKSC5601_CID);
in place of
NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "ks_c_5601-1987",
NS_UNICODETOKSC5601_CID);
If we decide on what to use for designation of KSC5601 GL encoding
for X11 fonts, we also have to make changes in gfx/src/gtk/nsFontMetricGTK.cpp.
'ks_c_5601-1987' in the following part of the file has to be replaced
by what we decide to use for KS C 5601 GL encoding.
static nsFontCharSetInfo KSC5601 =
{ "ks_c_5601-1987", DoubleByteConvert,1);
should be changed to
static nsFontCharSetInfo KSC5601 =
{ "ksc5601.1987-0", DoubleByteConvert,1);
I suspect gb_2312-80 has a similar problem.
One may wonder why JIS 0208 and JIS 0212 don't have
this problem. That's because for Japanese, there's been
no confusion between coded character set name (JIS X 0208,
JIS X 0212) on the one hand and character set encoding scheme
(EUC-JP, ISO-2022-JP). For Korean, Microsoft mixed all these up
(although EUC-KR had been very well established before MS shipped
their first internet mail clients)
and Mozilla, in its effort to be generous, made a little blunder
by failing to distinguish between ks_c_5601-1987 (the standard-violating
MIME charset name used by MS products) and ksc5601.1987-0 (the last two
fields of XLFD for Korean X11 fonts with KSC5601 GL encoding).
The situation for simplified Chinese is even worse (GB2312 and
variants are used to mean at least two different things. It would
be nice if a much better name EUC-CN had caught on as MIME charset)
Comment 17•23 years ago
|
||
Comment 18•23 years ago
|
||
Exactly like ks_c_5601-1987 plays dual roles in Mozilla source,
x-x-big5 and gb_2312-80 play two roles as 'MIME charset' name
(non-canonical) and the font encoding name in X11. The same principle
can be applied to distinguish one from the other.
I'm attaching a patch to fix all three cases together.
In this patch, I use 'ksc5601.1987-0' for KS C 5601-1987 GL encoding,
'x-big5-noascii' for X11 Big5 font without US-ASCII part
(similar names have been used for X11 JOHAB fonts and X11 GBK
fonts), and 'gb2312.1980-0' for GB 2312-1980 GL encoding.
I found that there's nothing to prevent '.' from being used
in Mozilla's internal charset name and '.' is used
for 'gb18030.2000-0'. Besides, I think using 'gb2312.1980-0'
and 'ksc5601.1987-0' as Mozilla's internal charset names
for Mozilla's encoders to deal with
the X11 fonts with the same XLFD name fields is a good idea.
'ksc5601.1987-0', 'x-big5-noascii' and 'gb2312.1980-0' are
used only by encoders for X11 fonts and they are never used
as MIME charset name (so they should not be in charsetalias.properties
file as commented there). Like jis_0208-1983 and jis_0212-1990
don't cause any alias resolution problem, these three names
wouldn't cause any alias resolution problem.
Comment 19•23 years ago
|
||
Comment 20•23 years ago
|
||
I forgot to look into QT and Xlib part of gfx. I'm going to
attach a new patch that take care of QT and Xlib as well as gtk.
Comment 21•23 years ago
|
||
Updated•23 years ago
|
Attachment #48747 -
Attachment is obsolete: true
Updated•23 years ago
|
Attachment #48775 -
Attachment is obsolete: true
Comment 22•23 years ago
|
||
Comment 23•23 years ago
|
||
I'm sorry I keep forgetting a thing or two. I believe the latest attachment
I've just added (timed 9:43 US PDT) has everything I want to change. BTW, most
of my changes can be
regarded as merely 'aesthetic' (not affecting the way Mozilla
runs but simply making the source code clearer). However, I think
it's better this way because it will prevent potential confusion
in the future.
Assignee | ||
Comment 24•23 years ago
|
||
thanks for all your input and the hard work Jungshik! At a first glance the
changes look good, I'll roll up a new hopefully all-encompassing patch
including your changes.
mkaply - do we need to worry about nsCSIDtoCharsetName.cpp? What is your
opinion on the proposed course of action?
Comment 25•23 years ago
|
||
I like this fix a lot. I was investigating some Mac encoding problems and was
very confused to discover that aliases are required for all charsets even when
the alias is the same as the original charset.
I don't think you need to mess with nCSIDtoCharsetName.cpp - this is really just
a 4.x compatibility thing. Frank would have more info though.
Comment 26•23 years ago
|
||
marking nsbranch+ since it is a performance win
Comment 27•23 years ago
|
||
I spent several hours working with Juraj on this yesterday.
It is not clear to me that we will actually save much startup time by not
loading this file.
Juraj put a high accuracy time around the code that loads this file and
while it takes about 250 milli-seconds to build the charset menu
he reported it takes about 30 milli-seconds to load this file.
Comment 28•23 years ago
|
||
30 ms out of ? An easy way to time startup is
- create 'Default User'
- time ./mozilla -P 'Default User' file:///usr/tmp/quit.html
quit.html: <html><body onLoad="window.close()"></body></html>
There are no big wins; we got to be doing a whole bunch of small wins to get a
cumulative performance improvement. Maybe small improvements need not be taken
in for the branch.
Assignee | ||
Comment 29•23 years ago
|
||
bstell, I've just learned from jrgm that the timer resolution on W2K is 16ms.
This would explain why I was measuring either 16 or 32 ms gain using the built-
in timer method. It also explains some of the "noise" I was picking up when
using the method described in bug 97712, which I continue to use in parallel.
I will also apply dp's method on Linux.
So far it looks like the lower boundary is at ~ 30 ms and the upper boundary at
~ 250 ms. After completing the patch, we could perhaps apply it to the trunk
and have the QA officially record the performance gain using their metrics. If
deemed significant enough, we could take the patch into the branch.
Assignee | ||
Comment 30•23 years ago
|
||
Assignee | ||
Comment 31•23 years ago
|
||
I'd like to wait for ftang to have a look at this before proceeding any futher.
I saw some asserts with a Linux test build
(http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#3692),
but they seem to be unrelated to this change.
Whiteboard: have patch, need r/sr
Comment 32•23 years ago
|
||
> I'd like to wait for ftang to have a look at this before proceeding any futher.
> I saw some asserts with a Linux test build
> (http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#3692),
> but they seem to be unrelated to this change.
If it's unrelated, can you back out your change and still see the asserts?
Assignee | ||
Comment 33•23 years ago
|
||
>If it's unrelated, can you back out your change and still see the asserts?
Yes, I've tried on two separate build machines (thanks Brian!). Although, the
asserts are quite irritating, they seem to unrelated. I just wanted to have it
recorded here, to have the benefit of a wider audience. Although I'm not sure
how to resolve those asserts at this point, I believe we could proceed with
this patch a that's why I posted it.
Comment 34•23 years ago
|
||
Juraj: please set the environment variable NS_FONT_DEBUG to 4 and
find out what charset is complaining.
Dp: Building the charset menu seems to take around 250-300mS. Not reading
in the file (which is the stated plan of this bug) would appear to save
about 30 mS which is not much of the 250+ mS.
Assignee | ||
Comment 36•23 years ago
|
||
Brian:
I believe it's hpbig5, but there might be others. I'll have to rerun all of my
converter test with the font debug turned on. I'll follow up on that soon.
I followed up with Chris McAfee in bug 97712 and it seems that the "official"
startup metrics can achieve a 3% margin of error, which is roughly 3 times
better than what I have been seeing. If the performance improvement is small,
it could be lost in the "noise". I believe the 30 mS for the file access is
correct, although the overall improvement could be better due to reduced number
of hash table lookups, smaller bloat etc.
Assignee | ||
Comment 37•23 years ago
|
||
Comment 38•23 years ago
|
||
from the startup log:
###!!! ASSERTION: failed to get converter type: 'mDocConverterType!=nsnull',
file nsFontMetricsGTK.cpp, line 3692
###!!! Break: at file nsFontMetricsGTK.cpp, line 3692
failed to get converter type for zh-TW, nsFontMetricsGTK.cpp 3693
The zh-TW charset is for traditional Chinese.
Assignee | ||
Comment 39•23 years ago
|
||
> The zh-TW charset
well, that would be a language group, wouldn't it? Am I right in assuming
that "mDocConverterType = first_font->mCharSetInfo->Convert" failed?
Comment 40•23 years ago
|
||
taking the nsbranch+ off this. it should bake on the trunk for awhile and then
only come to the branch if it's extreme low risk and high reward at this point..
let me know if it turns out that way and it happens in the next week or so.
Updated•23 years ago
|
OS: Windows 95 → All
Assignee | ||
Comment 41•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Attachment #49623 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #49402 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #48776 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #48744 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #48721 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #48645 -
Attachment is obsolete: true
Assignee | ||
Comment 42•23 years ago
|
||
ftang:
would you be able to review tha most recent patch? The changes now affect the
registry enumeration, which is in turn used only by the UI (charset menu, mail
encoding menu etc.). This would greatly minimize any regression risk as
discussed offline. I did some preliminary verification and it looks good. I'd
like to finally close this and move on to "bigger fish" like bug 64146.
Assignee | ||
Comment 43•23 years ago
|
||
another comment: we really need to resolve both "x-sjis" and "shift_jis" for
Japanese builds. I was quite surprised at this finding.
Comment 44•23 years ago
|
||
> another comment: we really need to resolve both "x-sjis" and "shift_jis" for
> Japanese builds. I was quite surprised at this finding.
Probably a bug. Historically, when shift_jis was not registered with IANA we
created x-sjis. We still need to alias x-sjis for old webpages, but we should not
use this internally.
Cc'ing rchen who might know why the Japanese builds are using x-sjis.
Reporter | ||
Comment 45•23 years ago
|
||
this one is more complex than I thought. jbetak said it is still ok for m0.9.6
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Assignee | ||
Comment 46•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Attachment #50337 -
Attachment is obsolete: true
Assignee | ||
Comment 47•23 years ago
|
||
ftang - could you please review this again?
I removed all attempts to optimize charset aliasing to reduce risk of
regressions. When bug 64146 is checked in, there is no need to enumerate 180+
converters, which simplifies this problem quite a bit. By accident I noticed
that we didn't report any error when a charset designator was empty -
mozilla.org returns an empty HTTP header charset and we end up loading the
string bundle to look it up - I found it quite amusing.
Also cc'ing blizzard to put this on his radar. It would be nice, if he could sr
after ftang approves of this approach.
Assignee | ||
Comment 48•23 years ago
|
||
nhotta - would you feel comfortable with and have time to review this? I'd
really appreciate it. Hopefully Chriss Blizzard will sr soon, so that I can get
it checked in.
Comment 49•23 years ago
|
||
NS_TIMELINE_STOP_TIMER("nsCharsetAlias2:GetPreferred");
is that for permance profiling?
Assignee | ||
Comment 50•23 years ago
|
||
nhotta - thanks for looking at this!
Yes, NS_TIMELINE_STOP_TIMER is for performance profiling, it only gets built in
debug builds with the the enviroment variable MOZ_TIMELINE set. I was surprised
to see that lxr is full of these macros - someone has been putting them all
over :-)
Comment 51•23 years ago
|
||
So is this just adding checks for two cases ("shift_jis" and "x-sjis")?
Comment 52•23 years ago
|
||
Comment on attachment 53808 [details] [diff] [review]
final patch candidate - this assumes that bug 64146 will be resolved soon
r=nhotta
Attachment #53808 -
Flags: review+
Assignee | ||
Comment 53•23 years ago
|
||
blizzard -
could I pester you with one more sr? This one is on i18n performance (startup)
tuning. I'd really appreciate it, as M096 is drawing near :-)
Assignee | ||
Comment 54•23 years ago
|
||
cc'ing waterson for potential sr
Assignee | ||
Comment 55•23 years ago
|
||
cc'ing brendan for potential i18n sr
Assignee | ||
Updated•23 years ago
|
Whiteboard: have patch, need r/sr → needs sr
Assignee | ||
Updated•23 years ago
|
Whiteboard: needs sr → requested sr by email on 11/02/2001
Comment 56•23 years ago
|
||
Comment on attachment 53808 [details] [diff] [review]
final patch candidate - this assumes that bug 64146 will be resolved soon
sr=kin@netscape.com
Attachment #53808 -
Flags: superreview+
Comment 57•23 years ago
|
||
I forgot to ask why the "x-sjis" and "shift_jis" cases can't be combined into a
single if statement like ...
+ if (aKey.Equals(NS_LITERAL_STRING("x-sjis") ||
+ aKey.Equals(NS_LITERAL_STRING("shift_jis")))
Assignee | ||
Comment 58•23 years ago
|
||
Netcenter needs to be evangelized and x-sjis should be dropped from their
Japanese site. The only reason I kept them separate was my vain hope to have
them do it before I can check this patch in.
Kin, thanks so much for your review - I'll correct this and land the patch in a
few minutes.
Assignee | ||
Comment 59•23 years ago
|
||
patch has just landed... Marking fixed - thanks everyone!
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 61•23 years ago
|
||
Kat,
> Netcenter needs to be evangelized and x-sjis should be dropped from their
> Japanese site. The only reason I kept them separate was my vain hope to have
> them do it before I can check this patch in.
Can you help on this?
Comment 62•23 years ago
|
||
> Kat,
>> Netcenter needs to be evangelized and x-sjis should be dropped from their
>> Japanese site.
I guess it is finally time to mothball Netscape 2.x or earlier & IE3
completely.
>>The only reason I kept them separate was my vain hope to have
>> them do it before I can check this patch in.
NetCenter isn't the only site using x-sjis, but it should now be safe
for other web sites to drop x-sjis also. As long as external charset handling
for Mozilla can deal with x-sjis, this should not be a problem.
>Can you help on this?
Sure.
You need to log in
before you can comment on or make changes to this bug.
Description
•