Closed Bug 97175 Opened 23 years ago Closed 23 years ago

startup perf- remove the need of loading of charsetalias.properties files at startup

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla0.9.6

People

(Reporter: ftang, Assigned: jbetak)

References

Details

(Keywords: perf, Whiteboard: requested sr by email on 11/02/2001)

Attachments

(1 file, 10 obsolete files)

first part of the patch - avoid charset aliasing when using internal charset designators 23 years ago jbetak@netscape.com (away - not reading bugmail) 3.66 KB, patch	jbetak : review+	Details \| Diff \| Splinter Review
the promised 2nd part of the patch - convert internal charset designators to their canonical form 23 years ago jbetak@netscape.com (away - not reading bugmail) 2.73 KB, patch		Details \| Diff \| Splinter Review
more complete, albeit also more aggressive patch 23 years ago jbetak@netscape.com (away - not reading bugmail) 11.23 KB, patch		Details \| Diff \| Splinter Review
a patch to assign two diff. names to two different roles of KS C 5601 23 years ago Jungshik Shin 1.91 KB, patch		Details \| Diff \| Splinter Review
cleaning up internal charset names for X11 fonts for CN/TW/KO 23 years ago Jungshik Shin 4.52 KB, patch		Details \| Diff \| Splinter Review
cleaning up internal charset name for CN/TW/KO fonts in gtk/Xlib/QT 23 years ago Jungshik Shin 7.38 KB, patch		Details \| Diff \| Splinter Review
cleaning up internal charset name for CN/TW/KO fonts in gtk/Xlib/QT 23 years ago Jungshik Shin 8.38 KB, patch		Details \| Diff \| Splinter Review
updated patch - thanks for all your input! 23 years ago jbetak@netscape.com (away - not reading bugmail) 16.81 KB, patch		Details \| Diff \| Splinter Review
excerpt from the console output with NS_FONT_DEBUG=4 23 years ago jbetak@netscape.com (away - not reading bugmail) 12.17 KB, text/plain		Details
less aggressive patch as requested by ftang 23 years ago jbetak@netscape.com (away - not reading bugmail) 9.18 KB, patch		Details \| Diff \| Splinter Review
final patch candidate - this assumes that bug 64146 will be resolved soon 23 years ago jbetak@netscape.com (away - not reading bugmail) 2.58 KB, patch	nhottanscp : review+ kinmoz : superreview+	Details \| Diff \| Splinter Review

Frank Tang

Reporter

Description

•

23 years ago

we currently load charsetalias.properties at startup time, we should find a way to remove the need of loading it at startup time to speed up startup performance we could 1. build in common used pair in the cpp code, and/or 2. cache the resolved result in registry.

Suresh Duddi (gone)

Updated

•

23 years ago

Blocks: 7251

Keywords: perf

Summary: startup perf- remove the need of loading of charsetalias.properties files at startup → startup perf- remove the need of loading of charsetalias.properties files at startup

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 1

•

23 years ago

per offline discussion with ftang - tentatively reassigning to myself

Assignee: ftang → jbetak

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 2

•

23 years ago

list of charset aliases requested during startup in sequential order (US trunk build): windows-1252 x-mac-roman us-ascii x-u-escaped ISO-8859-2 ISO-8859-3 ISO-8859-4 ISO-8859-5 ISO-8859-6 ISO-8859-6-I ISO-8859-6-E ISO-8859-7 ISO-8859-8 ISO-8859-8-I ISO-8859-8-E ISO-8859-9 ISO-8859-10 ISO-8859-13 ISO-8859-14 ISO-8859-15 ISO-IR-111 windows-1250 windows-1251 windows-1253 windows-1254 windows-1255 windows-1256 windows-1257 windows-1258 TIS-620 IBM866 KOI8-R KOI8-U x-mac-ce x-mac-greek x-mac-turkish x-mac-croatian x-mac-romanian x-mac-cyrillic x-mac-ukrainian x-mac-icelandic armscii-8 x-viet-tcvn5712 VISCII x-viet-vps UTF-7 x-imap4-modified-utf7 UTF-16BE UTF-16LE UTF-32BE UTF-32LE T.61-8bit x-user-defined x-mac-arabic x-mac-devanagari x-mac-farsi x-mac-gurmukhi x-mac-gujarati x-mac-hebrew IBM850 IBM852 IBM855 IBM857 IBM862 IBM864 IBM864i Shift_JIS EUC-JP ISO-2022-JP GB2312 windows-936 x-gbk HZ-GB-2312 gb18030 Big5 Big5-HKSCS x-euc-tw EUC-KR x-johab x-windows-949 windows-1252 x-mac-roman us-ascii x-u-escaped ISO-8859-2 ISO-8859-3 ISO-8859-4 ISO-8859-5 ISO-8859-6 ISO-8859-6-I ISO-8859-6-E ISO-8859-7 ISO-8859-8 ISO-8859-8-I ISO-8859-8-E ISO-8859-9 ISO-8859-10 ISO-8859-13 ISO-8859-14 ISO-8859-15 ISO-IR-111 windows-1250 windows-1251 windows-1253 windows-1254 windows-1255 windows-1256 windows-1257 windows-1258 TIS-620 IBM866 KOI8-R KOI8-U x-mac-ce x-mac-greek x-mac-turkish x-mac-croatian x-mac-romanian x-mac-cyrillic x-mac-ukrainian x-mac-icelandic armscii-8 x-viet-tcvn5712 VISCII x-viet-vps UTF-7 x-imap4-modified-utf7 UTF-16BE UTF-16LE UTF-16 UTF-32BE UTF-32LE T.61-8bit x-user-defined Adobe-Symbol-Encoding x-zapf-dingbats x-mac-arabic x-mac-devanagari x-mac-farsi x-mac-gurmukhi x-mac-gujarati x-mac-hebrew IBM850 IBM852 IBM855 IBM857 IBM862 IBM864 IBM864i Shift_JIS EUC-JP ISO-2022-JP jis_0201 jis_0208-1983 jis_0212-1990 GB2312 windows-936 x-gbk x-gbk-noascii HZ-GB-2312 gb_2312-80 gb18030.2000-0 gb18030.2000-1 gb18030 Big5 x-x-big5 Big5-HKSCS hkscs-1 x-euc-tw x-cns-11643-1 x-cns-11643-2 x-cns-11643-3 x-cns-11643-4 x-cns-11643-5 x-cns-11643-6 x-cns-11643-7 EUC-KR ks_c_5601-1987 x-x11johab x-johab x-johab-noascii x-windows-949 ISO-8859-15 armscii-8 ISO-8859-4 ISO-8859-14 ISO-8859-2 GB2312 Big5 KOI8-R windows-1251 KOI8-U ISO-8859-7 ISO-2022-JP EUC-KR ISO-8859-10 ISO-8859-3 TIS-620 ISO-8859-9 VISCII iso-8859-15 ibm850 x-mac-roman windows-1252 iso-8859-14 iso-8859-7 x-mac-greek windows-1253 x-mac-icelandic iso-8859-10 iso-8859-3 iso-8859-4 iso-8859-13 windows-1257 ibm852 iso-8859-2 x-mac-ce windows-1250 x-mac-croatian ibm855 iso-8859-5 iso-ir-111 koi8-r x-mac-cyrillic windows-1251 ibm866 koi8-u x-mac-ukrainian x-mac-romanian gb2312 x-gbk gb18030 hz-gb-2312 big5 big5-hkscs x-euc-tw euc-jp iso-2022-jp shift_jis euc-kr x-windows-949 x-johab armscii-8 tis-620 ibm857 iso-8859-9 x-mac-turkish windows-1254 x-viet-tcvn5712 viscii x-viet-vps windows-1258 x-mac-devanagari x-mac-gujarati x-mac-gurmukhi iso-8859-6 windows-1256 ibm864 x-mac-arabic x-mac-farsi iso-8859-8-i windows-1255 iso-8859-8 ibm862 x-mac-hebrew I just spoke to ftang and we shouldn't be needing all of these during startup. I suspect that there is some enumeration going on somewhere and we should defer that to later. I'm still in the first stage of investigation, but crude preliminary data indicate that this might be a worthwhile effort...

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 3

•

23 years ago

OK, I believe we are loading all these aliases because of the charset menu construction. I'm marking dependency to bug 64146.

Status: NEW → ASSIGNED

Depends on: 64146

Priority: -- → P3

Target Milestone: --- → mozilla0.9.5

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 4

•

23 years ago

confirming my last assumption: removing the charset menu loader reduces the number of charset alias resolutions (in a US trunk build) to two: UTF-8 ISO-8859-1 Looks like we'll have to address both issues together. I'll run the same check with a Japanese build as Frank suggested and enhance the list of hard-coded charset aliases. However, as it seems the delayed loading of the charset menu would help greatly in addressing this issue.

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 5

•

23 years ago

here is a list of non-canonical charset designators. We need to fix these in our internal prefs to avoid the need for alias resolution and subsequent loading of charsetalias.properties. Two patches coming up - one will defer charset alias resolution to debug builds only, the other will address the need for strictly canonical charset designators in our internal prefs. Resolving charset alias: UTF-16 = UTF-16BE Resolving charset alias: gb_2312-80 = GB2312 Resolving charset alias: x-x-big5 = Big5 Resolving charset alias: ks_c_5601-1987 = x-windows-949 Resolving charset alias: iso-8859-1 = ISO-8859-1 Resolving charset alias: iso-8859-1 = ISO-8859-1 Resolving charset alias: iso-8859-1 = ISO-8859-1 Resolving charset alias: iso-8859-1 = ISO-8859-1 Resolving charset alias: iso-8859-15 = ISO-8859-15 Resolving charset alias: ibm850 = IBM850 Resolving charset alias: iso-8859-14 = ISO-8859-14 Resolving charset alias: iso-8859-7 = ISO-8859-7 Resolving charset alias: iso-8859-10 = ISO-8859-10 Resolving charset alias: iso-8859-3 = ISO-8859-3 Resolving charset alias: iso-8859-4 = ISO-8859-4 Resolving charset alias: iso-8859-13 = ISO-8859-13 Resolving charset alias: windows-1257 = windows-1257 Resolving charset alias: ibm852 = IBM852 Resolving charset alias: iso-8859-2 = ISO-8859-2 Resolving charset alias: ibm855 = IBM855 Resolving charset alias: iso-8859-5 = ISO-8859-5 Resolving charset alias: iso-ir-111 = ISO-IR-111 Resolving charset alias: koi8-r = KOI8-R Resolving charset alias: ibm866 = IBM866 Resolving charset alias: koi8-u = KOI8-U Resolving charset alias: gb2312 = GB2312 Resolving charset alias: hz-gb-2312 = HZ-GB-2312 Resolving charset alias: big5 = Big5 Resolving charset alias: big5-hkscs = Big5-HKSCS Resolving charset alias: euc-jp = EUC-JP Resolving charset alias: iso-2022-jp = ISO-2022-JP Resolving charset alias: shift_jis = Shift_JIS Resolving charset alias: euc-kr = EUC-KR Resolving charset alias: tis-620 = TIS-620 Resolving charset alias: ibm857 = IBM857 Resolving charset alias: iso-8859-9 = ISO-8859-9 Resolving charset alias: viscii = VISCII Resolving charset alias: iso-8859-6 = ISO-8859-6 Resolving charset alias: ibm864 = IBM864 Resolving charset alias: iso-8859-8-i = ISO-8859-8-I Resolving charset alias: iso-8859-8 = ISO-8859-8 Resolving charset alias: ibm862 = IBM862

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 6

•

23 years ago

Attached patch first part of the patch - avoid charset aliasing when using internal charset designators (obsolete) — Details — Splinter Review

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 7

•

23 years ago

cc'ing bobj and cathleen, first rough measurements show that this could save around 3% (0.25s) of startup time.

Suresh Duddi (gone)

Comment 8

•

23 years ago

Review: So I guess we are completely eliminating the charset alias ? In that case, can we remove GetCharsetAtom() and GetCharsetAtom2() If not, atleast we can make GetCharsetAtom() call GetCharsetAtomInternal() r=dp

bobj

Comment 9

•

23 years ago

> So I guess we are completely eliminating the charset alias ? No, we are forcing all "internal" charset designators to use the canonical names, but we still need to support the aliases for external charsets (e.g., MIME headers). Nominating nsbranch for 0.9.4.

Keywords: nsbranch

Target Milestone: mozilla0.9.5 → mozilla0.9.4

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 10

•

23 years ago

Attached patch the promised 2nd part of the patch - convert internal charset designators to their canonical form (obsolete) — Details — Splinter Review

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 11

•

23 years ago

dp, thanks for looking at this! I guess bobj already addressed your question. I might only add that I too believe that we might be able to increase the number of instances using GetCharsetAtomInternal(). I felt I had to create a new method, since the old one is used for both - internal and external charset designator resolution. The need for internal chasert aliasing could be eliminated thoroughtout the whole application, so this could be just a first step. I'm not entirely sure, how much we could benefit from going all the way, I'd have to investigate some more.

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 12

•

23 years ago

there is still a small problem left to resolve. After changing navigator.properties, I'm still left with the following non-canonical charset designators. I'm investigating, where they are coming from and suspect the source could be individual charset decoders. Resolving charset alias: UTF-16 = UTF-16BE Resolving charset alias: gb_2312-80 = GB2312 Resolving charset alias: x-x-big5 = Big5 Resolving charset alias: ks_c_5601-1987 = x-windows-949

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 13

•

23 years ago

I just verified that the remaining four non-canonical references (UTF-16, gb_2312-80, x-x-big5 and ks_c_5601-1987) are not coming from our registered charset decoders and I'm 100% certain they are not coming from the charset menu either, which was at the center of my optimization effort here. I'll give it a more careful look in the debugger and come up with a recommendation how to proceed from here. Please bear with me.

jbetak@netscape.com (away - not reading bugmail)

Assignee

Updated

•

23 years ago

Attachment #48645 - Flags: review+

Attachment #48645 - Flags: needs-work+

jbetak@netscape.com (away - not reading bugmail)

Assignee

Updated

•

23 years ago

Attachment #48655 - Attachment is obsolete: true

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 14

•

23 years ago

Attached patch more complete, albeit also more aggressive patch (obsolete) — Details — Splinter Review

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 15

•

23 years ago

cc'ing some more i18n folks to solicit feedback on the proposed changes. As it turned out the encoding designators requiring aliasing came from four encoders: nsUnicodeToBIG5NoAscii, nsUnicodeToUTF16, nsUnicodeToKSC5601, nsUnicodeToGB2312GL. I'd like to change the _To argument in the NS_UCONV_REG_UNREG macro. It is my belief that it should cause no harm, since the NS_UNICODEDECODER_CONTRACTID_BASE and the DECODER_NAME_BASE remains unchanged. Another more aggressive change involves making GetCharsetAtom2() call GetCharsetAtomInternal(). Unlike GetCharsetAtom(), the "2" method is being used exclusively for internal purposes and we could easily avoid the penalty of encoding name aliasing: http://lxr.mozilla.org/seamonkey/search?string=GetCharsetAtom2 I'm still concerned that we might run into trouble, when QT/GTK/Xlib refer to different encoding names requiring aliasing. I might whip up GetCharsetAtomInternal2() to make this approach less aggressive, but I'd rather not. I do believe there is some measurable benefit in avoid internal encoding aliasing throughout the whole application as dp pointed out and the current patch is a first in that direction. Before we push for a branch checkin, I'd also like to make a binary drop for the QA to confirm the estimated startup time savings of 2.5-3%.

Jungshik Shin

Comment 16

•

23 years ago

I'm afraid the following patch to nsUCvKoModule.cpp has some problem. -------------- RCS file: /cvsroot/mozilla/intl/uconv/ucvko/nsUCvKoModule.cpp,v retrieving revision 1.17 diff -u -w -b -r1.17 nsUCvKoModule.cpp --- nsUCvKoModule.cpp 2001/07/18 21:48:43 1.17 +++ nsUCvKoModule.cpp 2001/09/09 00:38:27 @@ -74,7 +74,7 @@ NS_UCONV_REG_UNREG(nsEUCKRToUnicode, "EUC-KR", "Unicode" , NS_EUCKRTOUNICODE_CID); NS_UCONV_REG_UNREG(nsUnicodeToEUCKR, "Unicode", "EUC-KR", NS_UNICODETOEUCKR_CID); -NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "ks_c_5601-1987", NS_UNICODETOKSC5601_CID); +NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "EUC-KR", NS_UNICODETOKSC5601_CID); NS_UCONV_REG_UNREG(nsUnicodeToX11Johab, "Unicode", "x-x11johab", NS_UNICODETOX11JOHAB_CID); NS_UCONV_REG_UNREG(nsJohabToUnicode, "x-johab", "Unicode" , NS_JOHABTOUNICODE_CID); NS_UCONV_REG_UNREG(nsUnicodeToJohab, "Unicode", "x-johab", NS_UNICODETOJOHAB_CID); ---------------------------------------- 'ks_c_5601-1987' plays *two* roles in Mozilla and this is a source of confusion. One is an alias name for the canonical (MIME) charset name 'x-windows-949'. Microsoft came up with this standard-violating MIME charset name (it's a long story) and began to use it in their mail/news clients. As their clients are so widespread that Mozilla has to be generous and accept it as an alias to either EUC-KR or x-windows-949. (the latter is MS's proprieatary extension of the former). For this role, the canonical name 'x-windows-949' can be used internally in place of 'ks_c_5601-1987'. However, to deal with Korean mail messages and news articles with MIME charset name 'ks_c_5601-1987', this has to be kept as an alias for x-windows-949 (the canonical name in Mozilla). The other is an internal charset name within Mozilla for KS C 5601 GL encoding (used ONLY by X11 *fonts* for chararacters in KS C 5601-1987. That's why there's no nsKSC5601ToUnicode.cpp) It is this role that 'ks_c_5601-1987' is playing in nsUnicodeToKSC5601.cpp and this CANNOT and SHOULD NOT be regarded as identical to 'EUC-KR'. Neither can/should it be regarded as identical to 'x-windows-949'. So, what has to be done? Unless there's some restriction on what format to be used for 'charset' name(e.g. only one '-' can be used or '.' cannot be used), we can just come up with a *distinct* name to use for this second case (i.e. in nsUnicodeToKSC5601 converter). I suggest we use 'ksc5601-1987-0' or 'ksc5601.1987-0' in intl/uconv/ucvko/nsUCvKoModule.cpp. I prefer the latter because that's exactly the same as used in the last two fields of XLFD of the fonts Unicode -> KSC5601 converter deals with. In case '.' cannot be used, we should use 'ksc5601-1987-0'. In case only one '-' can be used (which I don't think is the case), what about 'ksc5601_1987-0'. In short, intl/uconv/ucvko/nsUCvKoModule.cpp should have the following (assuming we can use '.' ) NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "ksc5601.1987-0", NS_UNICODETOKSC5601_CID); in place of NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "ks_c_5601-1987", NS_UNICODETOKSC5601_CID); If we decide on what to use for designation of KSC5601 GL encoding for X11 fonts, we also have to make changes in gfx/src/gtk/nsFontMetricGTK.cpp. 'ks_c_5601-1987' in the following part of the file has to be replaced by what we decide to use for KS C 5601 GL encoding. static nsFontCharSetInfo KSC5601 = { "ks_c_5601-1987", DoubleByteConvert,1); should be changed to static nsFontCharSetInfo KSC5601 = { "ksc5601.1987-0", DoubleByteConvert,1); I suspect gb_2312-80 has a similar problem. One may wonder why JIS 0208 and JIS 0212 don't have this problem. That's because for Japanese, there's been no confusion between coded character set name (JIS X 0208, JIS X 0212) on the one hand and character set encoding scheme (EUC-JP, ISO-2022-JP). For Korean, Microsoft mixed all these up (although EUC-KR had been very well established before MS shipped their first internet mail clients) and Mozilla, in its effort to be generous, made a little blunder by failing to distinguish between ks_c_5601-1987 (the standard-violating MIME charset name used by MS products) and ksc5601.1987-0 (the last two fields of XLFD for Korean X11 fonts with KSC5601 GL encoding). The situation for simplified Chinese is even worse (GB2312 and variants are used to mean at least two different things. It would be nice if a much better name EUC-CN had caught on as MIME charset)

Jungshik Shin

Comment 17

•

23 years ago

Attached patch a patch to assign two diff. names to two different roles of KS C 5601 (obsolete) — Details — Splinter Review

Jungshik Shin

Comment 18

•

23 years ago

Exactly like ks_c_5601-1987 plays dual roles in Mozilla source, x-x-big5 and gb_2312-80 play two roles as 'MIME charset' name (non-canonical) and the font encoding name in X11. The same principle can be applied to distinguish one from the other. I'm attaching a patch to fix all three cases together. In this patch, I use 'ksc5601.1987-0' for KS C 5601-1987 GL encoding, 'x-big5-noascii' for X11 Big5 font without US-ASCII part (similar names have been used for X11 JOHAB fonts and X11 GBK fonts), and 'gb2312.1980-0' for GB 2312-1980 GL encoding. I found that there's nothing to prevent '.' from being used in Mozilla's internal charset name and '.' is used for 'gb18030.2000-0'. Besides, I think using 'gb2312.1980-0' and 'ksc5601.1987-0' as Mozilla's internal charset names for Mozilla's encoders to deal with the X11 fonts with the same XLFD name fields is a good idea. 'ksc5601.1987-0', 'x-big5-noascii' and 'gb2312.1980-0' are used only by encoders for X11 fonts and they are never used as MIME charset name (so they should not be in charsetalias.properties file as commented there). Like jis_0208-1983 and jis_0212-1990 don't cause any alias resolution problem, these three names wouldn't cause any alias resolution problem.

Jungshik Shin

Comment 19

•

23 years ago

Attached patch cleaning up internal charset names for X11 fonts for CN/TW/KO (obsolete) — Details — Splinter Review

Jungshik Shin

Comment 20

•

23 years ago

I forgot to look into QT and Xlib part of gfx. I'm going to attach a new patch that take care of QT and Xlib as well as gtk.

Jungshik Shin

Comment 21

•

23 years ago

Attached patch cleaning up internal charset name for CN/TW/KO fonts in gtk/Xlib/QT (obsolete) — Details — Splinter Review

Jungshik Shin

Updated

•

23 years ago

Attachment #48747 - Attachment is obsolete: true

Jungshik Shin

Updated

•

23 years ago

Attachment #48775 - Attachment is obsolete: true

Jungshik Shin

Comment 22

•

23 years ago

Attached patch cleaning up internal charset name for CN/TW/KO fonts in gtk/Xlib/QT (obsolete) — Details — Splinter Review

Jungshik Shin

Comment 23

•

23 years ago

I'm sorry I keep forgetting a thing or two. I believe the latest attachment I've just added (timed 9:43 US PDT) has everything I want to change. BTW, most of my changes can be regarded as merely 'aesthetic' (not affecting the way Mozilla runs but simply making the source code clearer). However, I think it's better this way because it will prevent potential confusion in the future.

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 24

•

23 years ago

thanks for all your input and the hard work Jungshik! At a first glance the changes look good, I'll roll up a new hopefully all-encompassing patch including your changes. mkaply - do we need to worry about nsCSIDtoCharsetName.cpp? What is your opinion on the proposed course of action?

Mike Kaply [:mkaply]

Comment 25

•

23 years ago

I like this fix a lot. I was investigating some Mac encoding problems and was very confused to discover that aliases are required for all charsets even when the alias is the same as the original charset. I don't think you need to mess with nCSIDtoCharsetName.cpp - this is really just a 4.x compatibility thing. Frank would have more info though.

msanz

Comment 26

•

23 years ago

marking nsbranch+ since it is a performance win

Keywords: nsbranch → nsbranch+

kill this account

Comment 27

•

23 years ago

I spent several hours working with Juraj on this yesterday. It is not clear to me that we will actually save much startup time by not loading this file. Juraj put a high accuracy time around the code that loads this file and while it takes about 250 milli-seconds to build the charset menu he reported it takes about 30 milli-seconds to load this file.

Suresh Duddi (gone)

Comment 28

•

23 years ago

30 ms out of ? An easy way to time startup is - create 'Default User' - time ./mozilla -P 'Default User' file:///usr/tmp/quit.html quit.html: <html><body onLoad="window.close()"></body></html> There are no big wins; we got to be doing a whole bunch of small wins to get a cumulative performance improvement. Maybe small improvements need not be taken in for the branch.

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 29

•

23 years ago

bstell, I've just learned from jrgm that the timer resolution on W2K is 16ms. This would explain why I was measuring either 16 or 32 ms gain using the built- in timer method. It also explains some of the "noise" I was picking up when using the method described in bug 97712, which I continue to use in parallel. I will also apply dp's method on Linux. So far it looks like the lower boundary is at ~ 30 ms and the upper boundary at ~ 250 ms. After completing the patch, we could perhaps apply it to the trunk and have the QA officially record the performance gain using their metrics. If deemed significant enough, we could take the patch into the branch.

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 30

•

23 years ago

Attached patch updated patch - thanks for all your input! (obsolete) — Details — Splinter Review

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 31

•

23 years ago

I'd like to wait for ftang to have a look at this before proceeding any futher. I saw some asserts with a Linux test build (http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#3692), but they seem to be unrelated to this change.

Whiteboard: have patch, need r/sr

bobj

Comment 32

•

23 years ago

> I'd like to wait for ftang to have a look at this before proceeding any futher. > I saw some asserts with a Linux test build > (http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#3692), > but they seem to be unrelated to this change. If it's unrelated, can you back out your change and still see the asserts?

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 33

•

23 years ago

>If it's unrelated, can you back out your change and still see the asserts? Yes, I've tried on two separate build machines (thanks Brian!). Although, the asserts are quite irritating, they seem to unrelated. I just wanted to have it recorded here, to have the benefit of a wider audience. Although I'm not sure how to resolve those asserts at this point, I believe we could proceed with this patch a that's why I posted it.

kill this account

Comment 34

•

23 years ago

Juraj: please set the environment variable NS_FONT_DEBUG to 4 and find out what charset is complaining. Dp: Building the charset menu seems to take around 250-300mS. Not reading in the file (which is the stated plan of this bug) would appear to save about 30 mS which is not much of the 250+ mS.

Asa Dotzler [:asa]

Comment 35

•

23 years ago

0.9.4 is out the door.

Target Milestone: mozilla0.9.4 → mozilla0.9.5

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 36

•

23 years ago

Brian: I believe it's hpbig5, but there might be others. I'll have to rerun all of my converter test with the font debug turned on. I'll follow up on that soon. I followed up with Chris McAfee in bug 97712 and it seems that the "official" startup metrics can achieve a 3% margin of error, which is roughly 3 times better than what I have been seeing. If the performance improvement is small, it could be lost in the "noise". I believe the 30 mS for the file access is correct, although the overall improvement could be better due to reduced number of hash table lookups, smaller bloat etc.

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 37

•

23 years ago

Attached file excerpt from the console output with NS_FONT_DEBUG=4 (obsolete) — Details

kill this account

Comment 38

•

23 years ago

from the startup log: ###!!! ASSERTION: failed to get converter type: 'mDocConverterType!=nsnull', file nsFontMetricsGTK.cpp, line 3692 ###!!! Break: at file nsFontMetricsGTK.cpp, line 3692 failed to get converter type for zh-TW, nsFontMetricsGTK.cpp 3693 The zh-TW charset is for traditional Chinese.

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 39

•

23 years ago

> The zh-TW charset well, that would be a language group, wouldn't it? Am I right in assuming that "mDocConverterType = first_font->mCharSetInfo->Convert" failed?

chris hofmann

Comment 40

•

23 years ago

taking the nsbranch+ off this. it should bake on the trunk for awhile and then only come to the branch if it's extreme low risk and high reward at this point.. let me know if it turns out that way and it happens in the next week or so.

Keywords: nsbranch+ → nsbranch-

Adam D. Moss

Updated

•

23 years ago

OS: Windows 95 → All

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 41

•

23 years ago

Attached patch less aggressive patch as requested by ftang (obsolete) — Details — Splinter Review

jbetak@netscape.com (away - not reading bugmail)

Assignee

Updated

•

23 years ago

Attachment #49623 - Attachment is obsolete: true

jbetak@netscape.com (away - not reading bugmail)

Assignee

Updated

•

23 years ago

Attachment #49402 - Attachment is obsolete: true

jbetak@netscape.com (away - not reading bugmail)

Assignee

Updated

•

23 years ago

Attachment #48776 - Attachment is obsolete: true

jbetak@netscape.com (away - not reading bugmail)

Assignee

Updated

•

23 years ago

Attachment #48744 - Attachment is obsolete: true

jbetak@netscape.com (away - not reading bugmail)

Assignee

Updated

•

23 years ago

Attachment #48721 - Attachment is obsolete: true

jbetak@netscape.com (away - not reading bugmail)

Assignee

Updated

•

23 years ago

Attachment #48645 - Attachment is obsolete: true

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 42

•

23 years ago

ftang: would you be able to review tha most recent patch? The changes now affect the registry enumeration, which is in turn used only by the UI (charset menu, mail encoding menu etc.). This would greatly minimize any regression risk as discussed offline. I did some preliminary verification and it looks good. I'd like to finally close this and move on to "bigger fish" like bug 64146.

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 43

•

23 years ago

another comment: we really need to resolve both "x-sjis" and "shift_jis" for Japanese builds. I was quite surprised at this finding.

bobj

Comment 44

•

23 years ago

> another comment: we really need to resolve both "x-sjis" and "shift_jis" for > Japanese builds. I was quite surprised at this finding. Probably a bug. Historically, when shift_jis was not registered with IANA we created x-sjis. We still need to alias x-sjis for old webpages, but we should not use this internally. Cc'ing rchen who might know why the Japanese builds are using x-sjis.

Frank Tang

Reporter

Updated

•

23 years ago

Blocks: 103175

Frank Tang

Reporter

Comment 45

•

23 years ago

this one is more complex than I thought. jbetak said it is still ok for m0.9.6

Target Milestone: mozilla0.9.5 → mozilla0.9.6

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 46

•

23 years ago

Attached patch final patch candidate - this assumes that bug 64146 will be resolved soon — Details — Splinter Review

jbetak@netscape.com (away - not reading bugmail)

Assignee

Updated

•

23 years ago

Attachment #50337 - Attachment is obsolete: true

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 47

•

23 years ago

ftang - could you please review this again? I removed all attempts to optimize charset aliasing to reduce risk of regressions. When bug 64146 is checked in, there is no need to enumerate 180+ converters, which simplifies this problem quite a bit. By accident I noticed that we didn't report any error when a charset designator was empty - mozilla.org returns an empty HTTP header charset and we end up loading the string bundle to look it up - I found it quite amusing. Also cc'ing blizzard to put this on his radar. It would be nice, if he could sr after ftang approves of this approach.

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 48

•

23 years ago

nhotta - would you feel comfortable with and have time to review this? I'd really appreciate it. Hopefully Chriss Blizzard will sr soon, so that I can get it checked in.

nhottanscp

Comment 49

•

23 years ago

NS_TIMELINE_STOP_TIMER("nsCharsetAlias2:GetPreferred"); is that for permance profiling?

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 50

•

23 years ago

nhotta - thanks for looking at this! Yes, NS_TIMELINE_STOP_TIMER is for performance profiling, it only gets built in debug builds with the the enviroment variable MOZ_TIMELINE set. I was surprised to see that lxr is full of these macros - someone has been putting them all over :-)

nhottanscp

Comment 51

•

23 years ago

So is this just adding checks for two cases ("shift_jis" and "x-sjis")?

nhottanscp

Comment 52

•

23 years ago

Comment on attachment 53808 [details] [diff] [review] final patch candidate - this assumes that bug 64146 will be resolved soon r=nhotta

Attachment #53808 - Flags: review+

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 53

•

23 years ago

blizzard - could I pester you with one more sr? This one is on i18n performance (startup) tuning. I'd really appreciate it, as M096 is drawing near :-)

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 54

•

23 years ago

cc'ing waterson for potential sr

Marek Z. Jeziorek

Updated

•

23 years ago

Blocks: 107067

Marek Z. Jeziorek

Updated

•

23 years ago

Keywords: nsbranch-

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 55

•

23 years ago

cc'ing brendan for potential i18n sr

jbetak@netscape.com (away - not reading bugmail)

Assignee

Updated

•

23 years ago

Whiteboard: have patch, need r/sr → needs sr

jbetak@netscape.com (away - not reading bugmail)

Assignee

Updated

•

23 years ago

Whiteboard: needs sr → requested sr by email on 11/02/2001

kinmoz

Comment 56

•

23 years ago

Comment on attachment 53808 [details] [diff] [review] final patch candidate - this assumes that bug 64146 will be resolved soon sr=kin@netscape.com

Attachment #53808 - Flags: superreview+

kinmoz

Comment 57

•

23 years ago

I forgot to ask why the "x-sjis" and "shift_jis" cases can't be combined into a single if statement like ... + if (aKey.Equals(NS_LITERAL_STRING("x-sjis") || + aKey.Equals(NS_LITERAL_STRING("shift_jis")))

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 58

•

23 years ago

Netcenter needs to be evangelized and x-sjis should be dropped from their Japanese site. The only reason I kept them separate was my vain hope to have them do it before I can check this patch in. Kin, thanks so much for your review - I'll correct this and land the patch in a few minutes.

jbetak@netscape.com (away - not reading bugmail)

Assignee

Comment 59

•

23 years ago

patch has just landed... Marking fixed - thanks everyone!

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Teruko Kobayashi

Comment 60

•

23 years ago

Changed QA contact to ftang@netscape.com.

QA Contact: andreasb → ftang

bobj

Comment 61

•

23 years ago

Kat, > Netcenter needs to be evangelized and x-sjis should be dropped from their > Japanese site. The only reason I kept them separate was my vain hope to have > them do it before I can check this patch in. Can you help on this?

Katsuhiko Momoi

Comment 62

•

23 years ago

> Kat, >> Netcenter needs to be evangelized and x-sjis should be dropped from their >> Japanese site. I guess it is finally time to mothball Netscape 2.x or earlier & IE3 completely. >>The only reason I kept them separate was my vain hope to have >> them do it before I can check this patch in. NetCenter isn't the only site using x-sjis, but it should now be safe for other web sites to drop x-sjis also. As long as external charset handling for Mozilla can deal with x-sjis, this should not be a problem. >Can you help on this? Sure.

You need to log in before you can comment on or make changes to this bug.