Closed Bug 97175 Opened 23 years ago Closed 23 years ago

startup perf- remove the need of loading of charsetalias.properties files at startup

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla0.9.6

People

(Reporter: ftang, Assigned: jbetak)

References

Details

(Keywords: perf, Whiteboard: requested sr by email on 11/02/2001)

Attachments

(1 file, 10 obsolete files)

2.58 KB, patch
nhottanscp
: review+
kinmoz
: superreview+
Details | Diff | Splinter Review
we currently load charsetalias.properties at startup time, we should find a way to remove the need of loading it at startup time to speed up startup performance we could 1. build in common used pair in the cpp code, and/or 2. cache the resolved result in registry.
Blocks: 7251
Keywords: perf
Summary: startup perf- remove the need of loading of charsetalias.properties files at startup → startup perf- remove the need of loading of charsetalias.properties files at startup
per offline discussion with ftang - tentatively reassigning to myself
Assignee: ftang → jbetak
list of charset aliases requested during startup in sequential order (US trunk build): windows-1252 x-mac-roman us-ascii x-u-escaped ISO-8859-2 ISO-8859-3 ISO-8859-4 ISO-8859-5 ISO-8859-6 ISO-8859-6-I ISO-8859-6-E ISO-8859-7 ISO-8859-8 ISO-8859-8-I ISO-8859-8-E ISO-8859-9 ISO-8859-10 ISO-8859-13 ISO-8859-14 ISO-8859-15 ISO-IR-111 windows-1250 windows-1251 windows-1253 windows-1254 windows-1255 windows-1256 windows-1257 windows-1258 TIS-620 IBM866 KOI8-R KOI8-U x-mac-ce x-mac-greek x-mac-turkish x-mac-croatian x-mac-romanian x-mac-cyrillic x-mac-ukrainian x-mac-icelandic armscii-8 x-viet-tcvn5712 VISCII x-viet-vps UTF-7 x-imap4-modified-utf7 UTF-16BE UTF-16LE UTF-32BE UTF-32LE T.61-8bit x-user-defined x-mac-arabic x-mac-devanagari x-mac-farsi x-mac-gurmukhi x-mac-gujarati x-mac-hebrew IBM850 IBM852 IBM855 IBM857 IBM862 IBM864 IBM864i Shift_JIS EUC-JP ISO-2022-JP GB2312 windows-936 x-gbk HZ-GB-2312 gb18030 Big5 Big5-HKSCS x-euc-tw EUC-KR x-johab x-windows-949 windows-1252 x-mac-roman us-ascii x-u-escaped ISO-8859-2 ISO-8859-3 ISO-8859-4 ISO-8859-5 ISO-8859-6 ISO-8859-6-I ISO-8859-6-E ISO-8859-7 ISO-8859-8 ISO-8859-8-I ISO-8859-8-E ISO-8859-9 ISO-8859-10 ISO-8859-13 ISO-8859-14 ISO-8859-15 ISO-IR-111 windows-1250 windows-1251 windows-1253 windows-1254 windows-1255 windows-1256 windows-1257 windows-1258 TIS-620 IBM866 KOI8-R KOI8-U x-mac-ce x-mac-greek x-mac-turkish x-mac-croatian x-mac-romanian x-mac-cyrillic x-mac-ukrainian x-mac-icelandic armscii-8 x-viet-tcvn5712 VISCII x-viet-vps UTF-7 x-imap4-modified-utf7 UTF-16BE UTF-16LE UTF-16 UTF-32BE UTF-32LE T.61-8bit x-user-defined Adobe-Symbol-Encoding x-zapf-dingbats x-mac-arabic x-mac-devanagari x-mac-farsi x-mac-gurmukhi x-mac-gujarati x-mac-hebrew IBM850 IBM852 IBM855 IBM857 IBM862 IBM864 IBM864i Shift_JIS EUC-JP ISO-2022-JP jis_0201 jis_0208-1983 jis_0212-1990 GB2312 windows-936 x-gbk x-gbk-noascii HZ-GB-2312 gb_2312-80 gb18030.2000-0 gb18030.2000-1 gb18030 Big5 x-x-big5 Big5-HKSCS hkscs-1 x-euc-tw x-cns-11643-1 x-cns-11643-2 x-cns-11643-3 x-cns-11643-4 x-cns-11643-5 x-cns-11643-6 x-cns-11643-7 EUC-KR ks_c_5601-1987 x-x11johab x-johab x-johab-noascii x-windows-949 ISO-8859-15 armscii-8 ISO-8859-4 ISO-8859-14 ISO-8859-2 GB2312 Big5 KOI8-R windows-1251 KOI8-U ISO-8859-7 ISO-2022-JP EUC-KR ISO-8859-10 ISO-8859-3 TIS-620 ISO-8859-9 VISCII iso-8859-15 ibm850 x-mac-roman windows-1252 iso-8859-14 iso-8859-7 x-mac-greek windows-1253 x-mac-icelandic iso-8859-10 iso-8859-3 iso-8859-4 iso-8859-13 windows-1257 ibm852 iso-8859-2 x-mac-ce windows-1250 x-mac-croatian ibm855 iso-8859-5 iso-ir-111 koi8-r x-mac-cyrillic windows-1251 ibm866 koi8-u x-mac-ukrainian x-mac-romanian gb2312 x-gbk gb18030 hz-gb-2312 big5 big5-hkscs x-euc-tw euc-jp iso-2022-jp shift_jis euc-kr x-windows-949 x-johab armscii-8 tis-620 ibm857 iso-8859-9 x-mac-turkish windows-1254 x-viet-tcvn5712 viscii x-viet-vps windows-1258 x-mac-devanagari x-mac-gujarati x-mac-gurmukhi iso-8859-6 windows-1256 ibm864 x-mac-arabic x-mac-farsi iso-8859-8-i windows-1255 iso-8859-8 ibm862 x-mac-hebrew I just spoke to ftang and we shouldn't be needing all of these during startup. I suspect that there is some enumeration going on somewhere and we should defer that to later. I'm still in the first stage of investigation, but crude preliminary data indicate that this might be a worthwhile effort...
OK, I believe we are loading all these aliases because of the charset menu construction. I'm marking dependency to bug 64146.
Status: NEW → ASSIGNED
Depends on: 64146
Priority: -- → P3
Target Milestone: --- → mozilla0.9.5
confirming my last assumption: removing the charset menu loader reduces the number of charset alias resolutions (in a US trunk build) to two: UTF-8 ISO-8859-1 Looks like we'll have to address both issues together. I'll run the same check with a Japanese build as Frank suggested and enhance the list of hard-coded charset aliases. However, as it seems the delayed loading of the charset menu would help greatly in addressing this issue.
here is a list of non-canonical charset designators. We need to fix these in our internal prefs to avoid the need for alias resolution and subsequent loading of charsetalias.properties. Two patches coming up - one will defer charset alias resolution to debug builds only, the other will address the need for strictly canonical charset designators in our internal prefs. Resolving charset alias: UTF-16 = UTF-16BE Resolving charset alias: gb_2312-80 = GB2312 Resolving charset alias: x-x-big5 = Big5 Resolving charset alias: ks_c_5601-1987 = x-windows-949 Resolving charset alias: iso-8859-1 = ISO-8859-1 Resolving charset alias: iso-8859-1 = ISO-8859-1 Resolving charset alias: iso-8859-1 = ISO-8859-1 Resolving charset alias: iso-8859-1 = ISO-8859-1 Resolving charset alias: iso-8859-15 = ISO-8859-15 Resolving charset alias: ibm850 = IBM850 Resolving charset alias: iso-8859-14 = ISO-8859-14 Resolving charset alias: iso-8859-7 = ISO-8859-7 Resolving charset alias: iso-8859-10 = ISO-8859-10 Resolving charset alias: iso-8859-3 = ISO-8859-3 Resolving charset alias: iso-8859-4 = ISO-8859-4 Resolving charset alias: iso-8859-13 = ISO-8859-13 Resolving charset alias: windows-1257 = windows-1257 Resolving charset alias: ibm852 = IBM852 Resolving charset alias: iso-8859-2 = ISO-8859-2 Resolving charset alias: ibm855 = IBM855 Resolving charset alias: iso-8859-5 = ISO-8859-5 Resolving charset alias: iso-ir-111 = ISO-IR-111 Resolving charset alias: koi8-r = KOI8-R Resolving charset alias: ibm866 = IBM866 Resolving charset alias: koi8-u = KOI8-U Resolving charset alias: gb2312 = GB2312 Resolving charset alias: hz-gb-2312 = HZ-GB-2312 Resolving charset alias: big5 = Big5 Resolving charset alias: big5-hkscs = Big5-HKSCS Resolving charset alias: euc-jp = EUC-JP Resolving charset alias: iso-2022-jp = ISO-2022-JP Resolving charset alias: shift_jis = Shift_JIS Resolving charset alias: euc-kr = EUC-KR Resolving charset alias: tis-620 = TIS-620 Resolving charset alias: ibm857 = IBM857 Resolving charset alias: iso-8859-9 = ISO-8859-9 Resolving charset alias: viscii = VISCII Resolving charset alias: iso-8859-6 = ISO-8859-6 Resolving charset alias: ibm864 = IBM864 Resolving charset alias: iso-8859-8-i = ISO-8859-8-I Resolving charset alias: iso-8859-8 = ISO-8859-8 Resolving charset alias: ibm862 = IBM862
cc'ing bobj and cathleen, first rough measurements show that this could save around 3% (0.25s) of startup time.
Review: So I guess we are completely eliminating the charset alias ? In that case, can we remove GetCharsetAtom() and GetCharsetAtom2() If not, atleast we can make GetCharsetAtom() call GetCharsetAtomInternal() r=dp
> So I guess we are completely eliminating the charset alias ? No, we are forcing all "internal" charset designators to use the canonical names, but we still need to support the aliases for external charsets (e.g., MIME headers). Nominating nsbranch for 0.9.4.
Keywords: nsbranch
Target Milestone: mozilla0.9.5 → mozilla0.9.4
dp, thanks for looking at this! I guess bobj already addressed your question. I might only add that I too believe that we might be able to increase the number of instances using GetCharsetAtomInternal(). I felt I had to create a new method, since the old one is used for both - internal and external charset designator resolution. The need for internal chasert aliasing could be eliminated thoroughtout the whole application, so this could be just a first step. I'm not entirely sure, how much we could benefit from going all the way, I'd have to investigate some more.
there is still a small problem left to resolve. After changing navigator.properties, I'm still left with the following non-canonical charset designators. I'm investigating, where they are coming from and suspect the source could be individual charset decoders. Resolving charset alias: UTF-16 = UTF-16BE Resolving charset alias: gb_2312-80 = GB2312 Resolving charset alias: x-x-big5 = Big5 Resolving charset alias: ks_c_5601-1987 = x-windows-949
I just verified that the remaining four non-canonical references (UTF-16, gb_2312-80, x-x-big5 and ks_c_5601-1987) are not coming from our registered charset decoders and I'm 100% certain they are not coming from the charset menu either, which was at the center of my optimization effort here. I'll give it a more careful look in the debugger and come up with a recommendation how to proceed from here. Please bear with me.
Attachment #48645 - Flags: review+
Attachment #48645 - Flags: needs-work+
Attachment #48655 - Attachment is obsolete: true
cc'ing some more i18n folks to solicit feedback on the proposed changes. As it turned out the encoding designators requiring aliasing came from four encoders: nsUnicodeToBIG5NoAscii, nsUnicodeToUTF16, nsUnicodeToKSC5601, nsUnicodeToGB2312GL. I'd like to change the _To argument in the NS_UCONV_REG_UNREG macro. It is my belief that it should cause no harm, since the NS_UNICODEDECODER_CONTRACTID_BASE and the DECODER_NAME_BASE remains unchanged. Another more aggressive change involves making GetCharsetAtom2() call GetCharsetAtomInternal(). Unlike GetCharsetAtom(), the "2" method is being used exclusively for internal purposes and we could easily avoid the penalty of encoding name aliasing: http://lxr.mozilla.org/seamonkey/search?string=GetCharsetAtom2 I'm still concerned that we might run into trouble, when QT/GTK/Xlib refer to different encoding names requiring aliasing. I might whip up GetCharsetAtomInternal2() to make this approach less aggressive, but I'd rather not. I do believe there is some measurable benefit in avoid internal encoding aliasing throughout the whole application as dp pointed out and the current patch is a first in that direction. Before we push for a branch checkin, I'd also like to make a binary drop for the QA to confirm the estimated startup time savings of 2.5-3%.
I'm afraid the following patch to nsUCvKoModule.cpp has some problem. -------------- RCS file: /cvsroot/mozilla/intl/uconv/ucvko/nsUCvKoModule.cpp,v retrieving revision 1.17 diff -u -w -b -r1.17 nsUCvKoModule.cpp --- nsUCvKoModule.cpp 2001/07/18 21:48:43 1.17 +++ nsUCvKoModule.cpp 2001/09/09 00:38:27 @@ -74,7 +74,7 @@ NS_UCONV_REG_UNREG(nsEUCKRToUnicode, "EUC-KR", "Unicode" , NS_EUCKRTOUNICODE_CID); NS_UCONV_REG_UNREG(nsUnicodeToEUCKR, "Unicode", "EUC-KR", NS_UNICODETOEUCKR_CID); -NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "ks_c_5601-1987", NS_UNICODETOKSC5601_CID); +NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "EUC-KR", NS_UNICODETOKSC5601_CID); NS_UCONV_REG_UNREG(nsUnicodeToX11Johab, "Unicode", "x-x11johab", NS_UNICODETOX11JOHAB_CID); NS_UCONV_REG_UNREG(nsJohabToUnicode, "x-johab", "Unicode" , NS_JOHABTOUNICODE_CID); NS_UCONV_REG_UNREG(nsUnicodeToJohab, "Unicode", "x-johab", NS_UNICODETOJOHAB_CID); ---------------------------------------- 'ks_c_5601-1987' plays *two* roles in Mozilla and this is a source of confusion. One is an alias name for the canonical (MIME) charset name 'x-windows-949'. Microsoft came up with this standard-violating MIME charset name (it's a long story) and began to use it in their mail/news clients. As their clients are so widespread that Mozilla has to be generous and accept it as an alias to either EUC-KR or x-windows-949. (the latter is MS's proprieatary extension of the former). For this role, the canonical name 'x-windows-949' can be used internally in place of 'ks_c_5601-1987'. However, to deal with Korean mail messages and news articles with MIME charset name 'ks_c_5601-1987', this has to be kept as an alias for x-windows-949 (the canonical name in Mozilla). The other is an internal charset name within Mozilla for KS C 5601 GL encoding (used ONLY by X11 *fonts* for chararacters in KS C 5601-1987. That's why there's no nsKSC5601ToUnicode.cpp) It is this role that 'ks_c_5601-1987' is playing in nsUnicodeToKSC5601.cpp and this CANNOT and SHOULD NOT be regarded as identical to 'EUC-KR'. Neither can/should it be regarded as identical to 'x-windows-949'. So, what has to be done? Unless there's some restriction on what format to be used for 'charset' name(e.g. only one '-' can be used or '.' cannot be used), we can just come up with a *distinct* name to use for this second case (i.e. in nsUnicodeToKSC5601 converter). I suggest we use 'ksc5601-1987-0' or 'ksc5601.1987-0' in intl/uconv/ucvko/nsUCvKoModule.cpp. I prefer the latter because that's exactly the same as used in the last two fields of XLFD of the fonts Unicode -> KSC5601 converter deals with. In case '.' cannot be used, we should use 'ksc5601-1987-0'. In case only one '-' can be used (which I don't think is the case), what about 'ksc5601_1987-0'. In short, intl/uconv/ucvko/nsUCvKoModule.cpp should have the following (assuming we can use '.' ) NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "ksc5601.1987-0", NS_UNICODETOKSC5601_CID); in place of NS_UCONV_REG_UNREG(nsUnicodeToKSC5601, "Unicode", "ks_c_5601-1987", NS_UNICODETOKSC5601_CID); If we decide on what to use for designation of KSC5601 GL encoding for X11 fonts, we also have to make changes in gfx/src/gtk/nsFontMetricGTK.cpp. 'ks_c_5601-1987' in the following part of the file has to be replaced by what we decide to use for KS C 5601 GL encoding. static nsFontCharSetInfo KSC5601 = { "ks_c_5601-1987", DoubleByteConvert,1); should be changed to static nsFontCharSetInfo KSC5601 = { "ksc5601.1987-0", DoubleByteConvert,1); I suspect gb_2312-80 has a similar problem. One may wonder why JIS 0208 and JIS 0212 don't have this problem. That's because for Japanese, there's been no confusion between coded character set name (JIS X 0208, JIS X 0212) on the one hand and character set encoding scheme (EUC-JP, ISO-2022-JP). For Korean, Microsoft mixed all these up (although EUC-KR had been very well established before MS shipped their first internet mail clients) and Mozilla, in its effort to be generous, made a little blunder by failing to distinguish between ks_c_5601-1987 (the standard-violating MIME charset name used by MS products) and ksc5601.1987-0 (the last two fields of XLFD for Korean X11 fonts with KSC5601 GL encoding). The situation for simplified Chinese is even worse (GB2312 and variants are used to mean at least two different things. It would be nice if a much better name EUC-CN had caught on as MIME charset)
Exactly like ks_c_5601-1987 plays dual roles in Mozilla source, x-x-big5 and gb_2312-80 play two roles as 'MIME charset' name (non-canonical) and the font encoding name in X11. The same principle can be applied to distinguish one from the other. I'm attaching a patch to fix all three cases together. In this patch, I use 'ksc5601.1987-0' for KS C 5601-1987 GL encoding, 'x-big5-noascii' for X11 Big5 font without US-ASCII part (similar names have been used for X11 JOHAB fonts and X11 GBK fonts), and 'gb2312.1980-0' for GB 2312-1980 GL encoding. I found that there's nothing to prevent '.' from being used in Mozilla's internal charset name and '.' is used for 'gb18030.2000-0'. Besides, I think using 'gb2312.1980-0' and 'ksc5601.1987-0' as Mozilla's internal charset names for Mozilla's encoders to deal with the X11 fonts with the same XLFD name fields is a good idea. 'ksc5601.1987-0', 'x-big5-noascii' and 'gb2312.1980-0' are used only by encoders for X11 fonts and they are never used as MIME charset name (so they should not be in charsetalias.properties file as commented there). Like jis_0208-1983 and jis_0212-1990 don't cause any alias resolution problem, these three names wouldn't cause any alias resolution problem.
I forgot to look into QT and Xlib part of gfx. I'm going to attach a new patch that take care of QT and Xlib as well as gtk.
Attachment #48747 - Attachment is obsolete: true
Attachment #48775 - Attachment is obsolete: true
I'm sorry I keep forgetting a thing or two. I believe the latest attachment I've just added (timed 9:43 US PDT) has everything I want to change. BTW, most of my changes can be regarded as merely 'aesthetic' (not affecting the way Mozilla runs but simply making the source code clearer). However, I think it's better this way because it will prevent potential confusion in the future.
thanks for all your input and the hard work Jungshik! At a first glance the changes look good, I'll roll up a new hopefully all-encompassing patch including your changes. mkaply - do we need to worry about nsCSIDtoCharsetName.cpp? What is your opinion on the proposed course of action?
I like this fix a lot. I was investigating some Mac encoding problems and was very confused to discover that aliases are required for all charsets even when the alias is the same as the original charset. I don't think you need to mess with nCSIDtoCharsetName.cpp - this is really just a 4.x compatibility thing. Frank would have more info though.
marking nsbranch+ since it is a performance win
Keywords: nsbranchnsbranch+
I spent several hours working with Juraj on this yesterday. It is not clear to me that we will actually save much startup time by not loading this file. Juraj put a high accuracy time around the code that loads this file and while it takes about 250 milli-seconds to build the charset menu he reported it takes about 30 milli-seconds to load this file.
30 ms out of ? An easy way to time startup is - create 'Default User' - time ./mozilla -P 'Default User' file:///usr/tmp/quit.html quit.html: <html><body onLoad="window.close()"></body></html> There are no big wins; we got to be doing a whole bunch of small wins to get a cumulative performance improvement. Maybe small improvements need not be taken in for the branch.
bstell, I've just learned from jrgm that the timer resolution on W2K is 16ms. This would explain why I was measuring either 16 or 32 ms gain using the built- in timer method. It also explains some of the "noise" I was picking up when using the method described in bug 97712, which I continue to use in parallel. I will also apply dp's method on Linux. So far it looks like the lower boundary is at ~ 30 ms and the upper boundary at ~ 250 ms. After completing the patch, we could perhaps apply it to the trunk and have the QA officially record the performance gain using their metrics. If deemed significant enough, we could take the patch into the branch.
I'd like to wait for ftang to have a look at this before proceeding any futher. I saw some asserts with a Linux test build (http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#3692), but they seem to be unrelated to this change.
Whiteboard: have patch, need r/sr
> I'd like to wait for ftang to have a look at this before proceeding any futher. > I saw some asserts with a Linux test build > (http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#3692), > but they seem to be unrelated to this change. If it's unrelated, can you back out your change and still see the asserts?
>If it's unrelated, can you back out your change and still see the asserts? Yes, I've tried on two separate build machines (thanks Brian!). Although, the asserts are quite irritating, they seem to unrelated. I just wanted to have it recorded here, to have the benefit of a wider audience. Although I'm not sure how to resolve those asserts at this point, I believe we could proceed with this patch a that's why I posted it.
Juraj: please set the environment variable NS_FONT_DEBUG to 4 and find out what charset is complaining. Dp: Building the charset menu seems to take around 250-300mS. Not reading in the file (which is the stated plan of this bug) would appear to save about 30 mS which is not much of the 250+ mS.
0.9.4 is out the door.
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Brian: I believe it's hpbig5, but there might be others. I'll have to rerun all of my converter test with the font debug turned on. I'll follow up on that soon. I followed up with Chris McAfee in bug 97712 and it seems that the "official" startup metrics can achieve a 3% margin of error, which is roughly 3 times better than what I have been seeing. If the performance improvement is small, it could be lost in the "noise". I believe the 30 mS for the file access is correct, although the overall improvement could be better due to reduced number of hash table lookups, smaller bloat etc.
from the startup log: ###!!! ASSERTION: failed to get converter type: 'mDocConverterType!=nsnull', file nsFontMetricsGTK.cpp, line 3692 ###!!! Break: at file nsFontMetricsGTK.cpp, line 3692 failed to get converter type for zh-TW, nsFontMetricsGTK.cpp 3693 The zh-TW charset is for traditional Chinese.
> The zh-TW charset well, that would be a language group, wouldn't it? Am I right in assuming that "mDocConverterType = first_font->mCharSetInfo->Convert" failed?
taking the nsbranch+ off this. it should bake on the trunk for awhile and then only come to the branch if it's extreme low risk and high reward at this point.. let me know if it turns out that way and it happens in the next week or so.
Keywords: nsbranch+nsbranch-
OS: Windows 95 → All
Attachment #49623 - Attachment is obsolete: true
Attachment #49402 - Attachment is obsolete: true
Attachment #48776 - Attachment is obsolete: true
Attachment #48744 - Attachment is obsolete: true
Attachment #48721 - Attachment is obsolete: true
Attachment #48645 - Attachment is obsolete: true
ftang: would you be able to review tha most recent patch? The changes now affect the registry enumeration, which is in turn used only by the UI (charset menu, mail encoding menu etc.). This would greatly minimize any regression risk as discussed offline. I did some preliminary verification and it looks good. I'd like to finally close this and move on to "bigger fish" like bug 64146.
another comment: we really need to resolve both "x-sjis" and "shift_jis" for Japanese builds. I was quite surprised at this finding.
> another comment: we really need to resolve both "x-sjis" and "shift_jis" for > Japanese builds. I was quite surprised at this finding. Probably a bug. Historically, when shift_jis was not registered with IANA we created x-sjis. We still need to alias x-sjis for old webpages, but we should not use this internally. Cc'ing rchen who might know why the Japanese builds are using x-sjis.
Blocks: 103175
this one is more complex than I thought. jbetak said it is still ok for m0.9.6
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Attachment #50337 - Attachment is obsolete: true
ftang - could you please review this again? I removed all attempts to optimize charset aliasing to reduce risk of regressions. When bug 64146 is checked in, there is no need to enumerate 180+ converters, which simplifies this problem quite a bit. By accident I noticed that we didn't report any error when a charset designator was empty - mozilla.org returns an empty HTTP header charset and we end up loading the string bundle to look it up - I found it quite amusing. Also cc'ing blizzard to put this on his radar. It would be nice, if he could sr after ftang approves of this approach.
nhotta - would you feel comfortable with and have time to review this? I'd really appreciate it. Hopefully Chriss Blizzard will sr soon, so that I can get it checked in.
NS_TIMELINE_STOP_TIMER("nsCharsetAlias2:GetPreferred"); is that for permance profiling?
nhotta - thanks for looking at this! Yes, NS_TIMELINE_STOP_TIMER is for performance profiling, it only gets built in debug builds with the the enviroment variable MOZ_TIMELINE set. I was surprised to see that lxr is full of these macros - someone has been putting them all over :-)
So is this just adding checks for two cases ("shift_jis" and "x-sjis")?
Comment on attachment 53808 [details] [diff] [review] final patch candidate - this assumes that bug 64146 will be resolved soon r=nhotta
Attachment #53808 - Flags: review+
blizzard - could I pester you with one more sr? This one is on i18n performance (startup) tuning. I'd really appreciate it, as M096 is drawing near :-)
cc'ing waterson for potential sr
Blocks: 107067
Keywords: nsbranch-
cc'ing brendan for potential i18n sr
Whiteboard: have patch, need r/sr → needs sr
Whiteboard: needs sr → requested sr by email on 11/02/2001
Comment on attachment 53808 [details] [diff] [review] final patch candidate - this assumes that bug 64146 will be resolved soon sr=kin@netscape.com
Attachment #53808 - Flags: superreview+
I forgot to ask why the "x-sjis" and "shift_jis" cases can't be combined into a single if statement like ... + if (aKey.Equals(NS_LITERAL_STRING("x-sjis") || + aKey.Equals(NS_LITERAL_STRING("shift_jis")))
Netcenter needs to be evangelized and x-sjis should be dropped from their Japanese site. The only reason I kept them separate was my vain hope to have them do it before I can check this patch in. Kin, thanks so much for your review - I'll correct this and land the patch in a few minutes.
patch has just landed... Marking fixed - thanks everyone!
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Changed QA contact to ftang@netscape.com.
QA Contact: andreasb → ftang
Kat, > Netcenter needs to be evangelized and x-sjis should be dropped from their > Japanese site. The only reason I kept them separate was my vain hope to have > them do it before I can check this patch in. Can you help on this?
> Kat, >> Netcenter needs to be evangelized and x-sjis should be dropped from their >> Japanese site. I guess it is finally time to mothball Netscape 2.x or earlier & IE3 completely. >>The only reason I kept them separate was my vain hope to have >> them do it before I can check this patch in. NetCenter isn't the only site using x-sjis, but it should now be safe for other web sites to drop x-sjis also. As long as external charset handling for Mozilla can deal with x-sjis, this should not be a problem. >Can you help on this? Sure.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: