Closed Bug 157993 Opened 22 years ago Closed 22 years ago

consolidate uconv libraries

Categories

(Core :: Internationalization, defect, P2)

x86
Windows 2000
defect

Tracking

()

RESOLVED FIXED
mozilla1.2alpha

People

(Reporter: alecf, Assigned: alecf)

References

Details

(Keywords: intl)

Attachments

(1 file, 6 obsolete files)

right now we have a bunch of charset converter libraries. I don't know why we
need all these dlls, when we could combine them into one or two DLLs. Here's
what I'm thinking:

Combine into ucvbase.dll:
ucvcn.dll
ucvibm.dll
ucvlatin.dll
ucvmath.dll (only 19k, we might as well just make it part of the default)

Combine into ucvasia.dll:
ucvja.dll
ucvko.dll
ucvtw.dll
ucvtw2.dll

This should make the total size smaller: right now anywhere from 10-50% of EACH
DLL is a copy and pasted nsUCv*Support.cpp - all these files appear to be
identical. This "Support" file contains a bunch of base classes that all the
decoders/encoders derive from. we can share these classes in a static library,
and then when we combine the above dlls, we'll see a savings because the library
will be shared within the dll.

After this work is completed, I have a further optimization which gets rid of
all the leaf classes for these converters and instead allows each one to be
created off a specific base class. This should reduce the overhead of each
converter down to a simple static function that makes a call to a shared static
constructor. But I'll file that in another bug.

CC'ing some i18n people so they know this is going on. The one question I have
is how we should combine the libraries: does it make sense to combine them all
into one library? If we do two libraries, does the above division look correct?
as a preliminary test, I combined the ucvja and ucvtw2 DLLs (the two largest)
into one dll...now, this is a debug build, but here are the results:

Seperate:
237     ucvja/ucvja.dll
233     ucvtw2/ucvtw2.dll
470     total

Combined:
429     ucvasia/ucvasia.dll

So this is a savings of 41k, or about 9%, just from combining these two
libraries. I imagine we can save an additional 30-50k per library that we add to
this combination.

Note that in release builds, the converter libraries add up to 761k. Based on
this first test, I'm going to estimate that we can knock this down to about
600k, and perhaps even smaller!
Status: NEW → ASSIGNED
Priority: -- → P2
Summary: consolidate uconv libraries → consolidate uconv libraries
Target Milestone: --- → mozilla1.2alpha
Blocks: 158003
Dear alecf, this work will increase run time footprint of asian users.
For example, if I am a Japanese users, most of my time will only view document
in Japanese and Wester encoding so Gecko will only load ucvja and uconv into the
memory, ucvko,ucvlatin, ucvcn,ucvibm,ucvtw and ucvtw2 won't get load into the
memory. if you merge them, then we will increase the in-memory footprint by 500K
for most Japanese users. The same thing happen to Simplified Chinese user,
Korean users, and Traditional Chinese users.

Remember, Japanese users don't read Korean just like you and me. If we don't
want to let English users take those footprint hit from Korean converters, there
are no reason to ask Japanese users to take the footprint hit from Korean
converters neither. 

ucvcn is for simplified chinese users in China
ucvtw and ucvtw2 are for traditional chineusers in Taiwan

I don't mind you combine ucvmath or ucvibm into ucvlatin but usually no body use
them anyway.
code issue, QA to yokoyama@netscape.com for now.
Keywords: intl
QA Contact: ruixu → yokoyama
ftang: ultimately, I think I can consolidate these libraries to well under 500k.
Also, DLLs are paged into memory on a page-by-page basis, usually with a
granularity of 4k - this will not increase the heap or code footprint of the
product, because only those pages which are loaded (i.e. via
nsIModule.getFactory) will be paged in.
ok, now I've combined ucvja, ucvtw, and ucvtw2 into one library:
165     ucvtw/ucvtw.dll
233     ucvtw2/ucvtw2.dll
237     ucvja/ucvja.dll
635     total

and combined:
553     ucvasia.dll

so now we've saved 82k, or about 12%.. its getting better!

and finally, having combined all 4 asian libraries into the same library:
165     ucvtw/ucvtw.dll
233     ucvtw2/ucvtw2.dll
237     ucvja/ucvja.dll
157     ucvko/ucvko.dll
792     total

and combined:
669     ucvasia.dll

a savings of 123k or about 16%!

based on these results, I think we can knock down the release version of asia
libraries from 552k to 466k, a savings of 86k.

now to combine ucvcn/ucvibm/ucvlatin/ucvmath (currently 424k in in debug builds,
or 209k in release builds..)
I suggest not merging ucvmath so that I don't run into lock-in when trying to
effect changes there.
what's a lock-in?
lock-in means I may have to wait forever to effect changes if people perceive
that they may have wider implications. Often, MathML-related changes are
tangential to other developers, and they don't give them due diligence.
an update on ucvwestern: after combining ucvcn, ucvlatin, and ucvibm, I ended up
with these results...(on a debug build)

before:
57      ucvibm/ucvibm.dll
133     ucvcn/ucvcn.dll
181     ucvlatin/ucvlatin.dll
371     total

after:
285     ucvwest.dll

a savings of 86k, or 23%. Based on this estimation, I think that this will save
another 44k on release builds, bringing our total up to 130k... and that's
before fixing bug 158003!

As for ucvmath, I don't see why we can't allow you to keep working on ucvmath
stuff, with the understanding that the code is very seperate even though
technically they're in the same DLL. I'll certainly make sure it's all #ifdefed
appropriately.
Keeping it separate will avoid undue hassle (I speak from experience) -- it is
only 19K and doesn't cause harm to anyone.
Attached file consolidate everything but ucvmath (obsolete) —
ok fine. for now I'll skip ucvmath .. but I may be back! :)

Anyway, this gzipped patch is for Unix and Win32.. I'm going to apply it to my
mac build and attempt to create 3 new projects.. (ucvutil, ucvwestern, and
ucvasia) wish me luck :)

I don't know what mimetype to use for gzip.. the un-gzipped patch is 551k, most
of which is very mechanical removal of GetMaxLength from MANY classes, and
removal of all the nsUC*Support.* files, which are now located in
intl/uconv/util as a static library.
Attachment #92455 - Attachment mime type: application/octet-stream → application/x-gzip
A minor update, and without the -N, so the patch is smaller and doesn't need to
be gzipped - for some reason I couldn't read ungzipped patches on my mac.
Attachment #92455 - Attachment is obsolete: true
Comment on attachment 92746 [details] [diff] [review]
consolidate everything but ucvmath v1.1

ok, I've tested this patch to the best of my ability on windows, mac, and
Linux. when the tree opens today, I'll check in the updated mac project files
(not yet part of the build)

Next I'll do a test build on each platform (or maybe get someone to help me)
and get it to IQA. Who should I ask in IQA to test this?
Attached patch update mac build scripts (obsolete) — Splinter Review
this additional patch for mac will update the build scripts to only build the
two new libraries - it also removes the use of the MANIFEST files, where were
completely unnecessary - the manifest files only exported the CID files, which
are only used by the module code.

oh, I'll have yet another patch for packaging updates.
Attached patch updating packaging files (obsolete) — Splinter Review
and this updates the packaging files, and removes the old library from an
existing install. I'll bet I have to do this on the commercial side too...
*sigh*
Ahhh,  i should have paid more attention to your consolidation...........
ucvcn doesn't belong to ucvwest.dll.   It's for Simplified Chinese users.
I'd really appreciate if you can remove the ucvcn from ucvwest.dll 
and add to ucvasia.dll   Sorry, alec ;(

>Next I'll do a test build on each platform (or maybe get someone to help me)
>and get it to IQA. Who should I ask in IQA to test this?
I can help you on Win if you'd like.  I believe ylong may be the right iQA to
test some sites.
andreas: how is ylong's workload?  alec's patch will impact all CJK and western
pages. 
    (ie. all the pages except MathML)
oh! absolutely no problem :) I didn't understand all these codepage things
anyway! I'll do that and update the patch here.

Attached patch consolidate everything v1.2 (obsolete) — Splinter Review
ok, this patch is one big patch which covers
1) moving ucvcn from ucvwestern to ucvasia
   during which I had to fix the g_AsciiMapping table that conflicted between a
few modules - there is now a g_ucvko_AsciiMapping so that there is no more
confusion.
2) mac build scripts
3) packaging

hopefully this is the final patch, before reviews anyway.

You'll also notice that during this process, I removed a bunch of
implementations of GetMaxLength, and implemented support for a
"MaxLengthFactor" which is the factor by which you multiply the srclen to get
the destlen - this way, 95% of the classes can use the base class and not
implement their own GetMaxLength, just to multiply by some constant number...
this saves us a vtable and code footprint for every converter.
Attachment #92746 - Attachment is obsolete: true
Attachment #93163 - Attachment is obsolete: true
Attachment #93167 - Attachment is obsolete: true
roy - thanks for the offer to help - do you need a build, or would you like to
just apply the patch to your own build?
one thing I noticed after I attached the patch was that in TestUConv.cpp you
have to put an #ifdef NS_DEBUG around:
+    if (dec) {
+      nsCOMPtr<nsIBasicDecoder> isBasic = do_QueryInterface(dec);
+      if (isBasic) {
+        basicDecCount++;
+        printf("b");
+      }
+      else printf(" ");
+    }
+    else printf(" ");


and around the equivalent code for nsIBasicEncoder - it is only a test of course

I'm working on getting a linux build now.. I could really use some help on the
mac though, my little iMac is going to take forever to build a fresh release
build...
>roy - thanks for the offer to help - do you need a build, or would you like to
>just apply the patch to your own build?
I just thought you may need my Windows build help; but I guess
you have no problem on Windows platform.

>I could really use some help on the mac though
naoki?  can you help alec on Mac?
ok, one more nit on that last patch.. I just applied this to a totally fresh
tree - you need to switch the order of "src" and "util" in
intl/uconv/Makefile.in - the static library in "util" is required to build the
dynamic library in "src"

I'll make a new patch once I'm certain that this builds once and for all :)
Attached patch consolidate everything v1.21 (obsolete) — Splinter Review
ok, last patch, I hope :) This just cleans up a few minor things:
1) g_ucvko_AsciiMapping stuff - causing missing symbols during link
2) NS_DEBUG stuff in TestUConv.cpp
3) build ordering in intl/uconv/Makefile.in
Attachment #93181 - Attachment is obsolete: true
ok, that last patch does in fact work. So who do we have lined up to test?

Windows: Roy or ylong?
Mac: Naoki?
Linux: ylong?

Thanks a LOT for your help guys. What platforms do we need actual builds on, and
what platforms can we just apply the above patch? The patch is obviously easier
because distributing binaries around can be pretty difficult :)

Another option is that I could also branch the intl/uconv directory and someone
could update that directory to the branch before building.
>ftang: ultimately, I think I can consolidate these libraries to well under 500k.
>Also, DLLs are paged into memory on a page-by-page basis, usually with a
>granularity of 4k - this will not increase the heap or code footprint of the
>product, because only those pages which are loaded (i.e. via
>nsIModule.getFactory) will be paged in.
If that is the case, why don't we combin them together into one uconv.dll and
let English user see the same impact ?
Please either (1) leave the dll structure as is, or (2) combine them together
with uconv, ucvlatin and every converter into one dll. I don't want to see in
one hand we want to keep minimum download for US/English uers and in the other
hand force Asian embedding build increase download size. IF embedding care about
dll download size for English users. I care that for Asian embedding custmer who
wil lonly include japanese dll but not chinese dll too. 
What asian embedding customers are demanding (for instance) a japanese-only dll?
I'd like to know before we hypothesize about what markets are demanding what
configuration. I understand that Japanese/Chinese/etc are independent users but
I am trying to weigh the overall cost of having all these seperate DLLs - to
generate the most benefit for the largest markets. Right now we're hurting all
markets by having bloated DLLs.

The way I look at it is this: After these changes, here are the net benefits for
each market:
            Include all libraries | only include market-specific
Non-Asian         -130k           |      -86k
Asian             -130k           |      +382k

(where Asian means a specific asian market like Japanese - the increase is the
cost of including the extra non-Japanese converters that you get with the
combined DLL, minus the total savings of combining the DLLs)

If I were to combine them all into one dll, the results would look something like:
            Include all libraries | Don't include them
Non-Asian         -130k           |      +382k
Asian             -130k           |      +382k

Basically I'm optimizing for 3 markets right now. The most important markets to
me are both of the "Include all libraries" markets - the ones who want full
browser support. The next most important are the non-asian who don't want the
libraries, because they are currently the next most demanding embedding customer.

All this being said, all of these patches do not prevent us later adding some
tweaks to the Makefiles which will allow individual DLLs to be created - for
instance, I could forsee a new flag like MOZ_SEPERATE_CONVERTERS=ja which would
cause ucvja to be generated as a DLL in addition to a static library, thus
satisfying the japanese-only market.
naoki/ylong/roy, I need some help getting this tested. Can I get some help? It
would work best if I could get you to apply the patch and do a build, but if I
need to deliver binaries to someone, I will!
Alec, please talk to Frank and get the patch reviewed first.
This is not a small patch, I prefer if you could provide a binray to IQA
(assuming the change is already in your local build).
>What asian embedding customers are demanding (for instance) a japanese-only dll?
let me ask a different question:
What asian embedding customers are demanding (for instance) a english-only dll?
that is what we have in the embedding project right now, right ?
for the very same reason, (if there are a need for english only embedding), you
will have a need for japaense only.

alecf: why don't we sit down in a meeting together first. 

ok, talked with frank over e-mail - we decided that one DLL really is the way to
go. I have a working patch for linux/windows and I just need to get the mac
project files lined up before I attach another patch.
alright, I have a windows release build with everything combined into one dll.
The results:

Before consolidation:
48      ./src/uconv.dll
84      ./ucvcn/ucvcn.dll
36      ./ucvibm/ucvibm.dll
184     ./ucvja/ucvja.dll
120     ./ucvko/ucvko.dll
100     ./ucvlatin/ucvlatin.dll
28      ./ucvmath/ucvmath.dll
128     ./ucvtw/ucvtw.dll
184     ./ucvtw2/ucvtw2.dll
912     total

after consolidation:
728     src/uconv.dll

that's a savings of 186k! with the 2-dll solution, it would have been 768k, so
the final consolidation saved us an additional 40k.
ok, I have consolidated all 8 dlls into a single dll. The attached patch is
everything except the differences to uconv.xml on mac - this patch is 380k, and
the  patch to uconv.xml alone is 345k, due to some strange changes codewarrior
made when I added some files. 

Anyway, ftang, can you review this? 90% of the changes are changes to classes
such that GetMaxLength() is calcluated by the base class as a factor of the
srclen - for those classes which had a custom GetMaxLength() - for instance
UTF8 conversion, and so forth, I just left GetMaxLength alone.
Attachment #93342 - Attachment is obsolete: true
+// ucvlatin
+#include "nsUCvLatinCID.h"
+#include "nsUCvLatinDll.h"
[...etc...]

what about having a #define list that is hosted in each ucvXXX. This way, you
could merge ucvmath as well, knowing that any changes over there won't have to
require to change the driver files explicitly because the setup is distributed.
rbs: true enough.. I'm going to make another pass through for bug 158003 so how
about I do that there? 
OK then.
Comment on attachment 94056 [details] [diff] [review]
consolidate everything v1.3

r=ftang
except the part of ucvasia.dll,so ucvwest.dll,so and mozucth.xpt in
xpinstall/packager
which I mention to alecf in private email. 

I think we should land this asap so IQA can test it and we have enough time to
react befor m1.2 alpha if there are any mistake
Attachment #94056 - Flags: review+
I think now is the perfect timing to land this kind of mass changes. 
please get sr= 
I agree - sr requests have been made, I'm just waiting for them to come in...
Comment on attachment 94056 [details] [diff] [review]
consolidate everything v1.3

sr=blizzard

(Really with a patch this large and this repetitive it's a rubber stamp, but I
did look through the whole patch and didn't see anything bad.)
Attachment #94056 - Flags: superreview+
ok, this has landed - still lots of time to keep testing this for mozilla1.2
FYI this patch appears to have cost us roughly 1% at startup (see comet).
FYI, static builds broke with this checkin.  For some reason, ucvmath & the
merged uconv dlls are redefining symbols in some Decoder/Encoder support classes:
nsBasicDecoderSupport
nsBufferDecoderSupport
nsTableDecoderSupport
nsMultiTableEncoderSupport
etc
darin: yeah, I noticed that :( I'm trying to see what I can do.. I have
something that will further reduce this dll by another 30k or so (See bug 158003)

seawood: yes, before each library had their own unique set of shared classes -
ucvja had "nsUCvJABasicDecoderSupport", ucvko had "nsUCvKOBasicDecoderSupport"
and so forth - now they all share the same set of classes, as defined in the
library intl/uconv/util - ucvmath shares these as well now.. 

I don't know exactly how the static build works, but I'm guessing we need to
make sure that library doesn't get added twice.
ok, marking this fixed for now... still working on a way to recover the startup
cost...
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
I was looking at the static build problem -- it seems that ucvmath still has its
own copy of what's in uconv/util.  To fix the static build, it seems like you
need to make ucvmath use the util library, and then list the util library in
LIBS instead of SHARED_LIBRARY_LIBS for one or both of src and ucvmath.
The static build bustage is now covered by bug 163217.
Blocks: 163737
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: