Last Comment Bug 157993 - consolidate uconv libraries
: consolidate uconv libraries
Status: RESOLVED FIXED
: intl
Product: Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: x86 Windows 2000
: P2 normal (vote)
: mozilla1.2alpha
Assigned To: Alec Flett
: Roy Yokoyama
:
Mentors:
: 94221 (view as bug list)
Depends on:
Blocks: 158003 163737
  Show dependency treegraph
 
Reported: 2002-07-17 14:49 PDT by Alec Flett
Modified: 2009-08-23 10:28 PDT (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
consolidate everything but ucvmath (58.17 KB, application/x-gzip)
2002-07-23 15:25 PDT, Alec Flett
no flags Details
consolidate everything but ucvmath v1.1 (309.98 KB, patch)
2002-07-25 08:59 PDT, Alec Flett
no flags Details | Diff | Splinter Review
update mac build scripts (4.71 KB, patch)
2002-07-29 09:55 PDT, Alec Flett
no flags Details | Diff | Splinter Review
updating packaging files (6.17 KB, patch)
2002-07-29 10:06 PDT, Alec Flett
no flags Details | Diff | Splinter Review
consolidate everything v1.2 (330.22 KB, patch)
2002-07-29 11:56 PDT, Alec Flett
no flags Details | Diff | Splinter Review
consolidate everything v1.21 (317.25 KB, patch)
2002-07-30 16:08 PDT, Alec Flett
no flags Details | Diff | Splinter Review
consolidate everything v1.3 (379.45 KB, patch)
2002-08-05 12:11 PDT, Alec Flett
ftang: review+
blizzard: superreview+
Details | Diff | Splinter Review

Description Alec Flett 2002-07-17 14:49:24 PDT
right now we have a bunch of charset converter libraries. I don't know why we
need all these dlls, when we could combine them into one or two DLLs. Here's
what I'm thinking:

Combine into ucvbase.dll:
ucvcn.dll
ucvibm.dll
ucvlatin.dll
ucvmath.dll (only 19k, we might as well just make it part of the default)

Combine into ucvasia.dll:
ucvja.dll
ucvko.dll
ucvtw.dll
ucvtw2.dll

This should make the total size smaller: right now anywhere from 10-50% of EACH
DLL is a copy and pasted nsUCv*Support.cpp - all these files appear to be
identical. This "Support" file contains a bunch of base classes that all the
decoders/encoders derive from. we can share these classes in a static library,
and then when we combine the above dlls, we'll see a savings because the library
will be shared within the dll.

After this work is completed, I have a further optimization which gets rid of
all the leaf classes for these converters and instead allows each one to be
created off a specific base class. This should reduce the overhead of each
converter down to a simple static function that makes a call to a shared static
constructor. But I'll file that in another bug.

CC'ing some i18n people so they know this is going on. The one question I have
is how we should combine the libraries: does it make sense to combine them all
into one library? If we do two libraries, does the above division look correct?
Comment 1 Alec Flett 2002-07-17 15:55:20 PDT
as a preliminary test, I combined the ucvja and ucvtw2 DLLs (the two largest)
into one dll...now, this is a debug build, but here are the results:

Seperate:
237     ucvja/ucvja.dll
233     ucvtw2/ucvtw2.dll
470     total

Combined:
429     ucvasia/ucvasia.dll

So this is a savings of 41k, or about 9%, just from combining these two
libraries. I imagine we can save an additional 30-50k per library that we add to
this combination.

Note that in release builds, the converter libraries add up to 761k. Based on
this first test, I'm going to estimate that we can knock this down to about
600k, and perhaps even smaller!
Comment 2 Frank Tang 2002-07-17 16:28:31 PDT
Dear alecf, this work will increase run time footprint of asian users.
For example, if I am a Japanese users, most of my time will only view document
in Japanese and Wester encoding so Gecko will only load ucvja and uconv into the
memory, ucvko,ucvlatin, ucvcn,ucvibm,ucvtw and ucvtw2 won't get load into the
memory. if you merge them, then we will increase the in-memory footprint by 500K
for most Japanese users. The same thing happen to Simplified Chinese user,
Korean users, and Traditional Chinese users.

Remember, Japanese users don't read Korean just like you and me. If we don't
want to let English users take those footprint hit from Korean converters, there
are no reason to ask Japanese users to take the footprint hit from Korean
converters neither. 

ucvcn is for simplified chinese users in China
ucvtw and ucvtw2 are for traditional chineusers in Taiwan

I don't mind you combine ucvmath or ucvibm into ucvlatin but usually no body use
them anyway.
Comment 3 Rui Xu 2002-07-17 18:49:51 PDT
code issue, QA to yokoyama@netscape.com for now.
Comment 4 Alec Flett 2002-07-18 14:43:26 PDT
ftang: ultimately, I think I can consolidate these libraries to well under 500k.
Also, DLLs are paged into memory on a page-by-page basis, usually with a
granularity of 4k - this will not increase the heap or code footprint of the
product, because only those pages which are loaded (i.e. via
nsIModule.getFactory) will be paged in.
Comment 5 Alec Flett 2002-07-19 11:52:30 PDT
ok, now I've combined ucvja, ucvtw, and ucvtw2 into one library:
165     ucvtw/ucvtw.dll
233     ucvtw2/ucvtw2.dll
237     ucvja/ucvja.dll
635     total

and combined:
553     ucvasia.dll

so now we've saved 82k, or about 12%.. its getting better!

Comment 6 Alec Flett 2002-07-19 12:09:49 PDT
and finally, having combined all 4 asian libraries into the same library:
165     ucvtw/ucvtw.dll
233     ucvtw2/ucvtw2.dll
237     ucvja/ucvja.dll
157     ucvko/ucvko.dll
792     total

and combined:
669     ucvasia.dll

a savings of 123k or about 16%!

based on these results, I think we can knock down the release version of asia
libraries from 552k to 466k, a savings of 86k.

now to combine ucvcn/ucvibm/ucvlatin/ucvmath (currently 424k in in debug builds,
or 209k in release builds..)
Comment 7 rbs 2002-07-20 01:02:08 PDT
I suggest not merging ucvmath so that I don't run into lock-in when trying to
effect changes there.
Comment 8 Alec Flett 2002-07-22 11:34:31 PDT
what's a lock-in?
Comment 9 rbs 2002-07-22 15:07:05 PDT
lock-in means I may have to wait forever to effect changes if people perceive
that they may have wider implications. Often, MathML-related changes are
tangential to other developers, and they don't give them due diligence.
Comment 10 Alec Flett 2002-07-22 17:37:14 PDT
an update on ucvwestern: after combining ucvcn, ucvlatin, and ucvibm, I ended up
with these results...(on a debug build)

before:
57      ucvibm/ucvibm.dll
133     ucvcn/ucvcn.dll
181     ucvlatin/ucvlatin.dll
371     total

after:
285     ucvwest.dll

a savings of 86k, or 23%. Based on this estimation, I think that this will save
another 44k on release builds, bringing our total up to 130k... and that's
before fixing bug 158003!

As for ucvmath, I don't see why we can't allow you to keep working on ucvmath
stuff, with the understanding that the code is very seperate even though
technically they're in the same DLL. I'll certainly make sure it's all #ifdefed
appropriately.
Comment 11 rbs 2002-07-22 17:47:22 PDT
Keeping it separate will avoid undue hassle (I speak from experience) -- it is
only 19K and doesn't cause harm to anyone.
Comment 12 Alec Flett 2002-07-23 15:25:30 PDT
Created attachment 92455 [details]
consolidate everything but ucvmath

ok fine. for now I'll skip ucvmath .. but I may be back! :)

Anyway, this gzipped patch is for Unix and Win32.. I'm going to apply it to my
mac build and attempt to create 3 new projects.. (ucvutil, ucvwestern, and
ucvasia) wish me luck :)

I don't know what mimetype to use for gzip.. the un-gzipped patch is 551k, most
of which is very mechanical removal of GetMaxLength from MANY classes, and
removal of all the nsUC*Support.* files, which are now located in
intl/uconv/util as a static library.
Comment 13 Alec Flett 2002-07-25 08:59:03 PDT
Created attachment 92746 [details] [diff] [review]
consolidate everything but ucvmath v1.1

A minor update, and without the -N, so the patch is smaller and doesn't need to
be gzipped - for some reason I couldn't read ungzipped patches on my mac.
Comment 14 Alec Flett 2002-07-29 09:51:38 PDT
Comment on attachment 92746 [details] [diff] [review]
consolidate everything but ucvmath v1.1

ok, I've tested this patch to the best of my ability on windows, mac, and
Linux. when the tree opens today, I'll check in the updated mac project files
(not yet part of the build)

Next I'll do a test build on each platform (or maybe get someone to help me)
and get it to IQA. Who should I ask in IQA to test this?
Comment 15 Alec Flett 2002-07-29 09:55:58 PDT
Created attachment 93163 [details] [diff] [review]
update mac build scripts

this additional patch for mac will update the build scripts to only build the
two new libraries - it also removes the use of the MANIFEST files, where were
completely unnecessary - the manifest files only exported the CID files, which
are only used by the module code.

oh, I'll have yet another patch for packaging updates.
Comment 16 Alec Flett 2002-07-29 10:06:09 PDT
Created attachment 93167 [details] [diff] [review]
updating packaging files

and this updates the packaging files, and removes the old library from an
existing install. I'll bet I have to do this on the commercial side too...
*sigh*
Comment 17 Roy Yokoyama 2002-07-29 10:29:42 PDT
Ahhh,  i should have paid more attention to your consolidation...........
ucvcn doesn't belong to ucvwest.dll.   It's for Simplified Chinese users.
I'd really appreciate if you can remove the ucvcn from ucvwest.dll 
and add to ucvasia.dll   Sorry, alec ;(

>Next I'll do a test build on each platform (or maybe get someone to help me)
>and get it to IQA. Who should I ask in IQA to test this?
I can help you on Win if you'd like.  I believe ylong may be the right iQA to
test some sites.
andreas: how is ylong's workload?  alec's patch will impact all CJK and western
pages. 
    (ie. all the pages except MathML)
Comment 18 Alec Flett 2002-07-29 10:33:16 PDT
oh! absolutely no problem :) I didn't understand all these codepage things
anyway! I'll do that and update the patch here.

Comment 19 Alec Flett 2002-07-29 11:56:17 PDT
Created attachment 93181 [details] [diff] [review]
consolidate everything v1.2

ok, this patch is one big patch which covers
1) moving ucvcn from ucvwestern to ucvasia
   during which I had to fix the g_AsciiMapping table that conflicted between a
few modules - there is now a g_ucvko_AsciiMapping so that there is no more
confusion.
2) mac build scripts
3) packaging

hopefully this is the final patch, before reviews anyway.

You'll also notice that during this process, I removed a bunch of
implementations of GetMaxLength, and implemented support for a
"MaxLengthFactor" which is the factor by which you multiply the srclen to get
the destlen - this way, 95% of the classes can use the base class and not
implement their own GetMaxLength, just to multiply by some constant number...
this saves us a vtable and code footprint for every converter.
Comment 20 Alec Flett 2002-07-30 09:15:10 PDT
roy - thanks for the offer to help - do you need a build, or would you like to
just apply the patch to your own build?
one thing I noticed after I attached the patch was that in TestUConv.cpp you
have to put an #ifdef NS_DEBUG around:
+    if (dec) {
+      nsCOMPtr<nsIBasicDecoder> isBasic = do_QueryInterface(dec);
+      if (isBasic) {
+        basicDecCount++;
+        printf("b");
+      }
+      else printf(" ");
+    }
+    else printf(" ");


and around the equivalent code for nsIBasicEncoder - it is only a test of course

I'm working on getting a linux build now.. I could really use some help on the
mac though, my little iMac is going to take forever to build a fresh release
build...
Comment 21 Roy Yokoyama 2002-07-30 10:18:30 PDT
>roy - thanks for the offer to help - do you need a build, or would you like to
>just apply the patch to your own build?
I just thought you may need my Windows build help; but I guess
you have no problem on Windows platform.

>I could really use some help on the mac though
naoki?  can you help alec on Mac?
Comment 22 Alec Flett 2002-07-30 15:42:08 PDT
ok, one more nit on that last patch.. I just applied this to a totally fresh
tree - you need to switch the order of "src" and "util" in
intl/uconv/Makefile.in - the static library in "util" is required to build the
dynamic library in "src"

I'll make a new patch once I'm certain that this builds once and for all :)
Comment 23 Alec Flett 2002-07-30 16:08:29 PDT
Created attachment 93342 [details] [diff] [review]
consolidate everything v1.21

ok, last patch, I hope :) This just cleans up a few minor things:
1) g_ucvko_AsciiMapping stuff - causing missing symbols during link
2) NS_DEBUG stuff in TestUConv.cpp
3) build ordering in intl/uconv/Makefile.in
Comment 24 Alec Flett 2002-07-30 17:23:43 PDT
ok, that last patch does in fact work. So who do we have lined up to test?

Windows: Roy or ylong?
Mac: Naoki?
Linux: ylong?

Thanks a LOT for your help guys. What platforms do we need actual builds on, and
what platforms can we just apply the above patch? The patch is obviously easier
because distributing binaries around can be pretty difficult :)

Another option is that I could also branch the intl/uconv directory and someone
could update that directory to the branch before building.
Comment 25 Frank Tang 2002-07-31 09:52:28 PDT
>ftang: ultimately, I think I can consolidate these libraries to well under 500k.
>Also, DLLs are paged into memory on a page-by-page basis, usually with a
>granularity of 4k - this will not increase the heap or code footprint of the
>product, because only those pages which are loaded (i.e. via
>nsIModule.getFactory) will be paged in.
If that is the case, why don't we combin them together into one uconv.dll and
let English user see the same impact ?
Please either (1) leave the dll structure as is, or (2) combine them together
with uconv, ucvlatin and every converter into one dll. I don't want to see in
one hand we want to keep minimum download for US/English uers and in the other
hand force Asian embedding build increase download size. IF embedding care about
dll download size for English users. I care that for Asian embedding custmer who
wil lonly include japanese dll but not chinese dll too. 
Comment 26 Alec Flett 2002-07-31 10:23:53 PDT
What asian embedding customers are demanding (for instance) a japanese-only dll?
I'd like to know before we hypothesize about what markets are demanding what
configuration. I understand that Japanese/Chinese/etc are independent users but
I am trying to weigh the overall cost of having all these seperate DLLs - to
generate the most benefit for the largest markets. Right now we're hurting all
markets by having bloated DLLs.

The way I look at it is this: After these changes, here are the net benefits for
each market:
            Include all libraries | only include market-specific
Non-Asian         -130k           |      -86k
Asian             -130k           |      +382k

(where Asian means a specific asian market like Japanese - the increase is the
cost of including the extra non-Japanese converters that you get with the
combined DLL, minus the total savings of combining the DLLs)

If I were to combine them all into one dll, the results would look something like:
            Include all libraries | Don't include them
Non-Asian         -130k           |      +382k
Asian             -130k           |      +382k

Basically I'm optimizing for 3 markets right now. The most important markets to
me are both of the "Include all libraries" markets - the ones who want full
browser support. The next most important are the non-asian who don't want the
libraries, because they are currently the next most demanding embedding customer.

All this being said, all of these patches do not prevent us later adding some
tweaks to the Makefiles which will allow individual DLLs to be created - for
instance, I could forsee a new flag like MOZ_SEPERATE_CONVERTERS=ja which would
cause ucvja to be generated as a DLL in addition to a static library, thus
satisfying the japanese-only market.
Comment 27 Alec Flett 2002-08-01 14:54:36 PDT
naoki/ylong/roy, I need some help getting this tested. Can I get some help? It
would work best if I could get you to apply the patch and do a build, but if I
need to deliver binaries to someone, I will!
Comment 28 nhottanscp 2002-08-01 15:28:45 PDT
Alec, please talk to Frank and get the patch reviewed first.
This is not a small patch, I prefer if you could provide a binray to IQA
(assuming the change is already in your local build).
Comment 29 Frank Tang 2002-08-01 15:34:33 PDT
>What asian embedding customers are demanding (for instance) a japanese-only dll?
let me ask a different question:
What asian embedding customers are demanding (for instance) a english-only dll?
that is what we have in the embedding project right now, right ?
for the very same reason, (if there are a need for english only embedding), you
will have a need for japaense only.

alecf: why don't we sit down in a meeting together first. 

Comment 30 Alec Flett 2002-08-02 16:14:23 PDT
ok, talked with frank over e-mail - we decided that one DLL really is the way to
go. I have a working patch for linux/windows and I just need to get the mac
project files lined up before I attach another patch.
Comment 31 Alec Flett 2002-08-02 16:24:06 PDT
alright, I have a windows release build with everything combined into one dll.
The results:

Before consolidation:
48      ./src/uconv.dll
84      ./ucvcn/ucvcn.dll
36      ./ucvibm/ucvibm.dll
184     ./ucvja/ucvja.dll
120     ./ucvko/ucvko.dll
100     ./ucvlatin/ucvlatin.dll
28      ./ucvmath/ucvmath.dll
128     ./ucvtw/ucvtw.dll
184     ./ucvtw2/ucvtw2.dll
912     total

after consolidation:
728     src/uconv.dll

that's a savings of 186k! with the 2-dll solution, it would have been 768k, so
the final consolidation saved us an additional 40k.
Comment 32 Alec Flett 2002-08-05 12:11:19 PDT
Created attachment 94056 [details] [diff] [review]
consolidate everything v1.3

ok, I have consolidated all 8 dlls into a single dll. The attached patch is
everything except the differences to uconv.xml on mac - this patch is 380k, and
the  patch to uconv.xml alone is 345k, due to some strange changes codewarrior
made when I added some files. 

Anyway, ftang, can you review this? 90% of the changes are changes to classes
such that GetMaxLength() is calcluated by the base class as a factor of the
srclen - for those classes which had a custom GetMaxLength() - for instance
UTF8 conversion, and so forth, I just left GetMaxLength alone.
Comment 33 rbs 2002-08-05 16:29:09 PDT
+// ucvlatin
+#include "nsUCvLatinCID.h"
+#include "nsUCvLatinDll.h"
[...etc...]

what about having a #define list that is hosted in each ucvXXX. This way, you
could merge ucvmath as well, knowing that any changes over there won't have to
require to change the driver files explicitly because the setup is distributed.
Comment 34 Alec Flett 2002-08-05 17:05:14 PDT
rbs: true enough.. I'm going to make another pass through for bug 158003 so how
about I do that there? 
Comment 35 rbs 2002-08-05 18:48:00 PDT
OK then.
Comment 36 Frank Tang 2002-08-09 13:50:54 PDT
Comment on attachment 94056 [details] [diff] [review]
consolidate everything v1.3

r=ftang
except the part of ucvasia.dll,so ucvwest.dll,so and mozucth.xpt in
xpinstall/packager
which I mention to alecf in private email. 

I think we should land this asap so IQA can test it and we have enough time to
react befor m1.2 alpha if there are any mistake
Comment 37 Frank Tang 2002-08-09 13:58:16 PDT
I think now is the perfect timing to land this kind of mass changes. 
please get sr= 
Comment 38 Alec Flett 2002-08-09 14:16:27 PDT
I agree - sr requests have been made, I'm just waiting for them to come in...
Comment 39 Christopher Blizzard (:blizzard) 2002-08-10 09:58:01 PDT
Comment on attachment 94056 [details] [diff] [review]
consolidate everything v1.3

sr=blizzard

(Really with a patch this large and this repetitive it's a rubber stamp, but I
did look through the whole patch and didn't see anything bad.)
Comment 40 Alec Flett 2002-08-12 14:51:22 PDT
ok, this has landed - still lots of time to keep testing this for mozilla1.2
Comment 41 Darin Fisher 2002-08-13 07:10:40 PDT
FYI this patch appears to have cost us roughly 1% at startup (see comet).
Comment 42 hacker formerly known as seawood@netscape.com 2002-08-13 08:39:25 PDT
FYI, static builds broke with this checkin.  For some reason, ucvmath & the
merged uconv dlls are redefining symbols in some Decoder/Encoder support classes:
nsBasicDecoderSupport
nsBufferDecoderSupport
nsTableDecoderSupport
nsMultiTableEncoderSupport
etc
Comment 43 Alec Flett 2002-08-13 09:33:12 PDT
darin: yeah, I noticed that :( I'm trying to see what I can do.. I have
something that will further reduce this dll by another 30k or so (See bug 158003)

seawood: yes, before each library had their own unique set of shared classes -
ucvja had "nsUCvJABasicDecoderSupport", ucvko had "nsUCvKOBasicDecoderSupport"
and so forth - now they all share the same set of classes, as defined in the
library intl/uconv/util - ucvmath shares these as well now.. 

I don't know exactly how the static build works, but I'm guessing we need to
make sure that library doesn't get added twice.
Comment 44 Alec Flett 2002-08-14 16:05:37 PDT
ok, marking this fixed for now... still working on a way to recover the startup
cost...
Comment 45 David Baron :dbaron: ⌚️UTC-7 2002-08-16 18:30:41 PDT
I was looking at the static build problem -- it seems that ucvmath still has its
own copy of what's in uconv/util.  To fix the static build, it seems like you
need to make ucvmath use the util library, and then list the util library in
LIBS instead of SHARED_LIBRARY_LIBS for one or both of src and ucvmath.
Comment 46 David Baron :dbaron: ⌚️UTC-7 2002-08-17 07:39:28 PDT
The static build bustage is now covered by bug 163217.
Comment 47 Simon Montagu :smontagu 2009-08-23 10:28:36 PDT
*** Bug 94221 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.