consolidate uconv libraries

RESOLVED FIXED in mozilla1.2alpha



15 years ago
8 years ago


(Reporter: Alec Flett, Assigned: Alec Flett)



Windows 2000
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)



(1 attachment, 6 obsolete attachments)



15 years ago
right now we have a bunch of charset converter libraries. I don't know why we
need all these dlls, when we could combine them into one or two DLLs. Here's
what I'm thinking:

Combine into ucvbase.dll:
ucvmath.dll (only 19k, we might as well just make it part of the default)

Combine into ucvasia.dll:

This should make the total size smaller: right now anywhere from 10-50% of EACH
DLL is a copy and pasted nsUCv*Support.cpp - all these files appear to be
identical. This "Support" file contains a bunch of base classes that all the
decoders/encoders derive from. we can share these classes in a static library,
and then when we combine the above dlls, we'll see a savings because the library
will be shared within the dll.

After this work is completed, I have a further optimization which gets rid of
all the leaf classes for these converters and instead allows each one to be
created off a specific base class. This should reduce the overhead of each
converter down to a simple static function that makes a call to a shared static
constructor. But I'll file that in another bug.

CC'ing some i18n people so they know this is going on. The one question I have
is how we should combine the libraries: does it make sense to combine them all
into one library? If we do two libraries, does the above division look correct?

Comment 1

15 years ago
as a preliminary test, I combined the ucvja and ucvtw2 DLLs (the two largest)
into one, this is a debug build, but here are the results:

237     ucvja/ucvja.dll
233     ucvtw2/ucvtw2.dll
470     total

429     ucvasia/ucvasia.dll

So this is a savings of 41k, or about 9%, just from combining these two
libraries. I imagine we can save an additional 30-50k per library that we add to
this combination.

Note that in release builds, the converter libraries add up to 761k. Based on
this first test, I'm going to estimate that we can knock this down to about
600k, and perhaps even smaller!
Priority: -- → P2
Summary: consolidate uconv libraries → consolidate uconv libraries
Target Milestone: --- → mozilla1.2alpha


15 years ago
Blocks: 158003

Comment 2

15 years ago
Dear alecf, this work will increase run time footprint of asian users.
For example, if I am a Japanese users, most of my time will only view document
in Japanese and Wester encoding so Gecko will only load ucvja and uconv into the
memory, ucvko,ucvlatin, ucvcn,ucvibm,ucvtw and ucvtw2 won't get load into the
memory. if you merge them, then we will increase the in-memory footprint by 500K
for most Japanese users. The same thing happen to Simplified Chinese user,
Korean users, and Traditional Chinese users.

Remember, Japanese users don't read Korean just like you and me. If we don't
want to let English users take those footprint hit from Korean converters, there
are no reason to ask Japanese users to take the footprint hit from Korean
converters neither. 

ucvcn is for simplified chinese users in China
ucvtw and ucvtw2 are for traditional chineusers in Taiwan

I don't mind you combine ucvmath or ucvibm into ucvlatin but usually no body use
them anyway.

Comment 3

15 years ago
code issue, QA to for now.
Keywords: intl
QA Contact: ruixu → yokoyama

Comment 4

15 years ago
ftang: ultimately, I think I can consolidate these libraries to well under 500k.
Also, DLLs are paged into memory on a page-by-page basis, usually with a
granularity of 4k - this will not increase the heap or code footprint of the
product, because only those pages which are loaded (i.e. via
nsIModule.getFactory) will be paged in.

Comment 5

15 years ago
ok, now I've combined ucvja, ucvtw, and ucvtw2 into one library:
165     ucvtw/ucvtw.dll
233     ucvtw2/ucvtw2.dll
237     ucvja/ucvja.dll
635     total

and combined:
553     ucvasia.dll

so now we've saved 82k, or about 12%.. its getting better!


Comment 6

15 years ago
and finally, having combined all 4 asian libraries into the same library:
165     ucvtw/ucvtw.dll
233     ucvtw2/ucvtw2.dll
237     ucvja/ucvja.dll
157     ucvko/ucvko.dll
792     total

and combined:
669     ucvasia.dll

a savings of 123k or about 16%!

based on these results, I think we can knock down the release version of asia
libraries from 552k to 466k, a savings of 86k.

now to combine ucvcn/ucvibm/ucvlatin/ucvmath (currently 424k in in debug builds,
or 209k in release builds..)

Comment 7

15 years ago
I suggest not merging ucvmath so that I don't run into lock-in when trying to
effect changes there.

Comment 8

15 years ago
what's a lock-in?

Comment 9

15 years ago
lock-in means I may have to wait forever to effect changes if people perceive
that they may have wider implications. Often, MathML-related changes are
tangential to other developers, and they don't give them due diligence.

Comment 10

15 years ago
an update on ucvwestern: after combining ucvcn, ucvlatin, and ucvibm, I ended up
with these results...(on a debug build)

57      ucvibm/ucvibm.dll
133     ucvcn/ucvcn.dll
181     ucvlatin/ucvlatin.dll
371     total

285     ucvwest.dll

a savings of 86k, or 23%. Based on this estimation, I think that this will save
another 44k on release builds, bringing our total up to 130k... and that's
before fixing bug 158003!

As for ucvmath, I don't see why we can't allow you to keep working on ucvmath
stuff, with the understanding that the code is very seperate even though
technically they're in the same DLL. I'll certainly make sure it's all #ifdefed

Comment 11

15 years ago
Keeping it separate will avoid undue hassle (I speak from experience) -- it is
only 19K and doesn't cause harm to anyone.

Comment 12

15 years ago
Created attachment 92455 [details]
consolidate everything but ucvmath

ok fine. for now I'll skip ucvmath .. but I may be back! :)

Anyway, this gzipped patch is for Unix and Win32.. I'm going to apply it to my
mac build and attempt to create 3 new projects.. (ucvutil, ucvwestern, and
ucvasia) wish me luck :)

I don't know what mimetype to use for gzip.. the un-gzipped patch is 551k, most
of which is very mechanical removal of GetMaxLength from MANY classes, and
removal of all the nsUC*Support.* files, which are now located in
intl/uconv/util as a static library.


15 years ago
Attachment #92455 - Attachment mime type: application/octet-stream → application/x-gzip

Comment 13

15 years ago
Created attachment 92746 [details] [diff] [review]
consolidate everything but ucvmath v1.1

A minor update, and without the -N, so the patch is smaller and doesn't need to
be gzipped - for some reason I couldn't read ungzipped patches on my mac.
Attachment #92455 - Attachment is obsolete: true

Comment 14

15 years ago
Comment on attachment 92746 [details] [diff] [review]
consolidate everything but ucvmath v1.1

ok, I've tested this patch to the best of my ability on windows, mac, and
Linux. when the tree opens today, I'll check in the updated mac project files
(not yet part of the build)

Next I'll do a test build on each platform (or maybe get someone to help me)
and get it to IQA. Who should I ask in IQA to test this?

Comment 15

15 years ago
Created attachment 93163 [details] [diff] [review]
update mac build scripts

this additional patch for mac will update the build scripts to only build the
two new libraries - it also removes the use of the MANIFEST files, where were
completely unnecessary - the manifest files only exported the CID files, which
are only used by the module code.

oh, I'll have yet another patch for packaging updates.

Comment 16

15 years ago
Created attachment 93167 [details] [diff] [review]
updating packaging files

and this updates the packaging files, and removes the old library from an
existing install. I'll bet I have to do this on the commercial side too...

Comment 17

15 years ago
Ahhh,  i should have paid more attention to your consolidation...........
ucvcn doesn't belong to ucvwest.dll.   It's for Simplified Chinese users.
I'd really appreciate if you can remove the ucvcn from ucvwest.dll 
and add to ucvasia.dll   Sorry, alec ;(

>Next I'll do a test build on each platform (or maybe get someone to help me)
>and get it to IQA. Who should I ask in IQA to test this?
I can help you on Win if you'd like.  I believe ylong may be the right iQA to
test some sites.
andreas: how is ylong's workload?  alec's patch will impact all CJK and western
    (ie. all the pages except MathML)

Comment 18

15 years ago
oh! absolutely no problem :) I didn't understand all these codepage things
anyway! I'll do that and update the patch here.


Comment 19

15 years ago
Created attachment 93181 [details] [diff] [review]
consolidate everything v1.2

ok, this patch is one big patch which covers
1) moving ucvcn from ucvwestern to ucvasia
   during which I had to fix the g_AsciiMapping table that conflicted between a
few modules - there is now a g_ucvko_AsciiMapping so that there is no more
2) mac build scripts
3) packaging

hopefully this is the final patch, before reviews anyway.

You'll also notice that during this process, I removed a bunch of
implementations of GetMaxLength, and implemented support for a
"MaxLengthFactor" which is the factor by which you multiply the srclen to get
the destlen - this way, 95% of the classes can use the base class and not
implement their own GetMaxLength, just to multiply by some constant number...
this saves us a vtable and code footprint for every converter.
Attachment #92746 - Attachment is obsolete: true
Attachment #93163 - Attachment is obsolete: true
Attachment #93167 - Attachment is obsolete: true

Comment 20

15 years ago
roy - thanks for the offer to help - do you need a build, or would you like to
just apply the patch to your own build?
one thing I noticed after I attached the patch was that in TestUConv.cpp you
have to put an #ifdef NS_DEBUG around:
+    if (dec) {
+      nsCOMPtr<nsIBasicDecoder> isBasic = do_QueryInterface(dec);
+      if (isBasic) {
+        basicDecCount++;
+        printf("b");
+      }
+      else printf(" ");
+    }
+    else printf(" ");

and around the equivalent code for nsIBasicEncoder - it is only a test of course

I'm working on getting a linux build now.. I could really use some help on the
mac though, my little iMac is going to take forever to build a fresh release

Comment 21

15 years ago
>roy - thanks for the offer to help - do you need a build, or would you like to
>just apply the patch to your own build?
I just thought you may need my Windows build help; but I guess
you have no problem on Windows platform.

>I could really use some help on the mac though
naoki?  can you help alec on Mac?

Comment 22

15 years ago
ok, one more nit on that last patch.. I just applied this to a totally fresh
tree - you need to switch the order of "src" and "util" in
intl/uconv/ - the static library in "util" is required to build the
dynamic library in "src"

I'll make a new patch once I'm certain that this builds once and for all :)

Comment 23

15 years ago
Created attachment 93342 [details] [diff] [review]
consolidate everything v1.21

ok, last patch, I hope :) This just cleans up a few minor things:
1) g_ucvko_AsciiMapping stuff - causing missing symbols during link
2) NS_DEBUG stuff in TestUConv.cpp
3) build ordering in intl/uconv/
Attachment #93181 - Attachment is obsolete: true

Comment 24

15 years ago
ok, that last patch does in fact work. So who do we have lined up to test?

Windows: Roy or ylong?
Mac: Naoki?
Linux: ylong?

Thanks a LOT for your help guys. What platforms do we need actual builds on, and
what platforms can we just apply the above patch? The patch is obviously easier
because distributing binaries around can be pretty difficult :)

Another option is that I could also branch the intl/uconv directory and someone
could update that directory to the branch before building.

Comment 25

15 years ago
>ftang: ultimately, I think I can consolidate these libraries to well under 500k.
>Also, DLLs are paged into memory on a page-by-page basis, usually with a
>granularity of 4k - this will not increase the heap or code footprint of the
>product, because only those pages which are loaded (i.e. via
>nsIModule.getFactory) will be paged in.
If that is the case, why don't we combin them together into one uconv.dll and
let English user see the same impact ?
Please either (1) leave the dll structure as is, or (2) combine them together
with uconv, ucvlatin and every converter into one dll. I don't want to see in
one hand we want to keep minimum download for US/English uers and in the other
hand force Asian embedding build increase download size. IF embedding care about
dll download size for English users. I care that for Asian embedding custmer who
wil lonly include japanese dll but not chinese dll too. 

Comment 26

15 years ago
What asian embedding customers are demanding (for instance) a japanese-only dll?
I'd like to know before we hypothesize about what markets are demanding what
configuration. I understand that Japanese/Chinese/etc are independent users but
I am trying to weigh the overall cost of having all these seperate DLLs - to
generate the most benefit for the largest markets. Right now we're hurting all
markets by having bloated DLLs.

The way I look at it is this: After these changes, here are the net benefits for
each market:
            Include all libraries | only include market-specific
Non-Asian         -130k           |      -86k
Asian             -130k           |      +382k

(where Asian means a specific asian market like Japanese - the increase is the
cost of including the extra non-Japanese converters that you get with the
combined DLL, minus the total savings of combining the DLLs)

If I were to combine them all into one dll, the results would look something like:
            Include all libraries | Don't include them
Non-Asian         -130k           |      +382k
Asian             -130k           |      +382k

Basically I'm optimizing for 3 markets right now. The most important markets to
me are both of the "Include all libraries" markets - the ones who want full
browser support. The next most important are the non-asian who don't want the
libraries, because they are currently the next most demanding embedding customer.

All this being said, all of these patches do not prevent us later adding some
tweaks to the Makefiles which will allow individual DLLs to be created - for
instance, I could forsee a new flag like MOZ_SEPERATE_CONVERTERS=ja which would
cause ucvja to be generated as a DLL in addition to a static library, thus
satisfying the japanese-only market.

Comment 27

15 years ago
naoki/ylong/roy, I need some help getting this tested. Can I get some help? It
would work best if I could get you to apply the patch and do a build, but if I
need to deliver binaries to someone, I will!

Comment 28

15 years ago
Alec, please talk to Frank and get the patch reviewed first.
This is not a small patch, I prefer if you could provide a binray to IQA
(assuming the change is already in your local build).

Comment 29

15 years ago
>What asian embedding customers are demanding (for instance) a japanese-only dll?
let me ask a different question:
What asian embedding customers are demanding (for instance) a english-only dll?
that is what we have in the embedding project right now, right ?
for the very same reason, (if there are a need for english only embedding), you
will have a need for japaense only.

alecf: why don't we sit down in a meeting together first. 


Comment 30

15 years ago
ok, talked with frank over e-mail - we decided that one DLL really is the way to
go. I have a working patch for linux/windows and I just need to get the mac
project files lined up before I attach another patch.

Comment 31

15 years ago
alright, I have a windows release build with everything combined into one dll.
The results:

Before consolidation:
48      ./src/uconv.dll
84      ./ucvcn/ucvcn.dll
36      ./ucvibm/ucvibm.dll
184     ./ucvja/ucvja.dll
120     ./ucvko/ucvko.dll
100     ./ucvlatin/ucvlatin.dll
28      ./ucvmath/ucvmath.dll
128     ./ucvtw/ucvtw.dll
184     ./ucvtw2/ucvtw2.dll
912     total

after consolidation:
728     src/uconv.dll

that's a savings of 186k! with the 2-dll solution, it would have been 768k, so
the final consolidation saved us an additional 40k.

Comment 32

15 years ago
Created attachment 94056 [details] [diff] [review]
consolidate everything v1.3

ok, I have consolidated all 8 dlls into a single dll. The attached patch is
everything except the differences to uconv.xml on mac - this patch is 380k, and
the  patch to uconv.xml alone is 345k, due to some strange changes codewarrior
made when I added some files. 

Anyway, ftang, can you review this? 90% of the changes are changes to classes
such that GetMaxLength() is calcluated by the base class as a factor of the
srclen - for those classes which had a custom GetMaxLength() - for instance
UTF8 conversion, and so forth, I just left GetMaxLength alone.
Attachment #93342 - Attachment is obsolete: true

Comment 33

15 years ago
+// ucvlatin
+#include "nsUCvLatinCID.h"
+#include "nsUCvLatinDll.h"

what about having a #define list that is hosted in each ucvXXX. This way, you
could merge ucvmath as well, knowing that any changes over there won't have to
require to change the driver files explicitly because the setup is distributed.

Comment 34

15 years ago
rbs: true enough.. I'm going to make another pass through for bug 158003 so how
about I do that there? 

Comment 35

15 years ago
OK then.

Comment 36

15 years ago
Comment on attachment 94056 [details] [diff] [review]
consolidate everything v1.3

except the part of ucvasia.dll,so ucvwest.dll,so and mozucth.xpt in
which I mention to alecf in private email. 

I think we should land this asap so IQA can test it and we have enough time to
react befor m1.2 alpha if there are any mistake
Attachment #94056 - Flags: review+

Comment 37

15 years ago
I think now is the perfect timing to land this kind of mass changes. 
please get sr= 

Comment 38

15 years ago
I agree - sr requests have been made, I'm just waiting for them to come in...
Comment on attachment 94056 [details] [diff] [review]
consolidate everything v1.3


(Really with a patch this large and this repetitive it's a rubber stamp, but I
did look through the whole patch and didn't see anything bad.)
Attachment #94056 - Flags: superreview+

Comment 40

15 years ago
ok, this has landed - still lots of time to keep testing this for mozilla1.2

Comment 41

15 years ago
FYI this patch appears to have cost us roughly 1% at startup (see comet).
FYI, static builds broke with this checkin.  For some reason, ucvmath & the
merged uconv dlls are redefining symbols in some Decoder/Encoder support classes:

Comment 43

15 years ago
darin: yeah, I noticed that :( I'm trying to see what I can do.. I have
something that will further reduce this dll by another 30k or so (See bug 158003)

seawood: yes, before each library had their own unique set of shared classes -
ucvja had "nsUCvJABasicDecoderSupport", ucvko had "nsUCvKOBasicDecoderSupport"
and so forth - now they all share the same set of classes, as defined in the
library intl/uconv/util - ucvmath shares these as well now.. 

I don't know exactly how the static build works, but I'm guessing we need to
make sure that library doesn't get added twice.

Comment 44

15 years ago
ok, marking this fixed for now... still working on a way to recover the startup
Last Resolved: 15 years ago
Resolution: --- → FIXED
I was looking at the static build problem -- it seems that ucvmath still has its
own copy of what's in uconv/util.  To fix the static build, it seems like you
need to make ucvmath use the util library, and then list the util library in
LIBS instead of SHARED_LIBRARY_LIBS for one or both of src and ucvmath.
The static build bustage is now covered by bug 163217.


15 years ago
Blocks: 163737
Duplicate of this bug: 94221
You need to log in before you can comment on or make changes to this bug.