Closed
Bug 95518
Opened 23 years ago
Closed 23 years ago
reduce memory usage of fontinfo map in Linux GFX
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
FIXED
mozilla0.9.5
People
(Reporter: ftang, Assigned: bstell)
Details
(Keywords: memory-footprint)
Attachments
(7 files)
9.74 KB,
patch
|
Details | Diff | Splinter Review | |
26.64 KB,
patch
|
Details | Diff | Splinter Review | |
28.96 KB,
patch
|
tetsuroy
:
review+
|
Details | Diff | Splinter Review |
42.93 KB,
patch
|
Details | Diff | Splinter Review | |
1.50 KB,
text/plain
|
Details | |
29.42 KB,
patch
|
Details | Diff | Splinter Review | |
48.08 KB,
patch
|
roland.mainz
:
review+
brendan
:
superreview+
|
Details | Diff | Splinter Review |
Currently, GFX on Linux use a PRUint32 info[2048] to represent it can render
unciode or not.
bstell mention that he have a algorithm which cna reduce the needed memroy.
Reporter | ||
Comment 1•23 years ago
|
||
bstell said the new algorith can reduce memory from 8K per encoding (for bitmap
fonts) or per TrueType Font, to about 500 bytes.
footprint bug
Keywords: footprint
Reporter | ||
Comment 3•23 years ago
|
||
bstell said he will divide the 16 bits address space to a 3 stage lookup
4 x 4 x 8
So the smallest map will be
(2 ^ 4) x sizeof(pointer) + 2 x (2 ^ 4) x sizeof(pointer) + 2 x 32
= 256 bytes
and the largest map will be
(2 ^ 4) x sizeof(pointer) + 17 x (2 ^ 4) x sizeof(pointer) + 257 x 32
= 9376 bytes
he said most of the map for WGL4 fonts are 600-700 bytes and the larget font he
have use about 6K
Currently, all the font use 8K bytes.
Assignee | ||
Comment 5•23 years ago
|
||
Assignee | ||
Comment 6•23 years ago
|
||
Assignee | ||
Comment 7•23 years ago
|
||
Frank made a a couple suggestions:
1) Since the CCMap is 256 bit page oriented it would be more
efficient to set bits from the 8K map a page at a time. This
same interface would work for the copying pages from a CCMap.
2) Some fonts have large areas of all bits set. So when setting
a page, if all the bits are set (all 1's) allocate an
"all bits set page" page to share.
Assignee | ||
Comment 8•23 years ago
|
||
Assignee | ||
Comment 9•23 years ago
|
||
The code in attachment 47105 [details] [diff] [review] actually uses implements "pointers" using
2 byte offsets so
the smallest table (empty map) is 3*32=96 bytes
small fonts should be in the 200-700 byte range
large fonts (ie: cyberbit) should be in the 4K range
the max size is 8800 bytes
Assignee | ||
Comment 10•23 years ago
|
||
Min map size = 32 bytes.
------------------------
Its actually possible for an empty map to fold the Upper pointers, the empty
mid pointers, and the empty page on top of each other and reduce the map
to 32 bytes total. This may be useful for a 20bit Unicode map. This is
compatible with the current accessor macro.
Sharing Maps
------------------------
It seems reasonable that fonts released as a group (eg: Helvetica regular /
bold / italic / bold-italic) might have the same char map. Thus we might
save more memory by sharing maps if we had low cost way to identify equal maps.
We might consider adding a 32 byte control block. It could hold the number of
chars and a map signature/digest. These could be used as a quick check to see
if 2 maps were not identical which would speed up the process of looking for
identical maps.
Comment 11•23 years ago
|
||
Comment on attachment 47105 [details] [diff] [review]
revised patch: loads by 256 bit "page", folds "all bits set" pages
/r=yokoyama
Updated•23 years ago
|
Attachment #47105 -
Flags: review+
Comment 12•23 years ago
|
||
bstell - wanna include a patch for Xlib toolkit, too - or should I file a
seperate bug for this ?
Assignee | ||
Comment 13•23 years ago
|
||
Hopefully this will propogate to all GFX versions (win,mac,xlib,ps,...).
I would prefer to open separate bugs and once this one has a couple days of
soak then do the other versions.
Comment 14•23 years ago
|
||
Nit: don't expandtab in Makefile.in, do use tabs (style only, in this case, but
if you modified a command line for a rule, you would need a tab or make would barf).
I'm wondering why you don't use prbitmap_t (prbit.h) instead of PRUint16, just
for the slight gain (many architectures do 32-bit or 64-bit -- unsigned long is
what prbitmap_t typedef's to -- loads, stores, and ALU-ops). In light of that
idea, maybe the macros should use WORD instead of LONG or SHORT.
Also, the magic 2048 used to dimension PRUint32 arrays should be #defined with a
value computed from more primitive macros.
Early comments, I'll say more in a bit -- neat work. Any reason to leave bug
75260 open still?
/be
Comment 15•23 years ago
|
||
Brian Stell wrote:
> Hopefully this will propogate to all GFX versions (win,mac,xlib,ps,...).
>
> I would prefer to open separate bugs and once this one has a couple days of
> soak then do the other versions.
I disagree in the nsFontMetricsXlib*.*-case. Filing seperate bugs to
"hunt&catch-up" with all changes in the GTK+ gfx stuff is a mess... it usually
happens like this: Once I pulled one change over I see three new changes in GTK+
stuff... OK, then I'll file new bugs, get r= but no quick sr=, then sr= in the
wrong order or no sr= because superreviewer cannot test them because the patches
depend on each other (well... the nsFontMetrics* code isn't bug enougth to avoid
this... ;-(( ). After all, I have my sr= for all patches, can check-in and still
lag far behind of the GTK+ gfx status.[1]
Not to mention a closed tree which requires a= because it waits to be
branched... ;-((
Filing "combined" patches which pull multiple changes from GTK+ to Xlib is
difficult, too - superreviewers always beat me because patches are usually
horrible big in those cases... ;-((
That's why I decided to spend a week to _sync_ the GTK+-->Xlib[2]
nsFontMetrics*.* sources and now try beg the people to patch GTK+ _and_ Xlib
both - which is _not_ difficult: Sources are ~99% identical and
nsFontMetricsXlib.* are build by default for the Xprint module.
Anyway... I am going to file a new patch for both GTK+ and Xlib gfx in ~1-2
hours...
[1]=Just the todays count: I am "monitoring" 11 (eleven!!) other bugs which will
patch nsFontMtricsGTK.* in this/next month... just guess how much time gets
wasted to sync all this code... ;-((
[2]=Which was only a GTK+ --> Xlib one-way sync... Xlib gfx stuff currently
contains many improvents which may be usefull in the GTK+ gfx stuff, too...
Assignee | ||
Comment 16•23 years ago
|
||
Brendan:
Thanks for pointing out prbitmap_t. Like so much in the Mozilla I was not
aware of it. I will take a look.
In the current design every block is the same 32 bytes and this give fairly
good memory usage and makes allocating / accessing simple. You mention
there might be a "slight gain" in performance using 32/64 bit accessing
instead of 16 bit. Ftang mentioned using larger accesses but since we would
have the same number of SHIFT/AND and INDEX/LOAD operations I am unclear
where the actual gain would be.
Bug 75260 is for windows as opposed to linux. There are several bugs open
which are very similar to this bug. I will clean this up once this is done.
Yes, you are right I should clean up and define 2048 instead of just
following the existing code.
Assignee | ||
Comment 17•23 years ago
|
||
Roland:
I feel there are 3 separate issues here:
1) I want to measure the real performance impact before we apply this to
all platforms.
2) It is very hard to get code r=/sr=/a=.
I am trying to avoid the pain that rbs is suffering in bug 99010 where
many files need to be reviewed by many people all at onece. It is my belief
that once this is in and working that it will be easier to get the r=/sr=
for the other UIs.
3) The common code in nsFontMetricsGTK.cpp and nsFontMetricsXlib.cpp
should be shared not duplicated. We have been adding alot of code to
nsFontMetricsGTK.cpp to do glyph search/fill-in that should be applied
to Xlib/QT/PS/... We really should open a bug to factor out the common
code.
Comment 18•23 years ago
|
||
Comment 19•23 years ago
|
||
> 2) It is very hard to get code r=/sr=/a=.
> I am trying to avoid the pain that rbs is suffering in bug 99010 where
> many files need to be reviewed by many people all at onece. It is my belief
> that once this is in and working that it will be easier to get the r=/sr=
> for the other UIs.
I tried that. This ends-up in the issues I described in my last comment.
Even the comment "it's only a port patch, code already exists in GTK+ gfx code"
does not prevent the case that the patches need a week or more until they are in
CVS. And this usually screws-up&blocks my other patches... and finally I am in
hell because I have to retarget bugs to later milestones - even if the code is
already "done" and "ready" for CVS... ;-(((
> 3) The common code in nsFontMetricsGTK.cpp and nsFontMetricsXlib.cpp
> should be shared not duplicated. We have been adding alot of code to
> nsFontMetricsGTK.cpp to do glyph search/fill-in that should be applied
> to Xlib/QT/PS/... We really should open a bug to factor out the common
> code.
That's what I am already doing for Xlib toolkit and the Xprint module. Both
share most of their code. For PS module it will be very difficult (I tried that
in my gfx/src/dps/ port) - and the GTK+ gfx stuff is IMHO useless because it's
functionality is 100% identical to the Xlib gfx code functionality (except the
add-ons for Xprint stuff).
In theory we could use C++ templates for code sharing between all those
modules... but I am not sure whether the coding guidelines would allow such an
exception...
----
bstell: Wanna r= that Xlib part of my new patch, please ?
Comment 20•23 years ago
|
||
(Forgot to cc: myself -- feel free to cc: me if I've failed to do so and you're
responding to my review comments. Thanks.)
The win in using natural-int sized loads and store is that on modern
architectures, a byte or short access can take longer to handle, because it
turns into a 32-bit or 64-bit load (either through compiler instruction
selection, where the compiler loads 32 bits and then shifts or otherwise
extracts the desired 16 bit half-word [see Alpha Architecture Reference Manual,
A.4.1, and note that Alpha calls a 16-bit unit a "Word"]; or in the CPU's
interface to [busses to] memory, where a similar process must take place).
Stores too may take longer to retire if 16-bit rather than 32- or 64-bit,
sometimes quite a bit longer; stores that do not fill cache lines completely
before the lines must be written back can lead to decimal order-of-magnitude
slowdowns, although stores aren't critical to this bug's patch.
Having written all that, I'm still in favor of your using a prbitmap_t or
unsigned long type, although NSPR may not provide the optimal type for all
platforms. As you say, the number of ALU ops in computing addresses and testing
bits does not vary with load size -- so why not use the best load size, which on
modern chips is not 16 bits?
/be
Comment 21•23 years ago
|
||
Cc'ing wtc for prbitmap_t and general CPU architecture optimization advice.
/be
Assignee | ||
Comment 22•23 years ago
|
||
> FreeCCMap(gUserDefinedCCMap); gUserDefinedCCMap = nsnull;
The FreeCCMap() takes a reference and after freeing it sets it to
null so the "gUserDefinedCCMap = nsnull;" part is not needed.
> PRUint32 map[2048];
See Brendan's comments about 2048.
Assignee | ||
Comment 23•23 years ago
|
||
So will the basic block stay 32 bytes?
I'm unclear on how to write platform independent code using prbitmap_t that
accesses a 32 byte block.
Can I get some help writing the CCMAP_HAS_CHAR macro using prbitmap_t?
Once that is done I am sure I can convert all the other code.
Comment 24•23 years ago
|
||
bstell: see next attachment. I reworked the macros to derive from
PR_BITS_PER_LONG_LOG2, which NSPR's prbit.h uses for its prbitmap_t macros. I
defined a PR_BITS_PER_BITMAP_LOG2 first, to hide the "LONG" implementation
detail (NSPR should do this -- wtc?).
I didn't rework all macros to minimize repeated magic or related numbers, but
that could be done, too (for instance, CCMAP_NUM_CHARS_PER_PAGE and
CCMAP_PAGE_MASK could both be based on a new CCMAP_BIT_INDEX_LOG2 macro, using
PR_BIT and PR_BITMASK respectively. You might want to do that, just so it's
easy to tweak fundamental parameters in one place if another change is needed.
/be
Comment 25•23 years ago
|
||
Assignee | ||
Comment 26•23 years ago
|
||
I will incorporate Brendan's work and suggestions and rework the Xlib code.
Comment 27•23 years ago
|
||
prbitmap_t is one of the things (Brendanisms?) NSPR
inherited from its predecessor (NSPR1). NSPR itself
doesn't use prbitmap_t, and I am not familiar with
prbitmap_t either.
The only comment about prbitmap_t in the source code is:
A prbitmap_t is an integer that can be used for bitmaps
It says nothing about optimal performance or natural int
size. So I am not sure that prbitmap_t is the right type
if those are your goals.
It seems that the simple 'int' is the integer type with
the natural size. 'int' is at least 16 bits. So seems
like 'int' (as opposed to prbitmap_t) should be the right
replacement for PRUint16 here.
Comment 28•23 years ago
|
||
wtc: it's not a brendan-ism, it's a kipp-ism. Are you going to deprecate it?
It seems worthwhile, given this bug and earlier (now defunct?) uses of it (kipp
used it in the Netscape JVM, IIRC). I imagine that int could be suboptimal on
Alpha, for instance, compared to long, if long is 64 bits.
/be
Comment 29•23 years ago
|
||
bstell: I don't think we need to fret too much about int vs. long vs. prbitmap_t
(which must be known to be a typedef for long, for us to be able to make use of
a PR_BITS_PER_LONG_LOG2 or similar macro). Sorry for belaboring the issue here
-- I wanted to get wtc's thoughts and NSPR commitments. If you prefer to keep
dependencies fewer, just use long instead of prbitmap_t, and avoid including
prbit.h. So long as we avoid PRUint16, I think we've optimized enough.
/be
Assignee | ||
Comment 30•23 years ago
|
||
In this area I am going to carefully avoid having an opinion.
Pick your poison:
Do we prefer long or prbitmap_t?
If I don't hear any strong opinions I will default to Erik's and Ftang's
choice of long.
Assignee | ||
Comment 31•23 years ago
|
||
Can I assume a long is always at least as long as a PRUint32?
Comment 32•23 years ago
|
||
> Can I assume a long is always at least as long as a PRUint32?
AFAIK per ANSI-C a |long| is at least 32bit large. I assume a PRUint32 has
exactly 32bits... therefore the answer is YES.
On most 64bit platoforms (like Alpha/SPARCv9-platforms) |long| is 64bit large.
Comment 33•23 years ago
|
||
long wins, if only because prbit.h fails to define a PR_BITS_PER_BITMAP_LOG2
alias for PR_BITS_PER_LONG_LOG2, forcing consumers to "know" the equivalence.
/be
Comment 34•23 years ago
|
||
Why don't you use int? long on Win16 is not
the natural integer size.
Seriously, C99 defines the type you want as
int_fast16_t (I understand you just need a
fast integer type that is at least 16 bits).
But C99 is not widely implemented yet.
Comment 35•23 years ago
|
||
Mozilla dropped Win16 support a long time ago. I'm ok with int, too, but have
gut-level doubts (memory from SGI and MicroUnity days) that in LP64 compilation
type models, on 64-bit CPUs, 32-bit loads will be slower.
/be
Comment 36•23 years ago
|
||
64-bit CPUs meaning the whole cache/memory interface, too. School me, I'm out
of date on architecture by a few years.
C99 sounds cool. We added uintN and intN to NSPR1 (now PRUintn and PRIntn in
NSPR) for the same reason as int_fast16_t -- to have a "natural" integer type of
at least 16 bits. Neither naming scheme is great.
Brian, you don't need to hang on this int vs. long debate, pick int to make wtc
happy and ignore my doubts -- we can always tune later if ever the cost shows up
(unlikely).
/be
Assignee | ||
Comment 37•23 years ago
|
||
Assignee | ||
Comment 38•23 years ago
|
||
I added a compile time switch to control the bit access:
#define ALU_SIZE PR_BITS_PER_LONG
//#define ALU_SIZE 16
//#define ALU_SIZE 32
//#define ALU_SIZE 64
I tested all 3 on my Redhat 7.1 Linux system.
Feel free to recompile with a different one active to try the other sizes.
Roland: would you kindly try this on Xlib?
Are we ready for sr= ?
Comment 39•23 years ago
|
||
bstell, the new files are not in that patch, can you cvs add them and produce a
cvs diff -uN from the top-most common directory, for an all-in-one patch? Also,
a few minor comments:
- It seems as though the pattern:
memset(map, 0, sizeof(map));mapper->FillInfo(map);MapToCCMap(map)
with nsresult checking elided, occurs a lot -- make a common subroutine?
- SetUpFontCharSetInfo(nsFontCharSetInfo* aSelf) should return a boolean so its
caller doesn't have to "know" to test charSetInfo->mCCMap?
- The "Now, single byte documents find these special chars before the CJK fonts
are searched so this is no longer needed and should be removed." comment is
new -- why not just remove the code that's no longer needed? If not, please
cite a bug # in the comment.
/be
Assignee | ||
Comment 40•23 years ago
|
||
Comment 41•23 years ago
|
||
Comment on attachment 49718 [details] [diff] [review]
patch with requested changes
Nit: The LOG2 macros actually compute POW2(n) -- rename? NSPR calls this
macro PR_BIT, btw.
The xlib port needs the citation of bug 100233, also.
'nother naming nit: GetCCMap is an allocator, not a getter -- NewCCMap? Might help
future-leak-proof it very slightly.
Too bad FONT_HAS_GLYPH wasn't parameterized by font, instead of font->mMap -- then it
could have been layered transparently on top of CCMAP_HAS_CHAR. Worth
doing now?
In nsCompressedCharMap::SetChars(PRUint32* aMap), there's an empty #if (CCMAP_BITS_PER_ALU == 16)\n#endif clause.
Also, alu_val could be declared where it's set to 0, in the loop.
In the next #elif clause, int v is set but never used.
Above that method, isn't SetChars(PRUint16* aCCMap) now dead code? If so, remove it and
any macros used only by its body.
Otherwise, looks good. I'm tiring of writing comments in this tiny textarea! My only
lingering thought is that the code might be clearer if you gave up
on the PRUint16 type and sizing and just used
ALU_TYPE throughout.
Code looks pure (just curious, have you run purify? Not necessary, just wondering).
sr=brendan@mozilla.org with the above nits picked, except for the PRUint16 abolition,
which I leave up to you.
/be
Attachment #49718 -
Flags: superreview+
Assignee | ||
Comment 42•23 years ago
|
||
I'll change LOG2 to CCMAP_POW2. I saw that PR_BIT did POW2 but it does not
handle a 64 bit number on a system with a 32 bit ALU. I also felt that using
it for POW2 would make obscure code since the name PR_BIT seems more like a
name that is used to generate a bitmask than to raise to a power of 2.
Is the comment "code might be clearer if you gave up on the PRUint16 type"
meant to include changing the what the upper and mid indexes are handled?
The table sizes for 16/32/64 bit "pointers" are:
+--------------------------------+------+------+------+
| pointers + data bits| 16 | 32 | 64 |
+--------------------------------+------+------+------+
| min 2*(16*alu_size) + 1*32 | 96 | 160 | 288 |
| avg 6*(16*alu_size) + 15*32 | 672 | 864 | 1248 |
| max 19*(16*alu_size) + 256*32 | 8800 | 9408 |10624 |
+--------------------------------+------+------+------+
The avg is based on my looking at a variety of actual western fonts.
I would prefer to use the smallest table size.
I have not used purify. I try to be very careful and look at every line in
the debugger (although I still make mistakes).
Thanks
Comment 43•23 years ago
|
||
bstell: right, the upper and mid cursors would want to stay PRUint16s. I see
the problem: either you have to declare differently typed (and aligned) storage
for those cursors and for the bitmaps, or you have to pun. Ok, I'm convinced.
I started questioning the PRUint16 parameterization partly because of the
SetChars that takes a PRUint16* -- that is a dead method, right?
/be
Assignee | ||
Comment 44•23 years ago
|
||
The SetChars(PRUint16*) was used in an earlier incantation but can now be
removed as you requested. I do plan to save its "fast scan" algorithm but I
can move that into a comment. (Because a CCMap "knows" which sections are
empty the code can skip those sections which allows for a "fast scan".)
The fast scan is not currently used but it is obscure enough and interesting
enough to save.
Comment 45•23 years ago
|
||
Comment on attachment 49718 [details] [diff] [review]
patch with requested changes
OK... I tested the patch on Solaris with Xlib and GTK+ build and a 64bit-Solaris-sparcv9 Xlib build
r=Roland.Mainz@informatik.med.uni-giessen.de
Attachment #49718 -
Flags: review+
Assignee | ||
Comment 46•23 years ago
|
||
checked in.
The FONT_HAS_GLYPH macro was in a lot of files:
gfx/src/gtk/nsFontMetricsGTK.cpp
gfx/src/mac/nsMacUnicodeFontInfo.cpp
gfx/src/os2/nsFontMetricsOS2.cpp
gfx/src/qt/nsFontMetricsQT.cpp
gfx/src/windows/nsFontMetricsWin.cpp
gfx/src/windows/nsRenderingContextWin.cpp
gfx/src/xlib/nsFontMetricsXlib.cpp
Should I make a patch for these and attach it to this bug report?
Assignee | ||
Comment 47•23 years ago
|
||
I should probably make a FONT_HAS_CHAR macro as a transition tool.
Comment 48•23 years ago
|
||
CC:'ing rbs@maths.uq.edu.au
Unix/Linux tinderbox machines are currently RED because the patches in bug 99010
predates this patch.
Comment 49•23 years ago
|
||
I have been keeping my changes in sync with the tip, and have ensured that
things were up-to-date at the time of my carpool. I suspect it is just mid-air
checkins since no errors are now reported on the logs.
Comment 50•23 years ago
|
||
OK, got it...
IS_REPRESENTABLE((*font)->mMap, ...) has been obsoleted, and all these bits
should be corrected, I am doing a s//g and will update...
Comment 51•23 years ago
|
||
rbs:
This is definately _no_ midcheckin.
All Unix/Linux tinderbox machines are RED or will be RED soon until both GTK+
and XLib toolkits gets fixed...
Comment 52•23 years ago
|
||
Should be fixed now. Please rebuild gtk after re-pulling gfx/src/gtk.
Let me know how your build goes -- it is much faster to just re-build gtk than
to wait for another round of Tinderbox, thanks.
Assignee | ||
Comment 53•23 years ago
|
||
This was checked in last week. Finally got performance results:
no performance impact
closing ...
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 54•23 years ago
|
||
Switching qa contact to ftang for now. Frank, can this one be verified within
development or can you provide a test case for IQA? Thanks.
QA Contact: andreasb → ftang
You need to log in
before you can comment on or make changes to this bug.
Description
•