reduce memory usage of fontinfo map in Linux GFX

RESOLVED FIXED in mozilla0.9.5

Status

()

RESOLVED FIXED
17 years ago
17 years ago

People

(Reporter: ftang, Assigned: bstell)

Tracking

({memory-footprint})

Trunk
mozilla0.9.5
x86
Linux
memory-footprint
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(7 attachments)

(Reporter)

Description

17 years ago
Currently, GFX on Linux use a PRUint32 info[2048] to represent it can render 
unciode or not. 
bstell mention that he have a algorithm which cna reduce the needed memroy.
(Reporter)

Comment 1

17 years ago
bstell said the new algorith can reduce memory from 8K per encoding (for bitmap 
fonts) or per TrueType Font, to about 500 bytes.
footprint bug
Keywords: footprint
(Assignee)

Comment 2

17 years ago
isn't this a dup of bug 75260 ?
Status: NEW → ASSIGNED
(Reporter)

Comment 3

17 years ago
bstell said he will divide the 16 bits address space to a 3 stage lookup
4 x 4 x 8
So the smallest map will be
(2 ^ 4) x sizeof(pointer) + 2 x (2 ^ 4) x sizeof(pointer) + 2 x 32
= 256 bytes
and the largest map will be
(2 ^ 4) x sizeof(pointer) + 17 x (2 ^ 4) x sizeof(pointer) + 257 x 32
= 9376 bytes

he said most of the map for WGL4 fonts are 600-700 bytes and the larget font he
have use about 6K 

Currently, all the font use 8K bytes. 
(Reporter)

Comment 4

17 years ago
mark as m95
Target Milestone: --- → mozilla0.9.5
(Assignee)

Comment 5

17 years ago
Created attachment 46693 [details] [diff] [review]
patch; nsCompressedCharMap.cpp/nsCompressedCharMap.h
(Assignee)

Comment 6

17 years ago
Created attachment 46987 [details] [diff] [review]
patch; nsCompressedCharMap and nsFontMetricsGTK
(Assignee)

Comment 7

17 years ago
Frank made a a couple suggestions:

 1) Since the CCMap is 256 bit page oriented it would be more
    efficient to set bits from the 8K map a page at a time. This
    same interface would work for the copying pages from a CCMap.

 2) Some fonts have large areas of all bits set. So when setting 
    a page, if all the bits are set (all 1's) allocate an 
    "all bits set page" page to share.
(Assignee)

Comment 8

17 years ago
Created attachment 47105 [details] [diff] [review]
revised patch: loads by 256 bit "page", folds "all bits set" pages
(Assignee)

Comment 9

17 years ago
The code in attachment 47105 [details] [diff] [review] actually uses implements "pointers" using 
2 byte offsets so 

  the smallest table (empty map) is 3*32=96 bytes

  small fonts should be in the 200-700 byte range

  large fonts (ie: cyberbit) should be in the 4K range

  the max size is 8800 bytes
(Assignee)

Comment 10

17 years ago
Min map size = 32 bytes.
------------------------
Its actually possible for an empty map to fold the Upper pointers, the empty
mid pointers, and the empty page on top of each other and reduce the map
to 32 bytes total. This may be useful for a 20bit Unicode map. This is 
compatible with the current accessor macro.

Sharing Maps
------------------------
It seems reasonable that fonts released as a group (eg: Helvetica regular / 
bold / italic / bold-italic) might have the same char map. Thus we might
save more memory by sharing maps if we had low cost way to identify equal maps.
We might consider adding a 32 byte control block. It could hold the number of 
chars and a map signature/digest. These could be used as a quick check to see 
if 2 maps were not identical which would speed up the process of looking for
identical maps.

Comment 11

17 years ago
Comment on attachment 47105 [details] [diff] [review]
revised patch: loads by 256 bit "page", folds "all bits set" pages

/r=yokoyama

Updated

17 years ago
Attachment #47105 - Flags: review+

Comment 12

17 years ago
bstell - wanna include a patch for Xlib toolkit, too - or should I file a
seperate bug for this ?
(Assignee)

Comment 13

17 years ago
Hopefully this will propogate to all GFX versions (win,mac,xlib,ps,...).

I would prefer to open separate bugs and once this one has a couple days of 
soak then do the other versions. 
Nit: don't expandtab in Makefile.in, do use tabs (style only, in this case, but
if you modified a command line for a rule, you would need a tab or make would barf).

I'm wondering why you don't use prbitmap_t (prbit.h) instead of PRUint16, just
for the slight gain (many architectures do 32-bit or 64-bit -- unsigned long is
what prbitmap_t typedef's to -- loads, stores, and ALU-ops).  In light of that
idea, maybe the macros should use WORD instead of LONG or SHORT.

Also, the magic 2048 used to dimension PRUint32 arrays should be #defined with a
value computed from more primitive macros.

Early comments, I'll say more in a bit -- neat work.  Any reason to leave bug
75260 open still?

/be

Comment 15

17 years ago
Brian Stell wrote:
> Hopefully this will propogate to all GFX versions (win,mac,xlib,ps,...).
> 
> I would prefer to open separate bugs and once this one has a couple days of 
> soak then do the other versions.

I disagree in the nsFontMetricsXlib*.*-case. Filing seperate bugs to
"hunt&catch-up" with all changes in the GTK+ gfx stuff is a mess... it usually
happens like this: Once I pulled one change over I see three new changes in GTK+
stuff... OK, then I'll file new bugs, get r= but no quick sr=, then sr= in the
wrong order or no sr= because superreviewer cannot test them because the patches
depend on each other (well... the nsFontMetrics* code isn't bug enougth to avoid
this... ;-(( ). After all, I have my sr= for all patches, can check-in and still
lag far behind of the GTK+ gfx status.[1]
Not to mention a closed tree which requires a= because it waits to be
branched... ;-((
Filing "combined" patches which pull multiple changes from GTK+ to Xlib is
difficult, too - superreviewers always beat me because patches are usually
horrible big in those cases... ;-((

That's why I decided to spend a week to _sync_ the GTK+-->Xlib[2]
nsFontMetrics*.* sources and now try beg the people to patch GTK+ _and_ Xlib
both - which is _not_ difficult: Sources are ~99% identical and
nsFontMetricsXlib.* are build by default for the Xprint module.

Anyway... I am going to file a new patch for both GTK+ and Xlib gfx in ~1-2
hours...

[1]=Just the todays count: I am "monitoring" 11 (eleven!!) other bugs which will
patch nsFontMtricsGTK.* in this/next month... just guess how much time gets
wasted to sync all this code... ;-((
[2]=Which was only a GTK+ --> Xlib one-way sync... Xlib gfx stuff currently
contains many improvents which may be usefull in the GTK+ gfx stuff, too...
(Assignee)

Comment 16

17 years ago
Brendan:

Thanks for pointing out prbitmap_t. Like so much in the Mozilla I was not
aware of it. I will take a look. 

In the current design every block is the same 32 bytes and this give fairly
good memory usage and makes allocating / accessing simple. You mention 
there might be a "slight gain" in performance using 32/64 bit accessing 
instead of 16 bit. Ftang mentioned using larger accesses but since we would 
have the same number of SHIFT/AND and INDEX/LOAD operations I am unclear 
where the actual gain would be.

Bug 75260 is for windows as opposed to linux. There are several bugs open 
which are very similar to this bug. I will clean this up once this is done.

Yes, you are right I should clean up and define 2048 instead of just 
following the existing code.

(Assignee)

Comment 17

17 years ago
Roland:

I feel there are 3 separate issues here:

1) I want to measure the real performance impact before we apply this to 
all platforms.

2) It is very hard to get code r=/sr=/a=.
I am trying to avoid the pain that rbs is suffering in bug 99010 where
many files need to be reviewed by many people all at onece. It is my belief 
that once this is in and working that it will be easier to get the r=/sr= 
for the other UIs.

3) The common code in nsFontMetricsGTK.cpp and nsFontMetricsXlib.cpp
should be shared not duplicated. We have been adding alot of code to
nsFontMetricsGTK.cpp to do glyph search/fill-in that should be applied
to Xlib/QT/PS/... We really should open a bug to factor out the common
code.



Comment 18

17 years ago
Created attachment 49233 [details] [diff] [review]
New patch for both GTK+ and Xlib gfx/

Comment 19

17 years ago
> 2) It is very hard to get code r=/sr=/a=.
> I am trying to avoid the pain that rbs is suffering in bug 99010 where
> many files need to be reviewed by many people all at onece. It is my belief 
> that once this is in and working that it will be easier to get the r=/sr= 
> for the other UIs.

I tried that. This ends-up in the issues I described in my last comment.
Even the comment "it's only a port patch, code already exists in GTK+ gfx code"
does not prevent the case that the patches need a week or more until they are in
CVS. And this usually screws-up&blocks my other patches... and finally I am in
hell because I have to retarget bugs to later milestones - even if the code is
already "done" and "ready" for CVS... ;-(((

> 3) The common code in nsFontMetricsGTK.cpp and nsFontMetricsXlib.cpp
> should be shared not duplicated. We have been adding alot of code to
> nsFontMetricsGTK.cpp to do glyph search/fill-in that should be applied
> to Xlib/QT/PS/... We really should open a bug to factor out the common
> code.

That's what I am already doing for Xlib toolkit and the Xprint module. Both
share most of their code. For PS module it will be very difficult (I tried that
in my gfx/src/dps/ port) - and the GTK+ gfx stuff is IMHO useless because it's
functionality is 100% identical to the Xlib gfx code functionality (except the
add-ons for Xprint stuff).

In theory we could use C++ templates for code sharing between all those
modules... but I am not sure whether the coding guidelines would allow such an
exception...

----

bstell: Wanna r= that Xlib part of my new patch, please ?
(Forgot to cc: myself -- feel free to cc: me if I've failed to do so and you're
responding to my review comments.  Thanks.)

The win in using natural-int sized loads and store is that on modern
architectures, a byte or short access can take longer to handle, because it
turns into a 32-bit or 64-bit load (either through compiler instruction
selection, where the compiler loads 32 bits and then shifts or otherwise
extracts the desired 16 bit half-word [see Alpha Architecture Reference Manual,
A.4.1, and note that Alpha calls a 16-bit unit a "Word"]; or in the CPU's
interface to [busses to] memory, where a similar process must take place).

Stores too may take longer to retire if 16-bit rather than 32- or 64-bit,
sometimes quite a bit longer; stores that do not fill cache lines completely
before the lines must be written back can lead to decimal order-of-magnitude
slowdowns, although stores aren't critical to this bug's patch.

Having written all that, I'm still in favor of your using a prbitmap_t or
unsigned long type, although NSPR may not provide the optimal type for all
platforms.  As you say, the number of ALU ops in computing addresses and testing
bits does not vary with load size -- so why not use the best load size, which on
modern chips is not 16 bits?

/be
Cc'ing wtc for prbitmap_t and general CPU architecture optimization advice.

/be
(Assignee)

Comment 22

17 years ago
> FreeCCMap(gUserDefinedCCMap);            gUserDefinedCCMap = nsnull;

The FreeCCMap() takes a reference and after freeing it sets it to
null so the "gUserDefinedCCMap = nsnull;" part is not needed.

>  PRUint32 map[2048];

See Brendan's comments about 2048.

(Assignee)

Comment 23

17 years ago
So will the basic block stay 32 bytes?

I'm unclear on how to write platform independent code using prbitmap_t that
accesses a 32 byte block.

Can I get some help writing the CCMAP_HAS_CHAR macro using prbitmap_t?

Once that is done I am sure I can convert all the other code.
bstell: see next attachment.  I reworked the macros to derive from
PR_BITS_PER_LONG_LOG2, which NSPR's prbit.h uses for its prbitmap_t macros.  I
defined a PR_BITS_PER_BITMAP_LOG2 first, to hide the "LONG" implementation
detail (NSPR should do this -- wtc?).

I didn't rework all macros to minimize repeated magic or related numbers, but
that could be done, too (for instance, CCMAP_NUM_CHARS_PER_PAGE and
CCMAP_PAGE_MASK could both be based on a new CCMAP_BIT_INDEX_LOG2 macro, using
PR_BIT and PR_BITMASK respectively.  You might want to do that, just so it's
easy to tweak fundamental parameters in one place if another change is needed.

/be
Created attachment 49294 [details]
fragment rederiving CCMAP_BIT_INDEX &c from prbitmap_t bits
(Assignee)

Comment 26

17 years ago
I will incorporate Brendan's work and suggestions and rework the Xlib code.

Comment 27

17 years ago
prbitmap_t is one of the things (Brendanisms?) NSPR
inherited from its predecessor (NSPR1).  NSPR itself
doesn't use prbitmap_t, and I am not familiar with
prbitmap_t either.

The only comment about prbitmap_t in the source code is:
  A prbitmap_t is an integer that can be used for bitmaps
It says nothing about optimal performance or natural int
size.  So I am not sure that prbitmap_t is the right type
if those are your goals.

It seems that the simple 'int' is the integer type with
the natural size.  'int' is at least 16 bits.  So seems
like 'int' (as opposed to prbitmap_t) should be the right
replacement for PRUint16 here.
wtc: it's not a brendan-ism, it's a kipp-ism.  Are you going to deprecate it?
It seems worthwhile, given this bug and earlier (now defunct?) uses of it (kipp
used it in the Netscape JVM, IIRC).  I imagine that int could be suboptimal on
Alpha, for instance, compared to long, if long is 64 bits.

/be

bstell: I don't think we need to fret too much about int vs. long vs. prbitmap_t
(which must be known to be a typedef for long, for us to be able to make use of
a PR_BITS_PER_LONG_LOG2 or similar macro).  Sorry for belaboring the issue here
-- I wanted to get wtc's thoughts and NSPR commitments.  If you prefer to keep
dependencies fewer, just use long instead of prbitmap_t, and avoid including
prbit.h.  So long as we avoid PRUint16, I think we've optimized enough.

/be
(Assignee)

Comment 30

17 years ago
In this area I am going to carefully avoid having an opinion.

Pick your poison: 

  Do we prefer long or prbitmap_t?

If I don't hear any strong opinions I will default to Erik's and Ftang's 
choice of long.
(Assignee)

Comment 31

17 years ago
Can I assume a long is always at least as long as a PRUint32?

Comment 32

17 years ago
> Can I assume a long is always at least as long as a PRUint32?

AFAIK per ANSI-C a |long| is at least 32bit large. I assume a PRUint32 has
exactly 32bits... therefore the answer is YES.

On most 64bit platoforms (like Alpha/SPARCv9-platforms) |long| is 64bit large.
long wins, if only because prbit.h fails to define a PR_BITS_PER_BITMAP_LOG2
alias for PR_BITS_PER_LONG_LOG2, forcing consumers to "know" the equivalence.

/be

Comment 34

17 years ago
Why don't you use int?  long on Win16 is not
the natural integer size.

Seriously, C99 defines the type you want as
int_fast16_t (I understand you just need a
fast integer type that is at least 16 bits).
But C99 is not widely implemented yet.
Mozilla dropped Win16 support a long time ago.  I'm ok with int, too, but have
gut-level doubts (memory from SGI and MicroUnity days) that in LP64 compilation
type models, on 64-bit CPUs, 32-bit loads will be slower.

/be
64-bit CPUs meaning the whole cache/memory interface, too.  School me, I'm out
of date on architecture by a few years.

C99 sounds cool.  We added uintN and intN to NSPR1 (now PRUintn and PRIntn in
NSPR) for the same reason as int_fast16_t -- to have a "natural" integer type of
at least 16 bits.  Neither naming scheme is great.

Brian, you don't need to hang on this int vs. long debate, pick int to make wtc
happy and ignore my doubts -- we can always tune later if ever the cost shows up
(unlikely).

/be
(Assignee)

Comment 37

17 years ago
Created attachment 49429 [details] [diff] [review]
patch; the page access is compile time switchable 16/32/64 bit
(Assignee)

Comment 38

17 years ago
I added a compile time switch to control the bit access:

  #define ALU_SIZE PR_BITS_PER_LONG
  //#define ALU_SIZE 16
  //#define ALU_SIZE 32
  //#define ALU_SIZE 64

I tested all 3 on my Redhat 7.1 Linux system.

Feel free to recompile with a different one active to try the other sizes.

Roland: would you kindly try this on Xlib?

Are we ready for sr= ?
bstell, the new files are not in that patch, can you cvs add them and produce a
cvs diff -uN from the top-most common directory, for an all-in-one patch?  Also,
a few minor comments:

- It seems as though the pattern:
  memset(map, 0, sizeof(map));mapper->FillInfo(map);MapToCCMap(map)
  with nsresult checking elided, occurs a lot -- make a common subroutine?

- SetUpFontCharSetInfo(nsFontCharSetInfo* aSelf) should return a boolean so its
  caller doesn't have to "know" to test charSetInfo->mCCMap?

- The "Now, single byte documents find these special chars before the CJK fonts
  are searched so this is no longer needed and should be removed." comment is
  new -- why not just remove the code that's no longer needed?  If not, please
  cite a bug # in the comment.

/be
(Assignee)

Comment 40

17 years ago
Created attachment 49718 [details] [diff] [review]
patch with requested changes
Comment on attachment 49718 [details] [diff] [review]
patch with requested changes

Nit: The LOG2 macros actually compute POW2(n) -- rename?  NSPR calls this
macro PR_BIT, btw.

The xlib port needs the citation of bug 100233, also.

'nother naming nit: GetCCMap is an allocator, not a getter -- NewCCMap?  Might help
future-leak-proof it very slightly.

Too bad FONT_HAS_GLYPH wasn't parameterized by font, instead of font->mMap -- then it
could have been layered transparently on top of CCMAP_HAS_CHAR.  Worth
doing now?

In nsCompressedCharMap::SetChars(PRUint32* aMap), there's an empty #if (CCMAP_BITS_PER_ALU == 16)\n#endif clause.
Also, alu_val could be declared where it's set to 0, in the loop.
In the next #elif clause, int v is set but never used.

Above that method, isn't SetChars(PRUint16* aCCMap) now dead code?  If so, remove it and
any macros used only by its body.

Otherwise, looks good.  I'm tiring of writing comments in this tiny textarea!  My only
lingering thought is that the code might be clearer if you gave up
on the PRUint16 type and sizing and just used
ALU_TYPE throughout.

Code looks pure (just curious, have you run purify?  Not necessary, just wondering).

sr=brendan@mozilla.org with the above nits picked, except for the PRUint16 abolition,
which I leave up to you.

/be
Attachment #49718 - Flags: superreview+
(Assignee)

Comment 42

17 years ago
I'll change LOG2 to CCMAP_POW2. I saw that PR_BIT did POW2 but it does not 
handle a 64 bit number on a system with a 32 bit ALU. I also felt that using 
it for POW2 would make obscure code since the name PR_BIT seems more like a 
name that is used to generate a bitmask than to raise to a power of 2.

Is the comment "code might be clearer if you gave up on the PRUint16 type"
meant to include changing the what the upper and mid indexes are handled?
The table sizes for 16/32/64 bit "pointers" are:
+--------------------------------+------+------+------+
|            pointers + data bits|  16  |  32  |  64  |
+--------------------------------+------+------+------+
| min   2*(16*alu_size) +   1*32 |   96 |  160 |  288 |
| avg   6*(16*alu_size) +  15*32 |  672 |  864 | 1248 |
| max  19*(16*alu_size) + 256*32 | 8800 | 9408 |10624 |
+--------------------------------+------+------+------+
The avg is based on my looking at a variety of actual western fonts.
I would prefer to use the smallest table size.

I have not used purify. I try to be very careful and look at every line in
the debugger (although I still make mistakes).

Thanks

bstell: right, the upper and mid cursors would want to stay PRUint16s.  I see
the problem: either you have to declare differently typed (and aligned) storage
for those cursors and for the bitmaps, or you have to pun.  Ok, I'm convinced.
I started questioning the PRUint16 parameterization partly because of the
SetChars that takes a PRUint16* -- that is a dead method, right?

/be
(Assignee)

Comment 44

17 years ago
The SetChars(PRUint16*) was used in an earlier incantation but can now be 
removed as you requested. I do plan to save its "fast scan" algorithm but I 
can move that into a comment. (Because a CCMap "knows" which sections are 
empty the code can skip those sections which allows for a "fast scan".)
The fast scan is not currently used but it is obscure enough and interesting
enough to save.

Comment 45

17 years ago
Comment on attachment 49718 [details] [diff] [review]
patch with requested changes

OK... I tested the patch on Solaris with Xlib and GTK+ build and a 64bit-Solaris-sparcv9 Xlib build
r=Roland.Mainz@informatik.med.uni-giessen.de
Attachment #49718 - Flags: review+
(Assignee)

Comment 46

17 years ago
checked in.

The FONT_HAS_GLYPH macro was in a lot of files:

 gfx/src/gtk/nsFontMetricsGTK.cpp
 gfx/src/mac/nsMacUnicodeFontInfo.cpp
 gfx/src/os2/nsFontMetricsOS2.cpp
 gfx/src/qt/nsFontMetricsQT.cpp
 gfx/src/windows/nsFontMetricsWin.cpp
 gfx/src/windows/nsRenderingContextWin.cpp
 gfx/src/xlib/nsFontMetricsXlib.cpp

Should I make a patch for these and attach it to this bug report?
(Assignee)

Comment 47

17 years ago
I should probably make a FONT_HAS_CHAR macro as a transition tool.

Comment 48

17 years ago
CC:'ing rbs@maths.uq.edu.au

Unix/Linux tinderbox machines are currently RED because the patches in bug 99010
predates this patch.

Comment 49

17 years ago
I have been keeping my changes in sync with the tip, and have ensured that 
things were up-to-date at the time of my carpool. I suspect it is just mid-air 
checkins since no errors are now reported on the logs.

Comment 50

17 years ago
OK, got it... 
IS_REPRESENTABLE((*font)->mMap, ...) has been obsoleted, and all these bits
should be corrected, I am doing a s//g and will update...

Comment 51

17 years ago
rbs:
This is definately _no_ midcheckin. 
All Unix/Linux tinderbox machines are RED or will be RED soon until both GTK+
and XLib toolkits gets fixed...

Comment 52

17 years ago
Should be fixed now. Please rebuild gtk after re-pulling gfx/src/gtk.
Let me know how your build goes -- it is much faster to just re-build gtk than
to wait for another round of Tinderbox, thanks.
(Assignee)

Comment 53

17 years ago
This was checked in last week. Finally got performance results: 

  no performance impact

closing ...
Status: ASSIGNED → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED

Comment 54

17 years ago
Switching qa contact to ftang for now. Frank, can this one be verified within
development or can you provide a test case for IQA?  Thanks.
QA Contact: andreasb → ftang
You need to log in before you can comment on or make changes to this bug.