9.74 KB, patch
|Details | Diff | Splinter Review|
26.64 KB, patch
|Details | Diff | Splinter Review|
28.96 KB, patch
|Details | Diff | Splinter Review|
42.93 KB, patch
|Details | Diff | Splinter Review|
1.50 KB, text/plain
29.42 KB, patch
|Details | Diff | Splinter Review|
48.08 KB, patch
|Details | Diff | Splinter Review|
Currently, GFX on Linux use a PRUint32 info to represent it can render unciode or not. bstell mention that he have a algorithm which cna reduce the needed memroy.
bstell said the new algorith can reduce memory from 8K per encoding (for bitmap fonts) or per TrueType Font, to about 500 bytes. footprint bug
isn't this a dup of bug 75260 ?
Status: NEW → ASSIGNED
bstell said he will divide the 16 bits address space to a 3 stage lookup 4 x 4 x 8 So the smallest map will be (2 ^ 4) x sizeof(pointer) + 2 x (2 ^ 4) x sizeof(pointer) + 2 x 32 = 256 bytes and the largest map will be (2 ^ 4) x sizeof(pointer) + 17 x (2 ^ 4) x sizeof(pointer) + 257 x 32 = 9376 bytes he said most of the map for WGL4 fonts are 600-700 bytes and the larget font he have use about 6K Currently, all the font use 8K bytes.
mark as m95
Target Milestone: --- → mozilla0.9.5
Created attachment 46693 [details] [diff] [review] patch; nsCompressedCharMap.cpp/nsCompressedCharMap.h
Frank made a a couple suggestions: 1) Since the CCMap is 256 bit page oriented it would be more efficient to set bits from the 8K map a page at a time. This same interface would work for the copying pages from a CCMap. 2) Some fonts have large areas of all bits set. So when setting a page, if all the bits are set (all 1's) allocate an "all bits set page" page to share.
Created attachment 47105 [details] [diff] [review] revised patch: loads by 256 bit "page", folds "all bits set" pages
The code in attachment 47105 [details] [diff] [review] actually uses implements "pointers" using 2 byte offsets so the smallest table (empty map) is 3*32=96 bytes small fonts should be in the 200-700 byte range large fonts (ie: cyberbit) should be in the 4K range the max size is 8800 bytes
Min map size = 32 bytes. ------------------------ Its actually possible for an empty map to fold the Upper pointers, the empty mid pointers, and the empty page on top of each other and reduce the map to 32 bytes total. This may be useful for a 20bit Unicode map. This is compatible with the current accessor macro. Sharing Maps ------------------------ It seems reasonable that fonts released as a group (eg: Helvetica regular / bold / italic / bold-italic) might have the same char map. Thus we might save more memory by sharing maps if we had low cost way to identify equal maps. We might consider adding a 32 byte control block. It could hold the number of chars and a map signature/digest. These could be used as a quick check to see if 2 maps were not identical which would speed up the process of looking for identical maps.
Comment on attachment 47105 [details] [diff] [review] revised patch: loads by 256 bit "page", folds "all bits set" pages /r=yokoyama
bstell - wanna include a patch for Xlib toolkit, too - or should I file a seperate bug for this ?
Hopefully this will propogate to all GFX versions (win,mac,xlib,ps,...). I would prefer to open separate bugs and once this one has a couple days of soak then do the other versions.
Nit: don't expandtab in Makefile.in, do use tabs (style only, in this case, but if you modified a command line for a rule, you would need a tab or make would barf). I'm wondering why you don't use prbitmap_t (prbit.h) instead of PRUint16, just for the slight gain (many architectures do 32-bit or 64-bit -- unsigned long is what prbitmap_t typedef's to -- loads, stores, and ALU-ops). In light of that idea, maybe the macros should use WORD instead of LONG or SHORT. Also, the magic 2048 used to dimension PRUint32 arrays should be #defined with a value computed from more primitive macros. Early comments, I'll say more in a bit -- neat work. Any reason to leave bug 75260 open still? /be
Brian Stell wrote: > Hopefully this will propogate to all GFX versions (win,mac,xlib,ps,...). > > I would prefer to open separate bugs and once this one has a couple days of > soak then do the other versions. I disagree in the nsFontMetricsXlib*.*-case. Filing seperate bugs to "hunt&catch-up" with all changes in the GTK+ gfx stuff is a mess... it usually happens like this: Once I pulled one change over I see three new changes in GTK+ stuff... OK, then I'll file new bugs, get r= but no quick sr=, then sr= in the wrong order or no sr= because superreviewer cannot test them because the patches depend on each other (well... the nsFontMetrics* code isn't bug enougth to avoid this... ;-(( ). After all, I have my sr= for all patches, can check-in and still lag far behind of the GTK+ gfx status. Not to mention a closed tree which requires a= because it waits to be branched... ;-(( Filing "combined" patches which pull multiple changes from GTK+ to Xlib is difficult, too - superreviewers always beat me because patches are usually horrible big in those cases... ;-(( That's why I decided to spend a week to _sync_ the GTK+-->Xlib nsFontMetrics*.* sources and now try beg the people to patch GTK+ _and_ Xlib both - which is _not_ difficult: Sources are ~99% identical and nsFontMetricsXlib.* are build by default for the Xprint module. Anyway... I am going to file a new patch for both GTK+ and Xlib gfx in ~1-2 hours... =Just the todays count: I am "monitoring" 11 (eleven!!) other bugs which will patch nsFontMtricsGTK.* in this/next month... just guess how much time gets wasted to sync all this code... ;-(( =Which was only a GTK+ --> Xlib one-way sync... Xlib gfx stuff currently contains many improvents which may be usefull in the GTK+ gfx stuff, too...
Brendan: Thanks for pointing out prbitmap_t. Like so much in the Mozilla I was not aware of it. I will take a look. In the current design every block is the same 32 bytes and this give fairly good memory usage and makes allocating / accessing simple. You mention there might be a "slight gain" in performance using 32/64 bit accessing instead of 16 bit. Ftang mentioned using larger accesses but since we would have the same number of SHIFT/AND and INDEX/LOAD operations I am unclear where the actual gain would be. Bug 75260 is for windows as opposed to linux. There are several bugs open which are very similar to this bug. I will clean this up once this is done. Yes, you are right I should clean up and define 2048 instead of just following the existing code.
Roland: I feel there are 3 separate issues here: 1) I want to measure the real performance impact before we apply this to all platforms. 2) It is very hard to get code r=/sr=/a=. I am trying to avoid the pain that rbs is suffering in bug 99010 where many files need to be reviewed by many people all at onece. It is my belief that once this is in and working that it will be easier to get the r=/sr= for the other UIs. 3) The common code in nsFontMetricsGTK.cpp and nsFontMetricsXlib.cpp should be shared not duplicated. We have been adding alot of code to nsFontMetricsGTK.cpp to do glyph search/fill-in that should be applied to Xlib/QT/PS/... We really should open a bug to factor out the common code.
> 2) It is very hard to get code r=/sr=/a=. > I am trying to avoid the pain that rbs is suffering in bug 99010 where > many files need to be reviewed by many people all at onece. It is my belief > that once this is in and working that it will be easier to get the r=/sr= > for the other UIs. I tried that. This ends-up in the issues I described in my last comment. Even the comment "it's only a port patch, code already exists in GTK+ gfx code" does not prevent the case that the patches need a week or more until they are in CVS. And this usually screws-up&blocks my other patches... and finally I am in hell because I have to retarget bugs to later milestones - even if the code is already "done" and "ready" for CVS... ;-((( > 3) The common code in nsFontMetricsGTK.cpp and nsFontMetricsXlib.cpp > should be shared not duplicated. We have been adding alot of code to > nsFontMetricsGTK.cpp to do glyph search/fill-in that should be applied > to Xlib/QT/PS/... We really should open a bug to factor out the common > code. That's what I am already doing for Xlib toolkit and the Xprint module. Both share most of their code. For PS module it will be very difficult (I tried that in my gfx/src/dps/ port) - and the GTK+ gfx stuff is IMHO useless because it's functionality is 100% identical to the Xlib gfx code functionality (except the add-ons for Xprint stuff). In theory we could use C++ templates for code sharing between all those modules... but I am not sure whether the coding guidelines would allow such an exception... ---- bstell: Wanna r= that Xlib part of my new patch, please ?
(Forgot to cc: myself -- feel free to cc: me if I've failed to do so and you're responding to my review comments. Thanks.) The win in using natural-int sized loads and store is that on modern architectures, a byte or short access can take longer to handle, because it turns into a 32-bit or 64-bit load (either through compiler instruction selection, where the compiler loads 32 bits and then shifts or otherwise extracts the desired 16 bit half-word [see Alpha Architecture Reference Manual, A.4.1, and note that Alpha calls a 16-bit unit a "Word"]; or in the CPU's interface to [busses to] memory, where a similar process must take place). Stores too may take longer to retire if 16-bit rather than 32- or 64-bit, sometimes quite a bit longer; stores that do not fill cache lines completely before the lines must be written back can lead to decimal order-of-magnitude slowdowns, although stores aren't critical to this bug's patch. Having written all that, I'm still in favor of your using a prbitmap_t or unsigned long type, although NSPR may not provide the optimal type for all platforms. As you say, the number of ALU ops in computing addresses and testing bits does not vary with load size -- so why not use the best load size, which on modern chips is not 16 bits? /be
Cc'ing wtc for prbitmap_t and general CPU architecture optimization advice. /be
> FreeCCMap(gUserDefinedCCMap); gUserDefinedCCMap = nsnull; The FreeCCMap() takes a reference and after freeing it sets it to null so the "gUserDefinedCCMap = nsnull;" part is not needed. > PRUint32 map; See Brendan's comments about 2048.
So will the basic block stay 32 bytes? I'm unclear on how to write platform independent code using prbitmap_t that accesses a 32 byte block. Can I get some help writing the CCMAP_HAS_CHAR macro using prbitmap_t? Once that is done I am sure I can convert all the other code.
bstell: see next attachment. I reworked the macros to derive from PR_BITS_PER_LONG_LOG2, which NSPR's prbit.h uses for its prbitmap_t macros. I defined a PR_BITS_PER_BITMAP_LOG2 first, to hide the "LONG" implementation detail (NSPR should do this -- wtc?). I didn't rework all macros to minimize repeated magic or related numbers, but that could be done, too (for instance, CCMAP_NUM_CHARS_PER_PAGE and CCMAP_PAGE_MASK could both be based on a new CCMAP_BIT_INDEX_LOG2 macro, using PR_BIT and PR_BITMASK respectively. You might want to do that, just so it's easy to tweak fundamental parameters in one place if another change is needed. /be
I will incorporate Brendan's work and suggestions and rework the Xlib code.
prbitmap_t is one of the things (Brendanisms?) NSPR inherited from its predecessor (NSPR1). NSPR itself doesn't use prbitmap_t, and I am not familiar with prbitmap_t either. The only comment about prbitmap_t in the source code is: A prbitmap_t is an integer that can be used for bitmaps It says nothing about optimal performance or natural int size. So I am not sure that prbitmap_t is the right type if those are your goals. It seems that the simple 'int' is the integer type with the natural size. 'int' is at least 16 bits. So seems like 'int' (as opposed to prbitmap_t) should be the right replacement for PRUint16 here.
wtc: it's not a brendan-ism, it's a kipp-ism. Are you going to deprecate it? It seems worthwhile, given this bug and earlier (now defunct?) uses of it (kipp used it in the Netscape JVM, IIRC). I imagine that int could be suboptimal on Alpha, for instance, compared to long, if long is 64 bits. /be
bstell: I don't think we need to fret too much about int vs. long vs. prbitmap_t (which must be known to be a typedef for long, for us to be able to make use of a PR_BITS_PER_LONG_LOG2 or similar macro). Sorry for belaboring the issue here -- I wanted to get wtc's thoughts and NSPR commitments. If you prefer to keep dependencies fewer, just use long instead of prbitmap_t, and avoid including prbit.h. So long as we avoid PRUint16, I think we've optimized enough. /be
In this area I am going to carefully avoid having an opinion. Pick your poison: Do we prefer long or prbitmap_t? If I don't hear any strong opinions I will default to Erik's and Ftang's choice of long.
Can I assume a long is always at least as long as a PRUint32?
> Can I assume a long is always at least as long as a PRUint32? AFAIK per ANSI-C a |long| is at least 32bit large. I assume a PRUint32 has exactly 32bits... therefore the answer is YES. On most 64bit platoforms (like Alpha/SPARCv9-platforms) |long| is 64bit large.
long wins, if only because prbit.h fails to define a PR_BITS_PER_BITMAP_LOG2 alias for PR_BITS_PER_LONG_LOG2, forcing consumers to "know" the equivalence. /be
Why don't you use int? long on Win16 is not the natural integer size. Seriously, C99 defines the type you want as int_fast16_t (I understand you just need a fast integer type that is at least 16 bits). But C99 is not widely implemented yet.
Mozilla dropped Win16 support a long time ago. I'm ok with int, too, but have gut-level doubts (memory from SGI and MicroUnity days) that in LP64 compilation type models, on 64-bit CPUs, 32-bit loads will be slower. /be
64-bit CPUs meaning the whole cache/memory interface, too. School me, I'm out of date on architecture by a few years. C99 sounds cool. We added uintN and intN to NSPR1 (now PRUintn and PRIntn in NSPR) for the same reason as int_fast16_t -- to have a "natural" integer type of at least 16 bits. Neither naming scheme is great. Brian, you don't need to hang on this int vs. long debate, pick int to make wtc happy and ignore my doubts -- we can always tune later if ever the cost shows up (unlikely). /be
Created attachment 49429 [details] [diff] [review] patch; the page access is compile time switchable 16/32/64 bit
I added a compile time switch to control the bit access: #define ALU_SIZE PR_BITS_PER_LONG //#define ALU_SIZE 16 //#define ALU_SIZE 32 //#define ALU_SIZE 64 I tested all 3 on my Redhat 7.1 Linux system. Feel free to recompile with a different one active to try the other sizes. Roland: would you kindly try this on Xlib? Are we ready for sr= ?
bstell, the new files are not in that patch, can you cvs add them and produce a cvs diff -uN from the top-most common directory, for an all-in-one patch? Also, a few minor comments: - It seems as though the pattern: memset(map, 0, sizeof(map));mapper->FillInfo(map);MapToCCMap(map) with nsresult checking elided, occurs a lot -- make a common subroutine? - SetUpFontCharSetInfo(nsFontCharSetInfo* aSelf) should return a boolean so its caller doesn't have to "know" to test charSetInfo->mCCMap? - The "Now, single byte documents find these special chars before the CJK fonts are searched so this is no longer needed and should be removed." comment is new -- why not just remove the code that's no longer needed? If not, please cite a bug # in the comment. /be
Comment on attachment 49718 [details] [diff] [review] patch with requested changes Nit: The LOG2 macros actually compute POW2(n) -- rename? NSPR calls this macro PR_BIT, btw. The xlib port needs the citation of bug 100233, also. 'nother naming nit: GetCCMap is an allocator, not a getter -- NewCCMap? Might help future-leak-proof it very slightly. Too bad FONT_HAS_GLYPH wasn't parameterized by font, instead of font->mMap -- then it could have been layered transparently on top of CCMAP_HAS_CHAR. Worth doing now? In nsCompressedCharMap::SetChars(PRUint32* aMap), there's an empty #if (CCMAP_BITS_PER_ALU == 16)\n#endif clause. Also, alu_val could be declared where it's set to 0, in the loop. In the next #elif clause, int v is set but never used. Above that method, isn't SetChars(PRUint16* aCCMap) now dead code? If so, remove it and any macros used only by its body. Otherwise, looks good. I'm tiring of writing comments in this tiny textarea! My only lingering thought is that the code might be clearer if you gave up on the PRUint16 type and sizing and just used ALU_TYPE throughout. Code looks pure (just curious, have you run purify? Not necessary, just wondering). email@example.com with the above nits picked, except for the PRUint16 abolition, which I leave up to you. /be
Attachment #49718 - Flags: superreview+
I'll change LOG2 to CCMAP_POW2. I saw that PR_BIT did POW2 but it does not handle a 64 bit number on a system with a 32 bit ALU. I also felt that using it for POW2 would make obscure code since the name PR_BIT seems more like a name that is used to generate a bitmask than to raise to a power of 2. Is the comment "code might be clearer if you gave up on the PRUint16 type" meant to include changing the what the upper and mid indexes are handled? The table sizes for 16/32/64 bit "pointers" are: +--------------------------------+------+------+------+ | pointers + data bits| 16 | 32 | 64 | +--------------------------------+------+------+------+ | min 2*(16*alu_size) + 1*32 | 96 | 160 | 288 | | avg 6*(16*alu_size) + 15*32 | 672 | 864 | 1248 | | max 19*(16*alu_size) + 256*32 | 8800 | 9408 |10624 | +--------------------------------+------+------+------+ The avg is based on my looking at a variety of actual western fonts. I would prefer to use the smallest table size. I have not used purify. I try to be very careful and look at every line in the debugger (although I still make mistakes). Thanks
bstell: right, the upper and mid cursors would want to stay PRUint16s. I see the problem: either you have to declare differently typed (and aligned) storage for those cursors and for the bitmaps, or you have to pun. Ok, I'm convinced. I started questioning the PRUint16 parameterization partly because of the SetChars that takes a PRUint16* -- that is a dead method, right? /be
The SetChars(PRUint16*) was used in an earlier incantation but can now be removed as you requested. I do plan to save its "fast scan" algorithm but I can move that into a comment. (Because a CCMap "knows" which sections are empty the code can skip those sections which allows for a "fast scan".) The fast scan is not currently used but it is obscure enough and interesting enough to save.
Comment on attachment 49718 [details] [diff] [review] patch with requested changes OK... I tested the patch on Solaris with Xlib and GTK+ build and a 64bit-Solaris-sparcv9 Xlib build r=Roland.Mainz@informatik.med.uni-giessen.de
Attachment #49718 - Flags: review+
checked in. The FONT_HAS_GLYPH macro was in a lot of files: gfx/src/gtk/nsFontMetricsGTK.cpp gfx/src/mac/nsMacUnicodeFontInfo.cpp gfx/src/os2/nsFontMetricsOS2.cpp gfx/src/qt/nsFontMetricsQT.cpp gfx/src/windows/nsFontMetricsWin.cpp gfx/src/windows/nsRenderingContextWin.cpp gfx/src/xlib/nsFontMetricsXlib.cpp Should I make a patch for these and attach it to this bug report?
I should probably make a FONT_HAS_CHAR macro as a transition tool.
CC:'ing firstname.lastname@example.org Unix/Linux tinderbox machines are currently RED because the patches in bug 99010 predates this patch.
I have been keeping my changes in sync with the tip, and have ensured that things were up-to-date at the time of my carpool. I suspect it is just mid-air checkins since no errors are now reported on the logs.
OK, got it... IS_REPRESENTABLE((*font)->mMap, ...) has been obsoleted, and all these bits should be corrected, I am doing a s//g and will update...
rbs: This is definately _no_ midcheckin. All Unix/Linux tinderbox machines are RED or will be RED soon until both GTK+ and XLib toolkits gets fixed...
Should be fixed now. Please rebuild gtk after re-pulling gfx/src/gtk. Let me know how your build goes -- it is much faster to just re-build gtk than to wait for another round of Tinderbox, thanks.
This was checked in last week. Finally got performance results: no performance impact closing ...
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED
Switching qa contact to ftang for now. Frank, can this one be verified within development or can you provide a test case for IQA? Thanks.
QA Contact: andreasb → ftang
You need to log in before you can comment on or make changes to this bug.