Closed Bug 377950 Opened 18 years ago Closed 17 years ago

Need to get real cmaps from fonts

Categories

(Core :: Graphics, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: pavlov, Assigned: pavlov)

References

Details

(Keywords: perf, regression)

Attachments

(3 files, 8 obsolete files)

We need to pull out the cmaps of all the installed fonts so that we can update the unicode bitfield ranges that a font supports to be accurate and so that we can use them instead of ScriptGetCMap for surrogate pairs. Here is partial code that gets us part of the way. We may want to move the actual cmap extraction code in to its own file. We also want to use a compressed character map for holding the cmap. There is code for this in /intl/unicharutil/util/nsCompressedCharMap.{cpp,h} but it would be nice if the class was nicer to use. Something like cmap[n] = PR_TRUE or hasGlyph = cmap[n] or something rather than a bunch of complicated macros.
there is code to get/parse cmap formats 4 and 12 in gfx/src/windows/nsFontMetricsWin.cpp. I'm not sure if we need to parse more than those or not -- need to check the opentype/truetype specs on microsoft's site.
Attached patch v0.3 (obsolete) — Splinter Review
here's a partial patch. i've got a path that uses ScriptGetCMap and updates the unicode bitranges based on it (doesn't work for characters that don't fit in ucs2). I've also got some code that uses GetFontData() and sets up the stuff needed to pull out the cmaps from the fonts manually. This is the path we want. We'll want to put a compressed cmap on the FontEntry to store the results on as well.
Attached patch v0.5 (obsolete) — Splinter Review
this version caches which fonts have which glyphs using GetFontData(). It also fixes some various other bugs with font searching and makes font searches actually search for specific characters rather than ranges so the results are often much smaller. This also lets us use cmaps for ucs4 characters. The biggest problem with this patch right now is the huge bitsets I allocate. Need to use RLE compressed ones...
Attachment #262032 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Flags: blocking1.9+
Keywords: perf, regression
Blocks: 362631
Blocks: 332649
Attached patch v0.7 (obsolete) — Splinter Review
if you ignore the huge memory bloat and startup speed hit, this patch rules. some i18n pages load up to 6x faster. Most at least 2x
Attachment #265613 - Attachment is obsolete: true
Attached patch v0.75 (obsolete) — Splinter Review
additional cleanup
Attachment #265873 - Attachment is obsolete: true
Attached patch v0.9 (obsolete) — Splinter Review
ok, this patch is in pretty good shape. the diff has some extra code (splititems/etc) that is all ifdef'd out and will be part of another patch so please ignore. this has a sparse bitset class from roc that speeds up startup a bit and greatly reduces memory use. I still need to measure the startup hit and see about getting the data lazily, but I believe this patch is about ready to land.
Attachment #265895 - Attachment is obsolete: true
Attached patch v1.0 (obsolete) — Splinter Review
this version looks pretty good
Attachment #266057 - Attachment is obsolete: true
Attachment #266307 - Flags: superreview?
Attachment #266307 - Flags: review?(vladimir)
Attachment #266307 - Flags: superreview? → superreview?(roc)
Attachment #266307 - Flags: superreview?(roc)
Attachment #266307 - Flags: superreview?
Attachment #266307 - Flags: review?(vladimir)
Attachment #266307 - Flags: review+
Attachment #266307 - Flags: superreview? → superreview?(roc)
Attached patch v1.1 (obsolete) — Splinter Review
A couple minor changes (use nsAutoPtr<> in gfxSparseBitset, tiny bit of random cleanup). No functional changes.
Attachment #266307 - Attachment is obsolete: true
Attachment #266433 - Flags: superreview?(roc)
Attachment #266433 - Flags: review+
Attachment #266307 - Flags: superreview?(roc)
+ PRBool test(PRUint32 aIndex) { + void set(PRUint32 aIndex) { + PRUint32 getsize() { Test, Set, GetSize + mWeights[aWeight - 1] = (aValue == PR_TRUE); I know you just copied this code, but == PR_TRUE is redundant... + FontEntry *FindFontForString(const PRUnichar *aString, PRUint32 aLength, gfxWindowsFont *aFont); + FontEntry *FindFontEntry(const nsAString& aName); Could use some comments here. +SCRIPT_SHAPE: Ick! Can't you use a while loop here? If it helps, change this: if (giveUp) { if (PR_LOG_TEST(gFontLog, PR_LOG_DEBUG)) PR_LOG(gFontLog, PR_LOG_DEBUG, ("%s - gave up", NS_LossyConvertUTF16toASCII(font->GetName()).get())); goto SCRIPT_PLACE; } to if (giveUp) { if (PR_LOG_TEST(gFontLog, PR_LOG_DEBUG)) PR_LOG(gFontLog, PR_LOG_DEBUG, ("%s - gave up", NS_LossyConvertUTF16toASCII(font->GetName()).get())); rv = item->Shape(); } else { while (PR_TRUE) { rv = item->Shape(); if (NS_SUCCEEDED(rv)) break; ... } } - for (PRUint32 k = 0; k < 32; ++k) { + for (PRUint32 k = 0; k < 32; ++k) fe->mUnicodeRanges[x++] = (range & (1 << k)) != 0; - } Please don't do this :-) + PRUint16 format = GET_SHORT(p); + p += sizeof(PRUint16); // skip format I think it would be better to make these accessor functions inline functions that also update p. Like so: static inline PRUint16 ReadShort(PRUint8 **aPtr) { PRUint8 *p = *aPtr; *aPtr = p + sizeof(PRUint16); return (p[0] << 8) + p[1]; } Then the callers can be simpler: PRUint16 format = ReadShort(&p); No chance of forgetting to advance p or advancing p by the wrong amount. Also, macros suck! + for ( c = startCode; c <= endCode; ++c) { + aFontEntry->mCharacterMap.set(c); We could make this far faster ... might want to note so in a comment. + p += sizeof(PRUint32); // skip startGlyphID field Given the suggestion above, this could be written more uniformly as just: ReadLong(&p); + while (p < end) { + PRUint8 tmp = p[0]; + p[0] = p[1]; + p[1] = tmp; + p += 2; // every two bytes + } Modifying aBuf is a bit dodgy. Someone might pass in data that we don't actually own, like Freetype tables or something. I think we should do the byteswapping on demand (that's probably faster too). Maybe use a helper function like GetShortAt(PRUint8 *aBuf, PRUint32 aIndex)? + PRUint16 segCount = s[3] / 2; + PRUint16 *endCode = &s[7]; Can you use constants instead of magic 3 and 7? It would help the reader a lot. + HDC hdc = GetDC(nsnull); Shouldn't this be hoisted so we don't have to do it once per font? I seem to recall this being expensive. + if (l >= LF_FACESIZE) + l = LF_FACESIZE - 1; Easier to read as l = PR_MIN(l, LF_FACESIZE - 1); + HFONT font = CreateFontIndirectW(&logFont); Check for failure? You've got cleanup code right after this anyway... + buffer.AppendElements(len); Check for OOM? - nsCOMPtr<nsIWindowsRegKey> regKey = - do_CreateInstance("@mozilla.org/windows-registry-key;1"); + nsCOMPtr<nsIWindowsRegKey> regKey = do_CreateInstance("@mozilla.org/windows-registry-key;1"); if (!regKey) return NS_ERROR_FAILURE; - NS_NAMED_LITERAL_STRING(kFontSubstitutesKey, - "SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\FontSubstitutes"); + NS_NAMED_LITERAL_STRING(kFontSubstitutesKey, "SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\FontSubstitutes"); Beter not do this, I think... Overall looks pretty good.
Attached patch v1.2Splinter Review
Attachment #266452 - Flags: superreview?(roc)
Attachment #266433 - Attachment is obsolete: true
Attachment #266433 - Flags: superreview?(roc)
(In reply to comment #9) > + PRBool test(PRUint32 aIndex) { > + void set(PRUint32 aIndex) { > + PRUint32 getsize() { > > Test, Set, GetSize I'd like to leave these as test() and set() for now as they match what std::bitset and anything else that looks like that so if I'm trying out alternate implementations I don't have to keep changing them. We can adjust them later when we're sure we want to stick with this. > +SCRIPT_SHAPE: > > Ick! Can't you use a while loop here? If it helps, change this: > I've already rewritten this function in another patch (that depends on this one) so I'd like to avoid making any more changes to this if thats ok. > + PRUint16 format = GET_SHORT(p); > + p += sizeof(PRUint16); // skip format > I've completely rewritten all the cmap parsing to be much more readable. > > + HDC hdc = GetDC(nsnull); > > Shouldn't this be hoisted so we don't have to do it once per font? I seem to > recall this being expensive. It could be but I don't think it will help much. GetDC() returns a shared DC though so this is pretty cheap. We do this in a lot of other places. > > - nsCOMPtr<nsIWindowsRegKey> regKey = > - do_CreateInstance("@mozilla.org/windows-registry-key;1"); > + nsCOMPtr<nsIWindowsRegKey> regKey = > do_CreateInstance("@mozilla.org/windows-registry-key;1"); > if (!regKey) > return NS_ERROR_FAILURE; > - NS_NAMED_LITERAL_STRING(kFontSubstitutesKey, > - "SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\FontSubstitutes"); > + NS_NAMED_LITERAL_STRING(kFontSubstitutesKey, > "SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\FontSubstitutes"); > > Beter not do this, I think... They're both under 120 characters... :-) 80 char limit is so 1980. I believe I've addressed all the other comments you had.
Comment on attachment 266452 [details] [diff] [review] v1.2 Okay, check it in with some comments for those Find* functions.
Attachment #266452 - Flags: superreview?(roc)
Attachment #266452 - Flags: superreview+
Attachment #266452 - Flags: review+
checked in.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
bug 382309 created for the optimalization of array set in gfxSparseBitSet
I backed this out, because it was causing startup crashes for some/most? nightly testers. Details available at http://office.smedbergs.us/crash-stats/report/index/2e57018f-0df5-11dc-b623-000c29e42bed but that URL may not stay around, so here's the short version: 0 ReadCMAPTableFormat4 1 ReadCMAP 2 gfxWindowsPlatform::FontGetCMapDataProc(nsAString_internal const &,nsRefPtr<FontEntry> &,void *) 3 nsBaseHashtable<nsStringHashKey,nsRefPtr<FontEntry>,nsRefPtr<FontEntry> >::s_EnumStub(PLDHashTable *,PLDHashEntryHdr *,unsigned int,void *) 4 PL_DHashTableEnumerate 5 nsBaseHashtable<nsStringHashKey,nsRefPtr<FontEntry>,nsRefPtr<FontEntry> >::Enumerate(PLDHashOperator (*)(nsAString_internal const &,nsRefPtr<FontEntry> &,void *),void *) 6 gfxWindowsPlatform::UpdateFontList() 7 gfxWindowsPlatform::gfxWindowsPlatform() 8 gfxPlatform::Init()
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
fwiw, i managed to coax talkback into reporting about this :) Incident ID: 32628454 Stack Signature ReadCMAPTableFormat4 ed62b9cc Product ID FirefoxTrunk Build ID 2007052904 Trigger Time 2007-05-29 09:46:17.0 Platform Win32 Operating System Windows NT 5.1 build 2600 Module firefox.exe + (004ec656) URL visited skipping update? User Comments Since Last Crash 6 sec Total Uptime 6 sec Trigger Reason Access violation Source File, Line No. e:\builds\tinderbox\fx-trunk\winnt_5.2_depend\mozilla\gfx\thebes\src\gfxwindowsplatform.cpp, line 238 Stack Trace ReadCMAPTableFormat4 [mozilla/gfx/thebes/src/gfxwindowsplatform.cpp, line 238] ReadCMAP [mozilla/gfx/thebes/src/gfxwindowsplatform.cpp, line 324] 0x07000600 my current suspect is: PRUint16 segCountX2 = ReadShortAt(aBuf, OffsetSegCountX2); but I'm going home or something. I have 5 fonts which cause crashes, http://www.webpagepublicity.com/free-fonts/n/Neurochrome.ttf is a font that matches by name, although I haven't downloaded it to verify it is what I have fonts that trigger this for me: Tall Paul, Neurochrome, Vladimir Script, Australian Sunrise, Outright Televism
Attached patch crash fix (obsolete) — Splinter Review
Attachment #266525 - Flags: review?(vladimir)
Attached patch crash fix v2Splinter Review
Attachment #266525 - Attachment is obsolete: true
Attachment #266532 - Flags: review?(vladimir)
Attachment #266525 - Flags: review?(vladimir)
Comment on attachment 266532 [details] [diff] [review] crash fix v2 add a check for gdata+(endCount-startCount)*2 >= aBuf+length , and not just gdata >= also get rid of the length != 0 checks (unnecessary with the length > 16 check) r=me with that
Attachment #266532 - Flags: review?(vladimir) → review+
fixed + no more crashes i think
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
Attachment #266797 - Flags: superreview?(pavlov)
Attachment #266797 - Flags: review?(pavlov)
Attachment #266797 - Flags: superreview?(pavlov)
Attachment #266797 - Flags: superreview+
Attachment #266797 - Flags: review?(pavlov)
Attachment #266797 - Flags: review+
The mingw build issue patch is checked in now.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: