Closed Bug 117877 Opened 24 years ago Closed 17 years ago

x11 font banning checkin (bug 104075) increased startup time (Ts) by 2%

Categories

(Core Graveyard :: GFX: Gtk, defect)

x86
Linux
defect
Not set
major

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: mcafee, Unassigned)

References

Details

(Keywords: perf, regression)

Attachments

(5 files, 2 obsolete files)

This checkin increased startup times by about 2%: http://bonsai.mozilla.org/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=1010022660&maxdate=1010024279 gisburn(roland) says this filters out bogus fonts at startup, but we're taking a 150ms/7100ms hit on 400Mhz machines like facedown (6pm 1/2 jump) http://tegu.mozilla.org/graph/query.cgi?testname=startup&tbox=facedown&autoscale=1&days=7&avg=1 We need to look at how this was implemented, can we delay this filtering after startup, and anything else that can get this increase in time back down.
Keywords: perf, regression
You can easily figure out whether the patch really causes this issue.... just remove font.x11.acceptfontpattern and font.x11.rejectfontpattern from unix.js - then the font banning code will not be activated ...
note: this slowdown is probably dependent on the number of fonts, e.g. this will show up worse on a system like irix that has more fonts. roland says these prefs are on by default, turning this off by default might be an option. Or maybe we could try looking for some known bogus fonts. Just thinking of options here...
jrgm brings up the point, that this might be an OS problem, e.g. if you have bogus fonts, remove them from the OS, and let's not solve this in mozilla.
Can someone define "bogus fonts" or point me to the bug that does? It seems like a bad idea to tell people to remove fonts from their system in order for Mozilla to use reasonable fonts rather than ignoring the fonts which may be valid for other programs.
Depends on: 104075
QA Contact: doronr → timeless
Summary: font checkin increased startup time (Ts) by 2% → x11 font banning checkin (bug 104075) increased startup time (Ts) by 2%
I was using the term "bogus fonts" based on a brief conversation in irc, I am probably mis-using the term. Maybe seawood or the code author could comment on what the font banning code is doing, and how mozilla is different from other X apps that probably have access to the same set of fonts.
bogus fonts: JISX0201 fonts that are missing glyphs CNS11643-0 fonts on solaris that have no glyphs (zero glyphs in the font) ugly fonts: many URW foundry fonts such as Helvetica are very ugly
One of the reasons Erik van der Poel added the foundry to the font name in the font prefs dialog is because the code keep finding URW Helvetica (which is ugly) before finding Adobe Helvetica (which is attractive). While Adobe is alphabetically ahead of URW the font code must follow the order in the font path. I doubt that any app wants the ugly URW fonts. But it seems to me that asking users or OS-vendors to remove these fonts so Moz can look good would be perceived as presumptive/arrogant on our part. I have specifically requested the sun people remove their bogus CNS11643 fonts but they have not to my knowledge. With these bogus fonts on the system Moz ends up drawing blank chars and we get bug reports. My current fix for the JISX0201 fonts disables far too many other fonts. Using font banning would help correct this. The slow down might be caused by parsing the regular expression or it might be cause by running the regular expressions over the font names. Roland, can we get some metrics on where the time is spent? I have heard this is the way to measure time: http://www.mozilla.org/performance/measureStartup.html
Just a minor nit: We can comment out the accept pattern - |pref("font.x11.acceptfontpattern", ".*");| is just a dummy pattern which accepts everything... But I doubt it will half the startup time increment (e.g. 1% instead of 2%)... Linux regex code is just braindead and very very inefficient compared to the code on Solaris ... ;-(
Based on what I have read so far, I vote for turning this feature off by default. Hopefully the bad-font cases are a minority group of users, those people can turn this on.
bstell: Do you like that idea ?
Lets try to turn off the accept pattern. In theory (assuming we scan many many fonts) it should half the time we spend in the font ban code (e.g. we call half the number of regexec() and save the regcomp() for the accept pattern in nsFontMetricsGTK::Init()).
Well, if URW fonts are in the bad font category, then that affects all RedHat users which, I'm told, is a significant portion of our Linux userbase.
We are burning engineers to go after those 0.5% performance problems. This 2% increase is a huge step backwards. Normally, when regressions like this happens, options are: 1) back out the fix/feature, till we find fix that won't hurt performance 2) turn fix/feature off by default I really appreciate everyone for turning this off. thanks. :-)
Roland: I want to hear an argument for turning any of this on by default. How many users are affected here? I haven't heard of font problems from most of my unix co-workers, and if they care then turning on the pref is not an unreasonable request.
cathleen wrote: > Normally, when regressions like this happens, options are: > 1) back out the fix/feature, till we find fix that won't hurt performance This won't help, we already have patches and code in the tree which depends on some stuff introduced with the patch ... ---- sleestack wrote: > Roland: I want to hear an argument for turning any of this > on by default. How many users are affected here? I haven't > heard of font problems from most of my unix co-workers, Question back: How many Unix coworkers do you have which use Chinese/Japanese/Arabic/Hebrew glyphs ? > and if they care then turning on the pref is not an unreasonable request. I'll file a patch to turn it OFF by default, however I don't think bstell will like this... One option may be to extend the code and add per-charset regex patterns (like "pref("font.x11.rejectfontpattern.iso8859-1", ...)", e.g. first test for the charset - and if it matches then run the regexec()) ...
Another optimisation - only copy the string _once_ (not twice) - and only for fonts with full XLFD syntax.
Does anyone know where the 2% time is spent? Lacking that info it seems difficult to hold a meaningful discussion (including r=/sr= a patch). I would strongly like to have font banning in (and not turned off). Asking people to turn it on seems unreasonable. How would people know its an option? Do we plan to add it to the GUI? Just to be clear: I am not simply suggesting we accept a 2% slow down so we can have font banning. Lets find out what is happening, what we can do to address the issue, and then discuss options.
let's get attachment 63466 [details] [diff] [review] r=/sr= (only for now), and land that first I'd like to see the performance hit goes away first for several cycles, before we even try again with the other patch to speed things up.
Attached image Startup time graph
Just looking at todays startup time graph - the startup time is now far below the times after (and even before) my checkin ... ... are we really sure that my patch caused this spike - or is it possible that some other effect is responsible for this (chrome/XPCOM registration of changed files etc.) ?
your checkin was the _only checkin at teh time the Ts changed. the speedup since then was the change of the tboxes from gnome to twm as a window manager ( as Starred on tinderbox at the time sleestack made the change)
Chris and I will try turning of the prefs and get a time measurement. If this leads to a 2% improvment then we will leave it in and reopen bug 56441 requesting rework.
bstell: Is it possible to disable the font banning code by default, use it in bugs for analysis on the customer side and then turn these bna patterns into hardcoded bans ?
since Roland does not have a system where he can get timing numbers I will work on getting the time info
Roland, ask dp@netscape.com for help timing the builds. We have tests that run on all platforms, start here: http://www.mozilla.org/performance/measureStartup.html You must be THIS HIGH to check in to the mozilla tree, come on, impress me.
Chris, I chatted with Roland last night and we agreed that I would run the timing tests and he would do some work for me on a different bug. It is in my queue to work on this afternoon.
here are my rough measurements (regex are calculated): get prefs: 0.13 = 1.3% sprintf screen info: 1.95 = 19.2% regex reject: 0.98 = 9.6% regex accept: 7.08 = 69.8% ------------------------- 10.14 milliseconds Notes: my system = 1.2 GHz Athlon PC Redhat 6.2 debug build all times in milli-seconds all times should be taken as approximate 244 fonts regex'd to get to start screen http://www.mozilla.org/start/
I recommend we apply attachment 63407 [details] [diff] [review] and find out how it affects the startup metrics. r=bstell@ix.netcom.com for attachment 63407 [details] [diff] [review]
I am testing the turn-off patch 63466 right now, looks like 2.2% or so faster, will attach a plot in a bit. I can test the 63407 plot next, might have to wait until later tonight.
Chris: thank you for planning to test attachment 63407 [details] [diff] [review].
As we expected, we get 1/2 of the time back with the accept patch. The first dip in the 63712 plot (the 1% plot) shows first the 2% turn-everything-off dip, then the 1% only-turn-off-accept-pattern dip.
I think we can get another 0.5% speedup via more cleanup (like attachment 63467 [details] [diff] [review]) of nsFontMetricsGTK.cpp ... is that acceptable then ?
Chris: I'm curious how my time measurements compare to the graphs. What's the CPU speed of Coffee?
Roland: I'm curious about this: > I think we can get another 0.5% speedup via more cleanup (like attachment > 63467) It is a good idea to remove the extra str copy and use list[i]. - char buf[512]; - PL_strncpyz(buf, name, sizeof(buf)); - char *fName = buf; Did you do measurements? <stuck record>It is hard to tell where time is spent without measuring</stuck record> My guess is that the PL_strncpyz takes very little time. But one never knows. I admit I was surprised to find that the sprintf took more time than reject regex. Could you work on speeding up the sprintf first? Notes: I do not see any advantage in dropping the {} around the continue. Looks like you have a tab in the diff. I see blank lines being diff'd. I see unchanged lines being diff'd.
coffee is a 450MHz scsi system with 256M ram. I think we should get the time back first, let's turn this off and then make another attempt to turn this on and measure the performance hit at that time. r=mcafee for patch #2 to turn this off.
What mcafee wrote: flip the pref, we do not tolerate this kind of regression. We need less than zero tolerance. /be
Comment on attachment 63466 [details] [diff] [review] Patch to turn off the whole font ban code by default (both rejectpattern and acceptpattern) If this pref-backout really reduces the performance effects of the font banning code to nil, then let's stop there. I'm not impressed that we knew about these issues at the point of checkin, after at least one sr expressed concerns about the performance effects of regexes. To say nothing of the fact that the addition of proper PR_LOG use was just handwaved away as "too much risk", as though PR_LOG were inherently more risky than fprintf. If this pref _isn't_ enough to totally remove the effects of font-banning on performance for cases where fonts aren't banned, then I think we should back it all out until someone finds a way to make that so. We can't take performance regressions like this, now.
Attachment #63466 - Flags: superreview+
Comment on attachment 63466 [details] [diff] [review] Patch to turn off the whole font ban code by default (both rejectpattern and acceptpattern) r=bstell@ix.netcom.com
Attachment #63466 - Flags: review+
Switching all the NS_FONT_DEBUG to PR_LOG is a separate issue from font banning and worthy of a discussion. I also felt it was beyond the scope of bug 104075. For example, the NS_FONT_DEBUG prints are available in optimized builds where as PR_LOG is not. This is a point where people tend to express opinions which is why it seems out of scope for such a specific bug as font banning, bug 104075. http://www.mozilla.org/projects/nspr/reference/html/prlog.html#25266 Logging is compiled into the NSPR debug builds; logging is not compiled into the NSPR optimized builds. One of the problems I had prior to adding the NS_FONT_DEBUG macros is that I would get bugs from users (often with optimized builds) saying the font search did something wrong. Since I could not reproduce the font environment I frequently could not reproduce the problem. This made it very hard to debug and fix these problems. I've opened bug 118479 for that discussion. If you are interested in this kindly cc: yourself to that bug.
thanks bstell, I see the expected 2% drop in Ts times across the five startup tbox tests.
Severity: normal → major
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla0.9.8
Brian Stell wrote: > Roland: I'm curious about this: > > I think we can get another 0.5% speedup via more cleanup (like attachment > > 63467) > > It is a good idea to remove the extra str copy and use list[i]. > - char buf[512]; > - PL_strncpyz(buf, name, sizeof(buf)); > - char *fName = buf; > > Did you do measurements? <stuck record>It is hard to tell where time is spent > without measuring</stuck record> My guess is that the PL_strncpyz takes very > little time. But one never knows. I admit I was surprised to find that the > sprintf took more time than reject regex. Me, too... :) > Could you work on speeding up the sprintf first? Sure. But attachment attachment 63467 [details] [diff] [review] is not the whole thing. The idea is to clean-up the whole fontmetrics code (for example - we can implement the way how new fontmetrics objects are obtained in Xlib/Xprint code (it bypasses the component manager and uses a nsFontCache subclass)), the patch wasn't complete. Doing some profiling on the code may squish some percent out of it without the need to sacrify the font banning code... I'll file a new patch this week...
My point was/is: <stuck record>ya can't tell where the time is spent without measuring</stuck recond> I'm actually pleased with your work in attachment 63467 [details] [diff] [review]. With a bit of minor rework I'd be glad to r= it.
More cleanup&co.: - Replaced sprintf() in font banning code with quick&dirty code - |nsDeviceContextGTK| now overrides nsFontCache::CreateFontCache() to bypass the component manager for creating |nsIFontMetrics| objects (nsDeviceContextXlib already does that since eternity) - Fixed a minor nit in nsDeviceContextPS/nsDeviceContextXp/nsDeviceContextXlib where the code returns NS_OK instead of the return value of |mFontCache->Init()| More performance work on demand...
Attachment #63467 - Attachment is obsolete: true
review of attachment 64633 [details] [diff] [review]: I do not know enough to approve/disapprove the changes to gfx/src/gtk/nsDeviceContextGTK.cpp Please add assertions to verify the inputs to moz_itoa. Please find and remove the tabs (or funny indentation) in the patch. I ran the page loader test without/with/without/with/without the patch and the time seems to be the same (1st run was without and there was one very long URL load). excluding the nsDeviceContextGTK.cpp part r=bstell@ix.netcom.com
bstell: are you testing pageload time in addition to startup time?
Oops, I ran the page loader, http://cowtools.mcom.com/page-loader/loader.pl, I forgot to test startup time. Roland, please wait on checkin as I need to measure startup time.
hum... I measured differently but it should give equivalent results to the tests in comment #29 get prefs seemes about the same at 0.15 milli-sec font count seems to be up to 1156 (from 244) build pattern is 1.36 milli-sec regex_accept is 0.45 milli-sed regex_reject is 13.54 milli-sec ------------------------------- 15.35 milli-sec which is up from the 10.14 milliseconds but the font count is up almost 5x (474%). Does this mean that the equivalent time would be 15.35/4.74 = 3.23 milli-seconds or does this mean the time is up 5 more milli-seconds? I'll have to work on this a bit more.
Chris: Can I get your help figuring this out?
The patch has rotten too much... I have to file a new one... ;-(
Target Milestone: mozilla0.9.8 → mozilla1.0
Attachment #64633 - Attachment is obsolete: true
performance regression -> mozilla1.0, nsbeta1
Keywords: mozilla1.0, nsbeta1
Cathleen: this code is currently disabled so (someone correct me if I'm wrong) this is not a performance regression. We are trying to figure out how to re-enable it without causing a performance regression.
okay. if it's disabled, that's great. :-) moving off nsbeta1, mozilla 1.0
Keywords: mozilla1.0, nsbeta1
Uhm, is anyone doing the performance measurements (looking at comment #52 and comment #53) ?
Blocks: 127604
Target Milestone: mozilla1.0 → ---
Product: Browser → Seamonkey
Roland, Are you still working on this ?
Priority: P1 → --
I'm moving this bug to Core::Internationalization since the dependent bug is there. Not sure if this should belong to Core:Layout Text instead.
Component: General → Internationalization
Product: SeaMonkey → Core
QA Contact: timeless → i18n
Whiteboard: CLOSE-ME (OBSOLETE?)
Performance issue in dead code -> WONTFIX
Assignee: roland.mainz → nobody
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Component: Internationalization → GFX: Gtk
QA Contact: i18n → gtk
Resolution: --- → WONTFIX
Whiteboard: CLOSE-ME (OBSOLETE?)
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: