Closed Bug 119066 Opened 23 years ago Closed 21 years ago

pageload and window open slowed down (~1%)

Categories

(SeaMonkey :: General, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Future

People

(Reporter: cathleennscp, Assigned: harishd)

Details

(Keywords: perf, regression)

Attachments

(8 files)

looks like txul (window open) is also slowed down during this time: 467ms -> 486ms on comet (4%) for the same check-ins above (http://tinderbox.mozilla.org/bonsai/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=1010539260&maxdate=1010540819) 1434ms -> 1476ms on sleestack (2.9%) for a slightly larger group of checkins (http://tinderbox.mozilla.org/bonsai/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=1010540220&maxdate=1010544419)
Summary: pageload slowed down by 1% (14ms) → pageload and window open slowed down by 1% (14ms)
Summary: pageload and window open slowed down by 1% (14ms) → pageload and window open slowed down
My checkins-under-suspicion affected the XUL window opening process, so could reasonably be suspected of affecting Txul times. But I've profiled before and after and I believe my changes to be a wash. I'm thinking I'm excused...
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla0.9.8
Looks like it's me! Donno how my checkin afects pageload. Investigating.
So, I had set up a couple of builds to do some testing, one pull-by-date from before harishd's (and other) checkins and up to the tip at ~18:00 today. I was just trying some startup tests to confirm the gain there and I stumbled across the fact that 'XUL.mfasl' was getting blown away on alternate startups. That means that the fasl file is corrupt, and, I believe, that the XUL cache may not be getting (correctly) populated. That, of course, would have an effect on both pageload and, moreso, window open. [It may be the case that the startup gain is better than reported so far, although it may be the case that since we are just reporting the minimum time, we are already reporting about what it will be even if the fasl file was not getting rebuilt on each startup]. I've confirmed that today's Linux commercial build shows the same problem with XUL.mfasl as my own trunk build showed. XUL.mfasl is deleted every second startup.
see bug for an assertion and bail that happens on every second startup (on Linux at least).
see bug 119038 for an assertion and bail that happens on every second startup (on Linux at least).
This jprof output was collected as: setenv JPROF_FLAGS "JP_START JP_PERIOD=0.010" and running command: ./mozilla -chrome "file:///mozilla2/mozilla/xpfe/test/winopen.xul?close=1&phase1=20&phase2=20&phase3=10&cycles=2" which opens 100 windows. The Jan08 build had these times (with jprof not running): avgOpenTime:894 minOpenTime:877 maxOpenTime:1068 medOpenTime:891 The Jan09 build (pulled 6:10pm; jprof to follow) had these times (jprof not running): avgOpenTime:912 minOpenTime:895 maxOpenTime:954 medOpenTime:912 which is 2% slower. If any jprof maestros can lend an interpretation (I'm just starting to learn how to use this jprof output).
adding the jprof maestro's referred to above (sorry I didn't have a place to host the web page and the attachment unzipped was too large for bugzilla).
First thoughts: Jan 9 seems to be doing more allocations and frees, especially of COM objects: Jan 8: 99 298 nsCOMPtr_base::~nsCOMPtr_base Jan 9: 113 331 nsCOMPtr_base::~nsCOMPtr_base It also looks like there are more QI's going on (which may make sense given the above): Jan 8: 33 575 nsCOMPtr_base::assign_from_helper Jan 9: 40 601 nsCOMPtr_base::assign_from_helper Together those would account for around 1/2 of the difference. Running some of the "how many objects of what types are being allocated" tests might help. Part of the problem is that your jprof isn't fine enough resolution to find this sort of thing easily. Sampling error is too large for a total difference of only 125 out of ~5600. I normally use JP_PERIOD=0.0015 (about the lowest you can go), though that's also on a fast machine.
What/Where is XUL.mfasl?
That is the fastload file. Fastload file corruption can increase startup and pageload. That probably explains why we are not seeing the 10% increase in startup that dbaron was seeing from this checkin. I will help you track this down today.
I chatted with harishd over IRC, but for the record: in your profile directory, the fastload file is XUL FastLoad File on Macs, XUL.mfl on Windows, and XUL.mfasl on Unix-y systems. The corruption happens when writing a new file (because there was on file, or it was invalid), and is only detected on the next app startup. /be
I earlier said: > That means that the fasl file is corrupt, and, I believe, that the XUL > cache may not be getting (correctly) populated. but now I'm wondering whether I was confusing 'bad fastload' with an invalid XUL document (or XBL binding). So, question: is the XUL cache immune from fastload handling errors, or is it the case that the XUL cache can be (possibly) left in a less than optimal state. (Just curious).
I have a fix for the fast loader problem. Have to run page load test against my fix. Hopefully, this would fix Tp & Txul
jrgm: the XUL cache is not persistent, and any failure to write or read FastLoad data should not affect what's in the cache. /be
Ts on commet and sleestack dropped by 4% and 5% respectively ( after fixing bug 119038 ). However, I don't see a significant change in Tp/Txul.
I built two mozilla builds with MOZ_PROFILE=1,MOZ_DEBUG=. One was pulled just before harishd's first parser checkin on Jan 8, and the other was pulled at midnight Jan 10, which had the leak and fastload fixes (along with whatever else landed on the trunk in that period). The window open times on btek, mecca, etc. (for Linux) have continued to show about a ~2% regression in window open times. I did two quantify runs, one with each build, by starting the browser, immediately stopping data collection as the browser, then beginning data collection again, running a test to time the opening of 5 new windows, stopping the data collection, and finally shutting down and letting quantify process the log. I then used the 'delta' comparison feature to compare the two runs (which gives you a call graph and function lists showing the difference in the two runs). The hot spot of this 'delta' Quantify comparison is the following. From nsParser::ResumeParse, I have: build: Jan 10 23:59 pull build: Jan 8 17:36 pull nsParser::Tokenize nsParser::Tokenize --> 216 calls (675ms) --> 213 calls (401ms) | | | | nsExpatDriver::ConsumeToken nsExpatTokenizer::ConsumeToken --> 186 calls (666ms) --> 185 calls (382ms) | | | | nsExpatDriver::ParseBuffer nsExpatTokenizer::ParseXMLBuffer --> 203 calls (665ms) --> 196 calls (381ms) | | | | XML_Parse XML_Parse --> 383 calls (664ms) --> 381 calls (380ms) | | --------------------- --------------------- | | | | prologProcessor prologInitProcessor prologProcessor prologInitProcessor --> 169 calls --> 93 calls --> 93 calls --> 93 calls (429ms) (274ms) (157ms) (157ms) and at this point, it's probably better to look at the actual Quantify data, since this ascii diagram is about to get too confusing (if not already). I've put the two original files and the delta file at http://jrgm.mcom.com/quantify/bug-119066 (I'd upload them to bugzilla but they're up to 1/4 GB in size :-]). That 275 msec difference, over five windows, is ~50 msec for a normal runtime window opening of 750 msec, which is about a 7% difference. That's much higher than the actual regression on Linux, but (a) I need to do a real measurement on windows, for the same conditions as this Quantify data was generated, to see what the win32 number looks like for real, and (b) I'm not sure that the Quantify real-time values are useful beyond getting the general "flavour" of what this maps to for real-time. So, 7% is "kind of" correct. Maybe, though, I've badly misread this output, or otherwise botched this test. But perhaps this is an area to look at for what slowed down during window open. (I have one (very) naive question: If we're opening a window type that has already been opened before, why is the xml parser being called. I thought that we would have already had this, in predigested form, in the XUL cache? brendan? waterson? bueller?)
Keywords: perf
Just to be clear on the notation: nsParser::Tokenize --> 216 calls (675ms) means that nsParser::Tokenize() was called 216 times by the function above for a (Quantify estimated) realtime cost of 675 msec for the function plus descendant (F+D) time, for the given column (build).
> by the function above by the function above it in the column
Keywords: regression
Severity: normal → critical
jrgm: Is it F-time or F+D time? If F+D then in the nsExpatTokenizer case you should include CWellFormedDTD time [ ::BuildModel(), ::GetTokenizer() ] as well since all the sink calls happen via CWellFormedDTD.
jrgm: The code path has changed a bit and am not sure if the quantify data would be of much use :-(
Out of time. Mass move to 0.9.9
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Okay, so I did a set of Quantify runs again, once with a build pulled by data 01/08 17:36pm, and again with a build pulled 01/17 02:00am. Data collection was enabled only for a period of time where I opened and closed 20 windows using a default, new profile in each test (with the sidebar collapsed; in the classic skin). When I tested outside of Quantify, there was a 1% difference in the times between the builds (Jan17 slower), and while testing in Quantify, the Jan17 build measured as 2% slower. Using Quantify's difference function, in comparing the two builds, the biggest time difference in the two builds was the F+D time descending form nsParser::ResumeParse, which took (according to Quantify) 2618ms for the Jan17 build and 2094ms for the Jan08 build (diff = 524ms, or ~25ms per window open). I'll attach a picture of the 'critical path' in the diff Quantify run, and an abbreviated list of the top and bottom of a function list sorted by F+D. The actual Quantify runs (base, new, diff) will be posted at: http://jrgm.mcom.com/quantify/bug-119066/Jan08-Jan17 In the abbreviated F+D list, first, leave aside the top 3 and bottom 3 largest F+D changes. (nsCSSFrameConstructor::ContentInserted shows up as different function calls due to a signature change rev 1.110 of nsCSSFrameConstructor.h; the flip of 590 calls from nsXBLPrototypeHandler::BindingDetached to nsXBLPrototypeHandler::BindingAttached looks just plain weird, but the constructors for nsXBLPrototypeHandler() were changed, rev 1.11 of nsXBLPrototypeHandler.h, and maybe Quantify is just confused by that; the flip of nsJSContext::`vector deleting destructor' to nsJSContext::`scalar deleting destructor' is something for someone else to ponder). So, the biggest "real" gainers and losers after that are all tied up in the parser (e.g., nsParser::BuildModel took 665ms more time in the 01/17 build). Hope that points to something that is solvable.
In that image, which shows the difference in execution paths between the build pulled Jan08 and the build pulled Jan17, the brown arcs are code paths that are called _more_ in the Jan17 build, and the green arcs are code paths that are called _less_ in the Jan17 build (which includes code paths that don't exist in the Jan17 build).
Attachment #65592 - Attachment mime type: text/html → image/png
>So, the biggest "real" gainers and losers after that are all tied up in the >parser (e.g., nsParser::BuildModel took 665ms more time in the 01/17 build). NS_IMETHODIMP nsExpatDriver::BuildModel(nsIParser* aParser, nsITokenizer* aTokenizer, nsITokenObserver* anObserver, nsIContentSink* aSink) { return mInternalState; } The above method does no work ( virtually ) so if you're seeing a difference in nsParser::BuildModel() then either CNavDTD or layout has slowed down. I don't believe it's CNavDTD since all I changed there is replacing virtual nsresult to NS_IMETHOD.
Landed fix for bug 120345. It improved Txul time by ~1%.
one small step for man... thanks harish.
Keywords: nsbeta1
harishd, so where are we at with this one?
Target Milestone: mozilla0.9.9 → mozilla1.0
It's extremely hard to spot the problem and there's no way to come up with a fix in the 1.0 milestone. Moving to 1.1 ( hopefully it will get fixed then! ).
Keywords: nsbeta1nsbeta1-
Summary: pageload and window open slowed down → pageload and window open slowed down (~1%)
Target Milestone: mozilla1.0 → mozilla1.1
I recommend some conclusion here. If there's a trail we need to keep looking at, let's look at it. If not, then let's marked this bug as done. Pushing this to 1.1 reduced the probability of anything happening quite a bit.
harish, this is a tinderbox performance regression. we need to address this bug, either come up with a fix or explain why your feature/fix is costing perf regression. I don't think it's fair to mark it a nsbeta1-, the longer we push off, the smaller the chance we'll be able to address it.
What has been done about that in the meantime?
Target Milestone: mozilla1.1alpha → mozilla1.1beta
Uh Markus, you're not allowed to change other engineer's target milestone. Don't do that.
>harish, this is a tinderbox performance regression. we need to address this >bug, either come up with a fix or explain why your feature/fix is costing perf >regression. Cathleen, it's hard to say what caused that regression. However, IMO, we should compromise this regression for the benefits ( 10% improvement in Ts, and ~2MB reduction in memory bloat ).
Target Milestone: mozilla1.1beta → Future
so is going to be marked "fixed" then ?
I propose that this bug be closed or at least downgraded from Critical P1. I believe all the perf regressions discussed here have long since been superceded by other performance improvements.
I think we have given this a close enough look. It hasn't gotten attention for 2 years, it is almost pointless to go after this small a performance win after such a long time. Since some work did happen, I will mark this fixed. Harish, if you are active in this bug please feel free to correct me or add something here.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: