Closed
Bug 119066
Opened 23 years ago
Closed 20 years ago
pageload and window open slowed down (~1%)
Categories
(SeaMonkey :: General, defect, P1)
SeaMonkey
General
Tracking
(Not tracked)
RESOLVED
FIXED
Future
People
(Reporter: cathleennscp, Assigned: harishd)
Details
(Keywords: perf, regression)
Attachments
(8 files)
tinderbox pageload jumped from 1220's ms to 1240's ms Checkins happend during 1/08 17:47PM to 18:56PM: http://tinderbox.mozilla.org/bonsai/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=1010538840&maxdate=1010540819 See graph: http://tegu.mozilla.org/graph/query.cgi?tbox=btek&testname=pageload&autoscale=1&days=7&avg=1 We need to address this soon.
looks like txul (window open) is also slowed down during this time: 467ms -> 486ms on comet (4%) for the same check-ins above (http://tinderbox.mozilla.org/bonsai/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=1010539260&maxdate=1010540819) 1434ms -> 1476ms on sleestack (2.9%) for a slightly larger group of checkins (http://tinderbox.mozilla.org/bonsai/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=1010540220&maxdate=1010544419)
Summary: pageload slowed down by 1% (14ms) → pageload and window open slowed down by 1% (14ms)
Summary: pageload and window open slowed down by 1% (14ms) → pageload and window open slowed down
My checkins-under-suspicion affected the XUL window opening process, so could reasonably be suspected of affecting Txul times. But I've profiled before and after and I believe my changes to be a wash. I'm thinking I'm excused...
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla0.9.8
Looks like it's me! Donno how my checkin afects pageload. Investigating.
Comment 8•23 years ago
|
||
So, I had set up a couple of builds to do some testing, one pull-by-date from before harishd's (and other) checkins and up to the tip at ~18:00 today. I was just trying some startup tests to confirm the gain there and I stumbled across the fact that 'XUL.mfasl' was getting blown away on alternate startups. That means that the fasl file is corrupt, and, I believe, that the XUL cache may not be getting (correctly) populated. That, of course, would have an effect on both pageload and, moreso, window open. [It may be the case that the startup gain is better than reported so far, although it may be the case that since we are just reporting the minimum time, we are already reporting about what it will be even if the fasl file was not getting rebuilt on each startup]. I've confirmed that today's Linux commercial build shows the same problem with XUL.mfasl as my own trunk build showed. XUL.mfasl is deleted every second startup.
Comment 9•23 years ago
|
||
see bug for an assertion and bail that happens on every second startup (on Linux at least).
Comment 10•23 years ago
|
||
see bug 119038 for an assertion and bail that happens on every second startup (on Linux at least).
Comment 11•23 years ago
|
||
This jprof output was collected as: setenv JPROF_FLAGS "JP_START JP_PERIOD=0.010" and running command: ./mozilla -chrome "file:///mozilla2/mozilla/xpfe/test/winopen.xul?close=1&phase1=20&phase2=20&phase3=10&cycles=2" which opens 100 windows. The Jan08 build had these times (with jprof not running): avgOpenTime:894 minOpenTime:877 maxOpenTime:1068 medOpenTime:891 The Jan09 build (pulled 6:10pm; jprof to follow) had these times (jprof not running): avgOpenTime:912 minOpenTime:895 maxOpenTime:954 medOpenTime:912 which is 2% slower. If any jprof maestros can lend an interpretation (I'm just starting to learn how to use this jprof output).
Comment 12•23 years ago
|
||
same test; jan 09 18:10 build.
Comment 13•23 years ago
|
||
adding the jprof maestro's referred to above (sorry I didn't have a place to host the web page and the attachment unzipped was too large for bugzilla).
Comment 14•23 years ago
|
||
First thoughts: Jan 9 seems to be doing more allocations and frees, especially of COM objects: Jan 8: 99 298 nsCOMPtr_base::~nsCOMPtr_base Jan 9: 113 331 nsCOMPtr_base::~nsCOMPtr_base It also looks like there are more QI's going on (which may make sense given the above): Jan 8: 33 575 nsCOMPtr_base::assign_from_helper Jan 9: 40 601 nsCOMPtr_base::assign_from_helper Together those would account for around 1/2 of the difference. Running some of the "how many objects of what types are being allocated" tests might help. Part of the problem is that your jprof isn't fine enough resolution to find this sort of thing easily. Sampling error is too large for a total difference of only 125 out of ~5600. I normally use JP_PERIOD=0.0015 (about the lowest you can go), though that's also on a fast machine.
Assignee | ||
Comment 15•23 years ago
|
||
What/Where is XUL.mfasl?
Comment 16•23 years ago
|
||
That is the fastload file. Fastload file corruption can increase startup and pageload. That probably explains why we are not seeing the 10% increase in startup that dbaron was seeing from this checkin. I will help you track this down today.
Comment 17•23 years ago
|
||
I chatted with harishd over IRC, but for the record: in your profile directory, the fastload file is XUL FastLoad File on Macs, XUL.mfl on Windows, and XUL.mfasl on Unix-y systems. The corruption happens when writing a new file (because there was on file, or it was invalid), and is only detected on the next app startup. /be
Comment 18•23 years ago
|
||
I earlier said: > That means that the fasl file is corrupt, and, I believe, that the XUL > cache may not be getting (correctly) populated. but now I'm wondering whether I was confusing 'bad fastload' with an invalid XUL document (or XBL binding). So, question: is the XUL cache immune from fastload handling errors, or is it the case that the XUL cache can be (possibly) left in a less than optimal state. (Just curious).
Assignee | ||
Comment 19•23 years ago
|
||
I have a fix for the fast loader problem. Have to run page load test against my fix. Hopefully, this would fix Tp & Txul
Comment 20•23 years ago
|
||
jrgm: the XUL cache is not persistent, and any failure to write or read FastLoad data should not affect what's in the cache. /be
Assignee | ||
Comment 21•23 years ago
|
||
Ts on commet and sleestack dropped by 4% and 5% respectively ( after fixing bug 119038 ). However, I don't see a significant change in Tp/Txul.
Comment 22•23 years ago
|
||
I built two mozilla builds with MOZ_PROFILE=1,MOZ_DEBUG=. One was pulled just before harishd's first parser checkin on Jan 8, and the other was pulled at midnight Jan 10, which had the leak and fastload fixes (along with whatever else landed on the trunk in that period). The window open times on btek, mecca, etc. (for Linux) have continued to show about a ~2% regression in window open times. I did two quantify runs, one with each build, by starting the browser, immediately stopping data collection as the browser, then beginning data collection again, running a test to time the opening of 5 new windows, stopping the data collection, and finally shutting down and letting quantify process the log. I then used the 'delta' comparison feature to compare the two runs (which gives you a call graph and function lists showing the difference in the two runs). The hot spot of this 'delta' Quantify comparison is the following. From nsParser::ResumeParse, I have: build: Jan 10 23:59 pull build: Jan 8 17:36 pull nsParser::Tokenize nsParser::Tokenize --> 216 calls (675ms) --> 213 calls (401ms) | | | | nsExpatDriver::ConsumeToken nsExpatTokenizer::ConsumeToken --> 186 calls (666ms) --> 185 calls (382ms) | | | | nsExpatDriver::ParseBuffer nsExpatTokenizer::ParseXMLBuffer --> 203 calls (665ms) --> 196 calls (381ms) | | | | XML_Parse XML_Parse --> 383 calls (664ms) --> 381 calls (380ms) | | --------------------- --------------------- | | | | prologProcessor prologInitProcessor prologProcessor prologInitProcessor --> 169 calls --> 93 calls --> 93 calls --> 93 calls (429ms) (274ms) (157ms) (157ms) and at this point, it's probably better to look at the actual Quantify data, since this ascii diagram is about to get too confusing (if not already). I've put the two original files and the delta file at http://jrgm.mcom.com/quantify/bug-119066 (I'd upload them to bugzilla but they're up to 1/4 GB in size :-]). That 275 msec difference, over five windows, is ~50 msec for a normal runtime window opening of 750 msec, which is about a 7% difference. That's much higher than the actual regression on Linux, but (a) I need to do a real measurement on windows, for the same conditions as this Quantify data was generated, to see what the win32 number looks like for real, and (b) I'm not sure that the Quantify real-time values are useful beyond getting the general "flavour" of what this maps to for real-time. So, 7% is "kind of" correct. Maybe, though, I've badly misread this output, or otherwise botched this test. But perhaps this is an area to look at for what slowed down during window open. (I have one (very) naive question: If we're opening a window type that has already been opened before, why is the xml parser being called. I thought that we would have already had this, in predigested form, in the XUL cache? brendan? waterson? bueller?)
Keywords: perf
Comment 23•23 years ago
|
||
Just to be clear on the notation: nsParser::Tokenize --> 216 calls (675ms) means that nsParser::Tokenize() was called 216 times by the function above for a (Quantify estimated) realtime cost of 675 msec for the function plus descendant (F+D) time, for the given column (build).
Comment 24•23 years ago
|
||
> by the function above
by the function above it in the column
Keywords: regression
Assignee | ||
Comment 25•23 years ago
|
||
jrgm: Is it F-time or F+D time? If F+D then in the nsExpatTokenizer case you should include CWellFormedDTD time [ ::BuildModel(), ::GetTokenizer() ] as well since all the sink calls happen via CWellFormedDTD.
Assignee | ||
Comment 26•23 years ago
|
||
jrgm: The code path has changed a bit and am not sure if the quantify data would be of much use :-(
Assignee | ||
Comment 27•23 years ago
|
||
Out of time. Mass move to 0.9.9
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Comment 28•23 years ago
|
||
Okay, so I did a set of Quantify runs again, once with a build pulled by data 01/08 17:36pm, and again with a build pulled 01/17 02:00am. Data collection was enabled only for a period of time where I opened and closed 20 windows using a default, new profile in each test (with the sidebar collapsed; in the classic skin). When I tested outside of Quantify, there was a 1% difference in the times between the builds (Jan17 slower), and while testing in Quantify, the Jan17 build measured as 2% slower. Using Quantify's difference function, in comparing the two builds, the biggest time difference in the two builds was the F+D time descending form nsParser::ResumeParse, which took (according to Quantify) 2618ms for the Jan17 build and 2094ms for the Jan08 build (diff = 524ms, or ~25ms per window open). I'll attach a picture of the 'critical path' in the diff Quantify run, and an abbreviated list of the top and bottom of a function list sorted by F+D. The actual Quantify runs (base, new, diff) will be posted at: http://jrgm.mcom.com/quantify/bug-119066/Jan08-Jan17 In the abbreviated F+D list, first, leave aside the top 3 and bottom 3 largest F+D changes. (nsCSSFrameConstructor::ContentInserted shows up as different function calls due to a signature change rev 1.110 of nsCSSFrameConstructor.h; the flip of 590 calls from nsXBLPrototypeHandler::BindingDetached to nsXBLPrototypeHandler::BindingAttached looks just plain weird, but the constructors for nsXBLPrototypeHandler() were changed, rev 1.11 of nsXBLPrototypeHandler.h, and maybe Quantify is just confused by that; the flip of nsJSContext::`vector deleting destructor' to nsJSContext::`scalar deleting destructor' is something for someone else to ponder). So, the biggest "real" gainers and losers after that are all tied up in the parser (e.g., nsParser::BuildModel took 665ms more time in the 01/17 build). Hope that points to something that is solvable.
Comment 29•23 years ago
|
||
In that image, which shows the difference in execution paths between the build pulled Jan08 and the build pulled Jan17, the brown arcs are code paths that are called _more_ in the Jan17 build, and the green arcs are code paths that are called _less_ in the Jan17 build (which includes code paths that don't exist in the Jan17 build).
Updated•23 years ago
|
Attachment #65592 -
Attachment mime type: text/html → image/png
Comment 30•23 years ago
|
||
Assignee | ||
Comment 31•23 years ago
|
||
>So, the biggest "real" gainers and losers after that are all tied up in the
>parser (e.g., nsParser::BuildModel took 665ms more time in the 01/17 build).
NS_IMETHODIMP
nsExpatDriver::BuildModel(nsIParser* aParser,
nsITokenizer* aTokenizer,
nsITokenObserver* anObserver,
nsIContentSink* aSink)
{
return mInternalState;
}
The above method does no work ( virtually ) so if you're seeing a difference in
nsParser::BuildModel() then either CNavDTD or layout has slowed down.
I don't believe it's CNavDTD since all I changed there is replacing virtual
nsresult to NS_IMETHOD.
Assignee | ||
Comment 32•23 years ago
|
||
Landed fix for bug 120345. It improved Txul time by ~1%.
Comment 33•23 years ago
|
||
one small step for man... thanks harish.
Comment 34•23 years ago
|
||
harishd, so where are we at with this one?
Assignee | ||
Comment 35•22 years ago
|
||
It's extremely hard to spot the problem and there's no way to come up with a fix in the 1.0 milestone. Moving to 1.1 ( hopefully it will get fixed then! ).
Comment 36•22 years ago
|
||
I recommend some conclusion here. If there's a trail we need to keep looking at, let's look at it. If not, then let's marked this bug as done. Pushing this to 1.1 reduced the probability of anything happening quite a bit.
Reporter | ||
Comment 37•22 years ago
|
||
harish, this is a tinderbox performance regression. we need to address this bug, either come up with a fix or explain why your feature/fix is costing perf regression. I don't think it's fair to mark it a nsbeta1-, the longer we push off, the smaller the chance we'll be able to address it.
Comment 38•22 years ago
|
||
What has been done about that in the meantime?
Target Milestone: mozilla1.1alpha → mozilla1.1beta
Comment 39•22 years ago
|
||
Uh Markus, you're not allowed to change other engineer's target milestone. Don't do that.
Assignee | ||
Comment 40•22 years ago
|
||
>harish, this is a tinderbox performance regression. we need to address this
>bug, either come up with a fix or explain why your feature/fix is costing perf
>regression.
Cathleen, it's hard to say what caused that regression. However, IMO, we should
compromise this regression for the benefits ( 10% improvement in Ts, and ~2MB
reduction in memory bloat ).
Updated•22 years ago
|
Target Milestone: mozilla1.1beta → Future
Comment 41•22 years ago
|
||
so is going to be marked "fixed" then ?
Comment 42•20 years ago
|
||
I propose that this bug be closed or at least downgraded from Critical P1. I believe all the perf regressions discussed here have long since been superceded by other performance improvements.
Comment 43•20 years ago
|
||
I think we have given this a close enough look. It hasn't gotten attention for 2 years, it is almost pointless to go after this small a performance win after such a long time. Since some work did happen, I will mark this fixed. Harish, if you are active in this bug please feel free to correct me or add something here.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Updated•20 years ago
|
Product: Browser → Seamonkey
You need to log in
before you can comment on or make changes to this bug.
Description
•