Closed Bug 119066 Opened 23 years ago Closed 21 years ago

pageload and window open slowed down (~1%)

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Future

People

(Reporter: cathleennscp, Assigned: harishd)

Details

(Keywords: perf, regression)

Attachments

(8 files)

Page load [ without my change ] 23 years ago harishd 5.23 KB, text/html		Details
Page load [ with my change ] 23 years ago harishd 5.23 KB, text/html		Details
Page load graph [ without my change ] 23 years ago harishd 8.57 KB, image/png		Details
Page load graph [ with my change ] 23 years ago harishd 8.63 KB, image/png		Details
jprof output (.zip); build pulled by date to Jan 08 17:36 PST; test: open 100 windows 23 years ago John Morrison 348.65 KB, application/octet-stream		Details
jprof output (.zip); build pulled at Jan 09 18:10 PST; test: open 100 windows 23 years ago John Morrison 343.43 KB, application/octet-stream		Details
image of the difference tree, focussed on subtree of nsParser::ResumeParse 23 years ago John Morrison 17.75 KB, image/png		Details
list of top F+D gainers and losers in the diff for Jan08->Jan17 23 years ago John Morrison 2.69 KB, text/plain		Details

Cathleen

Reporter

Description

•

23 years ago

tinderbox pageload jumped from 1220's ms to 1240's ms Checkins happend during 1/08 17:47PM to 18:56PM: http://tinderbox.mozilla.org/bonsai/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=1010538840&maxdate=1010540819 See graph: http://tegu.mozilla.org/graph/query.cgi?tbox=btek&testname=pageload&autoscale=1&days=7&avg=1 We need to address this soon.

Cathleen

Reporter

Comment 1

•

23 years ago

looks like txul (window open) is also slowed down during this time: 467ms -> 486ms on comet (4%) for the same check-ins above (http://tinderbox.mozilla.org/bonsai/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=1010539260&maxdate=1010540819) 1434ms -> 1476ms on sleestack (2.9%) for a slightly larger group of checkins (http://tinderbox.mozilla.org/bonsai/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=1010540220&maxdate=1010544419)

Summary: pageload slowed down by 1% (14ms) → pageload and window open slowed down by 1% (14ms)

Cathleen

Reporter

Updated

•

23 years ago

Summary: pageload and window open slowed down by 1% (14ms) → pageload and window open slowed down

Dan M

Comment 2

•

23 years ago

My checkins-under-suspicion affected the XUL window opening process, so could reasonably be suspected of affecting Txul times. But I've profiled before and after and I believe my changes to be a wash. I'm thinking I'm excused...

harishd

Assignee

Updated

•

23 years ago

Status: NEW → ASSIGNED

Priority: -- → P1

Target Milestone: --- → mozilla0.9.8

harishd

Assignee

Comment 3

•

23 years ago

Attached file Page load [ without my change ] — Details

harishd

Assignee

Comment 4

•

23 years ago

Attached file Page load [ with my change ] — Details

harishd

Assignee

Comment 5

•

23 years ago

Looks like it's me! Donno how my checkin afects pageload. Investigating.

harishd

Assignee

Comment 6

•

23 years ago

Attached image Page load graph [ without my change ] — Details

harishd

Assignee

Comment 7

•

23 years ago

Attached image Page load graph [ with my change ] — Details

John Morrison

Comment 8

•

23 years ago

So, I had set up a couple of builds to do some testing, one pull-by-date from before harishd's (and other) checkins and up to the tip at ~18:00 today. I was just trying some startup tests to confirm the gain there and I stumbled across the fact that 'XUL.mfasl' was getting blown away on alternate startups. That means that the fasl file is corrupt, and, I believe, that the XUL cache may not be getting (correctly) populated. That, of course, would have an effect on both pageload and, moreso, window open. [It may be the case that the startup gain is better than reported so far, although it may be the case that since we are just reporting the minimum time, we are already reporting about what it will be even if the fasl file was not getting rebuilt on each startup]. I've confirmed that today's Linux commercial build shows the same problem with XUL.mfasl as my own trunk build showed. XUL.mfasl is deleted every second startup.

John Morrison

Comment 9

•

23 years ago

see bug for an assertion and bail that happens on every second startup (on Linux at least).

John Morrison

Comment 10

•

23 years ago

see bug 119038 for an assertion and bail that happens on every second startup (on Linux at least).

John Morrison

Comment 11

•

23 years ago

Attached file jprof output (.zip); build pulled by date to Jan 08 17:36 PST; test: open 100 windows — Details

This jprof output was collected as: setenv JPROF_FLAGS "JP_START JP_PERIOD=0.010" and running command: ./mozilla -chrome "file:///mozilla2/mozilla/xpfe/test/winopen.xul?close=1&phase1=20&phase2=20&phase3=10&cycles=2" which opens 100 windows. The Jan08 build had these times (with jprof not running): avgOpenTime:894 minOpenTime:877 maxOpenTime:1068 medOpenTime:891 The Jan09 build (pulled 6:10pm; jprof to follow) had these times (jprof not running): avgOpenTime:912 minOpenTime:895 maxOpenTime:954 medOpenTime:912 which is 2% slower. If any jprof maestros can lend an interpretation (I'm just starting to learn how to use this jprof output).

John Morrison

Comment 12

•

23 years ago

Attached file jprof output (.zip); build pulled at Jan 09 18:10 PST; test: open 100 windows — Details

same test; jan 09 18:10 build.

John Morrison

Comment 13

•

23 years ago

adding the jprof maestro's referred to above (sorry I didn't have a place to host the web page and the attachment unzipped was too large for bugzilla).

Randell Jesup [:jesup] (needinfo me)

Comment 14

•

23 years ago

First thoughts: Jan 9 seems to be doing more allocations and frees, especially of COM objects: Jan 8: 99 298 nsCOMPtr_base::~nsCOMPtr_base Jan 9: 113 331 nsCOMPtr_base::~nsCOMPtr_base It also looks like there are more QI's going on (which may make sense given the above): Jan 8: 33 575 nsCOMPtr_base::assign_from_helper Jan 9: 40 601 nsCOMPtr_base::assign_from_helper Together those would account for around 1/2 of the difference. Running some of the "how many objects of what types are being allocated" tests might help. Part of the problem is that your jprof isn't fine enough resolution to find this sort of thing easily. Sampling error is too large for a total difference of only 125 out of ~5600. I normally use JP_PERIOD=0.0015 (about the lowest you can go), though that's also on a fast machine.

harishd

Assignee

Comment 15

•

23 years ago

What/Where is XUL.mfasl?

Suresh Duddi (gone)

Comment 16

•

23 years ago

That is the fastload file. Fastload file corruption can increase startup and pageload. That probably explains why we are not seeing the 10% increase in startup that dbaron was seeing from this checkin. I will help you track this down today.

Brendan Eich [:brendan]

Comment 17

•

23 years ago

I chatted with harishd over IRC, but for the record: in your profile directory, the fastload file is XUL FastLoad File on Macs, XUL.mfl on Windows, and XUL.mfasl on Unix-y systems. The corruption happens when writing a new file (because there was on file, or it was invalid), and is only detected on the next app startup. /be

John Morrison

Comment 18

•

23 years ago

I earlier said: > That means that the fasl file is corrupt, and, I believe, that the XUL > cache may not be getting (correctly) populated. but now I'm wondering whether I was confusing 'bad fastload' with an invalid XUL document (or XBL binding). So, question: is the XUL cache immune from fastload handling errors, or is it the case that the XUL cache can be (possibly) left in a less than optimal state. (Just curious).

harishd

Assignee

Comment 19

•

23 years ago

I have a fix for the fast loader problem. Have to run page load test against my fix. Hopefully, this would fix Tp & Txul

Brendan Eich [:brendan]

Comment 20

•

23 years ago

jrgm: the XUL cache is not persistent, and any failure to write or read FastLoad data should not affect what's in the cache. /be

harishd

Assignee

Comment 21

•

23 years ago

Ts on commet and sleestack dropped by 4% and 5% respectively ( after fixing bug 119038 ). However, I don't see a significant change in Tp/Txul.

John Morrison

Comment 22

•

23 years ago

I built two mozilla builds with MOZ_PROFILE=1,MOZ_DEBUG=. One was pulled just before harishd's first parser checkin on Jan 8, and the other was pulled at midnight Jan 10, which had the leak and fastload fixes (along with whatever else landed on the trunk in that period). The window open times on btek, mecca, etc. (for Linux) have continued to show about a ~2% regression in window open times. I did two quantify runs, one with each build, by starting the browser, immediately stopping data collection as the browser, then beginning data collection again, running a test to time the opening of 5 new windows, stopping the data collection, and finally shutting down and letting quantify process the log. I then used the 'delta' comparison feature to compare the two runs (which gives you a call graph and function lists showing the difference in the two runs). The hot spot of this 'delta' Quantify comparison is the following. From nsParser::ResumeParse, I have: build: Jan 10 23:59 pull build: Jan 8 17:36 pull nsParser::Tokenize nsParser::Tokenize --> 216 calls (675ms) --> 213 calls (401ms) | | | | nsExpatDriver::ConsumeToken nsExpatTokenizer::ConsumeToken --> 186 calls (666ms) --> 185 calls (382ms) | | | | nsExpatDriver::ParseBuffer nsExpatTokenizer::ParseXMLBuffer --> 203 calls (665ms) --> 196 calls (381ms) | | | | XML_Parse XML_Parse --> 383 calls (664ms) --> 381 calls (380ms) | | --------------------- --------------------- | | | | prologProcessor prologInitProcessor prologProcessor prologInitProcessor --> 169 calls --> 93 calls --> 93 calls --> 93 calls (429ms) (274ms) (157ms) (157ms) and at this point, it's probably better to look at the actual Quantify data, since this ascii diagram is about to get too confusing (if not already). I've put the two original files and the delta file at http://jrgm.mcom.com/quantify/bug-119066 (I'd upload them to bugzilla but they're up to 1/4 GB in size :-]). That 275 msec difference, over five windows, is ~50 msec for a normal runtime window opening of 750 msec, which is about a 7% difference. That's much higher than the actual regression on Linux, but (a) I need to do a real measurement on windows, for the same conditions as this Quantify data was generated, to see what the win32 number looks like for real, and (b) I'm not sure that the Quantify real-time values are useful beyond getting the general "flavour" of what this maps to for real-time. So, 7% is "kind of" correct. Maybe, though, I've badly misread this output, or otherwise botched this test. But perhaps this is an area to look at for what slowed down during window open. (I have one (very) naive question: If we're opening a window type that has already been opened before, why is the xml parser being called. I thought that we would have already had this, in predigested form, in the XUL cache? brendan? waterson? bueller?)

Keywords: perf

John Morrison

Comment 23

•

23 years ago

Just to be clear on the notation: nsParser::Tokenize --> 216 calls (675ms) means that nsParser::Tokenize() was called 216 times by the function above for a (Quantify estimated) realtime cost of 675 msec for the function plus descendant (F+D) time, for the given column (build).

John Morrison

Comment 24

•

23 years ago

> by the function above by the function above it in the column

Cathleen

Reporter

Updated

•

23 years ago

Keywords: regression

Cathleen

Reporter

Updated

•

23 years ago

Severity: normal → critical

harishd

Assignee

Comment 25

•

23 years ago

jrgm: Is it F-time or F+D time? If F+D then in the nsExpatTokenizer case you should include CWellFormedDTD time [ ::BuildModel(), ::GetTokenizer() ] as well since all the sink calls happen via CWellFormedDTD.

harishd

Assignee

Comment 26

•

23 years ago

jrgm: The code path has changed a bit and am not sure if the quantify data would be of much use :-(

harishd

Assignee

Comment 27

•

23 years ago

Out of time. Mass move to 0.9.9

Target Milestone: mozilla0.9.8 → mozilla0.9.9

John Morrison

Comment 28

•

23 years ago

Okay, so I did a set of Quantify runs again, once with a build pulled by data 01/08 17:36pm, and again with a build pulled 01/17 02:00am. Data collection was enabled only for a period of time where I opened and closed 20 windows using a default, new profile in each test (with the sidebar collapsed; in the classic skin). When I tested outside of Quantify, there was a 1% difference in the times between the builds (Jan17 slower), and while testing in Quantify, the Jan17 build measured as 2% slower. Using Quantify's difference function, in comparing the two builds, the biggest time difference in the two builds was the F+D time descending form nsParser::ResumeParse, which took (according to Quantify) 2618ms for the Jan17 build and 2094ms for the Jan08 build (diff = 524ms, or ~25ms per window open). I'll attach a picture of the 'critical path' in the diff Quantify run, and an abbreviated list of the top and bottom of a function list sorted by F+D. The actual Quantify runs (base, new, diff) will be posted at: http://jrgm.mcom.com/quantify/bug-119066/Jan08-Jan17 In the abbreviated F+D list, first, leave aside the top 3 and bottom 3 largest F+D changes. (nsCSSFrameConstructor::ContentInserted shows up as different function calls due to a signature change rev 1.110 of nsCSSFrameConstructor.h; the flip of 590 calls from nsXBLPrototypeHandler::BindingDetached to nsXBLPrototypeHandler::BindingAttached looks just plain weird, but the constructors for nsXBLPrototypeHandler() were changed, rev 1.11 of nsXBLPrototypeHandler.h, and maybe Quantify is just confused by that; the flip of nsJSContext::`vector deleting destructor' to nsJSContext::`scalar deleting destructor' is something for someone else to ponder). So, the biggest "real" gainers and losers after that are all tied up in the parser (e.g., nsParser::BuildModel took 665ms more time in the 01/17 build). Hope that points to something that is solvable.

John Morrison

Comment 29

•

23 years ago

Attached image image of the difference tree, focussed on subtree of nsParser::ResumeParse — Details

In that image, which shows the difference in execution paths between the build pulled Jan08 and the build pulled Jan17, the brown arcs are code paths that are called _more_ in the Jan17 build, and the green arcs are code paths that are called _less_ in the Jan17 build (which includes code paths that don't exist in the Jan17 build).

John Morrison

Updated

•

23 years ago

Attachment #65592 - Attachment mime type: text/html → image/png

John Morrison

Comment 30

•

23 years ago

Attached file list of top F+D gainers and losers in the diff for Jan08->Jan17 — Details

harishd

Assignee

Comment 31

•

23 years ago

>So, the biggest "real" gainers and losers after that are all tied up in the >parser (e.g., nsParser::BuildModel took 665ms more time in the 01/17 build). NS_IMETHODIMP nsExpatDriver::BuildModel(nsIParser* aParser, nsITokenizer* aTokenizer, nsITokenObserver* anObserver, nsIContentSink* aSink) { return mInternalState; } The above method does no work ( virtually ) so if you're seeing a difference in nsParser::BuildModel() then either CNavDTD or layout has slowed down. I don't believe it's CNavDTD since all I changed there is replacing virtual nsresult to NS_IMETHOD.

harishd

Assignee

Comment 32

•

23 years ago

Landed fix for bug 120345. It improved Txul time by ~1%.

Chris McAfee

Comment 33

•

23 years ago

one small step for man... thanks harish.

harishd

Assignee

Updated

•

23 years ago

Keywords: nsbeta1

Chris McAfee

Comment 34

•

23 years ago

harishd, so where are we at with this one?

harishd

Assignee

Updated

•

23 years ago

Target Milestone: mozilla0.9.9 → mozilla1.0

harishd

Assignee

Comment 35

•

23 years ago

It's extremely hard to spot the problem and there's no way to come up with a fix in the 1.0 milestone. Moving to 1.1 ( hopefully it will get fixed then! ).

Keywords: nsbeta1 → nsbeta1-

Summary: pageload and window open slowed down → pageload and window open slowed down (~1%)

Target Milestone: mozilla1.0 → mozilla1.1

Chris McAfee

Comment 36

•

23 years ago

I recommend some conclusion here. If there's a trail we need to keep looking at, let's look at it. If not, then let's marked this bug as done. Pushing this to 1.1 reduced the probability of anything happening quite a bit.

Cathleen

Reporter

Comment 37

•

23 years ago

harish, this is a tinderbox performance regression. we need to address this bug, either come up with a fix or explain why your feature/fix is costing perf regression. I don't think it's fair to mark it a nsbeta1-, the longer we push off, the smaller the chance we'll be able to address it.

Markus Hübner

Comment 38

•

23 years ago

What has been done about that in the meantime?

Target Milestone: mozilla1.1alpha → mozilla1.1beta

Håkan Waara

Comment 39

•

23 years ago

Uh Markus, you're not allowed to change other engineer's target milestone. Don't do that.

harishd

Assignee

Comment 40

•

23 years ago

>harish, this is a tinderbox performance regression. we need to address this >bug, either come up with a fix or explain why your feature/fix is costing perf >regression. Cathleen, it's hard to say what caused that regression. However, IMO, we should compromise this regression for the benefits ( 10% improvement in Ts, and ~2MB reduction in memory bloat ).

Heikki Toivonen (remove -bugzilla when emailing directly)

Updated

•

23 years ago

Target Milestone: mozilla1.1beta → Future

Markus Hübner

Comment 41

•

23 years ago

so is going to be marked "fixed" then ?

Ronald Tilby

Comment 42

•

21 years ago

I propose that this bug be closed or at least downgraded from Critical P1. I believe all the perf regressions discussed here have long since been superceded by other performance improvements.

Chris McAfee

Comment 43

•

21 years ago

I think we have given this a close enough look. It hasn't gotten attention for 2 years, it is almost pointless to go after this small a performance win after such a long time. Since some work did happen, I will mark this fixed. Harish, if you are active in this bug please feel free to correct me or add something here.

Status: ASSIGNED → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

Myk Melez [:myk] [@mykmelez]

Updated

•

21 years ago

Product: Browser → Seamonkey

You need to log in before you can comment on or make changes to this bug.