Closed Bug 119066 Opened 23 years ago Closed 20 years ago

pageload and window open slowed down (~1%)

Categories

(SeaMonkey :: General, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Future

People

(Reporter: cathleennscp, Assigned: harishd)

Details

(Keywords: perf, regression)

Attachments

(8 files)

tinderbox pageload jumped from 1220's ms to 1240's ms

Checkins happend during 1/08 17:47PM to 18:56PM:
http://tinderbox.mozilla.org/bonsai/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=1010538840&maxdate=1010540819

See graph:
http://tegu.mozilla.org/graph/query.cgi?tbox=btek&testname=pageload&autoscale=1&days=7&avg=1

We need to address this soon.
looks like txul (window open) is also slowed down during this time:

467ms -> 486ms on comet (4%)
for the same check-ins above
(http://tinderbox.mozilla.org/bonsai/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=1010539260&maxdate=1010540819)

1434ms -> 1476ms on sleestack (2.9%)
for a slightly larger group of checkins
(http://tinderbox.mozilla.org/bonsai/cvsquery.cgi?module=MozillaTinderboxAll&date=explicit&mindate=1010540220&maxdate=1010544419)
Summary: pageload slowed down by 1% (14ms) → pageload and window open slowed down by 1% (14ms)
Summary: pageload and window open slowed down by 1% (14ms) → pageload and window open slowed down
My checkins-under-suspicion affected the XUL window opening process, so could
reasonably be suspected of affecting Txul times. But I've profiled before and
after and I believe my changes to be a wash. I'm thinking I'm excused...
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla0.9.8
Looks like it's me! Donno how my checkin afects pageload. Investigating.
So, I had set up a couple of builds to do some testing, one pull-by-date from 
before harishd's (and other) checkins and up to the tip at ~18:00 today. I was 
just trying some startup tests to confirm the gain there and I stumbled across 
the fact that 'XUL.mfasl' was getting blown away on alternate startups. That 
means that the fasl file is corrupt, and, I believe, that the XUL cache may not 
be getting (correctly) populated. That, of course, would have an effect on both 
pageload and, moreso, window open. [It may be the case that the startup gain is 
better than reported so far, although it may be the case that since we are just 
reporting the minimum time, we are already reporting about what it will be even 
if the fasl file was not getting rebuilt on each startup].

I've confirmed that today's Linux commercial build shows the same problem with 
XUL.mfasl as my own trunk build showed. XUL.mfasl is deleted every second 
startup.
see bug for an assertion and bail that happens on every second startup (on 
Linux at least).
see bug 119038 for an assertion and bail that happens on every second startup 
(on Linux at least).
This jprof output was collected as:

setenv JPROF_FLAGS "JP_START JP_PERIOD=0.010"

and running command:

./mozilla -chrome
"file:///mozilla2/mozilla/xpfe/test/winopen.xul?close=1&phase1=20&phase2=20&phase3=10&cycles=2"


which opens 100 windows. 

The Jan08 build had these times (with jprof not running):

avgOpenTime:894   minOpenTime:877  maxOpenTime:1068  medOpenTime:891

The Jan09 build (pulled 6:10pm; jprof to follow) had these times (jprof not
running):

avgOpenTime:912  minOpenTime:895  maxOpenTime:954  medOpenTime:912

which is 2% slower. 

If any jprof maestros can lend an interpretation (I'm just starting to learn
how to use this jprof output).
adding the jprof maestro's referred to above (sorry I didn't have a place to 
host the web page and the attachment unzipped was too large for bugzilla).
First thoughts:

Jan 9 seems to be doing more allocations and frees, especially of COM objects:
Jan 8: 99      298 nsCOMPtr_base::~nsCOMPtr_base
Jan 9: 113     331 nsCOMPtr_base::~nsCOMPtr_base

It also looks like there are more QI's going on (which may make sense given the
above):
Jan 8: 33      575 nsCOMPtr_base::assign_from_helper
Jan 9: 40      601 nsCOMPtr_base::assign_from_helper

Together those would account for around 1/2 of the difference.

Running some of the "how many objects of what types are being allocated" tests
might help.


Part of the problem is that your jprof isn't fine enough resolution to find this
sort of thing easily.  Sampling error is too large for a total difference of
only 125 out of ~5600.

I normally use JP_PERIOD=0.0015 (about the lowest you can go), though that's
also on a fast machine.
What/Where is XUL.mfasl?
That is the fastload file. Fastload file corruption can increase startup and
pageload. That probably explains why we are not seeing the 10% increase in
startup that dbaron was seeing from this checkin.

I will help you track this down today.
I chatted with harishd over IRC, but for the record: in your profile directory,
the fastload file is XUL FastLoad File on Macs, XUL.mfl on Windows, and
XUL.mfasl on Unix-y systems.  The corruption happens when writing a new file
(because there was on file, or it was invalid), and is only detected on the next
app startup.

/be
I earlier said:
  > That means that the fasl file is corrupt, and, I believe, that the XUL 
  > cache may not be getting (correctly) populated.
but now I'm wondering whether I was confusing 'bad fastload' with an invalid
XUL document (or XBL binding). So, question: is the XUL cache immune from 
fastload handling errors, or is it the case that the XUL cache can be 
(possibly) left in a less than optimal state. (Just curious).
I have a fix for the fast loader problem. Have to run page load test against my
fix. Hopefully, this would fix Tp & Txul
jrgm: the XUL cache is not persistent, and any failure to write or read FastLoad
data should not affect what's in the cache.

/be
Ts on commet and sleestack dropped by 4% and 5% respectively ( after fixing bug
119038 ). However, I don't see a significant change in Tp/Txul.

I built two mozilla builds with MOZ_PROFILE=1,MOZ_DEBUG=. One was pulled
just before harishd's first parser checkin on Jan 8, and the other was 
pulled at midnight Jan 10, which had the leak and fastload fixes (along 
with whatever else landed on the trunk in that period). The window open
times on btek, mecca, etc. (for Linux) have continued to show about a ~2%
regression in window open times.

I did two quantify runs, one with each build, by starting the browser,
immediately stopping data collection as the browser, then beginning data
collection again, running a test to time the opening of 5 new windows, stopping 
the data collection, and finally shutting down and letting quantify process 
the log.

I then used the 'delta' comparison feature to compare the two runs (which
gives you a call graph and function lists showing the difference in the two
runs).

The hot spot of this 'delta' Quantify comparison is the following. 
From nsParser::ResumeParse, I have:

  build: Jan 10 23:59 pull                  build: Jan 8 17:36 pull

nsParser::Tokenize                        nsParser::Tokenize 
    --> 216 calls (675ms)                     --> 213 calls (401ms)
         |                                         |
         |                                         |
nsExpatDriver::ConsumeToken               nsExpatTokenizer::ConsumeToken        
    --> 186 calls (666ms)                     --> 185 calls (382ms)
         |                                         |
         |                                         |
nsExpatDriver::ParseBuffer                nsExpatTokenizer::ParseXMLBuffer
    --> 203 calls (665ms)                     --> 196 calls (381ms)
         |                                         |
         |                                         |
  XML_Parse                                 XML_Parse
    --> 383 calls (664ms)                     --> 381 calls (380ms)
         |                                         |
   ---------------------                     ---------------------       
   |                   |       	             |                   |       
prologProcessor  prologInitProcessor      prologProcessor  prologInitProcessor
  --> 169 calls    --> 93 calls             --> 93 calls     --> 93 calls
      (429ms)          (274ms)                  (157ms)          (157ms)      

and at this point, it's probably better to look at the actual Quantify data,
since this ascii diagram is about to get too confusing (if not already).

I've put the two original files and the delta file at 
  http://jrgm.mcom.com/quantify/bug-119066
(I'd upload them to bugzilla but they're up to 1/4 GB in size :-]).

That 275 msec difference, over five windows, is ~50 msec for a normal runtime
window opening of 750 msec, which is about a 7% difference. That's much higher
than the actual regression on Linux, but (a) I need to do a real measurement
on windows, for the same conditions as this Quantify data was generated, to
see what the win32 number looks like for real, and (b) I'm not sure that the
Quantify real-time values are useful beyond getting the general "flavour" of
what this maps to for real-time. So, 7% is "kind of" correct.

Maybe, though, I've badly misread this output, or otherwise botched this test.
But perhaps this is an area to look at for what slowed down during window open.

(I have one (very) naive question: If we're opening a window type that has 
already been opened before, why is the xml parser being called. I thought 
that we would have already had this, in predigested form, in the XUL cache?
brendan? waterson? bueller?)
Keywords: perf
Just to be clear on the notation:

    nsParser::Tokenize
        --> 216 calls (675ms)

means that nsParser::Tokenize() was called 216 times by the function above
for a (Quantify estimated) realtime cost of 675 msec for the function plus
descendant (F+D) time, for the given column (build).
> by the function above

by the function above it in the column
Keywords: regression
Severity: normal → critical
jrgm: Is it F-time or F+D time? If F+D then in the nsExpatTokenizer case you
should include CWellFormedDTD time [ ::BuildModel(), ::GetTokenizer() ] as well
since all the sink calls happen via CWellFormedDTD. 
jrgm: The code path has changed a bit and am not sure if the quantify data would
be of much use :-(
Out of time. Mass move to 0.9.9
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Okay, so I did a set of Quantify runs again, once with a build pulled by data
01/08 17:36pm, and again with a build pulled 01/17 02:00am.  Data collection
was enabled only for a period of time where I opened and closed 20 windows
using a default, new profile in each test (with the sidebar collapsed; in the
classic skin).

When I tested outside of Quantify, there was a 1% difference in the times
between the builds (Jan17 slower), and while testing in Quantify, the Jan17
build measured as 2% slower.

Using Quantify's difference function, in comparing the two builds, the biggest
time difference in the two builds was the F+D time descending form
nsParser::ResumeParse, which took (according to Quantify) 2618ms for the Jan17
build and 2094ms for the Jan08 build (diff = 524ms, or ~25ms per window open).

I'll attach a picture of the 'critical path' in the diff Quantify run, and an
abbreviated list of the top and bottom of a function list sorted by F+D.  The
actual Quantify runs (base, new, diff) will be posted at:
    http://jrgm.mcom.com/quantify/bug-119066/Jan08-Jan17

In the abbreviated F+D list, first, leave aside the top 3 and bottom 3 largest
F+D changes. (nsCSSFrameConstructor::ContentInserted shows up as different
function calls due to a signature change rev 1.110 of nsCSSFrameConstructor.h;
the flip of 590 calls from nsXBLPrototypeHandler::BindingDetached to
nsXBLPrototypeHandler::BindingAttached looks just plain weird, but the
constructors for nsXBLPrototypeHandler() were changed, rev 1.11 of
nsXBLPrototypeHandler.h, and maybe Quantify is just confused by that; the flip
of nsJSContext::`vector deleting destructor' to nsJSContext::`scalar deleting
destructor' is something for someone else to ponder).

So, the biggest "real" gainers and losers after that are all tied up in the
parser (e.g., nsParser::BuildModel took 665ms more time in the 01/17 build).

Hope that points to something that is solvable.
In that image, which shows the difference in execution paths between the build
pulled Jan08 and the build pulled Jan17, the brown arcs are code paths that are
called _more_ in the Jan17 build, and the green arcs are code paths that are
called _less_ in the Jan17 build (which includes code paths that don't exist in

the Jan17 build).
Attachment #65592 - Attachment mime type: text/html → image/png
>So, the biggest "real" gainers and losers after that are all tied up in the
>parser (e.g., nsParser::BuildModel took 665ms more time in the 01/17 build).

NS_IMETHODIMP 
nsExpatDriver::BuildModel(nsIParser* aParser,
                          nsITokenizer* aTokenizer,
                          nsITokenObserver* anObserver,
                          nsIContentSink* aSink) 
{
  return mInternalState;
}

The above method does no work ( virtually ) so if you're seeing a difference in
nsParser::BuildModel() then either CNavDTD or layout has slowed down. 
I don't believe it's CNavDTD since all I changed there is replacing virtual
nsresult to NS_IMETHOD. 
Landed fix for bug 120345. It improved Txul time by ~1%.
one small step for man...   thanks harish.
Keywords: nsbeta1
harishd, so where are we at with this one?
Target Milestone: mozilla0.9.9 → mozilla1.0
It's extremely hard to spot the problem and there's no way to come up with a fix
in the 1.0 milestone. Moving to 1.1 ( hopefully it will get fixed then! ).
Keywords: nsbeta1nsbeta1-
Summary: pageload and window open slowed down → pageload and window open slowed down (~1%)
Target Milestone: mozilla1.0 → mozilla1.1
I recommend some conclusion here.  If there's a trail
we need to keep looking at, let's look at it.  If not,
then let's marked this bug as done.  Pushing this to 1.1
reduced the probability of anything happening quite a bit.
harish, this is a tinderbox performance regression.  we need to address this
bug,  either come up with a fix or explain why your feature/fix is costing perf
regression.  

I don't think it's fair to mark it a nsbeta1-, the longer we push off, the
smaller the chance we'll be able to address it.
What has been done about that in the meantime?
Target Milestone: mozilla1.1alpha → mozilla1.1beta
Uh Markus, you're not allowed to change other engineer's target milestone. Don't
do that.
>harish, this is a tinderbox performance regression.  we need to address this
>bug,  either come up with a fix or explain why your feature/fix is costing perf
>regression.

Cathleen, it's hard to say what caused that regression. However, IMO, we should
compromise this regression for the benefits ( 10% improvement in Ts, and ~2MB
reduction in memory bloat ).
Target Milestone: mozilla1.1beta → Future
so is going to be marked "fixed" then ?
I propose that this bug be closed or at least downgraded from Critical P1.  I
believe all the perf regressions discussed here have long since been superceded
by  other performance improvements.
I think we have given this a close enough look.  It hasn't gotten attention
for 2 years, it is almost pointless to go after this small a performance win
after such a long time.  Since some work did happen, I will mark this fixed.

Harish, if you are active in this bug please feel free to correct me or add something here.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: