Closed Bug 450401 Opened 16 years ago Closed 16 years ago

Tp regression from 4:21 8/7/2008

Categories

(Release Engineering :: General, defect)

defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mossop, Unassigned)

References

()

Details

(Keywords: perf, regression)

Talos is showing a Tp regression, about 10% on XP, OSX and Linux. Oddly Vista shows a Tp improvement at the same time.

My best attempts to find the regression range from the push log led me the conclusion that it was one of the following changesets:

http://hg.mozilla.org/mozilla-central/index.cgi/rev/04432668e53e
http://hg.mozilla.org/mozilla-central/index.cgi/rev/eb6fda3be5bb
http://hg.mozilla.org/mozilla-central/index.cgi/rev/f74c28a5138e

I tried backing out the first two with no change to the results. I can't believe that a test suite change would affect things, though hgweb is being a little secretive about what that change is.
Flags: blocking1.9.1?
Have we closed the tree yet?  If so, have we tried widening the checkin range we're considering?  Or are we just going by the changesets the builds are reporting they built?

Did anything change in network infrastructure or some such during this time period?
Closing the tree for this and mac orange.
Severity: normal → blocker
(In reply to comment #3)
> Bug 238072 perhaps?
> 

Nevermind.  It appears that was subsequently backed out.
Pulled the CSV data from the Mossop's graph link, the following buildIDs bracket the regression range: [The OS X data is too noisy.]

200808071047 (Linux, ok)
200808071419 (XP, ok)
200808071621 (XP, regressed)
200808071621 (Linux, regressed)

So, range is 7 Aug, between 14:19 and 16:21 (Pacific == GMT +7) == 21:19 to 23:21 GMT. Pushlog data from around that range:

23:24:38 2008 +0000	92b4e1f43fbf	Bug 424813, Jquery test suite update, patch by harthur, r=sdwilsh
23:18:31 2008 +0000	04432668e53e	Bug 441469. Implement parsing of @font-face rules. r+sr=dbaron
23:18:31 2008 +0000	eb6fda3be5bb	Bug 113934. Backend and some minimal front end for dragging tabs between windows. r=gavin, r+sr=jst
23:16:20 2008 +0000	f74c28a5138e	Bug 424813, Jquery test suite update, patch by harthur, r=sdwilsh
21:18:48 2008 +0000	b5295239bdba	Bug 449168. Fix selection details leak. r=masayuki, sr=roc
21:18:48 2008 +0000	cc82f602d986	Bug 121341. Switch nsPersistentProperties::Load from reading a char at a time to reading a block at a time, with a state-machine parser and all. r=bsmedberg, sr=bzbarsky
21:18:48 2008 +0000	e7c43898c379	Bug 449422. Assert early and hopefully not often when table-row frames have the wrong display type. r=bernd, sr=roc
21:16:24 2008 +0000	1434eaf5a557	Bug 449317 – Use the new search textbox binding in the Add-ons manager. r=mossop

Other pushes are > 30 minutes outside the regression range.

The last one, 92b4e1f43fbf / Jquery, is just outside the range, and looks rather unlikely to have any Tp impact. All changes are in dom/tests/mochitest/ajax/jquery/.

The next 2 are the ones Mossop says backouts (locally?) didn't help with.

The next one f74c28a5138e / Jquery seems to be a bogus push? The changelog is empty. Not sure what happened here.

The next 3 changesets are technically outside the regression range, but only by a minute (and perhaps as little as 12 seconds). Between our fuzzy timestamps and clock skew, it seems quite possible these 3 changesets were not picked up in the 200808071419 build, and thus may be the cause of the regression.

(In reply to comment #5)
> The next 2 are the ones Mossop says backouts (locally?) didn't help with.

I backed them out in mozilla-central but then when the numbers didn't change for nearly a day I relanded them.

> The next 3 changesets are technically outside the regression range, but only by
> a minute (and perhaps as little as 12 seconds). Between our fuzzy timestamps
> and clock skew, it seems quite possible these 3 changesets were not picked up
> in the 200808071419 build, and thus may be the cause of the regression.

If you believe tinderbox (and who wouldn't) then yes the 200808071419 build (actually OSX due to confusing graph server lengends) is building changeset 1434eaf5a557
So, I'm stumped.

We've backed out b5295239bdba ("selection details leak") and cc82f602d986 ("nsPersistentProperties::Load"), and Tp hasn't budged. The only parts not backed out are f74c28a5138e (an empty commit?) and e7c43898c379, (which only added 3 NS_ASSERTS).
Depends on: 450506
I've backed out e7c43898c379. Mossop said he looked into f74c28a5138e earlier, and confirmed it's an empty commit. Waiting on IT to do a clobber (bug 450506) in hope that might clear things up.
OS X and Linux have reported post-clobber Tp numbers... Neither went down. Not sure what else to do here.
I've relanded everything that was backed out, and declared the tree reopened.
I tried (unsuccessfully) to follow when the regression was noticed, what changes were/werent in the regression range and what changes were backed out. Can you summarise?

(My concern is that, with our fuzzy(!) timestamping, a losing change might have been accidentally excluded from the regression range.)
I noticed the regression on Friday. Over the weekend I backed out what I believed to be the patches in range of the regression:

http://hg.mozilla.org/mozilla-central/index.cgi/rev/04432668e53e
http://hg.mozilla.org/mozilla-central/index.cgi/rev/eb6fda3be5bb

After being backed out for nearly a day and no noticeable change in perf numbers I relanded them.

Yesterday dolske looked more closely at the data and determined that a couple of earlier patches might have been inside the range due to fuzzy timing. SO we backed out the following:

http://hg.mozilla.org/mozilla-central/index.cgi/rev/b5295239bdba
http://hg.mozilla.org/mozilla-central/index.cgi/rev/cc82f602d986

Again neither of these helped. Tinderbox claims that the last good build was building changeset 1434eaf5a557 and the first bad build was changeset 92b4e1f43fbf. This leaves 3 changesets that we haven't yet tried backing out, however one of those is empty, one purely touches test suite files and the last only changes asserts which should only be relevant in debug builds.

The last good changeset 1434eaf5a557 is purely add-ons manager code and should never impact Tp. The changeset previous to this was landed an hour earlier so I doubt it would be in range. The changeset after the first bad one was an hour and a half later so again I doubt it would be in range.
I wonder when the last clobber prior to the regression was.
It's possible the build system failed to pick up a change.
What does "failed to pick up a change" mean? The tinderbox tells us exactly what revision it is building each time.

Currently the depend tinderboxes never automatically clobber, so it's likely that the previous clobber was long ago, perhaps months.

I object to reopening the tree when we don't have a solution in place to deal with the regression, but I'll start a thread on md.planning to discuss it further; no need to clutter this bug.
In looking back at the performance graphs i think the actual regression might have occurred closer to mid-day than originally postulated. This leads me to suspect bug 428278. 
(In reply to comment #14)
> What does "failed to pick up a change" mean? The tinderbox tells us exactly
> what revision it is building each time.

Yes, that's exactly right if we have an empty tree for the build.
But, with old generated files in the tree, sometimes they are not regenerated when they should be because the build system is missing some dependencies.

A change may only take effect later when a different change triggers regeneration of the appropriate files, or when a clobber is done.

> Currently the depend tinderboxes never automatically clobber, so it's likely
> that the previous clobber was long ago, perhaps months.

If the boxes the build (assuming that is the "depend tinderboxes") never clobber, then that is very unfortunate.
Karl, if you know of specific examples (outside of the NSS tarpit) of missing dependencies, please file them. As far as I know our build system calculates dependencies quite accurately.

The nightly tinderboxes always build from scratch (clobber). The depend ones never do, unless we trigger it manually. There's a bug about doing so automatically, but I'm not sure what its current status is.
(In reply to comment #15)
> In looking back at the performance graphs i think the actual regression might
> have occurred closer to mid-day than originally postulated.

What are you looking at that leads you to believe this?

I pulled the CSV data from the graphs, found the builds with the obvious jumps in Tp, and matched that to the Hg pushlog. (Note that the graph/buildid times are PDT, and the pushlog is GMT -- a 7 hour difference. Err, rather, was when I did this. Bug 448459 has since landed, which puts the pushlog timestamps in PST/PDT.)
(In reply to comment #17)
> Karl, if you know of specific examples (outside of the NSS tarpit) of missing
> dependencies, please file them.

Filed bug 450706 and bug 450707, but they won't be relevant here.  Bugs similar to bug 450651 and bug 392950 are more likely to cause problems with these symptoms.

I imagine that there are always going to be dependencies that are not picked up.
For example, maybe we don't really want to reinstall everything any time that nsinstall.c changes.

The most recent change to configure.in prior to 1434eaf5a557 was 8e6782e9aaed.  Changing configure.in would (should) have caused most things to rebuild.  If missing build dependencies are involved then the next best guess for a range would be 8e6782e9aaed:92b4e1f43fbf.  That's 41 commits over one day.
(In reply to comment #18)
> (In reply to comment #15)
> > In looking back at the performance graphs i think the actual regression might
> > have occurred closer to mid-day than originally postulated.
> 
> What are you looking at that leads you to believe this?
> 
> I pulled the CSV data from the graphs, found the builds with the obvious jumps
> in Tp, and matched that to the Hg pushlog. (Note that the graph/buildid times
> are PDT, and the pushlog is GMT -- a 7 hour difference. Err, rather, was when I
> did this. Bug 448459 has since landed, which puts the pushlog timestamps in
> PST/PDT.)
> 

The actual raw data is rather noisy going up one build and down the next.  I was interested in looking at the trend which is easier to visualize form the graph than to determine from the raw data.  I looked at several of the talos machines and it appeared to me that the trend began its upward swing at around noon Augsut 7th PDT. The check-in comment in this bug is timestamped 2008-08-07 12:35:01 PDT, However you are correct in that the puslong seems to indicate it was checked in much earlier.

However, looking at the patch, I would expect it might regress the tp numbers as it adds a loop through all children that was not previously there in the middle of table layout.
(In reply to comment #20)

> The actual raw data is rather noisy going up one build and down the next.

The graph from the URL field of this bug has 3 platforms shown; 2 have unambiguous jumps. [The 3rd matches your description, where the trend increased but it's unclear exactly where it went up.]
Depends on: 412347
(In reply to comment #19)
> If missing build dependencies are involved the next best guess for a range
> would be 8e6782e9aaed:92b4e1f43fbf.

I pushed some changesets to the tryserver to make it build code equivalent to
the revisions at each end of this range.

IIUC the tryserver runs Tp3 so the equivalent test on the Firefox tinderbox
showed a 3-4% slip on Mac and Linux and much less on Windows XP.

http://graphs.mozilla.org/#show=911655,911694,912148&sel=1217987170,1218623751

On the tryserver, the differences seen were:

Linux      - 1-2% slip
Mac        - 1% slip
Windows XP - no change

Perhaps the Linux result is just significant, but these differences are much
less than the differences seen on the Firefox tinderbox.

The results are here:

http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaTry&maxdate=1219130012&legend=0&norules=1&hours=6

try-cfc9547f663 and try-66f40d3c5af correspond to 8e6782e9aaed at the start of
the range and try-12bbe146e39 corresponds to 92b4e1f43fbf at the end of the
range.
(In reply to comment #22)
> IIUC the tryserver runs Tp3 so the equivalent test on the Firefox tinderbox
> showed a 3-4% slip on Mac and Linux and much less on Windows XP.
> 
> http://graphs.mozilla.org/#show=911655,911694,912148&sel=1217987170,1218623751

Sorry, wrong link there

http://graphs.mozilla.org/#show=911655,911694,912148,53218,61148,146577&sel=1217799104,1218529704

And what I meant was that I guess (from the times) that the tryserver is running the equivalent of the slow test.  (I see that both the "fast" and slow tests are called "Tp3".)
Looking at the link in the last comment, it appears that the Tp regression
also occurred on the 1.9 branch at the same time, correct?
(In reply to comment #24)
Yes, you are right.
The 1.9.0 checkins in that range:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2008-08-07+10%3A00&maxdate=2008-08-07+18%3A00&cvsroot=%2Fcvsroot

The only code change is bug 442333, which landed on mozilla-central
2008-07-01 (without regressions AFAICT).

So it seems this bug was not caused by a checkin.
OK.  So where do the pages that re loaded by the browser for the tp test live?  DO they live on one server that is referenced by all the build machines, or does each build machine load these pages from a separate server?  If this is all loaded from a single webserver, then I would look for an issue there.
Web pages are loaded from local apache installs on each individual test slave.
More than likely caused by bug 432883 and fixed by bug 457582.
Blocks: 432883
Status: NEW → RESOLVED
Closed: 16 years ago
Depends on: 457582
Resolution: --- → FIXED
Mass move of Core:Testing bugs to mozilla.org:ReleaseEngineering. Filter on RelEngMassMove to ignore.
Component: Testing → Release Engineering
Product: Core → mozilla.org
QA Contact: testing → release
Version: Trunk → other
Flags: blocking1.9.1?
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.