Closed Bug 381537 Opened 18 years ago Closed 18 years ago

large performance regression on fx-win32-tbox perf

Categories

(Firefox :: General, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX

People

(Reporter: sayrer, Unassigned)

Details

(Keywords: perf)

Tp: 582ms -----------> 651ms Tp2: 471.6375ms ------> 502.7ms Tdhtml: 1212ms ----------> 1256ms Txul: 532ms -----------> 641ms Ts: 1890ms ----------> 2641ms Dunno if this is a problem with the box, or real regression. Either way, we should figure it out asap.
Severity: normal → critical
OS: Linux → Windows XP
For comparison, the same box's tests on mozilla1.8, after the same outage, a period when there were no 1.8 checkins: Tp: 421ms --------> 442ms Tp2: 343.6625ms --> 391.05ms Tdhtml: 1225ms ---> 1334ms Txul: 359ms ------> 375ms Ts: 1375ms -------> 1968ms
Keywords: perf
note that Tdhtml moved on the mac yesterday too: 504--->516. All the other numbers remained about constant, however.
I'm going to try rebooting the perf machine. If this helps, I think we should start rebooting between tests (or at least fairly often).
Post-reboot (and clobber to fix some CVS fail) the numbers still show regression. Tp:642ms Tp2:500.0875ms Tdhtml:1237ms Txul:640ms Ts:2031ms It may still be the box but the reboot didn't seem to fix it.
This did fix the large Ts jump, though there is still a small one.
Oh, it looks like this was a change on the testing box. We've had yardsticks change all the time -- we don't hold the tree closed for it. If we did, the tree would still be closed from when btek was moved in 2003.
(In reply to comment #7) > Oh, it looks like this was a change on the testing box. We've had yardsticks > change all the time -- we don't hold the tree closed for it. If we did, the > tree would still be closed from when btek was moved in 2003. This comment doesn't make much sense to me, if the testing box did change, we should back out the patches that went in while it was missing, and re-add them. More than 7000 lines of code went in while it was missing.
checkins to Firefox code in the range: mrbkap%gmail.com Mark the overwritten scope property in the space between where we remove it and re-add it in its changed form. bug 381374, r=igor sdwilsh%shawnwilsher.com Bustage fix for Bug 380250. (Windows) jonas%sicking.cc Bug 380872: Forgot to address bzs review comment to remove this assertion. r/sr=bz sdwilsh@shawnwilsher.com Bug 380250 - Convert Download Manager's RDF backend to mozStorage. r=cbiesinger,r=mconnor mrbkap%gmail.com Protect the number from GC, even if it was originally a number. bug 375976, r=crowder masayuki%d-toybox.com Bug 381426 Can't be activated Input Method in the Bookmark Properties. r+sr=roc crowder%fiverocks.com Bug 380998: StackGrowthDirection is not reliable with Sun Studio 11, patch by Ginn Chen <ginn.chen@sun.com>, r=brendan mrbkap%gmail.com Don't assume that the parser is still enabled after we've returned to the event loop. bug 380590, r+sr=sicking jonas%sicking.cc Bug 380872: Call BindToTree on anonymous children too when BindToTree is called on an element. r/sr=bz jonas%sicking.cc Bug 53901: Make sure to also release controllers when unbinding xul elements from the DOM. r/sr=bz
(In reply to comment #10) > > The latter (especially the yellow showing that the tinderbox was interrupted in > its run on Mozilla1.8) suggests that this isn't a real regression, but a > configuration change in the machine. > > Therefore, I don't think this should hold the tree closed. > I agree that it is more likely not to be a real regression at all. However, we don't know that no regressions occurred on Windows at the same time as the changes on the box (which contributed to the large Ts spike, for sure).
See http://build-graphs.mozilla.org/graph/query.cgi?tbox=bl-bldxp01_head&testname=startup&autoscale=1&size=&units=ms&ltype=&points=&showpoint=2007%3A05%3A22%3A10%3A30%3A58%2C2031&avg=1&days=40 for an example of how we are (or aren't, who can tell?) hiding perf regressions in these periods where bl-bldxp01 spikes after a restart - either the ceiling was raised 5% by completely unknown and random factors (calling it a "configuration change" seems wrong, since the change consists of killing hung processes and restarting tinderbox), or we had a 5% Ts regression between 04/29 and 05/05, and either way I was probably wrong to resolve bug 379257.
There've been no configuration changes on the perf machines. We don't install updates, or otherwise make any unannounced modifications. The only thing that's happened is that the machine has rebooted. I think we really need to look into how to get reliable performance numbers, in a way that's reproducible on more than one machine :) That's actually a non-trivial task. I think we should consider rebooting regularly, if rebooting affects the numbers so much.
(In reply to comment #12) > See > http://build-graphs.mozilla.org/graph/query.cgi?tbox=bl-bldxp01_head&testname=startup&autoscale=1&size=&units=ms&ltype=&points=&showpoint=2007%3A05%3A22%3A10%3A30%3A58%2C2031&avg=1&days=40 > for an example of how we are (or aren't, who can tell?) hiding perf regressions (In reply to comment #13) > I think we really need to look into how to get reliable performance numbers, in > a way that's reproducible on more than one machine :) That's actually a > non-trivial task. I'm sure it is non-trivial. But until it's fixed, we'll face the unpleasant choice of unwittingly piling on performance regressions for 95% of our users, or not getting work done. :(
This seems to have sorted itself out, with the exception of Tp. The tree is reopened now, we will be monitoring Tp to see whether it stabilizes. WORKSFORME?
Do I have the timeline right? rhelmer rebooted it ~8:30, it did two more runs with bad numbers, then something unspecified was done to it between 10:30 and 11:00, and now it's all better? WORKSFORME if that's reproducible, and the steps are known, and every single on-call IT person knows what they are and how to do them, and every sheriff and likely bug filer (okay, that's just me) knows what to ask IT to do when bl-bldxp01 hangs, because this isn't at all an isolated incident, it's either the 11th or 12th time since last summer.
there may have been a small performance regression after this large perf regression was rememdied by rebooting the box a bunch of time. That's covered in bug 381782.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.