Closed Bug 468014 Opened 16 years ago Closed 16 years ago

Increase in TS, TP and Tsvg on Vista 1.9.2

Categories

(Release Engineering :: General, defect)

x86
Windows Vista
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: robert.strong.bugs, Unassigned)

Details

After rebooting the Vista 1.9.2 talos boxes there was an increase in TS, TP, and Tsvg. The increase brings these numbers much closer to what is seen on the Vista 1.9.1 talos boxes as well as the XP 1.9.1 / 1.9.2 talos boxes so the increase reported in this bug is suspect.

For example, Vista TS on 1.9.1 has always reported these higher numbers as can be seen in the following graph.
http://graphs.mozilla.org/#show=787126,787093,787114,1431846,2418771,2418843,2419283,2419774&sel=1227233842,1228443443

The Vista 1.9.2 talos boxes have recently been rebooted again to see if there is another change in the numbers. Depending on the outcome we may want to reboot the Vista 1.9.1 talos boxes to see if they also change.

The reboot of these boxes was done as part of bug 463020
With standalone talos I did 2 TS runs on Vista with the nightly build from prior to the increase and with an hourly from after the increase... below are the results

Before 1st Run: 439.11
Before 2nd Run: 461.37
After 1st Run : 458.16
After 2nd Run : 439.16

This further reinforces my belief that the reboots caused the increase.
btw: each run consisted of 20 cycles and the numbers I stated in comment #1 is the average of cycles 2 through 20.
Based on all of the above findings I've reopened the tree
With the inconsistencies found with TS, TP3, and Tsvg on the Vista 1.9.2 talos systems I think it *might* be a good idea to reboot the Vista 1.9.1 talos systems to see if they show any significant variance with TS, TP3, or Tsvg after a reboot.
It *might* also be appropriate to run standalone talos on one of the 1.9.2 Vista systems (perhaps just the TS tests) to get an idea if there is a significant variance. It would also be a good thing IMO to run it against a build prior to the regression called out in this bug and a build after. In bot cases I would prefer it if these builds were placed in a directory where the firefox.exe has never existed before for both runs (e.g. directories named firefox-before and firefox-after) just in case prefetch is coming into play... this will prevent the use of a pre-existing prefetch file since the prefetch files appear to be associated by the following: <full path including binary filename>.
Alice, catlee - either of you looking at this?
I did the second set of reboots on the 1.9.2 vista boxes - the machines seem to be in good order and I don't have a reason that we should be distrusting their output.

The jump in numbers post-initial reboot is still worrying, though.
Are we still investigating this, or just living with it?
(In reply to comment #9)
> The reboot of these boxes was done as part of bug 463020
Actually, the reboot work was done in bug#467791#c1 (I had trouble lining up activity in other bug with bumps in the graph). 



Now that we're auto-rebooting Talos machines (details in bug#463020), is there any remaining reason to keep this bug open any longer?
(In reply to comment #9)
> Now that we're auto-rebooting Talos machines (details in bug#463020), is there
> any remaining reason to keep this bug open any longer?

Closing as FIXED-indirectly-by-the-talos-auto-rebooting. However, if you see something like this again, please do reopen.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Just for clarity I think it is important to note that rebooting the talos machines manually dramatically increased these numbers and that the numbers provided by talos previously for these machines were incorrect.
We've gone to a system where every talos box is auto-rebooted post each test run - we shouldn't hit the same state again where a box has had a long uptime and is reporting unreliable numbers.  That doesn't explain the oddness that was observed in this case, but it can give us some confidence for the future.
Agreed... I just wanted a tad more clarity.

As a suggestion I think it would be a good thing to compare talos numbers of any new talos machines that are brought online since these machines exhibited the lower numbers since the moment the numbers were collected. In this specific instance the numbers could have been compared with the 1.9.1 machines to see a significant discrepancy.
Unfortunately, I did notice the low numbers when the machines initially came online and did a number of reboots to try and get them inline with the moz-central reported numbers.  The numbers stayed lower and solid, so I attributed the change to an alteration that we made to the standard talos ref image (we changed from a standard vista install to a business license).  

I was pretty surprised when the numbers took that shot up to line up with moz-central post reboot, since that was something that I had attempted a couple of times when the machines were first brought online.

It is usual practice to let new talos boxes 'bake' for a short period to ensure that their numbers are trustworthy and usable.  This happened in this case.

I really can't say what the mis-step here was - but I'm putting all my money on auto-rebooting to help in the future. :)
Component: Release Engineering: Talos → Release Engineering
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.