Closed Bug 1525841 Opened 3 years ago Closed 3 years ago

Resident Memory metrics suddenly turned unimodal from bimodal

Categories

(Testing :: General, defect)

x86
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: igoldan, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression, Whiteboard: infrastructure)

Perfherder automatically noticed a "regression" from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=f0e33657f074a11e1609533bd812fb800c6a2b8e

So called "regression":

19% Resident Memory windows7-32 pgo stylo 472,285,465.93 -> 560,020,979.82

This very much resembles an infrastructure update, because:

  • it happened on both integration branches at the same time
  • it cannot be narrowed down to the same commit on both integration branches
  • retriggering older changesets now outputs higher values similar to the regression above, no matter how low initial values were

Thus, I'm curious what change caused this, preferably someone points to the actual bug. I want to make sure Perfherder is handling sane data.

Rob, are you aware of any recent platform updates on Windows 7 machines?

Flags: needinfo?(rthijssen)

this shouldn't have happened, but yes, i can see that it has.

taskcluster windows instances, are supposed to be pinned to a specific occ sha revision, so that they are not susceptible to change on the master branch of the occ infra codebase.

the windows 7 ec2 amis in production were deployed on 2018-11-13.

however, the change which introduced sha pinning was deployed on 2018-11-25, after the last windows 7 deployment.

this means that the currently deployed windows 7 amis, still take their configuration from the master branch of occ and as such they will change whenever there are commits to occ master. this can happen frequently and did in fact happen on tuesday when this regression was noted. this was an oversight on my part as i hadn't noted that windows 7 prod amis predated the sha pinning change.

we will need to redeploy windows 7 (and 2012) amis in order to get them to pick up the sha pinning change from late november, to get them pinned to a deterministic occ sha revision to prevent this happening again.

once all of our windows infra is using the new sha pinned occ configuration, it will be a lot easier to say with certainty whether specific windows worker types have changed or not and point to the specific changes in occ commit history (which uses a commit message convention that points to a bug number). as of today, this is only true of windows 10 ec2 instances. we can easily include windows 7 & 2012 ec2 instances as soon as we redeploy those worker types and there is ongoing work to include windows 10 hardware instances over the coming days.

Flags: needinfo?(rthijssen)

The AWSY alert is this one.

I think this Talos regression is also related, as it's the same platform.

== Change summary for alert #19144 (as of Mon, 04 Feb 2019 14:23:19 GMT) ==

Regressions:

9% tabpaint windows7-32 pgo e10s stylo 51.34 -> 56.06

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=19144

See Also: → 1528156
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
Whiteboard: infrastructure
You need to log in before you can comment on or make changes to this bug.