:standard8, 5 runs is definitely not enough here. In this case, you should probably run at least 30 trials for the test on windows7-32. The improvement or regression in your change is probably very small if you are running into this issue. The variability of this metric is at least 10%.
Looking at the one with a 14% regression, there's an outlier in the data that is throwing it off.
That said, even without the outlier it's still a regression so I'm adding this issue to the fxperftest triage discussion topics because those results are contradictory and I wonder if we have anything in the works to help with this.
I want to mention that I did notice a regression in the perfherder data starting around Feb. 6th. The metric's value increased and the variability also increased (it became more bi-modal) with a change that occurred around that point: https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&highlightedRevisions=49b9c64323ba&highlightedRevisions=05bb33181530&selected=1922259,1042059648&series=try,1915518,1,1&series=mozilla-central,1941169,1,1&series=autoland,1922259,1,1&timerange=31536000&zoom=1580262151795,1582217411881,537.8701858441731,1530.8610954716917