Closed Bug 840387 Opened 12 years ago Closed 10 years ago

Write up what to do if you suspect a regression

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: k0scist, Unassigned)

Details

(Keywords: sheriffing-P2)

As far as I know, though I could be wrong, we have no documentation to let developers (etc) know what to do if they suspect a regression. Bits and pieces are here and there, but its probably impossible to decipher what to actually do. In fact, while I think I know the procedure, I'm not really sure. Something like: - (well, first see if its noise. it may/may not be. that by itself is worth writing up and probably complicated. There is the need to explain ad hoc "sure, the plateau is clear" vs what you are statistically short-handing) - try to reproduce locally; I've mostly heard positive responses about https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code except some complaints about it being hard to find. Then you have to compare the numbers. - If you need to compare to a production platform, you'll have to run on try. How to do this? - Comparing numbers is friggin complicated. Of course, being a first-principles guy, while I'm fine short-handing being rigorous, I would at least point to what is missing. In any case, something should be written then. - So you've gotten to this step and think its a real regression. What then? Well you: * comment on the bug relevent to the patch * probably talk to someone(s) about this; certainly whoseever code it was (or if it is yours, then someone else knowledgeable about it), and someone(s) that make decisions about this sort of thing (mbrubeck? not sure) * if you've gotten this far and there is still desire to land the patch (or not backout the patch if it has stayed landed) mail dev-platform to at least clue people in - and we should document how datazilla can be used now (if we're comfortable with that) and what will be possible in the future, pointing to bug #s ideally with some sort of ETA. Perhaps that is it. This bug was prompted by an interchange between a spohl and I at http://logbot.glob.com.au/?c=mozilla%23ateam&s=11%20Feb%202013&e=11%20Feb%202013#c546086 In general, I believe though could be wrong that having this all written down so that if someone asks we could point to a URL that gives a good how-to it would be a comfort for both the questioner and the questionee. I realize this rough outline could be the start of some documentation, but I did this as a napkin sketch in about 10 minutes. I don't have the time right now to make this a wiki page that is readable by human beings and for that I apologize.
Keywords: sheriffing-P2
I think the single most useful thing we can do would be to document the nitty gritty of how to go about usefully (i.e. without destroying our infrastructure) bisecting a performance regression on try. Right now, that is the best way to perform this kind of work. Running it locally is mostly an exercise in futility unless you have a very large jump. Remember that the machines we run on in production are 50-100 times slower than a dev machine. So, trying to repro locally really won't work well unless we have a huge regression, and if we have a huge regression, we probably already have a smoking gun. In reality, we need better tools for this, and this is what Bisect in the cloud is actually *for*. I really think that beyond writing up the "how to" to use try to bisect this we should just throw our resources toward completing the bisect in the cloud so that it can be used for performance regressions and then we make that the way you do this kind of regression hunting.
we have templates for filing regression bugs as well as good documentation on the wiki
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.