Closed Bug 990644 Opened 6 years ago Closed 2 years ago

evaluate our process and criteria for managing talos alerts

Categories

(Testing :: Talos, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jmaher, Unassigned)

Details

After discussing with :avih on irc today, it becomes clear that we are have plenty of improvements to our overall process for tracking performance regressions and improvements.

I want to use this bug to discuss what works and doesn't.  We can try new things- but I need ideas.

Right now (as of firefox 31), this is the plan of attack:
* all alerts generated from the talos analysis should be evaluated
* for all regressions found, we will attempt to find the offending bug/changeset and repopen/comment in the bug. 
* in rare cases we will open a new bug
* all regression bugs will be blocking a tracking bug for the specific version (right now it is bug 990085)
* on merge day we will check all the regressions/improvements against what is on the tracking bug

This is a large improvement over the current process of filing bugs and letting them linger.

Lets keep discussion in this bug related to how the process should be, what would be more friendly to developers and project managers with the end goal of removing all regressions by the time we ship.
Just a quick recap:

Traditionally, we'd mostly rely on the sheriffs to back out patches which caused big regressions, or on automatic emails from the dev.tree-management list which are sent to owners of suspected offending patches, which choose what to do about them.

Joel found out that some of these notifications go seemingly unhandled, and decided to explicitly track regressions in a new-bug-per-regression fashion.

This creates a lot of (probably rightful) bugs, but also an overhead in managing them, connecting them to the bug owners, and in general acting upon them.

The system suggested in this bug is that instead of new bug per regression, we'll re-open the potentially offending bugs and provide a regression summary.

From there, the owner would need to take an explicit and visible action regarding the regression. Closing the bug again with a short one-sentence explanation of why it's not worth pursuing is absolutely legitimate.

What we gain is visible awareness to performance. Hopefully this will also allow less regressions to fall through the cracks, and allow more visible decisions on why some regressions were acceptable.
I really think this discussion should happen on the newsgroups
(In reply to Avi Halachmi (:avih) from comment #1)
> Just a quick recap:
> 
> Traditionally, we'd mostly rely on the sheriffs to back out patches which
> caused big regressions, or on automatic emails from the dev.tree-management
> list which are sent to owners of suspected offending patches, which choose
> what to do about them.
> 
> Joel found out that some of these notifications go seemingly unhandled, and
> decided to explicitly track regressions in a new-bug-per-regression fashion.

The sheriffs only handle regressions that are displayed in TBPL, so 'some of these notifications' is really 'all' from a sheriff standpoint, since talos perf regressions (as opposed to talos job infra issues/crashes etc) aren't visible in TBPL (we need bug 824812 for that). That said, Joel has been a superhero and been tracking the regressions via dev.tree-management recently.

Re new bug vs reopening the old - I completely agree that reopening is more likely to get a response/action, so think that's best going forwards :-)
I was once scolded for re-opening a bug for a perf regression. As I recall, the objection was that re-opening could be misinterpreted as a back-out and/or that it confused the fixed-in-version annotations. I suppose if a fix is landed in version N, causes a perf regression, and the regression is fixed in version N+1, there is room for confusion. From this perspective, I think new-bug-per-regression is a superior solution: One bug provides a clear history and current status of restoring functionality and another provides a clear history and current status of the perf regression. (But I also agree that new-bug-per-regression has disadvantages.)
thanks gbrown- I believe your instance there was why I started new bugs for each regression.  I am working on keeping track of bugs found and what works/doesn't so I have more data to back this up.  

Right now it seems that re opening a bug is effective for 75% of the cases.
The re-open is for keeping the perf notice at the same context as the possibly offending patch, and to allow higher visibility of it for those who are involved with the bug.

Once a decision was taken, be it considering it irrelevant, or not worth the time, or opening a new bug to address the regression, it can be closed by the bug owner.

It's not meant to imply that a backout is required. And it sounds like a good idea to include it with the message format for such notices.

Joel has been doing the one-bug-per-regression thing for the past month or so, and the overhead with it is meaningful on one hand, and not visible enough to those who are involved with the possibly offending bug on the other.

So, it's probably worth a try IMO. If it backlashes or prove otherwise inconvenient, then we'll try to think of yet another approach to keep perf issues more accessible than they currently are.
Another approach, if re-opening messes up some automatic handling, would be to post the regression info without re-opening, and needinfo? the owner of the bug.

Does that sound better?
I think we should stick with the approach in place, but keep a close eye if folks are upset about commenting in existing bugs.  I can see this as a point in the process where we can make improvements in a cycle or two.
Please don't invent new bug life cycle rules here. As a rule of thumb, if a bug's patch lands, it gets resolved as fixed. If the patch gets backed out or turns out to be ineffective, the bug may be reopened.

We have a policy that patches get backed out for causing performance regressions. AFAIK you don't need to be a peer or sheriff or the bug owner to do that, so I guess you should feel free to back out stuff according to that policy, at which point reopening the bug will be the right thing to do.

Otherwise, it seems like you should handle this like any other regression, i.e. file a bug on it, make it block the original bug and set the tracking-firefox31 flag. It seems like you've kind of re-invented this with bug 990085, which doesn't make much sense to me.
Dao, the previous lifecycle didn't include a new bug for every regression. Usually bugs were filed for regressions which the owner had intention to handle.

The goal here is to raise regressions visibility by doing _something_ about every regression, even those which will not get handled, in order to get a visible decision on how the regression got handled.

On those small-ish regressions, the decision of whether to back out, file a followup bug or do nothing cannot be taken by the sheriffs or joel or anyone outside the people which know the context well.

Filing a new bug for every 2-3-4% regressions turns out to create a lot of overhead (and less visibility from the original bug), so we're trying approaches to improve it. One of those was to reopen the bugs with the regression info, and another one was to just needinfo the owner on what should be done about it.

We're still in the process of trying different approaches.
(In reply to Avi Halachmi (:avih) from comment #10)
> Dao, the previous lifecycle didn't include a new bug for every regression.

It generally did and does, except for regressions that would cause a patch to be backed out right away.

> Usually bugs were filed for regressions which the owner had intention to
> handle.

No, the bug first needs to be filed, and then the owner is responsible for taking care of it. Taking care of a bug can mean explaining why it's ok not to worry about it, at which point the bug should probably be marked as WONTFIX (but nevertheless remain in the offending bug's dependency list).

> The goal here is to raise regressions visibility by doing _something_ about
> every regression, even those which will not get handled, in order to get a
> visible decision on how the regression got handled.

Backing out the offending patch or filing a bug on the regression is the common way to do that. I don't see why talos regressions should be different from any other regressions.

> On those small-ish regressions, the decision of whether to back out, file a
> followup bug or do nothing cannot be taken by the sheriffs or joel or anyone
> outside the people which know the context well.

You may not feel confident that a backout is warranted, but there's no reason why you should shy away from filing a bug. Again, this isn't different from non-talos regressions that get filed by nightly testers, QA people or random users.

> Filing a new bug for every 2-3-4% regressions turns out to create a lot of
> overhead

What kind of overhead? Bugs are cheap and filing a new one doesn't create more overhead than reopening one.

> (and less visibility from the original bug),

What kind of visibility are you looking for that you wouldn't get from a new bug blocking the offending bug and with the right tracking flags set? What's special about talos regressions compared to other regressions that you need that kind of visibility?

> approaches to improve it. One of those was to reopen the bugs with the
> regression info, and another one was to just needinfo the owner on what
> should be done about it.

There's no guarantee that a needinfo request will be answered in time. Filing a bug gives you the option to set tracking flags such that it doesn't get lost.
(In reply to Dão Gottwald [:dao] from comment #11)
> (In reply to Avi Halachmi (:avih) from comment #10)
> > Dao, the previous lifecycle didn't include a new bug for every regression.
> 
> It generally did and does...

Not according to comment 0.


> ... and then the owner is responsible for
> taking care of it. Taking care of a bug can mean explaining why it's ok not
> to worry about it, at which point the bug should probably be marked as
> WONTFIX...

We're in agreement here.


> I don't see why talos regressions should be different
> from any other regressions.

The main difference is that talos regressions are not necessarily failures. It's easy to decide to backout patches which fail some tests, but 2-5% or even 20% regressions can be acceptable, depending on context and alternatives. Talos regressions need judgment calls to handle them, and the best context for these calls is the offending bug itself and the people on its CC list.


> You may not feel confident that a backout is warranted, but there's no
> reason why you should shy away from filing a bug.

We're talking about a very different numbers of bugs than what we used to have. If you follow dev.tree-management, you know that the volume of regressions there is far bigger than the number of bugs which are filed for regressions. This is what this bug tries to cover, and the volume is bigger than before.

However, I don't have exact numbers. The statement about the different volumes is based on hunch from what I see and on Joel's inspection.


> > (and less visibility from the original bug),
> 
> What kind of visibility are you looking for that you wouldn't get from a new
> bug blocking the offending bug and with the right tracking flags set? What's
> special about talos regressions compared to other regressions that you need
> that kind of visibility?

I'm hoping that the people on CC of the offending bug will get bugmail with the regression info and the decision taken, and possibly offer their own opinion as well if they see such need. Typically and hopefully these would be only 2 extra messages for regressions: the regression info and the resolution regarding it.

The only way to get that with new bug per regression would be to duplicate the original bug with its CC list, and I don't think many would appreciate it.


> There's no guarantee that a needinfo request will be answered in time.
> Filing a bug gives you the option to set tracking flags such that it doesn't
> get lost.

Nothing is guaranteed, and regardless, this proposed system is not strictly about timing, but rather about more visible info and resolution.

Apparently many regressions fall through the cracks. We want to handle them explicitly, and this just adds volume of actions which were not taken before. The question right now is where should these actions take place to get the best benefit.

Of course, questioning the need to explicitly follow each regression from dev.tree-management in bugzilla is also legitimate.
(In reply to Avi Halachmi (:avih) from comment #12)
> (In reply to Dão Gottwald [:dao] from comment #11)
> > (In reply to Avi Halachmi (:avih) from comment #10)
> > > Dao, the previous lifecycle didn't include a new bug for every regression.
> > 
> > It generally did and does...
> 
> Not according to comment 0.

I meant regressions in general. I.e. if a patch lands and causes a problem, it should be backed out or a bug should be filed. It seems like what's reported as a talos regression often isn't taken seriously by anyone, if it gets looked at at all, but that doesn't falsify that there is a general regressions handling workflow.

> > I don't see why talos regressions should be different
> > from any other regressions.
> 
> The main difference is that talos regressions are not necessarily failures.
> It's easy to decide to backout patches which fail some tests, but 2-5% or
> even 20% regressions can be acceptable, depending on context and
> alternatives. Talos regressions need judgment calls to handle them, and the
> best context for these calls is the offending bug itself and the people on
> its CC list.

This is true for all kinds of regressions. We resolve non-talos regression bugs as invalid or wontfix all the time.

> > You may not feel confident that a backout is warranted, but there's no
> > reason why you should shy away from filing a bug.
> 
> We're talking about a very different numbers of bugs than what we used to
> have. If you follow dev.tree-management, you know that the volume of
> regressions there is far bigger than the number of bugs which are filed for
> regressions. This is what this bug tries to cover, and the volume is bigger
> than before.

Again, bugs are cheap. There's no bug id shortage or a limit to dependency list sizes. Are you concerned about the amount of bugmail generated by all those bugs? You wouldn't get less bugmail when reopening bugs, in fact you would get more bugmail if several back-and-forth comments are sent to all people on the CC list.

> I'm hoping that the people on CC of the offending bug will get bugmail with
> the regression info and the decision taken, and possibly offer their own
> opinion as well if they see such need. Typically and hopefully these would
> be only 2 extra messages for regressions: the regression info and the
> resolution regarding it.
> 
> The only way to get that with new bug per regression would be to duplicate
> the original bug with its CC list, and I don't think many would appreciate
> it.

You could say that about any regression. Why do we generally not handle regressions like proposed here? Because:

* reopening bugs that are effectively still landed is confusing
* if you don't reopen the bug your regression report may be lost
* it gets more messy if a bug causes multiple regressions
* being on the CC list doesn't mean you want to be involved with every followup activity

> > There's no guarantee that a needinfo request will be answered in time.
> > Filing a bug gives you the option to set tracking flags such that it doesn't
> > get lost.
> 
> Nothing is guaranteed, and regardless, this proposed system is not strictly
> about timing, but rather about more visible info and resolution.

It's about timing as much as any other regression, since once enough time has passed we'll ship regressions to users.

> Apparently many regressions fall through the cracks. We want to handle them
> explicitly, and this just adds volume of actions which were not taken
> before. The question right now is where should these actions take place to
> get the best benefit.
> 
> Of course, questioning the need to explicitly follow each regression from
> dev.tree-management in bugzilla is also legitimate.

Also a good question. Can we trust these messages more often than not? Is it reasonable to expect bug owners to figure this out? Is this what makes you uneasy to file bugs on these regressions? Then again, reopening a bug isn't a more lightweight action than filing a new one...
I agree with Dao here.  The right way to track any regression (including performance ones) is to file a new bug, mark it as tracking-firefoxNN?, and also mark it to block the culprit bug.  The tracking flag is how you get visibility into those bugs.  Reopening a bug means something very specific (that the patch is backed out) and overloading that to mean other things for Talos regression doesn't seem valuable.
Lawrence and me had a conversation about this. He suggested that release management should track regressions and not let them ride the trains.

I think the Dao/Ehsan suggestion is reasonable for tracking these.

Lawrence, can you suggest a process that would work for release management here?
Flags: needinfo?(lmandel)
I will start using the tracking flags for all bugs- now that we have more data and a more complete picture (bugs filed for all detected regressions) we can do more actionable items.
I would start with exactly what Dao and Ehsan suggested. Nom each regression bug for tracking-firefoxNN? Noming the bug means that the release manager overseeing the release will review the request and follow up with the appropriate team in order to get to a decision and action.

cc Lukas and Sylverstre for a heads up and any additional input.
Flags: needinfo?(lmandel)
Release Management can track regressions but there would be a few criteria as we moved along the trains.  When we get to Beta, the bugs have to have an assignee, and by the time we get to the middle of the Beta cycle, no more speculative fixes can be landed so backouts would be the only remaining option. If we can't backout for some reason, the bugs would then be untracked as there would be no more action remaining on them.  It sounds like that aligns with some of the comments in this bug where even if no action can be taken, we have data on the regressions and what is done about them.
Thanks Lukas and Lawrence.  Lets use tracking flags for the next release or two and see how it works out.
I think there's another thing we should consider while requesting explicit attention to each and every performance regression:

Since this is a deviation from our past behavior where regressions are now more visible, it could require more developers attention, and therefore maybe implicitly shift more development resources towards handling regression.

I think that if such shift happens, it should be explicit rather than implicit (i.e. not just as a result of filing more regressions bugs because we now look closer at them).

I.e. I don't think it's a good thing if features and enhancement get less resources without us explicitly meaning to take resources from them and directing them towards regressions handling.

Until now there was a de-facto balance where some regressions were handled and some were not. We don't _know_ that this balance was a bad one because we didn't track all regression and didn't try to evaluate how many were dismissed "wrongly".

I think it's very possible that the balance was good, i.e. devs acted responsibly, reacted to regression when they felt it required attention, or dismissed them for valid reasons (e.g. priorities, impact, etc) which were weighted with some good context.

It's possible that as a result of this new process, resources might shift implicitly, and If that happens, we should be aware of that.

So please post your concerns here if you feel this system creates more burden/noise than you think it should. We'll have to take that into account.
Flags: needinfo?(lmandel)
Avi, I think you're attributing too much intention to the way that we have traditionally (mis-)handled Talos regressions.  In my viewpoint, they're like any other regression, so filing and tracking them makes sense to me.  That doesn't mean that developers will need to fix all of the regression bugs that people file, just like today.  It will just help us at least be aware of the ones that we're not fixing until we release, and will hopefully help document the reasons (or the lack of) why that is.

I think the concern over the shift on what developers spend their time on is theoretical, and I'm not sure if filing bugs is going to change people's behavior in itself.  So I'm not too worried about the concern raised in comment 20.
I'm also not concerned about using up too much dev time on this. First, devs should be aware of the regressions caused by their landings so that they can make their own decisions. Second, tracking is first and foremost a cue to have a conversation about one or more bugs. Release management doesn't dictate what will be fixed we serve as an advocate with an eye to overall release health.
Flags: needinfo?(lmandel)
Some feedback on how this works so far:

In bug 987875, bug 993619, bug 994683 and bug 994692 - benjamin is (rightfully) lost. We say it regressed, we ask tracking? , but he got no context to decide if we need to track or not. I explained in bug 994692 comment 3, but we still need a solution.

I think the bug owner should first provide some info which is required before deciding to track or not, and then relman can make that decision.

Or, maybe we should do 2 kinds of trackings: one which only tracks regressions per release, and one which tracks regressions which are important to us/relman.

Not sure of the details yet, but suggestions are welcome.
(In reply to comment #23)
> Some feedback on how this works so far:
> 
> In bug 987875, bug 993619, bug 994683 and bug 994692 - benjamin is (rightfully)
> lost. We say it regressed, we ask tracking? , but he got no context to decide
> if we need to track or not. I explained in bug 994692 comment 3, but we still
> need a solution.
> 
> I think the bug owner should first provide some info which is required before
> deciding to track or not, and then relman can make that decision.

Yes, this is what needs to happen.

Take this example of a non-perf regression: someone changes something in, let's say, DOM, and then a few days later someone files a bug against Nightly saying a button in a website is broken in that clicking on it doesn't show you the cat picture you'd normally see.  As that is a Web facing regression, we probably want to track it, so the right thing to do would be to tracking? it.  But relman might not have enough data to be able to make a call unless someone does some investigation to prove that the regression is real and actionable.

> Or, maybe we should do 2 kinds of trackings: one which only tracks regressions
> per release, and one which tracks regressions which are important to us/relman.

I don't really understand the distinction.  If the regression is somehow not something which is important (let's say in a faulty Talos test or something like that) then we don't need to track it by definition.  If you want to maintain a list of *all* Talos regressions no matter why they're important or not, that list should not be maintained through tracking bugs, as those regressions should not affect our releases (as they're not important).  If you need to track those, you can invent another way (tracking them in a spreadsheet outside Bugzilla, using a whiteboard flag, using Bugzilla user tags, etc.).
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #24)
> > Or, maybe we should do 2 kinds of trackings: one which only tracks regressions
> > per release, and one which tracks regressions which are important to us/relman.
> 
> I don't really understand the distinction.  If the regression is somehow not
> something which is important (let's say in a faulty Talos test or something
> like that) then we don't need to track it by definition...

The distinction is:

1. All talos regression bugs which we file for a release.

2. Those which are flagged tracking? (or maybe tracking+).

Because not all the bugs from 1 would end up on 2, but we still want to keep the ones on 1 accessible for further analysis etc.
(In reply to comment #25)
> (In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment
> #24)
> > > Or, maybe we should do 2 kinds of trackings: one which only tracks regressions
> > > per release, and one which tracks regressions which are important to us/relman.
> > 
> > I don't really understand the distinction.  If the regression is somehow not
> > something which is important (let's say in a faulty Talos test or something
> > like that) then we don't need to track it by definition...
> 
> The distinction is:
> 
> 1. All talos regression bugs which we file for a release.
> 
> 2. Those which are flagged tracking? (or maybe tracking+).
> 
> Because not all the bugs from 1 would end up on 2, but we still want to keep
> the ones on 1 accessible for further analysis etc.

OK, I see, thanks.  Based on the above, I think you can achieve this by only nominating the important bugs for tracking, and just keeping the others on file, after having CCed the patch author, etc.  Is that correct?
Yes, and joel does that already (e.g. all the bugs mentioned on comment 23 block bug 990085). It was a refinement for:

(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #14)
> I agree with Dao here.  The right way to track any regression (including
> performance ones) is to file a new bug, mark it as tracking-firefoxNN?, and
> also mark it to block the culprit bug.  The tracking flag is how you get
> visibility into those bugs. ...

Where not all filed regression bugs should automatically be flagged for tracking, since there's a (hopefully finite and short) period where we know of the regression but not enough of its context for relman to be able to decide on tracking.
Recap of how we'll try to process it:

1. A regression bug is filed, which hopefully can point accurately enough at the offending bug.

2. A process to gather more info before flagging tracking? . Typically this would be needinfo? the bug owner to provide some context, but this doesn't always lead quickly to actionable info, e.g. "I'll look into it", "I don't understand how my patch regressed this", the impact is unclear, etc.

3. If/when step 2 ends up with useful info, we'll flag tracking? and then relman can decide if they want to track it or not.

Sounds good?
To clarify tracking from a relman perspective:
1. not all bugs that are nomed will be tracked
2. tracking a bug does not guarantee that it will be fixed in a given release but does guarantee that someone from relman will review the bug and will likely follow up with the engineer or team to discuss the impact and importance

In general, relman tracks bugs that we think are important for a release. Some of the bugs will drop their tracking flags over time as more is learned. While I appreciate the effort to get as much information as possible about a bug before tracking, I would rather know as early as possible about a potential issue. Knowing that Talos is reporting a 10% regression in some measure that we care about is useful even if we don't yet know the cause. relman can help ensure that attention is given to determine the cause (could be that this is a Talos issue and not a true regression) and whether we can and will fix the issue in a specific release.
(In reply to Lawrence Mandel [:lmandel] from comment #29)
> While I appreciate the effort to get as much information as
> possible about a bug before tracking, I would rather know as early as
> possible about a potential issue.

If I understand correctly, you prefer that we ask for tracking as soon as we file the bug? Any threshold of any kind? e.g. only for NN% regression or more? only on some tests? Or just any regression bug which get filed?

What about this?

(In reply to Avi Halachmi (:avih) from comment #23)
> In bug 987875, bug 993619, bug 994683 and bug 994692 - benjamin is
> (rightfully) lost. We say it regressed, we ask tracking? , but he got no
> context to decide if we need to track or not.

And we also usually don't have context or knowledge. We have to wait for the offending bug owner for that, and hopefully though the discussion, the context emerges.
Flags: needinfo?(lmandel)
(In reply to Avi Halachmi (:avih) from comment #30)
> (In reply to Lawrence Mandel [:lmandel] from comment #29)
> > While I appreciate the effort to get as much information as
> > possible about a bug before tracking, I would rather know as early as
> > possible about a potential issue.
> 
> If I understand correctly, you prefer that we ask for tracking as soon as we
> file the bug? Any threshold of any kind? e.g. only for NN% regression or
> more? only on some tests? Or just any regression bug which get filed?

There should probably be a threshold, even if it is a soft one. For regressions that you consider minor, we probably don't need to be notified. Anything that hits the major category should be flagged. I'm not sure how to quantify minor and major in terms of % at this point and can use some guidance here. (We will also learn over time what level of regression we should care about.)
> 
> What about this?
> 
> (In reply to Avi Halachmi (:avih) from comment #23)
> > In bug 987875, bug 993619, bug 994683 and bug 994692 - benjamin is
> > (rightfully) lost. We say it regressed, we ask tracking? , but he got no
> > context to decide if we need to track or not.
> 
> And we also usually don't have context or knowledge. We have to wait for the
> offending bug owner for that, and hopefully though the discussion, the
> context emerges.

What I was trying to say earlier is that from a release perspective (and tracking perspective) I care most about the impact to the release. If we know there is a major regression (however we define major), the fact that I don't know the cause or the next steps doesn't matter. relman will help find an owner and drive to the information that's needed to make a decision.
Flags: needinfo?(lmandel)
There's a loop here. We can't know/assess if it's major before the bug owner expresses her opinion. This can take time, but you want us to flag for tracking ASAP if it's major, but we can't know before etc etc etc.
To make it clear, performance regression never had hard thresholds beyond which we stated "this should be fixed". Naturally, we want to fix all performance regressions. But some are bigger than others, some are easier to fix than others, etc. The threshold doesn't exist, and it's highly context dependent.

The decision of whether to handle the regression or not (at least those which I was involved with) is a judgment call based on understanding the context (load on the team which should handle it, various assessments, dependencies, assessed effort, risks, etc).

This is a lengthy process, or at the very least quite involving. It's a discussion (at bugzilla or IRC), which ends up in a judgment call which hopefully all those involved agree on. This process must involve the offending bug owner, and many times her team as well.

We don't have a process for that. We make (hopefully good) calls based on common sense and understandings between those involved.
Also, performance regressions which the users will notice are the minority IMO.

Many regressions in the ranges of 2-3-5-10 or even 20% or 50% on some cases are not something which most users can notice. The nature of most performance regressions is creeping degradation - not breakage, and they're therefore mostly unnoticeable individually.
Let's just say we won't flag regressions for tracking until the patch author has replied, unless the regressions is obvious* or >= 10%

* "obvious" = regression is clearly caused by the linked bug, definitely degrades performance, and definitely needs to be fixed (author unlikely to argue that benefits outweigh costs)
(In reply to Avi Halachmi (:avih) from comment #33)
> To make it clear, performance regression never had hard thresholds beyond
> which we stated "this should be fixed".

That's not entirely true, Ts IIRC has a 3% threshold beyond which anything in the range is supposed to be backed out.  I enforced that for a while, and I did back stuff out because of it.  I'm not sure if anyone else has enforced it as aggressively as I did (or at all) since I stopped paying attention to these regressions.

(In reply to Vladan Djeric (:vladan) from comment #35)
> Let's just say we won't flag regressions for tracking until the patch author
> has replied, unless the regressions is obvious* or >= 10%

FWIW back when we were discussing the Ts threshold one point which was brought up is that different Talos test suites have different "normal" ranges for regressions.  The threshold should probably depend on the test.
(In reply to Vladan Djeric (:vladan) from comment #35)
> Let's just say we won't flag regressions for tracking until the patch author
> has replied, unless the regressions is obvious* or >= 10%
> 
> * "obvious" = regression is clearly caused by the linked bug, definitely
> degrades performance, and definitely needs to be fixed (author unlikely to
> argue that benefits outweigh costs)

What does "definitely degrades performance" mean here? If there are talos tests that report regressions without there being an actual performance regression, we should fix/disable these tests.
(In reply to Dão Gottwald [:dao] from comment #37)
> What does "definitely degrades performance" mean here? If there are talos
> tests that report regressions without there being an actual performance
> regression, we should fix/disable these tests.

As far as I know, the tests are good, though obviously everything can be improved. This quoted part is fuzzy for me as well.
if we are not going to take regressions seriously, then we shouldn't be running the tests.  So if there is a platform or test that we always ignore, lets fix or disable that.

That said a 2% regression which is hard to fix might be worth accepting.  My goal is to raise visibility of the regression so we can make the decision "this feature is worth the performance tradeoff", even if it is a 2% regression.
(In reply to comment #39)
> That said a 2% regression which is hard to fix might be worth accepting.  My
> goal is to raise visibility of the regression so we can make the decision "this
> feature is worth the performance tradeoff", even if it is a 2% regression.

Who would be the person to make that call?  Is it the developer of the feature or someone else?
probably a combination of the developer, release manager, and someone else, say vladan or avi.  I am sure others have ideas as well.
We still have many open issues with following up on regressions. I'd like to give a concrete example.

Bug 998454. Regression which appeared on Firefox 29 few days ago due to a late uplift. Some investigation took place, and today it has come to my table to call what to do about it.

It's a performance regression which we don't like (12% tab animation regression on windows 8), but it's far too late to do anything about it now (and probably also few days ago when it first showed up, although back then we still weren't sure of the offending bug). Since It's not breaking anything, I recommended to leave it be.

My question is: what would have been the best way to handle this case in the first place from relman perspective? How would you have liked it to interact with you?
(In reply to comment #42)
> My question is: what would have been the best way to handle this case in the
> first place from relman perspective? How would you have liked it to interact
> with you?

I think this is an example of why we need to detect and act on regressions as soon as possible.  This seems to not have affected trunk, but did we catch it on Aurora?  If yes, I'd look into why it fell through the cracks on Aurora.
in this specific case we landed a change on aurora/beta and it was reported as fast as possible- I started bisecting the day the alert came in.
(In reply to Dão Gottwald [:dao] from comment #37)
> (In reply to Vladan Djeric (:vladan) from comment #35)
> > * "obvious" = regression is clearly caused by the linked bug, definitely
> > degrades performance, and definitely needs to be fixed (author unlikely to
> > argue that benefits outweigh costs)
> 
> What does "definitely degrades performance" mean here? If there are talos
> tests that report regressions without there being an actual performance
> regression, we should fix/disable these tests.

e.g. sometimes a patch changes the test or the meaning of the test numbers, so the "regression" in the test score is expected and doesn't actually represent a deterioration in performance.
Another update.

While these regressions bugs do get some attention and discussion going, it seems that bug owners are not quick to declare "No, it's not worth the hassle" (OTOH "Yes, I'm following it up" does happen occasionally). IMO understandably. One reason IMO is that it could take considerable investigation to arrive at this conclusions.

Sure, they'd prefer to not have the regression, sure, they'd probably put some time into trying to improve it if they had less burning tasks, but declaring upfront "No, I won't" is not something any developer wants to declare unless it's absolutely clear to them that they shouldn't (and almost nothing is 100% clear). Again, IMO understandably.

So it seems that it's hard to get a "closure" on these regression bugs when the owner doesn't willingly follow it up.

Trying to push it towards a closure would force devs to put time into it without taking their other priorities into account. It's like saying "deciding what to do with this regression is high on your priorities list", which is not something I'm convinced we should do, especially not for the majority of regressions which are relatively small (say, <15% on one or few platforms/tests).

I'm not sure how can we improve on his, but I wanted to share my observations/opinion. Maybe others would have better insights.
If a particular benchmark systematically gets worse over times and the individual regression bugs seem fairly inactionable, perhaps it would be better to create a new bug focused on profiling the current performance bottlenecks. If it would be useful for bookkeeping, this new bug could be made to depend on the individual regression bugs.

As an example of this I'd like to bring up bug 986323, where njn was able to quickly develop a plan of action by looking at the difference between AWSY reports over a period of many months.

Admittedly, this hasn't exactly reversed the trend on AWSY yet so far (that may be difficult in general), and performance regressions seem to be a lot easier to spot as AWSY is rather noisy (see bug 1000268 for a potential fix for that).

But the fact that the main components of the regression were identified fairly rapidly indicates to me that looking at the big picture may at times be more useful than spending a lot of time focusing on the small changes that lead there.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #47)
> ... looking at the big picture may at times
> be more useful than spending a lot of time focusing on the small changes
> that lead there.

I think similarly.

I think the natural evolution of a software system includes code changes/additions where occasionally small regressions are introduced.

Once in a while, when a certain aspect looks like "it's going too far" or "this really should be better" in terms of performance/regressions, it would be declared as something which should be improved, and then the big picture would be taken into account, which could result in some refactoring or other specific project or effort to improve this metric.

Making it a high priority that each small code change doesn't result in a small regression in any of the known metrics requires a lot of effort (in detecting the regression and also in modifying the code to make it go away, if at all possible - which is an effort even if it doesn't result in fixing the small regression). Many times it takes effort from several developers.

Maybe a balance of the two approaches is possible: once a small regression is detected, we won't expect a full on investigation (which happened on some of these regressions bugs so far), but rather notice the owner of it, and then ask is there's an obvious/clean/easy fix, or if something was overlooked at the change, etc.

If the owner still thinks the change is reasonably good with no obvious issues at the code, then unless the owner thinks it should really be investigated deeper, we should accept it as .

In other words: maybe we should treat small regressions as acceptable by default unless those "in the know" (typically a the bug or module owner) think otherwise, at which case they'll pursue it because they're responsible.

This means that this new system of bug per regression would be used just to make sure that such regressions are not overlooked, rather than a system to make sure they don't happen.

It may seem like a nuance, but it's a state of mind which could save us a lot of development resources, while still keeping everyone with awareness for performance.
(In reply to Avi Halachmi (:avih) from comment #32)
> We can't know/assess if it's major before the bug owner expresses her opinion.

I disagree! It's too reductionist to say "we either know everything or we know nothing" about a given regression. Any reasonable knowledgeable person should be able to estimate severity/impact based on a quick analysis of talos data (in the form of comment 0s of the bugs Joel files, for example). You and Joel are reasonable knowledgeable people when it comes to performance regressions :)

It's absolutely a judgement call, and it can be difficult to get right, but it's not impossible, and it's OK to err on the side of "tracking too much" (because it's much cheaper to stop tracking something than it is to miss tracking something).
Depends on: 1008197
No longer depends on: 1008197
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.