Open Bug 1684946 Opened 3 years ago Updated 2 years ago

browsertime vismet tasks with --rebuild flag all reuse a single test result. Perfherder says Confidence is Infinity (high).

Categories

(Tree Management :: Perfherder, defect, P3)

Tracking

(Not tracked)

REOPENED

People

(Reporter: sfink, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [perf:workflow])

These browsertime results all show confidence as Infinity (high).

I'm glad that it has the courage of its convictions, but I am not feeling quite as confident as it is.

Both sides of the comparison have 7 triggers.

The provided link no longer has any data. Could you provide another one?

Flags: needinfo?(sphink)

Oh, sorry, I should have marked this as a duplicate of bug 1602893. But here is a recent example. (Note that this URL will take several minutes to load.)

The problem is that when using mach try fuzzy --rebuild N, vismet tasks all get the input from a single test run. Which means that they appear to be distinct jobs to perfherder, but they all output identical numbers. Presumably, this makes perfherder overconfident in the results (the variance is zero).

You can either mark this bug a duplicate, or use this bug as a request to display something different when variance is zero. Optimally it would be something with a tooltip that says "all values are the same! Are you sure you're measuring something useful here?" I could imagine this happening in other cases than bug 1602893.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(sphink)
Resolution: --- → DUPLICATE

Oops. I meant to leave this for you to decide whether to close it as a duplicate.

Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---

Copying from bug 1602893 comment 1, since that bug covered retriggers and the results of "Add new task", but not mach try --rebuild:

Greg Mierzwinski [:sparky] wrote:

This problem is worse than I originally thought. I can schedule multiple vismet tasks, but each of them only processed the results from the first btime task that was created for them. (We can't retrigger, and we can't schedule multiple runs). So for me to be able to analyze the vismet data, I would have to make one push per trial which is a bit much.

To be clear, this is still an issue with eg mach try fuzzy --rebuild 3.

Summary: Confidence is Infinity (high) → browsertime vismet tasks with --rebuild flag all reuse a single test result. Perfherder says Confidence is Infinity (high).
Whiteboard: [perf:workflow]

Is this bug still relevant? Could you provide an update?

Flags: needinfo?(sphink)
Priority: -- → P3

(In reply to Acasandrei Beatrice (needinfo me) from comment #5)

Is this bug still relevant? Could you provide an update?

It definitely still is. I know other people are running into it as well. As for an update, bug 1602893 fixed other workflows, but the mach try --rebuild workflow is still broken and still biting people.

Flags: needinfo?(sphink)

In my opinion, this is a taskgraph deficiency. The "right" fix would look something like:

  • Add an additional type of dependency alongside if-dependencies and soft-dependencies. I'll call it instance-dependencies. It could imply if-dependencies, though it's actually kind of independent and so I'd probably implement it that way.
  • When duplicating tasks for --rebuild, any dependent tasks that are "instance-dependent" on a duplicated task would also be duplicated, and the downstream duplicates would pull their artifact inputs from the corresponding upstream duplicates. (I looked at this before, and it's kind of messy to implement, but I don't recall the details.)
  • If instance-dependencies is considered to be orthogonal to if-dependencies, then the previous step applies the other way around as well.
  • I think, but am not sure, that this mechanism could replace the vismet-specific workaround in bug 1602893.

I'm not actually familiar enough with taskgraph to state that this is definitely the way to go, but it feels right to me. One part that I'm especially unsure about is whether there's a better name than instance-dependencies! (The idea is that you have an upstream job that produces different output every time it runs, such as profiles or timing data, and a downstream job that consumes those differing outputs. You want to say that a downstream instance is dependent on the upstream instance, not just the upstream label.)

You need to log in before you can comment on or make changes to this bug.