Closed Bug 1458242 Opened Last year Closed 11 months ago
Flushed to tp6 as a measurement
46 bytes, text/x-phabricator-request
|Details | Review|
now that bug 1457325 is complete we can add a second measurement to tp6. Once tp6 is running in raptor, this will be doable.
Whiteboard: [PI:May] → [PI:June]
add DOMContentFlushed to tp6, collect it and return the geometric mean of all measurements.
Comment on attachment 9008857 [details] Bug 1458242 - add DOMContentFlushed to tp6 as a measurement. r=rwood Robert Wood [:rwood] has approved the revision.
Attachment #9008857 - Flags: review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/autoland/rev/1496cbf5f2c1 add DOMContentFlushed to tp6 as a measurement. r=rwood
> add DOMContentFlushed to tp6, collect it and return the geometric mean of all measurements. I'm not sure this is what we want. The goal here was to stand up DOMContentFlushed so that we can observe and see how it tracks the other measurements, and then decide (with Product) how much importance to place in it. So we want (a) a way to be able to look at the graph for these values next to the existing (fnbp) tp6 graphics, and (b) to avoid influencing existing measurements during the evaluation. Does this patch give us that?
Thanks for commenting :bholley. We are happy to do what you propose- we don't have a good way to visualize this right now. There might be some shortcuts to make this happen. So the goal is for all tp6 pages to track: * timetofirstnonblankpaint * domcontentflushed we also measure hero which is a custom element that we created on our own which measures something we thought up. Are there other measurements? If it is just the 2, we could track those a different tests for now and then graph them side by side in perfherder so it is easy to review. I would like to know what we would like to alert on. It is unrealistic for us to currently sheriff multiple data points for tp6 (we could do the 2, but 3 adds that many more tests to sheriff which we are already falling behind regularly.) Any direction would help- otherwise we can use ttfnb and dcf and track those both over time. One other caveat- this is only on m-c right now until we get all docs and geckoprofiling support for raptor (maybe in the next 2 weeks) then we can move tp6 and some tests to tier-1 for raptor and turn them off for talos.
I don't think we need to sheriff them during the first stage. I think the ideal infrastructure would be to have tests be able to have subtests, whose results we can graph individually. There would be a topline score, which we could separately configure. This would allow us to set up additional subtests and track them before deciding how we should sheriff them. The eventual decision might be (a) don't sheriff it, just track it as an FYI, (b) roll it into the other score via a geometric mean and sheriff it that way, or (c) break it out into its own top-level test. If that's something we can set up, I think that'd be great. If it would be a lot of work, setting up an additional un-sheriffed top-line score would be the way to get this particular issue moving (but wouldn't scale as easily to hero element and other things).
we can look into this- I think we could configure subtests or metrics to post and not sheriff, I can pick this up next week. :rwood, do you have any concerns?
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #10) > we can look into this- I think we could configure subtests or metrics to > post and not sheriff, I can pick this up next week. > > :rwood, do you have any concerns? No concerns here, all the individual measurements as well as the overall topline score (geometric mean) are available in the PERFHERDER_DATA submission, as well as in the perfheder-data.json taskcluster artifact - so I would think it's doable on the reporting side.
I went to write a patch for this and then realized we already have this data. Here is time to first non blank paint, dom content flushed, and custom hero element for facebook: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1778547,1,10&series=mozilla-central,1731717,1,10&series=mozilla-central,1731718,1,10 as a note, we only run this on mozilla-central right now, but this gives is a good sense of tracking. :bholley, does this work for you? How could we help you make this analysis easier or is this excellent?
Ahah! That is _exactly_ what I was looking for. I realize I was looking at "talos", but that the new results are under "raptor". So now we wait for patches to come along that impact the graphs, so that we can see how results track each other. Thanks Joel!
Resolution: FIXED → WORKSFORME
Err, whoops - I forgot we we'd hijacked an existing bug.
Resolution: WORKSFORME → FIXED
Oh, and to confirm - have the geometric mean bits been applied yet? I generally think we should gain more confidence in this this measurement before we let it impact our top-line measurements.
Hi Bobby, yes raptor reports the top-level score as the geometric mean of all the different measurements taken. It is tier2 only running on central right now. Here's a raptor tp6 job . In treeherder under 'Job Details' select the perfherder-data.json file and you'll see in there the top-level geometric-mean result ('value') as well as all the individual measurements (currently dcf, fnbpaint, and hero).  https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=raptor,tp6&selectedJob=202246143
(In reply to Robert Wood [:rwood] from comment #16) > Hi Bobby, yes raptor reports the top-level score as the geometric mean of > all the different measurements taken. It is tier2 only running on central > right now. Here's a raptor tp6 job . Ah ok. As long as we're still running the old tests as tier-1 that's fine.
You need to log in before you can comment on or make changes to this bug.