Split DevTools DAMP test suite in several chunks
Categories
(DevTools :: General, task, P3)
Tracking
(firefox100 fixed)
Tracking | Status | |
---|---|---|
firefox100 | --- | fixed |
People
(Reporter: jdescottes, Assigned: jdescottes)
References
(Blocks 2 open bugs)
Details
Attachments
(2 files, 2 obsolete files)
The DevTools performance tests (DAMP) take a long time to run and often run into GC issues.
We should split up the suite. :jmaher provided a prototype at https://hg.mozilla.org/try/rev/0704f612572bb1a5da7a8a5420a24ed155d9ecc9
Updated•2 years ago
|
Assignee | ||
Comment 1•2 years ago
|
||
Hi Joel,
While playing around with this I realized that introducing those chunks as proposed here means losing all the history about the performance tests.
As you might know we are mostly interesting in tracking performance of each subtest rather than tracking the overall DAMP "score". That's why in theory, splitting is fine for us. But it would still be great to be able to keep track of the historical values.
Another downside is that on perfherder the results show up as different tests (damp1, damp2, damp3). So if we are interested in a specific subtest, we have first to know in which chunk it runs.
It's probably just something we need to get used to, but I was naively thinking everything could still show up as a single "damp" suite in perfherder. Just wanted to check with you if that would be feasible? Thinking more about it, I can see how that would be problematic to implement, but still wanted to check.
Comment 2•2 years ago
|
||
many things are more complicated then they appear. Great catch on this. If I understand correctly, you want to make sure that the summarized damp
score remains. I think the only way to do that is to run everything together.
I guess one way to break it up would be cold, complicated, custom, simple, etc. Ideally there could be a logical grouping so it would be easy going forward.
In the end, the same queries to perfherder should give you the same graphs for the subtests (unless I am overlooking the obvious).
Assignee | ||
Comment 3•2 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -0800) from comment #2)
many things are more complicated then they appear. Great catch on this. If I understand correctly, you want to make sure that the summarized
damp
score remains. I think the only way to do that is to run everything together.
It's not a hard blocker. It's not so much about the score and rather about making it easy to compare 2 runs. Which is easier when all subtests are under a single "damp" suite on perfherder. For DAMP we almost never look at the overall score, which is very rarely significantly impacted, instead we immediately jump in the subtest view to see if any subtest regressed significantly.
For instance, imagine I want to assess the perf impact of a low level patch, and I have no idea which panel it might slow down. When all the subtests are under a single DAMP suite, I can open the compare view, click on subtests
for one of the DAMP reports, and there I can see if any subtests was significantly impacted by filtering with "Show only important changes". But with 3 chunks, I will need to do this 3 times. Open on all the subtests for damp1, check if any subtest regressed ; same for damp2 ; same for damp3. So that's a bit more tedious.
When you are chasing a specific performance problem or regression it's really less of a problem, because you should know in which chunk this impacted test was running, and you can simply run this test only.
I guess one way to break it up would be cold, complicated, custom, simple, etc. Ideally there could be a logical grouping so it would be easy going forward.
I agree, that's a great suggestion. Have some logical grouping so that it's easy to know where each test is expected to run. But still try to have a good balance so that chunks take a comparable amount of time. So far one split which seems to work is:
- chunk1: all inspector, accessibility, styleditor tests (all those tools are somewhat related)
- chunk2: all console tests (that's where we have the most tests!)
- chunk3: all the rest :) (debugger + netmonitor mostly, but also a few others)
With that split, I might just use names instead of indices for the chunks (ie damp-inspector instead of damp1).
In the end, the same queries to perfherder should give you the same graphs for the subtests (unless I am overlooking the obvious).
Right, but the data set for each subtest will be a new one, fully disconnected from the data we had when everything was under damp
. Right now on perfherder if I try to add a data-set, I get duplicated entries for damp ${subtest name} ${platform info}
and damp1-2-3 ${subtest name} ${platform info}
. And if I select a data-set from damp 1-2-3, there is no past data available. Eg see https://treeherder.mozilla.org/perfherder/graphs?series=try,4101729,1,12&selected=4101729,1473760456
But I guess that's expected, and there's no easy way to workaround this. We have a custom viewer for perf data at https://firefox-dev.tools/performance-dashboard/ where we can write some code to reconcile perf data. We mostly used it when platform names used to change, but I guess we can do the same if the data-set name changes as well.
I think getting faster DAMP execution times and reducing GC issues is more important than those issues, and that we should move forward. It might also allow us to reenable some tests which got disabled for GC issues.
Assignee | ||
Comment 4•2 years ago
|
||
Split DAMP in 3 explicit sub-suites: damp-inspector, damp-webconsole, damp-other
Locally you can still run any DAMP subtest with --suite=damp or --activeTests=damp
Assignee | ||
Comment 5•2 years ago
|
||
(After presenting this to the team, losing the ability to compare all subtests at once is still an issue, so let's see if we can find a solution)
Hi Dave,
We are trying to split DAMP's try job into smaller ones (3 at the moment). So we created 3 suites:
- damp-inspector
- damp-webconsole
- damp-other
The current patch on phabricator works, but there is still one downside which is that when you want to compare the results between 2 pushes, instead of comparing the subtests under "damp", we now would have to compare the subtests for three different suites.
With "damp", we are really not interested in damp's score, we only look at the subtests. Is there a way our 3 jobs could contribute to a single "damp" suite, so that we can still see and compare all the subtests easily? I am not sure if this is feasible and if it is, where it should be configured/handled. Is this something we could do?
Thanks!
Comment 6•2 years ago
|
||
(In reply to Julian Descottes [:jdescottes] from comment #5)
With "damp", we are really not interested in damp's score, we only look at the subtests. Is there a way our 3 jobs could contribute to a single "damp" suite, so that we can still see and compare all the subtests easily? I am not sure if this is feasible and if it is, where it should be configured/handled. Is this something we could do?
I seem to recall that for power data, we have used multiple JSON files to consolidate and ingest into Perfherder. I think this was done with additional suites rather than subtests, so it might not work for your use case. :sparky do you recall?
That said, if you're not interested in the suite level value, you could promote your subtests to tests. You can do this either by having multiple "suites" in your JSON file, or by simply removing the summarised suite value (this retains the structure but Perfherder stops treating these results as subtests). If you did this, you would lose continuity with your past results as you would have new signatures.
I think it would be a good idea to connect with Alex Irimovici, who is currently conducting user research on Perfherder's compare view and may have some ideas on how to achieve what you're after. If not, perhaps this will inform our direction.
Assignee | ||
Comment 7•2 years ago
|
||
Thanks for the feedback!
by simply removing the summarised suite value (this retains the structure but Perfherder stops treating these results as subtests). If you did this, you would lose continuity with your past results as you would have new signatures.
At first glance, this seems like a good option. Continuity at this point is a nice to have, we are more interested in monitoring regressions, and we have another dashboard were we can try to reconcile the data if we want to keep some historical tracking. I imagine this would automatically promote all our subtests without having to declare them explicitly anywhere else.
Comment 8•2 years ago
|
||
(In reply to Julian Descottes [:jdescottes] from comment #7)
I imagine this would automatically promote all our subtests without having to declare them explicitly anywhere else.
I believe so, though I'll leave the needinfo for Alex to confirm. Perfherder will treat the subtest name as the test name, and so it will also be necessary to update the documentation so that each test has a destination.
If you're making these changes you may want to consider other changes to the name at this time, so as the apparently redundant ".DAMP" suffix. Also, you can use extraOptions to express test variants that create distinct test signatures, so you could have "inspector.open" with extraOptions of "cold", "simple", "complicated", and "custom". You could perhaps take this further and have an "open" test with the panel as an extraOption.
Comment 9•2 years ago
|
||
I (In reply to Dave Hunt [:davehunt] [he/him] ⌚GMT from comment #6)
(In reply to Julian Descottes [:jdescottes] from comment #5)
With "damp", we are really not interested in damp's score, we only look at the subtests. Is there a way our 3 jobs could contribute to a single "damp" suite, so that we can still see and compare all the subtests easily? I am not sure if this is feasible and if it is, where it should be configured/handled. Is this something we could do?
I seem to recall that for power data, we have used multiple JSON files to consolidate and ingest into Perfherder. I think this was done with additional suites rather than subtests, so it might not work for your use case. :sparky do you recall?
Yes that's correct. We had two separate PERFHERDER_DATA outputs, one for the technical metrics, and the other for the resource usage metrics. We ended up with multiple blobs because of how different the summarization value was.
Comment 10•2 years ago
|
||
I was reviewing some very old patches and saw bug 1612896. This would solve GC issues while keeping damp jobs unique and still long to run.
Assignee | ||
Comment 11•2 years ago
|
||
(In reply to Alexandre Poirot [:ochameau] from comment #10)
I was reviewing some very old patches and saw bug 1612896. This would solve GC issues while keeping damp jobs unique and still long to run.
That could also be an option!
Joel: I filed the bug based on our discussion and I felt like the two main issues that would be solved by splitting DAMP were: 1/ the long runtime 2/ GCs
As mentioned in the previous comment, Alex had a simpler patch which could address GC issues, but would not help with the runtime (might even slow down the tests a bit? not sure). Were there other goals/benefits for you with the split, which I didn't capture in the bug description?
I think we're not too far from having a working solution with different jobs here, but it's still nice to consider all suggested options.
Comment 12•2 years ago
|
||
sounds like we have choices! I looked into damp because it wasn't running everywhere and it was often taking 50+ minutes, so if something went wrong it was a long cycle to retrigger or gather more data. Anything that is the most reliable will get my vote, second to that will be is the data going to be useful (not confusing, find bugs that we care to fix, etc.)
In the end, what works best for devtools?
Assignee | ||
Comment 13•2 years ago
|
||
Ok thanks! Overall I think I'd still prefer to have different jobs, because it speeds up the overall time, and if we have an intermittent in one job, it doesn't affect as much all the other tests.
The only thing that would make me go back to a single job setup is if we can't get an easy way to compare all the devtools tests from two pushes in perfherder. But hopefully we can find a solution for that.
Dave: Do you have pointers on how to achieve "having multiple "suites" in your JSON file, or by simply removing the summarised suite value". I am not sure where to modify that?
Comment 14•2 years ago
|
||
Here's a snippet of the JSON file in its current form:
{
"framework": {"name": "devtools"},
"suites": [
{
"alertThreshold": 2.0,
"extraOptions": [...],
"lowerIsBetter": true,
"name": "damp",
"shouldAlert": true,
"subtests": [
{
"alertThreshold": 2.0,
"lowerIsBetter": true,
"name": "cold.inspector.open.DAMP",
"replicates": [...],
"shouldAlert": true,
"unit": "ms",
"value": 380.37615200000005
},
],
"value": 208.77174549146764
}
]
}
If you remove the suites[0].value
this will effectively remove the "damp" test and replace it in the Perfherder UI with tests based on the subtest names (such as "cold.inspector.open.DAMP" from the example above). Whilst we've done this for Raptor, it appears that there's currently no way to prevent the summary value in Talos when there are subtests. Here's the relevant code. Perhaps we could add a condition to construct_results based on the suite name. I believe this would retain continuity in Perfherder's results. What I'm less sure of is if Perfherder will consolidate results from multiple JSON files with the same revision and suite but different subtests. I do think this is worth a try though.
The other option I mention would change the shape of the data to have multiple descendants of suites
, so instead of "damp" with subtests, we'd essentially promote the subtests to suites in their own right. I suspect this is non-trivial and carries more risk.
Assignee | ||
Comment 15•2 years ago
|
||
Depends on D137655
This will allow to promote all subtests from the various damp-* suites to individual tests
This allows to have a single view to compare results between damp pushes.
Updated•2 years ago
|
Assignee | ||
Comment 16•2 years ago
|
||
Depends on D137655
Alternative to D141833
This allows to preserve a single test suite called "damp" in the output, and have all our tests as subtests
This way we preserve historical data, and we can keep using the compare view as we did before splitting DAMP in
several suites. The split becomes mostly transparent for us in terms of sheriffing
Assignee | ||
Comment 17•2 years ago
|
||
The idea is to use this configuration to pass a DAMP suite name to the DAMP tests without having to create separate talos test classes.
Rather than passing a hardcoded damp_suite
value, extra_prefs
might be useful for other tests and at least should be generic.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 18•2 years ago
|
||
Pushed by jdescottes@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/49e91c2434f7 Add support for extra_prefs argument for talos suite definition r=perftest-reviewers,sparky https://hg.mozilla.org/integration/autoland/rev/d654d3fcfa8e [devtools] Split DevTools DAMP test suite in several chunks r=perftest-reviewers,sparky,nchevobbe,ochameau
Comment 19•2 years ago
|
||
Backed out 2 changesets (Bug 1749928) for causing webgl failures.
Backout link
Push with failures - webgl
Failure Log
Comment 20•2 years ago
|
||
Pushed by jdescottes@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/44fe405bb226 Add support for extra_prefs argument for talos suite definition r=perftest-reviewers,sparky https://hg.mozilla.org/integration/autoland/rev/a34804c2e1ca [devtools] Split DevTools DAMP test suite in several chunks r=perftest-reviewers,sparky,nchevobbe,ochameau
Assignee | ||
Updated•2 years ago
|
Comment 21•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/44fe405bb226
https://hg.mozilla.org/mozilla-central/rev/a34804c2e1ca
Description
•