Closed Bug 1749928 Opened 2 years ago Closed 2 years ago

Split DevTools DAMP test suite in several chunks

Categories

(DevTools :: General, task, P3)

task

Tracking

(firefox100 fixed)

RESOLVED FIXED
100 Branch
Tracking Status
firefox100 --- fixed

People

(Reporter: jdescottes, Assigned: jdescottes)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files, 2 obsolete files)

The DevTools performance tests (DAMP) take a long time to run and often run into GC issues.

We should split up the suite. :jmaher provided a prototype at https://hg.mozilla.org/try/rev/0704f612572bb1a5da7a8a5420a24ed155d9ecc9

Hi Joel,

While playing around with this I realized that introducing those chunks as proposed here means losing all the history about the performance tests.

As you might know we are mostly interesting in tracking performance of each subtest rather than tracking the overall DAMP "score". That's why in theory, splitting is fine for us. But it would still be great to be able to keep track of the historical values.

Another downside is that on perfherder the results show up as different tests (damp1, damp2, damp3). So if we are interested in a specific subtest, we have first to know in which chunk it runs.

It's probably just something we need to get used to, but I was naively thinking everything could still show up as a single "damp" suite in perfherder. Just wanted to check with you if that would be feasible? Thinking more about it, I can see how that would be problematic to implement, but still wanted to check.

Flags: needinfo?(jmaher)

many things are more complicated then they appear. Great catch on this. If I understand correctly, you want to make sure that the summarized damp score remains. I think the only way to do that is to run everything together.

I guess one way to break it up would be cold, complicated, custom, simple, etc. Ideally there could be a logical grouping so it would be easy going forward.

In the end, the same queries to perfherder should give you the same graphs for the subtests (unless I am overlooking the obvious).

Flags: needinfo?(jmaher)

(In reply to Joel Maher ( :jmaher ) (UTC -0800) from comment #2)

many things are more complicated then they appear. Great catch on this. If I understand correctly, you want to make sure that the summarized damp score remains. I think the only way to do that is to run everything together.

It's not a hard blocker. It's not so much about the score and rather about making it easy to compare 2 runs. Which is easier when all subtests are under a single "damp" suite on perfherder. For DAMP we almost never look at the overall score, which is very rarely significantly impacted, instead we immediately jump in the subtest view to see if any subtest regressed significantly.

For instance, imagine I want to assess the perf impact of a low level patch, and I have no idea which panel it might slow down. When all the subtests are under a single DAMP suite, I can open the compare view, click on subtests for one of the DAMP reports, and there I can see if any subtests was significantly impacted by filtering with "Show only important changes". But with 3 chunks, I will need to do this 3 times. Open on all the subtests for damp1, check if any subtest regressed ; same for damp2 ; same for damp3. So that's a bit more tedious.

When you are chasing a specific performance problem or regression it's really less of a problem, because you should know in which chunk this impacted test was running, and you can simply run this test only.

I guess one way to break it up would be cold, complicated, custom, simple, etc. Ideally there could be a logical grouping so it would be easy going forward.

I agree, that's a great suggestion. Have some logical grouping so that it's easy to know where each test is expected to run. But still try to have a good balance so that chunks take a comparable amount of time. So far one split which seems to work is:

  • chunk1: all inspector, accessibility, styleditor tests (all those tools are somewhat related)
  • chunk2: all console tests (that's where we have the most tests!)
  • chunk3: all the rest :) (debugger + netmonitor mostly, but also a few others)

With that split, I might just use names instead of indices for the chunks (ie damp-inspector instead of damp1).

In the end, the same queries to perfherder should give you the same graphs for the subtests (unless I am overlooking the obvious).

Right, but the data set for each subtest will be a new one, fully disconnected from the data we had when everything was under damp. Right now on perfherder if I try to add a data-set, I get duplicated entries for damp ${subtest name} ${platform info} and damp1-2-3 ${subtest name} ${platform info}. And if I select a data-set from damp 1-2-3, there is no past data available. Eg see https://treeherder.mozilla.org/perfherder/graphs?series=try,4101729,1,12&selected=4101729,1473760456

But I guess that's expected, and there's no easy way to workaround this. We have a custom viewer for perf data at https://firefox-dev.tools/performance-dashboard/ where we can write some code to reconcile perf data. We mostly used it when platform names used to change, but I guess we can do the same if the data-set name changes as well.

I think getting faster DAMP execution times and reducing GC issues is more important than those issues, and that we should move forward. It might also allow us to reenable some tests which got disabled for GC issues.

Split DAMP in 3 explicit sub-suites: damp-inspector, damp-webconsole, damp-other

Locally you can still run any DAMP subtest with --suite=damp or --activeTests=damp

(After presenting this to the team, losing the ability to compare all subtests at once is still an issue, so let's see if we can find a solution)

Hi Dave,

We are trying to split DAMP's try job into smaller ones (3 at the moment). So we created 3 suites:

  • damp-inspector
  • damp-webconsole
  • damp-other

The current patch on phabricator works, but there is still one downside which is that when you want to compare the results between 2 pushes, instead of comparing the subtests under "damp", we now would have to compare the subtests for three different suites.

With "damp", we are really not interested in damp's score, we only look at the subtests. Is there a way our 3 jobs could contribute to a single "damp" suite, so that we can still see and compare all the subtests easily? I am not sure if this is feasible and if it is, where it should be configured/handled. Is this something we could do?

Thanks!

Flags: needinfo?(dave.hunt)

(In reply to Julian Descottes [:jdescottes] from comment #5)

With "damp", we are really not interested in damp's score, we only look at the subtests. Is there a way our 3 jobs could contribute to a single "damp" suite, so that we can still see and compare all the subtests easily? I am not sure if this is feasible and if it is, where it should be configured/handled. Is this something we could do?

I seem to recall that for power data, we have used multiple JSON files to consolidate and ingest into Perfherder. I think this was done with additional suites rather than subtests, so it might not work for your use case. :sparky do you recall?

That said, if you're not interested in the suite level value, you could promote your subtests to tests. You can do this either by having multiple "suites" in your JSON file, or by simply removing the summarised suite value (this retains the structure but Perfherder stops treating these results as subtests). If you did this, you would lose continuity with your past results as you would have new signatures.

I think it would be a good idea to connect with Alex Irimovici, who is currently conducting user research on Perfherder's compare view and may have some ideas on how to achieve what you're after. If not, perhaps this will inform our direction.

Flags: needinfo?(gmierz2)
Flags: needinfo?(dave.hunt)
Flags: needinfo?(airimovici)

Thanks for the feedback!

by simply removing the summarised suite value (this retains the structure but Perfherder stops treating these results as subtests). If you did this, you would lose continuity with your past results as you would have new signatures.

At first glance, this seems like a good option. Continuity at this point is a nice to have, we are more interested in monitoring regressions, and we have another dashboard were we can try to reconcile the data if we want to keep some historical tracking. I imagine this would automatically promote all our subtests without having to declare them explicitly anywhere else.

(In reply to Julian Descottes [:jdescottes] from comment #7)

I imagine this would automatically promote all our subtests without having to declare them explicitly anywhere else.

I believe so, though I'll leave the needinfo for Alex to confirm. Perfherder will treat the subtest name as the test name, and so it will also be necessary to update the documentation so that each test has a destination.

If you're making these changes you may want to consider other changes to the name at this time, so as the apparently redundant ".DAMP" suffix. Also, you can use extraOptions to express test variants that create distinct test signatures, so you could have "inspector.open" with extraOptions of "cold", "simple", "complicated", and "custom". You could perhaps take this further and have an "open" test with the panel as an extraOption.

I (In reply to Dave Hunt [:davehunt] [he/him] ⌚GMT from comment #6)

(In reply to Julian Descottes [:jdescottes] from comment #5)

With "damp", we are really not interested in damp's score, we only look at the subtests. Is there a way our 3 jobs could contribute to a single "damp" suite, so that we can still see and compare all the subtests easily? I am not sure if this is feasible and if it is, where it should be configured/handled. Is this something we could do?

I seem to recall that for power data, we have used multiple JSON files to consolidate and ingest into Perfherder. I think this was done with additional suites rather than subtests, so it might not work for your use case. :sparky do you recall?

Yes that's correct. We had two separate PERFHERDER_DATA outputs, one for the technical metrics, and the other for the resource usage metrics. We ended up with multiple blobs because of how different the summarization value was.

Flags: needinfo?(gmierz2)

I was reviewing some very old patches and saw bug 1612896. This would solve GC issues while keeping damp jobs unique and still long to run.

(In reply to Alexandre Poirot [:ochameau] from comment #10)

I was reviewing some very old patches and saw bug 1612896. This would solve GC issues while keeping damp jobs unique and still long to run.

That could also be an option!

Joel: I filed the bug based on our discussion and I felt like the two main issues that would be solved by splitting DAMP were: 1/ the long runtime 2/ GCs

As mentioned in the previous comment, Alex had a simpler patch which could address GC issues, but would not help with the runtime (might even slow down the tests a bit? not sure). Were there other goals/benefits for you with the split, which I didn't capture in the bug description?
I think we're not too far from having a working solution with different jobs here, but it's still nice to consider all suggested options.

Flags: needinfo?(jmaher)

sounds like we have choices! I looked into damp because it wasn't running everywhere and it was often taking 50+ minutes, so if something went wrong it was a long cycle to retrigger or gather more data. Anything that is the most reliable will get my vote, second to that will be is the data going to be useful (not confusing, find bugs that we care to fix, etc.)

In the end, what works best for devtools?

Flags: needinfo?(jmaher)

Ok thanks! Overall I think I'd still prefer to have different jobs, because it speeds up the overall time, and if we have an intermittent in one job, it doesn't affect as much all the other tests.

The only thing that would make me go back to a single job setup is if we can't get an easy way to compare all the devtools tests from two pushes in perfherder. But hopefully we can find a solution for that.

Dave: Do you have pointers on how to achieve "having multiple "suites" in your JSON file, or by simply removing the summarised suite value". I am not sure where to modify that?

Flags: needinfo?(dave.hunt)

Here's a snippet of the JSON file in its current form:

{
  "framework": {"name": "devtools"},
  "suites": [
    {
      "alertThreshold": 2.0,
      "extraOptions": [...],
      "lowerIsBetter": true,
      "name": "damp",
      "shouldAlert": true,
      "subtests": [
        {
          "alertThreshold": 2.0,
          "lowerIsBetter": true,
          "name": "cold.inspector.open.DAMP",
          "replicates": [...],
          "shouldAlert": true,
          "unit": "ms",
          "value": 380.37615200000005
        },
      ],
      "value": 208.77174549146764
    }
  ]
}

If you remove the suites[0].value this will effectively remove the "damp" test and replace it in the Perfherder UI with tests based on the subtest names (such as "cold.inspector.open.DAMP" from the example above). Whilst we've done this for Raptor, it appears that there's currently no way to prevent the summary value in Talos when there are subtests. Here's the relevant code. Perhaps we could add a condition to construct_results based on the suite name. I believe this would retain continuity in Perfherder's results. What I'm less sure of is if Perfherder will consolidate results from multiple JSON files with the same revision and suite but different subtests. I do think this is worth a try though.

The other option I mention would change the shape of the data to have multiple descendants of suites, so instead of "damp" with subtests, we'd essentially promote the subtests to suites in their own right. I suspect this is non-trivial and carries more risk.

Flags: needinfo?(dave.hunt)

Depends on D137655

This will allow to promote all subtests from the various damp-* suites to individual tests
This allows to have a single view to compare results between damp pushes.

Attachment #9269035 - Attachment description: Bug 1749928 - [devtools] Remove test suite value talos output for DAMP → Bug 1749928 - [devtools] Remove test suite value from talos output for DAMP

Depends on D137655

Alternative to D141833

This allows to preserve a single test suite called "damp" in the output, and have all our tests as subtests
This way we preserve historical data, and we can keep using the compare view as we did before splitting DAMP in
several suites. The split becomes mostly transparent for us in terms of sheriffing

The idea is to use this configuration to pass a DAMP suite name to the DAMP tests without having to create separate talos test classes.
Rather than passing a hardcoded damp_suite value, extra_prefs might be useful for other tests and at least should be generic.

Attachment #9269035 - Attachment is obsolete: true
Attachment #9269181 - Attachment is obsolete: true
Blocks: 1761730
Pushed by jdescottes@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/49e91c2434f7
Add support for extra_prefs argument for talos suite definition r=perftest-reviewers,sparky
https://hg.mozilla.org/integration/autoland/rev/d654d3fcfa8e
[devtools] Split DevTools DAMP test suite in several chunks r=perftest-reviewers,sparky,nchevobbe,ochameau

Backed out 2 changesets (Bug 1749928) for causing webgl failures.
Backout link
Push with failures - webgl
Failure Log

Flags: needinfo?(jdescottes)
Pushed by jdescottes@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/44fe405bb226
Add support for extra_prefs argument for talos suite definition r=perftest-reviewers,sparky
https://hg.mozilla.org/integration/autoland/rev/a34804c2e1ca
[devtools] Split DevTools DAMP test suite in several chunks r=perftest-reviewers,sparky,nchevobbe,ochameau
Flags: needinfo?(jdescottes)
Flags: needinfo?(alexandru.irimovici)
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 100 Branch
Blocks: 1762837
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: