Closed Bug 1570619 Opened 5 years ago Closed 5 years ago

2.44% tart (windows10-64-shippable) regression on push a91cc939f11246643f966e61a7bfa58d8a11a237 (Wed June 26 2019)

Categories

(Firefox Build System :: Task Configuration, defect)

defect
Not set
normal

Tracking

(Performance Impact:?, firefox70 wontfix, firefox71 wontfix)

RESOLVED WONTFIX
mozilla70
Performance Impact ?
Tracking Status
firefox70 --- wontfix
firefox71 --- wontfix

People

(Reporter: alexandrui, Unassigned)

References

(Regression)

Details

(4 keywords)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=a91cc939f11246643f966e61a7bfa58d8a11a237

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

2% tart windows10-64-shippable opt e10s stylo 2.55 -> 2.61

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=22149

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Flags: needinfo?(mshal)
Component: General → Task Configuration
Product: Testing → Firefox Build System
Version: Version 3 → unspecified

Michael, you have 9 patches on the bug, so it's difficult for me to determine the one causing the regression. Which of them makes sense to you to have caused this?
Thanks

Target Milestone: --- → mozilla70

(In reply to Alexandru Ionescu :alexandrui from comment #1)

Michael, you have 9 patches on the bug, so it's difficult for me to determine the one causing the regression. Which of them makes sense to you to have caused this?
Thanks

Off-hand I'm not sure, I'm trying to bisect it down to see if I can find anything. I'll update when I know more.

Ok, thank you!

The bisection pointed to the final commit enabling Windows 3-tier PGO (https://phabricator.services.mozilla.com/D34803), but that is a bit misleading. Investigating further, I found that the Windows 3-tier PGO landed after bug 1196094, which defined NS_FREE_PERMANENT_DATA for profile-generate builds to support e10s profiling, but before bug 1557762, which would have enabled the define in 1-tier PGO builds. On try I verified that taking bug 1557762 and doing a 1-tier PGO build from the commit before the Windows 3-tier PGO switch-over, I see the same perf regression in tart.

So the regression showed up in the 3-tier commits because we started to inadvertently correctly handle the e10s profile-generation switchover from bug 1196094.

:froydnj, do you have a suggestion here? Is this a regression we just need to accept? Would it perhaps be helped by bug 1555974? Presumably we wouldn't want to profile all the #ifdef NS_FREE_PERMANENT_DATA sections since they aren't in the final profile-use build, but it's not clear to me if this regression is from the addition of that code.

Flags: needinfo?(mshal) → needinfo?(nfroyd)

(In reply to Michael Shal [:mshal] from comment #4)

:froydnj, do you have a suggestion here? Is this a regression we just need to accept? Would it perhaps be helped by bug 1555974? Presumably we wouldn't want to profile all the #ifdef NS_FREE_PERMANENT_DATA sections since they aren't in the final profile-use build, but it's not clear to me if this regression is from the addition of that code.

Just to be sure I understand:

  1. We enable NS_FREE_PERMANENT_DATA in PGO generate builds so we can collect profile data.
  2. We do not enable NS_FREE_PERMANENT_DATA in PGO use builds.
  3. Doing (1), above, results in us taking a slowdown in PGO use builds (!).
  4. Not doing (1), above, (but still dumping the profile data as appropriate) results in a performance improvement on the benchmarks in comment 0.

If that's the case, then I think bug 1555974 is the answer.

Flags: needinfo?(nfroyd)

(In reply to Nathan Froyd [:froydnj] from comment #5)

Just to be sure I understand:

  1. We enable NS_FREE_PERMANENT_DATA in PGO generate builds so we can collect profile data.
  2. We do not enable NS_FREE_PERMANENT_DATA in PGO use builds.
  3. Doing (1), above, results in us taking a slowdown in PGO use builds (!).
  4. Not doing (1), above, (but still dumping the profile data as appropriate) results in a performance improvement on the benchmarks in comment 0.

Right, though for 4) I don't believe that my test is still dumping the profile data as appropriate - from bug 1196094 I'm led to believe that disabling NS_FREE_PERMANENT_DATA means we're calling _exit directly, and so we should be missing profile data for child processes in e10s. I don't see any differences in the talos benchmarks when I do this, though. :Gijs, is there a particular test where I should see differences if I undo https://phabricator.services.mozilla.com/D33269 ?

The profile data has pretty significant function counts with and without the _exit fix:

With NS_FREE_PERMANENT_DATA defined (total profraw size: ~829MB):
Instrumentation level: IR
Total functions: 290592
Maximum function count: 51711708
Maximum internal block count: 161365500

Without NS_FREE_PERMANENT_DATA defined (total profraw size: ~140MB):
Instrumentation level: IR
Total functions: 290586
Maximum function count: 14554514
Maximum internal block count: 7775441

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Michael Shal [:mshal] from comment #6)

(In reply to Nathan Froyd [:froydnj] from comment #5)

Just to be sure I understand:

  1. We enable NS_FREE_PERMANENT_DATA in PGO generate builds so we can collect profile data.
  2. We do not enable NS_FREE_PERMANENT_DATA in PGO use builds.
  3. Doing (1), above, results in us taking a slowdown in PGO use builds (!).
  4. Not doing (1), above, (but still dumping the profile data as appropriate) results in a performance improvement on the benchmarks in comment 0.

Right, though for 4) I don't believe that my test is still dumping the profile data as appropriate - from bug 1196094 I'm led to believe that disabling NS_FREE_PERMANENT_DATA means we're calling _exit directly, and so we should be missing profile data for child processes in e10s.

This (that we'd miss the profile data for kids) is my understanding as well.

I don't see any differences in the talos benchmarks when I do this, though. :Gijs, is there a particular test where I should see differences if I undo https://phabricator.services.mozilla.com/D33269 ?

In bug 1552425, where we accidentally stopped collecting metrics from child processes (which is what I think the issue is here), a regression in "2.51 - 2.54% raptor-speedometer-firefox (linux64-shippable, linux64-shippable-qr)" regression was noticed. Though I wouldn't be surprised if our Windows runs differ enough that the result is that things don't show up there when they did show up on Linux.

So purely from a correctness perspective I don't think we should go back to not collecting child process data (right?!), but it's disconcerting that it would cause a regression. It's possible that bug 1555974 would fix this, but I'm not super sure that it would?

It may also be interesting to work out what actually causes the TART difference. There are some tools to inspect the raw profiler data but I don't know them or how to go from there - it does look like the icon-open-DPI1.all.TART subtest regresses about 9%, which might help narrow down what's changed...

Does that help?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(mshal)
Whiteboard: [qf:tracking70]

Following up in email with mshal and vchin.

If this is reasonably easy to reproduce on try, has anybody generated any profiles for us to analyze?

I spent quite a bit of time trying to dig into this, and came up empty. I tried testing a fix to bug 1555974, but did not see an improvement in tart despite it looking promising with the NS_FREE_PERMANENT_DATA change. In other words, even though I no longer defined NS_FREE_PERMANENT_DATA, when I add the shutdown code in dom/ipc/ContentChild.cpp during MOZ_PROFILE_GENERATE, the regression persists. So I'm not sure how to solve this while keeping e10s profiling enabled. Might be worth checking in with Gijs or dmajor if there is interest to fix this, as I am out of ideas.

I spent quite a bit of time trying to dig into this, and came up empty. I tried testing a fix to bug 1555974, but did not see an improvement in tart despite it looking promising with the NS_FREE_PERMANENT_DATA change.

Gotcha. Did you happen to create and performance profiles while doing these tests that the team can examine?

Hey Gijs, can I presume that by applying a reverse of https://hg.mozilla.org/mozilla-central/rev/93ae54d47ca4, we'll be in a "before" state that we could gather a TART performance profile from?

Flags: needinfo?(gijskruitbosch+bugs)

I recreated my try pushes, but I'm not sure if I can come to the same conclusions as before. The runs vary enough that it be hard to see where an improvement or regression lies, at least to me. Maybe you will have better insight :)

The regression was triggered on commit a91cc939f11246643f966e61a7bfa58d8a11a237, when 3-tier PGO was enabled. Those try pushes & numbers look like this:

1-tier PGO: 2.54 - 2.59, avg 2.567: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ac41c61e182a1504d1f36cd04ccca32923b15c06
3-tier PGO: 2.58 - 2.64, avg 2.613: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3bf3c475f1d0097b449b491775970fe9ca956b63

(Note: the PGO profile data is available in the 'B' task in 1-tier PGO as merged.profdata, and the 'run' task in 3-tier PGO as profdata.tar.xz, which contains *.profraw that need to be merged)

However, the e10s PGO profiling was added part in bug 1196094, and part in bug 1557762, while 3-tier PGO landed in between the two. 3-tier PGO had the side of effect of also fixing bug 1557762, but only for 3-tier builds, while bug 1557762 fixed it for 1-tier builds. So it's worth looking at what 1-tier PGO looked like if bug 1557762 was included, which is this:

1-tier PGO with bug 1557762: 2.57 - 2.61, avg 2.596: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cf254eaeb88d3108bea8d70ff795caeadd0d2fdd

We can also try to look at 3-tier PGO without e10s profiling, which I tried by disabling the NS_FREE_PERMANENT_DATA define:

3-tier PGO without NS_FREE_PERMANENT_DATA: 2.57 - 2.6, avg 2.584: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0be0adf6c17f6242a17d6b9fa678a2862ce89e52

However, we don't want to disable profiling e10s subprocesses, so I also tried to fix bug 1555974. This should mean we're now correctly profiling e10s subprocesses, but without the full NS_FREE_PERMANENT_DATA define that may have other side effects. That's this push:

3-tier PGO with proposed bug 1555974 patch: 2.55 - 2.62, avg 2.586: https://treeherder.mozilla.org/#/jobs?repo=try&revision=76674dffb2991de1ae2af02bae40b15b20ccb24a

Using the average runtime as the benchmark, I get in order from best to worst:

1-tier base: 2.567
3-tier without NS_FREE: 2.584
3-tier shutdown with MOZ_PROF_GEN: 2.586
1-tier with bug 1557762: 2.596
3-tier base: 2.613

So I'm doubting my conclusion that this is solely due to e10s, since the average is still ~0.02 worse in 3-tier without NS_FREE_PERMANENT_DATA defined, and 3-tier is also ~0.02 worse than 1-tier with bug 1557762 included, so it seems there may be an effect both from 3-tier PGO and from e10s profiling.

Maybe I should still try to land bug 1555974? It looks like it has an effect here -- when I thought it was inconsequential earlier I was using perfherder to do comparisons, which maybe doesn't always catch regressions of this magnitude.

Any guidance or suggestions would be much appreciated!

Flags: needinfo?(mshal)

I should also note that the runtime of this benchmark has trended upward since e10s / 3-tier PGO, so comparing to recent m-c pushes isn't very helpful. Those are frequently in the 2.7+ range.

Flags: needinfo?(mconley)

Discussed during triage. :mconley, are you intending to look at this further? Thanks.

(In reply to Mike Conley (:mconley) (:⚙️) from comment #12)

Hey Gijs, can I presume that by applying a reverse of https://hg.mozilla.org/mozilla-central/rev/93ae54d47ca4, we'll be in a "before" state that we could gather a TART performance profile from?

Err, it's messy, because there's a few changes involved here and initially that commit did not cause this regression. :-(

(In reply to Michael Shal [:mshal] from comment #13)

I recreated my try pushes, but I'm not sure if I can come to the same conclusions as before. The runs vary enough that it be hard to see where an improvement or regression lies, at least to me. Maybe you will have better insight :)

So I went through this and looked specifically at the icon-open-DPI1.all.TART subtest - as I noted in comment #7, it regressed on the order of 10%, which is easier to spot than the ~1-2% this bug is about.

The regression was triggered on commit a91cc939f11246643f966e61a7bfa58d8a11a237, when 3-tier PGO was enabled. Those try pushes & numbers look like this:

1-tier PGO: 2.54 - 2.59, avg 2.567: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ac41c61e182a1504d1f36cd04ccca32923b15c06
3-tier PGO: 2.58 - 2.64, avg 2.613: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3bf3c475f1d0097b449b491775970fe9ca956b63

So the perfherder link for tart subtests is:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=3bf3c475f1d0097b449b491775970fe9ca956b63&originalSignature=1907711&newSignature=1907711&framework=1&originalRevision=ac41c61e182a1504d1f36cd04ccca32923b15c06

That shows an 11.51% regression, with the "bad" case average of 2.23, spread from 2.17 to 2.27, and the "good" average being 2.00, spread 1.97-2.02 .

However, the e10s PGO profiling was added part in bug 1196094, and part in bug 1557762, while 3-tier PGO landed in between the two. 3-tier PGO had the side of effect of also fixing bug 1557762, but only for 3-tier builds, while bug 1557762 fixed it for 1-tier builds. So it's worth looking at what 1-tier PGO looked like if bug 1557762 was included, which is this:

1-tier PGO with bug 1557762: 2.57 - 2.61, avg 2.596: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cf254eaeb88d3108bea8d70ff795caeadd0d2fdd

Compared with the 1-tier base: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=cf254eaeb88d3108bea8d70ff795caeadd0d2fdd&originalSignature=1907711&newSignature=1907711&framework=1&originalRevision=ac41c61e182a1504d1f36cd04ccca32923b15c06

The good case is the same (same cset), the bad case averages 2.21, spread from 2.16 to 2.26. Now, I'm not great at talos tea-leaf-reading, but that looks like it's basically the same as the other "bad" case, so I think it's warranted to say that the regression is caused by adding NS_FREE_PERMANENT_DATA in order to include the child process data.

NS_FREE_PERMANENT_DATA does lots of things. We want one of those things (dumping pgo profile data when the child processes shut down) but don't want pretty much everything else it does. Considering I'm seeing it used also to determine whether/when we clear font caches, it's not inconceivable that this has an impact on how well-optimized certain codepaths are, and that in turn impacts the "finished product", as it were.

We can also try to look at 3-tier PGO without e10s profiling, which I tried by disabling the NS_FREE_PERMANENT_DATA define:

3-tier PGO without NS_FREE_PERMANENT_DATA: 2.57 - 2.6, avg 2.584: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0be0adf6c17f6242a17d6b9fa678a2862ce89e52

Comparison: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=0be0adf6c17f6242a17d6b9fa678a2862ce89e52&originalSignature=1907711&newSignature=1907711&framework=1&originalRevision=ac41c61e182a1504d1f36cd04ccca32923b15c06

This shows a new case of 2.07, spread 2.04 - 2.09. This is (significantly!) better than before, but perfherder still sees a regression. Low-confidence, it says, but given the spread is disjoint and there's 10 runs either side of the comparison, I suspect there's something to it.

The problem is figuring out what's causing that...

However, we don't want to disable profiling e10s subprocesses, so I also tried to fix bug 1555974. This should mean we're now correctly profiling e10s subprocesses, but without the full NS_FREE_PERMANENT_DATA define that may have other side effects. That's this push:

3-tier PGO with proposed bug 1555974 patch: 2.55 - 2.62, avg 2.586: https://treeherder.mozilla.org/#/jobs?repo=try&revision=76674dffb2991de1ae2af02bae40b15b20ccb24a

Compare: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=76674dffb2991de1ae2af02bae40b15b20ccb24a&originalSignature=1907711&newSignature=1907711&framework=1&originalRevision=ac41c61e182a1504d1f36cd04ccca32923b15c06

This shows new case 2.08, spread 2.03-2.11 . Again, better than before but still a regression.

Using the average runtime as the benchmark, I get in order from best to worst:

1-tier base: 2.567
3-tier without NS_FREE: 2.584
3-tier shutdown with MOZ_PROF_GEN: 2.586
1-tier with bug 1557762: 2.596
3-tier base: 2.613

So I'm doubting my conclusion that this is solely due to e10s, since the average is still ~0.02 worse in 3-tier without NS_FREE_PERMANENT_DATA defined, and 3-tier is also ~0.02 worse than 1-tier with bug 1557762 included, so it seems there may be an effect both from 3-tier PGO and from e10s profiling.

Maybe I should still try to land bug 1555974? It looks like it has an effect here -- when I thought it was inconsequential earlier I was using perfherder to do comparisons, which maybe doesn't always catch regressions of this magnitude.

I think this would definitely help. It may be that performance profiles (ie gecko/fx profiler ones, not pgo clang profiles) from these csets would help pinpoint what's happening for the remainder of the change. I assume you've verified that we get roughly the same ~800mb profile data as we do with NS_FREE ?

Otherwise, my remaining suggestion would be that your patch still calls a "full" exit in the PGO generate case. Ideally, we should QuickExit even in the profile generate case, but force clang to dump its profile data before doing so. I don't know if there's a way to force it to do so from the code... seems :froydnj is on PTO so unsure who else to ask at the moment.

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(mshal)

Thanks for the thorough analysis, Gijs! I agree we should try to get bug 1555974 landed, that looks like it'll help.

Flags: needinfo?(mconley)

Looks like we are shipping 70 with this regression. May still be possible to fix it for 71/72 though.

(In reply to :Gijs (he/him) from comment #16)

I think this would definitely help. It may be that performance profiles (ie gecko/fx profiler ones, not pgo clang profiles) from these csets would help pinpoint what's happening for the remainder of the change. I assume you've verified that we get roughly the same ~800mb profile data as we do with NS_FREE ?

Hmm, the file size looks roughly the same, but unfortunately the number of *.profraw files drops from 64 to 16 when I remove NS_FREE_PERMANENT_DATA, and my proposed fix also has 16 files. So I guess that means it's not profiling e10s properly? I'm currently trying to add MOZ_PROFILE_GENERATE checks to other NS_FREE_PERMANENT_DATA lines that appear to be checking when to shut things down (rather than free data).

(In reply to Michael Shal [:mshal] from comment #20)

(In reply to :Gijs (he/him) from comment #16)

I think this would definitely help. It may be that performance profiles (ie gecko/fx profiler ones, not pgo clang profiles) from these csets would help pinpoint what's happening for the remainder of the change. I assume you've verified that we get roughly the same ~800mb profile data as we do with NS_FREE ?

Hmm, the file size looks roughly the same, but unfortunately the number of *.profraw files drops from 64 to 16 when I remove NS_FREE_PERMANENT_DATA, and my proposed fix also has 16 files. So I guess that means it's not profiling e10s properly?

Maybe, yeah... IIRC clang has some tools to dump the data that's in these files, could be useful to do that and grep for something like InitChildProcess before/after ( https://searchfox.org/mozilla-central/rev/05a22d864814cb1e4352faa4004e1f975c7d2eb9/toolkit/xre/Bootstrap.cpp#65 )

I'm currently trying to add MOZ_PROFILE_GENERATE checks to other NS_FREE_PERMANENT_DATA lines that appear to be checking when to shut things down (rather than free data).

Have you verified that MOZ_PROFILE_GENERATE is defined correctly?

(In reply to :Gijs (he/him) from comment #21)

(In reply to Michael Shal [:mshal] from comment #20)

(In reply to :Gijs (he/him) from comment #16)

I think this would definitely help. It may be that performance profiles (ie gecko/fx profiler ones, not pgo clang profiles) from these csets would help pinpoint what's happening for the remainder of the change. I assume you've verified that we get roughly the same ~800mb profile data as we do with NS_FREE ?

Hmm, the file size looks roughly the same, but unfortunately the number of *.profraw files drops from 64 to 16 when I remove NS_FREE_PERMANENT_DATA, and my proposed fix also has 16 files. So I guess that means it's not profiling e10s properly?

Maybe, yeah... IIRC clang has some tools to dump the data that's in these files, could be useful to do that and grep for something like InitChildProcess before/after ( https://searchfox.org/mozilla-central/rev/05a22d864814cb1e4352faa4004e1f975c7d2eb9/toolkit/xre/Bootstrap.cpp#65 )

Good idea, it looks like if I use 'llvm-profdata show -counts -all-functions', I can see a difference when e10s is used:

Block counts: [8, 2, 0, 0, 0, 0, 10, 0, 10, 10, 10, 0, 0, 0, 10, 10, 0, 10, 10, 10, 10, 0, 10, 10, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 8, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

and when it's not:

Block counts: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Unfortunately my proposed fix comes up blank (all 0's) as well, so it seems that's not working as intended.

I checked all uses of NS_FREE_PERMANENT_DATA in the code and added MOZ_PROFILE_GENERATE checks to the ones that either call ProcessChild::QuickExit() or KillHard(). Additionally, I made ProcessWatcher::EnsureProcessTerminated() doesn't force the shutdown. This appears to make e10s profiling work without all of NS_FREE_PERMANENT_DATA being activated, and I now get the full 68 *.profraw files, and see a populated Block counts line. But this results in similar perf regressions:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=a2cdbbe21c91efd8ceac9f9b87ab39ea73176829&originalSignature=1907711&newSignature=1907711&framework=1&originalRevision=ac41c61e182a1504d1f36cd04ccca32923b15c06

So it seems to me that my original attempt to fix bug 1555974 was not in fact doing so, and when I do it properly now (I think?), it doesn't address the perf regression. :Gijs, any suggestions on where to do from here? Can you take a peak at these patches and see if there's anything that looks obviously wrong?

https://hg.mozilla.org/try/rev/1e8c5148978f44c4a9aa9562853f60c276bae202
https://hg.mozilla.org/try/rev/0641e8015cdc56efd5af802a7bf7c15dc1032cac
https://hg.mozilla.org/try/rev/8fd597ce70f3bac10025818709778e4826dfdb10
https://hg.mozilla.org/try/rev/a83c536df2e6f235c44107f219530acebe2f5156

I'm currently trying to add MOZ_PROFILE_GENERATE checks to other NS_FREE_PERMANENT_DATA lines that appear to be checking when to shut things down (rather than free data).

Have you verified that MOZ_PROFILE_GENERATE is defined correctly?

Good question, it looks like MOZ_PROFILE_GENERATE is set in the defines in config.status on the instrumented build (due to --enable-profile-generate being used), so it should be set in mozilla-config.h and able to be consumed globally.

Flags: needinfo?(mshal) → needinfo?(gijskruitbosch+bugs)

(In reply to Michael Shal [:mshal] from comment #22)

I checked all uses of NS_FREE_PERMANENT_DATA in the code and added MOZ_PROFILE_GENERATE checks to the ones that either call ProcessChild::QuickExit() or KillHard(). Additionally, I made ProcessWatcher::EnsureProcessTerminated() doesn't force the shutdown. This appears to make e10s profiling work without all of NS_FREE_PERMANENT_DATA being activated, and I now get the full 68 *.profraw files, and see a populated Block counts line. But this results in similar perf regressions:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=a2cdbbe21c91efd8ceac9f9b87ab39ea73176829&originalSignature=1907711&newSignature=1907711&framework=1&originalRevision=ac41c61e182a1504d1f36cd04ccca32923b15c06

So it seems to me that my original attempt to fix bug 1555974 was not in fact doing so, and when I do it properly now (I think?), it doesn't address the perf regression. :Gijs, any suggestions on where to do from here? Can you take a peak at these patches and see if there's anything that looks obviously wrong?

So I think we have 2 ways of digging in here:

  • see if, instead of avoiding the quickexit calls, we can make quickexit and any other manual _exit() calls be preceded by a call to __llvm_profile_write_file (should be forward-declared) when using MOZ_PROFILE_GENERATE, which per docs at https://github.com/llvm-mirror/clang/blob/master/docs/SourceBasedCodeCoverage.rst#using-the-profiling-runtime-without-static-initializers should force clang to write out the profiling files. I haven't tested this and I know very little about clang, and I think the docs are for a slightly different way of profiling (-fprofile-instr-generate vs. -fprofile-generate), but the documentation for these args is identical, so unsure if there is a difference and/or if there's a different function you'd want...
  • use gecko profiles of the before/after states to figure out more about what exactly is slower here, to see if that can give us clues about why e10s-ness is making a difference here...

Does that help?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(mshal)

(In reply to :Gijs (he/him) from comment #23)

  • see if, instead of avoiding the quickexit calls, we can make quickexit and any other manual _exit() calls be preceded by a call to __llvm_profile_write_file (should be forward-declared) when using MOZ_PROFILE_GENERATE, which per docs at https://github.com/llvm-mirror/clang/blob/master/docs/SourceBasedCodeCoverage.rst#using-the-profiling-runtime-without-static-initializers should force clang to write out the profiling files. I haven't tested this and I know very little about clang, and I think the docs are for a slightly different way of profiling (-fprofile-instr-generate vs. -fprofile-generate), but the documentation for these args is identical, so unsure if there is a difference and/or if there's a different function you'd want...

We do something like this for Android, because the standard Android process shutdown doesn't go through a normal exit and skips the profile write: https://searchfox.org/mozilla-central/rev/17756e2a5c180d980a4b08d99f8cc0c97290ae8d/toolkit/xre/Bootstrap.cpp#89

I tried that out here by adding the __llvm_profile_dump() call to ipc/glue/ProcessChild.cpp in QuickExit()[1]. I also had to skip calling KillHard() in a few places [2] and not force process shutdowns [3], otherwise the profile data would be truncated and the merge step would fail.

[1] https://hg.mozilla.org/try/rev/6c889e919a1787b49f9c209686bffdb4e0d60567
[2] https://hg.mozilla.org/try/rev/45ff1d5874409914721fd0502e3b7d09cca690e0
[3] https://hg.mozilla.org/try/rev/5748d44392352414cb439e85a76c719c95105eaf

Unfortunately the results are still bad: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=009b711db09eb1910e52f33799b4ee36b13e1d85&originalSignature=1907711&newSignature=1907711&framework=1&originalRevision=ac41c61e182a1504d1f36cd04ccca32923b15c06

  • use gecko profiles of the before/after states to figure out more about what exactly is slower here, to see if that can give us clues about why e10s-ness is making a difference here...

This sounds like a reasonable next step. Can one of you provide some guidance on what functions I should start looking at for say the icon-open-DPI1.all.TART or newtab-open-preload-no.all.TART tests? I do see a bunch of differences with 'llvm-profdata show -all-functions -counts', including some cases where there are hits in the original 1-tier PGO build but not after e10s was enabled (which is odd, as I would expect that we'd just have more data, not less). However, I don't know if they're in relevant parts of the code.

Flags: needinfo?(mshal)
Flags: needinfo?(mconley)
Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Michael Shal [:mshal] from comment #24)

(In reply to :Gijs (he/him) from comment #23)

  • use gecko profiles of the before/after states to figure out more about what exactly is slower here, to see if that can give us clues about why e10s-ness is making a difference here...

This sounds like a reasonable next step. Can one of you provide some guidance on what functions I should start looking at for say the icon-open-DPI1.all.TART or newtab-open-preload-no.all.TART tests? I do see a bunch of differences with 'llvm-profdata show -all-functions -counts', including some cases where there are hits in the original 1-tier PGO build but not after e10s was enabled (which is odd, as I would expect that we'd just have more data, not less). However, I don't know if they're in relevant parts of the code.

If you use --gecko-profile with ./mach try fuzzy, you can push to try and have it generate gecko profiles for any talos jobs it runs (I don't see profiling talos jobs in the retrigger/new task interface, only raptor ones, or I would have used that...). I was planning to just do this for you, but unfortunately it seems ./mach try fuzzy is broken for me on the base rev which you're using (which is now quite old). :-(

In theory, diffing the gecko profiles that those jobs output (either comparing manually or using https://profiler.firefox.com/compare/ ) might help narrow down what the difference is between the two situations.

I would have done the same from a more recent revision, but per comment #14 I don't know if this would get us anything useful...

In fact, having looked at that myself now, it looks like after bug 1586232 these numbers have dropped off a cliff (in a good way) anyway, so I'm not sure how useful it is to keep investigating this. In principle, it'd be satisfying to understand why this happened, but I'm not sure we can get any measurable improvement on some of the metrics that regressed here given how low they now are...
I'll leave the ni for Mike to see what he thinks.

Flags: needinfo?(gijskruitbosch+bugs)

It sounds like this may have to be something we accept given backing this out is not what we would want.

Continue tracking for 71 for now but would like to get Mike's take on whether we just close this off.

(In reply to :Gijs (he/him) from comment #25)

In fact, having looked at that myself now, it looks like after bug 1586232 these numbers have dropped off a cliff (in a good way) anyway, so I'm not sure how useful it is to keep investigating this. In principle, it'd be satisfying to understand why this happened, but I'm not sure we can get any measurable improvement on some of the metrics that regressed here given how low they now are...
I'll leave the ni for Mike to see what he thinks.

To be fair, after writing this comment I realized that bug 1586232 is presumably gated on the megabar work and so won't affect 71 anymore as/when it goes to beta...

Catching up on this bug, we seem to have hit the diminishing returns point. I suspect we should close this as WONTFIX. I don't think it's worth burning more time on 2.44% on TART, given that we have the megabar coming down the pike.

Feel free to re-open if anyone disagrees.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(mconley)
Resolution: --- → WONTFIX

(In reply to :Gijs (he/him) from comment #25)

If you use --gecko-profile with ./mach try fuzzy, you can push to try and have it generate gecko profiles for any talos jobs it runs (I don't see profiling talos jobs in the retrigger/new task interface, only raptor ones, or I would have used that...). I was planning to just do this for you, but unfortunately it seems ./mach try fuzzy is broken for me on the base rev which you're using (which is now quite old). :-(

In theory, diffing the gecko profiles that those jobs output (either comparing manually or using https://profiler.firefox.com/compare/ ) might help narrow down what the difference is between the two situations.

FWIW I managed to get some try pushes on the old revs today with --gecko-profile (I had to apply fixes for bug 1575014 and bug 1585283)

1-tier base: https://treeherder.mozilla.org/#/jobs?repo=try&revision=59f43c3eec9e5ad536f844ce469f20f282976c3e&selectedJob=271775279
3-tier with attempted fix: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cb0ba750ffa333fea206940a66bed48130f326bc&selectedJob=271780001

I tried loading them up in that profiler.firefox.com/compare/ link but it complains about an "Unexpected data source". Both profiles load up fine individually, though. If you wanted to take a quick look I'd be curious if you notice anything, but I'm also happy to just drop this.

Has Regression Range: --- → yes
Performance Impact: --- → ?
Whiteboard: [qf:tracking70]
You need to log in before you can comment on or make changes to this bug.