Closed Bug 1174715 Opened 9 years ago Closed 5 years ago

3-5% Linux 64/win8 damp regression on Fx-Team-Non-PGO on June 12, 2015 from push b93a5e3d522d

Categories

(DevTools :: Debugger, defect, P4)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Talos has detected a Firefox performance regression from your commit b93a5e3d522d in bug 1164564.  We need you to address this regression.

This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=b93a5e3d522d&showAll=1

On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#DAMP

Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p linux64 -u none -t g2  # add "mozharness: --spsProfile" to generate profile data

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a damp

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Thursday, or the offending patch will be backed out! ***

Our wiki page oulines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
I did some retriggers:
https://treeherder.mozilla.org/#/jobs?repo=fx-team&fromchange=ac0a2b395a31&tochange=a3399f552e3b&filter-searchStr=Ubuntu%20HW%2012.04%20x64%20fx-team%20talos%20g2

This patch takes the damp values from ~310 -> ~325.  later, I see a similar regression for 4.8% on Win8 PGO (note, we don't do pgo builds as frequently.

:ejpbruel, can you take a look at this and see why this is the case.  Help us determine if we need to live with this or more likely what our options are for reducing or removing this regression.
Flags: needinfo?(ejpbruel)
Joel, is there a way to determine exactly which measurements are causing the regression?  In this case, I'm expecting that the regression may be caused by jsdebugger speed, so the following probes are particularly interesting: 

simple.jsdebugger.open.DAMP, complicated.jsdebugger.open.DAMP, simple.jsdebugger.reload.DAMP, complicated.jsdebugger.reload.DAMP, simple.jsdebugger.close.DAMP, complicated.jsdebugger.close.DAMP
Flags: needinfo?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #1)
> I did some retriggers:
> https://treeherder.mozilla.org/#/jobs?repo=fx-
> team&fromchange=ac0a2b395a31&tochange=a3399f552e3b&filter-
> searchStr=Ubuntu%20HW%2012.04%20x64%20fx-team%20talos%20g2
> 
> This patch takes the damp values from ~310 -> ~325.  later, I see a similar
> regression for 4.8% on Win8 PGO (note, we don't do pgo builds as frequently.
> 
> :ejpbruel, can you take a look at this and see why this is the case.  Help
> us determine if we need to live with this or more likely what our options
> are for reducing or removing this regression.

This patch tries to resolve some of the mess that is promises in the devtools code. In a nutshell: we cannot use native promises in the debugger on worker threads. We have a software implementation of promises called Promise.jsm that we use instead. In order for the server to work on both the main thread and on worker threads, the way we load this module should be the same everywhere.

On worker threads, we load Promise.jsm as a CommonJS module. We have to do this, because JSMs are not available on workers. On the main thread, we pre-load Promise.jsm, and then provide it as a built-in module. However, that also led to problems, because pre-loaded modules are visible to the debugger, where they should not (this is a problem because the debugger should not be able to debug itself).

To resolve these issues, this patch refactors Promise.jsm so that it can be loaded as a CommonJS module on both the main thread and worker threads, and removes the uses of native promises in those code paths needed by the debugger on a worker thread. The loss of preloading is likely to explain some of this performance regression, but is necessary, because of the visibility bug mentioned earlier.

Things are further complicated by the fact that the module loader uses Promise.jsm internally; it cannot use the same instance of Promise.jsm that is loaded as a module, because that would lead to a circular dependency where the module needs to be loaded in order to be loaded. To avoid this, Promise.jsm is now loaded twice: once by the module loader itself, and once as a CommonJS module by other modules. I expect this explains some part of the performance regression as well.

We cannot simply revert this patch, because we need it in order to get worker debugging to work. However, there might be ways to mitigate some of the performance regression introduced by this patch. Here's what we could do:

- We could still pre-load Promise.jsm if only we had a way to make it invisible to the debugger. I tried to do this by loading it in a sandbox with the invisibleToDebugger flag set to true. However, that did not seem to work, hence why I fell back on loading it as a CommonJS module.
- If we can pre-load Promise.jsm, we would not need to load it twice, because the JSM machinery does not depend on the module loader.
Flags: needinfo?(ejpbruel)
Do some of these options for pre-loading promise.jsm seem reasonable to do in the coming week or two?
:ejpbruel, can you let me know a timeline when this will be completed?
Flags: needinfo?(ejpbruel)
Eddy is on PTO until Aug 31.

Joe, is there someone else who can work on this task while he is out?
Flags: needinfo?(jwalker)
I'm working on finding someone.

In the short term, unless Brian thinks otherwise, I think we risk doing more bitrot damage by backing this out than by leaving it for a while. I don't think we'll forget. Brian?
Flags: needinfo?(jwalker)
Flags: needinfo?(ejpbruel)
(In reply to Joe Walker [:jwalker] (overloaded - needinfo me or ping on irc) from comment #8)
> I'm working on finding someone.
> 
> In the short term, unless Brian thinks otherwise, I think we risk doing more
> bitrot damage by backing this out than by leaving it for a while. I don't
> think we'll forget. Brian?

I agree that backing it out at this point doesn't make sense.  Based on Comment 4, here are the suggested workarounds:

- We could still pre-load Promise.jsm if only we had a way to make it invisible to the debugger. I tried to do this by loading it in a sandbox with the invisibleToDebugger flag set to true. However, that did not seem to work, hence why I fell back on loading it as a CommonJS module.
- If we can pre-load Promise.jsm, we would not need to load it twice, because the JSM machinery does not depend on the module loader.

Sounds like the first one didn't work out so unless if we can find a way to get that to work we will need to find a way to pre-load Promise.jsm.
is there anything I can do here to help out?  Otherwise, I will leave this to you guys as the domain experts!
Assignee: nobody → ejpbruel
Going to mark this as P4 for now. Sure, it's a perf regression, but it doesn't seem to affect people that much. We could try to improve performance in Promise.jsm, but I expect we will be able to replace that with native Promises within a couple of months, in which case this bug becomes moot.
Priority: -- → P4
Assignee: ejpbruel → nobody
Product: Firefox → DevTools
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.