1230215 - 3% Linux/Win damp regression on Fx-Team (v.45) on Dec 02, 2015 from push 15522bc2931a52ec435f22e233f747264b90f647

Reporter

Description

•

9 years ago

Talos has detected a Firefox performance regression from your commit 15522bc2931a52ec435f22e233f747264b90f647 in bug 1215959. We need you to address this regression. This is a list of all known regressions and improvements related to your bug: http://alertmanager.allizom.org:8080/alerts.html?rev=15522bc2931a52ec435f22e233f747264b90f647&showAll=1 On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#DAMP Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax: try: -b o -p linux64 -u none -t g2 # add "mozharness: --spsProfile" to generate profile data To run the test locally and do a more in-depth investigation, first set up a local Talos environment: https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code Then run the following command from the directory where you set up Talos: talos --develop -e <path>/firefox -a damp Making a decision: As the patch author we need your feedback to help us handle this regression. Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 1

•

9 years ago

a compare view of all affected platforms/tests: https://treeherder.allizom.org/perf.html#/compare?originalProject=fx-team&originalRevision=15522bc2931a&newProject=fx-team&newRevision=0e47cb064701&filter=damp&showOnlyConfident=1&showUnreliablePlatforms=1 looking in details (linux64 e10s for example): https://treeherder.allizom.org/perf.html#/comparesubtest?originalProject=fx-team&originalRevision=15522bc2931a&newProject=fx-team&newRevision=0e47cb064701&originalSignature=13dd9b8ab5c9d40a946e35a9dc085970caebce48&newSignature=13dd9b8ab5c9d40a946e35a9dc085970caebce48 you can see the big offenders are: damp complicated.jsdebugger.reload.DAMP opt e10s damp simple.jsdebugger.close.DAMP opt e10s damp simple.jsdebugger.reload.DAMP opt e10s damp simple.netmonitor.close.DAMP opt e10s There is a good win: damp simple.jsdebugger.open.DAMP opt e10s the rest of the data in the compare view is more than likely attributed to noise. :jlong, can you take a look at this and weigh in on why we might be seeing these changes as a result of your patch: https://hg.mozilla.org/integration/fx-team/rev/0e47cb064701?

Flags: needinfo?(jlong)

James Long (:jlongster)

Assignee

Comment 2

•

9 years ago

That change is a big refactoring of the debugger, so I'm not surprised there might be negative effects. I think I can solve the `reload` case, there's actually a very small section of code that affects it greatly and it can be simplified/removed (leftover code from before). I'm not sure about the `close` case. I can take a look though. Hardest part about this is to figure how to reproduce it locally and get familiar with damp. I've had limited experience with it.

Flags: needinfo?(jlong)

James Long (:jlongster)

Assignee

Updated

•

9 years ago

Assignee: nobody → jlong

James Long (:jlongster)

Assignee

Comment 3

•

9 years ago

Oh I think I actually know why `close` is taking longer. We actually wait for the debugger to finish async requests now, while before we didn't. The problem with not waiting was that async requests would come in after the UI is destroyed, and it would error. Previously there were literally just errors. So we sacrificed performance for correctness. I developed a different way to handle this though later in the refactoring. Instead of waiting for them to finish, when they come in we just ignore them. I think I can remove the code that waits for the requests to finish now. I'll look into it.

Brian Grinstead [:bgrins]

Comment 4

•

9 years ago

(In reply to James Long (:jlongster) from comment #3) > Oh I think I actually know why `close` is taking longer. We actually wait > for the debugger to finish async requests now, while before we didn't. The > problem with not waiting was that async requests would come in after the UI > is destroyed, and it would error. Previously there were literally just > errors. So we sacrificed performance for correctness. > > I developed a different way to handle this though later in the refactoring. > Instead of waiting for them to finish, when they come in we just ignore them. > > I think I can remove the code that waits for the requests to finish now. > I'll look into it. Waiting for async requests to finish when it wasn't before seems like a reasonable reason to regress close time. With your other way of handling this, does it work properly when a toolbox is opened/closed a bunch of times rapidly so that there are lots of pending requests?

James Long (:jlongster)

Assignee

Comment 5

•

9 years ago

(In reply to Brian Grinstead [:bgrins] from comment #4) > > Waiting for async requests to finish when it wasn't before seems like a > reasonable reason to regress close time. With your other way of handling > this, does it work properly when a toolbox is opened/closed a bunch of times > rapidly so that there are lots of pending requests? Yep, those are different instances of the debugger so it wouldn't even know about previous async requests. Reloading really fast is actually a harder scenario because it is the same instance, but previous async requests may "bleed through". My solution even works there, though. In fact that's the very use case which made me solve this. See this code: https://github.com/mozilla/gecko-dev/blob/master/devtools/client/debugger/debugger-controller.js#L191-L206 Basically, every async request has an ID, and I keep a list of "open requests" as a simple array of IDs. That array is reset to empty whenever the page reloads or devtools is closed. When future requests finish it'll come through and be ignore because it's not a "current request". I had to do this because you can't block a page navigation, while you can block the debugger shutdown. But it's actually a better solution because we don't have to block on it (we don't really care about the results anyway)

James Long (:jlongster)

Assignee

Comment 6

•

9 years ago

Here's a push without the code to wait on async requests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=10053727eabb I seem to remember trying this before but still ran into some edge cases. We'll see what comes up.

James Long (:jlongster)

Assignee

Comment 7

•

9 years ago

Need to do this before implementing an optimization

Depends on: 1230299

James Long (:jlongster)

Assignee

Comment 8

•

9 years ago

Awesome, according to the try run in comment 6 I can just remove the code the blocks shutdown on async requests because we just ignore them now. I'm going to split off work into individual bugs which are now blocking this bug.

Depends on: 1230556

James Long (:jlongster)

Assignee

Comment 9

•

9 years ago

Here's a try run with patches from both bugs and `-t g2` so we can see if it fixes this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3cd0dd87d907 Talos: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=3cd0dd87d907 I'll need some help understanding the data. Should we see a 3-5% improvement now by comparing it with mozilla-central, because it's comparing it with my slower code?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 10

•

9 years ago

I retriggered jobs on the try run, found the base revision on fx-team and retriggered the g2 jobs there- in due time we will have enough data to have more confidence in what comes out: https://treeherder.mozilla.org/perf.html#/compare?originalProject=fx-team&originalRevision=c9c2015c63c5&newProject=try&newRevision=3cd0dd87d907&filter=damp&showOnlyConfident=1 to answer your question- yes, we should see an improvement on the damp test with your patch! Lets see if that does the trick

James Long (:jlongster)

Assignee

Comment 11

•

9 years ago

Thanks Joel - looks like I need to fix some tests with those patches, so I'll have to make another try run.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 12

•

9 years ago

if you can keep the parent rev the same, it would be easy to just switch the compare view to point to the newRevision

James Long (:jlongster)

Assignee

Comment 13

•

9 years ago

Ok this should have the same parent rev, tests pass locally and I'm pretty sure this should work: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e292c2b07e38

James Long (:jlongster)

Assignee

Comment 14

•

9 years ago

I'm probably going to go ahead and land those 2 bugs tonight regardless; they clean up the code and implement optimizations I wanted to make anyway.

James Long (:jlongster)

Assignee

Comment 15

•

9 years ago

Unfortunately it looks like all those tests runs had a bad parent commit. So we're going to have to change the parent rev. https://treeherder.mozilla.org/#/jobs?repo=try&revision=17d448ab398e https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=17d448ab398e

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 16

•

9 years ago

I am not seeing much of an improvement here after retriggers; https://treeherder.mozilla.org/perf.html#/compare?originalProject=fx-team&originalRevision=b1821f8cf261&newProject=try&newRevision=17d448ab398e&filter=damp&showOnlyImportant=0 either way, I think we have cleaned up what we can. This sounds like a good cleanup and there might not be much more that can realistically be done.

James Long (:jlongster)

Assignee

Comment 17

•

9 years ago

Ok. Looking at the original benchmarks, it looks like it's a consistent 3-5ms difference. It's really cool that we can detect that! But it's also a really small change, so I agree that we are ok to ignore it. I am surprised that avoiding blocking on pending async requests during shutdown didn't change anything, but maybe the way the DAMP tests run there aren't any async requests going on anyway. The only other thing I can think of is that we are loading more JS files (some bigger ones like React) on startup. I wouldn't be surprised if that added 3-5ms. But that shouldn't affect reload or shutdown. We're going to be making more changes to the debugger so we can profile startup/reload at some point and see if there's anything we can do to optimize it in general.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 18

•

9 years ago

I would be fine closing this as wontfix or other topics. I prefer to see some form of resolution or a rough plan (we need to do X in 2 months, etc.) That is my preference, this is your bug :) Thanks again for looking into this!

James Long (:jlongster)

Assignee

Comment 19

•

9 years ago

Ok, let's keep it for now. It's a bit hard to give a plan because the debugger is in a state of flux right now, and we are going to continue doing some major frontend refactors. Every pass of refactoring it's easier to rationalize about and profile and change the code to be fast. Note that all of these numbers are about to change quite a bit. Bug 1132501 will attach the thread when the toolbox opens, not when the debugger is initialized, so the debugger open/reload events should be faster but the toolbox opening is going to take a hit. We've actually already don a ton of damp runs in that bug and it seems like opening the toolbox will be ~30ms slower, but it's a sorely needed architectural piece and we should have been doing it like that anyway. That 30ms just moves from the debugger opening to the toolbox opening.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → WONTFIX

BMO Automation

Updated

•

7 years ago

Product: Firefox → DevTools

Bugzilla

3% Linux/Win damp regression on Fx-Team (v.45) on Dec 02, 2015 from push 15522bc2931a52ec435f22e233f747264b90f647

Categories

(DevTools :: Debugger, defect)

Tracking

(Not tracked)

People

(Reporter: jmaher, Assigned: jlong)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression][e10s])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Comment 19

Updated