Closed
Bug 1162622
Opened 10 years ago
Closed 10 years ago
Investigate slowdown in mochitest-devtools-chrome-1 since late April
Categories
(DevTools :: General, defect)
DevTools
General
Tracking
(firefox41 fixed)
RESOLVED
FIXED
Firefox 41
Tracking | Status | |
---|---|---|
firefox41 | --- | fixed |
People
(Reporter: jgriffin, Assigned: terrence)
References
Details
Attachments
(5 files)
The mochitest-devtools-chrome-1 job has been taking increasing amounts of time to run on linux32 debug since around April 27.
This manifests as "command timed out: 4800 seconds elapsed" errors in Treeherder, logged in bug 1137757.
We should use ActiveData to determine whether this increase in runtime is sudden or gradual, and whether it is across-the-board or whether it affects particular tests more than others.
Kyle, can you generate data for runtimes over time of individual tests in mochitest-devtools-chrome-1 on linux32 debug from about April 21 through May 4, so we can see if there are trends that would point us in the direction of a cause?
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(klahnakoski)
Comment 1•10 years ago
|
||
yes, this can be done, but it will take a few days: The cluster collapsed under the weight of the data.
Comment 2•10 years ago
|
||
The new cluster is rebuilding, and the most recent days are still missing.
Here is the query I am using to do the pull. It must be imported, normalized, put into pivot table, and charted. I will do this once the rest of the days get indexed.
> {
> "from":"unittest",
> "select":{
> "name":"duration",
> "value":{"sub":["result.end_time","result.start_time"]},
> "aggregate":"min"
> },
> "edges":[
> "result.test",
> {
> "value":"build.date",
> "domain":{
> "type":"time",
> "min":"{{20 APR 2015}}",
> "max":"{{8 MAY 2015}}",
> "interval":"day"
> }
> }
> ],
> "where":{"and":[
> {"eq":{
> "build.branch":"mozilla-inbound",
> "run.suite":"mochitest-devtools-chrome",
> "run.chunk":1,
> "build.platform":"linux",
> "build.type":"debug"
> }},
> {
> "gte":{"build.date":"{{20 APR 2015}}"},
> "lt":{"build.date":"{{8 MAY 2015}}"}
> }
> ]},
> "limit":100000
> }
Comment 3•10 years ago
|
||
I thought we stored result.duration? Is there some reason we expect that to be unreliable in this case?
Comment 4•10 years ago
|
||
Do we? Oh dear, it looks like you are correct. The `result.duration` will be a better measure.
Reporter | ||
Comment 5•10 years ago
|
||
Just eyeballing the data on Treeherder, it looks like a significant regression occurred in the Apr 23-25 timeframe. Average runtimes for the job went from an avg of about ~70 min to ~80 min.
Comment 6•10 years ago
|
||
Here is the PNG of the slowdown. A couple of days data is still being indexed, but we can clearly see the slowdown is almost everywhere and impacts all five of the main directories: debugger, commandline, animationinspector, canvasdebugger, and framework. I will zoom into the day with the problem
Comment 7•10 years ago
|
||
Looking at just one test, browser/devtools/debugger/test/browser_dbg_variables-view-02.js and it's run duration by the build.date hour, we get this chart. Since some hours did not see a build, I filled it in with the last known value; this prevents visual noise from the zeros.
Comment 8•10 years ago
|
||
Listing the revisions and making a scatter plot of duration by build.date, we can see the same step. Using build.date has known ordering issues, and it reveals itself as a little bouncing before the step.
Comment 9•10 years ago
|
||
is this across all platforms? Maybe a machine update is to blame for this? Likewise a mozharness update could cause it. Alternatively a real firefox bug is causing it! food for thought
Reporter | ||
Comment 10•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #9)
> is this across all platforms? Maybe a machine update is to blame for this?
> Likewise a mozharness update could cause it. Alternatively a real firefox
> bug is causing it! food for thought
There weren't any mozharness changes on m-c between Apr 17 and Apr 30.
Here are the buildbot changes in that window: https://wiki.mozilla.org/index.php?title=ReleaseEngineering/Maintenance&oldid=1070162
Here are the puppet changes: https://hg.mozilla.org/build/puppet/shortlog
Nothing jumps out as a likely cause, although I could be missing something.
If we wanted to eliminate environment changes, we could check out a test slave and run the tests with an April 20th binary and an April 30th one, and see if it can be reproduced.
Comment 11•10 years ago
|
||
Even just running locally and reproducing the regression would be pretty convincing that it's a gecko change to me.
Reporter | ||
Comment 12•10 years ago
|
||
I've retriggered this job several times on a commit from Apr 22, at https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=ef1fb0e0911d
This should tell us whether the problem is related to something in-tree or not. If this confirms an in-tree change, we can push to try from this tip with the latest mozharness to determine whether mozharness might be to blame.
Reporter | ||
Comment 13•10 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #12)
> I've retriggered this job several times on a commit from Apr 22, at
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> central&revision=ef1fb0e0911d
>
> This should tell us whether the problem is related to something in-tree or
> not. If this confirms an in-tree change, we can push to try from this tip
> with the latest mozharness to determine whether mozharness might be to blame.
Retriggers are back and show the lower duration, implying that the cause is in-tree.
Reporter | ||
Comment 14•10 years ago
|
||
I'm running a try push from Apr 22 against current mozharness, but I don't expect that to show a problem.
If not, it looks like the culprit is one of these changes, based on Kyle's charts:
Thu Apr 23 20:06:50 2015 +0000 365450b3d3ef Eitan Isaacson — Bug 1155917 - Added volume argument to nsISpeechService.speak(). r=smaug
Thu Apr 23 20:05:45 2015 +0000 95fd54df106b Terrence Cole — Bug 1156390 - Do CheckMarkedThing on internal edges as well as roots; r=sfink
Thu Apr 23 20:01:18 2015 +0000 408faa11152a Nikhil Marathe — Bug 1154494 - Hit network only once. r=baku,bkelly
Eyeballing it, it looks like there was a similar regression on Windows 7 debug. Kyle, can you run the same report for Windows 7 debug?
Flags: needinfo?(klahnakoski)
Comment 15•10 years ago
|
||
Win7 debug is not as blatantly clear, but it is still there. There is one test that actually improved: browser/devtools/animationinspector/test/browser_animation_playerFronts_are_refreshed.js
In this sample I only looked at browser/devtools/animationinspector/test/*; durations are normalized to help visualization; missing hours were filled in with the previous hour with data; I used the following query
> {
> "from":"unittest",
> "select":{
> "name":"duration",
> "value":"result.duration",
> "aggregate":"min"
> },
> "edges":[
> "result.test",
> {
> "value":"build.date",
> "domain":{
> "type":"time",
> "min":"{{22 APR 2015}}",
> "max":"{{25 APR 2015}}",
> "interval":"hour"
> }
> }
> ],
> "where":{"and":[
> {"eq":{
> "build.branch":"mozilla-inbound",
> "run.suite":"mochitest-devtools-chrome",
> "run.chunk":1,
> "build.platform":"win32",
> "build.type":"debug",
> "machine.os":"win7-ix"
> }},
> {"prefix":{"result.test":"browser/devtools/animationinspector/test/"}},
> {
> "gte":{"build.date":"{{22 APR 2015}}"},
> "lt":{"build.date":"{{25 APR 2015}}"}
> }
> ]},
> "limit":100000
> }
Flags: needinfo?(klahnakoski)
Reporter | ||
Comment 16•10 years ago
|
||
Here are the results of a bunch of retriggers around the suspect range:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=83c9e4db68b3
5563
7907
7830
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=408faa11152a
11330
7771
11137
5055
7596
7759
7605
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=95fd54df106b
11784
11757
11378
11288
13132
11955
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=365450b3d3ef
11617
11426
11102
Although there is some noise here, it looks like https://hg.mozilla.org/integration/mozilla-inbound/rev/95fd54df106b is the culprit.
Reporter | ||
Comment 17•10 years ago
|
||
Terrence, it's looking like your patch https://hg.mozilla.org/integration/mozilla-inbound/rev/95fd54df106b may have caused the devtools chrome tests to run slower, at least on debug; see above. Is this a possible side-effect of your patch?
Flags: needinfo?(terrence)
Flags: needinfo?(sphink)
Assignee | ||
Comment 18•10 years ago
|
||
Yes, my patch added some extra debugging checks, which I guess are slower than intended. I'll back out the regressing bug.
Flags: needinfo?(terrence)
Flags: needinfo?(sphink)
Assignee | ||
Comment 19•10 years ago
|
||
Here is a targeted backout that gets us back to previous behavior. We should find a faster subset of the tests to run on every internal edge.
Updated•10 years ago
|
Attachment #8604986 -
Flags: review?(sphink) → review+
Assignee | ||
Comment 20•10 years ago
|
||
A try run so we can test if this solves the problem: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ea61b59bfa85
Assignee | ||
Comment 21•10 years ago
|
||
Previous try run was on a busted tip, let's try again.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c8616116962
Comment 22•10 years ago
|
||
Assignee | ||
Comment 23•10 years ago
|
||
Finally, the tree opened when I was looking! Sorry it took so long to land this!
Comment 24•10 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
status-firefox41:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 41
Updated•7 years ago
|
Product: Firefox → DevTools
You need to log in
before you can comment on or make changes to this bug.
Description
•