Closed Bug 1150215 Opened 10 years ago Closed 10 years ago

Add devtools test for talos

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(firefox40 fixed)

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: bgrins, Assigned: bgrins)

References

()

Details

Attachments

(2 files, 6 obsolete files)

We should add some performance testing for devtools to measure improvements and regressions. This would be a page load style test, most likely similar to tart which can run as a separate addon.
Some ideas of things that could be measured: How quickly various devtools things happen on a simple page: * Time until toolbox ready opening on console panel * Time until toolbox ready opening on inspector panel * Time until toolbox ready opening on debugger panel * Toolbox open and close * etc It could possibly include even more specific things: * Inspector time to update after 1000 mutations * Console rendering speed for 1000 log messages * Debugger time to load on page with 1000 scripts * etc
There is an old bug where msucan and jmaher were discussing this, but I can't seem to find it.
Attached patch talos-devtools-WIP-1.patch (obsolete) — Splinter Review
Stashing a first WIP of the Devtools At Maximum Performance (DAMP) test.. here's a summary of what's going on: It's similar to TART in that it creates an addon and the manifest file just loads up a single test runner page where you can change options and which individual tests to run. What it does: for each of ["webconsole", "inspector", "jsdebugger", "styleeditor", "netmonitor"]: 1) Open a new tab on a relatively simple page and wait for the load event 2) Measure the time from gDevTools.openToolbox until the toolbox is initialized 3) Measure the time from gDevTools.closeToolbox until the toolbox is fully destroyed 4) Close the tab So it loops through and does this N times then reports the results. We could also measure the time it takes to re-open the tools on a single tab (which might give more consistent times depending on how long the initial connection process takes), but time to open on a fresh tab is probably the more important metric to track. It would be cool to also load this up with a much more complicated page to see how the toolbox fares with more work to do -- maybe a page from tp5 or something we come up with ourselves.
Assignee: nobody → bgrinstead
Status: NEW → ASSIGNED
I think the tool specific tests are really useful, especially when they are potentially very common workflows or known bottlenecks in the tool In addition to what bgrins listed above: * Inspector time to update after 1000 mutations * Console rendering speed for 1000 log messages * Debugger time to load on page with 1000 scripts Some other things that measure tool performance, and also how the tools affect performance of content * Time for profiler to start/stop (and on subsequent profiles, seems way longer since it has to parse the previous profiles data) * Page load with netmonitor open (was an email about this slowing down pages) * Overhead of running a profile on content, with or without memory/allocations on
I don't know how hard it is to display/measure more than just one number/type in talos, but I really think it would be helpful to also write down the memory usage while running all these scenarios. You can retrieve a meaningful number on linux with this code, which works both in parent and child processes: Components.classes["@mozilla.org/memory-reporter-manager;1"].getService(Components.interfaces.nsIMemoryReporterManager).residentUnique Also I'm wondering... here you are writing down yet another kind of test scripts. But isn't there a way to push such numbers from other existing test harness, like luciddream/mochitests? I imagine that's more a question for ateam than devtools... but that would be really, really handy! Pushing such markers from luciddream would allow checking performances over various and very different environment/usages.
Attached patch talos-devtools.patch (obsolete) — Splinter Review
Joel, here is a WIP of the devtools talos test. I'd like to make sure things seem generally sane before proceeding much further. I believe you mentioned that there is a way to run this on the try servers to see if it works in that environment. Can you help me get that set up?
Attachment #8588803 - Attachment is obsolete: true
Flags: needinfo?(jmaher)
Here is some example output that the test generates when I run it locally (via `talos -n -d --develop --executablePath PATH --activeTests damp`): ------- Summary: start ------- Number of tests: 24 [#0] simple.webconsole.open.DAMP Cycles:25 Average:223.91 Median:208.21 stddev:67.06 (32.2%) stddev-sans-first:26.23 Values: 521.3 193.4 205.9 212.3 194.9 187.9 192.8 206.2 303.1 203.7 190.2 231.5 208.2 228.8 188.6 189.4 210.5 212.8 222.4 234.3 236.8 208.6 242.3 192.6 179.4 [#1] simple.webconsole.close.DAMP Cycles:25 Average:22.44 Median:22.44 stddev:4.39 (19.6%) stddev-sans-first:3.66 Values: 34.6 16.4 16.7 22.8 20.5 30.7 23.4 24.5 29.0 23.8 23.3 18.9 23.8 18.8 27.6 21.9 20.6 23.2 17.3 19.9 19.7 23.1 19.4 22.4 18.7 [#2] simple.inspector.open.DAMP Cycles:25 Average:314.67 Median:313.09 stddev:23.04 (7.4%) stddev-sans-first:18.36 Values: 383.8 292.5 293.8 317.0 298.7 313.9 320.1 321.4 350.5 333.1 313.1 289.9 305.6 344.3 297.0 325.9 286.9 334.9 283.8 312.8 323.5 300.2 310.2 322.5 291.5 [#3] simple.inspector.close.DAMP Cycles:25 Average:56.79 Median:54.36 stddev:13.07 (24.0%) stddev-sans-first:13.20 Values: 47.6 37.2 37.6 66.2 66.1 49.3 59.8 52.6 68.8 56.4 41.0 82.8 48.6 54.4 54.0 47.6 42.0 67.2 82.7 72.8 73.2 45.5 55.9 62.8 47.8 [#4] simple.jsdebugger.open.DAMP Cycles:25 Average:279.14 Median:274.97 stddev:30.82 (11.2%) stddev-sans-first:25.43 Values: 366.3 265.1 283.6 363.8 275.0 280.5 259.4 268.8 296.6 285.9 276.3 260.3 314.8 276.8 257.5 237.8 263.7 266.1 250.3 249.6 268.0 289.1 252.5 287.3 283.3 [#5] simple.jsdebugger.close.DAMP Cycles:25 Average:28.07 Median:26.64 stddev:6.84 (25.7%) stddev-sans-first:6.87 Values: 22.1 24.0 22.3 53.8 26.5 24.5 23.1 26.6 31.2 32.0 25.1 33.3 36.5 29.7 29.7 24.4 29.6 20.8 28.9 31.9 29.2 28.4 22.9 19.5 25.5 [#6] simple.styleeditor.open.DAMP Cycles:25 Average:175.80 Median:170.74 stddev:27.13 (15.9%) stddev-sans-first:27.60 Values: 164.1 145.0 211.8 205.6 177.4 142.8 150.2 174.4 218.0 180.5 246.0 157.3 154.7 175.1 145.5 151.8 231.4 163.7 160.3 181.9 170.6 175.0 170.7 167.2 173.9 [#7] simple.styleeditor.close.DAMP Cycles:25 Average:32.04 Median:40.32 stddev:18.39 (45.6%) stddev-sans-first:18.68 Values: 41.6 47.8 15.7 11.4 47.2 45.6 34.4 12.6 17.5 49.0 13.4 50.9 66.5 64.4 9.9 21.9 15.3 54.3 9.4 43.2 35.3 12.6 27.9 12.6 40.3 [#8] simple.performance.open.DAMP Cycles:25 Average:145.38 Median:140.61 stddev:15.81 (11.2%) stddev-sans-first:11.98 Values: 196.3 153.8 139.4 153.6 122.2 156.5 149.7 127.0 155.2 134.6 138.8 144.5 158.2 140.6 144.2 164.5 139.2 139.9 167.8 136.5 131.6 136.4 125.5 141.4 137.0 [#9] simple.performance.close.DAMP Cycles:25 Average:22.85 Median:18.41 stddev:10.72 (58.2%) stddev-sans-first:10.90 Values: 27.6 15.3 18.7 20.7 18.3 14.3 18.4 28.2 23.1 14.7 17.7 14.7 22.7 15.4 49.5 40.4 18.2 18.1 51.8 16.5 10.3 24.9 17.3 19.3 35.0 [#10] simple.netmonitor.open.DAMP Cycles:25 Average:225.51 Median:227.48 stddev:35.58 (15.6%) stddev-sans-first:36.26 Values: 214.2 298.4 222.2 218.1 270.5 242.6 240.3 227.5 238.3 240.6 226.3 261.9 266.1 183.0 164.7 143.7 233.1 247.9 166.4 257.3 222.9 227.7 219.4 185.3 219.4 [#11] simple.netmonitor.close.DAMP Cycles:25 Average:39.84 Median:36.99 stddev:9.85 (26.6%) stddev-sans-first:9.92 Values: 31.7 58.2 33.2 46.0 36.0 41.1 29.5 37.5 31.5 37.0 31.7 37.2 30.7 29.7 48.9 68.0 48.9 35.6 49.7 33.9 41.5 43.1 30.7 51.3 33.4 [#12] complicated.webconsole.open.DAMP Cycles:25 Average:220.47 Median:221.53 stddev:19.47 (8.8%) stddev-sans-first:17.59 Values: 176.9 187.8 221.5 207.5 224.1 202.0 224.6 251.9 203.6 207.5 223.7 242.7 237.1 218.2 209.9 232.1 261.7 217.6 221.7 235.1 204.2 225.4 248.5 210.9 215.4 [#13] complicated.webconsole.close.DAMP Cycles:25 Average:27.04 Median:26.52 stddev:6.05 (22.8%) stddev-sans-first:6.18 Values: 26.9 21.4 20.5 22.2 32.7 32.9 41.1 30.9 21.3 22.9 22.4 30.2 38.5 29.4 20.8 23.0 26.3 28.5 33.5 30.2 19.6 24.9 26.5 17.8 32.2 [#14] complicated.inspector.open.DAMP Cycles:25 Average:386.93 Median:379.21 stddev:35.19 (9.3%) stddev-sans-first:35.75 Values: 369.6 386.5 391.2 368.1 349.5 383.5 392.2 443.9 493.4 394.9 367.2 397.5 392.2 358.9 367.1 345.1 416.0 448.5 371.0 372.1 361.8 379.2 352.9 421.0 349.9 [#15] complicated.inspector.close.DAMP Cycles:25 Average:78.74 Median:59.69 stddev:52.28 (87.6%) stddev-sans-first:53.03 Values: 49.1 59.7 58.2 61.5 43.2 69.4 83.1 76.0 300.0 42.1 52.5 65.8 150.2 103.3 110.0 65.2 66.9 70.8 47.0 77.3 43.0 93.3 72.0 42.0 67.0 [#16] complicated.jsdebugger.open.DAMP Cycles:25 Average:354.26 Median:338.79 stddev:97.77 (28.9%) stddev-sans-first:98.55 Values: 278.2 269.8 387.9 248.5 484.0 434.4 369.5 257.8 288.5 351.1 358.8 330.2 278.2 245.4 241.3 436.2 522.1 494.0 324.1 603.1 355.7 239.2 308.5 338.8 411.2 [#17] complicated.jsdebugger.close.DAMP Cycles:25 Average:89.02 Median:47.54 stddev:71.83 (151.1%) stddev-sans-first:72.82 Values: 46.7 43.4 183.9 46.5 163.2 229.0 243.7 76.8 48.2 224.8 223.1 50.8 49.2 51.5 57.9 48.1 51.6 42.2 47.0 48.9 47.5 48.0 48.2 59.6 45.7 [#18] complicated.styleeditor.open.DAMP Cycles:25 Average:483.89 Median:484.73 stddev:25.82 (5.3%) stddev-sans-first:23.92 Values: 536.1 468.0 503.1 522.1 469.2 487.4 501.2 463.1 451.4 461.9 485.6 473.8 517.3 508.5 461.1 501.7 464.4 484.7 485.3 434.4 511.3 477.6 444.8 472.1 511.2 [#19] complicated.styleeditor.close.DAMP Cycles:25 Average:45.52 Median:44.97 stddev:7.51 (16.7%) stddev-sans-first:7.66 Values: 43.4 35.4 48.9 36.1 49.9 41.7 48.8 36.0 47.0 42.8 48.3 42.4 37.8 64.1 35.7 39.4 38.6 45.0 50.1 50.4 50.3 49.2 59.2 41.4 56.2 [#20] complicated.performance.open.DAMP Cycles:25 Average:181.44 Median:181.90 stddev:13.95 (7.7%) stddev-sans-first:14.22 Values: 186.0 194.8 208.8 170.9 182.5 185.6 188.9 167.8 172.9 186.7 166.2 181.9 169.1 170.2 217.2 179.3 171.5 180.6 169.6 197.5 187.9 183.1 180.9 184.7 151.4 [#21] complicated.performance.close.DAMP Cycles:25 Average:22.64 Median:23.53 stddev:8.67 (36.8%) stddev-sans-first:8.48 Values: 34.6 29.2 23.5 19.7 12.4 29.0 27.9 20.5 15.6 28.1 33.8 16.4 22.1 14.1 19.2 9.7 33.2 15.3 26.8 28.5 12.6 13.7 11.9 25.0 43.0 [#22] complicated.netmonitor.open.DAMP Cycles:25 Average:217.64 Median:218.27 stddev:17.76 (8.1%) stddev-sans-first:15.23 Values: 263.9 221.7 208.0 205.8 236.9 201.2 226.5 195.7 232.0 219.2 214.5 229.1 241.8 231.2 191.7 203.5 195.9 191.5 211.2 231.8 227.1 218.3 216.2 198.1 228.3 [#23] complicated.netmonitor.close.DAMP Cycles:25 Average:35.45 Median:33.73 stddev:7.45 (22.1%) stddev-sans-first:6.57 Values: 53.6 30.8 34.8 34.4 41.2 29.9 40.2 29.1 44.1 37.3 28.1 32.9 31.8 35.7 33.7 33.7 33.8 32.6 32.4 37.9 34.9 29.3 57.7 32.3 24.0 -------- Summary: end --------
(In reply to Alexandre Poirot [:ochameau] from comment #5) > I don't know how hard it is to display/measure more than just one > number/type in talos, > but I really think it would be helpful to also write down the memory usage > while running all these scenarios. > > You can retrieve a meaningful number on linux with this code, which works > both in parent and child processes: > > Components.classes["@mozilla.org/memory-reporter-manager;1"]. > getService(Components.interfaces.nsIMemoryReporterManager).residentUnique I believe that each test suite should be focused on a single metric (time or memory), but that's more of a Joel question. > Also I'm wondering... here you are writing down yet another kind of test > scripts. > But isn't there a way to push such numbers from other existing test harness, > like luciddream/mochitests? > I imagine that's more a question for ateam than devtools... but that would > be really, really handy! > Pushing such markers from luciddream would allow checking performances over > various and very different environment/usages. Yes, that would be very nice
(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #4) > I think the tool specific tests are really useful, especially when they are > potentially very common workflows or known bottlenecks in the tool > > In addition to what bgrins listed above: > > * Inspector time to update after 1000 mutations > * Console rendering speed for 1000 log messages > * Debugger time to load on page with 1000 scripts > > Some other things that measure tool performance, and also how the tools > affect performance of content > > * Time for profiler to start/stop (and on subsequent profiles, seems way > longer since it has to parse the previous profiles data) > * Page load with netmonitor open (was an email about this slowing down pages) > * Overhead of running a profile on content, with or without > memory/allocations on My thinking is that these tool specific tests would be a separate suite (although probably sharing code in a way similar to how tart/cart do). This way we could keep the measurements and alerts isolated from each other. For example, if we regressed the toolbox startup time, but at the same time landing something that sped the console logging time way up, it could end up being a net improvement although we missed a regression. From what I understand, the ability to get alerts and track results on individual tests may be upcoming. If this is the case, then I'd be happy to do everything in a single suite. Another Joel question :)
Attached patch talos-devtools.patch (obsolete) — Splinter Review
Setting the options in test.py to more closely match tart
Attachment #8589377 - Attachment is obsolete: true
Attached patch talos-devtools.patch (obsolete) — Splinter Review
Worked with jmaher to figure out why it wasn't running on try servers. This one also includes extra measurements for time to reload the page with the tools opened.
Attachment #8590984 - Attachment is obsolete: true
with the latest patches we are running on try- a few tweaks via irc chat, but we are getting there.
Flags: needinfo?(jmaher)
Depends on: 1153886
Depends on: 1153903
I'd like a "Step into" performance test - say, load a known script with a debugger statement, send "Step into" commands as fast as possible for, say, one minute and count how many commands ran..
(In reply to Hallvord R. M. Steen [:hallvors] from comment #13) > I'd like a "Step into" performance test - say, load a known script with a > debugger statement, send "Step into" commands as fast as possible for, say, > one minute and count how many commands ran.. This would be a great addition to some of Jordan's ideas in Comment 4. I see those 'task specific' tests as being part of a second suite done in a follow up to this (damp2?). The reasoning being (a) they are probably much longer running so their results would mask results for basic toolbox open / close and (b) toolbox open / close time is important to get in ASAP so we can start tracking performance in the short term.
Attached patch talos-devtools.patch (obsolete) — Splinter Review
A much smaller patch, since I've removed the custom complicated page and am using bild.de from tp5 instead. I've also removed the ASAP mode junk from test.py since I don't think we actually needed it - I had just added it to try and debug why runs weren't working (which just turned out to be because of a bad URL).
Attachment #8591069 - Attachment is obsolete: true
Simple generator to create heavy scripts for performance tests - might be useful here: http://hallvord.com/temp/moz/scriptgen.htm The heaviest script I've tested with had settings depth: 7, width: 5, it was more than 30 million characters. Gecko itself ran it without problems on my machine, but the debugger was struggling badly with several "slow script" dialogs when first opened and a total hang on reload. I guess some issues regarding that should be reported separately, but slightly smaller generated scripts might be useful for work going on in this bug.
Brian, should we spin the stuff in comment 4 and my suggestions off to one or more separate bugs?
Flags: needinfo?(bgrinstead)
Blocks: 1154874
(In reply to Hallvord R. M. Steen [:hallvors] from comment #17) > Brian, should we spin the stuff in comment 4 and my suggestions off to one > or more separate bugs? Yes, let's do that - I filed Bug 1154874
Flags: needinfo?(bgrinstead)
I started a simple section on the wiki about this test: https://wiki.mozilla.org/Buildbot/Talos/Tests#DAMP
Joel, not sure who the right person to review this is. Feel free to redirect. I can also ask a devtools peer to review parts of it also if necessary. Here are two specific things I'm still unsure about with the patch: 1) All the settings in the damp class in test.py. I just copied this mostly from tart, but I'm not sure if they make sense in this case (especially tpmozafterpaint, sps_profile_interval, sps_profile_entries, and the lack of ASAP mode) 2) I've been sort of winging it as I go as far as what measurements to take in damp.js. The main flow is: for (webconsole, inspector, jsdebugger, etc as PANEL) { Open new tab and switch to it once load event fires Wait 100ms Time opening for PANEL Time page reload Time closing for PANEL Close tab } 2a) I'm not sure if we should be waiting after the tab has loaded before proceeding. If so, should it be more/less than 100ms? 2b) Timing the page reload is handy because we could track things like Bug 1143224 where we cause jank during the page load, but it also could be improved / regressed by non-devtools changes (like if the page load time itself changes). Not sure if this is a problem.
Attachment #8592576 - Attachment is obsolete: true
Attachment #8594162 - Flags: review?(jmaher)
Comment on attachment 8594162 [details] [diff] [review] talos-devtools.patch Review of attachment 8594162 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for writing this! Right now we are hovering around 15 minutes for this to run (probably 12 minutes if we remove the overhead of machine setup). Do we see this growing in the near future? I am tempted to create a new jobtype for this. Also do we expect this to behave similarly on all platforms? Any platforms we should ignore?
Attachment #8594162 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #21) > I am tempted to create a new jobtype for this. Just remember we moved to a new generic job naming, the first of which was g1, so a new job should be g2 ;)
(In reply to Joel Maher (:jmaher) from comment #21) > Comment on attachment 8594162 [details] [diff] [review] > talos-devtools.patch > > Review of attachment 8594162 [details] [diff] [review]: > ----------------------------------------------------------------- > > Thanks for writing this! > > Right now we are hovering around 15 minutes for this to run (probably 12 > minutes if we remove the overhead of machine setup). Do we see this growing > in the near future? I am tempted to create a new jobtype for this. Right now I can't think of anything else we would want to add in a basic toolbox test suite, but it is possible I guess. I filed Bug 1154874 to separate the requests for adding more extensive task-specific tests in another suite, although I'd be interested in your feedback about that idea. If you think it'd be better to lump those into damp, then it would definitely grow. > Also do we expect this to behave similarly on all platforms? Any platforms > we should ignore? Just clicking through the results on: https://treeherder.mozilla.org/#/jobs?repo=try&revision=49bb456add97&exclusion_profile=false&filter-searchStr=chromez, it looks like they are all at least a little different. I'd feel better about making that decision after we have some data from it running for a while.
I recommend landing it for now- we will need to create a 'g2' job which can run this and a future second test if there is a need for it. I see it working like so: * land on talos (you can do this) * add the suite definition to https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos.json, and update talos.json to have the latest revision and suite * update buildbot-configs to be aware of this new test, when it deploys (firefox 40) then we will have talos and related configs available.
Attached patch talos-definition.patch (obsolete) — Splinter Review
> * add the suite definition to https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos.json, and update talos.json to have the latest revision and suite I don't know if this is right, it's just a guess. Let me know if other changes are needed.
Attachment #8595405 - Flags: review?(jmaher)
Comment on attachment 8595405 [details] [diff] [review] talos-definition.patch Review of attachment 8595405 [details] [diff] [review]: ----------------------------------------------------------------- everything else looks swell. ::: testing/talos/talos.json @@ +1,3 @@ > { > "talos.zip": { > + "url": "http://talos-bundles.pvt.build.mozilla.org/zips/talos.85d4f8ef4810.zip", we can actually leave this alone- this is specifically for Android. Also your baseline is out of date (landed a talos update this morning, so talos_revision below will bitrot)
Attachment #8595405 - Flags: review?(jmaher) → review+
Rebased on top of mozilla-inbound and removed the talos.zip change
Attachment #8595405 - Attachment is obsolete: true
Attachment #8595489 - Flags: review+
Attachment #8594162 - Flags: checkin+
Component: Developer Tools: Framework → Talos
Flags: checkin+
Product: Firefox → Testing
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
just need to actually schedule this stuff!
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1157793
Depends on: 1158197
(In reply to Joel Maher (:jmaher) from comment #31) > just need to actually schedule this stuff! Anything else need to be done for this before we mark this one as resolved?
Flags: needinfo?(jmaher)
this is great, lets close it out! tests are running and reporting just fine.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Flags: needinfo?(jmaher)
Resolution: --- → FIXED
Blocks: 1239422
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: