Closed Bug 1182595 Opened 4 years ago Closed 4 years ago

Add a common SPS Profiling framework for talos tests

Categories

(Testing :: Talos, defect)

defect
Not set

Tracking

(e10s+, firefox47 fixed)

RESOLVED FIXED
mozilla47
Tracking Status
e10s + ---
firefox47 --- fixed

People

(Reporter: mconley, Unassigned)

References

Details

Attachments

(1 file, 3 obsolete files)

The pageloader extension that is used with talos communicates with the Gecko Profiler to start profiling, add markers, and to dump profiles.

Currently, that support is only for non-e10s. We should add support for multi-process profiling. We'll need that to chase down some of our talos regressions.
There's a patch for this. It would be nice if it could land:
https://hg.mozilla.org/users/mconley_mozilla.com/e10s-talos/rev/3759cad0f5eb
Flags: needinfo?(mconley)
I merged this patch in my repo but it didn't work. Maybe it needs another patch as well? It was missing the subprocess.
I had this working locally at one point, but a number of tests were hanging on try, which is why I never requested review for it - there were some cases I probably hadn't considered.

I'll see if I can figure out why this is broken now.
My current hypothesis is that the initial content process (that knows we're profiling) is being shut down, and a new one is being started which doesn't know we're profiling. Essentially, this is bug 1103094.

Let me just quickly confirm locally.
Flags: needinfo?(mconley)
Yeah, this is exactly what is happening.
Depends on: 1103094
Thanks for doing this Mike :D!
Argh, there's a separate problem here; when the test finishes, we close the last remote tab, which causes the content process to shut down - this occurs before we can get the profile out of it.
Depends on: 1193838
Comment on attachment 8699195 [details]
MozReview Request: Bug 1182595 - Add common, e10s-friendly SPS Profiler scripts that Talos tests can use. r?jmaher

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/28193/diff/1-2/
Attachment #8699195 - Attachment description: MozReview Request: Bug 1182595 - [WIP] Add common Profiler.js library that Talos test content can use. → MozReview Request: Bug 1182595 - Add common, e10s-friendly SPS Profiler scripts that Talos tests can use. r?jmaher
:mconley, is there additional work to do here?  anything you need help with?
Hey Mike, I'm trying to get this running locally but I get various errors:

first: JavaScript error: resource://gre/modules/PerformanceStats.jsm, line 215: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIPerformanceStatsService.isMonitoringJank]

then: 15:35:36     INFO -  PROCESS | 57289 | pageloader exception: ReferenceError: TalosParentProfiler is not defined

Is there a way to make this work? It would be a huge help to fix our perf bugs.
Flags: needinfo?(mconley)
Comment on attachment 8699195 [details]
MozReview Request: Bug 1182595 - Add common, e10s-friendly SPS Profiler scripts that Talos tests can use. r?jmaher

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/28193/diff/2-3/
Comment on attachment 8711784 [details]
MozReview Request: Bug 1182595 - Refactor tpaint to use new common Profiler scripts. r?jmaher

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32299/diff/1-2/
Comment on attachment 8711785 [details]
MozReview Request: Bug 1182595 - Make ts_paint talos test use new TalosContentProfiler utility. r?jmaher

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32301/diff/1-2/
Comment on attachment 8711786 [details]
MozReview Request: Modify pageloader to use TalosParentProfiler

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32303/diff/1-2/
Thanks Mike, one small thing I had to change is that for non-e10s mode the profiler is paused before the scrolling test starts and then tscroll.js resumes it. For the e10s tscroll.js is in the child as a frame script so it cannot resume it hence, the chrome process is not getting profiled at all. As a quick hack I just resumed the parentProfiler right before we send the message to the child to start running the test. It's not perfect but did the job for me :)
Flags: needinfo?(mconley)
Attachment #8699195 - Flags: review?(jmaher)
Comment on attachment 8699195 [details]
MozReview Request: Bug 1182595 - Add common, e10s-friendly SPS Profiler scripts that Talos tests can use. r?jmaher

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/28193/diff/3-4/
Attachment #8711784 - Attachment description: MozReview Request: Refactor tpaint to use new Profiler scripts → MozReview Request: Bug 1182595 - Refactor tpaint to use new common Profiler scripts. r?jmaher
Attachment #8711784 - Flags: review?(jmaher)
Comment on attachment 8711784 [details]
MozReview Request: Bug 1182595 - Refactor tpaint to use new common Profiler scripts. r?jmaher

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32299/diff/2-3/
Attachment #8711785 - Attachment description: MozReview Request: Make ts_paint talos test use new TalosContentProfiler utility. r?jmaher → MozReview Request: Bug 1182595 - Make ts_paint talos test use new TalosContentProfiler utility. r?jmaher
Attachment #8711785 - Flags: review?(jmaher)
Comment on attachment 8711785 [details]
MozReview Request: Bug 1182595 - Make ts_paint talos test use new TalosContentProfiler utility. r?jmaher

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32301/diff/2-3/
Attachment #8711786 - Attachment is obsolete: true
I finally think this might be reviewable. Just going to push to try with and without profiling:

Without profiles: https://treeherder.mozilla.org/#/jobs?repo=try&revision=30170668f54f

With: https://treeherder.mozilla.org/#/jobs?repo=try&revision=97a2036998cd
the without yields ts_paint issues in both e10s and non-e10s, and the with profiles yields even more issues.  I will wait for the try results to be greener before reviewing if that is ok.
Comment on attachment 8699195 [details]
MozReview Request: Bug 1182595 - Add common, e10s-friendly SPS Profiler scripts that Talos tests can use. r?jmaher

https://reviewboard.mozilla.org/r/28193/#review33165

I like where this is going.  Looking at the integration pieces and general code it looks fine.  I have to admit the profiling code is a bit over my head.

::: testing/talos/talos/talos-powers/components/TalosPowersService.js
(Diff revision 4)
> -        break;

why are we removing sessionstore-windows-restored?  is it not used anymore?
Attachment #8699195 - Flags: review?(jmaher)
Comment on attachment 8711784 [details]
MozReview Request: Bug 1182595 - Refactor tpaint to use new common Profiler scripts. r?jmaher

https://reviewboard.mozilla.org/r/32299/#review33169

just a question, this is really simple looking.

::: testing/talos/talos/startup_test/tpaint.html:107
(Diff revision 3)
> +    TalosContentProfiler.contentMarker("tpaint-stopwatch start");

I don't understand the need for tpaint-stopwatch start/stop when we have 'start: tpaint' and 'end: tpaint' already.  is this something specific to the profiler?
Attachment #8711784 - Flags: review?(jmaher) → review+
Comment on attachment 8711785 [details]
MozReview Request: Bug 1182595 - Make ts_paint talos test use new TalosContentProfiler utility. r?jmaher

https://reviewboard.mozilla.org/r/32301/#review33171

this looks good, is there any concern with finishStartupProfile if we are not profiling?  I want to make sure this doesn't impact the test when we are not profiling.
Comment on attachment 8711785 [details]
MozReview Request: Bug 1182595 - Make ts_paint talos test use new TalosContentProfiler utility. r?jmaher

https://reviewboard.mozilla.org/r/32301/#review33173
Attachment #8711785 - Flags: review+
https://reviewboard.mozilla.org/r/28193/#review33165

> why are we removing sessionstore-windows-restored?  is it not used anymore?

Originally, we were initializing talos-powers after sessionstore-windows-restored, but we can't do that anymore because there are some tests that have content signaling for profiling before that notification has fired. Now we fire soon after the user profile has been loaded. The only concern here is that this adds a _tiny_ bit of overhead to load the frame script, which might affect some talos numbers, but if we go in aware of that, we can just write it off, right?
https://reviewboard.mozilla.org/r/28193/#review33165

> Originally, we were initializing talos-powers after sessionstore-windows-restored, but we can't do that anymore because there are some tests that have content signaling for profiling before that notification has fired. Now we fire soon after the user profile has been loaded. The only concern here is that this adds a _tiny_ bit of overhead to load the frame script, which might affect some talos numbers, but if we go in aware of that, we can just write it off, right?

yes, we can easily *rebaseline* the results, happens all the time when we understand what is the cause :)
Comment on attachment 8699195 [details]
MozReview Request: Bug 1182595 - Add common, e10s-friendly SPS Profiler scripts that Talos tests can use. r?jmaher

https://reviewboard.mozilla.org/r/28193/#review33221
Attachment #8699195 - Flags: review+
https://hg.mozilla.org/integration/fx-team/rev/ad4d966f23677027aa0bd08f195f1478dd9d6d8c
Bug 1182595 - Add common, e10s-friendly SPS Profiler scripts that Talos tests can use. r=jmaher
Summary: Make profile dumping work with e10s → Add a common SPS Profiling framework for talos tests
Comment on attachment 8711784 [details]
MozReview Request: Bug 1182595 - Refactor tpaint to use new common Profiler scripts. r?jmaher

Going to move this out to another bug. There's work to do to reduce the number of runs these tests do when SPS profiling, since otherwise we timeout while symbolicating these huge profiles.
Attachment #8711784 - Attachment is obsolete: true
Comment on attachment 8711785 [details]
MozReview Request: Bug 1182595 - Make ts_paint talos test use new TalosContentProfiler utility. r?jmaher

Going to move this out to another bug. There's work to do to reduce the number of runs these tests do when SPS profiling, since otherwise we timeout while symbolicating these huge profiles.
Attachment #8711785 - Attachment is obsolete: true
Blocks: 1251363
https://hg.mozilla.org/mozilla-central/rev/ad4d966f2367
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
You need to log in before you can comment on or make changes to this bug.