Closed Bug 1254087 Opened 7 years ago Closed 5 years ago

[e10s] regression on tp5o % Processor Time

Categories

(Testing :: Talos, defect, P3)

defect

Tracking

(e10s-)

RESOLVED WONTFIX
Tracking Status
e10s - ---

People

(Reporter: gkrizsanits, Unassigned)

References

Details

windows7-32 21.60% 	
windowsxp   16.58% 

turning off APZ: -5.90%

https://treeherder.allizom.org/perf.html#/e10s
Blocks: e10s-perf
tracking-e10s: --- → ?
Priority: -- → P4
Priority: P4 → P3
Flags: needinfo?(jgriffiths)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #0)
> windows7-32 21.60% 	
> windowsxp   16.58% 
> 
> turning off APZ: -5.90%
> 
> https://treeherder.allizom.org/perf.html#/e10s

I need some prose that explains this bug for me to be able to evaluate it. Right now it seems like something that should block.
Flags: needinfo?(jgriffiths) → needinfo?(gkrizsanits)
(In reply to Jeff Griffiths (:canuckistani) (:⚡︎) from comment #1)
> I need some prose that explains this bug for me to be able to evaluate it.
> Right now it seems like something that should block.

The consensus was during the meeting that increased CPU time with the content process was expected and it's within the acceptance range (if I understood it correctly...).

Anyway, tp5o tests the time it takes Firefox to load the tp5 web pages multiple times and while doing it we measure the CPU usage as well with a windows API (this test is windows only). This test does not run in ASAP mode so it should be quite similar to what a regular user would experience while loading these pages I think.

That being said I tried to look into this probe but could not figure out how it's working. My guess is that it's using PdhAddCounter windows API to measure the CPU usage, what's not clear if it's measuring the child+parent process or only the parent. The later would be more worry some... Joel, do you know how this probe works exactly? Do we measure content process + parent process in the e10s case?

I would personally care less about this bug if that extra CPU usage came with shorter load times on the tp5 pages...
Flags: needinfo?(gkrizsanits) → needinfo?(jmaher)
in talos for the counters, we assume a child process is named 'plugin-container'.  And we try to collect the counters for the main process (which we launch) and the child process.  These collected values are added together and reported.

What I don't know is if we really collect counters for the child process or if plugin-container is really the one and only child process.  Can we confirm process names are only firefox.exe and plugin-container?  We could push to try with a modified talos to print out status of the child process counters:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8e329cbe267
Flags: needinfo?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #3)
> in talos for the counters, we assume a child process is named
> 'plugin-container'.

That used to be the case up until very recently, we're just changing that in bug 1114647.

> What I don't know is if we really collect counters for the child process or
> if plugin-container is really the one and only child process.  Can we
> confirm process names are only firefox.exe and plugin-container?  We could
> push to try with a modified talos to print out status of the child process
> counters:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8e329cbe267

If we collect counter for child process, with bug 1114647 landed we should see a change (and update the test accordingly).
Depends on: e10s-rename
that will be nice if we see the change.  Assuming we are collecting parent+child %cpu, would there be any other concerns with this e10s difference?
Eh, sorry, the patch in bug 1114647 already taking care of the renaming in Talos so it won't show any difference. Anyway, I will test this locally, but it very much looks like we're doing the right thing here and measuring the combined CPU usage for both processes.
% processor time seems to work for both firefox.exe and plugin-container.  The one counter that doesn't exist is 'modified page list bytes'- that seems to fail for plugin-container.
Joel, is there follow up work here related to talos results now that the parent bug is fixed?
No longer blocks: e10s-perf
Component: General → Talos
Flags: needinfo?(jmaher)
Product: Core → Testing
:jimm, I was under the impression that the bug changing the process names would fix talos, the original patches did that.  Can I get a list of processes to expect and which scenarios (platforms/buildtypes) to expect them?
Flags: needinfo?(jmaher)
We're using firefox.exe for content processes on Windows. AFAICT we didn't change anything on mac or linux. Jld can confirm this.
Flags: needinfo?(jld)
Yes, Windows content processes are now firefox.exe and not plugin-container.exe when the parent process is firefox.exe.  However, plugins (NPAPI and GMP) are still plugin-container, and content processes created by xpcshell (or other not-the-browser-app embedders, if any) are also plugin-container; I don't know if Talos needs to deal with those cases.

And, for completeness: Linux might be changing at some point in the future; Mac will stay as plugin-container; and other (non-Tier-1) Unixes that support e10s are as described above for Windows.


Shall I file a bug to add basic self-checks, like whether the content process performance counters aren't all 0, to the automated tests?  If I'd had a way to reproduce this locally and/or on Try, or even confirm that it was in fact broken, I would've tried to fix it before landing instead of resorting to the “land it and see what breaks” approach.
Flags: needinfo?(jld)
I believe the answer here is that we need to collect counters from all firefox.exe processes, not just the one.  We add counters here:
https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/cmanager_win32.py#109

I believe in _addCounters we can ensure we match on all process names.  I would need to look up the windows api for counters to ensure we can do this properly.

if I get time this week, I will hack on this, otherwise next week is reasonable.
a lot has changed since this was last worked on.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.