1250169 - RSS collection in tp5 should collect data from all processes, especially content process in e10s mode

Reporter

Description

•

8 years ago

Currently we measure the main process RSS, but in e10s land we have main/content process.

we need to sort out the other counters as well.

William Lachance (:wlach)

Comment 1

•

8 years ago

Is tp5rss still relevant with AWSY and friends now reporting to perfherder? Just asking, not implying.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 2

•

8 years ago

this is a good time to evaluate what is different.  I would be happy to remove memory counters that are duplicated in AWSY as long as we have e10s/non-e10s data on all platforms.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 3

•

8 years ago

:erahm, do you have thoughts here?

Flags: needinfo?(erahm)

Eric Rahm [:erahm]

Comment 4

•

8 years ago

(In reply to Joel Maher (:jmaher) from comment #3)
> :erahm, do you have thoughts here?

It depends on what tp5rss does. If we're talking about the tp5 test described on the wiki [1], it sounds like it's measuring something different than what AWSY does. The description for RSS and Working Set indicates they're sampled every 20s, so that could be useful for detecting transient memory spikes (well maybe better then AWSY at least).

AWSY opens 100 tp5 pages in 30 tabs waiting 10 seconds per tab, then closes the tabs. It does this 5 times and makes several final memory measurements afterwards (immediately, after 30 seconds, and after forcing garbage collection), we then close the tabs and remeasure. We also make an effort to simulate user action to trigger other heuristic based garbage collection (such as compacting GC).

It should also be noted we're not sending e10s data to treeherder yet. We do have support, but I'm not sure how we want to report that data.

Do we want to separate the data? This gets tricky with multiple content processes (their PID isn't deterministic), and do we want the RSS or USS of the content processes?

Should it be one combined metric as I've been doing for e10s memory analysis? For that case I do |total_memory = RSS(parent) + SUM(USS(children))|.

[1] https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5

Flags: needinfo?(erahm)

Eric Rahm [:erahm]

Comment 5

•

8 years ago

Oh right, also AWSY is only on Linux currently. It works on other platforms but we only have one test machine.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 6

•

8 years ago

for Talos we only collect counters for tp5 runs, so we are effectively measuring the same thing.  We do e10s, non e10s, opt, pgo, linux|win|osx.  Maybe we should compare data for linux where we think there is duplication and either terminate collecting memory for linux or accept duplication for the time being.

Another thing is that we could modify talos to collect similar numbers to AWSY until we get more os/config coverage on AWSY.

I think we need to answer:
* what test makes sense to record memory (tp5 is different that AWSY, but similar)
* what specifically do we want to collect (RSS from the process or system polling)

answering that should help us determine which system and what to record.

Regarding the mention of collecting data on a timer from the OS vs collecting RSS from the process after each pageload, we only report the average data to perfherder and don't store the entire collection.  We had looked at this about 6 months ago and determined that there was not a simple way to determine what was useful other than the average value.

Jim Mathies [:jimm]

Updated

•

8 years ago

Blocks: e10s-tests

tracking-e10s: --- → +

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 7

•

8 years ago

after further examination, it looks as if we taken child processes into account for linux as well as 'plugin-container' for windows.  Possibly we can close this as worksforme?

Chris H-C :chutten

Comment 8

•

8 years ago

Interesting behaviour: Private Bytes is being reported twice as often for tp5o-e10s than normal tp5. A modified cmanager_linux.py printing out the pidlist shows that the single, low result at the beginning of the test is from before the second process starts, but all the others come from managers managing two processes: http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/656163f081fd2bc74cfd3d8ad31ef75fee529bd3f9503ae3661dd93032860dccbadb84aa78fad5f5aefe62aa64789eb3801d659b75b193b4eb866976ceda988a

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 9

•

8 years ago

I cannot decipher the log file/comment- is this what we expected via irc?

Chris H-C :chutten

Comment 10

•

8 years ago

Sorry about that.

The log is showing that, as expected, the first sample is from before the second process spins up. 

The manager was modified to output the sample and the pidlist at the same time. Unfortunately, from just that I can only confirm what we already suspected about the one single-process sample. I still don't know the reason for t extra samples.

Eric Rahm [:erahm]

Comment 11

•

8 years ago

Can I get some clarification on what "private bytes" means in this context? Or can you just point me to the code doing the measurements? Are we summing the USS of each process or is it RSS or is it a combo?

Flags: needinfo?(chutten)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 12

•

8 years ago

private bytes is outlined here:
https://bugzilla.mozilla.org/show_bug.cgi?id=1253984#c2

For windows we add a counter for each process and then sum those counters:
https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/cmanager_win32.py#224

I am not sure if those pdh counters relate to USS or RSS.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 13

•

8 years ago

odd though, we started this bug talking about RSS, then turned it into private bytes.

right now RSS is collected from inside the browser:
https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/pageloader/chrome/memory.js#66

Chris H-C :chutten

Comment 14

•

8 years ago

I blame the :jmaher of 8 days ago for directing me to post here the partial results of my Private Bytes investigation :)

Maybe the Private Bytes counter needs its own bug?

Flags: needinfo?(chutten)

Eric Rahm [:erahm]

Comment 15

•

8 years ago

(In reply to Joel Maher (:jmaher) from comment #13)
> odd though, we started this bug talking about RSS, then turned it into
> private bytes.
> 
> right now RSS is collected from inside the browser:
> https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/
> pageloader/chrome/memory.js#66

Okay, so memory.js is broken. I'm having a hard time understanding the intent of it, but no matter what the intent is it's not accomplishing it's goals. It kind of stumbles into doing the right thing for the single process case (if the right thing is to print the RSS of main process).

Lets take a step back and discuss this measurement. Even if it did the "right thing" and measured the RSS of every process, I'm not sure that's a terribly useful measurement. Summing the RSS of the parent and the USS of the children is probably more useful.

I believe we do that in another measurement (someone with more Talos knowledge can confirm that), if that's the case maybe we should just drop this measurement or keep it intentionally main process only.

Robert Wood [:rwood]

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WONTFIX

Bugzilla

Quick Search

RSS collection in tp5 should collect data from all processes, especially content process in e10s mode

Categories

(Testing :: Talos, defect)

Tracking

(e10s+)

People

(Reporter: jmaher, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated