Closed Bug 685632 Opened 13 years ago Closed 13 years ago

Please collect RSS (memory) data during Talos testing on Android

Categories

(Testing :: Talos, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mfinkle, Assigned: jmaher)

References

Details

(Whiteboard: [android_tier_1][MemShrink:P1])

Attachments

(4 files, 6 obsolete files)

10.21 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
9.85 KB, patch
wlach
: review+
k0scist
: feedback+
Details | Diff | Splinter Review
1.13 KB, patch
wlach
: review+
Details | Diff | Splinter Review
1.83 KB, patch
bear
: review+
Details | Diff | Splinter Review
We really need to see the memory use during activities like pageloading (Tp) and even startup (Ts)

The Meamo tests support recording RSS during the test run. Can we add that for Android too?
Summary: Please collect RSS (memory) data during Talos testing → Please collect RSS (memory) data during Talos testing on Android
http://hg.mozilla.org/users/mfinkle_mozilla.com/testharness/file/3dd6a3781145/content/overlay.js#l170

One suggestion is to take a snapshot of the memory after receiving the 'load' event in pageloader.  We could use this above code inside of pageloader to do that.
IMO this should be a priority, the numbers from Maemo say that we're horrible here.
Whiteboard: I'm meeting with Clint and Joel on 09/22 and will update this whiteboard with an ETA.
Whiteboard: I'm meeting with Clint and Joel on 09/22 and will update this whiteboard with an ETA.
Whiteboard: [android_tier_1]
Whiteboard: [android_tier_1] → [android_tier_1][MemShrink]
In case it's not obvious, we need to measure the RSS of both the chrome and content processes.  I'd prefer for these two numbers to be reported separately, since that makes debugging regressions easier.
Clint, assigning to you, please reassign to a more appropriate person if there is one!
Assignee: nobody → ctalbert
Whiteboard: [android_tier_1][MemShrink] → [android_tier_1][MemShrink:P1]
this patch adds a -rss flag to pageloader and when used we will dump the rss for each process to the stdout after each page is loaded.  This is different from the traditional counter manager where we collect the rss from a separate process every 20 seconds.  

Please advise on use of collecting memory rss and the change in frequency.  I ran this by :dougt/:mfinkle and they mentioned if we could get the peak memory (as a followup bug) that it would get all the data they care about.

Here is an example of collecting this during tsvg (single cycle vs 5 in production):
i,value
0,55541760
1,62844928
2,63008768
3,63074304
4,63172608
5,62722048
6,63119360
7,63148032
8,63311872
9,63508480
10,63553536
11,64081920
RETURN: tsvg_main_rss: 59.6MB
Assignee: ctalbert → jmaher
Status: NEW → ASSIGNED
Attachment #567995 - Flags: review?(justin.lebar+bug)
Attachment #567995 - Flags: review?(mark.finkle)
this patch adds the ability for us to output rss data from pageloader and then parse the resulting output to build up rss.  I want to get some general feedback on this approach.  Specifically because we need to modify the pageloader args to use -rss, as well as the counters. 

current counters:
  linux_counters : ['Private Bytes', 'RSS', 'XRes']

proposed counters:
  linux_counters : ['Private Bytes', 'Main_RSS', 'XRes']

wlach, can you please advice on how much sense this makes to another person.  Also my results parsing code is rudimentary as I need to test on mobile to get the name of the child process.  Currently it is hacked as 'Main', but I see it being 'Main' and 'Child'.  This is sort of unknown for me since I don't know in the future if we will have child[0-9] or something like that.
Attachment #567997 - Flags: feedback?(wlachance)
Comment on attachment 567995 [details] [diff] [review]
collect rss inside pageloader (1.0)

Review of attachment 567995 [details] [diff] [review]:
-----------------------------------------------------------------

You've copied far too much of aboutMemory.js here.  If you're just getting RSS you should be able to use  nsIMemoryReporterManager::resident and save yourself 90% of the effort.
Attachment #567995 - Flags: feedback-
will that get resident memory for all processes?
Nick, maybe we should add a function in nsIMemoryReporterManager which returns a dictionary of RSS values, one for each process?
Comment on attachment 567995 [details] [diff] [review]
collect rss inside pageloader (1.0)

Let's try to expose the right API for this.
Attachment #567995 - Flags: review?(justin.lebar+bug) → review-
You know, it might be just as easy to do:

  var mgr = Cc["@mozilla.org/memory-reporter-manager;1"].
      getService(Ci.nsIMemoryReporterManager);
  var residents = {};
  var e = mgr.enumerateReporters();
  while (e.hasMoreElements()) {
    var reporter = e.getNext().QueryInterface(Ci.nsIMemoryReporter);
    if (reporter.path == 'resident') {
      residents[reporter.process] = reporter.amount;
    }
  }

This assumes that |resident| isn't in a multi-reporter, but that seems better than querying every multi-reporter unnecessarily.
Comment on attachment 567997 [details] [diff] [review]
allow for talos to output rss from pageloader (WIP)

The basic outline of the patch looks fine, aside from the issue you mention wrt names, and it looks like the logic here could easily be extended to handle that.

Minor nit: The parantheses on the ifs aren't necessary. I know we do this (inconsistently) elsewhere in Talos, but it's generally not considered to be good python coding style and is inconsistent with what we do in other modules. If we're only going to be measuring the memory use of a process named 'Main', I'd probably just do:

if rssmatch and rssmatch.group(1) == 'Main':
Attachment #567997 - Flags: feedback?(wlachance) → feedback+
Comment on attachment 567995 [details] [diff] [review]
collect rss inside pageloader (1.0)

I think you are updating this patch
Attachment #567995 - Flags: review?(mark.finkle)
I just realized that RSS is not what we want to measure here.

The content process shares libxul and other libraries with the chrome process, but libxul and friends are counted once in each process's RSS.  That means that we'll overcount how much memory we're actually using, and as we add more content processes, we'll overcount even more.

The right measure is proportional set size (PSS), not RSS.  PSS counts each shared library proportionally with how many times it's loaded.

We can read PSS in about:memory, but it's kind of tough to get at it from script, since it's hidden in a memory multi-reporter.  We'll need to fix this.
cool, is there a bug on file to track that PSS?  mfinkle/dougt, would PSS be good to measure for you guys?
By the way, if you're reporting the content and chrome processes' memory usage *separately*, RSS is a fine measure.  In fact, it's probably the right one, since in that case, you don't want adding another content process to decrease the Chrome process's memory usage.  It's only when you're *summing* the processes' memory usage that PSS is the right measure.
> cool, is there a bug on file to track that PSS?

We can do that in bug 695676.
updated talos bits...want to restructure the rss regex parsing a bit, but this works for android on main/child process!
Attachment #567997 - Attachment is obsolete: true
updated patch for simpler pageloader.  This works for main/child process, tested on Fennec.  I wanted to get my work up on the bug and I can polish the patches and workflow next week.
Attachment #567995 - Attachment is obsolete: true
Attachment #568812 - Flags: feedback?(justin.lebar+bug)
Comment on attachment 568812 [details] [diff] [review]
collect rss inside pageloader (2.0)

This is pretty good, but I'd like to take another look at it.

> +function memOnLoad() {
> +function memOnUnload() {

You don't call these functions onload and onunload, so let's rename them.

> +function memOnUnload() {
> +  // We need to check if the observer has been added before removing; in some
> +  // circumstances (eg. reloading the page quickly) it might not have because
> +  // onLoad might not fire.

This comment doesn't make sense anymore.  In fact, since you always call memOnLoad before you call memOnUnload, I don't think you need the gAddedObserver check at all.

> +      procName = procName.split(' ')[0];

OOC, what does the name of the child process look like such that you need to only take the first word?

> +/**
> + * Top-level function that does the work of generating the page.
> + */
> +function collectRSS()

This comment is no longer relevant.

> @@ -87,16 +88,17 @@ var gIOS = null;
>  function plInit() {
>    if (running) {
>      return;
>    }
>   running = true;
>  
>    cycle = 0;
> 
> +  memOnLoad();

Do you need this initial call to memOnLoad()?  If you do, it should be in an |if (reportRSS)| block.

> @@ -287,16 +290,21 @@ function plLoadPage() {
>          content.messageManager.removeMessageListener('PageLoader:MozAfterPaint', plPaintHandler);
>      };
>    }
>  
>    if (timeout > 0) {
>      timeoutEvent = setTimeout('loadFail()', timeout);
>    } 
>    start_time = Date.now();
> +  if (reportRSS) {
> +    memOnLoad();
> +    rss = collectRSS();
> +    dumpLine(rss);
> +  }
>    content.loadURI(pageName);
>  }

loadURI(pageName) is where we load the *next* page, right?  Measuring right before we load the next page is fine (actually, it's good, since that means we get a measurement before we load any pages), but then do you ever measure after we load the final page?

Also, why start the timer before we collect RSS?

The collection code is also wrong.  You need to call memOnLoad and then wait for ChildMemoryListener to fire.  That's the point of 

> +    os.notifyObservers(null, "child-memory-reporter-request", null);
> +    os.addObserver(ChildMemoryListener, "child-memory-reporter-update", false);

This says "collect the memory report from the child, and when you're done, fire a 'child-memory-reporter-update' notification, which will call ChildMemoryListener.

If you don't wait for the callback, the memory information you're collecting when you call collectRSS() will be out-of-date.  (Actually, it'll be up-to-date for the main process and out of date for the child process.)

So you need to yield -- stop running JS entirely -- and wait for ChildMemoryListener to be called back.  Only after you've collected RSS there can you continue running script and load the next page.

> +    s = s.replace(/^\s*/, '').replace(/\s*$/, '');

Since the goal of regular expressions is to be as unreadable as possible, how about

  s = s.replace(/^\s+|\s+$/g, '');

?

> +++ b/chrome/pageloader.xul
> @@ -45,12 +45,13 @@
>    onload="plInit()">
>  
>    <script type="application/x-javascript" 
>            src="chrome://global/content/globalOverlay.js"/>
>    <script type="application/x-javascript" src="MozillaFileLogger.js"></script>
>    <script type="application/x-javascript" src="report.js"></script>
>    <script type="application/x-javascript" src="pageloader.js"></script>
>    <script type="application/x-javascript" src="quit.js"></script>
> +  <script type="application/x-javascript" src="memory.js"></script>

Do you still need this?
Attachment #568812 - Flags: feedback?(justin.lebar+bug) → feedback-
In case it's not clear, every time you want to collect a memory report from the child process, you need to call

> +    os.notifyObservers(null, "child-memory-reporter-request", null);

and wait for ChildMemoryListener to be called back.  But you only need to call

> +    os.addObserver(ChildMemoryListener, "child-memory-reporter-update", false);

once.
updated with feedback.  I am much happier with this patch.
Attachment #568812 - Attachment is obsolete: true
Attachment #569785 - Flags: review?(justin.lebar+bug)
Comment on attachment 569785 [details] [diff] [review]
collect rss inside pageloader (3.0)

This is a lot better, I don't think we're quite there yet, principally because of this:

> So [after you call startMemCollector] you need to yield -- stop running JS 
> entirely -- and wait for 
> ChildMemoryListener to be called back.  Only after you've collected RSS there 
> can you continue running script and load the next page.

Does that make sense?  ChildMemoryListener is called back asynchronously.  It doesn't get called right after os.notifyObservers('child-memory-reporter-request').  You need to wait until ChildMemoryListener has been run before you can call content.loadURI().
 
Unless I'm misunderstanding something here?

>    start_time = Date.now();
> +  if (reportRSS) {
> +    if (useBrowserChrome)
> +      startMemCollector();
> +    else
> +      dumpLine(collectRSS());
> +  }
>    content.loadURI(pageName);

> Why start the timer before we collect RSS?

Also, what happens if useBrowserChrome is true but we don't have a separate content process?
Attachment #569785 - Flags: review?(justin.lebar+bug) → review-
- Added a callback to startMemCollector so we only continue the flow once startMemCollector is finished.

- Added a 10 second timer for the first time we collect RSS.  If we time out, then we never received a child-memory-reporter-update.

Tested on desktop linux and android (with/without content)
Attachment #569785 - Attachment is obsolete: true
Attachment #569924 - Flags: review?(justin.lebar+bug)
What repository is this patch against?  This looks right, but I'd like to get some context.
Comment on attachment 569924 [details] [diff] [review]
collect rss inside pageloader (3.1)

r=me on the memory bits!  I'd like njn to take a look, because this is really important to get right.

> function startMemCollector(callback) {

I think the logic flow could be cleaned up here.  The assumption that !gMemoryObserver implies gChildProcess == true is baked in pretty hard, but I think it'd probably be cleaner if it went like:

startMemCollector(callback) {
  if (!gMemoryObserver) {
    // This is our first time here.
  }
  else {
  }
}

even if this resulted in a bit of code duplication.

Also, I might rename some of these functions.  startMemCollector "starts" in a different way than stopMemCollector "stops".  And ChildMemoryListener isn't only a listener.  But I'll leave it to your next reviewer to nit about these things.  :)
Attachment #569924 - Flags: review?(nnethercote)
Attachment #569924 - Flags: review?(justin.lebar+bug)
Attachment #569924 - Flags: review+
I'm curious whether there's a better way to determine whether there's a content process than this 10s timeout.  But using a timeout here doesn't seem so bad, since we assume that the Talos machines, of all machines, don't have long pauses due to other jobs.
I am not a big fan of this 10s thing, but I don't know of a better method...somebody must know.  This only causes a 10s delay on the first collection, so we do not all a lot of overhead.  

Right now this is targeted for Android and OSX, what about Linux and Windows?  Will we be replacing the existing RSS on the desktop or collecting this in addition?
Comment on attachment 569924 [details] [diff] [review]
collect rss inside pageloader (3.1)

Review of attachment 569924 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not going to give r+ or r-.  CollectRSS() looks good, and that's the heart of it.  As for the rest of it... maybe I'm not understanding enough about how pageloader.js works.  But memory.js feels like spaghetti to me, like you chose the wrong place in pageloader.js to add the new functionality and that made things much uglier than they needed to be.  And the presence of dead variables and arguments doesn't fill me with confidence.

::: chrome/memory.js
@@ +1,5 @@
> +
> +var gMemoryObserver = false;
> +var gChildProcess = true;
> +var gMemTimer = null;
> +var gMemCallback = null;

For example, these global variables are pretty gross.  I figure you're calling startMemCollector multiple times, and you want to do things like get the observer service only once?  Instead of checking a boolean that tells you if you've already got it, every time, couldn't you just do some initialization before the first call to startMemCollector and setup a configuration object with the appropriate properties -- |os|, |callback|, |timer|, |childProcess| -- and pass that as an argument to startMemCollector?

@@ +36,5 @@
> +
> +  if (gChildProcess) {
> +    os.notifyObservers(null, "child-memory-reporter-request", null);
> +  } else {
> +    ChildMemoryListener(null, null, null);

It's confusing that you use ChildMemoryListener in two ways -- both as a listener, and then as the fallback case (with null args) for when the child observation fails.  It would be better to have a separate function for the failure case with a different name, maybe |doMeasurements()|.  It would call |dumpLine(collectRSS())| and |gMemCallback()|, and ChildMemoryListener() would call it.

@@ +61,5 @@
> +function collectRSS() {
> +  var mgr = Components.classes["@mozilla.org/memory-reporter-manager;1"].
> +      getService(Components.interfaces.nsIMemoryReporterManager);
> +  var e = mgr.enumerateReporters();
> +  var residents = {};

This variable is unused.

::: chrome/pageloader.js
@@ +67,5 @@
>  var running = false;
>  var forceCC = true;
> +var reportRSS = false;
> +var gNextURI = "";
> +var plForce = false;

The inconsistent naming of globals (some with a 'g' prefix, some without) is weird but that appears to be the prevailing style for this file.  A comment explaining what |plForce| means would be nice, esp. because the variable name itself doesn't give much clue.  (What's being forced?)

@@ +302,5 @@
> +    startAndLoadURI();
> +  }
> +}
> +
> +function startAndLoadURI(delay) {

|delay| is a dead argument and can be removed.

@@ +306,5 @@
> +function startAndLoadURI(delay) {
> +  if (gNextURI != "") {
> +    start_time = Date.now();
> +    content.loadURI(gNextURI);
> +    gNextURI = "";

Why use a global variable to pass in the URI?  Surely this can just be a normal argument?
Attachment #569924 - Flags: review?(nnethercote)
(In reply to Nicholas Nethercote [:njn] from comment #31)
> But memory.js feels like spaghetti to me,
> like you chose the wrong place in pageloader.js to add the new functionality
> and that made things much uglier than they needed to be.

Just to clarify:  it feels to me like you need to do some initialization (setup the observer, etc) before any pages get loaded, and then call a function that does measurements each time a page loads.  But you jammed the initialization code into the measurement function, which makes it complicated.
updated to address comments.  This has separate functions for init, collect, stop.  No doubling up of functionality.  Also I have removed all but 2 of the global variables.  Thanks for the great comments in the previous comments!
Attachment #569924 - Attachment is obsolete: true
Attachment #571211 - Flags: review?(nnethercote)
Comment on attachment 571211 [details] [diff] [review]
collect rss inside pageloader (3.14)

Review of attachment 571211 [details] [diff] [review]:
-----------------------------------------------------------------

Looks much better!  The control flow still makes my head spin, but I can't see how to simplify it further.

::: chrome/memory.js
@@ +5,5 @@
> +
> +/*
> + * Initialize memory collector.  Determine if we have a child process.
> + */
> +function intializeMemoryCollector(callback, args) {

You misspelled "initialize" :)

@@ +6,5 @@
> +/*
> + * Initialize memory collector.  Determine if we have a child process.
> + */
> +function intializeMemoryCollector(callback, args) {
> +    if (callback) {

Is this condition ever false?  Can it be removed?

@@ +47,5 @@
> +/*
> + * Collect memory from all processes and callback when done collecting.
> + */
> +function collectMemory(callback, args) {
> +  if (callback) {

Can this condition ever fail?  Can it be removed?

@@ +78,5 @@
> +      procName = reporter.process;
> +      if (procName == '')
> +        procName = "Main";
> +        
> +      //For content process it is in the format "Content (<PID>)", we just want Content

I'd put the second "Content" in double quotes.
Attachment #571211 - Flags: review?(nnethercote) → review+
Blocks: 699644
this patch allows for collection of rss from the browser_output.txt instead of system counters collected externally.  The one thing I need some consensus on is to either hardcode the -rss into the tp line (this patch adds it via --rss to PerfConfigurator) and adding "remote_counters : ['Main_RSS', 'Content_RSS']".  If we want the command line, we shouldn't have to hardcode stuff in the .config file. Alternatively we might only want to collect certain types of data from the pageloader RSS, so we wouldn't want to specify everything.
Attachment #568808 - Attachment is obsolete: true
Attachment #573777 - Flags: feedback?(jhammel)
Comment on attachment 573777 [details] [diff] [review]
allow for talos to output rss from pageloader (0.9)

The general patch seems fine and consistent with the way talos currently is written.  It would be nice to clean some of this up, but probably not blocking for this bug
Attachment #573777 - Flags: feedback?(jhammel) → feedback+
Attachment #573777 - Flags: review?(wlachance)
Comment on attachment 573777 [details] [diff] [review]
allow for talos to output rss from pageloader (0.9)

In general this looks good, r+ with nits addressed.

>+                if self.rss and (line.find('-tpchrome') > 0): 
>+                    #if mozAfterPaint is True add -tpmozafterpaint option 
>+                    line = line.replace('-tpchrome ','-rss -tpchrome ')
>+                    newline = line

Two things:

1. Conditional would be more clearly written as 'if self.rss and '-tpchrome' in line:'
2. The comment 'if mozAfterPaint is True add -tpmozafterpaint option' looks wrong to me. Either correct it, or go ahead and remove it (I don't think it adds that much value).

>+                if (("Main_RSS" in counters) or ("Content_RSS" in counters)):
>+                    RSS_REGEX = re.compile('RSS:\s+([a-zA-Z0-9]+):\s+([0-9]+)$')
>+                    counter_results['Main_RSS'] = []
>+                    counter_results['Content_RSS'] = []
>+                    for line in results_raw.split('\n'):
>+                        rssmatch = RSS_REGEX.search(line)
>+                        if (rssmatch):
>+                            if (rssmatch.group(1) == 'Main'):
>+                                counter_results['Main_RSS'].append(rssmatch.group(2))
>+                            if (rssmatch.group(1) == 'Content'):
>+                                counter_results['Content_RSS'].append(rssmatch.group(2))

Two more things:

1. I think I've said this before, but IMO we should try to move away from using parentheses inside if statements in new code in Talos. They're not necessary in python and go against PEP8.
2. I think the matching code would be clearer if you did something like:

(type, value) = (rssmatch.group(1), rssmatch.group(2))

and assigned the parts of the tuple to counter_results.
Attachment #573777 - Flags: review?(wlachance) → review+
landed on talos:
http://hg.mozilla.org/build/talos/rev/da1a81e365d4

still need to update the graph server and get this in releng staging.
I overlooked the obvious while developing and testing my other talos patch, this is a simple fix to correct that.
Attachment #574864 - Flags: review?(wlachance)
Comment on attachment 574864 [details] [diff] [review]
quick fix to make talos work without requiring remote_counters (1.0)

The logic here looks fine, but the use of an exception here is a bit heavy handed. Better to just:

counters = test_config.get(self.platform_type + 'counters', [])
Attachment #574864 - Flags: review?(wlachance) → review+
Attachment #575182 - Flags: review?(aki) → review?(bear)
Attachment #575182 - Flags: review?(bear) → review+
Depends on: 703672
we have live data (inbound):
http://graphs-new.mozilla.org/graph.html#tests=[[135,63,20],[134,63,20],[136,63,20]]&sel=none&displayrange=7&datatype=running
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: