Closed Bug 419776 Opened 16 years ago Closed 15 years ago

Create new Talos suite: talos should measure x resources

Categories

(Release Engineering :: General, defect, P2)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dougt, Assigned: anodelman)

References

Details

(Keywords: mobile, perf)

Attachments

(3 files, 1 obsolete file)

we should be measuring x resources during the talos run.  Testing between 1.8 and 1.9, i have measured a huge improvement and hope that this stays that way. 


xrestop is the tool needed for measuring what is going on in X.
Is our goal here to measure running size or to detect leaks of graphics resources?  If we want to detect leaks we need to:

1. When we do this run we want to be running with the session history turned off.
2. Same goes for the memory cache.

At first blush it looks like keeping session history and memory cache hangs onto a large amount of X resources and aren't freed for some time.

Might be interesting to do this with stuart's memory pressure stuff as well.

It looks like looking at the Talos code that there might be a place to integrate this.  There's code in there that already reads information out of /proc to get memory size and cpu numbers, yeah?
my goal here was to have something equivalent to tracking RSS on linux, but for X.

Attached file GetXRes function (obsolete) —
This returns the total bytes used by X for a given PID.

I think we should tie this into the cmanager_linux.py as another counter type.  Feel free to improve my greping.

Alice, do you have any thoughts on adding this?
Assignee: nobody → doug.turner
Status: NEW → ASSIGNED
I was hoping to simply do:

Add my function to cmanager_linux.py, then add:

counterDict["XRes"] = GetXRes

to the list of counters in the dictionary.  However, the output file that is created contains only:

1,value

after a tp run.  
And of course, i added the right counter to the sample.config I am running. 
Attachment #315666 - Attachment is obsolete: true
Comment on attachment 315794 [details] [diff] [review]
adds X Resource Tracking to Talos

You also need to add something to your config file like:

    unix_counters : [ 'Private Bytes', 'RSS', 'XRes']


(Note, the addition of XRes in the above list)
We would like to get this on production tboxes soon.
What machines do you want this running on?  All linux/mac?  (I'm assuming here that this wouldn't just be of interest for testing mobile - correct me if I'm wrong).
only matters on linux/gtk machines, so not the mac.  this is for all linux/gtk machines that test tp_RSS.
Oops - yeah, no mac.  Shouldn't have replied before caffeine.

So, you'd like this on all our linux machines?  I'm just trying to scope out what the request is.

I had a quick look and we don't have xrestop installed on our basic talos image, so it would also involve having to install it all over the place.
Oh, and there would also have to be some hacking in talos itself.  At the moment, the counter that we manage (ie, monitoring RSS, Private Bytes, etc) are controlled in two different sets: one for unix style systems and one for windows.  In this case we'd be creating a counter that would only run on linux and not mac and would break the current assumptions in the code.
Can't you just not add the counters to the config files on the mac.
Our system is too smart for that - we share config files across platforms. :)

We probably could do something where we create custom config files for macs, but we are really trying to keep the amount of config files that we use low.  We had an issue in the past with config file explosion where it was very hard to track who was using which and pushing changes to all of them became a finicky nightmare.  

Using mac only configs would double the already large amount of configs.
is it easier to just throw up a new machine?
No - we don't have enough machines for that.

If we want this then we should put in the time to make the changes to talos to allow us to push it to currently up machines.
how can I help?  we are not measuring the dark-matter of memory leaks -- what we allocate inside of X. we think it is pretty important.  
Doug: 

- does this impact runtime or performance of the current talos run? That would help clarify if we can do this on our existing linux talos machines or if we need to start talking about buying more machines and figuring out how to set them up. 

- is this something we need run on every checkin in an automated manner? Or something that could be run manually at major milestones of a release?
1) i am sure there is some cost to running this on talos -- even though it is "in parallel" against the page cycler.  No idea of the actual cost, but I would suspect it would be small.

2) every chickin similar to how we test vmrss.

Attachment #315794 - Flags: review?(anodelman)
Attachment #315794 - Flags: review?(anodelman) → review+
Comment on attachment 315794 [details] [diff] [review]
adds X Resource Tracking to Talos

great alice.  What is next, does talos need another review, or approval, or should I just check it in?
I've r+ed the patch that gets the xres collection code into talos, but this
will not cause it be run anywhere without:

1) seconday patches to have talos maintain different counters for linux/mac 
2) requisite config files updates for all active talos config files
3) xrestop installed on all linux talos boxes

The r+ed patch should be fine for check in as it will not be executed at all, so it is very low risk.
Checking in cmanager_linux.py;
/cvsroot/mozilla/testing/performance/talos/cmanager_linux.py,v  <--  cmanager_linux.py
new revision: 1.2; previous revision: 1.1
done
thanks alice! 
Keywords: mobile, perf
When we have the mobile builds going off hg and talos working for mobile, we can see how it affects talos turning this on and then stage it on the desktop Linux version when we know the effect of it.
Depends on: 426444
In the short term, why can't we stand up a linux gtk/x11 box running this test case (against mozilla-central)?  We are driving blind at the moment wrt x leaks+bloat.
There are currently no spare perf testing machines - as it stands even my staging perf machines have been commandeered for comparison perf testing work.

Comment #22 is the route to having this run in an automated fashion.
(In reply to comment #26)
> In the short term, why can't we stand up a linux gtk/x11 box running this test
> case (against mozilla-central)?  We are driving blind at the moment wrt x
> leaks+bloat.
> 

(In reply to comment #27)
> There are currently no spare perf testing machines - as it stands even my
> staging perf machines have been commandeered for comparison perf testing work.
> 
> Comment #22 is the route to having this run in an automated fashion.
> 

For us to set this up as part of automation framework is non-trivial. If we do it on the same existing talos machines, we have to schedule downtimes for them all, update the talos machines, and then do calibration runs to confirm no perf regressions caused by the change, before we reopen the trees. Alternatively, we could build out a new set of machines just for this new test, avoiding risk of change to existing talos machines, but this needs a set of machines and some setup/maintenance work. Both of these options are not on our plates for this quarter, but we can discuss it with other mobile requests for next quarter.

In the meanwhile, if you have a spare machines, there's nothing to stop you setting up a private machine running this test on a branch of your choosing, and manually triggering it on builds of your choosing as you need.

Hope all that makes sense. I do understand the usefulness of this new test, but I'm trying to juggle this with the other things we already did promise to do. If I'm missing something, maybe we should all have a quick conf call on this?
christian, we should be measuring x resources on mobile.
Assignee: doug.turner → nobody
Status: ASSIGNED → NEW
Blocks: 455755
No longer depends on: 426444
Component: Testing → Release Engineering
Product: Core → mozilla.org
QA Contact: testing → release
Version: Trunk → other
Looks like there's been no activity on this for quite awhile. I'm moving it out to future. If someone is going to grab it to work on, feel free to move it back.
Component: Release Engineering → Release Engineering: Future
There has been no activity because no one on build has been working on it.  There is patch that is ready to land, can we just do that?
Component: Release Engineering: Future → Release Engineering
(In reply to comment #31)
> There has been no activity because no one on build has been working on it. 
> There is patch that is ready to land, can we just do that?

Deferring to Alice on this.
Patch has been checked in and landed.

What's missing there is running this in an automated fashion, as described in comment #22 - that's the future part of the work.
Attachment #315794 - Flags: checked‑in+
I'm moving this back to future as per comment #33
Component: Release Engineering → Release Engineering: Future
Any ETA on when this will happen?
Summary: talos should measure x resources → Create new Talos suite: talos should measure x resources
Resurrected as part of q4 talos test creation push.
Component: Release Engineering: Future → Release Engineering
Priority: -- → P2
Assignee: nobody → anodelman
Some difficulty getting xres onto the talos slaves (mostly due to them now being on a pretty old version of ubuntu - gutsy).  Currently using:

https://launchpad.net/ubuntu/+source/xrestop/0.4-1/+build/212407/+files/xrestop_0.4-1_i386.deb

and 'dpkg -i xrestop_0.4-1_i386.deb'

We'll see what kind of numbers we get out of that on stage.
Attachment #411844 - Flags: review?(lsblakk) → review+
Attachment #411855 - Flags: review?(lsblakk) → review+
Depends on: 528591
Installed xrestop on all production & try talos linux slaves (except
talos-rev2-linux04 which is unreponsive.

Also, graph server changes pushed to production.

This is now ready for roll out.
Depends on: 528599
Depends on: 528642
Comment on attachment 411855 [details] [diff] [review]
[checked in]talos changes to make xres run on linux only

Checking in sample.config;
/cvsroot/mozilla/testing/performance/talos/sample.config,v  <--  sample.config
new revision: 1.38; previous revision: 1.37
done
Checking in ttest.py;
/cvsroot/mozilla/testing/performance/talos/ttest.py,v  <--  ttest.py
new revision: 1.36; previous revision: 1.35
done
Checking in run_tests.py;
/cvsroot/mozilla/testing/performance/talos/run_tests.py,v  <--  run_tests.py
new revision: 1.54; previous revision: 1.53
done
Attachment #411855 - Attachment description: talos changes to make xres run on linux only → [checked in]talos changes to make xres run on linux only
Attachment #411855 - Flags: checked-in+
Configured talos-rev2-linux04 now that it is up and reachable.  That should finish things here.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Depends on: 531690
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: