If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Create new Talos suite: talos should measure x resources

RESOLVED FIXED

Status

Release Engineering
General
P2
normal
RESOLVED FIXED
10 years ago
4 years ago

People

(Reporter: dougt, Assigned: alice)

Tracking

({mobile, perf})

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 1 obsolete attachment)

(Reporter)

Description

10 years ago
we should be measuring x resources during the talos run.  Testing between 1.8 and 1.9, i have measured a huge improvement and hope that this stays that way. 


xrestop is the tool needed for measuring what is going on in X.
Is our goal here to measure running size or to detect leaks of graphics resources?  If we want to detect leaks we need to:

1. When we do this run we want to be running with the session history turned off.
2. Same goes for the memory cache.

At first blush it looks like keeping session history and memory cache hangs onto a large amount of X resources and aren't freed for some time.

Might be interesting to do this with stuart's memory pressure stuff as well.

It looks like looking at the Talos code that there might be a place to integrate this.  There's code in there that already reads information out of /proc to get memory size and cpu numbers, yeah?
(Reporter)

Comment 2

10 years ago
my goal here was to have something equivalent to tracking RSS on linux, but for X.

(Reporter)

Comment 3

10 years ago
Created attachment 315666 [details]
GetXRes function

This returns the total bytes used by X for a given PID.

I think we should tie this into the cmanager_linux.py as another counter type.  Feel free to improve my greping.

Alice, do you have any thoughts on adding this?
Assignee: nobody → doug.turner
Status: NEW → ASSIGNED
(Reporter)

Comment 4

10 years ago
I was hoping to simply do:

Add my function to cmanager_linux.py, then add:

counterDict["XRes"] = GetXRes

to the list of counters in the dictionary.  However, the output file that is created contains only:

1,value

after a tp run.  
(Reporter)

Comment 5

10 years ago
And of course, i added the right counter to the sample.config I am running. 
(Reporter)

Comment 6

10 years ago
Created attachment 315794 [details] [diff] [review]
adds X Resource Tracking to Talos
Attachment #315666 - Attachment is obsolete: true
(Reporter)

Comment 7

10 years ago
Comment on attachment 315794 [details] [diff] [review]
adds X Resource Tracking to Talos

You also need to add something to your config file like:

    unix_counters : [ 'Private Bytes', 'RSS', 'XRes']


(Note, the addition of XRes in the above list)
(Reporter)

Comment 8

10 years ago
http://graphs-stage.mozilla.org/dgraph.html#spst=range&spstart=0&spend=53&bpst=cursor&bpstart=0&bpend=53&name=tp_loadtime&m1tid=216004&m1bl=0&m1avg=0


looks like it is working.
(Reporter)

Comment 9

10 years ago
We would like to get this on production tboxes soon.
(Assignee)

Comment 10

10 years ago
What machines do you want this running on?  All linux/mac?  (I'm assuming here that this wouldn't just be of interest for testing mobile - correct me if I'm wrong).
(Reporter)

Comment 11

10 years ago
only matters on linux/gtk machines, so not the mac.  this is for all linux/gtk machines that test tp_RSS.
(Assignee)

Comment 12

10 years ago
Oops - yeah, no mac.  Shouldn't have replied before caffeine.

So, you'd like this on all our linux machines?  I'm just trying to scope out what the request is.

I had a quick look and we don't have xrestop installed on our basic talos image, so it would also involve having to install it all over the place.
(Assignee)

Comment 13

10 years ago
Oh, and there would also have to be some hacking in talos itself.  At the moment, the counter that we manage (ie, monitoring RSS, Private Bytes, etc) are controlled in two different sets: one for unix style systems and one for windows.  In this case we'd be creating a counter that would only run on linux and not mac and would break the current assumptions in the code.
(Reporter)

Comment 14

10 years ago
Can't you just not add the counters to the config files on the mac.
(Assignee)

Comment 15

10 years ago
Our system is too smart for that - we share config files across platforms. :)

We probably could do something where we create custom config files for macs, but we are really trying to keep the amount of config files that we use low.  We had an issue in the past with config file explosion where it was very hard to track who was using which and pushing changes to all of them became a finicky nightmare.  

Using mac only configs would double the already large amount of configs.
(Reporter)

Comment 16

10 years ago
is it easier to just throw up a new machine?
(Assignee)

Comment 17

10 years ago
No - we don't have enough machines for that.

If we want this then we should put in the time to make the changes to talos to allow us to push it to currently up machines.
(Reporter)

Comment 18

10 years ago
how can I help?  we are not measuring the dark-matter of memory leaks -- what we allocate inside of X. we think it is pretty important.  
Doug: 

- does this impact runtime or performance of the current talos run? That would help clarify if we can do this on our existing linux talos machines or if we need to start talking about buying more machines and figuring out how to set them up. 

- is this something we need run on every checkin in an automated manner? Or something that could be run manually at major milestones of a release?
(Reporter)

Comment 20

10 years ago
1) i am sure there is some cost to running this on talos -- even though it is "in parallel" against the page cycler.  No idea of the actual cost, but I would suspect it would be small.

2) every chickin similar to how we test vmrss.

(Reporter)

Updated

10 years ago
Attachment #315794 - Flags: review?(anodelman)
(Assignee)

Updated

10 years ago
Attachment #315794 - Flags: review?(anodelman) → review+
(Reporter)

Comment 21

10 years ago
Comment on attachment 315794 [details] [diff] [review]
adds X Resource Tracking to Talos

great alice.  What is next, does talos need another review, or approval, or should I just check it in?
(Assignee)

Comment 22

10 years ago
I've r+ed the patch that gets the xres collection code into talos, but this
will not cause it be run anywhere without:

1) seconday patches to have talos maintain different counters for linux/mac 
2) requisite config files updates for all active talos config files
3) xrestop installed on all linux talos boxes

The r+ed patch should be fine for check in as it will not be executed at all, so it is very low risk.
(Assignee)

Comment 23

10 years ago
Checking in cmanager_linux.py;
/cvsroot/mozilla/testing/performance/talos/cmanager_linux.py,v  <--  cmanager_linux.py
new revision: 1.2; previous revision: 1.1
done
(Reporter)

Comment 24

10 years ago
thanks alice! 

Updated

10 years ago
Keywords: mobile, perf

Comment 25

10 years ago
When we have the mobile builds going off hg and talos working for mobile, we can see how it affects talos turning this on and then stage it on the desktop Linux version when we know the effect of it.

Updated

10 years ago
Depends on: 426444
(Reporter)

Comment 26

10 years ago
In the short term, why can't we stand up a linux gtk/x11 box running this test case (against mozilla-central)?  We are driving blind at the moment wrt x leaks+bloat.
(Assignee)

Comment 27

10 years ago
There are currently no spare perf testing machines - as it stands even my staging perf machines have been commandeered for comparison perf testing work.

Comment #22 is the route to having this run in an automated fashion.
(In reply to comment #26)
> In the short term, why can't we stand up a linux gtk/x11 box running this test
> case (against mozilla-central)?  We are driving blind at the moment wrt x
> leaks+bloat.
> 

(In reply to comment #27)
> There are currently no spare perf testing machines - as it stands even my
> staging perf machines have been commandeered for comparison perf testing work.
> 
> Comment #22 is the route to having this run in an automated fashion.
> 

For us to set this up as part of automation framework is non-trivial. If we do it on the same existing talos machines, we have to schedule downtimes for them all, update the talos machines, and then do calibration runs to confirm no perf regressions caused by the change, before we reopen the trees. Alternatively, we could build out a new set of machines just for this new test, avoiding risk of change to existing talos machines, but this needs a set of machines and some setup/maintenance work. Both of these options are not on our plates for this quarter, but we can discuss it with other mobile requests for next quarter.

In the meanwhile, if you have a spare machines, there's nothing to stop you setting up a private machine running this test on a branch of your choosing, and manually triggering it on builds of your choosing as you need.

Hope all that makes sense. I do understand the usefulness of this new test, but I'm trying to juggle this with the other things we already did promise to do. If I'm missing something, maybe we should all have a quick conf call on this?
(Reporter)

Comment 29

9 years ago
christian, we should be measuring x resources on mobile.
Assignee: doug.turner → nobody
Status: ASSIGNED → NEW
(Assignee)

Updated

9 years ago
Blocks: 455755
No longer blocks: 455755
(Assignee)

Updated

9 years ago
No longer depends on: 426444
Component: Testing → Release Engineering
Product: Core → mozilla.org
QA Contact: testing → release
Version: Trunk → other
Looks like there's been no activity on this for quite awhile. I'm moving it out to future. If someone is going to grab it to work on, feel free to move it back.
Component: Release Engineering → Release Engineering: Future

Comment 31

9 years ago
There has been no activity because no one on build has been working on it.  There is patch that is ready to land, can we just do that?
Component: Release Engineering: Future → Release Engineering
(In reply to comment #31)
> There has been no activity because no one on build has been working on it. 
> There is patch that is ready to land, can we just do that?

Deferring to Alice on this.
(Assignee)

Comment 33

9 years ago
Patch has been checked in and landed.

What's missing there is running this in an automated fashion, as described in comment #22 - that's the future part of the work.
Attachment #315794 - Flags: checked‑in+
I'm moving this back to future as per comment #33
Component: Release Engineering → Release Engineering: Future

Comment 35

9 years ago
Any ETA on when this will happen?
Summary: talos should measure x resources → Create new Talos suite: talos should measure x resources
(Assignee)

Comment 36

8 years ago
Resurrected as part of q4 talos test creation push.
Component: Release Engineering: Future → Release Engineering
Priority: -- → P2
(Assignee)

Updated

8 years ago
Assignee: nobody → anodelman
(Assignee)

Comment 37

8 years ago
Created attachment 411844 [details] [diff] [review]
update graph server with new xres counter
Attachment #411844 - Flags: review?(lsblakk)
(Assignee)

Comment 38

8 years ago
Some difficulty getting xres onto the talos slaves (mostly due to them now being on a pretty old version of ubuntu - gutsy).  Currently using:

https://launchpad.net/ubuntu/+source/xrestop/0.4-1/+build/212407/+files/xrestop_0.4-1_i386.deb

and 'dpkg -i xrestop_0.4-1_i386.deb'

We'll see what kind of numbers we get out of that on stage.
Attachment #411844 - Flags: review?(lsblakk) → review+
(Assignee)

Comment 39

8 years ago
Created attachment 411855 [details] [diff] [review]
[checked in]talos changes to make xres run on linux only
Attachment #411855 - Flags: review?(lsblakk)
Attachment #411855 - Flags: review?(lsblakk) → review+
(Assignee)

Updated

8 years ago
Depends on: 528591
(Assignee)

Comment 40

8 years ago
Installed xrestop on all production & try talos linux slaves (except
talos-rev2-linux04 which is unreponsive.

Also, graph server changes pushed to production.

This is now ready for roll out.
(Assignee)

Updated

8 years ago
Depends on: 528599
(Assignee)

Updated

8 years ago
Depends on: 528642
(Assignee)

Comment 41

8 years ago
Comment on attachment 411855 [details] [diff] [review]
[checked in]talos changes to make xres run on linux only

Checking in sample.config;
/cvsroot/mozilla/testing/performance/talos/sample.config,v  <--  sample.config
new revision: 1.38; previous revision: 1.37
done
Checking in ttest.py;
/cvsroot/mozilla/testing/performance/talos/ttest.py,v  <--  ttest.py
new revision: 1.36; previous revision: 1.35
done
Checking in run_tests.py;
/cvsroot/mozilla/testing/performance/talos/run_tests.py,v  <--  run_tests.py
new revision: 1.54; previous revision: 1.53
done
Attachment #411855 - Attachment description: talos changes to make xres run on linux only → [checked in]talos changes to make xres run on linux only
Attachment #411855 - Flags: checked-in+
(Assignee)

Comment 42

8 years ago
Configured talos-rev2-linux04 now that it is up and reachable.  That should finish things here.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Depends on: 531690
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.