Closed Bug 419776 Opened 17 years ago Closed 15 years ago

Create new Talos suite: talos should measure x resources

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: dougt, Assigned: anodelman)

References

Details

(Keywords: mobile, perf)

Attachments

(3 files, 1 obsolete file)

GetXRes function 17 years ago Doug Turner (:dougt) 278 bytes, text/plain		Details
adds X Resource Tracking to Talos 17 years ago Doug Turner (:dougt) 2.08 KB, patch	anodelman : review+ bhearsum : checked-in+	Details \| Diff \| Splinter Review
update graph server with new xres counter 15 years ago alice nodelman [:alice] [:anode] 773 bytes, patch	lsblakk : review+	Details \| Diff \| Splinter Review
[checked in]talos changes to make xres run on linux only 15 years ago alice nodelman [:alice] [:anode] 9.41 KB, patch	lsblakk : review+ anodelman : checked-in+	Details \| Diff \| Splinter Review

Doug Turner (:dougt)

Reporter

Description

•

17 years ago

we should be measuring x resources during the talos run.  Testing between 1.8 and 1.9, i have measured a huge improvement and hope that this stays that way. 


xrestop is the tool needed for measuring what is going on in X.

Christopher Blizzard (:blizzard)

Comment 1

•

17 years ago

Is our goal here to measure running size or to detect leaks of graphics resources?  If we want to detect leaks we need to:

1. When we do this run we want to be running with the session history turned off.
2. Same goes for the memory cache.

At first blush it looks like keeping session history and memory cache hangs onto a large amount of X resources and aren't freed for some time.

Might be interesting to do this with stuart's memory pressure stuff as well.

It looks like looking at the Talos code that there might be a place to integrate this.  There's code in there that already reads information out of /proc to get memory size and cpu numbers, yeah?

Doug Turner (:dougt)

Reporter

Comment 2

•

17 years ago

my goal here was to have something equivalent to tracking RSS on linux, but for X.

Doug Turner (:dougt)

Reporter

Comment 3

•

17 years ago

Attached file GetXRes function (obsolete) — Details

This returns the total bytes used by X for a given PID.

I think we should tie this into the cmanager_linux.py as another counter type.  Feel free to improve my greping.

Alice, do you have any thoughts on adding this?

Assignee: nobody → doug.turner

Status: NEW → ASSIGNED

Doug Turner (:dougt)

Reporter

Comment 4

•

17 years ago

I was hoping to simply do:

Add my function to cmanager_linux.py, then add:

counterDict["XRes"] = GetXRes

to the list of counters in the dictionary.  However, the output file that is created contains only:

1,value

after a tp run.

Doug Turner (:dougt)

Reporter

Comment 5

•

17 years ago

And of course, i added the right counter to the sample.config I am running.

Doug Turner (:dougt)

Reporter

Comment 6

•

17 years ago

Attached patch adds X Resource Tracking to Talos — Details — Splinter Review

Attachment #315666 - Attachment is obsolete: true

Doug Turner (:dougt)

Reporter

Comment 7

•

17 years ago

Comment on attachment 315794 [details] [diff] [review]
adds X Resource Tracking to Talos

You also need to add something to your config file like:

    unix_counters : [ 'Private Bytes', 'RSS', 'XRes']


(Note, the addition of XRes in the above list)

Doug Turner (:dougt)

Reporter

Comment 8

•

17 years ago

http://graphs-stage.mozilla.org/dgraph.html#spst=range&spstart=0&spend=53&bpst=cursor&bpstart=0&bpend=53&name=tp_loadtime&m1tid=216004&m1bl=0&m1avg=0


looks like it is working.

Doug Turner (:dougt)

Reporter

Comment 9

•

16 years ago

We would like to get this on production tboxes soon.

alice nodelman [:alice] [:anode]

Assignee

Comment 10

•

16 years ago

What machines do you want this running on?  All linux/mac?  (I'm assuming here that this wouldn't just be of interest for testing mobile - correct me if I'm wrong).

Doug Turner (:dougt)

Reporter

Comment 11

•

16 years ago

only matters on linux/gtk machines, so not the mac.  this is for all linux/gtk machines that test tp_RSS.

alice nodelman [:alice] [:anode]

Assignee

Comment 12

•

16 years ago

Oops - yeah, no mac.  Shouldn't have replied before caffeine.

So, you'd like this on all our linux machines?  I'm just trying to scope out what the request is.

I had a quick look and we don't have xrestop installed on our basic talos image, so it would also involve having to install it all over the place.

alice nodelman [:alice] [:anode]

Assignee

Comment 13

•

16 years ago

Oh, and there would also have to be some hacking in talos itself.  At the moment, the counter that we manage (ie, monitoring RSS, Private Bytes, etc) are controlled in two different sets: one for unix style systems and one for windows.  In this case we'd be creating a counter that would only run on linux and not mac and would break the current assumptions in the code.

Doug Turner (:dougt)

Reporter

Comment 14

•

16 years ago

Can't you just not add the counters to the config files on the mac.

alice nodelman [:alice] [:anode]

Assignee

Comment 15

•

16 years ago

Our system is too smart for that - we share config files across platforms. :)

We probably could do something where we create custom config files for macs, but we are really trying to keep the amount of config files that we use low.  We had an issue in the past with config file explosion where it was very hard to track who was using which and pushing changes to all of them became a finicky nightmare.  

Using mac only configs would double the already large amount of configs.

Doug Turner (:dougt)

Reporter

Comment 16

•

16 years ago

is it easier to just throw up a new machine?

alice nodelman [:alice] [:anode]

Assignee

Comment 17

•

16 years ago

No - we don't have enough machines for that.

If we want this then we should put in the time to make the changes to talos to allow us to push it to currently up machines.

Doug Turner (:dougt)

Reporter

Comment 18

•

16 years ago

how can I help?  we are not measuring the dark-matter of memory leaks -- what we allocate inside of X. we think it is pretty important.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 19

•

16 years ago

Doug: 

- does this impact runtime or performance of the current talos run? That would help clarify if we can do this on our existing linux talos machines or if we need to start talking about buying more machines and figuring out how to set them up. 

- is this something we need run on every checkin in an automated manner? Or something that could be run manually at major milestones of a release?

Doug Turner (:dougt)

Reporter

Comment 20

•

16 years ago

1) i am sure there is some cost to running this on talos -- even though it is "in parallel" against the page cycler.  No idea of the actual cost, but I would suspect it would be small.

2) every chickin similar to how we test vmrss.

Doug Turner (:dougt)

Reporter

Updated

•

16 years ago

Attachment #315794 - Flags: review?(anodelman)

alice nodelman [:alice] [:anode]

Assignee

Updated

•

16 years ago

Attachment #315794 - Flags: review?(anodelman) → review+

Doug Turner (:dougt)

Reporter

Comment 21

•

16 years ago

Comment on attachment 315794 [details] [diff] [review]
adds X Resource Tracking to Talos

great alice.  What is next, does talos need another review, or approval, or should I just check it in?

alice nodelman [:alice] [:anode]

Assignee

Comment 22

•

16 years ago

I've r+ed the patch that gets the xres collection code into talos, but this
will not cause it be run anywhere without:

1) seconday patches to have talos maintain different counters for linux/mac 
2) requisite config files updates for all active talos config files
3) xrestop installed on all linux talos boxes

The r+ed patch should be fine for check in as it will not be executed at all, so it is very low risk.

alice nodelman [:alice] [:anode]

Assignee

Comment 23

•

16 years ago

Checking in cmanager_linux.py;
/cvsroot/mozilla/testing/performance/talos/cmanager_linux.py,v  <--  cmanager_linux.py
new revision: 1.2; previous revision: 1.1
done

Doug Turner (:dougt)

Reporter

Comment 24

•

16 years ago

thanks alice!

Christian Sejersen

Updated

•

16 years ago

Keywords: mobile, perf

Christian Sejersen

Comment 25

•

16 years ago

When we have the mobile builds going off hg and talos working for mobile, we can see how it affects talos turning this on and then stage it on the desktop Linux version when we know the effect of it.

Christian Sejersen

Updated

•

16 years ago

Depends on: 426444

Doug Turner (:dougt)

Reporter

Comment 26

•

16 years ago

In the short term, why can't we stand up a linux gtk/x11 box running this test case (against mozilla-central)?  We are driving blind at the moment wrt x leaks+bloat.

alice nodelman [:alice] [:anode]

Assignee

Comment 27

•

16 years ago

There are currently no spare perf testing machines - as it stands even my staging perf machines have been commandeered for comparison perf testing work.

Comment #22 is the route to having this run in an automated fashion.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 28

•

16 years ago

(In reply to comment #26)
> In the short term, why can't we stand up a linux gtk/x11 box running this test
> case (against mozilla-central)?  We are driving blind at the moment wrt x
> leaks+bloat.
> 

(In reply to comment #27)
> There are currently no spare perf testing machines - as it stands even my
> staging perf machines have been commandeered for comparison perf testing work.
> 
> Comment #22 is the route to having this run in an automated fashion.
> 

For us to set this up as part of automation framework is non-trivial. If we do it on the same existing talos machines, we have to schedule downtimes for them all, update the talos machines, and then do calibration runs to confirm no perf regressions caused by the change, before we reopen the trees. Alternatively, we could build out a new set of machines just for this new test, avoiding risk of change to existing talos machines, but this needs a set of machines and some setup/maintenance work. Both of these options are not on our plates for this quarter, but we can discuss it with other mobile requests for next quarter.

In the meanwhile, if you have a spare machines, there's nothing to stop you setting up a private machine running this test on a branch of your choosing, and manually triggering it on builds of your choosing as you need.

Hope all that makes sense. I do understand the usefulness of this new test, but I'm trying to juggle this with the other things we already did promise to do. If I'm missing something, maybe we should all have a quick conf call on this?

Doug Turner (:dougt)

Reporter

Comment 29

•

16 years ago

christian, we should be measuring x resources on mobile.

Assignee: doug.turner → nobody

Status: ASSIGNED → NEW

alice nodelman [:alice] [:anode]

Assignee

Updated

•

16 years ago

Blocks: 455755

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Updated

•

16 years ago

No longer blocks: 455755

alice nodelman [:alice] [:anode]

Assignee

Updated

•

16 years ago

No longer depends on: 426444

(not currently active) Ted Mielczarek

Updated

•

16 years ago

Component: Testing → Release Engineering

Product: Core → mozilla.org

QA Contact: testing → release

Version: Trunk → other

bhearsum@mozilla.com (:bhearsum)

Comment 30

•

16 years ago

Looks like there's been no activity on this for quite awhile. I'm moving it out to future. If someone is going to grab it to work on, feel free to move it back.

Component: Release Engineering → Release Engineering: Future

Stuart Parmenter

Comment 31

•

16 years ago

There has been no activity because no one on build has been working on it.  There is patch that is ready to land, can we just do that?

Component: Release Engineering: Future → Release Engineering

bhearsum@mozilla.com (:bhearsum)

Comment 32

•

16 years ago

(In reply to comment #31)
> There has been no activity because no one on build has been working on it. 
> There is patch that is ready to land, can we just do that?

Deferring to Alice on this.

alice nodelman [:alice] [:anode]

Assignee

Comment 33

•

16 years ago

Patch has been checked in and landed.

What's missing there is running this in an automated fashion, as described in comment #22 - that's the future part of the work.

bhearsum@mozilla.com (:bhearsum)

Updated

•

16 years ago

Attachment #315794 - Flags: checked‑in+

bhearsum@mozilla.com (:bhearsum)

Comment 34

•

16 years ago

I'm moving this back to future as per comment #33

Component: Release Engineering → Release Engineering: Future

Stuart Parmenter

Comment 35

•

16 years ago

Any ETA on when this will happen?

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

15 years ago

Summary: talos should measure x resources → Create new Talos suite: talos should measure x resources

alice nodelman [:alice] [:anode]

Assignee

Comment 36

•

15 years ago

Resurrected as part of q4 talos test creation push.

Component: Release Engineering: Future → Release Engineering

Priority: -- → P2

alice nodelman [:alice] [:anode]

Assignee

Updated

•

15 years ago

Assignee: nobody → anodelman

alice nodelman [:alice] [:anode]

Assignee

Comment 37

•

15 years ago

Attached patch update graph server with new xres counter — Details — Splinter Review

Attachment #411844 - Flags: review?(lsblakk)

alice nodelman [:alice] [:anode]

Assignee

Comment 38

•

15 years ago

Some difficulty getting xres onto the talos slaves (mostly due to them now being on a pretty old version of ubuntu - gutsy).  Currently using:

https://launchpad.net/ubuntu/+source/xrestop/0.4-1/+build/212407/+files/xrestop_0.4-1_i386.deb

and 'dpkg -i xrestop_0.4-1_i386.deb'

We'll see what kind of numbers we get out of that on stage.

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

15 years ago

Attachment #411844 - Flags: review?(lsblakk) → review+

alice nodelman [:alice] [:anode]

Assignee

Comment 39

•

15 years ago

Attached patch [checked in]talos changes to make xres run on linux only — Details — Splinter Review

Attachment #411855 - Flags: review?(lsblakk)

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

15 years ago

Attachment #411855 - Flags: review?(lsblakk) → review+

alice nodelman [:alice] [:anode]

Assignee

Updated

•

15 years ago

Depends on: 528591

alice nodelman [:alice] [:anode]

Assignee

Comment 40

•

15 years ago

Installed xrestop on all production & try talos linux slaves (except
talos-rev2-linux04 which is unreponsive.

Also, graph server changes pushed to production.

This is now ready for roll out.

alice nodelman [:alice] [:anode]

Assignee

Updated

•

15 years ago

Depends on: 528599

alice nodelman [:alice] [:anode]

Assignee

Updated

•

15 years ago

Depends on: 528642

alice nodelman [:alice] [:anode]

Assignee

Comment 41

•

15 years ago

Comment on attachment 411855 [details] [diff] [review]
[checked in]talos changes to make xres run on linux only

Checking in sample.config;
/cvsroot/mozilla/testing/performance/talos/sample.config,v  <--  sample.config
new revision: 1.38; previous revision: 1.37
done
Checking in ttest.py;
/cvsroot/mozilla/testing/performance/talos/ttest.py,v  <--  ttest.py
new revision: 1.36; previous revision: 1.35
done
Checking in run_tests.py;
/cvsroot/mozilla/testing/performance/talos/run_tests.py,v  <--  run_tests.py
new revision: 1.54; previous revision: 1.53
done

Attachment #411855 - Attachment description: talos changes to make xres run on linux only → [checked in]talos changes to make xres run on linux only

Attachment #411855 - Flags: checked-in+

alice nodelman [:alice] [:anode]

Assignee

Comment 42

•

15 years ago

Configured talos-rev2-linux04 now that it is up and reachable.  That should finish things here.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Karl Tomlinson (:karlt)

Updated

•

15 years ago

Depends on: 531690

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

You need to log in before you can comment on or make changes to this bug.