Closed Bug 609111 Opened 14 years ago Closed 12 years ago

Talos should run xperf during its tests

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: sdwilsh, Assigned: jmaher)

References

(Blocks 1 open bug,
URL
)

Details

Attachments

(6 files, 9 obsolete files)

initial code to allow talos to run with xperf (0.5) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 19.50 KB, patch		Details \| Diff \| Splinter Review
talos patch to run xperf when given the --xperf_path command line (1.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 14.28 KB, patch		Details \| Diff \| Splinter Review
xperf tools to be used with talos (1.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 15.39 KB, patch	anodelman : review+ cmtalbert : feedback+	Details \| Diff \| Splinter Review
talos patch to run xperf when given the --xperf_path command line (2.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 13.86 KB, patch		Details \| Diff \| Splinter Review
talos patch to run xperf when given the --xperf_path command line (3.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 11.85 KB, patch		Details \| Diff \| Splinter Review
xperf tools to be used with talos (2.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 15.39 KB, patch		Details \| Diff \| Splinter Review
xperf tools to be used with talos (2.1) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 16.18 KB, patch	cmtalbert : review+ anodelman : feedback+	Details \| Diff \| Splinter Review
talos patch to run xperf when given the --xperf_path command line (3.1) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 9.88 KB, patch	anodelman : review+	Details \| Diff \| Splinter Review
add xperf to buildbot-configs (0.5) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 2.40 KB, patch		Details \| Diff \| Splinter Review
talos patch to run xperf when given the --xperf_path command line (4.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 18.54 KB, patch		Details \| Diff \| Splinter Review
add xperf to buildbot-configs (1.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 18.13 KB, patch	armenzg : review+ anodelman : feedback+	Details \| Diff \| Splinter Review
(checked-in) add xperf to buildbot-configs (2.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 3.17 KB, patch	armenzg : review+	Details \| Diff \| Splinter Review
allow xperf binary to have spaces in the path (1.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 3.72 KB, patch	anodelman : review+ armenzg : feedback+	Details \| Diff \| Splinter Review
(checked in) fix path to xperf.exe in config.py (1.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 1.18 KB, patch	armenzg : review+	Details \| Diff \| Splinter Review
add xperf to try 13 years ago Armen [:armenzg] 1.07 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review

Shawn Wilsher :sdwilsh

Reporter

Description

•

14 years ago

It'd be really awesome if talos ran xperf during its performance tests so we can get data lots of things that we'd like to track.  I'll keep the things we want to track out of this bug for now (we can do those in follow-ups) and keep this just as getting it running for the talos run.

bhearsum@mozilla.com (:bhearsum)

Comment 1

•

14 years ago

New test suites are tracked in Testing: Talos these days. When Talos learns how to run it there may need to be a RelEng bug for a small bit of integration.

Component: Release Engineering → Talos

Product: mozilla.org → Testing

QA Contact: release → talos

Version: other → Trunk

Shawn Wilsher :sdwilsh

Reporter

Comment 2

•

14 years ago

Although, it looks like http://hg.mozilla.org/build/talos/file/ce9b0413fbf1/ttest.py#l283 would make this a bit difficult and not as useful for tests like Ts.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 3

•

14 years ago

(In reply to comment #3)
> It'd be really awesome if talos ran xperf during its performance tests so we
> can get data lots of things that we'd like to track.  I'll keep the things we
> want to track out of this bug for now (we can do those in follow-ups) and keep
> this just as getting it running for the talos run.

sdwilsh: what impact would running xperf have on Talos perf results?

I ask because I'm trying to understand if you want all Talos suites run with xperf? Or if you want all Talos suites run twice - once with xperf, and once without?

Shawn Wilsher :sdwilsh

Reporter

Comment 4

•

14 years ago

(In reply to comment #3)
> I ask because I'm trying to understand if you want all Talos suites run with
> xperf? Or if you want all Talos suites run twice - once with xperf, and once
> without?
That's a great question that I'd need to collect data to give you a good answer on.  If we want to be paranoid, we'd do things twice, but I suspect we could get away with just one run with xperf.

(dormant account)

Updated

•

14 years ago

Blocks: 572459

Shawn Wilsher :sdwilsh

Reporter

Updated

•

13 years ago

URL: http://etherpad.mozilla.org:9000/xper...

Armen [:armenzg]

Comment 5

•

13 years ago

As per jmaher we might hear more about this closer to the all hands.

Currently UAC on the Windows slaves in on the way for xperf.

For more details about xperf please read https://developer.mozilla.org/En/Profiling_with_Xperf

(dormant account)

Comment 6

•

13 years ago

You can't adjust UAC notification level?

Armen [:armenzg]

Comment 7

•

13 years ago

We can (not easily) but we should look to see if we should want to be testing in different conditions than our users (that would require a wider audience to give their feedback).

(dormant account)

Comment 8

•

13 years ago

(In reply to comment #7)
> We can (not easily) but we should look to see if we should want to be testing
> in different conditions than our users (that would require a wider audience to
> give their feedback).

I'm not sure what wider audience has to do with technical issues in deploying xperf?

bhearsum@mozilla.com (:bhearsum)

Comment 9

•

13 years ago

(In reply to comment #8)
> (In reply to comment #7)
> > We can (not easily) but we should look to see if we should want to be testing
> > in different conditions than our users (that would require a wider audience to
> > give their feedback).
> 
> I'm not sure what wider audience has to do with technical issues in deploying
> xperf?

We do our best to keep our test machine configurations in-line with real-world user configurations, to make the tests as valid as possible.

Armen [:armenzg]

Comment 10

•

13 years ago

I am referring to disabling UAC on the testing machines.

Shawn Wilsher :sdwilsh

Reporter

Comment 11

•

13 years ago

UAC should have zero impact on performance.  If this is what is blocking us, I'm going to go cry in a corner.

Shawn Wilsher :sdwilsh

Reporter

Updated

•

13 years ago

Depends on: 640829

John Ford [:jhford] CET/CEST Berlin Time

Comment 12

•

13 years ago

(In reply to comment #11)
> UAC should have zero impact on performance.  If this is what is blocking us,
> I'm going to go cry in a corner.

its not performance, its that most (all?) users run with UAC on, and if we don't test with it on it might go unnoticed.

John Ford [:jhford] CET/CEST Berlin Time

Comment 13

•

13 years ago

(In reply to comment #12)
> (In reply to comment #11)
> > UAC should have zero impact on performance.  If this is what is blocking us,
> > I'm going to go cry in a corner.
> 
> its not performance, its that most (all?) users run with UAC on, and if we
> don't test with it on it might go unnoticed.

the last "it" should be "UAC related breakage", sorry for the bugspam

Brendan Eich [:brendan]

Comment 14

•

13 years ago

Why is it either/or? Can we run with and without UAC? At least one box without?

/be

Chris AtLee [:catlee]

Comment 15

•

13 years ago

(In reply to comment #14)
> Why is it either/or? Can we run with and without UAC? At least one box without?
> 
> /be

I don't think we really want to have two sets of machines, some with and some without.

If we can disable UAC for a particular test run that would be best.

Shawn Wilsher :sdwilsh

Reporter

Comment 16

•

13 years ago

(In reply to comment #12)
> its not performance, its that most (all?) users run with UAC on, and if we
> don't test with it on it might go unnoticed.
So the concern is for tests that run on these slaves, not talos, correct?

(note that I know of a number of techy people who turn it off because it is too naggy for them)

(In reply to comment #14)
> Why is it either/or? Can we run with and without UAC? At least one box without?
I strongly suspect that would require different slave pools, which is likely undesirable for RelEng (especially since we cannot get more of the hardware that these tests run on).

John Ford [:jhford] CET/CEST Berlin Time

Comment 17

•

13 years ago

(In reply to comment #15)
> If we can disable UAC for a particular test run that would be best.

If I recall correctly, you have to reboot for a change in UAC to take effect.  If this is correct, it would be much more difficult to schedule tests on the slaves as they would have to reboot half way through the buildbot job, and again at the end of the buildbot job.

(dormant account)

Comment 18

•

13 years ago

(In reply to comment #17)
> (In reply to comment #15)
> > If we can disable UAC for a particular test run that would be best.
> 
> If I recall correctly, you have to reboot for a change in UAC to take effect. 
> If this is correct, it would be much more difficult to schedule tests on the
> slaves as they would have to reboot half way through the buildbot job, and
> again at the end of the buildbot job.

If it makes you feel better. I'd like to reboot before running this test anyway. The upside is that it only needs to be run once unlike our other tests.

Shawn Wilsher :sdwilsh

Reporter

Comment 19

•

13 years ago

(In reply to comment #18)
> If it makes you feel better. I'd like to reboot before running this test
> anyway. The upside is that it only needs to be run once unlike our other tests.
Let's do that as a follow-up please.  Scope creep is going to be the death of this project, so getting xperf running on the boxes first should be all we are looking at.

(dormant account)

Comment 20

•

13 years ago

(In reply to comment #19)

> Let's do that as a follow-up please.  Scope creep is going to be the death of
> this project, so getting xperf running on the boxes first should be all we are
> looking at.

ok. I'm all for whatever helps this get deployed sooner.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 21

•

13 years ago

Attached patch initial code to allow talos to run with xperf (0.5) (obsolete) — Details — Splinter Review

this is a patch to talos (hg.mozilla.org/build/talos) which will work if you do this:
python PerfConfigurator -e <path>\firefox.exe --activeTests ts --xperf_path c:\<path>\xperf.exe --sampleConfig xperf.config --output win7.yml
python run_tests.py win7.yml

I am currently running on talos (tools) staging to see how this works.  The UAC has been turned off before my testrun and the xperf tools have been installed to c:\perftools (I had trouble referencing 'c:\program files\microsoft windows performance toolkit').

currently for ts (20 iterations of starting up the browser), we generate at 140MB .etl file.

Armen [:armenzg]

Comment 22

•

13 years ago

Maybe we can find a way to exempt xperf.exe from UAC with a manifest or something.

(dormant account)

Comment 23

•

13 years ago

(In reply to comment #22)
> Maybe we can find a way to exempt xperf.exe from UAC with a manifest or
> something.

One option is to run everything in a privileged account ie
start xperf
runas talos as a non-priviledged user
stop xperf
runas nonpriviledged user to postprocess xperf.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 24

•

13 years ago

Attached patch talos patch to run xperf when given the --xperf_path command line (1.0) (obsolete) — Details — Splinter Review

this patch adds a --xperf_path cli to PerfConfigurator.py which requires the full patch to xperf.exe (i.e. --xperf_path c:/perftools/xperf.exe).

This patch doesn't cause any functional regressions on any other platforms or test types.  I have tested this on the ts:txul:tsvg and tp4 tests.

Next steps are to figure out UAC, xperf tool installation, and get this reviewed.

Assignee: nobody → jmaher

Attachment #518789 - Attachment is obsolete: true

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 25

•

13 years ago

in windows 7 there is a way to set the access control to different levels (effectively 0-4).  By default it is on 4 meaning block everything.  If we set it to 3, we can run xperf, but still get prompts and warnings (if firefox does something malicious) for all system changes and third party stuff.

Armen [:armenzg]

Comment 26

•

13 years ago

jmaher could you give me a screenshot of those warnings?
I have managed to disable crash notifications on Win2008 for 64-bit builds and I assume there should be a way to make that happen for Win7.

Shawn Wilsher :sdwilsh

Reporter

Comment 27

•

13 years ago

(In reply to comment #25)
> in windows 7 there is a way to set the access control to different levels
> (effectively 0-4).  By default it is on 4 meaning block everything.  If we set
> it to 3, we can run xperf, but still get prompts and warnings (if firefox does
> something malicious) for all system changes and third party stuff.
Awesome.  3 still warns about most things, so I think that setting it to 3 should work great for us.

(dormant account)

Updated

•

13 years ago

Blocks: 644744

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 28

•

13 years ago

ok, the talos boxes are already set at this level of reporting and I found out that we actually run the talos process elevated to begin with.  So the UAC issue doesn't seem to be a problem.

Remaining items:
 - get xperf tools installed on talos slaves
 - figure out how to upload the .etl file to the directory with the test results

Shawn Wilsher :sdwilsh

Reporter

Comment 29

•

13 years ago

(In reply to comment #28)
>  - get xperf tools installed on talos slaves
That's bug 640829 (which will hopefully get some action post Q1)

>  - figure out how to upload the .etl file to the directory with the test
> results
Is this really needed?  We could easily process a local file and report the numbers we get.

(dormant account)

Comment 30

•

13 years ago

(In reply to comment #29)
> (In reply to comment #28)
> >  - get xperf tools installed on talos slaves
> That's bug 640829 (which will hopefully get some action post Q1)
> 
> >  - figure out how to upload the .etl file to the directory with the test
> > results
> Is this really needed?  We could easily process a local file and report the
> numbers we get.

(In reply to comment #29)
> (In reply to comment #28)
> >  - get xperf tools installed on talos slaves
> That's bug 640829 (which will hopefully get some action post Q1)
> 
> >  - figure out how to upload the .etl file to the directory with the test
> > results
> Is this really needed?  We could easily process a local file and report the
> numbers we get.

I'd like to have the xperf file available for download.

Armen [:armenzg]

Comment 31

•

13 years ago

I have set up a machine on staging with xperf. I will be meeting with jmaher to make sure it runs properly with buildbot.

Getting the .etl file will have to be discussed later on as those machines do not have keys to upload such file to ftp. FWIW I prefer the post-processing approach.

I have got several questions:
* would we be running xperf by default on every talos job?
** for every suite or a subset of all?
** would we be running suites with/without xperf enabled?
** should xperf jobs have the same priority as other test jobs?
* do we want to run talos with xperf on *every* push?
** just some branches?
** just nightly pushes?
** would we prefer the performance team to request xperf runs upon request?
** would giving try syntax give the value needed?

NOTE: Before answering these questions please have in mind that answering "yes" or "all" is easy but the current capacity on the Win7 pool of slaves is always being outrun on peak hours.

Shawn Wilsher :sdwilsh

Reporter

Comment 32

•

13 years ago

(In reply to comment #31)
> * would we be running xperf by default on every talos job?
> ** for every suite or a subset of all?
I think Ts, Txul, and Tp4 are the tests we need this on for now.  We can expand it later.

> ** would we be running suites with/without xperf enabled?
per comment 4, I think we'll be fine with just running under xperf, but this will likely cause a change in numbers (and thus baseline).  Probably a good idea to run these on staging with and without xperf to compare and then make a decision.

> ** should xperf jobs have the same priority as other test jobs?
Absolutely.  This will be tracking data that we've needed to track for a very long time.

> * do we want to run talos with xperf on *every* push?
Regressions in what we want to measure are just as serious as regressing tp4 or ts, so I would assume yes.

> ** just some branches?
At the very least, mozilla-central (and all the new branches that we use to track releases)

> ** just nightly pushes?
I worry that this would not give us enough data points.

> ** would we prefer the performance team to request xperf runs upon request?
We'd have to watch all pushes closely, which feels faily.

> ** would giving try syntax give the value needed?
Would maybe be useful, but if we don't run non-xperf talos, we'd get it for free when running talos, right?

> NOTE: Before answering these questions please have in mind that answering "yes"
> or "all" is easy but the current capacity on the Win7 pool of slaves is always
> being outrun on peak hours.
This is unfortunate.  Is there a plan to add capacity to it?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 33

•

13 years ago

can we start figuring out what type of numbers we want to harvest out of the .etl files?  the only way we can trend data is to decide on a set of numbers we want to pull out of the .etl files and report.

(dormant account)

Comment 34

•

13 years ago

(In reply to comment #33)
> can we start figuring out what type of numbers we want to harvest out of the
> .etl files?  the only way we can trend data is to decide on a set of numbers we
> want to pull out of the .etl files and report.

list of files accessed and time spent on their io

Shawn Wilsher :sdwilsh

Reporter

Comment 35

•

13 years ago

(In reply to comment #33)
> can we start figuring out what type of numbers we want to harvest out of the
> .etl files?  the only way we can trend data is to decide on a set of numbers we
> want to pull out of the .etl files and report.
The number of calls to the following I/O (must enable PROC_TRHEAD, LOADER, FILE_IO, and FILE_IO_INIT providers) methods:
Create
DirEnum
Flush
Read
Rename
SetInfo
Write
Additionally, information about the following disk I/O (must enabled DISK_IO provider):
Read Size
Write Size
The above numbers should all be very stable in my testing.  Tracking hard faults (must enabled HARD_FAULTS provider) would also be good, but in my testing they were inconsistent so we may decide it's not worth tracking those.

It's unclear to me at this time if you want the "[Read|Write|Flush] Service Time" from the DISK_IO provider or the "Duration Time" for each of the file I/O methods for taras' list of things.


(and all of the above is only for the firefox process that we are testing)

Shawn Wilsher :sdwilsh

Reporter

Comment 36

•

13 years ago

I totally forgot to mention that I/O counts need to be separated from the main thread (identified by thread 0 IIRC) and non-main thread.

cmtalbert

Comment 37

•

13 years ago

(In reply to comment #32)
> (In reply to comment #31)
> > NOTE: Before answering these questions please have in mind that answering "yes"
> > or "all" is easy but the current capacity on the Win7 pool of slaves is always
> > being outrun on peak hours.
> This is unfortunate.  Is there a plan to add capacity to it?
Yes, but it's complicated.  We can no longer buy the version of minis we use for testing.  New minis do not run windows and linux natively.  We (my team + Zandr from IT) are looking into new slave systems that will replace our windows and linux boxes.  The new system will not obsolete itself nearly as fast as minis and will be more rackable, manageable etc from an IT sense.  I believe IT has finally been able to get the two candidate machines in house (their makers were being unusually difficult), Zandr is verifying them for IT functionality, after that we will take a look at putting them in our automation & tools staging environment to see how easy/difficult/etc it is to configure them like a slave mini.  So, adding capacity is probably a quarter away (at best) before we get these things in a state where they can be rolled out en masse.

I can hunt down bugs for all this later if you're interested, but I wanted to quickly answer your question right now while doing my bugmail.

Armen [:armenzg]

Comment 38

•

13 years ago

(In reply to comment #37)
> (In reply to comment #32)
> > (In reply to comment #31)
> > > NOTE: Before answering these questions please have in mind that answering "yes"
> > > or "all" is easy but the current capacity on the Win7 pool of slaves is always
> > > being outrun on peak hours.
> > This is unfortunate.  Is there a plan to add capacity to it?
> Yes, but it's complicated.  We can no longer buy the version of minis we use
> for testing.  New minis do not run windows and linux natively.  We (my team +
> Zandr from IT) are looking into new slave systems that will replace our windows
> and linux boxes.  The new system will not obsolete itself nearly as fast as
> minis and will be more rackable, manageable etc from an IT sense.  I believe IT
> has finally been able to get the two candidate machines in house (their makers
> were being unusually difficult), Zandr is verifying them for IT functionality,
> after that we will take a look at putting them in our automation & tools
> staging environment to see how easy/difficult/etc it is to configure them like
> a slave mini.  So, adding capacity is probably a quarter away (at best) before
> we get these things in a state where they can be rolled out en masse.
> 
> I can hunt down bugs for all this later if you're interested, but I wanted to
> quickly answer your question right now while doing my bugmail.

ctalbert's explanation is correct and it applies to releng's infrastructure as well. I will be doing a blog post today and few other later trying to calculate the needed capacity.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 39

•

13 years ago

Comment on attachment 519584 [details] [diff] [review]
talos patch to run xperf when given the --xperf_path command line (1.0)

:anode, this is a working version of the xperf changes.  I wanted to get your feedback on this while we develop the .etl parser code and figure out how to get that pumped through the logs, etc...

Attachment #519584 - Flags: feedback?(anodelman)

Shawn Wilsher :sdwilsh

Reporter

Comment 40

•

13 years ago

(In reply to comment #39)
> :anode, this is a working version of the xperf changes.  I wanted to get your
> feedback on this while we develop the .etl parser code and figure out how to
> get that pumped through the logs, etc...
I forgot to comment here; it looks like nothing like this exists yet, sadly.  Looks like we get to roll our own.

alice nodelman [:alice] [:anode]

Comment 41

•

13 years ago

Comment on attachment 519584 [details] [diff] [review]
talos patch to run xperf when given the --xperf_path command line (1.0)

Looks like you have a partial pref set in the sample.config:
+  talos.logfile: 

I'm a little confused at how the deviceManager is used here to check the platform - I was under the impression that the deviceManager would be None on non-mobile tests.  Shouldn't you be checking platform.system() instead?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 42

•

13 years ago

Attached patch xperf tools to be used with talos (1.0) (obsolete) — Details — Splinter Review

new files to be added in the xtalos/ (newly created) subdirectory.  These will manage start/stop/parsing of xperf when used with talos.

Once these have landed, we can then work on landing the other patch (updated version coming soon) to make this work.

Attachment #531832 - Flags: review?(anodelman)

Attachment #531832 - Flags: feedback?(ctalbert)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 43

•

13 years ago

Attached patch talos patch to run xperf when given the --xperf_path command line (2.0) (obsolete) — Details — Splinter Review

updated patch to run xperf using new xtalos toolset.  Addressed previous feedback as well.

Attachment #519584 - Attachment is obsolete: true

Attachment #531917 - Flags: review?(anodelman)

Attachment #519584 - Flags: feedback?(anodelman)

alice nodelman [:alice] [:anode]

Comment 44

•

13 years ago

Comment on attachment 531917 [details] [diff] [review]
talos patch to run xperf when given the --xperf_path command line (2.0)

Can we get the 'print' commands in bcontroller.py run through the util.py (either noisy or debug messages)?  utils.py is mostly an attempt to centralize output from talos so that it is easier to format.

alice nodelman [:alice] [:anode]

Updated

•

13 years ago

Attachment #531832 - Flags: review?(anodelman) → review+

cmtalbert

Comment 45

•

13 years ago

Comment on attachment 531832 [details] [diff] [review]
xperf tools to be used with talos (1.0)

I have many more questions before I can really provide any useful feedback here.

* xperf_providers and xperf_stackwalkers seem like they should be preferences we are sending to xperf.  The providers are what we are pulling out of the instrumentation, right?  I'm not sure why we'd want that to be hard coded in start_xperf.py.
* Ideally provider/stackwalker information should bubble all the way up to the config file that perfconfigurator generates, right?  It seems like it's something we'd want to allow the test to specify and that's where the other test options live.
* After we parse the file, you do an open(etlparser.csv).  If something went wrong in etlparser and that file doesn't exist, then this will throw and traceback.  At the very least we need to catch that and output a useful message that can be picked up by the log so that people have a clue as to where they need to start debugging.
* In etlparser.py why do you have two different methods of reading the file?  Why not use the "giant file method" all the time?  It seems it would simplify the code and make things more robust.
* If giant files are the norm with xperf, then why do we believe that we can read the generated etlparser.csv file directly into memory and output it to our log?  Do we need to anticipate that the generated CSV file may also be a giant file and handle it properly in bcontroller?

So that's my feedback so far.  The approach itself looks good, but lots of questions about the implicit assumptions being made here.

Attachment #531832 - Flags: feedback?(ctalbert) → feedback+

Shawn Wilsher :sdwilsh

Reporter

Comment 46

•

13 years ago

(In reply to comment #45)
> * If giant files are the norm with xperf, then why do we believe that we can
> read the generated etlparser.csv file directly into memory and output it to our
> log?  Do we need to anticipate that the generated CSV file may also be a giant
> file and handle it properly in bcontroller?
The csv file is just a summary of the data from the ETL so it's substantially smaller.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 47

•

13 years ago

the etl and csv sizes are interesting.

a tp4 run had a:
400MB etl file
850MB .csv file (full parser)
70MB  .csv file (filtered just for firefox.exe)

the first file parser doeesn't read .csv format, really it just looks for processname and the headers.  The other file parser is more complete and actually does a csv read.  It would be possible to beef up the parser and then reuse it.  

good idea about the providers and stackwalkers.  Those need to be set when you start xperf.  I will look into options for passing that in from the config file.

the other stuff is simple things I overlooked, thanks for catching it.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 48

•

13 years ago

Attached patch talos patch to run xperf when given the --xperf_path command line (3.0) (obsolete) — Details — Splinter Review

updated patch as a WIP to address feedback from prior review and to work with bug 657987.  Will update this patch as I get more testing with it.

Attachment #531917 - Attachment is obsolete: true

Attachment #531917 - Flags: review?(anodelman)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 49

•

13 years ago

Attached patch xperf tools to be used with talos (2.0) (obsolete) — Details — Splinter Review

updated xtalos patch, this resolves the feedback related to the file related handling and parsing, but doesn't fully resolve the hardcoding of the providers and stakwalk variables.  Those are mostly handled in other patches, but I need to finish this and test it.  Updated patch coming before too long.

Attachment #531832 - Attachment is obsolete: true

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 50

•

13 years ago

Attached patch xperf tools to be used with talos (2.1) — Details — Splinter Review

updated patch to get xperf start variables from the .config file, also updated to do a single pass on the .etl generated csv file.

Attachment #534520 - Attachment is obsolete: true

Attachment #535253 - Flags: review?(ctalbert)

Attachment #535253 - Flags: feedback?(anodelman)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 51

•

13 years ago

Attached patch talos patch to run xperf when given the --xperf_path command line (3.1) (obsolete) — Details — Splinter Review

talos patch to do the xperf_path and run the xtalos toolchain.  Also changes to the sample.config.

Attachment #534519 - Attachment is obsolete: true

Attachment #535254 - Flags: review?(anodelman)

cmtalbert

Comment 52

•

13 years ago

Comment on attachment 535253 [details] [diff] [review]
xperf tools to be used with talos (2.1)

Review of attachment 535253 [details] [diff] [review]:
-----------------------------------------------------------------

Overall, this looks good.  I have a couple of nits, but nothing really major. r+

::: xtalos/etlparser.py
@@ +45,5 @@
> +eventname_index = 0
> +diskbytes_col = "Size"
> +proc_col = "Process Name ( PID)"
> +fname_col = "FileName"
> +

Nit: usually constants like this are in UPPER_CASE_WITH_UNDERSCORES :)
Also, proc_col isn't used anywhere.

@@ +176,5 @@
> +  processing_options = []
> +  xperf_cmd = '%s -i %s -o %s.csv %s' % \
> +              (options.xperf_tool,
> +               options.etl_filename,
> +               options.etl_filename,

given the default options over in xtalos - this is going to result in the output file name being "output.etl.csv" is that what you're intending?

@@ +182,5 @@
> +
> +  if (options.debug_level >= xtalos.DEBUG_INFO):
> +    print "executing '%s'" % xperf_cmd
> +  os.system(xperf_cmd)
> +  return options.etl_filename + ".csv"

Note the above comment.

@@ +199,5 @@
> +  if (options.processName is None):
> +    options.processName = ""
> +
> +  csvname = etl2csv(options)
> +  filterByProcName(csvname, options.processName)

oh I see, the csv is only a temporary file, so maybe that's why it's name doesn't matter.  We should perhaps make that clearer with a comment?

@@ +208,5 @@
> +
> +  if options.outputFile:
> +    print "filename, readcount, readbytes, writecount, writebytes"
> +  else:
> +    print output

Output isn't defined until after the for loop below us.  So this else case is either going to traceback or print nothing.

::: xtalos/start_xperf.py
@@ +54,5 @@
> +
> +  if not options.stackwalk:
> +    print "No xperf stackwalk options given"
> +    sys.exit(1)
> +

Nit: you have these option verification here and more over in etlparser.  Why not do this all in parser.verifyOptions so that all this verification logic is in one place?  There may be some value in having it local to each file, but I dunno.  I think it makes sense to verify them in the verifyOptions method since we call that from everywhere already.

::: xtalos/xtalos.py
@@ +62,5 @@
> +
> +    self.add_option("-e", "--etl_filename",
> +                    action = "store", dest = "etl_filename",
> +                    help = "Name of the .etl file to work with. Defaults to 'output.etl'")
> +    defaults["etl_filename"] = "output.etl"

The code over in etlparser seems to depend on this being a file name both with and without the extension.

It's probably ok as long as we're cool with our csv being named: "output.etl.csv".

Attachment #535253 - Flags: review?(ctalbert) → review+

alice nodelman [:alice] [:anode]

Updated

•

13 years ago

Attachment #535253 - Flags: feedback?(anodelman) → feedback+

alice nodelman [:alice] [:anode]

Comment 53

•

13 years ago

Comment on attachment 535254 [details] [diff] [review]
talos patch to run xperf when given the --xperf_path command line (3.1)

This looks good to me, though I might like having the command line options for start_xperf/stop_eperf/etlparser stored away somewhere so that they can be more easily found and altered - right now you'd have to hunt through the code to see how they are run.

Attachment #535254 - Flags: review?(anodelman) → review+

alice nodelman [:alice] [:anode]

Updated

•

13 years ago

Depends on: 659512

alice nodelman [:alice] [:anode]

Comment 54

•

13 years ago

Setting the lack of current testing master as blocking this bug.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 55

•

13 years ago

Attached patch add xperf to buildbot-configs (0.5) (obsolete) — Details — Splinter Review

alice nodelman [:alice] [:anode]

Comment 56

•

13 years ago

From a discussion on irc, we are going to block this bug on the rollout of tp5.  Since we are currently in the transition period between tp4 and tp5 upsetting the numbers right now would just cause confusion.  Once tp5 is stable and tp4 is disabled we'll be comfortable altering tp5 to run with xperf - possibly over a similar deployment period as tp5 itself.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 57

•

13 years ago

but while we are waiting for tp4 to be turned off and tp5 to stabilize, we will run these patches in staging and have the buildbot configs ready to go so we just have to flip a switch.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 58

•

13 years ago

Attached patch talos patch to run xperf when given the --xperf_path command line (4.0) — Details — Splinter Review

updated patch to create a xperf.config file and only run the tests as a separate test and not upload results to graph server (buildbot config will determine this).  This isolates the changes well and upon landing and turning on xperf we will not need to side by side compare.

Attachment #535254 - Attachment is obsolete: true

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 59

•

13 years ago

Attached patch add xperf to buildbot-configs (1.0) (obsolete) — Details — Splinter Review

add xperf as a new test type to mozilla central and try only for win7 platform.

Attachment #539806 - Attachment is obsolete: true

Attachment #545983 - Flags: review?(armenzg)

Attachment #545983 - Flags: feedback?(anodelman)

Armen [:armenzg]

Comment 60

•

13 years ago

Comment on attachment 545983 [details] [diff] [review]
add xperf to buildbot-configs (1.0)

It looks good.

You should mark the "try" entry to be False as it controls the coalescing (that should mean that paint_tests is set incorrectly).

I assume this will be landed once xperf gets deployed, right?

Attachment #545983 - Flags: review?(armenzg) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 61

•

13 years ago

I was under the impression all the win7 slaves had c:\perftools\xperf.exe and libraries installed.  Is there a chance we can get the xperf toolchain installed in the next week or so?

One thought I had was to land this patch with xperf turned off on all platforms.  Then when the machines are ready, we just have to change one or two lines.  That would prevent bitrot!

alice nodelman [:alice] [:anode]

Comment 62

•

13 years ago

Comment on attachment 545983 [details] [diff] [review]
add xperf to buildbot-configs (1.0)

Glad that you figured out the win-7 only issue!

Attachment #545983 - Flags: feedback?(anodelman) → feedback+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 63

•

13 years ago

landed patches on talos tree:
http://hg.mozilla.org/build/talos/rev/f90bce780b65
http://hg.mozilla.org/build/talos/rev/c5e493459dd0

alice nodelman [:alice] [:anode]

Updated

•

13 years ago

Depends on: 672192

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 64

•

13 years ago

Attached patch (checked-in) add xperf to buildbot-configs (2.0) — Details — Splinter Review

updated to the new buildbot config changes.  Still needs some testing, but this is a much smaller and more straightforward patch!

Attachment #545983 - Attachment is obsolete: true

Attachment #554919 - Flags: review?(armenzg)

Armen [:armenzg]

Comment 65

•

13 years ago

Comment on attachment 554919 [details] [diff] [review]
(checked-in) add xperf to buildbot-configs (2.0)

Looks good. I can get it in tomorrow's reconfig.

Attachment #554919 - Flags: review?(armenzg) → review+

Armen [:armenzg]

Comment 66

•

13 years ago

sdwilsh which branches should we be running this against?

Armen [:armenzg]

Comment 67

•

13 years ago

Comment on attachment 554919 [details] [diff] [review]
(checked-in) add xperf to buildbot-configs (2.0)

This has been checked-in and deployed on production masters:
http://hg.mozilla.org/build/buildbot-configs/rev/6d28b5dc34fe

Attachment #554919 - Attachment description: add xperf to buildbot-configs (2.0) → (checked-in) add xperf to buildbot-configs (2.0)

(dormant account)

Comment 68

•

13 years ago

(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #66)
> sdwilsh which branches should we be running this against?

mozilla-central and try

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 69

•

13 years ago

ok, this is running on tinderbox for m-c, so the buildbot changes are working:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1314124879.1314125216.1715.gz&fulltext=1

I am not seeing the data in the autolog server we setup to collect it all.  I suspect I am trying to talk to the server incorrectly:
http://hg.mozilla.org/build/talos/file/ff3b05d67fcb/xtalos/etlparser.py#l82

The server is brasstacks.mozilla.com which lives on MPT, but that IP address is the ip address for the MV-Office network.  

My understanding is the tinderbox slaves live on Build-VPN and can talk to MPT machines?  We might just need to adjust this address to be correct.

Armen [:armenzg]

Comment 70

•

13 years ago

It is showing up on tbpl:
tbpl.mozilla.org/?tree=Firefox&jobname=xperf

jmaher do you have a test case for me to try in one of the slaves?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 71

•

13 years ago

Attached patch allow xperf binary to have spaces in the path (1.0) — Details — Splinter Review

our previous assumption that xperf.exe would live in c:\perftools\xperf.exe is not valid and we need to allow for it to live in something like: "c:\program files\microsoft windows performance tools\xperf.exe".

This patch adds quotes around the places where we reference xperf.exe.

Attachment #555912 - Flags: review?(anodelman)

Attachment #555912 - Flags: feedback?(armenzg)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 72

•

13 years ago

Attached patch (checked in) fix path to xperf.exe in config.py (1.0) — Details — Splinter Review

simple patch to adjust the path we pass in for --xperf_path to talos when running the xperf tests.

Attachment #555916 - Flags: review?(armenzg)

alice nodelman [:alice] [:anode]

Updated

•

13 years ago

Attachment #555912 - Flags: review?(anodelman) → review+

Armen [:armenzg]

Updated

•

13 years ago

Attachment #555912 - Flags: feedback?(armenzg) → feedback+

Armen [:armenzg]

Updated

•

13 years ago

Attachment #555916 - Flags: review?(armenzg) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 73

•

13 years ago

buildbot-configs (expecting this will get picked up in the reconfig on Tuesday Aug 30th):
http://hg.mozilla.org/build/buildbot-configs/rev/1389371a6201

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

13 years ago

Depends on: 682301

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 74

•

13 years ago

landed the talos changes:
http://hg.mozilla.org/build/talos/rev/cdead2ea8ba8

When talos.zip (http://people.mozilla.org/~jmaher/taloszips/cdead2ea8ba8/talos.zip, bug 682301) is deployed, we can then make use of the buildbot-configs change and we should start seeing xperf results from buildbot.

Armen [:armenzg]

Updated

•

13 years ago

Attachment #555916 - Attachment description: fix path to xperf.exe in config.py (1.0) → (checked in) fix path to xperf.exe in config.py (1.0)

Armen [:armenzg]

Comment 75

•

13 years ago

The talos.zip has been deployed.
The buildbot-configs change will be picked up tomorrow's scheduled reconfig.

Armen [:armenzg]

Comment 76

•

13 years ago

This has been live since Thursday and is showing up on tbpl.

try support is next.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 77

•

13 years ago

once we have try support, we can close this bug.

Armen [:armenzg]

Comment 78

•

13 years ago

Attached patch add xperf to try — Details — Splinter Review

Attachment #560266 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 79

•

13 years ago

Comment on attachment 560266 [details] [diff] [review]
add xperf to try

Review of attachment 560266 [details] [diff] [review]:
-----------------------------------------------------------------

I like small patches!  do we need to do anything to try-chooser?

Attachment #560266 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

12 years ago

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.