Last Comment Bug 609111 - Talos should run xperf during its tests
: Talos should run xperf during its tests
Status: RESOLVED FIXED
:
Product: Testing
Classification: Components
Component: Talos (show other bugs)
: Trunk
: x86 Windows 7
: -- normal with 1 vote (vote)
: ---
Assigned To: Joel Maher ( :jmaher)
:
:
Mentors:
http://etherpad.mozilla.org:9000/xper...
Depends on: 640829 659512 672192 682301
Blocks: 572459 644744
  Show dependency treegraph
 
Reported: 2010-11-02 14:06 PDT by Shawn Wilsher :sdwilsh
Modified: 2012-05-04 12:13 PDT (History)
17 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
initial code to allow talos to run with xperf (0.5) (19.50 KB, patch)
2011-03-11 12:29 PST, Joel Maher ( :jmaher)
no flags Details | Diff | Splinter Review
talos patch to run xperf when given the --xperf_path command line (1.0) (14.28 KB, patch)
2011-03-15 20:25 PDT, Joel Maher ( :jmaher)
no flags Details | Diff | Splinter Review
xperf tools to be used with talos (1.0) (15.39 KB, patch)
2011-05-11 19:07 PDT, Joel Maher ( :jmaher)
anodelman: review+
cmtalbert: feedback+
Details | Diff | Splinter Review
talos patch to run xperf when given the --xperf_path command line (2.0) (13.86 KB, patch)
2011-05-12 06:31 PDT, Joel Maher ( :jmaher)
no flags Details | Diff | Splinter Review
talos patch to run xperf when given the --xperf_path command line (3.0) (11.85 KB, patch)
2011-05-23 12:41 PDT, Joel Maher ( :jmaher)
no flags Details | Diff | Splinter Review
xperf tools to be used with talos (2.0) (15.39 KB, patch)
2011-05-23 12:43 PDT, Joel Maher ( :jmaher)
no flags Details | Diff | Splinter Review
xperf tools to be used with talos (2.1) (16.18 KB, patch)
2011-05-25 18:56 PDT, Joel Maher ( :jmaher)
cmtalbert: review+
anodelman: feedback+
Details | Diff | Splinter Review
talos patch to run xperf when given the --xperf_path command line (3.1) (9.88 KB, patch)
2011-05-25 18:57 PDT, Joel Maher ( :jmaher)
anodelman: review+
Details | Diff | Splinter Review
add xperf to buildbot-configs (0.5) (2.40 KB, patch)
2011-06-16 09:16 PDT, Joel Maher ( :jmaher)
no flags Details | Diff | Splinter Review
talos patch to run xperf when given the --xperf_path command line (4.0) (18.54 KB, patch)
2011-07-14 13:01 PDT, Joel Maher ( :jmaher)
no flags Details | Diff | Splinter Review
add xperf to buildbot-configs (1.0) (18.13 KB, patch)
2011-07-14 13:12 PDT, Joel Maher ( :jmaher)
armenzg: review+
anodelman: feedback+
Details | Diff | Splinter Review
(checked-in) add xperf to buildbot-configs (2.0) (3.17 KB, patch)
2011-08-22 11:49 PDT, Joel Maher ( :jmaher)
armenzg: review+
Details | Diff | Splinter Review
allow xperf binary to have spaces in the path (1.0) (3.72 KB, patch)
2011-08-25 18:20 PDT, Joel Maher ( :jmaher)
anodelman: review+
armenzg: feedback+
Details | Diff | Splinter Review
(checked in) fix path to xperf.exe in config.py (1.0) (1.18 KB, patch)
2011-08-25 18:28 PDT, Joel Maher ( :jmaher)
armenzg: review+
Details | Diff | Splinter Review
add xperf to try (1.07 KB, patch)
2011-09-14 15:43 PDT, Armen Zambrano [:armenzg] (EDT/UTC-4)
jmaher: review+
Details | Diff | Splinter Review

Description Shawn Wilsher :sdwilsh 2010-11-02 14:06:58 PDT
It'd be really awesome if talos ran xperf during its performance tests so we can get data lots of things that we'd like to track.  I'll keep the things we want to track out of this bug for now (we can do those in follow-ups) and keep this just as getting it running for the talos run.
Comment 1 Ben Hearsum (:bhearsum) 2010-11-02 14:11:52 PDT
New test suites are tracked in Testing: Talos these days. When Talos learns how to run it there may need to be a RelEng bug for a small bit of integration.
Comment 2 Shawn Wilsher :sdwilsh 2010-11-02 14:14:46 PDT
Although, it looks like http://hg.mozilla.org/build/talos/file/ce9b0413fbf1/ttest.py#l283 would make this a bit difficult and not as useful for tests like Ts.
Comment 3 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2010-11-02 15:28:29 PDT
(In reply to comment #3)
> It'd be really awesome if talos ran xperf during its performance tests so we
> can get data lots of things that we'd like to track.  I'll keep the things we
> want to track out of this bug for now (we can do those in follow-ups) and keep
> this just as getting it running for the talos run.

sdwilsh: what impact would running xperf have on Talos perf results?

I ask because I'm trying to understand if you want all Talos suites run with xperf? Or if you want all Talos suites run twice - once with xperf, and once without?
Comment 4 Shawn Wilsher :sdwilsh 2010-11-02 15:33:47 PDT
(In reply to comment #3)
> I ask because I'm trying to understand if you want all Talos suites run with
> xperf? Or if you want all Talos suites run twice - once with xperf, and once
> without?
That's a great question that I'd need to collect data to give you a good answer on.  If we want to be paranoid, we'd do things twice, but I suspect we could get away with just one run with xperf.
Comment 5 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-03-10 12:37:05 PST
As per jmaher we might hear more about this closer to the all hands.

Currently UAC on the Windows slaves in on the way for xperf.

For more details about xperf please read https://developer.mozilla.org/En/Profiling_with_Xperf
Comment 6 (dormant account) 2011-03-10 12:51:29 PST
You can't adjust UAC notification level?
Comment 7 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-03-10 12:55:46 PST
We can (not easily) but we should look to see if we should want to be testing in different conditions than our users (that would require a wider audience to give their feedback).
Comment 8 (dormant account) 2011-03-10 12:57:38 PST
(In reply to comment #7)
> We can (not easily) but we should look to see if we should want to be testing
> in different conditions than our users (that would require a wider audience to
> give their feedback).

I'm not sure what wider audience has to do with technical issues in deploying xperf?
Comment 9 Ben Hearsum (:bhearsum) 2011-03-10 13:00:07 PST
(In reply to comment #8)
> (In reply to comment #7)
> > We can (not easily) but we should look to see if we should want to be testing
> > in different conditions than our users (that would require a wider audience to
> > give their feedback).
> 
> I'm not sure what wider audience has to do with technical issues in deploying
> xperf?

We do our best to keep our test machine configurations in-line with real-world user configurations, to make the tests as valid as possible.
Comment 10 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-03-10 13:08:12 PST
I am referring to disabling UAC on the testing machines.
Comment 11 Shawn Wilsher :sdwilsh 2011-03-10 14:36:15 PST
UAC should have zero impact on performance.  If this is what is blocking us, I'm going to go cry in a corner.
Comment 12 John Ford [:jhford] CET/CEST Berlin Time 2011-03-10 16:54:12 PST
(In reply to comment #11)
> UAC should have zero impact on performance.  If this is what is blocking us,
> I'm going to go cry in a corner.

its not performance, its that most (all?) users run with UAC on, and if we don't test with it on it might go unnoticed.
Comment 13 John Ford [:jhford] CET/CEST Berlin Time 2011-03-10 16:55:18 PST
(In reply to comment #12)
> (In reply to comment #11)
> > UAC should have zero impact on performance.  If this is what is blocking us,
> > I'm going to go cry in a corner.
> 
> its not performance, its that most (all?) users run with UAC on, and if we
> don't test with it on it might go unnoticed.

the last "it" should be "UAC related breakage", sorry for the bugspam
Comment 14 Brendan Eich [:brendan] 2011-03-10 16:58:43 PST
Why is it either/or? Can we run with and without UAC? At least one box without?

/be
Comment 15 Chris AtLee [:catlee] 2011-03-10 17:57:56 PST
(In reply to comment #14)
> Why is it either/or? Can we run with and without UAC? At least one box without?
> 
> /be

I don't think we really want to have two sets of machines, some with and some without.

If we can disable UAC for a particular test run that would be best.
Comment 16 Shawn Wilsher :sdwilsh 2011-03-10 18:01:19 PST
(In reply to comment #12)
> its not performance, its that most (all?) users run with UAC on, and if we
> don't test with it on it might go unnoticed.
So the concern is for tests that run on these slaves, not talos, correct?

(note that I know of a number of techy people who turn it off because it is too naggy for them)

(In reply to comment #14)
> Why is it either/or? Can we run with and without UAC? At least one box without?
I strongly suspect that would require different slave pools, which is likely undesirable for RelEng (especially since we cannot get more of the hardware that these tests run on).
Comment 17 John Ford [:jhford] CET/CEST Berlin Time 2011-03-11 09:41:37 PST
(In reply to comment #15)
> If we can disable UAC for a particular test run that would be best.

If I recall correctly, you have to reboot for a change in UAC to take effect.  If this is correct, it would be much more difficult to schedule tests on the slaves as they would have to reboot half way through the buildbot job, and again at the end of the buildbot job.
Comment 18 (dormant account) 2011-03-11 09:42:46 PST
(In reply to comment #17)
> (In reply to comment #15)
> > If we can disable UAC for a particular test run that would be best.
> 
> If I recall correctly, you have to reboot for a change in UAC to take effect. 
> If this is correct, it would be much more difficult to schedule tests on the
> slaves as they would have to reboot half way through the buildbot job, and
> again at the end of the buildbot job.

If it makes you feel better. I'd like to reboot before running this test anyway. The upside is that it only needs to be run once unlike our other tests.
Comment 19 Shawn Wilsher :sdwilsh 2011-03-11 09:53:29 PST
(In reply to comment #18)
> If it makes you feel better. I'd like to reboot before running this test
> anyway. The upside is that it only needs to be run once unlike our other tests.
Let's do that as a follow-up please.  Scope creep is going to be the death of this project, so getting xperf running on the boxes first should be all we are looking at.
Comment 20 (dormant account) 2011-03-11 09:56:39 PST
(In reply to comment #19)

> Let's do that as a follow-up please.  Scope creep is going to be the death of
> this project, so getting xperf running on the boxes first should be all we are
> looking at.

ok. I'm all for whatever helps this get deployed sooner.
Comment 21 Joel Maher ( :jmaher) 2011-03-11 12:29:50 PST
Created attachment 518789 [details] [diff] [review]
initial code to allow talos to run with xperf (0.5)

this is a patch to talos (hg.mozilla.org/build/talos) which will work if you do this:
python PerfConfigurator -e <path>\firefox.exe --activeTests ts --xperf_path c:\<path>\xperf.exe --sampleConfig xperf.config --output win7.yml
python run_tests.py win7.yml

I am currently running on talos (tools) staging to see how this works.  The UAC has been turned off before my testrun and the xperf tools have been installed to c:\perftools (I had trouble referencing 'c:\program files\microsoft windows performance toolkit').

currently for ts (20 iterations of starting up the browser), we generate at 140MB .etl file.
Comment 22 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-03-11 13:52:04 PST
Maybe we can find a way to exempt xperf.exe from UAC with a manifest or something.
Comment 23 (dormant account) 2011-03-11 15:40:57 PST
(In reply to comment #22)
> Maybe we can find a way to exempt xperf.exe from UAC with a manifest or
> something.

One option is to run everything in a privileged account ie
start xperf
runas talos as a non-priviledged user
stop xperf
runas nonpriviledged user to postprocess xperf.
Comment 24 Joel Maher ( :jmaher) 2011-03-15 20:25:50 PDT
Created attachment 519584 [details] [diff] [review]
talos patch to run xperf when given the --xperf_path command line (1.0)

this patch adds a --xperf_path cli to PerfConfigurator.py which requires the full patch to xperf.exe (i.e. --xperf_path c:/perftools/xperf.exe).

This patch doesn't cause any functional regressions on any other platforms or test types.  I have tested this on the ts:txul:tsvg and tp4 tests.

Next steps are to figure out UAC, xperf tool installation, and get this reviewed.
Comment 25 Joel Maher ( :jmaher) 2011-03-17 06:17:34 PDT
in windows 7 there is a way to set the access control to different levels (effectively 0-4).  By default it is on 4 meaning block everything.  If we set it to 3, we can run xperf, but still get prompts and warnings (if firefox does something malicious) for all system changes and third party stuff.
Comment 26 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-03-17 06:29:11 PDT
jmaher could you give me a screenshot of those warnings?
I have managed to disable crash notifications on Win2008 for 64-bit builds and I assume there should be a way to make that happen for Win7.
Comment 27 Shawn Wilsher :sdwilsh 2011-03-17 07:21:58 PDT
(In reply to comment #25)
> in windows 7 there is a way to set the access control to different levels
> (effectively 0-4).  By default it is on 4 meaning block everything.  If we set
> it to 3, we can run xperf, but still get prompts and warnings (if firefox does
> something malicious) for all system changes and third party stuff.
Awesome.  3 still warns about most things, so I think that setting it to 3 should work great for us.
Comment 28 Joel Maher ( :jmaher) 2011-03-29 08:28:08 PDT
ok, the talos boxes are already set at this level of reporting and I found out that we actually run the talos process elevated to begin with.  So the UAC issue doesn't seem to be a problem.

Remaining items:
 - get xperf tools installed on talos slaves
 - figure out how to upload the .etl file to the directory with the test results
Comment 29 Shawn Wilsher :sdwilsh 2011-03-29 09:35:55 PDT
(In reply to comment #28)
>  - get xperf tools installed on talos slaves
That's bug 640829 (which will hopefully get some action post Q1)

>  - figure out how to upload the .etl file to the directory with the test
> results
Is this really needed?  We could easily process a local file and report the numbers we get.
Comment 30 (dormant account) 2011-03-29 10:52:56 PDT
(In reply to comment #29)
> (In reply to comment #28)
> >  - get xperf tools installed on talos slaves
> That's bug 640829 (which will hopefully get some action post Q1)
> 
> >  - figure out how to upload the .etl file to the directory with the test
> > results
> Is this really needed?  We could easily process a local file and report the
> numbers we get.

(In reply to comment #29)
> (In reply to comment #28)
> >  - get xperf tools installed on talos slaves
> That's bug 640829 (which will hopefully get some action post Q1)
> 
> >  - figure out how to upload the .etl file to the directory with the test
> > results
> Is this really needed?  We could easily process a local file and report the
> numbers we get.

I'd like to have the xperf file available for download.
Comment 31 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-04-06 17:06:44 PDT
I have set up a machine on staging with xperf. I will be meeting with jmaher to make sure it runs properly with buildbot.

Getting the .etl file will have to be discussed later on as those machines do not have keys to upload such file to ftp. FWIW I prefer the post-processing approach.

I have got several questions:
* would we be running xperf by default on every talos job?
** for every suite or a subset of all?
** would we be running suites with/without xperf enabled?
** should xperf jobs have the same priority as other test jobs?
* do we want to run talos with xperf on *every* push?
** just some branches?
** just nightly pushes?
** would we prefer the performance team to request xperf runs upon request?
** would giving try syntax give the value needed?

NOTE: Before answering these questions please have in mind that answering "yes" or "all" is easy but the current capacity on the Win7 pool of slaves is always being outrun on peak hours.
Comment 32 Shawn Wilsher :sdwilsh 2011-04-07 09:30:20 PDT
(In reply to comment #31)
> * would we be running xperf by default on every talos job?
> ** for every suite or a subset of all?
I think Ts, Txul, and Tp4 are the tests we need this on for now.  We can expand it later.

> ** would we be running suites with/without xperf enabled?
per comment 4, I think we'll be fine with just running under xperf, but this will likely cause a change in numbers (and thus baseline).  Probably a good idea to run these on staging with and without xperf to compare and then make a decision.

> ** should xperf jobs have the same priority as other test jobs?
Absolutely.  This will be tracking data that we've needed to track for a very long time.

> * do we want to run talos with xperf on *every* push?
Regressions in what we want to measure are just as serious as regressing tp4 or ts, so I would assume yes.

> ** just some branches?
At the very least, mozilla-central (and all the new branches that we use to track releases)

> ** just nightly pushes?
I worry that this would not give us enough data points.

> ** would we prefer the performance team to request xperf runs upon request?
We'd have to watch all pushes closely, which feels faily.

> ** would giving try syntax give the value needed?
Would maybe be useful, but if we don't run non-xperf talos, we'd get it for free when running talos, right?

> NOTE: Before answering these questions please have in mind that answering "yes"
> or "all" is easy but the current capacity on the Win7 pool of slaves is always
> being outrun on peak hours.
This is unfortunate.  Is there a plan to add capacity to it?
Comment 33 Joel Maher ( :jmaher) 2011-04-07 09:43:50 PDT
can we start figuring out what type of numbers we want to harvest out of the .etl files?  the only way we can trend data is to decide on a set of numbers we want to pull out of the .etl files and report.
Comment 34 (dormant account) 2011-04-07 12:51:26 PDT
(In reply to comment #33)
> can we start figuring out what type of numbers we want to harvest out of the
> .etl files?  the only way we can trend data is to decide on a set of numbers we
> want to pull out of the .etl files and report.

list of files accessed and time spent on their io
Comment 35 Shawn Wilsher :sdwilsh 2011-04-07 13:46:10 PDT
(In reply to comment #33)
> can we start figuring out what type of numbers we want to harvest out of the
> .etl files?  the only way we can trend data is to decide on a set of numbers we
> want to pull out of the .etl files and report.
The number of calls to the following I/O (must enable PROC_TRHEAD, LOADER, FILE_IO, and FILE_IO_INIT providers) methods:
Create
DirEnum
Flush
Read
Rename
SetInfo
Write
Additionally, information about the following disk I/O (must enabled DISK_IO provider):
Read Size
Write Size
The above numbers should all be very stable in my testing.  Tracking hard faults (must enabled HARD_FAULTS provider) would also be good, but in my testing they were inconsistent so we may decide it's not worth tracking those.

It's unclear to me at this time if you want the "[Read|Write|Flush] Service Time" from the DISK_IO provider or the "Duration Time" for each of the file I/O methods for taras' list of things.


(and all of the above is only for the firefox process that we are testing)
Comment 36 Shawn Wilsher :sdwilsh 2011-04-08 10:54:03 PDT
I totally forgot to mention that I/O counts need to be separated from the main thread (identified by thread 0 IIRC) and non-main thread.
Comment 37 cmtalbert 2011-04-12 11:52:56 PDT
(In reply to comment #32)
> (In reply to comment #31)
> > NOTE: Before answering these questions please have in mind that answering "yes"
> > or "all" is easy but the current capacity on the Win7 pool of slaves is always
> > being outrun on peak hours.
> This is unfortunate.  Is there a plan to add capacity to it?
Yes, but it's complicated.  We can no longer buy the version of minis we use for testing.  New minis do not run windows and linux natively.  We (my team + Zandr from IT) are looking into new slave systems that will replace our windows and linux boxes.  The new system will not obsolete itself nearly as fast as minis and will be more rackable, manageable etc from an IT sense.  I believe IT has finally been able to get the two candidate machines in house (their makers were being unusually difficult), Zandr is verifying them for IT functionality, after that we will take a look at putting them in our automation & tools staging environment to see how easy/difficult/etc it is to configure them like a slave mini.  So, adding capacity is probably a quarter away (at best) before we get these things in a state where they can be rolled out en masse.

I can hunt down bugs for all this later if you're interested, but I wanted to quickly answer your question right now while doing my bugmail.
Comment 38 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-04-12 12:17:35 PDT
(In reply to comment #37)
> (In reply to comment #32)
> > (In reply to comment #31)
> > > NOTE: Before answering these questions please have in mind that answering "yes"
> > > or "all" is easy but the current capacity on the Win7 pool of slaves is always
> > > being outrun on peak hours.
> > This is unfortunate.  Is there a plan to add capacity to it?
> Yes, but it's complicated.  We can no longer buy the version of minis we use
> for testing.  New minis do not run windows and linux natively.  We (my team +
> Zandr from IT) are looking into new slave systems that will replace our windows
> and linux boxes.  The new system will not obsolete itself nearly as fast as
> minis and will be more rackable, manageable etc from an IT sense.  I believe IT
> has finally been able to get the two candidate machines in house (their makers
> were being unusually difficult), Zandr is verifying them for IT functionality,
> after that we will take a look at putting them in our automation & tools
> staging environment to see how easy/difficult/etc it is to configure them like
> a slave mini.  So, adding capacity is probably a quarter away (at best) before
> we get these things in a state where they can be rolled out en masse.
> 
> I can hunt down bugs for all this later if you're interested, but I wanted to
> quickly answer your question right now while doing my bugmail.

ctalbert's explanation is correct and it applies to releng's infrastructure as well. I will be doing a blog post today and few other later trying to calculate the needed capacity.
Comment 39 Joel Maher ( :jmaher) 2011-04-12 18:43:03 PDT
Comment on attachment 519584 [details] [diff] [review]
talos patch to run xperf when given the --xperf_path command line (1.0)

:anode, this is a working version of the xperf changes.  I wanted to get your feedback on this while we develop the .etl parser code and figure out how to get that pumped through the logs, etc...
Comment 40 Shawn Wilsher :sdwilsh 2011-04-12 18:57:07 PDT
(In reply to comment #39)
> :anode, this is a working version of the xperf changes.  I wanted to get your
> feedback on this while we develop the .etl parser code and figure out how to
> get that pumped through the logs, etc...
I forgot to comment here; it looks like nothing like this exists yet, sadly.  Looks like we get to roll our own.
Comment 41 alice nodelman [:alice] [:anode] 2011-04-18 14:53:21 PDT
Comment on attachment 519584 [details] [diff] [review]
talos patch to run xperf when given the --xperf_path command line (1.0)

Looks like you have a partial pref set in the sample.config:
+  talos.logfile: 

I'm a little confused at how the deviceManager is used here to check the platform - I was under the impression that the deviceManager would be None on non-mobile tests.  Shouldn't you be checking platform.system() instead?
Comment 42 Joel Maher ( :jmaher) 2011-05-11 19:07:55 PDT
Created attachment 531832 [details] [diff] [review]
xperf tools to be used with talos (1.0)

new files to be added in the xtalos/ (newly created) subdirectory.  These will manage start/stop/parsing of xperf when used with talos.

Once these have landed, we can then work on landing the other patch (updated version coming soon) to make this work.
Comment 43 Joel Maher ( :jmaher) 2011-05-12 06:31:11 PDT
Created attachment 531917 [details] [diff] [review]
talos patch to run xperf when given the --xperf_path command line (2.0)

updated patch to run xperf using new xtalos toolset.  Addressed previous feedback as well.
Comment 44 alice nodelman [:alice] [:anode] 2011-05-16 10:43:40 PDT
Comment on attachment 531917 [details] [diff] [review]
talos patch to run xperf when given the --xperf_path command line (2.0)

Can we get the 'print' commands in bcontroller.py run through the util.py (either noisy or debug messages)?  utils.py is mostly an attempt to centralize output from talos so that it is easier to format.
Comment 45 cmtalbert 2011-05-16 14:52:09 PDT
Comment on attachment 531832 [details] [diff] [review]
xperf tools to be used with talos (1.0)

I have many more questions before I can really provide any useful feedback here.

* xperf_providers and xperf_stackwalkers seem like they should be preferences we are sending to xperf.  The providers are what we are pulling out of the instrumentation, right?  I'm not sure why we'd want that to be hard coded in start_xperf.py.
* Ideally provider/stackwalker information should bubble all the way up to the config file that perfconfigurator generates, right?  It seems like it's something we'd want to allow the test to specify and that's where the other test options live.
* After we parse the file, you do an open(etlparser.csv).  If something went wrong in etlparser and that file doesn't exist, then this will throw and traceback.  At the very least we need to catch that and output a useful message that can be picked up by the log so that people have a clue as to where they need to start debugging.
* In etlparser.py why do you have two different methods of reading the file?  Why not use the "giant file method" all the time?  It seems it would simplify the code and make things more robust.
* If giant files are the norm with xperf, then why do we believe that we can read the generated etlparser.csv file directly into memory and output it to our log?  Do we need to anticipate that the generated CSV file may also be a giant file and handle it properly in bcontroller?

So that's my feedback so far.  The approach itself looks good, but lots of questions about the implicit assumptions being made here.
Comment 46 Shawn Wilsher :sdwilsh 2011-05-16 15:10:15 PDT
(In reply to comment #45)
> * If giant files are the norm with xperf, then why do we believe that we can
> read the generated etlparser.csv file directly into memory and output it to our
> log?  Do we need to anticipate that the generated CSV file may also be a giant
> file and handle it properly in bcontroller?
The csv file is just a summary of the data from the ETL so it's substantially smaller.
Comment 47 Joel Maher ( :jmaher) 2011-05-16 16:12:10 PDT
the etl and csv sizes are interesting.

a tp4 run had a:
400MB etl file
850MB .csv file (full parser)
70MB  .csv file (filtered just for firefox.exe)

the first file parser doeesn't read .csv format, really it just looks for processname and the headers.  The other file parser is more complete and actually does a csv read.  It would be possible to beef up the parser and then reuse it.  

good idea about the providers and stackwalkers.  Those need to be set when you start xperf.  I will look into options for passing that in from the config file.

the other stuff is simple things I overlooked, thanks for catching it.
Comment 48 Joel Maher ( :jmaher) 2011-05-23 12:41:13 PDT
Created attachment 534519 [details] [diff] [review]
talos patch to run xperf when given the --xperf_path command line (3.0)

updated patch as a WIP to address feedback from prior review and to work with bug 657987.  Will update this patch as I get more testing with it.
Comment 49 Joel Maher ( :jmaher) 2011-05-23 12:43:02 PDT
Created attachment 534520 [details] [diff] [review]
xperf tools to be used with talos (2.0)

updated xtalos patch, this resolves the feedback related to the file related handling and parsing, but doesn't fully resolve the hardcoding of the providers and stakwalk variables.  Those are mostly handled in other patches, but I need to finish this and test it.  Updated patch coming before too long.
Comment 50 Joel Maher ( :jmaher) 2011-05-25 18:56:22 PDT
Created attachment 535253 [details] [diff] [review]
xperf tools to be used with talos (2.1)

updated patch to get xperf start variables from the .config file, also updated to do a single pass on the .etl generated csv file.
Comment 51 Joel Maher ( :jmaher) 2011-05-25 18:57:25 PDT
Created attachment 535254 [details] [diff] [review]
talos patch to run xperf when given the --xperf_path command line (3.1)

talos patch to do the xperf_path and run the xtalos toolchain.  Also changes to the sample.config.
Comment 52 cmtalbert 2011-06-01 14:33:39 PDT
Comment on attachment 535253 [details] [diff] [review]
xperf tools to be used with talos (2.1)

Review of attachment 535253 [details] [diff] [review]:
-----------------------------------------------------------------

Overall, this looks good.  I have a couple of nits, but nothing really major. r+

::: xtalos/etlparser.py
@@ +45,5 @@
> +eventname_index = 0
> +diskbytes_col = "Size"
> +proc_col = "Process Name ( PID)"
> +fname_col = "FileName"
> +

Nit: usually constants like this are in UPPER_CASE_WITH_UNDERSCORES :)
Also, proc_col isn't used anywhere.

@@ +176,5 @@
> +  processing_options = []
> +  xperf_cmd = '%s -i %s -o %s.csv %s' % \
> +              (options.xperf_tool,
> +               options.etl_filename,
> +               options.etl_filename,

given the default options over in xtalos - this is going to result in the output file name being "output.etl.csv" is that what you're intending?

@@ +182,5 @@
> +
> +  if (options.debug_level >= xtalos.DEBUG_INFO):
> +    print "executing '%s'" % xperf_cmd
> +  os.system(xperf_cmd)
> +  return options.etl_filename + ".csv"

Note the above comment.

@@ +199,5 @@
> +  if (options.processName is None):
> +    options.processName = ""
> +
> +  csvname = etl2csv(options)
> +  filterByProcName(csvname, options.processName)

oh I see, the csv is only a temporary file, so maybe that's why it's name doesn't matter.  We should perhaps make that clearer with a comment?

@@ +208,5 @@
> +
> +  if options.outputFile:
> +    print "filename, readcount, readbytes, writecount, writebytes"
> +  else:
> +    print output

Output isn't defined until after the for loop below us.  So this else case is either going to traceback or print nothing.

::: xtalos/start_xperf.py
@@ +54,5 @@
> +
> +  if not options.stackwalk:
> +    print "No xperf stackwalk options given"
> +    sys.exit(1)
> +

Nit: you have these option verification here and more over in etlparser.  Why not do this all in parser.verifyOptions so that all this verification logic is in one place?  There may be some value in having it local to each file, but I dunno.  I think it makes sense to verify them in the verifyOptions method since we call that from everywhere already.

::: xtalos/xtalos.py
@@ +62,5 @@
> +
> +    self.add_option("-e", "--etl_filename",
> +                    action = "store", dest = "etl_filename",
> +                    help = "Name of the .etl file to work with. Defaults to 'output.etl'")
> +    defaults["etl_filename"] = "output.etl"

The code over in etlparser seems to depend on this being a file name both with and without the extension.

It's probably ok as long as we're cool with our csv being named: "output.etl.csv".
Comment 53 alice nodelman [:alice] [:anode] 2011-06-01 15:21:41 PDT
Comment on attachment 535254 [details] [diff] [review]
talos patch to run xperf when given the --xperf_path command line (3.1)

This looks good to me, though I might like having the command line options for start_xperf/stop_eperf/etlparser stored away somewhere so that they can be more easily found and altered - right now you'd have to hunt through the code to see how they are run.
Comment 54 alice nodelman [:alice] [:anode] 2011-06-01 15:28:41 PDT
Setting the lack of current testing master as blocking this bug.
Comment 55 Joel Maher ( :jmaher) 2011-06-16 09:16:03 PDT
Created attachment 539806 [details] [diff] [review]
add xperf to buildbot-configs (0.5)
Comment 56 alice nodelman [:alice] [:anode] 2011-06-16 13:01:32 PDT
From a discussion on irc, we are going to block this bug on the rollout of tp5.  Since we are currently in the transition period between tp4 and tp5 upsetting the numbers right now would just cause confusion.  Once tp5 is stable and tp4 is disabled we'll be comfortable altering tp5 to run with xperf - possibly over a similar deployment period as tp5 itself.
Comment 57 Joel Maher ( :jmaher) 2011-06-16 14:29:15 PDT
but while we are waiting for tp4 to be turned off and tp5 to stabilize, we will run these patches in staging and have the buildbot configs ready to go so we just have to flip a switch.
Comment 58 Joel Maher ( :jmaher) 2011-07-14 13:01:16 PDT
Created attachment 545982 [details] [diff] [review]
talos patch to run xperf when given the --xperf_path command line (4.0)

updated patch to create a xperf.config file and only run the tests as a separate test and not upload results to graph server (buildbot config will determine this).  This isolates the changes well and upon landing and turning on xperf we will not need to side by side compare.
Comment 59 Joel Maher ( :jmaher) 2011-07-14 13:12:24 PDT
Created attachment 545983 [details] [diff] [review]
add xperf to buildbot-configs (1.0)

add xperf as a new test type to mozilla central and try only for win7 platform.
Comment 60 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-07-14 13:33:46 PDT
Comment on attachment 545983 [details] [diff] [review]
add xperf to buildbot-configs (1.0)

It looks good.

You should mark the "try" entry to be False as it controls the coalescing (that should mean that paint_tests is set incorrectly).

I assume this will be landed once xperf gets deployed, right?
Comment 61 Joel Maher ( :jmaher) 2011-07-14 18:34:53 PDT
I was under the impression all the win7 slaves had c:\perftools\xperf.exe and libraries installed.  Is there a chance we can get the xperf toolchain installed in the next week or so?

One thought I had was to land this patch with xperf turned off on all platforms.  Then when the machines are ready, we just have to change one or two lines.  That would prevent bitrot!
Comment 62 alice nodelman [:alice] [:anode] 2011-07-15 06:29:14 PDT
Comment on attachment 545983 [details] [diff] [review]
add xperf to buildbot-configs (1.0)

Glad that you figured out the win-7 only issue!
Comment 63 Joel Maher ( :jmaher) 2011-07-15 13:29:10 PDT
landed patches on talos tree:
http://hg.mozilla.org/build/talos/rev/f90bce780b65
http://hg.mozilla.org/build/talos/rev/c5e493459dd0
Comment 64 Joel Maher ( :jmaher) 2011-08-22 11:49:36 PDT
Created attachment 554919 [details] [diff] [review]
(checked-in) add xperf to buildbot-configs (2.0)

updated to the new buildbot config changes.  Still needs some testing, but this is a much smaller and more straightforward patch!
Comment 65 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-08-22 12:49:42 PDT
Comment on attachment 554919 [details] [diff] [review]
(checked-in) add xperf to buildbot-configs (2.0)

Looks good. I can get it in tomorrow's reconfig.
Comment 66 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-08-23 08:10:50 PDT
sdwilsh which branches should we be running this against?
Comment 67 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-08-23 08:35:46 PDT
Comment on attachment 554919 [details] [diff] [review]
(checked-in) add xperf to buildbot-configs (2.0)

This has been checked-in and deployed on production masters:
http://hg.mozilla.org/build/buildbot-configs/rev/6d28b5dc34fe
Comment 68 (dormant account) 2011-08-23 08:46:19 PDT
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #66)
> sdwilsh which branches should we be running this against?

mozilla-central and try
Comment 69 Joel Maher ( :jmaher) 2011-08-23 12:47:39 PDT
ok, this is running on tinderbox for m-c, so the buildbot changes are working:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1314124879.1314125216.1715.gz&fulltext=1

I am not seeing the data in the autolog server we setup to collect it all.  I suspect I am trying to talk to the server incorrectly:
http://hg.mozilla.org/build/talos/file/ff3b05d67fcb/xtalos/etlparser.py#l82

The server is brasstacks.mozilla.com which lives on MPT, but that IP address is the ip address for the MV-Office network.  

My understanding is the tinderbox slaves live on Build-VPN and can talk to MPT machines?  We might just need to adjust this address to be correct.
Comment 70 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-08-24 08:13:19 PDT
It is showing up on tbpl:
tbpl.mozilla.org/?tree=Firefox&jobname=xperf

jmaher do you have a test case for me to try in one of the slaves?
Comment 71 Joel Maher ( :jmaher) 2011-08-25 18:20:57 PDT
Created attachment 555912 [details] [diff] [review]
allow xperf binary to have spaces in the path (1.0)

our previous assumption that xperf.exe would live in c:\perftools\xperf.exe is not valid and we need to allow for it to live in something like: "c:\program files\microsoft windows performance tools\xperf.exe".

This patch adds quotes around the places where we reference xperf.exe.
Comment 72 Joel Maher ( :jmaher) 2011-08-25 18:28:02 PDT
Created attachment 555916 [details] [diff] [review]
(checked in) fix path to xperf.exe in config.py (1.0)

simple patch to adjust the path we pass in for --xperf_path to talos when running the xperf tests.
Comment 73 Joel Maher ( :jmaher) 2011-08-26 08:31:27 PDT
buildbot-configs (expecting this will get picked up in the reconfig on Tuesday Aug 30th):
http://hg.mozilla.org/build/buildbot-configs/rev/1389371a6201
Comment 74 Joel Maher ( :jmaher) 2011-08-26 08:45:40 PDT
landed the talos changes:
http://hg.mozilla.org/build/talos/rev/cdead2ea8ba8

When talos.zip (http://people.mozilla.org/~jmaher/taloszips/cdead2ea8ba8/talos.zip, bug 682301) is deployed, we can then make use of the buildbot-configs change and we should start seeing xperf results from buildbot.
Comment 75 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-08-29 07:02:32 PDT
The talos.zip has been deployed.
The buildbot-configs change will be picked up tomorrow's scheduled reconfig.
Comment 76 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-09-06 12:43:00 PDT
This has been live since Thursday and is showing up on tbpl.

try support is next.
Comment 77 Joel Maher ( :jmaher) 2011-09-14 13:36:23 PDT
once we have try support, we can close this bug.
Comment 78 Armen Zambrano [:armenzg] (EDT/UTC-4) 2011-09-14 15:43:15 PDT
Created attachment 560266 [details] [diff] [review]
add xperf to try
Comment 79 Joel Maher ( :jmaher) 2011-09-14 17:14:51 PDT
Comment on attachment 560266 [details] [diff] [review]
add xperf to try

Review of attachment 560266 [details] [diff] [review]:
-----------------------------------------------------------------

I like small patches!  do we need to do anything to try-chooser?

Note You need to log in before you can comment on or make changes to this bug.