Last Comment Bug 770317 - create a xperf-based mainthread io tracker
: create a xperf-based mainthread io tracker
Status: RESOLVED FIXED
[Snappy:P1]
:
Product: Testing
Classification: Components
Component: General (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla17
Assigned To: Joel Maher ( :jmaher)
:
:
Mentors:
: 559663 (view as bug list)
Depends on: 778856 779272 780681 781319 786308 788235 788354
Blocks: 644744 734737
  Show dependency treegraph
 
Reported: 2012-07-02 13:34 PDT by (dormant account)
Modified: 2012-10-08 12:05 PDT (History)
14 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
xperf File IO tracker (2.44 KB, text/plain)
2012-07-12 17:56 PDT, Nicolás Chaim
no flags Details
Add xperf probe for XPCOM shutdown event (5.00 KB, patch)
2012-07-13 13:38 PDT, Nicolás Chaim
vladan.bugzilla: feedback+
Details | Diff | Splinter Review
xperf File and Network IO tracker (3.13 KB, text/plain)
2012-07-17 13:02 PDT, Nicolás Chaim
vladan.bugzilla: feedback+
Details
Add xperf probe for XPCOM shutdown event (5.03 KB, patch)
2012-07-19 17:16 PDT, Nicolás Chaim
mh+mozilla: review+
dteller: feedback+
Details | Diff | Splinter Review
Add xperf probe for XPCOM shutdown event (5.07 KB, patch)
2012-07-20 09:40 PDT, Nicolás Chaim
no flags Details | Diff | Splinter Review
IO tracker + talos (12.19 KB, patch)
2012-07-20 15:17 PDT, Nicolás Chaim
jmaher: feedback+
Details | Diff | Splinter Review
xperf File and Network IO tracker (3.03 KB, text/plain)
2012-07-20 15:25 PDT, Nicolás Chaim
no flags Details
IO tracker + talos (21.71 KB, patch)
2012-07-24 14:20 PDT, Nicolás Chaim
jmaher: review+
Details | Diff | Splinter Review
IO tracker + talos (21.48 KB, patch)
2012-07-29 14:04 PDT, Nicolás Chaim
jmaher: review+
Details | Diff | Splinter Review
Fix class name in mof file (870 bytes, patch)
2012-07-30 12:26 PDT, Nicolás Chaim
taras.mozilla: review+
Details | Diff | Splinter Review
add the 12 counters we are collecting with xperf to the graph server (1.0) (2.19 KB, patch)
2012-08-03 09:13 PDT, Joel Maher ( :jmaher)
k0scist: review+
Details | Diff | Splinter Review
make the counters report to graph server (1.0) (8.47 KB, patch)
2012-08-27 08:06 PDT, Joel Maher ( :jmaher)
k0scist: feedback+
Details | Diff | Splinter Review
make the counters report to graph server (1.1) (8.47 KB, patch)
2012-08-31 18:03 PDT, Joel Maher ( :jmaher)
k0scist: review+
Details | Diff | Splinter Review
example CSV (1.23 KB, text/plain)
2012-08-31 23:44 PDT, Jeff Hammel
no flags Details
example reading CSV (669 bytes, text/plain)
2012-09-01 00:05 PDT, Jeff Hammel
jmaher: feedback+
Details
fix bug in counters for tp5n on windows, cleanup .csv parsing (1.0) (4.05 KB, patch)
2012-09-04 09:37 PDT, Joel Maher ( :jmaher)
k0scist: review+
Details | Diff | Splinter Review

Description (dormant account) 2012-07-02 13:34:41 PDT
We caught a number of surprisingly silly occurrences of main thread IO in Firefox. These result in unbounded pauses during Firefox execution. We need to eliminate them.

We currently have some xperf support in Firefox. I would like to make use of + extend xperf support in order to output the following:
* Number of disk IOPs 
* Number of Network IOPS

Those should be broke up by main/non-main thread and also segmented into [startup | normal use | shutdown].

Once someone on perf team figures this out, we'll hand this off to someone in Automation to integrate with graph server.
Comment 1 David Teller [:Yoric] (please use "needinfo") 2012-07-04 02:27:46 PDT
There are several ways to do this. I believe that we should first try the following:
- from code, fire an xperf probe at the end of startup (done, as per bug 696033);
- from code, fire an xperf probe at the start of shutdown (which event should we use?);
- launch Firefox monitored by |xperf -on FileIO+Network+"Mozilla Generic Provider"|;
- stop xperf, parse results.
Comment 2 Nicolás Chaim 2012-07-12 17:56:49 PDT
Created attachment 641679 [details]
xperf File IO tracker

Python script that calls xperf with FileIO and "Mozilla Generic Provider" providers, parses the dumpfile, and aggregates I/O operation by main/non-main thread and startup/normal/shutdown stages.

Example: python iotracker.py "C:/firefox-src/mozilla-central/obj-dbg/dist/bin/firefox.exe -no-remote -p dev"

{'non-main': {'startup': defaultdict(<type 'int'>, {'read_bytes': 5313621, 'read_ops': 176}), 'normal': defaultdict(<type 'int'>, {'write_bytes': 15764688, 'read_bytes': 31482145, 'read_ops': 1057, 'write_ops': 770})}, 'main': {'startup': defaultdict(<type 'int'>, {'write_bytes': 765, 'read_bytes': 136616759, 'read_ops': 2516, 'write_ops': 1}), 'shutdown': defaultdict(<type 'int'>, {'write_bytes': 28356, 'read_bytes': 10446, 'read_ops': 265, 'write_ops': 5}), 'normal': defaultdict(<type 'int'>, {'write_bytes': 2550862, 'read_bytes': 2540804, 'read_ops': 250, 'write_ops': 424})}}
Comment 3 Marco Castelluccio [:marco] 2012-07-12 18:01:37 PDT
Is this something we can integrate in telemetry or the slowstartup addon?
Comment 4 Nicolás Chaim 2012-07-13 10:53:11 PDT
(In reply to Marco Castelluccio from comment #3)

I'm not sure about integrating it with the slowstartup addon, as Event Tracing for Windows requires administrative privileges. It also has a non-negligible performance overhead when it's enabled (events from all processes are received and must be filtered).
Comment 5 Joel Maher ( :jmaher) 2012-07-13 12:33:06 PDT
For reference we currently run xperf+firefox via talos.  Here are the launching and parsing scripts:
http://hg.mozilla.org/build/talos/file/807c94aef671/talos/xtalos

I am thinking if we can boil this down to 1 or two numbers we can get those added in graph server and report up to there.  

Questions:
* will we have to install a .mof or other files on the windows machine?
** If so, can this be automated?  We have hundreds of windows 7 clients
* Are we only targeting windows 7, or other flavors of windows now?
Comment 6 Nicolás Chaim 2012-07-13 13:38:32 PDT
Created attachment 642025 [details] [diff] [review]
Add xperf probe for XPCOM shutdown event
Comment 7 Nicolás Chaim 2012-07-17 13:02:18 PDT
Created attachment 643106 [details]
xperf File and Network IO tracker
Comment 8 Nicolás Chaim 2012-07-17 13:07:52 PDT
(In reply to Joel Maher (:jmaher) from comment #5)
Yes, we'll need to install the mof file that includes the XPCOM shutdown event.
Comment 9 Joel Maher ( :jmaher) 2012-07-17 13:12:59 PDT
is there a way to do this installation automatically?  If not, this is a manual process that will need to occur on hundreds of machines.

Also do you have a link to the .mof file?
Comment 10 Vladan Djeric (:vladan) 2012-07-19 11:10:27 PDT
Comment on attachment 642025 [details] [diff] [review]
Add xperf probe for XPCOM shutdown event

Looks good, remember to have a module owner review it next so we can land the patch :)
Comment 11 Vladan Djeric (:vladan) 2012-07-19 11:40:27 PDT
Comment on attachment 643106 [details]
xperf File and Network IO tracker

In your runp() function, you can just do something like subprocess.Popen(..., stdout=sys.stdout, ..) instead of redirecting the subprocess's stdout to your script and manually "print"-ing it

Just a nitpick, but could you also change the indent to 2 spaces instead of 4 and use a bit more space in the code, e.g. put "continue" and "break" statements on separate lines from condition checks
Comment 12 Nicolás Chaim 2012-07-19 17:16:27 PDT
Created attachment 644076 [details] [diff] [review]
Add xperf probe for XPCOM shutdown event
Comment 13 Mike Hommey [:glandium] 2012-07-19 22:18:45 PDT
Comment on attachment 644076 [details] [diff] [review]
Add xperf probe for XPCOM shutdown event

Review of attachment 644076 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/startup/mozprofilerprobe.mof
@@ +21,5 @@
>  {
> +};
> +
> +[dynamic: ToInstance, Description("Mozilla Event: XPCOM Shutdown."): Amended,
> + Guid("{26d1e091-0ae7-4f49-a554-4214445c505c}"),

Please uppercase the guid, like the others above.

::: toolkit/components/startup/nsAppStartup.cpp
@@ +92,5 @@
>    { 0xA7, 0x60, 0x8D, 0x49, 0x02, 0x77, 0x48, 0xAE} }
> +// {26d1e091-0ae7-4f49-a554-4214445c505c}
> +#define NS_XPCOM_SHUTDOWN_EVENT_CID \
> +  { 0x26d1e091, 0x0ae7, 0x4f49, \
> +  { 0xa5, 0x54, 0x42, 0x14, 0x44, 0x5c, 0x50, 0x5c} }

Likewise.
Comment 14 Nicolás Chaim 2012-07-20 09:40:39 PDT
Created attachment 644370 [details] [diff] [review]
Add xperf probe for XPCOM shutdown event

Changed GUIDs to uppercase.
Comment 15 Vladan Djeric (:vladan) 2012-07-20 10:42:22 PDT
Comment on attachment 644370 [details] [diff] [review]
Add xperf probe for XPCOM shutdown event

https://hg.mozilla.org/integration/mozilla-inbound/rev/f0544409e034
Comment 16 Nicolás Chaim 2012-07-20 15:17:50 PDT
Created attachment 644497 [details] [diff] [review]
IO tracker + talos

I have made changes to the way xperf in launched in talos to allow for both user mode logging and kernel logging, as needed for the IO tracker.

- The PID of the firefox instance is also now passed to etlparser.py
- Replaced many os.system calls with subprocess.call since otherwise commands with two sets of double quotes would fail for some reason.

jmaher will do further work to integrate the IO tracker with talos based on the "xperf File and Network IO tracker" python script.
Comment 17 Nicolás Chaim 2012-07-20 15:25:51 PDT
Created attachment 644502 [details]
xperf File and Network IO tracker

Improvements based on vladan's comments.
Comment 18 Ryan VanderMeulen [:RyanVM] 2012-07-20 21:04:51 PDT
https://hg.mozilla.org/mozilla-central/rev/f0544409e034
Comment 19 Joel Maher ( :jmaher) 2012-07-23 08:55:06 PDT
Comment on attachment 644497 [details] [diff] [review]
IO tracker + talos

Review of attachment 644497 [details] [diff] [review]:
-----------------------------------------------------------------

just a couple of questions.

::: talos/bcontroller.py
@@ +133,5 @@
>               '-o', csvname,
>               '-p', self.process,
>               '-e', etlname,
> +             '-c', self.configFile,
> +             '--pid', str(proc.pid)]

do we use --pid anywhere?  I don't see it used in etlparser.py

::: talos/xperf.config
@@ +7,5 @@
>  # base test overrides
>  basetest:
>    tpcycles: 1
>    xperf_providers : ['PROC_THREAD', 'LOADER', 'HARD_FAULTS', 'FILENAME', 'FILE_IO', 'FILE_IO_INIT']
> +  xperf_user_providers: ['"Mozilla Generic Provider"', 'Microsoft-Windows-TCPIP']

I assume we can work with the " properly.
Comment 20 Nicolás Chaim 2012-07-23 09:56:26 PDT
Yes, we should probably always add quotes to each of the providers when building the command line string, instead of putting the quotes in the name of the provider itself when it has spaces.

options.pid is not used anywhere in etlparser.py, but can be used when implementing network/per-thread IO parsing.
Comment 21 (dormant account) 2012-07-24 14:12:49 PDT
Lets not work in a closed bug until the work is done :)
Comment 22 Nicolás Chaim 2012-07-24 14:20:21 PDT
Created attachment 645491 [details] [diff] [review]
IO tracker + talos

Integrated with etlparser; now dumps a etl_output_thread_stats.csv file with aggregated counters.
Comment 23 Nicolás Chaim 2012-07-24 14:24:52 PDT
Comment on attachment 645491 [details] [diff] [review]
IO tracker + talos

- Log entries are no longer pre-filtered by the Firefox process, since incoming network events may occur on any other process. In this case, it's the connection ID (etw:ActivityId) what matters.
- Other events are filtered by the Firefox PID instead of the process name.
Comment 24 Joel Maher ( :jmaher) 2012-07-27 07:09:54 PDT
Comment on attachment 645491 [details] [diff] [review]
IO tracker + talos

Review of attachment 645491 [details] [diff] [review]:
-----------------------------------------------------------------

a couple nits/questions below, but otherwise this is great.

I ran this locally and I ended up with:
> more etl_output_thread_stats.csv
thread, stage, counter, value
main, normal, file_read_bytes, 1188312
main, normal, file_read_ops, 57
main, normal, file_write_bytes, 4143423
main, normal, file_write_ops, 206
main, normal, net_recv_bytes, 26
main, normal, net_send_bytes, 26
main, startup, file_read_bytes, 830415
main, startup, file_read_ops, 85
main, startup, file_write_bytes, 735
main, startup, file_write_ops, 18
main, startup, net_recv_bytes, 4
main, startup, net_send_bytes, 5
non-main, normal, file_read_bytes, 1037622
non-main, normal, file_read_ops, 41
non-main, normal, file_write_bytes, 785162
non-main, normal, file_write_ops, 58
non-main, normal, net_recv_bytes, 2527
non-main, normal, net_send_bytes, 2532
non-main, startup, file_read_bytes, 169236
non-main, startup, file_read_ops, 10
non-main, startup, file_write_bytes, 49704
non-main, startup, file_write_ops, 4

What do we want to report to the graph server?  I think this is too many variable to report (22 total).  Taras mentioned running this against tp5, is that the only test we want to run this against for now?

Ideally I would like to get this down to 6 or 8 counters that we report to the graph server.  I guess it is not big deal if we need to do them all, it will just make displaying and reporting these more difficult.

::: talos/xtalos/start_xperf.py
@@ +66,5 @@
>                (options.xperf_path,
>                 '+'.join(options.xperf_providers),
>                 '+'.join(options.xperf_stackwalk),
> +               options.etl_filename,
> +               '"+"'.join(options.xperf_user_providers),

nice call on the quotes here.

::: talos/xtalos/stop_xperf.py
@@ +48,5 @@
>    if options == None:
>      print "Unable to verify options"
>      sys.exit(1)
>  
> +  xperf_cmd = '"%s" -stop -stop talos_ses' % (options.xperf_path)

do we need two '-stop' commands here?

::: talos/xtalos/xtalos.py
@@ +73,5 @@
>  
>      self.add_option("-d", "--debug",
>                      type="int", dest = "debug_level",
>                      help = "debug level for output from tool (0-5, 5 being everything), defaults to 1")
> +    defaults["debug_level"] = 5

I am not sure we want to make this level=5 by default.
Comment 25 Joel Maher ( :jmaher) 2012-07-27 10:19:12 PDT
Here are the results from tp5n (1 cycle, not 25 cycles):

(talos) c:\mozilla\talos\talos>type etl_output_thread_stats.csv
thread, stage, counter, value
main, normal, file_read_bytes, 61656933
main, normal, file_read_ops, 2759
main, normal, file_write_bytes, 5690149
main, normal, file_write_ops, 769
main, normal, net_recv_bytes, 2076257
main, normal, net_send_bytes, 1689701
main, startup, file_read_bytes, 1303547
main, startup, file_read_ops, 104
main, startup, file_write_bytes, 735
main, startup, file_write_ops, 18
main, startup, net_recv_bytes, 1
main, startup, net_send_bytes, 2
non-main, normal, file_read_bytes, 158177202
non-main, normal, file_read_ops, 6186
non-main, normal, file_write_bytes, 500356217
non-main, normal, file_write_ops, 44076
non-main, normal, net_recv_bytes, 68757627
non-main, normal, net_send_bytes, 68953041
non-main, startup, file_read_bytes, 512320
non-main, startup, file_read_ops, 18
Comment 26 (dormant account) 2012-07-28 18:09:04 PDT
We could get rid of ops(maybe) and merge read/write byte counters.
Comment 27 Nicolás Chaim 2012-07-29 14:04:18 PDT
Created attachment 647008 [details] [diff] [review]
IO tracker + talos

- Restored defaults["debug_level"] to 1
- Added total I/O counters (i.e. net_io_bytes)
Comment 28 Nicolás Chaim 2012-07-29 14:07:58 PDT
(In reply to Joel Maher (:jmaher) from comment #24)
> Comment on attachment 645491 [details] [diff] [review]
> ::: talos/xtalos/stop_xperf.py
> @@ +48,5 @@
> >    if options == None:
> >      print "Unable to verify options"
> >      sys.exit(1)
> >  
> > +  xperf_cmd = '"%s" -stop -stop talos_ses' % (options.xperf_path)
> 
> do we need two '-stop' commands here?
> 

Yes, two stop commands are needed to stop the two trace sessions (user and kernel). "talos_ses" is the name of the user session, the kernel session has no name.
Comment 29 Nicolás Chaim 2012-07-29 14:14:14 PDT
(In reply to Joel Maher (:jmaher) from comment #25)
> Here are the results from tp5n (1 cycle, not 25 cycles):
> 
> (talos) c:\mozilla\talos\talos>type etl_output_thread_stats.csv
> ...

We still need to install the mof file and XPCOM shutdown event patch to get I/O during shutdown separately. Then it will look something like what I get on my machine:

thread, stage, counter, value
main, normal, file_io_bytes, 2714779
main, normal, file_read_bytes, 1122436
main, normal, file_read_ops, 54
main, normal, file_write_bytes, 1592343
main, normal, file_write_ops, 91
main, normal, net_io_bytes, 1684
main, normal, net_recv_bytes, 673
main, normal, net_send_bytes, 1011
main, shutdown, file_io_bytes, 2681987
main, shutdown, file_write_bytes, 2681987
main, shutdown, file_write_ops, 174
main, shutdown, net_io_bytes, 4
main, shutdown, net_recv_bytes, 2
main, shutdown, net_send_bytes, 2
main, startup, file_io_bytes, 4619774
main, startup, file_read_bytes, 4618891
main, startup, file_read_ops, 574
main, startup, file_write_bytes, 883
main, startup, file_write_ops, 18
main, startup, net_io_bytes, 25
main, startup, net_recv_bytes, 12
main, startup, net_send_bytes, 13
non-main, normal, file_io_bytes, 1887236
non-main, normal, file_read_bytes, 1064664
non-main, normal, file_read_ops, 39
non-main, normal, file_write_bytes, 822572
non-main, normal, file_write_ops, 60
non-main, normal, net_io_bytes, 648
non-main, normal, net_recv_bytes, 347
non-main, normal, net_send_bytes, 301
non-main, startup, file_io_bytes, 5489921
non-main, startup, file_read_bytes, 5460697
non-main, startup, file_read_ops, 219
non-main, startup, file_write_bytes, 29224
non-main, startup, file_write_ops, 7
non-main, startup, net_io_bytes, 5055
non-main, startup, net_recv_bytes, 2527
non-main, startup, net_send_bytes, 2528
Comment 30 Nicolás Chaim 2012-07-29 14:21:21 PDT
So I guess we're reporting 12 counters to the graph server:

(main|non-main), (startup|normal|shutdown), (file_io_bytes|net_io_bytes)

Some counters may not always be present. Last time I ran it, there was no I/O in non-main threads during shutdown.
Comment 31 Joel Maher ( :jmaher) 2012-07-29 15:51:03 PDT
we can go with the twelve counters.  If we do, I want to restrict this to a single test, or no more than 2 tests.  Currently we are looking at TP5, are there other tests which would be of use?
Comment 32 Joel Maher ( :jmaher) 2012-07-30 08:44:39 PDT
is it normal to not have all counters present?  i.e. is there a technical reason?  If this is expected and normal, then I can make sure that talos handles that scenario properly.
Comment 33 Nicolás Chaim 2012-07-30 08:48:03 PDT
The script just creates the corresponding items in the array as events are parsed.
Comment 34 Joel Maher ( :jmaher) 2012-07-30 08:48:21 PDT
Comment on attachment 647008 [details] [diff] [review]
IO tracker + talos

Review of attachment 647008 [details] [diff] [review]:
-----------------------------------------------------------------

Overall this is good.  I understand the MOF file.  For the xpcom shutdown event, do we just need to land a patch on mozilla-central and use that?
Comment 35 Nicolás Chaim 2012-07-30 08:52:52 PDT
(In reply to Joel Maher (:jmaher) from comment #34)
> Comment on attachment 647008 [details] [diff] [review]
> IO tracker + talos
> 
> Review of attachment 647008 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Overall this is good.  I understand the MOF file.  For the xpcom shutdown
> event, do we just need to land a patch on mozilla-central and use that?

Yes. The patch is already on mozilla-central:
https://hg.mozilla.org/mozilla-central/rev/f0544409e034
Comment 36 (dormant account) 2012-07-30 10:11:16 PDT
(In reply to Joel Maher (:jmaher) from comment #31)
> we can go with the twelve counters.  If we do, I want to restrict this to a
> single test, or no more than 2 tests.  Currently we are looking at TP5, are
> there other tests which would be of use?

tp5 is good for now.

What's our eta for deployment?
Comment 37 Joel Maher ( :jmaher) 2012-07-30 10:18:18 PDT
the biggest unknown here is getting the .mof file deployed.  

Otherwise I need to spend some time getting graph server definitions defined, we need to deploy this as a new talos.zip and we need to update buildbot-configs.

Most of the known quantities can be handled this week, the unknown could take a few weeks.  If we could get this deployed and just not collect/report data from the .mof file, we could get going and then track the bug to deploy the .mof file.
Comment 38 (dormant account) 2012-07-30 10:30:03 PDT
Why is it hard to deploy the .mof, is there anything we can do to speed that up? Having data partitioned makes it a lot more useful.
Comment 39 Joel Maher ( :jmaher) 2012-07-30 10:31:58 PDT
we just need to deploy the .mof file on all the windows 7 machines which is a few hundred.  I assume this can be scripted as my initial test seemed like it would work fine, but this needs to get scheduled in with releng which wouldn't be a top priority for them.
Comment 40 (dormant account) 2012-07-30 11:56:24 PDT
(In reply to Joel Maher (:jmaher) from comment #39)
> we just need to deploy the .mof file on all the windows 7 machines which is
> a few hundred.  I assume this can be scripted as my initial test seemed like
> it would work fine, but this needs to get scheduled in with releng which
> wouldn't be a top priority for them.

Can you file a bug specifying what needs to be done, I'll followup with making it a priority.
Comment 41 Nicolás Chaim 2012-07-30 12:26:47 PDT
Created attachment 647248 [details] [diff] [review]
Fix class name in mof file

Fixed class name in mof file. Apparently it didn't end up making much difference since classes are referred by guid.
Comment 42 Nicolás Chaim 2012-07-30 18:31:28 PDT
Can we get it checked-in to mozilla-central?
Comment 43 (dormant account) 2012-07-30 21:20:53 PDT
(In reply to Nicolás Chaim from comment #42)
> Can we get it checked-in to mozilla-central?

mark the patch as checkin-needed, point at glandium
Comment 44 Nicolás Chaim 2012-07-31 08:54:35 PDT
Comment on attachment 647248 [details] [diff] [review]
Fix class name in mof file

I can't seem to get the check-in needed flag.
Comment 45 Vladan Djeric (:vladan) 2012-07-31 11:55:48 PDT
Comment on attachment 647248 [details] [diff] [review]
Fix class name in mof file

https://hg.mozilla.org/integration/mozilla-inbound/rev/710a0800bbef
Comment 46 Vladan Djeric (:vladan) 2012-07-31 12:58:09 PDT
https://hg.mozilla.org/mozilla-central/rev/95bd473e7164
Comment 47 Ryan VanderMeulen [:RyanVM] 2012-07-31 19:17:28 PDT
https://hg.mozilla.org/mozilla-central/rev/710a0800bbef
Comment 48 (dormant account) 2012-08-02 11:43:09 PDT
Not done with the xperf part yet. lets stop landing mc-changes in this bug :)
Comment 49 Ed Morley [:emorley] 2012-08-02 11:57:11 PDT
(In reply to Taras Glek (:taras) from comment #48)
> Not done with the xperf part yet. lets stop landing mc-changes in this bug :)

If it helps, you can use [leave open] in the whiteboard to stop the merge tool from closing the bug. (https://groups.google.com/d/topic/mozilla.dev.platform/DQyQ56DgmhA/discussion)
Comment 50 Joel Maher ( :jmaher) 2012-08-03 09:13:31 PDT
Created attachment 648736 [details] [diff] [review]
add the 12 counters we are collecting with xperf to the graph server (1.0)
Comment 51 Joel Maher ( :jmaher) 2012-08-06 11:43:21 PDT
landed graph server definitions:
http://hg.mozilla.org/graphs/rev/b3935f847775
Comment 52 Joel Maher ( :jmaher) 2012-08-09 06:10:10 PDT
One problem I am having getting this patch to land on talos is that it uses python bits from modern talos and we run python 2.4 on the windows talos machines.  We have been trying for the last 2 quarters to bump the python version up and it is much easier to rewrite our code to work with 2.4.

The biggest thing is the 'from collections import defaultdict'.  Is the primary purpose for the default dict to keep things in proper order:
for cntr in sorted(io.iterkeys()):
  for cntr in sorted(io.iterkeys()):

Either we can drop that requirement or work around it with some extra code, I am testing it now with no defaultdict requirement.
Comment 53 Joel Maher ( :jmaher) 2012-08-09 09:43:27 PDT
I had to assign this bug to myself since nicolas @ mozilla.com was not a valid email in bugzilla and I couldn't submit my comment.

I have cleaned up the patch and it works in talos.  I found that Providing a user_provider which doesn't exist on the system will fail to produce anything.  So in the spirit of getting things going we can temporarily remove the "Mozilla Generic Provider" from the listening configs and this generates:

thread, stage, counter, value
main, startup, file_io_bytes, 40790612
main, startup, file_read_bytes, 38018251
main, startup, file_read_ops, 697
main, startup, file_write_bytes, 2772361
main, startup, file_write_ops, 170
main, startup, net_io_bytes, 43
main, startup, net_recv_bytes, 21
main, startup, net_send_bytes, 22


This isn't ideal, but it gets us going and allows us to integrate into talos until the windows 7 machines are all updated with the .mof file.
Comment 54 Nicolás Chaim 2012-08-09 09:52:06 PDT
defaultdict(int) creates a dict that initializes members to the type's default value (zero for int) if they don't exist.

If we're using a regular dict, the equivalent would be:

    key = (th, stg, "net_%s_bytes" % opType)
    if key not in io:
        io[key] = 0
    io[key] += bytes

We could perhaps also include some existing, external implementation for the defaultdict class, or maybe just initialize the dict with all possible keys we're using and zero values.
Comment 55 Jeff Hammel 2012-08-09 10:23:21 PDT
(In reply to Nicolás Chaim from comment #54)
> defaultdict(int) creates a dict that initializes members to the type's
> default value (zero for int) if they don't exist.
> 
> If we're using a regular dict, the equivalent would be:
> 
>     key = (th, stg, "net_%s_bytes" % opType)
>     if key not in io:
>         io[key] = 0
>     io[key] += bytes

Please don't do this.  python dicts have had setdefault forever

    key = (th, stg, "net_%s_bytes" % opType)
    io.setdefault(key, 0) += bytes
Comment 56 Nicolás Chaim 2012-08-09 10:49:31 PDT
(In reply to Jeff Hammel [:jhammel] from comment #55)
> (In reply to Nicolás Chaim from comment #54)
> > defaultdict(int) creates a dict that initializes members to the type's
> > default value (zero for int) if they don't exist.
> > 
> > If we're using a regular dict, the equivalent would be:
> > 
> >     key = (th, stg, "net_%s_bytes" % opType)
> >     if key not in io:
> >         io[key] = 0
> >     io[key] += bytes
> 
> Please don't do this.  python dicts have had setdefault forever
> 
>     key = (th, stg, "net_%s_bytes" % opType)
>     io.setdefault(key, 0) += bytes

Oh, that's perfect. I'm not very familiar with Python.
Comment 57 Nicolás Chaim 2012-08-09 13:54:08 PDT
I guess it would still need to be 
    io[key] = io.get(key, 0) + bytes
since
    io.setdefault(key, 0)
will just return an inmutable int
Comment 58 Jeff Hammel 2012-08-09 14:40:12 PDT
(In reply to Nicolás Chaim from comment #57)
> I guess it would still need to be 
>     io[key] = io.get(key, 0) + bytes
> since
>     io.setdefault(key, 0)
> will just return an inmutable int

Beh, good point. :(
Comment 59 Joel Maher ( :jmaher) 2012-08-10 06:37:15 PDT
landed the talos bits with the slight mods mentioned in the last few comments.  This worked on try server without affecting current results:
http://hg.mozilla.org/build/talos/rev/d3733d055e49

We still need to do a few modifications to talos to bring the xperf providers up from parsing the etl file and get them input into the graph server.  I am working on that with :jhammel.
Comment 60 Joel Maher ( :jmaher) 2012-08-27 08:06:05 PDT
Created attachment 655589 [details] [diff] [review]
make the counters report to graph server (1.0)

This patch does what it says it does by publishing all of the counters we collect from xperf to the graph servers.  The remaining work is to resolve the config files and how this will be run from buildbot.  If this patch is f+, we can start resolving the config file issues and the buildbot configs.
Comment 61 Jeff Hammel 2012-08-27 14:52:32 PDT
Comment on attachment 655589 [details] [diff] [review]
make the counters report to graph server (1.0)

>diff -r 6d79047595a4 talos/bcontroller.py
>--- a/talos/bcontroller.py	Fri Aug 24 20:02:51 2012 -0700
>+++ b/talos/bcontroller.py	Mon Aug 27 09:37:21 2012 -0400
>@@ -109,11 +109,13 @@
>         print "Error running etlparser: %s" % e
>         self.returncode = 1
> 
>-      print "__xperf_data_begin__"
>-      fhandle = open(csvname, 'r')
>-      print fhandle.read()
>+      results_file = open(self.browser_log, "a")
>+      results_file.write("__xperf_data_begin__")
>+      fhandle = open("etl_output_thread_stats.csv", 'r')
>+      results_file.write(fhandle.read())
>       fhandle.close()
>-      print "__xperf_data_end__"
>+      results_file.write("__xperf_data_end__")
>+      results_file.close()
>     else:    #blocking call to system, non-remote device
>       self.returncode = os.system(self.command + " > " + self.browser_log)

So this is just ensuring that we're writing to the browser log(?)  That seems to be a good idea.  I've also filed bug 786055 to get rid of the os.system calls and shell-level redirects

>diff -r 6d79047595a4 talos/cmanager_win32.py
>--- a/talos/cmanager_win32.py	Fri Aug 24 20:02:51 2012 -0700
>+++ b/talos/cmanager_win32.py	Mon Aug 27 09:37:21 2012 -0400
>@@ -21,6 +21,7 @@
>       path = win32pdh.MakeCounterPath((None, 'process', process,
>                                        None, -1, counter))
>       hq = win32pdh.OpenQuery()
>+      hc = None
>       try:
>         hc = win32pdh.AddCounter(hq, path)
>       except:
>@@ -33,8 +34,9 @@
>         except:
>           win32pdh.CloseQuery(hq)
> 
>-      self.registeredCounters[counter] = [hq, [(hc, path)]]
>-      self.updateCounterPathsForChildProcesses(counter)
>+      if hc:
>+        self.registeredCounters[counter] = [hq, [(hc, path)]]
>+        self.updateCounterPathsForChildProcesses(counter)

Not sure I understand this code, but maybe I don't need to.

>diff -r 6d79047595a4 talos/results.py
>--- a/talos/results.py	Fri Aug 24 20:02:51 2012 -0700
>+++ b/talos/results.py	Mon Aug 27 09:37:21 2012 -0400
>@@ -265,6 +265,9 @@
>     # regular expression for responsiveness results
>     RESULTS_RESPONSIVENESS_REGEX = re.compile('MOZ_EVENT_TRACE\ssample\s\d*?\s(\d*?)$', re.DOTALL|re.MULTILINE)
> 
>+    # regular expression for xperf io data
>+    XPERF_REGEX = re.compile('([a-z, \-\_]+)_io_bytes, ([0-9]+)')

As you can probably guess, I'm not a huge fan of regexes like this, but I guess this is okay.  Maybe a comment show example output would help.

Also, isn't the xperf output formatted data anyway?  Can't we just read the .csv and call it a day?

>     # classes for results types
>     classes = {'tsformat': TsResults,
>                'tpformat': PageloaderResults}
>@@ -391,6 +394,36 @@
>                 self.shutdown(global_counters)
>             if 'responsiveness' in global_counters:
>                 global_counters['responsiveness'].extend(self.responsiveness())
>+            self.xperf(global_counters)
>+
>+    def xperf(self, counter_results):
>+        """record xperf counters in counter_results dictionary"""
>+
>+        counters = ['main_startup_fileio', 
>+                    'main_startup_netio',
>+                    'main_normal_fileio',
>+                    'main_normal_netio',
>+                    'main_shutdown_fileio',
>+                    'main_shutdown_netio',
>+                    'nonmain_startup_fileio', 
>+                    'nonmain_startup_netio',
>+                    'nonmain_normal_fileio',
>+                    'nonmain_normal_netio',
>+                    'nonmain_shutdown_fileio',
>+                    'nonmain_shutdown_netio']

Pretty messy but I don't know a way around this.

>+        if not set(counters).intersection(counter_results.keys()):
>+            # no xperf counters to accumulate
>+            return
>+        for line in self.results_raw.split('\n'):
>+            xperfmatch = self.XPERF_REGEX.search(line)

Again, I wouldn't use a regex for this.

>+            if xperfmatch:
>+                (type, value) = (xperfmatch.group(1), xperfmatch.group(2))
>+                # type will be similar to 'main, startup, file'
>+                cname = ('_').join(type.split(', '))
>+                counter_name = '%sio' % cname
>+                if counter_name in counter_results:
>+                    counter_results[counter_name].append(value)

Does this need a setdefault? (not sure).

>     def rss(self, counter_results):
>         """record rss counters in counter_results dictionary"""
>diff -r 6d79047595a4 talos/test.py
>--- a/talos/test.py	Fri Aug 24 20:02:51 2012 -0700
>+++ b/talos/test.py	Mon Aug 27 09:37:21 2012 -0400
>@@ -49,7 +49,7 @@
> 
> class TsBase(Test):
>     """abstract base class for ts-style tests"""
>-    keys = ['url', 'url_timestamp', 'timeout', 'cycles', 'shutdown', 'profile_path', 
>+    keys = ['url', 'url_timestamp', 'timeout', 'cycles', 'shutdown', 'profile_path', 'xperf_counters',
>             'xperf_providers', 'xperf_user_providers', 'xperf_stackwalk']
> 
> class ts(TsBase):
>@@ -62,6 +62,7 @@
> class ts_paint(ts):
>     url = 'startup_test/tspaint_test.html?begin='
>     shutdown = None
>+    xperf_counters = ['main_startup_fileio', 'main_startup_netio', 'main_normal_fileio', 'main_normal_netio', 'main_shutdown_fileio', 'main_shutdown_netio', 'nonmain_startup_fileio', 'nonmain_startup_netio', 'nonmain_normal_fileio', 'nonmain_normal_netio', 'nonmain_shutdown_fileio', 'nonmain_shutdown_netio']

Again, messy, though i can't think of a better way of doing this.  Maybe e.g. fileio implies all of '{main,nonmain}_{startup,normal,shutdown}_fileio'...

> class ts_places_generated_max(ts):
>     profile_path = '${talos}/places_generated_max'
>@@ -145,7 +146,7 @@
>     filters = None
>     keys = ['tpmanifest', 'tpcycles', 'tppagecycles', 'tprender', 'tpchrome', 'tpmozafterpaint',
>             'rss', 'resolution', 'cycles',
>-            'win_counters', 'w7_counters', 'linux_counters', 'mac_counters', 'remote_counters',
>+            'win_counters', 'w7_counters', 'linux_counters', 'mac_counters', 'remote_counters', 'xperf_counters',
>             'timeout', 'shutdown', 'responsiveness', 'profile_path',
>             'xperf_providers', 'xperf_user_providers', 'xperf_stackwalk', 'filters'
>             ]
>@@ -197,6 +198,7 @@
>     w7_counters = ['Main_RSS', 'Content_RSS', 'Private Bytes', '% Processor Time', 'Modified Page List Bytes']
>     linux_counters = ['Private Bytes', 'XRes', 'Main_RSS', 'Content_RSS']
>     mac_counters = ['Private Bytes', 'Main_RSS', 'Content_RSS']
>+    xperf_counters = ['main_startup_fileio', 'main_startup_netio', 'main_normal_fileio', 'main_normal_netio', 'main_shutdown_fileio', 'main_shutdown_netio', 'nonmain_startup_fileio', 'nonmain_startup_netio', 'nonmain_normal_fileio', 'nonmain_normal_netio', 'nonmain_shutdown_fileio', 'nonmain_shutdown_netio']
> 
> class tp_js(PageloaderTest):
>     url = """'"http://localhost/page_load_test/framecycler.html?quit=1&cycles=5"'"""
>diff -r 6d79047595a4 talos/ttest.py
>--- a/talos/ttest.py	Fri Aug 24 20:02:51 2012 -0700
>+++ b/talos/ttest.py	Mon Aug 27 09:37:21 2012 -0400
>@@ -270,6 +270,10 @@
>             # setup global (cross-cycle) counters:
>             # shutdown, responsiveness
>             global_counters = {}
>+            if browser_config.get('xperf_path'):
>+                for c in test_config.get('xperf_counters', []):
>+                    global_counters[c] = []
>+
>             if test_config['shutdown']:
>                 global_counters['shutdown'] = []
>             if test_config.get('responsiveness') and platform.system() != "Linux":
>diff -r 6d79047595a4 talos/xperf.config
>--- a/talos/xperf.config	Fri Aug 24 20:02:51 2012 -0700
>+++ b/talos/xperf.config	Mon Aug 27 09:37:21 2012 -0400
>@@ -5,10 +5,15 @@
>   talos.logfile: browser_output.txt
> 
> # base test overrides
>-# TODO: add "Mozilla Generic Provider" to xperf_user_providers once we deploy the mof file to test slaves
> basetest:
>+  cycles: 1
>   tpcycles: 1
>-  cycles: 1
>+  tppagecycles: 1
>   xperf_providers : ['PROC_THREAD', 'LOADER', 'HARD_FAULTS', 'FILENAME', 'FILE_IO', 'FILE_IO_INIT']
>-  xperf_user_providers: ['Microsoft-Windows-TCPIP']
>+  xperf_user_providers: ['Mozilla Generic Provider', 'Microsoft-Windows-TCPIP']
>   xperf_stackwalk : ['FileRead', 'FileWrite', 'FileFlush']
>+  w7_counters: []
>+  win_counters: []
>+  linux_counters: []
>+  remote_counters: []
>+  mac_counters: []
Comment 62 Joel Maher ( :jmaher) 2012-08-28 07:21:50 PDT
so we could assume there is a known csv (or we could make it .json) file and parse the data that way.  That would remove our dependency on regex.  The idea was to keep this similar to the rest of the system in that we output data in browser_output.txt and then parse it up after the test completes.

Minor details, lets figure out the .config problem and start getting this stuff up for real reviews.
Comment 63 Joel Maher ( :jmaher) 2012-08-31 18:03:32 PDT
Created attachment 657506 [details] [diff] [review]
make the counters report to graph server (1.1)

A few small tweaks to this to make it work.  Lets get this reviewed and talos deployed:)  I chose to stick with all output going to browser_output.txt instead of having the results.py read the .csv file.  I would like to keep it as consistent as possible and put all the filename logic for .etl and .csv in one place.
Comment 64 Jeff Hammel 2012-08-31 23:42:42 PDT
Comment on attachment 657506 [details] [diff] [review]
make the counters report to graph server (1.1)

This looks fine.  Would you mind if I did a follow-up wrt to how we parse the counters in results? (Parsing via CSV vs the regex)  I would also like to put this on two separate lines:

NOISE: __xperf_data_end____startBeforeLaunchTimestamp1346452998207__endBeforeLaunchTimestamp

We have the following data marked by delimeters:
NOISE: __xperf_data_begin__thread, stage, counter, value
NOISE: main, normal, file_io_bytes, 135479381
NOISE: main, normal, file_read_bytes, 129100860
NOISE: main, normal, file_read_ops, 4280
NOISE: main, normal, file_write_bytes, 6378521
NOISE: main, normal, file_write_ops, 558
NOISE: main, normal, net_io_bytes, 11938485
NOISE: main, normal, net_recv_bytes, 1429236
NOISE: main, normal, net_send_bytes, 10509249
NOISE: main, shutdown, file_io_bytes, 26375
NOISE: main, shutdown, file_read_bytes, 7373
NOISE: main, shutdown, file_read_ops, 151
NOISE: main, shutdown, file_write_bytes, 19002
NOISE: main, shutdown, file_write_ops, 3
NOISE: main, startup, file_io_bytes, 24399710
NOISE: main, startup, file_read_bytes, 24398866
NOISE: main, startup, file_read_ops, 420
NOISE: main, startup, file_write_bytes, 844
NOISE: main, startup, file_write_ops, 18
NOISE: main, startup, net_io_bytes, 2
NOISE: main, startup, net_recv_bytes, 1
NOISE: main, startup, net_send_bytes, 1
NOISE: non-main, normal, file_io_bytes, 471162110
NOISE: non-main, normal, file_read_bytes, 96544227
NOISE: non-main, normal, file_read_ops, 3854
NOISE: non-main, normal, file_write_bytes, 374617883
NOISE: non-main, normal, file_write_ops, 37368
NOISE: non-main, normal, net_io_bytes, 150199046
NOISE: non-main, normal, net_recv_bytes, 84117265
NOISE: non-main, normal, net_send_bytes, 66081781
NOISE: non-main, startup, file_io_bytes, 354464
NOISE: non-main, startup, file_read_bytes, 354464
NOISE: non-main, startup, file_read_ops, 12
NOISE: __xperf_data_end__

This is just CSV.  Should be pretty trivial to parse with the csv module.

I'm also inclined to put the list of all counters in e.g. xtalos or somewhere the can be globally accessed.  This way we can check for validity of what is passed in.
Comment 65 Jeff Hammel 2012-08-31 23:44:07 PDT
Created attachment 657532 [details]
example CSV
Comment 66 Jeff Hammel 2012-09-01 00:05:07 PDT
Created attachment 657537 [details]
example reading CSV

this is an example of reading the example output from CSV.  I don't look for the delimeters, __xperf_data_{start,end}__ (which we currently don't use at all) which needs to be done, but these can be easily found a la the other delimeters in results (e.g. http://hg.mozilla.org/build/talos/file/6d1c5ee867e6/talos/results.py#l330)
Comment 67 Jeff Hammel 2012-09-01 00:05:40 PDT
(it would also be nice to have a test for this parsing)
Comment 68 Joel Maher ( :jmaher) 2012-09-01 01:47:26 PDT
Comment on attachment 657537 [details]
example reading CSV

>import csv
>
>contents = file('xperf.csv').read()
we should do:
if not os.path.exists(browser_config['xperf_csvname']):
    return
contents = file(browser_config['xperf_csvname']).read()

>lines = contents.splitlines()
>reader = csv.reader(lines)
>header = None
>for row in reader:
>
>    # read CSV
>    row = [i.strip() for i in row]
>    if not header:
>        header = row
this assumes the first row is the header, we could look for __xperf_start or something

>        continue
>    values = dict(zip(header, row))
>
>    # format for talos
>    thread = values['thread']
>    if thread == 'non-main':
>        thread = 'nonmain'
come to think of it, we can fix this hack in how we generate the csv in the first place

>    counter = values['counter'].rsplit('_io_bytes', 1)[0]
>    counter_name = '%s_%s_%sio' % (thread, values['stage'], counter)
>    value = float(values['value'])
>
>    # accrue counter
>    if counter_name in counter_results:
>        counter_results[counter_name].append(value)
I would innitialize the counter first.  Also make sure we only assign counters that are legit.
if counter_name in xtalos.known_couters:
    counter_results.setdefault(counter_name, []).append(value)
Comment 69 Joel Maher ( :jmaher) 2012-09-01 01:54:47 PDT
landed the results->graphserver patch in talos repo:
http://hg.mozilla.org/build/talos/rev/a632249f5fe9

still left todo:
sanity check try server run
deploy talos.zip
create patch for buildbot, get r+ and a reconfig
Comment 70 Jeff Hammel 2012-09-01 01:57:12 PDT
(In reply to Joel Maher (:jmaher) from comment #68)
> Comment on attachment 657537 [details]
> example reading CSV
> 
> >import csv
> >
> >contents = file('xperf.csv').read()
> we should do:
> if not os.path.exists(browser_config['xperf_csvname']):
>     return
> contents = file(browser_config['xperf_csvname']).read()

Sure, this was just an example for local data pasted from your try run.  In real life:

- we already have the browser_log in memory: http://hg.mozilla.org/build/talos/file/6d1c5ee867e6/talos/results.py#l284

- we would look for __xperf_start__ and __xperf_end__ like we do the other delimeters (e.g. line 330) and take the content between these delimeters

- these become the contents vs the xperf.csv file I read for testing

> >lines = contents.splitlines()
> >reader = csv.reader(lines)
> >header = None
> >for row in reader:
> >
> >    # read CSV
> >    row = [i.strip() for i in row]
> >    if not header:
> >        header = row
> this assumes the first row is the header, we could look for __xperf_start or
> something

Currently, we print the header row right after __xperf_start__

> >        continue
> >    values = dict(zip(header, row))
> >
> >    # format for talos
> >    thread = values['thread']
> >    if thread == 'non-main':
> >        thread = 'nonmain'
> come to think of it, we can fix this hack in how we generate the csv in the
> first place

Even better :)

> >    counter = values['counter'].rsplit('_io_bytes', 1)[0]
> >    counter_name = '%s_%s_%sio' % (thread, values['stage'], counter)
> >    value = float(values['value'])
> >
> >    # accrue counter
> >    if counter_name in counter_results:
> >        counter_results[counter_name].append(value)
> I would innitialize the counter first.  Also make sure we only assign
> counters that are legit.
> if counter_name in xtalos.known_couters:
>     counter_results.setdefault(counter_name, []).append(value)

Yep, even better :)  In any case, I'm fine doing this as a follow up if you want to check in your patch. I mostly wrote this as a proof of concept.
Comment 71 Joel Maher ( :jmaher) 2012-09-01 01:59:46 PDT
I would rather do this as a followup so we can get the xperf tests live as fast as possible.  Although with the long weekend, it might be a couple days before we really do get the tests live.
Comment 72 Jeff Hammel 2012-09-01 02:00:19 PDT
sounds good to me
Comment 73 Joel Maher ( :jmaher) 2012-09-04 09:37:39 PDT
Created attachment 658128 [details] [diff] [review]
fix bug in counters for tp5n on windows, cleanup .csv parsing (1.0)

this incorporates the suggestions :jhammel had for csv parsing of xperf.csv.  No more hacking through browser_output.txt!

There was a bug in the regular tp5 runs where we were trying to access counters which didn't exist.  This is resolved in the patch by better error checking.
Comment 74 Jeff Hammel 2012-09-04 09:46:15 PDT
Comment on attachment 658128 [details] [diff] [review]
fix bug in counters for tp5n on windows, cleanup .csv parsing (1.0)

+    if len(self.registeredCounters[counter]) < 2:
+      return None
+

Not sure I understand this line?

+        filename = 'etl_output_thread_stats.csv'
+        if not os.path.exists(filename):
+            return

AIUI, if you have xperf counters, this file *should* exist.  So maybe we should make a debug statement here?

+        import csv

Personally I'd put this at the top of the file.

Hmmm....there's also no error handling if the file is of a bad format.  I guess that's not a problem until it is.

Other than those nits, looks good to me!
Comment 75 Joel Maher ( :jmaher) 2012-09-04 11:15:02 PDT
updated with feedback from review and landed:
http://hg.mozilla.org/build/talos/rev/2605620aa7fc
Comment 76 Joel Maher ( :jmaher) 2012-09-07 10:43:37 PDT
Displaying all graphs at once:
http://graphs.mozilla.org/graph.html#tests=[[251,94,12],[250,94,12],[248,94,12],[247,94,12],[246,94,12],[244,94,12],[245,94,12],[243,94,12],[242,94,12]]&sel=none&displayrange=7&datatype=running

tp5n_nonmain_normal_netio_paint: 153735839.0:
http://graphs.mozilla.org/graph.html#tests=[[251,94,12]]&sel=none&displayrange=7&datatype=running

tp5n_main_startup_fileio_paint: 24400446.0:
http://graphs.mozilla.org/graph.html#tests=[[242,94,12]]&sel=none&displayrange=7&datatype=running

tp5n_main_shutdown_fileio_paint: 59143.0:
http://graphs.mozilla.org/graph.html#tests=[[246,94,12]]&sel=none&displayrange=7&datatype=running

tp5n_main_normal_netio_paint: 20300.0:
http://graphs.mozilla.org/graph.html#tests=[[245,94,12]]&sel=none&displayrange=7&datatype=running

tp5n_main_startup_netio_paint: 2.0:
http://graphs.mozilla.org/graph.html#tests=[[243,94,12]]&sel=none&displayrange=7&datatype=running

tp5n_nonmain_startup_fileio_paint: 354464.0:
http://graphs.mozilla.org/graph.html#tests=[[248,94,12]]&sel=none&displayrange=7&datatype=running

tp5n_nonmain_normal_fileio_paint: 502931900.0:
http://graphs.mozilla.org/graph.html#tests=[[250,94,12]]&sel=none&displayrange=7&datatype=running

tp5n_main_normal_fileio_paint: 53619254.0:
http://graphs.mozilla.org/graph.html#tests=[[244,94,12]]&sel=none&displayrange=7&datatype=running

tp5n_main_shutdown_netio_paint: 2.0:
http://graphs.mozilla.org/graph.html#tests=[[247,94,12]]&sel=none&displayrange=7&datatype=running

All of these links are for mozilla-central, will be slightly different links when looking at inbound data.
Comment 77 (dormant account) 2012-09-10 04:08:55 PDT
What does the _paint suffix mean?
Comment 78 Joel Maher ( :jmaher) 2012-09-10 04:17:17 PDT
it is a relic of talos.  It means that we are using waiting for mozAfterPaint before going to the next page (what we do by default now in all our tests).
Comment 79 Jeff Hammel 2012-09-10 08:41:53 PDT
(In reply to Taras Glek (:taras) from comment #77)
> What does the _paint suffix mean?

Please see http://k0s.org/mozilla/blog/20120724135349 for the hoary details.  Ultimately, we should not be sending metadata as part of the test name.  But for graphserver these things are ultimately tied.  For datazilla they will not be
Comment 80 (dormant account) 2012-09-13 01:54:51 PDT
Joel, do we retain the csv and/or xperf file after running the test? tp5n_main_shutdown_fileio_paint shows a lot of variation(which may be due to some non-determinism in our code). Would be good to compare io to see where the extra io is going.
Comment 81 Joel Maher ( :jmaher) 2012-09-13 06:14:45 PDT
Unfortunately we do not save off the .etl or generated .csv file.  I could add in a fix to print out the .csv file to stdout and it would show up in the logs.  Lets figure out a few things while we watch the tree for another 20-30 pushes.  That might bring out other hot spots or patterns.
Comment 82 (dormant account) 2012-09-17 02:10:50 PDT
(In reply to Joel Maher (:jmaher) from comment #81)
> Unfortunately we do not save off the .etl or generated .csv file.  I could
> add in a fix to print out the .csv file to stdout and it would show up in
> the logs.  Lets figure out a few things while we watch the tree for another
> 20-30 pushes.  That might bring out other hot spots or patterns.

Can you turn on logging of tp5n_main_shutdown_fileio_paint io?
Comment 83 Joel Maher ( :jmaher) 2012-09-17 06:26:18 PDT
For some reason we are not collecting it.  It could be we have it configured incorrectly (please check my xperf options: http://hg.mozilla.org/build/talos/file/tip/talos/xperf.config#l12), or maybe the parser is mixing it up.

I will be heads down most of this week on mobile stuff.  Whoever can get to it first wins.
Comment 84 (dormant account) 2012-09-26 13:46:35 PDT
*** Bug 559663 has been marked as a duplicate of this bug. ***
Comment 85 Joel Maher ( :jmaher) 2012-10-08 09:56:11 PDT
tp5n_main_shutdown_fileio_paint: 59143.0:
http://graphs.mozilla.org/graph.html#tests=[[246,94,12]]&sel=none&displayrange=7&datatype=running

Let me know if that is useful or not.
Comment 86 (dormant account) 2012-10-08 11:59:08 PDT
(In reply to Joel Maher (:jmaher) from comment #85)
> tp5n_main_shutdown_fileio_paint: 59143.0:
> http://graphs.mozilla.org/graph.html#tests=[[246,94,
> 12]]&sel=none&displayrange=7&datatype=running
> 
> Let me know if that is useful or not.

It's not. Can you add a summary of files read + bytes from them to the logs? This would be handy for atleast main thread startup/shutdown io(we can skip the others for now to save space).
Comment 87 Joel Maher ( :jmaher) 2012-10-08 12:05:52 PDT
we already print the data in the logfile:
RETURN:<a href='http://graphs.mozilla.org/graph.html#tests=[[251,94,12]]'>tp5n_nonmain_normal_netio_paint: 140670519.0</a><br>
RETURN:<a href='http://graphs.mozilla.org/graph.html#tests=[[242,94,12]]'>tp5n_main_startup_fileio_paint: 26365858.0</a><br>
RETURN:<a href='http://graphs.mozilla.org/graph.html#tests=[[246,94,12]]'>tp5n_main_shutdown_fileio_paint: 30886.0</a><br>
RETURN:<a href='http://graphs.mozilla.org/graph.html#tests=[[245,94,12]]'>tp5n_main_normal_netio_paint: 5307030.0</a><br>
RETURN:<a href='http://graphs.mozilla.org/graph.html#tests=[[243,94,12]]'>tp5n_main_startup_netio_paint: 2.0</a><br>
RETURN:<a href='http://graphs.mozilla.org/graph.html#tests=[[248,94,12]]'>tp5n_nonmain_startup_fileio_paint: 477344.0</a><br>
RETURN:<a href='http://graphs.mozilla.org/graph.html#tests=[[250,94,12]]'>tp5n_nonmain_normal_fileio_paint: 471793045.0</a><br>
RETURN:<a href='http://graphs.mozilla.org/graph.html#tests=[[244,94,12]]'>tp5n_main_normal_fileio_paint: 39505092.0</a><br>
RETURN:<a href='http://graphs.mozilla.org/graph.html#tests=[[247,94,12]]'>tp5n_main_shutdown_netio_paint: 3.0</a><br>

Note You need to log in before you can comment on or make changes to this bug.