<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Updated

•

13 years ago

Assignee: nobody → necheverria

Marco Castelluccio [:marco]

Comment 2

•

13 years ago

Attached file xperf File IO tracker — Details

Python script that calls xperf with FileIO and "Mozilla Generic Provider" providers, parses the dumpfile, and aggregates I/O operation by main/non-main thread and startup/normal/shutdown stages. Example: python iotracker.py "C:/firefox-src/mozilla-central/obj-dbg/dist/bin/firefox.exe -no-remote -p dev" {'non-main': {'startup': defaultdict(<type 'int'>, {'read_bytes': 5313621, 'read_ops': 176}), 'normal': defaultdict(<type 'int'>, {'write_bytes': 15764688, 'read_bytes': 31482145, 'read_ops': 1057, 'write_ops': 770})}, 'main': {'startup': defaultdict(<type 'int'>, {'write_bytes': 765, 'read_bytes': 136616759, 'read_ops': 2516, 'write_ops': 1}), 'shutdown': defaultdict(<type 'int'>, {'write_bytes': 28356, 'read_bytes': 10446, 'read_ops': 265, 'write_ops': 5}), 'normal': defaultdict(<type 'int'>, {'write_bytes': 2550862, 'read_bytes': 2540804, 'read_ops': 250, 'write_ops': 424})}}

Comment 3

•

13 years ago

Is this something we can integrate in telemetry or the slowstartup addon?

Comment 4

•

13 years ago

(In reply to Marco Castelluccio from comment #3) I'm not sure about integrating it with the slowstartup addon, as Event Tracing for Windows requires administrative privileges. It also has a non-negligible performance overhead when it's enabled (events from all processes are received and must be filtered).

Assignee

Comment 5

•

13 years ago

For reference we currently run xperf+firefox via talos. Here are the launching and parsing scripts: http://hg.mozilla.org/build/talos/file/807c94aef671/talos/xtalos I am thinking if we can boil this down to 1 or two numbers we can get those added in graph server and report up to there. Questions: * will we have to install a .mof or other files on the windows machine? ** If so, can this be automated? We have hundreds of windows 7 clients * Are we only targeting windows 7, or other flavors of windows now?

Comment 6

•

13 years ago

Attached patch Add xperf probe for XPCOM shutdown event (obsolete) — Details — Splinter Review

Attachment #642025 - Flags: feedback?(vdjeric)

Comment 7

•

13 years ago

Attached file xperf File and Network IO tracker (obsolete) — Details

Attachment #643106 - Flags: feedback?(vdjeric)

Comment 8

•

13 years ago

(In reply to Joel Maher (:jmaher) from comment #5) Yes, we'll need to install the mof file that includes the XPCOM shutdown event.

Assignee

Comment 9

•

13 years ago

is there a way to do this installation automatically? If not, this is a manual process that will need to occur on hundreds of machines. Also do you have a link to the .mof file?

Comment 10

•

13 years ago

Comment on attachment 642025 [details] [diff] [review] Add xperf probe for XPCOM shutdown event Looks good, remember to have a module owner review it next so we can land the patch :)

Attachment #642025 - Flags: feedback?(vdjeric) → feedback+

Comment 11

•

13 years ago

Comment on attachment 643106 [details] xperf File and Network IO tracker In your runp() function, you can just do something like subprocess.Popen(..., stdout=sys.stdout, ..) instead of redirecting the subprocess's stdout to your script and manually "print"-ing it Just a nitpick, but could you also change the indent to 2 spaces instead of 4 and use a bit more space in the code, e.g. put "continue" and "break" statements on separate lines from condition checks

Attachment #643106 - Flags: feedback?(vdjeric) → feedback+

Reporter

Updated

•

13 years ago

Attachment #643106 - Attachment mime type: text/x-python → text/plain

David Teller [:Yoric] - still alive but not very active

Comment 12

•

13 years ago

Attached patch Add xperf probe for XPCOM shutdown event (obsolete) — Details — Splinter Review

Attachment #642025 - Attachment is obsolete: true

Attachment #644076 - Flags: review?(mh+mozilla)

Mike Hommey [:glandium]

Comment 13

•

13 years ago

Comment on attachment 644076 [details] [diff] [review] Add xperf probe for XPCOM shutdown event Review of attachment 644076 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/startup/mozprofilerprobe.mof @@ +21,5 @@ > { > +}; > + > +[dynamic: ToInstance, Description("Mozilla Event: XPCOM Shutdown."): Amended, > + Guid("{26d1e091-0ae7-4f49-a554-4214445c505c}"), Please uppercase the guid, like the others above. ::: toolkit/components/startup/nsAppStartup.cpp @@ +92,5 @@ > { 0xA7, 0x60, 0x8D, 0x49, 0x02, 0x77, 0x48, 0xAE} } > +// {26d1e091-0ae7-4f49-a554-4214445c505c} > +#define NS_XPCOM_SHUTDOWN_EVENT_CID \ > + { 0x26d1e091, 0x0ae7, 0x4f49, \ > + { 0xa5, 0x54, 0x42, 0x14, 0x44, 0x5c, 0x50, 0x5c} } Likewise.

Attachment #644076 - Flags: review?(mh+mozilla) → review+

Updated

•

13 years ago

Attachment #644076 - Flags: feedback+

Comment 14

•

13 years ago

Attached patch Add xperf probe for XPCOM shutdown event — Details — Splinter Review

Changed GUIDs to uppercase.

Attachment #644076 - Attachment is obsolete: true

Attachment #644370 - Flags: review?(vdjeric)

Comment 15

•

13 years ago

Comment on attachment 644370 [details] [diff] [review] Add xperf probe for XPCOM shutdown event https://hg.mozilla.org/integration/mozilla-inbound/rev/f0544409e034

Attachment #644370 - Flags: review?(vdjeric)

Updated

•

13 years ago

OS: Windows XP → All

Hardware: x86 → All

Comment 16

•

13 years ago

Attached patch IO tracker + talos (obsolete) — Details — Splinter Review

I have made changes to the way xperf in launched in talos to allow for both user mode logging and kernel logging, as needed for the IO tracker. - The PID of the firefox instance is also now passed to etlparser.py - Replaced many os.system calls with subprocess.call since otherwise commands with two sets of double quotes would fail for some reason. jmaher will do further work to integrate the IO tracker with talos based on the "xperf File and Network IO tracker" python script.

Attachment #644497 - Flags: feedback?(jmaher)

Ryan VanderMeulen [:RyanVM]

Comment 17

•

13 years ago

Attached file xperf File and Network IO tracker — Details

Improvements based on vladan's comments.

Attachment #643106 - Attachment is obsolete: true

Comment 18

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/f0544409e034

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla17

Assignee

Comment 19

•

13 years ago

Comment on attachment 644497 [details] [diff] [review] IO tracker + talos Review of attachment 644497 [details] [diff] [review]: ----------------------------------------------------------------- just a couple of questions. ::: talos/bcontroller.py @@ +133,5 @@ > '-o', csvname, > '-p', self.process, > '-e', etlname, > + '-c', self.configFile, > + '--pid', str(proc.pid)] do we use --pid anywhere? I don't see it used in etlparser.py ::: talos/xperf.config @@ +7,5 @@ > # base test overrides > basetest: > tpcycles: 1 > xperf_providers : ['PROC_THREAD', 'LOADER', 'HARD_FAULTS', 'FILENAME', 'FILE_IO', 'FILE_IO_INIT'] > + xperf_user_providers: ['"Mozilla Generic Provider"', 'Microsoft-Windows-TCPIP'] I assume we can work with the " properly.

Attachment #644497 - Flags: feedback?(jmaher) → feedback+

Comment 20

•

13 years ago

Yes, we should probably always add quotes to each of the providers when building the command line string, instead of putting the quotes in the name of the provider itself when it has spaces. options.pid is not used anywhere in etlparser.py, but can be used when implementing network/per-thread IO parsing.

Reporter

Comment 21

•

13 years ago

Lets not work in a closed bug until the work is done :)

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment 22

•

13 years ago

Attached patch IO tracker + talos (obsolete) — Details — Splinter Review

Integrated with etlparser; now dumps a etl_output_thread_stats.csv file with aggregated counters.

Attachment #644497 - Attachment is obsolete: true

Attachment #645491 - Flags: review?(jmaher)

Comment 23

•

13 years ago

Comment on attachment 645491 [details] [diff] [review] IO tracker + talos - Log entries are no longer pre-filtered by the Firefox process, since incoming network events may occur on any other process. In this case, it's the connection ID (etw:ActivityId) what matters. - Other events are filtered by the Firefox PID instead of the process name.

Assignee

Comment 24

•

13 years ago

Comment on attachment 645491 [details] [diff] [review] IO tracker + talos Review of attachment 645491 [details] [diff] [review]: ----------------------------------------------------------------- a couple nits/questions below, but otherwise this is great. I ran this locally and I ended up with: > more etl_output_thread_stats.csv thread, stage, counter, value main, normal, file_read_bytes, 1188312 main, normal, file_read_ops, 57 main, normal, file_write_bytes, 4143423 main, normal, file_write_ops, 206 main, normal, net_recv_bytes, 26 main, normal, net_send_bytes, 26 main, startup, file_read_bytes, 830415 main, startup, file_read_ops, 85 main, startup, file_write_bytes, 735 main, startup, file_write_ops, 18 main, startup, net_recv_bytes, 4 main, startup, net_send_bytes, 5 non-main, normal, file_read_bytes, 1037622 non-main, normal, file_read_ops, 41 non-main, normal, file_write_bytes, 785162 non-main, normal, file_write_ops, 58 non-main, normal, net_recv_bytes, 2527 non-main, normal, net_send_bytes, 2532 non-main, startup, file_read_bytes, 169236 non-main, startup, file_read_ops, 10 non-main, startup, file_write_bytes, 49704 non-main, startup, file_write_ops, 4 What do we want to report to the graph server? I think this is too many variable to report (22 total). Taras mentioned running this against tp5, is that the only test we want to run this against for now? Ideally I would like to get this down to 6 or 8 counters that we report to the graph server. I guess it is not big deal if we need to do them all, it will just make displaying and reporting these more difficult. ::: talos/xtalos/start_xperf.py @@ +66,5 @@ > (options.xperf_path, > '+'.join(options.xperf_providers), > '+'.join(options.xperf_stackwalk), > + options.etl_filename, > + '"+"'.join(options.xperf_user_providers), nice call on the quotes here. ::: talos/xtalos/stop_xperf.py @@ +48,5 @@ > if options == None: > print "Unable to verify options" > sys.exit(1) > > + xperf_cmd = '"%s" -stop -stop talos_ses' % (options.xperf_path) do we need two '-stop' commands here? ::: talos/xtalos/xtalos.py @@ +73,5 @@ > > self.add_option("-d", "--debug", > type="int", dest = "debug_level", > help = "debug level for output from tool (0-5, 5 being everything), defaults to 1") > + defaults["debug_level"] = 5 I am not sure we want to make this level=5 by default.

Attachment #645491 - Flags: review?(jmaher) → review+

Assignee

Comment 25

•

13 years ago

Here are the results from tp5n (1 cycle, not 25 cycles): (talos) c:\mozilla\talos\talos>type etl_output_thread_stats.csv thread, stage, counter, value main, normal, file_read_bytes, 61656933 main, normal, file_read_ops, 2759 main, normal, file_write_bytes, 5690149 main, normal, file_write_ops, 769 main, normal, net_recv_bytes, 2076257 main, normal, net_send_bytes, 1689701 main, startup, file_read_bytes, 1303547 main, startup, file_read_ops, 104 main, startup, file_write_bytes, 735 main, startup, file_write_ops, 18 main, startup, net_recv_bytes, 1 main, startup, net_send_bytes, 2 non-main, normal, file_read_bytes, 158177202 non-main, normal, file_read_ops, 6186 non-main, normal, file_write_bytes, 500356217 non-main, normal, file_write_ops, 44076 non-main, normal, net_recv_bytes, 68757627 non-main, normal, net_send_bytes, 68953041 non-main, startup, file_read_bytes, 512320 non-main, startup, file_read_ops, 18

Reporter

Comment 26

•

13 years ago

We could get rid of ops(maybe) and merge read/write byte counters.

Comment 27

•

13 years ago

Attached patch IO tracker + talos — Details — Splinter Review

- Restored defaults["debug_level"] to 1 - Added total I/O counters (i.e. net_io_bytes)

Attachment #645491 - Attachment is obsolete: true

Attachment #647008 - Flags: review?

Comment 28

•

13 years ago

(In reply to Joel Maher (:jmaher) from comment #24) > Comment on attachment 645491 [details] [diff] [review] > ::: talos/xtalos/stop_xperf.py > @@ +48,5 @@ > > if options == None: > > print "Unable to verify options" > > sys.exit(1) > > > > + xperf_cmd = '"%s" -stop -stop talos_ses' % (options.xperf_path) > > do we need two '-stop' commands here? > Yes, two stop commands are needed to stop the two trace sessions (user and kernel). "talos_ses" is the name of the user session, the kernel session has no name.

Comment 29

•

13 years ago

(In reply to Joel Maher (:jmaher) from comment #25) > Here are the results from tp5n (1 cycle, not 25 cycles): > > (talos) c:\mozilla\talos\talos>type etl_output_thread_stats.csv > ... We still need to install the mof file and XPCOM shutdown event patch to get I/O during shutdown separately. Then it will look something like what I get on my machine: thread, stage, counter, value main, normal, file_io_bytes, 2714779 main, normal, file_read_bytes, 1122436 main, normal, file_read_ops, 54 main, normal, file_write_bytes, 1592343 main, normal, file_write_ops, 91 main, normal, net_io_bytes, 1684 main, normal, net_recv_bytes, 673 main, normal, net_send_bytes, 1011 main, shutdown, file_io_bytes, 2681987 main, shutdown, file_write_bytes, 2681987 main, shutdown, file_write_ops, 174 main, shutdown, net_io_bytes, 4 main, shutdown, net_recv_bytes, 2 main, shutdown, net_send_bytes, 2 main, startup, file_io_bytes, 4619774 main, startup, file_read_bytes, 4618891 main, startup, file_read_ops, 574 main, startup, file_write_bytes, 883 main, startup, file_write_ops, 18 main, startup, net_io_bytes, 25 main, startup, net_recv_bytes, 12 main, startup, net_send_bytes, 13 non-main, normal, file_io_bytes, 1887236 non-main, normal, file_read_bytes, 1064664 non-main, normal, file_read_ops, 39 non-main, normal, file_write_bytes, 822572 non-main, normal, file_write_ops, 60 non-main, normal, net_io_bytes, 648 non-main, normal, net_recv_bytes, 347 non-main, normal, net_send_bytes, 301 non-main, startup, file_io_bytes, 5489921 non-main, startup, file_read_bytes, 5460697 non-main, startup, file_read_ops, 219 non-main, startup, file_write_bytes, 29224 non-main, startup, file_write_ops, 7 non-main, startup, net_io_bytes, 5055 non-main, startup, net_recv_bytes, 2527 non-main, startup, net_send_bytes, 2528

Comment 30

•

13 years ago

So I guess we're reporting 12 counters to the graph server: (main|non-main), (startup|normal|shutdown), (file_io_bytes|net_io_bytes) Some counters may not always be present. Last time I ran it, there was no I/O in non-main threads during shutdown.

Assignee

Comment 31

•

13 years ago

we can go with the twelve counters. If we do, I want to restrict this to a single test, or no more than 2 tests. Currently we are looking at TP5, are there other tests which would be of use?

Assignee

Comment 32

•

13 years ago

is it normal to not have all counters present? i.e. is there a technical reason? If this is expected and normal, then I can make sure that talos handles that scenario properly.

Comment 33

•

13 years ago

The script just creates the corresponding items in the array as events are parsed.

Assignee

Comment 34

•

13 years ago

Comment on attachment 647008 [details] [diff] [review] IO tracker + talos Review of attachment 647008 [details] [diff] [review]: ----------------------------------------------------------------- Overall this is good. I understand the MOF file. For the xpcom shutdown event, do we just need to land a patch on mozilla-central and use that?

Attachment #647008 - Flags: review? → review+

Comment 35

•

13 years ago

(In reply to Joel Maher (:jmaher) from comment #34) > Comment on attachment 647008 [details] [diff] [review] > IO tracker + talos > > Review of attachment 647008 [details] [diff] [review]: > ----------------------------------------------------------------- > > Overall this is good. I understand the MOF file. For the xpcom shutdown > event, do we just need to land a patch on mozilla-central and use that? Yes. The patch is already on mozilla-central: https://hg.mozilla.org/mozilla-central/rev/f0544409e034

Reporter

Comment 36

•

13 years ago

(In reply to Joel Maher (:jmaher) from comment #31) > we can go with the twelve counters. If we do, I want to restrict this to a > single test, or no more than 2 tests. Currently we are looking at TP5, are > there other tests which would be of use? tp5 is good for now. What's our eta for deployment?

Assignee

Comment 37

•

13 years ago

the biggest unknown here is getting the .mof file deployed. Otherwise I need to spend some time getting graph server definitions defined, we need to deploy this as a new talos.zip and we need to update buildbot-configs. Most of the known quantities can be handled this week, the unknown could take a few weeks. If we could get this deployed and just not collect/report data from the .mof file, we could get going and then track the bug to deploy the .mof file.

Reporter

Comment 38

•

13 years ago

Why is it hard to deploy the .mof, is there anything we can do to speed that up? Having data partitioned makes it a lot more useful.

Assignee

Comment 39

•

13 years ago

we just need to deploy the .mof file on all the windows 7 machines which is a few hundred. I assume this can be scripted as my initial test seemed like it would work fine, but this needs to get scheduled in with releng which wouldn't be a top priority for them.

Reporter

Comment 40

•

13 years ago

(In reply to Joel Maher (:jmaher) from comment #39) > we just need to deploy the .mof file on all the windows 7 machines which is > a few hundred. I assume this can be scripted as my initial test seemed like > it would work fine, but this needs to get scheduled in with releng which > wouldn't be a top priority for them. Can you file a bug specifying what needs to be done, I'll followup with making it a priority.

Comment 41

•

13 years ago

Attached patch Fix class name in mof file — Details — Splinter Review

Fixed class name in mof file. Apparently it didn't end up making much difference since classes are referred by guid.

Attachment #647248 - Flags: review?(mh+mozilla)

Assignee

Updated

•

13 years ago

Depends on: 778856

Reporter

Updated

•

13 years ago

Attachment #647248 - Flags: review?(mh+mozilla) → review+

Comment 42

•

13 years ago

Can we get it checked-in to mozilla-central?

Reporter

Comment 43

•

13 years ago

(In reply to Nicolás Chaim from comment #42) > Can we get it checked-in to mozilla-central? mark the patch as checkin-needed, point at glandium

Comment 44

•

13 years ago

Comment on attachment 647248 [details] [diff] [review] Fix class name in mof file I can't seem to get the check-in needed flag.

Attachment #647248 - Flags: review?(mh+mozilla)

Updated

•

13 years ago

Keywords: checkin-needed

Updated

•

13 years ago

Keywords: checkin-needed

Updated

•

13 years ago

Attachment #647248 - Flags: review?(vdjeric)

Comment 45

•

13 years ago

Comment on attachment 647248 [details] [diff] [review] Fix class name in mof file https://hg.mozilla.org/integration/mozilla-inbound/rev/710a0800bbef

Attachment #647248 - Flags: review?(vdjeric)

Attachment #647248 - Flags: review?(mh+mozilla)

https://hg.mozilla.org/mozilla-central/rev/95bd473e7164

Comment 46

•

13 years ago

Ryan VanderMeulen [:RyanVM]

Assignee

Updated

•

13 years ago

Depends on: 779272

Comment 47

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/710a0800bbef

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

Reporter

Comment 48

•

13 years ago

Not done with the xperf part yet. lets stop landing mc-changes in this bug :)

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Ed Morley [:emorley]

Comment 49

•

13 years ago

(In reply to Taras Glek (:taras) from comment #48) > Not done with the xperf part yet. lets stop landing mc-changes in this bug :) If it helps, you can use [leave open] in the whiteboard to stop the merge tool from closing the bug. (https://groups.google.com/d/topic/mozilla.dev.platform/DQyQ56DgmhA/discussion)

Assignee

Comment 50

•

13 years ago

Attached patch add the 12 counters we are collecting with xperf to the graph server (1.0) — Details — Splinter Review

Attachment #648736 - Flags: review?(jhammel)

Updated

•

13 years ago

Attachment #648736 - Flags: review?(jhammel) → review+

Assignee

Comment 51

•

13 years ago

landed graph server definitions: http://hg.mozilla.org/graphs/rev/b3935f847775

Assignee

Updated

•

13 years ago

Depends on: 780681

Assignee

Updated

•

13 years ago

Depends on: 781319

Assignee

Comment 52

•

13 years ago

One problem I am having getting this patch to land on talos is that it uses python bits from modern talos and we run python 2.4 on the windows talos machines. We have been trying for the last 2 quarters to bump the python version up and it is much easier to rewrite our code to work with 2.4. The biggest thing is the 'from collections import defaultdict'. Is the primary purpose for the default dict to keep things in proper order: for cntr in sorted(io.iterkeys()): for cntr in sorted(io.iterkeys()): Either we can drop that requirement or work around it with some extra code, I am testing it now with no defaultdict requirement.

Assignee: nicolas → jmaher

Assignee

Comment 53

•

13 years ago

I had to assign this bug to myself since nicolas @ mozilla.com was not a valid email in bugzilla and I couldn't submit my comment. I have cleaned up the patch and it works in talos. I found that Providing a user_provider which doesn't exist on the system will fail to produce anything. So in the spirit of getting things going we can temporarily remove the "Mozilla Generic Provider" from the listening configs and this generates: thread, stage, counter, value main, startup, file_io_bytes, 40790612 main, startup, file_read_bytes, 38018251 main, startup, file_read_ops, 697 main, startup, file_write_bytes, 2772361 main, startup, file_write_ops, 170 main, startup, net_io_bytes, 43 main, startup, net_recv_bytes, 21 main, startup, net_send_bytes, 22 This isn't ideal, but it gets us going and allows us to integrate into talos until the windows 7 machines are all updated with the .mof file.

Comment 54

•

13 years ago

defaultdict(int) creates a dict that initializes members to the type's default value (zero for int) if they don't exist. If we're using a regular dict, the equivalent would be: key = (th, stg, "net_%s_bytes" % opType) if key not in io: io[key] = 0 io[key] += bytes We could perhaps also include some existing, external implementation for the defaultdict class, or maybe just initialize the dict with all possible keys we're using and zero values.

Comment 55

•

13 years ago

(In reply to Nicolás Chaim from comment #54) > defaultdict(int) creates a dict that initializes members to the type's > default value (zero for int) if they don't exist. > > If we're using a regular dict, the equivalent would be: > > key = (th, stg, "net_%s_bytes" % opType) > if key not in io: > io[key] = 0 > io[key] += bytes Please don't do this. python dicts have had setdefault forever key = (th, stg, "net_%s_bytes" % opType) io.setdefault(key, 0) += bytes

Comment 56

•

13 years ago

(In reply to Jeff Hammel [:jhammel] from comment #55) > (In reply to Nicolás Chaim from comment #54) > > defaultdict(int) creates a dict that initializes members to the type's > > default value (zero for int) if they don't exist. > > > > If we're using a regular dict, the equivalent would be: > > > > key = (th, stg, "net_%s_bytes" % opType) > > if key not in io: > > io[key] = 0 > > io[key] += bytes > > Please don't do this. python dicts have had setdefault forever > > key = (th, stg, "net_%s_bytes" % opType) > io.setdefault(key, 0) += bytes Oh, that's perfect. I'm not very familiar with Python.

Comment 57

•

13 years ago

I guess it would still need to be io[key] = io.get(key, 0) + bytes since io.setdefault(key, 0) will just return an inmutable int

Comment 58

•

13 years ago

(In reply to Nicolás Chaim from comment #57) > I guess it would still need to be > io[key] = io.get(key, 0) + bytes > since > io.setdefault(key, 0) > will just return an inmutable int Beh, good point. :(

Assignee

Comment 59

•

13 years ago

landed the talos bits with the slight mods mentioned in the last few comments. This worked on try server without affecting current results: http://hg.mozilla.org/build/talos/rev/d3733d055e49 We still need to do a few modifications to talos to bring the xperf providers up from parsing the etl file and get them input into the graph server. I am working on that with :jhammel.

Assignee

Comment 60

•

13 years ago

Attached patch make the counters report to graph server (1.0) (obsolete) — Details — Splinter Review

This patch does what it says it does by publishing all of the counters we collect from xperf to the graph servers. The remaining work is to resolve the config files and how this will be run from buildbot. If this patch is f+, we can start resolving the config file issues and the buildbot configs.

Attachment #655589 - Flags: feedback?(jhammel)

Comment 61

•

13 years ago

Comment on attachment 655589 [details] [diff] [review] make the counters report to graph server (1.0) >diff -r 6d79047595a4 talos/bcontroller.py >--- a/talos/bcontroller.py Fri Aug 24 20:02:51 2012 -0700 >+++ b/talos/bcontroller.py Mon Aug 27 09:37:21 2012 -0400 >@@ -109,11 +109,13 @@ > print "Error running etlparser: %s" % e > self.returncode = 1 > >- print "__xperf_data_begin__" >- fhandle = open(csvname, 'r') >- print fhandle.read() >+ results_file = open(self.browser_log, "a") >+ results_file.write("__xperf_data_begin__") >+ fhandle = open("etl_output_thread_stats.csv", 'r') >+ results_file.write(fhandle.read()) > fhandle.close() >- print "__xperf_data_end__" >+ results_file.write("__xperf_data_end__") >+ results_file.close() > else: #blocking call to system, non-remote device > self.returncode = os.system(self.command + " > " + self.browser_log) So this is just ensuring that we're writing to the browser log(?) That seems to be a good idea. I've also filed bug 786055 to get rid of the os.system calls and shell-level redirects >diff -r 6d79047595a4 talos/cmanager_win32.py >--- a/talos/cmanager_win32.py Fri Aug 24 20:02:51 2012 -0700 >+++ b/talos/cmanager_win32.py Mon Aug 27 09:37:21 2012 -0400 >@@ -21,6 +21,7 @@ > path = win32pdh.MakeCounterPath((None, 'process', process, > None, -1, counter)) > hq = win32pdh.OpenQuery() >+ hc = None > try: > hc = win32pdh.AddCounter(hq, path) > except: >@@ -33,8 +34,9 @@ > except: > win32pdh.CloseQuery(hq) > >- self.registeredCounters[counter] = [hq, [(hc, path)]] >- self.updateCounterPathsForChildProcesses(counter) >+ if hc: >+ self.registeredCounters[counter] = [hq, [(hc, path)]] >+ self.updateCounterPathsForChildProcesses(counter) Not sure I understand this code, but maybe I don't need to. >diff -r 6d79047595a4 talos/results.py >--- a/talos/results.py Fri Aug 24 20:02:51 2012 -0700 >+++ b/talos/results.py Mon Aug 27 09:37:21 2012 -0400 >@@ -265,6 +265,9 @@ > # regular expression for responsiveness results > RESULTS_RESPONSIVENESS_REGEX = re.compile('MOZ_EVENT_TRACE\ssample\s\d*?\s(\d*?)$', re.DOTALL|re.MULTILINE) > >+ # regular expression for xperf io data >+ XPERF_REGEX = re.compile('([a-z, \-\_]+)_io_bytes, ([0-9]+)') As you can probably guess, I'm not a huge fan of regexes like this, but I guess this is okay. Maybe a comment show example output would help. Also, isn't the xperf output formatted data anyway? Can't we just read the .csv and call it a day? > # classes for results types > classes = {'tsformat': TsResults, > 'tpformat': PageloaderResults} >@@ -391,6 +394,36 @@ > self.shutdown(global_counters) > if 'responsiveness' in global_counters: > global_counters['responsiveness'].extend(self.responsiveness()) >+ self.xperf(global_counters) >+ >+ def xperf(self, counter_results): >+ """record xperf counters in counter_results dictionary""" >+ >+ counters = ['main_startup_fileio', >+ 'main_startup_netio', >+ 'main_normal_fileio', >+ 'main_normal_netio', >+ 'main_shutdown_fileio', >+ 'main_shutdown_netio', >+ 'nonmain_startup_fileio', >+ 'nonmain_startup_netio', >+ 'nonmain_normal_fileio', >+ 'nonmain_normal_netio', >+ 'nonmain_shutdown_fileio', >+ 'nonmain_shutdown_netio'] Pretty messy but I don't know a way around this. >+ if not set(counters).intersection(counter_results.keys()): >+ # no xperf counters to accumulate >+ return >+ for line in self.results_raw.split('\n'): >+ xperfmatch = self.XPERF_REGEX.search(line) Again, I wouldn't use a regex for this. >+ if xperfmatch: >+ (type, value) = (xperfmatch.group(1), xperfmatch.group(2)) >+ # type will be similar to 'main, startup, file' >+ cname = ('_').join(type.split(', ')) >+ counter_name = '%sio' % cname >+ if counter_name in counter_results: >+ counter_results[counter_name].append(value) Does this need a setdefault? (not sure). > def rss(self, counter_results): > """record rss counters in counter_results dictionary""" >diff -r 6d79047595a4 talos/test.py >--- a/talos/test.py Fri Aug 24 20:02:51 2012 -0700 >+++ b/talos/test.py Mon Aug 27 09:37:21 2012 -0400 >@@ -49,7 +49,7 @@ > > class TsBase(Test): > """abstract base class for ts-style tests""" >- keys = ['url', 'url_timestamp', 'timeout', 'cycles', 'shutdown', 'profile_path', >+ keys = ['url', 'url_timestamp', 'timeout', 'cycles', 'shutdown', 'profile_path', 'xperf_counters', > 'xperf_providers', 'xperf_user_providers', 'xperf_stackwalk'] > > class ts(TsBase): >@@ -62,6 +62,7 @@ > class ts_paint(ts): > url = 'startup_test/tspaint_test.html?begin=' > shutdown = None >+ xperf_counters = ['main_startup_fileio', 'main_startup_netio', 'main_normal_fileio', 'main_normal_netio', 'main_shutdown_fileio', 'main_shutdown_netio', 'nonmain_startup_fileio', 'nonmain_startup_netio', 'nonmain_normal_fileio', 'nonmain_normal_netio', 'nonmain_shutdown_fileio', 'nonmain_shutdown_netio'] Again, messy, though i can't think of a better way of doing this. Maybe e.g. fileio implies all of '{main,nonmain}_{startup,normal,shutdown}_fileio'... > class ts_places_generated_max(ts): > profile_path = '${talos}/places_generated_max' >@@ -145,7 +146,7 @@ > filters = None > keys = ['tpmanifest', 'tpcycles', 'tppagecycles', 'tprender', 'tpchrome', 'tpmozafterpaint', > 'rss', 'resolution', 'cycles', >- 'win_counters', 'w7_counters', 'linux_counters', 'mac_counters', 'remote_counters', >+ 'win_counters', 'w7_counters', 'linux_counters', 'mac_counters', 'remote_counters', 'xperf_counters', > 'timeout', 'shutdown', 'responsiveness', 'profile_path', > 'xperf_providers', 'xperf_user_providers', 'xperf_stackwalk', 'filters' > ] >@@ -197,6 +198,7 @@ > w7_counters = ['Main_RSS', 'Content_RSS', 'Private Bytes', '% Processor Time', 'Modified Page List Bytes'] > linux_counters = ['Private Bytes', 'XRes', 'Main_RSS', 'Content_RSS'] > mac_counters = ['Private Bytes', 'Main_RSS', 'Content_RSS'] >+ xperf_counters = ['main_startup_fileio', 'main_startup_netio', 'main_normal_fileio', 'main_normal_netio', 'main_shutdown_fileio', 'main_shutdown_netio', 'nonmain_startup_fileio', 'nonmain_startup_netio', 'nonmain_normal_fileio', 'nonmain_normal_netio', 'nonmain_shutdown_fileio', 'nonmain_shutdown_netio'] > > class tp_js(PageloaderTest): > url = """'"http://localhost/page_load_test/framecycler.html?quit=1&cycles=5"'""" >diff -r 6d79047595a4 talos/ttest.py >--- a/talos/ttest.py Fri Aug 24 20:02:51 2012 -0700 >+++ b/talos/ttest.py Mon Aug 27 09:37:21 2012 -0400 >@@ -270,6 +270,10 @@ > # setup global (cross-cycle) counters: > # shutdown, responsiveness > global_counters = {} >+ if browser_config.get('xperf_path'): >+ for c in test_config.get('xperf_counters', []): >+ global_counters[c] = [] >+ > if test_config['shutdown']: > global_counters['shutdown'] = [] > if test_config.get('responsiveness') and platform.system() != "Linux": >diff -r 6d79047595a4 talos/xperf.config >--- a/talos/xperf.config Fri Aug 24 20:02:51 2012 -0700 >+++ b/talos/xperf.config Mon Aug 27 09:37:21 2012 -0400 >@@ -5,10 +5,15 @@ > talos.logfile: browser_output.txt > > # base test overrides >-# TODO: add "Mozilla Generic Provider" to xperf_user_providers once we deploy the mof file to test slaves > basetest: >+ cycles: 1 > tpcycles: 1 >- cycles: 1 >+ tppagecycles: 1 > xperf_providers : ['PROC_THREAD', 'LOADER', 'HARD_FAULTS', 'FILENAME', 'FILE_IO', 'FILE_IO_INIT'] >- xperf_user_providers: ['Microsoft-Windows-TCPIP'] >+ xperf_user_providers: ['Mozilla Generic Provider', 'Microsoft-Windows-TCPIP'] > xperf_stackwalk : ['FileRead', 'FileWrite', 'FileFlush'] >+ w7_counters: [] >+ win_counters: [] >+ linux_counters: [] >+ remote_counters: [] >+ mac_counters: []

Attachment #655589 - Flags: feedback?(jhammel) → feedback+

Assignee

Comment 62

•

13 years ago

so we could assume there is a known csv (or we could make it .json) file and parse the data that way. That would remove our dependency on regex. The idea was to keep this similar to the rest of the system in that we output data in browser_output.txt and then parse it up after the test completes. Minor details, lets figure out the .config problem and start getting this stuff up for real reviews.

Assignee

Updated

•

13 years ago

Depends on: 786308

Assignee

Comment 63

•

13 years ago

Attached patch make the counters report to graph server (1.1) — Details — Splinter Review

A few small tweaks to this to make it work. Lets get this reviewed and talos deployed:) I chose to stick with all output going to browser_output.txt instead of having the results.py read the .csv file. I would like to keep it as consistent as possible and put all the filename logic for .etl and .csv in one place.

Attachment #655589 - Attachment is obsolete: true

Attachment #657506 - Flags: review?(jhammel)

Comment 64

•

13 years ago

Comment on attachment 657506 [details] [diff] [review] make the counters report to graph server (1.1) This looks fine. Would you mind if I did a follow-up wrt to how we parse the counters in results? (Parsing via CSV vs the regex) I would also like to put this on two separate lines: NOISE: __xperf_data_end____startBeforeLaunchTimestamp1346452998207__endBeforeLaunchTimestamp We have the following data marked by delimeters: NOISE: __xperf_data_begin__thread, stage, counter, value NOISE: main, normal, file_io_bytes, 135479381 NOISE: main, normal, file_read_bytes, 129100860 NOISE: main, normal, file_read_ops, 4280 NOISE: main, normal, file_write_bytes, 6378521 NOISE: main, normal, file_write_ops, 558 NOISE: main, normal, net_io_bytes, 11938485 NOISE: main, normal, net_recv_bytes, 1429236 NOISE: main, normal, net_send_bytes, 10509249 NOISE: main, shutdown, file_io_bytes, 26375 NOISE: main, shutdown, file_read_bytes, 7373 NOISE: main, shutdown, file_read_ops, 151 NOISE: main, shutdown, file_write_bytes, 19002 NOISE: main, shutdown, file_write_ops, 3 NOISE: main, startup, file_io_bytes, 24399710 NOISE: main, startup, file_read_bytes, 24398866 NOISE: main, startup, file_read_ops, 420 NOISE: main, startup, file_write_bytes, 844 NOISE: main, startup, file_write_ops, 18 NOISE: main, startup, net_io_bytes, 2 NOISE: main, startup, net_recv_bytes, 1 NOISE: main, startup, net_send_bytes, 1 NOISE: non-main, normal, file_io_bytes, 471162110 NOISE: non-main, normal, file_read_bytes, 96544227 NOISE: non-main, normal, file_read_ops, 3854 NOISE: non-main, normal, file_write_bytes, 374617883 NOISE: non-main, normal, file_write_ops, 37368 NOISE: non-main, normal, net_io_bytes, 150199046 NOISE: non-main, normal, net_recv_bytes, 84117265 NOISE: non-main, normal, net_send_bytes, 66081781 NOISE: non-main, startup, file_io_bytes, 354464 NOISE: non-main, startup, file_read_bytes, 354464 NOISE: non-main, startup, file_read_ops, 12 NOISE: __xperf_data_end__ This is just CSV. Should be pretty trivial to parse with the csv module. I'm also inclined to put the list of all counters in e.g. xtalos or somewhere the can be globally accessed. This way we can check for validity of what is passed in.

Attachment #657506 - Flags: review?(jhammel) → review+

Comment 65

•

13 years ago

Attached file example CSV — Details

Comment 66

•

13 years ago

Attached file example reading CSV (obsolete) — Details

this is an example of reading the example output from CSV. I don't look for the delimeters, __xperf_data_{start,end}__ (which we currently don't use at all) which needs to be done, but these can be easily found a la the other delimeters in results (e.g. http://hg.mozilla.org/build/talos/file/6d1c5ee867e6/talos/results.py#l330)

Attachment #657537 - Flags: feedback?(jmaher)

Comment 67

•

13 years ago

(it would also be nice to have a test for this parsing)

Assignee

Comment 68

•

13 years ago

Comment on attachment 657537 [details] example reading CSV >import csv > >contents = file('xperf.csv').read() we should do: if not os.path.exists(browser_config['xperf_csvname']): return contents = file(browser_config['xperf_csvname']).read() >lines = contents.splitlines() >reader = csv.reader(lines) >header = None >for row in reader: > > # read CSV > row = [i.strip() for i in row] > if not header: > header = row this assumes the first row is the header, we could look for __xperf_start or something > continue > values = dict(zip(header, row)) > > # format for talos > thread = values['thread'] > if thread == 'non-main': > thread = 'nonmain' come to think of it, we can fix this hack in how we generate the csv in the first place > counter = values['counter'].rsplit('_io_bytes', 1)[0] > counter_name = '%s_%s_%sio' % (thread, values['stage'], counter) > value = float(values['value']) > > # accrue counter > if counter_name in counter_results: > counter_results[counter_name].append(value) I would innitialize the counter first. Also make sure we only assign counters that are legit. if counter_name in xtalos.known_couters: counter_results.setdefault(counter_name, []).append(value)

Attachment #657537 - Flags: feedback?(jmaher) → feedback+

Assignee

Comment 69

•

13 years ago

landed the results->graphserver patch in talos repo: http://hg.mozilla.org/build/talos/rev/a632249f5fe9 still left todo: sanity check try server run deploy talos.zip create patch for buildbot, get r+ and a reconfig

Comment 70

•

13 years ago

(In reply to Joel Maher (:jmaher) from comment #68) > Comment on attachment 657537 [details] > example reading CSV > > >import csv > > > >contents = file('xperf.csv').read() > we should do: > if not os.path.exists(browser_config['xperf_csvname']): > return > contents = file(browser_config['xperf_csvname']).read() Sure, this was just an example for local data pasted from your try run. In real life: - we already have the browser_log in memory: http://hg.mozilla.org/build/talos/file/6d1c5ee867e6/talos/results.py#l284 - we would look for __xperf_start__ and __xperf_end__ like we do the other delimeters (e.g. line 330) and take the content between these delimeters - these become the contents vs the xperf.csv file I read for testing > >lines = contents.splitlines() > >reader = csv.reader(lines) > >header = None > >for row in reader: > > > > # read CSV > > row = [i.strip() for i in row] > > if not header: > > header = row > this assumes the first row is the header, we could look for __xperf_start or > something Currently, we print the header row right after __xperf_start__ > > continue > > values = dict(zip(header, row)) > > > > # format for talos > > thread = values['thread'] > > if thread == 'non-main': > > thread = 'nonmain' > come to think of it, we can fix this hack in how we generate the csv in the > first place Even better :) > > counter = values['counter'].rsplit('_io_bytes', 1)[0] > > counter_name = '%s_%s_%sio' % (thread, values['stage'], counter) > > value = float(values['value']) > > > > # accrue counter > > if counter_name in counter_results: > > counter_results[counter_name].append(value) > I would innitialize the counter first. Also make sure we only assign > counters that are legit. > if counter_name in xtalos.known_couters: > counter_results.setdefault(counter_name, []).append(value) Yep, even better :) In any case, I'm fine doing this as a follow up if you want to check in your patch. I mostly wrote this as a proof of concept.

Assignee

Comment 71

•

13 years ago

I would rather do this as a followup so we can get the xperf tests live as fast as possible. Although with the long weekend, it might be a couple days before we really do get the tests live.

Comment 72

•

13 years ago

sounds good to me

Assignee

Comment 73

•

13 years ago

Attached patch fix bug in counters for tp5n on windows, cleanup .csv parsing (1.0) — Details — Splinter Review

this incorporates the suggestions :jhammel had for csv parsing of xperf.csv. No more hacking through browser_output.txt! There was a bug in the regular tp5 runs where we were trying to access counters which didn't exist. This is resolved in the patch by better error checking.

Attachment #657537 - Attachment is obsolete: true

Attachment #658128 - Flags: review?(jhammel)

Comment 74

•

13 years ago

Comment on attachment 658128 [details] [diff] [review] fix bug in counters for tp5n on windows, cleanup .csv parsing (1.0) + if len(self.registeredCounters[counter]) < 2: + return None + Not sure I understand this line? + filename = 'etl_output_thread_stats.csv' + if not os.path.exists(filename): + return AIUI, if you have xperf counters, this file *should* exist. So maybe we should make a debug statement here? + import csv Personally I'd put this at the top of the file. Hmmm....there's also no error handling if the file is of a bad format. I guess that's not a problem until it is. Other than those nits, looks good to me!

Attachment #658128 - Flags: review?(jhammel) → review+

Assignee

Comment 75

•

13 years ago

updated with feedback from review and landed: http://hg.mozilla.org/build/talos/rev/2605620aa7fc

Assignee

Updated

•

13 years ago

Depends on: 788235

Assignee

Updated

•

13 years ago

Depends on: 788354

Assignee

Comment 76

•

13 years ago

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

Reporter

Comment 77

•

13 years ago

What does the _paint suffix mean?

Assignee

Comment 78

•

13 years ago

it is a relic of talos. It means that we are using waiting for mozAfterPaint before going to the next page (what we do by default now in all our tests).

Comment 79

•

13 years ago

(In reply to Taras Glek (:taras) from comment #77) > What does the _paint suffix mean? Please see http://k0s.org/mozilla/blog/20120724135349 for the hoary details. Ultimately, we should not be sending metadata as part of the test name. But for graphserver these things are ultimately tied. For datazilla they will not be

Reporter

Comment 80

•

13 years ago

Joel, do we retain the csv and/or xperf file after running the test? tp5n_main_shutdown_fileio_paint shows a lot of variation(which may be due to some non-determinism in our code). Would be good to compare io to see where the extra io is going.

Assignee

Comment 81

•

13 years ago

Unfortunately we do not save off the .etl or generated .csv file. I could add in a fix to print out the .csv file to stdout and it would show up in the logs. Lets figure out a few things while we watch the tree for another 20-30 pushes. That might bring out other hot spots or patterns.

Reporter

Comment 82

•

13 years ago

(In reply to Joel Maher (:jmaher) from comment #81) > Unfortunately we do not save off the .etl or generated .csv file. I could > add in a fix to print out the .csv file to stdout and it would show up in > the logs. Lets figure out a few things while we watch the tree for another > 20-30 pushes. That might bring out other hot spots or patterns. Can you turn on logging of tp5n_main_shutdown_fileio_paint io?

Assignee

Comment 83

•

13 years ago

For some reason we are not collecting it. It could be we have it configured incorrectly (please check my xperf options: http://hg.mozilla.org/build/talos/file/tip/talos/xperf.config#l12), or maybe the parser is mixing it up. I will be heads down most of this week on mobile stuff. Whoever can get to it first wins.

Reporter

Updated

•

13 years ago

Blocks: 644744

Assignee

Comment 85

•

13 years ago

tp5n_main_shutdown_fileio_paint: 59143.0: http://graphs.mozilla.org/graph.html#tests=[[246,94,12]]&sel=none&displayrange=7&datatype=running Let me know if that is useful or not.

Reporter

Comment 86

•

13 years ago

(In reply to Joel Maher (:jmaher) from comment #85) > tp5n_main_shutdown_fileio_paint: 59143.0: > http://graphs.mozilla.org/graph.html#tests=[[246,94, > 12]]&sel=none&displayrange=7&datatype=running > > Let me know if that is useful or not. It's not. Can you add a summary of files read + bytes from them to the logs? This would be handy for atleast main thread startup/shutdown io(we can skip the others for now to save space).