779571 - make xtalos have an api and use it

Reporter

Description

•

12 years ago

We invoke talos from the CLI:

http://hg.mozilla.org/build/talos/file/aca06caab3ee/talos/bcontroller.py#l110

Instead, we should refactor xtalos such that we can just import it and
call it like real programmers.  The reason it is currently a CLI is so
developers can run it independently.  However, there is no reason
that we can't have an API that is called by a (minimal) CLI.

instead of shelling out, we can do, e.g.

import xtalos.start_xperf

start_xperf.collect(args)

See also bug 779565

Jeff Hammel

Reporter

Comment 1

•

12 years ago

Looking at xtalos, I see a few things that I assume are just bugs:

- we use '+'.join() on several things, e.g. http://hg.mozilla.org/build/talos/file/e3ab367914e6/talos/xtalos/start_xperf.py#l67 ; however, in fact these are only strings: http://hg.mozilla.org/build/talos/file/e3ab367914e6/talos/xtalos/xtalos.py#l84  so
'+'.join('FILE_IO') in fact => "F+I+L+E+_+I+O"; I assume we want lists here?

- http://hg.mozilla.org/build/talos/file/e3ab367914e6/talos/xtalos/etlparser.py#l239 :

   239       if img == procName:
   240         gThreads[tid] = "main"
   241       else:
   242         "non-main"

I assume this should be gThreads[tid] = 'non-main' ?

Joel Maher ( :jmaher ) (UTC -8)

Comment 2

•

12 years ago

yeah, that is a good find.

Jeff Hammel

Reporter

Comment 3

•

12 years ago

This is also very cute:

http://hg.mozilla.org/build/talos/file/e3ab367914e6/talos/bcontroller.py#l116

If starting xperf fails, we set the return code to failure.  Then, regardless of that, we launch the browser and get its return code.  Then, if stopping xperf fails we set it to fail.

I'm not sure what we actually want to do here (maybe worth ticketing of its own right, but for this patch I'll just keep this logic even though it is wrong

Joel Maher ( :jmaher ) (UTC -8)

Comment 4

•

12 years ago

we don't need to have backwards compatibility, lets just do it right.

Jeff Hammel

Reporter

Comment 5

•

12 years ago

Sure, I'm just not sure what "right" is here ;)  What should the returncode be in this case?

Jeff Hammel

Reporter

Comment 6

•

12 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

this is a rough draft.  I have not tested or proofread it yet, but wanted to get something up to show progress.

I was tempted to use the configuration module.  However, it lies in talos directory, so when running uninstalled (as buildbot does), you won't have (good) access to the configuration file. (I.e. when installed you can do `from talos import configuration`; when not installed, you can't).  While this isn't insurmountable, I decided the cost isn't worth it for the time being.  xtalos also has the inverse sense of what configuration does: given command line options, it overrides them from the configuration file(!) Again, not insurmountable (and probably not a good idea, TBH), but I decided it wasn't worth futzing with yet.

Attachment #652498 - Flags: feedback?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 7

•

12 years ago

Comment on attachment 652498 [details] [diff] [review]
WIP

Review of attachment 652498 [details] [diff] [review]:
-----------------------------------------------------------------

a lot of changes, but it was spaces for the most part :)

::: talos/bcontroller.py
@@ +99,3 @@
>          self.returncode = 1
>  
> +      # run the xtl parser

etl parser

::: talos/xtalos/etlparser.py
@@ +96,2 @@
>  
>  

please remove the autolog stuff, it is not needed anymore.

@@ +309,3 @@
>      if alog:
> +        # post to autolog
> +        alog.post()

we can remove the alog stuff

Attachment #652498 - Flags: feedback?(jmaher) → feedback+

Mozilla RelEng Bot

Comment 8

•

12 years ago

Try run for 1cab6db86c5d is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=1cab6db86c5d
Results (out of 2 total builds):
    success: 1
    failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-1cab6db86c5d

Mozilla RelEng Bot

Comment 9

•

12 years ago

Try run for 1cab6db86c5d is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=1cab6db86c5d
Results (out of 3 total builds):
    success: 1
    failure: 2
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-1cab6db86c5d

Jeff Hammel

Reporter

Comment 10

•

12 years ago

(In reply to Mozilla RelEng Bot from comment #9)
> Try run for 1cab6db86c5d is complete.
> Detailed breakdown of the results available here:
>     https://tbpl.mozilla.org/?tree=Try&rev=1cab6db86c5d
> Results (out of 3 total builds):
>     success: 1
>     failure: 2
> Builds (or logs if builds failed) available at:
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.
> com-1cab6db86c5d

https://tbpl.mozilla.org/php/getParsedLog.php?id=14546083&tree=Try&full=1

FAIL: Busted: ts_paint
FAIL: initialization timed out
Traceback (most recent call last):
  File "run_tests.py", line 250, in run_tests
    talos_results.add(mytest.runTest(browser_config, test))
  File "c:\talos-slave\talos-data\talos\ttest.py", line 263, in runTest
    self.initializeProfile(profile_dir, browser_config)
  File "c:\talos-slave\talos-data\talos\ttest.py", line 137, in initializeProfile
    if not self._ffsetup.InitializeNewProfile(profile_dir, browser_config):
  File "c:\talos-slave\talos-data\talos\ffsetup.py", line 292, in InitializeNewProfile
    raise talosError("initialization timed out")
talosError: 'initialization timed out'
Traceback (most recent call last):
  File "run_tests.py", line 298, in ?
    main()
  File "run_tests.py", line 295, in main
    run_tests(parser)
  File "run_tests.py", line 259, in run_tests
    raise e
utils.talosError: 'initialization timed out'
Error starting xperf
Error running etlparser
__xperf_data_begin__
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Python24\lib\threading.py", line 442, in __bootstrap
    self.run()
  File "c:\talos-slave\talos-data\talos\bcontroller.py", line 112, in run
    fhandle = open(csvname, 'r')
IOError: [Errno 2] No such file or directory: 'etl_output.csv'

I'm not sure why initialization is timing out.  It makes perfect sense why there is no .csv file afterwards, as the browser is never started to gather stats on.  I'll investigate this locally and see what is awry.

Jeff Hammel

Reporter

Comment 11

•

12 years ago

Running locally I get:

qm-pxp01:
                Started Wed, 22 Aug 2012 15:42:24
Running test ts_paint:
                Started Wed, 22 Aug 2012 15:42:24
DEBUG: operating with platform_type : w7_
DEBUG: created profile
Error starting xperf
c:/Program Files/Microsoft Windows Performance Toolkit/xperf: error: NT Kernel L
ogger: The instance name passed was not recognized as valid by a WMI data provid
er. (0x1069).
c:/Program Files/Microsoft Windows Performance Toolkit/xperf: error: talos_ses:
The instance name passed was not recognized as valid by a WMI data provider. (0x
1069).
Error running etlparser
__xperf_data_begin__
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Python27\Lib\threading.py", line 551, in __bootstrap_inner
    self.run()
  File "C:\Users\jhammel\talos\talos\bcontroller.py", line 112, in run
    fhandle = open(csvname, 'r')
IOError: [Errno 2] No such file or directory: 'etl_output.csv'

Jeff Hammel

Reporter

Comment 12

•

12 years ago

Of course I get this even without my patch

DEBUG: operating with platform_type : w7_
DEBUG: created profile
No xperf providers options given
xperf: error: NT Kernel Logger: The instance name passed
alid by a WMI data provider. (0x1069).
xperf: error: talos_ses: The instance name passed was no
 a WMI data provider. (0x1069).
xperf: error: test.etl: The specified path is invalid. (
xperf: error: test.etl: The specified path is invalid. (
in readfile: test.etl.csv
__xperf_data_begin__
filename, readcount, readbytes, writecount, writebytes

__xperf_data_end__
NOISE: Could not find __metrics(.*)__metrics in browser_
NOISE: Raw results:__start_report82|82|83|81|86|111|106|
3|82|106|82|83|106__end_report__startTimestamp1345674758
NOISE: openingTimes=82,83,81,86,111,106,91,106,82,82,104
06
NOISE: avgOpenTime:91
NOISE: minOpenTime:81
NOISE: maxOpenTime:111
NOISE: medOpenTime:83
NOISE: __xulWinOpenTime:83
NOISE: __startBeforeLaunchTimestamp1345674714916__endBef
NOISE: __startAfterTerminationTimestamp1345674758243__en
mp
NOISE: __startBeforeLaunchTimestamp1345675796160__endBef
NOISE: __startAfterTerminationTimestamp1345675799718__en
mp
NOISE: Initialization of new profile failed
NOISE: __start_report82|82|83|81|86|111|106|91|106|82|82
83|106__end_report__startTimestamp1345674758087__endTime
NOISE: openingTimes=82,83,81,86,111,106,91,106,82,82,104
06
NOISE: avgOpenTime:91
NOISE: minOpenTime:81
NOISE: maxOpenTime:111
NOISE: medOpenTime:83
NOISE: __xulWinOpenTime:83
NOISE: __startBeforeLaunchTimestamp1345674714916__endBef
NOISE: __startAfterTerminationTimestamp1345674758243__en
mp
NOISE: __startBeforeLaunchTimestamp1345675796160__endBef
NOISE: __startAfterTerminationTimestamp1345675799718__en
mp
Failed ts_paint:
                Stopped Wed, 22 Aug 2012 15:50:01
FAIL: Busted: ts_paint
FAIL: failed to initialize browser
Traceback (most recent call last):
  File "C:\Users\jhammel\talos\talos\run_tests.py", line
    talos_results.add(mytest.runTest(browser_config, tes
  File "C:\Users\jhammel\talos\talos\ttest.py", line 263
    self.initializeProfile(profile_dir, browser_config)
  File "C:\Users\jhammel\talos\talos\ttest.py", line 138
    raise talosError("failed to initialize browser")
talosError: 'failed to initialize browser'
Traceback (most recent call last):
  File "C:\Users\jhammel\talos\Scripts\talos-script.py",
    load_entry_point('talos==0.0', 'console_scripts', 't
  File "C:\Users\jhammel\talos\talos\run_tests.py", line
    run_tests(parser)
  File "C:\Users\jhammel\talos\talos\run_tests.py", line
    raise e
talos.utils.talosError: 'failed to initialize browser'

Joel Maher ( :jmaher ) (UTC -8)

Comment 13

•

12 years ago

We are currently running ts_paint (not tpaint) for xperf and will be switching to tp5n, makes sense to focus on those tests for now.

Make sure you are not requesting Mozilla specific performance counters, look at the original xperf.config file from a month or two ago.

Jeff Hammel

Reporter

Comment 14

•

12 years ago

Ah, you are correct, sir!  I was forgetting to specify the xperf.conf file.  With this, I can indeed reproduce the problem: works "fine" without my patch (actually, there are errors, but non-fatal ones that probably different from our production xperf setup), and fails with my patch

Jeff Hammel

Reporter

Comment 15

•

12 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

Attachment #652498 - Attachment is obsolete: true

Jeff Hammel

Reporter

Comment 16

•

12 years ago

Attached patch fixed, i hope — Details — Splinter Review

I *think* this resolves all of the important issues.  I've also removed posting to autolog, which is a plus anyway.  None of this is perfect -- partially due to bcontroller's, um "idiom", and partially as it is a straight port of what we have before.  For instance, the first time we run bcontroller for the example launch, we don't have xperf stackwalker options.  Too bad. It doesn't fail, just prints out a stupid error message.

In any case, I think this is much better and something worth building on.

Attachment #655045 - Attachment is obsolete: true

Attachment #655146 - Flags: review?(jmaher)

Jeff Hammel

Reporter

Comment 17

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #16)
> Created attachment 655146 [details] [diff] [review]
> fixed, i hope
> 
> I *think* this resolves all of the important issues.  I've also removed
> posting to autolog, which is a plus anyway.  None of this is perfect --
> partially due to bcontroller's, um "idiom", and partially as it is a
> straight port of what we have before.  For instance, the first time we run
> bcontroller for the example launch, we don't have xperf stackwalker options.
> Too bad. It doesn't fail, just prints out a stupid error message.
> 
> In any case, I think this is much better and something worth building on.

pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=fc9be2298de0

Joel Maher ( :jmaher ) (UTC -8)

Comment 18

•

12 years ago

Comment on attachment 655146 [details] [diff] [review]
fixed, i hope

Review of attachment 655146 [details] [diff] [review]:
-----------------------------------------------------------------

I ran this locally and it produces expected results.  just a little nit.

We will be adjusting this code in a few more iterations, so lets get this in for now and keep iterating on it throughout next week.

::: talos/bcontroller.py
@@ +99,3 @@
>          self.returncode = 1
>  
> +      # run the xtl parser

s/xtl/etl/

Attachment #655146 - Flags: review?(jmaher) → review+

Jeff Hammel

Reporter

Comment 19

•

12 years ago

Attached patch https://bugzilla.mozilla.org/show_bug.cgi?id=779571#c18 — Details — Splinter Review

Jeff Hammel

Reporter

Comment 20

•

12 years ago

So xperf results are in and they're green: https://tbpl.mozilla.org/php/getParsedLog.php?id=14684471&tree=Try&full=1

For the initial run, we do get:
"""
Error starting xperf: No xperf stackwalk options given
etlparser: in readfile: test.etl.csv
__xperf_data_begin__
"""

However, we don't care about xperf data for this run and ABICT it should affect results.  Further runs do should xperf results in the log

Joel Maher ( :jmaher ) (UTC -8)

Comment 21

•

12 years ago

Looking at the xperf log it is all good.  We get real data.

Jeff Hammel

Reporter

Comment 22

•

12 years ago

I'm going to go ahead and push.  The probability that this will affect any pending test (mostly linux opt right now) is fairly negligible.  Feel free to back out if it does cause problems

Jeff Hammel

Reporter

Comment 23

•

12 years ago

pushed: http://hg.mozilla.org/build/talos/rev/6d79047595a4

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

WIP 12 years ago Jeff Hammel 48.29 KB, patch	jmaher : feedback+	Details \| Diff \| Splinter Review
WIP 12 years ago Jeff Hammel 48.28 KB, patch		Details \| Diff \| Splinter Review
fixed, i hope 12 years ago Jeff Hammel 65.24 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
https://bugzilla.mozilla.org/show_bug.cgi?id=779571#c18 12 years ago Jeff Hammel 65.24 KB, patch		Details \| Diff \| Splinter Review