Closed Bug 779571 Opened 12 years ago Closed 12 years ago

make xtalos have an api and use it

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: k0scist, Unassigned)

References

Details

Attachments

(2 files, 2 obsolete files)

We invoke talos from the CLI:

http://hg.mozilla.org/build/talos/file/aca06caab3ee/talos/bcontroller.py#l110

Instead, we should refactor xtalos such that we can just import it and
call it like real programmers.  The reason it is currently a CLI is so
developers can run it independently.  However, there is no reason
that we can't have an API that is called by a (minimal) CLI.

instead of shelling out, we can do, e.g.

import xtalos.start_xperf

start_xperf.collect(args)

See also bug 779565
Looking at xtalos, I see a few things that I assume are just bugs:

- we use '+'.join() on several things, e.g. http://hg.mozilla.org/build/talos/file/e3ab367914e6/talos/xtalos/start_xperf.py#l67 ; however, in fact these are only strings: http://hg.mozilla.org/build/talos/file/e3ab367914e6/talos/xtalos/xtalos.py#l84  so
'+'.join('FILE_IO') in fact => "F+I+L+E+_+I+O"; I assume we want lists here?

- http://hg.mozilla.org/build/talos/file/e3ab367914e6/talos/xtalos/etlparser.py#l239 :

   239       if img == procName:
   240         gThreads[tid] = "main"
   241       else:
   242         "non-main"

I assume this should be gThreads[tid] = 'non-main' ?
yeah, that is a good find.
This is also very cute:

http://hg.mozilla.org/build/talos/file/e3ab367914e6/talos/bcontroller.py#l116

If starting xperf fails, we set the return code to failure.  Then, regardless of that, we launch the browser and get its return code.  Then, if stopping xperf fails we set it to fail.

I'm not sure what we actually want to do here (maybe worth ticketing of its own right, but for this patch I'll just keep this logic even though it is wrong
we don't need to have backwards compatibility, lets just do it right.
Sure, I'm just not sure what "right" is here ;)  What should the returncode be in this case?
Attached patch WIP (obsolete) — Splinter Review
this is a rough draft.  I have not tested or proofread it yet, but wanted to get something up to show progress.

I was tempted to use the configuration module.  However, it lies in talos directory, so when running uninstalled (as buildbot does), you won't have (good) access to the configuration file. (I.e. when installed you can do `from talos import configuration`; when not installed, you can't).  While this isn't insurmountable, I decided the cost isn't worth it for the time being.  xtalos also has the inverse sense of what configuration does: given command line options, it overrides them from the configuration file(!) Again, not insurmountable (and probably not a good idea, TBH), but I decided it wasn't worth futzing with yet.
Attachment #652498 - Flags: feedback?(jmaher)
Comment on attachment 652498 [details] [diff] [review]
WIP

Review of attachment 652498 [details] [diff] [review]:
-----------------------------------------------------------------

a lot of changes, but it was spaces for the most part :)

::: talos/bcontroller.py
@@ +99,3 @@
>          self.returncode = 1
>  
> +      # run the xtl parser

etl parser

::: talos/xtalos/etlparser.py
@@ +96,2 @@
>  
>  

please remove the autolog stuff, it is not needed anymore.

@@ +309,3 @@
>      if alog:
> +        # post to autolog
> +        alog.post()

we can remove the alog stuff
Attachment #652498 - Flags: feedback?(jmaher) → feedback+
Try run for 1cab6db86c5d is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=1cab6db86c5d
Results (out of 2 total builds):
    success: 1
    failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-1cab6db86c5d
Try run for 1cab6db86c5d is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=1cab6db86c5d
Results (out of 3 total builds):
    success: 1
    failure: 2
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-1cab6db86c5d
(In reply to Mozilla RelEng Bot from comment #9)
> Try run for 1cab6db86c5d is complete.
> Detailed breakdown of the results available here:
>     https://tbpl.mozilla.org/?tree=Try&rev=1cab6db86c5d
> Results (out of 3 total builds):
>     success: 1
>     failure: 2
> Builds (or logs if builds failed) available at:
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.
> com-1cab6db86c5d

https://tbpl.mozilla.org/php/getParsedLog.php?id=14546083&tree=Try&full=1

FAIL: Busted: ts_paint
FAIL: initialization timed out
Traceback (most recent call last):
  File "run_tests.py", line 250, in run_tests
    talos_results.add(mytest.runTest(browser_config, test))
  File "c:\talos-slave\talos-data\talos\ttest.py", line 263, in runTest
    self.initializeProfile(profile_dir, browser_config)
  File "c:\talos-slave\talos-data\talos\ttest.py", line 137, in initializeProfile
    if not self._ffsetup.InitializeNewProfile(profile_dir, browser_config):
  File "c:\talos-slave\talos-data\talos\ffsetup.py", line 292, in InitializeNewProfile
    raise talosError("initialization timed out")
talosError: 'initialization timed out'
Traceback (most recent call last):
  File "run_tests.py", line 298, in ?
    main()
  File "run_tests.py", line 295, in main
    run_tests(parser)
  File "run_tests.py", line 259, in run_tests
    raise e
utils.talosError: 'initialization timed out'
Error starting xperf
Error running etlparser
__xperf_data_begin__
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Python24\lib\threading.py", line 442, in __bootstrap
    self.run()
  File "c:\talos-slave\talos-data\talos\bcontroller.py", line 112, in run
    fhandle = open(csvname, 'r')
IOError: [Errno 2] No such file or directory: 'etl_output.csv'

I'm not sure why initialization is timing out.  It makes perfect sense why there is no .csv file afterwards, as the browser is never started to gather stats on.  I'll investigate this locally and see what is awry.
Running locally I get:

qm-pxp01:
                Started Wed, 22 Aug 2012 15:42:24
Running test ts_paint:
                Started Wed, 22 Aug 2012 15:42:24
DEBUG: operating with platform_type : w7_
DEBUG: created profile
Error starting xperf
c:/Program Files/Microsoft Windows Performance Toolkit/xperf: error: NT Kernel L
ogger: The instance name passed was not recognized as valid by a WMI data provid
er. (0x1069).
c:/Program Files/Microsoft Windows Performance Toolkit/xperf: error: talos_ses:
The instance name passed was not recognized as valid by a WMI data provider. (0x
1069).
Error running etlparser
__xperf_data_begin__
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Python27\Lib\threading.py", line 551, in __bootstrap_inner
    self.run()
  File "C:\Users\jhammel\talos\talos\bcontroller.py", line 112, in run
    fhandle = open(csvname, 'r')
IOError: [Errno 2] No such file or directory: 'etl_output.csv'
Of course I get this even without my patch

DEBUG: operating with platform_type : w7_
DEBUG: created profile
No xperf providers options given
xperf: error: NT Kernel Logger: The instance name passed
alid by a WMI data provider. (0x1069).
xperf: error: talos_ses: The instance name passed was no
 a WMI data provider. (0x1069).
xperf: error: test.etl: The specified path is invalid. (
xperf: error: test.etl: The specified path is invalid. (
in readfile: test.etl.csv
__xperf_data_begin__
filename, readcount, readbytes, writecount, writebytes

__xperf_data_end__
NOISE: Could not find __metrics(.*)__metrics in browser_
NOISE: Raw results:__start_report82|82|83|81|86|111|106|
3|82|106|82|83|106__end_report__startTimestamp1345674758
NOISE: openingTimes=82,83,81,86,111,106,91,106,82,82,104
06
NOISE: avgOpenTime:91
NOISE: minOpenTime:81
NOISE: maxOpenTime:111
NOISE: medOpenTime:83
NOISE: __xulWinOpenTime:83
NOISE: __startBeforeLaunchTimestamp1345674714916__endBef
NOISE: __startAfterTerminationTimestamp1345674758243__en
mp
NOISE: __startBeforeLaunchTimestamp1345675796160__endBef
NOISE: __startAfterTerminationTimestamp1345675799718__en
mp
NOISE: Initialization of new profile failed
NOISE: __start_report82|82|83|81|86|111|106|91|106|82|82
83|106__end_report__startTimestamp1345674758087__endTime
NOISE: openingTimes=82,83,81,86,111,106,91,106,82,82,104
06
NOISE: avgOpenTime:91
NOISE: minOpenTime:81
NOISE: maxOpenTime:111
NOISE: medOpenTime:83
NOISE: __xulWinOpenTime:83
NOISE: __startBeforeLaunchTimestamp1345674714916__endBef
NOISE: __startAfterTerminationTimestamp1345674758243__en
mp
NOISE: __startBeforeLaunchTimestamp1345675796160__endBef
NOISE: __startAfterTerminationTimestamp1345675799718__en
mp
Failed ts_paint:
                Stopped Wed, 22 Aug 2012 15:50:01
FAIL: Busted: ts_paint
FAIL: failed to initialize browser
Traceback (most recent call last):
  File "C:\Users\jhammel\talos\talos\run_tests.py", line
    talos_results.add(mytest.runTest(browser_config, tes
  File "C:\Users\jhammel\talos\talos\ttest.py", line 263
    self.initializeProfile(profile_dir, browser_config)
  File "C:\Users\jhammel\talos\talos\ttest.py", line 138
    raise talosError("failed to initialize browser")
talosError: 'failed to initialize browser'
Traceback (most recent call last):
  File "C:\Users\jhammel\talos\Scripts\talos-script.py",
    load_entry_point('talos==0.0', 'console_scripts', 't
  File "C:\Users\jhammel\talos\talos\run_tests.py", line
    run_tests(parser)
  File "C:\Users\jhammel\talos\talos\run_tests.py", line
    raise e
talos.utils.talosError: 'failed to initialize browser'
We are currently running ts_paint (not tpaint) for xperf and will be switching to tp5n, makes sense to focus on those tests for now.

Make sure you are not requesting Mozilla specific performance counters, look at the original xperf.config file from a month or two ago.
Ah, you are correct, sir!  I was forgetting to specify the xperf.conf file.  With this, I can indeed reproduce the problem: works "fine" without my patch (actually, there are errors, but non-fatal ones that probably different from our production xperf setup), and fails with my patch
Attached patch WIP (obsolete) — Splinter Review
Attachment #652498 - Attachment is obsolete: true
Attached patch fixed, i hopeSplinter Review
I *think* this resolves all of the important issues.  I've also removed posting to autolog, which is a plus anyway.  None of this is perfect -- partially due to bcontroller's, um "idiom", and partially as it is a straight port of what we have before.  For instance, the first time we run bcontroller for the example launch, we don't have xperf stackwalker options.  Too bad. It doesn't fail, just prints out a stupid error message.

In any case, I think this is much better and something worth building on.
Attachment #655045 - Attachment is obsolete: true
Attachment #655146 - Flags: review?(jmaher)
(In reply to Jeff Hammel [:jhammel] from comment #16)
> Created attachment 655146 [details] [diff] [review]
> fixed, i hope
> 
> I *think* this resolves all of the important issues.  I've also removed
> posting to autolog, which is a plus anyway.  None of this is perfect --
> partially due to bcontroller's, um "idiom", and partially as it is a
> straight port of what we have before.  For instance, the first time we run
> bcontroller for the example launch, we don't have xperf stackwalker options.
> Too bad. It doesn't fail, just prints out a stupid error message.
> 
> In any case, I think this is much better and something worth building on.

pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=fc9be2298de0
Comment on attachment 655146 [details] [diff] [review]
fixed, i hope

Review of attachment 655146 [details] [diff] [review]:
-----------------------------------------------------------------

I ran this locally and it produces expected results.  just a little nit.

We will be adjusting this code in a few more iterations, so lets get this in for now and keep iterating on it throughout next week.

::: talos/bcontroller.py
@@ +99,3 @@
>          self.returncode = 1
>  
> +      # run the xtl parser

s/xtl/etl/
Attachment #655146 - Flags: review?(jmaher) → review+
So xperf results are in and they're green: https://tbpl.mozilla.org/php/getParsedLog.php?id=14684471&tree=Try&full=1

For the initial run, we do get:
"""
Error starting xperf: No xperf stackwalk options given
etlparser: in readfile: test.etl.csv
__xperf_data_begin__
"""

However, we don't care about xperf data for this run and ABICT it should affect results.  Further runs do should xperf results in the log
Looking at the xperf log it is all good.  We get real data.
I'm going to go ahead and push.  The probability that this will affect any pending test (mostly linux opt right now) is fairly negligible.  Feel free to back out if it does cause problems
pushed: http://hg.mozilla.org/build/talos/rev/6d79047595a4
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: