make xtalos have an api and use it

RESOLVED FIXED

Status

Testing
Talos
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: Jeff Hammel, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

5 years ago
We invoke talos from the CLI:

http://hg.mozilla.org/build/talos/file/aca06caab3ee/talos/bcontroller.py#l110

Instead, we should refactor xtalos such that we can just import it and
call it like real programmers.  The reason it is currently a CLI is so
developers can run it independently.  However, there is no reason
that we can't have an API that is called by a (minimal) CLI.

instead of shelling out, we can do, e.g.

import xtalos.start_xperf

start_xperf.collect(args)

See also bug 779565
(Reporter)

Comment 1

5 years ago
Looking at xtalos, I see a few things that I assume are just bugs:

- we use '+'.join() on several things, e.g. http://hg.mozilla.org/build/talos/file/e3ab367914e6/talos/xtalos/start_xperf.py#l67 ; however, in fact these are only strings: http://hg.mozilla.org/build/talos/file/e3ab367914e6/talos/xtalos/xtalos.py#l84  so
'+'.join('FILE_IO') in fact => "F+I+L+E+_+I+O"; I assume we want lists here?

- http://hg.mozilla.org/build/talos/file/e3ab367914e6/talos/xtalos/etlparser.py#l239 :

   239       if img == procName:
   240         gThreads[tid] = "main"
   241       else:
   242         "non-main"

I assume this should be gThreads[tid] = 'non-main' ?
yeah, that is a good find.
(Reporter)

Comment 3

5 years ago
This is also very cute:

http://hg.mozilla.org/build/talos/file/e3ab367914e6/talos/bcontroller.py#l116

If starting xperf fails, we set the return code to failure.  Then, regardless of that, we launch the browser and get its return code.  Then, if stopping xperf fails we set it to fail.

I'm not sure what we actually want to do here (maybe worth ticketing of its own right, but for this patch I'll just keep this logic even though it is wrong
we don't need to have backwards compatibility, lets just do it right.
(Reporter)

Comment 5

5 years ago
Sure, I'm just not sure what "right" is here ;)  What should the returncode be in this case?
(Reporter)

Comment 6

5 years ago
Created attachment 652498 [details] [diff] [review]
WIP

this is a rough draft.  I have not tested or proofread it yet, but wanted to get something up to show progress.

I was tempted to use the configuration module.  However, it lies in talos directory, so when running uninstalled (as buildbot does), you won't have (good) access to the configuration file. (I.e. when installed you can do `from talos import configuration`; when not installed, you can't).  While this isn't insurmountable, I decided the cost isn't worth it for the time being.  xtalos also has the inverse sense of what configuration does: given command line options, it overrides them from the configuration file(!) Again, not insurmountable (and probably not a good idea, TBH), but I decided it wasn't worth futzing with yet.
Attachment #652498 - Flags: feedback?(jmaher)
Comment on attachment 652498 [details] [diff] [review]
WIP

Review of attachment 652498 [details] [diff] [review]:
-----------------------------------------------------------------

a lot of changes, but it was spaces for the most part :)

::: talos/bcontroller.py
@@ +99,3 @@
>          self.returncode = 1
>  
> +      # run the xtl parser

etl parser

::: talos/xtalos/etlparser.py
@@ +96,2 @@
>  
>  

please remove the autolog stuff, it is not needed anymore.

@@ +309,3 @@
>      if alog:
> +        # post to autolog
> +        alog.post()

we can remove the alog stuff
Attachment #652498 - Flags: feedback?(jmaher) → feedback+

Comment 8

5 years ago
Try run for 1cab6db86c5d is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=1cab6db86c5d
Results (out of 2 total builds):
    success: 1
    failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-1cab6db86c5d

Comment 9

5 years ago
Try run for 1cab6db86c5d is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=1cab6db86c5d
Results (out of 3 total builds):
    success: 1
    failure: 2
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-1cab6db86c5d
(Reporter)

Comment 10

5 years ago
(In reply to Mozilla RelEng Bot from comment #9)
> Try run for 1cab6db86c5d is complete.
> Detailed breakdown of the results available here:
>     https://tbpl.mozilla.org/?tree=Try&rev=1cab6db86c5d
> Results (out of 3 total builds):
>     success: 1
>     failure: 2
> Builds (or logs if builds failed) available at:
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.
> com-1cab6db86c5d

https://tbpl.mozilla.org/php/getParsedLog.php?id=14546083&tree=Try&full=1

FAIL: Busted: ts_paint
FAIL: initialization timed out
Traceback (most recent call last):
  File "run_tests.py", line 250, in run_tests
    talos_results.add(mytest.runTest(browser_config, test))
  File "c:\talos-slave\talos-data\talos\ttest.py", line 263, in runTest
    self.initializeProfile(profile_dir, browser_config)
  File "c:\talos-slave\talos-data\talos\ttest.py", line 137, in initializeProfile
    if not self._ffsetup.InitializeNewProfile(profile_dir, browser_config):
  File "c:\talos-slave\talos-data\talos\ffsetup.py", line 292, in InitializeNewProfile
    raise talosError("initialization timed out")
talosError: 'initialization timed out'
Traceback (most recent call last):
  File "run_tests.py", line 298, in ?
    main()
  File "run_tests.py", line 295, in main
    run_tests(parser)
  File "run_tests.py", line 259, in run_tests
    raise e
utils.talosError: 'initialization timed out'
Error starting xperf
Error running etlparser
__xperf_data_begin__
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Python24\lib\threading.py", line 442, in __bootstrap
    self.run()
  File "c:\talos-slave\talos-data\talos\bcontroller.py", line 112, in run
    fhandle = open(csvname, 'r')
IOError: [Errno 2] No such file or directory: 'etl_output.csv'

I'm not sure why initialization is timing out.  It makes perfect sense why there is no .csv file afterwards, as the browser is never started to gather stats on.  I'll investigate this locally and see what is awry.
(Reporter)

Comment 11

5 years ago
Running locally I get:

qm-pxp01:
                Started Wed, 22 Aug 2012 15:42:24
Running test ts_paint:
                Started Wed, 22 Aug 2012 15:42:24
DEBUG: operating with platform_type : w7_
DEBUG: created profile
Error starting xperf
c:/Program Files/Microsoft Windows Performance Toolkit/xperf: error: NT Kernel L
ogger: The instance name passed was not recognized as valid by a WMI data provid
er. (0x1069).
c:/Program Files/Microsoft Windows Performance Toolkit/xperf: error: talos_ses:
The instance name passed was not recognized as valid by a WMI data provider. (0x
1069).
Error running etlparser
__xperf_data_begin__
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Python27\Lib\threading.py", line 551, in __bootstrap_inner
    self.run()
  File "C:\Users\jhammel\talos\talos\bcontroller.py", line 112, in run
    fhandle = open(csvname, 'r')
IOError: [Errno 2] No such file or directory: 'etl_output.csv'
(Reporter)

Comment 12

5 years ago
Of course I get this even without my patch

DEBUG: operating with platform_type : w7_
DEBUG: created profile
No xperf providers options given
xperf: error: NT Kernel Logger: The instance name passed
alid by a WMI data provider. (0x1069).
xperf: error: talos_ses: The instance name passed was no
 a WMI data provider. (0x1069).
xperf: error: test.etl: The specified path is invalid. (
xperf: error: test.etl: The specified path is invalid. (
in readfile: test.etl.csv
__xperf_data_begin__
filename, readcount, readbytes, writecount, writebytes

__xperf_data_end__
NOISE: Could not find __metrics(.*)__metrics in browser_
NOISE: Raw results:__start_report82|82|83|81|86|111|106|
3|82|106|82|83|106__end_report__startTimestamp1345674758
NOISE: openingTimes=82,83,81,86,111,106,91,106,82,82,104
06
NOISE: avgOpenTime:91
NOISE: minOpenTime:81
NOISE: maxOpenTime:111
NOISE: medOpenTime:83
NOISE: __xulWinOpenTime:83
NOISE: __startBeforeLaunchTimestamp1345674714916__endBef
NOISE: __startAfterTerminationTimestamp1345674758243__en
mp
NOISE: __startBeforeLaunchTimestamp1345675796160__endBef
NOISE: __startAfterTerminationTimestamp1345675799718__en
mp
NOISE: Initialization of new profile failed
NOISE: __start_report82|82|83|81|86|111|106|91|106|82|82
83|106__end_report__startTimestamp1345674758087__endTime
NOISE: openingTimes=82,83,81,86,111,106,91,106,82,82,104
06
NOISE: avgOpenTime:91
NOISE: minOpenTime:81
NOISE: maxOpenTime:111
NOISE: medOpenTime:83
NOISE: __xulWinOpenTime:83
NOISE: __startBeforeLaunchTimestamp1345674714916__endBef
NOISE: __startAfterTerminationTimestamp1345674758243__en
mp
NOISE: __startBeforeLaunchTimestamp1345675796160__endBef
NOISE: __startAfterTerminationTimestamp1345675799718__en
mp
Failed ts_paint:
                Stopped Wed, 22 Aug 2012 15:50:01
FAIL: Busted: ts_paint
FAIL: failed to initialize browser
Traceback (most recent call last):
  File "C:\Users\jhammel\talos\talos\run_tests.py", line
    talos_results.add(mytest.runTest(browser_config, tes
  File "C:\Users\jhammel\talos\talos\ttest.py", line 263
    self.initializeProfile(profile_dir, browser_config)
  File "C:\Users\jhammel\talos\talos\ttest.py", line 138
    raise talosError("failed to initialize browser")
talosError: 'failed to initialize browser'
Traceback (most recent call last):
  File "C:\Users\jhammel\talos\Scripts\talos-script.py",
    load_entry_point('talos==0.0', 'console_scripts', 't
  File "C:\Users\jhammel\talos\talos\run_tests.py", line
    run_tests(parser)
  File "C:\Users\jhammel\talos\talos\run_tests.py", line
    raise e
talos.utils.talosError: 'failed to initialize browser'
We are currently running ts_paint (not tpaint) for xperf and will be switching to tp5n, makes sense to focus on those tests for now.

Make sure you are not requesting Mozilla specific performance counters, look at the original xperf.config file from a month or two ago.
(Reporter)

Comment 14

5 years ago
Ah, you are correct, sir!  I was forgetting to specify the xperf.conf file.  With this, I can indeed reproduce the problem: works "fine" without my patch (actually, there are errors, but non-fatal ones that probably different from our production xperf setup), and fails with my patch
(Reporter)

Comment 15

5 years ago
Created attachment 655045 [details] [diff] [review]
WIP
Attachment #652498 - Attachment is obsolete: true
(Reporter)

Comment 16

5 years ago
Created attachment 655146 [details] [diff] [review]
fixed, i hope

I *think* this resolves all of the important issues.  I've also removed posting to autolog, which is a plus anyway.  None of this is perfect -- partially due to bcontroller's, um "idiom", and partially as it is a straight port of what we have before.  For instance, the first time we run bcontroller for the example launch, we don't have xperf stackwalker options.  Too bad. It doesn't fail, just prints out a stupid error message.

In any case, I think this is much better and something worth building on.
Attachment #655045 - Attachment is obsolete: true
Attachment #655146 - Flags: review?(jmaher)
(Reporter)

Comment 17

5 years ago
(In reply to Jeff Hammel [:jhammel] from comment #16)
> Created attachment 655146 [details] [diff] [review]
> fixed, i hope
> 
> I *think* this resolves all of the important issues.  I've also removed
> posting to autolog, which is a plus anyway.  None of this is perfect --
> partially due to bcontroller's, um "idiom", and partially as it is a
> straight port of what we have before.  For instance, the first time we run
> bcontroller for the example launch, we don't have xperf stackwalker options.
> Too bad. It doesn't fail, just prints out a stupid error message.
> 
> In any case, I think this is much better and something worth building on.

pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=fc9be2298de0
Comment on attachment 655146 [details] [diff] [review]
fixed, i hope

Review of attachment 655146 [details] [diff] [review]:
-----------------------------------------------------------------

I ran this locally and it produces expected results.  just a little nit.

We will be adjusting this code in a few more iterations, so lets get this in for now and keep iterating on it throughout next week.

::: talos/bcontroller.py
@@ +99,3 @@
>          self.returncode = 1
>  
> +      # run the xtl parser

s/xtl/etl/
Attachment #655146 - Flags: review?(jmaher) → review+
(Reporter)

Comment 19

5 years ago
Created attachment 655194 [details] [diff] [review]
https://bugzilla.mozilla.org/show_bug.cgi?id=779571#c18
(Reporter)

Comment 20

5 years ago
So xperf results are in and they're green: https://tbpl.mozilla.org/php/getParsedLog.php?id=14684471&tree=Try&full=1

For the initial run, we do get:
"""
Error starting xperf: No xperf stackwalk options given
etlparser: in readfile: test.etl.csv
__xperf_data_begin__
"""

However, we don't care about xperf data for this run and ABICT it should affect results.  Further runs do should xperf results in the log
Looking at the xperf log it is all good.  We get real data.
(Reporter)

Comment 22

5 years ago
I'm going to go ahead and push.  The probability that this will affect any pending test (mostly linux opt right now) is fairly negligible.  Feel free to back out if it does cause problems
(Reporter)

Comment 23

5 years ago
pushed: http://hg.mozilla.org/build/talos/rev/6d79047595a4
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
(Reporter)

Updated

5 years ago
Duplicate of this bug: 779565
(Reporter)

Updated

5 years ago
Duplicate of this bug: 766602
(Reporter)

Updated

5 years ago
Duplicate of this bug: 725709
(Reporter)

Updated

5 years ago
Duplicate of this bug: 712674
You need to log in before you can comment on or make changes to this bug.