Closed Bug 897420 Opened 11 years ago Closed 11 years ago

Get a basic set of metrofx talos tests running in automation

Categories

(Release Engineering :: General, defect)

x86
Windows 8
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jimm, Assigned: jlund)

References

Details

Attachments

(5 files, 9 obsolete files)

41.34 KB, patch
jmaher
: review+
Details | Diff | Splinter Review
1.67 KB, patch
armenzg
: review+
Details | Diff | Splinter Review
1.83 KB, patch
armenzg
: review+
Details | Diff | Splinter Review
4.84 KB, patch
mozilla
: review+
Details | Diff | Splinter Review
897 bytes, patch
mozilla
: review+
Details | Diff | Splinter Review
from bug 773817, for starters we would like to get the following talos tests going in metrofx.

a11yr, ts_paint, tsvgr, tsvgr_opacity,
tscrollx, tscrollr, tsvgx, dromaeo_css,
dromaeo_dom, kraken, v8_7, tp5n, tp5o
jlund will be starting next Monday and I would like him to pick this up.
Assignee: nobody → jlund
Hey Jim, will be starting on this. Just trying to set priorities for assigned bugs.

I see this blocks metro-talos. What time requirements are we looking at for this bug (897420) ?
Well we would like these up and running before we roll metrofx out on a train. Currently we're targeting fx 26.

https://wiki.mozilla.org/RapidRelease/Calendar
Roger!

Just to let you know I am setting this to the top of my list as of now :)

So bare with me as I go through a familiarization period and then you should start to see progress.
So just looking at metro unittests (specifically mochitests), we trigger metro mode by passing the Mozharness script the --mochitest-metro-chrome flag:

'c:/mozilla-build/python27/python' '-u' 'scripts/scripts/desktop_unittest.py' '--cfg' 'unittests/win_unittest.py' '--mochitest-suite' 'mochitest-metro-chrome' '--download-symbols' 'ondemand'

http://mxr.mozilla.org/build/source/mozharness/configs/unittests/win_unittest.py#63

This sets up --metro-immersive to be used from mc/source/testing/mochitests/mochitest_options.py http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/mochitest_options.py#303

which sets up runtests.py to use a  -firefoxPath immersiveHelperPath. immersiveHelperPath then tells runtests.py to use ‘metrotestharness.exe’
http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#541
http://mxr.mozilla.org/mozilla-central/source/testing/mozbase/mozrunner/mozrunner/local.py#224

So I iiuc we want to do a similar thing for Talos. Unfortunately, ‘metrotestharness.exe’ seems to be packaged in the tests.zip and Talos w/ Mozharness does not use the tests.zip. Also, Mozinstall (used in mozharness for installing FF) dictates what the binary_path (--exectutablePath) is going to be while Talos’s runtests.py does not *yet* support a --metro-immersive flag.

My thoughts are to overwrite the binary_path by decorating TestingMixin.install() and to download the tests.zip in Mozharness. This means that only the Mozharness repo would be involved. Even if this is not what we ultimately want, hopefully we can trigger some metrofx talos tests from buildbot/mozharness pretty easily without involving much from other teams. This is all untested but I will trying out/running something like this forthcoming attached diff locally on a win8 machine in our build system tomorrow.

Obviously, let me know if you think I am way off on logic/implementation.
Attached patch bug_897420_08042013.diff (obsolete) — Splinter Review
no feedback or review requested. Just posting brainstorming implementation. Will figure out errors tomorrow when something like this is actually run/tested
(In reply to Jordan Lund (:jlund) from comment #5)
> So I iiuc we want to do a similar thing for Talos. Unfortunately,
> ‘metrotestharness.exe’ seems to be packaged in the tests.zip and Talos w/
> Mozharness does not use the tests.zip. Also, Mozinstall (used in mozharness
> for installing FF) dictates what the binary_path (--exectutablePath) is
> going to be while Talos’s runtests.py does not *yet* support a
> --metro-immersive flag.

Where does mozharness install the browser it tests? Hopefully the same place we use for mochitest runs since the browser is registered there on the win8 test slaves automatically.
can we add metrotestharness.exe to the talos repository?  I assume this is a fairly static program.  While this isn't ideal, we could add it to tooltool or some other place.

We added initial support for immersive mode to talos in bug 897417.
(In reply to Joel Maher (:jmaher) from comment #8)
> can we add metrotestharness.exe to the talos repository?  I assume this is a
> fairly static program.  While this isn't ideal, we could add it to tooltool
> or some other place.
> 
> We added initial support for immersive mode to talos in bug 897417.

If we bring down zip builds for testing, we might be able to include it in those as well.
last week we changed talos to run from mozharness instead of talos.zip.  We check out the source tree and install dependencies into the virtual environment that talos is run from.  We have breakpad binaries in talos for crash stack dumping.
> Where does mozharness install the browser it tests? Hopefully the same place
> we use for mochitest runs since the browser is registered there on the win8
> test slaves automatically.

talos and unittests in mozharness rely on this to install firefox:
http://hg.mozilla.org/build/mozharness/file/0fc10eccc784/mozharness/mozilla/testing/testbase.py#l270

Where 'mozinstall' is dictates and returns the location of the executable FF path. On our slave machines this usually would be something like:
C:\slave\test\build\application\firefox.exe
(In reply to Joel Maher (:jmaher) from comment #8)
> can we add metrotestharness.exe to the talos repository?  I assume this is a
> fairly static program.  While this isn't ideal, we could add it to tooltool
> or some other place.
> 
> We added initial support for immersive mode to talos in bug 897417.

I think that would be better. From what I can tell it's static (I ran it in it's own dir/env)

See upcoming comment for recent updates.
Sorry for delay in reply; it was a long weekend and I wanted to make sure things worked on my end.

So I was able to get talos “dromaeojs” (dromaeo_css, dromaeo_dom, kraken:v8_7) to run in metro mode. dromaeo_css barfed near the end (I’ll upload error logs in next comment). 

This was done with just Mozharness locally on a slave machine. I will be trying to integrate it with buildbot tonight/tomorrow morning (Along with the other basic talos tests requested)

The solution is not super pretty. I am doing what I mentioned above: dl’n tests.zip for just the static metrotestharness.exe and copying it into the application dir. I had to copy it over because, from Mozharness, I could not tell talos/talos/run_tests.py to run metrotestharness.exe with an --firefoxPath option. I then overwrote the binary_path so that the --excutablePath points to C:\slave\test\build\application\metrotestharness.exe instead of C:\slave\test\build\application\firefox.exe

Jmaher’s idea sounds better: putting ‘metrotestharness.exe’ in  something like the talos_repo.

In addition to that I was thinking that a nice option would be to mimic mochitests/runtests.py metro mode that I mentioned in comment 5: have Mozharness’s talos script pass a --metro-immersive option to the talos/talos/run_tests.py script. This would set up the ‘metrotestharness.exe’ to be called with something like: --firefoxPath C:\slave\test\build\application\firefox.exe’

Having metrotestharness.exe inside the talos_repo would also make it easier as the path to it would be known/controlled by talos/talos/run_tests.py, and it would not have to be copied to the application dir (eg: C:\slave\test\build\application\) just to be run.

If not, I can leave it as it is and I *think* everything should be fine with just Mozharness, once there are no errors :)

Sorry if this is all known to everyone and more verbose than needed; it’s all new to me and I am trying to cover my tracks in case my logic is faulty :)
17:01:49  WARNING - Can't copy c:/mozilla-build/python27/python27.dll to C:\slave\test\build\venv\Scripts\python27.dll: [Errno 2] No such file or directory: 'c:/mozilla-build/python27/python27.dll'!
17:02:29  WARNING - Unable to install optional package psutil==0.7.1.
17:02:30  WARNING - Unable to install optional package mozsystemmonitor==0.0.0.
17:41:29    ERROR -  Traceback (most recent call last):
17:41:29 CRITICAL -      raise talosError(str(e))
17:41:29 CRITICAL -  utils.talosError: "[Errno 13] Permission denied: 'browser_output.txt'"
17:41:34    ERROR -  Traceback (most recent call last):
17:41:34 CRITICAL -      raise utils.talosError(message)
17:41:34 CRITICAL -  talosError: "Could not find beforeLaunchTime in browser output: (tokens: ('__startBeforeLaunchTimestamp', '__endBeforeLaunchTimestamp')) [browser_output.txt]"
17:41:34    ERROR - Return code: 2

So I ran into this. while running dromaeojs talos tests in metro mode. Looking into it now.
(In reply to Jordan Lund (:jlund) from comment #12)
> (In reply to Joel Maher (:jmaher) from comment #8)
> > can we add metrotestharness.exe to the talos repository?  I assume this is a
> > fairly static program.  While this isn't ideal, we could add it to tooltool
> > or some other place.
> > 
> > We added initial support for immersive mode to talos in bug 897417.
> 
> I think that would be better. From what I can tell it's static (I ran it in
> it's own dir/env)
> 
> See upcoming comment for recent updates.

I suppose this is ok. If the base functionality of launching the browser work, then you should be able to reuse it. However, this exe is checked into the tree, and periodically receives bug fixes from normal mc checkins. Which means if we change some critical we will have to file talos bugs to update the copy of the exe in the talos repo. Not an optimal solution imo, but then again I don't see much changing in metrotestharness over time either.
(In reply to Jordan Lund (:jlund) from comment #14)
> 17:01:49  WARNING - Can't copy c:/mozilla-build/python27/python27.dll to
> C:\slave\test\build\venv\Scripts\python27.dll: [Errno 2] No such file or
> directory: 'c:/mozilla-build/python27/python27.dll'!
> 17:02:29  WARNING - Unable to install optional package psutil==0.7.1.
> 17:02:30  WARNING - Unable to install optional package
> mozsystemmonitor==0.0.0.
> 17:41:29    ERROR -  Traceback (most recent call last):
> 17:41:29 CRITICAL -      raise talosError(str(e))
> 17:41:29 CRITICAL -  utils.talosError: "[Errno 13] Permission denied:
> 'browser_output.txt'"
> 17:41:34    ERROR -  Traceback (most recent call last):
> 17:41:34 CRITICAL -      raise utils.talosError(message)
> 17:41:34 CRITICAL -  talosError: "Could not find beforeLaunchTime in browser
> output: (tokens: ('__startBeforeLaunchTimestamp',
> '__endBeforeLaunchTimestamp')) [browser_output.txt]"
> 17:41:34    ERROR - Return code: 2
> 
> So I ran into this. while running dromaeojs talos tests in metro mode.
> Looking into it now.

We ran into browser output problems with mochitest, so we fixed up metrotestharness such that it would dump the browser's std out as its own. So you should be getting browser output from the harness.
> I suppose this is ok. If the base functionality of launching the browser
> work, then you should be able to reuse it. However, this exe is checked into
> the tree, and periodically receives bug fixes from normal mc checkins. Which
> means if we change some critical we will have to file talos bugs to update
> the copy of the exe in the talos repo. Not an optimal solution imo, but then
> again I don't see much changing in metrotestharness over time either.

how about I just leave it in tests.zip. I added the tests.zip to talos runs in buildbot. I think there will be a performance hit with dl/extracting for just one file on one platform, but would it be better to just get these tests running and then if we find a new permanent home for metrotestharness or bug fixes become less frequent, we point mozharness to that?
(In reply to Jim Mathies [:jimm] from comment #16)
> (In reply to Jordan Lund (:jlund) from comment #14)
> > 17:01:49  WARNING - Can't copy c:/mozilla-build/python27/python27.dll to
> > C:\slave\test\build\venv\Scripts\python27.dll: [Errno 2] No such file or
> > directory: 'c:/mozilla-build/python27/python27.dll'!
> > 17:02:29  WARNING - Unable to install optional package psutil==0.7.1.
> > 17:02:30  WARNING - Unable to install optional package
> > mozsystemmonitor==0.0.0.
> > 17:41:29    ERROR -  Traceback (most recent call last):
> > 17:41:29 CRITICAL -      raise talosError(str(e))
> > 17:41:29 CRITICAL -  utils.talosError: "[Errno 13] Permission denied:
> > 'browser_output.txt'"
> > 17:41:34    ERROR -  Traceback (most recent call last):
> > 17:41:34 CRITICAL -      raise utils.talosError(message)
> > 17:41:34 CRITICAL -  talosError: "Could not find beforeLaunchTime in browser
> > output: (tokens: ('__startBeforeLaunchTimestamp',
> > '__endBeforeLaunchTimestamp')) [browser_output.txt]"
> > 17:41:34    ERROR - Return code: 2
> > 
> > So I ran into this. while running dromaeojs talos tests in metro mode.
> > Looking into it now.
> 
> We ran into browser output problems with mochitest, so we fixed up
> metrotestharness such that it would dump the browser's std out as its own.
> So you should be getting browser output from the harness.

The gap in time here is because I opened just the warnings (and worse logs) from talos_warnings.txt.

However I have uploaded the entire logs from a mozharness run that was triggered from within buildbot. Here are the logs:

http://people.mozilla.org/~jlund/noMetrotestOutput.txt

output does not seem to be coming from metrotestharness.exe. I'll try digging in and find out what I'm doing wrong
More results from other suites. Some seem to had more progress...

other-metro: tscroll a11yr and ts_paint
logs: http://people.mozilla.org/~jlund/otherMetro.html
Error: 16:39:32 CRITICAL -  talosError: "Could not find report in browser output: [('tsformat', ('__start_report', '__end_report')), ('tpformat', ('__start_tp_report', '__end_tp_report'))] [browser_output.txt]"

svgr-metro: tsvgr, tsvgr_opacity
Logs: http://people.mozilla.org/~jlund/svgrMetro.txt
error: 17:11:08 CRITICAL -  talosError: "Could not find report in browser output: [('tsformat', ('__start_report', '__end_report')), ('tpformat', ('__start_tp_report', '__end_tp_report'))] [browser_output.txt]"

tp5o-metro: tp5o
http://people.mozilla.org/~jlund/tp5oMetro.txt
error: 18:18:48 CRITICAL -  talosError: 'timeout exceeded'
> other-metro: tscroll a11yr and ts_paint
> logs: http://people.mozilla.org/~jlund/otherMetro.html
> Error: 16:39:32 CRITICAL -  talosError: "Could not find report in browser
> output: [('tsformat', ('__start_report', '__end_report')), ('tpformat',
> ('__start_tp_report', '__end_tp_report'))] [browser_output.txt]"

Valid output here, which is good to see. However there was a crash while running, crash bug b05406.

> svgr-metro: tsvgr, tsvgr_opacity
> Logs: http://people.mozilla.org/~jlund/svgrMetro.txt
> error: 17:11:08 CRITICAL -  talosError: "Could not find report in browser
> output: [('tsformat', ('__start_report', '__end_report')), ('tpformat',
> ('__start_tp_report', '__end_tp_report'))] [browser_output.txt]"


Some sort of an error running the test? I can try to reproduce - 

ERROR -  pageloader exception: TypeError: window.arguments is undefined

> tp5o-metro: tp5o
> http://people.mozilla.org/~jlund/tp5oMetro.txt
> error: 18:18:48 CRITICAL -  talosError: 'timeout exceeded'

8:18:46     INFO -  Cycle 1(24): loaded http://localhost/page_load_test/tp5n/yelp.com/www.yelp.com/biz/alexanders-steakhouse-cupertino.html (next: http://localhost/page_load_test/tp5n/youku.com/www.youku.com/index.html)
18:18:46     INFO -  RSS: Main: 245301248
18:18:46     INFO -  MetroWidget::GetDPI
18:18:46     INFO -  Cycle 1(25): loaded http://localhost/page_load_test/tp5n/yelp.com/www.yelp.com/biz/a
18:18:46     INFO -  DEBUG : Terminating: metrotestharness, plugin-container, crashreporter, dwwim
18:18:48     INFO -  DEBUG : unknown error during cleanup
18:18:48     INFO -  	Screen width/height:1600/1200
18:18:48     INFO -  	colorDepth:24
18:18:48     INFO -  	Browser inner width/height: 1600/1200
18:18:48     INFO -  browser_name:undefined
18:18:48     INFO -  browser_version:26.0a1
18:18:48     INFO -  buildID:20130806172400
18:18:48     INFO -  Failed tp5o:
18:18:48     INFO -  		Stopped Fri, 09 Aug 2013 18:18:48
18:18:48    ERROR -  Traceback (most recent call last):
18:18:48     INFO -    File "C:\slave\test\build\venv\lib\site-packages\talos\run_tests.py", line 277, in run_tests
18:18:48     INFO -      talos_results.add(mytest.runTest(browser_config, test))
18:18:48     INFO -    File "C:\slave\test\build\venv\lib\site-packages\talos\ttest.py", line 406, in runTest
18:18:48 CRITICAL -      raise talosError("timeout exceeded")
18:18:48 CRITICAL -  talosError: 'timeout exceeded'
18:18:48    ERROR - Return code: 2

Something didn't work at the end there, looks like the page load tests ran fine.
(In reply to Jim Mathies [:jimm] from comment #20)
> running, crash bug b05406.

bug 805406
Product: mozilla.org → Release Engineering
(In reply to Jim Mathies [:jimm] from comment #20)
> > other-metro: tscroll a11yr and ts_paint
> > logs: http://people.mozilla.org/~jlund/otherMetro.html
> > Error: 16:39:32 CRITICAL -  talosError: "Could not find report in browser
> > output: [('tsformat', ('__start_report', '__end_report')), ('tpformat',
> > ('__start_tp_report', '__end_tp_report'))] [browser_output.txt]"
> 
> Valid output here, which is good to see. However there was a crash while
> running, crash bug b05406.
> 

hmm OK, I'm trying a more recent build (one from today) right now. Is there anything I can do from this end?

> 18:18:48 CRITICAL -      raise talosError("timeout exceeded")
> 18:18:48 CRITICAL -  talosError: 'timeout exceeded'
> 18:18:48    ERROR - Return code: 2
> 
> Something didn't work at the end there, looks like the page load tests ran
> fine.

I extended the timeout for this script to 5400s instead of 3600s. It made it past its previous fail but stopped where svgr-metro did:

ERROR -  pageloader exception: TypeError: window.arguments is undefined

So it seems both of those suites are now failing at that point. I'm not sure if this requires a-team and a change in the talos's run_tests.py or if this fail stems from bbot/mozharness.
(In reply to Jordan Lund (:jlund) from comment #22)
> (In reply to Jim Mathies [:jimm] from comment #20)
> > > other-metro: tscroll a11yr and ts_paint
> > > logs: http://people.mozilla.org/~jlund/otherMetro.html
> > > Error: 16:39:32 CRITICAL -  talosError: "Could not find report in browser
> > > output: [('tsformat', ('__start_report', '__end_report')), ('tpformat',
> > > ('__start_tp_report', '__end_tp_report'))] [browser_output.txt]"
> > 
> > Valid output here, which is good to see. However there was a crash while
> > running, crash bug b05406.
> > 
> 
> hmm OK, I'm trying a more recent build (one from today) right now. Is there
> anything I can do from this end?

cross your fingers and hope it doesn't crash. :)

> > 18:18:48 CRITICAL -      raise talosError("timeout exceeded")
> > 18:18:48 CRITICAL -  talosError: 'timeout exceeded'
> > 18:18:48    ERROR - Return code: 2
> > 
> > Something didn't work at the end there, looks like the page load tests ran
> > fine.
> 
> I extended the timeout for this script to 5400s instead of 3600s. It made it
> past its previous fail but stopped where svgr-metro did:
> 
> ERROR -  pageloader exception: TypeError: window.arguments is undefined
> 

This we need to track down. its an exception in some of the changes we landed to get talos working bug 897417. What test suite is this, and are you running with the noChrome or chrome option?
Depends on: 805406
> This we need to track down. its an exception in some of the changes we
> landed to get talos working bug 897417. What test suite is this,

this is for tp5o and svgr (tsvgr, tsvgr_opacity).


> and are you running with the noChrome or chrome option?

tl;dr:
I am not specifying --noChrome which makes me think that the default is to run this tests with chrome?

long explanation:
iiuc, when these talos tests are run in mozharness, they ignore buildbot-configs: options/args set for talos suites. Instead mozharness just uses the keys given (eg: 'tp5o') against a separate talos.json url. I made my own talos.json for the purpose of testing that included '*-metro' keys. The values(options) for these are the exact same as the non metro equivalents:

http://hg.mozilla.org/users/jlund_mozilla.com/talos-json/raw-file/b374e24e2e6f/talos.json

so for example: 
['tp5o-metro']['tests'] == 'tp50'  # and 
['tp5o-metro']['talos_options'] == [
                "--mozAfterPaint",
                "--responsiveness",
                "--filter",
                "ignore_first:5",
                "--filter",
                "median",
                "--test_timeout",
                "3600"
            ]

results in mozharn eventually running:
17:18:26     INFO - Copy/paste: C:\slave\test\build\venv\Scripts\talos --noisy --debug -v --executablePath C:\slave\test\build\application\firefox\metrotestharness --title T-W864-IX-042 --symbolsPath http://stage.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-win32/1375835040/firefox-26.0a1.en-US.win32.crashreporter-symbols.zip --activeTests tp5o --results_url http://graphs.mozilla.org/server/collect.cgi --output talos.yml --branchName Cedar --datazilla-url https://datazilla.mozilla.org/talos --mozAfterPaint --responsiveness --filter ignore_first:5 --filter median --test_timeout 3600 --webServer localhost

I could add --noChrome to this talos.json for '*-metro' keys if you think I should?
(In reply to Jim Mathies [:jimm] from comment #23)
> (In reply to Jordan Lund (:jlund) from comment #22)
> > (In reply to Jim Mathies [:jimm] from comment #20)
> > > > other-metro: tscroll a11yr and ts_paint
> > > > logs: http://people.mozilla.org/~jlund/otherMetro.html
> > > > Error: 16:39:32 CRITICAL -  talosError: "Could not find report in browser
> > > > output: [('tsformat', ('__start_report', '__end_report')), ('tpformat',
> > > > ('__start_tp_report', '__end_tp_report'))] [browser_output.txt]"
> > > 
> > > Valid output here, which is good to see. However there was a crash while
> > > running, crash bug b05406.
> > > 
> > 
> > hmm OK, I'm trying a more recent build (one from today) right now. Is there
> > anything I can do from this end?
> 
> cross your fingers and hope it doesn't crash. :)
> 

I crossed and all, but the computer god still said no. However this suite ("tscrollr:a11yr:ts_paint") now failed with the same problem as 'tp50' and 'tsvgr:tsvgr_opacity'. So now all these suites relate to bug 897417 and ERROR -  pageloader exception: TypeError: window.arguments is undefined

BTW - I am doing these tests in 32bit mode. This I believe how we do talos win8 tests(non metro) and mochitest-metro-chrome tests. Should I be using 64bit?
Looking at pageloader code - 

http://mxr.mozilla.org/build/source/talos/talos/pageloader/components/tp-cmdline.js
http://mxr.mozilla.org/build/source/talos/talos/pageloader/chrome/pageloader.js#83

seems like you are trying to run no chrome, since there's no access to window.arguments if a chrome run is launching, afaict. However if it is no chrome then we load up the test in the base window, so window.arguments should be valid.

We should test this in a local run.
> BTW - I am doing these tests in 32bit mode. This I believe how we do talos
> win8 tests(non metro) and mochitest-metro-chrome tests. Should I be using
> 64bit?

We should be testing with 32-bit builds of the browser.
We did some work on running these locally for devs recently, anyone have a pointer to how to do that?
(In reply to Jim Mathies [:jimm] from comment #28)
> We did some work on running these locally for devs recently, anyone have a
> pointer to how to do that?

If it would help, I can loan you the slave that I am working on. If it were just for running/debugging on a build machine, it shouldn't be a big problem. If you wanted to modify the image, we would just have to document everything so that when we re-image the machine, we could mirror any changes done.

This is only if it is something you want/think it be be helpful. Or you could also just get me to run something you want and post you the logs. Either way I am still trying to solve the issue. I am running a suite ATM with explicitly specifying '--noChrome' and it is running longer than it's usual ~5 min. Weird (fingers crossed).

Have you tried running locally what you did here: https://bugzilla.mozilla.org/show_bug.cgi?id=773817#c29

after changes were implemented here: https://bugzilla.mozilla.org/show_bug.cgi?id=897417#c13

without specifying '--noChrome' ?
hmm so I think with '--noChrome' I get past the window.args undefined error.

this makes me think that I am never entering this condition block:
http://mxr.mozilla.org/build/source/talos/talos/pageloader/chrome/pageloader.js#104

but I am always taking this 'else' block:
http://mxr.mozilla.org/build/source/talos/talos/pageloader/chrome/pageloader.js#112

and you mentioned in in comment 26 that window.arguments will be valid if it is 'no chrome' so maybe thats why it's working when I pass '--noChrome'

I didn't get too far anyway after 'window.arguments'. Looks like I am still missing output from the 'metrotestharness.exe' when doing dromaeojs tests and hitting something like bug 805406 with 'tscrollr, a11yr, and ts_paint' tests
if you need any help debugging talos locally or questions about some of it, I would be happy to help.  I am not sure why you wouldn't be hitting the if condition you expect, but maybe there is something else going on.
I'll try to get these running locally again to confirm everything is working. I've been wrapped up in some other work.
So I have some progress. I was able to vnc locally into a machine and run the svgr suite directly with the talos script and then again by wrapping mozharness around it. The tests went through without failing. Although still issues (see below link of logs):

http://people.mozilla.org/~jlund/svgr-metro-log.txt

issues: 
- the profile this talos suite creates (-profile c:\\\users\\\cltbld~1.t-w\\\appdata\\\local\\\temp\\\tmpdwyequ\\\profile) doesn't seem to exist while the metro-browser is runnnig its tests (the user 'cltbld~1.t-w' doesn't even exist on this machine).

- :08     INFO -  TinderboxPrint: TalosResult: {"datazilla": {"tsvgr": {"url": "https://datazilla.mozilla.org/talos/summary/Cedar/c8c9bd74cc40?product=undefined&branch_version=26.0a1"} <- 'product=undefined' is not good. 
*NOTE: iiuc, jmaher thinks we should be able to add whatever appInfo.ID is being generated in this run to be associated with a product (eg, say 'Metro-Firefox' or 'Firefox') via: http://hg.mozilla.org/build/talos/file/tip/talos/getInfo.html


The other interesting thing here is that this suite works if I am VNC'n into the machine and running the mozharn script. However, it fails when running the same script, args, and state when I am SSH'n?:
http://people.mozilla.org/~jlund/svgr-metro-log-ssh.txt

I have a feeling it's failing here for a similar reason as it fails when running through buildbot.
DIFF ->


SSH log (error):
14:08:23     INFO -  DEBUG : created profile
14:08:25     INFO -  INFO : Could not find __metrics(.*)__metrics in browser_log: browser_output.txt
14:08:25     INFO -  INFO : Raw results:INFO | metrotestharness.exe | Launching browser...
14:08:25     INFO -  TEST-UNEXPECTED-FAIL | metrotestharness.exe | CoCreateInstance CLSID_ApplicationActivationManager failed.
14:08:25     INFO -  __startBeforeLaunchTimestamp1376946503204__endBeforeLaunchTimestamp
14:08:25     INFO -  __startAfterTerminationTimestamp1376946503290__endAfterTerminationTimestamp
14:08:25     INFO -  INFO : Initialization of new profile failed
14:08:25     INFO -  INFO : INFO | metrotestharness.exe | Launching browser...
14:08:25     INFO -  TEST-UNEXPECTED-FAIL | metrotestharness.exe | CoCreateInstance CLSID_ApplicationActivationManager failed.
14:08:25     INFO -  __startBeforeLaunchTimestamp1376946503204__endBeforeLaunchTimestamp
14:08:25     INFO -  __startAfterTerminationTimestamp1376946503290__endAfterTerminationTimestamp
14:08:25     INFO -  DEBUG : Terminating: metrotestharness, plugin-container, crashreporter, dwwim
14:08:25     INFO -  Failed tsvgr:
14:08:25     INFO -         Stopped Mon, 19 Aug 2013 14:08:25
14:08:25    ERROR -  Traceback (most recent call last):
14:08:25     INFO -    File "C:\slave\test\build\venv\lib\site-packages\talos\run_tests.py", line 277, in run_tests
14:08:25     INFO -      talos_results.add(mytest.runTest(browser_config, test))
14:08:25     INFO -    File "C:\slave\test\build\venv\lib\site-packages\talos\ttest.py", line 289, in runTest
14:08:25     INFO -      self.initializeProfile(profile_dir, browser_config)
14:08:25     INFO -    File "C:\slave\test\build\venv\lib\site-packages\talos\ttest.py", line 114, in initializeProfile
14:08:25 CRITICAL -      raise talosError("failed to initialize browser")
14:08:25 CRITICAL -  talosError: 'failed to initialize browser'
14:08:25    ERROR - Return code: 2

VNC log (success):
12:57:33     INFO -  DEBUG : created profile
12:57:37     INFO -  DEBUG : initialized metrotestharness
12:57:42     INFO -  DEBUG : command line: '"C:\slave\test\build\application\firefox\metrotestharness"  -profile c:\\\users\\\cltbld~1.t-w\\\appdata\\\local\\\temp\\\tmpdwyequ\\\profile -tp file:\C:\slave\test\build\venv\lib\site-packages\talos\page_load_test\svg\svg.manifest -tpchrome -tpnoisy -tpcycles 1 -tppagecycles 25'
12:57:48     INFO -  INFO : INFO | metrotestharness.exe | Launching browser...
12:57:48     INFO -  INFO | metrotestharness.exe | App model id='E4CFE2E6B75AA3A3'
12:57:48     INFO -  INFO | metrotestharness.exe | Harness process id: 1632
12:57:48     INFO -  INFO | metrotestharness.exe | Using bin path: 'C:\slave\test\build\application\firefox\firefox.exe'
12:57:48     INFO -  INFO | metrotestharness.exe | Writing out tests.ini to: 'C:\slave\test\build\application\firefox\tests.ini'
12:57:48     INFO -  INFO | metrotestharness.exe | Browser command line args: 'C:\slave\test\build\application\firefox\firefox.exe -profile c:\\\users\\\cltbld~1.t-w\\\appdata\\\local\\\temp\\\tmpdwyequ\\\profile -tp file:\C:\slave\test\build\venv\lib\site-packages\talos\page_load_test\svg\svg.manifest -tpchrome -tpnoisy -tpcycles 1 -tppagecycles 25'
12:57:48     INFO -  INFO | metrotestharness.exe | Activation succeeded. processid=992
12:57:48     INFO -  INFO | metrotestharness.exe | Waiting on child process...
12:58:09     INFO -  INFO : XRE_MetroCoreApplicationRun: IsMainThread:0 ThreadId:F84
On win8 slaves we register the default browser on login at a specific location - C:\slave\test\build\application\firefox\firefox.exe.

14:08:25     INFO -  TEST-UNEXPECTED-FAIL | metrotestharness.exe | CoCreateInstance CLSID_ApplicationActivationManager failed.

This looks like either the login registration never happened, or the browser wasn't unpacked into the right location?
> issues: 
> - the profile this talos suite creates (-profile
> c:\\\users\\\cltbld~1.t-w\\\appdata\\\local\\\temp\\\tmpdwyequ\\\profile)
> doesn't seem to exist while the metro-browser is runnnig its tests (the user
> 'cltbld~1.t-w' doesn't even exist on this machine).

Don't think this makes much difference as long as the profile firefox exists
and is accessible by whatever account talos is running under. mozharness doesn't have any issues with this when running mochitests, although talos account login might be different?

> The other interesting thing here is that this suite works if I am VNC'n into
> the machine and running the mozharn script. However, it fails when running
> the same script, args, and state when I am SSH'n?:
> http://people.mozilla.org/~jlund/svgr-metro-log-ssh.txt
> 
> I have a feeling it's failing here for a similar reason as it fails when
> running through buildbot.

Probably login account related, under ssh, you may not get the default browser registration. Q, can you confirm?
Flags: needinfo?(q)
Correct SSH would not have the browser regs loaded since the registration  is applied at interactive login time.
Flags: needinfo?(q)
(In reply to Q from comment #37)
> Correct SSH would not have the browser regs loaded since the registration 
> is applied at interactive login time.

just to catch up with recent IRC chat:
This error to launch metro browser when running from SSH is the same for when running through Buildbot. So I need to modify something to get Buildbot launch the metroharness browser like I can through VNC

Environment diff from buildbot slave state(left side of img) vs VNC (right side of img) when running the mozharness script: http://imm.io/1fIHd
I'm currently getting spam due to bug 773817 tracking fx 25. Do you think we'll be able to sort this out by the next uplift or should we kick this out to the next one?
I have been looking at the browser registration and have had no forward movement yet, due to some other priorities causing delays. Will update with more information later today.
(In reply to Jim Mathies [:jimm] from comment #39)
> I'm currently getting spam due to bug 773817 tracking fx 25. Do you think
> we'll be able to sort this out by the next uplift or should we kick this out
> to the next one?

still coming up with similar results throughout all the suites. I was talking with jmaher last week and I think we are really close. Just going to take a small tweak to get past this hiccup. That out of the way and I think we are well on our way to getting this landed.

Q I know you're busy but if you get a couple min to spare, I'd love to pick your brain tomorrow?
I have a few ideas do you have some for a vidyo discussion today?
(In reply to Q from comment #42)
> I have a few ideas do you have some for a vidyo discussion today?

That would be perfect! I am available all day and I'm on #releng #build
after discussing things with Q, my solution from comment 13, copying metrotestharness.exe to the firefox exec path rather then calling it where it is and passing '-firefoxPath', may be a problem.

I have decided to try to implement the same way that metro mochitests are done (comment 5).

I made a patch in the talos repo after emailing jmaher for the go ahead. 

Jmaher, This is rough as it does not check for win8 and it passes a '--metro-immersive-path' with the metrotestharness path rather then getting talos run_tests just to figure out where it is.

My solution seems to work however the problem still persists. As a side, I have discovered that I have been using the following path for my metrotestharness.exe

'tests/mozbase/mozrunner/mozrunner/resources/metrotestharness.exe'

Mochitests uses this metrotestharness.exe file:

'tests/bin/metrotestharness.exe'

I am just doing a build with the 'bin' path right now and progress seems to be being made. Albeit, I don't think I am getting all the output. Anyway, progress, and now I have some options to try over the weekend.
Attachment #801049 - Flags: feedback?(jmaher)
So I went through a number of suites in different variations today. It seems that the registration/default browser issue is resolved. I have 6 different suites and their respective logs to report:

suites: tscroll a11yr and ts_paint
http://people.mozilla.org/~jlund/otherMetro0908.txt

suites: "dromaeo_css", "dromaeo_dom", "kraken:v8_7"
http://people.mozilla.org/~jlund/dromaeoMetro0908.txt

suites: tscrollx
http://people.mozilla.org/~jlund/rafxMetro0908.txt

suites: tsvgr, tsvgr_opacity
http://people.mozilla.org/~jlund/svgrMetro0908.txt

These first 4 suites all failed on the same error. This is an issue I was hitting before (comment 20). bug 805406 and or chrome issues could be the root of it?

Here are lines I think give context to all 4 suites:
14:26:41     INFO -  eWindowType_invisible window requested, this doesn't actually exist!
14:26:41    ERROR -  pageloader exception: TypeError: window.arguments is undefined
14:26:41 CRITICAL -  talosError: "Could not find report in browser output: [('tsformat', ('__start_report', '__end_report')), ('tpformat', ('__start_tp_report', '__end_tp_report'))] [browser_output.txt]"



suites: tp5o
http://people.mozilla.org/~jlund/tp5oMetro0908.txt

suites: tp5n
http://people.mozilla.org/~jlund/tpnMetro0908.txt

tp5o and tp5n both ran for both an hour until they hit a timeout. I thought I extended it but I guess they hit a different timeout limit. I don’t think they should be taking an hour though. There seems to be a delay in the logs (45 min) where the metro browser closes and then reopens. It seems to stall on re opening:

16:20:20     INFO -  Cycle 1(7): loaded http://localhost/page_load_test/tp5n/tudou.com/www.tudou.com/index.html (next: http://localhost/page_load_test/tp5n/uol.com.br/www.uol.com.br/index.html)
16:20:20     INFO -  RSS: Main: 278147072
16:20:20     INFO -  M
17:05:21     INFO -  INFO : INFO | metrotestharness.exe | Launching browser...
17:05:21     INFO -  INFO | metrotestharness.exe | App model id='E4CFE2E6B75AA3A3'
17:05:21     INFO -  INFO | metrotestharness.exe | Harness process id: 1708
tp5n we don't need to run anymore, that is only run on much older branches (probably esr only by now)

it appears tp5o might have hung?  the test continued to run just fine but output stopped, then 45 minutes later we probably killed it and what we see is the log repeating itself from the buffer?


looking at the command lines between tp5o and tscroll, the only difference is the -rss flag is added to tp5o, but for some reason there is the window.arguments error in tscroll.
Comment on attachment 801049 [details] [diff] [review]
talos repo patch to accommodate mozharness passing an --metro-immersive-path

Review of attachment 801049 [details] [diff] [review]:
-----------------------------------------------------------------

::: talos/run_tests.py
@@ +192,4 @@
>    # set browser_config
>    browser_config=configurator.browser_config()
>  
> +  # if immersive-mode: set up metro browser launch

add a bug number to the comment :)

@@ +196,5 @@
> +  if config.get('immersive_mode_path'):
> +      # TODO assert win 8
> +      # mozharness cuts off the exe but metrotestharness needs it?
> +      appPath = '-firefoxpath %s.exe' % (browser_config['browser_path'],)
> +      browser_config['extra_args'] += appPath

we could make one line here, no need to assign appPath?

@@ +197,5 @@
> +      # TODO assert win 8
> +      # mozharness cuts off the exe but metrotestharness needs it?
> +      appPath = '-firefoxpath %s.exe' % (browser_config['browser_path'],)
> +      browser_config['extra_args'] += appPath
> +      browser_config['browser_path'] = config.get('immersive_mode_path')

I would validate immersive_mode_path is a valid file at the very least.
Attachment #801049 - Flags: feedback?(jmaher) → feedback+
the only place we access window.arguments is inside of pageloader.js (http://hg.mozilla.org/build/talos/file/tip/talos/pageloader/chrome/pageloader.js#l114):
    let toplevelwin = Services.wm.getMostRecentWindow("navigator:browser");
    if (isImmersive() && toplevelwin.arguments[0].wrappedJSObject) {
      args = toplevelwin.arguments[0].wrappedJSObject;
      if (!args.useBrowserChrome) {
        // Huh? Should never happen.
        throw new Exception("non-browser chrome test requested but we detected a metro immersive in-tab run?");
      }
      // running in a background tab
      metroTabbedChromeRun = true;
    } else {
      args = window.arguments[0].wrappedJSObject;
    }

Why we don't see this on tp5o is odd to me.  Is it possible that we are not detecting isImmersive()?
(In reply to Joel Maher (:jmaher) from comment #48)
> the only place we access window.arguments is inside of pageloader.js
> (http://hg.mozilla.org/build/talos/file/tip/talos/pageloader/chrome/
> pageloader.js#l114):
>     let toplevelwin = Services.wm.getMostRecentWindow("navigator:browser");
>     if (isImmersive() && toplevelwin.arguments[0].wrappedJSObject) {
>       args = toplevelwin.arguments[0].wrappedJSObject;
>       if (!args.useBrowserChrome) {
>         // Huh? Should never happen.
>         throw new Exception("non-browser chrome test requested but we
> detected a metro immersive in-tab run?");
>       }
>       // running in a background tab
>       metroTabbedChromeRun = true;
>     } else {
>       args = window.arguments[0].wrappedJSObject;
>     }
> 
> Why we don't see this on tp5o is odd to me.  Is it possible that we are not
> detecting isImmersive()?

I ran a number of tests today. I am having a lot of inconsistencies:
sometimes the tests start, sometimes the tests run but there is no log output, and sometimes the tests don't start with windows.args error.

However one interesting thing is after throwing some dumplines in pageloader.js, it seems that isImmersive() is always true but toplevelwin.arguments[0].wrappedJSObject is sometimes (if not always) undefined? I say sometimes as since adding dumplines I haven't had a suite start where output was also captured in the log. It seems to be random?
    dumpLine("isImmersive: " + isImmersive());
    dumpLine("toplevelwin: " + toplevelwin.arguments[0]);
    dumpLine("toplevelwin wrappedJSObject: " + toplevelwin.arguments[0].wrappedJSObject);
    22:31:14     INFO -  isImmersive: true
    22:31:14     INFO -  toplevelwin: about:start
    22:31:14     INFO -  toplevelwin wrappedJSObject: undefined

I removed tp5n from the list and fixed up talos patch in accordance to jmaher's feedback. Thx Joel for help so far.

Also extended the time limit. In mozharness, the time limit was raised but in talos it was set to 3600s. However the suites still exceed time of 5400s when they actually start. will debug more tomorrow to try and figure out the hang or why so much time is needed.
there is no reason to extend the timeout for talos, a talos test should never take more than 40 minutes, most take <15 minutes.

I have no idea why toplevelwin.arguments[0].wrappedJSObject is undefined, does about:start not have a wrappedJSObject?  maybe we shouldn't be on about:start, these are things I don't know.
Depends on what type of test you're trying to run - chrome or nochrome.
Here's where those args get set up in the pageloader command line handler - 

http://hg.mozilla.org/build/talos/file/ca2229a32cb6/talos/pageloader/components/tp-cmdline.js#l82
(In reply to Joel Maher (:jmaher) from comment #50)
> there is no reason to extend the timeout for talos, a talos test should
> never take more than 40 minutes, most take <15 minutes.
> 
> I have no idea why toplevelwin.arguments[0].wrappedJSObject is undefined,
> does about:start not have a wrappedJSObject?  maybe we shouldn't be on
> about:start, these are things I don't know.

When toplevelwin.arguments[0] is about:start, it's a string and there is no wrappedJSObject prop that's been attached to it.

However when the tests do start to run (determined by the computer gods), toplevelwin.arguments[0].wrappedJSObject has a value and I believe it is: http://hg.mozilla.org/build/talos/file/ca2229a32cb6/talos/pageloader/components/tp-cmdline.js#l126
(In reply to Jim Mathies [:jimm] from comment #51)
> Depends on what type of test you're trying to run - chrome or nochrome.

I'm specifying '-tpchrome' which I guess triggers http://hg.mozilla.org/build/talos/file/ca2229a32cb6/talos/pageloader/components/tp-cmdline.js#l141
(In reply to Jim Mathies [:jimm] from comment #52)
> Here's where those args get set up in the pageloader command line handler - 
> 
> http://hg.mozilla.org/build/talos/file/ca2229a32cb6/talos/pageloader/
> components/tp-cmdline.js#l82

I threw some dumps in this file. Whether or not we fail to start tests (windows.arguments error), in both cases we get to be true:
http://hg.mozilla.org/build/talos/file/ca2229a32cb6/talos/pageloader/components/tp-cmdline.js#l121


and it makes it to here (doesn't prematurely catch and return) because I dump lines immediately after this line:
http://hg.mozilla.org/build/talos/file/ca2229a32cb6/talos/pageloader/components/tp-cmdline.js#l131

If I could at least figure out why sometimes the tests start after the metro browser loads, then I can figure out why they take so long/hang...
could it be we are opening the window too fast:
http://hg.mozilla.org/build/talos/file/ca2229a32cb6/talos/pageloader/components/tp-cmdline.js#l127

maybe by opening it before everything is loaded properly we have a chance of losing the args.
(In reply to Joel Maher (:jmaher) from comment #56)
> could it be we are opening the window too fast:
> http://hg.mozilla.org/build/talos/file/ca2229a32cb6/talos/pageloader/
> components/tp-cmdline.js#l127
> 
> maybe by opening it before everything is loaded properly we have a chance of
> losing the args.

So after a number of tests today, setting a timeout (5s ATM) prior to: http://hg.mozilla.org/build/talos/file/ca2229a32cb6/talos/pageloader/chrome/pageloader.js#l104 seems to consistently start the svgr (testing more suites tomorrow) tests in our automation and locally. Sounds like there is a timing issue.
WRT the hang, there seems to be gaps in the logs + missing output when run in buildbot and locally. However the logs are better with a local run and svgr even sometimes complete!

locally run w/ mozharness: http://people.mozilla.org/~jlund/svgr-metro-0911.txt
buildbot run in automation: http://people.mozilla.org/~jlund/svgr-metro-0911-buildbot.txt
locally run w/ just talos (w/o mozharness): http://people.mozilla.org/~jlund/svgr-metro-0911-talos.txt <- this is browser_output but that just seems to show tsvgr_opacity and not tsvgr since it relaunches the browser in between a svgr suite runthrough. It takes ~45 min to run. Unfortunately w/o timestamps it's hard to tell if there is hanging/missing output

Notice the gaps in time getting larger and larger as the tests run. It is also taking way to long compared to the non metro svgr equivalent. Finally with the automation log, it seems to re spit out the initial metrotestharness output that is stored in maybe a buffer (as jmaher suggested in comment 46) or else we are seeing something due to the metrobrowser having to relaunch in between tsvgr and tsvgr_opacity. 

I am not entirely sure how to debug this at this point. I'll try the svgr test tomorrow with only one suite (tsvgr).

Maybe the missing output is from incorrectly propagating the output from the subprocesses all the way up (metro-firefox -> talos -> mozharn -> buildbot).
(In reply to Jordan Lund (:jlund) from comment #58)
> WRT the hang, there seems to be gaps in the logs + missing output when run
> in buildbot and locally. However the logs are better with a local run and
> svgr even sometimes complete!

Jmaher is looking into this. He has a win8 machine and was able to get Mozharness to run the talos script on Friday. He plans to do some debugging early this week.
I have been running a few tests locally using the raw talos command that mozharness would do (using the venv and everything) and have ran tsvgx and tp5o successfully.  The raw times are noticeably slower than the win8 numbers on tbpl (note: it could be a machine/network issue, but I find that hard to believe it would be off so much).

Total runtime for tsvgx: 6:32, tp5o: 17:03.

Those are very valid, in general, I am seeing a range of 200-1000ms longer for each pageload (we do 25 pageloads for each page we are testing).

Some pages are loading faster than we see in buildbot, but that is only a few.  

Looking at ts_paint (startup time), we hit about 800ms on windows 8, but in metro I am seeing 2500ms.  I assume metroharness.exe is adding some overhead here, not sure if this is concerning at all.

Any thoughts on why we would have slower page load times?
(In reply to Joel Maher (:jmaher) from comment #60)
> I have been running a few tests locally using the raw talos command that
> mozharness would do (using the venv and everything) and have ran tsvgx and
> tp5o successfully.  The raw times are noticeably slower than the win8
> numbers on tbpl (note: it could be a machine/network issue, but I find that
> hard to believe it would be off so much).
> 
> Total runtime for tsvgx: 6:32, tp5o: 17:03.
> 
> Those are very valid, in general, I am seeing a range of 200-1000ms longer
> for each pageload (we do 25 pageloads for each page we are testing).
> 
> Some pages are loading faster than we see in buildbot, but that is only a
> few.  
> 
> Looking at ts_paint (startup time), we hit about 800ms on windows 8, but in
> metro I am seeing 2500ms.  I assume metroharness.exe is adding some overhead
> here, not sure if this is concerning at all.
> 
> Any thoughts on why we would have slower page load times?

A few ideas - different gfx backend (metro uses omtc and loads async pan zoom), different rendering surfaces (do we run talos fullscreen for desktop?), bugs in the code, bad front end design, etc...

Once we have these running we can start looking at improving the numbers.
So some progress has been made on this:

svgx/svgr_opacity: http://people.mozilla.org/~jlund/svgr-metro-0923-mozharness.txt
status: looks successful w/ mozharness, output not hanging/delayed. Total time in line with non metro equivalents

tp5o: http://people.mozilla.org/~jlund/tp5o-metro-0923-mozharness.txt
status: also successful w/ mozharness, output not hanging/delayed. Total time normal

other(tscrollr, a11yr, ts_paint): http://people.mozilla.org/~jlund/other-metro-0923-mozharness.txt
status: failed tscrollr. output seemed fine
error: 14:50:33 CRITICAL -  talosError: "Could not find report in browser output: [('tsformat', ('__start_report', '__end_report')), ('tpformat', ('__start_tp_report', '__end_tp_report'))] [browser_output.txt]"

dromaeojs(dromaeo_css, dromaeo_dom, kraken:v8_7):  http://people.mozilla.org/~jlund/dromaeojs-metro-0923-mozharness.txt
status: failed
error:
16:35:59 CRITICAL -  utils.talosError: "[Errno 13] Permission denied: 'browser_output.txt'"
16:36:00 CRITICAL -  talosError: "Could not find beforeLaunchTime in browser output: (tokens: ('__startBeforeLaunchTimestamp', '__endBeforeLaunchTimestamp')) [browser_output.txt]"


So it seems that the hanging output issue has been fixed (see solution at the bottom for reference). Some suites are still failing. Also my timeout I have in talos in attempt to fix window.arguments error seems to need to be extended or is not working reliably as I occasionally still run into this error.

output/hang solution: This was due to the way mozharness run's subprocesses. Up until now we were using Popen in mozharn talos. With the method 'run_command()', you can pass an 'output_timeout' arg that triggers the internal handler to use mozprocess.ProcessHandler instead of subprocess.Popen. using mozprocess seemed to fix it.
The crash on the first suite "other", is odd.  Please try changing tscrollr to tscrollx, that is the latest version of that test.

The dromaeo crash appears to be related to the browser_output.txt file not being available.  I strongly suspect the browser process is still running and has this file open.  This is something we could add to talos, let me look into this.
(In reply to Joel Maher (:jmaher) from comment #63)
> The crash on the first suite "other", is odd.  Please try changing tscrollr
> to tscrollx, that is the latest version of that test.

ran with just tscrollx today -
log: http://people.mozilla.org/~jlund/other-metro-0924-mozharness.txt
status: still errors out. There is an unknown error ~2 pages up from the bottom of log:
16:54:05    ERROR -  0x73290000 - 0x732defff  Bcp4DEBUG : unknown error during cleanup:

> The dromaeo crash appears to be related to the browser_output.txt file not
> being available. This is something we could add to talos, let me
> look into this.

OK cool. I rebooted and ran a few times with still the same result.
================== Status Update

Joel Maher will be looking into this tomorrow morning

The following suites/args are the latest tests run: 
        "dromaeojs-metro": {
            "tests": ["dromaeo_css", "dromaeo_dom", "kraken", "v8_7"]
        },
        "other-metro": {
            "tests": ["a11yr", "ts_paint"]
        },
        "svgr-metro": {
            "tests": ["tsvgx", "tsvgr_opacity", "tscrollx"]
        },
        "tp5o-metro": {
            "tests": ["tp5o"],
            "pagesets_url": "http://talos-bundles.pvt.build.mozilla.org/zips/tp5n.zip",
            "pagesets_parent_dir_path": "talos/page_load_test/",
            "pagesets_manifest_path": "talos/page_load_test/tp5n/tp5o.manifest",
            "plugins": {
                "32": "http://talos-bundles.pvt.build.mozilla.org/zips/flash32_10_3_183_5.zip",
                "64": "http://talos-bundles.pvt.build.mozilla.org/zips/flash64_11_0_d1_98.zip"
            }
        }

The following are the logs from the above suites ran against latest m-c, talos, and mozharness (plus the addition of my patches):

http://people.mozilla.org/~jlund/dromaeoMetro1015.txt
timeout. here it looks like the browser failed to launch or output wasn't captured, or buffering issue

http://people.mozilla.org/~jlund/otherMetro1015.txt
looks to complete without barfing but many JS errors are printed throughout tests

http://people.mozilla.org/~jlund/svgrMetro1015.txt
runs many tests but eventually errors out

http://people.mozilla.org/~jlund/tp5oMetro1015.txt
timeout. either hung or there is an issue with buffering

http://people.mozilla.org/~jlund/window_args1015.txt
this error occasionally still creeps up. It seems there is still a timing issue

This was run in automation. Since there are regressions (again), I am currently running this locally to see if there is any difference. One thing I reverted from the talos repo is the way we called Popen. Previously I had this:
http://hg.mozilla.org/users/jlund_mozilla.com/build-talos/rev/3ded2a2e76e6
But I did not think it was doing much. Maybe this was helping with buffering issue? I am going to try reverting that patch and seeing if it was having a positive effect.

=== how to replicate errors:

Joel, this can be replicated by cloning the following hg repo with the latest rev:
http://hg.mozilla.org/users/jlund_mozilla.com/mozharness2/

then run the script from within mozharness with the following args:
python -u ./scripts/talos_script.py --suite other-metro --add-option --webServer,localhost --branch-name Cedar --system-bits 32 --cfg talos/windows_config.py --download-symbols ondemand --use-talos-json --no-read-buildbot-config --test-url http://stage.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32/1381825649/firefox-27.0a1.en-US.win32.tests.zip --installer-url http://stage.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32/1381825649/firefox-27.0a1.en-US.win32.zip

**note: other-metro can be replaced with svgr-metro, etc

This will end up using the following talos.json:
http://people.mozilla.org/~jlund/talos.json

and talos.json will use the latest talos repo here:
http://hg.mozilla.org/users/jlund_mozilla.com/build-talos/

The above steps should be the exact same as before but contain the latest merged production changes and my patches. Hopefully it will be easier to play around with now that I am using all hg and not Git. Let me know if you have any questions! :)
also note:

with tp5o there was also the following error:
"20:19:40     INFO -  WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmpp7i7j6\\profile\\cert8.db'"

with dromaeo there was the following in the logs:
"ProcessManager UNABLE to use job objects to manage child processes" <- which seemed to fall outside of mozharness logging
Joel Maher has this currently under his radar. He got a hold of a machine this morning and is testing his talos repo patches against metro which that may help the above issues.
I have been able to run the tests on my local win8 laptop.  A few differences to note:
* we are using my private repo (http://hg.mozilla.org/users/jmaher_mozilla.com/tpain) with changes that are preproduction (actually was landed today and backed out, will try again shortly)
* this is a local laptop, probably has different options set and software installed
* custom talos.json (http://people.mozilla.org/~jmaher/talos.json)

In using the mozharness repo that jlund has created with my above changes (really no changes specifically for metro), these tests are able to run pretty well.  I see about 1/10 tests run into problems.  The specific tests which are not run are:
* canvasmark
* tresize
* tscroll
* tpaint
* tart

I am not sure if all of these tests will be supported for metro mode, I suspect tresize and tpaint would not be.  tart probably wouldn't either.

> * tresize

This shouldn't run, metrofx is a single window browser that can't be resized.

> * canvasmark
> * tscroll
> * tpaint
> * tart

These shouldn't have any problem.
I was not able to get tpaint or tart to run, they both rely on chrome specific features, quite possible we need to tweak something.
I am finding similar results with jmaher's suites + mozharness. Although, sometimes when I leave a VNC session, let the computer display sleep, or run this through automation (buildbot), the suite can hang and timeout. If I open a VNC session and focus on the metro browser during one of these hangs, the tests start running again.

I am also almost certain there is a timing issue. I am still running into the window.arguments error even with Jmaher's talos patch. 

Aside from those two issues, the actual suites: dromaeo_css, dromaeo_dom, kraken, v8_7, a11yr, ts_paint, tsvgx, tsvgr_opacity, and tp5o run through mozharn successfully with applied patches.
(In reply to Joel Maher (:jmaher) from comment #70)
> I was not able to get tpaint or tart to run, they both rely on chrome
> specific features, quite possible we need to tweak something.

I think we have to rule out tapint as well. If I remember correctly this test tries to open a succession of windows to measure first paint. 

I did a search for tart and found this in the build repo - 

http://mxr.mozilla.org/build/source/talos/talos/page_load_test/tart/

Not much there. addins though are disabled by default (remember we had to jump through some hoops to get page loader loaded). Maybe this isn't loading up right.
correct, tpaint does a lot of window opening and measuring the times to do that.  TART uses an addon, that is something to look into.

As for the browser pausing and then the test times out, how can we prevent metro mode from going to sleep?  Any tips or thoughts on what could be collected or done to troubleshoot this?
(In reply to Joel Maher (:jmaher) from comment #73)
> correct, tpaint does a lot of window opening and measuring the times to do
> that.  TART uses an addon, that is something to look into.
> 
> As for the browser pausing and then the test times out, how can we prevent
> metro mode from going to sleep?  Any tips or thoughts on what could be
> collected or done to troubleshoot this?

If the browser is in the foreground it shouldn't suspend assuming our slaves are set up right. Two things could cause this - 

1) the browser in running in the background for some reason (definitely a bug in our setup/launch methodology if it is).
2) the machine is sleeping due to config issues (not likely)
More progress. Joel suggested a power setting issue on the build machines. After inspection the slaves are set to put the display to sleep after 15 min. I changed to this two hours just in case this helped matters. Today after three re-runs of every suite, I did not have any problems.

I am running them all again for a fourth run through but I think we should be close if not ready for the next stage. I propose we roll out metro talos onto the Cedar branch in production with the current suites that are passing. This may not be ideal but at least we can get the following suites being tested against right away:       

        "dromaeojs-metro": ["dromaeo_css", "dromaeo_dom", "kraken", "v8_7"]
        "other-metro": ["a11yr", "ts_paint"]
        "svgr-metro": ["tsvgx", "tsvgr_opacity", "tscrollx"]
        "tp5o-metro": ["tp5o"]

I will submit patches for review tomorrow. Joel, has that Talos revision that you sent me (the tpain one) landed with the subprocess changes? I don't think I won't need to do any talos patches since I am moving metrotestharness.exe during mozharness script prior to talos's run_tests.py being called (so no need for --metro-immersive-path)
all the process management code is landed in talos and will be rolled out to the trees today.  

I would like to see if canvasmark can be added to the list of tests, otherwise this list looks good for now.
(In reply to Jordan Lund (:jlund) from comment #75)
> More progress. Joel suggested a power setting issue on the build machines.
> After inspection the slaves are set to put the display to sleep after 15
> min. I changed to this two hours just in case this helped matters. Today
> after three re-runs of every suite, I did not have any problems.

This might be the source of the other win8 talos problem we have (bug 859571). What releng bug is this filed under?
Depends on: 929473
this is a patch against talos.json. It points to the latest revision in the hg talos repo. It also adds metro specific keys and known suites to work. This talos.json change needs to land prior to mozharness patch and buildbot-configs patch
Attachment #820806 - Flags: review?(jmaher)
Attached patch bug_897420_mh_10222013-4.diff (obsolete) — Splinter Review
this is for mozharness to accept metro named suites. If a suite that ends with '-metro' is passed to a talos script, then we grab metrotestharness.exe, move it to the installer path (see def install()), and use that as the binary.
Attachment #785615 - Attachment is obsolete: true
Attachment #820809 - Flags: review?(aki)
talos needs tests.zip to grab metrotestharness.exe. This is not ideal however until metrotestharness.exe becomes more static or ends in a new home, we will need to add the testsUrl to every win32 talos sendchange. I was trying to think of a way to narrow this down to strictly win8 but I could not find a correct way to do so from here. 

Also should note, I believe this will work although I have been firing a manual sendchange externally for testing.
Attachment #820837 - Flags: review?(armenzg)
Comment on attachment 820809 [details] [diff] [review]
bug_897420_mh_10222013-4.diff

I'm not thrilled with the .endswith('-metro') being the trigger here... we've done that for '-debug' and really, ideally we would have something else other than special name substrings to trigger different behavior.  Can the suite have another key in talos.json or something?

However, we're already doing it for -debug, so I can't really object too loudly.
Attachment #820809 - Flags: review?(aki) → review+
this adds metro talos keys. They are disabled by default, only valid for win8, and (for now) only enabled in the 'cedar' branch.
Attachment #820840 - Flags: review?(armenzg)
Comment on attachment 820806 [details] [diff] [review]
talos.json diff from mozilla-central

Review of attachment 820806 [details] [diff] [review]:
-----------------------------------------------------------------

r- for changing the talos revision, the other is more a question

::: testing/talos/talos.json
@@ +4,5 @@
>          "path": ""
>      },
>      "global": {
>          "talos_repo": "http://hg.mozilla.org/build/talos",
> +        "talos_revision": "58bd513cc54e"

please don't change these, that changeset was backed out yesterday.

@@ +68,5 @@
> +            "pagesets_manifest_path": "talos/page_load_test/tp5n/tp5o.manifest",
> +            "plugins": {
> +                "32": "http://talos-bundles.pvt.build.mozilla.org/zips/flash32_10_3_183_5.zip",
> +                "64": "http://talos-bundles.pvt.build.mozilla.org/zips/flash64_11_0_d1_98.zip"
> +            }

if nothing is different between tp5o and tp5o-metro, do we need a tp5o-metro?
Attachment #820806 - Flags: review?(jmaher) → review-
Attachment #820837 - Flags: review?(armenzg)
Attachment #820840 - Flags: review?(armenzg)
This metroharness version is known to work with many talos suites. It's source lives in m-c and is still actively being worked on. However, this saves us needing to download and extract the tests.zip for every talos run (talos unlike unittests does not require tests.zip from the sendchange). metrotestharness.exe may find a new home in the future and there may also need to be some maintenance between the tests.zip binary and this one but, for efficiency reasons, they would help a lot.
Attachment #821128 - Flags: review?(jmaher)
Comment on attachment 821128 [details] [diff] [review]
this adds a metrotestharness.exe version to talos repo.

Review of attachment 821128 [details] [diff] [review]:
-----------------------------------------------------------------

looks great, please land this in the talos repo.
Attachment #821128 - Flags: review?(jmaher) → review+
WRT to question as to why 'tp5o' and 'tp5o-metro' and holding the same value:

I can change that if you prefer. Right now the {suite}-metro keys match the '--suite {suite}-metro' arg that is passed to mozharness from buildbot. This works well for all the suites bar tp5o in talos.json (since we are running modified suite sets/options in the other ones).

I could in mozharness use the logic:
   if {this_suite} == 'tp5o-metro': use 'tp5o' in talos.json
   else: use {this_suite} in talos.json

I thought that was messy and, until metro uses the same args/options in talos.json as its non metro equivalent, that was my argument for having a duplicate entry in talos.json. I don't love either way, so I'll do whatever you think.
Attachment #820809 - Attachment is obsolete: true
Attachment #821198 - Flags: review?(jmaher)
Comment on attachment 821198 [details] [diff] [review]
talos.json change without modifying talos revision

Review of attachment 821198 [details] [diff] [review]:
-----------------------------------------------------------------

this looks better.  I am fine with metro specific ones, but I would hope we continue to work on this until we can get rid of the duplicated builder job names.
Attachment #821198 - Flags: review?(jmaher) → review+
Attachment #820806 - Attachment is obsolete: true
this adds metro keys to talos suites and adds a key: 'metro_mode' to tell mozharness to use metrotestharness.exe as binary_path. See next attachment for mozharness
Attachment #821198 - Attachment is obsolete: true
Attachment #822155 - Flags: review?(armenzg)
adds support for metro mode in talos runs. if the suite passed to mozharness: '--suite {suite}' has a 'metro_mode' key in talos.json, mozharness will move  metrotestharness.exe to the installer dir and make it the binary.
Attachment #801049 - Attachment is obsolete: true
Attachment #822159 - Flags: review?(armenzg)
we no longer require the tests.zip for metrotestharness.exe. Note, this patch should only land after the other 3: talos.json, talos repo, and mozharness patches land first.

This will run two sets of talos builders for win8 in Cedar only. One for metro mode, and one for 'normal' mode.
Attachment #820837 - Attachment is obsolete: true
Attachment #820840 - Attachment is obsolete: true
Attachment #822165 - Flags: review?(armenzg)
Armen, for latest result logs see: http://dev-master01.build.scl1.mozilla.com:8038/builders

It will be any of the most recent builds from talos suites that end with '-metro': other-metro, svgr-metro, dromaeojs-metro, and tp5o-metro under cedar. I believe the most recent 'other-metro' failed but occasional failures are expected.
Attachment #822165 - Flags: review?(armenzg) → review+
Comment on attachment 822159 [details] [diff] [review]
bug_897420_mh_1024.diff <- mozharness patch for metro talos

Review of attachment 822159 [details] [diff] [review]:
-----------------------------------------------------------------

r=+ once you figure out my question.

::: mozharness/mozilla/testing/talos.py
@@ +525,5 @@
> +            if not os.path.exists(dirs.get('abs_metro_path', '')):
> +                unknown_path = 'None: is "metro_harness_path_frmt" in your cfg?'
> +                self.fatal('Could not determine metrotestharness.exe path.'
> +                           'Trying - ' % (dirs.get('abs_metro_path',
> +                                                   unknown_path)))

I'm not used to this string formatting approach.
Normally I would see 'This is what you wanted: %s' % (my_variable)

Are you missing "%s" from your formatted string?
Attachment #822159 - Flags: review?(armenzg) → review+
Comment on attachment 822155 [details] [diff] [review]
bug_897420_mc_1024.diff <- talos.json change to support metro

Review of attachment 822155 [details] [diff] [review]:
-----------------------------------------------------------------

As long as you have an r+ wrt to which tests each job will trigger by jmaher (which you seem you do); I'm happy with it.
Attachment #822155 - Flags: review?(armenzg) → review+
I'm seeing "ProcessManager NOT managing child processes" and "ProcessManager UNABLE to use job objects to manage child processes" in the logs; is that something to worry about?

Perhaps, it is a question for jmaher.
I am not familiar with those ProcessManager errors, I do not believe they are talos specific, but they could be metroharness or mozharness specific.
> I'm not used to this string formatting approach.
> Normally I would see 'This is what you wanted: %s' % (my_variable)
> 
> Are you missing "%s" from your formatted string?

You are not used to seeing it because it only appears when some developer neglects to negative test all his/her code :|

My bad. Will rectify bug now.
Attachment #822159 - Attachment is obsolete: true
Attachment #822420 - Flags: review?(armenzg)
Attachment #822420 - Flags: review?(armenzg) → review+
In production
Comment on attachment 822420 [details] [diff] [review]
bug_897420_mh_1025.diff <- fixes str format typo from previous review

Backed out to see if it fixes windows talos:
http://hg.mozilla.org/build/mozharness/rev/00bb7c5b5488
Attachment #822420 - Flags: checked-in-
Whiteboard: [leave open]
fixes processhandler sometimes returning a long instead of an int. should be same patch as before with the added int() converter inside script.py
Attachment #822420 - Attachment is obsolete: true
Attachment #824155 - Flags: review?(aki)
Attachment #824155 - Flags: review?(aki) → review+
mozharness patch landed on default: http://hg.mozilla.org/build/mozharness/rev/b30dea525915
To land the buildbot patch to get this automated in cedar, we are dependent on the talos.json in m-c to point to a talos repo rev that supports metro talos. Once that m-c patch lands, we are good to go.
which depends on the trees being open :)
this re-enables mozprocess for all talos suites and sets a blanket timeout of 3600s (which is in-line with what the default from within talos).

It also adds the ability for the timeout to be configurable so we can set that by platform or support a new CLI argument if we wish to have a varied timeout set from Buildbot.
Attachment #826150 - Flags: review?(aki)
Attachment #826150 - Flags: review?(aki) → review+
Patches in place. Should make it's way through automation by tomorrow.
in production
Blocks: 935652
Why don't we resolve this as fixed and file remaining todo items as new bugs? AFACT the two remaining issues are getting talos supporting the retry test directive and enabling these tests on mc/aurora trees, with follow ups on enabling on beta/release as we roll out with 28.
These are running, and a retrigger will usually yield success. On the cedar branch: https://tbpl.mozilla.org/?tree=Cedar, you can see that we run 4 suites and usually have 1 failure per suite (usually failing to initialize browser- possible installation issues). I have done some retriggers and they yield better success than 75%.

I am fine closing this, jlund?
Sorry, have been PTO last week with graduation. I should have said that in this bug along with email. I'll close this and open up follow up bugs so we can hammer those out too. :)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: