Closed Bug 920757 Opened 11 years ago Closed 9 years ago

consume talos.json for talos tests on pandas

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
macOS
task
Not set
normal

Tracking

(firefox25 fixed, firefox26 fixed, firefox27 fixed)

RESOLVED WONTFIX
Tracking Status
firefox25 --- fixed
firefox26 --- fixed
firefox27 --- fixed

People

(Reporter: kmoir, Assigned: kmoir)

References

Details

Attachments

(10 files, 4 obsolete files)

17.22 KB, patch
mozilla
: feedback+
Details | Diff | Splinter Review
20.39 KB, patch
mozilla
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
1.42 KB, patch
kmoir
: review+
Details | Diff | Splinter Review
1.42 KB, patch
kmoir
: review+
Details | Diff | Splinter Review
724 bytes, patch
jmaher
: review+
Details | Diff | Splinter Review
21.22 KB, patch
kmoir
: checked-in+
Details | Diff | Splinter Review
737 bytes, patch
jmaher
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
20.57 KB, patch
kmoir
: checked-in+
Details | Diff | Splinter Review
23.90 KB, patch
mozilla
: review+
Details | Diff | Splinter Review
23.95 KB, patch
kmoir
: checked-in-
Details | Diff | Splinter Review
jmaher added the mobile tests to the talos.json in bug 914799, we should consume it for panda tests
Assignee: nobody → kmoir
Depends on: 923171
jmaher:

So I fixed all the dependency issues I mentioned at the Summit but am now seeing this error the talos tests.  Not sure how to fix this, if you have suggestions...

14:21:12     INFO -  Running test tp4m:
14:21:12     INFO -  		Started Tue, 08 Oct 2013 14:21:12
14:21:16     INFO -  Exception in thread Thread-1:
14:21:16    ERROR -  Traceback (most recent call last):
14:21:16     INFO -    File "/tools/python27/lib/python2.7/threading.py", line 551, in __bootstrap_inner
14:21:16     INFO -      self.run()
14:21:16     INFO -    File "/builds/panda-0314/test/build/talos_repo/talos/bcontroller.py", line 67, in run
14:21:16     INFO -      retVal = self.remoteProcess.launchProcess(self.command, outputFile=remoteLog, timeout=self.test_timeout)
14:21:16     INFO -    File "/builds/panda-0314/test/build/talos_repo/talos/ffprocess_remote.py", line 183, in launchProcess
14:21:16     INFO -      if (self.testAgent.fireProcess(cmd, maxWaitTime=waitTime) is None):
14:21:16    ERROR -  TypeError: fireProcess() got an unexpected keyword argument 'maxWaitTime'
14:41:17     INFO -  reconnecting socket
14:41:17     INFO -  pushing directory: /tmp/tmppDHjv2/profile to /mnt/sdcard/tests/profile
14:41:17     INFO -  Failed tp4m:
14:41:17     INFO -  		Stopped Tue, 08 Oct 2013 14:41:17
14:41:17    ERROR -  Traceback (most recent call last):
14:41:17     INFO -    File "/builds/panda-0314/test/build/talos_repo/talos/run_tests.py", line 277, in run_tests
14:41:17     INFO -      talos_results.add(mytest.runTest(browser_config, test))
14:41:17     INFO -    File "/builds/panda-0314/test/build/talos_repo/talos/ttest.py", line 289, in runTest
14:41:17     INFO -      self.initializeProfile(profile_dir, browser_config)
14:41:17     INFO -    File "/builds/panda-0314/test/build/talos_repo/talos/ttest.py", line 113, in initializeProfile
14:41:17     INFO -      if not self._ffsetup.InitializeNewProfile(profile_dir, browser_config):
14:41:17     INFO -    File "/builds/panda-0314/test/build/talos_repo/talos/ffsetup.py", line 293, in InitializeNewProfile
14:41:17 CRITICAL -      raise talosError("initialization timed out")
14:41:17 CRITICAL -  talosError: 'initialization timed out'
Flags: needinfo?(jmaher)
First off, super big bonus points for using my favorite panda board!  If the PI god collaborate with the Talos god then we would always run talos on panda-0314, there must be some truth to it.

This appears to be a versioning issue for mozdevice.  Oh, the fun and games continue.  We require mozdevice 0.19 for Talos, the latest and greatest mozdevice also supports maxWaitTime:
https://github.com/mozilla/mozbase/blob/master/mozdevice/mozdevice/devicemanagerSUT.py#L451

This change was added in January 2013:
https://github.com/mozilla/mozbase/commit/85f24245bc36fa662c1f7303409da2211580191c

If you look at the sut_tools, we have a hardcoded version of mozdevice (a copy in fact) in the tree:
https://hg.mozilla.org/build/tools/file/637af87d76a0/sut_tools/mozdevice/devicemanagerSUT.py#l602

I linked to the fireProcess in build/tools/sut_tools/ which shows the old method.  I believe we added mozdevice to sut_tools in September 2012.  We can either update mozdevice in sut_tools, or fix the path in talos to use the intree version vs the sut_tools version.
Flags: needinfo?(jmaher)
Attached patch bug920757.patchSplinter Review
This patch is working in staging with the exception of tspaint which needs to be added to the talos.json in bug 923171
Attachment #815477 - Flags: review?(aki)
Comment on attachment 815477 [details] [diff] [review]
bug920757.patch

I thought of something else I have to fix, let me address that first before I ask for review.
Attachment #815477 - Flags: review?(aki)
Comment on attachment 815477 [details] [diff] [review]
bug920757.patch

So the issue I didn't address yet is that the talos options are both in the talos.json and the mozharness config file.  Right now they are still consumed from the mozharness config file.  I assume the default behaviour should be to consume the options from the talos.json versus the config file?  This would allow them to vary by branch.
Attachment #815477 - Flags: feedback?(aki)
(In reply to Kim Moir [:kmoir] from comment #5)
> Comment on attachment 815477 [details] [diff] [review]
> bug920757.patch
> 
> So the issue I didn't address yet is that the talos options are both in the
> talos.json and the mozharness config file.  Right now they are still
> consumed from the mozharness config file.  I assume the default behaviour
> should be to consume the options from the talos.json versus the config file?
> This would allow them to vary by branch.

That's correct.
mozharness.mozilla.testing.talos has a --use-talos-json option
http://hg.mozilla.org/build/mozharness/file/7fafe7592d9f/mozharness/mozilla/testing/talos.py#l79
that we use here:
http://hg.mozilla.org/build/mozharness/file/7fafe7592d9f/mozharness/mozilla/testing/talos.py#l227

I'm not sure how useful being able to specify the tests in both ways is, or if we should require a talos.json.
Right this patch consumes list of the suites from the talos.json. Some of the options are still in the mozharness config files which I should remove so it's clear they are consumed from the talos.json only.
Comment on attachment 815477 [details] [diff] [review]
bug920757.patch

I don't understand about the specified_talos_suites -- you're getting passed a tuple or list and you want to hardcode to a string?

I would expect this to iterate over the list/tuple if that's what we're setting for the suites to run... or are we explicitly not allowing a list?  If so, silently discarding all but the first might not be as good as only allowing a single suite to be specified.
Attachment #815477 - Flags: feedback?(aki) → feedback+
Attached patch bug920757-2.patch (obsolete) — Splinter Review
Regarding

>>specified_talos_suites -- you're getting passed a tuple or list and you want to hardcode to a string?

There are only one test suite to run for each of the mobile suites

example
http://hg.mozilla.org/projects/cedar/raw-file/d1ca8add49de/testing/talos/talos.json

So I changed it to check that there is only one element in the list and change it to a string. Otherwise it fails with a key error when I try to parse the json.  Not sure if there is a better way to implement this.
Attachment #817463 - Flags: review?(aki)
Comment on attachment 817463 [details] [diff] [review]
bug920757-2.patch

(In reply to Kim Moir [:kmoir] from comment #9)
> >>specified_talos_suites -- you're getting passed a tuple or list and you want to hardcode to a string?
> 
> There are only one test suite to run for each of the mobile suites

If we never will have multiple, we could change this to a string in the json.
If we could possibly someday have multiple, I'm going to guess we want a for loop instead:

    for s in specified_talos_suites:
        ... run that suite

which would allow us to have multiple specified suites in a single run.

However, it looks like mozharness.mozilla.testing.talos only allows for a single suite per run, so this is consistent, at least :)


* Do the env changes work in this patch?  You're doing some self.query_env() manipulation and then not doing anything with the resulting env; I'm not sure if that will get picked up by the subsequent command(s) or not.

Nit: trailing whitespace on line 391 of this patch.
Attachment #817463 - Flags: review?(aki) → review+
I changed it to pass the list of activetests as a string separated by : as they are in the desktop tests.  (For the case if there were more than one in the list).  I also fixed the whitespace.
Attachment #817463 - Attachment is obsolete: true
Attachment #819065 - Flags: review?(aki)
Attachment #819065 - Flags: review?(aki) → review+
Attachment #819065 - Flags: checked-in+
Merged to production
reverted because talos.json was missing on aurora branch
How do I get talos.json on the aurora branch? Is this something that ateam takes care or the sheriffs?
Flags: needinfo?(jmaher)
it sounds like we need to add the new mobile settings into talos.json on aurora.  We have talos.json available, it is just the original format.  I can take care of this tomorrow.
Flags: needinfo?(jmaher)
thanks Joel!
Attachment #821603 - Flags: review?(kmoir) → review+
jmaher the talos.json is missing the mobile definitions on mozilla-release and mozilla-beta too.
Comment on attachment 821726 [details] [diff] [review]
mozilla-release adjustment to talos.json (1.0)

This works in staging but the data is not being written to datazilla, trying to figure out why.  Probably a mozharness issue on my part.
Attachment #821726 - Flags: review?(kmoir) → review+
Attachment #821960 - Flags: review?(jmaher)
Comment on attachment 821960 [details] [diff] [review]
add to talos options for m-r

Should have mentioned that this patch is in addition to the following patch in which isn't currently in production
https://bugzilla.mozilla.org/attachment.cgi?id=819065&action=edit
The datazilla issue I mentioned in comment #21 was just a staging env setup issue because 'mobile_tinderbox_tree': 'MobileTest' in the staging config and didn't reflect it on a branch basis like it does on the real masters.  So no issue.
Comment on attachment 821960 [details] [diff] [review]
add to talos options for m-r

Review of attachment 821960 [details] [diff] [review]:
-----------------------------------------------------------------

::: configs/android/android_panda_talos_releng.py
@@ +47,4 @@
>          #currently disabled
>          "remote-tsvgx":  [],
>          "remote-tcanvasmark":  [],
>          "remote-tsspider":  [],

note tsspider has been disable for a long long time with no plans to reenable it.  add remote-tspaint to the list also.
Attachment #821960 - Flags: review?(jmaher) → review+
https://hg.mozilla.org/releases/mozilla-release/rev/bc5f089773b9

we have m-c, m-a, m-r;  still waiting on m-b but early next week we will merge m-a->m-b
unbitrotten patch that I tested in staging that includes attachment 819065 [details] [diff] [review] and attachment 821960 [details] [diff] [review]
in production
Attachment #824072 - Flags: checked-in+
Attached patch mozprocess.patchSplinter Review
need mozprocess for changes that were also implemented today by jmaher
Attachment #824220 - Flags: review?(jmaher)
Comment on attachment 824220 [details] [diff] [review]
mozprocess.patch

Review of attachment 824220 [details] [diff] [review]:
-----------------------------------------------------------------

::: scripts/android_panda_talos.py
@@ +94,5 @@
>          'mozpoolclient',
>          'mozinfo',
>          'datazilla',
> +        'PyYAML',
> +        'mozprocess', 

nit: trailing whitespace
Attachment #824220 - Flags: review?(jmaher) → review+
Comment on attachment 824220 [details] [diff] [review]
mozprocess.patch

and fixed whitespace
Attachment #824220 - Flags: checked-in+
Depends on: 932640
patch to revert
Attachment #824711 - Flags: checked-in-
Attachment #824711 - Flags: checked-in- → checked-in+
The debugging behind the reason for the revert is described here

https://bugzilla.mozilla.org/show_bug.cgi?id=932640#c3
Attached patch bug920757-apr28.patch (obsolete) — Splinter Review
updated patch and tested on many branches, error on previous landing did not happen
Attached patch bug920757-apr29.patch (obsolete) — Splinter Review
Attachment #8413797 - Attachment is obsolete: true
Attachment #8414641 - Flags: review?(aki)
Comment on attachment 8414641 [details] [diff] [review]
bug920757-apr29.patch

found a slight problem will investigate before asking for a review
Attachment #8414641 - Flags: review?(aki)
Attachment #8414641 - Attachment is obsolete: true
Attachment #8415862 - Flags: review?(aki)
Comment on attachment 8415862 [details] [diff] [review]
bug920757-may1.patch

I haven't done any testing here, but this looks good.

>     def preflight_talos(self, suite_category, suites):
<snip>
>-        self.run_command(cmd, dirs['abs_talosdatatalos_dir'], env=env, halt_on_failure=True, fatal_exit_code=suites.get('fatal_exit_code', 3))
>+        # move fennec ids to repo so we can invoke the scripts from that dir
>+        self.copyfile(os.path.join(dirs['abs_talosdata_dir'], 'fennec_ids.txt'),
>+                      os.path.join(dirs['abs_talosrepo_dir'], 'fennec_ids.txt'),
>+                      error_level=FATAL,
>+                      )
>+
>+        self.run_command(cmd, dirs['abs_preflight_talos_dir'], env=env, halt_on_failure=True)

We're losing the |fatal_exit_code=suites.get('fatal_exit_code', 3)| in this run-command.
Could you add that back?  It will make any failure in this command an infrastructure error (TBPL purple) rather than red, iirc.
Since these preflight commands are things like screensaver or screen resolution, it shouldn't show up as orange or red, but infra.
Attachment #8415862 - Flags: review?(aki) → review+
Attached patch bug920757-may2.patch (obsolete) — Splinter Review
patch with comment in review fixed
Attachment #8416665 - Attachment is obsolete: true
Attachment #8418039 - Flags: checked-in+
Comment on attachment 8418039 [details] [diff] [review]
bug920757-may6.patch

reverted because of errors like this

https://tbpl.mozilla.org/php/getParsedLog.php?id=39296908&tree=Mozilla-Aurora

this did not happen in staging
Attachment #8418039 - Flags: checked-in+ → checked-in-
closing, pandas are on their way out
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: