Closed Bug 794587 Opened 12 years ago Closed 12 years ago

tpaint/twinopen times out when webroot != talos root

Categories

(Testing :: Talos, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Unassigned)

References

Details

We're hitting the 300 second timeout here on osx + windows when we're not using the talos zip.

I'm not entirely sure what the details are, but Joel suspects that the different webroot and talos root are the root cause.

I'm going to disable tpaint on mozharness talos until this is resolved.
I am being dense today, but could we put some steps in here on how to get mozharness locally and run the test?  I assume this doesn't have problems on linux?
(In reply to Joel Maher (:jmaher) from comment #1)
> I am being dense today, but could we put some steps in here on how to get
> mozharness locally and run the test?

Hm.

I think this'll be easiest to test once bug 713017 lands on Cedar.  Then to replicate:

1) edit cedar's testing/talos/talos.json to add tpaint back in the other tests.  It might help to point the talos_url at a temporary location as well, maybe people.m.o.
2) make sure you have that new talos_url populated
3) hg push that talos.json to hg.m.o/projects/cedar
4) this will build that revision, then launch mozharness talos jobs for that build.  The talos 'other' suite should go red.
5) update the talos_url tarball with the fixes you want tested
6) click on the red 'o' on https://tbpl.mozilla.org/?tree=Cedar and hit the '+' sign in the footer (hover text will say "Rebuild".  This will re-run the 'other' suite.
7) repeat 5 and 6 til it goes green
8) get review, land talos changes, file a bug for a new talos tarball to be uploaded
9) revert the testing/talos/talos.json talos_url change

If you want instructions for how to replicate this locally, that'll probably be a longer set of instructions, including setting apache up locally.  Or getting a loaned test slave.

> I assume this doesn't have problems on
> linux?

We're short on linux test slaves in staging, and we tend to hit more issues on osx and windows anyway.  So I've only tested on snow leopard and xp so far; I'll need to look at the other platforms carefully on Cedar before we roll out.
A loaned slave would be preferred then...I was hoping to run this locally to debug, but a slave might be quicker and easier for all.
Ok, I'll change the passwords on my osx box.
Sorry, normal "how to run locally" instructions would include --develop, which would prevent you from replicating this bug.
Instructions sent via email.
Joel said this was related to bug 784834.
See Also: → 784834
Blocks: 713055
No longer blocks: 713017
14:30:24     INFO -  Running test tpaint:
14:30:24     INFO -  		Started Wed, 17 Oct 2012 14:30:24
14:30:24     INFO -  DEBUG: operating with platform_type : mac_
14:30:24     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:30:24     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:30:24     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:30:24     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:30:25     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:30:25     INFO -  DEBUG: created profile
14:30:25     INFO -  2012-10-17 14:30:25.909 firefox[1289:707] invalid drawable
14:30:28     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:30:28     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:30:28     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:30:28     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:30:28     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:30:28     INFO -  	Screen width/height:1600/1200
14:30:28     INFO -  	colorDepth:24
14:30:28     INFO -  	Browser inner width/height: 1024/681
14:30:28     INFO -  browser_name:Firefox
14:30:28     INFO -  browser_version:19.0a1
14:30:28     INFO -  buildID:20121017131514
14:30:28     INFO -  DEBUG: initialized firefox
14:30:33     INFO -  DEBUG: command line: /builds/slave/talos-slave/test/build/application/FirefoxNightly.app/Contents/MacOS/firefox -foreground  -profile /var/folders/kp/d3gspp5n44dgvr_5sd53dvhr00000w/T/tmpcO4CWR/profile startup_test/twinopen/winopen.xul?mozafterpaint=1%26phase1=20
14:30:33     INFO -  2012-10-17 14:30:33.717 firefox[1297:707] invalid drawable
14:30:49     INFO -  2012-10-17 14:30:49.009 firefox[1297:b50b] Persistent UI failed to open file file://localhost/Users/cltbld/Library/Saved%20Application%20State/org.mozilla.nightly.savedState/window_1.data: No such file or directory (2)
14:35:38     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:35:38     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:35:38     INFO -  sh: line 1:  1297 Terminated: 15          /builds/slave/talos-slave/test/build/application/FirefoxNightly.app/Contents/MacOS/firefox -foreground -profile /var/folders/kp/d3gspp5n44dgvr_5sd53dvhr00000w/T/tmpcO4CWR/profile startup_test/twinopen/winopen.xul?mozafterpaint=1%26phase1=20 > browser_output.txt
14:35:43     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:35:43     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:35:43     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:35:43     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:35:43     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:35:43     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:35:43     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:35:43     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:35:43     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:35:43     INFO -  dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
14:35:43     INFO -  Failed tpaint:
14:35:43     INFO -  		Stopped Wed, 17 Oct 2012 14:35:43
14:35:43    ERROR -  Traceback (most recent call last):
14:35:43     INFO -    File "/builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/talos/run_tests.py", line 250, in run_tests
14:35:43     INFO -      talos_results.add(mytest.runTest(browser_config, test))
14:35:43     INFO -    File "/builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/talos/ttest.py", line 366, in runTest
14:35:43 CRITICAL -      raise talosError("timeout exceeded")
14:35:43 CRITICAL -  talosError: 'timeout exceeded'
14:35:43 CRITICAL -  FAIL: Busted: tpaint
14:35:43     INFO -  FAIL: timeout exceeded
14:35:43    ERROR -  Traceback (most recent call last):
14:35:43     INFO -    File "/builds/slave/talos-slave/test/build/venv/bin/talos", line 9, in <module>
14:35:43     INFO -      load_entry_point('talos==0.0', 'console_scripts', 'talos')()
14:35:43     INFO -    File "/builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/talos/run_tests.py", line 295, in main
14:35:43     INFO -      run_tests(parser)
14:35:43     INFO -    File "/builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/talos/run_tests.py", line 259, in run_tests
14:35:43     INFO -      raise e
14:35:43 CRITICAL -  talos.utils.talosError: 'timeout exceeded'
14:35:43    ERROR - Return code: 1
I tried upgrading talos on cedar to see if this would be fixed, and ended up getting in dependency hell.

It turns out that since talos doesn't specify versions for a number of dependencies, it'll take the latest, and even if I switch back to the older (working) talos, it'll stay broken.

Fixed via

find . -newer talos-0e.....tar.gz -type f -exec rm {} \;

which removed all the mozbase packages I uploaded after the newer talos.  Back to green.

This means that we won't be able to easily upgrade talos on a single branch.
So if I understand this correctly, the tests work after updating talos?

I wonder if we could tag versions of talos when we branch?  I am sure we can come up with a solution to make updating and different versions of talos/branch work.
(In reply to Joel Maher (:jmaher) from comment #10)
> So if I understand this correctly, the tests work after updating talos?

Nope, everything's perma-red.
I backed out since just having tpaint broken is better than all t* broken.

> I wonder if we could tag versions of talos when we branch?  I am sure we can
> come up with a solution to make updating and different versions of
> talos/branch work.

We'd have to explicitly list the dependencies, otherwise findlinks will grab the latest package of a type (e.g. mozinfo) that's on the server.
I'm not sure exactly what is being seen here, but I have two blind guesses:

- if talos is over-lenient with the versions of dependencies it specifies, this should be corrected

- if the dependencies that talos needs aren't available on the find-links page, this should be corrected

In addition, we should probably give talos a real version.  I'm guessing this isn't the case with how mozharness is being run, but there is another type of dependency hell when you have multiple versions of the same package that require different dependencies
(In reply to Jeff Hammel [:jhammel] from comment #12)
> I'm not sure exactly what is being seen here, but I have two blind guesses:
> 
> - if talos is over-lenient with the versions of dependencies it specifies,
> this should be corrected

Yes

> - if the dependencies that talos needs aren't available on the find-links
> page, this should be corrected

This caused the first wave of red; adding the new dependencies caused the 2nd-nth waves of red.

> In addition, we should probably give talos a real version.  I'm guessing
> this isn't the case with how mozharness is being run, but there is another
> type of dependency hell when you have multiple versions of the same package
> that require different dependencies

I don't particularly care here.


I think the problems are threefold:

* It wasn't readily apparent to me that I had to update the dependencies when updating.  When I did so, there was a second level of dependencies specified by the original dependencies which I had to upload.
** This isn't great, but I think we can live with it.

* After I got all the dependencies up, Talos was still going red because we were crashing, and minidump stackwalk failed to run.  I'm not sure where the problem is here, but it greened up when I backed everything out, so something about the upgraded python modules was causing something to break.

* After I pointed back to the original talos_url, Talos was still going red because findlinks was finding newer mozbase modules that were still broken.  To back out, I had to point back to the original talos_url and remove all new mozbase modules from the findlinks url.
This appears to be fixed with a more recent talos.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.