Closed Bug 932640 Opened 9 years ago Closed 9 years ago

tpn on Pandas on mozilla-beta broken


(Release Engineering :: General, defect)

Not set


(Not tracked)



(Reporter: philor, Assigned: kmoir)



(Keywords: regression)

I didn't quite believe the first three runs on the tip of beta, because two of the slaves had hardly ever done anything before (2 and 50-odd runs lifetime), so I retriggered a bunch, and retriggered several on the previous push which had been green.*tp4m says that something outside the tree caused us to become unable to run tp on Pandas on beta. Betting people are betting on bug 920757.

mozilla-beta is closed.
I don't understand how this test can run fine on m-c, m-a, project branches etc. with the same releng code and pass and yet timeout on m-b.  Is there a longer set of websites contacted for m-b and this is the reason it's timing out?  It's not failing because it can't parse the json or isn't being invoked properly.  It's failing because it's reaching a timeout, and bug 920757 didn't change anything in that regard.  I'll talk to someone on the ateam about it when they get in for further information.
jmaher and I are debugging this. According to him we complete the test, get the report at the harness level, and the browser terminates. So something is wrong with closing out the process, we are not uploading results.  He is trying to run the tests locally on a panda with the mozharness script and see if the issue can be reproduced.
I did a lot of testing this morning.  This is what I found.

The m-b talos.json specifies a different revision of talos than the ca2229a32cb6 vs 655c5140970c (revision of talos to clone).

To clarify, in the old way, we unzip the and run the scripts from there.  The new way, we clone the revision of the talos repo and run the scripts from there.

I ran a staging run where the talos revision matched the version of with the revision of m-b where tpn failed.  The test still timed out.

So I then reverted to the mozharness scripts that do not consume talos.json and ran the tests in staging.  The test ran green on m-b.  At this point, my guess is that there is something in our virtualenv that is causing the problem.  Here is what they look like for comparison.

virtualenv on beta where tpn was green
20:34:43     INFO - Current package versions:
20:34:43     INFO -   blobuploader == 1.0b
20:34:43     INFO -   mozfile == 0.9
20:34:43     INFO -   distribute == 0.6.24
20:34:43     INFO -   mozsystemmonitor == 0.0
20:34:43     INFO -   wsgiref == 0.1.2
20:34:43     INFO -   mozlog == 1.2
20:34:43     INFO -   mozcrash == 0.9
20:34:43     INFO -   mozpoolclient == 0.1.2
20:34:43     INFO -   requests == 1.2.3
20:34:43     INFO -   docopt == 0.6.1

when tpn was red
12:39:35     INFO - Current package versions:
12:39:35     INFO -   mozfile == 0.9
12:39:35     INFO -   PyYAML == 3.10
12:39:35     INFO -   oauth2 == 1.5.211
12:39:35     INFO -   mozinfo == 0.5
12:39:35     INFO -   blobuploader == 1.0b
12:39:35     INFO -   distribute == 0.6.24
12:39:35     INFO -   mozsystemmonitor == 0.0
12:39:35     INFO -   wsgiref == 0.1.2
12:39:35     INFO -   mozdevice == 0.26
12:39:35     INFO -   mozlog == 1.2
12:39:35     INFO -   mozcrash == 0.9
12:39:35     INFO -   mozpoolclient == 0.1.2
12:39:35     INFO -   datazilla == 1.4
12:39:35     INFO -   requests == 1.2.3
12:39:35     INFO -   docopt == 0.6.1
12:39:35     INFO -   httplib2 == 0.7.4

I'll revert bug 920757 (again).  It is very frustrating that one test on one branch is causing issues but that is release engineering life :-)
How often do tpn results on Pandas on beta cause us to back out a patch?

I would guess "never".
tpn on m-b is green again. Reopened :)
Assignee: nobody → kmoir
Closed: 9 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.