Closed Bug 572064 Opened 11 years ago Closed 11 years ago

Create new poller for jetpack-sdk repo to target jetpack-sdk unittest suite

Categories

(Release Engineering :: General, defect, P2)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Assigned: lsblakk)

References

Details

(Whiteboard: [q1 goal 2011])

Attachments

(3 files, 13 obsolete files)

3.44 KB, patch
catlee
: review+
lsblakk
: checked-in+
Details | Diff | Splinter Review
3.42 KB, patch
lsblakk
: checked-in+
Details | Diff | Splinter Review
21.20 KB, patch
catlee
: review+
lsblakk
: checked-in+
Details | Diff | Splinter Review
This new project branch is to monitor the existing jetpack repo at http://hg.mozilla.org/labs/jetpack for changes. Whenever a change happens to the jetpack repo, install a recent (nightly? tip-o-tree-latest? ) Firefox build from mozilla-central, and run the "jetpack-SDK" test suite on it. The jetpack-SDK can be found in http://hg.mozilla.org/labs/jetpack-sdk.

Results should be posted to new tinderbox waterfall and TBPL page. 

Dep.bug to track creating jetpack-SDK test-suite. Note: this is a Q2 goal, so once the test suite in bug#570248 is run-able, we'll need to move quickly. Is there anything we can do while waiting for the new test suite?
Can you explain a bit more why we need a jetpack-central repository? How is it different than jetpack-sdk? What do you mean by "monitor the existing jetpack repo for changes"?
Can this branch try using the disposable branches now set up and live? 

https://wiki.mozilla.org/DisposableProjectBranches
This looks like something we can do with a poller on the jetpack repo - once a change comes in it will do sendchanges to talos masters and target the platform/branches that have the jetpack-sdk test suite enabled.
Summary: Create new project branch "jetpack-central" → Create new poller for jetpack-central to target jetpack-sdk talos suite
Whiteboard: [q2goal]
http://hg.mozilla.org/labs/jetpack-sdk/  is the repo that needs polling.

first priority is to run it against the most recent 1.9.2 (3.6.x) build to confirm that tip of jetpack plays well with 3.6.x

second priority is to run the most recent m-c build against the (stable) 0.5 jetpack-sdk release to make sure that m-c doesn't break stable jetpack

third would be tip of jetpack-sdk to tip of m-c
Jetpack poller should pull nightlies (latest-m-c) for the tip to tip builds, and for the 3.6 to jetpack-sdk tip test runs it should pull from latest-3.6 in the releases dir, not Minefield.
Depends on: 570251
Summary: Create new poller for jetpack-central to target jetpack-sdk talos suite → Create new poller for jetpack-sdk repo to target jetpack-sdk unittest suite
Whiteboard: [q2goal] → [q3goal]
Priority: -- → P3
Priority: P3 → P2
I have a very hacky proof-of-concept running now on fedora64:

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTest/1296692434.1296692534.2417.gz&fulltext=1

This should poll the addon-sdk repo (I have only done forced builds so far) and on a change, it runs a jetpack.sh script which is similar to the run_jetpack.sh except that it downloads both the tip of addon-sdk as well as the latest-mozilla-central build from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/ and then runs the jetpack cfx testall against that binary.
Just doing some final testing of my patch for this to make sure it doesn't affect the jetpack unittest suite since they will both be using the same script.

I have the jetpack poller running on all platforms except w764 since I don't have any w764 slave to test against right now. They are on loan for other bugs.  But we could go ahead soon with what I have and add w764 as soon as I get a chance to test it.

In the meantime if someone can take a look at the win7 test output and let me know if this is an automation issue or a jetpack test suite issue that would be really helpful: http://pastebin.mozilla.org/1100391
Status: NEW → ASSIGNED
Hmm, I haven't seen that problem on Windows 7, which leads me to suspect it is an automation issue.

But it's hard to say.  Brian: what do you think?
it feels like some test hung. Then, when the parent 'cfx test' process got bored and timed out, it tried to cleanup after the browser process (which didn't actually terminate, cause it was still hung), and failed to delete all the files (which the browser was still using). Sort of like the Monty Python "I'm not dead yet" skit from Holy Grail.

I'd like to know which test was last running.. currently we're running 'cfx test' in a mode that only prints out one dot per test, but for automated environments it'd be better to run it in a more verbose mode. Can we change it to run "cfx test --verbose" instead of "cfx test"? That might be *too* verbose, but it should at least print out the individual test case names.
Being on working every day, I face this error occasionally. 
As Brian said, this is a previous "cfx test" that failed and especially failed to kill the firefox instance it used for the test!
The only solution to execute cfx test again is to kill manually this firefox instance, because this firefox process seems to never close itself, nor be killed by anyone.
(The big problem on window is that usually, when you kill a process it doesn't necessarely kill its children...)
Hrm, what I've done in other projects is to have the parent process touch a file every couple of seconds, and put the child process into a mode where it checks that file every couple of seconds, and terminates unless the file is less than a minute or two old. That sort of thing clears up the long-term problem of processes hanging around after their parent has given up waiting on them and terminates themselves. The usual problem that resulted was from accidental timeouts on loaded systems, and also second test runs that happened too quickly after the first failure for the cleanup mechanism to trigger (it's a long-term fix but not a short-term fix).

Not sure how hard that'd be to implement in this environment.. if we were to do it, then 'cfx' should be the parent process and packages/test-harness/lib/harness.js is basically the child process.
I know that mozrunner, that cfx use, has a bunch of code that aims to solve this specific problem (children not killed):
https://github.com/ctalbert/mozrunner/blob/master/mozrunner/killableprocess.py#L161

So this behavior is somekind of bug of cfx using mozrunning badly. Or there is a bug in mozrunner code, but I don't think so, because mozmill/QA team wouldn't be able to run their tests easily!
Clint: any insight into the hang Lukas is experiencing?
(In reply to comment #17)
> Clint: any insight into the hang Lukas is experiencing?

To be clear, this doesn't 'hang' the test suite, it just fails with warnings (orange).
Ok this has been tested and not only does it not break the script being run on mozilla-central test suites (I tested this against at least one each of the 3 platforms types) it also get green runs on: fed, fed64, leopard, snow, xp

As mentioned in comment 11 win7 is giving warnings and going orange on running the test - I've pointed it out to Myk and Brian Warner, I also wonder if there's anything up with that particular test machine and early next week I could try with a different slave to see if that changes the results at all.

w764 isn't tested against this yet since there are no slaves available and also we don't even run jetpack against mozilla-central yet so I can put turning this on for w764 in another bug to track it.

There's one thing about this patch that I think might be an issue - I had to play with the loading/reloading magic for buildbotcustom and in fact, when testing I was having some reconfig issues and forgot that I had messed with the order.  So on Monday I will test this again quickly in the old order and see if I still need to make that change, if so I'll check in with Catlee about what the options are - like can we import buildbotcustom twice?
Attachment #509288 - Attachment is obsolete: true
Attachment #515261 - Flags: review?(catlee)
Attachment #509287 - Attachment is obsolete: true
Attachment #515262 - Flags: review?(catlee)
Attached patch [tested] jetpack poller tools (obsolete) — Splinter Review
So I was in fact able to re-use the same run_jetpack.sh script that we use over in the builder side of things, I tested this against both jetpack-poller builders and mozilla-central unittest jetpack suites and both work with this script on all platform types (linux,win,mac).
Attachment #509289 - Attachment is obsolete: true
Attachment #515263 - Flags: review?(catlee)
Comment on attachment 515261 [details] [diff] [review]
[tested] jetpack poller buildbot-configs

>+buildObjects = {}
>+
> for branch in ACTIVE_BRANCHES:
>-    branchObjects = generateTalosBranchObjects(branch, BRANCHES[branch],
>+    talosObjects = generateTalosBranchObjects(branch, BRANCHES[branch],
>                                                ACTIVE_PLATFORMS, SUITES,
>                                                BRANCH_UNITTEST_VARS['platforms'])
>+    buildObjects = mergeBuildObjects(buildObjects, talosObjects)
>+
>+for project in ACTIVE_PROJECTS:
>+    projectObjects = generateProjectObjects(project, PROJECTS[project], SLAVES)
>+    buildObjects = mergeBuildObjects(buildObjects, projectObjects)
> 
>     # No change sources here please!
>-    c['builders'].extend(branchObjects['builders'])
>-    c['status'].extend(branchObjects['status'])
>+    c['builders'].extend(buildObjects['builders'])
>+    c['status'].extend(buildObjects['status'])
>     # We need just the triggerable schedulers here
>-    for s in branchObjects['schedulers']:
>+    for s in buildObjects['schedulers']:
>         if isinstance(s, Triggerable):
>             c['schedulers'].append(s)
> 

Because you've broken out buildObjects here (and in the other master files),
you can't extend c['builders'] et al. with branchObjects['builders'] at each
iteration through the loop; we'll end up with duplicate builders, status
plugins, etc. otherwise.

Please look at mozilla/builder_master.cfg or mozilla/scheduler_master.cfg to
see how c['builders'] et al. should be updated.
Attachment #515261 - Flags: review?(catlee) → review-
as per discussion in IRC
Attachment #515263 - Attachment is obsolete: true
Attachment #516346 - Flags: review?(catlee)
I cleaned up the logic for putting the c['builders'] together properly and at the same time managed to figure out how to not need to reorganize the loading/reloading magic so this is a much simpler and cleaner patch and also found that I had missed adding PROJECTS staging/preproduction tests schedulers so that's covered too now.
Attachment #515261 - Attachment is obsolete: true
Attachment #516355 - Flags: review?(catlee)
same as before, with preproduction_test_master_localconfig.py updated
Attachment #516355 - Attachment is obsolete: true
Attachment #516357 - Flags: review?(catlee)
Attachment #516355 - Flags: review?(catlee)
two things to say: 

1. We have wayyy too many custom localconfigs and could perhaps put the custom urls and manhole settings in a config file and just have one for each type of master
2. Thank you test-masters.sh
Attachment #516357 - Attachment is obsolete: true
Attachment #516360 - Flags: review?(catlee)
Attachment #516357 - Flags: review?(catlee)
so sorry for the bug spam, one last typo
Attachment #516360 - Attachment is obsolete: true
Attachment #516363 - Flags: review?(catlee)
Attachment #516360 - Flags: review?(catlee)
Whiteboard: [q3goal] → [q1 goal 2011]
(In reply to comment #11)

> In the meantime if someone can take a look at the win7 test output and let me
> know if this is an automation issue or a jetpack test suite issue that would be
> really helpful: http://pastebin.mozilla.org/1100391

That windows error bothers me.  I've tried to repro it for days now.  I can't
repro this at all in our buildbot staging environment, or in a faked buildbot
environment that simulates the nature of how things are launched in the
buildbot shell, even when I tweak things I think ought to cause this problem.

The thing to do might be to get a jetpack poller that reports to the jetpack
branch of TBPL, and then debug this by changing the jetpack code to see if we
can figure out where it is going off the rails.

Is there any way I can supply a jetpack tarball with some debugging code, get
it run on win7 such that this problem reoccurs and then get that output? 
Perhaps the solution is to wait for this bug to land?
(In reply to comment #28)

> That windows error bothers me.  I've tried to repro it for days now.  I can't
> repro this at all in our buildbot staging environment, or in a faked buildbot
> environment that simulates the nature of how things are launched in the
> buildbot shell, even when I tweak things I think ought to cause this problem.

I should have tested it on another windows machine because it's possible that the staging slave I was using got bunged up somehow. We have some available today so I will run it a few times on a couple more machines and confirm that.

> The thing to do might be to get a jetpack poller that reports to the jetpack
> branch of TBPL, and then debug this by changing the jetpack code to see if we
> can figure out where it is going off the rails.

Currently when someone is running something in staging it reports to http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaTest so if this does re-occur with the other test boxes, I'll point you there for debugging/logs 

> Is there any way I can supply a jetpack tarball with some debugging code, get
> it run on win7 such that this problem reoccurs and then get that output? 
> Perhaps the solution is to wait for this bug to land?

I would say this is a last resort if the above two options don't work out, but yes - if you can provide me an url to a different jetpack repo (user repo perhaps), I can set up the staging environment to pull the tarball from there to run tests on changed code.

I'm going to go set up those windows machines now and get back to you soon.
Clint:  here's a run on a different w7 slave and sadly it looks like the same issue.  So I have staging set up right now and am happy to adjust it to poll and pull from your own jetpack tarball if you send me a repo link to point it at.

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTest/1299266679.1299266702.9385.gz&fulltext=1
(In reply to comment #30)
> Clint:  here's a run on a different w7 slave and sadly it looks like the same
> issue.  So I have staging set up right now and am happy to adjust it to poll
> and pull from your own jetpack tarball if you send me a repo link to point it
> at.
> 
> http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTest/1299266679.1299266702.9385.gz&fulltext=1

Ok, at least we have some way to repro it.  Let me put a tarball together with some logging.  Will send you a link to it shortly.
Lukas, 
I have a jetpack tarball with some logging for you to try up here:
http://people.mozilla.org/~ctalbert/jetpack/tip.tar.bz2

Many thanks!
Attachment #515262 - Flags: review?(catlee) → review+
Comment on attachment 516363 [details] [diff] [review]
[tested] jetpack poller buildbot-configs (caught one more missed include for PROJECTS)

Can you get rid of the "# No change sources here please!" comment in universal_master?  It's not accurate...

Also, I think you need to update scheduler_master.cfg
Attachment #516363 - Flags: review?(catlee) → review-
Attachment #516346 - Flags: review?(catlee) → review+
took out the comment and also updated scheduler_master.cfg
Attachment #516363 - Attachment is obsolete: true
Attachment #518123 - Flags: review?(catlee)
carrying forward the r+ on this, just took out the nextSlave part so that this gets builds on checkin
Attachment #515262 - Attachment is obsolete: true
Attachment #518123 - Attachment is obsolete: true
Attachment #518131 - Flags: review?(catlee)
Attachment #518123 - Flags: review?(catlee)
Comment on attachment 518131 [details] [diff] [review]
jetpack poller buildbot-configs with schedulers and changesources

Looks good except that the tinderbox tree for preproduction shouldn't be the production jetpack tree.  We have a RelEng-Preproduction tree for this.
Attachment #518131 - Flags: review?(catlee) → review+
carries forward the r+ from previous patch, adjusted the preproduction tinderbox tree
Attachment #518131 - Attachment is obsolete: true
Comment on attachment 518157 [details] [diff] [review]
jetpack poller buildbot-configs with correct preproduction tinderbox tree

http://hg.mozilla.org/build/buildbot-configs/rev/e059958cfa1c

landed on default
Attachment #518157 - Flags: checked-in+
Comment on attachment 518125 [details] [diff] [review]
[tested] jetpack poller buildbotcustom (with idle slave logic removed)

http://hg.mozilla.org/build/buildbotcustom/rev/d0bd4c6e42cf

landed on default
Attachment #518125 - Flags: checked-in+
Comment on attachment 516346 [details] [diff] [review]
[tested] jetpack poller tools with PREP_CMD and no more echo ``

http://hg.mozilla.org/build/tools/rev/c931538c7b1f

landed on default
Attachment #516346 - Flags: checked-in+
Comment on attachment 518157 [details] [diff] [review]
jetpack poller buildbot-configs with correct preproduction tinderbox tree

http://hg.mozilla.org/build/buildbot-configs/rev/f1a4d01c8caa
Attachment #518157 - Flags: checked-in+ → checked-in-
So after this morning's error on reconfig (that weren't caught in a checkconfig) due to import/reload issues, I ran this on a local test-master and a test-scheduler-master both times starting the buildbot without the patch, tested doing a reconfig with almost the whole patch but not the reload order changed (reproduced error) and then again with the reload order changed and was able to successfully reconfig both masters.  Also ran it against test-master.sh to be certain.
Attachment #518157 - Attachment is obsolete: true
Attachment #518554 - Flags: review?(catlee)
Attachment #518554 - Flags: review?(catlee) → review+
Comment on attachment 518554 [details] [diff] [review]
[tested] jetpack poller buildbot-configs that handles the reconfig issue (rearrange import/reload)

http://hg.mozilla.org/build/buildbot-configs/rev/a41616f7cab9
Attachment #518554 - Flags: checked-in+
This is now live in production and no issues have been reported about the polling of repo/triggering of builds. I'm closing this now and any future bugs about this project can be filed separately to get the attention they need.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.