Closed Bug 852253 Opened 11 years ago Closed 11 years ago
SUT tests failing for b2g pandas
This has been keeping any panda from being considered ready for action (see bug 852242). We want be able to run any tests until this is fixed. 2013-03-18T11:33:22 syslog Imaging complete. Rebooting 2013-03-18T11:33:22 statemachine entering state b2g_rebooting 2013-03-18T11:33:22 syslog Submitting lifeguard event at http://10.12.128.33/api/device/panda-0100/event/b2g_rebooting/ 2013-03-18T11:35:25 statemachine entering state b2g_pinging 2013-03-18T11:35:26 sut connecting to SUT agent 2013-03-18T11:35:26 statemachine entering state sut_verifying 2013-03-18T11:36:16 sut connecting to SUT agent 2013-03-18T11:37:06 sut connecting to SUT agent 2013-03-18T11:37:56 sut connecting to SUT agent 2013-03-18T11:38:46 sut connecting to SUT agent 2013-03-18T11:39:36 sut connecting to SUT agent 2013-03-18T11:40:26 sut connecting to SUT agent 2013-03-18T11:41:16 sut connecting to SUT agent 2013-03-18T11:42:06 sut connecting to SUT agent 2013-03-18T11:42:56 sut connecting to SUT agent 2013-03-18T11:43:46 sut connecting to SUT agent 2013-03-18T11:43:53 statemachine ignored event free in state sut_verifying 2013-03-18T11:44:36 sut connecting to SUT agent 2013-03-18T11:45:26 sut connecting to SUT agent 2013-03-18T11:46:16 sut connecting to SUT agent 2013-03-18T11:47:06 statemachine device has failed: failed_sut_verifying 2013-03-18T11:47:06 statemachine entering state failed_sut_verifying
It might have started from services-central but I'm not sure: https://tbpl.mozilla.org/?jobname=b2g_panda&showall=1&rev=e2c97ca79bd6 https://hg.mozilla.org/mozilla-central/rev/e2c97ca79bd6
Right now our Gaia UI test jobs are depleting our panda pool since we don't come back with a working SUT agent. I will try to slow this down on bug 802317 but this needs to be fixed so we can get somewhere.
Severity: normal → blocker
Mark, do you have any time to look at this?
Sure, I don't have a lot of experience with the whole b2g building process, but I'll try out the latest image on my panda and see if I can reproduce the problem.
Okay, my panda has some issues right now, but what I'm seeing is this: link_image: 2256 could not load needed library 'libplds4.so' for 'sutagent' (load_library: Library 'libplds4.so' not found)CANNOT LINK EXECUTABLE I think that libplds4.so is no longer included in the builds, so the agent can't start. I've verified that I cannot connect to the agent port (20701) at all on the mozpool pandas. I don't know for sure that this is the issue, but I'm betting that it is.
The B2G build produces libplds4.so in objdir-gecko/dist/lib, but it doesn't get copied to the device with a current build. In an old build, it gets copied to /system/b2g. :mwu, :tzimmerman, what can do we to get libplds4.so copied to /system/b2g for panda builds, as it used to? We need this lib for running SUTAgent, and not having it breaks our ability to run tests on pandas in TBPL.
Everything got folded into libnss as part of bug 648407. As I've mentioned before, Negatus should not be using nspr.
Okay I finally got a panda build going here. However I'm going to need some help in figuring out how to link it with libnss, since libnss on Ubuntu doesn't contain the NSPR stuff, so I have to link it with the one built with b2g, somehow.
If you really want to use nspr, build your own and use that. It doesn't make sense to depend on any copy of nspr in /system/b2g since it's part of the code we're interested in testing.
Well, the agent isn't doing the testing; it's just an entry point to controlling the system and is pretty basic--process control, filesystem access, etc. So being integrated into that system isn't a bad thing per se. But your point still stands, since anything could happen to the platform's layout, so we would benefit from being isolated from these changes. We definitely aren't stuck on NSPR; we're evaluating changing away from it in bug 853728. In the short term, however, we'll look into building it ourselves. Thanks for the advice.
To quickly fix this, ted suggested just linking negatus against the static NSPR libs built in b2g. This will let us get the panda sutagent back to a usable state, and we'll look into moving away from NSPR completely at some point.
Oops that last patch was busted.
Attachment #729703 - Flags: review?(ted) → review+
https://github.com/mozilla/Negatus/commit/9c54b3cc545b77f5fade0d10e5b51f1e4cde9d8b I tested this on my own panda here and it worked just fine. I believe the change should be picked up automatically, so we'll see what happens with the new panda builds.
Rail, can you update the snapshot to pick up the above commit (assuming a snapshot update is still required)?
I will create the snapshot.
This has been pushed to try first: https://tbpl.mozilla.org/?tree=Try&rev=7fc78053f2d3 Once the review is positive and the job passes we can land it on inbound.
Comment on attachment 730352 [details] [diff] [review] update b2g snapshot Review of attachment 730352 [details] [diff] [review]: ----------------------------------------------------------------- Seems reasonable, although I'm not overly familiar with the way these snapshots are produced.
Attachment #730352 - Flags: review?(jgriffin) → review+
Sorry jgriffin. For some reason I thought you did. In any case, the patch had a little problem and this is the refreshed patch. Let's hope this push works: https://tbpl.mozilla.org/?tree=Try&rev=3f8e327dfccc
I'm getting a build failure. Anyone can look into it? https://tbpl.mozilla.org/php/getParsedLog.php?id=21205483&tree=Try&full=1#error0
Ah bah, I think Negatus is built before the static libs it depends on. I'll have to take another look at this.
Comment on attachment 730640 [details] [diff] [review] update b2g snapshot Rubber stamping, even though it sounds like we'll need another snapshot.
Attachment #730640 - Flags: review?(aki) → review+
I replaced the bundled shared NSPR libraries with static versions and now link against them. I made sure to do a full build this time, and all looks good.
Comment on attachment 730640 [details] [diff] [review] update b2g snapshot Waiting for new Negatus landing.
Attachment #730640 - Attachment is obsolete: true
Attachment #732336 - Flags: review?(ted) → review+
https://github.com/mozilla/Negatus/commit/5009c0738def17e68855b1c7084fd38ac8fd7545 Please update the snapshot.
I have to push this to try.
Comment on attachment 732862 [details] [diff] [review] include Negatus change The build went well: https://tbpl.mozilla.org/?tree=Try&rev=7c63e3be6922
Attachment #732862 - Flags: review?(aki)
Comment on attachment 732862 [details] [diff] [review] include Negatus change http://hg.mozilla.org/integration/mozilla-inbound/rev/483d3d9f852f
We now can see that we have 39 tests failing: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=483d3d9f852f&showall=1 Thanks Mark!
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.