Closed Bug 852253 Opened 9 years ago Closed 9 years ago

SUT tests failing for b2g pandas

Categories

(Testing :: General, defect)

x86
macOS
defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Unassigned)

References

Details

Attachments

(2 files, 4 obsolete files)

This has been keeping any panda from being considered ready for action (see bug 852242).

We want be able to run any tests until this is fixed.

2013-03-18T11:33:22 syslog Imaging complete. Rebooting
2013-03-18T11:33:22 statemachine entering state b2g_rebooting
2013-03-18T11:33:22 syslog Submitting lifeguard event at http://10.12.128.33/api/device/panda-0100/event/b2g_rebooting/
2013-03-18T11:35:25 statemachine entering state b2g_pinging
2013-03-18T11:35:26 sut connecting to SUT agent
2013-03-18T11:35:26 statemachine entering state sut_verifying
2013-03-18T11:36:16 sut connecting to SUT agent
2013-03-18T11:37:06 sut connecting to SUT agent
2013-03-18T11:37:56 sut connecting to SUT agent
2013-03-18T11:38:46 sut connecting to SUT agent
2013-03-18T11:39:36 sut connecting to SUT agent
2013-03-18T11:40:26 sut connecting to SUT agent
2013-03-18T11:41:16 sut connecting to SUT agent
2013-03-18T11:42:06 sut connecting to SUT agent
2013-03-18T11:42:56 sut connecting to SUT agent
2013-03-18T11:43:46 sut connecting to SUT agent
2013-03-18T11:43:53 statemachine ignored event free in state sut_verifying
2013-03-18T11:44:36 sut connecting to SUT agent
2013-03-18T11:45:26 sut connecting to SUT agent
2013-03-18T11:46:16 sut connecting to SUT agent
2013-03-18T11:47:06 statemachine device has failed: failed_sut_verifying
2013-03-18T11:47:06 statemachine entering state failed_sut_verifying
Right now our Gaia UI test jobs are depleting our panda pool since we don't come back with a working SUT agent. I will try to slow this down on bug 802317 but this needs to be fixed so we can get somewhere.
Severity: normal → blocker
Mark, do you have any time to look at this?
Sure, I don't have a lot of experience with the whole b2g building process, but I'll try out the latest image on my panda and see if I can reproduce the problem.
Okay, my panda has some issues right now, but what I'm seeing is this:

link_image[1855]:  2256 could not load needed library 'libplds4.so' for 'sutagent' (load_library[1118]: Library 'libplds4.so' not found)CANNOT LINK EXECUTABLE

I think that libplds4.so is no longer included in the builds, so the agent can't start.

I've verified that I cannot connect to the agent port (20701) at all on the mozpool pandas.  I don't know for sure that this is the issue, but I'm betting that it is.
The B2G build produces libplds4.so in objdir-gecko/dist/lib, but it doesn't get copied to the device with a current build.  In an old build, it gets copied to /system/b2g.

:mwu, :tzimmerman, what can do we to get libplds4.so copied to /system/b2g for panda builds, as it used to?  We need this lib for running SUTAgent, and not having it breaks our ability to run tests on pandas in TBPL.
Everything got folded into libnss as part of bug 648407.

As I've mentioned before, Negatus should not be using nspr.
Okay I finally got a panda build going here.  However I'm going to need some help in figuring out how to link it with libnss, since libnss on Ubuntu doesn't contain the NSPR stuff, so I have to link it with the one built with b2g, somehow.
If you really want to use nspr, build your own and use that. It doesn't make sense to depend on any copy of nspr in /system/b2g since it's part of the code we're interested in testing.
Well, the agent isn't doing the testing; it's just an entry point to controlling the system and is pretty basic--process control, filesystem access, etc.  So being integrated into that system isn't a bad thing per se.  But your point still stands, since anything could happen to the platform's layout, so we would benefit from being isolated from these changes.

We definitely aren't stuck on NSPR; we're evaluating changing away from it in bug 853728.  In the short term, however, we'll look into building it ourselves.  Thanks for the advice.
Assignee: nobody → mcote
Status: NEW → ASSIGNED
To quickly fix this, ted suggested just linking negatus against the static NSPR libs built in b2g.  This will let us get the panda sutagent back to a usable state, and we'll look into moving away from NSPR completely at some point.
Attachment #729698 - Flags: review?(ted)
Oops that last patch was busted.
Attachment #729698 - Attachment is obsolete: true
Attachment #729698 - Flags: review?(ted)
Attachment #729703 - Flags: review?(ted)
Attachment #729703 - Flags: review?(ted) → review+
https://github.com/mozilla/Negatus/commit/9c54b3cc545b77f5fade0d10e5b51f1e4cde9d8b

I tested this on my own panda here and it worked just fine.  I believe the change should be picked up automatically, so we'll see what happens with the new panda builds.
Rail, can you update the snapshot to pick up the above commit (assuming a snapshot update is still required)?
Flags: needinfo?(rail)
I will create the snapshot.
Flags: needinfo?(rail)
Attached patch update b2g snapshot (obsolete) — Splinter Review
Attachment #730352 - Flags: review?(jgriffin)
This has been pushed to try first:
https://tbpl.mozilla.org/?tree=Try&rev=7fc78053f2d3

Once the review is positive and the job passes we can land it on inbound.
Comment on attachment 730352 [details] [diff] [review]
update b2g snapshot

Review of attachment 730352 [details] [diff] [review]:
-----------------------------------------------------------------

Seems reasonable, although I'm not overly familiar with the way these snapshots are produced.
Attachment #730352 - Flags: review?(jgriffin) → review+
Attached patch update b2g snapshot (obsolete) — Splinter Review
Sorry jgriffin. For some reason I thought you did.

In any case, the patch had a little problem and this is the refreshed patch.

Let's hope this push works:
https://tbpl.mozilla.org/?tree=Try&rev=3f8e327dfccc
Attachment #730352 - Attachment is obsolete: true
Attachment #730640 - Flags: review?(aki)
I'm getting a build failure.
Anyone can look into it?
https://tbpl.mozilla.org/php/getParsedLog.php?id=21205483&tree=Try&full=1#error0
Ah bah, I think Negatus is built before the static libs it depends on.  I'll have to take another look at this.
Comment on attachment 730640 [details] [diff] [review]
update b2g snapshot

Rubber stamping, even though it sounds like we'll need another snapshot.
Attachment #730640 - Flags: review?(aki) → review+
I replaced the bundled shared NSPR libraries with static versions and now link against them.  I made sure to do a full build this time, and all looks good.
Attachment #729703 - Attachment is obsolete: true
Attachment #732336 - Flags: review?(ted)
Comment on attachment 730640 [details] [diff] [review]
update b2g snapshot

Waiting for new Negatus landing.
Attachment #730640 - Attachment is obsolete: true
Attachment #732336 - Flags: review?(ted) → review+
I have to push this to try.
Comment on attachment 732862 [details] [diff] [review]
include Negatus change

The build went well:
https://tbpl.mozilla.org/?tree=Try&rev=7c63e3be6922
Attachment #732862 - Flags: review?(aki)
Attachment #732862 - Flags: review?(aki) → review+
We now can see that we have 39 tests failing:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=483d3d9f852f&showall=1

Thanks Mark!
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.