Closed Bug 855049 Opened 9 years ago Closed 9 years ago

Gaia unit tests fail on buildbot b2g desktop builds

Categories

(Firefox OS Graveyard :: General, defect)

All
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: jgriffin, Unassigned)

References

Details

Gaia unit tests fail on buildbot-produced b2g desktop builds with the error "NS_ERROR_FAILURE" from https://github.com/mozilla-b2g/gaia/blob/master/test_apps/test-agent/common/vendor/mocha/mocha.js#L3856.

This does not happen with a local build.  I used the exact same gaia profile with each, so it doesn't seem to be a Gaia issue.

I'll try to figure out what precisely is failing.
This error gets thrown when trying to enumerate properties on the window object.  Buildbot uses quite a different mozconfig from what one would normally use on a b2g desktop build, so I'm going to start with that and try to narrow it down.
We have the ability to turn these checks off... mocha turns it on to ensure your not exposing unintentional global objects.. for us its less of a concern and probably would make the tests faster... I could probably come up with some simple solution to turn it off only in build bots environment or globally failing that.
I'm unable to reproduce this problem using a local build made with the same mozconfig that buildbot uses, so it's quite puzzling why this occurs.
I'll run a try job with a vastly simpler mozconfig and see if that helps...
This didn't work...even with a very simple mozconfig, the buildbot-produced builds still throw when enumerating properties on the window objects, which is really hard to explain.

jhford, have any ideas why buildbot-produced b2g desktop builds would behave differently than locally built ones, made with the same mozconfig?
Flags: needinfo?(jhford)
(In reply to James Lal [:lightsofapollo] from comment #2)
> We have the ability to turn these checks off... mocha turns it on to ensure
> your not exposing unintentional global objects.. for us its less of a
> concern and probably would make the tests faster... I could probably come up
> with some simple solution to turn it off only in build bots environment or
> globally failing that.

Is there is an easy way to do this locally?  I can try it to see if that's the only problem with this build, or if there are others.
James created a test hack at https://github.com/lightsofapollo/gaia/compare/hack-for-jgriffin which turns off window property enumeration in mocha, and with this hack the tests run correctly.

Since this hack is in a third-party lib, actually getting in production might not be easy.  And, we should really determine why the releng build is behaving differently, as this might impact other tests we want to run on it later.
One significant different between local builds and the buildbot build is that the buildbot builds are l10n builds.  Aki, is there a way to disable the l10n stuff for the per-commit builds, while leaving it intact for nightlies?
(In reply to Jonathan Griffin (:jgriffin) from comment #9)
> One significant different between local builds and the buildbot build is
> that the buildbot builds are l10n builds.  Aki, is there a way to disable
> the l10n stuff for the per-commit builds, while leaving it intact for
> nightlies?

There is, but there was actually a request or reason that we turned it on for per-commit builds.  I'll have to dig that up.
CCing Axel and Stas since we're talking about turning off l10n for desktop per-commit builds.
I don't see any indication that l10n being around would cause these issues. Yet, if unit tests would fail on l10n being around, the tests would be wrong, and would need to be fixed.
(In reply to Axel Hecht (pto - April 2) [:Pike] from comment #13)
> I don't see any indication that l10n being around would cause these issues.
> Yet, if unit tests would fail on l10n being around, the tests would be
> wrong, and would need to be fixed.

It's unlikely that l10n is the cause, and yet I don't know how to approach the problem other than removing variables, and l10n is a big one. 

This isn't a matter of unit tests failing and being wrong, it's a case of the build behaving differently, regardless of any tests.  We need to find out what's causing this difference, so we can fix it, and it must be something in the way that the build is generated.
mrbkap, do you have any idea why this problem occurs?  We're seeing an NS_ERROR_FAILURE error when trying to enumerate the properties of the window object in content JS on B2G desktop builds produced by buildbot, but not on builds people produce locally.  There isn't a mozconfig difference that explains it (see comment #6).
Flags: needinfo?(mrbkap)
Depends on: 860547
It's hard to know where the exception is coming from (though it seems most likely to be coming from XPC_WN_JSOp_Enumerate). jgriffen told me on IRC that he's verified that it's the act of enumerating itself that's causing the exception (that is, it isn't iterating any specific property that's throwing).

It'd be useful to have a debug build to step through with a debugger to get more info here.
Flags: needinfo?(mrbkap)
Blake suggestions seems the best approach. Trying to see if this reproduce in a debug build and if yes this would make it easier to debug this starting from a stack. Can you produce buildbot debug builds?

Also I would suggest to keep this bug open to let someone fix the platform issue once it has been found and open a new bug to land James workaround until this bug is fixed. There is no reasons to block tests integration on this since the workaround affects the *local copy* of a third party lib and we can hack it temporarily.

So I would suggest to:
 - Open a new bug (1) to land James's dirty workaround and do it asap.
 - Have the tests on tbpl! yeah!

 - Open a new bug (2) to remove James's dirty workaround
 - Produce buildbots debug builds and send it to blake or any other JS guys with clear instruction about how to reproduce the issue. (the command line necessary to produce the crash).
 - Fix this bug.
 - Fix (2) by removing the workaround.

How does it sounds?
That sounds like a good approach to me.  I've already filed a bug to request debug builds of these from buildbot as bug 860547.

I'll file a separate bug for landing James' temporary workaround.
Flags: needinfo?(jhford)
(In reply to Jonathan Griffin (:jgriffin) from comment #18)
> That sounds like a good approach to me.  I've already filed a bug to request
> debug builds of these from buildbot as bug 860547.
> 
> I'll file a separate bug for landing James' temporary workaround.

James' hack unfortunately breaks the gaia testrunner for gaia unit tests, but not the buildbot runner.  So we'll have to have him take another look at what needs to be done to work around this problem.  As soon as we can land a temporary hack, we can get these tests turned on in staging.
(In reply to Jonathan Griffin (:jgriffin) from comment #19)
> (In reply to Jonathan Griffin (:jgriffin) from comment #18)
> > That sounds like a good approach to me.  I've already filed a bug to request
> > debug builds of these from buildbot as bug 860547.
> > 
> > I'll file a separate bug for landing James' temporary workaround.
> 
> James' hack unfortunately breaks the gaia testrunner for gaia unit tests,
> but not the buildbot runner.  So we'll have to have him take another look at
> what needs to be done to work around this problem.  As soon as we can land a
> temporary hack, we can get these tests turned on in staging.

James do you have any idea on what's going on?
Sometime in the last week this problem disappeared. Yay!
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(jlal)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.