Closed Bug 807478 Opened 8 years ago Closed 8 years ago

B2G: turn on jsloader.reuseGlobal

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-basecamp:+, firefox18 fixed, firefox19 fixed)

RESOLVED FIXED
blocking-basecamp +
Tracking Status
firefox18 --- fixed
firefox19 --- fixed

People

(Reporter: philikon, Assigned: philikon)

References

Details

Attachments

(4 files, 2 obsolete files)

jsloader.reuseGlobal was disabled since it breaks Marionette tests. We need to get it re-enabled.
Blocks a blocker.
blocking-basecamp: --- → +
Assignee: nobody → philipp
Even though the pref was turned off for B2G on m-i tip (which has bug 798491), Marionette test harness is stalled during startup at

  waiting for system-message-listener-ready...

I'm investigating what broke it, but it might very well be that the change in bug 798491 has some weird side-effect that breaks Marionette.
Note that we are planning to turn on some B2G tests on m-c/m-i/try on TBPL tomorrow.  We won't be able to do that if Marionette is broken. :(
Ok, I have confirmed that bug 798491 definitely broke Marionette, even with the pref off (commit 553fb59b9ca0 works, commit 67cb43bb8865 doesn't).
(In reply to Philipp von Weitershausen [:philikon] from comment #4)
> Ok, I have confirmed that bug 798491 definitely broke Marionette, even with
> the pref off (commit 553fb59b9ca0 works, commit 67cb43bb8865 doesn't).

In that case, can we back that out until we figure why its busted?  We are planning to roll out mochitests for B2G on m-c tomorrow, but if this patch merges to m-c, we won't be able to - next week would be the earliest we could try again.

Additionally, it will block the work of many people across several teams (a-team, qa, webqa, and rel-eng) who are working hard to stand up automation for B2G.
Unfortunately, that's a 700KB patch that's extremely bitrotty, and khuey already had to waste time rebasing it once :(.

BTW, it would be interesting to test on the first landing.  Maybe the rebase broke things.
Can you bisect by jsm? Lets see if we can fix this in situ.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #6)
> BTW, it would be interesting to test on the first landing.  Maybe the rebase
> broke things.

Yes, that's what I suspect, too, given Vicamo's bug 798491 comment 84:
> 1) checkout head right before this patch from inbound: marionette works fine
> 2) apply this patch: marionette timeout
> 3) apply this patch but don't enable it in b2g/app/b2g.js: marionette works
> fine

Investigating...
Sigh, I confused myself with all the different relanding revisions. cset 5ce71981e005 is the last relanding with it pref'ed off. Rebuilding.
I just built with tip m-i.  It isn't only Marionette that's broken, the emulator doesn't boot...it never gets past the "mozilla" screen.
(In reply to Jonathan Griffin (:jgriffin) from comment #10)
> I just built with tip m-i.  It isn't only Marionette that's broken, the
> emulator doesn't boot...it never gets past the "mozilla" screen.

Yeah, Gecko doesn't seem to come up at all judging from logcat. The interesting thing is that I just built m-i tip for the otoro and it seems to work just fine.

I've also looked at the interdiff of cset 67cb43bb8865 (the one Vicamo backed out, but said emulator tests worked with the pref off) and cset 5ce71981e005 (the last relanding) and can't find anything that sticks out. Unfortunately I ran out of time to rebuild 67cb43bb8865 with the pref off to verify Vicamo's assessment from bug 798491 comment 84.
I'm able to boot and run the marionette tests just fine.  Seeing if I repro the test failure.

However, *qemu* itself doesn't even seem to come up if I run the mochitests.  I don't even get to the point where UI is shown, let alone the mozilla logo.
test_getmessage passes just fine for me.  WTF
I find it extremely hard to believe that bug 798491 did anything but change timing.
test_getmessage passes when it runs as part of the suite, but fails when it runs on its own.  I see

waiting for system-message-listener-ready...
timed out
done

in both the full run and the individual run.  However, the test fails in the individual run because

TEST-START test_getmessage.js
/home/vicamo/WorkSpace/mozilla/b2g/qemu-arm/gecko/dom/sms/tests/marionette/test_getmessage.js, runTest (marionette_test.MarionetteJSTestCase) ... ERROR

======================================================================
ERROR: /home/vicamo/WorkSpace/mozilla/b2g/qemu-arm/gecko/dom/sms/tests/marionette/test_getmessage.js, runTest (marionette_test.MarionetteJSTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/cjones/mozilla/inbound/testing/marionette/client/marionette/marionette_test.py", line 164, in runTest
    f = open(self.jsFile, 'r')
TEST-UNEXPECTED-FAIL : IOError: [Errno 2] No such file or directory: '/home/vicamo/WorkSpace/mozilla/b2g/qemu-arm/gecko/dom/sms/tests/marionette/test_getmessage.js'
One difference is that with current m-i, the system-message-listener-ready event never fires, hence the:

waiting for system-message-listener-ready...
timed out

All the telephony tests fail for me, possibly as a result.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #15)
> TEST-START test_getmessage.js
> /home/vicamo/WorkSpace/mozilla/b2g/qemu-arm/gecko/dom/sms/tests/marionette/
> test_getmessage.js, runTest (marionette_test.MarionetteJSTestCase) ... ERROR

That's my home dir, I don't remember I gave you an access account ;)
(In reply to Jonathan Griffin (:jgriffin) from comment #16)
> One difference is that with current m-i, the system-message-listener-ready
> event never fires, hence the:
> 
> waiting for system-message-listener-ready...
> timed out
> 
> All the telephony tests fail for me, possibly as a result.

Please forget all the "system-message-listener-ready" thing here. I comment out "this._sysMsgListenerReady" nullity checking in dom/system/gonk/RadioInterfaceLayer.js whenever I want to run Marionette tests.
(In reply to Philipp von Weitershausen [:philikon] from comment #4)
> Ok, I have confirmed that bug 798491 definitely broke Marionette, even with
> the pref off (commit 553fb59b9ca0 works, commit 67cb43bb8865 doesn't).

Commit 553fb59b9ca02 reliably *does not* wfm for running run-emulator.sh.  I reliably see our old friend

E/GeckoConsole(   43): [JavaScript Error: "this.webapps[msg.id] is undefined" {file: "resource://gre/modules/Webapps.jsm" line: 602}]

when trying to start up.

./test.sh mochitest doesn't work at all.  qemu doesn't boot.

./test.sh works just fine, same as m-i tip.

ARGGGH
I see all the same results on ad1d720d82b7.
OK, with ad1d720d82b7 and running the mochitest command "manually" (not through test.sh)

$ cd ~/mozilla/emulator-b2g/objdir-gecko/_tests/testing/mochitest
$ python runtestsb2g.py --emulator arm --b2gpath ~/mozilla/emulator-b2g/ --xre-path ~/mozilla/emulator-b2g/gaia/xulrunner-sdk/bin/ --console-level INFO

I get mochitests to start up.  Checking m-i tip ...
On my machine, against m-i tip:

- mochitests can be made to work with a timing hack
- webapi tests are completely broken, but Marionette unit tests pass; Marionette itself is not broken
- the home screen is never shown when launching the emulator manually

None of these problems occur before this patch, with https://hg.mozilla.org/integration/mozilla-inbound/rev/ad1d720d82b7
FTR I see this in logcat on startup

I/GeckoDump(   43): XXX FIXME : Got a mozContentEvent: system-message-listener-ready
The frustrating thing is that all the tests work just fine on my machine running on inbound 01123067aa58 / gaia 6947eedbf2d368e02c1f4dcea6f1f9d958446bb7 .  I tried running deviceapi tests while cpulimit'd to 75% and 50% but same results.  Trying 25% now but don't expect different results, seems like an IO issue.
Tried with a background clobber build going and same thing :(.
FTR, jgriffin was on 74e25211c14db3ca590b7eb7bb64495bc7af2b62, where lots of bad things happen.  This may be reproducing some other known bugs.
If I comment out all the timeout handling code in marionette-simpletest.js, the "waiting for system-message-listener-ready..." timeout goes away.  This seems to be because

I/Gecko   (   43): [BUG] waitFor: default timeout is 0

The python driver does

        marionette.set_script_timeout(45000)

just before setting up this waitFor().  So one possibility is that that feature broke, and another is that it's been broken for a long time and the check here has happened to succeed on the first time because of different timings.
Oh, we forgot to note above that when jgriffin updated to gaia tip, tests started working again.

But the bad news is that the mochitests are failing on cedar with

23:01:16     INFO -  waiting for system-message-listener-ready...
23:01:16     INFO -  Traceback (most recent call last):
23:01:16     INFO -    File "/home/cltbld/talos-slave/test/build/tests/mochitest/runtestsb2g.py", line 504, in <module>
23:01:16     INFO -      main()
23:01:16     INFO -    File "/home/cltbld/talos-slave/test/build/tests/mochitest/runtestsb2g.py", line 457, in main
23:01:16     INFO -      marionette = Marionette(**kwargs)
23:01:16     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/marionette/marionette.py", line 147, in __init__
23:01:16     INFO -      self.emulator.install_gecko(self.gecko_path, self)
23:01:16     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/marionette/emulator.py", line 405, in install_gecko
23:01:16     INFO -      self.wait_for_system_message(marionette)
23:01:16     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/marionette/emulator.py", line 299, in wait_for_system_message
23:01:16     INFO -      """)
23:01:16     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/marionette/marionette.py", line 428, in execute_async_script
23:01:16     INFO -      specialPowers=special_powers)
23:01:16     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/marionette/marionette.py", line 187, in _send_message
23:01:16     INFO -      self._handle_error(response)
23:01:16     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/marionette/marionette.py", line 222, in _handle_error
23:01:16     INFO -      raise JavascriptException(message=message, status=status, stacktrace=stacktrace)
23:01:16     INFO -  marionette.errors.JavascriptException/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/marionette/errors.py:18: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
23:01:16     INFO -    return str(self.message)
23:01:16     INFO -  : uncaught exception: 2147500033 at:  line: 0
23:01:16    ERROR - Return code: 1

which looks like the bug jgriffin was seeing.  So it looks like we need an emulator update :(.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #28)
> just before setting up this waitFor().  So one possibility is that that
> feature broke, and another is that it's been broken for a long time and the
> check here has happened to succeed on the first time because of different
> timings.

I also get

I/Gecko   (   43): [BUG] waitFor: default timeout is 0

on ad1d720d82b7.  So the check has been luckily passing on the first try.
With the pref flipped back on, the problems seem to stem from

 toolkit/devtools/debugger/server/dbg-server.jsm

around here

-----------
// Load the debugging server in a sandbox with its own compartment.
var systemPrincipal = Cc["@mozilla.org/systemprincipal;1"]
                      .createInstance(Ci.nsIPrincipal);

var gGlobal = Cu.Sandbox(systemPrincipal);
gGlobal.importFunction(loadSubScript);
gGlobal.loadSubScript("chrome://global/content/devtools/dbg-server.js");

this.DebuggerServer = gGlobal.DebuggerServer;
-----------

I tried all the hacks I could think of to make this work, but none of them succeeded.  Needs someone who actually understands Gecko-JS to fix, I fear.
What problem are you seeing in the debugger code? I tried khuey's patch yesterday and the debugger seemed to work just fine.
Thanks for the patch.

That fix wasn't quite right, since it removes the ability to have a default timeout, which breaks several tests.  (The default timeout case was working correctly in content, but not chrome.)  This version fixes that, while still including your other logic.  It passes all tests locally, and I've verified it actually waits for the system-message-listener-ready event correctly.
Attachment #677479 - Flags: review?(jones.chris.g)
Attachment #677296 - Attachment is obsolete: true
Attachment #677296 - Flags: review?(jgriffin)
Attachment #677479 - Flags: review?(jones.chris.g) → review+
Comment on attachment 677479 [details] [diff] [review]
fix timeout argument to waitFor(),

>+      var deadline = now + (timeout || this.timeout);

Actually this (||) code is incorrect now, because if timeout == 0 (expired), then we'll reset it to the default timeout ;).

r=me with that fixed
Review comments addressed.
Attachment #677479 - Attachment is obsolete: true
Attachment #677558 - Flags: review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/3ff61da528fa

This doesn't fix the larger issue of Marionette/remote-debugger breaking when the jsloader.reuseGlobal pref is flipped.
Whiteboard: [leave open]
Alright, after some sleuthing I found the problem to be mozIJSSubScriptLoader::loadSubScript(). Like Cu.import(), it defaults to loading the specified script in the global scope. When all JSMs are imported as functions into a shared global scope, the sub scripts are, too. So there's three things that need to be done in such a case:

(1) import the subscript into the local scope (this)
(2) make sure the subscript doesn't depend on anything outside its own script. If it does, it needs to be accessed as a member of `this`.
(3) anything the subscript wants to export needs to be set as a member of `this`

Incidentally, (3) is already taken care of, so this patch just does (1) and (2). Also, it fixes a EXPORTED_SYMBOL in specialpowers.
Attachment #677578 - Flags: review?(past)
Comment on attachment 677578 [details] [diff] [review]
fix debugger protocol imports

Shouldn't this patch include the pref flip or are you going to do that in a second patch?
(In reply to Chris Jones [:cjones] [:warhammer] from comment #39)
> Shouldn't this patch include the pref flip or are you going to do that in a
> second patch?

Oh yeah we can do that. I was going to do some more testing later to make sure the RIL and our JS/JSM-heavy DOM APIs were ok. I just forgot my SIM card in the other phone at home. :/
Comment on attachment 677578 [details] [diff] [review]
fix debugger protocol imports

Review of attachment 677578 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/devtools/debugger/dbg-transport.js
@@ +4,5 @@
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  "use strict";
> +Components.utils.import("resource://gre/modules/NetUtil.jsm");

That's not necessary, is it?
Attachment #677578 - Flags: review?(past) → review+
(In reply to Panos Astithas [:past] from comment #42)
> Comment on attachment 677578 [details] [diff] [review]
> fix debugger protocol imports
> 
> Review of attachment 677578 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/devtools/debugger/dbg-transport.js
> @@ +4,5 @@
> >   * License, v. 2.0. If a copy of the MPL was not distributed with this
> >   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> >  
> >  "use strict";
> > +Components.utils.import("resource://gre/modules/NetUtil.jsm");
> 
> That's not necessary, is it?

Or is it your (2) above?
(In reply to Panos Astithas [:past] from comment #43)
> > > +Components.utils.import("resource://gre/modules/NetUtil.jsm");
> > 
> > That's not necessary, is it?
> 
> Or is it your (2) above?

It is indeed.
Attached patch flip prefSplinter Review
Phone seems good with the pref flipped.
Attachment #677946 - Flags: review?(jones.chris.g)
Comment on attachment 677946 [details] [diff] [review]
flip pref

\o/ \o/
Attachment #677946 - Flags: review?(jones.chris.g) → review+
https://hg.mozilla.org/releases/mozilla-aurora/rev/71796f03f34e
https://hg.mozilla.org/releases/mozilla-aurora/rev/e4ace841aca8
https://hg.mozilla.org/releases/mozilla-aurora/rev/e319131e61e3

Note that the second patch didn't apply cleanly to Aurora due to it not having MockPermissionPrompt.jsm. Also, the third patch didn't apply cleanly because the jsloader.reuseGlobal pref wasn't set in b2g.js, so I went ahead and added it.
(In reply to Ryan VanderMeulen from comment #49)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/71796f03f34e
> https://hg.mozilla.org/releases/mozilla-aurora/rev/e4ace841aca8
> https://hg.mozilla.org/releases/mozilla-aurora/rev/e319131e61e3
> 
> Note that the second patch didn't apply cleanly to Aurora due to it not
> having MockPermissionPrompt.jsm. Also, the third patch didn't apply cleanly
> because the jsloader.reuseGlobal pref wasn't set in b2g.js, so I went ahead
> and added it.

<3

Thanks, Ryan!
After turnning jsloader.reuseGlobal off on otoro, b2g seemed to fail to start. Did I miss something? The last revision of gecko/ of my working copy is:

commit 87334268876e5519d2dae572bf12c0ccb60697ac
Author: Eric Chou <echou@mozilla.com>
Date:   Mon Nov 5 15:48:34 2012 -0800
Commit https://hg.mozilla.org/mozilla-central/rev/75f67e186249 results in bug 809717.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 809717
re-flip the value back, now ctypes will be undefined on device.
Vicamo, we don't reopen bugs when then there are new regressions found a long time after they land.  If we tried to do that, some bugs would be reopened hundreds of times ;).  We usually only reopen if the original patches are backed out again, which we shouldn't do for bug 809717.

That said, we should fix bug 809717 asap.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.