Last Comment Bug 810719 - B2G: turn on jsloader.reuseGlobal on Beta
: B2G: turn on jsloader.reuseGlobal on Beta
Status: RESOLVED FIXED
[MemShrink:P1]
:
Product: Firefox OS
Classification: Client Software
Component: General (show other bugs)
: unspecified
: ARM Gonk (Firefox OS)
: P1 normal (vote)
: B2G C2 (20nov-10dec)
Assigned To: Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary)
:
Mentors:
Depends on: 798491 807478 807845 809717 811784 813762 813766 814102 814104
Blocks: slim-fast 807104 807698
  Show dependency treegraph
 
Reported: 2012-11-11 15:43 PST by Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary)
Modified: 2013-02-26 16:45 PST (History)
23 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
+
fixed
unaffected


Attachments
Patch (888 bytes, patch)
2012-11-28 10:49 PST, Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary)
justin.lebar+bug: review+
justin.lebar+bug: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-11-11 15:43:13 PST
+++ This bug was initially created as a clone of Bug #807478 +++

We still need to do this on Aurora
Comment 1 Justin Lebar (not reading bugmail) 2012-11-20 09:02:34 PST
Well, the good news is we don't need this on Aurora anymore... :-/
Comment 2 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-20 12:19:47 PST
This is a MemShrink:P1 right?  It would be an indescribable shame for all this work to die on the vine ...
Comment 3 Justin Lebar (not reading bugmail) 2012-11-20 12:29:51 PST
(In reply to Chris Jones [:cjones] [:warhammer] from comment #2)
> This is a MemShrink:P1 right?  It would be an indescribable shame for all
> this work to die on the vine ...

Yes, absolutely.  It's not triaged because we usually triage bugs only at our bi-weekly meetings.  In cases like this, we hold off triaging not because we expect there to be any disagreement about the priority, but because we use triage as an opportunity to discuss bugs.  If we mark this as P1 right now, this bug won't be on our list of issues to talk about.

I doubt Kyle isn't working on this because it's not marked MemShrink:P1.
Comment 4 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-20 12:44:48 PST
As before when we were counting beans over memshrink priorities, my only goal is to ensure that this bug is prioritized above the other slim-fast work on the memshrink radar.
Comment 5 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-11-20 13:07:39 PST
Kyle is working on this, and Kyle works faster when he doesn't have to read bugmail about how Kyle is not working on stuff :-P
Comment 6 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-20 13:44:34 PST
I didn't mean to imply that Kyle wasn't working on this :).  However, I suspect Kyle also works faster when he doesn't have to read bugmail about how Kyle *is* working on stuff, so let's move on.
Comment 7 Alex Keybl [:akeybl] 2012-11-20 13:53:58 PST
This bug has been called out as likely having risk to non-B2G platforms. Given that, marking as P1, and moving into the C2 milestone. We should prioritize this landing to mozilla-beta as soon as possible, to prevent late-breaking regressions to other platforms.
Comment 8 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-11-20 14:07:12 PST
This bug has no risk to non-B2G platforms.  Bugs that this depends on might, but flipping the pref will only affect b2g.
Comment 9 Justin Lebar (not reading bugmail) 2012-11-20 14:09:07 PST
>  Bugs that this depends on might

It's for this reason that I brought this bug to akeybl's attention.
Comment 10 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-27 10:01:23 PST
\o/
Comment 11 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-11-28 10:49:47 PST
Created attachment 686178 [details] [diff] [review]
Patch
Comment 12 Justin Lebar (not reading bugmail) 2012-11-28 10:51:06 PST
Comment on attachment 686178 [details] [diff] [review]
Patch

r+a=me
Comment 13 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-11-28 17:33:03 PST
I tried to land this today but I had to back it out because it causes b2g to crash :-(
Comment 14 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-28 21:26:58 PST
tl;dr

 - an app process is SIGBUS'ing around the time we push new files
 - this might be racy and the app process might be seeing inconsistent .so's
 - b2g dies and is relaunched twice (?) after that
 - on the fourth relaunching (fourth time's a charm!) we get a legitimate-looking error

14:41:08     INFO - E/GeckoConsole(  342): [JavaScript Error: "resource://specialpowers/MockPermissionPrompt.jsm - EXPORTED_SYMBOLS is not an array." {file: "chrome://specialpowers/content/specialpowersAPI.js" line: 13}]

 - things seem to proceed normally but then the harness errors out
 - Kyle and I can't seem to reproduce

More precisely, in https://tbpl.mozilla.org/php/getParsedLog.php?id=17426792&full=1&branch=mozilla-beta at least

14:41:08     INFO -  installing gecko binaries...
...
14:41:08     INFO -  restarting B2G
14:41:08     INFO -  'cp' not found, but 'dd' was found as a replacement
14:41:08     INFO -  Traceback (most recent call last):
...
14:41:08     INFO -    File "/usr/lib/python2.6/telnetlib.py", line 395, in read_very_lazy
14:41:08     INFO -      raise EOFError, 'telnet connection closed'
14:41:08     INFO -  EOFError: telnet connection closed
14:41:08    ERROR - Return code: 1

The telnet connection seems to be interrupted when we push the new gecko onto the emulator.  That's strange.  Continuing on

...
14:41:08     INFO - E/GeckoConsole(   43): [JavaScript Error: "NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIXPCComponents_Utils.import]" {file: "jar:file:///system/b2g/omni.ja!/components/HealthReportService.js" line: 10}]
14:41:08     INFO - I/Gecko   (   43): 1354142124088	Marionette	INFO	MarionetteComponent loaded
14:41:08     INFO - I/Gecko   (   43): 1354142124139	Marionette	INFO	marionette enabled

The restarted b2g process comes up, and marionette seems to be happy.  Then

14:41:08     INFO - F/libc    (  182): Fatal signal 7 (SIGBUS) at 0x41390cf4 (code=2)
...
14:41:08     INFO - I/DEBUG   (   35): pid: 182, tid: 182  >>> /system/b2g/plugin-container <<<
...
14:41:08     INFO - E/GeckoConsole(   43): Content JS INFO at app://system.gaiamobile.org/js/window_manager.js:996 in createFrame: %%%%% Launching FTU as remote (OOP)
14:41:08     INFO - E/GeckoConsole(   43): Content JS INFO at app://system.gaiamobile.org/js/window_manager.js:996 in createFrame: %%%%% Launching Homescreen as remote (OOP)
...
14:41:08     INFO - I/Gecko   (   43): [Parent 43] WARNING: pipe error (105): Connection reset by peer: file /data/jenkins/jobs/b2g-build/workspace/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 431
14:41:08     INFO - I/Gecko   (   43): [Parent 43] WARNING: pipe error (110): Connection reset by peer: file /data/jenkins/jobs/b2g-build/workspace/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 431
14:41:08     INFO - I/Gecko   (   43): [Parent 43] WARNING: pipe error (112): Connection reset by peer: file /data/jenkins/jobs/b2g-build/workspace/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 431
14:41:08     INFO - F/libc    (   43): Fatal signal 11 (SIGSEGV) at 0x00000047 (code=1)

This is a little hard to interpret, but basically what's happening is
 - b2g process launches FTU and homescreen apps
 - one of those two crashes with SIGBUS (this is backwards in logcat, confusingly)
 - the other doesn't show a crash, but it appears to die too
 - the main b2g processes segfaults

This is where things are obviously going wrong.

After that,

14:41:08     INFO - E/profiler(  267): Registering start signal
14:41:08     INFO - I/Gecko   (  267): 1354142210634	Marionette	INFO	MarionetteComponent loaded
14:41:08     INFO - I/Gecko   (  267): 1354142210665	Marionette	INFO	marionette enabled

b2g is relaunched by the services manager and marionette comes up again.

14:41:08     INFO - E/profiler(  342): Registering start signal
14:41:08     INFO - I/Gecko   (  342): 1354142229425	Marionette	INFO	MarionetteComponent loaded
14:41:08     INFO - I/Gecko   (  342): 1354142229443	Marionette	INFO	marionette enabled

Hm, so the third b2g dies and a fourth is launched.  That's weird because there are no crashes logged.

14:41:08     INFO - E/GeckoConsole(  342): [JavaScript Error: "resource://specialpowers/MockPermissionPrompt.jsm - EXPORTED_SYMBOLS is not an array." {file: "chrome://specialpowers/content/specialpowersAPI.js" line: 13}]

That looks relevant at least!

14:41:08     INFO - E/profiler(  380): Registering start signal
14:41:08     INFO - E/GeckoConsole(  342): Content JS INFO at app://system.gaiamobile.org/js/window_manager.js:996 in createFrame: %%%%% Launching FTU as remote (OOP)
14:41:08     INFO - E/GeckoConsole(  342): Content JS INFO at app://system.gaiamobile.org/js/window_manager.js:996 in createFrame: %%%%% Launching Homescreen as remote (OOP)
14:41:08     INFO - E/profiler(  405): Registering start signal

This time, with the fourth b2g process, FTU and homescreen seem to launch fine.  But next

14:41:08  WARNING - # TBPL WARNING #
14:41:08  WARNING - The mochitest suite: mochitest-1 ran with return status: WARNING

and we're done.

I'm not able to reproduce anything like this locally with ./test.sh, although |./test.sh mochitest| is broken.  I can't start up the emulator with ./run-emulator.sh.  I don't think Kyle has been able to get anything running.

jgriffin, how can we most closely duplicate the tbpl environment locally?
Comment 15 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-28 21:30:46 PST
Try push to rule out ... some class of weirdness, but I'm not holding my breath for anything meaningful

https://tbpl.mozilla.org/?tree=Try&rev=4eea13894e1d
Comment 16 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-28 21:35:37 PST
I see basically the same thing looking through a failing reftest log,

https://tbpl.mozilla.org/php/getParsedLog.php?id=17425077&full=1&branch=mozilla-beta#error0
Comment 17 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-28 21:39:14 PST
I may be misinterpreting these log statements

14:41:08     INFO -  waiting for system-message-listener-ready...
14:41:08     INFO -  done
14:41:08     INFO -  installing gecko binaries...
14:41:08     INFO -  pushing /system/b2g/crashreporter-override.ini (attempt 1 of 10)
...
14:41:08     INFO -  pushing /system/b2g/components/binary.manifest (attempt 1 of 10)
14:41:08     INFO -  restarting B2G

but they make it appear like we're pushing the updated build while the previous b2g service is running.  If so, that would explain two of the weird things in comment 14
 - SIGBUS on a build that only has a pref change
 - b2g process dying and restarting without error messages (this happens on SIGKILL, which is what |adb stop b2g| delivers)

To (mostly) fix this problem, we need to stop b2g before pushing new code to the emulator.

I also forgot to mention above, I don't see any issues when running with this pref change on an otoro.
Comment 18 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-28 21:51:30 PST
Sigh.  I was testing with a .userconfig pointing at mozilla-inbound :/.  Here we go again ...
Comment 19 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-28 22:15:55 PST
Another interpretation of these logs is

[some weird and possibly bad, but irrelevant, stuff happens on startup]
...
14:41:08     INFO - E/GeckoConsole(  342): [JavaScript Error: "resource://specialpowers/MockPermissionPrompt.jsm - EXPORTED_SYMBOLS is not an array." {file: "chrome://specialpowers/content/specialpowersAPI.js" line: 13}]
...
[harness doesn't start up properly and we fail the tests]

Occam's Razor favors this.  Build almost done, will see.
Comment 20 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-28 22:20:00 PST
I can reproduce locally.  Simpler explanation seems to be correct.
Comment 21 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-28 22:23:46 PST
Bug 807478 seems not to have been uplifted hard enough.
Comment 22 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-28 22:36:33 PST
https://hg.mozilla.org/releases/mozilla-beta/rev/6e6c2d28ffdf
Comment 23 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-28 22:39:14 PST
jgriffin, leaving needinfo? outstanding.  Last time we diagnosed these test failures, we discussed the right incantations to run the suites like CI does, but I totally forgot them :).  We should document that somewhere.

Note You need to log in before you can comment on or make changes to this bug.