Closed Bug 810719 Opened 12 years ago Closed 12 years ago

B2G: turn on jsloader.reuseGlobal on Beta

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-basecamp:+, firefox18 fixed, firefox19 unaffected)

RESOLVED FIXED
B2G C2 (20nov-10dec)
blocking-basecamp +
Tracking Status
firefox18 --- fixed
firefox19 --- unaffected

People

(Reporter: khuey, Assigned: khuey)

References

Details

(Whiteboard: [MemShrink:P1])

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #807478 +++

We still need to do this on Aurora
Whiteboard: [MemShrink]
Assignee: nobody → khuey
Summary: B2G: turn on jsloader.reuseGlobal on Aurora → B2G: turn on jsloader.reuseGlobal on Aurora and Beta
Well, the good news is we don't need this on Aurora anymore... :-/
Summary: B2G: turn on jsloader.reuseGlobal on Aurora and Beta → B2G: turn on jsloader.reuseGlobal on Beta
This is a MemShrink:P1 right?  It would be an indescribable shame for all this work to die on the vine ...
(In reply to Chris Jones [:cjones] [:warhammer] from comment #2)
> This is a MemShrink:P1 right?  It would be an indescribable shame for all
> this work to die on the vine ...

Yes, absolutely.  It's not triaged because we usually triage bugs only at our bi-weekly meetings.  In cases like this, we hold off triaging not because we expect there to be any disagreement about the priority, but because we use triage as an opportunity to discuss bugs.  If we mark this as P1 right now, this bug won't be on our list of issues to talk about.

I doubt Kyle isn't working on this because it's not marked MemShrink:P1.
As before when we were counting beans over memshrink priorities, my only goal is to ensure that this bug is prioritized above the other slim-fast work on the memshrink radar.
Kyle is working on this, and Kyle works faster when he doesn't have to read bugmail about how Kyle is not working on stuff :-P
I didn't mean to imply that Kyle wasn't working on this :).  However, I suspect Kyle also works faster when he doesn't have to read bugmail about how Kyle *is* working on stuff, so let's move on.
This bug has been called out as likely having risk to non-B2G platforms. Given that, marking as P1, and moving into the C2 milestone. We should prioritize this landing to mozilla-beta as soon as possible, to prevent late-breaking regressions to other platforms.
Priority: -- → P1
Target Milestone: --- → B2G C2 (20nov-10dec)
This bug has no risk to non-B2G platforms.  Bugs that this depends on might, but flipping the pref will only affect b2g.
>  Bugs that this depends on might

It's for this reason that I brought this bug to akeybl's attention.
Whiteboard: [MemShrink] → [MemShrink:P1]
Comment on attachment 686178 [details] [diff] [review]
Patch

r+a=me
Attachment #686178 - Flags: review+
Attachment #686178 - Flags: approval-mozilla-beta+
I tried to land this today but I had to back it out because it causes b2g to crash :-(
tl;dr

 - an app process is SIGBUS'ing around the time we push new files
 - this might be racy and the app process might be seeing inconsistent .so's
 - b2g dies and is relaunched twice (?) after that
 - on the fourth relaunching (fourth time's a charm!) we get a legitimate-looking error

14:41:08     INFO - E/GeckoConsole(  342): [JavaScript Error: "resource://specialpowers/MockPermissionPrompt.jsm - EXPORTED_SYMBOLS is not an array." {file: "chrome://specialpowers/content/specialpowersAPI.js" line: 13}]

 - things seem to proceed normally but then the harness errors out
 - Kyle and I can't seem to reproduce

More precisely, in https://tbpl.mozilla.org/php/getParsedLog.php?id=17426792&full=1&branch=mozilla-beta at least

14:41:08     INFO -  installing gecko binaries...
...
14:41:08     INFO -  restarting B2G
14:41:08     INFO -  'cp' not found, but 'dd' was found as a replacement
14:41:08     INFO -  Traceback (most recent call last):
...
14:41:08     INFO -    File "/usr/lib/python2.6/telnetlib.py", line 395, in read_very_lazy
14:41:08     INFO -      raise EOFError, 'telnet connection closed'
14:41:08     INFO -  EOFError: telnet connection closed
14:41:08    ERROR - Return code: 1

The telnet connection seems to be interrupted when we push the new gecko onto the emulator.  That's strange.  Continuing on

...
14:41:08     INFO - E/GeckoConsole(   43): [JavaScript Error: "NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIXPCComponents_Utils.import]" {file: "jar:file:///system/b2g/omni.ja!/components/HealthReportService.js" line: 10}]
14:41:08     INFO - I/Gecko   (   43): 1354142124088	Marionette	INFO	MarionetteComponent loaded
14:41:08     INFO - I/Gecko   (   43): 1354142124139	Marionette	INFO	marionette enabled

The restarted b2g process comes up, and marionette seems to be happy.  Then

14:41:08     INFO - F/libc    (  182): Fatal signal 7 (SIGBUS) at 0x41390cf4 (code=2)
...
14:41:08     INFO - I/DEBUG   (   35): pid: 182, tid: 182  >>> /system/b2g/plugin-container <<<
...
14:41:08     INFO - E/GeckoConsole(   43): Content JS INFO at app://system.gaiamobile.org/js/window_manager.js:996 in createFrame: %%%%% Launching FTU as remote (OOP)
14:41:08     INFO - E/GeckoConsole(   43): Content JS INFO at app://system.gaiamobile.org/js/window_manager.js:996 in createFrame: %%%%% Launching Homescreen as remote (OOP)
...
14:41:08     INFO - I/Gecko   (   43): [Parent 43] WARNING: pipe error (105): Connection reset by peer: file /data/jenkins/jobs/b2g-build/workspace/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 431
14:41:08     INFO - I/Gecko   (   43): [Parent 43] WARNING: pipe error (110): Connection reset by peer: file /data/jenkins/jobs/b2g-build/workspace/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 431
14:41:08     INFO - I/Gecko   (   43): [Parent 43] WARNING: pipe error (112): Connection reset by peer: file /data/jenkins/jobs/b2g-build/workspace/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 431
14:41:08     INFO - F/libc    (   43): Fatal signal 11 (SIGSEGV) at 0x00000047 (code=1)

This is a little hard to interpret, but basically what's happening is
 - b2g process launches FTU and homescreen apps
 - one of those two crashes with SIGBUS (this is backwards in logcat, confusingly)
 - the other doesn't show a crash, but it appears to die too
 - the main b2g processes segfaults

This is where things are obviously going wrong.

After that,

14:41:08     INFO - E/profiler(  267): Registering start signal
14:41:08     INFO - I/Gecko   (  267): 1354142210634	Marionette	INFO	MarionetteComponent loaded
14:41:08     INFO - I/Gecko   (  267): 1354142210665	Marionette	INFO	marionette enabled

b2g is relaunched by the services manager and marionette comes up again.

14:41:08     INFO - E/profiler(  342): Registering start signal
14:41:08     INFO - I/Gecko   (  342): 1354142229425	Marionette	INFO	MarionetteComponent loaded
14:41:08     INFO - I/Gecko   (  342): 1354142229443	Marionette	INFO	marionette enabled

Hm, so the third b2g dies and a fourth is launched.  That's weird because there are no crashes logged.

14:41:08     INFO - E/GeckoConsole(  342): [JavaScript Error: "resource://specialpowers/MockPermissionPrompt.jsm - EXPORTED_SYMBOLS is not an array." {file: "chrome://specialpowers/content/specialpowersAPI.js" line: 13}]

That looks relevant at least!

14:41:08     INFO - E/profiler(  380): Registering start signal
14:41:08     INFO - E/GeckoConsole(  342): Content JS INFO at app://system.gaiamobile.org/js/window_manager.js:996 in createFrame: %%%%% Launching FTU as remote (OOP)
14:41:08     INFO - E/GeckoConsole(  342): Content JS INFO at app://system.gaiamobile.org/js/window_manager.js:996 in createFrame: %%%%% Launching Homescreen as remote (OOP)
14:41:08     INFO - E/profiler(  405): Registering start signal

This time, with the fourth b2g process, FTU and homescreen seem to launch fine.  But next

14:41:08  WARNING - # TBPL WARNING #
14:41:08  WARNING - The mochitest suite: mochitest-1 ran with return status: WARNING

and we're done.

I'm not able to reproduce anything like this locally with ./test.sh, although |./test.sh mochitest| is broken.  I can't start up the emulator with ./run-emulator.sh.  I don't think Kyle has been able to get anything running.

jgriffin, how can we most closely duplicate the tbpl environment locally?
Flags: needinfo?(jgriffin)
Try push to rule out ... some class of weirdness, but I'm not holding my breath for anything meaningful

https://tbpl.mozilla.org/?tree=Try&rev=4eea13894e1d
I may be misinterpreting these log statements

14:41:08     INFO -  waiting for system-message-listener-ready...
14:41:08     INFO -  done
14:41:08     INFO -  installing gecko binaries...
14:41:08     INFO -  pushing /system/b2g/crashreporter-override.ini (attempt 1 of 10)
...
14:41:08     INFO -  pushing /system/b2g/components/binary.manifest (attempt 1 of 10)
14:41:08     INFO -  restarting B2G

but they make it appear like we're pushing the updated build while the previous b2g service is running.  If so, that would explain two of the weird things in comment 14
 - SIGBUS on a build that only has a pref change
 - b2g process dying and restarting without error messages (this happens on SIGKILL, which is what |adb stop b2g| delivers)

To (mostly) fix this problem, we need to stop b2g before pushing new code to the emulator.

I also forgot to mention above, I don't see any issues when running with this pref change on an otoro.
Sigh.  I was testing with a .userconfig pointing at mozilla-inbound :/.  Here we go again ...
Another interpretation of these logs is

[some weird and possibly bad, but irrelevant, stuff happens on startup]
...
14:41:08     INFO - E/GeckoConsole(  342): [JavaScript Error: "resource://specialpowers/MockPermissionPrompt.jsm - EXPORTED_SYMBOLS is not an array." {file: "chrome://specialpowers/content/specialpowersAPI.js" line: 13}]
...
[harness doesn't start up properly and we fail the tests]

Occam's Razor favors this.  Build almost done, will see.
I can reproduce locally.  Simpler explanation seems to be correct.
Bug 807478 seems not to have been uplifted hard enough.
https://hg.mozilla.org/releases/mozilla-beta/rev/6e6c2d28ffdf
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
jgriffin, leaving needinfo? outstanding.  Last time we diagnosed these test failures, we discussed the right incantations to run the suites like CI does, but I totally forgot them :).  We should document that somewhere.
Flags: needinfo?(jgriffin)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: