Closed Bug 825840 Opened 7 years ago Closed 7 years ago

[build] Mac Gecko build an flash are broken

Categories

(Firefox OS Graveyard :: GonkIntegration, defect, P1)

x86_64
macOS
defect

Tracking

(blocking-basecamp:+, firefox19 wontfix, firefox20 fixed, b2g18 fixed)

RESOLVED FIXED
B2G C4 (2jan on)
blocking-basecamp +
Tracking Status
firefox19 --- wontfix
firefox20 --- fixed
b2g18 --- fixed

People

(Reporter: dscravaglieri, Assigned: gwagner)

References

Details

Attachments

(2 files, 1 obsolete file)

STR:

0- On a Mac
1- ./repo sync
2- ./build.sh
3- ./flash.sh

Expected:
Unagi starts with fresh build

Actual:
the unagi starts but the screen stays black.
Assignee: nobody → doug.turner
blocking-basecamp: --- → ?
From Anthony Ricaud on the mailing list:

It seems the root error is this:
$  "/usr/bin/ld" -demangle -dynamic -arch x86_64 -macosx_version_min
10.8.0 -syslibroot /Users/rik24d/code/b2g-unagi/**out/target/product/unagi/obj/
-o a.out -L/Users/rik24d/code/b2g-**unagi/out/target/product/**unagi/obj/lib
-rpath-link=/Users/rik24d/**code/b2g-unagi/out/target/**product/unagi/obj/lib
-lSystem /usr/bin/../lib/clang/4.0/lib/**darwin/libclang_rt.osx.a
ld: unknown option: -rpath-link=/Users/rik24d/**code/b2g-unagi/out/target/
**product/unagi/obj/lib

Does ld on the Mac not have the rpath-link option?
Flags: needinfo?(dscravaglieri)
Andrew: Looks like it doesn't.

Here's the output of `ld -v` : 
@(#)PROGRAM:ld  PROJECT:ld64-133.3
configured to support archs: armv6 armv7 i386 x86_64
LTO support using: LLVM version 3.1svn, from Apple Clang 4.0 (build 421.0.60)

I'm on 10.8.2.
Flags: needinfo?(dscravaglieri)
Very important to fix, but not a blocker.

Adding some who may have an idea where to look.
blocking-basecamp: ? → -
I have the same ld. Not sure what to add, but yes, ld64 has -rpath, but no -rpath-link. From the manual it looks like -rpath just changes the runtime behaviour, not the link time one as with ELF.
i am not working on this.
Assignee: doug.turner → nobody
I'd like to renominate this as this seriously affects our development process. Not being able to build affects my productivity.
blocking-basecamp: - → ?
I second Anthony's comment. If this is not a blocking bug, then no one will fix it. And then all mac-based developers will be screwed.

Bug 825698, for example, is a blocking+ bug assigned to me. I can't work on it because versions of gecko that I build will not boot.  It probably isn't right to say that this bug blocks 825698, but it kind of does.  I am currently the best person to work on 825698, but I can't.
These are the symptoms I see with this bug.  My build process looks like this:

  ./build.sh gecko
  ./flash.sh gecko

Both of these steps run without error. But then the phone won't boot. It shows the FirefoxOS splash screen, but then goes black.

Building m-c from the tip yesterday, the errors in logcat were in Webapps.jsm at line 2290, I think. (It was something about nsIProperties.get, but the code at line 2290 is completely unrelated to that).

Sometimes I would also see repeated errors in OfflineCacheInstaller.jsm at line 68 where it calls outputStream.close(). The error message seems to indicate that output streams don't have close methods anymore, which seems sort of absurd. I've seen other bugs that mention recent necko changes... Could that be affecting anything here?

I also tried building b2g18 from tip last night. I got the very same Webapps.jsm error, but didn't see any errors from OfflineCacheInstaller.jsm

Then, I reverted b2g18 to right before the most recent (just a few days ago) change to Webapps.jsm.  But I still saw the error in that file, only at a slightly different line number because of the change to the file.

Maybe someone could bisect this?
As a start at the bisection, I'm trying hg checkout -d 2012-12-28 and will see if that builds and boots.
I should add that I'm on MacOS 10.6.8 and am not noticing any link errors when I build. I've got clang 3.1 installed, but I assume I'm using whatever stock version of ld is on the system.
When I've got a working build that someone else built and I see a Webapps.jsm error in the logcat, it looks like this:

E/GeckoConsole(  108): [JavaScript Error: "NS_ERROR_FAILURE: Denied" {file: "jar:file:///system/b2g/omni.ja!/components/Webapps.js" line: 701}]

But when I build myself, the Webapps.jsm error looks like this:

E/GeckoConsole(  108): [JavaScript Error: "NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]" {file: "resource://gre/modules/Webapps.jsm" line: 2290}]

Notice the difference in the file URLs.  Completly different URL schemes.  I have no idea what it means, but it seems like it could be a hint to someone who understands it...
Gecko fails to book even with my checkout from 12-28.  Here's the OfflineCacheInstaller error I see.

E/GeckoConsole(  553): [JavaScript Error: "[Exception... "Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIOutputStream.close]"  nsresult: "0x80040111 (NS_ERROR_NOT_AVAILABLE)"  location: "JS frame :: resource://gre/modules/OfflineCacheInstaller.jsm :: storeCache/<.onCacheEntryAvailable :: line 68"  data: no]" {file: "resource://gre/modules/OfflineCacheInstaller.jsm" line: 68}]

That is repeated about 5 times, then I see the Webapps.jsm error from above appears.

This OfflineCacheInstaller.jsm error only seems to appear right after the ./flash.sh gecko. It does not appear in the logcat for subsequent reboots.

The screen is black. The soft buttons are illuminated. None of the hardware buttons do anything. When I reboot, I see the TurkCell screen, then the FirefoxOS screen, then blank with the soft buttons off, then blank with the soft buttons on.  

I'll attach the complete logcat of the reboot sequence.
I've attached the logcat output of a reboot up until the time the screen goes blank and the soft buttons illuminate.  It shows the Webapps.jsm error, but does not show the OfflineCacheInstaller errors that occur on the restart after flashing
Doing hg checkout -d 2012-12-26 on my mozilla-central tree and rebuilding doesn't fix the problem either.  That't pretty weird, since I'm almost positive I was building gecko successfully in that time frame.
Duplicate of this bug: 826445
See bug 826445 comment 5.
Assignee: nobody → philipp
Attached patch fix that doesn't (obsolete) — Splinter Review
Attachment #697654 - Flags: review?(21)
JP suggested that Brad may be able to take a look.
Comment on attachment 697654 [details] [diff] [review]
fix that doesn't

Ok, never mind, this doesn't work after all. Weird, I'm pretty sure it was working just a minute ago.

I bet we're now running into another weird import failure that simply isn't being logged.
Attachment #697654 - Attachment description: fix that somehow works but I don't know why → fix that doesn't
Attachment #697654 - Flags: review?(21)
(In reply to David Flanagan [:djf] from comment #11)

> Notice the difference in the file URLs.  Completly different URL schemes.  I
> have no idea what it means, but it seems like it could be a hint to someone
> who understands it...

I'd say that's because one is .jsm and the other is .js :) not the same files ! so you're not looking at the same error at all in both cases.

Also, the line number is wrong because these files are preprocessed, you have to look around the said line number to find the guilty line.
As all Mac users are impacted, this is now considered as a blocker.
blocking-basecamp: ? → +
Priority: -- → P1
> Also, the line number is wrong because these files are preprocessed, you have to look around the 
> said line number to find the guilty line.

One can just look at the file in the objdir.  Anyway, we got the line in bug 826445.
(In reply to Julien Wajsberg [:julienw] from comment #21)

> 
> I'd say that's because one is .jsm and the other is .js :) not the same
> files ! so you're not looking at the same error at all in both cases.

That's embarassing!  Sorry for the noise!
Philip any news here?
I am currently bisecting. The last good build was around 12/29.
I'll find and owner for this to offload :phillikon.
Assignee: philipp → milan
you can even use the releases/b2g18 repository, this might be easier.
Hub, the code change doesn't seem like something that should have caused the build to break, but are you around to take a look?
Flags: needinfo?(hub)
(In reply to Milan Sreckovic [:milan] from comment #30)
> Hub, the code change doesn't seem like something that should have caused the
> build to break, but are you around to take a look?

I don't have a Mac build environment for b2g. I build in Fedora 17 as always. I don't see why my change who wreck havoc like that.
Flags: needinfo?(hub)
We have some problems importing resource://gre/modules/CrashSubmit.jsm and throw an exception here:
https://hg.mozilla.org/mozilla-central/file/38407b98003b/b2g/chrome/content/shell.js#l84
(In reply to Gregor Wagner [:gwagner] from comment #32)
> We have some problems importing resource://gre/modules/CrashSubmit.jsm and
> throw an exception here:
> https://hg.mozilla.org/mozilla-central/file/38407b98003b/b2g/chrome/content/
> shell.js#l84

Yep, that's consistent with my findings:

From bug 826445 comment #5:
> Some digging revealed that the Cu.import() line for Webapps.jsm fails with
> NS_ERROR_FILE_NOT_FOUND. I checked that the file exists in the omni.ja and
> is valid JavaScript. Of course it must somehow be importable because later
> in logcat we see the line that Justin quoted above: an exception that's
> thrown *from* Webapps.jsm. NS_ERROR_FILE_NOT_FOUND is also thrown if there
> are circular imports between JSMs. But I couldn't find any.

So somehow we're failing to Cu.import() things at seemlingly random times.
blocking-basecamp: + → ?
Priority: P1 → --
(In reply to Philipp von Weitershausen [:philikon] from comment #33)
> So somehow we're failing to Cu.import() things at seemlingly random times.

Err nevermind, just saw comment 29.
Attached patch patchSplinter Review
Assignee: milan → anygregor
This got mistakenly non-plused.
blocking-basecamp: ? → +
Priority: -- → P1
Comment on attachment 698083 [details] [diff] [review]
patch

Review of attachment 698083 [details] [diff] [review]:
-----------------------------------------------------------------

Ship it! We never include CrashSubmit.jsm when not building the crash reporter [1], so this seems like the right fix to me.

[1] https://mxr.mozilla.org/mozilla-central/source/toolkit/Makefile.in#58

::: b2g/chrome/content/shell.js
@@ +85,5 @@
>      Cu.import("resource://gre/modules/CrashSubmit.jsm", this);
>      return this.CrashSubmit;
> +#else
> +    return null;
> +#endif

Micro-nit: move the `delete this.CrashSubmit;` out of the #ifdef and do `return this.CrashSubmit = null;` here, avoiding the getter later.
Attachment #698083 - Flags: review+
Attachment #697654 - Attachment is obsolete: true
Blocks: 821498
So by re-reading this, here is the conclusion: I doubt it was Mac specific. It was indeed caused by an original bug in the original implementation of the crash report - bug that didn't conditionally load the crash submit module - but triggered by my change that cause the crash submit module to attempt to load even if no crash dump has been created.

As usual I forgot I always force the inclusion of the crash reporter in my local build, and how could I forgot since after each time I do a |./repo sync| I have to reapply the patch.

Sorry about that. Nice work !
hub, how do you explain that it worked on Linux ?
(In reply to Julien Wajsberg [:julienw] from comment #40)
> hub, how do you explain that it worked on Linux ?

On Linux in general I don't know, but on MY Linux, as I said, I build with the crash reporter opted in (a patch in gonk-misc is needed for that). Also the official builds are probably not having issues either as they do build with crash reporter.
But I'm not using this patch to build and I didn't have this problem on my Linux...
https://hg.mozilla.org/mozilla-central/rev/2428a69911d7
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
I'm seeing E/GeckoConsole(  108): [JavaScript Error: "NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]" {file: "resource://gre/modules/Webapps.jsm" line: 2290}]

When I build from b2g-18 tonight.  Can we get the patch uplifted to that tree, too?
(In reply to David Flanagan [:djf] from comment #44)
> I'm seeing E/GeckoConsole(  108): [JavaScript Error: "NS_ERROR_FAILURE:
> Component returned failure code: 0x80004005 (NS_ERROR_FAILURE)
> [nsIProperties.get]" {file: "resource://gre/modules/Webapps.jsm" line: 2290}]
> 
> When I build from b2g-18 tonight.  Can we get the patch uplifted to that
> tree, too?

Pushed it.
https://hg.mozilla.org/releases/mozilla-b2g18/rev/d7dbdec9352e
this must be in aurora too.
(In reply to Julien Wajsberg [:julienw] from comment #40)
> hub, how do you explain that it worked on Linux ?

The crashreporter code is not very cross-compile friendly (when the host and target OS are not the same, Linux->Android works fine). I believe the default B2G mozconfig disables the crashreporter when building on Mac OS X.
I'm actually still seeing the Webapps.jsm bug when I try to build mozilla-central, too. I'm using he ./build.sh gecko build script. Does this mean that the build script is misconfigured?
You should maybe try a full ./build.sh (without clobbering)
Please disregard comment 48. My .userconfig was messed up, and I was actually trying to build b2g18 again, before this patch was applied to that tree.
Building b2g18 works for me now.
(In reply to Julien Wajsberg [:julienw] from comment #46)
> this must be in aurora too.

Apparently it was decided this morning that uplifts only need to go on b2g18 now, so nothing else needs doing here.
Target Milestone: --- → B2G C4 (2jan on)
You need to log in before you can comment on or make changes to this bug.