Closed Bug 1079655 Opened 10 years ago Closed 10 years ago

Nightly fails to start on Yosemite when run from Terminal

Categories

(Core :: XPCOM, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla35
Tracking Status
firefox34 --- fixed
firefox35 --- fixed

People

(Reporter: jya, Assigned: jya)

References

Details

(Keywords: regression)

Attachments

(2 files, 5 obsolete files)

Did a hg pull earlier today, since I've been unable to start. It always fail with "Couldn't load XPCOM."

I have clobbered, re-pulled, cleared ccache to no available.

It gets stuck for about 30s while looping in static GetFrozenFunctionsFunc
XPCOMGlueLoad(const char* aXPCOMFile) (xpcom/glue/standalone/nsXPCOMGlue.cpp)

* thread #1: tid = 0xad136, 0x000000010000741c firefox`XPCOMGlueLoad(aXPCOMFile=0x00007fff5fbff410) + 1084 at nsXPCOMGlue.cpp:455, queue = 'com.apple.main-thread', stop reason = breakpoint 28.1
  * frame #0: 0x000000010000741c firefox`XPCOMGlueLoad(aXPCOMFile=0x00007fff5fbff410) + 1084 at nsXPCOMGlue.cpp:455
    frame #1: 0x0000000100006f52 firefox`XPCOMGlueStartup(aXPCOMFile=0x00007fff5fbff410) + 66 at nsXPCOMGlue.cpp:520
    frame #2: 0x0000000100001e89 firefox`InitXPCOMGlue(argv0=0x00007fff5fbffb10, xreDirectory=0x00007fff5fbff8d8) + 361 at nsBrowserApp.cpp:559
    frame #3: 0x0000000100001a5f firefox`main(argc=5, argv=0x00007fff5fbff9b0) + 95 at nsBrowserApp.cpp:624
    frame #4: 0x00000001000015a4 firefox`start + 52

and then returns nullptr.
Reverting to commit ef1cd14c8cac resolves the problem.

Going to track which one introduced the problem
The first bad revision is:
changeset:   209160:5884c9f92f3d
user:        Stephen Pohl <spohl.mozilla.bugs@gmail.com>
date:        Tue Oct 07 09:33:09 2014 -0400
summary:     Bug 1078640: Sanitize path used to load XPCOM on OSX. r=smichaud
Depends on: 1078640
Summary: Nightly fail to start → Nightly fails to start on OS X
this is very weird...
reverting this commit. and only adding the line:
    UInt8 tempBuffer[MAXPATHLEN];

alone causes the issue...
I then tried
Replacing:
UInt8 tempBuffer[MAXPATHLEN];

with:
UInt8 tempBuffer[MAXPATHLEN/2];

and it now works !

compiler bug??
$ clang --version
Apple LLVM version 6.0 (clang-600.0.51) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin14.0.0
Thread model: posix

from XCode 6.0.1
replacing with:
    UInt8 tempBuffer[MAXPATHLEN-104]

and it works... any value less than 104 gives me the couldn't load XPCOM error
Have you checked the value of tempBuffer at that point with a simple printf to see how this compares to the issue described in bug 1078640 comment 0?
Flags: needinfo?(jyavenard)
(In reply to Jean-Yves Avenard [:jya] from comment #3)
> this is very weird...
> reverting this commit. and only adding the line:
>     UInt8 tempBuffer[MAXPATHLEN];
> 
> alone causes the issue...

What's the value of MAXPATHLEN? Does this reproduce when you have no patches applied?

> compiler bug??
> $ clang --version
> Apple LLVM version 6.0 (clang-600.0.51) (based on LLVM 3.5svn)
> Target: x86_64-apple-darwin14.0.0
> Thread model: posix
> 
> from XCode 6.0.1

Same here, so it's doubtful that this is a compiler bug.
Blocks: 1078640
Component: General → XPCOM
No longer depends on: 1078640
Keywords: regression
Product: Firefox → Core
(In reply to :Gijs Kruitbosch from comment #5)
> Have you checked the value of tempBuffer at that point with a simple printf
> to see how this compares to the issue described in bug 1078640 comment 0?

the content of tempBuffer is irrelevant really, because reverting commit 5884c9f92f3d and only leaving the UInt8 tempBuffer[MAXPATHLEN]; declaration is enough to trigger the bug.

MAXPATHLEN has a value of 1024.

Putting tempBuffer[XX] where XX is 920 or less is fine, anything over causes the problem.

I was running Yosemite 10.10 beta 4. Have now upgraded to beta 5, same issue.
I have a macbook air now running identical code (same OS, same Xcode, and same source/binary) and it works there.

I have no patches applied to that tree.

The machine I'm using is a late-2013 mac pro, 8 core, 32GB RAM
Flags: needinfo?(jyavenard)
I just tested the firefox-2014-10-08-06-54-30-mozilla-central nightly on Yosemite GM Candidate 2 (build 14A386a).  It runs fine if you double-click on it (and the signature checks out fine).  But I always get the "Couldn't load XPCOM" error when I try to run it from Terminal -- no matter what's my current directory (and how long the path is to the "firefox" command).
Summary: Nightly fails to start on OS X → Nightly fails to start on Yosemite when run from Terminal
(In reply to Steven Michaud from comment #8)
> I just tested the firefox-2014-10-08-06-54-30-mozilla-central nightly on
> Yosemite GM Candidate 2 (build 14A386a).  It runs fine if you double-click
> on it (and the signature checks out fine).  But I always get the "Couldn't
> load XPCOM" error when I try to run it from Terminal -- no matter what's my
> current directory (and how long the path is to the "firefox" command).

that's interesting...
At least it's not just me...

Downloading this image: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-macosx64-debug/latest/

I can start the application just fine, either using open command, or directly doing:
/Volumes/Nightly/FirefoxNightlyDebug.app/Contents/MacOS/firefox -no-remote -foreground -P Development

But doing:
./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/firefox
or:
open obj-ff-dbg/dist/NightlyDebug.app <--- bounce for about 30s and then dies)

yields the same issue... "Couldn't load XPCOM."

will try a non-unified build out of interest.
(Following up comment #8)

This doesn't happen on OS X 10.9.5, 10.8.5 or 10.7.5.
And things are now even weirder -- this bug appears to be intermittent!  (On Yosemite.)

Just now I tried running firefox-2014-10-07-03-02-02-mozilla-central nightly from Terminal (which doesn't contain the patch for bug 1078640), and it ran fine.  Then I ran the firefox-2014-10-08-06-54-30-mozilla-central nightly again from Terminal ... and it also ran fine!! :-(
I've retested several times with the firefox-2014-10-08-06-54-30-mozilla-central nightly, and it keeps running fine -- even after a reboot.

But just now I tested with an additional parameter on the commandline:

Desktop/Firefox\ Nightly\ 2014-10-08.app/Contents/MacOS/firefox -ProfileManager

instead of

Desktop/Firefox\ Nightly\ 2014-10-08.app/Contents/MacOS/firefox

This triggers the bug ... at least for now.
And now I'm seeing the bug without the patch for bug 1078640!  Using the following commandline with the firefox-2014-10-07-03-02-02-mozilla-central nightly.

Desktop/FirefoxNightly\ 2014-10-07.app/Contents/MacOS/firefox -ProfileManager

> Desktop/Firefox\ Nightly\ 2014-10-08.app/Contents/MacOS/firefox -ProfileManager

This, above, should have been:

Desktop/FirefoxNightly\ 2014-10-08.app/Contents/MacOS/firefox -ProfileManager
Interesting that it is 100% reproducible for me with my own build and only on that particular machine, but not with the one from ftp.mozilla.org... 
Providing argument to firefox or not makes no difference.

Steven, do you think it could be something, like the HW acceleration crash, something related to the stack size going over a certain size and triggering an issue?
At this point I haven't a clue.

Clearly, though, whoever tries to investigate this bug will need to do it on Yosemite -- hopefully on the current developer seed.
Jean-Yves:  Try a nonsensical commandline parameter and see if it makes a difference.

I find I can reliably trigger the bug with either nightly by adding "-shit" :-)
Attached patch wip (obsolete) — Splinter Review
Would you be able to apply this patch and tell me if it fixes the issue for you? I've also started a try build, but it'll be a while before it's available:
https://tbpl.mozilla.org/?tree=Try&rev=6cb214915337
Assignee: nobody → spohl.mozilla.bugs
Status: NEW → ASSIGNED
Flags: needinfo?(jyavenard)
Also I now sometimes see the bug even when double-clicking on either nightly.

Let me test with a nightly that doesn't have any v2 signature changes, and see what happens.
(In reply to Steven Michaud from comment #16)
> Jean-Yves:  Try a nonsensical commandline parameter and see if it makes a
> difference.
> 
> I find I can reliably trigger the bug with either nightly by adding "-shit"
> :-)

LOL :) and guess what it does !
That's using the nightly build from http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-macosx64-debug/latest/ ; the error "Couldn't load XPCOM." is then returned instantly.

On my local build, using -shit or my preferred -wtf same 30s+ wait.

And that's with unified and non-unified build
Flags: needinfo?(jyavenard)
I can't repro this bug with the firefox-2014-09-30-03-02-02-mozilla-central nightly, even by adding "-shit".

At some point I'll try to find the shitty regression range.  But not right now, since I need a break :-)
Attached patch Do not allocate memory on stack (obsolete) — Splinter Review
Don't use temp buffer
Attachment #8502158 - Flags: review?(smichaud)
Assignee: spohl.mozilla.bugs → jyavenard
Attachment #8502144 - Attachment is obsolete: true
Only sanetize path if the retrieval succeeded. Also cleanup sanetization so it handles cases that will never ever happen
Attachment #8502183 - Flags: review?(smichaud)
Only sanetize path if the retrieval succeeded. Also cleanup sanetization so it handles cases that will never ever happen
Attachment #8502185 - Flags: review?(spohl.mozilla.bugs)
Attachment #8502158 - Attachment is obsolete: true
Attachment #8502158 - Flags: review?(smichaud)
Attachment #8502183 - Attachment is obsolete: true
Attachment #8502183 - Flags: review?(smichaud)
Comment on attachment 8502185 [details] [diff] [review]
Do not sanetize path should an error occurred

Review of attachment 8502185 [details] [diff] [review]:
-----------------------------------------------------------------

nit: Could you change the commit message to:
Bug 1079655: Do not sanitize the path used to load XPCOM if the path couldn't be retrieved successfully on OSX. r=spohl

r=spohl with that addressed. Thanks!
Attachment #8502185 - Flags: review?(spohl.mozilla.bugs) → review+
Attached patch Do not allocate memory on stack (obsolete) — Splinter Review
:sphol
Attachment #8502185 - Attachment is obsolete: true
Attachment #8502189 - Flags: review?(spohl.mozilla.bugs)
Attachment #8502190 - Flags: review+
Comment on attachment 8502189 [details] [diff] [review]
Do not allocate memory on stack

Review of attachment 8502189 [details] [diff] [review]:
-----------------------------------------------------------------

nit: could you update this commit message to:
Bug 1079655: Ensure that the path used to load XPCOM can be sanitized on OSX 10.10. r=spohl

r=spohl with that. Thanks!
Attachment #8502189 - Flags: review?(spohl.mozilla.bugs) → review+
Attachment #8502189 - Attachment is obsolete: true
Attachment #8502198 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/6b845af11ff0
https://hg.mozilla.org/mozilla-central/rev/02dd4a900b0e
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Landed on aurora in the Mac V2 signing combined patch in bug 1047584
Can someone explain what the actual bug was here? None of the attachments or comments say what was actually fixed.
Best guess was some kind of stack exhaustion, only reproducible on 10.10. By using an existing buffer rather than allocating a new one on the stack we worked around it. It's possible that we could run into this again somewhere else.
We might have here, however I am not sure how to confirm what troubleshooting is needed: 
https://support.mozilla.org/en-US/questions/1052460

"The Dock icon for Firefox goes up once then returns to the dock, nothing else that I can find happens! This is in the console log: 3/16/15 18:05:22.620 com.apple.xpc.launchd[1] (org.mozilla.firefox.182928[11568]) Service exited with abnormal code: 255 This is the console log for v35: 12/13/14 08:06:01.521 com.apple.xpc.launchd[1]: (org.mozilla.firefox.48312[6679]) Service exited with abnormal code: 255 "
Flags: needinfo?(spohl.mozilla.bugs)
There isn't enough evidence (yet) to say that this is the same issue. We should file a separate bug to track it properly.
Flags: needinfo?(spohl.mozilla.bugs)
You need to log in before you can comment on or make changes to this bug.