Symlinked components break everything

RESOLVED FIXED

Status

()

Core
XPCOM
RESOLVED FIXED
8 years ago
7 years ago

People

(Reporter: glandium, Assigned: glandium)

Tracking

1.9.2 Branch
All
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking1.9.2 -, status1.9.2 .13-fixed, status1.9.1 unaffected)

Details

Attachments

(2 attachments, 1 obsolete attachment)

(Assignee)

Description

8 years ago
Bug 530196 is probably a symptom of this issue.

Another symptom may occur on linux builds where extensions containing components are symlinked. For example, on Debian, extensions can live in a common directory for all applications and /usr/lib/mozilla/extensions/{app-id}/{ext-id} is a symbolic link to the common directory.

Another trigger was reported by a user who uses symbolic links for his firefox profile.

What happens then is that while you can run firefox once, if you exit firefox and start it again, well, it doesn't start up.

It looks like the xptiInterfaceInfoManager part of bug 491245 is responsible for this issue. In other words, reverting its hunk (http://hg.mozilla.org/mozilla-central/diff/51bafb458d68/xpcom/reflect/xptinfo/src/xptiInterfaceInfoManager.cpp) is enough to "fix" the problem.

Comment 1

8 years ago
So this is basically the same as bug 513736 and bug 530793, right?
Blocks: 491245
(Assignee)

Comment 2

8 years ago
(In reply to comment #1)
> So this is basically the same as bug 513736 and bug 530793, right?

Yes it is. And I think I know what the root problem is (though I am currently building to verify that), and it all boils down to what we want to expect from nsIFile.equals. In other words, the change in bug 491245 in probably very wrong, and the normalizations shouldn't even be needed.

On Unix, and I guess OSX, nsIFile.equals could be a matter of checking st_dev and st_ino in a stat() result.
(Assignee)

Comment 3

8 years ago
Created attachment 431372 [details] [diff] [review]
Possible patch for 1.9.2 branch

I verified this patch works, for both the components directory issue and bug 530196 (tested after reverting it, too).

So the problem is that the normalization is done on the nsIFile contained in the components array, and this breaks some other code using this array.

Now, as I said in comment #2, I think the real issue is that of nsIFile.equals and should be fixed there. Patches for bug 530196 and bug 491245 should be reverted.

I can provide a patch for Unix. I guess the same logic would work on OSX.
(Assignee)

Comment 4

8 years ago
Created attachment 431403 [details] [diff] [review]
Different approach for trunk
(Assignee)

Comment 5

8 years ago
Comment on attachment 431403 [details] [diff] [review]
Different approach for trunk

I'd be interested to know if the test passes on OSX with this patch.
It should, however, break on Windows.
Attachment #431403 - Attachment is patch: true
Attachment #431403 - Attachment mime type: application/octet-stream → text/plain
(Assignee)

Comment 6

8 years ago
Note this would probably be more efficient if the strcmp was done first and stat()s only done when then doesn't match. But I'd first like to hear what you think before going further.
(Assignee)

Comment 7

8 years ago
http://msdn.microsoft.com/en-us/library/aa364952%28VS.85%29.aspx This could be used to implement the same test on Windows.
(Assignee)

Updated

8 years ago
Attachment #431372 - Flags: feedback?(benjamin)
(Assignee)

Updated

8 years ago
Attachment #431403 - Flags: feedback?(benjamin)
Is there a reason you requested feedback instead of review from bsmedberg?
Assignee: nobody → mh+mozilla
(Assignee)

Comment 9

8 years ago
(In reply to comment #8)
> Is there a reason you requested feedback instead of review from bsmedberg?

Firstly because I'd like to hear which approach would be considered the right one. Secondly because the second patch is not complete, since it doesn't implement the change on windows and other platforms.
(Assignee)

Updated

7 years ago
Duplicate of this bug: 533535
(Assignee)

Updated

7 years ago
Duplicate of this bug: 530793
(Assignee)

Updated

7 years ago
Duplicate of this bug: 513736

Comment 13

7 years ago
Using the latest nightly build solves the problem. I deduce from that that the incoming version 3.6.3 will solve it as well.

Comment 14

7 years ago
Using the latest nightly 3.6.3pre did /not/ solve the problem for me.
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-firefox-3.6.x/firefox-3.6.3pre.en-US.linux-i686.tar.bz2
(launched once since there was no compatibility.ini file - tried to re-launch, and it failed)
(Assignee)

Comment 15

7 years ago
Benjamin, could you take a look ?

Comment 16

7 years ago
The patch as in comment 5 (https://bugzilla.mozilla.org/attachment.cgi?id=431403)
works for me with symlinked home dirs.  Otherwise ff fails to start on second
run.  I can't seem to see that this patch has been pushed into the hg repo, though. If it has been, what branch was it pushed to?
(Assignee)

Comment 17

7 years ago
Richard, it hasn't been applied, and hasn't been discussed. Patch in comment 5 also lacks corresponding change for Windows.

Comment 18

7 years ago
This seems to have hit a lot of people and also seems to have
created a lot of unnecessary angst for extension writers. Does 
the orig bug manifest at all under Windows? If not, wouldn't a
few ifdefs do to get this out of the door?  Is a revert of 
51bafb458d68 not acceptable either?  Thanks for the patch, Mike!
(Assignee)

Comment 19

7 years ago
Comment on attachment 431372 [details] [diff] [review]
Possible patch for 1.9.2 branch

Let's try a different approach to get a comment on this.
Attachment #431372 - Attachment description: Possible patch, but fixing the wrong issue IMHO → Possible patch for 1.9.2 branch
Attachment #431372 - Flags: feedback?(benjamin) → review?(benjamin)
(Assignee)

Comment 20

7 years ago
Comment on attachment 431403 [details] [diff] [review]
Different approach for trunk

I would be interested to know if you consider that to be the right approach. If so, I'll implement the win32 alternative and send to the try servers.
Attachment #431403 - Attachment description: Patch → Different approach for trunk
Attachment #431403 - Flags: feedback?(benjamin) → review?(benjamin)

Comment 21

7 years ago
Is there any way of encouraging someone with check-in powers to look at this? This bug is essentially preventing me from using most extensions in Firefox. I've voted for the bug, if that helps.
(In reply to comment #21)
> Is there any way of encouraging someone with check-in powers to look at this?

There's a patch waiting on review. Until the patch is given review, it cannot be checked-in.
I mentioned to mh on IRC, but this is on my list of things to really think about once 3.6.4 blockers and tracking are out of the way, but it might take a week or more yet.

Comment 24

7 years ago
Here is one topic about this bug on Mac OSX:

https://support.mozilla.com/en-US/forum/1/590191

Different solutions were offered though not clear (for me) which of them worked and why.

And one user on Mac OS X just installed Firefox (no add-ons, I guess) and has the same issue.

"I click on the icon on my dock for Firefox and it bounces up like other apps do when they start up, but then it just stops. it doesn't even show up on the activity monitor under active processes."

https://support.mozilla.com/en-US/forum/1/665803

Fix the problem in Firefox. And/or put a large link on support.mozilla.com about how to solve it.

Comment 25

7 years ago
Hi from Germany,

same problem here. MacOS 10.6.3, FF 3.6.3. Very frustrating.

Plugins which will work:
Adblock Plus
Adblock Plus Element Hiding Helper
All-in-One Gestures
Counterpixel
Deutsches Wörterbuch
ebay Toolbar
Feed Filter
Fission
LiveClick
Long URL Please
NoScript
Quick Locale Switcher
QuickJava
ReloadEvery
TinEye Reverse
Trashmail.net
UrlbarExt
User Agent Switcher

Plugins which DONT WORK:
DownloadHelper
DownthemAll!
Greasemonkey
Modify Headers
Stylish
Weave

At least i've seen a couple of problems here with GreaseMonkey, Downthemall and
Weave. So what do these plugins have in common?
This now seems to work as expected on mozilla-central. I've got http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=33ff230a5b78&tochange=0d8bf91aa71e as the fix range, so maybe fixed by bug 570488 ?

Comment 27

7 years ago
(In reply to comment #26)
> This now seems to work as expected on mozilla-central. I've got
> http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=33ff230a5b78&tochange=0d8bf91aa71e
> as the fix range, so maybe fixed by bug 570488 ?

OK, finally some positive news...
Is there a version (beta, rc, ...) available somewhere that we can test?
(Assignee)

Comment 28

7 years ago
Please note that even if the current issue with the components directory is fixed, the PoC for trunk should still be considered as it also decreases the number of stat() calls at startup (normalization does a stat() on every sub directory of the path starting from / to see if it is a symbolic link and possibly resolve it). With this PoC, any call to file.normalize for use with file.equals could be removed.

Updated

7 years ago
Blocks: 574458

Updated

7 years ago
No longer blocks: 574458

Updated

7 years ago
Duplicate of this bug: 576233

Comment 30

7 years ago
>At least i've seen a couple of problems here with GreaseMonkey, Downthemall and
Weave. So what do these plugins have in common?

Do they use a XPT component? At least, that is what triggers the issue with our iMacros addon: https://bugzilla.mozilla.org/show_bug.cgi?id=574334 
Renaming the xpt "solves" the issue (i. e. Firefox starts again, but the extension is broken)

Updated

7 years ago
Duplicate of this bug: 548927
This was fixed completely on trunk (for Firefox 4) with bug 568691, and I don't think there is a safe patch I'd take on branches, so let's call this WORKSFORME.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → WORKSFORME
(Assignee)

Comment 33

7 years ago
(In reply to comment #32)
> This was fixed completely on trunk (for Firefox 4) with bug 568691, and I don't
> think there is a safe patch I'd take on branches, so let's call this
> WORKSFORME.

My patch for the 1.9.2 branch is pretty safe. Please consider it.

Other than that, what about comment 28 ? Need I file a new bug, now ?

Updated

7 years ago
Duplicate of this bug: 574458
Comment on attachment 431372 [details] [diff] [review]
Possible patch for 1.9.2 branch

Yeah, that seems safe enough. r-d, this will help Linux (and mac) users whose application or profile is installed in a symlinked location, and is fairly low risk.
Attachment #431372 - Flags: review?(benjamin)
Attachment #431372 - Flags: review+
Attachment #431372 - Flags: approval1.9.2.8?
re: comment 28 followup, I don't think so: I'm pretty sure we aren't normalizing files at all any more.

Comment 37

7 years ago
(In reply to comment #36)
> re: comment 28 followup, I don't think so: I'm pretty sure we aren't
> normalizing files at all any more.

Maybe not in mainline but the bug affects all 3.6.x releases.

Some of the bugs marked as duplicates of this one are reports against 3.6.x versions not trunk.

If every bug reported about this gets marked as a duplicate of this one then it just won't get fixed for 3.6.x...

Comment 38

7 years ago
(In reply to comment #37)
> (In reply to comment #36)
> > re: comment 28 followup, I don't think so: I'm pretty sure we aren't
> > normalizing files at all any more.
> 
> Maybe not in mainline but the bug affects all 3.6.x releases.

Oops I can't read can I.

Updated

7 years ago
Duplicate of this bug: 574458
It probably makes the most sense to reopen this bug as a branch bug if we're going to try to land glandium's patch.
Status: RESOLVED → REOPENED
blocking1.9.2: --- → -
status1.9.2: --- → wanted
Resolution: WORKSFORME → ---
Version: Trunk → 1.9.2 Branch
Duplicate of this bug: 530793
Comment on attachment 431403 [details] [diff] [review]
Different approach for trunk

The trunk patch is obsolete if I'm reading comment 32 correctly.
Attachment #431403 - Attachment is obsolete: true
Attachment #431403 - Flags: review?(benjamin)

Comment 43

7 years ago
This problem seems to be gone in Fedora Thunderbird 3.1.1 - I have both lightning and enigmail installed and enabled, and there are no problems launching. Is that expected?
(In reply to comment #43)
> This problem seems to be gone in Fedora Thunderbird 3.1.1 - I have both
> lightning and enigmail installed and enabled, and there are no problems
> launching. Is that expected?

I can't speak for the Fedora package, but the Thunderbird 3.1.1 from Mozilla still breaks because of this with Lightning.

Comment 45

7 years ago
This quickly neutralizes the bug for me (Linux Firefox, Sync Addon, symlinked profile)

$ firefox                   # runs OK
$ firefox --safe-mode       # just exit in the safe-mode options prompt
$ firefox                   # runs OK

This command line as well
$ firefox --safe-mode  &&  firefox

Comment 46

7 years ago
Comment on attachment 431372 [details] [diff] [review]
Possible patch for 1.9.2 branch

a=LegNeato for 1.9.2.9.
Attachment #431372 - Flags: approval1.9.2.9? → approval1.9.2.9+
Depends on: 584156
This bug broke xpcshell-tests on Mac on the 1.9.2 branch, see bug 584156.
(Assignee)

Comment 48

7 years ago
(In reply to comment #47)
> This bug broke xpcshell-tests on Mac on the 1.9.2 branch, see bug 584156.

This doesn't make sense, because the patch doesn't change how registration is done.
(In reply to comment #48)
> (In reply to comment #47)
> > This bug broke xpcshell-tests on Mac on the 1.9.2 branch, see bug 584156.
> 
> This doesn't make sense, because the patch doesn't change how registration is
> done.

Sorry, this could actually be bug 582012 - I missed checking the pushlog and didn't see it was pushed at the same time. I'm now building 1.9.2 and will confirm either way in a bit.
(In reply to comment #49)
> (In reply to comment #48)
> > (In reply to comment #47)
> > > This bug broke xpcshell-tests on Mac on the 1.9.2 branch, see bug 584156.
> > 
> > This doesn't make sense, because the patch doesn't change how registration is
> > done.
> 
> Sorry, this could actually be bug 582012 - I missed checking the pushlog and
> didn't see it was pushed at the same time. I'm now building 1.9.2 and will
> confirm either way in a bit.

Yep, local backout is showing that bug 582012 does indeed appear to be the culprit.
No longer depends on: 584156
Depends on: 584156
This landed and was backed out for the test failure.

Comment 52

7 years ago
Comment on attachment 431372 [details] [diff] [review]
Possible patch for 1.9.2 branch

Removing .9 approval as this missed landing before freeze. Feel free to nominate again, though the bar for approval will be higher.
Attachment #431372 - Flags: approval1.9.2.9+ → approval1.9.2.9-

Comment 53

7 years ago
Was the backout (of this patch) mentioned in comment 51 because the the "xpcshell-tests on Mac" problem which *seems* from the comments here to have been because of a different patch.  Ie  did this really break something despite what comments 48, 50 seem to imply?

The patch here doesn't seem to check if the Clone() succeeds so I suppose it might cause a problem on some systems assuming that Clone() needs to allocate memory.

Mind you I assume that Normalize() also might need to allocate memory if the resulting string is longer, I don't know if it can/will fail gracefully.

Apart from those the only thing which looks likely (IMHO) is just that it changes the memory layout slightly so maybe exposing a corruption problem somewhere else...

Does anyone have a simple test case showing a failure after applying the patch?

I wonder if changing the patch to call both current->Normalize() and normalized->Normalize() will show the same problem (not that such a patch would fix the problem that this bug was opened for, but it might be a test to see what is actually failing).
(Assignee)

Comment 54

7 years ago
I don't know if Normalize() on mac can change things regarding case (since the fs is case insensitive), but if the failing test can vary depending on components directory case, that could be the source of the problem.

Comment 55

7 years ago
May I just remind you that whithout this patch, most linux distribution break as componen are indeed symlinked. So if ou do not add it upstream, mots distribution will ship it meaning that not landing it will be a failure. And BTW this cleraly indicates that the test plan should include symlinked components!
(Assignee)

Comment 56

7 years ago
So, this is what can be seen on OSX debug builds with the patch applied:

###!!! ASSERTION: This is not supposed to fail!: 'Error', file /builds/slave/tryserver-macosx-debug/build/js/src/xpconnect/src/nsXPConnect.cpp, line 1017
###!!! ASSERTION: Failed to initialize nsScriptSecurityManager: 'NS_SUCCEEDED(rv)', file /builds/slave/tryserver-macosx-debug/build/caps/src/nsScriptSecurityManager.cpp, line 3455
(Assignee)

Comment 57

7 years ago
and what XPCOM_DEBUG_BREAK=stack-and-abort unveils:
DumpJSStack+0x00003B87 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0004EF7B]
std::vector<unsigned short, std::allocator<unsigned short> >::resize(unsigned long)+0x0000429C [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x000461B2]
DumpJSStack+0x00039233 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x00084627]
std::vector<unsigned short, std::allocator<unsigned short> >::resize(unsigned long)+0x00008046 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x00049F5C]
DumpJSStack+0x0024756A [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0029295E]
DumpJSStack+0x00251079 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0029C46D]
DumpJSStack+0x00251666 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0029CA5A]
DumpJSStack+0x002576DA [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x002A2ACE]
JNIEnv_::ThrowNew(_jclass*, char const*)+0x00063D7A [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x01185924]
NS_GetComponentRegistrar_P+0x00004046 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011E67AE]
NS_GetComponentRegistrar_P+0x00005D09 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011E8471]
JNIEnv_::ThrowNew(_jclass*, char const*)+0x00056CF9 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011788A3]
JNIEnv_::ThrowNew(_jclass*, char const*)+0x0005720C [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x01178DB6]
DumpJSStack+0x00004E3E [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x00050232]
DumpJSStack+0x00004E98 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0005028C]
DumpJSStack+0x000BCF53 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x00108347]
DumpJSStack+0x000C096C [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0010BD60]
NS_GetComponentRegistrar_P+0x00004C3A [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011E73A2]
NS_GetComponentRegistrar_P+0x0000715B [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011E98C3]
NS_GetComponentRegistrar_P+0x000072BF [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011E9A27]
NS_GetComponentRegistrar_P+0x0000780E [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011E9F76]
NS_InitXPCOM3_P+0x000008D7 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0118BEA7]
NS_InitXPCOM2_P+0x0000002F [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0118BF41]
NS_InitXPCOM2+0x0000001F [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./libxpcom.dylib +0x000019DD]
start+0x00001601 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./TestRegistrationOrder +0x00001E31]
start+0x00000E7F [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./TestRegistrationOrder +0x000016AF]
start+0x000000FB [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./TestRegistrationOrder +0x0000092B]
start+0x00000029 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./TestRegistrationOrder +0x00000859]
(Assignee)

Comment 58

7 years ago
Created attachment 472256 [details] [diff] [review]
workaround for mac

So, I haven't found out (yet) what and why exactly, but what happens is that something on mac doesn't like that the components directory isn't normalized on mac. It's most probably related to bug 530188. The simplest "fix" I found is to normalize in nsDirectoryService::GetCurrentProcessDir (interestingly, the unix codepath uses realpath against MOZILLA_FIVE_HOME, so the patch makes it somewhat more close to what the unix codepath does).
This also explains why the test fails in the harness, where it is started as $BUILD_DIR/xpcom/tests/../../dist/bin/TestRegistrationOrder, and not when run by hand with $BUILD_DIR/dist/bin/TestRegistrationOrder. (running ./TestRegistrationOrder naturally fails, too)
Attachment #472256 - Flags: review?(benjamin)
Attachment #472256 - Flags: review?(benjamin) → review+
(Assignee)

Updated

7 years ago
Attachment #431372 - Flags: approval1.9.2.12?
(Assignee)

Comment 59

7 years ago
Comment on attachment 472256 [details] [diff] [review]
workaround for mac

We're too late for .11 but that could be considered for .12. The other patch attached to this bug was landed and led to bug 584156 on mac only, so it was backed out. This patch is an additional workaround for mac, which avoids bug 584156, according to my testing.  This code is in a mac only part of the code so it doesn't affect anything but mac, and on mac, it may affect the value returned for the current process directory (depending on what exactly Normalize does on mac, and where Firefox is installed). This /shouldn't/ have an impact.
Attachment #472256 - Flags: approval1.9.2.12?

Comment 60

7 years ago
(In reply to comment #54)
> I don't know if Normalize() on mac can change things regarding case (since the
> fs is case insensitive)
...

Yes the default hfs+ setup on a Mac is case-insensitive but it isn't always the case.

If any code assumes that all Mac fs are case insensitive then it will break for those who choose to turn on the case-sensitive feature of hfs+ or for example where the files are mounted from a server with a protocol which is, such as smb or nfs.

[ not that this should be (hopefully) relevant to the bug... ]

Comment 61

7 years ago
We're early in the development for .12 so I am willing to get this landed if it lands soon. If there are any issues we'll back it out, likely for good.

Updated

7 years ago
Attachment #472256 - Flags: approval1.9.2.12? → approval1.9.2.12+

Updated

7 years ago
Attachment #431372 - Flags: approval1.9.2.12? → approval1.9.2.12+

Comment 62

7 years ago
a=LegNeato for 1.9.2.12. Please land only on the mozilla-1.9.2 default branch, *not* the relbranch.

Also, please be sure to land both patches (preferably as one patch).
(Assignee)

Comment 63

7 years ago
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/5e114301d046
Status: REOPENED → RESOLVED
Last Resolved: 7 years ago7 years ago
status1.9.2: wanted → .12-fixed
Resolution: --- → FIXED

Comment 64

7 years ago
Hello,

I had the same problem on SuSE Linux 11.1 with firefox 3.6.8.
After the second start with the add-ons "Firefox Sync",
downthemall or greasemonkey firefox directly terminates.

Here we use amd, not automount and the home directories are accessed
over a link.(The "bug" can be seen in the file xpti.dat.)

The two patches fixed the problems for me.

regards,

Martin

Comment 65

7 years ago
I built the "default" head of mozilla-1.9.2 yesterday and I can confirm that it fixes the problem for me too. Thanks!

Updated

7 years ago
status1.9.1: --- → wontfix
(Assignee)

Updated

7 years ago
status1.9.1: wontfix → unaffected
You need to log in before you can comment on or make changes to this bug.