Closed Bug 551152 Opened 14 years ago Closed 14 years ago

Symlinked components break everything

Categories

(Core :: XPCOM, defect)

1.9.2 Branch
All
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking1.9.2 --- -
status1.9.2 --- .13-fixed
status1.9.1 --- unaffected

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(2 files, 1 obsolete file)

Bug 530196 is probably a symptom of this issue.

Another symptom may occur on linux builds where extensions containing components are symlinked. For example, on Debian, extensions can live in a common directory for all applications and /usr/lib/mozilla/extensions/{app-id}/{ext-id} is a symbolic link to the common directory.

Another trigger was reported by a user who uses symbolic links for his firefox profile.

What happens then is that while you can run firefox once, if you exit firefox and start it again, well, it doesn't start up.

It looks like the xptiInterfaceInfoManager part of bug 491245 is responsible for this issue. In other words, reverting its hunk (http://hg.mozilla.org/mozilla-central/diff/51bafb458d68/xpcom/reflect/xptinfo/src/xptiInterfaceInfoManager.cpp) is enough to "fix" the problem.
So this is basically the same as bug 513736 and bug 530793, right?
Blocks: 491245
(In reply to comment #1)
> So this is basically the same as bug 513736 and bug 530793, right?

Yes it is. And I think I know what the root problem is (though I am currently building to verify that), and it all boils down to what we want to expect from nsIFile.equals. In other words, the change in bug 491245 in probably very wrong, and the normalizations shouldn't even be needed.

On Unix, and I guess OSX, nsIFile.equals could be a matter of checking st_dev and st_ino in a stat() result.
I verified this patch works, for both the components directory issue and bug 530196 (tested after reverting it, too).

So the problem is that the normalization is done on the nsIFile contained in the components array, and this breaks some other code using this array.

Now, as I said in comment #2, I think the real issue is that of nsIFile.equals and should be fixed there. Patches for bug 530196 and bug 491245 should be reverted.

I can provide a patch for Unix. I guess the same logic would work on OSX.
Attached patch Different approach for trunk (obsolete) — Splinter Review
Comment on attachment 431403 [details] [diff] [review]
Different approach for trunk

I'd be interested to know if the test passes on OSX with this patch.
It should, however, break on Windows.
Attachment #431403 - Attachment is patch: true
Attachment #431403 - Attachment mime type: application/octet-stream → text/plain
Note this would probably be more efficient if the strcmp was done first and stat()s only done when then doesn't match. But I'd first like to hear what you think before going further.
http://msdn.microsoft.com/en-us/library/aa364952%28VS.85%29.aspx This could be used to implement the same test on Windows.
Attachment #431372 - Flags: feedback?(benjamin)
Attachment #431403 - Flags: feedback?(benjamin)
Is there a reason you requested feedback instead of review from bsmedberg?
Assignee: nobody → mh+mozilla
(In reply to comment #8)
> Is there a reason you requested feedback instead of review from bsmedberg?

Firstly because I'd like to hear which approach would be considered the right one. Secondly because the second patch is not complete, since it doesn't implement the change on windows and other platforms.
Using the latest nightly build solves the problem. I deduce from that that the incoming version 3.6.3 will solve it as well.
Using the latest nightly 3.6.3pre did /not/ solve the problem for me.
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-firefox-3.6.x/firefox-3.6.3pre.en-US.linux-i686.tar.bz2
(launched once since there was no compatibility.ini file - tried to re-launch, and it failed)
Benjamin, could you take a look ?
The patch as in comment 5 (https://bugzilla.mozilla.org/attachment.cgi?id=431403)
works for me with symlinked home dirs.  Otherwise ff fails to start on second
run.  I can't seem to see that this patch has been pushed into the hg repo, though. If it has been, what branch was it pushed to?
Richard, it hasn't been applied, and hasn't been discussed. Patch in comment 5 also lacks corresponding change for Windows.
This seems to have hit a lot of people and also seems to have
created a lot of unnecessary angst for extension writers. Does 
the orig bug manifest at all under Windows? If not, wouldn't a
few ifdefs do to get this out of the door?  Is a revert of 
51bafb458d68 not acceptable either?  Thanks for the patch, Mike!
Comment on attachment 431372 [details] [diff] [review]
Possible patch for 1.9.2 branch

Let's try a different approach to get a comment on this.
Attachment #431372 - Attachment description: Possible patch, but fixing the wrong issue IMHO → Possible patch for 1.9.2 branch
Attachment #431372 - Flags: feedback?(benjamin) → review?(benjamin)
Comment on attachment 431403 [details] [diff] [review]
Different approach for trunk

I would be interested to know if you consider that to be the right approach. If so, I'll implement the win32 alternative and send to the try servers.
Attachment #431403 - Attachment description: Patch → Different approach for trunk
Attachment #431403 - Flags: feedback?(benjamin) → review?(benjamin)
Is there any way of encouraging someone with check-in powers to look at this? This bug is essentially preventing me from using most extensions in Firefox. I've voted for the bug, if that helps.
(In reply to comment #21)
> Is there any way of encouraging someone with check-in powers to look at this?

There's a patch waiting on review. Until the patch is given review, it cannot be checked-in.
I mentioned to mh on IRC, but this is on my list of things to really think about once 3.6.4 blockers and tracking are out of the way, but it might take a week or more yet.
Here is one topic about this bug on Mac OSX:

https://support.mozilla.com/en-US/forum/1/590191

Different solutions were offered though not clear (for me) which of them worked and why.

And one user on Mac OS X just installed Firefox (no add-ons, I guess) and has the same issue.

"I click on the icon on my dock for Firefox and it bounces up like other apps do when they start up, but then it just stops. it doesn't even show up on the activity monitor under active processes."

https://support.mozilla.com/en-US/forum/1/665803

Fix the problem in Firefox. And/or put a large link on support.mozilla.com about how to solve it.
Hi from Germany,

same problem here. MacOS 10.6.3, FF 3.6.3. Very frustrating.

Plugins which will work:
Adblock Plus
Adblock Plus Element Hiding Helper
All-in-One Gestures
Counterpixel
Deutsches Wörterbuch
ebay Toolbar
Feed Filter
Fission
LiveClick
Long URL Please
NoScript
Quick Locale Switcher
QuickJava
ReloadEvery
TinEye Reverse
Trashmail.net
UrlbarExt
User Agent Switcher

Plugins which DONT WORK:
DownloadHelper
DownthemAll!
Greasemonkey
Modify Headers
Stylish
Weave

At least i've seen a couple of problems here with GreaseMonkey, Downthemall and
Weave. So what do these plugins have in common?
This now seems to work as expected on mozilla-central. I've got http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=33ff230a5b78&tochange=0d8bf91aa71e as the fix range, so maybe fixed by bug 570488 ?
(In reply to comment #26)
> This now seems to work as expected on mozilla-central. I've got
> http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=33ff230a5b78&tochange=0d8bf91aa71e
> as the fix range, so maybe fixed by bug 570488 ?

OK, finally some positive news...
Is there a version (beta, rc, ...) available somewhere that we can test?
Please note that even if the current issue with the components directory is fixed, the PoC for trunk should still be considered as it also decreases the number of stat() calls at startup (normalization does a stat() on every sub directory of the path starting from / to see if it is a symbolic link and possibly resolve it). With this PoC, any call to file.normalize for use with file.equals could be removed.
Blocks: 574458
No longer blocks: 574458
>At least i've seen a couple of problems here with GreaseMonkey, Downthemall and
Weave. So what do these plugins have in common?

Do they use a XPT component? At least, that is what triggers the issue with our iMacros addon: https://bugzilla.mozilla.org/show_bug.cgi?id=574334 
Renaming the xpt "solves" the issue (i. e. Firefox starts again, but the extension is broken)
This was fixed completely on trunk (for Firefox 4) with bug 568691, and I don't think there is a safe patch I'd take on branches, so let's call this WORKSFORME.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
(In reply to comment #32)
> This was fixed completely on trunk (for Firefox 4) with bug 568691, and I don't
> think there is a safe patch I'd take on branches, so let's call this
> WORKSFORME.

My patch for the 1.9.2 branch is pretty safe. Please consider it.

Other than that, what about comment 28 ? Need I file a new bug, now ?
Comment on attachment 431372 [details] [diff] [review]
Possible patch for 1.9.2 branch

Yeah, that seems safe enough. r-d, this will help Linux (and mac) users whose application or profile is installed in a symlinked location, and is fairly low risk.
Attachment #431372 - Flags: review?(benjamin)
Attachment #431372 - Flags: review+
Attachment #431372 - Flags: approval1.9.2.8?
re: comment 28 followup, I don't think so: I'm pretty sure we aren't normalizing files at all any more.
(In reply to comment #36)
> re: comment 28 followup, I don't think so: I'm pretty sure we aren't
> normalizing files at all any more.

Maybe not in mainline but the bug affects all 3.6.x releases.

Some of the bugs marked as duplicates of this one are reports against 3.6.x versions not trunk.

If every bug reported about this gets marked as a duplicate of this one then it just won't get fixed for 3.6.x...
(In reply to comment #37)
> (In reply to comment #36)
> > re: comment 28 followup, I don't think so: I'm pretty sure we aren't
> > normalizing files at all any more.
> 
> Maybe not in mainline but the bug affects all 3.6.x releases.

Oops I can't read can I.
It probably makes the most sense to reopen this bug as a branch bug if we're going to try to land glandium's patch.
Status: RESOLVED → REOPENED
blocking1.9.2: --- → -
Resolution: WORKSFORME → ---
Version: Trunk → 1.9.2 Branch
Comment on attachment 431403 [details] [diff] [review]
Different approach for trunk

The trunk patch is obsolete if I'm reading comment 32 correctly.
Attachment #431403 - Attachment is obsolete: true
Attachment #431403 - Flags: review?(benjamin)
This problem seems to be gone in Fedora Thunderbird 3.1.1 - I have both lightning and enigmail installed and enabled, and there are no problems launching. Is that expected?
(In reply to comment #43)
> This problem seems to be gone in Fedora Thunderbird 3.1.1 - I have both
> lightning and enigmail installed and enabled, and there are no problems
> launching. Is that expected?

I can't speak for the Fedora package, but the Thunderbird 3.1.1 from Mozilla still breaks because of this with Lightning.
This quickly neutralizes the bug for me (Linux Firefox, Sync Addon, symlinked profile)

$ firefox                   # runs OK
$ firefox --safe-mode       # just exit in the safe-mode options prompt
$ firefox                   # runs OK

This command line as well
$ firefox --safe-mode  &&  firefox
Comment on attachment 431372 [details] [diff] [review]
Possible patch for 1.9.2 branch

a=LegNeato for 1.9.2.9.
Attachment #431372 - Flags: approval1.9.2.9? → approval1.9.2.9+
Depends on: 584156
This bug broke xpcshell-tests on Mac on the 1.9.2 branch, see bug 584156.
(In reply to comment #47)
> This bug broke xpcshell-tests on Mac on the 1.9.2 branch, see bug 584156.

This doesn't make sense, because the patch doesn't change how registration is done.
(In reply to comment #48)
> (In reply to comment #47)
> > This bug broke xpcshell-tests on Mac on the 1.9.2 branch, see bug 584156.
> 
> This doesn't make sense, because the patch doesn't change how registration is
> done.

Sorry, this could actually be bug 582012 - I missed checking the pushlog and didn't see it was pushed at the same time. I'm now building 1.9.2 and will confirm either way in a bit.
(In reply to comment #49)
> (In reply to comment #48)
> > (In reply to comment #47)
> > > This bug broke xpcshell-tests on Mac on the 1.9.2 branch, see bug 584156.
> > 
> > This doesn't make sense, because the patch doesn't change how registration is
> > done.
> 
> Sorry, this could actually be bug 582012 - I missed checking the pushlog and
> didn't see it was pushed at the same time. I'm now building 1.9.2 and will
> confirm either way in a bit.

Yep, local backout is showing that bug 582012 does indeed appear to be the culprit.
No longer depends on: 584156
Depends on: 584156
This landed and was backed out for the test failure.
Comment on attachment 431372 [details] [diff] [review]
Possible patch for 1.9.2 branch

Removing .9 approval as this missed landing before freeze. Feel free to nominate again, though the bar for approval will be higher.
Attachment #431372 - Flags: approval1.9.2.9+ → approval1.9.2.9-
Was the backout (of this patch) mentioned in comment 51 because the the "xpcshell-tests on Mac" problem which *seems* from the comments here to have been because of a different patch.  Ie  did this really break something despite what comments 48, 50 seem to imply?

The patch here doesn't seem to check if the Clone() succeeds so I suppose it might cause a problem on some systems assuming that Clone() needs to allocate memory.

Mind you I assume that Normalize() also might need to allocate memory if the resulting string is longer, I don't know if it can/will fail gracefully.

Apart from those the only thing which looks likely (IMHO) is just that it changes the memory layout slightly so maybe exposing a corruption problem somewhere else...

Does anyone have a simple test case showing a failure after applying the patch?

I wonder if changing the patch to call both current->Normalize() and normalized->Normalize() will show the same problem (not that such a patch would fix the problem that this bug was opened for, but it might be a test to see what is actually failing).
I don't know if Normalize() on mac can change things regarding case (since the fs is case insensitive), but if the failing test can vary depending on components directory case, that could be the source of the problem.
May I just remind you that whithout this patch, most linux distribution break as componen are indeed symlinked. So if ou do not add it upstream, mots distribution will ship it meaning that not landing it will be a failure. And BTW this cleraly indicates that the test plan should include symlinked components!
So, this is what can be seen on OSX debug builds with the patch applied:

###!!! ASSERTION: This is not supposed to fail!: 'Error', file /builds/slave/tryserver-macosx-debug/build/js/src/xpconnect/src/nsXPConnect.cpp, line 1017
###!!! ASSERTION: Failed to initialize nsScriptSecurityManager: 'NS_SUCCEEDED(rv)', file /builds/slave/tryserver-macosx-debug/build/caps/src/nsScriptSecurityManager.cpp, line 3455
and what XPCOM_DEBUG_BREAK=stack-and-abort unveils:
DumpJSStack+0x00003B87 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0004EF7B]
std::vector<unsigned short, std::allocator<unsigned short> >::resize(unsigned long)+0x0000429C [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x000461B2]
DumpJSStack+0x00039233 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x00084627]
std::vector<unsigned short, std::allocator<unsigned short> >::resize(unsigned long)+0x00008046 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x00049F5C]
DumpJSStack+0x0024756A [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0029295E]
DumpJSStack+0x00251079 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0029C46D]
DumpJSStack+0x00251666 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0029CA5A]
DumpJSStack+0x002576DA [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x002A2ACE]
JNIEnv_::ThrowNew(_jclass*, char const*)+0x00063D7A [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x01185924]
NS_GetComponentRegistrar_P+0x00004046 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011E67AE]
NS_GetComponentRegistrar_P+0x00005D09 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011E8471]
JNIEnv_::ThrowNew(_jclass*, char const*)+0x00056CF9 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011788A3]
JNIEnv_::ThrowNew(_jclass*, char const*)+0x0005720C [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x01178DB6]
DumpJSStack+0x00004E3E [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x00050232]
DumpJSStack+0x00004E98 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0005028C]
DumpJSStack+0x000BCF53 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x00108347]
DumpJSStack+0x000C096C [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0010BD60]
NS_GetComponentRegistrar_P+0x00004C3A [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011E73A2]
NS_GetComponentRegistrar_P+0x0000715B [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011E98C3]
NS_GetComponentRegistrar_P+0x000072BF [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011E9A27]
NS_GetComponentRegistrar_P+0x0000780E [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x011E9F76]
NS_InitXPCOM3_P+0x000008D7 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0118BEA7]
NS_InitXPCOM2_P+0x0000002F [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./XUL +0x0118BF41]
NS_InitXPCOM2+0x0000001F [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./libxpcom.dylib +0x000019DD]
start+0x00001601 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./TestRegistrationOrder +0x00001E31]
start+0x00000E7F [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./TestRegistrationOrder +0x000016AF]
start+0x000000FB [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./TestRegistrationOrder +0x0000092B]
start+0x00000029 [/Volumes/Namoroka/NamorokaDebug.app/Contents/MacOS/./TestRegistrationOrder +0x00000859]
So, I haven't found out (yet) what and why exactly, but what happens is that something on mac doesn't like that the components directory isn't normalized on mac. It's most probably related to bug 530188. The simplest "fix" I found is to normalize in nsDirectoryService::GetCurrentProcessDir (interestingly, the unix codepath uses realpath against MOZILLA_FIVE_HOME, so the patch makes it somewhat more close to what the unix codepath does).
This also explains why the test fails in the harness, where it is started as $BUILD_DIR/xpcom/tests/../../dist/bin/TestRegistrationOrder, and not when run by hand with $BUILD_DIR/dist/bin/TestRegistrationOrder. (running ./TestRegistrationOrder naturally fails, too)
Attachment #472256 - Flags: review?(benjamin)
Attachment #472256 - Flags: review?(benjamin) → review+
Attachment #431372 - Flags: approval1.9.2.12?
Comment on attachment 472256 [details] [diff] [review]
workaround for mac

We're too late for .11 but that could be considered for .12. The other patch attached to this bug was landed and led to bug 584156 on mac only, so it was backed out. This patch is an additional workaround for mac, which avoids bug 584156, according to my testing.  This code is in a mac only part of the code so it doesn't affect anything but mac, and on mac, it may affect the value returned for the current process directory (depending on what exactly Normalize does on mac, and where Firefox is installed). This /shouldn't/ have an impact.
Attachment #472256 - Flags: approval1.9.2.12?
(In reply to comment #54)
> I don't know if Normalize() on mac can change things regarding case (since the
> fs is case insensitive)
...

Yes the default hfs+ setup on a Mac is case-insensitive but it isn't always the case.

If any code assumes that all Mac fs are case insensitive then it will break for those who choose to turn on the case-sensitive feature of hfs+ or for example where the files are mounted from a server with a protocol which is, such as smb or nfs.

[ not that this should be (hopefully) relevant to the bug... ]
We're early in the development for .12 so I am willing to get this landed if it lands soon. If there are any issues we'll back it out, likely for good.
Attachment #472256 - Flags: approval1.9.2.12? → approval1.9.2.12+
Attachment #431372 - Flags: approval1.9.2.12? → approval1.9.2.12+
a=LegNeato for 1.9.2.12. Please land only on the mozilla-1.9.2 default branch, *not* the relbranch.

Also, please be sure to land both patches (preferably as one patch).
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/5e114301d046
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Hello,

I had the same problem on SuSE Linux 11.1 with firefox 3.6.8.
After the second start with the add-ons "Firefox Sync",
downthemall or greasemonkey firefox directly terminates.

Here we use amd, not automount and the home directories are accessed
over a link.(The "bug" can be seen in the file xpti.dat.)

The two patches fixed the problems for me.

regards,

Martin
I built the "default" head of mozilla-1.9.2 yesterday and I can confirm that it fixes the problem for me too. Thanks!
You need to log in before you can comment on or make changes to this bug.