Closed Bug 76431 Opened 21 years ago Closed 19 years ago

Profiles need to be protected from running multiple instances of mozilla

Categories

(Core Graveyard :: Profile: BackEnd, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.0

People

(Reporter: gordon, Assigned: ccarlen)

References

(Blocks 1 open bug)

Details

(Keywords: dataloss, topembed+, Whiteboard: [adt3] [ETA 05/05] [driver:brendan])

Attachments

(5 files, 8 obsolete files)

17.39 KB, patch
darin.moz
: superreview+
Details | Diff | Splinter Review
26.07 KB, patch
Details | Diff | Splinter Review
26.42 KB, patch
brendan
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
53.33 KB, patch
Details | Diff | Splinter Review
27.75 KB, patch
jesup
: review+
jesup
: approval+
Details | Diff | Splinter Review
If multiple instances of mozilla run using the same profile, the cache files can 
become corrupted.

We need some kind of mechanism, like lock files used in 4.x, to prevent cache 
corruption.  This is probably a wider issue than just the cache.  I know we have 
this potential problem on Unix, but we need to ascertain whether is could affect 
Windows or Mac.
Target Milestone: --- → mozilla0.9.1
Gagan, who should fix this?  Profile?
In 4.x we protected the cache, global history, and certificate with somekind of 
lock file (at least on unix).
Component: Networking: Cache → Profile Manager BackEnd
Summary: Cache needs to be protected from running multiple instances of mozilla → Profiles needs to be protected from running multiple instances of mozilla
Whiteboard: [cache]
This is more than cache. We need to address this globally. Taking over till I
find the right owner.
Assignee: gordon → gagan
way more than just cache. I'm sure the mailnews files can't handle simultaneous 
use. If you make pref changes in both instances you will lose the pref changes 
in the one you shut down first. It wouldn't be hard to think of more examples.
per mtg with gagan, move to target milestone 0.9.2 and cc: ccarlen as this is in
profiles.

The mailnews folder locking bug is actually tracked as bug
http://bugzilla.mozilla.org/show_bug.cgi?id=12850.
Target Milestone: mozilla0.9.1 → mozilla0.9.2
over to ccarlen
Assignee: gagan → ccarlen
Whiteboard: [cache]
Making lock files happen on all platforms is going to take some thought. I would
think that this would be a problem on all platforms. I've seen this happen with
the global history file on Mac. Question is: should two instances of mozilla
even be able to run on the same machine? I don't think so. On Windows, there is
code to prevent this. CC'ing law for input on this.
Status: NEW → ASSIGNED
on windows you can lock a file for relatively exclusive use, and i'd expect 
that we use that mechanism.  It was my impression from the silly build config 
flock files that unix does have a real way to lock files instead of litering 
the directory tree w/ lock files.

does NSPR not have some way of asking the os if a file is in use?

how do you manage to get multiple versions of mozilla to run concurrently on 
macos? or do you just run 2or more of: PPEmbed, Mozilla, Netscape6?

Personally I find the nc4.7x unix cache lock error to be absolutely annoying 
and stupid because i tend to get it constantly when for some reason the lock 
file managed to outlive the browser (i guess i crashed it some time ago).  If 
we're going to give a dialog like the one nc4.7x unix gave then we should have 
a button 'clear locks' w/ help explaining that it's possible for locks to 
outlive the browser instance in the unfortunate event that the browser or 
system crash.

There's supposed to be a readonly Cache implementation so if the cache db file 
is locked for write we should fall back to the readonly mode.

Mail folders should only lock for write for small time intervals when they 
actually need to write, although i'd be satisfied if we locked the currently 
open folder(s) ala nc4.  On w2k advanced server w/ terminal services, I often 
have the console running nc4.7x downloading mail and then use a terminal 
session to read mail from as many folders as the os/nc4.7x let me [which is all 
folders that aren't selected in the console instance -- selecting local mail or 
a newsserver in the console instance means i can read all mail folders in the 
terminal session instance].

outlook2000 really sucks because it aggressively locks all mail folders, i had 
20 'folders' on \\remote\share\*.pst and it locked all of them as soon as it 
launched, preventing all other computers from being able to open any of the 
folders until both it and they quit and then a new instance could run and 
access them...
> does NSPR not have some way of asking the os if a file is in use?

No.

On Unix you can look into fcntl() file locking.
In order to do this easily and XP, it would be nice if, on
nsILocalFile::OpenNSPRFileDesc(), it was possible to specify exclusive r/w
access on the file to be opened. Or, is that possible to do using the mode
parameter of that method. Where is the meaning of that mode parameter
documented? In NSPR?
>There's supposed to be a readonly Cache implementation so if the cache db file
>is locked for write we should fall back to the readonly mode.

If one browser instance had "write access" to the cache, it would likely be 
completely incoherent for another instance to try to read fromt the cache files.  
We would need to implement some sort of inter-application communication to 
arbitrate access to the cache.  A readonly Cache implementation is not a solution 
to this problem.
The "single instance" behavior on Win32 is primarily a performance thing (it
keeps you from having multiple processes chewing up resources when you
double-click on Internet shortcuts, for example).  The fact that it protects
against such things as cache corruption is just a fringe benefit.  I don't know
much about those issues.

I think we do still have exposure even on Win32 because profiles (and therefore
files) can be shared between Mozilla and Netscape builds.
> We would need to implement some sort of inter-application communication
> to arbitrate access to the cache.

This seems fraught with peril. Would you like access to your cache dependent on
an Apple Event communication between to apps on the Mac? Also, if one process
has exclusive r/w access to a file, what is to be communicated? What's the 2nd
process to do other than wait until the file is closed, fail, or take some other
course of action?

> A readonly Cache implementation is not a solution to this problem.

Right. But because this problem is not only with the cache, but anything within
a profile dir which can be opened with write access. Because of the number of
files involved, that's why I say having interprocess communication to arbitrate
access is complex and fraught with peril. Interprocess comunication may be
useful if applied broadly. One process, before selecting a profile, could ask
others if that is its current profile and prevent the user from selecting that one.


Component: Profile Manager BackEnd → ActiveX Wrapper
Target Milestone: mozilla0.9.2 → M1
>undoing random damage to bug fields<

Ok, so lemme quickly enumerate all files in my profile:
bookmarks.html
* nc4 has a simple 'bookmarks have been changed on disk, reload?
* moz has bugs ~we don't close bookmarks until we quit~ so assuming the file 
locking we shouldn't have a problem (just no bookmarks, and since the reason i 
want multiple instances is mail i don't care if bookmarks are empty and 
disabled.
panels.rdf
* this file shouldn't be changing often and really shouldn't be opened for 
write access.
localstore.rdf
* this one is a big problem because most user clicks that aren't buttons result 
in changes to properties that we persist here.  I'd be perfectly happy if one 
instance locked for writing and all others could open read-only.
search.rdf
* this should be like panels although it should change even less often
mimeTypes.rdf
* you'll only be actively browsing in one instance and therefore only tinkering 
in one instance of this file, so we should be opening readonly until someone 
makes a change at which point we temporarily close, open readwrite, write, 
close, reopen readonly.
prefs.js
* this is a problem. either the arguments for mimeTypes.rdf or localstore.rdf 
apply
history.dat
* same as prefs.js except it changes frequently, so probably lean towards 
localstore.rdf behavior.
cookies.txt
* same as history.dat

someone already addressed these:
cert7.db
key3.db
secmod.db

panacea.dat
* this is some sort of news file, hopefully it changes rarely so ala 
mimeTypes.rdf
abook.mab
* this is probably the addressbook, so like mimeTypes.rdf
history.mab
* this appears to be a structure file, so hopefully it never changes, if it 
does it will have to match the behavior of whatever file it describes -- or it 
might limit the possibilities -- i don't know much about mork except that i get 
assertions from it.

these are directoris:
US, contains: bookmarks.html panels.rdf localstore.rdf search.rdf 
mimeTypes.rdf -- all of which were already addressed, although i suspect this 
is for presets.

chrome, contains: user-locales.rdf user-skins.rdf -- users shouldn't 
force changes to these often, ideally they're smart enough to only have one moz 
running while they commit serious tinkerage -- behave like localstore.rdf

NewCache -- other people have better discussions.

there are already bugs filed about ipc arbitration we can either use them or 
just talk about everything here since we have everyone's attention.

fwiw your view of IPC is silly.  The reason i run two instances of nc4 mail is 
because i want one to always be running and downloading my mail, and one 
because i only need it occasionally [specifically when i'm remote] and want to 
read my mail -- yes there should be a risk of dataloss if i happen to lock a 
folder while the main mailer is trying to deliver mail but that's my risk. -- 
oh and the odds that your ipc will catch Terminal Services activity are really 
poor especially when combined w/ UNC and remote files. relying on system 
locking is much wiser than relying on artificial locking.

suppose I decide i don't like mozilla's meddling so i take my ext2 partition 
and make symlinks for all the critical files.  and then you use .lock files in 
the profile directory, but since they aren't the same profile directory you're 
contending for the same data w/o the locks _ever_ working. -- this is really 
not too inconceivable, take terminal service's approach to %temp%:

[console]
qz9w3.slt\NewCache>echo %temp%
J:\DOCUMENT\Josh\LOCALS~1\Temp

[terminal session]
F:\build\mozilla\xpinstall\packager>echo %temp%
J:\DOCUMENT\Josh\LOCALS~1\Temp\1

if you decided to stick locks in %temp% you'd never know there's another user 
logged in -- which is part of the point of terminal services (indeed there's 
another user besides the two of me logged in, and you certainly couldn't tell 
from this, because even dir /s /ad %temp%\.. wouldn't turn it up -- wrong 
username.

however, in the ext2 case assuming ext2 isn't crazy, if you lock the file it 
will appear locked reguardless of how someone gets to it.
Component: ActiveX Wrapper → Profile Manager BackEnd
Target Milestone: M1 → mozilla0.9.2
Not gonna make 0.9.2
Target Milestone: mozilla0.9.2 → mozilla1.0
This is really bad.  Can we find someone else that has the time to fix this?
The reason that I pushed this one off is that it looks to be a huge amount of
work. The only short answer I can see is to lock a whole profile dir which would
protect everything in it. I think that would be too restrictive and it probably
should be dealt with on a case-by-case basis. Altering all consumers of files
within a profile dir is huge. Gordon, if you can see a shorter way to profile
safety, let me know.
I just want at least the same protection we had in 4.x.  That protected the disk 
cache, global history, and certificates.  I think making each component design 
and implement their own scheme for locking would needlessly duplicate effort and 
be more difficult to debug and maintain.

I'm not sure what happened here. I tried to move the bug along around the end of 
April, but it doesn't look like you got the bug until just last week.  I 
understand that doesn't leave you much time for it, but I think this could be a 
serious problem for many users.
at the very least, i'd like to see something similar to 4x's lock file.  would
this really be that difficult to implement?  ...not that i know anything at all
about how the profile code works ;-)
The profile mgr knows nothing about the files within a profile dir other than,
in some cases, their names. It implements nsIDirectoryServiceProvider and
provides the locations defined in the 2nd part of
http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsAppDirectoryServiceDefs.h.
Other components request these keys through the nsIProperties iface or they just
get the key for the current profile dir and append from there. I suppose the the
profile mgr, when it got the first request for a location that we say should be
locked, it could do the lock. The problem is knowing how long to maintain the
lock unless some changes were made to nsIDirectoryService and/or
nsIDirectoryServiceProvider. 
interesting... couldn't the locking/unlocking be triggered using the profile
change observer stuff? 
in fact, we need to try to acquire the lock when the user picks a profile so we
can gracefully fail and allow the user to select a different profile.  even better
than that:  the profile manager dialog should show gray-out the profiles that are
currently in use and only allow unlocked profiles to be selected.  is this a
possibility or just feature creep?
Darin, yes it could be triggered by the profile change notifications. Actually,
since it is the profile mgr sending those notifications, it would be making a
long distance call to itself :-) It could be done more directly than that if the
profile mgr is maintaining a lock on the whole dir.

Also, that suggests that the lock is in place for the whole time that the
profile is in use. That's certainly easier to do if that's OK with people. I
thought what we were after were short term, only when nescesary locks on each
individual file. I think I mentioned a while back that it's possible (easily) to
lock an entire profile dir and that idea was rejected.
i think we should go for the simple "lock the entire profile" approach and then
open a bug to improve it.  at least it would solve the problem... though perhaps
not optimally.  as it is, we risk all sorts of corruption and potential crashes
(i would think) by not locking the profile to one instance of the browser.
I agree.
OK, that's what I'll do - consider the whole dir locked for the time that it's
set to be the current profile. 
*** Bug 98016 has been marked as a duplicate of this bug. ***
*** Bug 98038 has been marked as a duplicate of this bug. ***
This should be given a higher priority. When you run two versions at the same
time, you can do things like read mail in a session and then destroy the entire
mail config in prefs.js.

*** Bug 83283 has been marked as a duplicate of this bug. ***
*** Bug 101902 has been marked as a duplicate of this bug. ***
-> 0.9.5. I can deal with this (at the soonest) in about a week.
Target Milestone: mozilla1.0 → mozilla0.9.5
See bug 102947.
*** Bug 102947 has been marked as a duplicate of this bug. ***
-> 0.9.6
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Mass move to 0.9.7
Target Milestone: mozilla0.9.6 → mozilla0.9.7
mass move to 0.9.8
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Shouldnt that be critical since it can cause data loss? Maybe there should be 
a warning in the release notes about this as long it is not fixed?
i recently mistakenly launched two instances of mozilla w/ the same profile.  i
closed one and continued to use the other.... thinking that since i hadn't
really done anything it should be ok.  but, as a result, some pages didn't load
completely... and things just generally were bad.  of course, i shouldn't be
running two instances of the browser under the same profile, but i imagine the
senario i just described has happened to more than a few people out there.  at
any rate, it just makes mozilla look bad.
+dataloss (coming from my experience). any other keywords we need?
Keywords: dataloss
how about nsCatFood, if that still means something.
Keywords: nsCatFood
qa -> ktrina
QA Contact: tever → ktrina
ATM, cache is the big issue ... see bug 105746.
Blocks: 119086
Working on this. I'm using a lockfile in the profile directory when it's in use.
Doing this so that it's impervious to crashing (lingering lockfiles) and on all
platforms is a bit of a chore but is coming along.
In progress, but not gonna make 0.9.8. -> 0.9.9
Target Milestone: mozilla0.9.8 → mozilla0.9.9
*** Bug 122553 has been marked as a duplicate of this bug. ***
This patch provides locking of the entire profile dir on three platforms in a
way that shoould be impervious to lock'n'crash. It doesn't do much fancy with
the UI, when you select a profile that is locked you get a dialog saying it's
in use. Tested on Mac CFM, Windows, Linux & Mach-0 I had to hack some code on
Mac & Win to even be able to get in the position of sharing a profile with
another instance. There is code on both those platforms to prevent the app from
running if another instance is running.
It would be nice(tm) if, when Mozilla discovered that a profile was locked,
it would send a remote invocation to the instance already running in that
profile to cause it to open a new window.  (Akin to what happens,
on Unix, when you do "mozilla -remote 'openURL(, new-window)'" - except
that it ought to load the user's home page instead of about:blank.)

This would have the effect of seamlessly mapping "run Mozilla again on
an active profile" to "open a new Mozilla window from that instance",
which is much more user friendly than popping up an error dialog box.
Zack, see comment #24 - #26. What you propose would be nice and it would match
the Windows behvior but, for now, this will prevent data loss. What you want
should be an RFE to XPApps.
I'm not seeing the relevance of comment 24-26, but okay, I'll file
a separate RFE and mark it dependent on this bug.
So now it'll be impossible to test a debug build of mozilla on a machine that's
already running a release build, without making multiple profiles?  Yuck.

Is there really no hope for just getting a readonly instance like 4.x used to do?
Akk: just create multiple ~ dirs.  My computers tend to have at least 2 if not 
3.

otoh, at least one of those also has multiple profiles.

Whatever happened to the idea of readonly profiles?  I should be able to use 
mozilla at a severe performance penalty with the stipulation that it won't 
write anything to my profile dir?
OpenVMS doesn't do flock (byte range locking). Does anyone mind if for OpenVMS 
(builds with XP_UNIX and VMS both set) we do something a little more simple? A 
remove() followed by an open(foo,O_CREAT|O_EXCL,...) will do it. On OpenVMS you 
can't delete a file if someone else has it open, so if the open() succeeds, you 
hold the "lock".
That sounds good - please do. If that's true (the remove will fail) for all
Unixes, might as well do that for all since it's simpler. 
I believe its normal on UNIX for a remove() to succeed, even if someone else has 
the file open. So unfortunately this solution can't be for all XP_UNIX builds, 
only OpenVMS.
That's my understanding as well: Unix allows files to be unlinked
even if another process has them open.  (It's not really a delete -
the file stays around until the other process closes it; also, there
might be other links to the file.)
Blocks: 122698
colin, can you update the patch with the OpenVMS changes?
I don't have a current tree I can try this in, in order to submit a revised 
patch. But basically I would need this code BEFORE the "#elif defined(XP_UNIX)" 
line:

#elif defined(VMS)
    nsXPIDLCString filePath;
    rv = lockFile->GetPath(getter_Copies(filePath));
    if (NS_FAILED(rv))
        return rv;

    remove(filePath.get());
    mLockFileDesc = open(filePath.get(), O_WRONLY | O_CREAT | O_EXCL, 0666);
    if (mLockFileDesc == -1)
    {
        NS_ERROR("Failed to open lock file.");
        return NS_ERROR_FAILURE;
    }
    
    mLockFile = lockFile;
Attached patch with OpenVMS addition (obsolete) — Splinter Review
Attachment #67135 - Attachment is obsolete: true
Wan-Teh, could you review the directory locking scheme?
Micheal, can you see what would need to be done here for OS/2?
I reviewed the directory locking scheme on Unix, Windows, and Mac.
I only found one problem with the design on Unix.

On Unix, the "parent.lock" file should not be deleted.  (I described
a problematic event sequence to Conrad in private communication.)
Since the lock file will be permanent in the profile directory, I
also suggest that it be renamed so that users won't be confused to
see it in the profile directory.

The designs for Windows and Mac are fine.
The OpenVMS part looks good. Thanks for including in the patch. I will test once
all this is checked in.
Gordon, can you review the Mac impl? Wan-Teh went through the Unix & Windows but
deferred the Mac part. Sadly, because of some code in nsNativeAppSupportMac
which prevents the app from even launching if another instance is running, this
code won't get the chance to be used. I had to comment it out to test. After
this is landed, that can be changed.
I don't quite understand.  After this is landed, what can be changed?  I didn't
really think this bug was intended to enable multiple instances to run
simultaneously.  I viewed it more as protecting profiles on those platforms that
didn't already have an existing mechanism of protection.
Attached patch with OS/2 addition (obsolete) — Splinter Review
Attachment #67499 - Attachment is obsolete: true
When specifying a profile from the command line on start (i.e. "mozilla -p
name"), mozilla brings up the profile manager if the profile is locked. 
However, the user does not know why they got the profile manager rather than
starting up into the browser.  Would it be possible to have the alert pop up in
this case also, letting the user know that that profile is locked?
> I didn't really think this bug was intended to enable multiple 
> instances to run simultaneously.

Then is there another bug for that?  4.x and earlier versions always worked fine
in that mode (with everything after the first instance being read-only), so it
would be a shame if we can't do as well as 4.x.  

If it's really that hard in our new cache model to deal with a readonly cache,
how about no-cache for the second instance, and readonly for everything else
(like prefs)?
Gordon, On the Mac anyway, its horrible mechanism of just quitting on detecting
another instance running could be changed. I, for one, want to be able to run
one instance with one profile and still be able to run another one as long as I
use a different profile.
akkana: there's far more than just the disk cache to consider (cert db, cookies
db, etc.), and a readonly disk cache would mean that plugins that require
stream-as-file simply wouldn't work (eg. flash).
Conrad: I think another bug should be opened for enabling multiple instances to
run.  The behavior I would like to see is if a different profile is chosen, you
get a separate instance of the browser, but if a currently used profile is
chosen, you "switch" to that instance and possibly open a new window.  I'm not
sure who should own that bug though.
That seems strange to me. But, even if we did that, we still need this mechanism
to know that a profile is in use.
i'd have to agree with gordon... that does sound like a very user friendly
solution to this problem.  you seemlessly get a browser window w/ the profile
you requested.  most users won't understand what it means for a profile to be in
use or locked or whatever you call it.
Gordon, Darin,

Your proposed solution won't work when the profile is
on a shared file system and the user is simultaneously
logged in on multiple machines, trying to use the same
profile.

I suggest you consider a solution that use file locking
to protect access to the files that must persist across
sessions and create one-time use copies of the files
that do not need to persist across sessions.  The cert
database and bookmarks would be files that must persist
across sessions and the cache files would be files that
do not need to persist across sessions.
I'm sorry.  I didn't mean to imply this patch was not desired.  I was talking
about *additional* functionality.

In the case where there is a shared file system and the profile is in use by
another machine, then there is no choice but to inform the user of this.

Wan-Teh, why do you think cache files don't need to persist across sessions?
Gordon,

If cache files don't persist across sessions, performance
will suffer, but the browser should still function.  So
I consider cache file persistence a "highly desirable"
but not a "must have".

wtc: tell that to the flash plugin ;-)  unfortunately, the way we currently
implement stream-as-file for plugins requires a disk cache.  this could of
course be changed, but just thought i'd mention it.

ccarlen: likewise, i'm not in any way opposed to the current patch... was also
just thinking about where we might go from here.
Conrad,

I believe that Unix fcntl() file locking only protects
access to a file by processes on the same machine.
Therefore your scheme can't protect the profiles from
running multiple instances of mozilla on different
machines that share the profile directories.
Wan-Teh - It depends on whether the NFS server supports locking. Here's
something I found on the subject: http://www.spinnaker.de/linux/nfs-locking.html
Dear All,

Sorry to butt in on this thread of conversation, but I have been following with
interest the conversation about locking problems (I have experienced race
conditions myself (see bug 124623)).

Locking on a *nix flavour system is not a problem, fcntl works. As an example,
mail programs like Mutt and Pine implement locking for exactly this reason, so
that instances on different machines cannot work against each other. OK, so this
sorts out Linux et al.

The only concern I have is that this is fairly *nix specific. How will this sit
with Windows? I have very little knowledge in this area. Maybe the multi user
scenario cannot exist in the same way? If it can, what solution then?


Paul
fcntl() works on NFS/AFS. it's flock() that's local-only.
> How will this sit with Windows?

On Windows, you're less likely to run into a profile in use by another instance.
There, we use another mechanism to ensure that only one instance runs at a time.
The 2nd, on detecting the 1st, sends a msg to the 1st asking it to open the
requested URL or a new window and then quits. This is not done between instances
of mozilla and NS6. In that case, data corruption due to profile sharing is
still possible and this patch will prevent that.

On Mac, though you didn't ask ;-), it's impossible to run into a profile in use
by another instance of mozilla or NS6. There, the program just puts up an alert
and quits on detecting another instance. While this patch doesn't change that
(horrible) behavior, it at least makes it possible to change it.

conrad: what about when a windows user's profile is located on a SMB share? 
what happens if the user runs mozilla from more than one machine w/ the same
profile?
1. AFS @umcp has reported corruption between pine + procmail iirc.
2. On WindowsNT Terminal Server or W2k w/ Terminal Services, or any NT w/ 
switcher.exe you can easily run multiple instances. I tend to run multiple 
mozillas.
Darin: I don't know. I'm relying on CreateFile() to behave in the same way WRT
shareMode on an SMB share as locally. In the msdn doc for that routine, I don't
see any caveats that would make me think it wouldn't. 
ccarlen: sounds good.
*** Bug 124623 has been marked as a duplicate of this bug. ***
*** Bug 126563 has been marked as a duplicate of this bug. ***
Why not make a copy of the cache as it exists when the second instance starts up
and then make modifications to the local cache copy ? You can also unlink all
the  files that are copied so that they won't stay around if the second instance
exits abnormally. There are ofcourse a whole lot of race conditions possible
with persistent files. What happens if I add a new bookmark in the new instance
and delete a couple of bookmarks in the first instance ? Which one gets to the
disk ? In which order ?

In general it would be prudent that you warn the user (and not follow the
Netscape 4.x behaviour of making the files read-only) and proceed with copying
the files to a local profile directory and unlinking them immediately so that
any crashes are not likely to leave behind stale files. Also file locking is not
very reliable across NFS et al. It would be best to not rely on them as they
tend to work depending on the quality of implementation of the underlying OS. 
copying the disk cache can be prohibitively expensive.
If it only happens when a second instance is opened, I'd think it would be OK.
have you ever tried deleting your disk cache on a mac or a slow machine for that
matter? ;-)
Why not just start with a fresh copy of the cache ? This would be better that
not using any cache at all.
*** Bug 127198 has been marked as a duplicate of this bug. ***
Keywords: mozilla1.0
Target Milestone: mozilla0.9.9 → mozilla1.0
Moving Netscape owned 0.9.9 and 1.0 bugs that don't have an nsbeta1, nsbeta1+,
topembed, topembed+, Mozilla0.9.9+ or Mozilla1.0+ keyword.  Please send any
questions or feedback about this to adt@netscape.com.  You can search for
"Moving bugs not scheduled for a project" to quickly delete this bugmail.
Target Milestone: mozilla1.0 → mozilla1.2
is there any hope this can be pushed back to moz 1.0?
I agree with comment 96. Even a solution that just prevents a second instance 
from starting up is way better than profile/cache/mail corruption.
yup... 1.0 is the right target.
Keywords: topembed+
Target Milestone: mozilla1.2 → mozilla1.0
Alright then, Can I get r=/sr= on the patch? After the current patch was posted,
which at least prevents dataloss, the discussion turned to things way beyond the
summary of this bug. If we can agree that preventing dataloss is acceptable for
now, let's review the patch and take the discussion of sharing profiles, etc. to
another bug. 
I also see the problem with bookmarks, one created in an instance doesn't show
when you close both. build id 20020319 on Linux
*** Bug 134517 has been marked as a duplicate of this bug. ***
So, theres a 2 month old patch here. Does it still apply/work? Who needs to be
poked to review this?

As much as I'd love the ns4 behvaiour of just disabling the cache/session
history, this is still better than the current behaviour.
*** Bug 128990 has been marked as a duplicate of this bug. ***
Patch for Linux, OpenVMS, OS/2. Dups with Windows:
OS -> All
OS: Linux → All
*** Bug 136096 has been marked as a duplicate of this bug. ***
*** Bug 108849 has been marked as a duplicate of this bug. ***
*** Bug 136213 has been marked as a duplicate of this bug. ***
This is really, really annoying - I just clobbered my global history again. :(

nominating for nsbeta1 - this is important to get in sooner rather than later,
because the patch can only protect against corruption with other builds with the
patch in (else the lock file won't be created/checked)

(this also means that I have to play arround a bit to be able test this - I'll
try tomorrow)
Keywords: nsbeta1
You probably have to e-mail drivers@mozilla.org to get approval. (I'm not sure
of the exact process.)
Keywords: patch
It was said, on Windows there is already some protection against running
multiple instances, and therefore no risk for profile corruption.

But note, I learned on #mozilla IRC that there is an easy way to run multiple
instances accidentially, even on Windows.

It is possible to install and run Mozilla plus a commercial distribution based
on Mozilla on the same machine. If the distributor uses the default settings of
Mozilla, and stores the user's profiles in the same place, then both versions
will share the profiles.

Therefore, if a user on Windows runs Mozilla and a commercial distribution at
the same time, the profile will become corrupted, too.
Ya know, I asked for review of this patch over two months ago. Instead of
review, the discussion turned to how multiple instances should share the same
profile.

Wan-Teh reviewed the locking scheme. Gordon, Darin, can you do overall review?
Comment on attachment 68207 [details] [diff] [review]
with OS/2 addition

>Index: src/nsProfileAccess.cpp

>+// **********************************************************************
>+// class ProfileStruct
>+// **********************************************************************

class nsProfileLock ??


>+nsresult nsProfileLock::Lock(nsILocalFile* aFile)
>+{

>+    nsCOMPtr<nsILocalFile> lockFile;
>+    rv = aFile->Clone((nsIFile **)((void **)getter_AddRefs(lockFile)));

ugh!  why do we even have nsILocalFile?


looks good to me.  sr=darin
Attachment #68207 - Flags: superreview+
Kaie: you should probably read all of my comments, they describe various issues 
as well as ways to run multiple mozilla's on windows nt.
kaie: That issue would be solved by using different profile dirs for the
commercial and non-commercial version (bug 107694). They shouldn't be using the
same profiles anyway due to minor incompatibilities.
Blocks: 138000
So I wrote the MozillaClassic Unix locking in 2.0, and it's still in 4.x.  It
has the advantage of working with NFS servers who have no or broken lock
managers, and it's 4.x compatible (which was the point raised in the C|net
review against Mozilla).  See
http://lxr.mozilla.org/classic/source/cmd/xfe/mozilla.c#3659.

/be
If we're going to ship this in 1.0 then we need to get it in for RC2.
Keywords: mozilla1.0mozilla1.0+
Attached patch updated patchSplinter Review
The old patch suffered some bit rot. The new patch has suggestions from
Wan-Teh's review, the nsILocalFile member var in the lock only exists for Mac,
and it was just updated for the file API changes.
Attachment #68207 - Attachment is obsolete: true
When one instance is open, mozilla can just act normally. When 2 or more are 
open, one Mozilla instance (by random or first open) can be chosen as the 
server, and it can negotiate all WRITE access to profiles. Read access could 
be done without any problems or negotiation. The trick would be to keep writes 
to a minimum for the main profile dir from the non-server. This could be done 
by doing things like having a seperate cache for the new instance and having 
both instances look at both caches.
But of course, that would be a long-term fix and should probably be a separate 
bug since a quicker fix exists.
*** Bug 141546 has been marked as a duplicate of this bug. ***
Keywords: nsbeta1nsbeta1+
Whiteboard: [adt3]
Blocks: 119357
I've trashed data due to dueling profiles.  This is SO easy to hit in Unix.  I
do it all the time if I'm not careful to use -P.  
Whiteboard: [adt3] → [adt3] [ETA 05/05]
Comment on attachment 81285 [details] [diff] [review]
updated patch

sr=darin (still looks good to me)
Attachment #81285 - Flags: superreview+
I don't see the code for unix that checks to see if there's already a valid
process running.  Brendan's code did this.  Conrad, were you going to do work on
that?  I thought you were after our email exchange.
Checking for a running process doesn't make sense if the profile is on a
remotely mounted filesystem, does it?  That's pretty common in the unix world,
at any rate.
Read my 2.x-era code please, then ask questions.  The pid is qualified by IP
address.

/be
Ok, I looked over brendan's old 4.x code extensively, and this patch (looking at
the Unix implementation - some of this might apply to the others, but I don't
know them as well).

The new code depends on file locks (fcntl(...,F_SETLK,...)) being valid across
network mounts.  Almost all network FS's including Win32 servers (via samba)
support fcntl() locks.  When the process dies, the lock will be released.  This
doesn't require our code to know the IP address (or that they be unique, which
may be an issue with multiple NAT subnets sharing IP's), and it's immune to PID
rollover (not a big deal, but possible).  Added bonus: if the lock exists and
was created by another machine, if the process there is dead (crashed, lost
connection, etc), we don't have to manually clear the lock.


The old code is a complete implementation for handling profiles on shared
servers using symlinks, and using the IP address and PID to identify if the old
process is still alive.

The only hole I can find with his old implementation is that not all filesystems
support symlink().  Strange from a Unix perspective, but true.  Among others, I
believe not all shared Windows partitions (which can be mounted from Unix w/
samba) support symlink.  From the smbclient manpage:

>>>>
symlink source destination:
This command depends on the server supporting the CIFS UNIX extensions and will
fail if the server does not. The client requests that the server create a
symbolic hard link between the source and destination files. The source file
must not exist. Note that the server will not create a link to any path that
lies outside the currently connected share. This is enforced by the Samba server.
<<<<

Basically, all Win32 servers mounted with smb would not support the symlink
semantics needed by brendan's code.



Conclusion: the new code is better and will work across a wider set of network
conditions, and avoids several edge-case holes in the old code.
good to hear... i never liked the way 4x handled profile locking.  at least on
linux, it never seemed to clean itself up after a crash.  i always had to go
into my profile and remove the lock "symlink" file manually :(
The ns4 dialog code would tell you what machine/pid was being used, so that you
could find it an kill it if needed.

At my university, programs like netscape run on a random machine in a cluster,
so knowing which one I have to ssh to to kill it off would be very helpful.
The problem is that lots and lots of nfs servers don't support locking.  In
fact, network servers will generally lie and just say "yeah, you've got the
lock" even when you don't.  If I had to choose between that and supporting
filesystems that didn't support symlinks, I would have to choose the former.
It's also possible to get hangs waiting for an NFS lock, in certain cases of
mismatched servers (some combinations of NFS versions between SGI and Sun
machines will cause this).
Ugh.  I was afraid of that.  (I did some searches to try to check, but I'm not
an NFS expert.)  As for 4.x, I've more than a few times had to remove lock files
when power or connectivity was lost, etc.

It is a real problem with the 4.x implementation that (a) there's no cleanup of
lock files on a crash for sure, and (b) it doesn't work with Windows servers. 
For example, IT here doesn't back up individual machines (especially not Unix);
it only backs up important servers plus the shared Windows server which everyone
has a directory on (and Unix machines can mount via samba).  This is not an
unusual case.

The alternative to file locks vs symlinks is to use some combination, or to
emulate symlinks via ordinary files (which might be slightly more complex).

Basically, if the symlink() call returns an error that indicates the operation
isn't allowed, either try an fcntl() lock (samba supports them, and it's the
primary non-symlink FS, though there are others like Linux FAT partitions(!)),
or write a file with the symlink info in it.  symlink() is almost (in this
usage) an atomic way to create and write a small file; there are other ways to
do that.

BTW, some of these network issues probably affect other platforms in different
ways when they use a shared profile.
The 4.x code does invalidate the host:pid-lock if the contending instance is
running on host and the pid is not valid.  To do better, we'd need an "are you a
netscape instance, and are you alive?" inter-process protocol.  It didn't seem
worth it in 1995, but I was hacking near the end of the 2.0 cycle -- with more
time (1.1alpha), more could be done.

For mozilla1.0, I like the symlink-with-fcntl fallback approach.

/be
New patch soon?  We want this for 1.0, rc2 ideally (but that means today).

/be
Whiteboard: [adt3] [ETA 05/05] → [adt3] [ETA 05/05] [driver:brendan]
This all seems rather late for 1.0. I would have thought given where we were, in 
order to obtain stability, new functionality would NOT be allowed on the 1.0 
branch. Only bug fixes for serious bugs should be allowed in.

I just know the symlink() fcntl() stuff isn't going to work on OpenVMS. If you 
put that code in now and then toss the RC2 release over the wall, I'm not going 
to have any time to fix it.

Release Engineering 101: If you want a stable release, you can't accept new code 
in the end game.
colin: the 4.x code is not "new".  Likewise, we need to fix this 4xp bug for any
mozilla1.0 that wants to succeed 4.x.  If you can't deal with it, then you can
live with the profile corruption platform on your platform -- but why should any
other platform have to, especially when those platforms ran the 4.x code just
fine?

/be
s/profile corruption platform/profile corruption hazard/

I passed release management 101 years ago, but thanks anyway.

/be
Colin, this is a bugfix for a serious bug. I think it's probably in the top 3
wors bugs that we have on Mozilla. Uncountable users are hosed by corrupted
profiles because they ran two mozillas or a mozilla and a netscape at the same
time causing profile corruption.  Feel free to query for all the Worksforme bugs
that have "new corrupt profile" (substrings) in the comments to get a feel for
how widespread this problem is. You'll find two primary catagories of
corruption. The first is caused by running an old profile with a new build where
we broke compatability (this sucks and needs to stop) The other major catagory
is running two apps on the same profile and this is only going to get worse as
more people use Mozilla and Netscape. This is a bug, and one of our worst bugs.
Catagorizing the fix for one of our worst bugs as a feature doesn't do anyone
any good. By your definition not crashing on startup is a feature we shouldn't
respond to on the 1.0 branch. I don't buy that "no new features" line here. 
Asa: to be fair, colin said "no new code in the end game", not "no new
features". My point is that we're taking new code to fix bad bugs, even if the
new code is a few lines in a small patch to old code.

This bug requires more than a few lines, but ccarlen has sunk the cost, and the
patch is testable and reviewable in the time we have left for 1.0.  So I agree
with you, but I think we need to be clear that "new code" is not a disqualifier
at this stage in the end game -- you have to look at the particular new code,
and at the severity of the bug.

/be
SPAM/Stupid comment:
we get many incoming Bugs about Profile corruption.
The cause :
1. Running 2 Mozilla's at the same time
2. Broken compatability
3. Bugs who caused the profile corruption
(recent Bug: The downloadmanager.rdf grows and Mozilla gets slower with the time)
4. Bugs with installed Themes/language packs (should be already fixed with the
version string)

We should start to fix those bugs. I can't count the bugs i resolved because the
Profile were corrupt. And a "normal" User don't know how to import his mail (we
must fix this bug !) from his old profile.
And many users also don't know that they fix many "bugs" with a new profile.
(I repaired my profile 6x in 1-2 years)
Mozilla looks really bad with this profile corrution.


Blocks: 143200
No longer blocks: 138000
I worked in the 4.x compatibility as best I could -- I'll test as soon as my
build is done, others please test too and report results, here or via mail.

/be
Thanks. I'll try it out on OS X Mach-0 (XP_UNIX)
operator= wasn't complete.  No longer crashes.

Also, set the global mCurrentProfileLock when we lock the default profile dir
(if we succeed), so we don't lock/unlock/relock on startup

Tested lightly; I'll now start testing nasty cases
Attachment #83096 - Attachment is obsolete: true
Attached patch Fixed an oops that slipped in (obsolete) — Splinter Review
Attachment #83132 - Attachment is obsolete: true
Fixed some more problems with handling when the file is locked.

Patch ready for review.  Tested with both 4.x-style symlinks, and with
filelocks if the FS doesn't support symlinks (for example, Win32 fileservers
via smbfs (samba)).  Let's get it in.
Attachment #83136 - Attachment is obsolete: true
Note: I can only test this on Unix; we'll need coverage on other platforms (Mac
OS9/OSX (unix may cover OSX(?)), Win32, OS/2, etc).  I want to land this on the
trunk ASAP, and branch once nsIFile is in branch.
Severity: normal → major
Check-in-able diff -u coming next, don't review it unless you like trying to
see past the indentation and tab-elimination changes.

/be
This is the one to apply to an up-to-date trunk tree, while cd'd to profile
(and using POSIXLY_CORRECT=1 patch -p0 as usual).

/be
Tried it out on OSX Mach-0. Worked well. Tested killing the process which held
the lock and it was released properly.
As far as the impls for Mac CFM and Windows, that was tested when I made the
original patch. It hasn't changed. If somebody else wants to beat on though,
please do. My only reservation here is that, on Windows, the code relies on the
system closing the file handle when the process terminates (crashes). While I'm
confident of that on NT, 2000, XP, I'd like to see it tested on Win95 or 98. 
The exists var declared at line 540 in nsProfile.cpp is used without
necessarily being set.

/be
Attachment #83179 - Attachment is obsolete: true
Other use of exists variables all follow unconditional sets (via out parameters
of methods that always set their out params on success -- early returns for
XPCOM failure ensure that other exists uses follow only success).

/be
Attachment #83180 - Attachment is obsolete: true
Did you try it with a kill -9?  I'm not sure if atexit handlers are run in that
case. (Are they run for a SIGTERM? I can't remember.)
atexit handlers aren't (and can't be) run in response to SIGKILL.  They can (and
should) be run in a SIGTERM handler, the likes of which we should install if we
haven't already.  Does NSPR already do that, or should we whip something up?

(Having to manually clear out lock files after a kill -9 is a time-honoured Unix
tradition.)
In theory, a stale lock file should be detected, right?
Yes, stale lock files (symlinks) are found via the kill(pid,0) call - that
verifies the process that created the lock is dead (or alive).  For file locks
(F_SETLK), when the process dies the lock dies with it.
Comment on attachment 83205 [details] [diff] [review]
new diff -wu, only change: init exists at line 540 in nsProfile.cpp

>Index: src/nsProfileAccess.cpp

>+    // First, try the 4.x-compatible symlink technique, which works with NFS
>+    // without depending on (broken or missing, too often) lockd.
>+    struct in_addr inaddr;
>+    inaddr.s_addr = INADDR_LOOPBACK;
>+
>+    char hostname[256];
>+    PRStatus status = PR_GetSystemInfo(PR_SI_HOSTNAME, hostname, sizeof hostname);
>+    if (status == PR_SUCCESS)
>+    {
>+        char netdbbuf[PR_NETDB_BUF_SIZE];
>+        PRHostEnt hostent;
>+        status = PR_GetHostByName(hostname, netdbbuf, sizeof netdbbuf, &hostent);
>+        if (status == PR_SUCCESS)
>+            memcpy(&inaddr, hostent.h_addr, sizeof inaddr);
>+    }
>+
>+    char *signature =
>+        PR_smprintf("%s:%lu", inet_ntoa(inaddr), (unsigned long)getpid());

this is nearly equivalent to the following:

  nsXPIDLCString myIP;
  nsCOMPtr<nsIDNSService>
dns(do_GetService("@mozilla.org/network/dns-service;1"));
  if (dns)
      dns->GetMyIPAddress(getter_Copies(myIP));
  nsCAutoString signature;
  signature = myIP + NS_LITERAL_CSTRING(":") + nsPrintfCString("%lu",
getpid());

except, it looks like you want to reference inaddr again down below...

at any rate, the patches looks good to me (r/sr=darin)
Attachment #83205 - Flags: superreview+
Comment on attachment 83205 [details] [diff] [review]
new diff -wu, only change: init exists at line 540 in nsProfile.cpp

I'm going to r= ccarlen's and rjesup's code and changes -- if they can r= my
4.x-revival code, we're ready for trunk checkin.  Let's do it.	Conrad, can you
do the deed, or should I?

/be
Attachment #83205 - Flags: review+
Brendan, your code is r='d.  Whichever of us is awake and available the next
time the tree is open, let's get it in.
Blocks: 138000
No longer blocks: 143200
Severity: major → normal
ccarlen kindly offered to do the checkin (I hope he applied the diff -u in the
last attachment before this comment), once he verifies that Win95 works.  Or not
if we get antsy, and we'll let the trunk testing community tell us that Win95
and 98 work (we hope).

/be
Checked in. Thanks to everybody who contributed to the patch.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Grace, Can you do some testing of this, please?

Here's what should happen:
1. Launch one instance of the app
2. launch another
  Even if you have only 1 profile, the profile dialog will come up. When you try
to select the profile, you will get an alert saying the profile is in use.
3. Without dismissing the profile dialog, quit the first instance of the app
which is using that profile.
4. Back in the profile selction dialog in the 2nd instance, you should be able
to select that profile and launch.

Platform Notes:
Windows - you need to have a ns and a mozilla build to test this since you can't
run 2 instances of either on Windows.
Mac - you will never run into the multiple instance problem since there is code
to prevent 2 simultaneous instances of either mozilla or ns. With this in place,
we can get rid of that evil code.

What I need from the testing is to make sure that we don't get stuck by dead
locks - particularly on Win 95 & 98. After running an instance which will lock
the profile dir, crash the machine, pull its power cord from the wall, etc.
So what does this mean for us end users? Can I leave mozilla running on my home
computer for my wife to use and login to it with terminal services remotely for
my email? Do I now need to logout and have her login seperately to the machine? 

My work pattern involves leaving many windows open, windows that I am currently
using to peruse large amounts of information, keep an open task list of emails
and general placeholders where I am up to. Closing down all sessions before
leaving for work is extremely unproductive.

How will this ever enable mozilla to run as multiple processes instead of one
large executable, all instances vurnerable to a a crash problem, be it website
or huge mail folder sorting, searching or what have you (and yes, software will
always crash).

Would mozilla not be better of with a data store that can handle concurrent
access and per item locking while in use?

As far as cache goes - big deal - I come home and clear the cache and fine. But
email -- nothing new comes in till I shut it all down this morning, whereby it
then goes and downloads over 2000 emails. I leave it running while I go to work
because otherwise whatever came in will come in again (another problem).So I get
in to work and remotely kill it. Mouth off here and hope my wife isnt wondering
what happened to her browser windows.

Microsoft Lookout and Exploiter dont have these problems at all.
looks like this one cost us a small bit (15ms on comet) of Ts :(
You may have a point that the profile stuff should be threaded through a
backend, etc.  It isn't now, however, and running more than one mozilla against
the same profile is just asking for corruption of files or loss of modifications
(like to preferences, sizes, mail data, etc).  The architectural change to the
profile backend should be filed as a separate bug for probably the 1.2 or 1.3
timeframe.  There are bugs on this for things like sharing a bookmarks store.

To avoid corruption, we need to use different profiles for different running
mozillas.  This patch helps make sure of that.

If you need to access your profile from work, you should have a separate profile
for your wife at home and get the in habit of quitting the browser when 
you're done.  You can leave yours up so long as she logs in as another user and
starts her own copy.  In WinXP the default screen after the screensaver's been
up shows the users available for login; you can have several users logged in at
once.  You'll still need to kill your browser to unlock the profile. 
Alternatively, you can use a remote-access program to interact with the
already-running mozilla as if you were at the keyboard/mouse.  (VNC, etc).
Well said, I gues we can live like that. Thanks for the hard work all.
Joe: You may be interested in bug 135137.
darin: Correctness is more important than performance.

I'm sure there are plenty of ways to speed up startup by a larger factor than
this slowdown...
zach: indeed! :-)  it's just somewhat surprising that this change would even
register on the performance meter.
What OS does comet run?  The first thing that comes to mind, is that symlink()
on some systems will block until I/O completes.  I'm not sure whether 15ms is
the right ballpark for that delay, though.
comet is running linux... just visit the tinderbox page and see for yourself. 
click on the L, and view the full log.
Accoring to brab TBox, this checkin have added a warning:

+profile/src/nsProfileAccess.cpp:1857
+ `int symlink_errno' might be used uninitialized in this function

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/profile/src/nsProfileAccess.cpp&mark=1857#1854
*** Bug 102519 has been marked as a duplicate of this bug. ***
do we think this one is going to be ready to land on the 1.0 branch tomorrow?
note that with terminal services and locks there's nothing preventing you from 
having multiple profiles with mail accounts pointing to the same mail folder.  
doing so of course entails risks, but if you're willing to take them ...

If you do that, there's a bit of risk while your main profile is delivering 
mail if you are reading a mailfolder in your terminal profile, however ideally 
mozilla can be taught to play nicely with locked mailfolders while delivering 
mail (if it isn't now, then please file an enhancement request asking for 
mozilla mail to handle that).
The result of this patch seems to be somewhat confusing.
If you run mozilla with a default profile and that profile is in use and you
start another mozilla instance, you get the profile selection dialog which does
not really indicate why it was shown which may confuse people (easily done on
linux since running multiple instances of mozilla is pretty easy).
You will only realize what is happening if you try to open the profile and see
the error message.
according to comment #174, I have this kind of bug, with quicklaunch enabled.

I closed every mozilla window, and when I right click on mozilla icon in order
to open a new mailnews window, I have profile manager showing instead.

I have to close quicklaunch and launch Mozilla once again !
see bug 144930
> The result of this patch seems to be somewhat confusing.

It could pop up an alert first saying "The selected profile is in use..."
followed by the profile picker. But, that would force a minimum of 2 dialogs in
this situation. As it is, it might be somewhat confusing on first encounter.
Hopefully, after a few encounters, you'll appreciate the minimum number of
dialogs you're forced to deal with in this situation.
testing (win2k machine)
install moz build 2002051608- had QuickLaunch on- single profile
ran from icon 
ran from shortcut- each time ProfileManager came up (see bug 139226)
installed trunk ns 2002051608
launched and got Profile Manager- selected profile
got message- profile in use (expected as mozilla in use)
exit 
launched mozilla from icon and got Profile Manager -single profile unselected
selected profile and got warning message again- but this is the app using that 
profile.

will test more later 
if it isn't prohibitively expensive, i think it'd be nice candy if we styled 
'inuse' profiles (eg graying/italicizing their icon/text).
Blocks: 143200
Made patch for branch including fixes for bug 144930 and bug 145061. Will post
after updated tree builds.
No longer blocks: 143200
It doesn't necessarily have to be an alert...
There's space on the profile manager(on the left) to add something like 'The
Profile Manager has been opened because that profile is already in use.' or
something to that effect.
[readding dependency from ccarlen's midair with chofmann]
Blocks: 143200
Has fixes for the other 2 bugs included. Also, has a last-minute correction
that went into the trunk as well - the alert is put up by
nsIPromptService::alert() rather than alert(). Otherwise, the same.
Comment on attachment 84019 [details] [diff] [review]
MOZILLA_1_0_0 -wu patch

a=rjesup@wgate.com
Merging forward r/sr from this patch and the regression patches from trunk. 
Approving; please try to get this in ASAP
Attachment #84019 - Flags: superreview+
Attachment #84019 - Flags: review+
Attachment #84019 - Flags: approval+
Checked into branch.
Keywords: fixed1.0.0
No longer blocks: 143200
The UI for this is very bad: just bringing up the profile manager dialog
without any indication whatsoever why is confusing. When the user finally
selects the profile that is in use the dialog gives again misleading information:
only a tiny fraction of users will want to use another profile or create
a new one in this situation. Most of them probably are not aware that
another mozilla process is already running.

This should definitely be reopened and changed so that the first 
dialog indicates the problem and possible ways to solve it. The 
profile manager can be shown after this.
Minimizing the number of dialogs is good but not at the cost of such
a bad user interaction.
Perhaps pop up the preferences manager but add some text to the window
explaining why it's there?
I think the first thing most users want here
is an explanation of the situation (and a good one),
and *then* a choice of what can be done (one of which is profile manager).
Note that this situation will most likely occur in circumstances where
the user was not aware that another process is running, i.e. the most
common action will be "exit".
Those who deliberately start another instance will probably specify the
profile or the profile manager directly in the command line anyway.

So popping up an alert first and then showing profile manager will
be an enhancement of UI over the current situation.
If there is an integrated dialog that 
shows both the explanation and the profile manager 
in a user friendly way it might be better, but I 
guess that would be a lot more work and hardly worth the effort.
I read comments 48 and comment 71 - 74.

Would the following be possible?

If the profile is in use, detect, whether the application is running on the same
or on a remote machine.

If that is impossible to detect, inform the user, offer to select a profile.

If the application is running on the same machine, communicate with the running
instance, tell it to open a new browser window, and exit the new process.

If the application is running on a different machine, show a message, inform the
user about the situation, and ask "do you want to use a different profile?".
i agree with kaie... that sounds like an excellent solution... but this probably
deserves a separate bug.  this one was just to avoid corruption (albeit at the
cost of user friendliness).
Yes. And, it needs to be made more consistent across platforms. Right now, Mac &
Windows have other mechanisms in place to deal with multiple running instances.

Windows - nsNativeAppSupportWin.cpp uses DDE so that, when you start a 2nd
instance, it justs forwards the given command line to the running instance and
then quits. It just forwards to the running instance with whatever profile it's
using. This makes for the simplest user experience. Only works locally and not
between NS and mozilla, though.

Mac - When the 2nd instance starts, it just puts up an alert and quits :-/ See
bug 145127.

Unix - What's been discussed here.

What Kai suggests sounds good - similar to what Windows does but taking the
profile locked state into account. The downside is that the Windows DDE solution
happens before we even init XPCOM so the time taken until the 2nd instance
forwards the request to the first is near zero.

I think to make the UE friendly and consistent across platforms should be an
XPApps bug. This was just to have the back-end locking mechanism. 
This might be a bit off topic, but why in Linux when you click on the Mozilla
icon does it not act like it does in Windows and open it in the same instance.
When I click on the Mozilla icon in Linux, it opens a new profile manager box
(meaning a new instance). Shouldn't it just detect another instance and load the
window in that instance?
Some thoughts regarding communication with an already running Mozilla:
At least 0.9.9 (the one I use) is fragile when it's starting. If you try to use
-remote while another Mozilla hasn't finished loading, it often segfaults.
Whatever communication mechanism you choose needs to take this into account. At
the moment a Mozilla instance creates the lock file it needs to be prepared to
take requests from other instances.
One way to communicate with a running or just starting Mozilla instance would be
to drop a file with the request in ~/.mozilla. Mozilla could monitor this
directory (or there could be a wakeup signal of some kind, a UDP socket for
instance) and process incoming requests. A primitive mechanism, I admit, but it
seems to be easy to implement, works the same on every platform (well, the
directory is different) and is reliable.
The mozilla startup script that's included in the rpms already creates a new
window in an already running instance.
This code intercepts the SIGV signal resulting in Mozilla no longer dumping any
core files when crashing. I've filed bug 148453 on this problem.
Blocks: 148509
Verified for Win and Mac
(see bug 122698 for Linux) 
Status: RESOLVED → VERIFIED
My personal work pattern involves opening several separate instances of Mozilla;
up until rc3, I could do this willy-nilly.  If one of my instances crashed, it
would only take down that instance's windows, and not every copy of Mozilla I
have running.  With the locking now present in rc3, I can no longer do this,
which severely cramps my work style -- I've reverted to using rc2 out of sheer
necessity.

It would be nice if there were at least a prefs option to disable the locking
behavior.  Let ME worry about corrupting my profile data by running multiple
instances; it's what I want to do and Mozilla should give me the option to do
so.  (I don't use any part of Mozilla except the browser anyway, so the only
valuable data is my bookmarks file.)  I don't mind if locking is the default
behavior, and the aforementioned preference is an obscure, undocumented option
that has to be manually entered into prefs.js in Swahili ideographs, as long as
it exists. :)
> so.  (I don't use any part of Mozilla except the browser anyway, so the only
> valuable data is my bookmarks file.)  I don't mind if locking is the default

without profile locking, valuable data such as your bookmarks file could get
completely corrupted.  there simply are no guarantees that any of your profile
data will be safe when more than one executable is accessing it at the same time.

you were really playing with fire each and every time you ran more than one
mozilla with the same profile.  if you care about any of the data in your
profile directory, then do yourself a favor and use mozilla 1.0.
> you were really playing with fire each and every time you ran more than one
> mozilla with the same profile.  if you care about any of the data in your
> profile directory, then do yourself a favor and use mozilla 1.0.

I'm aware of that.  (I thought this was made fairly clear when I asked that I be
allowed to make the decision whether to endanger my data.)  Which is why I have
 a variety of scripts in place that archive all my preference data from my
various programs every night.  I'm not worried about losing my bookmarks.  (And
in 15 months of using Mozilla, I never have.)  I'm fully aware of WHY the
locking is there; the fact that I cannot disable it if I so choose (without
recoding) is what bothers me.

Like I said, I don't mind if the locking behavior is the default setup; I
certainly don't want to endanger others' data.  But the locking behavior
prevents me from using Mozilla in a way that I am comfortable with, and I see no
reason why there should not be an option to disable this behavior if desired.
> prevents me from using Mozilla in a way that I am comfortable with, and I see 
> no reason why there should not be an option to disable this behavior if 
> desired.

from my point of view anyways, it comes down to this: no one wants to support
mozilla without profile locking in place.  lack of profile locking has caused
numerous bugs that are usually very difficult or at least tedious to correlate
with profile corruption.  we simply don't have the time or energy to support
such a configuration.  from past experience, we know that users tend to play
with hidden prefs and forget that they've done so, or maybe they observe a bug
that seems unrelated to that hidden pref they set and don't mention that they
set it when they file a bug.  in the end, after much work, we discover that the
bug was due to the fact that they set the hidden pref that is known to do bad
things.  i just think we have enough things to worry about as is... let's not
make more work for ourselves if we can help it.

i know this sucks for you since you are used to using mozilla in a certain way,
but i hope you understand why even a hidden preference is not necessarily a
simple solution to your problem.
And there is a simple workaround if you insist to do that (at least on linux - I
don't know about windows). Simply add rm ~/.mozilla/yourprofiledir/lock into the
script by which you'll start a new mozilla instance. But then please don't
report any bugs which you would see in your mozilla.
Summary: Profiles needs to be protected from running multiple instances of mozilla → Profiles need to be protected from running multiple instances of mozilla
Using the debian package, I find the current behaviour of mozilla really really
annoying when it comes to running multiple instances. Netscape of old was much
more sane - if a process crashed, you had to manually remove the lock file, but
this was much better than the current practice of
1) if you start mozilla from one computer on a network, you are not stopped from
running mozilla from another computer at the same time, using the same profile
(over an nfs link)
2) if you try to run even netscape from a computer that is running mozilla
(using the same DISPLAY variable, but very much differnt profile data), it fires
up another window to the mozilla session.
3) if you start mozilla on one computer with one display, ssh to another
computer that has different disks (and hence profile data), start another
mozilla, it even manages to open up another window to the original mozilla
session. This is just stupid!

Tim - 1) is probably a bug, 2) and 3) are bugs in the wrapper script debian uses
(which may be the saem as the mozilla one which is in the tree, of course...)
Please provide more details about case (1).  We put in a lot of work to use a
ns4.x-like scheme for NFS and a file-lock scheme for samba and windows.  Please
also provide output of 'ls -la' on the profile directory (where prefs.js lives)
while one copy is running, and then again after the second copy is run, and the
IP addresses of both machines.  Also manually try making a symlink (to anything)
in the profile directory ('ln -s foo bar') and do an 'ls -la'.

Case (2) and (3) look like wrapper script issues.  Perhaps that's causing the
problem with locking too if they're using a wrapper that nukes the symlink.

If an NFS server supports neither symlinks nor file-locks, then there's no
profile locking under unix.

If you're running windows but storing the profile on NFS, all bets are off.
The NFS protocol from v2 (v1 was unshipped, IIRC) on has had SYMLINK.  There is
no legal way to subset the protocol, so the only question is, does an NFS client
provide a symlink(2) emulation, and an ln -s command, or equivalent.  I join
rjesup in asking for more data about case (1), especially the ls -la outputs.

/be
Ah! Hello bbaetz!

Anyhoo, just checked this out again. I lied. point 1 was for the SuSe 7.2 +
Tru64 OSF/1 v5.1 (I think) system we have at uni.

(and 2 and 3 were for my debian unstable box)

So, when I start (admittantly, version 0.9.8 ; versions 1.0.0-rc1 and up have
all created huge headaches for me, both on linux and tru64, as documented
elsewhere on bugzilla) mozilla on the tru64 machine, it does not create a
symlink or lockfile. I can create a symlink manually. So, starting moz on the
suse 7.2 box doesn't know it should bugger off.

More possibly relevant information:
We have had huge headaches on the production NFS server about a month ago - it
kept on oopsing all over the place (This looks like it has been fixed in linux
2.4.19-rc2), and we found out it was caused by someone operating netscape 4.7 to
read their mail - something related to tru64 clients trying to use filelocking.
So we removed netscape in order to not keep getting these oopses, while waiting
for the kernel fix. Now, Is file locking just completely broken under tru64+nfs,
and so this breaks the lock file used by mozilla? In this situation, if mozilla
tries to write a lock file, and fails, should it not just write a file anyway,
without using kernel locks, and just hope for the best? After all, the race
condition (if two moz processes started at exactly the same time) isn't _that_
lethal. Better than no "lock" at all?....

Of course, I don't actually know how moz implements the file locks, so I could
be sprouting BS....?
Tim Connors: mozilla0.9.8 had no symlink locking, or any other kind of locking,
among processes vying for a profile.  FYI, symlink(2) is atomic and exclusive;
that's why I used it in Netscape 2.0 to implement an NFS-safe lock file signed
by the (primary) IP address and pid of the winning contender.

/be
Blocks: 148453
What's the state of this topic today?, I would like to share the profile between a Linux physical Machine and a Windows virtual machine (using Virtual Box Shared Folders).

Both machines will be using Firefox 3.6
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.