Closed Bug 101016 Opened 23 years ago Closed 22 years ago

when the installer runs mozilla, it dies immediately with "getcwd: Function not implemented"

Categories

(SeaMonkey :: Installer, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.0

People

(Reporter: zwol, Assigned: netscape)

References

Details

(Keywords: relnote, Whiteboard: [adt2 rtm] [fixed on trunk and 1.0 branch])

Attachments

(1 file)

I do

$ mozilla-installer/mozilla-installer

and it churns away.  The it tries to run the just-installed mozilla, which
immediately dies like so:

/home/zack/mozilla/run-mozilla.sh /home/zack/mozilla/mozilla-bin -installer
getcwd() failed: Function not implemented
If I run that command myself from a shell window, mozilla comes up normally.

My theory is that the installer is executing mozilla with its current
directory set to a directory that has been deleted, but I have no proof of
this.
QA Contact: bugzilla → ktrina
QA Contact: ktrina → gbush
cannot reproduce here. 
Zack, what build are you using? Have you tried a recent nightly build and a
fresh profile?
Just reproduced the bug with the installer at
<http://ftp.mozilla.org/pub/mozilla/nightly/2001-10-08-08-trunk/mozilla-i686-pc-linux-gnu-installer.tar.gz>
and no profiles directory at all.
I install into $HOME/mozilla, running as myself not root,
might that have something to do with it?
OK, I tried that very same installer build and it works fine, I only get this
error message on the console but mozilla comes up fine.

shell-init: could not get current directory: getcwd: cannot access parent
directories: No such file or directory
Aha, sounds like a difference in shell implementations.
My /bin/sh is "ash" (NetBSD's sh, as hacked up by Debian).
What do you have?
I just have good ole bash (2.05.0), as hacked up by Debian, as well :-)
I just did an install as a normal user and got the same message. Was able to
install fine and run fine. btw, I have had this 'shell-init * getcwd' message
appear on almost all recent releases. It does not seem to interfere with
installation or running for me. (Talkback 0.95 tar.gz install)

Points of interest:
__Error shows up after calls to GTK are made:
  [dunk@skippy mozilla-installer]$ ./mozilla-installer
  Gtk-CRITICAL **: file gtkwidget.c: line 1510 (gtk_widget_hide): assertion \
     `widget != NULL' failed.
  [dunk@skippy mozilla-installer]$ /home/dunk/9.5/run-mozilla.sh  \
     /home/dunk/9.5/mozilla-bin -installer
  shell-init: could not get current directory: getcwd: cannot access parent \
    directories: No such file or directory
  MOZILLA_FIVE_HOME=/home/dunk/9.5
  [Install continues as normal here..]
  

__Doesn't appear to be in the .sh installer scripts:  
  [dunk@skippy mozilla-installer]$ strings --print-file-name * | grep getcwd
  mozilla-installer-bin: getcwd

__Shell version:
  [dunk@skippy mozilla-installer]$ /bin/sh --version
  GNU bash, version 2.05.1(1)-release (i586-mandrake-linux-gnu)
  Copyright 2000 Free Software Foundation, Inc.

__After installation.. (In application's directory):
  [dunk@skippy 9.5]$ strings --print-file-name * | grep getcwd
  libxpcom.so: getcwd
  libxpistub.so: getcwd

__Who I am:
  [dunk@skippy 9.5]$ uname -a
  Linux skippy 2.4.6-5mdk #1 Wed Jul 18 19:59:39 CEST 2001 i686 unknown

I can provide any other information needed of my system. Running somewhere
between RedHat 6.1 and Mandrake 8.1.

What is the process for getting this bug confirmed and fixed?

thanks
dunk
OK, I think it is safe to confirm this bug now, we seem to depend on the shell
implementation. Maybe the bug is with ash, but then we should add a comment to
the release notes.  ---> NEW

Zack, can you try to switch your /bin/sh to bash for a single test and see if
the problem still occurs? Also please do try it with milestone 0.9.5. That way
we should be able to pin down the problem further.

Bunk:
To have a bug confirmed you need to convince somebody with canconfirm privileges
(like myself) like me that the bug is real. Writing good reports and following
suggestions (like Zack did) does help a lot here.
To have a bug fixed you need to convince somebody with the proper skillset that
it is worth her precious time to fix it. Or you just dive into the code and do
it yourself :-)
Status: UNCONFIRMED → NEW
Ever confirmed: true
I can certainly do these tests.  Here's what I get.

This is with the mozilla0.9.5 installer.  It was instructed to install
Navigator only, into /home/zack/m095/install, which did not exist before
the installation, and to preserve .xpi modules.  I did a dry run first,
then installed repeatedly from the .xpi modules.

With /bin/sh = ash:

~/m095 $ mozilla-installer/mozilla-installer

Gtk-CRITICAL **: file gtkwidget.c: line 1510 (gtk_widget_hide): assertion
`widget != NULL' failed.

~/m095 $ getcwd() failed: Function not implemented

and Mozilla does not load.

With /bin/sh = bash: 

Gtk-CRITICAL **: file gtkwidget.c: line 1510 (gtk_widget_hide): assertion
`widget != NULL' failed.

~/m095 $ mozilla-installer/mozilla-installer

~/m095 $ shell-init: could not get current directory: getcwd: cannot access
parent directories: No such file or directory
/home/zack/m095/install/run-mozilla.sh /home/zack/m095/install/mozilla-bin
-installer
shell-init: could not get current directory: getcwd: cannot access parent
directories: No such file or directory
MOZILLA_FIVE_HOME=/home/zack/m095/install
  LD_LIBRARY_PATH=/home/zack/m095/install:/home/zack/m095/install/plugins:.:
     LIBRARY_PATH=/home/zack/m095/install:/home/zack/m095/install/components
       SHLIB_PATH=/home/zack/m095/install
          LIBPATH=/home/zack/m095/install
       ADDON_PATH=/home/zack/m095/install
      MOZ_PROGRAM=/home/zack/m095/install/mozilla-bin
      MOZ_TOOLKIT=
        moz_debug=0
     moz_debugger=
shell-init: could not get current directory: getcwd: cannot access parent
directories: No such file or directory
*** QfaServices is being registered
 I am inside the initialize
 Hey : You are in QFA Startup 
(QFA)Talkback loaded Ok.

and Mozilla proceeds to load up.

Since mozilla loaded completely when sh = bash, I guessed that it
might have retained the bogus working directory.

~/m095 $ ps xc | grep mozilla
17905 pts/4    S      0:00 run-mozilla.sh
17911 pts/4    S      0:09 mozilla-bin
17912 pts/4    S      0:00 mozilla-bin
17913 pts/4    S      0:00 mozilla-bin
17914 pts/4    S      0:00 mozilla-bin
17916 pts/4    S      0:00 mozilla-bin
~/m095 $ ls -l /proc/17911/cwd
lrwx------    1 zack     zack            0 Oct 15 19:58 cwd ->
/var/tmp/.tmp.xi.0/bin (deleted)

Indeed it did.  The installer is running mozilla from inside a directory
that has been deleted.

Linux, and some other Unixes, have no problem whatsoever with deleting a
directory that is empty but still some process's working directory.  The
directory inode is not recycled until the last process using it leaves or
exits - just like when you delete an open file.  However, in order to
maintain file system consistency, you cannot create files in a deleted
directory, the '.' and '..' links are removed, and most important in this
case, getcwd(2) will fail.

$ mkdir x
$ cd x
$ rmdir $HOME/x
$ ls -la
total 0
$ strace -e getcwd /bin/pwd
getcwd(0xbffff73c, 1024)                = -1 ENOENT (No such file or directory)
/bin/pwd: cannot get current directory: No such file or directory

Like that.

Now, watch what happens if I try to run shells from that deleted
directory:

$ strace -e getcwd ash -c 'echo hello'
getcwd(0xbffff90c, 256)                 = -1 ENOENT (No such file or directory)
getcwd() failed: No such file or directory
$ strace -e getcwd bash -c 'echo hello'
getcwd(0x80b800c, 4095)                 = -1 ENOENT (No such file or directory)
shell-init: could not get current directory: getcwd: cannot access parent
directories: No such file or directory
hello

Conclusion: Both bash and ash call getcwd() during their
initialization, even if they don't strictly need to (neither of the
above commands makes any use of the result of getcwd).  If it fails,
ash treats that as a fatal error; bash carries on.

/var/tmp/.tmp.xi.0/bin was presumably a scratch directory created by
the installer.  It changed into that directory while unpacking, then
when it was done it deleted the directory - but didn't bother changing
back out again.  Thus, when it spawned run-mozilla.sh, that script got
the deleted directory for its working directory, and (if executed by
ash) it gave up.

Proposed fix: Have the installer remember where it came from, and chdir
back there before deleting the temporary directories.

[p.s. dunk - getcwd is the name of a system library function, not a shell
command. the corresponding shell command is "pwd".  however, as you can
see, ash and bash both call getcwd even if pwd is never used in the script.]
Target Milestone: --- → M1
Target Milestone: M1 → Future
Shouldn't we warn about this problem in the release notes?
Keywords: relnote
*** Bug 113667 has been marked as a duplicate of this bug. ***
I'm also seeing this behaviour using Mandrakes bash2.  Another note - if this is
known to only work correctly with bash, why is it using /bin/sh?  If it's not a
sh script, it shouldn't pretend to be one!

[askwar@teich askwar]$ ls -la /bin/sh /bin/bash ; rpm -qf /bin/bash
-rwxr-xr-x    1 root     root       580940 Nov 19 14:14 /bin/bash*
lrwxrwxrwx    1 root     root            4 Dez  4 18:04 /bin/sh -> bash*
bash-2.05-15mdk
yikes, we're seeing this on donner's RH 7.3 machine.

several bad things happen:  4.x migration fails, you can't create a new profile,
the app won't launch, etc.
dmose tells me that bash is the default shell.  (It is a sign that I'm old
school that I use /bin/tcsh?)
> It is a sign that I'm old school that I use /bin/tcsh?

Yes.
zack's suggestion:

"Proposed fix: Have the installer remember where it came from, and chdir
back there before deleting the temporary directories."

I'll see if that fixes it for me.

un-futuring this.
Keywords: nsbeta1
Target Milestone: Future → ---
data point about donner's RH 7.3 machine:

he installed it as a redhat workstation, with no developer tools, if that makes
sense.

ok, some info:

it's not the installer.  I installed mozilla on another machine, zipped it up,
brought it over to donner's machine, unzipped it, and it fails.

he's got /bin/sh -> bash, and his version of bash is 

Bash version 2.05a.0(1) release GNU
Seth, could you compile up the following and tell me if you get any link or
runtime errors on donner's system?


#include <unistd.h>
#include <stdio.h>

main( int argc, char *argv[] )
{
  char buf[1024];

  if ( getcwd( buf, sizeof( buf ) ) != (char *) NULL )
          printf( "Current working dir is %s\n", buf );
  else
          printf( "getcwd failed\n" );
}

Just cc -o testme testme.c is necessary for me to compile link and run on Linux
here.

getcwd is in manual 3 so I can't imagine what the shell has to do with it.
syd, it's not that getcwd() fails, it's that doing getcwd() from a deleted
directory fails.

I build and tested your code on donner's machine, and it worked fine.

output:  "Current working dir is /home/stephend"

note, just running /bin/bash in a deleted directory will cause this problem.

[stephend@h-10-169-108-235 stephend]$ mkdir foo
[stephend@h-10-169-108-235 stephend]$ cd foo
[stephend@h-10-169-108-235 foo]$ rm -rf ~/foo
[stephend@h-10-169-108-235 foo]$ bash
shell-init: could not get current directory: getcwd: cannot access parent
directories: No such file or directory

Ok, I misread one of the earlier comments. And this happens in the browser. hrmmmm.

So, can we load a debug build of mozilla into gdb, do a shar libc, set a break
on getcwd, and find out who is calling it?
Seth: How did you run the browser after extracting it on a different machine?  I
have never had any trouble running the browser from the command line once it
gets installed.  This sounds like a different bug from mine.

I looked through the installer code and it's quite clear where the bug I
reported is.  Look at 
http://lxr.mozilla.org/seamonkey/source/xpinstall/wizard/unix/src2/nsXIEngine.cpp.
Search for mOriginalDir and mTmp. You will see that there is code which is
supposed to save the original directory, create a temp directory, chdir() into
it, unpack some zipfiles, chdir() back to the original directory, and delete the
temp directory.  It doesn't work: strace indicates that the final chdir() is to
the directory that the installer is already in.

I am guessing that it doesn't work because mOriginalDir gets set in
nsXIEngine::LoadXPIStub().  That function gets called once for each .xpi module
downloaded.  The first time, it correctly sets mOriginalDir, but the second time
through it clobbers that with the path to the temp directory.  Therefore the
chdir() in nsXIEngine::~nsXIEngine doesn't go anywhere.

If I'm right, this can be fixed by moving the getcwd() call from LoadXPIStub()
into the nsXIEngine constructor.  I don't have a Mozilla checkout or any idea
how to test that change if I did, though.
>Seth: How did you run the browser after extracting it on a different machine? 
>I have never had any trouble running the browser from the command line once it
>gets installed.  This sounds like a different bug from mine.

I installed on another linux box, and then zipped up the whole beast, ftp it
over to the machine with the problem, unzipped it.

after doing that, doing ./mozilla -installer will fail.  (I can attach the
strace log.)

I was trying to take the actually installer out of it.  based on my zip test,
even once we fix the installer problem, we'll have other problems.

I can get a build with the suggestion you made, and try it out and see if I get
further.
Peculiar.  When I run ./mozilla -installer it works just fine.  I think
we've definitely got two different bugs here.
In my initial strace of the installer run, I saw that we don't chdir() back to
the installer dir until *after* the sub-process has forked.  This patch causes
us to chdir() to the location of the binary (because it was handy available)
before running it.
Zack:

try this

./mozilla -ProfileManager

are you able to create a profile manually?
seth: Yeah, it works just fine.

chris: Aha, so _that_'s what's really going on.  That patch looks like the right
idea to me.
in LoadXPIStub /home/syd/machvbeta/netscape-installer
In XIEngine destructor /home/syd/machvbeta/netscape-installer
shell-init: could not get current directory: getcwd: cannot access parent
directories: No such file or directory


So, the above printfs show that the chdirs are matched, it appears no clobbering
is happening before we hit the destructor.
Ok, I think we might do something better than fprintf in handling the error
(however unlikely that is)
Comment on attachment 83463 [details] [diff] [review]
chdir to destination dir before fork()

r=syd
Attachment #83463 - Flags: review+
I just rebuilt with cls' patch and tried it on donner's machine.

now, install works, and running mozilla works, everything is fine.

Is this something we want for 1.0?  I'm not sure how common it is to have a
/bin/bash that has this bug.

re-assign to cls to land on the trunk.
Assignee: syd → seawood
Comment on attachment 83463 [details] [diff] [review]
chdir to destination dir before fork()

sr=sspitzer

It's probably one of our golden rules not printf / fprintf anything in
optimized builds.

I think you should consider wrapping #include <errno.h> and the "if
(chdir(dest) < 0)" block with #ifdef DEBUG.
Attachment #83463 - Flags: superreview+
> Is this something we want for 1.0?  I'm not sure how common it is to have a
> /bin/bash that has this bug.

Very common.  All recent versions of bash exhibit this behavior.  Nominating for
1.0.
Keywords: mozilla1.0
getting on Mach V rtm radar.
Keywords: nsbeta1nsbeta1+
Whiteboard: [adt2 rtm]
fix checked in as is on the trunk.  

thanks to cls for the fix, and zackw@panix.com for all the debugging / testing.

I'll talk to drivers about getting this for 1.0.
Whiteboard: [adt2 rtm] → [adt2 rtm] [fixed on the trunk]
Target Milestone: --- → mozilla1.0
Keywords: adt1.0.0
Comment on attachment 83463 [details] [diff] [review]
chdir to destination dir before fork()

a=blizzard on behalf of drivers for the 1.0 branch
Attachment #83463 - Flags: approval+
The patch has been checked into the moz1.0.0 branch.
Status: NEW → RESOLVED
Closed: 22 years ago
Keywords: adt1.0.0fixed1.0.0
Resolution: --- → FIXED
Whiteboard: [adt2 rtm] [fixed on the trunk] → [adt2 rtm]
thanks for landing this on the 1.0 branch, as that will mean it will make it as 
part of the machv rtm.
Whiteboard: [adt2 rtm] → [adt2 rtm] [fixed on trunk and 1.0 branch]
removing the item for this fixed bug from the mozilla 1.0 rc3 release notes and 
future versions. 
verifying on trunk 2002053108,
unable to verify on branch until bug 145776 lands and am able to get to 
ftp.mozilla.org to get a build!
Status: RESOLVED → VERIFIED
verifying on branch build for Mozilla 6/12
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: