Closed Bug 1345413 Opened 3 years ago Closed 3 years ago

Crash in nsDependentCString::nsDependentCString

Categories

(Toolkit :: Startup and Profile System, defect, P1, critical)

52 Branch
Unspecified
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla55
Tracking Status
firefox52 + verified
firefox-esr52 52+ verified
firefox53 + verified
firefox54 + verified
firefox55 + verified

People

(Reporter: jcristau, Assigned: glandium)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-1bd8abb0-1846-4dda-8fd8-8a7292170308.
=============================================================

It looks like we crash in XREMain::XRE_mainStartup if the LOGNAME environment variable is unset.

We got a few of those crashes since yesterday on 52.0 release from Ubuntu users, and I can reproduce locally with "env -u LOGNAME firefox".  Not sure why this triggers now, it doesn't look like a new issue as far as I can tell.  My own crash reports got a "[@ XREMain::XRE_mainStartup ]" (52.0) or "[@ libxul.so@0x25c18db | XREMain::XRE_main ]" (55.0a1) signature.
Keywords: regression
Looks like a regression from bug 921063?
Blocks: 921063
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(lultimouomo)
My Crash Report is linked to this Bug. 

I can start firefox without a problem with option --no-remote.

firefox --no-remote
Assignee: nobody → benjamin
Status: NEW → ASSIGNED
Priority: -- → P1
The code looks like it was written to handle this case, so it's not immediately obvious to me why this broke. For example:

https://dxr.mozilla.org/mozilla-central/rev/58753259bfeb3b818eac7870871b0aae1f8de64a/toolkit/xre/nsAppRunner.cpp#3815
Note profile.

CheckArg also called with null param:
https://dxr.mozilla.org/mozilla-central/rev/58753259bfeb3b818eac7870871b0aae1f8de64a/toolkit/xre/nsAppRunner.cpp#1704

We crash here:
https://dxr.mozilla.org/mozilla-release/rev/dbb35200d46931343fd853cbe1af7688794b0940/toolkit/xre/nsAppRunner.cpp#3822

So we didn't bail out before, despite that being the intention:
https://dxr.mozilla.org/mozilla-release/rev/dbb35200d46931343fd853cbe1af7688794b0940/toolkit/xre/nsAppRunner.cpp#1702

Looks like that CheckArg call isn't doing what we're expecting it to do.
If -u isn't passed, CheckArg won't do anything, AFAICT, what it does is just override LOGNAME if argv contains -u something?
It is not the CheckArg call that is crashing, there is a nsDependentCString lower in the function.
Yes, I understand. When looking at the code I thought CheckArg would return a BAD_ARG in this case, but it doesn't (as pointed out in comment 5), causing the construction of the mutex name to segfault. The mutex is new from bug 921063.
Duplicate of this bug: 1345438
[Tracking Requested - why for this release]: Linux startup crashes
Comment on attachment 8844904 [details]
Bug 1345413 - Don't crash if LOGNAME is not set in the environment,

https://reviewboard.mozilla.org/r/118190/#review120236

If one user doesn't have LOGNAME, it's likely any user on the same system won't have it set. Which means the mutex path would be the same for all users on the same system, which doesn't really sound great. Seems like it would be better to fallback to something like geteuid()/getpwuid() when username is null when creating the mutex path.
Attachment #8844904 - Flags: review?(mh+mozilla) → review-
Flags: needinfo?(mh+mozilla)
I see lines in the comments such as:
tracking-firefox55: --- → ?
This morning the upgrade to FF 52 caused it to consistently crash on start up (Ubuntu 16.04 when using Xrdp).  Nightly, which is FF 55 (55.0a1 (2017-03-08) (64-bit) is running well today.
Tracking as new startup crash in 52.  We should get this in a dot release.
I can't do anything more complicated. Mike can you take it?
Assignee: benjamin → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(mh+mozilla)
Assignee: nobody → mh+mozilla
Flags: needinfo?(mh+mozilla)
Comment on attachment 8846988 [details]
Bug 1345413 - Fallback to getpwuid() info when LOGNAME is not set when creating the XRemote mutex.

https://reviewboard.mozilla.org/r/120030/#review122034

r=me, I guess.

::: toolkit/xre/nsAppRunner.cpp:3826
(Diff revision 1)
> +        // Beware that another call to getpwent/getpwname/getpwuid will overwrite
> +        // pw, but we don't have such another call between here and when username
> +        // is used last.

Well, not lexically in this method, but we might have some other threads (?) running, too?  So we would really want to use `getpwuid_r`...or would that significantly complicate things?

::: toolkit/xre/nsAppRunner.cpp:3837
(Diff revision 1)
> -        program + NS_LITERAL_CSTRING("_") + nsDependentCString(username);
> +      // In the unlikely even that LOGNAME is not set and getpwuid failed, just
> +      // don't put the username in the mutex directory. It will conflict with
> +      // other users mutex, but the worst that can happen is that they wait for
> +      // MOZ_XREMOTE_START_TIMEOUT_SEC during startup in that case.

WDYT about appending some interesting identifier in the case where we don't have a username?  It'd only help in corner cases, but maybe it would help people figure out what's going on?  Maybe?
Attachment #8846988 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #15)
> Comment on attachment 8846988 [details]
> Bug 1345413 - Fallback to getpwuid() info when LOGNAME is not set when
> creating the XRemote mutex.
> 
> https://reviewboard.mozilla.org/r/120030/#review122034
> 
> r=me, I guess.
> 
> ::: toolkit/xre/nsAppRunner.cpp:3826
> (Diff revision 1)
> > +        // Beware that another call to getpwent/getpwname/getpwuid will overwrite
> > +        // pw, but we don't have such another call between here and when username
> > +        // is used last.
> 
> Well, not lexically in this method, but we might have some other threads (?)
> running, too?  So we would really want to use `getpwuid_r`...or would that
> significantly complicate things?

I'm not sure the _r variants are available on all the systems this code builds on, and I'd rather avoid having to figure out how large of a buffer needs to be allocated beforehand (and possibly retry on ERANGE, etc.).

OTOH, this specific code runs, AFAIK, *before* any other thread starts. XPCOM is not even initialized.

> ::: toolkit/xre/nsAppRunner.cpp:3837
> (Diff revision 1)
> > -        program + NS_LITERAL_CSTRING("_") + nsDependentCString(username);
> > +      // In the unlikely even that LOGNAME is not set and getpwuid failed, just
> > +      // don't put the username in the mutex directory. It will conflict with
> > +      // other users mutex, but the worst that can happen is that they wait for
> > +      // MOZ_XREMOTE_START_TIMEOUT_SEC during startup in that case.
> 
> WDYT about appending some interesting identifier in the case where we don't
> have a username?  It'd only help in corner cases, but maybe it would help
> people figure out what's going on?  Maybe?

wouldn't "firefox_" instead of "firefox_user" be a red flag already?
Flags: needinfo?(nfroyd)
(In reply to Mike Hommey [:glandium] from comment #16)
> (In reply to Nathan Froyd [:froydnj] from comment #15)
> > Comment on attachment 8846988 [details]
> > Bug 1345413 - Fallback to getpwuid() info when LOGNAME is not set when
> > creating the XRemote mutex.
> > 
> > https://reviewboard.mozilla.org/r/120030/#review122034
> > 
> > r=me, I guess.
> > 
> > ::: toolkit/xre/nsAppRunner.cpp:3826
> > (Diff revision 1)
> > > +        // Beware that another call to getpwent/getpwname/getpwuid will overwrite
> > > +        // pw, but we don't have such another call between here and when username
> > > +        // is used last.
> > 
> > Well, not lexically in this method, but we might have some other threads (?)
> > running, too?  So we would really want to use `getpwuid_r`...or would that
> > significantly complicate things?
> 
> I'm not sure the _r variants are available on all the systems this code
> builds on, and I'd rather avoid having to figure out how large of a buffer
> needs to be allocated beforehand (and possibly retry on ERANGE, etc.).

FWIW, the man page for getpwuid says that `sysconf(_SC_GETPW_R_SIZE_MAX)` should be sufficient.

> OTOH, this specific code runs, AFAIK, *before* any other thread starts.
> XPCOM is not even initialized.

Ah, so it does.  OK, let's not bother.

> > ::: toolkit/xre/nsAppRunner.cpp:3837
> > (Diff revision 1)
> > > -        program + NS_LITERAL_CSTRING("_") + nsDependentCString(username);
> > > +      // In the unlikely even that LOGNAME is not set and getpwuid failed, just
> > > +      // don't put the username in the mutex directory. It will conflict with
> > > +      // other users mutex, but the worst that can happen is that they wait for
> > > +      // MOZ_XREMOTE_START_TIMEOUT_SEC during startup in that case.
> > 
> > WDYT about appending some interesting identifier in the case where we don't
> > have a username?  It'd only help in corner cases, but maybe it would help
> > people figure out what's going on?  Maybe?
> 
> wouldn't "firefox_" instead of "firefox_user" be a red flag already?

Fair enough!
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #17)
> FWIW, the man page for getpwuid says that `sysconf(_SC_GETPW_R_SIZE_MAX)`
> should be sufficient.

FWIW, that sentence is also followed with "If this size is too small, the call fails with ERANGE, in which case the caller can retry with a larger buffer". Considering the pw_name, pw_passwd, pw_gecos, pw_dir and pw_shell fields can be of arbitrary length, _SC_GETPW_R_SIZE_MAX can only give you a rule of thumb.
I found another problem with this code, namely, that the mutex directory is never removed as intended:

14434 rmdir("/tmp/firefox_glandium")    = -1 ENOTEMPTY (Directory not empty)
I'll file a separate bug for comment 19.
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/51e13fb7f254
Fallback to getpwuid() info when LOGNAME is not set when creating the XRemote mutex. r=froydnj
https://hg.mozilla.org/mozilla-central/rev/51e13fb7f254
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment on attachment 8846988 [details]
Bug 1345413 - Fallback to getpwuid() info when LOGNAME is not set when creating the XRemote mutex.

Approval Request Comment
[Feature/Bug causing the regression]: Regression from bug 921063
[User impact if declined]: Startup crash in some cases
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: unset LOGNAME before starting Firefox
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: No
[Why is the change risky/not risky?]: The change adds a graceful fallback instead of the hard crash.
[String changes made/needed]: N/A
Attachment #8846988 - Flags: approval-mozilla-release?
Attachment #8846988 - Flags: approval-mozilla-esr52?
Attachment #8846988 - Flags: approval-mozilla-beta?
Attachment #8846988 - Flags: approval-mozilla-aurora?
Comment on attachment 8846988 [details]
Bug 1345413 - Fallback to getpwuid() info when LOGNAME is not set when creating the XRemote mutex.

Fix a crash. Beta53+ & Aurora54+.
Attachment #8846988 - Flags: approval-mozilla-beta?
Attachment #8846988 - Flags: approval-mozilla-beta+
Attachment #8846988 - Flags: approval-mozilla-aurora?
Attachment #8846988 - Flags: approval-mozilla-aurora+
Comment on attachment 8846988 [details]
Bug 1345413 - Fallback to getpwuid() info when LOGNAME is not set when creating the XRemote mutex.

let's get this in place for a possible dot release
Flags: needinfo?(lultimouomo)
Attachment #8846988 - Flags: approval-mozilla-release?
Attachment #8846988 - Flags: approval-mozilla-release+
Attachment #8846988 - Flags: approval-mozilla-esr52?
Attachment #8846988 - Flags: approval-mozilla-esr52+
Duplicate of this bug: 1346789
Flagging this for manual testing, instructions in Comment 0 and Comment 23.
Flags: qe-verify+
https://hg.mozilla.org/releases/mozilla-esr52/rev/b7c1f472aed7 (FIREFOX_ESR_52_0_X_RELBRANCH - 52.0.2)
The production version of firefox (52.0.1) that hit the Ubuntu distribution system this morning, does not have a fix for this issue.
52.0.1 was released last week to address a security issue disclosed by the pwn2own competition. This fix won't ship until 52.0.2 is released later this week (or whenever you distro happens to do so after that).
Reproduced the initial crash using 52.0 Firefox Release on Ubuntu 16.04 64bit, verified that the crash does not occur using Firefox 53beta5, latest Developer Edition 54.0a2 and latest Nightly 55.0a1 on Ubuntu 16.04 64bit.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Duplicate of this bug: 1348766
Let's make sure this works as intended on 52.0.2 as well.
Flags: qe-verify+
Reproduced the crash on 52.0, Ubuntu 16.04.
Verified fixed Fx 52.0.2, 52.0.2 ESR
You need to log in before you can comment on or make changes to this bug.