Local m-c build on Linux crashes on startup when compiled with clang-6
Categories
(Firefox Build System :: General, defect, P2)
Tracking
(firefox71 wontfix)
Tracking | Status | |
---|---|---|
firefox71 | --- | wontfix |
People
(Reporter: MatsPalmgren_bugz, Unassigned)
References
(Blocks 1 open bug)
Details
Crash Data
Attachments
(1 file)
Starting it under gdb shows that it crashes due to infinite recursion copying a string:
(gdb) bt
#0 0x00007fffe443ab5d in mozilla::detail::nsTStringRepr<char>::nsTStringRepr(char*, unsigned int, mozilla::detail::StringDataFlags, mozilla::detail::StringClassFlags) (this=<optimized out>, aData=0x7fffea2099c4 <gNullChar> "", aLength=0, aDataFlags=mozilla::detail::StringDataFlags::TERMINATED, aClassFlags=mozilla::detail::StringClassFlags::NULL_TERMINATED) at xpcom/string/nsTStringRepr.h:322
#1 0x00007fffe443ab5d in nsTSubstring<char>::nsTSubstring(mozilla::detail::StringClassFlags) (this=<optimized out>, aClassFlags=mozilla::detail::StringClassFlags::NULL_TERMINATED) at xpcom/string/nsTSubstring.h:1142
#2 0x00007fffe443ab5d in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=<optimized out>, aTuple=...) at xpcom/string/nsTString.h:95
#3 0x00007fffe443ab5d in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf040, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#4 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546
#5 0x00007fffe443ab82 in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTString.h:96
#6 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf090, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#7 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546
#8 0x00007fffe443ab82 in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTString.h:96
#9 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf0e0, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#10 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546
#11 0x00007fffe443ab82 in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTString.h:96
#12 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf130, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#13 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546
#14 0x00007fffe443ab82 in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTString.h:96
#15 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf180, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#16 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546
#17 0x00007fffe443ab82 in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTString.h:96
#18 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf1d0, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#19 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546
#20 0x00007fffe443ab82 in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTString.h:96
#21 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf220, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#22 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546
Reporter | ||
Comment 1•5 years ago
•
|
||
I'm guessing the above code is miscompiled for some reason.
I use the default clang on my system. Is this version not supported anymore?
# /usr/bin/clang --version
clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Reporter | ||
Comment 2•5 years ago
|
||
This is on "Ubuntu 18.04.3 LTS".
Reporter | ||
Comment 3•5 years ago
•
|
||
Problem solved by removing:
export CC=/usr/bin/clang
export CXX=/usr/bin/clang++
from my .mozconfig file. Then the build defaulted to use a newer clang I had under $HOME/.mozbuild/clang/bin/
I'll leave this bug open anyway in case clang-6 is supposed to be supported.
Comment 4•5 years ago
|
||
dmajor pointed out that there are known miscompilation issues with clang-6 that we've patched for our version that we use in automation. I'm not sure there's much to do here but maybe add a check for clang-6.0.0 and refuse to build?
If the clang-6 is one from taskcluster (say, if it's in .mozbuild) then it might still work...
Updated•5 years ago
|
Updated•5 years ago
|
Comment 8•5 years ago
|
||
Can we add a configure check for this somehow? It seems ubuntu almost ships a build with this bug, see bug 1600467... :/
Comment 9•5 years ago
|
||
This is being tracked by https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/1850529 in Ubuntu.
Ubuntu 16.04 has clang 6.0, and as pointed out this is known to result in miscompilation issues, which for some reason haven't really surfaced until now (firefox 71). I have tested backporting clang 6.0.1 to Ubuntu 16.04 with all the patches listed in https://hg.mozilla.org/mozilla-central/file/5f1704e88fa79ad4156497de208c87c58a228ca2/build/build-clang/clang-6-linux64.json (as suggested in https://bugzilla.mozilla.org/show_bug.cgi?id=1592571#c11), but it didn't help.
I am considering attempting a backport of clang 8.
Please note that these Ubuntu builds (such as the one mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1600467) are for testing purposes only, and the PPAs containing them (particularly https://launchpad.net/~ubuntu-mozilla-security/+archive/ubuntu/ppa) advertise that they are not meant for end users. Ubuntu developers do not intend to release a broken build of firefox to the Ubuntu archive.
Comment 10•5 years ago
|
||
Ah, thanks for the update Olivier :)
I think it'd still be nice to detect this at configure time if possible, but it may be not worth the churn.
Comment 12•5 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #10)
Ah, thanks for the update Olivier :)
I think it'd still be nice to detect this at configure time if possible, but it may be not worth the churn.
I agree - me and a gsoc student were stumped on this for close to a month, trying to figure out what was wrong with their setup. There's also been a succession of dupes filed.
Can we add a configure check to prevent building with clang 6 unless located in .mozbuild
?
Comment 13•5 years ago
|
||
Oops, wrong needinfo requestee.
(In reply to :Gijs (he/him) from comment #12)
Can we add a configure check to prevent building with clang 6 unless located in
.mozbuild
?
Comment 15•5 years ago
|
||
I'll take this. dmajor, does it seem worthwhile to explicitly check for the clang 6 from automation case? Just disallowing clang-6 seems more straightforward.
Comment 16•5 years ago
|
||
(In reply to Chris Manchester (:chmanchester) from comment #15)
I'll take this. dmajor, does it seem worthwhile to explicitly check for the clang 6 from automation case? Just disallowing clang-6 seems more straightforward.
So you're proposing that our clang requirement on Linux would be "5 or later, but not 6"? I don't have any objection, but double check with froydnj.
Comment 17•5 years ago
|
||
(In reply to :dmajor from comment #16)
(In reply to Chris Manchester (:chmanchester) from comment #15)
I'll take this. dmajor, does it seem worthwhile to explicitly check for the clang 6 from automation case? Just disallowing clang-6 seems more straightforward.
So you're proposing that our clang requirement on Linux would be "5 or later, but not 6"? I don't have any objection, but double check with froydnj.
Disallowing clang 6 entirely seems OK to me. Anybody who still has our automation-built clang 6 probably needs a kick to upgrade anyway. Please add a toolchain test to ensure that we detect clang 6 properly.
Comment 18•5 years ago
|
||
It's very possible this is the same as bug 1601707. If it is, we don't need to exclude clang 6.
Comment 19•5 years ago
|
||
(In reply to Mike Hommey [:glandium] (high latency) from comment #18)
It's very possible this is the same as bug 1601707. If it is, we don't need to exclude clang 6.
OTOH, I don't want to be rediscovering that people are using a compiler that doesn't implement some finer point of C++17 (bug 1601707 comment 5) every couple of weeks or months, wasting time debugging, and rewriting the code to compensate.
Comment 20•5 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #19)
(In reply to Mike Hommey [:glandium] (high latency) from comment #18)
It's very possible this is the same as bug 1601707. If it is, we don't need to exclude clang 6.
OTOH, I don't want to be rediscovering that people are using a compiler that doesn't implement some finer point of C++17 (bug 1601707 comment 5) every couple of weeks or months, wasting time debugging, and rewriting the code to compensate.
I think we should have some gcc build running tests on automation, fwiw. Finding stuff like that or bug 1600735 the hard way is really painful :(
Comment 21•5 years ago
|
||
Maybe even tier 2 / running on m-c only, or something?
Comment 22•5 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #20)
(In reply to Nathan Froyd [:froydnj] from comment #19)
(In reply to Mike Hommey [:glandium] (high latency) from comment #18)
It's very possible this is the same as bug 1601707. If it is, we don't need to exclude clang 6.
OTOH, I don't want to be rediscovering that people are using a compiler that doesn't implement some finer point of C++17 (bug 1601707 comment 5) every couple of weeks or months, wasting time debugging, and rewriting the code to compensate.
I think we should have some gcc build running tests on automation, fwiw. Finding stuff like that or bug 1600735 the hard way is really painful :(
I totally agree about this stuff being super-painful to debug. But running GCC and clang base-toolchain tests might be a lot to ask. :(
Comment 23•5 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #21)
Maybe even tier 2 / running on m-c only, or something?
I'll just note that as somebody who has had multiple patches backed out because of jobs that only run on central and those jobs aren't selectable by default by mach try fuzzy
(or non-obviously selectable), I'd really not like to see us add more of "only run on m-c" jobs.
Comment 24•5 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #19)
(In reply to Mike Hommey [:glandium] (high latency) from comment #18)
It's very possible this is the same as bug 1601707. If it is, we don't need to exclude clang 6.
OTOH, I don't want to be rediscovering that people are using a compiler that doesn't implement some finer point of C++17 (bug 1601707 comment 5) every couple of weeks or months, wasting time debugging, and rewriting the code to compensate.
That specific comment explicitly says current GCC doesn't implement it. Are we going to require clang only now?
Comment 25•5 years ago
|
||
(In reply to Mike Hommey [:glandium] (high latency) from comment #24)
(In reply to Nathan Froyd [:froydnj] from comment #19)
(In reply to Mike Hommey [:glandium] (high latency) from comment #18)
It's very possible this is the same as bug 1601707. If it is, we don't need to exclude clang 6.
OTOH, I don't want to be rediscovering that people are using a compiler that doesn't implement some finer point of C++17 (bug 1601707 comment 5) every couple of weeks or months, wasting time debugging, and rewriting the code to compensate.
That specific comment explicitly says current GCC doesn't implement it. Are we going to require clang only now?
Can we bump the required version of GCC too? :)
I don't think we can drop GCC support just on that. Maybe we should just get a static checker for the string issue?
Comment 26•5 years ago
|
||
(In reply to Mike Hommey [:glandium] (high latency) from comment #24)
That specific comment explicitly says current GCC doesn't implement it. Are we going to require clang only now?
Well GCC just fixed it in fairness: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=279069
I think they would've fixed it much earlier if we would've found this on automation rather than after busting builds shipped by distros to a bunch of users... :)
I think supporting clang-only would be sad, IMHO. I use gcc builds from time to time for debugging, as they have better debug info.
(In reply to Nathan Froyd [:froydnj] from comment #23)
I'll just note that as somebody who has had multiple patches backed out because of jobs that only run on central and those jobs aren't selectable by default by
mach try fuzzy
(or non-obviously selectable), I'd really not like to see us add more of "only run on m-c" jobs.
To be clear, I'd be more than happy with them running in all pushes :)
But if the automation cost is a concern, Tier2 + central-only shouldn't get anyone backed out, and should be cheaper, if I understand our backout policy correctly.
Comment 27•5 years ago
|
||
I ended up writing a patch for this. I'll post it in case we end up wanting to take it.
Comment 28•5 years ago
|
||
Comment 29•5 years ago
|
||
Someone on #introduction mentioned that https://phabricator.services.mozilla.com/D56873 did fix the problem for them.
Comment 30•5 years ago
|
||
(In reply to Olivier Tilloy from comment #9)
I am considering attempting a backport of clang 8.
I ended up backporting clang 8 to Ubuntu 16.04, and rebuilding firefox 71.0+build5 with it. The startup crash is gone, and that build was published to xenial-security and xenial-updates yesterday. So as far as Ubuntu is concerned, problem solved.
Updated•5 years ago
|
Comment 31•5 years ago
|
||
Un-assigning for now. There seem to be arguments for and against here, but if we're sure people aren't hitting this anymore this should be closed.
Description
•