Closed Bug 1594686 Opened 5 years ago Closed 5 years ago

Local m-c build on Linux crashes on startup when compiled with clang-6

Categories

(Firefox Build System :: General, defect, P2)

x86_64
Linux
defect

Tracking

(firefox71 wontfix)

RESOLVED WONTFIX
Tracking Status
firefox71 --- wontfix

People

(Reporter: MatsPalmgren_bugz, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Attachments

(1 file)

Starting it under gdb shows that it crashes due to infinite recursion copying a string:

(gdb) bt
#0 0x00007fffe443ab5d in mozilla::detail::nsTStringRepr<char>::nsTStringRepr(char*, unsigned int, mozilla::detail::StringDataFlags, mozilla::detail::StringClassFlags) (this=<optimized out>, aData=0x7fffea2099c4 <gNullChar> "", aLength=0, aDataFlags=mozilla::detail::StringDataFlags::TERMINATED, aClassFlags=mozilla::detail::StringClassFlags::NULL_TERMINATED) at xpcom/string/nsTStringRepr.h:322
#1 0x00007fffe443ab5d in nsTSubstring<char>::nsTSubstring(mozilla::detail::StringClassFlags) (this=<optimized out>, aClassFlags=mozilla::detail::StringClassFlags::NULL_TERMINATED) at xpcom/string/nsTSubstring.h:1142
#2 0x00007fffe443ab5d in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=<optimized out>, aTuple=...) at xpcom/string/nsTString.h:95
#3 0x00007fffe443ab5d in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf040, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#4 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546
#5 0x00007fffe443ab82 in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTString.h:96
#6 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf090, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#7 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546
#8 0x00007fffe443ab82 in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTString.h:96
#9 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf0e0, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#10 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546
#11 0x00007fffe443ab82 in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTString.h:96
#12 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf130, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#13 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546
#14 0x00007fffe443ab82 in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTString.h:96
#15 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf180, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#16 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546
#17 0x00007fffe443ab82 in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTString.h:96
#18 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf1d0, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#19 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546
#20 0x00007fffe443ab82 in nsTString<char>::nsTString(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTString.h:96
#21 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&, std::nothrow_t const&) (this=0x7fffdf6bf220, aTuple=..., aFallible=...) at xpcom/string/nsTSubstring.cpp:556
#22 0x00007fffe443ab82 in nsTSubstring<char>::Assign(nsTSubstringTuple<char> const&) (this=0x7fffea2099c4 <gNullChar>, aTuple=...) at xpcom/string/nsTSubstring.cpp:546

I'm guessing the above code is miscompiled for some reason.
I use the default clang on my system. Is this version not supported anymore?

# /usr/bin/clang --version
clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

This is on "Ubuntu 18.04.3 LTS".

Problem solved by removing:

export CC=/usr/bin/clang
export CXX=/usr/bin/clang++

from my .mozconfig file. Then the build defaulted to use a newer clang I had under $HOME/.mozbuild/clang/bin/

I'll leave this bug open anyway in case clang-6 is supposed to be supported.

Severity: blocker → normal
Summary: Local m-c build on Linux crashes on startup with "Exiting due to channel error" message → Local m-c build on Linux crashes on startup when compiled with clang-6

dmajor pointed out that there are known miscompilation issues with clang-6 that we've patched for our version that we use in automation. I'm not sure there's much to do here but maybe add a check for clang-6.0.0 and refuse to build?

If the clang-6 is one from taskcluster (say, if it's in .mozbuild) then it might still work...

See Also: → 1595360
Crash Signature: [@ libxul.so@0x8741e6 | libxul.so@0x8741c0 | libxul.so@0x8741c0 | libxul.so@0x8740f7 | libxul.so@0x86d82f | libxul.so@0x299828a | libxul.so@0x5d3b0e3 | libxul.so@0x5d3bc79 | libxul.so@0x5d3ba35 | libxul.so@0x898308 | libxul.so@0x6f33d37 | libxul.so@0x9037…
Crash Signature: libxul.so@0x9037...] → libxul.so@0x9037...] [@ libxul.so@0x8743c6 | libxul.so@0x8743a0 | libxul.so@0x8743a0 | libxul.so@0x8742d7 | libxul.so@0x86da0f | libxul.so@0x2998afa | libxul.so@0x5d39d70 | libxul.so@0x5d3a906 | libxul.so@0x5d3a6c2 | libxul.so@0x8984e8 | libxul.so@0x6f3…
See Also: → 1600467

Can we add a configure check for this somehow? It seems ubuntu almost ships a build with this bug, see bug 1600467... :/

This is being tracked by https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/1850529 in Ubuntu.
Ubuntu 16.04 has clang 6.0, and as pointed out this is known to result in miscompilation issues, which for some reason haven't really surfaced until now (firefox 71). I have tested backporting clang 6.0.1 to Ubuntu 16.04 with all the patches listed in https://hg.mozilla.org/mozilla-central/file/5f1704e88fa79ad4156497de208c87c58a228ca2/build/build-clang/clang-6-linux64.json (as suggested in https://bugzilla.mozilla.org/show_bug.cgi?id=1592571#c11), but it didn't help.
I am considering attempting a backport of clang 8.

Please note that these Ubuntu builds (such as the one mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1600467) are for testing purposes only, and the PPAs containing them (particularly https://launchpad.net/~ubuntu-mozilla-security/+archive/ubuntu/ppa) advertise that they are not meant for end users. Ubuntu developers do not intend to release a broken build of firefox to the Ubuntu archive.

Ah, thanks for the update Olivier :)

I think it'd still be nice to detect this at configure time if possible, but it may be not worth the churn.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #10)

Ah, thanks for the update Olivier :)

I think it'd still be nice to detect this at configure time if possible, but it may be not worth the churn.

I agree - me and a gsoc student were stumped on this for close to a month, trying to figure out what was wrong with their setup. There's also been a succession of dupes filed.

Can we add a configure check to prevent building with clang 6 unless located in .mozbuild ?

Component: XPCOM → General
Flags: needinfo?(nfroyd)
Product: Core → Firefox Build System

Oops, wrong needinfo requestee.

(In reply to :Gijs (he/him) from comment #12)

Can we add a configure check to prevent building with clang 6 unless located in .mozbuild ?

Flags: needinfo?(nfroyd) → needinfo?(cmanchester)

Yes I will find an assignee.

Blocks: mach-busted
Priority: -- → P2

I'll take this. dmajor, does it seem worthwhile to explicitly check for the clang 6 from automation case? Just disallowing clang-6 seems more straightforward.

Assignee: nobody → cmanchester
Flags: needinfo?(cmanchester) → needinfo?(dmajor)

(In reply to Chris Manchester (:chmanchester) from comment #15)

I'll take this. dmajor, does it seem worthwhile to explicitly check for the clang 6 from automation case? Just disallowing clang-6 seems more straightforward.

So you're proposing that our clang requirement on Linux would be "5 or later, but not 6"? I don't have any objection, but double check with froydnj.

Flags: needinfo?(dmajor) → needinfo?(nfroyd)

(In reply to :dmajor from comment #16)

(In reply to Chris Manchester (:chmanchester) from comment #15)

I'll take this. dmajor, does it seem worthwhile to explicitly check for the clang 6 from automation case? Just disallowing clang-6 seems more straightforward.

So you're proposing that our clang requirement on Linux would be "5 or later, but not 6"? I don't have any objection, but double check with froydnj.

Disallowing clang 6 entirely seems OK to me. Anybody who still has our automation-built clang 6 probably needs a kick to upgrade anyway. Please add a toolchain test to ensure that we detect clang 6 properly.

Flags: needinfo?(nfroyd)

It's very possible this is the same as bug 1601707. If it is, we don't need to exclude clang 6.

(In reply to Mike Hommey [:glandium] (high latency) from comment #18)

It's very possible this is the same as bug 1601707. If it is, we don't need to exclude clang 6.

OTOH, I don't want to be rediscovering that people are using a compiler that doesn't implement some finer point of C++17 (bug 1601707 comment 5) every couple of weeks or months, wasting time debugging, and rewriting the code to compensate.

(In reply to Nathan Froyd [:froydnj] from comment #19)

(In reply to Mike Hommey [:glandium] (high latency) from comment #18)

It's very possible this is the same as bug 1601707. If it is, we don't need to exclude clang 6.

OTOH, I don't want to be rediscovering that people are using a compiler that doesn't implement some finer point of C++17 (bug 1601707 comment 5) every couple of weeks or months, wasting time debugging, and rewriting the code to compensate.

I think we should have some gcc build running tests on automation, fwiw. Finding stuff like that or bug 1600735 the hard way is really painful :(

Maybe even tier 2 / running on m-c only, or something?

(In reply to Emilio Cobos Álvarez (:emilio) from comment #20)

(In reply to Nathan Froyd [:froydnj] from comment #19)

(In reply to Mike Hommey [:glandium] (high latency) from comment #18)

It's very possible this is the same as bug 1601707. If it is, we don't need to exclude clang 6.

OTOH, I don't want to be rediscovering that people are using a compiler that doesn't implement some finer point of C++17 (bug 1601707 comment 5) every couple of weeks or months, wasting time debugging, and rewriting the code to compensate.

I think we should have some gcc build running tests on automation, fwiw. Finding stuff like that or bug 1600735 the hard way is really painful :(

I totally agree about this stuff being super-painful to debug. But running GCC and clang base-toolchain tests might be a lot to ask. :(

(In reply to Emilio Cobos Álvarez (:emilio) from comment #21)

Maybe even tier 2 / running on m-c only, or something?

I'll just note that as somebody who has had multiple patches backed out because of jobs that only run on central and those jobs aren't selectable by default by mach try fuzzy (or non-obviously selectable), I'd really not like to see us add more of "only run on m-c" jobs.

(In reply to Nathan Froyd [:froydnj] from comment #19)

(In reply to Mike Hommey [:glandium] (high latency) from comment #18)

It's very possible this is the same as bug 1601707. If it is, we don't need to exclude clang 6.

OTOH, I don't want to be rediscovering that people are using a compiler that doesn't implement some finer point of C++17 (bug 1601707 comment 5) every couple of weeks or months, wasting time debugging, and rewriting the code to compensate.

That specific comment explicitly says current GCC doesn't implement it. Are we going to require clang only now?

(In reply to Mike Hommey [:glandium] (high latency) from comment #24)

(In reply to Nathan Froyd [:froydnj] from comment #19)

(In reply to Mike Hommey [:glandium] (high latency) from comment #18)

It's very possible this is the same as bug 1601707. If it is, we don't need to exclude clang 6.

OTOH, I don't want to be rediscovering that people are using a compiler that doesn't implement some finer point of C++17 (bug 1601707 comment 5) every couple of weeks or months, wasting time debugging, and rewriting the code to compensate.

That specific comment explicitly says current GCC doesn't implement it. Are we going to require clang only now?

Can we bump the required version of GCC too? :)

I don't think we can drop GCC support just on that. Maybe we should just get a static checker for the string issue?

(In reply to Mike Hommey [:glandium] (high latency) from comment #24)

That specific comment explicitly says current GCC doesn't implement it. Are we going to require clang only now?

Well GCC just fixed it in fairness: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=279069

I think they would've fixed it much earlier if we would've found this on automation rather than after busting builds shipped by distros to a bunch of users... :)

I think supporting clang-only would be sad, IMHO. I use gcc builds from time to time for debugging, as they have better debug info.

(In reply to Nathan Froyd [:froydnj] from comment #23)

I'll just note that as somebody who has had multiple patches backed out because of jobs that only run on central and those jobs aren't selectable by default by mach try fuzzy (or non-obviously selectable), I'd really not like to see us add more of "only run on m-c" jobs.

To be clear, I'd be more than happy with them running in all pushes :)

But if the automation cost is a concern, Tier2 + central-only shouldn't get anyone backed out, and should be cheaper, if I understand our backout policy correctly.

I ended up writing a patch for this. I'll post it in case we end up wanting to take it.

Someone on #introduction mentioned that https://phabricator.services.mozilla.com/D56873 did fix the problem for them.

(In reply to Olivier Tilloy from comment #9)

I am considering attempting a backport of clang 8.

I ended up backporting clang 8 to Ubuntu 16.04, and rebuilding firefox 71.0+build5 with it. The startup crash is gone, and that build was published to xenial-security and xenial-updates yesterday. So as far as Ubuntu is concerned, problem solved.

Un-assigning for now. There seem to be arguments for and against here, but if we're sure people aren't hitting this anymore this should be closed.

Assignee: cmanchester → nobody

Closing

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: