Closed Bug 1824691 Opened 2 years ago Closed 1 year ago

[gcc-12 issue] g++-12 seems to miscompile code and produces "fatal: STL threw bad_alloc" at runtime during mochitest of C-C TB (DEBUG version)..

Categories

(Thunderbird :: Build Config, defect)

defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: ishikawa, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

(I am not sure if I selected the right component. Please feel free to fix it.)

During C-C TB mochitest by running DEBUG version of C-C TB,
I see runtime error, fatal: STL threw bad_alloc

A dozen or so tests experience this error and crashed.

I have not seen this error when I compiled C-C TB with an earlier gcc
compiler suite. g++11 does not have this issue, for example.

My tool and linux versions.

I compiled C-C TB under Debian GNU/Linux using gcc-12, g++-12, etc..

g++ version is:
g++-12 (Debian 12.2.0-14) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

linux version is:
uname -a
Linux ip030 6.0.0-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.0.5-1 (2022-10-28) x86_64 GNU/Linux

The version of g++-11 that does not cause this runtime errro to happen.
g++-11 (Debian 11.3.0-12) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I am including a log from a test that failed due to the runtime assertion.

mach mochitest --debuger gdbcomm/calendar/test/browser/browser_calDAV_discovery.js

Short Summary

From the stackdump, I conclude that the following code in /usr/include/c++/7/bits/hashtable.h
produced __bkt_count as
18446744073709551557
which is FFFFFFFFFFFFFFC5 in hex.
This is way too large.
This value is passed to allocation routine and the allocation routine threw the exception and TB blew up.

	auto __nb_elems = __detail::__distance_fw(__f, __l);
	auto __bkt_count =
	  _M_rehash_policy._M_next_bkt(
	    std::max(_M_rehash_policy._M_bkt_for_elements(__nb_elems),
		     __bucket_hint));

I suspect some type of wraparound error, i.e., overflow or something.
I have a suspition that |__nb_elems| was a very funny value because |_f| and |_l| look very similar superficially, but could not verify it because |__nb_elems| is optimised out even at "-Og" (!?) [See the log attached.]
It could be g++-12 produced incorrect code since g++-11 does not have this problem.

The optimization I used was:

ac_add_options --enable-optimize="-g -Og                -fvar-tracking -gdwarf-4 -fvar-tracking-assignments -freorder-blocks"

So actually there was not much optimization and I was a bit disappointed to see that gdb could not print the value of "__nb_elems" saying that it was optimized out.

(Aside from this runtime error, I am a bit curious why the log from the test has strange characters as follows. But I now realize that using g++-11 also produces the same strange characters in the log.

ⰲ겿{"action":"test_status","time":1679894501487,"thread":null,"pid":null,"source":"mochitest","test":"chrome://mochitests/content/browser/comm/calendar/test/browser/browser_calDAV_discovery.js","subtest":"506 > 0","status":"PASS","message":"","js_source":"TestRunner.js"}ⰲ겿
)

Explanation of the log:

The attached log is from running the test using my shell wrapper to
invoke |mach|.

./do-make-tb.sh  mochitest --debugger gdb comm/calendar/test/browser/browser_calDAV_discovery.js

"do-make-tb.sh" is the shell wrapper to set various environment
variables before invoking |mach|.
The shell wrapper produces many additional output so that I can verify which
versions of the tool and source tree I used, and other environment
variable settings.

Also I have modified runtests.py to print out various internal test
information used by the test harness, especially timeout values.
(I have not figured out how to properly increase timeout values to run
mochitest of TB under valgrind, but so far little success. For
|mozmill| test framework, I figured out after may trials and error.s
|mochitest| is a hard nut to crack. But I digress)

So there is way too much information before gdb is finally invoked to
run TB binary.

Skip the additional log output explained above, by searching for
"(gdb) run" to find the execution of TB under gdb with my local debug
output.
Then you will see the segmentation error caught by MOZ_CRASH. The crash happens rather early.

I am reverting to g++-11 for now.
(Except that there is a change/patch for g++-12 I produced which
errs the compilation by g++-11.
I inserted the following pragma line in netwerk/protocol/websocket/WebSocketChannel.cpp.

#pragma GCC diagnostic ignored "-Wuse-after-free"

The reason I inserted this was as follows. g++-12 complained about the use of a pointer value that is passed to realloc. When realloc fails, the previous pointer value is used and
it is a legitimate use case. So I inserted the pragma to aovid the particular check using #pragma GCC diagnostic push/pop pairs.
(Yes, use-after-free catches this use case involving realloc(), too. Amazing.)

It seems -Wuse-after-free is new to g++-12 and not in g++-11. g++-11 complains that it does not know -Wuse-after-free. So I had to change the code to ignore the -Wuse-after-free check only after g++-11 by judicious use of GNUC and its comparison against 11.

LOG(("WebSocketChannel: update read buffer extended to %u\n", mBufferSize));
#pragma GCC diagnostic push
#if defined(__GNUC__) && __GNUC__ > 11
#pragma GCC diagnostic ignored "-Wuse-after-free"
#endif
    /* old, copy of mBuffer,  is used after realloc. But it is intentional.*/

Both g++-12 and gcc-12 seem to have nice additional compile time warning checks
and thus should be used in the future once these runtime problems are weeded
out IMHO.
g++-12 check is nicer, but there ARE places where source patches are necessary.

I have produced a set of patches to compile M-C/C-C TB with -Wall -Werror (except for some warnings) using gcc-12 compiler suite and now I think it covers the major dubious code spots.

EDIT: typos in command line to run test using --debugger were fixed.

The string "GNUC" with two underscores before and after it, _ _ GNUC _ _ without the extra space, appears as bold character GNUC above.

Blocks: build-gcc-12
Summary: g++-12 seems to miscompile code and produces "fatal: STL threw bad_alloc" at runtime during mochitest of C-C TB (DEBUG version).. → [gcc-12 issue] g++-12 seems to miscompile code and produces "fatal: STL threw bad_alloc" at runtime during mochitest of C-C TB (DEBUG version)..

The severity field is not set for this bug.
:Sylvestre, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(sledru)
Component: General → Build Config
Flags: needinfo?(sledru)
Product: Developer Infrastructure → Thunderbird
Assignee: nobody → daniel
Assignee: daniel → nobody

Now I have slightly updated gcc-12 on my PC (Debian GNU/Linux) and updated libstdc++.

$ g++-12 --version
g++-12 (Debian 12.3.0-4) 12.3.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Previously, it was

g++-12 (Debian 12.2.0-14) 12.2.0
With this version, it seems the C-C TB is created and it doesn't show the runtime error. I could run xpcshell test for example.

However, there may be a few issues one needs to be aware of.
For example, I need to use --without-sysroot to correctly configure the headers for new libstdc++, etc. ( Bug 1839324 )
Also, there are many new compile-time warnings and one may need to remove them. (I use -Werror to turn warnings into errors to catch as many subtle source code issues as possible. There ARE warnings that occur too often and for them, I have to use -Wno-error constructs.)

Anyway, it seems that it works now.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: