Open Bug 1729459 Opened 3 months ago Updated 3 days ago

Floating-Point Normalization breaks build on 32bit Linux

Categories

(Core :: JavaScript Engine, defect)

Firefox 93
defect

Tracking

()

Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- fix-optional
firefox92 --- unaffected
firefox93 --- wontfix
firefox94 --- wontfix
firefox95 --- fix-optional

People

(Reporter: me, Unassigned, NeedInfo)

References

(Regression)

Details

(Keywords: regression)

Attachments

(3 files)

(In reply to Tooru Fujisawa [:arai] from comment #48 on Bug 531915)

given that what we want to do there is just use double, regardless of system's definition (double or long double),
I think we'd better workaround the conflict by not using the double_t or __double_t identifiers.

We can replace double_t and __double_t to some other names, or maybe just use double itself.

I don't think that would resolve the issue. In math libraries, for uniformity we want to use 64bit doubles on all systems---on some systems a double might be 32bit---which we why we define a separate double_t.

On Windows, a double is always 64bit so we don't have an issue. However, that seems to be not true on the specified linux as per the crashlog because the math library in glibc is defining double_t to be long double.

We could check if double_t is defined and use that but, from a quick google search, that doesn't seem easy to do in cpp.

Flags: needinfo?(tom)
Flags: needinfo?(arai.unmht)

AFAICT this isn't a runtime crash it's a build break

Summary: Floating-Point Normalization Crashes on 32bit Linux → Floating-Point Normalization breaks build on 32bit Linux

So just to confirm, this didn't break all Firefox x86 builds; we're still green on mozilla-central. Those builds are called 'Linux' but don't have an explicit x86 or 32-bit id in their name. So it's only affecting a subset of Linux builds.

Just to make sure, does sizeof(double) == 4 happen also on C++ ?
the original code is C, but we're compiling it as C++.

Flags: needinfo?(arai.unmht) → needinfo?(me)

One potential solution adapted from your original idea of just using the system supplied double, that sidesteps this issue, is to not define double_t for 32-bit Linux and only define it for other platforms. Since glibc is guaranteed to define double_t (cf. these lines in math.h) it should, in theory, work (and it shouldn't break any existing behaviour on other platforms).

I am thinking something like

#if defined(__linux__) && defined(__i386__)
// do nothing, rely on glibc's double_t
#else
// keep the original code
#endif
Flags: needinfo?(me)

(In reply to Tom Ritter [:tjr] (ni? for response to CVE/sec-approval/advisories/etc) from comment #3)

So just to confirm, this didn't break all Firefox x86 builds; we're still green on mozilla-central.

mozilla-central builds using an old glibc.

(In reply to sanketh from comment #1)

I don't think that would resolve the issue. In math libraries, for uniformity we want to use 64bit doubles on all systems---on some systems a double might be 32bit---which we why we define a separate double_t.

There is some misunderstanding here, double is always 64-bit, whereas long double is at least 64-bit. See also https://en.cppreference.com/w/cpp/language/types#Floating-point_types and https://en.cppreference.com/w/cpp/types/climits/FLT_EVAL_METHOD.

(In reply to André Bargull [:anba] from comment #7)

(In reply to sanketh from comment #1)

I don't think that would resolve the issue. In math libraries, for uniformity we want to use 64bit doubles on all systems---on some systems a double might be 32bit---which we why we define a separate double_t.

There is some misunderstanding here, double is always 64-bit, whereas long double is at least 64-bit. See also https://en.cppreference.com/w/cpp/language/types#Floating-point_types and https://en.cppreference.com/w/cpp/types/climits/FLT_EVAL_METHOD.

Hmm, interesting, why glibc is doing then. 🤔

On 32-bit x86, unless explicitly overridden by someone setting __FLT_EVAL_METHOD__, it sets __GLIBC_FLT_EVAL_METHOD=2 and that setting causes double_t = long double in math.h.

I am really confused now. I don't have access to a 32-bit linux distro right now but I'd be curious to know what sizeof(double) and sizeof(long double) return. (fwiw, on 32-bit and 64-bit Windows, they both return 8, and on 64-bit linux sizeof(double) = 8 and sizeof(long double) = 16 (I got these numbers from the Compiler Explorer)).

sizeof(double) is always 8. (We don't support any exotic platform where sizeof(double) is anything other than 8.)

sizeof(long double) with be most likely:

  • Windows (any platform): 8
  • x86: 12
  • x86-64: 16
  • ARM32: 8
  • ARM64: 16

Looking at our custom patches, https://searchfox.org/mozilla-central/source/modules/fdlibm/patches/20_emulate_freebsd_internal_double_types.patch looks suspicious:

Furthermore the patch links to FreeBSD, but FreeBSD actually differentiates between different platforms, whereas our code simply takes the definitions for __LP64__:

(In reply to André Bargull [:anba] from comment #10)

sizeof(double) is always 8. (We don't support any exotic platform where sizeof(double) is anything other than 8.)

sizeof(long double) with be most likely:

  • Windows (any platform): 8
  • x86: 12
  • x86-64: 16
  • ARM32: 8
  • ARM64: 16

Cool, thank you so much for looking into this. In that case, I guess we can continue with arai's suggestion of using double everywhere and not defining the freebsd internal types, and it would save us a lot of pain in the future.


Looking at our custom patches, https://searchfox.org/mozilla-central/source/modules/fdlibm/patches/20_emulate_freebsd_internal_double_types.patch looks suspicious:

In retrospect, that was a bad idea, at the time, I was just thinking about keeping the patch size small and it didn't hit me that we shouldn't be defining identifiers that start with double-underscore.

That's fair.

Furthermore the patch links to FreeBSD, but FreeBSD actually differentiates between different platforms, whereas our code simply takes the definitions for __LP64__:

Originally, the patch was supposed to differentiate between platforms, but it started to get unwieldy once we got Windows into the mix (since we needed to differentiate between windows 32-bit and non-windows 32-bit), we would have a bunch of cases and in all cases except one (linux 32-bit) it was the same outcome (double_t = double), and the simple version (the current one) passed all the try tests so it seemed like it was working.

I suggest something along these lines: that brings the double_t-related typedefs in line with FreeBSD's definitions with/out LP_64 https://cgit.freebsd.org/src/tree/sys/x86/include/_types.h#n55 and basically reverts modules/fdlibm/patches/19_remove_unneeded_round_to_integer_helpers.patch as on 32bit these helpers are not as unused as they were assumed to be.

Duplicate of this bug: 1734977

This bug affects 32bit (x86) builds on Gentoo as well. That is documented in https://bugs.gentoo.org/816975

I tried Christoph's patch but it would not work as is. The following new issues were introduced:

8 0:48.08 In file included from /var/tmp/portage/www-client/firefox-93.0/work/firefox-93.0/modules/fdlibm/src/e_acos.cpp:44:
8 0:48.08 /var/tmp/portage/www-client/firefox-93.0/work/firefox-93.0/modules/fdlibm/src/math_private.h:638:8: error: unknown type name '__float_t'; did you mean 'float_t'?
8 0:48.08 rnintf(__float_t x)
8 0:48.08 ^~~~~~~~~
8 0:48.09 float_t
8 0:48.09 /usr/include/math.h:155:21: note: 'float_t' declared here
8 0:48.09 typedef long double float_t;
8 0:48.09 ^
8 0:48.09 In file included from /var/tmp/portage/www-client/firefox-93.0/work/firefox-93.0/modules/fdlibm/src/e_acos.cpp:44:
8 0:48.10 /var/tmp/portage/www-client/firefox-93.0/work/firefox-93.0/modules/fdlibm/src/math_private.h:662:14: error: exponent has no digits
8 0:48.10 return (x + __CONCAT(0x1.8p, LDBL_MANT_DIG) / 2 -
8 0:48.10 ^
8 0:48.10 /usr/include/sys/cdefs.h:109:23: note: expanded from macro '__CONCAT'
8 0:48.11 #define __CONCAT(x,y) x ## y
8 0:48.11 ^
8 0:48.11 <scratch space>:283:6: note: expanded from here
8 0:48.11 0x1.8pLDBL_MANT_DIG
8 0:48.12 ^
8 0:48.12 In file included from /var/tmp/portage/www-client/firefox-93.0/work/firefox-93.0/modules/fdlibm/src/e_acos.cpp:44:
8 0:48.12 /var/tmp/portage/www-client/firefox-93.0/work/firefox-93.0/modules/fdlibm/src/math_private.h:663:3: error: exponent has no digits
8 0:48.13 __CONCAT(0x1.8p, LDBL_MANT_DIG) / 2);
8 0:48.13 ^
8 0:48.13 /usr/include/sys/cdefs.h:109:23: note: expanded from macro '__CONCAT'
8 0:48.13 #define __CONCAT(x,y) x ## y
8 0:48.13 ^
8 0:48.14 <scratch space>:284:6: note: expanded from here
8 0:48.14 0x1.8pLDBL_MANT_DIG
8 0:48.14 ^
8 0:48.15 3 errors generated.

I managed to modify Christoph's patch to resolve the above issues (see attachment), although my changes may not be the best.

Working patch for 32bit (x86) builds on Gentoo. My changes very likely can be improved upon, but they do work.

This patch does not work on arm64, ppc64el and s390x and probably also not on mips64el.
I only tested a build for i386 and amd64 locally so I didn't noticed early enough the breakage on the other architectures before I did the upload.

https://buildd.debian.org/status/logs.php?pkg=thunderbird&ver=1%3A91.2.1-1

Carsten, Which patch are you referring too? Christoph's patch or my modified version of that?

I looked at all the build logs for the "Maybe-Failed" builds and they all fail the same way, everyone of them is complaining about the "#define __CONCAT(x,y) ...." like what I saw for x86 on Gentoo (see above). That suggests you didn't actually use my version of the patch, or LDBL_MANT_DIG is not 64 like it is for x86 on Gentoo. Either way, it seems the macro expansion is done differently by the compiler/pre-processor on freebsd then the systems we tried to patch.

The compiler/pre-processor on freebsd seems to replace LDBL_MANT_DIG with its actual value before processing the __CONCAT() macro, whereas the systems we tried it on do not. That is, if LDBL_MANT_DIG is 64, __CONCAT(0x1.8p, LDBL_MANT_DIG) will get properly expanded to 0x1.8p64 on freebsd (or so it seems) and on the systems we used it gets expanded to 0x1.8pLDBL_MANT_DIG, which is invalid. From some test code I coubled together the behavior was the same for both gcc/g++ and clang/clang++. I was not able to resolve this macro expansion issue, that's why my version of the patch used the following ugly hack to avoid the problem, but it only works when LDBL_MANT_DIG is 64:

#if (LDBL_MANT_DIG == 64)
return (x + __CONCAT(0x1.8p, 64) / 2 -
__CONCAT(0x1.8p, 64) / 2);
#else
return (x + __CONCAT(0x1.8p, LDBL_MANT_DIG) / 2 -
__CONCAT(0x1.8p, LDBL_MANT_DIG) / 2);
#endif

I mean and used your patch (modified) patch.

I've tried to play a bit around with the pre-processor variables, but I'm lost in the pool of possible options here. I'm to stupid to dig further into this, it's to long ago I've done some C or C++ hacking.

I'm curious... how does Mozilla themselves manage to build the x86 version available for download below?

https://ftp.mozilla.org/pub/firefox/releases/94.0.1/linux-i686/en-US/firefox-94.0.1.tar.bz2

(In reply to pp from comment #19)

I'm curious... how does Mozilla themselves manage to build the x86 version available for download below?

https://ftp.mozilla.org/pub/firefox/releases/94.0.1/linux-i686/en-US/firefox-94.0.1.tar.bz2

It's built against a relatively old glibc that either doesn't define double_t or defines it to that same type (probably the former).

This new patch will resolve the __CONCAT() issue encountered by Carsten on Debian, which is the same as I previously encountered with x86 on Gentoo (see comments #14 and #16). The original workaround I had for that issue was specific to x86 and was ugly. That was due to a lack of understanding of the preprocessor macros and specifically the __CONCAT() macro.

It turns out that __CONCAT() is defined differently on FreeBSD and non-FreeBSD targets. On FreeBSD __CONCAT() is defined as:

#define __CONCAT1(x,y) x ## y
#define __CONCAT(x,y) __CONCAT1(x,y)

On non-FreeBSD targets it is defined as:

#define __CONCAT(x,y) x ## y

That multi-level implementation on FreeBSD matters in this case due to the hexadecimal floating-point values being created and without it the macro does not get expanded properly as we saw in the build failures.

The new patch uses a new macro to wrap __CONCAT() and ensure the result is always expanded properly.

See the following for more details:

https://gcc.gnu.org/onlinedocs/gcc-3.0.1/cpp_3.html#SEC32
https://sources.debian.org/src/glibc/2.32-3/misc/sys/cdefs.h/
https://github.com/freebsd/freebsd-src/blob/main/sys/sys/cdefs.h

I can confirm that this patch fixes at least the compilation issues I had previously for 91.2.0 on Debian, I've added the above patch to the build chain for 91.3.2 in Debian unstable and all critical platform have build successfully binary packages.

https://buildd.debian.org/status/logs.php?pkg=thunderbird&ver=1%3A91.3.2-1

Thanks for working on the patch!

So here's another funny fact: __FTL_EVAL_METHOD__ is set by the compiler. GCC sets it to 2 on x86, but clang sets it to 0 (it's actually more complicated, clang gives different values on netbsd). Which means depending on the compiler, double_t can be different. So I would actually expect that last patch to still be failing in some situations.

So I would actually expect that last patch to still be failing in some situations.

Confirmed: the patch breaks on Mozilla CI.

The following makes things build on x86 for me with both GCC and clang, but the test added in bug 531915 doesn't pass when compiling with GCC (but does when compiling with clang)

diff --git a/modules/fdlibm/src/math_private.h b/modules/fdlibm/src/math_private.h
index 51d79f9c2ec59..fafd7d6fc1e0d 100644
--- a/modules/fdlibm/src/math_private.h
+++ b/modules/fdlibm/src/math_private.h
@@ -30,7 +30,11 @@
  * Adapted from https://github.com/freebsd/freebsd-src/search?q=__double_t
  */
 
+#if defined __FLT_EVAL_METHOD__ && (__FLT_EVAL_METHOD__ == 2)
+typedef long double      __double_t;
+#else
 typedef double      __double_t;
+#endif
 typedef __double_t  double_t;
 
 /*

(In reply to Mike Hommey [:glandium] from comment #25)

The following makes things build on x86 for me with both GCC and clang, but the test added in bug 531915 doesn't pass when compiling with GCC (but does when compiling with clang)

The same applies to the patch from comment 21 adjusted with the patch from comment 25.

You need to log in before you can comment on or make changes to this bug.