Open Bug 439144 Opened 16 years ago Updated 1 month ago

investigate compiler warning: dereferencing type-punned pointer will break strict-aliasing rules

Categories

(NSPR :: NSPR, defect)

4.7.1
x86
Linux
defect

Tracking

(Not tracked)

People

(Reporter: KaiE, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [build_warning])

Attachments

(5 files, 7 obsolete files)

When compiling NSPR 4.7.1 on RHEL 5.2 I get the following compiler warning, which I've been asked to check, because it seems to considered a serious warning:

dereferencing type-punned pointer will break strict-aliasing rules

We should investigate whether this is indicating a bug.

gcc is gcc-4.1.2-41.el5

I'm attaching a full build log.

Note that I'm getting the same warning when compiling NSS sources, about 300 times.
I suggest a makefile change to suppress that warning.
I simply refuse to put a whole bunch of unscheduled work into NSS just
because the gcc folks decided that gcc needs to be even more pedantic 
than before.  We've got important work to do, and silencing gcc won't
win any new customers.
You do realize that this warning isn't just about "you should write clean code", right? It means "you are doing something that possibly doesn't do what you think it does", i.e. it warns about things that can cause real broken behavior in NSS.

If you don't want to spend time fixing these warnings, the alternative fix is to disable strict aliasing optimizations. This has in the past lead to measurable performance hits, but it is the only alternative to fixing the warnings if you want to avoid bugs.

Simply suppressing the warnings will hide real bugs.
I actually inspected the warnings in NSPR and tried to fix
them, but didn't finish it.

I agree with Jonas that we should disable strict aliasing optimizatons
until we fix these warnings.  We already had to do that for desblapi.c
in NSS:
http://mxr.mozilla.org/security/search?string=-fno-strict-aliasing
That was done in bug 333917, where aliasing resulted in actual test failures. So please take this warning seriously.
nspr-4.7.3 is broken by gcc-4.4 unless -fno-strict-aliasing is specified.  i suggest you fix the code to comply to aliasing rules or force this flag (preferably the former).
We have received a bug report that looks similar to this one.
See https://bugzilla.redhat.com/show_bug.cgi?id=487844

Disabling optimization fixes that bug.
The bug causes fonts and spacing to be completely wrong.
yes, that is the same issue i'm seeing.
We treat prdtoa.c as third-party code (it's based on David
Gay's dtoa.c with only trivial changes) and try not to
modify our copy.  So I'm just going to compile it with
-fno-strict-aliasing.  I also added -ffloat-store, which
is necessary for our dtoa test program to pass in optimized
builds.
Attachment #365335 - Flags: review?(kaie)
Thanks Wan-Teh, but given that this code still requires --float-store for the test program, the code still seems to be ambiguous.
Attached patch Code patch (obsolete) — Splinter Review
In https://bugzilla.redhat.com/show_bug.cgi?id=487844#c20
Jakub Jelinek proposed how to fix the code.

I wrote a patch based on his proposal, which I attach here.

With this patch:
- the dtoa test program still works
- it's not necessary to compile the dtoa test program with -ffloat-store
Attached patch Code patch v2 (obsolete) — Splinter Review
Attachment #365352 - Attachment is obsolete: true
Attachment #365335 - Flags: review?(kaie) → review+
Comment on attachment 365335 [details] [diff] [review]
Compile prdtoa.c with -ffloat-store -fno-strict-aliasing (backed out)

r=kaie

I have actually used this patch as a workaround for the fedora bug. Thanks Wan-Teh!
Attachment #365367 - Flags: review?(wtc)
Comment on attachment 365367 [details] [diff] [review]
Code patch v2

With this patch we require neither of the two compiler options (added with the workaround patch).
The whitespace doesn't match the existing code. The existing code unfortunately uses tabs.
Attached patch wtc's code patch (obsolete) — Splinter Review
Here is my attempt at fixing the code.  I tried to minimize the diffs
because I'd like to ask the author (David Gay) to fix this upstream.
I use the following strategy.

1. Use the dval, word0, and word1 macros only on variables of type U
(the union).  Do not use these macros on doubles.

2. Functions that take a double parameter or return a double value
continue to do so.  I don't change the parameter or return value to
type U.  Instead, I rename a double parameter by prepending 'd'
and the original double parameter becomes a local variable of type
U.  For example, a double parameter 'x' would be renamed 'dx', and
I declare 'x' as a local variable of type U and initialize 'x' with 'dx':
  void foo(double dx)  /* dx used to be named x */
  {
      U x;

      dval(x) = dx;

3. In strtod, the double variable aadj1 is used in a lot of places "naked",
and is only passed to those three macros once (specifically, word0(aadj1)).
If I redeclare aadj1 as type U, I'll need to use macros on a lot of places.
So instead, I just use a local variable 'tmp' to deal with the single instance
of word0(aadj1).

Minimizing diffs may not result in the best code.  I'll leave the fine tuning
to the author of the code.
Jonas: could you tell the JavaScript team to inspect their copy
of dtoa.c (mozilla/js/src/dtoa.c) for these type-punned pointer
warnings?  Thanks.

Kai, could you test my code patch (attachment 365438 [details] [diff] [review])?  Are there
any remaining type-punned pointer warnings after that patch is
applied?  I don't have gcc 4.4, so I only tried to make the code
compile when I wrote that patch, without considering the type-punned
pointer warnings.
As usual it took quite some work to get the development version of fedora rawhide installed.

Wan-Teh, now I have an environment with gcc 4.4 etc.

Both your patch and my patch, they prevent all the aliasing warnings, and both make the dtoa test program succeeded.

(I did not use the build system patch for the tests)
(In reply to comment #16)
> Jonas: could you tell the JavaScript team to inspect their copy
> of dtoa.c (mozilla/js/src/dtoa.c) for these type-punned pointer
> warnings?  Thanks.
> 
> Kai, could you test my code patch (attachment 365438 [details] [diff] [review])?  Are there
> any remaining type-punned pointer warnings after that patch is
> applied?  I don't have gcc 4.4, so I only tried to make the code
> compile when I wrote that patch, without considering the type-punned
> pointer warnings.

testing on gentoo w/ gcc trunk was successful.  it seems to work perfectly, and fixes a segfault i previously saw when building w/ -O3.

i've added this patch to our gcc-porting overlay for wider testing.
Adding crowder to cc so that he's aware of the changes being done here. I think he's been mucking with the JS copy of dtoa.c
Comment on attachment 365335 [details] [diff] [review]
Compile prdtoa.c with -ffloat-store -fno-strict-aliasing (backed out)

I checked in this makefile patch on the NSPR trunk (NSPR 4.8)
as a workaround.

Checking in Makefile.in;
/cvsroot/mozilla/nsprpub/pr/src/misc/Makefile.in,v  <--  Makefile.in
new revision: 1.21; previous revision: 1.20
done
Attachment #365335 - Attachment description: Compile prdtoa.c with -ffloat-store -fno-strict-aliasing → Compile prdtoa.c with -ffloat-store -fno-strict-aliasing (checked in)
I just found that both mozilla/configure.in and mozilla/js/src/configure.in
use the -fno-strict-aliasing flag (bug 414641).  As a result, the JS copy
of dtoa.c is safe.

I have prepared a patch for the latest dtoa.c from upstream and will
send it to the author soon.
Attachment #365367 - Attachment is obsolete: true
Attachment #365367 - Flags: review?(wtc)
Attached patch wtc's code patch v2 (obsolete) — Splinter Review
I made some minor edits (added parentheses around the macro argument 'x' and
changed 'tmp' to 'temp').

I submitted a version of this patch for the latest upstream dtoa.c to the author.

Kai, could you please review this patch?  Thanks.
Attachment #365438 - Attachment is obsolete: true
Attachment #366067 - Flags: review?(kaie)
This bug is already fixed for the JS copy of dtoa.c in bug 450392.
Comment on attachment 366067 [details] [diff] [review]
wtc's code patch v2

> #ifdef YES_ALIAS
>-#define dval(x) x
>+/* NOTE(wtc): I define dval(x) using a typecast, but
>+ *   #define dval(x) (x).d
>+ * would also work.

Good. I tried. it compiles.
I would prefer the "(x).d" approach, because...


>+ */
>+#define dval(x) ((double *)&x)[0]

... I don't like the pointer cast, because we lose type safety.

If my "type safety argument" is convincing, please use my preference.
If you want to keep your preference, 
please explain in the source comment "why" you prefer the typecast.


> #else
> #ifdef IEEE_8087
>-#define word0(x) ((U*)&x)->L[1]
>-#define word1(x) ((U*)&x)->L[0]
>+#define word0(x) (x).L[1]
>+#define word1(x) (x).L[0]
> #else
>-#define word0(x) ((U*)&x)->L[0]
>-#define word1(x) ((U*)&x)->L[1]
>+#define word0(x) (x).L[0]
>+#define word1(x) (x).L[1]
> #endif
>-#define dval(x) ((U*)&x)->d
>+#define dval(x) (x).d
> #endif


This looks very clean and reasonable to me. It keeps type safety. I like it.


r=kaie with my proposed code change or comment clarification
Attachment #366067 - Flags: review?(kaie) → review+
Attached patch wtc's code patch v3 (obsolete) — Splinter Review
Thanks for the review.  I used the typecast in the dval macro for YES_ALIAS because
the word0 and word1 macros for YES_ALIAS also use typecase.  But your question
made me realize that we don't need the YES_ALIAS code now that we only pass
variables of the union U type to the dval, word0, and word1 macros.  So I simply
removed the YES_ALIAS code in this patch.
Attachment #366067 - Attachment is obsolete: true
Attachment #367352 - Flags: review?(kaie)
Comment on attachment 366067 [details] [diff] [review]
wtc's code patch v2

I submitted this patch to the author of dtoa.c on March 5.  He happened
to be dealing with a couple of other issues related to dtoa.c, and switched
to a use of union U similar to this patch.  He released a new version of
dtoa.c on Monday (2009-03-16) that fixes not only the strict aliasing issues
but also the parentheses and some (but not all) "may be used uninitialized"
gcc warnings.  You can get the new dtoa.c from http://www.netlib.org/fp/dtoa.c
and read the change log at http://www.netlib.org/fp/changes.

So I'm going to update and apply this patch to NSPR's prdtoa.c and back out
the -fno-strict-aliasing workaround.  This patch is not exactly a backport
of the author's fix, but this patch is smaller so I think it's more suitable.
I will upgrade to the new dtoa.c later (in bug 482002) because that requires
more testing.

I'm going to submit NSPR's other dtoa.c patch (see
https://bugzilla.mozilla.org/show_bug.cgi?id=362134) to the
author of dtoa.c.  If you have patched your copy of dtoa.c, I
encourage you to submit your patches to the author.
I made minor changes to patch v3 and checked it in on
the NSPR trunk (NSPR 4.8) and NSPR_4_7_BRANCH (NSPR 4.7.4).

Checking in prdtoa.c;
/cvsroot/mozilla/nsprpub/pr/src/misc/prdtoa.c,v  <--  prdtoa.c
new revision: 4.6; previous revision: 4.5
done

Checking in prdtoa.c;
/cvsroot/mozilla/nsprpub/pr/src/misc/prdtoa.c,v  <--  prdtoa.c
new revision: 4.5.36.1; previous revision: 4.5
done
Attachment #367352 - Attachment is obsolete: true
Attachment #367352 - Flags: review?(kaie)
Comment on attachment 365335 [details] [diff] [review]
Compile prdtoa.c with -ffloat-store -fno-strict-aliasing (backed out)

I backed out this workaround from the NSPR trunk (NSPR 4.8).

Checking in Makefile.in;
/cvsroot/mozilla/nsprpub/pr/src/misc/Makefile.in,v  <--  Makefile.in
new revision: 1.22; previous revision: 1.21
done
Attachment #365335 - Attachment description: Compile prdtoa.c with -ffloat-store -fno-strict-aliasing (checked in) → Compile prdtoa.c with -ffloat-store -fno-strict-aliasing (backed out)
Attachment #365335 - Attachment is obsolete: true
FWIW, I'm building NSPR 4.7.4 for RHEL 5.x, and I still get this warning, see attachment.
Kai, could you ask a GCC developer to look at this test file?

This file contains typical Berkeley socket code that casts the
address of a sockaddr_in structure to a struct sockaddr pointer.
But GCC warns about those typecasts.

Save this file as foo.c and compile it with this command line:
$ gcc -c -O2 -Wall -Wstrict-aliasing=2 foo.c
foo.c: In function ‘foo’:
foo.c:19: warning: dereferencing type-punned pointer might break strict-aliasing rules
foo.c:21: warning: dereferencing type-punned pointer might break strict-aliasing rules

What's the right way to fix these warnings?
Attached patch Fix remaining warnings (obsolete) — Splinter Review
This patch fix the warnings shown in attachment 376436 [details],
and a warning in the test program ranfile.c.

I also fixed a bug (caused by a typo) in PR_SendFile for
HP-UX (in ptio.c) while investigating the warnings.

The test program prpoll.c still has two warnings, which
I reduced to the test file foo.c in attachment 376527 [details].
Attachment #376528 - Flags: review?(kaie)
Comment on attachment 376528 [details] [diff] [review]
Fix remaining warnings


>@@ -1012,17 +1012,17 @@ static PRBool pt_hpux_sendfile_cont(pt_C

>-            hdtrl[0].iov_base = ((char *) hdtrl[0].iov_len) + count;
>+            hdtrl[0].iov_base = ((char *) hdtrl[0].iov_base) + count;
>             hdtrl[0].iov_len -= count;

This is a HUGE find!  It really deserves its own separate bug report!
I removed the fix for the unrelated HP-UX PR_SendFile bug, which
is now covered by bug 492170.
Attachment #376528 - Attachment is obsolete: true
Attachment #376583 - Flags: review?(kaie)
Attachment #376528 - Flags: review?(kaie)
What's the status on this bug?
I don't remember what the status is.  There is a patch
(Fix remaining warnings v2, attachment 376583 [details] [diff] [review]) that needs
review.  If that's reviewed and checked in, I'll mark this
bug fixed.
Whiteboard: [build_warning]
Blocks: buildwarning
Could we get a review on the last patch and finally close this bug?
Kaie, see comment 37.
(In reply to Elio Maldonado from comment #37)
> Could we get a review on the last patch and finally close this bug?

Although I'm only the duplicate reporter - 4.8.8 under Gentoo looks fine wrt this issue.
Building optimized on RHEL 5 (gcc 4.1.x) with the latest patch applied:
  we still get lots of "type-punned"

Building optimized on Fedora 17 (gcc 4.7.x) WIHTOUT the latest patch applied:
  no such warnings


Can we conclude that gcc has been improved to be smarter, and that newer gcc understands that our code is safe?

If the answer is yes, then I propose to close this bug as worksforme.

It seems unnecessary to apply a patch that only fixes warnings with an older compiler version, if the suppression of such warnings is still incomplete.
Comment on attachment 376583 [details] [diff] [review]
Fix remaining warnings v2

partial r=kaie on the portions I've reviewed
  mozilla/nsprpub/pr/tests/ranfile.c
  mozilla/nsprpub/pr/src/pthreads/ptio.c

In mozilla/nsprpub/pr/src/misc/prthinfo.c
if we really consider to check this patch in,
then please be more complete, there are two more places where stack_end is being referenced, please remove the typecast everywhere.

Looking at the change to prfdcache, I'm surprised how dirty that construct is. Typecasting between completely unrelated structs whose only shared semtantic is that the first struct member is a pointer. Wow. But it would be good to keep the existing typecast, so future readers will have an easy chance to discover what type is expected, I'm leaning towards r-
Attachment #376583 - Flags: review?(kaie) → review-

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: wtc → nobody
Severity: normal → S3

nspr 4.35 seems to compile cleanly today. However, the tests are a different story:

x86_64-pc-linux-gnu-gcc -o ranfile.o -c -fvisibility=hidden   -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -march=native -fstack-protector-all -O2 -pipe -fdiagnostics-color=always -frecord-gcc-switches -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-clash-protection -flto=4 -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing  -Wformat -Werror=format-security -Werror=implicit-function-declaration -Werror=implicit-int -Werror=int-conversion -Werror=incompatible-pointer-types -Wall -pthread -O2 -fPIC  -UDEBUG -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DNDEBUG=1 -DHAVE_VISIBILITY_HIDDEN_ATTRIBUTE=1 -DHAVE_VISIBILITY_PRAGMA=1 -DXP_UNIX=1 -D_GNU_SOURCE=1 -DHAVE_FCNTL_FILE_LOCKING=1 -DHAVE_POINTER_LOCALTIME_R=1 -DLINUX=1 -DHAVE_DLADDR=1 -DHAVE_GETTID=1 -DHAVE_LCHOWN=1 -DHAVE_SETPRIORITY=1 -DHAVE_STRERROR=1 -DHAVE_SYSCALL=1 -DHAVE_SECURE_GETENV=1 -D_REENTRANT=1 -DFORCE_PR_LOG -D_PR_PTHREADS -UHAVE_CVAR_BUILT_ON_SEM -I../../dist/include/nspr -I/var/tmp/portage/dev-libs/nspr-4.35-r2/work/nspr-4.35/nspr/pr/include -I/var/tmp/portage/dev-libs/nspr-4.35-r2/work/nspr-4.35/nspr/pr/include/private  /var/tmp/portage/dev-libs/nspr-4.35-r2/work/nspr-4.35/nspr/pr/tests/ranfile.c
In file included from ../../dist/include/nspr/prio.h:16,
                 from /var/tmp/portage/dev-libs/nspr-4.35-r2/work/nspr-4.35/nspr/pr/tests/ranfile.c:36:
/var/tmp/portage/dev-libs/nspr-4.35-r2/work/nspr-4.35/nspr/pr/tests/ranfile.c: In function ‘RandomNum’:
/var/tmp/portage/dev-libs/nspr-4.35-r2/work/nspr-4.35/nspr/pr/tests/ranfile.c:93:22: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
   93 |     LL_USHR(shift, *((PRUint64*)&random), 16);
      |                     ~^~~~~~~~~~~~~~~~~~~

See also: https://bugs.gentoo.org/867634

This makes it difficult to run the testsuite on an LTO-enabled build :P so close...

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: