Last Comment Bug 439144 - investigate compiler warning: dereferencing type-punned pointer will break strict-aliasing rules
: investigate compiler warning: dereferencing type-punned pointer will break st...
Status: NEW
[build_warning]
:
Product: NSPR
Classification: Components
Component: NSPR (show other bugs)
: 4.7.1
: x86 Linux
: -- normal with 1 vote (vote)
: ---
Assigned To: Wan-Teh Chang
:
:
Mentors:
: 471187 (view as bug list)
Depends on:
Blocks: buildwarning
  Show dependency treegraph
 
Reported: 2008-06-13 15:40 PDT by Kai Engert (:kaie) (on vacation)
Modified: 2013-03-07 01:24 PST (History)
10 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
gzip compressed build log (6.74 KB, application/octet-stream)
2008-06-13 15:40 PDT, Kai Engert (:kaie) (on vacation)
no flags Details
Compile prdtoa.c with -ffloat-store -fno-strict-aliasing (backed out) (1.25 KB, patch)
2009-03-03 16:55 PST, Wan-Teh Chang
kaie: review+
Details | Diff | Splinter Review
Code patch (16.41 KB, patch)
2009-03-03 17:53 PST, Kai Engert (:kaie) (on vacation)
no flags Details | Diff | Splinter Review
Code patch v2 (23.10 KB, patch)
2009-03-03 19:00 PST, Kai Engert (:kaie) (on vacation)
no flags Details | Diff | Splinter Review
wtc's code patch (6.58 KB, patch)
2009-03-04 08:48 PST, Wan-Teh Chang
no flags Details | Diff | Splinter Review
wtc's code patch v2 (6.59 KB, patch)
2009-03-06 20:46 PST, Wan-Teh Chang
kaie: review+
Details | Diff | Splinter Review
wtc's code patch v3 (7.76 KB, patch)
2009-03-13 22:09 PDT, Wan-Teh Chang
no flags Details | Diff | Splinter Review
wtc's code patch v4 (checked in) (7.82 KB, patch)
2009-03-19 20:34 PDT, Wan-Teh Chang
no flags Details | Diff | Splinter Review
Warnings on i386 with NSPR 4.7.4 (1.12 KB, text/plain)
2009-05-08 11:16 PDT, Kai Engert (:kaie) (on vacation)
no flags Details
Test file foo.c containing typical BSD socket code (522 bytes, text/plain)
2009-05-08 22:22 PDT, Wan-Teh Chang
no flags Details
Fix remaining warnings (8.43 KB, patch)
2009-05-08 22:30 PDT, Wan-Teh Chang
no flags Details | Diff | Splinter Review
Fix remaining warnings v2 (7.74 KB, patch)
2009-05-09 15:02 PDT, Wan-Teh Chang
kaie: review-
Details | Diff | Splinter Review

Description Kai Engert (:kaie) (on vacation) 2008-06-13 15:40:16 PDT
Created attachment 325040 [details]
gzip compressed build log

When compiling NSPR 4.7.1 on RHEL 5.2 I get the following compiler warning, which I've been asked to check, because it seems to considered a serious warning:

dereferencing type-punned pointer will break strict-aliasing rules

We should investigate whether this is indicating a bug.

gcc is gcc-4.1.2-41.el5

I'm attaching a full build log.

Note that I'm getting the same warning when compiling NSS sources, about 300 times.
Comment 1 Nelson Bolyard (seldom reads bugmail) 2008-06-13 17:46:51 PDT
I suggest a makefile change to suppress that warning.
I simply refuse to put a whole bunch of unscheduled work into NSS just
because the gcc folks decided that gcc needs to be even more pedantic 
than before.  We've got important work to do, and silencing gcc won't
win any new customers.
Comment 2 Jonas Sicking (:sicking) No longer reading bugmail consistently 2008-09-04 14:20:31 PDT
You do realize that this warning isn't just about "you should write clean code", right? It means "you are doing something that possibly doesn't do what you think it does", i.e. it warns about things that can cause real broken behavior in NSS.

If you don't want to spend time fixing these warnings, the alternative fix is to disable strict aliasing optimizations. This has in the past lead to measurable performance hits, but it is the only alternative to fixing the warnings if you want to avoid bugs.

Simply suppressing the warnings will hide real bugs.
Comment 3 Wan-Teh Chang 2008-09-05 00:00:37 PDT
I actually inspected the warnings in NSPR and tried to fix
them, but didn't finish it.

I agree with Jonas that we should disable strict aliasing optimizatons
until we fix these warnings.  We already had to do that for desblapi.c
in NSS:
http://mxr.mozilla.org/security/search?string=-fno-strict-aliasing
Comment 4 Jonas Sicking (:sicking) No longer reading bugmail consistently 2008-09-05 00:32:02 PDT
That was done in bug 333917, where aliasing resulted in actual test failures. So please take this warning seriously.
Comment 5 Ryan Hill 2009-03-01 19:29:55 PST
nspr-4.7.3 is broken by gcc-4.4 unless -fno-strict-aliasing is specified.  i suggest you fix the code to comply to aliasing rules or force this flag (preferably the former).
Comment 6 Kai Engert (:kaie) (on vacation) 2009-03-03 09:40:33 PST
We have received a bug report that looks similar to this one.
See https://bugzilla.redhat.com/show_bug.cgi?id=487844

Disabling optimization fixes that bug.
The bug causes fonts and spacing to be completely wrong.
Comment 7 Ryan Hill 2009-03-03 12:42:43 PST
yes, that is the same issue i'm seeing.
Comment 8 Wan-Teh Chang 2009-03-03 16:55:51 PST
Created attachment 365335 [details] [diff] [review]
Compile prdtoa.c with -ffloat-store -fno-strict-aliasing (backed out)

We treat prdtoa.c as third-party code (it's based on David
Gay's dtoa.c with only trivial changes) and try not to
modify our copy.  So I'm just going to compile it with
-fno-strict-aliasing.  I also added -ffloat-store, which
is necessary for our dtoa test program to pass in optimized
builds.
Comment 9 Kai Engert (:kaie) (on vacation) 2009-03-03 17:50:03 PST
Thanks Wan-Teh, but given that this code still requires --float-store for the test program, the code still seems to be ambiguous.
Comment 10 Kai Engert (:kaie) (on vacation) 2009-03-03 17:53:15 PST
Created attachment 365352 [details] [diff] [review]
Code patch

In https://bugzilla.redhat.com/show_bug.cgi?id=487844#c20
Jakub Jelinek proposed how to fix the code.

I wrote a patch based on his proposal, which I attach here.

With this patch:
- the dtoa test program still works
- it's not necessary to compile the dtoa test program with -ffloat-store
Comment 11 Kai Engert (:kaie) (on vacation) 2009-03-03 19:00:55 PST
Created attachment 365367 [details] [diff] [review]
Code patch v2
Comment 12 Kai Engert (:kaie) (on vacation) 2009-03-03 20:08:16 PST
Comment on attachment 365335 [details] [diff] [review]
Compile prdtoa.c with -ffloat-store -fno-strict-aliasing (backed out)

r=kaie

I have actually used this patch as a workaround for the fedora bug. Thanks Wan-Teh!
Comment 13 Kai Engert (:kaie) (on vacation) 2009-03-03 20:10:52 PST
Comment on attachment 365367 [details] [diff] [review]
Code patch v2

With this patch we require neither of the two compiler options (added with the workaround patch).
Comment 14 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-03-03 21:34:58 PST
The whitespace doesn't match the existing code. The existing code unfortunately uses tabs.
Comment 15 Wan-Teh Chang 2009-03-04 08:48:09 PST
Created attachment 365438 [details] [diff] [review]
wtc's code patch

Here is my attempt at fixing the code.  I tried to minimize the diffs
because I'd like to ask the author (David Gay) to fix this upstream.
I use the following strategy.

1. Use the dval, word0, and word1 macros only on variables of type U
(the union).  Do not use these macros on doubles.

2. Functions that take a double parameter or return a double value
continue to do so.  I don't change the parameter or return value to
type U.  Instead, I rename a double parameter by prepending 'd'
and the original double parameter becomes a local variable of type
U.  For example, a double parameter 'x' would be renamed 'dx', and
I declare 'x' as a local variable of type U and initialize 'x' with 'dx':
  void foo(double dx)  /* dx used to be named x */
  {
      U x;

      dval(x) = dx;

3. In strtod, the double variable aadj1 is used in a lot of places "naked",
and is only passed to those three macros once (specifically, word0(aadj1)).
If I redeclare aadj1 as type U, I'll need to use macros on a lot of places.
So instead, I just use a local variable 'tmp' to deal with the single instance
of word0(aadj1).

Minimizing diffs may not result in the best code.  I'll leave the fine tuning
to the author of the code.
Comment 16 Wan-Teh Chang 2009-03-04 10:06:55 PST
Jonas: could you tell the JavaScript team to inspect their copy
of dtoa.c (mozilla/js/src/dtoa.c) for these type-punned pointer
warnings?  Thanks.

Kai, could you test my code patch (attachment 365438 [details] [diff] [review])?  Are there
any remaining type-punned pointer warnings after that patch is
applied?  I don't have gcc 4.4, so I only tried to make the code
compile when I wrote that patch, without considering the type-punned
pointer warnings.
Comment 17 Kai Engert (:kaie) (on vacation) 2009-03-04 14:12:37 PST
As usual it took quite some work to get the development version of fedora rawhide installed.

Wan-Teh, now I have an environment with gcc 4.4 etc.

Both your patch and my patch, they prevent all the aliasing warnings, and both make the dtoa test program succeeded.

(I did not use the build system patch for the tests)
Comment 18 Ryan Hill 2009-03-04 15:09:15 PST
(In reply to comment #16)
> Jonas: could you tell the JavaScript team to inspect their copy
> of dtoa.c (mozilla/js/src/dtoa.c) for these type-punned pointer
> warnings?  Thanks.
> 
> Kai, could you test my code patch (attachment 365438 [details] [diff] [review])?  Are there
> any remaining type-punned pointer warnings after that patch is
> applied?  I don't have gcc 4.4, so I only tried to make the code
> compile when I wrote that patch, without considering the type-punned
> pointer warnings.

testing on gentoo w/ gcc trunk was successful.  it seems to work perfectly, and fixes a segfault i previously saw when building w/ -O3.

i've added this patch to our gcc-porting overlay for wider testing.
Comment 19 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-03-04 17:03:49 PST
Adding crowder to cc so that he's aware of the changes being done here. I think he's been mucking with the JS copy of dtoa.c
Comment 20 Wan-Teh Chang 2009-03-04 18:10:02 PST
Comment on attachment 365335 [details] [diff] [review]
Compile prdtoa.c with -ffloat-store -fno-strict-aliasing (backed out)

I checked in this makefile patch on the NSPR trunk (NSPR 4.8)
as a workaround.

Checking in Makefile.in;
/cvsroot/mozilla/nsprpub/pr/src/misc/Makefile.in,v  <--  Makefile.in
new revision: 1.21; previous revision: 1.20
done
Comment 21 Wan-Teh Chang 2009-03-04 22:00:40 PST
I just found that both mozilla/configure.in and mozilla/js/src/configure.in
use the -fno-strict-aliasing flag (bug 414641).  As a result, the JS copy
of dtoa.c is safe.

I have prepared a patch for the latest dtoa.c from upstream and will
send it to the author soon.
Comment 22 Wan-Teh Chang 2009-03-06 20:46:21 PST
Created attachment 366067 [details] [diff] [review]
wtc's code patch v2

I made some minor edits (added parentheses around the macro argument 'x' and
changed 'tmp' to 'temp').

I submitted a version of this patch for the latest upstream dtoa.c to the author.

Kai, could you please review this patch?  Thanks.
Comment 23 Wan-Teh Chang 2009-03-06 22:23:42 PST
This bug is already fixed for the JS copy of dtoa.c in bug 450392.
Comment 24 Kai Engert (:kaie) (on vacation) 2009-03-10 20:43:40 PDT
Comment on attachment 366067 [details] [diff] [review]
wtc's code patch v2

> #ifdef YES_ALIAS
>-#define dval(x) x
>+/* NOTE(wtc): I define dval(x) using a typecast, but
>+ *   #define dval(x) (x).d
>+ * would also work.

Good. I tried. it compiles.
I would prefer the "(x).d" approach, because...


>+ */
>+#define dval(x) ((double *)&x)[0]

... I don't like the pointer cast, because we lose type safety.

If my "type safety argument" is convincing, please use my preference.
If you want to keep your preference, 
please explain in the source comment "why" you prefer the typecast.


> #else
> #ifdef IEEE_8087
>-#define word0(x) ((U*)&x)->L[1]
>-#define word1(x) ((U*)&x)->L[0]
>+#define word0(x) (x).L[1]
>+#define word1(x) (x).L[0]
> #else
>-#define word0(x) ((U*)&x)->L[0]
>-#define word1(x) ((U*)&x)->L[1]
>+#define word0(x) (x).L[0]
>+#define word1(x) (x).L[1]
> #endif
>-#define dval(x) ((U*)&x)->d
>+#define dval(x) (x).d
> #endif


This looks very clean and reasonable to me. It keeps type safety. I like it.


r=kaie with my proposed code change or comment clarification
Comment 25 Wan-Teh Chang 2009-03-13 22:09:26 PDT
Created attachment 367352 [details] [diff] [review]
wtc's code patch v3

Thanks for the review.  I used the typecast in the dval macro for YES_ALIAS because
the word0 and word1 macros for YES_ALIAS also use typecase.  But your question
made me realize that we don't need the YES_ALIAS code now that we only pass
variables of the union U type to the dval, word0, and word1 macros.  So I simply
removed the YES_ALIAS code in this patch.
Comment 26 Wan-Teh Chang 2009-03-19 07:16:23 PDT
Comment on attachment 366067 [details] [diff] [review]
wtc's code patch v2

I submitted this patch to the author of dtoa.c on March 5.  He happened
to be dealing with a couple of other issues related to dtoa.c, and switched
to a use of union U similar to this patch.  He released a new version of
dtoa.c on Monday (2009-03-16) that fixes not only the strict aliasing issues
but also the parentheses and some (but not all) "may be used uninitialized"
gcc warnings.  You can get the new dtoa.c from http://www.netlib.org/fp/dtoa.c
and read the change log at http://www.netlib.org/fp/changes.

So I'm going to update and apply this patch to NSPR's prdtoa.c and back out
the -fno-strict-aliasing workaround.  This patch is not exactly a backport
of the author's fix, but this patch is smaller so I think it's more suitable.
I will upgrade to the new dtoa.c later (in bug 482002) because that requires
more testing.

I'm going to submit NSPR's other dtoa.c patch (see
https://bugzilla.mozilla.org/show_bug.cgi?id=362134) to the
author of dtoa.c.  If you have patched your copy of dtoa.c, I
encourage you to submit your patches to the author.
Comment 27 Wan-Teh Chang 2009-03-19 20:34:25 PDT
Created attachment 368447 [details] [diff] [review]
wtc's code patch v4 (checked in)

I made minor changes to patch v3 and checked it in on
the NSPR trunk (NSPR 4.8) and NSPR_4_7_BRANCH (NSPR 4.7.4).

Checking in prdtoa.c;
/cvsroot/mozilla/nsprpub/pr/src/misc/prdtoa.c,v  <--  prdtoa.c
new revision: 4.6; previous revision: 4.5
done

Checking in prdtoa.c;
/cvsroot/mozilla/nsprpub/pr/src/misc/prdtoa.c,v  <--  prdtoa.c
new revision: 4.5.36.1; previous revision: 4.5
done
Comment 28 Wan-Teh Chang 2009-03-19 20:37:25 PDT
Comment on attachment 365335 [details] [diff] [review]
Compile prdtoa.c with -ffloat-store -fno-strict-aliasing (backed out)

I backed out this workaround from the NSPR trunk (NSPR 4.8).

Checking in Makefile.in;
/cvsroot/mozilla/nsprpub/pr/src/misc/Makefile.in,v  <--  Makefile.in
new revision: 1.22; previous revision: 1.21
done
Comment 29 Kai Engert (:kaie) (on vacation) 2009-05-08 11:16:00 PDT
Created attachment 376436 [details]
Warnings on i386 with NSPR 4.7.4

FWIW, I'm building NSPR 4.7.4 for RHEL 5.x, and I still get this warning, see attachment.
Comment 30 Wan-Teh Chang 2009-05-08 22:22:25 PDT
Created attachment 376527 [details]
Test file foo.c containing typical BSD socket code

Kai, could you ask a GCC developer to look at this test file?

This file contains typical Berkeley socket code that casts the
address of a sockaddr_in structure to a struct sockaddr pointer.
But GCC warns about those typecasts.

Save this file as foo.c and compile it with this command line:
$ gcc -c -O2 -Wall -Wstrict-aliasing=2 foo.c
foo.c: In function ‘foo’:
foo.c:19: warning: dereferencing type-punned pointer might break strict-aliasing rules
foo.c:21: warning: dereferencing type-punned pointer might break strict-aliasing rules

What's the right way to fix these warnings?
Comment 31 Wan-Teh Chang 2009-05-08 22:30:23 PDT
Created attachment 376528 [details] [diff] [review]
Fix remaining warnings

This patch fix the warnings shown in attachment 376436 [details],
and a warning in the test program ranfile.c.

I also fixed a bug (caused by a typo) in PR_SendFile for
HP-UX (in ptio.c) while investigating the warnings.

The test program prpoll.c still has two warnings, which
I reduced to the test file foo.c in attachment 376527 [details].
Comment 32 Wan-Teh Chang 2009-05-08 22:37:06 PDT
*** Bug 471187 has been marked as a duplicate of this bug. ***
Comment 33 Nelson Bolyard (seldom reads bugmail) 2009-05-08 22:46:59 PDT
Comment on attachment 376528 [details] [diff] [review]
Fix remaining warnings


>@@ -1012,17 +1012,17 @@ static PRBool pt_hpux_sendfile_cont(pt_C

>-            hdtrl[0].iov_base = ((char *) hdtrl[0].iov_len) + count;
>+            hdtrl[0].iov_base = ((char *) hdtrl[0].iov_base) + count;
>             hdtrl[0].iov_len -= count;

This is a HUGE find!  It really deserves its own separate bug report!
Comment 34 Wan-Teh Chang 2009-05-09 15:02:29 PDT
Created attachment 376583 [details] [diff] [review]
Fix remaining warnings v2

I removed the fix for the unrelated HP-UX PR_SendFile bug, which
is now covered by bug 492170.
Comment 35 Elio Maldonado 2010-08-31 11:17:33 PDT
What's the status on this bug?
Comment 36 Wan-Teh Chang 2010-08-31 13:21:59 PDT
I don't remember what the status is.  There is a patch
(Fix remaining warnings v2, attachment 376583 [details] [diff] [review]) that needs
review.  If that's reviewed and checked in, I'll mark this
bug fixed.
Comment 37 Elio Maldonado 2011-09-05 10:22:38 PDT
Could we get a review on the last patch and finally close this bug?
Comment 38 Ed Morley [:emorley] 2011-09-05 10:38:03 PDT
Kaie, see comment 37.
Comment 39 Toralf Förster 2011-09-11 06:08:09 PDT
(In reply to Elio Maldonado from comment #37)
> Could we get a review on the last patch and finally close this bug?

Although I'm only the duplicate reporter - 4.8.8 under Gentoo looks fine wrt this issue.
Comment 40 Kai Engert (:kaie) (on vacation) 2012-08-15 00:03:04 PDT
Building optimized on RHEL 5 (gcc 4.1.x) with the latest patch applied:
  we still get lots of "type-punned"

Building optimized on Fedora 17 (gcc 4.7.x) WIHTOUT the latest patch applied:
  no such warnings


Can we conclude that gcc has been improved to be smarter, and that newer gcc understands that our code is safe?

If the answer is yes, then I propose to close this bug as worksforme.

It seems unnecessary to apply a patch that only fixes warnings with an older compiler version, if the suppression of such warnings is still incomplete.
Comment 41 Kai Engert (:kaie) (on vacation) 2012-08-15 00:21:26 PDT
Comment on attachment 376583 [details] [diff] [review]
Fix remaining warnings v2

partial r=kaie on the portions I've reviewed
  mozilla/nsprpub/pr/tests/ranfile.c
  mozilla/nsprpub/pr/src/pthreads/ptio.c

In mozilla/nsprpub/pr/src/misc/prthinfo.c
if we really consider to check this patch in,
then please be more complete, there are two more places where stack_end is being referenced, please remove the typecast everywhere.

Looking at the change to prfdcache, I'm surprised how dirty that construct is. Typecasting between completely unrelated structs whose only shared semtantic is that the first struct member is a pointer. Wow. But it would be good to keep the existing typecast, so future readers will have an easy chance to discover what type is expected, I'm leaning towards r-

Note You need to log in before you can comment on or make changes to this bug.