Closed Bug 572412 Opened 15 years ago Closed 10 years ago

Pointer to 32bit cast is lossy, gives compiler warnings, and apparent potential bugs

Categories

(NSS :: Libraries, defect)

x86_64
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bjacob, Assigned: n.nethercote)

References

(Blocks 1 open bug)

Details

(Whiteboard: [build_warning])

Attachments

(1 file, 1 obsolete file)

Here are some warnings I get when building on linux x86-64 with gcc 4.4: secoid.c:1843: warning: cast from pointer to integer of different size pkcs11.c:1947: warning: cast from pointer to integer of different size lginit.c:498: warning: cast from pointer to integer of different size lginit.c:504: warning: cast from pointer to integer of different size lginit.c:505: warning: cast from pointer to integer of different size ... (plus many more such warnings) Here's what the code looks like: for example, in lginit.c: static PLHashNumber lg_HashNumber(const void *key) { return (PLHashNumber) key; } PRIntn lg_CompareValues(const void *v1, const void *v2) { PLHashNumber value1 = (PLHashNumber) v1; PLHashNumber value2 = (PLHashNumber) v2; return (value1 == value2); } On 64bit platforms, this lg_CompareValues() could easily return true for actually different pointers, e.g. if they differ by exactly 2^32 bytes. Indeed, PLHashNumber is uint32. I don't know if this is a bug. If it is, then, since our hash tables use 32bit keys, the only way to fix that is to refrain from using pointers as keys.
Whiteboard: [build_warning]
Blocks: buildwarning
It very much looks like a bug, and the code is still the same.
> It very much looks like a bug, and the code is still the same. Actually, bug 1182667 papered over all of these. In the hash functions that's acceptable; ignoring the high 32-bits of a pointer when computing a hash function gives you a low-quality hash function but not an incorrect one. However, the comparison done in lg_CompareValues *is* defective. It could mistakenly think that two distinct pointers are actually equal. The fix is easy and I will post it shortly.
Attachment #8659042 - Flags: review?(rrelyea)
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Comment on attachment 8659042 [details] [diff] [review] Don't use bogus lg_CompareValues() function Review of attachment 8659042 [details] [diff] [review]: ----------------------------------------------------------------- Yep, this is fine. I've checked it in. https://hg.mozilla.org/projects/nss/rev/ac25fb952b21
Attachment #8659042 - Flags: review?(rrelyea)
Attachment #8659042 - Flags: review+
Attachment #8659042 - Flags: checked-in+
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Thank you for the quick response.
This commit caused build failures on multiple platforms. It appears that it works on 32bit, but fails on 64bit platforms.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Kai Engert (:kaie) from comment #7) > Sample log of one of the failed builds: > https://bot.nss-crypto.org:8011/builders/1-linux-x64-DBG/builds/630/steps/ > shell/logs/stdio Can you help me find where the build failure is? I don't see any compile errors in that file, but there's a lot of output so I may have missed it. Thank you.
It's not a build failure, it's a new failure when executing the NSS test suite. This page explains how to run the test suite: https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/NSS_Sources_Building_Testing When running the test suite, search the output for the string: FAILED
Huh. I ran the tests locally, on a 64-bit build (AFAIK; it's an x86-64 machine) and got the same results before and afterwards. But the run you linked to had many failures. I'm at the limit of my expertise here. Maybe Martin has an idea.
That's distressing. It seems like the first step is to back this out (need to keep the tree green) and then try to figure out what's up. Nicholas, Kai: what compiler are you using? I have noticed some odd behaviors with particular versions of clang.
I can reproduce on my local system, which uses gcc 5.1.1 (from fedora 22). I see that the failure is random, sometimes it works. The test suite fails to create a CA cert. Navigate into a tests_results/security/localhost.*/CA directory. Inside that directory run: rm key3.db cert8.db secmod.db certutil -N -d dbm:../CA -f ../tests.pw certutil -s "CN=NSS Test CA, O=BOGUS NSS, L=Mountain View, ST=California, C=US" -S -n TestCA -t CTu,CTu,CTu -v 600 -x -d dbm:../CA -1 -2 -5 -f ../tests.pw -z ../tests_noise -m 1 < /tmp/input File /tmp/input should contain the following lines: 5 6 9 n y -1 n 5 6 7 9 n (taken from "Creating CA Cert" inside nss/tests/cert/cert.sh) Sometimes the command works. If it works, you must clean up and recreate the db again using above steps. Sometimes the command fails with: certutil: unable to generate key(s) : SEC_ERROR_PKCS11_DEVICE_ERROR: A PKCS #11 module returned CKR_DEVICE_ERROR, indicating that a problem has occurred with the token or slot.
(In reply to Eric Rescorla (:ekr) from comment #11) > It seems like the first step is to back this out (need to keep the tree > green) and > then try to figure out what's up. I've already backed out. One of the failing platforms was 2-linux-x64-opt. Failing build was: https://bot.nss-crypto.org:8011/builders/2-linux-x64-OPT/builds/708/steps/shell/logs/stdio The still running build from after the backout has successfully passed that test. https://bot.nss-crypto.org:8011/builders/2-linux-x64-OPT/builds/709/steps/shell/logs/stdio
I can reproduce on my local machine using gcc 5.1.1 (fedora 22). The build machine mentioned in comment 13 uses gcc 4.4.7 (centos 6).
OK, we found it. There's a nasty sign extension bug in lg_mkHandle that was being masked by the old code. Patch coming.
Attached patch gah.patchSplinter Review
Attachment #8659042 - Attachment is obsolete: true
Attachment #8662625 - Flags: superreview?(n.nethercote)
Attachment #8662625 - Flags: review?(ekr)
Comment on attachment 8662625 [details] [diff] [review] gah.patch Review of attachment 8662625 [details] [diff] [review]: ----------------------------------------------------------------- Nice catch. I wrote a small test program to replicate this and I can see that things go wrong if any of the entries in hashBuf[] have values greater than 0x80, i.e. the topmost bit is 1. But I don't understand why this is happening when both hashBuf[] and handle are unsigned. Could someone explain? Thank you. ::: lib/softoken/legacydb/lgutil.c @@ +303,5 @@ > /* there is only one KRL, use a fixed handle for it */ > if (handle != LG_TOKEN_KRL_HANDLE) { > lg_XORHash(hashBuf,dbKey->data,dbKey->len); > + handle = ((CK_OBJECT_HANDLE)hashBuf[0] << 24) | > + ((CK_OBJECT_HANDLE)hashBuf[1] << 16) | Nit: remove trailing whitespace.
Attachment #8662625 - Flags: superreview?(n.nethercote) → superreview+
The explanation is that C does a bad job when it comes to signed and unsigned values. What I *think* happens (the c99 spec didn't shed much light on this): hashBuf[0] is unsigned char unsigned char << 24 is promoted to a (signed) int assigning an int to an unsigned long causes it to be first converted to a long, which - if the sign bit is set - causes the sign bit to fill the upper 4 bytes.
Comment on attachment 8662625 [details] [diff] [review] gah.patch Review of attachment 8662625 [details] [diff] [review]: ----------------------------------------------------------------- https://hg.mozilla.org/projects/nss/rev/73709fde8f44
Attachment #8662625 - Flags: review?(ekr) → checked-in+
> unsigned char << 24 is promoted to a (signed) int Ok, that's the bit I didn't expect. How uncivilised. Thank you for the explanation.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.21
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: