Closed Bug 353896 Opened 18 years ago Closed 18 years ago

Building tip with NSS_ECC_MORE_THAN_SUITE_B causes crashes in all.sh

Categories

(NSS :: Libraries, defect, P2)

3.11.3
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.11.5

People

(Reporter: julien.pierre, Assigned: wtc)

References

Details

Attachments

(3 files)

Currently, if you pull the NSS tip with NSS_ENABLE_ECC=1 and NSS_ECC_MORE_THAN_SUITE_B=1, then run all.sh, you will get 20 core files in selfserv as follows :

[jp96085@monstre]/export/home/nss/tip/mozilla/tests_results/security/monstre.1 337 % find . | grep core
./ext_client/core.selfserv.monstre.23860
./ext_client/core.selfserv.monstre.25200
./ext_client/core.selfserv.monstre.25296
./ext_client/core.selfserv.monstre.25366
./ext_client/core.selfserv.monstre.25463
./ext_client/core.selfserv.monstre.28745
./ext_client/core.selfserv.monstre.120
./ext_client/core.selfserv.monstre.234
./ext_client/core.selfserv.monstre.312
./ext_client/core.selfserv.monstre.412
./client/core.selfserv.monstre.21442
./client/core.selfserv.monstre.22836
./client/core.selfserv.monstre.22933
./client/core.selfserv.monstre.23003
./client/core.selfserv.monstre.23099
./client/core.selfserv.monstre.26308
./client/core.selfserv.monstre.27717
./client/core.selfserv.monstre.27815
./client/core.selfserv.monstre.27885
./client/core.selfserv.monstre.27981

Most of the stacks are of this type :

(dbx) where
current thread: t@2
  [1] strlen(0x8068a2a, 0xfe869220, 0xfec2b090, 0x0), at 0xfeb84b8c
  [2] _fprintf(0xfec2b090, 0x8068a08, 0x8080ae0, 0x0, 0x0), at 0xfebdccfd
=>[3] errWarn(funcString = 0x8080ae0 "HDX PR_Read"), line 248 in "selfserv.c"
  [4] handle_connection(tcp_sock = 0x817b9f0, model_sock = 0x80836c8, requestCert = 0), line 968 in "selfserv.c"
  [5] jobLoop(a = (nil), b = (nil), c = 0), line 518 in "selfserv.c"
  [6] thread_wrapper(arg = 0x816d2c0), line 486 in "selfserv.c"
  [7] _pt_root(arg = 0x8178688), line 220 in "ptthread.c"
  [8] _thr_setup(0xfec42400), at 0xfebff9be
  [9] _lwp_start(), at 0xfebffca0
(dbx)

This happens because PR_Read returned -1 without setting an NSPR error .

I realize the above build environment isn't supported, but our software still should be more robust. Two things need to be done :
1) selfserv should not crash . An NSPR error should be set if the wrong curve is used or keygen fails
2) we should attempt to detect the build error at build time. If NSS_ECC_MORE_THAN_SUITE_B is set but the source doesn't support that setting, there should be an
#ifdef NSS_ECC_MORE_THAN_SUITE_B
#error Sorry, but you lose !
#endif

I have wasted an incredible amount of time on various machines with build mismatches due to out-of-date checkout scripts and ECC build macros.

Similarly, if the source requires NSS_ECC_MORE_THAN_SUITE_B, it should have

#ifndef NSS_ECC_MORE_THAN_SUITE_B
#error Sorry, but you lose too !
#endif
The crash problem is actually bug 342795, so I'm adding it as a dependency .

I will attach patches for the build problem.
Depends on: 342795
Attachment #239741 - Flags: superreview?(nelson)
Attachment #239741 - Flags: review?(alexei.volkov.bugs)
Attachment #239742 - Flags: superreview?(nelson)
Attachment #239742 - Flags: review?(alexei.volkov.bugs)
Comment on attachment 239742 [details] [diff] [review]
Patch for tip and NSS_3_11_BRANCH

This is fine.  It should save developers a lot of time.
Attachment #239742 - Flags: superreview?(nelson) → review+
Comment on attachment 239741 [details] [diff] [review]
Patch for NSS_3_11_1_BRANCH

My first reaction to this patch is that I would prefer that this version of this file be ifdeffed so that it can build either basic or extended ECC.  
But I won't withhold review for that.
Attachment #239741 - Flags: superreview?(nelson) → review+
Attachment #239741 - Flags: review?(alexei.volkov.bugs) → review+
Attachment #239742 - Flags: review?(alexei.volkov.bugs) → review+
Nelson,

re: comment 5, I would also much prefer that we have a single file, but I think we both know there are some objections to that outside of Sun .

Thanks for the reviews.

I checked in attachment 239741 [details] [diff] [review] to NSS_3_11_1_BRANCH :

Checking in ecl-curve.h;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecl-curve.h,v  <--  ecl-curve.h
new revision: 1.2.28.2.2.1; previous revision: 1.2.28.2
done

And attachment 239742 [details] [diff] [review] to NSS_3_11_BRANCH :

Checking in ecl-curve.h;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecl-curve.h,v  <--  ecl-curve.h
new revision: 1.2.28.4; previous revision: 1.2.28.3
done

And to the tip :

Checking in ecl-curve.h;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecl-curve.h,v  <--  ecl-curve.h
new revision: 1.6; previous revision: 1.5
done

I'm keeping this bug open until its dependencies about the crash are resolved.
Priority: -- → P2
Not sure why Julien set the target milestone to 3.11.4 instead of 3.12.

This patch will fix the crash on Solaris by having SECU_Strerror(0)
return the string "No error." instead of NULL.  It doesn't do what
Julien wanted though:

  1) selfserv should not crash . An NSPR error should be set if the wrong
  curve is used or keygen fails

I am open to other error strings for the error code 0.  Note that
strerror(0) is
- "Success" on Linux
- "Error 0" on HP-UX and Solaris
- "Unknown error: 0" on Mac OS X
- "No error" on Windows
Attachment #245623 - Flags: superreview?(alexei.volkov)
Attachment #245623 - Flags: review?(nelson)
Wan-Teh,

A change was made for 3.11.4 as you can see in comment 6 to attempt to avoid building the wrong level of ECC.

The part about the NSPR error being set needs to be fixed in dependent bugs, bug 342795 and 353899 .

I would almost prefer not to checkin the fix for SECU_Strerror as it will hide this type of error in the future and make them even more difficult to diagnose. 
Nelson, I suggest that we mark this bug fixed.  I have
created a patch for the dependent bug 342795.

I will let you decide whether SECU_Strerror should return
an error message for error code 0 (attachment 245623 [details] [diff] [review]).
Either way is fine by me, but let's not let this bug drag
on for too long.
Assignee: bugzilla → wtchang
Attachment #245623 - Flags: superreview?(alexei.volkov) → superreview?(alexei.volkov.bugs)
The ecl-curve.h patch in this bug was checked in in NSS 3.11.4.

The dependent bug (bug 342795) was fixed in NSS 3.11.5.

Whether you want to change SECU_Strerror(0) or not is fine by me.
Feel free to cancel my review request for that patch.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Target Milestone: 3.11.4 → 3.11.5
Comment on attachment 245623 [details] [diff] [review]
Make SECU_Strerror(0) return "No error." instead of NULL

I agree with Julien. I think we are better off breaking then hiding errors in the software.
Attachment #245623 - Flags: superreview?(alexei.volkov.bugs)
Regarding SECU_Strerror(0), this function is documented to return NULL for 
unknown error codes.  Some programs don't check and crash.  Changing error 0
to return a non-NULL string fixes one error code out of billions.  

Do we want to change the function to never return a NULL string, but 
instead return an "unknown error" string for all unknown errors?  

Should this function assert that the error code is known?
Comment on attachment 245623 [details] [diff] [review]
Make SECU_Strerror(0) return "No error." instead of NULL

We should open a new bug for this issue.
Attachment #245623 - Flags: review?(nelson)
It's fine by me to consider 0 an unknown error code rather
than the error code for "no error".
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: