Last Comment Bug 353896 - Building tip with NSS_ECC_MORE_THAN_SUITE_B causes crashes in all.sh
: Building tip with NSS_ECC_MORE_THAN_SUITE_B causes crashes in all.sh
Status: RESOLVED FIXED
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: 3.11.3
: All All
: P2 normal (vote)
: 3.11.5
Assigned To: Wan-Teh Chang
:
:
Mentors:
Depends on: 342795
Blocks:
  Show dependency treegraph
 
Reported: 2006-09-22 16:27 PDT by Julien Pierre
Modified: 2007-01-08 15:21 PST (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Patch for NSS_3_11_1_BRANCH (1.04 KB, patch)
2006-09-22 17:38 PDT, Julien Pierre
alvolkov.bgs: review+
nelson: review+
Details | Diff | Splinter Review
Patch for tip and NSS_3_11_BRANCH (1.10 KB, patch)
2006-09-22 17:40 PDT, Julien Pierre
alvolkov.bgs: review+
nelson: review+
Details | Diff | Splinter Review
Make SECU_Strerror(0) return "No error." instead of NULL (729 bytes, patch)
2006-11-14 17:03 PST, Wan-Teh Chang
no flags Details | Diff | Splinter Review

Description Julien Pierre 2006-09-22 16:27:35 PDT
Currently, if you pull the NSS tip with NSS_ENABLE_ECC=1 and NSS_ECC_MORE_THAN_SUITE_B=1, then run all.sh, you will get 20 core files in selfserv as follows :

[jp96085@monstre]/export/home/nss/tip/mozilla/tests_results/security/monstre.1 337 % find . | grep core
./ext_client/core.selfserv.monstre.23860
./ext_client/core.selfserv.monstre.25200
./ext_client/core.selfserv.monstre.25296
./ext_client/core.selfserv.monstre.25366
./ext_client/core.selfserv.monstre.25463
./ext_client/core.selfserv.monstre.28745
./ext_client/core.selfserv.monstre.120
./ext_client/core.selfserv.monstre.234
./ext_client/core.selfserv.monstre.312
./ext_client/core.selfserv.monstre.412
./client/core.selfserv.monstre.21442
./client/core.selfserv.monstre.22836
./client/core.selfserv.monstre.22933
./client/core.selfserv.monstre.23003
./client/core.selfserv.monstre.23099
./client/core.selfserv.monstre.26308
./client/core.selfserv.monstre.27717
./client/core.selfserv.monstre.27815
./client/core.selfserv.monstre.27885
./client/core.selfserv.monstre.27981

Most of the stacks are of this type :

(dbx) where
current thread: t@2
  [1] strlen(0x8068a2a, 0xfe869220, 0xfec2b090, 0x0), at 0xfeb84b8c
  [2] _fprintf(0xfec2b090, 0x8068a08, 0x8080ae0, 0x0, 0x0), at 0xfebdccfd
=>[3] errWarn(funcString = 0x8080ae0 "HDX PR_Read"), line 248 in "selfserv.c"
  [4] handle_connection(tcp_sock = 0x817b9f0, model_sock = 0x80836c8, requestCert = 0), line 968 in "selfserv.c"
  [5] jobLoop(a = (nil), b = (nil), c = 0), line 518 in "selfserv.c"
  [6] thread_wrapper(arg = 0x816d2c0), line 486 in "selfserv.c"
  [7] _pt_root(arg = 0x8178688), line 220 in "ptthread.c"
  [8] _thr_setup(0xfec42400), at 0xfebff9be
  [9] _lwp_start(), at 0xfebffca0
(dbx)

This happens because PR_Read returned -1 without setting an NSPR error .

I realize the above build environment isn't supported, but our software still should be more robust. Two things need to be done :
1) selfserv should not crash . An NSPR error should be set if the wrong curve is used or keygen fails
2) we should attempt to detect the build error at build time. If NSS_ECC_MORE_THAN_SUITE_B is set but the source doesn't support that setting, there should be an
#ifdef NSS_ECC_MORE_THAN_SUITE_B
#error Sorry, but you lose !
#endif

I have wasted an incredible amount of time on various machines with build mismatches due to out-of-date checkout scripts and ECC build macros.

Similarly, if the source requires NSS_ECC_MORE_THAN_SUITE_B, it should have

#ifndef NSS_ECC_MORE_THAN_SUITE_B
#error Sorry, but you lose too !
#endif
Comment 1 Julien Pierre 2006-09-22 16:49:17 PDT
The crash problem is actually bug 342795, so I'm adding it as a dependency .

I will attach patches for the build problem.
Comment 2 Julien Pierre 2006-09-22 17:38:16 PDT
Created attachment 239741 [details] [diff] [review]
Patch for NSS_3_11_1_BRANCH
Comment 3 Julien Pierre 2006-09-22 17:40:49 PDT
Created attachment 239742 [details] [diff] [review]
Patch for tip and NSS_3_11_BRANCH
Comment 4 Nelson Bolyard (seldom reads bugmail) 2006-09-22 23:13:56 PDT
Comment on attachment 239742 [details] [diff] [review]
Patch for tip and NSS_3_11_BRANCH

This is fine.  It should save developers a lot of time.
Comment 5 Nelson Bolyard (seldom reads bugmail) 2006-09-22 23:17:49 PDT
Comment on attachment 239741 [details] [diff] [review]
Patch for NSS_3_11_1_BRANCH

My first reaction to this patch is that I would prefer that this version of this file be ifdeffed so that it can build either basic or extended ECC.  
But I won't withhold review for that.
Comment 6 Julien Pierre 2006-09-25 13:51:42 PDT
Nelson,

re: comment 5, I would also much prefer that we have a single file, but I think we both know there are some objections to that outside of Sun .

Thanks for the reviews.

I checked in attachment 239741 [details] [diff] [review] to NSS_3_11_1_BRANCH :

Checking in ecl-curve.h;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecl-curve.h,v  <--  ecl-curve.h
new revision: 1.2.28.2.2.1; previous revision: 1.2.28.2
done

And attachment 239742 [details] [diff] [review] to NSS_3_11_BRANCH :

Checking in ecl-curve.h;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecl-curve.h,v  <--  ecl-curve.h
new revision: 1.2.28.4; previous revision: 1.2.28.3
done

And to the tip :

Checking in ecl-curve.h;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecl-curve.h,v  <--  ecl-curve.h
new revision: 1.6; previous revision: 1.5
done

I'm keeping this bug open until its dependencies about the crash are resolved.
Comment 7 Wan-Teh Chang 2006-11-14 17:03:56 PST
Created attachment 245623 [details] [diff] [review]
Make SECU_Strerror(0) return "No error." instead of NULL

Not sure why Julien set the target milestone to 3.11.4 instead of 3.12.

This patch will fix the crash on Solaris by having SECU_Strerror(0)
return the string "No error." instead of NULL.  It doesn't do what
Julien wanted though:

  1) selfserv should not crash . An NSPR error should be set if the wrong
  curve is used or keygen fails

I am open to other error strings for the error code 0.  Note that
strerror(0) is
- "Success" on Linux
- "Error 0" on HP-UX and Solaris
- "Unknown error: 0" on Mac OS X
- "No error" on Windows
Comment 8 Julien Pierre 2006-11-14 17:39:07 PST
Wan-Teh,

A change was made for 3.11.4 as you can see in comment 6 to attempt to avoid building the wrong level of ECC.

The part about the NSPR error being set needs to be fixed in dependent bugs, bug 342795 and 353899 .

I would almost prefer not to checkin the fix for SECU_Strerror as it will hide this type of error in the future and make them even more difficult to diagnose. 
Comment 9 Wan-Teh Chang 2006-11-20 10:02:30 PST
Nelson, I suggest that we mark this bug fixed.  I have
created a patch for the dependent bug 342795.

I will let you decide whether SECU_Strerror should return
an error message for error code 0 (attachment 245623 [details] [diff] [review]).
Either way is fine by me, but let's not let this bug drag
on for too long.
Comment 10 Wan-Teh Chang 2006-12-06 15:10:52 PST
The ecl-curve.h patch in this bug was checked in in NSS 3.11.4.

The dependent bug (bug 342795) was fixed in NSS 3.11.5.

Whether you want to change SECU_Strerror(0) or not is fine by me.
Feel free to cancel my review request for that patch.
Comment 11 Alexei Volkov 2007-01-03 14:35:15 PST
Comment on attachment 245623 [details] [diff] [review]
Make SECU_Strerror(0) return "No error." instead of NULL

I agree with Julien. I think we are better off breaking then hiding errors in the software.
Comment 12 Nelson Bolyard (seldom reads bugmail) 2007-01-08 14:48:23 PST
Regarding SECU_Strerror(0), this function is documented to return NULL for 
unknown error codes.  Some programs don't check and crash.  Changing error 0
to return a non-NULL string fixes one error code out of billions.  

Do we want to change the function to never return a NULL string, but 
instead return an "unknown error" string for all unknown errors?  

Should this function assert that the error code is known?
Comment 13 Nelson Bolyard (seldom reads bugmail) 2007-01-08 14:49:10 PST
Comment on attachment 245623 [details] [diff] [review]
Make SECU_Strerror(0) return "No error." instead of NULL

We should open a new bug for this issue.
Comment 14 Wan-Teh Chang 2007-01-08 15:21:27 PST
It's fine by me to consider 0 an unknown error code rather
than the error code for "no error".

Note You need to log in before you can comment on or make changes to this bug.