Please see https://bugzilla.novell.com/show_bug.cgi?id=346256 (it's x86-64 only) shlibsign fails in the build process: Generating DSA Key Pair....Generating PQG Params: An I/O error occurred during That's apparently a gcc bug in upcoming version 4.3. So probably nothing to be resolved here but maybe of interest for others. Current workaround is to undefine NSS_USE_COMBA
Thanks for letting us know that this defect in a recent version of gcc makes it fail to correctly compile NSS code that is correct and has successfully built and worked for years. The solution is to build NSS with an unbroken working version of gcc.
Wolfgang, this could be a compiler optimization bug in gcc 4.3. Can you report this to the gcc team or someone familiar with gcc? You should also find out what exactly went wrong. shlibsign doesn't crash. So we need to find out at which step the generation of PQG params fails. Perhaps it fails to load libfreebl3.so?
Yes, it's something in optimization since building with -O0 works fine. More details are in the referenced bug (which is open for everybody to read) and hopefully tracked by Novell's gcc team.
Wan-Teh, Fedora, Debian, Ubuntu provides gcc 4.3 snapshots. Is there any chance you could test with them?
There is no bug in NSS to be fixed. The bug report is resolved. The evidence in the bug is clear: there is an optimizer problem. The gcc maintainers can fix gcc, and then we'll be willing to use the fixed compiler. A great deal of time was spent making the comba code right, by 3 or 4 people two years ago, and this attitude that seems to assume that we have no idea what our code does is simply insulting and unwelcome. We're not going to bend over every time some maker of an open source build tool has some new whim. Cygwin tried that, and now they're no longer a supported build platform for Mozilla. One of the characteristics of inline assembler is that it tells the compiler what instructions to produce, pretty directly. If the compiler can no longer follow those instructions, and now produces code that does not do what it was told, that's a compiler bug. The instructions it was told to produce were, and still are, correct, and do not need to be fixed. I'm quite happy to continue to use older properly-working versions of gcc to build NSS until they can fix their bug. As long as the tinderbox builds continue to build and pass without errors, there is no bug in NSS's comba code that needs to be fixed. If for some reason we should desire to build with a version of gcc that can no longer correctly build with inline assembler, then I would produce a .s file containing the correct code built by the older version of gcc, and NSS would start using that .s file instead of the .c sources with that broken gcc compiler. We already do that for another vendor's compilers. I'm quite willing to do it for gcc. If someone wants to CONTRIBUTE a patch that solves this problem with the new compiler, but produces no less efficient code and doesn't reduce performance ANY on servers in load tests, that would be considered.
Maybe I should make clear that I reported that issue here just to make sure that people hitting that issue can find a reference in bmo and can find out what's going on. I don't expect NSS people to care about that.
This certainly is a gcc bug, but I think it's worth taking into consideration that the more compilers evolve, the more direct insertion of external inline assembly instruction in the middle of the code they generate perturbs them. AFAIK the way to lower the risks is usually to save and restore as much of the register state as possible before entering/when leaving the inline assembly instruction blocks. And from a very fast look, it doesn't look like http://lxr.mozilla.org/mozilla/source/security/nss/lib/freebl/mpi/mp_comba.c does much of that.
The bug seems fixed. Using the latest development snapshot of the Fedora Rawhide OS, which comes with gcc 4.3.0, I was able to build NSS trunk fine. I also ran the test suite, which completed without failures.
The previous comment was based on a debug build. Wan-Teh reminded me the original bug was reported with an optimized build. And indeed. I still see a failure with an optimized build :-( /home/kaie/moz/nss/head/mozilla/security/nss/cmd/shlibsign/Linux2.6_x86_64_glibc_PTH_64_OPT.OBJ/shlibsign -v -i /home/kaie/moz/nss/head/mozilla/security/nss/cmd/shlibsign/../../../../dist/Linux2.6_x86_64_glibc_PTH_64_OPT.OBJ/lib/libsoftokn3.so Generating DSA Key Pair....Generating PQG Params: An I/O error occurred during security authorization. make: *** [../../../../dist/Linux2.6_x86_64_glibc_PTH_64_OPT.OBJ/lib/libsoftokn3.chk] Error 1 make: *** [libs] Error 2 make: *** [libs] Error 2
(I personally think this bug should be kept open until a real solution is known, even if no fix inside NSS is planned.) I have been asked to rebuild NSS for the next Linux Fedora OS release, where all components are being built with gcc 4.3.0. Unless a better solution is known, I will use the workaround proposed in comment 0.
I think it would be less than a day's work (maybe about an hour's work) to: - build NSS optimized on Linux with some non-broken gcc - disassemble mp_comba.o generating mp_comba_broken_gcc.s - change the Makefile(s) to build mp_comba_broken_gcc.s instead of mp_comba.c on Linux systems which have that broken gcc - commit mp_comba_broken_gcc.s and the modified Makefile. I'm not including the wait-for-review time in that estimate.
There some more analysis of this bug in https://bugzilla.redhat.com/show_bug.cgi?id=432146 and a bug against GCC 4.3 (thanks to Jakub Jelinek) with a reduced testcase in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35160. Jakub also suggested a change (in the RH bug) to mp_comba.c that avoids this bug by simplifying the asm constraints and removing unnecessary variable initialization. Would the NSS team be willing to consider including this change?
In reply to Matthew's comment 12, a question and some comments. a) The gcc.gnu.org bug seems to suggest that a patch has already been made. (Or am I misunderstanding that?) If that bug is fixed, then is there any need or motivation to make any further changes to NSS sources? b) Changing this in NSS 3.11.x has implications on the FIPS validation of this code. We're not planning on any further FIPS validations of the NSS 3.11.c code (as I understand things), but we are considering doing another FIPS evaluation for NSS 3.12. So, this begs the question: if we were to accept a patch to NSS for NSS 3.12.x only (the trunk) and not for NSS 3.11.x would that be helpful to anyone? c) I've seen the patch at https://bugzilla.redhat.com/attachment.cgi?id=294959 and my reaction is that I'd like Tom St. Denis to review it and vouch for it. If he's willing to take it into libTomFast (or whatever it was called), then I'd consider taking it in NSS 3.12, IFF it still needs to be fixed (that is, if GCC hasn't been fixed by the patch for their bug cited above).
(In reply to comment #13) > In reply to Matthew's comment 12, a question and some comments. > > a) The gcc.gnu.org bug seems to suggest that a patch has already been made. > (Or am I misunderstanding that?) If that bug is fixed, then is there any > need or motivation to make any further changes to NSS sources? That is correct. As the reporter version of GCC 4.3 was not final, it is safe to say that there will not be any release versions of GCC 4.3 that will suffer from that bug. However, jakub seems to imply that the current code does unnecessary work, and could be improved regardless as described with the patch. > > b) Changing this in NSS 3.11.x has implications on the FIPS validation of > this code. We're not planning on any further FIPS validations of the NSS > 3.11.c code (as I understand things), but we are considering doing another > FIPS evaluation for NSS 3.12. So, this begs the question: if we were to > accept a patch to NSS for NSS 3.12.x only (the trunk) and not for NSS 3.11.x > would that be helpful to anyone? If it would improve the code, yes. > c) I've seen the patch at https://bugzilla.redhat.com/attachment.cgi?id=294959 > and my reaction is that I'd like Tom St. Denis to review it and vouch for it. > If he's willing to take it into libTomFast (or whatever it was called), > then I'd consider taking it in NSS 3.12, IFF it still needs to be fixed > (that is, if GCC hasn't been fixed by the patch for their bug cited above). > GCC has been fixed but based on Jakub's assertions that the current code is sub-optimal, I think it would be good to consider his change suggestions. Re-opening for consideration.
I've reviewed and tested the patch in the bugzilla.redhat.com bug report. I had to study the GCC inline assembly syntax and the ISO C implementation of those SQRADDxx macros in TomsFastMath. I can vouch for its correctness except for the early clobbers (=&r), which I still don't understand. Nelson, if you want, we can omit the change to SQRADDDB (which has the early clobber changes), because the change to SQRADDSC alone is enough to work around the GCC 4.3pre bug, and I fully understand that change. (That change also makes it unnecessary to initialize sc0, sc1, sc2.) Basically, the patch removes unnecessary constraints on the inputs/outputs of the inline assembly. So as caillon said, the patch should help GCC generate better code. Linux distributions such as Fedora and openSUSE have the ability to apply their own patches to the pristine NSS source code, so Kai doesn't need the patch on the NSS_3_11_BRANCH. But it would be nice to get this patch into the NSS trunk. Taking this change would require you to regenerate mp_comba_amd64_sun.s for Solaris. It is our pleasure to give the patch back to TomsFastMath upstream. That should not block us from taking this patch (at least the subset I fully understand).
Wan-Teh, given that mp_comba_amd64_sun.s is correct as is, a change to the .c source does not necessitate a change to mp_comba_amd64_sun.s I do not want to take any change whose correctness depends on some very new version of gcc. I am unwilling to take a change that means that the .c file will cease to correctly compile with any version of gcc with which it now compiles correctly. As I understand it, the propopsed change to SQRADDSC meets my criteria, but I am not certain that the SQADDDB change does. Basically, my reaction to this whole bug, at this point, is this: Given that there is no bug in the current source that prevents it from being compiled and working correctly on Linux with a released gcc, I think there is no urgency to do anything with this. This bug is now nothing more than a very minor performance enhancement. We have other priorities for NSS 3.12, priorities that are causing a lot of other desirable work to remain undone, such as a lot of SSL work. The change requested in this bug is much lower priority (in my view) than the SSL work that is being delayed by higher priority work. So, if I had more resources to put on NSS at this time, I would put them on SSL, not on this. If you have the time and interest to take this bug and make the changes, and to carry the responsiblity for them thereafter, I won't stand in your way. :)
Understood and agreed. This is why I did all the work in the Red Hat bug report, not here (I didn't want to distract you from the SSL work), and my work on the Red Hat bug report was all done during non-business hours as a pastime.
Nelson, you claim the bug is not present in any released version of gcc. You claim the bug is only present in a not-yet-released version of gcc 4.3.0pre. Well, if we all had refused to work on this bug, this bug would probably have remained undetected, and gcc 4.3.0 would probably have been shipped with this bug. If we all had simply ignored this issue and waited for the final release of gcc 4.3.0, then very likely we would have ended up in a situation where a released version of gcc would show this bug. I'm therefore really glad that Wan-Teh and Jakub have offered their work to investigate and analyze this problem.
> I'm therefore really glad that Wan-Teh and Jakub have offered their work to > investigate and analyze this problem. I am too! I'm glad that their work got the problem fixed at the source. Thanks to all who helped solve the gcc problem. And now that it's fixed, I don't want to spend time to work around it in NSS.
(In reply to comment #13) > > b) Changing this in NSS 3.11.x has implications on the FIPS validation of > this code. We're not planning on any further FIPS validations of the NSS > 3.11.c code (as I understand things), but we are considering doing another > FIPS evaluation for NSS 3.12. So, this begs the question: if we were to > accept a patch to NSS for NSS 3.12.x only (the trunk) and not for NSS 3.11.x > would that be helpful to anyone? From my point of view, changes to 3.11 should be avoided. From my point of view, if possible optimizations have been identified, it's reasonable to apply them to 3.12 before the next FIPS evaluation starts.
Created attachment 303938 [details] [diff] [review] Proposed patch by Jakub Jelinek of Red Hat I submitted this patch to the email list for LibTom Projects in http://groups.google.com/group/libtom/browse_thread/thread/46b7827d4374892c?hl=en
Comment on attachment 303938 [details] [diff] [review] Proposed patch by Jakub Jelinek of Red Hat I forgot to mention that this patch was contributed by Jakub Jelinek of Red Hat. I have reviewed the patch and fully understand it, including the "earlyclobbers". I only added a note to mp_comba_amd64_sun.s because it will get out of sync with mp_comba.c after the patch is applied.
Here are Jakub's arguments, he recommends that we apply this change to NSS, because (quote): 1) you can remove the unnecessary workarounds where you need to initialize sc* = 0 in order to avoid compiler warnings. All the clearing of the vars really has to be done in the generated code 2) by accurrately describing what the inline asm does you give the compiler more freedom to generate better code, especially if register preassure is high.
I don't find those reasons persuasive. The still don't make this more important than (say) any of the pending SSL work.
(In reply to comment #24) > The still don't make this more important than (say) any of the pending SSL work. I don't understand this argument. If I understand correctly, the work is already done. Jakub produced the patch, Wan-Teh reviewed it, I ran the test suites. Wan-Teh adds a comment to the SUN assembler file, in order to follow your wish to keep the SUN code unchanged - in order to avoid any work on SUN's side. Are there arguments to reject the contribution? I'm changing the bug summary, because this is no longer about miscompilation (gcc with fix compiles NSS fine), it's now about a contributed improvement. I'm changing the Version attribute of this bug, to make it clear the improvement is proposed for trunk (only).
Nelson, In the last paragraph of comment 16, you said it's fine for me to take this bug and make the change. During the President Day long weekend, I worked on this bug as a pastime. I attached the patch to this bug to wrap up the work. I did *not* ask anyone to review the patch. Why can't I work on something that fascinates me in my spare time? I didn't waste any NSS resources. Not to mention I also reviewed Julien's SSL patch (bug 403240 comment 33) in the long weekend.
Wan-Teh, I didn't realize you had taken the bug. I was stating that *I* will not work on this bug until more important work is done. You are (of course, as always) both free and WELCOME to work on this if you wish! Sorry for my confusion.
Wan-Teh, If you modify the comba code, please also regenerate the mp_comba_amd64_sun.s to make sure that it is in sync with the C master source. When we made our performance improvements for amd64, we made sure that everything built on both Solaris and Linux, even though we could have chosen to do Solaris only. We expect the code to be maintained on both platforms. The "unnamed compiler" is gcc 3.4.3 on Solaris x64, which is in /usr/sfw/bin/gcc, bundled with s10, which you can DL free . Some testing with rsaperf should also be done to make sure that performance doesn't actually regress due to the patch before checkin. Note that the code was never tested on Intel x64 chips. It would be interesting to see how it performs there. We don't have any machines to test that on at Sun yet.
Julien, do you make it a requirement to update the Sun code at the same time? If I understand correctly: - Nelson has proposed to avoid changes to the Sun code - Wan-Teh might have a hard time to test it on Sun Because of Nelson's preference to avoid changes to the Sun code, I'd conclude, checking in any changes for Sun code might be postponed to a later time? I understand the current state of this bug as follows: - Jakub Jelinek contributed the patch - Wan-Teh has reviewed the patch Is it OK if Wan-Teh marks the attachment as reviewed? I'd think so. Then he should be OK to check it in.
I'm putting on my "NSS project lead for Sun" hat for this comment. Sun does not require ANY change to mp_comba_amd64_sun.s for NSS 3.12. No such change is prerequisite to the integration of Wan-Teh's patch to the .c file. I will write more in private email.
I checked in Jakub Jelinek's patch for mp_comba.c on the NSS trunk for NSS 3.12. Checking in mp_comba.c; /cvsroot/mozilla/security/nss/lib/freebl/mpi/mp_comba.c,v <-- mp_comba.c new revision: 1.3; previous revision: 1.2 done I didn't update the comment in mp_comba_amd64_sun.s as I proposed in attachment 303938 [details] [diff] [review]. Let me know if you want me to do that.
Today, the bug at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35160 was marked as Resolved/Fixed in gcc 4.4.0