Last Comment Bug 407866 - Contributed improvement to security/nss/lib/freebl/mpi/mp_comba.c
: Contributed improvement to security/nss/lib/freebl/mpi/mp_comba.c
Status: RESOLVED FIXED
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: trunk
: x86 Linux
: -- trivial (vote)
: 3.12
Assigned To: Wan-Teh Chang
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-12-11 00:26 PST by Wolfgang Rosenauer [:wolfiR]
Modified: 2008-12-30 22:18 PST (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Proposed patch by Jakub Jelinek of Red Hat (6.03 KB, patch)
2008-02-17 18:04 PST, Wan-Teh Chang
no flags Details | Diff | Splinter Review

Description Wolfgang Rosenauer [:wolfiR] 2007-12-11 00:26:13 PST
Please see 
https://bugzilla.novell.com/show_bug.cgi?id=346256
(it's x86-64 only)

shlibsign fails in the build process:
Generating DSA Key Pair....Generating PQG Params: An I/O error occurred during

That's apparently a gcc bug in upcoming version 4.3. So probably nothing to be resolved here but maybe of interest for others.

Current workaround is to undefine NSS_USE_COMBA
Comment 1 Nelson Bolyard (seldom reads bugmail) 2007-12-13 18:08:30 PST
Thanks for letting us know that this defect in a recent version of gcc makes 
it fail to correctly compile NSS code that is correct and has successfully 
built and worked for years.  The solution is to build NSS with an unbroken 
working version of gcc.
Comment 2 Wan-Teh Chang 2007-12-13 21:17:51 PST
Wolfgang, this could be a compiler optimization bug in gcc 4.3.
Can you report this to the gcc team or someone familiar with gcc?

You should also find out what exactly went wrong.  shlibsign
doesn't crash.  So we need to find out at which step the generation
of PQG params fails.  Perhaps it fails to load libfreebl3.so?
Comment 3 Wolfgang Rosenauer [:wolfiR] 2007-12-13 23:44:07 PST
Yes, it's something in optimization since building with -O0 works fine.
More details are in the referenced bug (which is open for everybody to read) and hopefully tracked by Novell's gcc team.
Comment 4 İsmail Dönmez 2007-12-16 02:36:13 PST
Wan-Teh,

Fedora, Debian, Ubuntu provides gcc 4.3 snapshots. Is there any chance you could test with them?
Comment 5 Nelson Bolyard (seldom reads bugmail) 2007-12-16 17:24:45 PST
There is no bug in NSS to be fixed.  The bug report is resolved. The evidence in the bug is clear: there is an optimizer problem.  The gcc maintainers can fix gcc, and then we'll be willing to use the fixed compiler.

A great deal of time was spent making the comba code right, by 3 or 4 people two years ago, and this attitude that seems to assume that we have no idea 
what our code does is simply insulting and unwelcome.

We're not going to bend over every time some maker of an open source build tool has some new whim.  Cygwin tried that, and now they're no longer a supported build platform for Mozilla.

One of the characteristics of inline assembler is that it tells the compiler what instructions to produce, pretty directly.  If the compiler can no longer follow those instructions, and now produces code that does not do what it was told, that's a compiler bug. The instructions it was told to produce were, and still are, correct, and do not need to be fixed.

I'm quite happy to continue to use older properly-working versions of gcc to build NSS until they can fix their bug.  As long as the tinderbox builds continue to build and pass without errors, there is no bug in NSS's comba code that needs to be fixed.

If for some reason we should desire to build with a version of gcc that can no longer correctly build with inline assembler, then I would produce a .s file containing the correct code built by the older version of gcc, and NSS would start using that .s file instead of the .c sources with that broken gcc compiler.  We already do that for another vendor's compilers.  I'm quite willing to do it for gcc.

If someone wants to CONTRIBUTE a patch that solves this problem with the new compiler, but produces no less efficient code and doesn't reduce performance ANY on servers in load tests, that would be considered.
Comment 6 Wolfgang Rosenauer [:wolfiR] 2007-12-17 00:04:04 PST
Maybe I should make clear that I reported that issue here just to make sure that people hitting that issue can find a reference in bmo and can find out what's going on. I don't expect NSS people to care about that.
Comment 7 Jean-Marc Desperrier 2007-12-22 02:47:01 PST
This certainly is a gcc bug, but I think it's worth taking into consideration that the more compilers evolve, the more direct insertion  of external inline assembly instruction in the middle of the code they generate perturbs them.
AFAIK the way to lower the risks is usually to save and restore as much of the register state as possible before entering/when leaving the inline assembly instruction blocks. And from a very fast look, it doesn't look like http://lxr.mozilla.org/mozilla/source/security/nss/lib/freebl/mpi/mp_comba.c
does much of that.

Comment 8 Kai Engert (:kaie) (on vacation) 2008-02-08 17:58:29 PST
The bug seems fixed.

Using the latest development snapshot of the Fedora Rawhide OS, which comes with gcc 4.3.0, I was able to build NSS trunk fine. I also ran the test suite, which completed without failures.
Comment 9 Kai Engert (:kaie) (on vacation) 2008-02-08 18:09:37 PST
The previous comment was based on a debug build.
Wan-Teh reminded me the original bug was reported with an optimized build.

And indeed. I still see a failure with an optimized build :-(

/home/kaie/moz/nss/head/mozilla/security/nss/cmd/shlibsign/Linux2.6_x86_64_glibc_PTH_64_OPT.OBJ/shlibsign -v -i /home/kaie/moz/nss/head/mozilla/security/nss/cmd/shlibsign/../../../../dist/Linux2.6_x86_64_glibc_PTH_64_OPT.OBJ/lib/libsoftokn3.so
Generating DSA Key Pair....Generating PQG Params: An I/O error occurred during security authorization.
make[2]: *** [../../../../dist/Linux2.6_x86_64_glibc_PTH_64_OPT.OBJ/lib/libsoftokn3.chk] Error 1
make[1]: *** [libs] Error 2
make: *** [libs] Error 2
Comment 10 Kai Engert (:kaie) (on vacation) 2008-02-08 18:33:51 PST
(I personally think this bug should be kept open until a real solution is known, even if no fix inside NSS is planned.)

I have been asked to rebuild NSS for the next Linux Fedora OS release, where all components are being built with gcc 4.3.0.

Unless a better solution is known, I will use the workaround proposed in comment 0.
Comment 11 Nelson Bolyard (seldom reads bugmail) 2008-02-08 19:23:04 PST
I think it would be less than a day's work (maybe about an hour's work) to:
- build NSS optimized on Linux with some non-broken gcc
- disassemble mp_comba.o generating mp_comba_broken_gcc.s
- change the Makefile(s) to build mp_comba_broken_gcc.s instead of mp_comba.c
on Linux systems which have that broken gcc  
- commit mp_comba_broken_gcc.s and the modified Makefile.  

I'm not including the wait-for-review time in that estimate.
Comment 12 Matthew Gregan [:kinetik] 2008-02-11 01:27:11 PST
There some more analysis of this bug in https://bugzilla.redhat.com/show_bug.cgi?id=432146 and a bug against GCC 4.3 (thanks to Jakub Jelinek) with a reduced testcase in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35160.

Jakub also suggested a change (in the RH bug) to mp_comba.c that avoids this bug by simplifying the asm constraints and removing unnecessary variable initialization.  Would the NSS team be willing to consider including this change?
Comment 13 Nelson Bolyard (seldom reads bugmail) 2008-02-15 14:15:29 PST
In reply to Matthew's comment 12, a question and some comments.

a) The gcc.gnu.org bug seems to suggest that a patch has already been made.
(Or am I misunderstanding that?)  If that bug is fixed, then is there any
need or motivation to make any further changes to NSS sources?

b) Changing this in NSS 3.11.x has implications on the FIPS validation of
this code.  We're not planning on any further FIPS validations of the NSS
3.11.c code (as I understand things), but we are considering doing another 
FIPS evaluation for NSS 3.12.  So, this begs the question: if we were to 
accept a patch to NSS for NSS 3.12.x only (the trunk) and not for NSS 3.11.x
would that be helpful to anyone?  

c) I've seen the patch at https://bugzilla.redhat.com/attachment.cgi?id=294959
and my reaction is that I'd like Tom St. Denis to review it and vouch for it.
If he's willing to take it into libTomFast (or whatever it was called), 
then I'd consider taking it in NSS 3.12, IFF it still needs to be fixed 
(that is, if GCC hasn't been fixed by the patch for their bug cited above).  
Comment 14 Christopher Aillon (sabbatical, not receiving bugmail) 2008-02-15 14:37:01 PST
(In reply to comment #13)
> In reply to Matthew's comment 12, a question and some comments.
> 
> a) The gcc.gnu.org bug seems to suggest that a patch has already been made.
> (Or am I misunderstanding that?)  If that bug is fixed, then is there any
> need or motivation to make any further changes to NSS sources?

That is correct.  As the reporter version of GCC 4.3 was not final, it is safe to say that there will not be any release versions of GCC 4.3 that will suffer from that bug.  However, jakub seems to imply that the current code does unnecessary work, and could be improved regardless as described with the patch.

> 
> b) Changing this in NSS 3.11.x has implications on the FIPS validation of
> this code.  We're not planning on any further FIPS validations of the NSS
> 3.11.c code (as I understand things), but we are considering doing another 
> FIPS evaluation for NSS 3.12.  So, this begs the question: if we were to 
> accept a patch to NSS for NSS 3.12.x only (the trunk) and not for NSS 3.11.x
> would that be helpful to anyone?

If it would improve the code, yes.


> c) I've seen the patch at https://bugzilla.redhat.com/attachment.cgi?id=294959
> and my reaction is that I'd like Tom St. Denis to review it and vouch for it.
> If he's willing to take it into libTomFast (or whatever it was called), 
> then I'd consider taking it in NSS 3.12, IFF it still needs to be fixed 
> (that is, if GCC hasn't been fixed by the patch for their bug cited above).  
> 

GCC has been fixed but based on Jakub's assertions that the current code is sub-optimal, I think it would be good to consider his change suggestions.  Re-opening for consideration.
Comment 15 Wan-Teh Chang 2008-02-15 14:54:52 PST
I've reviewed and tested the patch in the bugzilla.redhat.com bug report.
I had to study the GCC inline assembly syntax and the ISO C implementation
of those SQRADDxx macros in TomsFastMath.  I can vouch for its correctness
except for the early clobbers (=&r), which I still don't understand.

Nelson, if you want, we can omit the change to SQRADDDB (which has the
early clobber changes), because the change to SQRADDSC alone is enough to
work around the GCC 4.3pre bug, and I fully understand that change.  (That
change also makes it unnecessary to initialize sc0, sc1, sc2.)

Basically, the patch removes unnecessary constraints on the inputs/outputs
of the inline assembly.  So as caillon said, the patch should help GCC
generate better code.

Linux distributions such as Fedora and openSUSE have the ability to apply
their own patches to the pristine NSS source code, so Kai doesn't need the
patch on the NSS_3_11_BRANCH.  But it would be nice to get this patch into
the NSS trunk. Taking this change would require you to regenerate
mp_comba_amd64_sun.s for Solaris.

It is our pleasure to give the patch back to TomsFastMath upstream.  That
should not block us from taking this patch (at least the subset I fully
understand).
Comment 16 Nelson Bolyard (seldom reads bugmail) 2008-02-15 15:39:44 PST
Wan-Teh, given that mp_comba_amd64_sun.s is correct as is,  a change to 
the .c source does not necessitate a change to mp_comba_amd64_sun.s 

I do not want to take any change whose correctness depends on some very 
new version of gcc.  I am unwilling to take a change that means that the
.c file will cease to correctly compile with any version of gcc with which
it now compiles correctly.  As I understand it, the propopsed change to 
SQRADDSC meets my criteria, but I am not certain that the SQADDDB 
change does.  

Basically, my reaction to this whole bug, at this point, is this: 
Given that there is no bug in the current source that prevents it from 
being compiled and working correctly on Linux with a released gcc, I think 
there is no urgency to do anything with this.  This bug is now nothing more 
than a very minor performance enhancement.  

We have other priorities for NSS 3.12, priorities that are causing a lot of 
other desirable work to remain undone, such as a lot of SSL work.  The change 
requested in this bug is much lower priority (in my view) than the SSL work 
that is being delayed by higher priority work.  So, if I had more resources 
to put on NSS at this time, I would put them on SSL, not on this.  

If you have the time and interest to take this bug and make the changes, and 
to carry the responsiblity for them thereafter, I won't stand in your way. :)
Comment 17 Wan-Teh Chang 2008-02-15 16:21:14 PST
Understood and agreed.  This is why I did all the work in the Red Hat bug report, not here
(I didn't want to distract you from the SSL work), and my work on the Red Hat bug report
was all done during non-business hours as a pastime.
Comment 18 Kai Engert (:kaie) (on vacation) 2008-02-15 20:13:01 PST
Nelson, you claim the bug is not present in any released version of gcc.

You claim the bug is only present in a not-yet-released version of gcc 4.3.0pre.

Well, if we all had refused to work on this bug, this bug would probably have remained undetected, and gcc 4.3.0 would probably have been shipped with this bug.

If we all had simply ignored this issue and waited for the final release of gcc 4.3.0, then very likely we would have ended up in a situation where a released version of gcc would show this bug.

I'm therefore really glad that Wan-Teh and Jakub have offered their work to investigate and analyze this problem.
Comment 19 Nelson Bolyard (seldom reads bugmail) 2008-02-15 20:31:16 PST
> I'm therefore really glad that Wan-Teh and Jakub have offered their work to
> investigate and analyze this problem.

I am too!  I'm glad that their work got the problem fixed at the source.
Thanks to all who helped solve the gcc problem.

And now that it's fixed, I don't want to spend time to work around it in NSS.

Comment 20 Kai Engert (:kaie) (on vacation) 2008-02-15 21:24:59 PST
(In reply to comment #13)
>
> b) Changing this in NSS 3.11.x has implications on the FIPS validation of
> this code.  We're not planning on any further FIPS validations of the NSS
> 3.11.c code (as I understand things), but we are considering doing another 
> FIPS evaluation for NSS 3.12.  So, this begs the question: if we were to 
> accept a patch to NSS for NSS 3.12.x only (the trunk) and not for NSS 3.11.x
> would that be helpful to anyone?  

From my point of view, changes to 3.11 should be avoided.

From my point of view, if possible optimizations have been identified, it's reasonable to apply them to 3.12 before the next FIPS evaluation starts.
Comment 21 Wan-Teh Chang 2008-02-17 18:04:23 PST
Created attachment 303938 [details] [diff] [review]
Proposed patch by Jakub Jelinek of Red Hat

I submitted this patch to the email list for LibTom Projects in

http://groups.google.com/group/libtom/browse_thread/thread/46b7827d4374892c?hl=en
Comment 22 Wan-Teh Chang 2008-02-17 18:08:09 PST
Comment on attachment 303938 [details] [diff] [review]
Proposed patch by Jakub Jelinek of Red Hat

I forgot to mention that this patch was contributed by
Jakub Jelinek of Red Hat.  I have reviewed the patch and
fully understand it, including the "earlyclobbers".

I only added a note to mp_comba_amd64_sun.s because it
will get out of sync with mp_comba.c after the patch is
applied.
Comment 23 Kai Engert (:kaie) (on vacation) 2008-02-18 02:05:47 PST
Here are Jakub's arguments, he recommends that we apply this change to NSS, because (quote):

1) you can remove the unnecessary workarounds where you need to initialize
sc* = 0 in order to avoid compiler warnings.  All the clearing of the vars
really has to be done in the generated code

2) by accurrately describing what the inline asm does you give the compiler more
freedom to generate better code, especially if register preassure is high.
Comment 24 Nelson Bolyard (seldom reads bugmail) 2008-02-18 14:10:27 PST
I don't find those reasons persuasive. The still don't make this more important
than (say) any of the pending SSL work.
Comment 25 Kai Engert (:kaie) (on vacation) 2008-02-18 23:37:23 PST
(In reply to comment #24)
> The still don't make this more important than (say) any of the pending SSL work.

I don't understand this argument.
If I understand correctly, the work is already done.

Jakub produced the patch, Wan-Teh reviewed it, I ran the test suites.

Wan-Teh adds a comment to the SUN assembler file, in order to follow your wish to keep the SUN code unchanged - in order to avoid any work on SUN's side.

Are there arguments to reject the contribution?


I'm changing the bug summary, because this is no longer about miscompilation (gcc with fix compiles NSS fine), it's now about a contributed improvement.

I'm changing the Version attribute of this bug, to make it clear the improvement is proposed for trunk (only).
Comment 26 Wan-Teh Chang 2008-02-20 07:57:56 PST
Nelson,

In the last paragraph of comment 16, you said it's fine for me to take
this bug and make the change.

During the President Day long weekend, I worked on this bug as a pastime.
I attached the patch to this bug to wrap up the work.  I did *not* ask
anyone to review the patch.

Why can't I work on something that fascinates me in my spare time?  I
didn't waste any NSS resources.

Not to mention I also reviewed Julien's SSL patch (bug 403240 comment 33)
in the long weekend.
Comment 27 Nelson Bolyard (seldom reads bugmail) 2008-02-20 08:08:35 PST
Wan-Teh, I didn't realize you had taken the bug.  I was stating that *I*
will not work on this bug until more important work is done.  You are 
(of course, as always) both free and WELCOME to work on this if you wish!  
Sorry for my confusion.
Comment 28 Julien Pierre 2008-02-25 18:04:31 PST
Wan-Teh,

If you modify the comba code, please also regenerate the mp_comba_amd64_sun.s to make sure that it is in sync with the C master source. When we made our performance improvements for amd64, we made sure that everything built on both Solaris and Linux, even though we could have chosen to do Solaris only. We expect the code to be maintained on both platforms.

The "unnamed compiler" is gcc 3.4.3 on Solaris x64, which is in /usr/sfw/bin/gcc, bundled with s10, which you can DL free . Some testing with rsaperf should also be done to make sure that performance doesn't actually regress due to the patch before checkin. Note that the code was never tested on Intel x64 chips. It would be interesting to see how it performs there. We don't have any machines to test that on at Sun yet.
Comment 29 Kai Engert (:kaie) (on vacation) 2008-02-26 06:45:46 PST
Julien, do you make it a requirement to update the Sun code at the same time?
If I understand correctly:
- Nelson has proposed to avoid changes to the Sun code
- Wan-Teh might have a hard time to test it on Sun

Because of Nelson's preference to avoid changes to the Sun code, I'd conclude, checking in any changes for Sun code might be postponed to a later time?


I understand the current state of this bug as follows:

- Jakub Jelinek contributed the patch
- Wan-Teh has reviewed the patch

Is it OK if Wan-Teh marks the attachment as reviewed?
I'd think so.
Then he should be OK to check it in.
Comment 30 Nelson Bolyard (seldom reads bugmail) 2008-02-26 13:28:42 PST
I'm putting on my "NSS project lead for Sun" hat for this comment.
Sun does not require ANY change to mp_comba_amd64_sun.s for NSS 3.12.
No such change is prerequisite to the integration of Wan-Teh's patch 
to the .c file.  I will write more in private email.
Comment 31 Wan-Teh Chang 2008-03-08 22:10:27 PST
I checked in Jakub Jelinek's patch for mp_comba.c on the NSS trunk
for NSS 3.12.

Checking in mp_comba.c;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/mp_comba.c,v  <--  mp_comba.c
new revision: 1.3; previous revision: 1.2
done

I didn't update the comment in mp_comba_amd64_sun.s as I proposed in
attachment 303938 [details] [diff] [review].  Let me know if you want me to do that.
Comment 32 Nelson Bolyard (seldom reads bugmail) 2008-12-30 22:18:01 PST
Today, the bug at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35160 was 
marked as Resolved/Fixed in gcc 4.4.0

Note You need to log in before you can comment on or make changes to this bug.