Closed Bug 327855 Opened 18 years ago Closed 18 years ago

Crash on entering https sites when security.OCSP.enabled=1

Categories

(NSS :: Libraries, defect, P2)

x86
OS/2
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.11.1

People

(Reporter: mozilla, Assigned: julien.pierre)

Details

(Keywords: crash, fixed1.8.0.4, fixed1.8.1)

Attachments

(1 file)

Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.8.0.1) Gecko/20060212 SeaMonkey/1.0
Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.9a1) Gecko/20051206 Firefox/1.6a1

James Moe reported in mozilla.dev.ports.os2 that after updating to SeaMonkey 1.0 from Mozilla 1.7.12, SeaMonkey kept crashing when entering secure/https sites. I tested this with https://freemail.web.de and could narrow it down to one line in his prefs.js:
   user_pref("security.OCSP.enabled", 1);

I then ran an older Firefox debug build that I still had around through the debugger and discovered that the crash occurs in PR_Connect when called from ocsp_ConnectToHost.

In ocsp_ConnectToHost sock gets first assigned a pointer by PR_NewTCPSocket() and the sock==NULL test fails but a bit later the call
   hostIndex = PR_EnumerateHostEnt(hostIndex, &hostEntry, port, &addr);
for some reason resets sock to a NULL pointer. Don't understand why. This trivial patch would fix the immediate crash, though:

--- security/nss/lib/certhigh/ocsp.c    30 Jun 2005 20:53:57 -0000      1.21
+++ security/nss/lib/certhigh/ocsp.c    19 Feb 2006 22:28:05 -0000
@@ -1760,7 +1760,7 @@
        hostIndex = 0;
        do {
            hostIndex = PR_EnumerateHostEnt(hostIndex, &hostEntry, port, &addr);

-           if (hostIndex <= 0)
+           if (hostIndex <= 0 || sock == NULL)
                goto loser;
        } while (PR_Connect(sock, &addr, timeout) != PR_SUCCESS);
Peter,

There is nothing in the code in ocsp_ConnectToHost that can explain the value of the sock pointer being reset, so we cannot accept your patch. We need to fix the root cause. There may be a bug in one of the function that it calls that corrupts this stack variable. You should be able to very easily find the source of this problem in the OS/2 IDEBUG .

First set a breakpoint in ocsp_ConnectToHost.
Then run until you hit it.
Then setting a data change breakpoint 4 bytes on the address of sock (&sock).
Then continue.
You will then see the stack of the code that is corrupting this variable.
Julien, you're right.  'sock' shouldn't be NULL
at that point.

The following function calls write to the other
local variables in that function:

PR_StringToNetAddr(host, &addr): writes to 'addr'
PR_GetHostByName(host, netdbbuf, PR_NETDB_BUF_SIZE,
                 &hostEntry): writes to 'hostEntry'
PR_EnumerateHostEnt(hostIndex, &hostEntry, port,
                    &addr): writes to 'addr'

So one of them may be the culprit.
Assignee: wtchang → julien.pierre.bugs
Julien, I tried your trick with the storage change breakpoint on &sock in the debugger that puts me into the _STD(memset) function that belongs to libc05.dll (line 33 of memset.s). memset is the one called from PR_EnumerateHostEnt:
  memset(address, 0, sizeof(PRNetAddr));

Don't know much about these system level things but it seems to me that something goes wrong with the C-Library. CCing Knut, some input from him might be helpful.
Keywords: crash
Sounds like NSS and NSPR have different values for sizeof(PRNetAddr).
Is this a binary compatiblity issue?  Has that size changed recently?
Peter,

It's very unlikely there is anything wrong with the C library memset call itself. This would mess up many other places more likely long before this.

Nelson,

I think you are right about the definitions being out of sync between NSS and NSPR. The definition for the PRNetAddr structure in prio.h includes :

#if defined(XP_UNIX) || defined(XP_OS2_EMX)
    struct {                            /* Unix domain socket address */
        PRUint16 family;                /* address family (AF_UNIX) */
#ifdef XP_OS2
        char path[108];                 /* null-terminated pathname */
                                        /* bind fails if size is not 108. */
#else
        char path[104];                 /* null-terminated pathname */
#endif
    } local;
#endif

However, my reading of NSPR configure and NSS coreconf shows that XP_OS2 and XP_OS2_EMX should be set both during the NSPR build and the NSS build. Peter, can you verify this is the case from your build logs ?

The only other conditionals in the PRNetAddr definition are for BeOS, which is not set in the OS/2 build.
Sorry, was a stupid idea (that's what you get when you add bugzilla comments long after midnight...)

$OBJDIR\nsprpub\config\autoconf.mk lists both -DXP_OS2=1 -DXP_OS2_EMX=1. For NSS I looked at the gcc command line when I just recompiled ocsp.c, and found that no _EMX variable is present.

I put printf outputs in both ocsp_ConnectToHost and PR_EnumerateHostEnt and got this just before the crash:
ocsp_ConnectToHost: sizeof(PRNetAddr)=28
PR_EnumerateHostEnt: sizeof(PRNetAddr)=112



As a sidenote, I also see _lots_ or warnings like 

X:/SM_10/mozilla/security/nss/lib/certhigh/ocsp.c:115: warning: missing initializer
X:/SM_10/mozilla/security/nss/lib/certhigh/ocsp.c:115: warning: (near initialization for `ocsp_OCSPRequestTemplate[1].size')
X:/SM_10/mozilla/security/nss/lib/certhigh/ocsp.c:119: warning: missing initializer
X:/SM_10/mozilla/security/nss/lib/certhigh/ocsp.c:119: warning: (near initialization for `ocsp_OCSPRequestTemplate[2].size')

for many of the files belonging to NSS. OS/2 gcc (3.2.2) fault?
Try to add
DEFINES                 += -DXP_OS2_EMX
in mozilla/security/coreconf/OS2.mk in the ifdef XP_OS2_EMX section .

That should temporarily fix your problem. That's not the best solution though, since XP_OS2_EMX isn't used in the NSS source code at all, only in NSPR and its headers. It would be preferrable for the NSPR header to test for the compiler using a macro that it sets, if possible. If it is, then this bug should be moved over to NSPR.

Also, I'm wondering how NSPR works with VACPP now, if at all, given the current definition for PRNetAddr.
Yes, NSPR public headers should only test macros
that are defined by the compiler or in prcpucfg.h
(for OS/2, that is mozilla/nsprpub/pr/include/md/_os2.cfg).
OK, if I add -DXP_OS2_EMX=1 to XP_DEFINE in mozilla/security/coreconf/OS2.mk then it works as expected (PRNetAddr is 112 bytes long as seen from ocsp_ConnectToHost(), too.

If you say that XP_OS2_EMX should not be defined for NSPR, then we should also remove it from nsprpub/configure.in, and change most (all?) cases of XP_OS2_EMX to plain XP_OS2. The line "AC_DEFINE(XP_OS2_EMX)" in configure.in was added _long_ ago, and the current build is basically still called EMX/GCC build in many places...

So, not sure which way is more "correct". Mike, Javier, any opinions?
Well in the past we definitely had to distinguish between XP_OS2_EMX and XP_OS2 because TCP/IP implementation were different.

Personally I think we should get rid all VACPP code and convert EMX to XP_OS2.

I really don't think VACPP is relevant anymore.

Julien?
Mike,

I agree, you can get rid of VACPP support in NSPR and NSS. Just make sure this is done all at once and document the releases that drop VACPP support.
Attached patch Proposed patchSplinter Review
Does OS/2 have Unix domain sockets (address family AF_UNIX or AF_LOCAL)
regardless of the compiler used?

The proper fix (without removing all VACPP code) is to add -DXP_OS2_EMX
to DEFINES as Julien suggested.

I also found one minor problem: NSS compiles with both -DXP_OS2=1 and
-DXP_OS2 because it adds -DXP_OS2=1 to XP_DEFINE and -DXP_OS2 to DEFINES.
Attachment #213338 - Flags: superreview?(julien.pierre.bugs)
Attachment #213338 - Flags: review?(mozilla)
I wasn't aware that OS/2 supported domain sockets with any compiler. AFAIK the OS/2 stack doesn't support domain sockets, but I could be wrong. I don't have an OS/2 machine in front of me to check. The article at http://www.edm2.com/0308/socket1.html says only the AF_INET is supported, though, but it is from 1998 and the more recent TCP/IP stack updates may have changed that.
Comment on attachment 213338 [details] [diff] [review]
Proposed patch

Looks OK as a temporary fix.
Attachment #213338 - Flags: superreview?(julien.pierre.bugs) → superreview+
Comment on attachment 213338 [details] [diff] [review]
Proposed patch

Yes, the patch is fine for the current problem.

According to the OS/2 Toolkit docs AF_UNIX (equivalent to AF_OS2) is available with any compiler through the sockets.sys and afos2.sys drivers. So it should be available from any compiler.
Attachment #213338 - Flags: review?(mozilla) → review+
I checked in the patch on the NSS trunk (NSS 3.12),
NSS_3_11_BRANCH (3.11.1), NSS_CLIENT_TAG (Mozilla 1.9
alpha), and MOZILLA_1_8_BRANCH (Mozilla 1.8.1).

Based on Peter's comment 15, it is safe to change
defined(XP_OS_EMX) to defined(XP_OS2) in the definition
of PRNetAddr (see comment 5).  I also verified that's
the only place in the entire NSS source tree and NSPR's
public header files where the XP_OS_EMX macro is tested.
Status: NEW → RESOLVED
Closed: 18 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Target Milestone: --- → 3.11.1
What are the compiler predefined macros that identify
the EMX/GCC and VACPP compilers?

In comment 10 and comment 11, Michael and Julien agreed
to get rid of VACPP support in NSPR and NSS.  Until then,
a more robust solution would be for NSPR to define either
XP_OS2_EMX or XP_OS2_VACPP in prcpucfg.h (which is
_os2.cfg renamed) by testing the compiler-identification
macros.  Something like this:

mozilla/nsprpub/pr/include/md/_os2.cfg:

#if defined(__GNUC__) && !defined(XP_OS2_EMX)
#define XP_OS2_EMX 1
#endif

#if (defined(_IBMC__) || defined(__IBMCPP__)) && !defined(XP_OS2_VACPP)
#define XP_OS2_VACPP 1
#endif
Wan-Teh,

I think __IBMC__ is defined by VACPP and __GNUC__ by EMX/gcc.
Wan-Teh, should not configure take care of these variables? If they wouldn't get set according to the compiler (GCC/EMX or VACPP) then there would be something wrong with the compiler setup. So if you change that I wouldn't silently set extra defines in _os2.cfg but instead to an "#error XP_OS2_EMX not defined!" and similarly for VACPP.

Btw, I don't see a prcpucfg.h for OS/2, only for Mac?
Peter:

The prcpucfg.h file for OS/2 is mozilla/nsprpub/pr/include/md/_os2.cfg.
It is renamed prcpucfg.h during NSPR's build process.

NSPR doesn't require its clients to define any macros in order to use
its header files.  The policy is that NSPR's public header files should
be "self-sufficient" and should only test compiler predefined macros or
macros defined in some other NSPR header (most commonly prcpucfg.h).

Since Mozilla may be the only client of NSPR on OS/2, I'm fine with
making an exception of this policy for OS/2.  I verified that all the
four build systems in Mozilla (Mozilla itself, mozilla/directory/c-sdk,
NSPR, and NSS's coreconf) define XP_OS2_EMX now.
Comment on attachment 213338 [details] [diff] [review]
Proposed patch

Would be great to see this simple OS/2 only fix on the 1.8.0, too.

(I understand that it's already too late for 1.8.0.2 so I'm asking for the next one.)
Attachment #213338 - Flags: approval1.8.0.3?
Comment on attachment 213338 [details] [diff] [review]
Proposed patch

a=mkaply and I will try to remember to put this in official 1.8.0.2
Attachment #213338 - Flags: approval1.8.0.3? → approval1.8.0.3+
Checkin reminder: please get this into 1.8.0 branch.
I checked in the patch on the MOZILLA_1_8_0_BRANCH for
Mozilla 1.8.0.3.

Checking in OS2.mk;
/cvsroot/mozilla/security/coreconf/OS2.mk,v  <--  OS2.mk
new revision: 1.20.14.1.2.1; previous revision: 1.20.14.1
done
Keywords: fixed1.8.0.3
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: