Closed Bug 134099 Opened 22 years ago Closed 22 years ago

Assertion fails on startup in prsocket.c

Categories

(NSPR :: NSPR, defect, P1)

x86
BeOS
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: beos, Assigned: wtc)

Details

Attachments

(1 file, 4 obsolete files)

Well, it seems we keep running into problems in nspr.  Everything builds, but
when trying to run, an assertion fails in prsocket.c at line 466.  The addr sent
into this function from PR_NewTCPSocketPair (peerAddr) never gets initialized. 
I think the _MD_accept in bnet.c should be setting this, but it is not, the the
result it returns is rv=2.

Chris, any ideas?  I don't know if we can look at nightly builds, especially if
they have been failing because of ldap failing.  I'll try looking through cvs to
see if I can notice any changes that might have caused this.
Oops.  I filed this under the wrong product.  Changed to NSPR
Component: Networking → NSPR
Product: Browser → NSPR
I think that's caused by my fix for bug 106496.  PR_NewTCPSocketPair
was not calling PR_Accept with peerAddr.  I changed that so that we
can verify the peer is the other socket we create in PR_NewTCPSocketPair.
Attached patch Proposed patch (obsolete) — Splinter Review
The deleted code is wrong.  I hope that's the problem.
Let me know if this patch fixes the assertion failure.
Priority: -- → P1
Target Milestone: --- → 4.2
_PR_HAVE_SOCKADDR_LEN is only defined for the BONE version, so the code you would be
removing would never be used under net_server anyway.  Sorry, I guess I should have said
that this was for the net_server build.  I have not tried under BONE lately.
Here is a step through of what I see happenning before the failed assertion:

- PR_NewTCPSocketPair from prsocket.c gets called.
- since we are neither unix nor nt, two PRNetAddr are declared, selfAddr and
  peerAddr
- selfAddr gets initialized
- peerAddr is used in the PR_Accept call without being initialized
- PR_Accept calls SocketAccept in prsocket.c
- before the failed assertion is reached, _MD_accept is called with the still, unitialized PRNetAddr

So, where is peerAddr supposed to be initialized?  From what I'v read in the 
BeBook:

>After you've bound a stream listener socket to an interface (through bind()),
>you then tell the socket to start listening for clients that are trying to
>connect. You then pass the socket to accept() which blocks until a client
>connects to the listener (the client does this by calling connect(), passing it
>a description of the interface to which the listener is bound). 

This says to be that the client addr being passed into accept should be a vaild
sockaddr, but in our scenario, it is not.  So, this leads me to ask, what is
peerAddr supposed to be?
The client addr being passed into accept() is to receive
the address of the client.  It is an output argument.
Therefore it does not need to be initialized.

Can you print the value of *addrlen before and after
each call to accept()?  addrlen is an input/output
argument.  It needs to be initialized to the size of
the buffer for the client addr, and on return accept()
sets it to the actual size of the client addr.
Please also print the contents of *addr when
the assertion in prsocket.c at line 466 fails.

Could it be that the accept() function has a
bug and does not return the client address?
If so, you can work around this accept() bug
by modifying the _MD_accept() function in bnet.c
to call getpeername(rv, addr, addrlen) (I omitted
the casts) if rv != -1 and addr is not NULL.
The addr length stays consistently at 12.
The peerAddr raw struct has a family value of 31120.
After the accept call in bnet.c, it remains unchanged.
Wtc, when you want me to print out *addr, what exactly do you want?


Chris, have you tried to build/run on BeOS with the latest code lately?  Do you see similar problems?

I will do an update under BONE tonight, and see if everything is ok there, or not.

I will also try the suggestion above.  However, _MD_accept had to have been called before, why would it not work now?
> The addr length stays consistently at 12.

This value is correct for BeOS.

> The peerAddr raw struct has a family value of 31120.
> After the accept call in bnet.c, it remains unchanged.

31120 is the uninitialized value.  If it remains unchanged
after the accept call in bnet.c, this means accept has a
bug: it does not store the peer's address in the sockaddr
structure pointed to by 'addr'.

> However, _MD_accept had to have been called before, why
> would it not work now?

We were not calling _MD_accept with a *non-NULL* addr
argument before.  After my change to PR_NewTCPSocketPair,
we are now calling _MD_accept with a non-NULL addr argument.

accept with a non-NULL addr argument is equivalent to
accept with a NULL addr argument + getpeername on the new
socket.  This is why I suggested the workaround of adding
getpeername(rv, addr, addrlen) (I omitted the casts).
This fixes things under net_server builds.  BONE was not working last night.  
It's getting late, so I won't be able to check BONE until tomorrow.
Attachment #76750 - Attachment is obsolete: true
default owner and qa.
Assignee: new-network-bugs → wtc
QA Contact: benc → wtc
Ok, under BONE, it is failing with an Assertion at line 1079 in prsocket.c, on the second time that PR_GetSockName is called.  I'm assuming the first time was
with a non-NULL value, which worked, and the second is with a NULL value.

The starting addrlen, in both calls, is 28.  When the call to getsockname in the _MD_getsockname method in bnet.c returns the first time, the addrlen is set
to 12, which is valid.  The second call, it remains 28, but the getsockname
function does not return a failure code either.

Is there something like before, that I can call, if the addr is NULL, so that
the addrlen is set properly?
Arougthopher:

> Is there something like before, that I can call, if
> the addr is NULL, so that the addrlen is set properly?

I don't understand your question.  The second PR_GetSockName()
call in PR_NewTCPSocketPair() is:
    if (PR_GetSockName(f[0], &selfAddr) == PR_FAILURE) {
        goto failed;
    }

The addr is &selfAddr, which is not NULL.

It seems that the second call to getsockname() fails because
addrlen remains 28 after the call.  That just can't be right.
Arougthopher: please try this patch under BONE.  It explicitly
binds the client socket and then calls getsockname on it.
This fixes the problem for both BONE and net_server builds.  

wtc: Your comments above about removing the IS_LITTLE_ENDIAN code were needed
for the BONE build and have been added to this patch.

btw, if you don't want to type out arougthopher, you can just call me paul :)

Thank you so VERY much for your help.
Attachment #77176 - Attachment is obsolete: true
Attachment #77343 - Attachment is obsolete: true
Well, I just tried a nightly build, and the assertion was not thrown?!?
Maybe this problem was happenning because I am running with DEBUG enabled.

I will try removing the patch and building again, and also building without
debug.
Paul,

NSPR assertions (the PR_ASSERT macro) are compiled away in
optimized builds.  By default the assertions only exist in
debug builds.  This is why you won't see the assertion
failure in a nightly build (which is optimized).
oh yeah, just call me stupid :)
I think I was coding a little to long that day.
Anyway, the patch seems to work for me.  r=arougthopher
Chris or Wan-teh, can you give this your r=, so that it can be checked in?  I
don't have access to the NSPR branch, IIRC.  It would be nice to be able to
build a DEBUG build, without having to remember to apply this patch every time,
and, for new developers, which BeOS desparately needs!

thanks
Status: NEW → ASSIGNED
Attached patch New patchSplinter Review
Arougthopher,

Could you try this new patch?  Its change to prsocket.c
is cleaner.  Let me know if it works on BeOS.  Thanks.
Keywords: mozilla1.0.1
Target Milestone: 4.2 → 4.2.1
Arougthopher,

Just a reminder that you don't need to recompile or relink
Mozilla after applying the new NSPR patch.  You only need to
do a clean rebuild in nsprpub.

Here are the steps.
1. Remove previously patched mozilla/nsprpub/pr/src/io/prsocket.c
and mozilla/nsprpub/pr/src/md/beos/bnet.c.  Check out these two
files from cvs.

2. cd to the top of your build tree, then do
   % cd nsprpub
   % gmake clean
   % gmake
Attachment #77355 - Attachment is obsolete: true
Comment on attachment 86682 [details] [diff] [review]
New patch

r=arougthopher
this works for me.

btw, Wan-teh, I know you don't have to recompile everthing, but, if a new
developer, or, if I download a release branch, then it must be, which is what I
meant.	Thanks for the comments, though.
Attachment #86682 - Flags: review+
I checked in the patch on the tip and NSPRPUB_PRE_4_2_CLIENT_BRANCH
of NSPR.

Paul, Chris, please let me know if you want the fix on the
MOZILLA_1_0_BRANCH for inclusion in the mozilla 1.0.1 release.
We will need drivers' approval for that.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
I don't see a need for this in the MOZILLA_1_0_BRANCH, as 1.1.1alpha is supposed
to be released soon, which, in my mind, means the 1_0 branch won't be needed as
much, soon.  Though, I could be completely wrong on this :)  Chris?
Sadly, I haven't been able to get a straight answer from drivers wrt accepting
anything for the 1.0 branch other than fixes for commercial embedding concerns.
 So while I think it's a good fix for our "stable" branch, I have no faith that
drivers would actually accept the patch so that time & effort would be better
spent elsewhere.
Keywords: mozilla1.0.1
Target Milestone: 4.2.1 → 4.2.2
verified
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: