Assertion fails on startup in prsocket.c

VERIFIED FIXED in 4.2.2

Status

NSPR
NSPR
P1
blocker
VERIFIED FIXED
16 years ago
16 years ago

People

(Reporter: Paul, Assigned: Wan-Teh Chang)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 4 obsolete attachments)

(Reporter)

Description

16 years ago
Well, it seems we keep running into problems in nspr.  Everything builds, but
when trying to run, an assertion fails in prsocket.c at line 466.  The addr sent
into this function from PR_NewTCPSocketPair (peerAddr) never gets initialized. 
I think the _MD_accept in bnet.c should be setting this, but it is not, the the
result it returns is rv=2.

Chris, any ideas?  I don't know if we can look at nightly builds, especially if
they have been failing because of ldap failing.  I'll try looking through cvs to
see if I can notice any changes that might have caused this.
(Reporter)

Comment 1

16 years ago
Oops.  I filed this under the wrong product.  Changed to NSPR
Component: Networking → NSPR
Product: Browser → NSPR
(Assignee)

Comment 2

16 years ago
I think that's caused by my fix for bug 106496.  PR_NewTCPSocketPair
was not calling PR_Accept with peerAddr.  I changed that so that we
can verify the peer is the other socket we create in PR_NewTCPSocketPair.
(Assignee)

Comment 3

16 years ago
Created attachment 76750 [details] [diff] [review]
Proposed patch

The deleted code is wrong.  I hope that's the problem.
Let me know if this patch fixes the assertion failure.
(Assignee)

Updated

16 years ago
Priority: -- → P1
Target Milestone: --- → 4.2
(Reporter)

Comment 4

16 years ago
_PR_HAVE_SOCKADDR_LEN is only defined for the BONE version, so the code you would be
removing would never be used under net_server anyway.  Sorry, I guess I should have said
that this was for the net_server build.  I have not tried under BONE lately.
(Reporter)

Comment 5

16 years ago
Here is a step through of what I see happenning before the failed assertion:

- PR_NewTCPSocketPair from prsocket.c gets called.
- since we are neither unix nor nt, two PRNetAddr are declared, selfAddr and
  peerAddr
- selfAddr gets initialized
- peerAddr is used in the PR_Accept call without being initialized
- PR_Accept calls SocketAccept in prsocket.c
- before the failed assertion is reached, _MD_accept is called with the still, unitialized PRNetAddr

So, where is peerAddr supposed to be initialized?  From what I'v read in the 
BeBook:

>After you've bound a stream listener socket to an interface (through bind()),
>you then tell the socket to start listening for clients that are trying to
>connect. You then pass the socket to accept() which blocks until a client
>connects to the listener (the client does this by calling connect(), passing it
>a description of the interface to which the listener is bound). 

This says to be that the client addr being passed into accept should be a vaild
sockaddr, but in our scenario, it is not.  So, this leads me to ask, what is
peerAddr supposed to be?
(Assignee)

Comment 6

16 years ago
The client addr being passed into accept() is to receive
the address of the client.  It is an output argument.
Therefore it does not need to be initialized.

Can you print the value of *addrlen before and after
each call to accept()?  addrlen is an input/output
argument.  It needs to be initialized to the size of
the buffer for the client addr, and on return accept()
sets it to the actual size of the client addr.
(Assignee)

Comment 7

16 years ago
Please also print the contents of *addr when
the assertion in prsocket.c at line 466 fails.

Could it be that the accept() function has a
bug and does not return the client address?
If so, you can work around this accept() bug
by modifying the _MD_accept() function in bnet.c
to call getpeername(rv, addr, addrlen) (I omitted
the casts) if rv != -1 and addr is not NULL.
(Reporter)

Comment 8

16 years ago
The addr length stays consistently at 12.
The peerAddr raw struct has a family value of 31120.
After the accept call in bnet.c, it remains unchanged.
Wtc, when you want me to print out *addr, what exactly do you want?


Chris, have you tried to build/run on BeOS with the latest code lately?  Do you see similar problems?

I will do an update under BONE tonight, and see if everything is ok there, or not.

I will also try the suggestion above.  However, _MD_accept had to have been called before, why would it not work now?
(Assignee)

Comment 9

16 years ago
> The addr length stays consistently at 12.

This value is correct for BeOS.

> The peerAddr raw struct has a family value of 31120.
> After the accept call in bnet.c, it remains unchanged.

31120 is the uninitialized value.  If it remains unchanged
after the accept call in bnet.c, this means accept has a
bug: it does not store the peer's address in the sockaddr
structure pointed to by 'addr'.

> However, _MD_accept had to have been called before, why
> would it not work now?

We were not calling _MD_accept with a *non-NULL* addr
argument before.  After my change to PR_NewTCPSocketPair,
we are now calling _MD_accept with a non-NULL addr argument.

accept with a non-NULL addr argument is equivalent to
accept with a NULL addr argument + getpeername on the new
socket.  This is why I suggested the workaround of adding
getpeername(rv, addr, addrlen) (I omitted the casts).
(Reporter)

Comment 10

16 years ago
Created attachment 77176 [details] [diff] [review]
Patch for bug in BeOS accept call

This fixes things under net_server builds.  BONE was not working last night.  
It's getting late, so I won't be able to check BONE until tomorrow.
Attachment #76750 - Attachment is obsolete: true

Comment 11

16 years ago
default owner and qa.
Assignee: new-network-bugs → wtc
QA Contact: benc → wtc
(Reporter)

Comment 12

16 years ago
Ok, under BONE, it is failing with an Assertion at line 1079 in prsocket.c, on the second time that PR_GetSockName is called.  I'm assuming the first time was
with a non-NULL value, which worked, and the second is with a NULL value.

The starting addrlen, in both calls, is 28.  When the call to getsockname in the _MD_getsockname method in bnet.c returns the first time, the addrlen is set
to 12, which is valid.  The second call, it remains 28, but the getsockname
function does not return a failure code either.

Is there something like before, that I can call, if the addr is NULL, so that
the addrlen is set properly?
(Assignee)

Comment 13

16 years ago
Arougthopher:

> Is there something like before, that I can call, if
> the addr is NULL, so that the addrlen is set properly?

I don't understand your question.  The second PR_GetSockName()
call in PR_NewTCPSocketPair() is:
    if (PR_GetSockName(f[0], &selfAddr) == PR_FAILURE) {
        goto failed;
    }

The addr is &selfAddr, which is not NULL.

It seems that the second call to getsockname() fails because
addrlen remains 28 after the call.  That just can't be right.
(Assignee)

Comment 14

16 years ago
Created attachment 77343 [details] [diff] [review]
Patch for PR_NewTCPSocketPair.  Bind the client socket explicitly.

Arougthopher: please try this patch under BONE.  It explicitly
binds the client socket and then calls getsockname on it.
(Reporter)

Comment 15

16 years ago
Created attachment 77355 [details] [diff] [review]
combined patch + changes from comments above

This fixes the problem for both BONE and net_server builds.  

wtc: Your comments above about removing the IS_LITTLE_ENDIAN code were needed
for the BONE build and have been added to this patch.

btw, if you don't want to type out arougthopher, you can just call me paul :)

Thank you so VERY much for your help.
Attachment #77176 - Attachment is obsolete: true
Attachment #77343 - Attachment is obsolete: true
(Reporter)

Comment 16

16 years ago
Well, I just tried a nightly build, and the assertion was not thrown?!?
Maybe this problem was happenning because I am running with DEBUG enabled.

I will try removing the patch and building again, and also building without
debug.
(Assignee)

Comment 17

16 years ago
Paul,

NSPR assertions (the PR_ASSERT macro) are compiled away in
optimized builds.  By default the assertions only exist in
debug builds.  This is why you won't see the assertion
failure in a nightly build (which is optimized).
(Reporter)

Comment 18

16 years ago
oh yeah, just call me stupid :)
I think I was coding a little to long that day.
Anyway, the patch seems to work for me.  r=arougthopher
(Reporter)

Comment 19

16 years ago
Chris or Wan-teh, can you give this your r=, so that it can be checked in?  I
don't have access to the NSPR branch, IIRC.  It would be nice to be able to
build a DEBUG build, without having to remember to apply this patch every time,
and, for new developers, which BeOS desparately needs!

thanks
Status: NEW → ASSIGNED
(Assignee)

Comment 20

16 years ago
Created attachment 86682 [details] [diff] [review]
New patch

Arougthopher,

Could you try this new patch?  Its change to prsocket.c
is cleaner.  Let me know if it works on BeOS.  Thanks.
(Assignee)

Updated

16 years ago
Keywords: mozilla1.0.1
Target Milestone: 4.2 → 4.2.1
(Assignee)

Comment 21

16 years ago
Arougthopher,

Just a reminder that you don't need to recompile or relink
Mozilla after applying the new NSPR patch.  You only need to
do a clean rebuild in nsprpub.

Here are the steps.
1. Remove previously patched mozilla/nsprpub/pr/src/io/prsocket.c
and mozilla/nsprpub/pr/src/md/beos/bnet.c.  Check out these two
files from cvs.

2. cd to the top of your build tree, then do
   % cd nsprpub
   % gmake clean
   % gmake
(Reporter)

Updated

16 years ago
Attachment #77355 - Attachment is obsolete: true
(Reporter)

Comment 22

16 years ago
Comment on attachment 86682 [details] [diff] [review]
New patch

r=arougthopher
this works for me.

btw, Wan-teh, I know you don't have to recompile everthing, but, if a new
developer, or, if I download a release branch, then it must be, which is what I
meant.	Thanks for the comments, though.
Attachment #86682 - Flags: review+
(Assignee)

Comment 23

16 years ago
I checked in the patch on the tip and NSPRPUB_PRE_4_2_CLIENT_BRANCH
of NSPR.

Paul, Chris, please let me know if you want the fix on the
MOZILLA_1_0_BRANCH for inclusion in the mozilla 1.0.1 release.
We will need drivers' approval for that.
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED
(Reporter)

Comment 24

16 years ago
I don't see a need for this in the MOZILLA_1_0_BRANCH, as 1.1.1alpha is supposed
to be released soon, which, in my mind, means the 1_0 branch won't be needed as
much, soon.  Though, I could be completely wrong on this :)  Chris?
Sadly, I haven't been able to get a straight answer from drivers wrt accepting
anything for the 1.0 branch other than fixes for commercial embedding concerns.
 So while I think it's a good fix for our "stable" branch, I have no faith that
drivers would actually accept the patch so that time & effort would be better
spent elsewhere.
(Assignee)

Updated

16 years ago
Keywords: mozilla1.0.1
(Assignee)

Updated

16 years ago
Target Milestone: 4.2.1 → 4.2.2
(Reporter)

Comment 26

16 years ago
verified
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.