92123 - Linux-Crash on AOL cert enrollment page - Trunk, M092 & N610 [@ libc.so.6 - ssl_DefSend]

Reporter

Description

•

23 years ago

Build: branch 07/24 linux build
OS: RH 6.2-J

Using 07/24 linux branch build with a new profile, when I tried to get an AOL 
cert from https://certificates.netscape.com, the browser crashes.

Steps to reproduce:
1. Launch browser with a new profile.
2. Go to https://certificates.netscape.com
3. Click on Get the Cert
The browser crashes, a talkback window comes up, but the browser window remains.
However it seems the browser is disconnected from network. You can't go to any 
web site from that browser window.
Callstack to follow.

ji

Reporter

Comment 1

•

23 years ago

Below is the callstack:
Incident ID 33283866
Stack Signature libc.so.6 + 0xb2c82 (0x404d9c82) de72ad1c
Bug ID
Trigger Time 2001-07-24 10:50:44
User Comments Get AOL cert
Build ID 2001072405
Product ID Netscape6.10
Platform ID LinuxIntel
Stack Trace
libc.so.6 + 0xb2c82 (0x404d9c82)
libnspr4.so + 0x1ce78 (0x401b5e78)
libnspr4.so + 0xc42a (0x401a542a)
ssl_DefSend()
ssl3_SendRecord()
SSL3_SendAlert()
ssl_SecureClose()
ssl_Close()
nsSSLIOLayerClose()
libnspr4.so + 0xb5f7 (0x401a45f7)
nsSocketTransport::CloseConnection()
nsSocketTransport::Process()
nsSocketTransportService::Run()
nsThread::Main()
libnspr4.so + 0x1f3ee (0x401b83ee)
libpthread.so.0 + 0x5b85 (0x401ccb85

Keywords: crash

ji

Reporter

Updated

•

23 years ago

Version: 1.01 → 2.0

John Unruh

Comment 2

•

23 years ago

Verified on Linux. Mac and Win32 are OK.

John Unruh

Comment 3

•

23 years ago

Adding keywords. I'll download builds going backward until I find the one that 
does not crash.

Keywords: nsbeta1, regression

John Unruh

Comment 4

•

23 years ago

This was working with the 7/23 Linux commercial build.

John Unruh

Comment 5

•

23 years ago

My mistake. This bug goes back to at least 7/13.

Jimmy Lee

Comment 6

•

23 years ago

May 23 works for me.  May 31 fails for me.  I have no other builds in between.

Javier Delgadillo

Comment 7

•

23 years ago

Is this a Linux and JA build only?

John Unruh

Comment 8

•

23 years ago

I'm using the normal Linux branch build on Redhat 6.0. I used to not be able to 
reach https://certificates.netscape.com/NSEnroll.html , but now that the browser 
loads it, the crash bug has become visible.

ji

Reporter

Comment 9

•

23 years ago

The build I used is English linux branch build and system is RH6.2 Japanese.

John Unruh

Comment 10

•

23 years ago

Getting certs with Linux from any of the sites on this page 
http://junruh.mcom.com/tests.html under "CMS 4.2 testing" works. The crash 
occurs only when clicking on the link at https://certificates.netscape.com

Priority: -- → P1

John Unruh

Comment 11

•

23 years ago

cc shadow

John Unruh

Comment 12

•

23 years ago

Changing summary.

URL: https://certificates.netscape.com → https://certificates.netscape.com/NSE...

Keywords: nsdogfood, pp

Summary: Browser crashes when trying to get a cert → Linux-Crash on AOL cert enrollment page

beomsuk

Comment 13

•

23 years ago

I just finished simple test for this version of browser with cms and netscape 
root ca. 
I didn't see any problem with cms and it is not crashed with netscape root ca 
but hung. 
It displays "connecting to certificates.netscape.com" but nothing's happened.

Stephane Saux

Comment 14

•

23 years ago

reproduced on Linux build 0725200105.0.9.2
->javi
P1
t->2.0

Assignee: ssaux → javi

Target Milestone: --- → 2.0

Javier Delgadillo

Comment 15

•

23 years ago

anyone know if certificates.netscape.com uses Keep-Alive connections?

bill

Comment 16

•

23 years ago

btw,certificates.netscape.com is still running iCMS 4.1.  I don't know how to
tell if it's using keep_alives but I can check it if someone knows a way to
determine that.

in the CMS.cfg I see:
eeGateway.keepAliveOn=false

I'm running the 0724 build of N6.1 on Windows 2000 and aren't seeing any
problems with that site, fwiw.

Stephane Saux

Comment 17

•

23 years ago

javi:
You may be able to use something like that in a debug build
setenv NSPR_LOG_MODULES nsHttp:5
setenv NSPR_LOG_FILE foo.log
I just saw that in bug 90196 where the resulting log shows information about
keep alive.
Not sure whether it will work in your case.

Matthew Harmsen

Comment 18

•

23 years ago

Using the Linux RTM candidate bits (installer) at:

ftp://sweetlou/products/client/seamonkey/unix/linux/2.2/x86/2001-07-24-18-0.9.2/

I was unable to produce a crash on Red Hat Linux 6.2.  However, I did experience
a "hang" when going from "https://certificates.netscape.com" to
"https://certificates.netscape.com/NSEnroll.html".  However, if I go directly to
the URL "https://certificates.netscape.com/NSEnroll.html", I am able to
successfully enroll and obtain a user certificate.  I verified this with Beomsuk
on his machine, and he was able to produce the exact same behaviour.

Stephane Saux

Comment 19

•

23 years ago

target 2.1

Target Milestone: 2.0 → 2.1

Bob Lord

Comment 20

•

23 years ago

Summary: this bug does not show up when you use CMS 4.2 SP2 (the current
version).  It happens when you use old versions of CMS.

Stephane Saux

Comment 21

•

23 years ago

adding nsenterprise to all P1, P2 PSM bugs with target milestone of 2.1

Keywords: nsenterprise

Jay Patel [:jay]

Comment 22

•

23 years ago

I think this might be a dup of bug 83747 because the stacks look identical. 
Although bug 83747 was logged first, this bug has actually been looked at, so
I'll leave it up to QA to mark that bug a dup of this one.

Adding Trunk, M092 & N610 [@ libc.so.6 - ssl_DefSend] to summary and topcrash
keyword for tracking.  

The final N610 Linux build 2001072504 has been seeing this crash according to
the latest Talkback data. Here are a couple of entries:

ssaux's crash:
Incident ID 33328193
Stack Signature libc.so.6 + 0xb0eb2 (0x404ceeb2) 86d987d7
Bug ID
Trigger Time 2001-07-25 10:43:41
User Comments Reproducing bug 92123
Build ID 2001072504
Product ID Netscape6.10
Platform ID LinuxIntel
Stack Trace
libc.so.6 + 0xb0eb2 (0x404ceeb2)
libnspr4.so + 0x1ce78 (0x401b4e78)
libnspr4.so + 0xc42a (0x401a442a)
ssl_DefSend()
ssl3_SendRecord()
SSL3_SendAlert()
ssl_SecureClose()
ssl_Close()
nsSSLIOLayerClose()
libnspr4.so + 0xb5f7 (0x401a35f7)
nsSocketTransport::CloseConnection()
nsSocketTransport::Process()
nsSocketTransportService::Run()
nsThread::Main()
libnspr4.so + 0x1f3ee (0x401b73ee)
libpthread.so.0 + 0x4eca (0x401caeca) 

and junruh's crash:

Incident ID 33325439
Stack Signature libc.so.6 + 0xdea32 (0x40551a32) f223bb1f
Bug ID
Trigger Time 2001-07-25 09:35:02
User Comments
Build ID 2001072504
Product ID Netscape6.10
Platform ID LinuxIntel
Stack Trace
libc.so.6 + 0xdea32 (0x40551a32)
libnspr4.so + 0x1ce78 (0x401b7e78)
libnspr4.so + 0xc42a (0x401a742a)
ssl_DefSend()
ssl3_SendRecord()
SSL3_SendAlert()
ssl_SecureClose()
ssl_Close()
nsSSLIOLayerClose()
libnspr4.so + 0xb5f7 (0x401a65f7)
nsSocketTransport::CloseConnection()
nsSocketTransport::Process()
nsSocketTransportService::Run()
nsThread::Main()
libnspr4.so + 0x1f3ee (0x401ba3ee)
libpthread.so.0 + 0x760e (0x401d460e)

Keywords: topcrash

Summary: Linux-Crash on AOL cert enrollment page → Linux-Crash on AOL cert enrollment page - Trunk, M092 & N610 [@ libc.so.6 - ssl_DefSend]

ji

Reporter

Comment 23

•

23 years ago

Win32 07/27 branch build and Mac 07/26 branch build are hung when clicking Get
the Cert icon on https://certificates.netscape.com page, the status bar shows
the browser is transfering data from certificates.netscape.com forever. But win
and Mac builds don't crash.

ji

Reporter

Comment 24

•

23 years ago

For win32 and mac builds, when I see the hang, if I reload the page by clicking 
on Stop and Back icon, clicking on Get the Cert icon can get to the enrollment 
page.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 25

•

23 years ago

My memory from looking at talkback reports was that this crash was a SIGPIPE
(not a SIGSEGV as most crashes are).  I'm not behind the firewall now so I can't
double-check.

It's useful to include the crash reason when filing talkback bugs.

chris hofmann

Comment 26

•

23 years ago

Trigger Type:   Program Crash 
Trigger Reason:   SIGPIPE: Write on Pipe, with no one to read: (signal 13)

jay/shiva,  lets get trigger reason added to the quick search report.

Bradley Baetz (:bbaetz)

Assignee

Comment 27

•

23 years ago

*** Bug 83747 has been marked as a duplicate of this bug. ***

Bradley Baetz (:bbaetz)

Assignee

Comment 28

•

23 years ago

I got this a couple of times, when reading mail over imap/ssl (see bug 92517).
The first time was at shutdown, and the second time was while reading mail,
after not having touched the computer for a while. In that case, I could keep
using the product after talkback came up.

Are we writing to a closed socket?

John Unruh

Comment 29

•

23 years ago

Mass assigning QA to ckritzer.

QA Contact: junruh → ckritzer

Javier Delgadillo

Comment 30

•

23 years ago

I'm not seeing this crash on Linux anymore.

Marking WORLSFORME.  Please re-open if this still crashes.

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → WORKSFORME

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 31

•

23 years ago

Still crashes, according to talkback.  In fact, it's probably our #1 trunk
topcrash on Linux.

With recent NSPR build system changes (bug 88045), we now see the top of the
stack more clearly.  Looking at a report from the 2001-08-15 build, the top of
the stack is:

libc.so.6+ ...
pt_Send()
pl_DefSend()
ssl_DefSend()

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Javier Delgadillo

Comment 32

•

23 years ago

(Oops, I added this in the wrong bug.)

Is there a web site that produces this crash? 
https://certificates.netscape.com/ does not anymore.  Without a reproducible
test case, this will *never* get fixed.

(I've tried the 3 sites mentioned in the talkback reports and none of them cause
my build from this morning to crash.)

The fact that the crash on https://certificates.netscape.com/ went away without
changing PSM code leads me to believe we're crashing in NSS because of a bug in
a different part of the code.  Perhaps trying to write to a closed socket that
only happens on slow connections.  

Anyone in QA have a modem they could test with?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 33

•

23 years ago

Recent user comments on this crash (some of which have URLs) are below.  Numbers
in parentheses are talkback report numbers.

     (34159147) Comments: clicked on gaim link
     (34156456) URL: https://secure.globalsign.net/cacert/root.cacert
     (34156456) Comments: Adding a cert via
https://secure.globalsign.net/cacert/root.cacert
     (34144893) Comments: crash on file quit
     (34134388) Comments: I wasn't even in the virtual desktop where mozilla was
running. The mail/news app was up  and was checking/filtering mail  so I assume
that's what did it....
     (34130076) Comments: exit mozilla and terminate network connection at about
the same time
     (34104702) URL: my.yahoo.com
     (34104702) Comments: Pressed refresh after leaving Mozilla unused for about
2 hours.
     (34092247) URL: http://dr.dk/licens/vaerdat.htm
     (34092190) URL: http://dr.dk/licens/vaerdat.htm
     (34069916) Comments: 2001-08-13-08 linux 2.2
     (34069457) Comments: 2001-08-13-08 on linux 2.2tried selecting a newsgroup
to download byusing the download and sync window.on linux it hangs  then i guess
after waiting a whileit crashes
     (34030983) URL: https://jsecom11.sun.com:443/ECom/docs/CompleteRegister.jsp
     (34030983) Comments: Submitting the form by clicking on "Register" button.
     (33996736) URL: salon.com
     (33996736) Comments: Just reading an article.  I didn't even click on
everything.  Odd  I don't think I've ever had a totally spontaneous crash before.
     (33994589) URL: register.com
     (33994589) Comments: submitting e-mail on register.com's webmail.
     (33915983) URL: www.salon.com/....
     (33915983) Comments: clicked on a link to read the next page of a story
     (33897868) Comments: 2001080910 linx 2.2idle had browser/mesger up and it
crashed
     (33844811) Comments: i think only one thread crashed or something.  so
maybe that's a dns problem.  mozilla is still alive and a little bit well

ckritzer (gone)

Comment 34

•

23 years ago

I noticed that nowhere in this bug report does anyone mention the bit about "PSM
or Netscape 6.1 detected: click here first to get a patch" appearing on the
"https://certificates.netscape.com/NSEnroll.html" webpage - is that because this
patch link is a recent addition somewhere?  

If no, why isn't it mentioned - is it inconsequential?  
If yes, has anyone tried it?

ji, could you try this again after installing the patch?

bill

Comment 35

•

23 years ago

That "patch" was added on July 4, 2001.  The "patch" is actually just the GTE
CyberTrust root certificate, which our internal CA chains to.

Brendan Eich [:brendan]

Comment 36

•

23 years ago

cc'ing wtc.  dbaron observed that NSPR passes 0 (not MSG_NOSIGNAL) for the send
flags, and does not abstract signal syscalls, so callers are likely to get
SIGPIPE in socket reader abrupt termination situations.  What to do?  At the
least, I think we need to prevent SIGPIPE from being raised, at the NSPR level
if possible.

/be

Wan-Teh Chang

Comment 37

•

23 years ago

NSPR ignores SIGPIPE.  Maybe that is not happening?

It is possible that we are bitten by the fact that
pthreads on Linux are really process clones.  Maybe
we need to ignore SIGPIPE for each thread.

(Don't use, old account) - Kai Engert

Comment 38

•

23 years ago

That is what I did in a different pthread application I wrote. In Linux, each
pthread has its own mask of signals being blocked and allowed. It is best to
unblock the signals only in those threads where you want to handle the signal.

However, all threads share the same signal handler code, which makes things
difficult if you want to handle a signal in multiple threads, as you don't have
access to thread local storage from within the signal handler...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 39

•

23 years ago

It could be that talkback is trapping the SIGPIPE and sending crash reports even
though Mozilla would otherwise ignore it.

Bradley Baetz (:bbaetz)

Assignee

Comment 40

•

23 years ago

dbaron: Don't think so. Its being raised, and theres no current signal handler
for SIGPIPE, so we'd crash without talkback, wouldn't we?

Brendan Eich [:brendan]

Comment 41

•

23 years ago

It sounds like a non-main thread got killed by the SIGPIPE, and that (rightly)
triggered talkback.  What wtc said: do we need to make sure each new thread
ignores SIGPIPE?

/be

Wan-Teh Chang

Comment 42

•

23 years ago

brendan wrote:
> What wtc said: do we need to make sure each new thread
> ignores SIGPIPE?

According to the LinuxThreads FAQ, this is not necessary.
(http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html#J)
So what NSPR does should be sufficient.

(Don't use, old account) - Kai Engert

Comment 43

•

23 years ago

Is it possible that Mozilla in Linux sets a signal handler for SIGCHLD only?

I traced through InstallUnixSignalHandlers, it does essentially nothing on my
system.

Using gdb, I stopped at the first line of main, set breakpoints to functions
sigaction and sigvec. It only stopped twice, in unix_rand.c, setting handlers
for SIGCHLD only.

Did I miss another system function that sets signal handlers, or do you think
gdb was unable to behave correctly?

gdb was unable to work with a breakpoint set at sigprocmask. However, I tried to
set breakpoints on source locations where sigprocmask should be called, but it
didn't stop.

Do you have an idea, how we could test whether there is really a signal handler
set?

Wan-Teh Chang

Comment 44

•

23 years ago

/netwerk/dns/src/unix-dns.c, line 699 -- CATCH_SIGNAL_DFL(SIGPIPE);

Brendan Eich [:brendan]

Comment 45

•

23 years ago

wtc, that looks like dead code, jwz's old child process dns helper jazz from the
classic days.  Cc'ing gordon and darin, who can best say whether this file
should be cvs removed.

/be

(Don't use, old account) - Kai Engert

Comment 46

•

23 years ago

Wan-Teh helped me to confirm that I saw a debugger problem. During application
init the following code is executed:

    struct sigaction sigact;
    int rv;
    sigact.sa_handler = SIG_IGN;
    sigemptyset(&sigact.sa_mask);
    sigact.sa_flags = 0;
    rv = sigaction(SIGPIPE, &sigact, 0);
    PR_ASSERT(0 == rv);

Brendan Eich [:brendan]

Comment 47

•

23 years ago

Ok, is something reseting SIG_DFL for SIGPIPE later on?

/be

Bradley Baetz (:bbaetz)

Assignee

Comment 48

•

23 years ago

dbaron: you were right. If I take ns/fullsoft/tests/crasher.c, massage it so
that it compiles, and then add the sigaction call from nspr to main(), and
change the crash test to raise(SIGPIPE), then:

a) If we initialise talkback before calling sigaction, then the program
continues after the call to raise.
b) If we call sigaction before initialising talkback, then talkback catches it,
and we appear to crash (I had to hack lots of stuff to get it to run at all, so
I'm not surprised that the dialog didn't show up).

stracing the build shows that talkback calls sigaction for SIGPIPE.

And of course NSPR is initialised before it starts registering components, so we
get case b).

This would also explain why I stopeed seeing this - I stopped using the branch
builds to use self-built trunk stuff (without talkback) after 6.1 was released.
talkback people - we need a way to convince talkback not to register a handler
for SIGPIPE. Alternately we can reset the signal handler for SIGPIPE ourselves
in the QFA component. We don't have to worry about portability in setting the
signal handler because linux is the only unix system to use talkback. I can come
up with the obvious patch for that if thats felt to be the best way to go.

Why did this wait until June to hit us? This should have been happening forever,
shouldn't it?

Wan-Teh Chang

Comment 49

•

23 years ago

bbaetz: good detective work!  We should report this bug to
the talkback vendor and ask them for a fix or workaround.
(The workaround is probably the obvious patch.)

chris hofmann

Comment 50

•

23 years ago

shiva, can you check on this?

Bradley Baetz (:bbaetz)

Assignee

Comment 51

•

23 years ago

The workarround is the obvious patch, and when I get in to work I'll generate
one (of course, I can't actually test that it works, but I'll add a
raise(SIGPIPE), and check that talkback doesn't come up)

The question of why this didn't hit us earlier still remains, though. Did
something in PSM change so that they try writing to a closed socket? This should
have hit non-PSM use, in theory.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 52

•

23 years ago

It does hit non-PSM use.  Out of the last 100 or so libc.so.6 stack signature
crashes that I went through, I'd say about 40-60 were this crash, 14 were
SIGPIPEs in libX11.so.6, and 7 were an IMAP-related SIGPIPE.

Bradley Baetz (:bbaetz)

Assignee

Comment 53

•

23 years ago

OK. Lets disable the signal, then. This will then be identical to the
non-talkback builds. Patch once I get somewhere I can easily build commercial.

Bradley Baetz (:bbaetz)

Assignee

Comment 54

•

23 years ago

Taking, patch to commercial tree attached, looking for r, sr

I've compiled the file I've modified - I can't link it, or test it, though,
because I don't have the config stuff for that.

Assignee: javi → bbaetz

Status: REOPENED → NEW

Component: Client Library → Talkback

Product: PSM → Browser

Target Milestone: 2.1 → mozilla0.9.4

Version: 2.0 → other

Bradley Baetz (:bbaetz)

Assignee

Comment 55

•

23 years ago

Attached patch patch — Details — Splinter Review

Shiva Thirumazhusai

Comment 56

•

23 years ago

Are we going to completely ignore SIGPIPE ?  or Is there a specific case
do we want to ignore ?

Bradley Baetz (:bbaetz)

Assignee

Comment 57

•

23 years ago

Yes, we need to completely ignore sigpipe. That code was taken from the NSPR
init code - we're just duplicating that behaviour.

Syd Logan

Comment 58

•

23 years ago

r=syd

Shiva Thirumazhusai

Comment 59

•

23 years ago

ccing shannon and jdunn.

Bradley Baetz (:bbaetz)

Assignee

Comment 60

•

23 years ago

So I've been told that we support talkback on HPUX as well. I'll have to copy
the NSPR code for that as well (which is slightly different). I'll need testing
that the file compiles, to check that I've included the correct headers.

Bradley Baetz (:bbaetz)

Assignee

Comment 61

•

23 years ago

wtc says that the HPUX specific code is not needed, since mozilla doesn't use
that type of threading (its incompatible with X, apparently). So my original
patch can go in as is.

Can I get an sr from someone please? darin?

Wan-Teh Chang

Comment 62

•

23 years ago

Your patch is fine for all Unix flavors, including
HP-UX.  Make sure the indentation is right.  (Hard
for me to tell from a patch file.)  r=wtc.

Jim Dunn

Comment 63

•

23 years ago

fyi I believe we also use talkback on solaris.

Darin Fisher

Comment 64

•

23 years ago

sr=darin

Bradley Baetz (:bbaetz)

Assignee

Updated

•

23 years ago

Status: NEW → RESOLVED

Closed: 23 years ago → 23 years ago

QA Contact: ckritzer → chofmann

Resolution: --- → FIXED

Bradley Baetz (:bbaetz)

Assignee

Comment 65

•

23 years ago

Fix checked into the comm tree yesterday. QA assigning to default talkback QA. 
I guess to verify, just look at the talkback logs.

Jan Carpenter

Comment 66

•

23 years ago

I'm not finding this exact stack trace in talkback for today.  I'm marking this
verified, reopen if it turns up again.

Status: RESOLVED → VERIFIED

Bradley Baetz (:bbaetz)

Assignee

Comment 67

•

23 years ago

*** Bug 89518 has been marked as a duplicate of this bug. ***

Stephane Saux

Comment 68

•

23 years ago

*** Bug 96269 has been marked as a duplicate of this bug. ***

timeless

Updated

•

15 years ago

Product: Core → Core Graveyard

Nobody; OK to take it and work on it

Updated

•

13 years ago

Crash Signature: [@ libc.so.6 - ssl_DefSend]