NSS3.4 / Any SSL transaction causes Mac to freeze

VERIFIED WORKSFORME

Status

defect
P1
blocker
VERIFIED WORKSFORME
18 years ago
18 years ago

People

(Reporter: javi, Assigned: wtc)

Tracking

({regression})

PowerPC
Mac System 9.x
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

()

Attachments

(4 attachments)

I've been seeing this problem since last Thursday.  Whenever I try to access an
SSL site with a Mac build using the trunk of NSS, the Mac freezes.  I have to
force quit the application or re-set the system.

When debugging, I found that the first handshake never returns.  I haven't had
time to debug any further than that.
Blocks: 116334
I feared it could have something to do with checkins from bug 106188, but I
think you saw that problem before the check in were made. When in doubt, and we
don't have other ideas, you could try to revert those patches in your tree.
Summary: Any SSL transaction causes Mac to freeze → NSS3.4 / Any SSL transaction causes Mac to freeze
I just learned that bug 106188 caused a regression for BeOS, so maybe it is
indeed the same problem here?
John, 

Can you connect to an SSL site using a turnk build?  If so, then this is not
caused by kaie's previous patch.
With the Mac 9.1 1/23/08 trunk build, the browser locks up sometimes when 
visiting SSL sites. For example, sometimes it can reach a site, such as 
https://pki/tests.html, and then after restarting, trying to reach that site 
locks up the browser, and if you don't soon force quit, the machine has to be 
rebooted.
I'm seeing the same results with build 2002012403 on MacOS X. When I access an
https server the busy animation starts and just keeps going. Pressing the stop
button causes the browser to freeze, requiring a force quit.

I'd suggest upgrading to major or critical.

John,

In comment #4, you said you saw the same lock-up problem with
the Mac 9.1 1/23/08 trunk build.  Does that build use the trunk
of NSS (aka NSS 3.4)?
Severity: normal → critical
Mac trunk builds still use NSS 3.3.  The only Mac in the world that is using NSS
3.4 is located in my cube.  ;)
Simon, Steve, does this look like a duplicate of the hang
described in bug 99561?  How do we use "Sampler" to get
the thread stacks?
So get a sampler trace, run Sampler (assuming you installed the developer tools). 
File->Attach, attach to the mozilla process. Window comes up. Click 'Start 
Sampling' Wait for a few seconds. Click 'Stop'. To get textual output, use Graph-
>Generate Report.
Posted file sampler trace
looks like we're hanging in imageLib waiting on a semaphore.
Regression from 99561 -> sdagley
Assignee: wtc → sdagley
um, no, I talked to wtc and this regressed last week before 99561 went in
We need to see evidence that a older build can display this problem then.

Pink's sampler trace is confusing, and may be bogus. It shows that Thread_0 (a
native thread) is stuck in a PR__Lock, and Thread_1 is in MD_PauseCPU. But in a
CFM build, in which all NSPR threads run on a single pthread, this situation is
not possible to achieve. PR_UserRunThread should always be called on the main
(native) thread.
I'm testing older builds now.
This is a regression caused by #106188 - backing those mods out eliminates the 
problem, even with the fix for #99561 still in
this is a smoketest blocker. 
Severity: critical → blocker
Keywords: smoketest
As noted in <http://bugzilla.mozilla.org/show_bug.cgi?id=121326#c15> this is a 
regression from #106188.  Giving back to module owner.
Assignee: sdagley → wtc
Something bad in mac nspr land:
Assertion failure: lock->owner != me, at prulock.c:268
Assertion failure: thread->md.asyncIOLock->owner == NULL, at macthr.c:301
Assignee: wtc → sfraser
Steve, Simon,

Please give this patch a try on the Mac.
That patch seems to work if the fix for 99561 isn't in
let me qualify that "work" comment - I'm testing the Carbon build.  I don't have 
a classic build handy to try
Assertion failure: lock->owner != me, at prulock.c:268

This assertion happens because the Mac _MD_Poll code is holding the asyncIOLock 
lock, while calling the socket->poll method. ssl_Poll ends up calling 
_MD_getpeername(), which tries to grab the same lock again. Stack:

  0856B3C0    PPC  3CB232F8  _PR_UserRunThread+000C8
  0856B340    PPC  3C2AABE4  nsThread::Main(void*)+000C4
  0856B2C0    PPC  3C127524  nsSocketTransportService::Run()+00094
  0856B260    PPC  3CB11084  PR_Poll+00024
  0856B220    PPC  3CB2DCFC  _MD_poll+0007C
  0856B1C0    PPC  3CB2D8D0  CheckPollDescs+00090
  0856B160    PPC  3CB19B88  pl_DefPoll+00078
  0856B120    PPC  3AA901F0  ssl_Poll+000D0
  0856B0A0    PPC  3AAA89D8  ssl_DefGetpeername+00038
  0856B060    PPC  3CB36F14  Ipv6ToIpv4SocketGetPeerName+00034
  0856AFF0    PPC  3CB15EE0  SocketGetPeerName+00030
  0856AFA0    PPC  3CB2E1D0  _MD_getpeername+000A0
  0856AF20    PPC  3CB290FC  WaitOnThisThread+0003C
  0856AED0    PPC  3CB22240  PR_Lock+00130
  0856AE70    PPC  3CB11BD8  PR_Assert+00048

We come out of this deadlock.
Sorry, turns out the patch only works sometimes.
This patch stops _MD_poll from holding the thread->md.asyncIOLock around the
call to CheckPollDescs(), thus making it OK for poll methods to make other
socket calls (like GetPeerName()) that might have to do blocking calls. This
makes the page load OK for me (on Mac OS 9). It's still not ideal, because
_MD_Poll calls PrepareForAsyncCompletion(), which is called again in
_MD_getpeername() (hence the commented out assertion).
Note that I was never sure whether we need to turn off interrupts, and lock the 
asyncIOLock in _MD_Poll. I remember asking gordon about that, and him saying 
"Well, it can't hurt".
With sfraser's patch both thawte's try a SSL cert page and the wellsfargo.com 
acct sign in page load for me on OS X
Comment on attachment 66370 [details] [diff] [review]
A hack for Mac's PR_ConnectContinue

Thanks, Simon and Steve.

So should I mark my patch obsolete?
Yes, but I'm not totally happy with my patch yet either. I don't like two calls 
to PrepareForAsyncCompletion() on the same thread. Maybe _MD_Poll should only 
call this after the CheckPollDescs() call?
*** Bug 121722 has been marked as a duplicate of this bug. ***
*** Bug 121683 has been marked as a duplicate of this bug. ***
*** Bug 121455 has been marked as a duplicate of this bug. ***
Is this broken on the 0.9.8 branch too?
Status: NEW → ASSIGNED
yes, it was landed on Friday
lowering severity to pull off of sheriff's radar since holding the tree won't
help this get fixed any faster.
Severity: blocker → critical
Status: fixing Mac NSPR is turning out to be difficult, though needs to be done 
in the long run. For the branch, we need to back out some or all of the patch 
checked in for bug 106188 to fix this, but I can't do this before Monday. If 
anyone can try that and test, that would be great.
I did that yesterday - backing out #106188 definitely eliminates the hang on the 
Carbon build.  I don't have a Classic build to try
*** Bug 121775 has been marked as a duplicate of this bug. ***
back to blocker
Severity: critical → blocker
The patch to fix this is in bug 106188.
any objections to marking this bug a dup of 106188?
Yes :)

That bug is about fixing blocking connects. This bug is about a problem in Mac 
NSPR (which we can't fix eaily enough to open the tree).
Do not mark this as a dupe.  

This bug was originally filed because the trunk of NSS (ie NSS 3.4) was not
working on the Mac.  This is a work in progress that has not landed on the
NS_CLIENT_TAG yet.  Somehow this bug turned into the SSL implementation of Mac
Mozilla on the trunk of the Mozilla tree is broken. 

Even when the trunk of MOzilla's SSL implementation is fixed, the NSS 3.4 (ie
trunk of NSS) implementation will still need to be fixed.
Javier is right.  This bug is about a Mac freeze problem when
using NSS 3.4.  (Mozilla is using NSS 3.3.2.)  This problem
may or may not be the same as the freeze problem that you guys
ran into with the regular Mozilla build.

I've opened new Mac NSPR bugs to track the underlying bugs for
the freeze of the regular Mozilla build.  They are listed as
the dependencies of bug 106188.  If you are primarily interested
in regular Mozilla builds, you should follow those two Mac NSPR
bugs and remove yourself from the cc list of this bug.
Simon can we take par tof the fix in the bug you cited that fixes this problem?
This is the last blocker keeping the tree closed today and I'm trying to get a
feel for when we are going to have something so we can open the tree up. 
I checked said patch into the trunk, so this is fixed. Note that bugs exist to 
fix Mac NSPR the right way (bug 121952, bug 121951).
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
We (the security team) originally opened this bug for
a Mac freeze problem when using NSS 3.4, which has not
yet landed.

We still need to investigate the freeze when using NSS
3.4 on the Mac.  I am reopening this bug.
Status: RESOLVED → REOPENED
Priority: -- → P1
Resolution: FIXED → ---
Target Milestone: --- → 3.4
Reassigned the bug to myself.

Removed those who are not working on NSS or Mac NSPR from
the cc list.  This bug is about the not-yet-released NSS 3.4.
Assignee: sfraser → wtc
Status: REOPENED → NEW
removing smoktest keword 
Keywords: smoketest
The Carbon, optimized build with NSS 3.4 I did this morning
exhibited a different problem.  When running on OS X, it
crashes if I go to any secure sites with this error message:
    The application Mozilla has unexpectedly quit.

I did not see the freeze problem that Javi and John saw on
Mac OS 9.
wtc: turn on Crash Reporter (run the Console app, and look in its Preferences). 
Then you should get a stack trace.
Here are the Mac build instructions to do a Mozilla build
with NSS 3.4:

1. Pull mozilla/build/mac/build_scripts from the NSS_3_4_LANDING_BRANCH.
2. Follow the normal procedure.

If you can help us debug this, that will be much appreciated.
This crash looks like nsEventStateManger::ShiftFocus blowing out the stack again.
We would really appreciate it if we could get some help from the CPD mac experts
on this one.  Wan-Teh posted NSS3.4 mac builds instruction in comment #52.
Thanks.
That nsEventStateManger crasher should only show up in the Classic theme. Try 
switching to the modern theme, and testing again. If it doesn't crash, you're ok.
Simon,

You are right.  Mozilla does not crash after switching to the
modern theme.

So, the crash when using the classic theme is not my fault?

Javi, John, could you test the Classic build with NSS 3.4 on
Mac OS 9.x?  Thanks.
doesn't block 116334. This is not an NSS3.4 issue.
No longer blocks: 116334
The only issue right now is that the Carbon build with NSS 3.4
crashes when using the classic theme.  Based on the crash log
(attachment 67013 [details]) and Simon Fraser's comment #53, I don't think
this is an NSS 3.4 issue.

Marked the bug WORKSFORME.
Status: NEW → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → WORKSFORME
I am seeing random crashes when I access https pages. It will work for a while
and then quit with and "Mozilla has unexpectedly quit" error in the Finder. I am
on Mac OS 9.2.2 using a recent nightly (not sure which one...I will check when I
get home).

I will also attach a crash report when I get home and repro the problem. I had
seen this behaviour before but no it is much more rampant. I have tried with a
clean profile. I am using the Modern theme.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Sorry for the spam, I have opened a new bug 127278 with my problems in it. Can
someone else put this back at WFM as I can't.
Marked the bug WORKSFORME.
Status: REOPENED → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → WORKSFORME
Verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.