Closed Bug 121326 Opened 24 years ago Closed 24 years ago

NSS3.4 / Any SSL transaction causes Mac to freeze

Categories

(NSS :: Libraries, defect, P1)

PowerPC
Mac System 9.x
defect

Tracking

(Not tracked)

VERIFIED WORKSFORME

People

(Reporter: javi, Assigned: wtc)

References

()

Details

(Keywords: regression)

Attachments

(4 files)

I've been seeing this problem since last Thursday. Whenever I try to access an SSL site with a Mac build using the trunk of NSS, the Mac freezes. I have to force quit the application or re-set the system. When debugging, I found that the first handshake never returns. I haven't had time to debug any further than that.
Blocks: 116334
I feared it could have something to do with checkins from bug 106188, but I think you saw that problem before the check in were made. When in doubt, and we don't have other ideas, you could try to revert those patches in your tree.
Summary: Any SSL transaction causes Mac to freeze → NSS3.4 / Any SSL transaction causes Mac to freeze
I just learned that bug 106188 caused a regression for BeOS, so maybe it is indeed the same problem here?
John, Can you connect to an SSL site using a turnk build? If so, then this is not caused by kaie's previous patch.
With the Mac 9.1 1/23/08 trunk build, the browser locks up sometimes when visiting SSL sites. For example, sometimes it can reach a site, such as https://pki/tests.html, and then after restarting, trying to reach that site locks up the browser, and if you don't soon force quit, the machine has to be rebooted.
I'm seeing the same results with build 2002012403 on MacOS X. When I access an https server the busy animation starts and just keeps going. Pressing the stop button causes the browser to freeze, requiring a force quit. I'd suggest upgrading to major or critical.
John, In comment #4, you said you saw the same lock-up problem with the Mac 9.1 1/23/08 trunk build. Does that build use the trunk of NSS (aka NSS 3.4)?
Severity: normal → critical
Mac trunk builds still use NSS 3.3. The only Mac in the world that is using NSS 3.4 is located in my cube. ;)
Simon, Steve, does this look like a duplicate of the hang described in bug 99561? How do we use "Sampler" to get the thread stacks?
So get a sampler trace, run Sampler (assuming you installed the developer tools). File->Attach, attach to the mozilla process. Window comes up. Click 'Start Sampling' Wait for a few seconds. Click 'Stop'. To get textual output, use Graph- >Generate Report.
Attached file sampler trace
looks like we're hanging in imageLib waiting on a semaphore.
Regression from 99561 -> sdagley
Assignee: wtc → sdagley
um, no, I talked to wtc and this regressed last week before 99561 went in
We need to see evidence that a older build can display this problem then. Pink's sampler trace is confusing, and may be bogus. It shows that Thread_0 (a native thread) is stuck in a PR__Lock, and Thread_1 is in MD_PauseCPU. But in a CFM build, in which all NSPR threads run on a single pthread, this situation is not possible to achieve. PR_UserRunThread should always be called on the main (native) thread.
I'm testing older builds now.
This is a regression caused by #106188 - backing those mods out eliminates the problem, even with the fix for #99561 still in
this is a smoketest blocker.
Severity: critical → blocker
Keywords: smoketest
As noted in <http://bugzilla.mozilla.org/show_bug.cgi?id=121326#c15> this is a regression from #106188. Giving back to module owner.
Assignee: sdagley → wtc
Something bad in mac nspr land: Assertion failure: lock->owner != me, at prulock.c:268 Assertion failure: thread->md.asyncIOLock->owner == NULL, at macthr.c:301
Assignee: wtc → sfraser
Steve, Simon, Please give this patch a try on the Mac.
That patch seems to work if the fix for 99561 isn't in
let me qualify that "work" comment - I'm testing the Carbon build. I don't have a classic build handy to try
Assertion failure: lock->owner != me, at prulock.c:268 This assertion happens because the Mac _MD_Poll code is holding the asyncIOLock lock, while calling the socket->poll method. ssl_Poll ends up calling _MD_getpeername(), which tries to grab the same lock again. Stack: 0856B3C0 PPC 3CB232F8 _PR_UserRunThread+000C8 0856B340 PPC 3C2AABE4 nsThread::Main(void*)+000C4 0856B2C0 PPC 3C127524 nsSocketTransportService::Run()+00094 0856B260 PPC 3CB11084 PR_Poll+00024 0856B220 PPC 3CB2DCFC _MD_poll+0007C 0856B1C0 PPC 3CB2D8D0 CheckPollDescs+00090 0856B160 PPC 3CB19B88 pl_DefPoll+00078 0856B120 PPC 3AA901F0 ssl_Poll+000D0 0856B0A0 PPC 3AAA89D8 ssl_DefGetpeername+00038 0856B060 PPC 3CB36F14 Ipv6ToIpv4SocketGetPeerName+00034 0856AFF0 PPC 3CB15EE0 SocketGetPeerName+00030 0856AFA0 PPC 3CB2E1D0 _MD_getpeername+000A0 0856AF20 PPC 3CB290FC WaitOnThisThread+0003C 0856AED0 PPC 3CB22240 PR_Lock+00130 0856AE70 PPC 3CB11BD8 PR_Assert+00048 We come out of this deadlock.
Sorry, turns out the patch only works sometimes.
This patch stops _MD_poll from holding the thread->md.asyncIOLock around the call to CheckPollDescs(), thus making it OK for poll methods to make other socket calls (like GetPeerName()) that might have to do blocking calls. This makes the page load OK for me (on Mac OS 9). It's still not ideal, because _MD_Poll calls PrepareForAsyncCompletion(), which is called again in _MD_getpeername() (hence the commented out assertion).
Note that I was never sure whether we need to turn off interrupts, and lock the asyncIOLock in _MD_Poll. I remember asking gordon about that, and him saying "Well, it can't hurt".
With sfraser's patch both thawte's try a SSL cert page and the wellsfargo.com acct sign in page load for me on OS X
Comment on attachment 66370 [details] [diff] [review] A hack for Mac's PR_ConnectContinue Thanks, Simon and Steve. So should I mark my patch obsolete?
Yes, but I'm not totally happy with my patch yet either. I don't like two calls to PrepareForAsyncCompletion() on the same thread. Maybe _MD_Poll should only call this after the CheckPollDescs() call?
*** Bug 121722 has been marked as a duplicate of this bug. ***
*** Bug 121683 has been marked as a duplicate of this bug. ***
*** Bug 121455 has been marked as a duplicate of this bug. ***
Is this broken on the 0.9.8 branch too?
Status: NEW → ASSIGNED
yes, it was landed on Friday
lowering severity to pull off of sheriff's radar since holding the tree won't help this get fixed any faster.
Severity: blocker → critical
Status: fixing Mac NSPR is turning out to be difficult, though needs to be done in the long run. For the branch, we need to back out some or all of the patch checked in for bug 106188 to fix this, but I can't do this before Monday. If anyone can try that and test, that would be great.
I did that yesterday - backing out #106188 definitely eliminates the hang on the Carbon build. I don't have a Classic build to try
*** Bug 121775 has been marked as a duplicate of this bug. ***
back to blocker
Severity: critical → blocker
The patch to fix this is in bug 106188.
any objections to marking this bug a dup of 106188?
Yes :) That bug is about fixing blocking connects. This bug is about a problem in Mac NSPR (which we can't fix eaily enough to open the tree).
Do not mark this as a dupe. This bug was originally filed because the trunk of NSS (ie NSS 3.4) was not working on the Mac. This is a work in progress that has not landed on the NS_CLIENT_TAG yet. Somehow this bug turned into the SSL implementation of Mac Mozilla on the trunk of the Mozilla tree is broken. Even when the trunk of MOzilla's SSL implementation is fixed, the NSS 3.4 (ie trunk of NSS) implementation will still need to be fixed.
Javier is right. This bug is about a Mac freeze problem when using NSS 3.4. (Mozilla is using NSS 3.3.2.) This problem may or may not be the same as the freeze problem that you guys ran into with the regular Mozilla build. I've opened new Mac NSPR bugs to track the underlying bugs for the freeze of the regular Mozilla build. They are listed as the dependencies of bug 106188. If you are primarily interested in regular Mozilla builds, you should follow those two Mac NSPR bugs and remove yourself from the cc list of this bug.
Simon can we take par tof the fix in the bug you cited that fixes this problem? This is the last blocker keeping the tree closed today and I'm trying to get a feel for when we are going to have something so we can open the tree up.
I checked said patch into the trunk, so this is fixed. Note that bugs exist to fix Mac NSPR the right way (bug 121952, bug 121951).
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
We (the security team) originally opened this bug for a Mac freeze problem when using NSS 3.4, which has not yet landed. We still need to investigate the freeze when using NSS 3.4 on the Mac. I am reopening this bug.
Status: RESOLVED → REOPENED
Priority: -- → P1
Resolution: FIXED → ---
Target Milestone: --- → 3.4
Reassigned the bug to myself. Removed those who are not working on NSS or Mac NSPR from the cc list. This bug is about the not-yet-released NSS 3.4.
Assignee: sfraser → wtc
Status: REOPENED → NEW
removing smoktest keword
Keywords: smoketest
The Carbon, optimized build with NSS 3.4 I did this morning exhibited a different problem. When running on OS X, it crashes if I go to any secure sites with this error message: The application Mozilla has unexpectedly quit. I did not see the freeze problem that Javi and John saw on Mac OS 9.
wtc: turn on Crash Reporter (run the Console app, and look in its Preferences). Then you should get a stack trace.
Here are the Mac build instructions to do a Mozilla build with NSS 3.4: 1. Pull mozilla/build/mac/build_scripts from the NSS_3_4_LANDING_BRANCH. 2. Follow the normal procedure. If you can help us debug this, that will be much appreciated.
This crash looks like nsEventStateManger::ShiftFocus blowing out the stack again.
We would really appreciate it if we could get some help from the CPD mac experts on this one. Wan-Teh posted NSS3.4 mac builds instruction in comment #52. Thanks.
That nsEventStateManger crasher should only show up in the Classic theme. Try switching to the modern theme, and testing again. If it doesn't crash, you're ok.
Simon, You are right. Mozilla does not crash after switching to the modern theme. So, the crash when using the classic theme is not my fault? Javi, John, could you test the Classic build with NSS 3.4 on Mac OS 9.x? Thanks.
doesn't block 116334. This is not an NSS3.4 issue.
No longer blocks: 116334
The only issue right now is that the Carbon build with NSS 3.4 crashes when using the classic theme. Based on the crash log (attachment 67013 [details]) and Simon Fraser's comment #53, I don't think this is an NSS 3.4 issue. Marked the bug WORKSFORME.
Status: NEW → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → WORKSFORME
I am seeing random crashes when I access https pages. It will work for a while and then quit with and "Mozilla has unexpectedly quit" error in the Finder. I am on Mac OS 9.2.2 using a recent nightly (not sure which one...I will check when I get home). I will also attach a crash report when I get home and repro the problem. I had seen this behaviour before but no it is much more rampant. I have tried with a clean profile. I am using the Modern theme.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Sorry for the spam, I have opened a new bug 127278 with my problems in it. Can someone else put this back at WFM as I can't.
Marked the bug WORKSFORME.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → WORKSFORME
Verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: