Closed
Bug 121326
Opened 23 years ago
Closed 23 years ago
NSS3.4 / Any SSL transaction causes Mac to freeze
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
VERIFIED
WORKSFORME
3.4
People
(Reporter: javi, Assigned: wtc)
References
()
Details
(Keywords: regression)
Attachments
(4 files)
21.82 KB,
text/plain
|
Details | |
805 bytes,
patch
|
Details | Diff | Splinter Review | |
1.08 KB,
patch
|
Details | Diff | Splinter Review | |
20.82 KB,
text/plain
|
Details |
I've been seeing this problem since last Thursday. Whenever I try to access an SSL site with a Mac build using the trunk of NSS, the Mac freezes. I have to force quit the application or re-set the system. When debugging, I found that the first handshake never returns. I haven't had time to debug any further than that.
Comment 1•23 years ago
|
||
I feared it could have something to do with checkins from bug 106188, but I think you saw that problem before the check in were made. When in doubt, and we don't have other ideas, you could try to revert those patches in your tree.
Updated•23 years ago
|
Summary: Any SSL transaction causes Mac to freeze → NSS3.4 / Any SSL transaction causes Mac to freeze
Comment 2•23 years ago
|
||
I just learned that bug 106188 caused a regression for BeOS, so maybe it is indeed the same problem here?
Reporter | ||
Comment 3•23 years ago
|
||
John, Can you connect to an SSL site using a turnk build? If so, then this is not caused by kaie's previous patch.
Comment 4•23 years ago
|
||
With the Mac 9.1 1/23/08 trunk build, the browser locks up sometimes when visiting SSL sites. For example, sometimes it can reach a site, such as https://pki/tests.html, and then after restarting, trying to reach that site locks up the browser, and if you don't soon force quit, the machine has to be rebooted.
Comment 5•23 years ago
|
||
I'm seeing the same results with build 2002012403 on MacOS X. When I access an https server the busy animation starts and just keeps going. Pressing the stop button causes the browser to freeze, requiring a force quit. I'd suggest upgrading to major or critical.
Assignee | ||
Comment 6•23 years ago
|
||
John, In comment #4, you said you saw the same lock-up problem with the Mac 9.1 1/23/08 trunk build. Does that build use the trunk of NSS (aka NSS 3.4)?
Assignee | ||
Updated•23 years ago
|
Severity: normal → critical
Reporter | ||
Comment 7•23 years ago
|
||
Mac trunk builds still use NSS 3.3. The only Mac in the world that is using NSS 3.4 is located in my cube. ;)
Assignee | ||
Comment 8•23 years ago
|
||
Simon, Steve, does this look like a duplicate of the hang described in bug 99561? How do we use "Sampler" to get the thread stacks?
Comment 9•23 years ago
|
||
So get a sampler trace, run Sampler (assuming you installed the developer tools).
File->Attach, attach to the mozilla process. Window comes up. Click 'Start
Sampling' Wait for a few seconds. Click 'Stop'. To get textual output, use Graph-
>Generate Report.
Comment 10•23 years ago
|
||
looks like we're hanging in imageLib waiting on a semaphore.
Updated•23 years ago
|
Comment 12•23 years ago
|
||
um, no, I talked to wtc and this regressed last week before 99561 went in
Comment 13•23 years ago
|
||
We need to see evidence that a older build can display this problem then. Pink's sampler trace is confusing, and may be bogus. It shows that Thread_0 (a native thread) is stuck in a PR__Lock, and Thread_1 is in MD_PauseCPU. But in a CFM build, in which all NSPR threads run on a single pthread, this situation is not possible to achieve. PR_UserRunThread should always be called on the main (native) thread.
Comment 14•23 years ago
|
||
I'm testing older builds now.
Comment 15•23 years ago
|
||
This is a regression caused by #106188 - backing those mods out eliminates the problem, even with the fix for #99561 still in
Comment 16•23 years ago
|
||
this is a smoketest blocker.
Severity: critical → blocker
Keywords: smoketest
Comment 17•23 years ago
|
||
As noted in <http://bugzilla.mozilla.org/show_bug.cgi?id=121326#c15> this is a regression from #106188. Giving back to module owner.
Assignee: sdagley → wtc
Comment 18•23 years ago
|
||
Something bad in mac nspr land: Assertion failure: lock->owner != me, at prulock.c:268 Assertion failure: thread->md.asyncIOLock->owner == NULL, at macthr.c:301
Assignee: wtc → sfraser
Assignee | ||
Comment 19•23 years ago
|
||
Steve, Simon, Please give this patch a try on the Mac.
Comment 20•23 years ago
|
||
That patch seems to work if the fix for 99561 isn't in
Comment 21•23 years ago
|
||
let me qualify that "work" comment - I'm testing the Carbon build. I don't have a classic build handy to try
Comment 22•23 years ago
|
||
Assertion failure: lock->owner != me, at prulock.c:268 This assertion happens because the Mac _MD_Poll code is holding the asyncIOLock lock, while calling the socket->poll method. ssl_Poll ends up calling _MD_getpeername(), which tries to grab the same lock again. Stack: 0856B3C0 PPC 3CB232F8 _PR_UserRunThread+000C8 0856B340 PPC 3C2AABE4 nsThread::Main(void*)+000C4 0856B2C0 PPC 3C127524 nsSocketTransportService::Run()+00094 0856B260 PPC 3CB11084 PR_Poll+00024 0856B220 PPC 3CB2DCFC _MD_poll+0007C 0856B1C0 PPC 3CB2D8D0 CheckPollDescs+00090 0856B160 PPC 3CB19B88 pl_DefPoll+00078 0856B120 PPC 3AA901F0 ssl_Poll+000D0 0856B0A0 PPC 3AAA89D8 ssl_DefGetpeername+00038 0856B060 PPC 3CB36F14 Ipv6ToIpv4SocketGetPeerName+00034 0856AFF0 PPC 3CB15EE0 SocketGetPeerName+00030 0856AFA0 PPC 3CB2E1D0 _MD_getpeername+000A0 0856AF20 PPC 3CB290FC WaitOnThisThread+0003C 0856AED0 PPC 3CB22240 PR_Lock+00130 0856AE70 PPC 3CB11BD8 PR_Assert+00048 We come out of this deadlock.
Comment 23•23 years ago
|
||
Sorry, turns out the patch only works sometimes.
Comment 24•23 years ago
|
||
This patch stops _MD_poll from holding the thread->md.asyncIOLock around the call to CheckPollDescs(), thus making it OK for poll methods to make other socket calls (like GetPeerName()) that might have to do blocking calls. This makes the page load OK for me (on Mac OS 9). It's still not ideal, because _MD_Poll calls PrepareForAsyncCompletion(), which is called again in _MD_getpeername() (hence the commented out assertion).
Comment 25•23 years ago
|
||
Note that I was never sure whether we need to turn off interrupts, and lock the asyncIOLock in _MD_Poll. I remember asking gordon about that, and him saying "Well, it can't hurt".
Comment 26•23 years ago
|
||
With sfraser's patch both thawte's try a SSL cert page and the wellsfargo.com acct sign in page load for me on OS X
Assignee | ||
Comment 27•23 years ago
|
||
Comment on attachment 66370 [details] [diff] [review] A hack for Mac's PR_ConnectContinue Thanks, Simon and Steve. So should I mark my patch obsolete?
Comment 28•23 years ago
|
||
Yes, but I'm not totally happy with my patch yet either. I don't like two calls to PrepareForAsyncCompletion() on the same thread. Maybe _MD_Poll should only call this after the CheckPollDescs() call?
Comment 29•23 years ago
|
||
*** Bug 121722 has been marked as a duplicate of this bug. ***
Comment 30•23 years ago
|
||
*** Bug 121683 has been marked as a duplicate of this bug. ***
Comment 31•23 years ago
|
||
*** Bug 121455 has been marked as a duplicate of this bug. ***
Comment 33•23 years ago
|
||
yes, it was landed on Friday
Comment 34•23 years ago
|
||
lowering severity to pull off of sheriff's radar since holding the tree won't help this get fixed any faster.
Severity: blocker → critical
Comment 35•23 years ago
|
||
Status: fixing Mac NSPR is turning out to be difficult, though needs to be done in the long run. For the branch, we need to back out some or all of the patch checked in for bug 106188 to fix this, but I can't do this before Monday. If anyone can try that and test, that would be great.
Comment 36•23 years ago
|
||
I did that yesterday - backing out #106188 definitely eliminates the hang on the Carbon build. I don't have a Classic build to try
Comment 37•23 years ago
|
||
*** Bug 121775 has been marked as a duplicate of this bug. ***
Comment 39•23 years ago
|
||
The patch to fix this is in bug 106188.
Comment 40•23 years ago
|
||
any objections to marking this bug a dup of 106188?
Comment 41•23 years ago
|
||
Yes :) That bug is about fixing blocking connects. This bug is about a problem in Mac NSPR (which we can't fix eaily enough to open the tree).
Reporter | ||
Comment 42•23 years ago
|
||
Do not mark this as a dupe. This bug was originally filed because the trunk of NSS (ie NSS 3.4) was not working on the Mac. This is a work in progress that has not landed on the NS_CLIENT_TAG yet. Somehow this bug turned into the SSL implementation of Mac Mozilla on the trunk of the Mozilla tree is broken. Even when the trunk of MOzilla's SSL implementation is fixed, the NSS 3.4 (ie trunk of NSS) implementation will still need to be fixed.
Assignee | ||
Comment 43•23 years ago
|
||
Javier is right. This bug is about a Mac freeze problem when using NSS 3.4. (Mozilla is using NSS 3.3.2.) This problem may or may not be the same as the freeze problem that you guys ran into with the regular Mozilla build. I've opened new Mac NSPR bugs to track the underlying bugs for the freeze of the regular Mozilla build. They are listed as the dependencies of bug 106188. If you are primarily interested in regular Mozilla builds, you should follow those two Mac NSPR bugs and remove yourself from the cc list of this bug.
Comment 44•23 years ago
|
||
Simon can we take par tof the fix in the bug you cited that fixes this problem? This is the last blocker keeping the tree closed today and I'm trying to get a feel for when we are going to have something so we can open the tree up.
Comment 45•23 years ago
|
||
I checked said patch into the trunk, so this is fixed. Note that bugs exist to fix Mac NSPR the right way (bug 121952, bug 121951).
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 46•23 years ago
|
||
We (the security team) originally opened this bug for a Mac freeze problem when using NSS 3.4, which has not yet landed. We still need to investigate the freeze when using NSS 3.4 on the Mac. I am reopening this bug.
Status: RESOLVED → REOPENED
Priority: -- → P1
Resolution: FIXED → ---
Target Milestone: --- → 3.4
Assignee | ||
Comment 47•23 years ago
|
||
Reassigned the bug to myself. Removed those who are not working on NSS or Mac NSPR from the cc list. This bug is about the not-yet-released NSS 3.4.
Assignee: sfraser → wtc
Status: REOPENED → NEW
Assignee | ||
Comment 49•23 years ago
|
||
The Carbon, optimized build with NSS 3.4 I did this morning exhibited a different problem. When running on OS X, it crashes if I go to any secure sites with this error message: The application Mozilla has unexpectedly quit. I did not see the freeze problem that Javi and John saw on Mac OS 9.
Comment 50•23 years ago
|
||
wtc: turn on Crash Reporter (run the Console app, and look in its Preferences). Then you should get a stack trace.
Assignee | ||
Comment 51•23 years ago
|
||
Assignee | ||
Comment 52•23 years ago
|
||
Here are the Mac build instructions to do a Mozilla build with NSS 3.4: 1. Pull mozilla/build/mac/build_scripts from the NSS_3_4_LANDING_BRANCH. 2. Follow the normal procedure. If you can help us debug this, that will be much appreciated.
Comment 53•23 years ago
|
||
This crash looks like nsEventStateManger::ShiftFocus blowing out the stack again.
Comment 54•23 years ago
|
||
We would really appreciate it if we could get some help from the CPD mac experts on this one. Wan-Teh posted NSS3.4 mac builds instruction in comment #52. Thanks.
Comment 55•23 years ago
|
||
That nsEventStateManger crasher should only show up in the Classic theme. Try switching to the modern theme, and testing again. If it doesn't crash, you're ok.
Assignee | ||
Comment 56•23 years ago
|
||
Simon, You are right. Mozilla does not crash after switching to the modern theme. So, the crash when using the classic theme is not my fault? Javi, John, could you test the Classic build with NSS 3.4 on Mac OS 9.x? Thanks.
Comment 57•23 years ago
|
||
doesn't block 116334. This is not an NSS3.4 issue.
No longer blocks: 116334
Assignee | ||
Comment 58•23 years ago
|
||
The only issue right now is that the Carbon build with NSS 3.4 crashes when using the classic theme. Based on the crash log (attachment 67013 [details]) and Simon Fraser's comment #53, I don't think this is an NSS 3.4 issue. Marked the bug WORKSFORME.
Status: NEW → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → WORKSFORME
Comment 59•23 years ago
|
||
I am seeing random crashes when I access https pages. It will work for a while and then quit with and "Mozilla has unexpectedly quit" error in the Finder. I am on Mac OS 9.2.2 using a recent nightly (not sure which one...I will check when I get home). I will also attach a crash report when I get home and repro the problem. I had seen this behaviour before but no it is much more rampant. I have tried with a clean profile. I am using the Modern theme.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment 60•23 years ago
|
||
Sorry for the spam, I have opened a new bug 127278 with my problems in it. Can someone else put this back at WFM as I can't.
Assignee | ||
Comment 61•23 years ago
|
||
Marked the bug WORKSFORME.
Status: REOPENED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•