Closed Bug 84572 Opened 24 years ago Closed 24 years ago

networking deadlock while posing security UI and closing window

Categories

(Core :: Networking, defect, P1)

x86
Windows 2000
defect

Tracking

()

RESOLVED FIXED
mozilla0.9.2

People

(Reporter: danm.moz, Assigned: darin.moz)

References

()

Details

(Whiteboard: r=danm, rs=dougt, a=asa)

Attachments

(3 files)

This'll be fun. We have other problems at the above site, but this deadlock is new within the past week or two, and preventing me from getting at another problem (bug 78504). Go to http://www.ebanking.lu/ Click the Accès Membre link in the lower left Mozilla deadlocks The first attachment is a stack trace in a worker thread doing socket transport. nsSocketTransport::Process (near the bottom) enters a monitor. The stack continues through HTTP protocol, and on into Security, which proxies a call over to the UI thread, presumably to open a dialog. (It's on nsNSSIOLayer.cpp rev 1.25 line 625, badCertHandler->UnknownIssuer(). (At the time the app freezes, it does seem to be trying to pose a dialog.) Meanwhile the second attachment is a stack trace in the UI thread. It's fielding a focus event which executes some JS from the site which apparently calls window.close(). While closing, nsDocShell::Stop tries to shut down IO, and you can see nsSocketTransport trying to enter the same monitor that's currently being held in the worker thread from the first attachment. To be fair, this freeze is the child of some awful things. But Brendan and I believe there is to blame at least the networking code, which is holding a monitor while calling out to a listener interface object, which is free to do pretty much anything, and does. Any takers?
Thinking Doug might want first crack...
Assignee: neeti → dougt
Blocks: 78504
sure. i will look at for 0.9.2
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.2
darin has a dup.
Assignee: dougt → darin
Status: ASSIGNED → NEW
I wondered when this would finally bite ;-)
Priority: -- → P1
I can't find the bug on this (maybe it got closed?!?)... will carry on here.
Status: NEW → ASSIGNED
after inspecting the stack traces carefully, i realize that this is a bit more interesting than i had originally thought. the real problem is not so much the fact that we call out to OnDataAvailable holding the lock, but the fact that we call PR_Read while holding the lock!! either nsSocketTransport::Dispatch needs to be rewritten to not need to enter the lock (since it anyways queues up an async event) or we need to find a way to not hold the socket transport lock while calling PR_Read. at first, i thought this bug resulted from my HTTP branch landing, since i am on the stack, but actually we would have faced this exact same bug regardless. it is really the fact of PSM becomeing in-process that triggered this bug. i already have a ref count on the socket PRFileDesc for use with blocking socket i/o... perhaps i can utilize the same sort of mechanism to allow us to drop the lock while calling out to PR_Read. investigating...
i've only tested this patch at a cursory level... more testing is definitely needed.
dougt says: rs=
Looks good -- with this patch Mozilla no longer locks up at the test site. The patch seems good to me. Exiting the monitor seems like a helpful thing, if you can afford it.
Whiteboard: r=danm, rs=dougt, a=?
Keywords: patch
Blocks: 83989
a= asa@mozilla.org for checkin to the trunk. (on behalf of drivers)
Whiteboard: r=danm, rs=dougt, a=? → r=danm, rs=dougt, a=asa
fix checked in
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Depends on: 77473
No longer depends on: 77473
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: