Closed Bug 181230 Opened 23 years ago Closed 23 years ago

profile-change-net-teardown must work

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla1.3beta

People

(Reporter: KaiE, Assigned: darin.moz)

References

Details

(Keywords: topembed+)

Attachments

(1 file, 4 obsolete files)

Working on bugs 177260 and 97622 I ran into an additional problem. I thought I had a good workaround, but the workaround requires direct manipulation of the lower NSPR data structures from outside the socket transport thread, and that is possibly dangerous. If, at the time we need to shut down PSM/NSS in order to switch profiles, there are still sockets alive, we have a real problem. Because the SSL library is layering itself on the file descriptor stack, the resources that SSL is holding into NSS can not be freed until the socket is closed. Actual behaviour: After the profile-change-net-teardown phase is over, there are still open sockets. Requested behaviour: During the profile-change-net-teardown phase, all open sockets must be closed synchronously.
Severity: normal → major
Status: NEW → ASSIGNED
Keywords: topembed
Priority: -- → P2
Target Milestone: --- → mozilla1.3alpha
Attached patch test patch (obsolete) — Splinter Review
this patch ensures that any active connections will not be added as idle connections after the net-teardown event fires.
kai says the "test patch" doesn't help.
Darin, in your test patch you try to modificate the http handler. I would have expected that the problem lies somewhere else. I see that mail is also not behaving correctly and stops us from switching profiles, because SSL sockets have not been closed. Are you able to tear down all sockets the hard way in the socket transport code, regardless where they come from?
kai: my assumption was that socket transport shutdown was happening, but that some connections were becoming IDLE (on a background thread) at the same time that we were processing the net-teardown event. i think my patch closes this hole, but obviously there is something else going on as well. i'm hoping to spend time on this today.
Attached patch v1 patch (obsolete) — Splinter Review
this patch fixes the "going offline while security dialog is visible" problem.
Attachment #106996 - Attachment is obsolete: true
kai: can you please give this latest patch a try and see if it fixes all of your testcases? thx!
Sorry, this patch does not change the behaviour, I still see the same problems. I would have assumed there is only one approach than can help us: When we receive the call to SetOffline, go to the socket transport code, synchronously iterate over the list of all sockets known by it, call close on each, and do not return from the notification until that is done. But that is probably dangerous, because other code has still references to the socket. So I suggest, instead of closing the files, which invalidates the file descriptors and makes other components crash: Use the hack that I proposed in bug 177260, but which you found to be unsafe when done from outside the socket thread. So the new approach might be: - SetOffline received - iterate over all sockets - for each socket, do the following hack: - use PR_NewTCPSocket to create a new unconnected socket - swap all the contents of the socket filedescriptor structure between the old and the hack socket - close the fd that is now no longer referenced by others
> I would have assumed there is only one approach than can help us: > When we receive the call to SetOffline, go to the socket transport code, > synchronously iterate over the list of all sockets known by it, call close on > each, and do not return from the notification until that is done. that is what the code tries to do. it cancels all of the socket transport requests, which must be processed on the socket transport thread. the thread which shutdown the socket transport service (the main thread) must wait until the socket transport thread exits (it does a join). this makes the call to shutdown the socket transport synchronous. kai: what i need is a testcase. i used the testcase of going offline while the security dialog is showing to arrive at this patch. is that what you were testing? before my patch i noticed PR_Close being called after the security dialog is closed, but with my patch PR_Close is called right away. please describe the steps you used to repro the problem. thx!
The testcase is: - add printf statements as the first line to nsNSSIOLayerClose and nsSSLIOLayerConnect in file nsNSSIOLayer.cpp - compile in optimized mode (I tested on Linux) - open two browser windows - in one window, open https://meine.db24.de - wait for the security window and leave it open - watch the console, you'll see the output from the open function - in the other window, click the statusbar offline icon Expected behaviour: The SSL close function should have been reached as soon as you clicked the offline icon. As a result, the console log should show the printf output from the close function. Actual behavior: No output on the console. Another testcase: - Configure mail to use IMAP / SSL. - Configure edit/privacy/certificates to "ask every time" - Open a mail and a browser window - in one window, connect to the server - should it prompt to select a certificate, cancel - you should see the prompt to enter your mail password - with the password prompt dialog open, switch to the browser window - click the offline icon. Expected: close is reached immediately. Actual: Nothing happens.
> i used the testcase of going offline while the > security dialog is showing to arrive at this patch. is that what you were > testing? before my patch i noticed PR_Close being called after the security > dialog is closed, but with my patch PR_Close is called right away Hmm, I can not confirm that, on my Linux system, close is not reached right away. It is not reached until I confirm the security warning dialog.
Attached patch v1.1 patch (obsolete) — Splinter Review
OK, this patch is similar to the other patch in that it only attempts to fix the problem for HTTPS.
Attachment #107248 - Attachment is obsolete: true
Attached patch v1.2 patch (obsolete) — Splinter Review
this one should really do the trick for HTTPS.
Attachment #107379 - Attachment is obsolete: true
This latest patch does indeed work for the https case. (Although it does not yet work for mail/ssl, as discussed).
Blocks: 182803
Attachment #107386 - Flags: superreview?(rpotts)
Attachment #107386 - Flags: review?(dougt)
Discussed in edt. Plussing because we need this for profile sharing.
Keywords: topembedtopembed+
Darin, is your intention to implement the general solution (which would include tearing down mail ssl sockets) within the scope of this bug?
Kai: yes, it is. but i want to get this HTTP patch in because i think it is anyways a good idea, and it might even fix a topcrash.
Ok, thanks!
Attachment #107386 - Flags: superreview?(rpotts) → superreview+
Attachment #107386 - Flags: superreview?(rpotts)
Attachment #107386 - Flags: superreview+
Attachment #107386 - Flags: review?(dougt)
Attachment #107386 - Flags: review+
Comment on attachment 107386 [details] [diff] [review] v1.2 patch seeking drivers approval to check this into the trunk for moz 1.3 alpha. this patch not only helps fix this bug for HTTPS connections, but it might also solve topcrash bug 179391. i think this patch has moderate risk.
Attachment #107386 - Flags: superreview?(rpotts)
Attachment #107386 - Flags: superreview+
Attachment #107386 - Flags: approval1.3a?
Comment on attachment 107386 [details] [diff] [review] v1.2 patch a=asa for checkin to 1.3a
Attachment #107386 - Flags: approval1.3a? → approval1.3a+
landed patch on trunk. next, i'll look into how to fix this in general. hopefully this patch will help with the topcrash shutdown bug.
Nominating to block 1.3b, because not having this fix results in crash bug 182803.
Flags: blocking1.3b?
Blocks: 187501
Comment on attachment 107386 [details] [diff] [review] v1.2 patch In the past, when people landed patches on the trunk, but decided to keep the bug open, I saw they marked the landed patches as obsolete to minimize potential confusion. I think that's a good idea. Marking obsolete.
Attachment #107386 - Attachment is obsolete: true
should be fixed by my patch for bug 176919.
Depends on: 176919
Target Milestone: mozilla1.3alpha → mozilla1.3beta
Blocks: 188558
FIXED with patch from bug 176919.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Flags: blocking1.3b?
*** Bug 188558 has been marked as a duplicate of this bug. ***
V/fixed. This has worked for some time. I haven't tested it with a set of unit cases, since profile changing is available only in seamonkey UI. I'd like to do this in the future as an high-level integration test, maybe as an extension of the networking-core functional test I wrote (which pokes a lot of network protocols at once). Bother me if you care.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: