Closed
Bug 30648
Opened 25 years ago
Closed 25 years ago
nsSocketTransports (from nsMsgProtocol.cpp) not getting removed from mActiveTransportList (reaching limit of 50)
Categories
(MailNews Core :: Backend, defect, P1)
Tracking
(Not tracked)
VERIFIED
FIXED
M14
People
(Reporter: trudelle, Assigned: sspitzer)
Details
(Whiteboard: [PDT+]3/15 verified/seems OK win32 b1, but need more soaking)
Attachments
(4 files)
1.44 KB,
patch
|
Details | Diff | Splinter Review | |
2.37 KB,
patch
|
Details | Diff | Splinter Review | |
1.46 KB,
patch
|
Details | Diff | Splinter Review | |
3.40 KB,
patch
|
Details | Diff | Splinter Review |
I don't like to open bugs without a finite set of steps to reproduce, but I've been seeing a very short MTBF while browsing newsgroups using recent (commercial) verification builds. It probably hasn't changed suddenly, but until recently no part of the browser had an MTBF that was any longer, so it didn't stand out so much. I only check a few newsgroups, but they have lots (thousands) of messages. Here's a typical scenario: Launch into mail, or into browser and then open a mail window. Login to IMAP server, read a few messages, respond to a few. load n.p.m.xpfe & read a few messages Load n.p.m.ui, read several messages, respond to a few. After 20-30 minutes, all loads/reads will fail. Up until a few weeks ago, I couldn't read more than 15 articles or so before the operation would fail. Message loading would start out slow, then gradually get much slower until it failed. Now I can read a lot more, but memory use continues to go up quickly with each newsgroup loaded. For instance, loading n.p.m.ui typically eats about 12 MB or RAM. Using MemTurbo, I've noticed that the failure to load newsgroups or read messages corresponds roughly with running out of free RAM.
Comment 1•25 years ago
|
||
May be related to bug 29808. Peter, when loads/reloads fail, is it a network problem or a repaint problem? In otehr words, if you alt-tab away from Mozilla and back, does the message display? Anyway, this is Seth's.
Assignee: phil → sspitzer
Assignee | ||
Comment 2•25 years ago
|
||
zach, I thought #29808 was linux only. a while back I had a bug where I wasn't closing the nntp connection. after the client opened enough (without closing any) the server start refusing, and this sounds similar. but I'll also investigate the memory leaking / running out of memory issue that peter points out.
Status: NEW → ASSIGNED
Reporter | ||
Comment 3•25 years ago
|
||
It is not a refresh problem, the status and throbber show operations in progress, and forced repaints have no effect. The key problem for me is that the message load times seem to start out slow and get progressively longer until they fail altogether. I mention the memory use only because it might be relevant. I'm not claiming it is a leak or even a problem, for all I know it is being put to good use as a cache.
I'm wondering if it could be similar to the bug I filed a while back: http://bugzilla.mozilla.org/show_bug.cgi?id=20653
Reporter | ||
Comment 5•25 years ago
|
||
Yes, it is similar, except this is happening in News, on Win98, I'm not using Biff for anything, it is easy to reproduce (100%). It sounds just like what asadotzler@netscape.net was describing in 20653. I think it is a dogfood issue, adding keyword. I'm very motivated to use this product, but for News it is impractical. One more note: one of the times it happened today was with 40 MB free RAM, so it may have nothing to do with low memory.
Keywords: dogfood
Assignee | ||
Comment 7•25 years ago
|
||
investigating this now. I'll start by making sure we are closing all nntp connections.
Target Milestone: M14
Assignee | ||
Comment 8•25 years ago
|
||
good news, is that on linux, I'm not seeing the nntp connections being left open. so most likely, that isn't the problem on windows either (but I'm going to check.) I was able to post messages and read messages for a while on linux, switching between newsgroups. I'll continue to try to reproduce.
Reporter | ||
Comment 9•25 years ago
|
||
I haven't seen this problem on Linux either, it could be limited to Win98.
Assignee | ||
Comment 10•25 years ago
|
||
updating summary and keywords.
Keywords: beta1
Whiteboard: [PDT+] → [PDT+](trying to reproduce)
Comment 11•25 years ago
|
||
I displayed about 50 mail messages in about 15 minutes with today's debug build on WinNT. No problems, other than the expected memory leaks (approx 10-100K per message.) Maybe the reported problem really is just a low memory problem, even though that doesn't jive with Peter's comment. A debug stack from the crash would sure help.
Reporter | ||
Comment 12•25 years ago
|
||
What crash? This is a failure to load messages while reading news on Windows 98; has anyone tried to reproduce it there? It happens to me all the time, so if anyone is interested I can demonstrate it. I don't think it is low RAM, but why on earth would we expect leaks of 100K per message?
Whiteboard: [PDT+](trying to reproduce) → [PDT+](trying to reproduce on other platforms)
Comment 13•25 years ago
|
||
I see this also. Using today's build. When this happens, I can't load any newsgroup messages (even in a different newsgroup). I can go back to my mail folder and load a mail (IMAP) message. Then, I try to go back to news and still can't load. I'll see if I can narrow down further, but I'm out most of the day Thurs. This is on Win32 system.
Comment 14•25 years ago
|
||
Seth - if you want, you can use my notebook while I'm out tomorrow. Just be sure to let me know so that I can turn off my screen saver password.
Assignee | ||
Comment 15•25 years ago
|
||
I'm getting somewhere. god bless the MSVC++ debugger. deep down in nsSocketTransportService.cpp, the list of active transport sockets keeps growing, until we start failing. (the get filled up with transports to the news server.) something is not cleaning up properly and I'm still looking for it.
Whiteboard: [PDT+](trying to reproduce on other platforms) → [PDT+](reproducable, debugging and working on a fix.)
Assignee | ||
Comment 16•25 years ago
|
||
yep, we start failing because we get 50 connections in the mActiveTransportList MAX_OPEN_CONNECTIONS is set to 50 still trying to figure out why connections aren't getting properly removed.
Assignee | ||
Comment 17•25 years ago
|
||
it may be something bigger than just news. it may be something with nsMsgProtocol.cpp if I do "get new messages" on a pop account, nsSocketTransports get left in the mActiveTransport list. after 50 get new messages, I'll start failing on that too, because the limit has been reached.
Priority: P3 → P1
Assignee | ||
Comment 18•25 years ago
|
||
adding necko team members to see if they have any ideas. I'm still debugging and trying to come up with a fix.
Assignee | ||
Comment 19•25 years ago
|
||
changing summary to reflect the real problem.
Summary: Very short MTBF reading newsgroups → nsSocketTransports (from nsMsgProtocol.cpp) not getting removed from mActiveTransportList (reaching limit of 50)
Assignee | ||
Comment 20•25 years ago
|
||
I have a fix, but is not the final fix. the problem, I think, is that transport never gets removed. we call nsSocketTransport::doWrite() and eventuall, we write out the entire buffer, then we get to line 1070 else if (NS_BASE_STREAM_WOULD_BLOCK == rv) { // // If the buffer is empty, then notify the reader and stop polling // for write until there is data in the buffer. See the OnWrite() // notification... // NS_ASSERTION((0 == totalBytesWritten), "returned NS_BASE_STREAM_WOULD_BLOCK and a writeCount"); if (GetFlag(eSocketWrite_Sync)) { // We can only wait if we created the stream (a pipe -- created in the synchronous // OpenOutputStream case). If we didn't create it, we couldn't have been the buffer // observer, so we won't get any notification when more data becomes available. SetFlag(eSocketWrite_Wait); mSelectFlags &= (~PR_POLL_WRITE); } my solution is to expose a CloseTransport() method and call it from nsMsgProtocol::CloseSocket() you get the idea. I'll attach the patch. I'd like to discuss the correct solution with the necko guys.
Assignee | ||
Comment 21•25 years ago
|
||
Assignee | ||
Comment 22•25 years ago
|
||
the fix I have in hand solves the problem. closed connections get removed from the mActiveTransports array so we never hit 50.
Whiteboard: [PDT+](reproducable, debugging and working on a fix.) → [PDT+](possible fix in hand)
Comment 23•25 years ago
|
||
Adjusted whiteboard based on reading of comment (i.e., the fix "works"). I still assume you need review and testing. IF other folks have provided that... please feel free to add that to the bug report, to help move this along. Thanks, Jim
Whiteboard: [PDT+](possible fix in hand) → [PDT+] fix in hand
Comment 24•25 years ago
|
||
Seth: Why do you add a new CloseTransport method? Why can't you just call Cancel?
Assignee | ||
Comment 25•25 years ago
|
||
Cancel is part of the nsIRequest interface? yes, I could just QI to nsIRequest and call cancel. The reason I didn't was I wasn't sure cancel going to be the right thing to do. Cancel() has the desired side effect (of removing the nsSocketTransport). Is there a better, more proper way to do this?
Comment 26•25 years ago
|
||
You don't even need to QI. Just call Cancel on the transport directly.
Assignee | ||
Comment 27•25 years ago
|
||
in nsMsgProtocol, all I have is a nsIChannel
Assignee | ||
Comment 28•25 years ago
|
||
duh. never mind. I can do what warren suggests, just calling cancel on the nsIChannel.
Assignee | ||
Comment 29•25 years ago
|
||
ok, I've got a cleaned up version of the patch. I'll attach in a second for warren's review.
Assignee | ||
Comment 30•25 years ago
|
||
Comment 31•25 years ago
|
||
Shouldn't that read: if (m_channel) { rv = m_channel->Cancel(); ?
Assignee | ||
Comment 32•25 years ago
|
||
yes, thanks for catching that. that cleaner than calling nsMsgProtocol::Cancel()
Comment 33•25 years ago
|
||
ok then... r=warren
Assignee | ||
Comment 34•25 years ago
|
||
seeking PDT approval now.
Assignee | ||
Comment 35•25 years ago
|
||
I've got the final version of the patch. seeking review from mscott (for the additional nsMsgProtocol.cpp change) now.
Assignee | ||
Comment 36•25 years ago
|
||
Comment 37•25 years ago
|
||
cvs diff -c is your friend ^^ I can't read your diff.
Assignee | ||
Comment 38•25 years ago
|
||
Comment 39•25 years ago
|
||
Looks good.
Assignee | ||
Comment 40•25 years ago
|
||
seeking approval now. mscott and warren have reviewed.
Assignee | ||
Comment 41•25 years ago
|
||
will check in fix this weekend.
Updated•25 years ago
|
Whiteboard: [PDT+] fix in hand → [PDT+]3/12 landing (fix in hand)
Assignee | ||
Comment 42•25 years ago
|
||
fixed. I still need to land this on the beta1 branch, and I'll do that soon.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 43•25 years ago
|
||
the fix for this has also been checked into the beta1 branch.
Comment 44•25 years ago
|
||
I believe this is OK using 2000-03-15-06 nb1b commercial build NT 4.0. I've tried a few times and haven't seen a complete failure, although switching newsgroups or back to mail folders then again to newsgroup is dog slow (seems slower than it was last week). Seems OK, but I'd like to give it some more soak time tomorrow. Won't mark verified until I test more tomorrow.
Whiteboard: [PDT+]3/12 landing (fix in hand) → [PDT+]3/15 verified/seems OK win32 b1, but need more soaking
Comment 45•24 years ago
|
||
OK, looks good to me. Marking verified with: 2000-03-20-06 nb1b commercial build NT 4.0 2000-03-20-06 nb1b commercial build linux rh6.0 2000-03-20-06 nb1b commercial build mac OS 9.0
Status: RESOLVED → VERIFIED
Updated•20 years ago
|
Product: MailNews → Core
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•