Closed Bug 30648 Opened 25 years ago Closed 25 years ago

nsSocketTransports (from nsMsgProtocol.cpp) not getting removed from mActiveTransportList (reaching limit of 50)

Tracking

(Not tracked)

Status:

VERIFIED FIXED

Milestone:

M14

People

(Reporter: trudelle, Assigned: sspitzer)

Details

(Whiteboard: [PDT+]3/15 verified/seems OK win32 b1, but need more soaking)

Attachments

(4 files)

a first draft at a fix. 25 years ago (not reading, please use seth@sspitzer.org instead) 1.44 KB, patch		Details \| Diff \| Splinter Review
the fix for this bug. 25 years ago (not reading, please use seth@sspitzer.org instead) 2.37 KB, patch		Details \| Diff \| Splinter Review
latest version of the fix. 25 years ago (not reading, please use seth@sspitzer.org instead) 1.46 KB, patch		Details \| Diff \| Splinter Review
the diff again, in -p format. sorry about that. 25 years ago (not reading, please use seth@sspitzer.org instead) 3.40 KB, patch		Details \| Diff \| Splinter Review

Peter Trudelle

Reporter

Description

•

25 years ago

I don't like to open bugs without a finite set of steps to reproduce, but I've
been seeing a very short MTBF while browsing newsgroups using recent
(commercial) verification builds. It probably hasn't changed suddenly, but until
recently no part of the browser had an MTBF that was any longer, so it didn't
stand out so much. I only check a few newsgroups, but they have lots (thousands)
of messages.  Here's a typical scenario:

Launch into mail, or into browser and then open a mail window.
Login to IMAP server, read a few messages, respond to a few.
load n.p.m.xpfe & read a few messages
Load n.p.m.ui, read several messages, respond to a few.
After 20-30 minutes, all loads/reads will fail.

Up until a few weeks ago, I couldn't read more than 15 articles or so before the
operation would fail. Message loading would start out slow, then gradually get
much slower until it failed. Now I can read a lot more, but memory use continues
to go up quickly with each newsgroup loaded.  For instance, loading n.p.m.ui
typically eats about 12 MB or RAM.  Using MemTurbo, I've noticed that the
failure to load newsgroups or read messages corresponds roughly with running out
of free RAM.

Richard Zach

Comment 1

•

25 years ago

May be related to bug 29808. Peter, when loads/reloads fail, is it a network
problem or a repaint problem? In otehr words, if you alt-tab away from Mozilla
and back, does the message display?  Anyway, this is Seth's.

Assignee: phil → sspitzer

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 2

•

25 years ago

zach, I thought #29808 was linux only.

a while back I had a bug where I wasn't closing the nntp connection.

after the client opened enough (without closing any) the server start refusing,
and this sounds similar.

but I'll also investigate the memory leaking / running out of memory issue that
peter points out.

Status: NEW → ASSIGNED

Peter Trudelle

Reporter

Comment 3

•

25 years ago

It is not a refresh problem, the status and throbber show operations in progress, 
and forced repaints have no effect. The key problem for me is that the message 
load times seem to start out slow and get progressively longer until they fail 
altogether.  I mention the memory use only because it might be relevant. I'm not 
claiming it is a leak or even a problem, for all I know it is being put to good 
use as a cache.

lchiang

Comment 4

•

25 years ago

I'm wondering if it could be similar to the bug I filed a while 
back:  http://bugzilla.mozilla.org/show_bug.cgi?id=20653

Peter Trudelle

Reporter

Comment 5

•

25 years ago

Yes, it is similar, except this is happening in News, on Win98, I'm not using
Biff for anything, it is easy to reproduce (100%).  It sounds just like what
asadotzler@netscape.net was describing in 20653.  I think it is a dogfood issue,
adding keyword. I'm very motivated to use this product, but for News it is
impractical.  One more note: one of the times it happened today was with 40 MB
free RAM, so it may have nothing to do with low memory.

Keywords: dogfood

leger

Comment 6

•

25 years ago

Putting on PDT+ for beta1.

Whiteboard: [PDT+]

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 7

•

25 years ago

investigating this now.  I'll start by making sure we are closing all nntp
connections.

Target Milestone: M14

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 8

•

25 years ago

good news, is that on linux, I'm not seeing the nntp connections being left
open.  so most likely, that isn't the problem on windows either (but I'm going
to check.)

I was able to post messages and read messages for a while on linux, switching
between newsgroups.

I'll continue to try to reproduce.

Peter Trudelle

Reporter

Comment 9

•

25 years ago

I haven't seen this problem on Linux either, it could be limited to Win98.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 10

•

25 years ago

updating summary and keywords.

Keywords: beta1

Whiteboard: [PDT+] → [PDT+](trying to reproduce)

buster

Comment 11

•

25 years ago

I displayed about 50 mail messages in about 15 minutes with today's debug build 
on WinNT.  No problems, other than the expected memory leaks (approx 10-100K 
per message.)  Maybe the reported problem really is just a low memory problem, 
even though that doesn't jive with Peter's comment. A debug stack from the 
crash would sure help.

Peter Trudelle

Reporter

Comment 12

•

25 years ago

What crash? This is a failure to load messages while reading news on Windows 98;
has anyone tried to reproduce it there?
It happens to me all the time, so if anyone is interested I can demonstrate it.
I don't think it is low RAM, but why on earth
would we expect leaks of 100K per message?

Whiteboard: [PDT+](trying to reproduce) → [PDT+](trying to reproduce on other platforms)

lchiang

Comment 13

•

25 years ago

I see this also.  Using today's build.  When this happens, I can't load any 
newsgroup messages (even in a different newsgroup).  I can go back to my mail 
folder and load a mail (IMAP) message.  Then, I try to go back to news and still 
can't load.  I'll see if I can narrow down further, but I'm out most of the day 
Thurs.  This is on Win32 system.

lchiang

Comment 14

•

25 years ago

Seth - if you want, you can use my notebook while I'm out tomorrow.  Just be 
sure to let me know so that I can turn off my screen saver password.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 15

•

25 years ago

I'm getting somewhere.  god bless the MSVC++ debugger.

deep down in nsSocketTransportService.cpp, the list of active transport sockets 
keeps growing, until we start failing.  (the get filled up with transports to 
the news server.)

something is not cleaning up properly and I'm still looking for it.

Whiteboard: [PDT+](trying to reproduce on other platforms) → [PDT+](reproducable, debugging and working on a fix.)

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 16

•

25 years ago

yep, we start failing because we get 50 connections in the mActiveTransportList

MAX_OPEN_CONNECTIONS is set to 50

still trying to figure out why connections aren't getting properly removed.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 17

•

25 years ago

it may be something bigger than just news.

it may be something with nsMsgProtocol.cpp

if I do "get new messages" on a pop account, nsSocketTransports get left in the
mActiveTransport list.

after 50 get new messages, I'll start failing on that too, because the limit has 
been reached.

Priority: P3 → P1

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 18

•

25 years ago

adding necko team members to see if they have any ideas.

I'm still debugging and trying to come up with a fix.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 19

•

25 years ago

changing summary to reflect the real problem.

Summary: Very short MTBF reading newsgroups → nsSocketTransports (from nsMsgProtocol.cpp) not getting removed from mActiveTransportList (reaching limit of 50)

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 20

•

25 years ago

I have a fix, but is not the final fix.

the problem, I think, is that transport never gets removed.

we call nsSocketTransport::doWrite() and eventuall, we write out the entire 
buffer, then we get to line 1070

  else if (NS_BASE_STREAM_WOULD_BLOCK == rv) {
    //
    // If the buffer is empty, then notify the reader and stop polling
    // for write until there is data in the buffer.  See the OnWrite()
    // notification...
    //
    NS_ASSERTION((0 == totalBytesWritten), "returned NS_BASE_STREAM_WOULD_BLOCK 
and a writeCount");
    if (GetFlag(eSocketWrite_Sync)) {
      // We can only wait if we created the stream (a pipe -- created in the 
synchronous 
      // OpenOutputStream case). If we didn't create it, we couldn't have been 
the buffer
      // observer, so we won't get any notification when more data becomes 
available. 
      SetFlag(eSocketWrite_Wait);
      mSelectFlags &= (~PR_POLL_WRITE);
    }

my solution is to expose a CloseTransport() method and call it 
from nsMsgProtocol::CloseSocket()

you get the idea.  I'll attach the patch.

I'd like to discuss the correct solution with the necko guys.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 21

•

25 years ago

Attached patch a first draft at a fix. — Details — Splinter Review

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 22

•

25 years ago

the fix I have in hand solves the problem.
closed connections get removed from the mActiveTransports array so we never hit 
50.

Whiteboard: [PDT+](reproducable, debugging and working on a fix.) → [PDT+](possible fix in hand)

Jim Roskind

Comment 23

•

25 years ago

Adjusted whiteboard based on reading of comment (i.e., the fix "works").  I
still assume you need review and testing.  IF other folks have provided that...
please feel free to add that to the bug report, to help move this along.
Thanks,
Jim

Whiteboard: [PDT+](possible fix in hand) → [PDT+] fix in hand

Warren Harris

Comment 24

•

25 years ago

Seth: Why do you add a new CloseTransport method? Why can't you just call 
Cancel?

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 25

•

25 years ago

Cancel is part of the nsIRequest interface?  yes, I could just QI to nsIRequest 
and call cancel.

The reason I didn't was I wasn't sure cancel going to be the right thing to do.

Cancel() has the desired side effect (of removing the nsSocketTransport).

Is there a better, more proper way to do this?

Warren Harris

Comment 26

•

25 years ago

You don't even need to QI. Just call Cancel on the transport directly.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 27

•

25 years ago

in nsMsgProtocol, all I have is a nsIChannel

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 28

•

25 years ago

duh.  never mind.  I can do what warren suggests, just calling cancel on the 
nsIChannel.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 29

•

25 years ago

ok, I've got a cleaned up version of the patch.

I'll attach in a second for warren's review.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 30

•

25 years ago

Attached patch the fix for this bug. — Details — Splinter Review

Warren Harris

Comment 31

•

25 years ago

Shouldn't that read:

     if (m_channel) {
        rv = m_channel->Cancel();

?

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 32

•

25 years ago

yes, thanks for catching that.

that cleaner than calling nsMsgProtocol::Cancel()

Warren Harris

Comment 33

•

25 years ago

ok then... r=warren

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 34

•

25 years ago

seeking PDT approval now.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 35

•

25 years ago

I've got the final version of the patch.  seeking review from mscott (for the 
additional nsMsgProtocol.cpp change) now.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 36

•

25 years ago

Attached patch latest version of the fix. — Details — Splinter Review

Warren Harris

Comment 37

•

25 years ago

cvs diff -c is your friend
         ^^
I can't read your diff.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 38

•

25 years ago

Attached patch the diff again, in -p format. sorry about that. — Details — Splinter Review

Warren Harris

Comment 39

•

25 years ago

Looks good.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 40

•

25 years ago

seeking approval now.  mscott and warren have reviewed.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 41

•

25 years ago

will check in fix this weekend.

Jim Roskind

Updated

•

25 years ago

Whiteboard: [PDT+] fix in hand → [PDT+]3/12 landing (fix in hand)

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 42

•

25 years ago

fixed.  I still need to land this on the beta1 branch, and I'll do that soon.

Status: ASSIGNED → RESOLVED

Closed: 25 years ago

Resolution: --- → FIXED

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 43

•

25 years ago

the fix for this has also been checked into the beta1 branch.

laurel

Updated

•

25 years ago

QA Contact: lchiang → laurel

laurel

Comment 44

•

25 years ago

I believe this is OK using 2000-03-15-06 nb1b commercial build NT 4.0.  I've
tried a few times and haven't seen a complete failure, although switching
newsgroups or back to mail folders then again to newsgroup is dog slow (seems
slower than it was last week).  Seems OK, but I'd like to give it some more soak
time tomorrow.  Won't mark verified until I test more tomorrow.

Whiteboard: [PDT+]3/12 landing (fix in hand) → [PDT+]3/15 verified/seems OK win32 b1, but need more soaking

laurel

Comment 45

•

24 years ago

OK, looks good to me.
Marking verified with:
2000-03-20-06 nb1b commercial build NT 4.0
2000-03-20-06 nb1b commercial build linux rh6.0
2000-03-20-06 nb1b commercial build mac OS 9.0

Status: RESOLVED → VERIFIED

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: MailNews → Core

Nobody; OK to take it and work on it

Updated

•

16 years ago

Product: Core → MailNews Core

You need to log in before you can comment on or make changes to this bug.