Closed Bug 194402 Opened 22 years ago Closed 22 years ago

publishing doesn't complete; opens multiple FTP control connetctions

Categories

(Core Graveyard :: Networking: FTP, defect, P1)

defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.4beta

People

(Reporter: chrispetersen, Assigned: darin.moz)

References

()

Details

(Keywords: topembed-)

Attachments

(4 files, 1 obsolete file)

Build: 2003-02-20-03(OS X Mach-0), 2003-02-20-04 (Win32)
Platform: All
Expected Results: The publishing process should finish the transfer of image amd
html data.
What I got: During the transfer of images to the server, the Composer
application crashes.

Steps to reproduce:

1) Go to www.apple.com
2) Select Edit page from File menu
3) In Composer, select Publish as
4) In the Settings tab, type a ftp address in the Publishing address field
5) Type a valid user name and password
6) In the Publish tab, type file name with a .html. Click Publish.
7) The file transfer should begin. After a dozen or so images have been
transfered, the application crashes.

Note: This doesn't happen at every site. I can't reproduce , for example,  with
google.com or a simple HTML page created in Composer. It could be related to
number on elements on the page perhaps.
Also, this problem occurs in the 7.02 (OS X and Linux) builds.
brade suggests assigning to darin...
Assignee: dougt → darin
Severity: normal → critical
Keywords: crash
 #0   0x00000000 in 0x0
 #1   0x0087a0c8 in nsSocketTransportService::RememberHost(nsACString const&,
unsigned short, PRIPv6Addr*)

ooh... this is a duplicate of bug 192207.

*** This bug has been marked as a duplicate of 192207 ***
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
Attempting to publish this document results in a crash.
interesting, publishing this page seems to cause us to create an endless number
of FTP connections.  as a result we hit the hard limit on the number of allowed
socket connections.  bug 192207 is about the fact that we crash when we reach
that hard limit.  with my patch for bug 192207 in my tree, we no longer crash. 
however, publishing still fails.

i'm going to reopen this bug to track the publishing failure.  i'm removing the
crash keyword and updating the summary to reflect the real problem.
Severity: critical → major
Status: RESOLVED → REOPENED
Depends on: 192207
Keywords: crash
Priority: -- → P2
Resolution: DUPLICATE → ---
Summary: While publishing to a ftp server from Composer, the application suddenly crashes → publishing doesn't complete; keeps opening new FTP connections
Target Milestone: --- → mozilla1.3final
ok, the problem is that we don't do anything to gate or limit the number of
simultaneous FTP connections.  netstat indicates that a bunch of control and
data connections have been opened.  it looks like the old socket transport
service had some code to make things continue to work even when the hard limit
on the number of sockets is reached, but there was a big XXX comment indicating
that this condition was not fully implemented.  perhaps it just barely worked
before.  ultimately, i think we need the socket transport to recover gracefully
when the hard limit on the number of sockets is reached, and maybe we need FTP
to limit how many sockets it tries to open.
ftp should be being limited by the per-host limit in teh socket transport
service, shouldn't it?
both.

we should limit the control to 4-5 per site and a total number of controls to
12-15.  In prefs, of course.
I'm grasping at straws here...

What I don't fully understand at the moment is that the persist code is supposed
to be attempting to serialize publishing of multiple files.  Composer appears to
be setting the flag for non-local files that are published and I don't see any
changes in the past few weeks in the persist code in that vicinity.  Was it
working based on some old assumption for how necko worked?  

Long ago, I did setup my own (linux) ftp server and limit the number of
connections that could be made to 1 and I was able to successfully publish an
html file with several images.

Why are the connections not closing?  Perhaps I have the pref set for immediate
timeout?
-> 1.4
Status: REOPENED → ASSIGNED
Target Milestone: mozilla1.3final → mozilla1.4alpha
brade: I'm going through the current ftp bugs, and I don't see any reports of
excessive ftp control connections from downloading, so perhaps it has only to do
with uploading? If you use different functions than downloading, I could see how
the connection management logic might break somehow.

Any idea when this regressed?
Summary: publishing doesn't complete; keeps opening new FTP connections → publishing doesn't complete; opens multiple FTP control connetctions
Keywords: nsbeta1, topembed
Priority: P2 → P1
Discussed in edt.  Minusing.  Not a currrent concern for our embedding customers.
Keywords: topembedtopembed-
Severity: major → critical
Target Milestone: mozilla1.4alpha → mozilla1.4beta
Attached patch v1 patch (obsolete) — Splinter Review
this patch cleans up a lot of stuff.

1- socket transport now asks the socket transport service (STS) if it is ok to
"attach a new socket".	if the STS says yes, then the socket transport goes
ahead and does its normal stuff.  else, the socket transport asks the STS to
notify it when it can attach a new socket.  this allows us to limit the number
of sockets allocated by mozilla, which is a requirement on some platforms.

2- now, to play nicely with this, FTP also has to limit the number of control
connections it caches.	so, i made nsFtpProtocolHandler::mRootConnectionList
hold no more than 8 connections.  the list is ordered by age, so i just trim
from the head of the list.  well, actually... i look for the eldest connection
to the same host and trim that first.  if no matches, then i trim the eldest
overall.

3- finally, when publishing we don't close the data connection until the
application shuts down.  i fixed this by ensuring that mDPipe is closed when
the channel completes.	so, just like we have the channel telling the state
"DataConnectionEstablished" ... we now have the channel telling the state
"DataConnectionComplete".  i tried to avoid this explicit function call, but i
couldn't come up with anything nearly as clean and simple.

4- finally, i touched up a few things in the async stream copier... mainly
cosmetic.
Attachment #119315 - Flags: review?(dougt)
Comment on attachment 119315 [details] [diff] [review]
v1 patch

I don't understand this comment before or after this change.  Maybe we can
explain why we are cheating?

-    // we'll cheat a little bit here since we know that NS_AsyncCopy does pass
-    // an event queue.
+    // we'll cheat a little bit here since we know that NS_AsyncCopy does not
+    // pass an event queue.



I don't see the nsISocketEventHandler interface file.  

You may want to reorder these so that the more common interface is the first in
the list:

+NS_IMPL_THREADSAFE_ISUPPORTS4(nsSocketTransport,
+			       nsISocketEventHandler,
			       nsISocketTransport,
			       nsITransport,
			       nsIDNSListener)


don't you have a race in InitiateSocket?  Suppose the limit is reached on the
number of sockets created.  A new request comes in, and you return with
|notifyWhenCatAttachSocket|.  Sometime later a socket becomes available and
before the MSG_RETRY_INIT_SOCKET event is handled (but is already has been
posted), a different socket calls CanAttachSocket()


Jud made me to it, i swear! :

-	     (void) mFTPState->DataConnectionEstablished();
+	     mFTPState->DataConnectionEstablished();

I am worried about cycles  with the new ftp handler global.  Can you make that
we do clean up properly.

Extra points for making any references to "ftp" consistent wrt case. :-)
>I don't understand this comment before or after this change.  Maybe we can
>explain why we are cheating?

because we ignore the "eventQ" parameter (assuming it will always be null). 
i'll add a comment to that effect.


>I don't see the nsISocketEventHandler interface file.

it is included at the bottom of nsISocketTransportService.idl


>You may want to reorder these so that the more common interface is the first in
>the list:

i chose that ordering based on the fact that nsISocketEventHandler would
probably be the most commonly QI'd interface (given that it is part of the event
dispatch mechanism for the socket thread).


>don't you have a race in InitiateSocket?

yes, and that is ok.  it is by design.  to avoid that race would have required
re-entrant callbacks.  i would prefer to avoid that.  since each call to
InitiateSocket checks CanAttachSocket, everything will be fine.  sure, an older
socket request will sometimes lose out, but i don't think that will be a big
issue.  i'll document the code.

as for DataConnectionEstablished, it returns |void| now =)


>I am worried about cycles  with the new ftp handler global.

nsFtpProtocolHandler owns no references to the nsFtpState, so i think this is
safe.  plus, my logs confirm that each nsFtpState is destroyed, so i think this
should be safe.


extra points... no thanks! ;-)
It sounds like this patch will also fix bug 193908 (because FTP data connections
will now be closed in a timely fashion). Darin, do you agree?
fix the comment to be more verbose.  
You may want to document the fact that there is a race condition and mention
this bug.
mark: yeah, it should fix that too.
Blocks: 193908
Attached patch v1.1 patchSplinter Review
Attachment #119315 - Attachment is obsolete: true
Attachment #119315 - Flags: review?(dougt)
Attachment #119341 - Flags: superreview?(bzbarsky)
Attachment #119341 - Flags: review?(dougt)
Attachment #119341 - Flags: review?(dougt) → review+
Attachment #119341 - Flags: superreview?(bzbarsky) → superreview?(alecf)
Attachment #119341 - Flags: superreview?(alecf) → superreview+
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Talking to brade in #composer yesterday, there was a thought that perhaps this
patch would fix the problem with publishing to a webdav server.  We've tested
this here at OEone, and it doesn't fix the problems I reported in bug 201210.
Publishing to a webdav server is still broken with patch 1.1 applied.
right, this patch has nothing to do with HTTP.
Ok, using the reduced test case this problem is no longer occuring. In previous
builds, using this test case would reproduce the crash. Verified on the
2003-04-10-08 Macho and 2003-04-10-04 Win32 trunk builds.
Status: RESOLVED → VERIFIED
Depends on: 201663
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: