Closed
Bug 1129345
Opened 10 years ago
Closed 10 years ago
Update ftp://ftp.mozqa.com FTP server to accept multiple FTP control connections
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task)
Infrastructure & Operations Graveyard
WebOps: Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: teodruta, Assigned: cliang)
References
Details
(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/408])
We need to update ftp://ftp.mozqa.com FTP server to accept multiple (concurrent) control connections from the same host.
We run multiple remote mozmill tests in mozmill-ci which use this ftp server.
For reference please look at Bug 117875.
The connection to the FTP sometimes will maintain itself even after closing the FTP tab until firefox will not get shutdown signal.
What this means for the remote mozill-tests:
We have a test which uses this ftp server to download a file.
After this test there is another one which navigates through the FTP layout of the mozqa ftp server.
This test may fail to connect to the mozqa ftp server if the connection for the previous test is still maintained.
Steps:
1. Open ftp://ftp.mozqa.com in a firefox tab
2. Open netstat.
3. Observe TIME_WAIT state for the ftp connection to the mozqa server on port 21.
TIME_WAIT - local endpoint has closed the connection.
This state will maintain for around 1 minute before it will change to CLOSE_WAIT.
CLOSE_WAIT - remote endpoint has closed the connection.
Steps (2):
// For this is required a stand-alone ftp client, like filezilla
1. Open ftp://ftp.mozqa.com in a firefox tab
2. Open a connection with ftp://ftp.mozqa.com in filezilla
3. Navigate anywhere in the open mozqa ftp firefox tab
4. Observe that the connection is lost in the stand-alone ftp client (filezilla)
| Reporter | ||
Updated•10 years ago
|
Component: WebOps: IT-Managed Tools → WebOps: Other
| Assignee | ||
Comment 1•10 years ago
|
||
It is very common practice to not immediately close out connections because it's easier / uses less resources to re-use an existing connection than to constantly build them up and tear them down. I'm assuming that the stand-alone client sends an explicit signal to close the connection while firefox itself does not.
Looking at the load balancer, connections are closed if:
- when the connection is first opened, no additional data comes within 10 seconds and
- an existing connection gets no new data for 40 seconds
Comment 2•10 years ago
|
||
(In reply to Teodor Druta from comment #0)
> We have a test which uses this ftp server to download a file.
> After this test there is another one which navigates through the FTP layout
> of the mozqa ftp server.
> This test may fail to connect to the mozqa ftp server if the connection for
> the previous test is still maintained.
Teodor, would you mind to create a minimized Mozmill test which exercises exactly this situation? I think I still miss something, especially because we are only talking about Firefox here. It should know about the state and handle it appropriately. I think I would have to investigate that a bit closer first.
> Steps:
> 1. Open ftp://ftp.mozqa.com in a firefox tab
> 2. Open netstat.
> 3. Observe TIME_WAIT state for the ftp connection to the mozqa server on
> port 21.
>
> Steps (2):
> // For this is required a stand-alone ftp client, like filezilla
> 1. Open ftp://ftp.mozqa.com in a firefox tab
> 2. Open a connection with ftp://ftp.mozqa.com in filezilla
> 3. Navigate anywhere in the open mozqa ftp firefox tab
> 4. Observe that the connection is lost in the stand-alone ftp client
> (filezilla)
Those are not valid steps for us given that two different applications have been used here. Please stay with Firefox only.
Flags: needinfo?(teodor.druta)
| Reporter | ||
Comment 3•10 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #2)
> Teodor, would you mind to create a minimized Mozmill test which exercises
> exactly this situation?
I'm sorry, I won't be able to provide a minimized testcase for this issue.
Flags: needinfo?(teodor.druta)
Comment 4•10 years ago
|
||
(In reply to Teodor Druta from comment #0)
> 2. Open a connection with ftp://ftp.mozqa.com in filezilla
> 3. Navigate anywhere in the open mozqa ftp firefox tab
> 4. Observe that the connection is lost in the stand-alone ftp client
> (filezilla)
Something which is not clear to me here is how this affects a single host when contacting the FTP server. We only have a single instance of Firefox running at a time. There is no other application which is running at the same time and requesting a control connection.
So if we fail in running a couple of tests right after each other, I would assume that something might be broken in Firefox. Each attempt to retrieve a page from the FTP server should cause the control connection to be reused.
I run our Mozmill test for the FTP navigation about 100 times in a loop and it never failed. So I don't think that this has something to do with the control connection. If that would be the case we should see failures all the time. But it hasn't appeared again after this bug was filed.
cyliang, what is the maximum amount of connections we allow for that FTP server?
Flags: needinfo?(cliang)
| Assignee | ||
Comment 5•10 years ago
|
||
So, there are two layers here:
1. the load balancer that accepts connections for ftp.mozqa.com and then forwards them to the server
2. the server, running the ftp software
At the load balancer, there is no limit to the number of connections.
At the server level, according to vsftpd.conf, the max number of clients is 10. Looking at the vsftpd man page, there is a separate setting (max_per_ip) which controls the number of clients from the same ip; that defaults to 0 (no limit).
Flags: needinfo?(cliang)
Comment 6•10 years ago
|
||
So in our case we have 64 machines which run Mozmill tests at the same time for releases and betas. That means it can happen that more than 10 of them will try to reach the FTP server. Given the max number of 10 clients, I assume the others will be rejected until the control connection is closed?
Updated•10 years ago
|
Flags: needinfo?(cliang)
| Assignee | ||
Comment 7•10 years ago
|
||
Can you run a simple ftp test on 20 or so servers to see what actually happens?
I believe you are correct as long as vsftpd defines as client as any client connection. However, I usually find it useful to actually test because of the gap between theory and reality. =)
Flags: needinfo?(cliang)
Comment 8•10 years ago
|
||
(In reply to C. Liang [:cyliang] from comment #7)
> Can you run a simple ftp test on 20 or so servers to see what actually
> happens?
I cannot do this easily. Best would be if we could increase this number and verify for the next beta builds which we will have later today. In such a case all of our 64 machines will be utilized.
> I believe you are correct as long as vsftpd defines as client as any client
> connection. However, I usually find it useful to actually test because of
> the gap between theory and reality. =)
The best testcase we can have is totally a day with at least a beta or release candidate build. We always see failures with the FTP server over the past months. So if a max client bump is successful, we will see it.
Comment 9•10 years ago
|
||
We kinda fail a lot here. So I would like to do my proposal if possible. Cyliang, what do you think?
Flags: needinfo?(cliang)
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/408] → [qa-automation-blocked][kanban:https://webops.kanbanize.com/ctrl_board/2/408]
| Assignee | ||
Comment 10•10 years ago
|
||
I've upped the max clients from 10 to 40 and restarted the FTP server. Let me know how testing goes. =)
Flags: needinfo?(cliang)
Comment 11•10 years ago
|
||
I ran the same tests as today afternoon and compared to a dozen of failures earlier today only a single failure related to the FTP server exists now. So I can clearly say this is an improvement. Lets wait for the next official beta release which will run the tests for each and every locale.
So thanks for upping the number to 40.
| Assignee | ||
Updated•10 years ago
|
Assignee: server-ops-webops → cliang
Comment 12•10 years ago
|
||
Today we had tests for the 38.0.5b1 candidate builds and happily none of our FTP related tests were failing! So this was the right thing to do. I wonder if it would hurt to even raise it to 60 which would nearly be equivalent to the number of machines we have in total. Also this host is accessible by the public and community members could also run tests at the same time. So a bit of buffer would be good.
Status: NEW → ASSIGNED
| Assignee | ||
Comment 13•10 years ago
|
||
Number of max clients increased to 60. The FTP server has been restarted.
Comment 14•10 years ago
|
||
Ok, I think we can mark the bug as fixed then. Thanks for your help!
Updated•10 years ago
|
Whiteboard: [qa-automation-blocked][kanban:https://webops.kanbanize.com/ctrl_board/2/408] → [kanban:https://webops.kanbanize.com/ctrl_board/2/408]
| Assignee | ||
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•