Open Bug 1526974 Opened 5 years ago Updated 2 years ago

Requests blocked despite connection maximum not reached

Categories

(DevTools :: Netmonitor, defect, P3)

65 Branch
defect

Tracking

(Not tracked)

REOPENED

People

(Reporter: Sarkie, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [necko-triaged])

Attachments

(8 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36

Steps to reproduce:

Visited a Website with many resource files with Keep-Alive enabled, they also don't have a content length but are cacheable

The site is private.

Actual results:

Files are download and cached

For next XHRs they are being blocked by max connection (I think) because the connections are till alive (maybe)

Expected results:

Browser shouldn't blocked the XHRs once the files have been cached as it is adding 200ms to all requests.

Component: Untriaged → Networking: HTTP
Product: Firefox → Core

David, I will ask you for an http log, please send it to my bugzilla email directly as your site is private and you may not want to publish any URLs or cookies:

https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Start_logging_using_command_line_arguments

it will be helpful to also state URLs that are failing for you. and maybe also copy of the about:support page, specifically the "Important Modified Preferences" section to check your settings.

thanks.

Flags: needinfo?(Sarkie)

Email sent.

Flags: needinfo?(Sarkie)

First of all, we don't cache POST responses, ever.

From the log (any private data stripped off, looking at the second of the two channels with provided URL:)

2019-02-15 12:27:05.825000 UTC - [Parent 17840: Main Thread]: D/nsHttp nsHttpChannel::ResolveProxy [this=0000019D1083C000]

ZERO delay in resolving the system proxy

2019-02-15 12:27:05.827000 UTC - [Parent 17840: Main Thread]: D/nsHttp nsHttpChannel::OnProxyAvailable [this=0000019D1083C000 pi=0000000000000000 status=0 mStatus=0]

delay 3ms, general processing

2019-02-15 12:27:05.830000 UTC - [Parent 17840: Socket Thread]: D/nsSocketTransport nsSocketTransport::ResolveHost [this=0000019D0D124800 XXXXX]

delay 1ms to resolve the host name (apparently DNS was cached)

2019-02-15 12:27:05.831000 UTC - [Parent 17840: Socket Thread]: D/nsSocketTransport nsSocketTransport::OnLookupComplete: this=0000019D0D124800 status 0.
2019-02-15 12:27:05.832000 UTC - [Parent 17840: Socket Thread]: D/nsSocketTransport   trying address: XXXXX

delay 28ms to connect the server

2019-02-15 12:27:05.860000 UTC - [Parent 17840: Socket Thread]: E/nsSocketTransport nsSocketTransport::OnSocketReady [this=0000019D0D124800 outFlags=2]

full TLS nego taking 320ms

2019-02-15 12:27:06.180000 UTC - [Parent 17840: Socket Thread]: E/nsSocketTransport nsSocketTransport::OnSocketReady [this=0000019D0D124800 outFlags=2]
...
2019-02-15 12:27:06.180000 UTC - [Parent 17840: Socket Thread]: V/nsHttp nsHttpConnection::EnsureNPNComplete [this=0000019D0F968800] setting complete to true
...
2019-02-15 12:27:06.180000 UTC - [Parent 17840: Socket Thread]: D/nsSocketTransport nsSocketOutputStream::Write [this=0000019D0D124AA8 count=5194]
2019-02-15 12:27:06.180000 UTC - [Parent 17840: Socket Thread]: D/nsSocketTransport   calling PR_Write [count=5194]
2019-02-15 12:27:06.180000 UTC - [Parent 17840: Socket Thread]: D/nsSocketTransport   PR_Write returned [n=5194]

response headers come with delay of 700ms

2019-02-15 12:27:06.860000 UTC - [Parent 17840: Socket Thread]: E/nsSocketTransport nsSocketTransport::OnSocketReady [this=0000019D0D124800 outFlags=1]
...
2019-02-15 12:27:06.869000 UTC - [Parent 17840: Socket Thread]: I/nsHttp http response [
  HTTP/1.1 200 OK

The response (5 bytes..) comes with another 99ms delay

2019-02-15 12:27:06.959000 UTC - [Parent 17840: Socket Thread]: E/nsSocketTransport nsSocketTransport::OnSocketReady [this=0000019D0D124800 outFlags=1]
2019-02-15 12:27:06.959000 UTC - [Parent 17840: Socket Thread]: D/nsSocketTransport nsSocketInputStream::Read [this=0000019D0D124A70 count=32768]
2019-02-15 12:27:06.959000 UTC - [Parent 17840: Socket Thread]: D/nsSocketTransport   calling PR_Read [count=32768]
2019-02-15 12:27:06.959000 UTC - [Parent 17840: Socket Thread]: D/nsSocketTransport   PR_Read returned [n=5]

done in 9ms (general processing delay)

2019-02-15 12:27:06.968000 UTC - [Parent 17840: Main Thread]: D/nsHttp nsHttpChannel::OnStopRequest [this=0000019D1083C000 request=0000019D0C592700 status=0]

The first POST behaves mostly the same, on a reused connection, time of the transaction being delayed because of queuing is exactly 2ms:

2019-02-15 12:23:06.574000 UTC - [Parent 17840: Main Thread]: V/nsHttp nsHttpConnectionMgr::AddTransaction [trans=0000019D09208800 0]
...
2019-02-15 12:23:06.576000 UTC - [Parent 17840: Socket Thread]: E/nsHttp nsHttpConnection::Activate [this=0000019D1278B400 trans=0000019D09208800 caps=129]

Closing as INVALID, the site is simply slower to respond.

Status: UNCONFIRMED → RESOLVED
Closed: 5 years ago
Resolution: --- → INVALID

Reopened per Valentin request.

Looks like there is a bug in how the backend is calculating the blocked time.

Note that here is the code that does that:
https://searchfox.org/mozilla-central/rev/92d11a33250a8e368c8ca3e962e15ca67117f765/devtools/server/actors/network-monitor/network-observer.js#852-860

Valentin, does this seem ok?

Honza

Status: RESOLVED → REOPENED
Ever confirmed: true
Flags: needinfo?(valentin.gosu)
Resolution: INVALID → ---

(In reply to Jan Honza Odvarko [:Honza] (always need-info? me) from comment #4)

Valentin, does this seem ok?

I'm not exactly sure if it's correct or not.
Why are we checking timings.STATUS_RESOLVING && timings.STATUS_CONNECTING_TO and then using only STATUS_RESOLVING ?

Also, I wonder if the bug is in _calculateOffsetAndTotalTime instead.

Flags: needinfo?(valentin.gosu)
Priority: -- → P3
Whiteboard: [necko-triaged]

Moving to devtools for investigation.

Component: Networking: HTTP → Netmonitor
Product: Core → DevTools
Attached image requests.png

Hey Honza,
I was looking for few bugs to fix to get back into Devtools, coincidentally I found this one (I had worked on fixing blocking state in the UI). On looking further, I tried to replicate (based on what I understood), however I couldn't reproduce this one.

The upper half of the image, is the first (non-cached) request, while the lower half is the consecutive request(cache primed state). In the 2nd half, there seems to be no blocking.

I've used the following code to test
cache-block.php

<html>     
<head>                                                        
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
</head>
<body>
<script>                                                      
for (var i=0; i<30;i++) {                                     
    $.get("zero-byte-block.php?"+i);                          
}
</script>                                                     
</body>
</html> 

zero-byte-block.php

<?php
sleep(1);
header("Expires: Sat, 9 Mar 2019 05:00:00 GMT");
//header("Content-Type: image/gif");
header("Content-Type: text/javascript");
Flags: needinfo?(odvarko)

My use case also has keep alive and http chunking if that is the underlying issue?

@Ruturaj: did the previous comment help?

Honza

Flags: needinfo?(odvarko)
Attached image Selection_011.png

Hey Honza,

No, The previous comment didn't help.

The parent page, I ensure chunked encoding was in place along with "no gzip", this was also applicable for the XHR requests. Yet on the "subsequent" request, I didn't see any "blockages".

  • Rutu
Flags: needinfo?(odvarko)

We are using gzip, and I think POST.

I can verify in a couple of hours, I can also email privately if needed more private information.

(In reply to David Poole from comment #11)

I can verify in a couple of hours, I can also email privately if needed more private information.

Thanks for the help!

Honza

Flags: needinfo?(odvarko)

Hey David,

POST is never cached. You sure the headers of POST XHR had caching headers? like Expires or Cache-Control ?

Also, I saw the user-agent string that you've given - Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
This isn't a Mozilla Firefox UA.

  • Rutu
Flags: needinfo?(Sarkie)

I think i used Chrome to create the bug?

If that's automatically added.

This is not caching related but blocking new connections, the requests are posts but i can add more info tomorrow.

I'm ooo today.

The issue is dev tools states the connection is blocked as reached max connections but the http log states otherwise.

Flags: needinfo?(Sarkie)

Emailed with Logs, to you Rutu.

I can more on the latest version if needed to show the issue.

I have screenshots from the other browsers too, to show intended behaviour.

Cheers,

Hey David,
Thanks for the email. It seems your case has got SSL involved. My test-setup that I'd created to recreate the issue was localhost, no SSL. Lemme try rework the test setup with SSL and try to get back on this one.

  • Rutu

Hey David,
I went through the SSL use case as well, however on my test page I wasn't able to replicate the bug.
The log that you've sent me, could you also send the associated HAR?
thanks,
Rutu

Flags: needinfo?(Sarkie)
Attached image Selection_014.png

David,
I tried again, and I was lucky to replicate it. First time I tried it was localhost, now I tried it with my own site, And I see what you mean.

Is that the concern?

Attached image Selection_015.png

For the same page I did a Google chrome test, there too there was a "queuing" time and not "stalled".

I'm not exactly sure if it's correct or not.
Why are we checking timings.STATUS_RESOLVING && timings.STATUS_CONNECTING_TO and then using only STATUS_RESOLVING ?

Also, I wonder if the bug is in _calculateOffsetAndTotalTime instead.

Perhaps in Firefox there is no separation between queuing and stalling. Perhaps this is where Valentin's point could be taken in consideration.

I've setup the issue here - https://ruturaj.net/old-files/cache-block.php

Honza, could you please check this?

I've moved my site back to HTTP/1.1 for this bug, Once verified I want to move it back to HTTP2

Flags: needinfo?(odvarko)

I don't currently have free time today, but can you change the keep alive to 30 seconds to test something else. Thanks

Flags: needinfo?(Sarkie)

Hey David,
I tried that with 30s, however still the same. Also what I got yest in attachment#9055155 [details], I'm unable to replicate again :(

Hi,
Sorry can we go back to this.

It is happening on the latest version of Firefox (69.0.1)

I have a HAR file, but can do another log if needed, this is happening on our newer application too, same middleware etc, OHS / WebLogic.

Hey David,
It'll be great if you can attach the HAR file here.
Thanks.

Can I send it privately?

David,
It would be great if u replace the hostnames / user / pass and put it here, so that its available as a general record.
Thanks.

Sure Ruturaj,
Is there a tool or is it just done manually?

A HAR file is just another text file, you can open it in a text editor like sublime text, Notepad, etc and replace it.

Attached file blocked.har

HAR uploaded.

So Keep Alive Timeout is set to 5.

For the first part of the HAR, the images are blocked, so that's fine, any more than 6 (iirc).

Then once that is completed, 16 seconds later, I load a POST, via AJAX, that is then blocked for 332 ms, why?
The next one is 358 after 5 seconds, then 366 after 5 seconds.

Then I do a final request within 5 seconds, it isn't blocked, and will be faster.

As far as I can tell from the documentation "blocked" means Firefox is holding the connection back to throttle the connection.

This looks to be wrong?

Hey David,

Then once that is completed, 16 seconds later, I load a POST, via AJAX, that is then blocked for 332 ms, why? The next one is 358 after 5 seconds, then 366 after 5 seconds.

Thanks for the file, now could you also generate the corresponding HAR on Google Chrome, so we'll have something to compare against. Perhaps its adding the connection time to blocked , not sure.

Flags: needinfo?(Sarkie)
Attached file ssd_chrome.har
Flags: needinfo?(Sarkie)

Uploaded Chrome version, the issue is I can't replicate as Chrome

Blocks: 1484005
Flags: needinfo?(odvarko)

Hi,
Just to update this, we are seeing the same issue but it may related to FF and the Server.

I have a new log, where every new connection is blocked, regardless of persistent connections but I am unsure if the issue is because the Keep Alive portion is broken on the server, which is causing Firefox to have issues.

I have a new log / har file to show.

I have also done it at the root level of the server, so a standard html page and still seeing the issue.

I can send the new files via email

Thanks

Flags: needinfo?(odvarko)

Sarkie, yes please, send it to my email.
You can also attach a screenshot of the Network panel here, so others can have at least that.
Thank you.

Flags: needinfo?(odvarko)

Blocked

Time spent in a queue waiting for a network connection.

The browser imposes a limit on the number of simultaneous connections that can be made to a single server. In Firefox this defaults to 6, but can be changed using the network.http.max-persistent-connections-per-server preference. If all connections are in use, the browser can't download more resources until a connection is released.

I have set network.http.max-persistent-connections-per-server to 1024 and I am still getting blocked.

Chrome doesn't show the issue, screenshot below

Application Basics

Name: Firefox
Version: 88.0.1
Build ID: 20210504152106
Distribution ID:
Update Channel: release
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0
OS: Windows_NT 10.0 19041
Launcher Process: Enabled
Multiprocess Windows: 6/6
Fission Windows: 0/6 Disabled by default
Remote Processes: 12
Enterprise Policies: Inactive
Google Location Service Key: Found
Google Safebrowsing Key: Found
Mozilla Location Service Key: Found
Safe Mode: false

The bug has been gotten pretty long already. I just wanted to say, that in my humble opinion, this should have the highest priority, as supposedly but not-working features are very bad (people wonder, spend time on debugging, just to find out it’s a bug and it’s known to not work).

So, until this is fixed, the request timings feature should be disabled, or fixed. Otherwise the rest of Web site developers, still using Mozilla Firefox, are also going to migrate to alternatives.

As other bug reports, like bug 1709703 1, have been closed as a duplicate of this report, the title should probably be updated.

Summary: Connections getting blocked with http.max-persistent-connections-per-proxy → Requests blocked despite connection maximum not reached

Hi is there an update? This bug is unassigned and has been open for 4 years. Is any further information needed?

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: