Closed
Bug 1499307
Opened 6 years ago
Closed 6 years ago
Some images are not request/loaded for pages with a high number of images
Categories
(Core :: Networking, defect, P2)
Tracking
()
RESOLVED
DUPLICATE
of bug 1486046
People
(Reporter: michael.kroell, Assigned: u408661)
Details
(Whiteboard: [necko-triaged])
Attachments
(4 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.9 Safari/537.36 Steps to reproduce: (1) Browse to https://geizhals.at/?cat=nb&hloc=at&hloc=de (2) Increase the number of products shown per page to 1000 in the select box (right next to "Blättern:") OS: Windows 10 Pro, Version 1803,Build: 17134.345 Firefox 62.0.3 64-Bit But we've seen that behaviour also with OS: Debian Linux (buster/sid) Firefox: 60.2.2esr (64-bit) Actual results: Some images fail to load. Per reload it is different images which are not loaded and the alt-text is shown instead. Looking at the network console shows that those images are actually not requested from the server at all. In out tests it did not make a difference whether "Disable Cache" was ticket or not in the developer console Expected results: All images should be loaded and display all the time.
Reporter | ||
Updated•6 years ago
|
OS: Unspecified → Windows 10
Hardware: Unspecified → x86_64
Updated•6 years ago
|
Component: Untriaged → Networking
Product: Firefox → Core
Comment 1•6 years ago
|
||
Please try to reproduce with Firefox Nightly first. I can't reproduce it locally with the Nightly channel. If this is still persisting even with Nightly, I will ask few more details: - by "fail to load" you mean that the page load indicator (the throbber) has already stopped spinning while some (product) images were still missing? - I will ask you for a log according [1], please add following modules to MOZ_LOG: imgRequest:5 then upload here and also list URL of images that failed to load. thanks. [1] https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Start_logging_using_command_line_arguments
Flags: needinfo?(michael.kroell)
Comment 2•6 years ago
|
||
I can also reproduce the issue on Nightly64.0a1(20181016100106) Linux Mint. (VMPlayer, Guest OS: Linux Mint19 64bit, Host OS: Windows10 Ver1803 64bit) I cannot reproduce the issue if set MOZ_LOG to timestamp,rotate:200,nsHttp:5,cache2:5,nsSocketTransport:5,nsHostResolver:5,imgRequest:5 (I tried 2 times but I could not reproduce it.) But, I can reproduce the issue if only set MOZ_LOG to imgRequest:5 (The test was executed after clearing history and cache each time)
Comment 3•6 years ago
|
||
Mozregression https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=cef590a6f946&tochange=17de0f463944
Comment 4•6 years ago
|
||
(In reply to Alice0775 White from comment #2) > But, I can reproduce the issue if only set MOZ_LOG to imgRequest:5 Thanks! Can you please at least do nsHttp:5,imgRequest:5? And please add image URLs that have failed for you. Thanks.
Flags: needinfo?(alice0775)
Comment 5•6 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #4) > (In reply to Alice0775 White from comment #2) > > But, I can reproduce the issue if only set MOZ_LOG to imgRequest:5 > > Thanks! > > Can you please at least do nsHttp:5,imgRequest:5? And please add image URLs > that have failed for you. Thanks. Umm... I can reproduce the issue if set MOZ_LOG to imgRequest:5, but cannot reproduce set to nsHttp:5,imgRequest:5. I tried 3 times each, the former is reproduced 3 of 3, but the latter 0 of 3....
Flags: needinfo?(alice0775)
Comment 6•6 years ago
|
||
and only nsHttp:5 + the list of URLs that failed?
Flags: needinfo?(alice0775)
Comment 7•6 years ago
|
||
I got! (1.544 Mbps using network speed restricrion) export MOZ_LOG=rotate:200,nsHttp:5,imgRequest:5 I attached 2 file(the size is large, I split it into 2). first 5 https://geizhals.at/asus-zenbook-pro-ux550vd-schwarz-90nb0et2-m04100-a1770475.html?hloc=at&hloc=de https://geizhals.at/hp-elitebook-840-g5-5df22es-abd-a1897776.html?hloc=at&hloc=de https://geizhals.at/lenovo-legion-y730-17ich-81hg0021ge-a1904130.html?hloc=at&hloc=de https://geizhals.at/lenovo-thinkpad-l580-20lw000wge-a1783479.html?hloc=at&hloc=de https://geizhals.at/lenovo-thinkpad-l580-20lw000uge-a1783483.html?hloc=at&hloc=de last 5 https://geizhals.at/lenovo-ideapad-320-15ikb-schwarz-80xl03fcge-a1729545.html?hloc=at&hloc=de https://geizhals.at/lenovo-ideapad-320-17ast-schwarz-80xw0016ge-a1813133.html?hloc=at&hloc=de https://geizhals.at/lenovo-ideapad-320s-13ikb-gold-81ak0039ge-a1729616.html?hloc=at&hloc=de https://geizhals.at/lenovo-ideapad-330-15ich-schwarz-81fk0041ge-a1856686.html?hloc=at&hloc=de https://geizhals.at/lenovo-ideapad-330-15igm-schwarz-81d100fyge-a1833499.html?hloc=at&hloc=de
Flags: needinfo?(alice0775)
Comment 8•6 years ago
|
||
Comment 9•6 years ago
|
||
Thanks. What exactly do yo mean with "first 5" and "last 5" ?
Comment 11•6 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #9) > Thanks. What exactly do yo mean with "first 5" and "last 5" ? When I reproduced the issue, Many images are missing. So, I picked up 5 missing image urls from page top and 5 missing image urls from page bottom. Page top ... ... missing1 missing2 ... missing3 missing4 missing5 ... ... ... ... missing5 ... missing4 missing3 missing2 missing1 ... ... Page bottom
Flags: needinfo?(alice0775)
Comment 12•6 years ago
|
||
Hmm.. the URLs you provide are links hrefs to the product pages, not the image URLs that failed to load. Never mind, I was able to repro locally with the same set of modules you have provided as working for you.
Comment 13•6 years ago
|
||
This is an h2 issue or a server issue. Nick, please download the log at https://www.janbambas.cz/moz/logs/1499307-hb-1.tar.xz (at the moment still slowly uploading, 12300322 bytes) Then go to logan (Ctrl-F5 - there were some recent bug fixes!) at https://janbambas.cz/moz/logan/#{"show"%3A[{"name"%3A"nsHttpChannel"%2C"on"%3A2519%2C"clr"%3A"%23ffffb3"}%2C{"name"%3A"nsHttpTransaction"%2C"on"%3A2434%2C"clr"%3A"%23bebada"}%2C{"name"%3A"Http2Stream"%2C"on"%3A2374%2C"clr"%3A"%23fb8072"}]} And load the log in it. The stream is closed with 80004004, because we receive a TCP RST. If you then open the session and scroll up, you'll get to a line `nsHttpConnection::DontReuse 0000027F93A44000 spdysession=0000027F91F14800` because of 804b0014 (net-seset) on the socket. Question is if we did something wrong or not. If yes, this bug should have a patch. If not, then this is another duplicate of bug 1486046. I would slowly tend to reopening it, since this is really frustrating...
Assignee: nobody → hurley
Flags: needinfo?(michael.kroell)
Priority: -- → P2
Whiteboard: [necko-triaged]
Comment 14•6 years ago
|
||
I prepared a simpler document that is also able to trigger the problem initially described by Michael Kroell; you can find it at https://gzhls.at/imgdemo.html (make sure to have the UA's cache cleared after each try to render it) It includes 1500 small images (all with different URIs) that are served by the same HTTPS server the original problem was observed with. I had a peek in the servers' logs while the associated requests were processed - interestingly, we register exactly 1000 (one thousand - while I would expect exactly 1500; maybe there is some obscure and artifical limit at work?) requests with the appropriate "Referer" header set, 100% of which our frontend delivers, apparently successfully, with status code 200. Neither Varnish (the origin for these cached responses, i.e. images) nor nginx (the HTTPS termination point) complain about/log anything abnormal while delivering, while Firefox' developer console shows a large number of requests for image's URIs, which had no responses logged for whatsoever.
Updated•6 years ago
|
Keywords: regression
Assignee | ||
Comment 15•6 years ago
|
||
Thanks for the log, Honza, as well as the pointers to stuff in logan. Made this much easier to make sense of! This is, indeed, a duplicate of bug 1486046. To make things even more convoluted, this exact issue with nginx has been discussed in yet another bug (that was temporarily hijacked from its original purpose) - take a look at the few comments starting at https://bugzilla.mozilla.org/show_bug.cgi?id=1050329#c9 I'll give a quick summary here, though - the way nginx shuts down h2 connections that have reached the request limit (which is, by default, 1000) is fundamentally broken, and results in data loss because of the RST that gets sent. Since we receive a RST without a GOAWAY (after sending all requests, including those past the 1000th), there is no way for us to know if the server did anything or not with our requests, and 7540 says (or at least strongly implies) to not retry in that scenario (see my comments in bug 1486046 for more details). Nginx has a bug on file about the whole broken close thing: https://trac.nginx.org/nginx/ticket/1250#comment:4 Reporter, probably your best thing to do is to significantly increase your http2_max_requests setting in nginx (Firefox, for example, won't roll over a connection until it has created something on the order of 50,000,000 streams - I'm not sure we ever reach that limit in practice). Why they set the default at 1000 is beyond me, it doesn't make sense for a protocol where the connection is designed to be long-lived and carry lots of transactions precisely to avoid the cost of setting up new connections.
Comment 16•6 years ago
|
||
I would still be interested in why this can't be reproduced with default log modules. Seems like it's causing some significant timing difference that prevents us reaching the limit.
Assignee | ||
Comment 17•6 years ago
|
||
I would imagine that our http stack gets slowed down enough on the send side that we are able to read the GOAWAY out of the kernel buffer (that nginx does send) before we send data causing the server to send a RST (because of nginx's broken tcp stream teardown). Since that gets read before the RST clears the kernel buffer, we notice its existence, and can safely retry the appropriate streams on a new session, and the behaviour doesn't manifest.
Comment 18•6 years ago
|
||
> Why they set the default at 1000 is beyond me
I could just add that I've probed nginx about this. The only answer I received was basically "because a customer requested it".
That is of course no explanation to why the default, why the default is so low and why the documentation basically doesn't help nginx users to make an informed decision about what to set the value to. I've seen nginx servers in the wild with a lower max requests value than max allowed concurrent streams!
Comment 19•6 years ago
|
||
For the record: After considerably increasing nginx's http2_max_requests setting, the problem's symptoms went away for us.
You need to log in
before you can comment on or make changes to this bug.
Description
•