I'll attach a packet capture that shows firefox enagaged in dozens (~90) of parallel sessions due to hostname sharding and the server responding with very aggressive TCP behavior on each of them (IW10).
This leads to losses, retransmissions, and 500ms of queuing induced link delays.
Generally we want to allow such high levels of parallelism because they are used to fight against high latency when dealing with small objects. Its the big objects and the IW10 configuration that get us into trouble here.
I hypothesize that when we start to do a large amount of parallelism we restrict our announced TCP receive window so as to effectively disallow IW10. That will still give us good latency on these highly parallel connections but throttle back the bandwidth and avoid over saturation. In the case of images that should still bring back the layout information quickly, which is great. As the active parallelism is reduced the window can be increased.
Low amounts of parallelism could just use normal OS auto tuning...
I don't yet know what the thresholds for low-vs-high are.
Created attachment 683723 [details]
There are significant variances in support for the necessary rwin management primatives between platforms.. I detailed most of that here:
Even given those differences, there is a reasonable strategy to at least try that would apply to win >= vista and os x.. notably that leaves XP and Linux/Android out.