URL Classification can delay AsyncOpen-to-transaction-pending by 26-25%, visual metrics by 3-10% in raptor pageload test
Categories
(Core :: Networking, defect, P2)
Tracking
()
People
(Reporter: acreskey, Unassigned)
References
(Blocks 3 open bugs)
Details
(Whiteboard: [necko-triaged])
Based on jesup's profiling and investigation of pageload performance issues, I made a performance compare in which the new revision has URL classification, NS_ShouldClassifyChannel(), disabled.
We can see that URL classification can be significant bottleneck - multiple highconfidence improvements in AsyncOpen-to-transaction-pending (up to 26-25%) and visual metrics (3-10%) are seen in raptor pageload test when we disable classication.
Note that there are some regressions when disabling URL Classification as well (possibly due to tracking script tailing), but they are outnumbered by the improvements.
Of course I'm not proposing that we disable URL classification but rather highlighting that this is an area with potential for large performance improvements.
Reporter | ||
Updated•8 months ago
|
Reporter | ||
Comment 1•8 months ago
•
|
||
Running this on Fenix here but it may take a couple of days before we have results.
Reporter | ||
Comment 2•8 months ago
•
|
||
I'm trying out two ideas:
1 - dispatch the channel classification events (to background thread and back to main) with a high priority
https://hg.mozilla.org/try/rev/f3036fbb2069ee43e7a20d34b988613cc385b372
2 - don't bother breaking up the classification into events done on background and then main thread: just do it on the current thread
https://hg.mozilla.org/try/rev/00e00f6adcd5494b178e448e544c1e90c48c00eb
Reporter | ||
Comment 3•8 months ago
|
||
(In reply to Andrew Creskey [:acreskey] from comment #1)
Running this on Fenix here but it may take a couple of days before we have results.
As expected, we are see that disabling URL classification also has a major impact on network request latency on Android in this Perf compare.
Picking some examples
• amazon perfstat-HttpChannelAsyncOpenToTransactionPending opt cold webrender, improved about 28.57%
, high confidence
• amazon perfstat-HttpChannelAsyncOpenToTransactionPending opt warm webrender, improved about 38.59%
, high confidence
• bing perfstat-HttpChannelAsyncOpenToTransactionPending opt cold webrender, improved about 33%
, high confidence
• bing perfstat-HttpChannelAsyncOpenToTransactionPending opt warm webrender, improved about 33%
, high confidence
Some regression:
• amazon perfstat-HttpTransactionWaitTime opt warm webrender, icreased 113.07%
, high confidence (implying that there is another bottleneck in the system)
I don't see much in terms of visual metric changes except that LastVisualChange looks to be delayed from some pages, while FirstVisualChange is improved on ebay.com.
Reporter | ||
Comment 4•8 months ago
|
||
(In reply to Andrew Creskey [:acreskey] from comment #2)
I'm trying out two ideas:
1 - dispatch the channel classification events (to background thread and back to main) with a high priority
https://hg.mozilla.org/try/rev/f3036fbb2069ee43e7a20d34b988613cc385b372
This did not seem to work out very well -- Perf compare.
Numerous pages of regressions, generally in visual metrics.
But two improvements to the key HttpChannelAsyncOpenToTransactionPending metric on google-slides
• google-slides perfstat-HttpChannelAsyncOpenToTransactionPending opt cold fission webrender, -5.54%
, high confidence
• google-slides perfstat-HttpChannelAsyncOpenToTransactionPending opt fission warm webrender, -7.83%
, , high confidence
Reporter | ||
Comment 5•8 months ago
•
|
||
(In reply to Andrew Creskey [:acreskey] from comment #2)
2 - don't bother breaking up the classification into events done on background and then main thread: just do it on the current thread
https://hg.mozilla.org/try/rev/00e00f6adcd5494b178e448e544c1e90c48c00eb
And this idea also seems to make things worse -- perf compare
A few improvements to HttpChannelAsyncOpenToTransactionPending
• amazon perfstat-HttpChannelAsyncOpenToTransactionPending opt fission warm webrender, 11% improvement
, high confidence
• bing-search perfstat-HttpChannelAsyncOpenToTransactionPending opt cold fission webrender, 6% improvement
, high confidence
• cnn perfstat-HttpChannelAsyncOpenToTransactionPending opt fission warm webrender, 13% improvement
, high confidence
• google-slides perfstat-HttpChannelAsyncOpenToTransactionPending opt fission warm webrender, 9% improvement
, high confidence
And at least one visual metric improvement:
• netflix PerceptualSpeedIndex opt fission warm webrender, 8.7% improvement
, high confidence
But too many regressions to list, visual metrics and networking perfstats.
Reporter | ||
Comment 6•8 months ago
|
||
Randell is exploring what sounds to be a better solution which I'll paraphrase here: we don't block the cache lookup on the classification, and in fact the classification result will go directly to the socket thread (instead of hopping to background and back to main). Once it arrives on the socket thread we can kick off speculative and actual connections.
Reporter | ||
Comment 7•1 month ago
|
||
I tried this test on the Fenix startup tests, making bool NS_ShouldClassifyChannel()
return false and it yielded an 80% regression in the newssite-applink-startup applink_startup
test. Interesting in its own right.
Reporter | ||
Comment 8•26 days ago
|
||
(In reply to Andrew Creskey [:acreskey] from comment #7)
I tried this test on the Fenix startup tests, making
bool NS_ShouldClassifyChannel()
return false and it yielded an 80% regression in thenewssite-applink-startup applink_startup
test. Interesting in its own right.
The unexpected results are very likely the result of having profiling run performance data mixed into the comparison, see bug 1965803
I'm running this again outside of ./mach try perf
to not include the profiling run data
Reporter | ||
Comment 9•23 days ago
|
||
I compared this patch (i.e NS_ShouldClassifyChannel() return false) on the newssite-applink-startup applink_startup
(without profiling runs) but it did not pick up any improvements. (Note that the tests are somewhat bimodal and I see relative standard deviation at between 8.5% and 11.5%, which will make picking up small improvements very difficult).
Reporter | ||
Comment 10•23 days ago
|
||
Actually, the P6 results look to be normally distributed and might be particularly useful in testing these changes.
Description
•