1927538 - URL Classification can delay AsyncOpen-to-transaction-pending by 26-25%, visual metrics by 3-10% in raptor pageload test

Reporter

Description

•

1 year ago

Based on jesup's profiling and investigation of pageload performance issues, I made a performance compare in which the new revision has URL classification, NS_ShouldClassifyChannel(), disabled.

We can see that URL classification can be significant bottleneck - multiple highconfidence improvements in AsyncOpen-to-transaction-pending (up to 26-25%) and visual metrics (3-10%) are seen in raptor pageload test when we disable classication.

Performance compare here.

Note that there are some regressions when disabling URL Classification as well (possibly due to tracking script tailing), but they are outnumbered by the improvements.

Of course I'm not proposing that we disable URL classification but rather highlighting that this is an area with potential for large performance improvements.

Andrew Creskey [:acreskey]

Reporter

Updated

•

1 year ago

Blocks: necko-perf

Andrew Creskey [:acreskey]

Reporter

Comment 1

•

1 year ago

•

Edited

Running this on Fenix here but it may take a couple of days before we have results.

Andrew Creskey [:acreskey]

Reporter

Updated

•

1 year ago

Comment 2

•

1 year ago

•

Edited

I'm trying out two ideas:

1 - dispatch the channel classification events (to background thread and back to main) with a high priority
https://hg.mozilla.org/try/rev/f3036fbb2069ee43e7a20d34b988613cc385b372

2 - don't bother breaking up the classification into events done on background and then main thread: just do it on the current thread
https://hg.mozilla.org/try/rev/00e00f6adcd5494b178e448e544c1e90c48c00eb

Andrew Creskey [:acreskey]

Reporter

Comment 3

•

1 year ago

(In reply to Andrew Creskey [:acreskey] from comment #1)

Running this on Fenix here but it may take a couple of days before we have results.

As expected, we are see that disabling URL classification also has a major impact on network request latency on Android in this Perf compare.

Picking some examples
• amazon perfstat-HttpChannelAsyncOpenToTransactionPending opt cold webrender, improved about 28.57%, high confidence
• amazon perfstat-HttpChannelAsyncOpenToTransactionPending opt warm webrender, improved about 38.59%, high confidence
• bing perfstat-HttpChannelAsyncOpenToTransactionPending opt cold webrender, improved about 33%, high confidence
• bing perfstat-HttpChannelAsyncOpenToTransactionPending opt warm webrender, improved about 33%, high confidence

Some regression:
• amazon perfstat-HttpTransactionWaitTime opt warm webrender, icreased 113.07%, high confidence (implying that there is another bottleneck in the system)

I don't see much in terms of visual metric changes except that LastVisualChange looks to be delayed from some pages, while FirstVisualChange is improved on ebay.com.

Blocks: perf-android

Andrew Creskey [:acreskey]

Reporter

Comment 4

•

1 year ago

(In reply to Andrew Creskey [:acreskey] from comment #2)

I'm trying out two ideas:

1 - dispatch the channel classification events (to background thread and back to main) with a high priority
https://hg.mozilla.org/try/rev/f3036fbb2069ee43e7a20d34b988613cc385b372

This did not seem to work out very well -- Perf compare.

Numerous pages of regressions, generally in visual metrics.

But two improvements to the key HttpChannelAsyncOpenToTransactionPending metric on google-slides
• google-slides perfstat-HttpChannelAsyncOpenToTransactionPending opt cold fission webrender, -5.54%, high confidence
• google-slides perfstat-HttpChannelAsyncOpenToTransactionPending opt fission warm webrender, -7.83%, , high confidence

Andrew Creskey [:acreskey]

Reporter

Comment 5

•

1 year ago

•

Edited

(In reply to Andrew Creskey [:acreskey] from comment #2)

2 - don't bother breaking up the classification into events done on background and then main thread: just do it on the current thread
https://hg.mozilla.org/try/rev/00e00f6adcd5494b178e448e544c1e90c48c00eb

And this idea also seems to make things worse -- perf compare

A few improvements to HttpChannelAsyncOpenToTransactionPending
• amazon perfstat-HttpChannelAsyncOpenToTransactionPending opt fission warm webrender, 11% improvement, high confidence
• bing-search perfstat-HttpChannelAsyncOpenToTransactionPending opt cold fission webrender, 6% improvement, high confidence
• cnn perfstat-HttpChannelAsyncOpenToTransactionPending opt fission warm webrender, 13% improvement, high confidence
• google-slides perfstat-HttpChannelAsyncOpenToTransactionPending opt fission warm webrender, 9% improvement, high confidence

And at least one visual metric improvement:
• netflix PerceptualSpeedIndex opt fission warm webrender, 8.7% improvement, high confidence

But too many regressions to list, visual metrics and networking perfstats.

Andrew Creskey [:acreskey]

Reporter

Comment 6

•

1 year ago

Randell is exploring what sounds to be a better solution which I'll paraphrase here: we don't block the cache lookup on the classification, and in fact the classification result will go directly to the socket thread (instead of hopping to background and back to main). Once it arrives on the socket thread we can kick off speculative and actual connections.

Andrew Creskey [:acreskey]

Reporter

Comment 7

•

5 months ago

I tried this test on the Fenix startup tests, making bool NS_ShouldClassifyChannel() return false and it yielded an 80% regression in the newssite-applink-startup applink_startup test. Interesting in its own right.

https://perf.compare/compare-results?baseRev=df1f96aeae247f8c7e0489b52e49de98f37b1de0&newRev=decc6a7e4cbcd05af0dc0795b9047b07648e63de&baseRepo=try&newRepo=try&framework=15&filter_confidence=medium%2Chigh

Andrew Creskey [:acreskey]

Reporter

Comment 8

•

5 months ago

(In reply to Andrew Creskey [:acreskey] from comment #7)

I tried this test on the Fenix startup tests, making bool NS_ShouldClassifyChannel() return false and it yielded an 80% regression in the newssite-applink-startup applink_startup test. Interesting in its own right.

https://perf.compare/compare-results?baseRev=df1f96aeae247f8c7e0489b52e49de98f37b1de0&newRev=decc6a7e4cbcd05af0dc0795b9047b07648e63de&baseRepo=try&newRepo=try&framework=15&filter_confidence=medium%2Chigh

The unexpected results are very likely the result of having profiling run performance data mixed into the comparison, see bug 1965803

I'm running this again outside of ./mach try perf to not include the profiling run data

Andrew Creskey [:acreskey]

Reporter

Comment 9

•

5 months ago

I compared this patch (i.e NS_ShouldClassifyChannel() return false) on the newssite-applink-startup applink_startup (without profiling runs) but it did not pick up any improvements. (Note that the tests are somewhat bimodal and I see relative standard deviation at between 8.5% and 11.5%, which will make picking up small improvements very difficult).

https://perf.compare/compare-results?baseRev=77ac96ad1ce81db0107b8ab6082b147499eeb01b&newRev=41f8d1d0e4c444b5e6ea0fc05d2266dfb48ad722&baseRepo=try&newRepo=try&framework=15

Andrew Creskey [:acreskey]

Reporter

Comment 10

•

5 months ago

Actually, the P6 results look to be normally distributed and might be particularly useful in testing these changes.

Bugzilla

URL Classification can delay AsyncOpen-to-transaction-pending by 26-25%, visual metrics by 3-10% in raptor pageload test

Categories

(Core :: Networking, defect, P2)

Tracking

()

People

(Reporter: acreskey, Unassigned)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [necko-triaged])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10