Closed Bug 1356686 Opened 7 years ago Closed 2 months ago

OMT brotli decompression

Categories

(Core :: Networking: HTTP, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
126 Branch
Performance Impact high
Tracking Status
firefox126 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: jesup)

References

(Depends on 1 open bug, Blocks 2 open bugs, Regressed 1 open bug)

Details

(4 keywords, Whiteboard: [necko-triaged])

Attachments

(5 files, 1 obsolete file)

See this profile for example: <https://perfht.ml/2og8OSq>  I captured it by loading the facebook homepage.  We spend 53ms on the main thread decompressing the brotli stream.  I bet this is a lot worse on a slow machine.

It seems we shouldn't be running decompression code on the target thread of the Necko data delivery.  Even if the target thread isn't the main thread (for example for the HTML parser) it seems that blocking HTML parsing for data decompression is a pretty bad idea if the machine does have free cores to run more than one thing concurrently.
Marking as [qf:investigate:p1] because it's premature to assume that simply moving this OMT is going to make things faster, of course.
Whiteboard: [qf:p1] → [qf:investigate:p1]
BTW I have another profile as well: https://perfht.ml/2ofWYYD

This is super easy to reproduce, FTR.
Whiteboard: [qf:investigate:p1] → [qf:investigate:p1][necko-next]
nsHTTPCompressConv doesn't implement nsIThreadRetargetableStreamListener so OMT is disabled if decompression is required.
The first thing to try is to reenable OMT for this scenario, i.e. move the decompression overhead to parser thread instead of blocking main thread.

I am not sure if moving decompression to another 'converter' thread will further improve the performance, since the HTML parser will still be waiting for the decompressed data. The page loading and rendering for HTML will still be blocked anyway. However, this will benefit the scenarios that OMT is disabled by other reason. It'll need further experiment to understand the benefit.
I'm not confident this will end up helping a lot either, but there are cases where we are fetching multiple compressed resources in parallel (js, css, html all commonly compressed) - so helper threads could at least give some parallelism.. but honestly this processing happens pretty fast so it might not be a win after the overhead.
FWIW I have seen this take 1-5 frames quite regularly in various profiles over the past few months, for example on Facebook.  The example profile in bug 1367666 reminded me to comment here.  It would really be nice if we fixed this, this really does show up a lot on real sites.
After bug 1357678 landed, part of the time we spent on brotli decompression is off-main-thread now (for example HTML parsing and image decoding). The reset of brotli decompression loading on main thread is mainly for loading JS and CSS. Bug 1380218 and bug 1381425 were filed previously for further OMT JS/CSS loading.
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P2
Keywords: perf
Dragana - is this something you can tackle?
Flags: needinfo?(dd.mozilla)

Junior - is this something you could investigate?

Flags: needinfo?(juhsu)
Flags: needinfo?(dd.mozilla)

(In reply to Selena Deckelmann :selenamarie :selena use ni? pronoun: she from comment #9)

Junior - is this something you could investigate?

Given Comment 4 and Comment 6, IMO it could be a P3 for now. It could be my backlog but I'm not familiar with the pb-Http setup.

Flags: needinfo?(juhsu)
Priority: P2 → P3
Whiteboard: [qf:investigate:p1][necko-next] → [qf:investigate:p1][necko-triaged]

QF triage team has reviewed this bug and it seems like a pageload issue.

Whiteboard: [qf:investigate:p1][necko-triaged] → [qf:p1:pageload][necko-triaged]

I think moving decompression off the main thread would help quite a bit.
Here's a more recent profile with 175ms spent inside nsHTTPCompressConv::OnDataAvailable: http://bit.ly/2ShNR8P
This is from loading https://www.facebook.com/DwayneJohnson .

Generally, the main thread is always busy during page load; the more work we can move off of it, the better. And things that are already asynchronous from the main thread's perspective (such as network requests) are a particularly attractive target.

Assignee: nobody → brennie
Status: NEW → ASSIGNED

Hi :dragana,

What all is required to accomplish this? Is it sufficient to read the data from the HTTP stream into a buffer and then pass that off to a thread to do the decoding?

Flags: needinfo?(dd.mozilla)

(In reply to Barret Rennie [:brennie] from comment #13)

Hi :dragana,

What all is required to accomplish this? Is it sufficient to read the data from the HTTP stream into a buffer and then pass that off to a thread to do the decoding?

we have away to deliver onDataAvailable off-mail-thread, but all listeners must agree.
For listeners that require onDataAvailable to be deliver on the main thread but are encoded to do the following:

  1. deliver onDataAvailabe to the decoder, to nsHTTPCompressConv, of the main thread.
  2. the decoder should dispatch decoded data to the main thread.

we need to be careful to properly cancel data delivery from the decoder if channel gets canceled.

I think this should work.

Flags: needinfo?(dd.mozilla)

So I ran some try runs of the patch I posted above (and some variants) to do some try runs and here are the results:

Always move off-main-thread, all decompression
This is the current version of the patch posted for review. If the listener chain for nsHTTPCompessConv doesn't support off-main-thread OnDataAvailable, nsHTTPCompressConv::OnDataAvailable will dispatch a task to call its listener on the main thread.
Regresses the following:

  • raptor-tp6-imgur-firefox opt 4.06%
  • raptor-tp6-instagram-firefox opt 27.92%
  • raptor-tp6-linkedin-firefox opt 2%
  • raptor-tp6-sheets-firefox opt 3%
  • raptor-tp6-yahoo-news-firefox opt 2.62%

Only move off-main-thread when entire listener chain supports, all decompression
Regresses the following:

  • raptor-tp6-imgur-firefox opt 5.18%
  • raptor-tp6-instagram-firefox opt 28.44%
    Improves the following:
  • raptor-tp6-wikipedia-firefox opt 2.28%
    Improves the following:
  • raptor-tp6-google-firefox opt 3.10%

Always move off-main-thread, only Brotli
Like the patch posted for review, this will always re-target off-the-main thread and dispatch the listener back to the main thread when it doesn't support re-targeting.
Regresses the following:
raptor-tp6-facebook-firefox-cold opt 33.20%

Improves the following:

  • raptor-tp6-google-firefox opt 3.15%
  • raptor-tp6-reddit-firefox opt 3.48%
  • raptor-tp6-youtube-firefox opt 3.10%

Only move off-main-thread when entire listener chain supports, only Brotli
Regresses the following:

  • raptor-tp6-facebook-firefox-cold opt 49.73%
  • raptor-tp6-imgur-firefox opt 4.44%
  • raptor-tp6-instagram-firefox opt 30.12%
  • raptor-tp6-linkedin-firefox opt 2.21%
Attachment #9068169 - Attachment is obsolete: true
Assignee: brennie → nobody
Status: ASSIGNED → NEW

I'm going to get this compiling again and re-evaluate it with our latest performance tests which include features like visual metrics.

Current nightly still seeing about 65ms in the content process during facebook pageload on a MacBook Pro.
https://share.firefox.dev/3eyoWLl

I've gotten the patches compiling again.

Nazim was kind enough to capture new profiles with these patches:

Baseline, 109ms of Brotli decompression on content process main thread:
https://share.firefox.dev/3ny3zhd

With Brotli moved omt: now only 4ms of Brotli decompression on content process main thread:
https://share.firefox.dev/3sXGHZx

Awaiting visual metric runs on try.

We have some interesting results from visual metrics.

I only rebuilt two of the variations from Comment 16
Always move off-main-thread, all decompression

While this regresses many sites, there are also numerous improvements of up to 50% in visual metrics.

Always move off-main-thread, only Brotli decompression

This regresses more sites than it improves, which is interesting.

I think a good next step would be to compare these variations on larger set of pages, ideally using live sites.

Performance Impact: --- → P1
Keywords: perf:pageload
Whiteboard: [qf:p1:pageload][necko-triaged] → [necko-triaged]

The Performance Priority Calculator has determined this bug's performance priority to be P1.

Platforms: macOS
Impact on site: Causes noticeable jank
Page load impact: Some
[x] Affects major website
[x] Able to reproduce locally

While this would be useful right now, to my understanding it will no longer be needed once we can move the network calls off the content process main thread, Bug 1639433.

Severity: normal → S3

The severity field for this bug is set to S3. However, the Performance Impact field flags this bug as having a high impact on the performance.
:jesup, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact flag to ??

For more information, please visit auto_nag documentation.

Flags: needinfo?(rjesup)
Depends on: omt-content
Flags: needinfo?(rjesup)
See Also: → 1851351

(In reply to Andrew Creskey [:acreskey] from comment #23)

While this would be useful right now, to my understanding it will no longer be needed once we can move the network calls off the content process main thread, Bug 1639433.

Why would that be? The Brotli decompression occurs on Parent Process mainthread right now.

I wonder why there are such regressions from moving this OMT. Also, if we can move it OMT, does it benefit (or perhaps reduce/remove regressions) from OnDataFinished OMT?

Flags: needinfo?(acreskey)

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #25)

(In reply to Andrew Creskey [:acreskey] from comment #23)

While this would be useful right now, to my understanding it will no longer be needed once we can move the network calls off the content process main thread, Bug 1639433.

Why would that be? The Brotli decompression occurs on Parent Process mainthread right now.

Oh, that's interesting.
I was relaying Dragana's thoughts on this one when we deprioritized it.
But I didn't realize that the decompression was on the parent process main thread.

I think that in that case we should have a closer look at this patch and its implications.

Flags: needinfo?(acreskey)

Content processes will now always retarget delivery of OnDataAvailable for Http
channels off the main thread. Consumers that were previously redirecting
off-main thread are not affected and their retargeting will stick, but any
Httpchannel that was not retargeted off the main thread will be retargeted to
the nsIStreamTransportService.

If the listener for nsHTTPCompressConv cannot be called off the main thread (ie
the call to nsIRetargetableRequest::CheckListenerChain would fail),
nsHTTPCompressConv will be called off main thread but dispatch its decoded data
back to the main thread.

Assignee: nobody → rjesup
Attachment #9359262 - Attachment description: WIP: Bug 1356686 - Do decompression off main thread in content r=#necko-reviewers! → Bug 1356686 - Do decompression off main thread in content r=#necko-reviewers!
Status: NEW → ASSIGNED

SVGs must be coded on MainThread because they need to access the LoadGroup,
etc, while other images can be decoded via the decode pool. This ensures
that CheckListenerChain() will return the right value for this instance.

Pushed by rjesup@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/54754a50d992
Make InterceptFailedOnStop support nsIThreadRetargetableStreamListener r=necko-reviewers,valentin
https://hg.mozilla.org/integration/autoland/rev/3b150d676b31
Do decompression off main thread in content r=necko-reviewers,valentin,extension-reviewers,robwu
https://hg.mozilla.org/integration/autoland/rev/6e9726c26d07
Ensure that imgRequest::CheckListenerChain returns the correct value r=tnikkel
https://hg.mozilla.org/integration/autoland/rev/63dc52c69902
Put OMT decompression behind a pref r=necko-reviewers,valentin
See Also: → 1886237
Pushed by rjesup@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/969c8ce344d6
Make InterceptFailedOnStop support nsIThreadRetargetableStreamListener r=necko-reviewers,valentin
https://hg.mozilla.org/integration/autoland/rev/c58710d55a77
Do decompression off main thread in content r=necko-reviewers,valentin,extension-reviewers,robwu
https://hg.mozilla.org/integration/autoland/rev/356510340dd1
Ensure that imgRequest::CheckListenerChain returns the correct value r=tnikkel
https://hg.mozilla.org/integration/autoland/rev/f5ac2d368c7d
Put OMT decompression behind a pref r=necko-reviewers,valentin
See Also: → 1886328
Depends on: 1886734
Regressions: 1888091
Flags: needinfo?(rjesup)
Regressions: 1889577

Capturing Randell's summary of improvements observed in telemetry ( from #performance-wins)

FCP shows a 10% improvement pretty much across the board, with 20% at the 95%ile, and ~100%(!) at the 99%ile, and ~180% at the 99.9th %-ile.
LCP showed 10% improvement at 50%ile, 20% at 75%ile, and 95%(!) at the 95%ile.

Sunil will be conducting an experiment with this pref to get a second view on the impact.

Regressions: 1899112
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: