OMT brotli decompression
Categories
(Core :: Networking: HTTP, enhancement, P3)
Tracking
()
Performance Impact | high |
People
(Reporter: ehsan.akhgari, Unassigned)
References
(Depends on 1 open bug, Blocks 2 open bugs)
Details
(4 keywords, Whiteboard: [necko-triaged])
Attachments
(1 obsolete file)
See this profile for example: <https://perfht.ml/2og8OSq> I captured it by loading the facebook homepage. We spend 53ms on the main thread decompressing the brotli stream. I bet this is a lot worse on a slow machine. It seems we shouldn't be running decompression code on the target thread of the Necko data delivery. Even if the target thread isn't the main thread (for example for the HTML parser) it seems that blocking HTML parsing for data decompression is a pretty bad idea if the machine does have free cores to run more than one thing concurrently.
Reporter | ||
Comment 1•7 years ago
|
||
Marking as [qf:investigate:p1] because it's premature to assume that simply moving this OMT is going to make things faster, of course.
Reporter | ||
Comment 2•7 years ago
|
||
BTW I have another profile as well: https://perfht.ml/2ofWYYD This is super easy to reproduce, FTR.
Updated•7 years ago
|
Comment 3•7 years ago
|
||
nsHTTPCompressConv doesn't implement nsIThreadRetargetableStreamListener so OMT is disabled if decompression is required. The first thing to try is to reenable OMT for this scenario, i.e. move the decompression overhead to parser thread instead of blocking main thread. I am not sure if moving decompression to another 'converter' thread will further improve the performance, since the HTML parser will still be waiting for the decompressed data. The page loading and rendering for HTML will still be blocked anyway. However, this will benefit the scenarios that OMT is disabled by other reason. It'll need further experiment to understand the benefit.
Comment 4•7 years ago
|
||
I'm not confident this will end up helping a lot either, but there are cases where we are fetching multiple compressed resources in parallel (js, css, html all commonly compressed) - so helper threads could at least give some parallelism.. but honestly this processing happens pretty fast so it might not be a win after the overhead.
Reporter | ||
Comment 5•6 years ago
|
||
FWIW I have seen this take 1-5 frames quite regularly in various profiles over the past few months, for example on Facebook. The example profile in bug 1367666 reminded me to comment here. It would really be nice if we fixed this, this really does show up a lot on real sites.
Comment 6•6 years ago
|
||
After bug 1357678 landed, part of the time we spent on brotli decompression is off-main-thread now (for example HTML parsing and image decoding). The reset of brotli decompression loading on main thread is mainly for loading JS and CSS. Bug 1380218 and bug 1381425 were filed previously for further OMT JS/CSS loading.
Comment 7•6 years ago
|
||
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Updated•5 years ago
|
Comment 10•5 years ago
|
||
(In reply to Selena Deckelmann :selenamarie :selena use ni? pronoun: she from comment #9)
Junior - is this something you could investigate?
Given Comment 4 and Comment 6, IMO it could be a P3 for now. It could be my backlog but I'm not familiar with the pb-Http setup.
Comment 11•5 years ago
|
||
QF triage team has reviewed this bug and it seems like a pageload issue.
Comment 12•5 years ago
•
|
||
I think moving decompression off the main thread would help quite a bit.
Here's a more recent profile with 175ms spent inside nsHTTPCompressConv::OnDataAvailable: http://bit.ly/2ShNR8P
This is from loading https://www.facebook.com/DwayneJohnson .
Generally, the main thread is always busy during page load; the more work we can move off of it, the better. And things that are already asynchronous from the main thread's perspective (such as network requests) are a particularly attractive target.
Updated•5 years ago
|
Comment 13•5 years ago
|
||
Hi :dragana,
What all is required to accomplish this? Is it sufficient to read the data from the HTTP stream into a buffer and then pass that off to a thread to do the decoding?
Comment 14•5 years ago
|
||
(In reply to Barret Rennie [:brennie] from comment #13)
Hi :dragana,
What all is required to accomplish this? Is it sufficient to read the data from the HTTP stream into a buffer and then pass that off to a thread to do the decoding?
we have away to deliver onDataAvailable off-mail-thread, but all listeners must agree.
For listeners that require onDataAvailable to be deliver on the main thread but are encoded to do the following:
- deliver onDataAvailabe to the decoder, to nsHTTPCompressConv, of the main thread.
- the decoder should dispatch decoded data to the main thread.
we need to be careful to properly cancel data delivery from the decoder if channel gets canceled.
I think this should work.
Comment 15•4 years ago
|
||
Comment 16•4 years ago
|
||
So I ran some try runs of the patch I posted above (and some variants) to do some try runs and here are the results:
Always move off-main-thread, all decompression
This is the current version of the patch posted for review. If the listener chain for nsHTTPCompessConv
doesn't support off-main-thread OnDataAvailable
, nsHTTPCompressConv::OnDataAvailable
will dispatch a task to call its listener on the main thread.
Regresses the following:
- raptor-tp6-imgur-firefox opt 4.06%
- raptor-tp6-instagram-firefox opt 27.92%
- raptor-tp6-linkedin-firefox opt 2%
- raptor-tp6-sheets-firefox opt 3%
- raptor-tp6-yahoo-news-firefox opt 2.62%
Only move off-main-thread when entire listener chain supports, all decompression
Regresses the following:
- raptor-tp6-imgur-firefox opt 5.18%
- raptor-tp6-instagram-firefox opt 28.44%
Improves the following: - raptor-tp6-wikipedia-firefox opt 2.28%
Improves the following: - raptor-tp6-google-firefox opt 3.10%
Always move off-main-thread, only Brotli
Like the patch posted for review, this will always re-target off-the-main thread and dispatch the listener back to the main thread when it doesn't support re-targeting.
Regresses the following:
raptor-tp6-facebook-firefox-cold opt 33.20%
Improves the following:
- raptor-tp6-google-firefox opt 3.15%
- raptor-tp6-reddit-firefox opt 3.48%
- raptor-tp6-youtube-firefox opt 3.10%
Only move off-main-thread when entire listener chain supports, only Brotli
Regresses the following:
- raptor-tp6-facebook-firefox-cold opt 49.73%
- raptor-tp6-imgur-firefox opt 4.44%
- raptor-tp6-instagram-firefox opt 30.12%
- raptor-tp6-linkedin-firefox opt 2.21%
Updated•3 years ago
|
Updated•3 years ago
|
Comment 17•3 years ago
|
||
I'm going to get this compiling again and re-evaluate it with our latest performance tests which include features like visual metrics.
Comment 18•3 years ago
|
||
Current nightly still seeing about 65ms in the content process during facebook pageload on a MacBook Pro.
https://share.firefox.dev/3eyoWLl
Comment 19•2 years ago
|
||
I've gotten the patches compiling again.
Nazim was kind enough to capture new profiles with these patches:
Baseline, 109ms of Brotli decompression on content process main thread:
https://share.firefox.dev/3ny3zhd
With Brotli moved omt: now only 4ms of Brotli decompression on content process main thread:
https://share.firefox.dev/3sXGHZx
Awaiting visual metric runs on try.
Comment 20•2 years ago
|
||
We have some interesting results from visual metrics.
I only rebuilt two of the variations from Comment 16
Always move off-main-thread, all decompression
While this regresses many sites, there are also numerous improvements of up to 50% in visual metrics.
Always move off-main-thread, only Brotli decompression
This regresses more sites than it improves, which is interesting.
Comment 21•2 years ago
|
||
I think a good next step would be to compare these variations on larger set of pages, ideally using live sites.
Updated•2 years ago
|
Comment 22•1 year ago
|
||
The Performance Priority Calculator has determined this bug's performance priority to be P1.
Platforms: macOS
Impact on site: Causes noticeable jank
Page load impact: Some
[x] Affects major website
[x] Able to reproduce locally
Comment 23•1 year ago
|
||
While this would be useful right now, to my understanding it will no longer be needed once we can move the network calls off the content process main thread, Bug 1639433.
Updated•1 year ago
|
Comment 24•10 months ago
|
||
The severity field for this bug is set to S3. However, the Performance Impact
field flags this bug as having a high impact on the performance.
:jesup, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact
flag to ?
?
For more information, please visit auto_nag documentation.
Comment 25•13 days ago
|
||
(In reply to Andrew Creskey [:acreskey] from comment #23)
While this would be useful right now, to my understanding it will no longer be needed once we can move the network calls off the content process main thread, Bug 1639433.
Why would that be? The Brotli decompression occurs on Parent Process mainthread right now.
I wonder why there are such regressions from moving this OMT. Also, if we can move it OMT, does it benefit (or perhaps reduce/remove regressions) from OnDataFinished OMT?
Comment 26•10 days ago
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #25)
(In reply to Andrew Creskey [:acreskey] from comment #23)
While this would be useful right now, to my understanding it will no longer be needed once we can move the network calls off the content process main thread, Bug 1639433.
Why would that be? The Brotli decompression occurs on Parent Process mainthread right now.
Oh, that's interesting.
I was relaying Dragana's thoughts on this one when we deprioritized it.
But I didn't realize that the decompression was on the parent process main thread.
I think that in that case we should have a closer look at this patch and its implications.
Updated•10 days ago
|
Description
•