Closed Bug 1516121 Opened 5 years ago Closed 2 months ago

Cached data delayed in getting to Content Process on facebook page

Categories

(Core :: Networking, enhancement, P2)

58 Branch
enhancement

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: jesup, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug, )

Details

(Keywords: perf, Whiteboard: [necko-triaged])

When loading https://www.facebook.com/DwayneJohnson/ the Master Process appears to finish the network requests for css files (cached) in ~25ms, but the data doesn't reach the Content process for ~200ms, most of which time the content process is idle waiting for the CSS.

Note that the initial HTML load from facebook is taking 1.3-1.6 seconds; that's not the issue here (or our problem I believe; Chrome sees the same delay).

The Master Process MainThread is quite busy during that delay, including network code and AsyncClassifier.

Resolving this would allow the Content Process to make progress on the rest of the things needed to finish the load.

One issue identified by Honza is that the Master Process is clogged due to many parallel loads resolving at the same time, so they all get delayed.

There may also be a known issue affecting overall load time here with cache reads not being parallelized in the IO thread, but that's likely a smaller impact.

Profile (zoomed in on delay): https://perfht.ml/2CtpeBx

(Linux64, Basic Compositing (OpenGL causes big problems here), dual-XEON SSD)
Depends on: 1516117
Depends on: 1505493
Depends on: 1516133
On the page referred in comment 0 there are ~150 js files (not all sync, tho) and ~25 css files (likely blocking, given high priority).


I ran the use case on Backtrack instrumented build.  Backtracking an arbitrarily picked js file [1] OnStopRequest on the content process shows the following, taking it backwards:



Overall the load takes 705ms  (from opening the channel on the content process to finish notification of OnStopRequest on the target consumer, instantiated at [2]).


StopRequestEvent, delay 399.48ms.  Cause: this is a dependent queue event called from CompleteResumeRunnable.
Other blockers on the dependent queue: TransportAndDataEvent, time 1.60ms

  (as a side note, sending Msg_OnStopRequest is delayed by 21.87ms because of 123 pre-scheduled IPC messages we are sending from the parent process main thread)

CompleteResumeRunnable *1, delay 397.87ms.  Cause: there are 934 blockers on the main thread of the content process, where the biggest offenders are:
- previously scheduled CompleteResumeRunnable events for previously requested js files (1.5 - 6ms)
- dom::NotifyOffThreadScriptLoadCompletedRunnable (3 - 7ms)

HttpBackgroundChannelChild::RecvOnTransportAndData on the socket thread, delay 21.54ms.  Cause: this is a dependent runnable blocked by delivering OnStartRequest to the consumer on the main thread.

PHttpChannel::Msg_OnStartRequest, delay: 43.45ms.  Cause: there are 636 blockers scheduled on Web Content/Main Thread.
Offenders seems to be similar to those for CompleteResumeRunnable

CacheFileInputStream::NotifyListener, delay 53.14ms.  This can be eliminated with bug 1505493.

CacheEntry::AvailableCallbackRunnable, delay 59.06ms, 2208 blockers on the parent process main thread.  We may think of some way to eliminate this, but I'm afraid we must notify various on-http-* observers and do some other main-thread only stuff.

net::ReadEvent on the cache IO thread, delay 13.86ms.  Cause: 124 ReadEvent blockers pre-scheduled on the cache io thread for previously requested js files, we may improve slightly with bug 1516117.  note that all the loads has the same priority, hence use the same queue in the single cache IO thread.

nsUrlClassifierDBService::AsyncClassifyLocalWithFeatures, delay 42.22ms and 24ms (called twice on the parent process main thread).  Cause: 1054 and 773 respective blockers.  Blockers are arbitrary proxy release events, various network starting events and nsUrlClassifierDBService::AsyncClassifyLocalWithFeatures.

MozPromise::Then @DoAsyncOpen, delay 37.17ms.  This is a dependent promise waiting for establishing the PHttpBck channels, what under the hood happens instantly itself.  Cause of the delay: MozPromise::ThenValueBase::ResolveOrRejectRunnable blocked on the main thread for 37.03ms by 824 blockers originating from previous js file requests.

net::ContinueAsyncOpenRunnable, delay: 19.63ms, blocker count: 508

PNecko::Msg_PHttpChannelConstructor run on the main thread, delay: 19.66ms, blocker count: 502



To sum: each of the massive amount of js loads more or less copies the same path at tightly the same timing (all the channels seems to be open from a single nsIDocument::DispatchContentLoadedEvents run scheduled from nsHtml5ExecutorReflusher.)  This means that the critical paths are, as they are advancing, fighting for the same resources - understand thread queues.  The most used resource is main thread of the content process follow by MT of the parent process.

The idling content process main thread in the Gecko Profiler seems to be bogus.

One of possible improvements (probably a big change) would be to allow processing of js data off the main thread.  Is there a bug?

Logs for inspection are at [3], use [4] (sorry, no docs for use of the tool available at the time.)  Search for 9BW9EWzKZgg.js and pick the first objective->label pair if you want to play with this.




[1] https://static.xx.fbcdn.net/rsrc.php/v3ilFk4/yQ/l/en_US/9BW9EWzKZgg.js
[2] https://searchfox.org/mozilla-central/rev/f5e90a1d62f47baa644c28274f8e03f2b8896f2f/dom/script/ScriptLoader.cpp#1146-1151
[3] https://janbambas.cz/dwnlds/backtrack-logs-1545505614.tar.xz
[4] https://mayhemer.github.io/backtrack-analyzer/

*1 - CompleteResumeRunnable is used to deliver OnStopRequest notification after we have delivered all data to the consumer.  Data are carried through PBackground, while OnStart/OnStop goes via the main thread to main thread IPC.
> The idling content process main thread in the Gecko Profiler seems to be
> bogus.

I don't think so... but remember, I said it was idling until the CSS was loaded (around 200ms in the profile I took).
After that the content process is busy, yes, and the JS comes in much later, so I can believe that's blocked on content MT stuff (for part of the time).  IIRC it doesn't even get requested until the CSS is loaded (and so, indirectly, the CSS blocks the JS from loading by blocking it from being requested).

> 
> One of possible improvements (probably a big change) would be to allow
> processing of js data off the main thread.  Is there a bug?

Processing of JS (I assume you mean execution) off main-thread would be a crazy amount of complexity.  Easier would be to move all the non-DOM/JS stuff off MainThread onto some secondary thread (SecondThread ;-) ), and then when that code needs to interact with DOM/etc (unfortunately often) synchronize them in some way.  Making DOM multi-thread safe would be ... really tough.
(In reply to Randell Jesup [:jesup] (Away until Jan 2) from comment #2)
> > The idling content process main thread in the Gecko Profiler seems to be
> > bogus.
> 
> I don't think so... but remember, I said it was idling until the CSS was
> loaded (around 200ms in the profile I took).

Looking at the profile from comment 0 again, I can see that we process loading CSS files on the parent process and, according to GeckoProfiler, we are done with it way sooner than the content process is delivered all the css data (or onstoprequest).  There is a huge gap between finishing load on the parent process and finishing it on the content process while the CP MT is doing nothing - according to GP.  

This is the "bogus" I was referring to.  But now I know it's not a bogus.  I believe the parent process network timing is collected correctly, but what you see is onstart/ondata/onstop as captured by the http transaction - on the socket thread.  But onstop on the content process depends on httpchannel's onstoprequest being called on the parent process MT - and I can see that the PP MT is really busy at the time this should happen.  so, I believe that PP MT flooding is the cause of the huge network load gap between parent and content.

I looked again at the BT logs and also tried to run the test case again, but apparently I can't reproduce the delay locally.  Probably because the PP MT starvation on your machine was caused by the OpenGL compositing issue you have later identified?

> After that the content process is busy, yes, and the JS comes in much later,
> so I can believe that's blocked on content MT stuff (for part of the time). 
> IIRC it doesn't even get requested until the CSS is loaded (and so,
> indirectly, the CSS blocks the JS from loading by blocking it from being
> requested).

This is intentional.  I think what happens is follows:
- css are so called blocking resources, necessary for page rendering
- another type of blocking resources are sync js files
- anything else is blocked from loading (but not from scheduling!) until all the blocking resources are completely loaded (on the parent process), in h2 world, blockers are consider "leaders" - a hi priority blocking group.
- the ~140 or so js files are added dynamically from (apparently from what I can see in BT) DOMContentLoaded:

LABEL "HttpChannelChild::AsyncOpen [https://static.xx.fbcdn.net/rsrc.php/v3/yW/r/Y71omDlTi6o.js]" @13700.38ms
  Web Content(18764)/Main Thread  
delay: 12.72ms

>>> INFO "profiler tracing: Navigation [DOMContentLoaded|1]" @13687.66ms
  Web Content(18764)/Main Thread  
delay: 0.02ms

OBJECTIVE "nsIDocument::DispatchContentLoadedEvents [https://www.facebook.com/DwayneJohnson/]" @13687.64ms
  Web Content(18764)/Main Thread  
delay: 0.01ms

EXECUTE_BEGIN "nsIDocument::DispatchContentLoadedEvents" @13687.63ms
  Web Content(18764)/Main Thread  
delay: 8.21ms, blocker count: 508

DISPATCH "" @13679.42ms
  Web Content(18764)/Main Thread  
delay: 0.00ms

OBJECTIVE "timing::NotifyDOMContentFlushedForRootContentDocument [https://www.facebook.com/DwayneJohnson/]" @13679.42ms
  Web Content(18764)/Main Thread  
delay: 0.02ms

- DOMContentLoaded is obviously blocked, in this case by all the referred css resources

Hence, this is intentional (even if those js files where in the markup as <script async>) and here we simply follow the spec.


> 
> > 
> > One of possible improvements (probably a big change) would be to allow
> > processing of js data off the main thread.  Is there a bug?
> 
> Processing of JS (I assume you mean execution)

No.  I mean buffering the data and, when the load is complete (aka onstoprequest), do the necessary processing, on MT or off MT, depending (here I don't know the code well enough).  But it seems like we just send the whole data buffer to one of the js compilation threads.  This MT loop could potentially be removed, with some reasonable care?  (AFAIK, we don't support incremental js content compilation, so there is no reason to buffer and consume the data on the main thread.)

> off main-thread would be a
> crazy amount of complexity.  Easier would be to move all the non-DOM/JS
> stuff off MainThread onto some secondary thread (SecondThread ;-) ),

:)

> and
> then when that code needs to interact with DOM/etc (unfortunately often)
> synchronize them in some way.  Making DOM multi-thread safe would be ...
> really tough.

Yep, and that's not what I was suggesting :)
Hmmm.  Re-profiling doesn't seem to show the same offset in time between Master and Content processes.  I see differences later, in the JS loads, but the content process is very busy then, so I can believe that the MT processing of the JS loads is delayed by that happening (and wouldn't then delay the overall load).  I'll leave this open for a bit, because I may see the problem again -- the comment 0 profile definitely seemed to show a problem.  Perhaps it was an "immediately  after content process is started" issue?  I'll check
(In reply to Honza Bambas (:mayhemer) from comment #3)
> (In reply to Randell Jesup [:jesup] (Away until Jan 2) from comment #2)
> > > The idling content process main thread in the Gecko Profiler seems to be
> > > bogus.
> > 
> > I don't think so... but remember, I said it was idling until the CSS was
> > loaded (around 200ms in the profile I took).
> 
> Looking at the profile from comment 0 again, I can see that we process
> loading CSS files on the parent process and, according to GeckoProfiler, we
> are done with it way sooner than the content process is delivered all the
> css data (or onstoprequest).  There is a huge gap between finishing load on
> the parent process and finishing it on the content process while the CP MT
> is doing nothing - according to GP.  
> 
> This is the "bogus" I was referring to.  But now I know it's not a bogus.  I
> believe the parent process network timing is collected correctly, but what
> you see is onstart/ondata/onstop as captured by the http transaction - on
> the socket thread.  But onstop on the content process depends on
> httpchannel's onstoprequest being called on the parent process MT - and I
> can see that the PP MT is really busy at the time this should happen.  so, I
> believe that PP MT flooding is the cause of the huge network load gap
> between parent and content.

Is this actually delaying processing, or is it just delaying the marker?  If onstop being delayed delays processing, then it's a problem.  If it doesn't, then I'm much less concerned.

> I looked again at the BT logs and also tried to run the test case again, but
> apparently I can't reproduce the delay locally.  Probably because the PP MT
> starvation on your machine was caused by the OpenGL compositing issue you
> have later identified?

Nope, that profiles was with basic compositing.  I'm having trouble reproducing it today...

[snip]
> Hence, this is intentional (even if those js files where in the markup as
> <script async>) and here we simply follow the spec.

So we'll load those JS async files at a high priority, even if we may not need to load them as part of the initial group, right?  We have seen some async scripts get executed "early; I wonder if this might be involved.  smaug?

> > > One of possible improvements (probably a big change) would be to allow
> > > processing of js data off the main thread.  Is there a bug?
> > 
> > Processing of JS (I assume you mean execution)
> 
> No.  I mean buffering the data and, when the load is complete (aka
> onstoprequest), do the necessary processing, on MT or off MT, depending
> (here I don't know the code well enough).  But it seems like we just send
> the whole data buffer to one of the js compilation threads.  This MT loop
> could potentially be removed, with some reasonable care?  (AFAIK, we don't
> support incremental js content compilation, so there is no reason to buffer
> and consume the data on the main thread.)

Are you talking parsing/byte-compilation?  Or JIT compilation?  JIT happens off-main-thread (except on 2-core machines), but that only happens once code is hit some number of times.
Flags: needinfo?(bugs)
Updated profile (initial load, but with a delay): https://perfht.ml/2Tq1yn3 

The CSS loads don't seem as problematic timing-wise as comment 0's.

While loading the JS code (see this zoom-in), the content thread is idle, and the mainthread is busy (and the network requests seem to take much longer in Content than Master): https://perfht.ml/2To9Iwg (and scroll down, then flip between Master and Content 2's Network tracks to compare).  Questions about that: what's happening that the content process is idle while we're waiting for this (cached) JS?  What is the master process doing in that time (it's reasonably busy but not running a single block of code).  Why does the JS data all come in "together" after 300ish ms on the Master process?  Why are some of them very delayed in the content process (I assume the master process or content process were busy, perhaps with the earlier JS, and delayed processing the onstop event on the content process)? 
Thanks
Flags: needinfo?(mpurzynski)
Flags: needinfo?(honzab.moz)
async scripts, per spec, run when they get chance to run, but before load event [1]. We could definitely try to postpone processing them if we know there is more important stuff to do. Bug 1514011 is related to that.


[1] https://html.spec.whatwg.org/images/asyncdefer.svg
Flags: needinfo?(bugs)
(In reply to Randell Jesup [:jesup] (Away until Jan 2) from comment #5)
> Is this actually delaying processing, or is it just delaying the marker?  If
> onstop being delayed delays processing, then it's a problem.  If it doesn't,
> then I'm much less concerned.

It works this way:
On the parent process HttpChannelParent is a stream listener of nsHttpChannel, onstart/ondata/onstop are all called on the main thread.  onstart is IPC'ed main thread to main thread to HttpChannelChild.  ondata are sent from the parent process main thread to the content process socket thread and _queued_ until onstart returns on the content process main thread.  onstop request is sent main thread to main thread and _queued_ until all ondata return (finish calling) on the content process socket thread.

question is what you mean with "processing" here.  for js I can see we just buffer the data in nsIIncrementalStreamLoader.  in onstop we actually do something with them at [1] (calling [2]).  this just gives the data to a js compilation thread.  result is then expected again on the main thread.  this is how I understand it from roughly looking at the code.

so, I think the processing is delayed.  incremental compilation on a background thread might help.  bug 1505493 might help.  calling onstop on a different thread (to bypass one MT loop) might help.

[1] https://searchfox.org/mozilla-central/rev/0ee0b63732d35d16ba22d5a1120622e2e8d58c29/dom/script/ScriptLoader.cpp#2786
[2] https://searchfox.org/mozilla-central/rev/0ee0b63732d35d16ba22d5a1120622e2e8d58c29/dom/script/ScriptLoader.cpp#1793

> 
> > I looked again at the BT logs and also tried to run the test case again, but
> > apparently I can't reproduce the delay locally.  Probably because the PP MT
> > starvation on your machine was caused by the OpenGL compositing issue you
> > have later identified?
> 
> Nope, that profiles was with basic compositing.  I'm having trouble
> reproducing it today...

Oh, that's interesting.  Can you think of anything that may have changed?

> 
> [snip]
> > Hence, this is intentional (even if those js files where in the markup as
> > <script async>) and here we simply follow the spec.
> 
> So we'll load those JS async files at a high priority, 

not sure why in "high priority".  async js resources are loaded unprioritized.

> even if we may not
> need to load them as part of the initial group, right? 

can you please tell me what "initial group" means?

> We have seen some
> async scripts get executed "early; I wonder if this might be involved. 
> smaug?
> 
> > > > One of possible improvements (probably a big change) would be to allow
> > > > processing of js data off the main thread.  Is there a bug?
> > > 
> > > Processing of JS (I assume you mean execution)
> > 
> > No.  I mean buffering the data and, when the load is complete (aka
> > onstoprequest), do the necessary processing, on MT or off MT, depending
> > (here I don't know the code well enough).  But it seems like we just send
> > the whole data buffer to one of the js compilation threads.  This MT loop
> > could potentially be removed, with some reasonable care?  (AFAIK, we don't
> > support incremental js content compilation, so there is no reason to buffer
> > and consume the data on the main thread.)
> 
> Are you talking parsing/byte-compilation?  Or JIT compilation?  JIT happens
> off-main-thread (except on 2-core machines), but that only happens once code
> is hit some number of times.

Oh, here my knowledge of the JS code is too short to answer, sorry.  But as said above, I believe this is off-main-thread byte compilation.
(In reply to Randell Jesup [:jesup] (Away until Jan 2) from comment #6)
> Updated profile (initial load, but with a delay): https://perfht.ml/2Tq1yn3 
> 
> The CSS loads don't seem as problematic timing-wise as comment 0's.
> 
> While loading the JS code (see this zoom-in), the content thread is idle,
> and the mainthread is busy (and the network requests seem to take much
> longer in Content than Master): https://perfht.ml/2To9Iwg (and scroll down,
> then flip between Master and Content 2's Network tracks to compare). 
> Questions about that: what's happening that the content process is idle
> while we're waiting for this (cached) JS?  

I believe I have explained this in comment 3.  

> What is the master process doing
> in that time (it's reasonably busy but not running a single block of code). 

I can't assume details, but as explained in comment 8, all the onstart/ondata/onstop notifications has to go through the master process main thread, one by one, for ~140 different resources.

> Why does the JS data all come in "together" after 300ish ms on the Master
> process?  

Again, I explain this in comment 3 and comment 8.

> Why are some of them very delayed in the content process (I assume
> the master process or content process were busy, perhaps with the earlier
> JS, and delayed processing the onstop event on the content process)? 
> Thanks

Yes, very likely.
Flags: needinfo?(honzab.moz)
Hm, what am I doing here? Hi everyone btw :)
Flags: needinfo?(mpurzynski)
Whiteboard: [qf] → [qf:p1:pageload]
I can see this bug more as a meta/investigation bug.  Leaving it for now in the necko triage pool, more investigation still needed on Randal's side.

(In reply to Michal Purzynski [:michal`] (use NEEDINFO) from comment #10)

Hm, what am I doing here? Hi everyone btw :)

I think a bugzilla bug makes needinfo-ing :michal (michal.novotny) difficult - it doesn't show up when typing :michal. It may be that the ` in your name is messing it up.

(In reply to Valentin Gosu [:valentin] from comment #12)

(In reply to Michal Purzynski [:michal`] (use NEEDINFO) from comment #10)

Hm, what am I doing here? Hi everyone btw :)

I think a bugzilla bug makes needinfo-ing :michal (michal.novotny) difficult - it doesn't show up when typing :michal. It may be that the ` in your name is messing it up.

just hit this problem with keeler. going to file a bug (if there is not one already)

Keywords: meta
Summary: Cached data delayed in getting to Content Process on facebook page → [meta] Cached data delayed in getting to Content Process on facebook page

Giving P2 as this should not be forgotten.

Priority: -- → P2
Whiteboard: [qf:p1:pageload] → [qf:p1:pageload][necko-triaged]

Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)

Whiteboard: [qf:p1:pageload][necko-triaged] → [necko-triaged]

Randell, is this still reproducible?

Flags: needinfo?(rjesup)
Severity: normal → N/A

I'm not able to reproduce this delay, but I think this is important if it's still occurring.

Facebook has changed enough that this isn't directly reproducible; the issues Honza mentions above likely still exist if triggered (pileup on main process MainThread, etc)

Flags: needinfo?(rjesup)
Status: NEW → RESOLVED
Closed: 2 months ago
Keywords: meta
Resolution: --- → WORKSFORME
Summary: [meta] Cached data delayed in getting to Content Process on facebook page → Cached data delayed in getting to Content Process on facebook page
You need to log in before you can comment on or make changes to this bug.