Open Bug 603807 Opened 14 years ago Updated 2 years ago

Sync paint buffers during startup blocks the content process and can cause poor startup time

Categories

(Core :: Graphics, defect)

defect

Tracking

()

People

(Reporter: azakai, Unassigned)

References

Details

PLayersChild:Msg_Update is synchronous, and causes significant slowdowns on Fennec. After recent patches, this is now the top sync message in terms of latency. Data:

When loading Fennec+loading a page (cnn.com), 0.47 seconds are spent in the child in calls to PLayersChild:Msg_Update, which is 10% of the total time. The parent only spends 0.003 seconds in actual processing for that message, so the great majority of the latency is due to the parent having other things to do, and the child waiting until the parent gets around to it.

When loading a page (after Fennec itself is already loaded, and is idle), only 0.012 seconds are spent (0.5% of the total time). That matches the data in the previous paragraph - this only becomes a serious problem when the parent is busy (which could be not only with initial load stuff as in the previous paragraph, but also other tabs, etc., or even just the device having a slow processor).

Making PLayersChild:Msg_Update async would solve this problem.
Blocks: 588050
tracking-fennec: --- → ?
Making layers updates async is possible, but it's quite nontrivial and deciding whether or not to do it is a complicated question.

How many Update()s are sent in that profile?
> How many Update()s are sent in that profile?

For cnn.com, 9.

That reminds me, this can be much worse on pages with animations. On webhamster.com, after 10 seconds over 300 such messages have been sent out, with 0.89 seconds of latency.
If 9 sync messages take half a second to process (~50msec per) during page load, our chrome-process event loop latency is too high.  Let's find out where the event-loop latency is coming from; that impacts UI responsiveness and panning/zooming animations.  Bug 601268 might be relevant if we're blocked on long runnables, but not if we just have a huge backlog of tasks to process.  Function timers might be a better measuring tool.

Processing 300 Update()s in 0.89 seconds is within a factor of 3 of what I had estimated, I'm not worried about that (that rate can drive a 300 fps animation).
> If 9 sync messages take half a second to process (~50msec per) during
> page load, our chrome-process event loop latency is too high.  Let's find
> out where the event-loop latency is coming from;

Those numbers were from loading Fennec itself, + loading a web page. During load there is a lot of heavy stuff going on, I'm not sure it's surprising the parent process is blocked up.

I think this is important to improve, since (1) we care very much about the time to load Fennec + load the first page, and (2) there are other cases where the parent might be blocked, and it would be good not to start to suffer from the sync messages in those cases.

> Processing 300 Update()s in 0.89 seconds is within a factor of 3
> of what I had estimated, I'm not worried about that (that rate can
> drive a 300 fps animation).

Well, 0.89 seconds of wait during 10 seconds of showing an animation means that the child process is working ~9% slower than it should. For example, if it's also running some JavaScript, or showing a video, those will be negatively impacted.
(In reply to comment #4)
> > If 9 sync messages take half a second to process (~50msec per) during
> > page load, our chrome-process event loop latency is too high.  Let's find
> > out where the event-loop latency is coming from;
> 
> Those numbers were from loading Fennec itself, + loading a web page. During
> load there is a lot of heavy stuff going on, I'm not sure it's surprising the
> parent process is blocked up.
> 

Such as?  That's what I'd like to find out.

> I think this is important to improve, since (1) we care very much about the
> time to load Fennec + load the first page, and (2) there are other cases where
> the parent might be blocked, and it would be good not to start to suffer from
> the sync messages in those cases.
> 

No, if we have to worry about chrome being blocked, we lose.  The content process wouldn't be able to get new pixels to screen synchronously or asynchronously and we wouldn't be able to process input events so perceived responsiveness would go to zilch.

> > Processing 300 Update()s in 0.89 seconds is within a factor of 3
> > of what I had estimated, I'm not worried about that (that rate can
> > drive a 300 fps animation).
> 
> Well, 0.89 seconds of wait during 10 seconds of showing an animation means that
> the child process is working ~9% slower than it should. For example, if it's
> also running some JavaScript, or showing a video, those will be negatively
> impacted.

So, "working 9% slower" is an oversimplification.  Video is decoded off the main thread, so Update() doesn't block it (we have separate plans for video anyway, bug 598868).  300 Update()s in 10s is 30 paints/sec, and if that's the target the page aimed for, Update() isn't a bottleneck.

To be clear, I'm not arguing that we shouldn't implement async Update().  We likely will want to.  But, unlike prefs, the system was carefully designed not to send more 60 messages a second, and unlike prefs, making it async is extremely difficult.  It's also been designed so that processing each Update() is basically free.  Making Update() async isn't a clear win either because it's going to require more memory.

The profile numbers show that the event loop has an unexpectedly high latency.  Let's find out what's causing that.  If it's, say, sync crunching away on the chrome main thread, then fixing sync will make this problem and others go away.  If it's something else that we can't fix, then making Update() async would assume a higher priority.
(In reply to comment #5)
> (In reply to comment #4)
> > > If 9 sync messages take half a second to process (~50msec per) during
> > > page load, our chrome-process event loop latency is too high.  Let's find
> > > out where the event-loop latency is coming from;
> > 
> > Those numbers were from loading Fennec itself, + loading a web page. During
> > load there is a lot of heavy stuff going on, I'm not sure it's surprising the
> > parent process is blocked up.
> > 
> 
> Such as?  That's what I'd like to find out.

I will try to find that out.

> 
> > I think this is important to improve, since (1) we care very much about the
> > time to load Fennec + load the first page, and (2) there are other cases where
> > the parent might be blocked, and it would be good not to start to suffer from
> > the sync messages in those cases.
> > 
> 
> No, if we have to worry about chrome being blocked, we lose.  The content
> process wouldn't be able to get new pixels to screen synchronously or
> asynchronously and we wouldn't be able to process input events so perceived
> responsiveness would go to zilch.

I agree that chrome being backed up is very bad. But there is unavoidable backing up during initial startup - startup does take several seconds, of 100% CPU usage. If the child sends a sync message during that time, it will end up waiting. Startup time for Firefox and Fennec is being worked on very hard, but I think we will never get it close to 0, which is the only situation where a sync message will not block the child.

> 
> > > Processing 300 Update()s in 0.89 seconds is within a factor of 3
> > > of what I had estimated, I'm not worried about that (that rate can
> > > drive a 300 fps animation).
> > 
> > Well, 0.89 seconds of wait during 10 seconds of showing an animation means that
> > the child process is working ~9% slower than it should. For example, if it's
> > also running some JavaScript, or showing a video, those will be negatively
> > impacted.
> 
> So, "working 9% slower" is an oversimplification.  Video is decoded off the
> main thread, so Update() doesn't block it (we have separate plans for video
> anyway, bug 598868).  300 Update()s in 10s is 30 paints/sec, and if that's the
> target the page aimed for, Update() isn't a bottleneck.
> 

Good point. I guess for video this is not important.

However, it is for JavaScript. A JS-heavy site running at full speed will run ~9% slower in the example above.

> To be clear, I'm not arguing that we shouldn't implement async Update().  We
> likely will want to.  But, unlike prefs, the system was carefully designed not
> to send more 60 messages a second, and unlike prefs, making it async is
> extremely difficult.  It's also been designed so that processing each Update()
> is basically free.  Making Update() async isn't a clear win either because it's
> going to require more memory.
> 

I don't know much about what layers does here (why this is sync, why it's hard to make async, etc.). Is there somewhere I can read about the design?
(In reply to comment #6)
> I agree that chrome being backed up is very bad. But there is unavoidable
> backing up during initial startup - startup does take several seconds, of 100%
> CPU usage. If the child sends a sync message during that time, it will end up
> waiting. Startup time for Firefox and Fennec is being worked on very hard, but
> I think we will never get it close to 0, which is the only situation where a
> sync message will not block the child.
> 

Sure.  But again, it depends on the kind of blockage.  If, say, there are a handful of events stopping things up because of synchronous disk IO (there are several known violators), those are the things we need to fix and then re-evaluate.

> However, it is for JavaScript. A JS-heavy site running at full speed will run
> ~9% slower in the example above.
> 

There are cases where that's possible, but again it's complicated.

> > To be clear, I'm not arguing that we shouldn't implement async Update().  We
> > likely will want to.  But, unlike prefs, the system was carefully designed not
> > to send more 60 messages a second, and unlike prefs, making it async is
> > extremely difficult.  It's also been designed so that processing each Update()
> > is basically free.  Making Update() async isn't a clear win either because it's
> > going to require more memory.
> > 
> 
> I don't know much about what layers does here (why this is sync, why it's hard
> to make async, etc.). Is there somewhere I can read about the design?

Best bet is https://wiki.mozilla.org/Gecko:CrossProcessLayers.  In a nutshell, child/parent share surfaces with each other.  The parent holds read-only front buffers and the child draws into back buffers.  The child can't draw directly into the parent's front buffers (unsynchronized) or we'd get rendering glitches.  The sync Update() is where backs/fronts are swapped.  If Update() were async, the child would have to give up its back surfaces, i.e. nothing to paint into, for an arbitrary amount of time, until the parent signaled that it was done using the old front surfaces for painting to screen.  During that time, the child would need to suppress all painting without also suppressing other useful work.  That's tricky tricky, but possible, for most things except <canvas>, because script can write to those at any time.  We'd need a third buffer and a more costly update pattern.  Other solutions to the <canvas> problem are possible, but there are tradeoffs.

I'm not interested in taking on that hard work just to, e.g., work around synchronous IO on the chrome main thread.
tracking-fennec: ? → 2.0-
Regarding the reason the parent is blocked up: With blassey's help I did some profiling of the events run during startup. There is no synchronous IO on the main thread - just on side threads. So no real fix there. There are simply a lot of things that get done on the main thread during startup, like loading and initializing components (parsing and executing JS ones in particular can take time), etc.

Regarding possible solutions: Thanks for the explanation! Ok, I see the need for a third buffer, and that has obvious disadvantages. I'm not sure I understand why this would be complex and/or problematic otherwise though - so I guess I'm missing something. Can you tell my why the following wouldn't work? (if it would work, then I'd love to try to implement it myself)

* Child renders to back buffer A.
* Child sends Update to parent, asynchronously.
* Child continues doing stuff, now writing to back buffer B instead of A. The child knows not to send further Update messages (which the parent would not be able to handle anyhow).
* Parent gets around to handling the child's message, finishes handling it, and returns a response.
* Child gets the response, and makes a note of that. When it next wants to send an Update, it does so, at which time it switches back to writing to back buffer A and so forth.

The downsides to this seem to be

1. The third buffer.
2. The child may suppress sending an Update, and only send one later, which is not good. However, on the other hand the child is free to continue working instead of waiting on the sync message. And anyhow, if the child only sends Updates at a reasonable rate (spaced at least 1/60 seconds apart, say), then the likelihood of needing to suppress sending an Update will be very small (the parent's response will have arrived a long time before the child wants to send one). It would only happen when there is momentary lag in the parent, like during initial load or some other infrequent special event.
If this bug is about startup time, the sync-ness of Update isn't going to matter to the *user* at all: the chrome process will get the bits and paint them at the exact same time either way.

My question here is really why we're trying to paint the content before startup is complete. Can we simply avoid painting it until we've basically finished the startup sequence?
I read in previous comments that you were measuring startup-->page-load before, but it didn't really register.  I assume you're doing something like |fennec http://www.cnn.com|?  In absence of evidence to the contrary, I'm going to claim that's an uncommon case, because the user would have to explicitly set a non-default, remote start page to do that on device.  (Or load links from other apps, but again I'll claim that's less common than start->home screen.)

(In reply to comment #8)
> Regarding the reason the parent is blocked up: With blassey's help I did some
> profiling of the events run during startup. There is no synchronous IO on the
> main thread - just on side threads.

How did you measure lack of sync IO?  Fastload cache is known to do sync IO on startup.  It's not just IO, but also too-long-running events, like excessive computation.  What was the longest event?  A timeline would be useful.

> 
> Regarding possible solutions: Thanks for the explanation! Ok, I see the need
> for a third buffer, and that has obvious disadvantages.

There's no absolute need for a third buffer except for <canvas>, and even there other schemes are possible.

Your proposed scheme works, but "1. The third buffer." is a big deal.  We're already using way too much memory on fennec, and this plan increases that by 50%.

In exchange for either (i) very difficult work to suppress painting during async Update() + 50% <canvas> memory overhead or (ii) 50% memory overhead across the board, we potentially reduce the time the content process waits on the chrome process by 0.5 seconds during startup-to-load-cnn.  How long is the total startup-to-load-cnn time you're measuring?  I see ~10s start-to-load-cnn.com on my desktop machine.  So that's a ~5% reduction, with no increase in perceived responsiveness.  And this is on an uncommon startup path, in exchange for what's listed above.  Bad trade IMHO.  If we get full startup event timelines and/or profiler data, I would almost guarantee we can shave more time than that off startup and possibly increase perceived responsiveness, more easily, with no memory tradeoff.

(In reply to comment #9)
> If this bug is about startup time, the sync-ness of Update isn't going to
> matter to the *user* at all: the chrome process will get the bits and paint
> them at the exact same time either way.
> 

Sync Update() can affect the time it takes cnn.com to fully load by reducing the amount of concurrency between content/chrome.  That is, the content process can't continue to parse/reflow/run script/etc. during Update().

> My question here is really why we're trying to paint the content before startup
> is complete. Can we simply avoid painting it until we've basically finished the
> startup sequence?

It's possible, but we do want cnn.com to load as soon as possible, and since most of the process of loading cnn.com and starting up the frontend can be done in parallel, it makes sense to do so.  Except when there are bad tradeoffs needed to make them parallel.
(In reply to comment #10)
> (In reply to comment #9)
> > If this bug is about startup time, the sync-ness of Update isn't going to
> > matter to the *user* at all: the chrome process will get the bits and paint
> > them at the exact same time either way.
> > 
> 
> Sync Update() can affect the time it takes cnn.com to fully load by reducing
> the amount of concurrency between content/chrome.  That is, the content process
> can't continue to parse/reflow/run script/etc. during Update().
> 

In a bit more detail: what you say would be true if we only painted cnn.com once, after it was fully loaded, but we repaint it 9 times (according to Alon's data) before it's fully loaded.  The sync-ness of Update() during those first 8 repaints cuts down on the time the content process could have spent doing other things.  Of course, it might have spent that waiting on network requests, we don't know without better data.
(In reply to comment #9)
> My question here is really why we're trying to paint the content before startup
> is complete. Can we simply avoid painting it until we've basically finished the
> startup sequence?

Sorry, looking back over this I realize I missed your point.  We might be able to do that, but if we can repaint during frontend startup it makes the page appear to load faster, for the same reason we repaint pages multiple times while they're loading.

But again this is on an uncommon path so I don't think it's worth bothering with too much.
(In reply to comment #10)
> I read in previous comments that you were measuring startup-->page-load before,
> but it didn't really register.  I assume you're doing something like |fennec
> http://www.cnn.com|?  In absence of evidence to the contrary, I'm going to
> claim that's an uncommon case, because the user would have to explicitly set a
> non-default, remote start page to do that on device.  (Or load links from other
> apps, but again I'll claim that's less common than start->home screen.)
> 

Actually I think the remote start page will be the common startup case, since Fennec will likely be set up to load a page like

  http://www.google.com/firefox

(or whatever website makes sense for the people setting up that device).

Aside from startup time, I am also concerned about animations. Running the sunspider, v8 or kraken benchmarks on a page with animations will run slower than it should, currently, due to blocking. Likewise it would slow down real-world applications like games that can max out the CPU on JavaScript.

> (In reply to comment #8)
> > Regarding the reason the parent is blocked up: With blassey's help I did some
> > profiling of the events run during startup. There is no synchronous IO on the
> > main thread - just on side threads.
> 
> How did you measure lack of sync IO?  Fastload cache is known to do sync IO on
> startup.  It's not just IO, but also too-long-running events, like excessive
> computation.  What was the longest event?  A timeline would be useful.

I'll do further profiling on that, but for the reasons I mentioned above, I think we already have enough motivation to try to figure out something to improve the blocking situation.

> 
> In exchange for either (i) very difficult work to suppress painting during
> async Update() + 50% <canvas> memory overhead or (ii) 50% memory overhead
> across the board, we potentially reduce the time the content process waits on
> the chrome process by 0.5 seconds during startup-to-load-cnn.  How long is the
> total startup-to-load-cnn time you're measuring?  I see ~10s
> start-to-load-cnn.com on my desktop machine.  So that's a ~5% reduction, with
> no increase in perceived responsiveness.

I saw a 10% difference in my data mentioned before. I agree that responsiveness is not a factor during startup, just absolute page load times. I also agree that a 50% memory increase is very bad (although, maybe on some devices it wouldn't matter).

What about not blocking during startup and page load? That is, to suffer rendering glitches. The partially rendered page can look odd anyhow during those times. I wonder which way would, psychologically, appear faster to the user...
(In reply to comment #13)
> Actually I think the remote start page will be the common startup case, since
> Fennec will likely be set up to load a page like
> 

Hm, I hadn't heard that.  Is there a bug or thread I can read?

> Aside from startup time, I am also concerned about animations. Running the
> sunspider, v8 or kraken benchmarks on a page with animations will run slower
> than it should, currently, due to blocking. Likewise it would slow down
> real-world applications like games that can max out the CPU on JavaScript.
> 

When we get to the point where we care about our scores on gfx benchmarks, we'll want to make this change.  We're not there yet.

> What about not blocking during startup and page load? That is, to suffer
> rendering glitches. The partially rendered page can look odd anyhow during
> those times. I wonder which way would, psychologically, appear faster to the
> user...

I'm not sure what you're suggesting; writing directly to the front buffer from the content process?  I'd rather just reduce the repaint frequency or not paint at all, as bsmedberg proposed in comment 9.

So, to be clear, I don't much care about Update() during startup because I think that'll be uncommon.  However, I do care about Update() during normal page loads.  Data there would be interesting.  Presumably (hopefully!) the chrome process will not be busy so Update() overhead is lower.  But if the chrome process *does* have a high event-loop latency during normal page load, we need to fix that, too.
Ok, I see your point. Let's focus on Update during startup, then, for now. My concern with reducing the # of repaints, or not repainting at all, is that that will give the user no indication that something is happening, and the experience will be of slowness or stalling. Letting the child paint into the front buffer during that time would show progress - with glitches though. So not sure what is better here. How bad would the glitches be?
Yes, there are several tradeoffs here.  Tearing, weirdly rotated content from exposing our "hidden" toroidal surfaces, inconsistent parts of the page like mismatched half-image/half-text, randomly weird-colored pixels, plenty more. These can stay on screen for an arbitrarily long time, until the content process finishes painting.  Those glitches make us look like amateurs, and I've already seen roc r- a patch that was set up to do something similar (bug 483409).  I'd rather not go there.
I see. That does sound very bad.

So, how about a pref value that affects how often we paint during startup?
I suspect (but don't know) that repaint-during-load is affected by interruptible reflow, for which we set magic constants tuned for desktop, where reflow affects UI responsiveness.  With chrome/content process separation, reflow only potentially affects perceived page load time, so it definitely seems worth tickling those constants some for fennec.  That's worth a separate bug (CC roc and bz, plz).
Whiteboard: [fennec-4.1?]
Assignee: nobody → azakai
tracking-fennec: - → 7+
Summary: De-Sync PLayersChild:Msg_Update IPC message → Sync paint buffers during startup blocks the content process and can cause poor startup time
Whiteboard: [fennec-4.1?]
given the lack of traction here, I'm clearing the tracking flag. Please re-nom if this is still needed/wanted
tracking-fennec: 7+ → ---
Assignee: azakai → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.