761933 - Gaia homescreen doesn't repaint while panning around when loaded out of process

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Description

•

12 years ago

If the homescreen is loaded in <iframe ... remote="true">, its content process crashes.  Not sure why yet.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

12 years ago

Blocks: b2g-e10s-work

JP Rosevear [:jpr]

Updated

•

12 years ago

blocking-basecamp: --- → +

Andreas Gal :gal

Comment 1

•

12 years ago

How are we doing here?

Andreas Gal :gal

Updated

•

12 years ago

Priority: -- → P1

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 2

•

12 years ago

Will see if fixing this ameliorates bug 779968.

Assignee: nobody → jones.chris.g

Blocks: 779968

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 3

•

12 years ago

The crash is gone (maybe I was testing the wrong thing?), but there are some other bugs.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 4

•

12 years ago

Well, the good news (although possibly bad news) is that the layers txn counter and composition fps stay around 60.  Need to retest with the stop-excessive-recomposites patches.

Bad news is that, if I pan around really quickly on the homescreen, the content just stops moving.  It doesn't start moving again until I stop panning around.  We're not repainting unnecessarily or anything silly like that.  First suspicion is that we're not marking the homescreen active/visible properly and it stays at -10 nice or something.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 5

•

12 years ago

Yay, with bug 778036 the not-repainting goes away.

But the fps is still the same :(.

No longer blocks: 779968

Depends on: 778036

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 6

•

12 years ago

Since I'm about 80% sure I was originally testing with the system app, I'll go ahead and morph this into the other blocker that it is, and leave open until bug 778036 lands so that we can verify the fix still works.

Summary: Crash when loading gaia homescreen app out of process → Gaia homescreen doesn't repaint while panning around when loaded out of process

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 8

•

12 years ago

I wasn't seeing this with patches from bug 778036, but now I am again, even after it landed :(.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

12 years ago

Blocks: 784435

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 9

•

12 years ago

The things we can try here, in parallel, are
 1. revert m-c to 8/15-era revision, re-apply bug bug 778036 and re-confirm that perf is good.  Bisect from there to tip to see what regressed this.
 2. check that the system app isn't invalidating or processing any new messages that cause the b2g process to do nontrivial work.  Some changes for async pan/zoom might have caused this.
 3. drop the process priority of the b2g process to the same as foreground content processes, and instead just increase the priority of the compositor thread

I'll look at (1).  Anyone want to volunteer to poke at (2) or (3)?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

12 years ago

Blocks: 784441

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 10

•

12 years ago

On http://hg.mozilla.org/mozilla-central/rev/f111e43bc076 and https://github.com/mozilla-b2g/gaia/commit/7ec42eac729ef7e0e464969797c6d3fb56dac2d5 , with bug 778036 applied, I still see the bug.

Time to poke at gaia a bit ...

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 11

•

12 years ago

Can't reproduce my results on https://github.com/mozilla-b2g/gaia/commit/d5f5e024109ce8b788e052fdef8d4f2c7689f4b2 with this patch applied

diff --git a/apps/system/js/bootstrap.js b/apps/system/js/bootstrap.js
index 1cd9052..8dfe405 100644
--- a/apps/system/js/bootstrap.js
+++ b/apps/system/js/bootstrap.js
@@ -51,6 +51,7 @@ function startup() {
   home.src = src;
   home.setAttribute('mozapp', src.replace('index.html', 'manifest.webapp'));
   home.setAttribute('mozbrowser', 'true');
+  home.setAttribute('remote', 'true');
   home.dataset.zIndexLevel = 'homescreen';
   var windows = document.getElementById('windows');
   windows.parentNode.insertBefore(home, windows);

i.e., the pre-homescreen-as-app change to gaia.

So back to the drawing board.  Grrrrr.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 12

•

12 years ago

Transaction counter is still showing that we're not overdrawing or anything like that.  The entire gfx pipeline just shuts down while panning around.

The cuddle animations are still running doubly-interpolated (content main thread along with compositor thread) at 60fps, which is odd ... the content process can repaint at 60fps alongside the compositor resampling at 60fps, but it can't repaint at *all* while the main thread is routing input events.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 13

•

12 years ago

Making the master-process priority equal to the foreground priority (and also equal to background, to eliminate possibility of bugs) doesn't change anything.  Even prioritizing the foreground process above the master process doesn't help.  Aieeeee!

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 14

•

12 years ago

(the setpriority() calls are correct and aren't returning error codes)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 15

•

12 years ago

Thinking back, I might have been testing on the Nexus S.  A kernel- or event-hub-specific problem might be contributing to this.  Will retest on nxs in a bit to double check.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 16

•

12 years ago

In the first 1 second of fast-panning, we process 39 events in the content process, and they're generally received 10-30ms apart.  There's a 200ms outlier gap though, early in the sequence.  This seems to be when we prerender the adjacent screen.

We're not using input-event compression, which is a known bug, and this is causing the queue to back up a bit.

But the event handling everything on the content-process side looks ok ...

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 17

•

12 years ago

Alright, we're not sending PLayers:Update messages at all while panning.  Something is going badly wrong.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 18

•

12 years ago

I'm seeing 2-3 PuppetWidget::Invalidate/Paint total during panning, but per discussion with mattwoodrow I think this is expected.

Now to see if we're getting to nsViewManager::ProcessPendingUpdatesForView/nsRefreshDriver::Notify ...

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 19

•

12 years ago

nsRefreshDriver::Notify isn't being called.  Matt, any ideas about that?

Checking now whether ProcessPendingUpdatesForView is being called separately ...

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 20

•

12 years ago

No, we don't hit ProcessPendingUpdatesForView either.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 21

•

12 years ago

The symptom here is that while panning around in OOP web content (the gaia homescreen), we don't repaint at all.

What's happening is
 - content receives touch events
 - it sets the -moz-transform of some elements based on the touch events
 - the restyles required by those changes are being posted
 - we're scheduling a view manager flush
 - but there are no refresh observers registered with that refresh driver, so we never hit nsRefreshDriver::Notify() and never repaint

In the steady state of no animation or image refresh observers, what's supposed to observe refresh driver ticks?  For some reason, whatever that is seems not to be running in content processes.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 22

•

12 years ago

When style changes happen, nsCSSFrameConstructor::PostRestyleEventInternal calls mPresShell->GetPresContext()->RefreshDriver()->AddStyleFlushObserver(mPresShell). That's what triggers a new refresh driver tick to flush styles on the presshell.

Andreas Gal :gal

Comment 23

•

12 years ago

I changed this code in my original OMTA patch to no longer request that refresh driver tick. Maybe thats what breaks here? dz probably kept the basic design.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 24

•

12 years ago

This is all restyling from explicit -moz-transform changes, no animations.  It works in-process.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 25

•

12 years ago

Animations work fine OOP too, 60fps (with sidecar interpolation).

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 26

•

12 years ago

Here's approximately what's happening (with lots of logging, which is making things about 4x more costly)
 - the homescreen starts on the page with just the search icon
 - the content process is processing touchmove quickly, keeping up with dispatch from the parent
   * because we're on the page with just the icon, restyles and invalidates are cheap
   * content process is sleeping between events
 - we process the first refresh driver tick, which repaints the second page
 - the second page is more complex (lots more icons).  repainting is relatively expensive
   * while repainting, the event queue continues filling up
 - we being flushing event queue
   * restyles take longer now because we're processing both pages (?)
   * content process no longer sleeping between events
 - for some reason, we don't process more refresh driver ticks

So there are three bad things going on here
 1. not compressing input events.  This is the worst.
 2. restyles taking longer than they should.  Exacerbates (1)
 3. not process refresh-driver timer even though it's fired.  Related to (1), but possibly indicates event priority problem.

bent, do you have any idea why (3) might be occurring?

Depends on: 774988, 779968

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Updated

•

12 years ago

Depends on: 784647

Andreas Gal :gal

Comment 27

•

12 years ago

Should we send timer events from the parent to avoid priority issues?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 28

•

12 years ago

That will just create different performance problems.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 29

•

12 years ago

Based on some unscientific tests, I think we'll be able free up 8-10MB of locked memory by moving the homescreen OOP.  And that's not counting the system wallpaper, which the system app is still rendering in my test.  That might free up another MB.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 30

•

12 years ago

The panning fps on OOP homescreen has gone down by about 10fps since last I checked.  Grrrrr.  Have some more work to do to move it OOP.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 31

•

12 years ago

The work here is done.  Optimizations remain.

Status: NEW → RESOLVED

Closed: 12 years ago

No longer depends on: 779968

Resolution: --- → FIXED