Closed Bug 975831 Opened 7 years ago Closed 7 years ago

virtual-list-demo.html checkerboards on buri with v1.4

Categories

(Core :: Graphics, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
1.4 S3 (14mar)

People

(Reporter: bkelly, Assigned: bkelly)

References

()

Details

(Keywords: perf, Whiteboard: [c= p= s= u=])

As part of our effort to improve the efficiency of our app code when scrolling long lists, its been suggested we move to a recycled node solution with absolute positioning.  This is described in this blog post by roc:

  http://robert.ocallahan.org/2014/02/implementing-virtual-widgets-on-web.html

With this demo list:

  http://people.mozilla.org/~roc/virtual-list-demo.html

We have started to investigate implementing this in contacts over in bug 975680.  I think email is planning to move to absolute positioning in bug 796474.

Before doing that work it would be good to understand:

1) Should this demo work well without checkerboarding on b2g?  If I understand correctly it was an attempt to implement a long list that is both memory efficient and CPU efficient with minimal reflows.

2) If yes, what is going wrong on b2g to trigger checkerboarding in this relatively simple demo?

Currently I see significant checkerboarding on b2g v1.4 on my buri device.  Here is a video at gecko rev 1238ef12b996:

  https://www.dropbox.com/s/a7k68f2x7luiotk/20040223_virtual_list_checkerboard.MOV

I'm taking this for now to investigate.
One potential issue is that it appears to set a large display port which results in buffers which must be allocated in system memory:

I/Gecko   ( 1450): ### ### SetDisplayPortForElement: x:0.000000 y:-875.239 width:980.000000 height:3175
I/Gecko   ( 1450): ### ### SetDisplayPortForElement: x:0.000000 y:-750.14 width:980.000000 height:3175
I/Gecko   ( 1450): ### ### SetDisplayPortForElement: x:0.000000 y:-154.052 width:980.000000 height:3175
E/memalloc( 1120): /dev/pmem: No more pmem available
W/memalloc( 1120): Falling back to ashmem
Branch with the virtual-list-demo.html as an app:

  https://github.com/wanderview/gaia/tree/virtual-list-demo

This runs better than in the browser, probably since zoom is not a factor.  It still gets some checkerboarding, though.
Profile with hwc:

  http://people.mozilla.org/~bgirard/cleopatra/#report=cf105361e6a557ded9e484c5b00efda5f021da34

Profile without hwc:

  http://people.mozilla.org/~bgirard/cleopatra/#report=55a812b29b42c8c968756864b6b0a701875e7b74

In both cases we get layouts up to 80ms to 120ms.  With hwc we have rasterizes 100+ms and without hwc we have rasterizes at 400ms to 500ms.

It also seems the demo's getScrollPos() function can trigger 80ms to 100ms sync reflows.
It seems maybe 50ms of the large layouts is due to building the text runs.
For the hwc case I see some eglGetRenderBufferANDROID calls taking up to 14ms in their lockBuffer.  This is causing the composite to blow its 16ms budget.
Most of the big rasterizes in the hwc case seem to be in RotatedContentBuffer::BeginPaint.
I also see these in the log when hwc is enabled and checkerboarding occurs:

E/msm7627a.hwcomposer(  136): hwc_set: Unable to render by hwc due to non-pmem memory
Update mozilla-central to 31113754db3b.  Profile with hwc:

  http://people.mozilla.org/~bgirard/cleopatra/#report=d56c241c36fe7b107aa1b9cc3ab9001d40ffdc5c

I should also note that its much easier to trigger checkerboarding with profiling enabled.
I tried the patches from bug 921858 for font rendering speed, but they did not help.
I updated the list style a bit.  I shrunk the text to avoid overlapping text in case that was some kind of slow path.  I also set the background-color of the body to #fff so that it will scroll with the list.

Unfortunately checkerboarding can still be triggered with fast scrolling.  Another profile:

http://people.mozilla.org/~bgirard/cleopatra/#report=94b4da4d37b2fc7e1819e39d1e84b9d226b70b78

Note, the profiles are definitely worse than normal.  Turning on profiling with:

./profile.sh start -t Compositor -f js,threads,stackwalk -p b2g && ./profile.sh start -f js,stackwalk -e 200000 -p List

Causes about 30% cpu usage on my buri:

  PID PR CPU% S  #THR     VSS     RSS PCY UID      Name
 2412  0  16% S    38 176656K  75844K  fg root     /system/b2g/b2g
 2625  0  16% S    13  73588K  27636K  fg app_2625 /system/b2g/plugin-container

And scrolling the list fast uses ~90% to ~95% CPU usage.

So the times are not equivalent to normal operation in the profile, but I would think they should still show proportionally where we're spending time.
(In reply to Ben Kelly [:bkelly] from comment #0)> 
> Before doing that work it would be good to understand:
> 
> 1) Should this demo work well without checkerboarding on b2g?  If I
> understand correctly it was an attempt to implement a long list that is both
> memory efficient and CPU efficient with minimal reflows.

I assume you're testing this by scrolling as fast as you can? There's no guarantee we can avoid checkerboarding in that case. Laying out and rendering screenfuls of new items at 60fps is just really hard and depending on the item content and the device, we may or may not be able to do it. I guess the question is: if we create a static page with the same content, can we scroll that without checkerboarding?

Having said that, if you're scrolling really fast then the code in the demo isn't ideal yet, because it guesses the current displayport --- which the APZC is dynamically adjusting in response to scroll speed. We need to do what I suggested in my blog post, which is to expose the current displayport to script.

(In reply to Ben Kelly [:bkelly] from comment #3)
> Profile with hwc:
> http://people.mozilla.org/~bgirard/cleopatra/
> #report=cf105361e6a557ded9e484c5b00efda5f021da34
> http://people.mozilla.org/~bgirard/cleopatra/
> #report=55a812b29b42c8c968756864b6b0a701875e7b74
> 
> In both cases we get layouts up to 80ms to 120ms.  With hwc we have
> rasterizes 100+ms and without hwc we have rasterizes at 400ms to 500ms.

The slow rasterizations in these profiles are dominated by buffer allocation, which seems wrong. Something is definitely wrong here in the layers system and we should be able to make it go away.

> It also seems the demo's getScrollPos() function can trigger 80ms to 100ms
> sync reflows.

We can probably reduce the cost of reflows a bit by simplifying the layout of the items. For example instead of making each one a flexbox, make each item a block with a single line, putting each chunk of text in an inline-block of fixed size. But I doubt we can do much to reduce the number of reflows; when scrolling at high speed, we basically need to reflow every item every time because they'll all have new content.

I suppose we could make each item a <canvas> and draw it ourselves, bypassing reflow that way. That sounds a bit crazy but it might work. I'd rather not go down that path because some kinds of apps would be very hard to write that way (e.g. apps with editable items).
Will having reflow roots potentially solve this problem?
(In reply to Ben Kelly [:bkelly] from comment #1)
> I/Gecko   ( 1450): ### ### SetDisplayPortForElement: x:0.000000 y:-875.239
> width:980.000000 height:3175

To make the test page more representative of what we'd be doing in Gaia apps you should set a meta-viewport tag on the page so the width isn't 980px.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #11)
> The slow rasterizations in these profiles are dominated by buffer
> allocation, which seems wrong. Something is definitely wrong here in the
> layers system and we should be able to make it go away.

Probably what's happening is that items are toggling between having and not having their own layers based on how frequently the 'top' property is being set.
It would be worth trying to use "transform" instead of positioning, and using "will-change:transform" to ensure the items are permanently layerized.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #13)
> (In reply to Ben Kelly [:bkelly] from comment #1)
> > I/Gecko   ( 1450): ### ### SetDisplayPortForElement: x:0.000000 y:-875.239
> > width:980.000000 height:3175
> 
> To make the test page more representative of what we'd be doing in Gaia apps
> you should set a meta-viewport tag on the page so the width isn't 980px.

The app version of the demo effectively accomplishes this.  All the data after comment 2 are using this version with an appropriate width.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #14)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #11)
> > The slow rasterizations in these profiles are dominated by buffer
> > allocation, which seems wrong. Something is definitely wrong here in the
> > layers system and we should be able to make it go away.
> 
> Probably what's happening is that items are toggling between having and not
> having their own layers based on how frequently the 'top' property is being
> set.

I can confirm I see the items getting switched to individual layers when checkerboarding appears.
Yeah, Vlad's video confirms it. Using transforms and will-change:transform may fix it. Although we may have another problem where an item goes offscreen temporarily and its layer is destroyed so we need to reallocate the layer when the item becomes visible again.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #11)
> I assume you're testing this by scrolling as fast as you can? There's no
> guarantee we can avoid checkerboarding in that case. Laying out and
> rendering screenfuls of new items at 60fps is just really hard and depending
> on the item content and the device, we may or may not be able to do it. I
> guess the question is: if we create a static page with the same content, can
> we scroll that without checkerboarding?

Regardless of other fixes, I think we should probably cap our velocity at some maximum.  There is going to be some limit to how fast we can reasonably scroll on these devices and we should just avoid exceeding that limit.

The non-APZC code in BrowserElementPanning.js uses 6 px/ms.  Instrumenting the code, I see I am hitting values of 6.5 px/ms to 7 px/ms with APZC enabled.

Hacking in a cap here:

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/Axis.cpp#92

I see some slight improvement with a limit of 6 px/ms.  To completely get rid of checkerboarding with the current code, however, I have to drop it to between 3.5 px/ms to 4.0 px/ms.  This seems a bit low numerically, but still feels pretty responsive.  Maybe we could consider this as a backup plan to help minimize checkerboarding in v1.4.
(In reply to Ben Kelly [:bkelly] from comment #5)
> For the hwc case I see some eglGetRenderBufferANDROID calls taking up to
> 14ms in their lockBuffer.  This is causing the composite to blow its 16ms
> budget.

This is something I've noticed under other workloads too, once you factor in the time taken for the IPC between the main and content process it tends to create pretty large gluts in the profile. In general I've seen the time it takes to generate and sometimes manipulate GL contexts/surfaces in the driver to vary a *lot* when under heavy load. The moral is: if we're creating a lot of layers at the same time we can't really trust the GPU driver to do so with reliable timing.
(In reply to Ben Kelly [:bkelly] from comment #19)
> Regardless of other fixes, I think we should probably cap our velocity at
> some maximum.  There is going to be some limit to how fast we can reasonably
> scroll on these devices and we should just avoid exceeding that limit.

Good idea; filed bug 976035 for this.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #18)
> Yeah, Vlad's video confirms it. Using transforms and will-change:transform
> may fix it. Although we may have another problem where an item goes
> offscreen temporarily and its layer is destroyed so we need to reallocate
> the layer when the item becomes visible again.

For reference, I believe this is the link to that video:

https://dl.dropboxusercontent.com/u/8559503/VID_20140223_220252.mp4
(In reply to Gabriele Svelto [:gsvelto] from comment #20)
> This is something I've noticed under other workloads too, once you factor in
> the time taken for the IPC between the main and content process it tends to
> create pretty large gluts in the profile. In general I've seen the time it
> takes to generate and sometimes manipulate GL contexts/surfaces in the
> driver to vary a *lot* when under heavy load. The moral is: if we're
> creating a lot of layers at the same time we can't really trust the GPU
> driver to do so with reliable timing.

Yea, I get the sense that we are running very close to 100% CPU most of the time while scrolling fast.  Anything that pushes us over the edge seems to cause things to pile up and degrade the entire system quickly.

I feel like we should probably have some target CPU usage while scrolling a reference workload (this demo or another reference list).  We need to have some breathing room so real apps have time to do other work like IDB, etc.

I'm not sure what that CPU target should be, but < 70% would be nice so we can run the profiler without pushing things over the edge.
Here's a better profile of the current virtual-list-demo app:

  http://people.mozilla.org/~bgirard/cleopatra/#report=96854687bd9d16fe6d3581f1168f771002791122

My previous profiles were accidentally run with 1ms sampling which is too fast with stackwalk enabled.

This gives more reasonable, albeit still too slow, times.  Sync reflows are ~30ms, large rasterize ~150ms, large layouts ~60ms.

I'll try translate approach next.
So I tried changing the code to use translateY() instead setting top and putting will-change:transform on the .item class.  Unfortunately it made things worse.  Now each item is in its own layer all the time.

I'll try to force them into a single layer, but any clues would be helpful.
I think we can avoid the layerizing issue by just not recycling nodes.  Here is a profile where I delete the node from the DOM instead of reusing it:

http://people.mozilla.org/~bgirard/cleopatra/#report=b7c22bc93c1f38c938c6e22a7f2bd0647475d635
This patch seems to avoid most checkerboarding, although still uses 80% to 90% CPU will scrolling.  If we want to do things like access IDB or use more complex CSS, I expect we will need to lower that further.

  https://github.com/wanderview/gaia/commit/8a172fc2153e00ef92a6a34e20f1fae720ca08f1

Here's a profile:

  http://people.mozilla.org/~bgirard/cleopatra/#report=4b836d4a4084e66206370dbaa9920e510bb6c244
Ok, one more update to the demo app.

  https://github.com/wanderview/gaia/commit/477bca542c185982da3d8a8844e9554a3e79f8a1

I changed basically two things:

1) Recycle the nodes, but detach and re-attach them from the DOM to avoid the layerization behavior.
2) Append the nodes to the DOM after they have been populated and had their top position set.

This drop CPU usage while scrolling to 70% to 80%.  The profile also looks much better to me:

  http://people.mozilla.org/~bgirard/cleopatra/#report=71981579c39fa0de187c1418b9e9cb1db209723b

Note, it does seem BrowserElementPanning is a bit heavy when we trigger touch events while swiping.  It seems there is often a touch event when there is a gap in the composition pipeline.  The panning stuff is now handled by APZC, but I guess we still need that code to handle mouse/touch events properly for apps?
That last profile in comment 28 was with tiles (as currently in m-c) enabled.  Most of the CPU gain came from the tiling.
I guess my overall take aways so far are:

1) When recycling make sure you pull the node furthest from the direction you are scrolling toward.
2) Detach the recycled node and then reattach it to avoid layerization issues.
3) Enable tiling.

Anything else I should try?
Thanks for all that information! It's very interesting.

I don't understand why making the items permanently layerized made things worse. What you did is a good workaround to avoid giving items their own layers, but I'm not sure why it's needed.
:bkelly -- thanks for diving deep into this. I have been away from computer tending to some family things, and did an experiment for email list scrolling, mentioned in bug 796474, comment 17.

So I need to catch up with your work in the app context here some more, as the family item allows. In the meantime, sorry for asking instead of just researching:

I am curious how the list app avoids having a scrollbar in the scroll view. I have found the scrollbar creates some dropped animation frames for our card transitions (so card transitions, not scrolling) if the scrollbar is around and fading out, and curious that if I shut it off for email if that also helps the scroll case. Although I do like the sense of size it gives the user on the scroll area. In any case, just curious about the effect of with or without a scrollbar.
Hey Ben - I think your last scroll implementation may not be updating correctly. I've upstreamed your fixes to bug 975680 with a fix for this as well.

I've also uploaded the js and css files of the solution to it's own repo here: https://github.com/kevingrandon/recyclist (If we go with this we can move it to mozilla-b2g). I want to add some standalone unit tests/perf tests to this repo if possible.

I'm not sure if it's correct in all cases, but I update last scroll here: https://github.com/KevinGrandon/recyclist/blob/19a63212022b9f9bac24de65630f4b6ff8b9b971/recyclist.js#L138
(In reply to Ben Kelly [:bkelly] from comment #28)
> Note, it does seem BrowserElementPanning is a bit heavy when we trigger
> touch events while swiping.  It seems there is often a touch event when
> there is a gap in the composition pipeline.  The panning stuff is now
> handled by APZC, but I guess we still need that code to handle mouse/touch
> events properly for apps?

AFAIK the only useful thing BrowserElementPanning does at this point (i.e. with APZ enabled) is:
1) Set the active element on touchstart, and reset it as appropriate
2) Handle the Gesture:DoubleTap message to do a zoom-to-rect

Item (1) on the list is probably what is showing up in your profiles, since that listens for touchstart/touchmove events. Bug 970125 is on file for moving the useful functionality of BEP.js into C++ and getting rid of it.
(In reply to Kevin Grandon :kgrandon from comment #33)
> I'm not sure if it's correct in all cases, but I update last scroll here:
> https://github.com/KevinGrandon/recyclist/blob/
> 19a63212022b9f9bac24de65630f4b6ff8b9b971/recyclist.js#L138

Thanks Kevin!  I indeed was not updating lastScrollPos.  I think that should only invalidate my results when scrolling backwards, though.  I pushed a fix to my virtual-list-demo branch.  I just updated it at the end of generate() since we already have the new scrollPos there.
(In reply to James Burke [:jrburke] from comment #32)
> I am curious how the list app avoids having a scrollbar in the scroll view.
> I have found the scrollbar creates some dropped animation frames for our
> card transitions (so card transitions, not scrolling) if the scrollbar is
> around and fading out, and curious that if I shut it off for email if that
> also helps the scroll case. Although I do like the sense of size it gives
> the user on the scroll area. In any case, just curious about the effect of
> with or without a scrollbar.

So maybe because the body is defaulting to overflow:visible instead of overflow:scroll?  I can try playing around with it.

It would be nice to have the scroll bar if we can set it such that it understands the full height of the list.  I think the shrinking/growing behavior is probably not intuitive to the average user.

I guess I would recommend trying to get it to scroll smoothly without the scroll bar first and then add it back in once we know we have enough cycles for that.
(In reply to Ben Kelly [:bkelly] from comment #35)
> 
> Thanks Kevin!  I indeed was not updating lastScrollPos.  I think that should
> only invalidate my results when scrolling backwards, though.  I pushed a fix
> to my virtual-list-demo branch.  I just updated it at the end of generate()
> since we already have the new scrollPos there.

I'm not sure that is going to be ideal as that will mean that the setTimeout will fire after updating the last scroll position, and we will call .pop or .shift incorrectly for the async generation. I think it needs to occur after the async generate.
(In reply to Kevin Grandon :kgrandon from comment #37)
> I'm not sure that is going to be ideal as that will mean that the setTimeout
> will fire after updating the last scroll position, and we will call .pop or
> .shift incorrectly for the async generation. I think it needs to occur after
> the async generate.

It will be set at then end of the sync generate() call in that case.  Since we get the new position in each generate() call we want to update the lastScrollPos() when we leave generate().
So I'm having difficulty getting this demo to work with a scrollable div.  The absolute positioned items remain positioned where they are while the div scrolls underneath them.

I'm doing something like this:

  <div id="outer"><div id="inner"></div></div>

  #outer {
    height: 480px;
    width: 100%;
    overflow: auto;
    background-color: #fff;
  }
  #inner {
    height: 100%;
    overflow: hidden;
  }

I updated the list code to append to the scrolledChild instead of the body directly.

Can anyone with more CSS-foo point me in the right direction?
And with these app javascript settings:

  var scrolledChild = document.getElementById("inner");
  var scrollEventNode = document.getElementById("outer");
  function getScrollPos() { return scrollEventNode.scrollTop; }
  function getScrollPortHeight() { return scrollEventNode.clientHeight; }
Hey Ben - I also made an example app within an element, here: https://github.com/KevinGrandon/recyclist/tree/master/example

It's possible to add headers/footers there. Do you just want to use that app instead? I haven't done a diff of the code, but I did change a bunch of stuff to get it to work with nested elements.
I potentially just broke that demo - fixing now.
Thanks Kevin, but I figured out I needed a position:relative on the inner div right when you replied.  Also, I want to keep this minimal demo app as close as possible so the profiles can be more easily compared.

I pushed a change to the demo app that puts the list into a scrollable div and shows the scrollbar:

  https://github.com/wanderview/gaia/commit/54979ed91eb5a977012d7de83b28a780150ed6f9

I left the CSS to hide the scrollbar commented out.

Here is a profile:

  http://people.mozilla.org/~bgirard/cleopatra/#report=c092d70a6b9338f76eb6dc243242afcd9802b56e

The scrollbar doesn't seem to hurt that much.  It appears to get its own, very small layer.

James, if you enable layer borders in the developer menu, do you see one large layer for the list and a tiny, separate layer for your scrollbar?  Or do you have more layers going on?
Flags: needinfo?(jrburke)
I would also mention it seems really easy to accidentally render the entire list without realizing it.  In the library version we should probably add some safety belts that throw if you initialize with a zero template.clientHeight or an obscenely large scrollPortHeight.
(In reply to Ben Kelly [:bkelly] from comment #43)
> Thanks Kevin, but I figured out I needed a position:relative on the inner
> div right when you replied.  Also, I want to keep this minimal demo app as
> close as possible so the profiles can be more easily compared.

Sure thing, just keep posting the diffs in here so I can keep that repo updated as necessary. Thanks!
:bkelly: I just see a tiny separate layer for the scrollbar, thanks for setting me straight on it, does not seem to be a primary factor for this kind of scrolling.
Flags: needinfo?(jrburke)
Because caffeine induced insomnia, I have another change:

  https://github.com/wanderview/gaia/commit/8ab40a9d273395731cfb8b8674de35626754919b

This adds a bit of complexity but reduces the number of nodes we need basically down to just the viewport size.

The idea is to switch to an active mode on the first scroll event.  Then every 5th animation frame call generateItems(1).  Because we are working off of animation frames we can guarantee the next page of nodes are rendered in time.

Right now its a constant to do work every 5 frames, but in theory we could calculate that if we could read the max velocity pref.  480 px/screen / [ 16 ms/frame * 6 px/ms ] = 5 frames/screen

Because we are retrieving the scrollPos in generateItems() for our other work anyways we can detect the end of the scroll manually without scroll events.  So this is independent of scroll event frequency, which I think is nice.

CPU seems slight better; maybe 5% or so.  Here's a profile:

  http://people.mozilla.org/~bgirard/cleopatra/#report=179549a862714693441503a26ab69ff31332d180

I guess a downside of this is that we might trigger some extra paints by calling requestAnimationFrame(), but it seems we will be painting while scrolling anyway.

Thoughts?
(In reply to Kevin Grandon :kgrandon from comment #45)
> (In reply to Ben Kelly [:bkelly] from comment #43)
> > Thanks Kevin, but I figured out I needed a position:relative on the inner
> > div right when you replied.  Also, I want to keep this minimal demo app as
> > close as possible so the profiles can be more easily compared.
> 
> Sure thing, just keep posting the diffs in here so I can keep that repo
> updated as necessary. Thanks!

I think one thing we should investigate is how to handle dynamic updates to the list.  It probably makes most sense to do that in the library version.

For example, new items appended when already scrolled to the bottom.  Do we keep our current position and append off screen?  Or do we scroll so we are still at the bottom?

More importantly, how does an app tell the library that an item was inserted into the viewable region or a viewable item changed?  And how does the library handle that?

I think those are probably solvable by clearing itemsInDOM and re-generating, but I wonder how big a penalty we will pay.
Dynamic items are a tough situation, I think generally this needs to be handled by the app though. I was doing some prototyping and will upload the code soon, but I was only updating the current item count of the list library, and the app code was just in charge of rendering items at an index. So on edit you would need to:

1 - Update local working dataset after edit.
2 - Call recyclist.fix()
3 - Render items at new indexes.

This will be tough for contacts if for example we save and contact and need to shift a large dataset. (Unless the sorting algorithm is easy and we can just Array.splice() where needed) One hacky solution is to reload the entire list, but that sucks.
(In reply to Ben Kelly [:bkelly] from comment #48)
> I think one thing we should investigate is how to handle dynamic updates to
> the list.  It probably makes most sense to do that in the library version.

The e-mail app wants this functionality for sure.  See https://bugzilla.mozilla.org/show_bug.cgi?id=796474#c7 for specific details.
 
> For example, new items appended when already scrolled to the bottom.  Do we
> keep our current position and append off screen?  Or do we scroll so we are
> still at the bottom?

In the e-mail app case, there's a concept of "the user is looking at 'now' and wants to see new messagess that come in."  Now is at the top of the list for us.  For the contacts case that I think is currently driving this, I don't see why you'd latch to the bottom.  It seems reasonable to pick a single contact as the stable point; in e-mail this is currently the top-most displayed message but we've since gained a concept of the focused message because of our next/previous functionality, so this may change a little bit.

> More importantly, how does an app tell the library that an item was inserted
> into the viewable region or a viewable item changed?  And how does the
> library handle that?

The e-mail app's mechanism is based on Array.splice-style notifications.  It gets a notification like (index, how many to delete, the list of items to add at that index.)  The single protocol covers addition and deletion.  Updates are handled separately, but all are currently coalesced/batched.
 
> I think those are probably solvable by clearing itemsInDOM and
> re-generating, but I wonder how big a penalty we will pay.

Constantly regenerating seems like it could get expensive.  In the e-mail app case with a server that's not great at batching/using Nagle, it's quite feasible for us to have results stream in one-by-one and for constant regeneration being very inefficient.
(In reply to Andrew Sutherland (:asuth) from comment #50)
> and wants to see new messagess that come in."  Now is at the top of the list
> for us. 
...
> It seems reasonable to pick a
> single contact as the stable point; in e-mail this is currently the top-most
> displayed message but we've since gained a concept of the focused message
> because of our next/previous functionality, so this may change a little bit.

Clarification: I mean top-most currently visible message, not in the entire conceptual message folder.
Yea, I think how to handle changes in the absolute ordering is up to the app.  I just meant we need a way for the app to tell Recyclist to regenerate effected items.

I think the array splice is a good metaphor and we probably want a similar API.  Something like:

  Recyclist.splice({
    index: 1234,
    oldCount: 10,
    newCount: 20,
    autoScroll: false
  });

This would let Recyclist only call populate() again for effected items.  I think this should be fairly straightforward since it has its itemsInDOM array.

Going back to the app side of the problem, I still think the easiest way to do deal with absolute positioning is to create an index in IDB.  We can then directly map the populate() index to an item by using IDBCursor.skip().  For scrolling we would pretty much be grabbing the next item every time.  I just don't know if its fast enough for our maximum scroll velocity and limited CPU budget.

I guess I should prototype a simple IDB solution here to see how fast it is.  Also, it probably suggests a change to the populate() API since we need an async way to populate.  (I don't see any reason to render a placeholder synchronously as its basically wasting cycles and is equivalent to checkerboarding anyway.)
(In reply to Andrew Sutherland (:asuth) from comment #50)
> In the e-mail app case, there's a concept of "the user is looking at 'now'
> and wants to see new messagess that come in."  Now is at the top of the list
> for us.  For the contacts case that I think is currently driving this, I
> don't see why you'd latch to the bottom.  It seems reasonable to pick a
> single contact as the stable point; in e-mail this is currently the top-most
> displayed message but we've since gained a concept of the focused message
> because of our next/previous functionality, so this may change a little bit.

In the SMS app we do something similar to the e-mail behavior: in the threads list we stick to the top so if the user is already at the top we scroll in a new thread, if it's somewhere else in the list we don't autoscroll to the top as that would be confusing. In the messages view we do exactly the same but sticking to the bottom where the newest messages are.
I have increased the complexity further to support async preparation of items.  I chose to incorporate the async work into the framework instead of letting the app due it so that we can avoid adding nodes at all if they scroll out of the displayport before the async prep work completes.

  https://github.com/wanderview/gaia/commit/9eb84c125cf4ba1f946906ea73cc20fdc3649bab

The display port multiplier is now essentially a tunable that indicates how slow you think your async prepare is.  If you have a slow preparation step then you need to increase the multiplier.  If you have everything immediately in memory you can shrink the multiplier quite low.

Here's a profile:

  http://people.mozilla.org/~bgirard/cleopatra/#report=91799a05276e72b06beb7d1f038dd55221c95c18
I have a naive IDB implementation in the app side which uses objectStore.get().  Even with a multiplier of around 4 I get some checkerboarding at max velocity.  I think we may need the tiling fix in order to have enough CPU.  I'll see if maintaining a cursor helps.
Another feature/use-case I realize e-mail wants for recyclist: hints/support for being smart about doing extra work for displayed messages.

The email app fetches snippets for messages on demand based on what's visible in the window.  It's not particularly smart about avoiding generating device load while scrolling is active.  It generates a (numerically capped) request for the set of currently visible messages.

Also, e-mail currently resolves mozContacts before inserting things into the DOM.  While we have a caching layer, when misses happen we hold up the actual DOM insertions until the lookups complete.  The rationale for the hold-up was to minimize visual flashing and gratuitous DOM reflows.  However, if we knew to keep our actions minimal during scrolling, we could defer new lookups until we get the all-clear.


I think there's 2 specific feature requests here:
- Expose the startIndex/endIndex range for what is currently visible
- Explicitly internalize heuristics/knobs to tell the app when it's okay to expend extra effort on what's visible and maybe where to spend the effort.  For example, it seems reasonable to wait for scrolling to quiesce before triggering snippet fetches.  While the app itself can do this, it's something multiple apps could potentially need to do and screw-up when they do it.


A related feature need is to have a counterpart to 'generate' like 'destroy'.  The e-mail app message objects need to explicitly told to die because we track all live e-mail addresses so we can update display names as mozContacts get added/changed.
Andrew, I think it would be easy to provide an interface like this:

  recyclist.isScrolling attribute that is true/false
  recyclist.onstatechange = function () {
    if (Recyclist.isScrolling) {
      // easy there tiger...
    } else {
      // lets burn some CPU!
    }
  };

The app should already be able to tell whats rendered from the populateItem() calls by adding a item.dataset.appSpecificId.  If the item you are populating already has an appSpecificId set, then you know that particular object just got de-rendered.

Andrew, is that adequate for your needs?  I think it would be best to keep the other internal state like start and end indices private.
Flags: needinfo?(bugmail)
We could also potentially follow the pattern of requestAnimationFrame(), maybe something like recyclist.requestIdleFrame(fn);
(In reply to Kevin Grandon :kgrandon from comment #58)
> We could also potentially follow the pattern of requestAnimationFrame(),
> maybe something like recyclist.requestIdleFrame(fn);

Interesting idea, but I think it might raise some more questions for Recyclist.  Like how often are "idle" frames issued, can we call out to more than one handler in an idle frame, etc?  It also may suggest more knowledge than Recyclist really has about the system.

It might be a cool platform feature, though. :-)

Exposing scrolling/not scrolling on the other hand should be fairly clear.  It's also straightforward as Recyclist already needs to know this.

Just my two cents.
We can always name it something else, but I assumed that it would be nice to try to decouple logic from within a callback. We probably don't want to be polling anything, and the callback probably shouldn't need to check that we're in any specific state. We could potentially pass in scroll state to a callback if desired.

Anyway, there's a lot of different ways we can do this, and I'm not tied to any particular solution.
isScrolling and onstatechange cover the core throttling use-case in a nice simple way.

start/end-index wise, I should re-state.  We want to know what items are currently visible.  Inferring that from what has been requested to be prepared sounds error prone, especially if the display port buffering starts biasing based on apparent scroll direction.

The snippet fetching pseudo-code would be:

  var activeSnipetFetch = false;
  function maybeFillSnippets() {
    if (recylist.isScrolling || activeSnippetFetch) {
      return;
    }
    activeSnippetFetch = true;
    viewSlice.maybeRequestBodies(
      recyclist.firstVisibleIndex, recyclist.lastVisibleIndex, ourOpts,
      function fetchDoneOrDead(err) {
         activeSnippetFetch = false;
         maybeFillSnippets();
      });
  } 
  recyclist.onstatechange = maybeFillSnippets;


recyclist.requestIdleFrame sounds like a convenience mash-up of isScrolling and requestAnimationFrame where it only fires when isScrolling and we get the request animation frame poke.  This sounds like it would be most aligned with CPU bound activities, like if e-mail used canvas to dynamically generate gravatars itself.  But I think we are expecting to block on I/O or interprocess comms most of the time.  If gallery has trouble overwhelming the system with its images / thumbnailing, that seems like a good candidate.
Flags: needinfo?(bugmail)
Well, I meant to base it on populateItem() not prepareItem().  If populateItem() is called and returns true, indicating success, then it's guaranteed to be in the DOM.

I don't want to expose the concept of start/stop because then that forces us to maintain that concept.  It seems like we may want to have the flexibility to do more creative things in the list in the future if necessary.

If you base your logic on populateItem() then the app will stay up-to-date even if the underlying heuristics change.

I suppose we could expose an isIndexInDOM() call cleanly.  Not sure if that would help for what you are planning.

In terms of scrolling direction, I think I'm going to add that to generateItem() as a hint.  Not sure yet what that is going to look like yet.  I think it will be needed for the apps to load data more efficiently from their back end.

I don't think we want to base any state API explicitly on requestAnimationFrame as I believe that might trigger a paint even if one is not needed.

Does any of that help or am I just being stubborn now? :-)
Flags: needinfo?(bugmail)
Or are you saying you want what's just in the viewport vs rendered to the DOM?
(In reply to Ben Kelly [:bkelly] from comment #63)
> Or are you saying you want what's just in the viewport vs rendered to the
> DOM?

In which case I agree.  The indices visible in the viewport is an immutable concept and could be exposed cleanly as you suggest.  It's just not the same as the startIndex/endIndex from the original demo.  Sorry for my confusion!
Yes, viewport.
Flags: needinfo?(bugmail)
Made a few tiny improvements to the recyclist library which you may want to implement in your branch. Going to continue to look for other optimizations in the code.

1 - Changed the array literal to an object literal. We were using delete which nulls out the value in the array so every operation was working on a larger set as you scrolled.

https://github.com/KevinGrandon/recyclist/commit/023ad4f1fa1506aaa38050f8203bfd1b54d7fbe1
http://jsperf.com/object-vs-array-delete-operations


2 - Implemented a faster Math.floor and Math.ceil using bitwise operations. This was frowned upon in the gaia meeting, but inside a critical loop and with a comment I think this is fine. This should also be safe as we are only dealing with positive numbers (array indexes).

https://github.com/KevinGrandon/recyclist/commit/bed8fbf4311e77671741d7d3018790c39906269e
http://jsperf.com/math-floor-and-math-ceil-vs-bitwise-operations
Made another optimization. We were calling generate() twice for each scroll event, when it should only be necessary to call it once with the larger window. The smaller generate() should not be needed as viewport items should already be populated by a previous call. This should avoid extra recyclableItems.sort() calls and iterations. I moved the initial sync generate() into the init method.

https://github.com/KevinGrandon/recyclist/commit/e9f8a40ef27e3c1e24c27939fc8ea7dd425d774e
Hmm, I think its a win to avoid scroll events altogether using the requestAnimationFrame() method I described in comment 47.  We don't need to generate as many DOM nodes.
(In reply to Ben Kelly [:bkelly] from comment #68)
> Hmm, I think its a win to avoid scroll events altogether using the
> requestAnimationFrame() method I described in comment 47.  We don't need to
> generate as many DOM nodes.

Ah right - I haven't upstreamed your latest changes yet. Let's test with both and see what's better :)
So I feel I've gone as far as makes sense with the demo.  We can avoid checkerboarding, but it is still a bit CPU intensive.

Kevin, do you want to run with this now to turn it into the library version?  Or do that in a separate bug?
(In reply to Ben Kelly [:bkelly] from comment #70)
> So I feel I've gone as far as makes sense with the demo.  We can avoid
> checkerboarding, but it is still a bit CPU intensive.
> 
> Kevin, do you want to run with this now to turn it into the library version?
> Or do that in a separate bug?

So I tried implementing the requestAnimationFrame, but I'm still seeing checkerboarding: https://github.com/KevinGrandon/recyclist/pull/1?w=1

I might be missing some patches though, so I will dig through everything in this bug again in the next few days.

Just to verify - you do not have any custom gecko patches implemented?
No gecko patches as far as I recall.  Just m-c the last time I tested this.  I was using a buri.  What device are you on?
I have a gecko from yesterday, and unfortunately don't have my buri with me. I have tested my patched version on a helix, peak, and nexus4 - checkerboarding is present on all of them (less so on the nexus). I'll try your branch as well though.
function fixupItems() {
  // Synchronously generate all the items that are immediately or nearly visible
  generateItems(1);
  // Asynchronously generate the other items for the displayport
  setTimeout(function() {
    generateItems(4);
  }, 0);
}

IMO this is really wrong:

* If you want to split something across frames then you probably want requestAnimationFrame instead of setTimeout. On the first callback you generate what is required for the frame, on the second you can start expending out.

* Second, and most important, a multiplier of 1x is just going to cover the viewport. Between the viewport and the displayport were going to render the background without the elements populated, send that off to the GPU and have it potentially visible during async scrolling. What we need to do is synchronously fill the full -displayport- so that we don't invalidate any content that is already drawn. Then if we receive a requestAnimationFrame after that one without a scroll event we can use treat this as idle time and grow the list outside the displayport.
Flags: needinfo?(roc)
I agree, which is why I suggested we add an API to tell Web content what the displayport is.
Flags: needinfo?(roc)
Alright, but in the mean time x3 of the viewport (maybe even x4) is a better guess.
I think we have figured out the main issue effecting the demo.  I'm closing this so we can focus on our app-specific bugs.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.4 S3 (14mar)
You need to log in before you can comment on or make changes to this bug.