Closed Bug 865750 Opened 7 years ago Closed 7 years ago

Support groups of nodes in visibility_monitor.js

Categories

(Firefox OS Graveyard :: Gaia, defect)

defect
Not set

Tracking

(b2g18 wontfix, b2g-v1.1hd wontfix)

RESOLVED FIXED
Tracking Status
b2g18 --- wontfix
b2g-v1.1hd --- wontfix

People

(Reporter: vingtetun, Assigned: eshapiro)

References

Details

(Keywords: perf, Whiteboard: [c= p=3 s= u=])

Attachments

(4 files)

visibility_monitor.js is written expecting a list of siblings. Having it working correctly with a list containing groups of nodes would be awesome and would help a lot of use cases (call log, contacts, sms, maybe others?).
Not sure how this was missed, it's blocking bug 865741 which is leo+ so setting this as leo+ as well.
Assignee: nobody → hub
Status: NEW → ASSIGNED
blocking-b2g: --- → leo+
Whiteboard: c=
Whiteboard: c= → c= p=3
Keywords: perf
Whiteboard: c= p=3 → c= p=3 ,
Hubert,

The code in visibility_montor.js is kind of hairy. Let me know if you'd like help with this. And please ask me for review.

Vivien: are you still hoping to use a modified visibility_monitor in v1.1?  Is this really something to block on?

Could you explain what you need in more detail?  What do you mean by "a list containing groups of nodes"? Is the issue that the items in the list have variable height?  Are you asking that it be modified to monitor all descendants of the list or just to monitor direct children of the list but handle the case where the children have different sizes.  The latter should be pretty easy, but I worry about the performance implications of the former.
Flags: needinfo?(21)
(In reply to David Flanagan [:djf] from comment #2)
> Vivien: are you still hoping to use a modified visibility_monitor in v1.1? 

I do :)

> Is this really something to block on?

That would make a difference imho.

> 
> Could you explain what you need in more detail?  What do you mean by "a list
> containing groups of nodes"? Is the issue that the items in the list have
> variable height?  Are you asking that it be modified to monitor all
> descendants of the list or just to monitor direct children of the list but
> handle the case where the children have different sizes.  The latter should
> be pretty easy, but I worry about the performance implications of the former.

This is a dom hierarchy story. afaict visbility_monitor.js expect a list to be formatted as:
 <list>
  <row id="1"></row>
  <row id="2"></row>
  <row id="3"></row>
  <row id="4"></row>
  <row id="5"></row>
  ...
 </list>

But in the case of the call loo / contacts  / sms / settings / (not sure about music) the dom hierarchy is based on grouping such as:

<list>
 <container id="today">
  <row id="1"></row>
  <row id="2"></row>
  <row id="3"></row>
  ...
 </container>
 <container id="yesterday">
  <row id="4"></row>
   ...
 </container>
 <container id="2 days ago">
  ...
 </container>
</list>


The issue with with such a hierarchy and visibility_monitor.js is that it does not scale since the show/hide callbacks will be called once a group become visible/hidden. Then if one group contains hundreds of rows we end up with the same issue as before.

Is it clearer?

One potential solution could be to fix those apps and to dump the dom as a suite of dom nodes but since this grouping mechanism as been implemented in different apps by different developers it should be because people find it more natural for some kinds of lists.

What do you think?
Flags: needinfo?(21)
Actually, all the apps are using that nested structure because is defined by the Building Blocks in that way.
Taking this from Hub, with his permission.
Assignee: hub → dflanagan
Vivien: your explanation in comment #3 is very clear. Thank you. What we have now for Gallery is monitorChildVisiblity(). But what most other apps want is monitorGrandchildVisibility().

I think I'll try to handle only the specific use-case of one level of nesting because I suspect that a more general solution is hard to do efficiently.

I see two main difficulties here:

1) I'll have to modify the MutationObserver to listen for subtree mutations instead of just childList mutations.  Or I'll have to listen for childList mutations on the parent list node and then obverve the childList of each container element as it is added.

2) The algorithm is heavily dependent on nextElementSibling and previousElementSibling, and those won't work for grandchildren. I could attempt to maintain my own list of all grandchildren elements. (And that could yield a generalization where I use querySelectorAll() to monitor all descendants that match a given selector.) But it seems heavyweight to duplicate the DOM's lists of elements with an external array of elements. So I think I'll need to refactor so that e.nextElementSibling becomes nextSibling(e).  Possibly I can make these new nextSibling() and previousSibling() functions work for either children or grandchildren.
(In reply to David Flanagan [:djf] from comment #6)
> Vivien: your explanation in comment #3 is very clear. Thank you. What we
> have now for Gallery is monitorChildVisiblity(). But what most other apps
> want is monitorGrandchildVisibility().
> 
> I think I'll try to handle only the specific use-case of one level of
> nesting because I suspect that a more general solution is hard to do
> efficiently.
> 
> I see two main difficulties here:
> 
> 1) I'll have to modify the MutationObserver to listen for subtree mutations
> instead of just childList mutations.  Or I'll have to listen for childList
> mutations on the parent list node and then obverve the childList of each
> container element as it is added.
> 
> 2) The algorithm is heavily dependent on nextElementSibling and
> previousElementSibling, and those won't work for grandchildren. I could
> attempt to maintain my own list of all grandchildren elements. (And that
> could yield a generalization where I use querySelectorAll() to monitor all
> descendants that match a given selector.) But it seems heavyweight to
> duplicate the DOM's lists of elements with an external array of elements. So
> I think I'll need to refactor so that e.nextElementSibling becomes
> nextSibling(e).  Possibly I can make these new nextSibling() and
> previousSibling() functions work for either children or grandchildren.

Could be a stupid idea. But can't we use a live node list (like a NodeList returned by container.getElementsByTagName) and access the previous/next element from here? That would avoid a duplicate of the DOM's list of elements.

The main issue is that NodeList are accessed by an integral index and there is no clean way to get the index of an element without iterating though the list but I don't know how much an issue it will be ?
I'd forgotten about getElementsByTagName as an option. That would be a nice way to agnostic about whether the element were children, grandchildren, or some other level.  I'll think about that.  It might require more refactoring to convert the algorithm based on linked lists to one based on a NodeList, though.
Evan,

I'm going to be tied up for a couple of days on media-related bugs at the San Diego work week, so I'm passing this bug off to you, since you've been working on test cases for it.

The attachment is the code I've written so far. It is completely untested--I've never even tried to run it, so it probably has typos and syntax errors.  

There's new code at the bottom to generalize the basic first/next/previous operations to work with children or grandchildren of the scrolling element.

There is also a bunch of new code in the mutationHandler() function. I've tried to simplify and generalize that code at the expense of sometimes calling the onscreen callback more often than necessary.  I'm much less confident in that code, so look it over carefully, please.

A number of perf improvements in the communcation apps are depending on a fix here, so see what you can do.  Let me know if you have questions about anything.
Assignee: dflanagan → eshapiro
In addition to the attached patch, you can also see my work in progress in this branch of my repo: https://github.com/davidflanagan/gaia/tree/bug865750.  That branch has separate commits which might make it a little bit simpler to understand the changes I've already made.
Evan has taken this bug and has a patch just about ready for review. I'm not going to be able to review for a couple of days.  Its a complete rewrite of the visibility monitor code, and includes tests, but still needs a careful review.

Vivien or Ben, are either of you available to review it?  I'm going to be tied up with last minute keyboard auto-correct stuff next week.
Flags: needinfo?(bkelly)
Flags: needinfo?(21)
A rewrite of visibility_monitor.js able to monitor nodes at any depth, not just direct children.
Attachment #756850 - Flags: review?(21)
Hi Evan. I could't make it work with a list markup, from the Building blocks.

At the other hand, I agree with Vivien that a solution using getElementsByTagName would have a much simpler implementation (not sure about performance).

I created a quick (and probably buggy) implementation here
https://github.com/albertopq/gaia/commit/a8724a23d1bab315ccc428bd142fd00b9561d8eb

Thoughts?

Thanks!
Could you send me what you have that didn't work? I tested quite heavily, and would be surprised if there was anything besides a small issue causing problems.
I got an error on getDistance method, just taking the contacts app and initialising  the monitor as:

https://github.com/albertopq/gaia/commit/a8724a23d1bab315ccc428bd142fd00b9561d8eb#L1R376

Let me know if I can help any further.

Thanks!
Were you calling my version with the following?
    monitorMultilevelChildVisibility(scrollable, 500, 300, 'li', onScreen, offScreen)

The version I submitted has a slightly different API, so it isn't too surprising calling it this way wouldn't work!

To be notified of all changes to children and grandchildren:
    monitorMultilevelChildVisibility(scrollable, 500, 300, 2, onScreen, offScreen)

To be notified of all changes to grandchildren that are li elements:
    monitorGrandchildWithTagVisibility(scrollable, 500, 300, 'li', onScreen, offScreen);

Let me know if that works
Alberto, I think a getElementsByTagName version would risk bad performance. Iterating over live node lists looks like a linear time operation but it is not always.
Flags: needinfo?(alberto.pastor)
Hi Evan,

Sorry, I meant monitorGrandchildWithTagVisibility. I still get this error if I initialise the monitor when the container is empty:

E/GeckoConsole( 5538): [JavaScript Error: "TypeError: curr is null" {file: "app://communications.gaiamobile.org/shared/js/multilevel_visibility_monitor.js" line: 652}]

If I do it when I already rendered 1 chunk, I don't get the error, but onScreen nor offScreen callbacks are never called.

Did you try it with the markup of the building blocks? Did it work for you? I might be not using the monitor correctly... Could you please try to use it in the Contacts app, for exmple?

@djf, yes you might be right. I just was talking about what was the simpler implementation, but not sure about performance :(. Mutations observing are not that cheap either, are they? We might need some performance tests on this. I think we could also stop iterating in some cases we know there are not more changes.
Flags: needinfo?(alberto.pastor)
Huh, okay, I'll see what's going on. I'll have time and I'll get back to you later today.
I haven't been able to reproduce the 'curr is null' error, but I think I know why you aren't getting onscreen or offscreen callbacks, and I have that working on my end.

What you need to do, is instead of using:
    monitorGrandchildWithTagVisibility(scrollable, 500, 300, 'li', onScreen, offScreen);

use:
    monitorChildWithTagVisibility(scrollable, 500, 300, 4, 'li', onScreen, offScreen);

For performance reasons, it is necessary to specify a maximum depth to monitor nodes. With Grandchildren, the monitor looks 2 levels down. For contacts, it appears as though the list items are 4 levels down, so a maxDepth of 4 must be specified for the list items to be found.

I hope that helps! And while I wasn't able to reproduce the 'curr is null' error, if you let me know the exact steps, I'd be glad to try and reproduce the error and fix it.
I have a rough proof of concept integration of multilevel_visibility_monitor with contacts working here:

  https://github.com/wanderview/gaia/tree/contacts-vis-mon

This is largely based on Vivien's original patch for bug 865747.

Note, I am seeing some severe jank with visibility monitor integrated using the reference-workload-heavy data on a Buri.  General sequence of events:

  1) With some debug statements I can see that it renders the first batch of visible contacts cleanly up to Alan Phair.
  2) Wait for app lazy load to reach steady state.
  3) Slowly scroll down. (Do not flick.)
  4) After Abby Lynch enters the bottom of the screen the UI completely freezes.
  5) Freeze remains for 10's of seconds.
  6) Debug statements show that onscreen has suddenly been called for contacts all the way down to Kathleen Sabb.  This is around 500 contacts or half the DB.

Based on this behavior my gut feeling is that the multilevel_visibility_monitor is getting confused some how when the first new contact becomes visible.  Its also likely I've just done something wrong.  Unfortunately I ran out of time to investigate this evening.
Flags: needinfo?(bkelly)
I'm facing the same problems :bkelly pointed out above. Note also that I didn't receive any offScreen callback.

At the other hand, as we are adding/removing elements to the dom in the onScreen/offScreen callbacks, the Mutation observer triggers again. 

I think the most efficient idea would be Evan trying to integrate the visibility monitor in contacts, using :bkelly first approach, and see what's going on.

Evan, what do you think?
Flags: needinfo?(eshapiro)
Minor issue in attachment 756850 [details]:

The monitor handle is not getting returned for all cases.  I patched this in my tree here:

  https://github.com/wanderview/gaia/commit/0caa2d1b0cf7cb7925fc847260d6eb0efe26a598
(In reply to Alberto Pastor [:albertopq] from comment #23)
> At the other hand, as we are adding/removing elements to the dom in the
> onScreen/offScreen callbacks, the Mutation observer triggers again. 

This was spot on.  I don't have a good explanation for why, but modifying the elements in our onscreen() was triggering the severe jank and huge number of onscreen() callbacks.  This is a bit unexpected since our DOM changes should not have changed the size of any list items.

To deal with this situation I added a monitor.ignore() function.  For example:

  monitor.ignore(function() {
    // DOM changes in here do not trigger mutation events
  });

See:

  https://github.com/wanderview/gaia/commit/a96850f438dc6175498c63b73017d277f72af557

With monitor.ignore() we now seem to get the appropriate callbacks and the severe jank is gone.  We do still have some minor jank during scrolling, but it seems likely that is more of an app rendering issue.
(In reply to Ben Kelly [:bkelly] from comment #25)
> With monitor.ignore() we now seem to get the appropriate callbacks and the
> severe jank is gone.

Correction.  We are getting all the onscreen callbacks, but we seem to be missing some offscreen callbacks.

For example, scrolling around I get offscreen() for Alisha Edwards and Amanda Casella, but no callbacks for the intervening contacts (Alisha Walton, Alisha Woodward, Alison Starling, and Allison Barren).

I don't know if its relevant, but this is one offscreen callback for every six contacts.  Six contacts fit on a screen at once.
(In reply to Ben Kelly [:bkelly] from comment #26)
> Correction.  We are getting all the onscreen callbacks, but we seem to be
> missing some offscreen callbacks.

This was corrected when I changed the maxDepth setting to 3 from 4.  So:

    monitor = monitorChildWithTagVisibility(groupsList, 600, 300, 3, 'li',
                                            onscreen, offscreen);

Is it a bug that with maxDepth=4 we get onscreen() callbacks, but not offscreen() callbacks?
From what I understand, things are working as expected with the applied patches? Soon I'll be able to take a look at what's going on.
Flags: needinfo?(eshapiro)
Okay, I think I have the contacts app integrated with the visibility monitor without issue or need for ignoring mutations:

https://github.com/es92/gaia/tree/contacts-test

The important change was I used scrollable as the container instead of groupsList:
    monitor = monitorChildWithTagVisibility(scrollable, 600, 300, 4, 'li', onscreen, offscreen);

It is necessary to provide the scrollable element as the container. The groupsList container had no knowledge that its visible height was less than its absolute height.

Let me know if this implements the expected behavior.
Flags: needinfo?(bkelly)
Awesome, thanks!

I actually just came to the same change in order to get the fixed header functionality working as well.  I'll try removing the ignore() call.

So it sounds like there is no way to detect this condition and fail fast when the monitor is created?  Would be nice just to avoid debugging time.

Thanks again.
Flags: needinfo?(bkelly)
So it worka without the monitor.ignore() as you indicated, but it does seem a bit slower when scrolling.

Would it be possible to include monitor.ignore(), or something like it, to optimize DOM changes we know won't effect visibility?
Flags: needinfo?(eshapiro)
Did you take a look at the branch I posted? While I removed the mutations.ignore, I kept batching the renderContact calls together, that seemed to improve performance a bit.

Link for convenience: https://github.com/es92/gaia/blob/contacts-test/apps/communications/contacts/js/contacts_list.js.
Flags: needinfo?(eshapiro)
(In reply to eshapiro from comment #32)
> Did you take a look at the branch I posted? While I removed the
> mutations.ignore, I kept batching the renderContact calls together, that
> seemed to improve performance a bit.

Yep, that's what I tried too and it does work fairly well.  If you scroll through the list as fast as you can by swiping it just seems a bit more responsive with monitor.ignore() in there.

I don't have numbers, though, so I could be suffering from confirmation bias.

It just seemed to me that if we could guarantee that the DOM was not changing visibility, then it might be worth avoiding the mutation events and recalculations.  Seems like every bit counts on this low end hardware.
I agree, its worth having the option. I'll add something back into the api.
I updated the pull request, let me know if that works for you.

(In reply to Ben Kelly [:bkelly] from comment #33)
> (In reply to eshapiro from comment #32)
> > Did you take a look at the branch I posted? While I removed the
> > mutations.ignore, I kept batching the renderContact calls together, that
> > seemed to improve performance a bit.
> 
> Yep, that's what I tried too and it does work fairly well.  If you scroll
> through the list as fast as you can by swiping it just seems a bit more
> responsive with monitor.ignore() in there.
> 
> I don't have numbers, though, so I could be suffering from confirmation bias.
> 
> It just seemed to me that if we could guarantee that the DOM was not
> changing visibility, then it might be worth avoiding the mutation events and
> recalculations.  Seems like every bit counts on this low end hardware.
(In reply to eshapiro from comment #35)
> I updated the pull request, let me know if that works for you.

Looks good.  Thanks!
No longer blocks: 871823
Glad you made it working I will start the review asap. Thanks for the collaboration guys :)
Flags: needinfo?(21)
While running unit tests on my integration of the multilevel_visibility_monitor with contacts app, I ran into a lot of cases where values were null.  This patch fixes or works around a lot of them and permits the tests to pass.

Its not clear to me if these are legitimate fixes or if the gaia unit tests are end up using the multilevel_visibility_monitor in an illegal way.  Thoughts?
Flags: needinfo?(eshapiro)
My thought is that either the unit tests are manipulating the contacts list in such a way that a requirement is being broken, or there is a bug somewhere in the way I implemented the mutation observer. I'm concerned that adding null checks is hiding the real issue, which could cause problems further down the line. Would you mind posting the sequence of commands causing the null values?
Flags: needinfo?(bkelly)
(In reply to eshapiro from comment #39)
> Would you mind posting the sequence of commands causing the null values?

Sure.  Basically run the communications unit tests in my branch without the attached fix:

  1) Checkout https://github.com/wanderview/gaia/tree/contacts-vis-mon
  2) Revert https://github.com/wanderview/gaia/commit/a9827687f8b44ca85d975305eb58abc9f3cdcd1c
  3) Setup and run the script referenced here:  https://developer.mozilla.org/en-US/docs/Mozilla/Firefox_OS/Platform/Testing/Gaia_unit_tests#Launch_the_server_and_Firefox_Nightly
  4) In your gaia checkout execute:  APP=communications make test-agent-test

The code for the unit tests is mostly here:

  https://github.com/wanderview/gaia/blob/master/apps/communications/contacts/test/unit/contacts_list_test.js

Unfortunately I only just started working with the tests, so I don't know a lot about their internal structure at the moment.  One thought, though, is that perhaps resetDom() in the setup is confusing the monitor.
Flags: needinfo?(bkelly)
thanks. I'll take a look this afternoon.
When I run "APP=communications/contacts make test-agent-test" I'm only getting the following 4 errors:

  1) [communications-contacts] Render contacts list Facebook Contacts List Search non-ASCII (accented characters) with ASCII results:
     TypeError: subject.resetSearch is not a function

  2) [communications-contacts] Render contacts list Facebook Contacts List Search ASCII (non-accented characters) with non-ASCII results:
     TypeError: subject.resetSearch is not a function

  3) [communications-contacts] Render contacts list Facebook Contacts List Order by lastname:
     Error: TypeError: printed is undefined (http://communications.gaiamobile.org:8080/contacts/test/unit/contacts_list_test.js?time=1371600791746:1034)
      at onerror (http://test-agent.gaiamobile.org:8080/common/vendor/mocha/mocha.js:4959)
  
  4) [communications-contacts] Render contacts list Facebook Contacts List NOT order by lastname:
     Error: TypeError: name is undefined (http://communications.gaiamobile.org:8080/contacts/test/unit/contacts_list_test.js?time=1371600791746:1064)

are those the same ones you're getting? The first two errors do not appear to be related to the visibility monitor, the second two may be, I'll look into them more probably tomorrow,
Flags: needinfo?(bkelly)
(In reply to eshapiro from comment #42)
> are those the same ones you're getting? The first two errors do not appear
> to be related to the visibility monitor, the second two may be, I'll look
> into them more probably tomorrow,

Sorry, those were due to a mis-merge when I rebased.  I pushed a fix to my contacts-vis-mon branch.

The branch should now run the tests cleanly.  If you back out my multilevel_visibility_monitor fixes in 22f1103973 then you should see the errors.
Flags: needinfo?(bkelly)
Comment on attachment 756850 [details]
link to github pull request

I'm sorry guys, I would really like to review this code and I don't want to slow down the process but it is big and hairy and I keep beeing distracted by other subjects. So I wonder if David would be a better reviewer (and more reactive) since he has implemented the first visibility_monitor.js. From a very far point of view the code looks good and go to the essential so I'm not really worried about it.

David how do you feel about reviewing this?
Attachment #756850 - Flags: review?(21) → review?(dflanagan)
Vivien,

Yes, I can take this review. I was going to do it, and then was really busy with auto-correct stuff so I suggested Evan ask you. (Also I'm afraid that Evan's code is better than mine, and I don't want my jealousy to come through in the review :-)
No longer blocks any blockers.
blocking-b2g: leo+ → -
Evan,

I've begun a review and there are lots of comments on github. Many are style nits or comment/documentation requests. Some are queries, some are suggestions for ways to do things differently. I think there are just one or two more substantive things.

I'm up to about line 430, where the callbacks stuff begins, and will try to review more tomorrow.

I know you're busy until next week, but plan to spend some time next week finishing this off, if you can.

Ben: I haven't looked at Evan's tests yet. Are you willing to take the review for those?
Flags: needinfo?(bkelly)
Vivien: did you see comment 46? Do you still think we should try to uplift this?
Flags: needinfo?(21)
(In reply to David Flanagan [:djf] from comment #47)
> Ben: I haven't looked at Evan's tests yet. Are you willing to take the
> review for those?

Sure.  I'll have it done by the end of the weekend.

Fair warning, though: I'm new to gaia and our test infrastructure, so I may not be familiar with all our conventions, best practices, etc. :-)
Flags: needinfo?(bkelly)
Attachment #756850 - Flags: review?(bkelly)
(In reply to eshapiro from comment #39)
> My thought is that either the unit tests are manipulating the contacts list
> in such a way that a requirement is being broken, or there is a bug
> somewhere in the way I implemented the mutation observer. I'm concerned that
> adding null checks is hiding the real issue, which could cause problems
> further down the line. Would you mind posting the sequence of commands
> causing the null values?

By the way, I noticed the contacts app unit tests configuration does not currently produce element sizes as you would expect.  For example, the scrollable list that takes the full screen on the device appears to only be a few pixels tall at first in the test case until nodes are added.  I think perhaps the contacts app test cases need to include some of the CSS from the real app.  Not sure if this is related to the null references or not, but thought I would mention it.
David:
Thanks for the review! As you mentioned, I have a lot to get done by tomorrow - I'll take a look early next week at both the null reference errors and your comments.

Ben:
The visibility monitor has some requirements about the css, so that's probably what's causing issues. You'll see when you look at my tests, I set some css properties in javascript so it works properly. I'm not 100% sure that's it, but it definitely seems like a good place to start. I'll fix it in the tests when I get back to this early next week.
(In reply to eshapiro from comment #51)
> Ben:
> The visibility monitor has some requirements about the css, so that's
> probably what's causing issues. You'll see when you look at my tests, I set
> some css properties in javascript so it works properly. I'm not 100% sure
> that's it, but it definitely seems like a good place to start. I'll fix it
> in the tests when I get back to this early next week.

So I think I might have isolated at least what is causing it.  It does not appear to be CSS related.  Sorry for the false trail there.

It appears the "curr is null" problems occur when a node is removed from the DOM in such a way that the corresponding MutationRecord is reported in the same event where the node was added.

For example, in the tests we eseentially do this:

      subject.load(newList);
      subject.remove('2');

This loads some contacts and then removes the contact with ID '2'.  The contacts list represents 34 mutations with additional nodes.

Debug in the mutationHandler shows that we get a single event with 35 mutations.  This is 34 additions and then the removal as the last mutation.

The node for contact '2' is already removed, but the visibility_monitor is just now getting the mutation with the addition of the node.  When it tries to call childAdded() with this node it gets the null reference.

It also gets a null reference when it calls childAdded() for a child of the removed node.

This can also be reproduced in the app running on the device:

  1) Checkout my contacts-vis-mon branch linked in comment 40.
  2) Revert null reference fixes for visibility monitor.
  3) make install-gaia
  4) make reference-workload-heavy
  5) Open contacts app
  6) Open contacts settings
  7) Toggle "Order by last name" and click Done
  8) Observe "curr is null" errors in logcat

This workflow calls resetDom() in contacts_list.js which in turn does:

    utils.dom.removeChildNodes(groupsList);

It appears that the removal of groupsList is reported with some of the trailing additions of the previous load().

It seems one possible solution would be to scan the mutations ahead of time looking for any removals.  Store any removed nodes in a hash and then check each added node to see if its a descendant of the removed nodes.

Thoughts?
(In reply to Ben Kelly [:bkelly] from comment #52)
> It appears that the removal of groupsList is reported with some of the
> trailing additions of the previous load().

Sorry, this should say the removal of children of groupsList.

Anyway, I'll leave this for now until you have time next week.  Thanks!
Comment on attachment 756850 [details]
link to github pull request

Overall the tests look pretty good!

I do have a bit of a concern about test coverage, though.  I think its important that we test adding nodes in a multilevel DOM since that is a core use case for this code.

More comments in the github pull request.  Thanks!

(Note, I'm having trouble getting the integration tests to run at the moment, but the code looked pretty straightforward.)
Attachment #756850 - Flags: review?(bkelly)
Taking a look to it, thanks a lot for the hard work!

F.
(In reply to Francisco Jordano [:arcturus] from comment #55)
> Taking a look to it, thanks a lot for the hard work!
> 
> F.

Upps, sorry wrong bug :)
I've added a few more comments on github, but I still haven't finished my review.
I've been working on the null bug. Ben, as you figured out, it appears to happen when the same element, or related elements, are removed/created on the same run of the mutation handler. 

I'm getting close, I think when I get back to this tomorrow I'll be able to fix it.

After that I'll get to looking at the review of the actual code (and I will add some test cases with adding multilevel nodes).
Flags: needinfo?(eshapiro)
Evan: I've finished a first pass review, but I think I need you to explain to me the need for the complicated nextCousin/prevCousin stuff when calling callbacks before I can understand it all and complete the review.

Overall, the code seems solid. There are a few areas that I'm concerned about, where you might be able to speed things up. But overall, I'm pretty happy with it, and if tests are passing, I think you'll be able to have something that I'll r+ soon.
I've fixed the null bug. It turned out there were a family of bugs centered around the mutation handler receiving 'conflicting' events. What I mean by that is the situation where some events say related nodes are added and others say they're removed. I rewrote the mutation handler and added a bunch of tests.

I'll document the code and add it to the pull request tomorrow.
part of the fix I made doesn't work in one case. I'm working on fixing that.
fixed it, documenting now
pull request updated. I'm going to start looking at comments now
implemented changes based on the review. I changed a lot of documentation, and based on feedback, ended up adding one feature, the option to only notify on changes to the maximum depth being watched. Let me know if there are further questions / concerns.
Flags: needinfo?(dflanagan)
Flags: needinfo?(bkelly)
The updates work well in both the contacts unit tests and on the device.  Very nice!  Thanks!
Flags: needinfo?(bkelly)
Oh, and r+ for my small part in the review.
Comment on attachment 756850 [details]
link to github pull request

Evan,

You didn't address the first part of comment 59 in your updated patch: that is, there is still not enough commentary for me to understand the callCallbacks() logic and why it requires the cousins stuff.

Because of my upcoming vacation, I may not have time to do another round of review. So rather than keeping this open, I'm going to give r- here.  You may need to find another reviewer in order to get this landed before I return on July 19th.

My main concern is the complexity and maintainabiliyt of the code. I don't want Gaia apps to start depending on this ~1000 line module it if you're the only one that understands it.

Before today I was just asking for enough documentation so that I could understand the algorithm.  Now that I see the changes you've had to make to the mutation observer to handle the deletion case, I worry that the whole thing is going to collapse under the weight of its own complexity.

Since this bug is no longer blocking, I think it is worth spending some more time on to get it right.

I'm wondering now whether we should have taken the approach suggested early in this bug of just using container.getElementsByTagName() and monitoring using that live node list.  My concern about that approach was (and is) that the performance might be bad. But I think that an implementaiton based on a live node list would be much, much simpler.  Are you willing to try an implementation based on getElementsByTagName() and test its performance against this version? If the simpler approach is fast enough, for the apps that need it, I'd prefer to go with that.

Failing that, let's think about what simplifications can be made to this implementation.  I don't think you have any clients that actually need to monitor the visibility of multiple levels of nodes, do you? So if you only monitored and reported elements at a fixed depth, I'm pretty sure that would dramatically simplify the callback calling code.  And it might also simplify the mutation handler code.  (And clients that needed to monitor at multiple levels could always create multiple visiblity monitors)

Also, I think that element removal will be an uncommon event in most of your client apps. So it is worth considering a simplification where on deletions you just reset and recompute the visibility of the elements from scratch.  That may mean that some elements do not have their offscreen callback called. But that should be okay.  The onscreen callback is mandatory. But for all of the apps that want to use this, I think that it is okay if the offscreen callback is just a hint and not always called reliably.
Attachment #756850 - Flags: review?(dflanagan) → review-
I agree about the complexity, maintainability could be an issue. getElementsByTagName would certainly be simpler. If you'd like me to attempt a rewrite using tagname, I definitely can do that. 

As for revising the current approach to be simpler there are a few ways to go. Yesterday I added an option for only notifying at the maxdepth level. You can see that when this option is set, a lot of the code in callcallbacks doesn't need to get called. I was considering simplifying the mutation case by recalculating, but I decided against it at the time as nodes would be notified of the same state multiple times.

Let me know what you'd like me to do.
Flags: needinfo?(dflanagan)
My thoughts as a user:

  - Notifying at maxdepth sounds reasonable and a good simplification.

  - Making offscreen() advisory would be acceptable, but it would nice to keep it guaranteed if we can.  I think its easier to reason about an API if its behavior is consistent.

  - Similarly, I would prefer if onscreen() to only notify once, but contacts would be ok with multiple onscreens.  Something to consider here is that while multiple notifications might be cheaper for the monitor, it may trigger heavy code in the app that we're trying to minimize calling.  I guess it would allow the app to decide if de-duplicating is necessary or not, but every app would need to decide that.

David, do you think only notifying at maxdepth would satisfy your complexity concerns?  I'm happy to follow up with review, but I don't have a good gauge of what you consider too complex vs not.

I had hoped to get this and my integration with contacts app landed soon.  You are correct, of course, that its not a blocker so we shouldn't rush, but I would like to move it forward if we can.

Evan, what is your availability to evaluate the getElementsByTagName approach?
Flags: needinfo?(eshapiro)
Flags: needinfo?(dflanagan)
(In reply to eshapiro from comment #68)
> I agree about the complexity, maintainability could be an issue.
> getElementsByTagName would certainly be simpler. If you'd like me to attempt
> a rewrite using tagname, I definitely can do that. 
> 
> As for revising the current approach to be simpler there are a few ways to
> go. Yesterday I added an option for only notifying at the maxdepth level.
> You can see that when this option is set, a lot of the code in callcallbacks
> doesn't need to get called. I was considering simplifying the mutation case
> by recalculating, but I decided against it at the time as nodes would be
> notified of the same state multiple times.
> 
> Let me know what you'd like me to do.

Using tagname seems like the simplest possible implementation, and if performance is good, I don't see any reason not to do that. So if you're willing to experiment with that approach to test performance, I'd try that first.

Then, if performance is not good enough, you could try the simplifications to the current patch.  

But you're the most familiar with the code, so you should make the final decision on this.

(In reply to Ben Kelly [:bkelly] from comment #69)

> 
>   - Notifying at maxdepth sounds reasonable and a good simplification.
> 

I'm glad to hear that.

>   - Making offscreen() advisory would be acceptable, but it would nice to
> keep it guaranteed if we can.  I think its easier to reason about an API if
> its behavior is consistent.
> 
>   - Similarly, I would prefer if onscreen() to only notify once, but
> contacts would be ok with multiple onscreens.  Something to consider here is
> that while multiple notifications might be cheaper for the monitor, it may
> trigger heavy code in the app that we're trying to minimize calling.  I
> guess it would allow the app to decide if de-duplicating is necessary or
> not, but every app would need to decide that.
> 

In Gallery, the way I use the original visibility monitor, it is obvious (to the gallery app) whether an element is in its onscreen state or its offscreen state.  If I got multiple onscreen calls for the same element it would be easy to avoid redoing computation because the app knows what is done when a node goes on or off screen.

I imagine the same is true for any other clients.  But we could actually require the client to pass in an isInOnscreenState predicate so that visibility monitor could test the state of a node before calling its callbacks.  Or, we could just set and clear a data-onscreen property on each node to track the state.

> David, do you think only notifying at maxdepth would satisfy your complexity
> concerns?  I'm happy to follow up with review, but I don't have a good gauge
> of what you consider too complex vs not.

Yes, it probably would.
Flags: needinfo?(dflanagan)
I'll look into using tagname. If that doesn't work well, then I'll make the current version easier to understand.
Flags: needinfo?(eshapiro)
I can confirm perf is acceptable. The code is going to be much simpler, I'll have something soon.
Attachment #769921 - Flags: review?(dflanagan)
Attachment #769921 - Flags: review?(bkelly)
I finished the rewrite using getElementsByTagName. Perf is fine, and the code is much simpler. The scope is decreased, but hopefully that won't matter for the clients that need this code.
(In reply to eshapiro from comment #74)
> I finished the rewrite using getElementsByTagName. Perf is fine, and the
> code is much simpler. The scope is decreased, but hopefully that won't
> matter for the clients that need this code.

Thanks!  I will start reviewing first thing tomorrow.
So the contacts app has the letter group shortcuts we use to jump scroll the list.  When the user hits one of these we set the scrollTop attribute of the container passed to the monitor.

With the previous monitor this seemed to work fine.  With the new tag monitor, however, I get a burst of notifications.  With a list of about 1150 contacts, if I jump to "P" I see 868 onscreen() calls and then immediately 857 offscreen() calls.  Unfortunately this causes a large amount of jank.

Is this expected with the new design?  I'm not sure how I can tell that one of the bursted onscreen() calls is going to be immediately offscreen()'d in order to avoid doing the render work.
Flags: needinfo?(eshapiro)
That's definitely a bug! Do you have a branch I can look at?
Flags: needinfo?(eshapiro) → needinfo?(bkelly)
(In reply to eshapiro from comment #77)
> That's definitely a bug! Do you have a branch I can look at?

I just updated my contacts app pull request to work with the tag monitor:

  https://github.com/mozilla-b2g/gaia/pull/10582

You need to copy the tag monitor file over manually.

Thanks for looking at it!
Flags: needinfo?(bkelly)
Commented in the PR, but it seems like this:

+    else if (
+      g.lastChildIndex < g.firstChildIndex ||
+      g.firstChildIndex > g.lastChildIndex
+    ) { // disjoint
+      notifyOnscreenInRange(g.firstChildIndex, g.lastChildIndex);
+      notifyOffscreenInRange(g.last.firstChildIndex, g.last.lastChildIndex);
+    } 

Should be using g.last in one half of the comparison?
yep, and that does appear to fix the issue you found with the contacts app!
Comment on attachment 769921 [details]
link to github pull request for getElementsByTagName version

r+ if you fix the issues on github and assuming that the tests are passing and bkelly is happy with it.

My github comment in fixIndex() is the only place where I worry that there is a bug. The other comments are nits or minor optimization suggestions.

Thanks for all your work on this Evan.
Attachment #769921 - Flags: review?(dflanagan) → review+
Comment on attachment 769921 [details]
link to github pull request for getElementsByTagName version

I added some relatively minor comments to the Github pull request.  You already addressed the major issues I had while testing with contacts app.  The unit tests work and appear to have good coverage.  I verified that fixIndex() ran in both directions.

Thanks for the awesome work!
Attachment #769921 - Flags: review?(bkelly) → review+
improved documentation and variable names, and squashed!

let me know if there's anything else I should change before it gets merged
Everything looked good to me, so I went ahead and merged:

  https://github.com/mozilla-b2g/gaia/commit/0a2392f2d81fc3ba9a2c5d3c9e3b2296c7b4befb

Thanks again for writing this as it will be a great help to contacts and other apps!
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
blocking-b2g: - → koi?
the v1.1 need this patch;
and the pr 866676 is affected by the pr;
so we set flag 
status-b2g18: 	?  	
status-b2g-v1.1hd: ?
status-b2g18: --- → ?
Flags: needinfo?(bkelly)
Updated dependencies and added some comments in bug 866676 since that is where the leo? nom is.  See:

  https://bugzilla.mozilla.org/show_bug.cgi?id=866676#c7
Flags: needinfo?(bkelly)
Flags: needinfo?(21)
This needs to be set to leo? to have this considered for 1.1. This is koi right now, which means that this is a wontfix for 1.1.
Removing koi? since this was commited to master before 1.2 branched.
blocking-b2g: koi? → ---
Whiteboard: c= p=3 , → [c= p=3 s= u=]
You need to log in before you can comment on or make changes to this bug.