Closed Bug 1373672 Opened 7 years ago Closed 7 years ago

FormDataListener.handleEvent() is extremely expensive when running speedometer

Categories

(Firefox :: Session Restore, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 57
Performance Impact high
Tracking Status
firefox57 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: ttaubert)

References

(Blocks 2 open bugs)

Details

Attachments

(6 files, 7 obsolete files)

1.67 KB, patch
smaug
: feedback-
Details | Diff | Splinter Review
1.85 KB, patch
Details | Diff | Splinter Review
3.11 KB, patch
Details | Diff | Splinter Review
61.28 KB, patch
nika
: review+
Details | Diff | Splinter Review
2.55 KB, patch
smaug
: review+
Details | Diff | Splinter Review
4.96 KB, patch
ttaubert
: review+
smaug
: review+
Details | Diff | Splinter Review
See this profile that I captured running about 100 or so tests into a full run of the Speedometer test suite at http://speedometer2.benj.me/:

https://perfht.ml/2sGyu12

The code: <https://searchfox.org/mozilla-central/rev/20d16dadd336e0c6b97e3e19dc4ff907744b5734/browser/components/sessionstore/content/content-sessionStore.js#452>

We are spending a whopping 118 *ms* only in this function.  As far as I can tell, this is all time being spent in cross-compartment wrappers and Xray wrappers code.  The JS code is trying to see whether the event's global is a frame that has been created after the load event by checking to see if it appears in a map (gFrameTree) which is what creates all of this overhead.

Note also that this benchmark has a lot of checkboxes inside it, for which we issue "input" and "change" events immediately after each other, so this code has 2x overhead for that <https://searchfox.org/mozilla-central/rev/20d16dadd336e0c6b97e3e19dc4ff907744b5734/dom/html/HTMLInputElement.cpp#4333>.  I'm not sure why it needs to handle both events at all.

Just to get a rough sense of the amount this is hurting Speedometer, commenting out these two lines on my super fast machine increases our benchmark score by about 4-6 points!

I'd like to discuss what we can do with this code.  We can probably improve things a bit by:

 * Moving FrameTree.jsm into content-sessionStore.js to eliminate the cross-compartment wrappers in between the objects.
 * Not listening on change events (as I think we should fire input events for everything we care about here)
 * Moving FrameTree.jsm logic into C++ somehow (I haven't studied it very closely to figure out what it does yet)
 * Move the entire thing into C++ (my preferred approach)
 * Consider something entirely different, such as not saving this information until the user interacts with the form control or something along those lines.
Flags: needinfo?(ttaubert)
huh, 4-6 points. That is a lot. I had seen this a bit in some profile, but it didn't show up that badly.

And yes, moving to C++, at least some part of the implementation, would be the best option.
Depends on: 1373677
I think some work on improving this code is being done in bug 1362330, where we're trying to make the xpath work be done in C++ code. A significant fraction of the call stacks in your profile seem to be occurring within xpath evaluation, so I imagine that this will help a fair amount.

After that change, I think the next big thing would be re-unifying the sessionstore JSMs back into a single javascript context to avoid the cross context overhead (which is rather unfortunate for code separation, but oh well). We could also move parts of it into C++, but that could take some time. For example, I imagine it wouldn't be too difficult on the platform side to do the form element at load content recording work with almost no overhead, but that would require rewriting the logic, and probably some work in editor.
Depends on: 1362330
(In reply to Michael Layzell [:mystor] from comment #2)
> I think some work on improving this code is being done in bug 1362330, where
> we're trying to make the xpath work be done in C++ code.

Ah, the FormData.collect() part?  I missed that initially...  Nice!

> After that change, I think the next big thing would be re-unifying the
> sessionstore JSMs back into a single javascript context to avoid the cross
> context overhead (which is rather unfortunate for code separation, but oh
> well).

That would be trivial if it weren't for this test, FWIW <https://searchfox.org/mozilla-central/rev/20d16dadd336e0c6b97e3e19dc4ff907744b5734/browser/components/sessionstore/test/content.js#13>

But note that this is a small part of the cost, the bigger part is the Xray wrapper part.  :-(

> We could also move parts of it into C++, but that could take some
> time. For example, I imagine it wouldn't be too difficult on the platform
> side to do the form element at load content recording work with almost no
> overhead, but that would require rewriting the logic, and probably some work
> in editor.

Perhaps we should wait for bug 1362330 to land and remeasure this?  It would also be nice if someone can figure out how to get rid if the CCW overhead here in the mean time!  (Can we use #include in js somehow using the python preprocessor we have?!)
We need to do more work in C++ or have the listener called fewer times.
Could we perhaps have some C++ event listener collecting interesting event targets, and then it would use idle time to pass the data to some JS implemented callback.
Also CCing Will who has been looking at a lot of sessionstore performance issues lately...
Whiteboard: [qf:p1]
(In reply to Olli Pettay [:smaug] from comment #4)
> We need to do more work in C++ or have the listener called fewer times.
> Could we perhaps have some C++ event listener collecting interesting event
> targets, and then it would use idle time to pass the data to some JS
> implemented callback.

I was thinking of something along the lines of this. I don't know if this would be functional, but it seems a lot simpler.
* adding a flag to HTMLInputElement (and similar nodes) which would be set if the node was part of the document during load.
* Whenever the value of this element changes, dispatch an idle event which will notify SessionStorage that the element had its value changed, and what the new value was. Don't dispatch the event if the flag is not set.
* In the JS code, record the values and the xpaths of the elements whenever these events arrive.

This way we never need to periodically check things, or iterate over FrameTrees etc., and we only do events in idle callbacks.

I'm sure there are some fiddly bits which I'm overlooking here, but it might be a nice idea.
Wouldn't that flood idle queue.
Note that this code also needs to work with other types of form controls besides inputs.

Also, based on the profile, I think the part that is really costly is really FrameTree.jsm itself, as it is dealing with JS window objects which means that it will get xray wrappers to them every time that it touches them.  I read the code and as far as I understand it, it is trying to maintain a map of the frames to their parents when they finish loading <https://searchfox.org/mozilla-central/rev/20d16dadd336e0c6b97e3e19dc4ff907744b5734/browser/components/sessionstore/FrameTree.jsm#231>, which mean effectively the code inside handleEvent() is checking whether the frame that the event is coming from is currently loading or not!  There are vastly more efficient ways of checking that in C++ at least.  I tried commenting out the code that uses FrameTree.jsm to check this: <https://searchfox.org/mozilla-central/rev/20d16dadd336e0c6b97e3e19dc4ff907744b5734/browser/components/sessionstore/content/content-sessionStore.js#453,457,459> and with that you get a profile like this: <http://bit.ly/2sHmFHN>

This tells me two things:

 * Bug 1362330 is not going to help here unfortunately due to the existence of this createLazy() call: <https://searchfox.org/mozilla-central/rev/20d16dadd336e0c6b97e3e19dc4ff907744b5734/browser/components/sessionstore/content/content-sessionStore.js#773>
 * Even if we got rid of all of the FrameTree.jsm overhead, the result would still be too slow...
No longer depends on: 1362330
(In reply to Olli Pettay [:smaug] from comment #7)
> Wouldn't that flood idle queue.

Theoretically we could be smart about not double-dispatching these events.

(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #8)
> Note that this code also needs to work with other types of form controls
> besides inputs.

Yeah, we'd have to extend this beyond just input, not sure how tricky that would be.

> Also, based on the profile, I think the part that is really costly is really
> FrameTree.jsm itself, as it is dealing with JS window objects which means
> that it will get xray wrappers to them every time that it touches them.  I
> read the code and as far as I understand it, it is trying to maintain a map
> of the frames to their parents when they finish loading
> <https://searchfox.org/mozilla-central/rev/
> 20d16dadd336e0c6b97e3e19dc4ff907744b5734/browser/components/sessionstore/
> FrameTree.jsm#231>, which mean effectively the code inside handleEvent() is
> checking whether the frame that the event is coming from is currently
> loading or not!  There are vastly more efficient ways of checking that in
> C++ at least.

My goal with my idea was to avoid the need for FrameTree entirely. Would having that be completely replaced with events coming from C++ be sufficient to avoid the performance issues?

For example, a (I think simple-ish, very minimal and wouldn't remove all of the overhead, but) thing which we could do is this: Add a flag to elements as to whether or not they were added to the DOM before the load event (I don't know if we have extra flag space in the Element struct right now - I'm operating under the assumption right now that we do), and expose a chrome only method to check that flag. Check that flag in https://searchfox.org/mozilla-central/rev/20d16dadd336e0c6b97e3e19dc4ff907744b5734/browser/components/sessionstore/content/content-sessionStore.js#410 instead of looking at the FrameTree. That way we don't need to do the whole FrameTree dance synchronously in response to these events.

How do you feel about that possibility?
First, here's what's happening in this function:

* When an input or change event fires, we check if the event fired in a frame that was created after the load event fired (i.e., a dynamically created frame). If it was, the event is discarded. That's what the gFrameTree.contains test does, as the comment explains. I verified that the code actually does this.

* If the event is not for a dynamically generated iframe, then we set a one-second timer. When that timer fires, we go through all the iframes in the tab, collect form data for them, and send it to the parent.

I'm not sure why we listen for both "input" and "change" events. The code for "change" events was added in bug 346337 and I don't see much of an explanation for it. However, we have a test for that bug, so hopefully we can just remove the code and see if it still passes. One odd thing is that MDN claims that the "input" event is not fired for checkboxes when the user clicks the control. As far as I can tell, that's wrong. But it might explain why someone thought we needed the "change" event.

It seems like there could be some easy wins here:

1. Use some sort of idle mechanism for the one second timer. I haven't been following all the idle work you guys have been doing, but I think we want the function to fire >= 1 second in the future, probably less than 30 seconds or so, and when the event loop is idle. That should mean we'd be firing this thing much less often during a benchmark run.

2. The first thing we should do when we get an input event is to check whether the timer is pending. If it is, we can simply return immediately. This will happen in the vast majority of cases. And this check doesn't require any wrapper overhead or anything since all the timer stuff is done in content-sessionStore.js itself.

I think with these two fixes the cost of this would be very low (assuming we can get the idle timer to fire rarely during benchmarking).
(In reply to Bill McCloskey (:billm) from comment #10)
> I'm not sure why we listen for both "input" and "change" events. The code
> for "change" events was added in bug 346337 and I don't see much of an
> explanation for it. However, we have a test for that bug, so hopefully we
> can just remove the code and see if it still passes. One odd thing is that
> MDN claims that the "input" event is not fired for checkboxes when the user
> clicks the control. As far as I can tell, that's wrong. But it might explain
> why someone thought we needed the "change" event.

A common problem with SessionStore, we added something years ago without justification and it's hard to find out if it breaks anyone in the wild, and why we thought that was necessary. I'd say let's give it a try and add a test for checkboxes if there's none yet.

> 1. Use some sort of idle mechanism for the one second timer. I haven't been
> following all the idle work you guys have been doing, but I think we want
> the function to fire >= 1 second in the future, probably less than 30
> seconds or so, and when the event loop is idle. That should mean we'd be
> firing this thing much less often during a benchmark run.

Yeah, we should do that. One of the new idle timers would probably be a good fit.


> 2. The first thing we should do when we get an input event is to check
> whether the timer is pending. If it is, we can simply return immediately.
> This will happen in the vast majority of cases. And this check doesn't
> require any wrapper overhead or anything since all the timer stuff is done
> in content-sessionStore.js itself.

That might be an easy solution indeed. Definitely worth doing.
Flags: needinfo?(ttaubert)
Bill described accurately what FrameTree.jsm does [1]. My gut feeling is that moving the FrameTree to C++ might not be worth it, I think we can speed up things mostly by only using the FrameTree when we really need it, as also Bill suggested in the comment above.

Good to have:

* A way to determine whether an event was fired by a frame that existed at the time the page was loaded. We want to ignore frames that were added dynamically. We might be able to do this without touching a JS window via C++ and/or with extra information on the event.

* Only the PageStyleListener currently needs to touch the window via the observer notification [2]. We can probably find a way to not do that.

* With the two points above we could basically get rid of FrameTree.contains() and make sure we don't call it in a hot path.

The remaining functionality (observers, map, forEach) doesn't seem worth moving to C++. map() and forEach() are only called when we collect data, and that should be done off an idle timer. We need to touch the DOM/window for data collection anyway. The frame tree needed for map() and forEach() is collected by FrameTreeInternal.collect() only once on page load.

[1] http://searchfox.org/mozilla-central/source/browser/components/sessionstore/FrameTree.jsm#17
[2] http://searchfox.org/mozilla-central/source/browser/components/sessionstore/content/content-sessionStore.js#497
The C++ -> JS call does itself show up in the profiles, that is why I'd like to see C++ here.
But sure, we can try first optimize staying in JS, and move to C++ if needed.
RE Xrays, this looks quite similar to bug 613498, which I expect bug 1348099 will be helpful for XrayResolveMethod() and XrayResolveAttribute(). Bug 1368325 that removes js::gc::BarriersAreAllowedOnCurrentThread() helps unwrapping wrapper a bit (only Beta and Nightly), and I am still checking is it possible to improve it further.
(In reply to Olli Pettay [:smaug] from comment #13)
> The C++ -> JS call does itself show up in the profiles, that is why I'd like
> to see C++ here.
> But sure, we can try first optimize staying in JS, and move to C++ if needed.

I guess we could remove the event listeners while the timer is active and then add them back again when it fires. That would only be beneficial if adding/removing listeners is fairly cheap, but hopefully is is cheap.
(In reply to Bill McCloskey (:billm) from comment #15)
> (In reply to Olli Pettay [:smaug] from comment #13)
> > The C++ -> JS call does itself show up in the profiles, that is why I'd like
> > to see C++ here.
> > But sure, we can try first optimize staying in JS, and move to C++ if needed.
> 
> I guess we could remove the event listeners while the timer is active and
> then add them back again when it fires. That would only be beneficial if
> adding/removing listeners is fairly cheap, but hopefully is is cheap.

That's a good idea too. We'd add/remove that at most once per second so I'd hope this wouldn't show up in the profile.
Oh, I didn't realize we don't have to listen all the events and check their targets.
If that really is the case, great.
But don't we miss then some frames, possibly?
(In reply to Olli Pettay [:smaug] from comment #17)
> Oh, I didn't realize we don't have to listen all the events and check their
> targets.
> If that really is the case, great.
> But don't we miss then some frames, possibly?

Once the timer fires, we collect data for all frames, so we wouldn't miss anything. I think we do it this way because we don't have any way to tell the parent about only a particular frame.
(In reply to Bill McCloskey (:billm) from comment #18)
> Once the timer fires, we collect data for all frames, so we wouldn't miss
> anything. I think we do it this way because we don't have any way to tell
> the parent about only a particular frame.

We _could_ tell the parent about only a subset of frames without too much extra coding work IIRC, we just haven't done this in the past. In general other than session history, sessionstore is not super granular in terms of what information it captures. We should perhaps consider recording which ones have actually changed, and only updating the information about those.
(In reply to Bill McCloskey (:billm) from comment #18)
> (In reply to Olli Pettay [:smaug] from comment #17)
> > Oh, I didn't realize we don't have to listen all the events and check their
> > targets.
> > If that really is the case, great.
> > But don't we miss then some frames, possibly?
> 
> Once the timer fires, we collect data for all frames, 
Then the code is rather surprising. Why does it explicitly collect frames?

> so we wouldn't miss
> anything. I think we do it this way because we don't have any way to tell
> the parent about only a particular frame.

But ok, sounds like this could be fixed in a quite simple way.
Need to find someone to do it ;)
This is rather high priority.
(In reply to Bill McCloskey (:billm) from comment #15)
> (In reply to Olli Pettay [:smaug] from comment #13)
> > The C++ -> JS call does itself show up in the profiles, that is why I'd like
> > to see C++ here.
> > But sure, we can try first optimize staying in JS, and move to C++ if needed.
> 
> I guess we could remove the event listeners while the timer is active and
> then add them back again when it fires. That would only be beneficial if
> adding/removing listeners is fairly cheap, but hopefully is is cheap.

I gave this a shot and I can't tell whether it improves the benchmark score. I get similar ratings with and without the patch on my machine... At least it doesn't break any tests.
(In reply to Olli Pettay [:smaug] from comment #20)
> > so we wouldn't miss
> > anything. I think we do it this way because we don't have any way to tell
> > the parent about only a particular frame.
> 
> But ok, sounds like this could be fixed in a quite simple way.
> Need to find someone to do it ;)
> This is rather high priority.

If we do that then we could only stop listening for input events on a single frame at a time to not miss out invalidating form data for others.

We'd also have to merge partial form data with what we collected before, so we'd either have a cache in the content script and still send full data to the parent as we do now, or have a cache in the parent that can handle partial updates.

How useful would this be? Would this improve our benchmark score because that's heavy on non-dynamically created frames? Are we optimizing for sites with many "static" frames where we are slowed down because we collect data for all frames although only a single one changed?

With bug 1362330, I would think that collecting form data ~once a second wouldn't have a huge impact anymore (even if done for all frames in the "tree") but I might be wrong.
Flags: needinfo?(ttaubert)
Talked to Mike last week and he said the either he or a colleague could take a look. Michael took a slightly different approach [1] that might be better than my patch above, he couldn't get the tests to run but we can figure that out.

Over to Mike, happy to talk more about this bug and answer any questions you might have.

[1] https://github.com/mystor/mozilla-central/commit/3e6934ac77ff76bd193e4ed7f16bbdf657af0543
Flags: needinfo?(ttaubert) → needinfo?(mdeboer)
I talked to Quan (our sessionstore perf GSoC intern!) about this bug and he'll take a look at this bug to see if it's a good fit for him.
The patch in bug 1362330 is almost ready to check-in again, so that'll already help somewhat.
Flags: needinfo?(mdeboer) → needinfo?(nnn_bikiu0707)
I tested the patch by Tim. On my machine, the score increased by 2 point, from 81.20 to 83.13.

The PageStyleListener is suggested to be removed by Gijs in bug 1348816. So I guess that shouldn't be a problem.

However, I don't understand why we want to avoid FrameTree.contains. It seems like a simple check to see if a frame is in a map.
Flags: needinfo?(nnn_bikiu0707)
(In reply to Bao Quan [:beekill] from comment #25)
> I tested the patch by Tim. On my machine, the score increased by 2 point,
> from 81.20 to 83.13.

Wow...  That's kind of surprising actually.  Can you please capture a profile similar to the one in comment 0 so that we can compare the before and after cost?  This benchmark is pretty noisy and at least on my machine, a change of 2 points is very well within the noise range of the benchmark, so the only good way to know whether patches help issues like this is by looking at the raw profiles.

> The PageStyleListener is suggested to be removed by Gijs in bug 1348816. So
> I guess that shouldn't be a problem.

That doesn't show up at all in the profile in comment 0, so no I don't think it's a problem at all here.

> However, I don't understand why we want to avoid FrameTree.contains. It
> seems like a simple check to see if a frame is in a map.

Did you read comment 0?  It's not a simple check at all.  If you look at the profile there you will see that as soon as we touch the frame object in the JS code there, an XRay wrapper gets created for it, since the frame object is a content JS object and we're coming from privileged JS code.  Furthermore, the access to the contains property of the FrameTree object goes through a cross compartment wrapper since the FrameTree object currently is imported from a JSM (FrameTree.jsm) so any JS object that we import from it will be wrapped inside an invisible cross compartment wrapper.  This means that this code is *much* slower than if the FrameTree object here was a "normal" JS object that wouldn't get a cross compartment wrappper, and that we wouldn't need to wrap the frame object in an XRay wrapper.  It is those additional costs that slow us down here (in fact, according to the profile in comment 0, these two are the *only* costly thing going on here, which is why I said the only fix to this problem is to port this code to C++ IMO, which would allow us to avoid these two bits of overhead.)

As long as this code keeps existing in JS, our only options for making it fast are:

 * Stop importing FrameTree from a JSM.  Basically delete FrameTree.jsm and copy and paste its code into content-sessionStore.js and find a way to test it that doesn't require importing a JSM.
 * Stop touching the content object from JS in order to avoid creating an XRay wrapper.
(In reply to Tim Taubert [:ttaubert] from comment #23)
> [1]
> https://github.com/mystor/mozilla-central/commit/
> 3e6934ac77ff76bd193e4ed7f16bbdf657af0543

The solution in my patches (linked above) for this problem was to:
a) Switch from using FrameTree.jsm to using the nsIDocShell.dynamicallyCreated (or createdDynamically - can't remember which), and
b) Add a helper chrome-only method in C++ on Event to check if the default view for the given event's target was created dynamically, which could then be used instead of the current implementation to avoid the xrays overhead on input events.

I wasn't trying to make FrameTree.jsm faster (it doesn't show up as much IIRC), rather just to remove the xray and cross-compartment overhead. I figured that using a bit of data which was already in C++ would be the easiest way to do that.
Bao's patch looks pretty good to me. It ensures that we run the event listener at most once per second. Bao, I think it would be good to compare your patch to simply commenting out the entire listener, since that's the best we can possibly do here.

Ideally we would run the listener even less often by running the timer at idle priority. But I don't think the platform supports that right now, and I'm not sure if it would help all that much.

We might also be able to cancel the timer if the user navigates away from the page. We'd have to find a performant way to do that though.
(In reply to Bill McCloskey (:billm) from comment #28)
> Bao's patch looks pretty good to me. It ensures that we run the event
> listener at most once per second. Bao, I think it would be good to compare
> your patch to simply commenting out the entire listener, since that's the
> best we can possibly do here.

Oh, I must be confused then -- I thought the patches mentioned are the ones in bug 1362330, but those do something very different!  A patch that does rate limit the listener should help quite a bit here, I think based on the profile.

Having a profile link captured with the mentioned patch applied would be really helpful here to know exactly how much content-sessionStore.js code shows up in Speedometer after it.

> Ideally we would run the listener even less often by running the timer at
> idle priority. But I don't think the platform supports that right now, and
> I'm not sure if it would help all that much.

Note that the part showing up under the profile is just collecting the data to process lazily later, not the part that runs off of the timer itself...

> We might also be able to cancel the timer if the user navigates away from
> the page. We'd have to find a performant way to do that though.

Canceling nsITimer's is super expensive FWIW since you'd need to gran the timer thread lock.

A much more performant API for this is Window.setTimeout/Window.clearTimeout.  After bug 1363829 was fixed, those APIs no longer use a physical timer under the hood when you set/unset them.
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #29)
> Oh, I must be confused then -- I thought the patches mentioned are the ones
> in bug 1362330, but those do something very different!  A patch that does
> rate limit the listener should help quite a bit here, I think based on the
> profile.

Sorry, I guess Tim is the author of the patch in this bug, which was the one I was referring to. It sounds like that's the one Bao testing in comment 25 though.

> > Ideally we would run the listener even less often by running the timer at
> > idle priority. But I don't think the platform supports that right now, and
> > I'm not sure if it would help all that much.
> 
> Note that the part showing up under the profile is just collecting the data
> to process lazily later, not the part that runs off of the timer itself...

The thing I see in the profile is handleEvent (which doesn't collect any data, it just sets a timer). With Tim's patch, handleEvent is not called while the timer is active (since we remove the event listener when we set the timer). Therefore, making the timer run less often will make handleEvent run less often. That's why I would like to make the timer fire less often.

> > We might also be able to cancel the timer if the user navigates away from
> > the page. We'd have to find a performant way to do that though.
> 
> Canceling nsITimer's is super expensive FWIW since you'd need to gran the
> timer thread lock.
>
> A much more performant API for this is
> Window.setTimeout/Window.clearTimeout.  After bug 1363829 was fixed, those
> APIs no longer use a physical timer under the hood when you set/unset them.

We're running from a content script, so I don't think we can use those here. The timeout manager is associated with an inner window.

Anyway, I'm not sure that this idea makes sense anyhow.
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #26)
> (In reply to Bao Quan [:beekill] from comment #25)
> > I tested the patch by Tim. On my machine, the score increased by 2 point,
> > from 81.20 to 83.13.
> 
> Wow...  That's kind of surprising actually.  Can you please capture a
> profile similar to the one in comment 0 so that we can compare the before
> and after cost?  This benchmark is pretty noisy and at least on my machine,
> a change of 2 points is very well within the noise range of the benchmark,
> so the only good way to know whether patches help issues like this is by
> looking at the raw profiles.

Here is the profile results:
+ Without Tim's patch: https://perfht.ml/2tP4Ffv, the score is 82.36
+ With Tim's patch: https://perfht.ml/2tOTC5U, the score is 83.54
At this point, keeping comment 14 in mind as well, I think it's warranted to re-profile and see if this is still showing up as extremely expensive.
Yeah, one needs to look at the profile and see whether the expensive call is gone.
This isn't expensive enough to show up reliably in speedometer score.
(we need to fix several this kinds of issues to improve speedometer score)
The profile from Nightly 20170710100245: https://perfht.ml/2t8uAKl, the function takes 236ms in the ~60s record, I don't know if this is expensive.
So how do we get this bug fixed? I'm not familiar with this code at all, but in current m-i content-sessionStore.js' event listener shows up pretty badly in profiles.
Flags: needinfo?(ttaubert)
I just figured I'd throw this together.

smaug, this is an even simpler version of :ttaubert's patches, which doesn't
bother unregistering the event listeners, and instead just sets a boolean flag.
I _think_ that it should make FormDataListener.handleEvent() disappear from
profiles if I understand this bug correctly. Is there any chance you could try
applying it and see if it stops appearing in profiles?

MozReview-Commit-ID: 65e4TW6HZPN
Attachment #8885458 - Flags: feedback?(bugs)
I looked at the profile with Tim's patch applied, and it seems like we're still ending up in the listener more often than once per second. I guess that must be because onFrameTreeReset is getting called (although, oddly, FrameTree.jsm doesn't show up in the profile at all). onFrameTreeReset is called whenever we start a new top-level load in the tab, which I imagine could happen a lot in Speedometer.

There are a couple ways I could think to fix this, but we should first verify whether onFrameTreeReset is indeed the problem. Commenting it out should do the trick.
Comment on attachment 8885458 [details] [diff] [review]
Avoid touching ccws or xrays while a FormDataListener collect is pending

If we can void doing the event listener call, then we should.
Is there something wrong with the other patch?
Attachment #8885458 - Flags: feedback?(bugs) → feedback-
I did some logging to see what we're doing inside FormDataListener::handleEvent.

The log from the console shows that:
+ the onFrameTreeReset is called only once, when the speedometer page is loaded.
+ the FrameTree::contains always returns false

It means that the only thing that is executed in handleEvent is FrameTree::contains. So Tim and Michael's patches are not going to help here.

If FrameTree::contains is expensive, I think we could do the following:
+ Move FrameTree::contains into MessageQueue callback
+ Remove the event callback when we're collecting.
This way, we don't have to check for FrameTree::contains every times an input happens.

However, I don't think it's going to help much as I've said in comment 32.

What do you guys think?
How often is handleEvent itself called?
(In reply to Bao Quan [:beekill] from comment #40)
> Created attachment 8887370 [details] [diff] [review]
> Logging inside FormDataListener::handleEvent
> 
> I did some logging to see what we're doing inside
> FormDataListener::handleEvent.
> 
> The log from the console shows that:
> + the onFrameTreeReset is called only once, when the speedometer page is
> loaded.
> + the FrameTree::contains always returns false
> 
> It means that the only thing that is executed in handleEvent is
> FrameTree::contains. So Tim and Michael's patches are not going to help here.
> 
> If FrameTree::contains is expensive, I think we could do the following:
> + Move FrameTree::contains into MessageQueue callback
> + Remove the event callback when we're collecting.
> This way, we don't have to check for FrameTree::contains every times an
> input happens.
> 
> However, I don't think it's going to help much as I've said in comment 32.
> 
> What do you guys think?

My understanding from talking with ehsan in the past was that the expensive code was not FrameTree::contains, but rather the fact that we are in chrome code and doing 2 somewhat expensive things:
1. Calling methods or accessing properties on content JS objects (such as the event's target DOM node) from chrome JS creates and then calls through an XRays Wrapper.
2. Calling methods or accessing properties in a separate JSM (such as FrameTree) creates and then calls through a Cross-Compartment Wrapper.

My understanding from ehsan's discussion of what he saw in profiles was that those were the expensive components. The reasoning behind my approach in my earlier WIP patch (which uses nsIDocShell::dynamicallyCreated) was to avoid creating any CCWs or XRays.

Ehsan, is this correct based on previous profiles you've collected?
Flags: needinfo?(ehsan)
I think Olli's question is the most pertinent here. The goal of the patches in this bug is to have fewer calls to handleEvent. I think we still don't know if the patches accomplish that goal. If we don't call handleEvent very often, then the cost of FrameTree.contains (and the CCWs involved in calling it) doesn't matter.
What Michael said actually matches what I see when I try different things. If I remove the FrameTree.contains() check then we still spend ~250ms in SessionStore. Moving the FrameTree into content-sessionStore.jsm, out of its own JSM, didn't seem to move the needle either.

IIRC, the only patch that was actually promising was Michael's, that introduces nsIDocShell::dynamicallyCreated. I think the time spent in SessionStore dropped by ~50%. I tried a similar approach by extending the EventListenerManager to not call the SessionStore listener for frames with dynamicallyCreated=true, but that didn't help.

Talking out loud now, I wonder if the CCW/XRay overhead is actually worse in FrameTree.map()/forEach()/collect(), i.e. when iterating all frames and subframes. So basically not, or not only, when calling handleEvent().
Flags: needinfo?(ttaubert)
(In reply to Michael Layzell [:mystor] from comment #42)
> My understanding from talking with ehsan in the past was that the expensive
> code was not FrameTree::contains, but rather the fact that we are in chrome
> code and doing 2 somewhat expensive things:
> 1. Calling methods or accessing properties on content JS objects (such as
> the event's target DOM node) from chrome JS creates and then calls through
> an XRays Wrapper.
> 2. Calling methods or accessing properties in a separate JSM (such as
> FrameTree) creates and then calls through a Cross-Compartment Wrapper.
> 
> My understanding from ehsan's discussion of what he saw in profiles was that
> those were the expensive components. The reasoning behind my approach in my
> earlier WIP patch (which uses nsIDocShell::dynamicallyCreated) was to avoid
> creating any CCWs or XRays.
> 
> Ehsan, is this correct based on previous profiles you've collected?

Yes, and please note that the majority of this cost goes through XRay wrappers (just grabbed a new profiled from a recent m-c build after the bugs mentioned in comment 14 and the situation is still the same.)  In other words, if we fix that issue, we may not actually need to worry too much about the CCW issue (or at least we'd need to remeasure.)

One thing that isn't clear to me is how many times handleEvent is being called here before and after attachment 8879440 [details] [diff] [review].  It would be helpful if someone answered this question.

(In reply to Tim Taubert [:ttaubert] from comment #44)
> What Michael said actually matches what I see when I try different things.
> If I remove the FrameTree.contains() check then we still spend ~250ms in
> SessionStore. Moving the FrameTree into content-sessionStore.jsm, out of its
> own JSM, didn't seem to move the needle either.

Both make perfect sense, since:

a) The thing that triggers the XRay wrapper creation is the declaration of the |frame| local variable, not the contains() call.
b) As I mentioned before, profiles show the CCW overhead to be a small part of the problem at best (and in general it's very difficult to understand how much CCW overhead exists just by looking at a profile.)  Since you have investigated this, I think we can ignore the CCW overhead for now.

> Talking out loud now, I wonder if the CCW/XRay overhead is actually worse in
> FrameTree.map()/forEach()/collect(), i.e. when iterating all frames and
> subframes. So basically not, or not only, when calling handleEvent().

I don't see that in the profiles, do you?
Flags: needinfo?(ehsan)
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #45)
> (In reply to Tim Taubert [:ttaubert] from comment #44)
> > What Michael said actually matches what I see when I try different things.
> > If I remove the FrameTree.contains() check then we still spend ~250ms in
> > SessionStore. Moving the FrameTree into content-sessionStore.jsm, out of its
> > own JSM, didn't seem to move the needle either.
> 
> Both make perfect sense, since:
> 
> a) The thing that triggers the XRay wrapper creation is the declaration of
> the |frame| local variable, not the contains() call.

I did remove the declaration too, of course, we don't actually need the frame until we collect data.

> > Talking out loud now, I wonder if the CCW/XRay overhead is actually worse in
> > FrameTree.map()/forEach()/collect(), i.e. when iterating all frames and
> > subframes. So basically not, or not only, when calling handleEvent().
> 
> I don't see that in the profiles, do you?

At least I think I saw it back when I tried different patches and different approaches. Probably when testing the EventListenerManager patch that got rid of the FrameTree.contains() calls. Without any WIP patches comment #0 remains valid.

I'll recheck tomorrow and take notes this time.
(In reply to Olli Pettay [:smaug] from comment #41)
> How often is handleEvent itself called?

I think that every time the checkbox is checked, the handleEvent is called twice since we have two event listener. So if the test has n checkboxes, handleEvent is called n * 2 times.

(In reply to Bill McCloskey (:billm) from comment #43)
> I think Olli's question is the most pertinent here. The goal of the patches
> in this bug is to have fewer calls to handleEvent. I think we still don't
> know if the patches accomplish that goal. If we don't call handleEvent very
> often, then the cost of FrameTree.contains (and the CCWs involved in calling
> it) doesn't matter.

(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #45)
> One thing that isn't clear to me is how many times handleEvent is being
> called here before and after attachment 8879440 [details] [diff] [review]. 
> It would be helpful if someone answered this question.

(*) As I said earlier, the FrameTree::contains [1] always returns false during the test, so the patches from Tim and Michael are not going to reduce the number of times handleEvent is called.


(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #45)
> > Talking out loud now, I wonder if the CCW/XRay overhead is actually worse in
> > FrameTree.map()/forEach()/collect(), i.e. when iterating all frames and
> > subframes. So basically not, or not only, when calling handleEvent().
> 
> I don't see that in the profiles, do you?

Because of (*), FrameTree.map()/forEach()/collect() is never executed and therefore not shown in the profiles.

[1]: http://searchfox.org/mozilla-central/source/browser/components/sessionstore/content/content-sessionStore.js#465
(In reply to Michael Layzell [:mystor] from comment #42)
> My understanding from talking with ehsan in the past was that the expensive
> code was not FrameTree::contains, but rather the fact that we are in chrome
> code and doing 2 somewhat expensive things:
> 1. Calling methods or accessing properties on content JS objects (such as
> the event's target DOM node) from chrome JS creates and then calls through
> an XRays Wrapper.
> 2. Calling methods or accessing properties in a separate JSM (such as
> FrameTree) creates and then calls through a Cross-Compartment Wrapper.

(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #45)
> (In reply to Tim Taubert [:ttaubert] from comment #44)
> > What Michael said actually matches what I see when I try different things.
> > If I remove the FrameTree.contains() check then we still spend ~250ms in
> > SessionStore. Moving the FrameTree into content-sessionStore.jsm, out of its
> > own JSM, didn't seem to move the needle either.
> 
> Both make perfect sense, since:
> 
> a) The thing that triggers the XRay wrapper creation is the declaration of
> the |frame| local variable, not the contains() call.
> b) As I mentioned before, profiles show the CCW overhead to be a small part
> of the problem at best (and in general it's very difficult to understand how
> much CCW overhead exists just by looking at a profile.)  Since you have
> investigated this, I think we can ignore the CCW overhead for now.

So can we change [1] to something like this?
```js
handleEvent(event) {
  MessageQueue.push("formdata", () => {
    let frame = event.target.ownerGlobal;
    if (gFrameTree.contains(frame) {
      return this.collect();
    } else {
      return null;
    }
  });
}
```

Currently, we declare |frame| every time handleEvent is called. And I think handleEvent is going to be called a lot in a second. Even though, this approach isn't going to reduce the number of times handleEvent is called, it's going to reduce the number of |frame| declaration to 1 per second and therefore going to reduce the XRay overhead.

[1]: http://searchfox.org/mozilla-central/source/browser/components/sessionstore/content/content-sessionStore.js#460-468
(In reply to Bao Quan [:beekill] from comment #47)
> (*) As I said earlier, the FrameTree::contains [1] always returns false
> during the test, so the patches from Tim and Michael are not going to reduce
> the number of times handleEvent is called.

I totally forgot about this, and you're totally right.

Michael's patch already gets us quite a speed-up because we're not touching |frame| and avoid CCWs/XRays. However, there are a ton of calls to handleEvent() and so even checking event.targetInDynamicDocShell is more expensive than we'd want.

I experimented with parts of Michael's patch but rewrote it so that the EventListenerManager would not call into event handlers that choose to ignore events from dynamic docShells in the first place.

https://github.com/ttaubert/gecko-dev/commit/4d3aa7ea510464e9c2a47518028cc181bc86cba2

(This btw also gets rid of the PageStyleListener tracked in bug 1348816.)

There's more we can do. We could remove the FrameTree as-is and replace it with a C++ function that returns an array/iterator of non-dynamic frames. But we still need to address the event handler parts.

Happy about any feedback, maybe there's an easier way to reach the EventListenerManagers, or mark certain EventListeners that want to ignore dynamic frames.
Happened to look at that patch and how it changes EventListenerManager.
That is so crazy stuff and I would r- that change.
ELM is super hot code and adding more and more checks there shouldn't be taken light.

Can we try some other approach?
(In reply to Olli Pettay [:smaug] from comment #51)
> Happened to look at that patch and how it changes EventListenerManager.
> That is so crazy stuff and I would r- that change.
> ELM is super hot code and adding more and more checks there shouldn't be
> taken light.

That's good feedback, thanks :)

> Can we try some other approach?

I'll think about another solution.
Sorry for sounding like a broken record here, but is it time to consider rewriting this in C++?
It seems like an easy fix here would be to remove the gFrameTree.contains check and then use Tim's patch to remove the event listener. The downside of this is that we would collect needlessly when the user is typing checking boxes in dynamic iframes. I guess that's unfortunate, but it's not the end of the world.
(In reply to Bao Quan [:beekill] from comment #48)
> So can we change [1] to something like this?
> ```js
> handleEvent(event) {
>   MessageQueue.push("formdata", () => {
>     let frame = event.target.ownerGlobal;
>     if (gFrameTree.contains(frame) {
>       return this.collect();
>     } else {
>       return null;
>     }
>   });
> }
> ```
> 
> Currently, we declare |frame| every time handleEvent is called. And I think
> handleEvent is going to be called a lot in a second. Even though, this
> approach isn't going to reduce the number of times handleEvent is called,
> it's going to reduce the number of |frame| declaration to 1 per second and
> therefore going to reduce the XRay overhead.
> 
> [1]:
> http://searchfox.org/mozilla-central/source/browser/components/sessionstore/
> content/content-sessionStore.js#460-468

Sorry for bother you, Tim, but I want to know what do you think of this approach?
Flags: needinfo?(ttaubert)
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #53)
> Sorry for sounding like a broken record here, but is it time to consider
> rewriting this in C++?

That's what I suggested in comment #50 too, but maybe I misunderstand the scope of the rewrite you have in mind. Are you thinking that all of the FormData serialization/collection code should live in C++? The FrameTree is used by multiple data collection parts, so unless we rewrite every single one of them the "easiest" thing would be to start with a C++ FrameTree implementation.

(In reply to Bao Quan [:beekill] from comment #55)
> Sorry for bother you, Tim, but I want to know what do you think of this
> approach?

No need to be sorry, really, you're not bothering me at all :) 

I think that this approach wouldn't work here (but feel free to prove me wrong and profile it) because that would lead to more form data collection when running the benchmark. So we would remove one bottle neck but increase the total amount of work we have to do.

(In reply to Bill McCloskey (:billm) from comment #54)
> It seems like an easy fix here would be to remove the gFrameTree.contains
> check and then use Tim's patch to remove the event listener. The downside of
> this is that we would collect needlessly when the user is typing checking
> boxes in dynamic iframes. I guess that's unfortunate, but it's not the end
> of the world.

I think that too wouldn't yield an overall improvement of the profile here due to what I wrote in the previous paragraph.
Depends on: 1348816, 1382635, 1365970
Flags: needinfo?(ttaubert)
(In reply to Tim Taubert [:ttaubert] from comment #56)
> (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from
> comment #53)
> > Sorry for sounding like a broken record here, but is it time to consider
> > rewriting this in C++?
> 
> That's what I suggested in comment #50 too, but maybe I misunderstand the
> scope of the rewrite you have in mind. Are you thinking that all of the
> FormData serialization/collection code should live in C++? The FrameTree is
> used by multiple data collection parts, so unless we rewrite every single
> one of them the "easiest" thing would be to start with a C++ FrameTree
> implementation.

I was mostly thinking of a FrameTree implementation in C++ also.  Or maybe moving this entire event handler to C++, which would be ideal really, and keep as much of the rest of this code untouched as we can to avoid scope creep.  I'm not super familiar with this code, but as far as I can understand, the entire goal of the FrameTree part is to know whether a frame was dynamically created or not, which is equivalent to nsIDocShell::GetCreatedDynamically() in C++ if I understand correctly.  So it seems like if this event handler was registered in C++, it could quickly figure out whether to do the |MessageQueue.push("formdata", () => this.collect());| part or not without doing all of the expensive work that is the subject of the discussion here.

Then the question of how to communicate the MessageQueue part back to JS would still remain.  I don't really have any brilliant ideas on how to do that, the easiest thing that comes to my mind is to put |MessageQueue.push("formdata", () => this.collect());| into a JS callback, and pass it to C++ and call it from there if GetCreatedDynamically() returns false.
This is the same as Tim's original patch here, but with two modifications:

1. I removed the gFrameTree.contains check.
2. There was a bug in the patch where we weren't passing useCapture=true to removeEventListener, so the listeners were not being removed.

With these two changes, the number of handleEvent calls drops from 56080 to 167. I didn't look at the score since my machine had a lot of stuff running during both benchmark runs.

As I said above, change #1 might make us collect more often in some circumstances. I have no idea how often users are likely to check checkboxes in dynamic iframes, but it's certainly possible that it could happen.

It sounds like moving the event listener to C++ wouldn't be too hard, but I just wanted to post this patch since it does in fact work.
Assignee: nobody → wmccloskey
Attachment #8879440 - Attachment is obsolete: true
Bill's latest patch does indeed work and it's not very invasive either.

I wanted to anyway experiment with another idea for dealing with the wrapper overhead, that is an event listener that acts as a native filter, and forwards only events we want. It's again based on Michael's non-dynamic docShells.

The performance impact of it seems even lower than Bill's patch, but it's slightly larger, and introduces a few new things. I personally think it isn't too bad, would love to know what others think.
Comment on attachment 8888775 [details] [diff] [review]
0001-Bug-1373672-Filter-events-from-dynamic-docShells-in-.patch

Review of attachment 8888775 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is generally a good approach, it's certainly nicer than my "targetIsInDynamicDocshell" hack which I added to event objects :P.

That being said, i do think we should change the rest of FrameTree to also call nsIDocShell::GetCreatedDynamically instead of keeping a parallel definition of whether or not a window is dynamically created. nsIDocShell::GetCreatedDynamically uses a different definition of dynamic creation than FrameTree does, and it's very likely that there are edge cases where they don't match, which could lead to some really weird edge-case behavior, unless we uniformly use GetCreatedDynamically everywhere. That property is totally accessible from chrome JS code, so theoretically we could just change FrameTree to use that instead of keeping an internal set.

::: browser/components/sessionstore/nsDynamicFrameEventFilter.h
@@ +24,5 @@
> +
> +private:
> +  bool TargetInDynamicDocShell(nsIDOMEvent* aEvent);
> +
> +  RefPtr<nsIDOMEventListener> mListener;

This should probably be a nsCOMPtr ;-)
(In reply to Michael Layzell [:mystor] from comment #60)
> That being said, i do think we should change the rest of FrameTree to also
> call nsIDocShell::GetCreatedDynamically instead of keeping a parallel
> definition of whether or not a window is dynamically created.

Yeah, that could be a follow-up. We'd just have a method that returns an array/iterator or takes a callback like forEach to get the list of frames we care about collecting data for.

> nsIDocShell::GetCreatedDynamically uses a different definition of dynamic
> creation than FrameTree does, and it's very likely that there are edge cases
> where they don't match, which could lead to some really weird edge-case
> behavior, unless we uniformly use GetCreatedDynamically everywhere. That
> property is totally accessible from chrome JS code, so theoretically we
> could just change FrameTree to use that instead of keeping an internal set.

Yeah, I'm aware that those two definitions don't match exactly. I don't think it's too bad, given that we still collect data for all frames usually, not just dirty frames. But we might of course miss an update... But if that's the "new" definition it wouldn't be too bad, and we can work on the follow-up asap.

> ::: browser/components/sessionstore/nsDynamicFrameEventFilter.h
> @@ +24,5 @@
> > +
> > +private:
> > +  bool TargetInDynamicDocShell(nsIDOMEvent* aEvent);
> > +
> > +  RefPtr<nsIDOMEventListener> mListener;
> 
> This should probably be a nsCOMPtr ;-)

Good point! Thanks for taking a look :)
I think Bill's patch is great!  Let's land it and keep remeasuring.  It solves the core issue of this bug which is the XRay wrapper overhead as far as I can tell from the code changes so I'm fairly confident that we may get away without having to do much more work on this.

I'll be happy to file follow-up bugs if needed.  :-)
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #62)
> I think Bill's patch is great!  Let's land it and keep remeasuring.  It
> solves the core issue of this bug which is the XRay wrapper overhead as far
> as I can tell from the code changes so I'm fairly confident that we may get
> away without having to do much more work on this.
> 
> I'll be happy to file follow-up bugs if needed.  :-)

I'd like to actually make sure that we don't end up causing ourselves even more problems with bill's patch. I imagine it'll get rid of the bulk of the expensive work (which are the xray wrappers being created for every change event), but it also means that every second we will perform a full form data collection pass, which could be very expensive.

The nice thing about tim's changes is that it doesn't cause us to start collecting form data during speedometer (or during any form change in a dynamically created iframe), while still avoiding that xray wrapper overhead.

Perhaps someone could get a profile with bill's patch, and make sure that we don't spend a lot of time in FormDataListener::collect (which should pretty much never have run before, and will be running now) - if it doesn't really show up then we're probably OK for now, but I do worry about it (as it is a fairly expensive operation).
This is a new version of my previous patch that I'd consider something we could/should land. Here's what it does:

* It removes FrameTree.jsm and puts the remaining state change notification observer functionality directly into content-sessionStore.js.

* It introduces a nsISessionStoreUtils service that provides native utility functions. That's also useful to have as a basis for further improvements.

* nsISessionStoreUtils.createDynamicFrameEventFilter(listener) is exactly what the previous patch did, it creates and returns an event listener/filter that skips events from dynamic frames.

* nsISessionStoreUtils.forEachNonDynamicChildFrame(frame, cb) calls |cb| for every non-dynamic child frame of |frame|. This replaces FrameTree.forEach() and .map().

* All tests are passing. I had to fix a few because we're collecting a little less data than before, and I updated browser_frametree.js to test the new code.


Here's a profile with the patch: https://perfht.ml/2tDSxP2
Attachment #8888775 - Attachment is obsolete: true
Attachment #8889462 - Flags: review?(wmccloskey)
Attachment #8889462 - Flags: review?(michael)
Attachment #8889462 - Flags: review?(bugs)
Comment on attachment 8889462 [details] [diff] [review]
0001-Bug-1373672-Filter-events-from-dynamic-docShells-in-.patch, v2

Review of attachment 8889462 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/sessionstore/SessionStorage.jsm
@@ +74,4 @@
>  var SessionStorageInternal = {
>    /**
>     * Reads all session storage data from the given docShell.
>     * @param docShell

Update this documentation to reflect the fact that this takes `content` now, instead of a `docShell` :)

::: browser/components/sessionstore/moz.build
@@ +44,5 @@
>      'TabStateCache.jsm',
>      'TabStateFlusher.jsm',
>  ]
>  
> +SOURCES += [

You currently only have one C++ source, but it might make sense to start with UNIFIED_SOURCES so you get the benefits of a unified build if/when more .cpp files end up in this directory.

::: browser/components/sessionstore/nsSessionStoreUtils.cpp
@@ +18,5 @@
> +{
> +  NS_ENSURE_TRUE(aWindow, NS_ERROR_INVALID_ARG);
> +  auto* outer = nsPIDOMWindowOuter::From(aWindow);
> +
> +  nsIDocShell *docShell = outer->GetDocShell();

nit: * on the left

@@ +33,5 @@
> +
> +    nsCOMPtr<nsPIDOMWindowOuter> window = item->GetWindow();
> +    MOZ_ASSERT(window);
> +
> +    nsIDocShell* docShell = window->GetDocShell();

The nsIDocShellTreeItem is the docshell, you can just do_QueryInterface it over.

::: browser/components/sessionstore/nsSessionStoreUtils.h
@@ +24,5 @@
> +private:
> +  ~nsSessionStoreUtils() { }
> +};
> +
> +class nsDynamicFrameEventFilter final : public nsIDOMEventListener

I would probably have declared DynamicFrameEventFilter inside of an anoymous namespace in nsSessionStoreUtils.cpp, as it is an implementation detail of nsSessionStoreUtils rather than an exposed type, but that's not a big deal.
Attachment #8889462 - Flags: review?(michael) → review+
Comment on attachment 8889462 [details] [diff] [review]
0001-Bug-1373672-Filter-events-from-dynamic-docShells-in-.patch, v2

>+class nsSessionStoreUtils final : public nsISessionStoreUtils
>+{
>+public:
>+  NS_DECL_THREADSAFE_ISUPPORTS
Why threadsafe?


>+class nsDynamicFrameEventFilter final : public nsIDOMEventListener
>+{
>+public:
>+  NS_DECL_THREADSAFE_ISUPPORTS
This certainly shouldn't be threadsafe, and this class should be cycle collectable and mListener should 
be traversed/unlinked
Attachment #8889462 - Flags: review?(michael)
Attachment #8889462 - Flags: review?(bugs)
Attachment #8889462 - Flags: review-
Attachment #8889462 - Flags: review+
Attachment #8889462 - Flags: review?(michael)
Blocks: 1376698
(In reply to Michael Layzell [:mystor] from comment #65)
> >  var SessionStorageInternal = {
> >    /**
> >     * Reads all session storage data from the given docShell.
> >     * @param docShell
> 
> Update this documentation to reflect the fact that this takes `content` now,
> instead of a `docShell` :)

Oops, done.

> > +SOURCES += [
> 
> You currently only have one C++ source, but it might make sense to start
> with UNIFIED_SOURCES so you get the benefits of a unified build if/when more
> .cpp files end up in this directory.

Good idea, done.

> > +    nsCOMPtr<nsPIDOMWindowOuter> window = item->GetWindow();
> > +    MOZ_ASSERT(window);
> > +
> > +    nsIDocShell* docShell = window->GetDocShell();
> 
> The nsIDocShellTreeItem is the docshell, you can just do_QueryInterface it
> over.

Done.

> > +class nsDynamicFrameEventFilter final : public nsIDOMEventListener
> 
> I would probably have declared DynamicFrameEventFilter inside of an anoymous
> namespace in nsSessionStoreUtils.cpp, as it is an implementation detail of
> nsSessionStoreUtils rather than an exposed type, but that's not a big deal.

I started with something similar but went the other way. I've put it in an anonymous namespace now.
(In reply to Olli Pettay [:smaug] from comment #66)
> >+class nsSessionStoreUtils final : public nsISessionStoreUtils
> >+{
> >+public:
> >+  NS_DECL_THREADSAFE_ISUPPORTS
> Why threadsafe?

Copy/paste, sorry. That indeed shouldn't be the threadsafe version.

> >+class nsDynamicFrameEventFilter final : public nsIDOMEventListener
> >+{
> >+public:
> >+  NS_DECL_THREADSAFE_ISUPPORTS
> This certainly shouldn't be threadsafe, and this class should be cycle
> collectable and mListener should 
> be traversed/unlinked

Added all the necessary CC macros, hope I got that right.
Attachment #8889462 - Attachment is obsolete: true
Attachment #8889462 - Flags: review?(wmccloskey)
Attachment #8890001 - Flags: review?(wmccloskey)
Attachment #8890001 - Flags: review?(michael)
Attachment #8890001 - Flags: review?(bugs)
Comment on attachment 8890001 [details] [diff] [review]
0001-Bug-1373672-Filter-events-from-dynamic-docShells-in-.patch, v3

Review of attachment 8890001 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/sessionstore/nsSessionStoreUtils.cpp
@@ +11,5 @@
> +using namespace mozilla::dom;
> +
> +namespace {
> +
> +  class nsDynamicFrameEventFilter final : public nsIDOMEventListener

nit: Don't indent in namespaces, and drop the ns prefix as we're in an anonymous namespace.

@@ +15,5 @@
> +  class nsDynamicFrameEventFilter final : public nsIDOMEventListener
> +  {
> +  public:
> +    NS_DECL_CYCLE_COLLECTING_ISUPPORTS
> +    NS_DECL_CYCLE_COLLECTION_CLASS_AMBIGUOUS(nsDynamicFrameEventFilter, nsIDOMEventListener)

I don't think you need the AMBIGUOUS here. There shouldn't be an ambiguous nsISupports base class.

::: browser/components/sessionstore/nsSessionStoreUtils.h
@@ +18,5 @@
> +
> +class nsSessionStoreUtils final : public nsISessionStoreUtils
> +{
> +public:
> +  NS_DECL_CYCLE_COLLECTING_ISUPPORTS

This doesn't need to be cycle collecting - it has no members. Just NS_DECL_ISUPPORTS should do it.
Attachment #8890001 - Flags: review?(michael)
(In reply to Michael Layzell [:mystor] from comment #70)
> > +namespace {
> > +
> > +  class nsDynamicFrameEventFilter final : public nsIDOMEventListener
> 
> nit: Don't indent in namespaces, and drop the ns prefix as we're in an
> anonymous namespace.

Done.

> > +  public:
> > +    NS_DECL_CYCLE_COLLECTING_ISUPPORTS
> > +    NS_DECL_CYCLE_COLLECTION_CLASS_AMBIGUOUS(nsDynamicFrameEventFilter, nsIDOMEventListener)
> 
> I don't think you need the AMBIGUOUS here. There shouldn't be an ambiguous
> nsISupports base class.

Done.

> > +public:
> > +  NS_DECL_CYCLE_COLLECTING_ISUPPORTS
> 
> This doesn't need to be cycle collecting - it has no members. Just
> NS_DECL_ISUPPORTS should do it.

This doesn't seem to work, the compiler complains about missing CC members.
Attachment #8890001 - Attachment is obsolete: true
Attachment #8890001 - Flags: review?(wmccloskey)
Attachment #8890001 - Flags: review?(bugs)
Attachment #8890033 - Flags: review?(wmccloskey)
Attachment #8890033 - Flags: review?(michael)
Attachment #8890033 - Flags: review?(bugs)
(In reply to Tim Taubert [:ttaubert] from comment #71)

> > > +public:
> > > +  NS_DECL_CYCLE_COLLECTING_ISUPPORTS
> > 
> > This doesn't need to be cycle collecting - it has no members. Just
> > NS_DECL_ISUPPORTS should do it.
> 
> This doesn't seem to work, the compiler complains about missing CC members.
Complains how? If the class isn't cycle collectable, there shouldn't be any CC members.
Did you forget to remove the CC stuff from .cpp?
(In reply to Olli Pettay [:smaug] from comment #72)
> (In reply to Tim Taubert [:ttaubert] from comment #71)
> > > > +public:
> > > > +  NS_DECL_CYCLE_COLLECTING_ISUPPORTS
> > > 
> > > This doesn't need to be cycle collecting - it has no members. Just
> > > NS_DECL_ISUPPORTS should do it.
> > 
> > This doesn't seem to work, the compiler complains about missing CC members.
> Complains how? If the class isn't cycle collectable, there shouldn't be any
> CC members.
> Did you forget to remove the CC stuff from .cpp?

Yup, sorry :(
Attachment #8890033 - Attachment is obsolete: true
Attachment #8890033 - Flags: review?(wmccloskey)
Attachment #8890033 - Flags: review?(michael)
Attachment #8890033 - Flags: review?(bugs)
Attachment #8890037 - Flags: review?(wmccloskey)
Attachment #8890037 - Flags: review?(michael)
Attachment #8890037 - Flags: review?(bugs)
Comment on attachment 8890037 [details] [diff] [review]
0001-Bug-1373672-Filter-events-from-dynamic-docShells-in-.patch, v5

>+  bool TargetInDynamicDocShell(nsIDOMEvent* aEvent)
Could we call this TargetInNonDynamicDocShell

>+  {
>+    EventTarget* target = aEvent->InternalDOMEvent()->GetTarget();
>+    if (!target) {
>+      return false;
>+    }
>+
>+    nsPIDOMWindowOuter* outer = target->GetOwnerGlobalForBindings();
>+    if (!outer) {
>+      return false;
>+    }
>+
>+    nsIDocShell* docShell = outer->GetDocShell();
>+    if (!docShell) {
>+      return false;
>+    }
...since then these falses would make more sense. No need to handle the event if docshell isn't around at all

>+
>+    bool isDynamic = false;
>+    nsresult rv = docShell->GetCreatedDynamically(&isDynamic);
>+    return NS_SUCCEEDED(rv) && isDynamic;
and then !isDynamic

and caller could drop !


>+NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION(DynamicFrameEventFilter)
>+  NS_INTERFACE_MAP_ENTRY_AMBIGUOUS(nsISupports, nsIDOMEventListener)
You can drop _AMBIGUOUS



>+nsSessionStoreUtils::ForEachNonDynamicChildFrame(mozIDOMWindowProxy* aWindow,
>+                                                 nsISessionStoreUtilsFrameCallback* aCallback)
>+{
>+  NS_ENSURE_TRUE(aWindow, NS_ERROR_INVALID_ARG);
>+  auto* outer = nsPIDOMWindowOuter::From(aWindow);
>+
>+  nsIDocShell* docShell = outer->GetDocShell();
Please null check outer before using it.
And make the variable strong.
nsCOMPtr<nsIDocShell> since in calling HandleFrame may delete it.


r+ for the c++
Attachment #8890037 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #75)
> >+  bool TargetInDynamicDocShell(nsIDOMEvent* aEvent)
> Could we call this TargetInNonDynamicDocShell

That's better, good suggestion.

> >+NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION(DynamicFrameEventFilter)
> >+  NS_INTERFACE_MAP_ENTRY_AMBIGUOUS(nsISupports, nsIDOMEventListener)
> You can drop _AMBIGUOUS

Done.

> >+  NS_ENSURE_TRUE(aWindow, NS_ERROR_INVALID_ARG);
> >+  auto* outer = nsPIDOMWindowOuter::From(aWindow);
> >+
> >+  nsIDocShell* docShell = outer->GetDocShell();
> Please null check outer before using it.
> And make the variable strong.
> nsCOMPtr<nsIDocShell> since in calling HandleFrame may delete it.

Done.

> r+ for the c++

Thanks!
Attachment #8890037 - Attachment is obsolete: true
Attachment #8890037 - Flags: review?(wmccloskey)
Attachment #8890037 - Flags: review?(michael)
Attachment #8890060 - Flags: review?(wmccloskey)
Attachment #8890060 - Flags: review?(michael)
Comment on attachment 8890060 [details] [diff] [review]
0001-Bug-1373672-Filter-events-from-dynamic-docShells-in-.patch, v6

Review of attachment 8890060 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/sessionstore/nsSessionStoreUtils.cpp
@@ +102,5 @@
> +    if (NS_SUCCEEDED(rv) && isDynamic) {
> +      continue;
> +    }
> +
> +    aCallback->HandleFrame(item->GetWindow(), i);

I'm worried that you're using the wrong index here. I think what you want is the original index in the frame tree, rather than the current index. Perhaps you should expose mChildOffset http://searchfox.org/mozilla-central/rev/3a3af33f513071ea829debdfbc628caebcdf6996/docshell/base/nsDocShell.cpp#4073, and use that?
Attachment #8890060 - Flags: review?(michael)
(In reply to Michael Layzell [:mystor] (PTO until July 28) from comment #77)
> > +    aCallback->HandleFrame(item->GetWindow(), i);
> 
> I'm worried that you're using the wrong index here. I think what you want is
> the original index in the frame tree, rather than the current index. Perhaps
> you should expose mChildOffset
> http://searchfox.org/mozilla-central/rev/
> 3a3af33f513071ea829debdfbc628caebcdf6996/docshell/base/nsDocShell.cpp#4073,
> and use that?

This is the same index as in an nsDOMWindowList, i.e. window.frames. SessionStore uses window.frames when restoring data, so I figured that's the right index to use.

However, when inserting a dynamic frame between two "static" ones, mChildOffset would yield indexes 0,1 instead of 0,2. The correct choice here depends on whether the dynamic frame is inserted before or after we try to restore data. Hmm.
(In reply to Michael Layzell [:mystor] (PTO until July 28) from comment #77)
> I'm worried that you're using the wrong index here. I think what you want is
> the original index in the frame tree, rather than the current index. Perhaps
> you should expose mChildOffset

If we skip dynamic frames and increment only for non-dynamic ones, the value should be the same as mChildOffset. We need to change FormData and ScrollPosition to use nsISessionStoreUtils to retrieve the list of frames.
(In reply to Tim Taubert [:ttaubert] from comment #79)
> (In reply to Michael Layzell [:mystor] (PTO until July 28) from comment #77)
> > I'm worried that you're using the wrong index here. I think what you want is
> > the original index in the frame tree, rather than the current index. Perhaps
> > you should expose mChildOffset
> 
> If we skip dynamic frames and increment only for non-dynamic ones, the value
> should be the same as mChildOffset. We need to change FormData and
> ScrollPosition to use nsISessionStoreUtils to retrieve the list of frames.

This was a tad more complicated because FormData.jsm and ScrollPosition.jsm are used by mobile too, and so we wouldn't want to change the API. I simply exposed .restore() functions in addition to the existing .restoreTree() functions and call those by iterating through non-dynamic frames in ContentRestore.jsm.

I extended test/browser_frametree.js to check indexes as well and make sure they're not affected by dynamic frames.
Attachment #8890060 - Attachment is obsolete: true
Attachment #8890060 - Flags: review?(wmccloskey)
Attachment #8890088 - Flags: review?(michael)
Comment on attachment 8890088 [details] [diff] [review]
0001-Bug-1373672-Filter-events-from-dynamic-docShells-in-.patch, v7

Review of attachment 8890088 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/sessionstore/nsSessionStoreUtils.cpp
@@ +88,5 @@
> +  int32_t length;
> +  nsresult rv = docShell->GetChildCount(&length);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  for (int32_t i = 0, idx = 0; i < length; ++i) {

please don't initialize idx in the for loop. I would prefer it if we initialized it outside on a separate line. I also would appreciate it if we had a more clearly distinct name so it was obvious it isn't the loop variable.

Like:
int32_t nonDynamicIdx = 0;
for (int32_t i = 0; i < length; ++i) { ... }

::: browser/components/sessionstore/test/browser_frametree.js
@@ +85,5 @@
> +  });
> +
> +  // The page still has two iframes.
> +  is(await countNonDynamicFrames(browser), 2, "two non-dynamic child frames");
> +  is(await enumerateIndexes(browser), "0,1", "correct indexes 0 and 1");

Can you add a test after this where you remove the first one? I'm worried that if you remove the first non-dynamic iframe it will report index 0 rather than index 1 for the remaining iframe :-(

I think the current FrameTree handles that case, and it would be a shame not to. As I mentioned in the previous review comment we do record this initial index for non-dynamic iframes in the nsDocShell, we just don't expose it through nsIDocShell atm.
Attachment #8890088 - Flags: review?(michael)
I didn't realize I'd assigned this to myself.
Assignee: wmccloskey → ttaubert
Ehsan was asking me how soon we could land the patches on this bug, as
apparently its blocking something important, and I had a pretty good idea of
what I was thinking of asking from you in the last review, so I just wrote up
these 2 small patches to expose the variable I was thinking of from nsIDocShell,
and then use it.

This first patch I'm just asking olli for review, and we should probably land
anyway even if we don't use this (as we're using the wrong type right now), and
the second patch is the relevant one.

----------------------

All consumers of this value expect the passed-in value to be signed, and a
negative value is stored into this variable (-1) when the docshell was
dynamically added. It makes more sense for this to be signed.

MozReview-Commit-ID: 8iKDOAx7O2R
Attachment #8891518 - Flags: review?(bugs)
The reasoning behind this is that with this change, removing a non-dynamic
docshell from the document dynamically shouldn't affect the indexes which we use
for both recording and restoring data in child docshells.

MozReview-Commit-ID: JIK8GBSWDEF
Attachment #8891519 - Flags: review?(ttaubert)
Attachment #8891519 - Flags: review?(bugs)
Comment on attachment 8891519 [details] [diff] [review]
Part 3: Expose childOffset from nsIDocShell to use in nsSessionStoreUtils, r=ttaubert

Review of attachment 8891519 [details] [diff] [review]:
-----------------------------------------------------------------

Great stuff, thanks! We should probably merge the parts here with my WIP patch? Or were you thinking of just landing these in this order?
Attachment #8891519 - Flags: review?(ttaubert) → review+
(In reply to Tim Taubert [:ttaubert] from comment #85)
> Great stuff, thanks! We should probably merge the parts here with my WIP
> patch? Or were you thinking of just landing these in this order?

I'm fine with either way.
Comment on attachment 8891518 [details] [diff] [review]
Part 2: nsDocShell::mChildOffset should be signed

interesting
Attachment #8891518 - Flags: review?(bugs) → review+
Comment on attachment 8891519 [details] [diff] [review]
Part 3: Expose childOffset from nsIDocShell to use in nsSessionStoreUtils, r=ttaubert

r+ for the docshell part, but I have no idea whether this makes sense for sessionstore.
Attachment #8891519 - Flags: review?(bugs) → review+
Comment on attachment 8890088 [details] [diff] [review]
0001-Bug-1373672-Filter-events-from-dynamic-docShells-in-.patch, v7

r=me on your patch with the 2 other ones which I wrote :-).

I'll let you land them.
Attachment #8890088 - Flags: review+
Pushed by ttaubert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e62c93a8c93b
Part 1: Filter events from dynamic docShells in Gecko before they reach SessionStore event handlers r=smaug,mystor
https://hg.mozilla.org/integration/mozilla-inbound/rev/dc318b0913b9
Part 2: nsDocShell::mChildOffset should be signed, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/36bb09c4b28e
Part 3: Expose childOffset from nsIDocShell to use in nsSessionStoreUtils, r=ttaubert, r=smaug
Backed out for bustage at nsSessionStoreUtils.cpp:21: bad implicit conversion constructor for 'DynamicFrameEventFilter, and e.g. eslint failures:

https://hg.mozilla.org/integration/mozilla-inbound/rev/0c274a6f7b7b9b41b4b29e34037f4f60244f8d05
https://hg.mozilla.org/integration/mozilla-inbound/rev/6da77209f66bc33af8a22905bb75007ef59e0abb
https://hg.mozilla.org/integration/mozilla-inbound/rev/896709cd1daf82ebe88f444e90ea3f2c995d49b9

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=36bb09c4b28edaefacb7199cd1d73e010753ecb9&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel
Static log: https://treeherder.mozilla.org/logviewer.html#?job_id=119929526&repo=mozilla-inbound

[task 2017-08-01T10:07:28.175952Z] 10:07:28     INFO -  In file included from /home/worker/workspace/build/src/obj-firefox/browser/components/sessionstore/Unified_cpp_sessionstore0.cpp:2:
[task 2017-08-01T10:07:28.176064Z] 10:07:28     INFO -  /home/worker/workspace/build/src/browser/components/sessionstore/nsSessionStoreUtils.cpp:21:3: error: bad implicit conversion constructor for 'DynamicFrameEventFilter'
[task 2017-08-01T10:07:28.176107Z] 10:07:28     INFO -    DynamicFrameEventFilter(nsIDOMEventListener* aListener)
[task 2017-08-01T10:07:28.176128Z] 10:07:28     INFO -    ^
[task 2017-08-01T10:07:28.176190Z] 10:07:28     INFO -  /home/worker/workspace/build/src/browser/components/sessionstore/nsSessionStoreUtils.cpp:21:3: note: consider adding the explicit keyword to the constructor
[task 2017-08-01T10:07:28.177514Z] 10:07:28     INFO -  1 warning generated.
[task 2017-08-01T10:07:28.177920Z] 10:07:28     INFO -  1 error generated.

There are also asserts: https://treeherder.mozilla.org/logviewer.html#?job_id=119941674&repo=mozilla-inbound
> Assertion failure: childOffset > 0, at /home/worker/workspace/build/src/browser/components/sessionstore/nsSessionStoreUtils.cpp:109 

Eslint failures: https://treeherder.mozilla.org/logviewer.html#?job_id=119929498&repo=mozilla-inbound
 TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/browser/components/sessionstore/content/content-sessionStore.js:80:31 | 'cb' is already declared in the upper scope. (no-shadow)
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/browser/components/sessionstore/test/browser_frametree.js:105:10 | Redundant use of `await` on a return value. (no-return-await)
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/browser/components/sessionstore/test/browser_frametree.js:116:10 | Redundant use of `await` on a return value. (no-return-await)
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/toolkit/modules/sessionstore/FormData.jsm:300:7 | Method 'restore' expected no return value. (consistent-return)
Flags: needinfo?(ttaubert)
This was a nice improvement on several Speedometer tests before it was backed out. We need more runs to get reliable data but we got 110 points and some nice improvements on several sub tests. So, this is great :)

https://arewefastyet.com/#machine=35&view=breakdown&suite=speedometer-misc
(In reply to Jan de Mooij [:jandem] from comment #92)
> This was a nice improvement on several Speedometer tests before it was
> backed out. We need more runs to get reliable data but we got 110 points and
> some nice improvements on several sub tests. So, this is great :)
> 
> https://arewefastyet.com/#machine=35&view=breakdown&suite=speedometer-misc

Hurray!  Thanks everyone! \o/
Pushed by ttaubert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ffc03127712c
Part 1: Filter events from dynamic docShells in Gecko before they reach SessionStore event handlers r=smaug,mystor
https://hg.mozilla.org/integration/mozilla-inbound/rev/cd86fc12a4db
Part 2: nsDocShell::mChildOffset should be signed, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/58b579b4ef4e
Part 3: Expose childOffset from nsIDocShell to use in nsSessionStoreUtils, r=ttaubert, r=smaug
See Also: → 1391298
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in before you can comment on or make changes to this bug.