Last Comment Bug 598482 - Hook up invalidation flushing to the refresh driver and make all painting asynchronous
: Hook up invalidation flushing to the refresh driver and make all painting asy...
Status: RESOLVED FIXED
[Snappy:P1][not-fennec-11]
: perf
Product: Core
Classification: Components
Component: Layout: View Rendering (show other bugs)
: Trunk
: All All
: -- normal with 12 votes (vote)
: mozilla13
Assigned To: Boris Zbarsky [:bz] (still a bit busy)
:
: Jet Villegas (:jet)
Mentors:
Depends on: 688128 718316 722641 728738 736152 749966 772048 782107 783073 984010 589442 714168 716549 718150 718287 CVE-2012-4217 807656
Blocks: 318474 554004 600443 638235 693707 718631
  Show dependency treegraph
 
Reported: 2010-09-21 15:50 PDT by Markus Stange [:mstange]
Modified: 2014-03-15 10:52 PDT (History)
55 users (show)
bzbarsky: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
wontfix


Attachments
wip (67.43 KB, patch)
2010-09-21 15:50 PDT, Markus Stange [:mstange]
no flags Details | Diff | Splinter Review
wip (148.10 KB, patch)
2011-08-02 10:57 PDT, Markus Stange [:mstange]
no flags Details | Diff | Splinter Review
part 1, v1: no sync plugin geometry update (3.65 KB, patch)
2011-10-03 10:31 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 2, v1: remove ForceUpdate() and Composite() and their callers (12.53 KB, patch)
2011-10-03 10:33 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 3, v1: remove synchronous painting APIs from nsIWidget (35.61 KB, patch)
2011-10-03 10:34 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 4, v1: remove unused nsIDOMWindowUtils::processUpdates (3.18 KB, patch)
2011-10-03 10:36 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 5, v1: remove invalidate event stuff (5.84 KB, patch)
2011-10-03 10:39 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 6, v1: remove futile eEditorUseAsyncUpdatesMask flag (7.85 KB, patch)
2011-10-03 10:41 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 7, v1: remove update flags (45.12 KB, patch)
2011-10-03 10:42 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 8, v1: remove unused mUpdateCnt (3.75 KB, patch)
2011-10-03 10:44 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 9, v1: remove ignored aIgnoreWidgetView argument (4.52 KB, patch)
2011-10-03 10:45 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 11, v1: set up a connection between view manager and refresh driver (13.63 KB, patch)
2011-10-03 10:47 PDT, Markus Stange [:mstange]
no flags Details | Diff | Splinter Review
part 10, v1: factor out AddDirtyRegion and FlushDirtyRegionToWidget (4.87 KB, patch)
2011-10-03 10:50 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 12, v1: remove aWidget argument (6.43 KB, patch)
2011-10-03 10:52 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 13, v1: flush invalidations from the refresh driver instead of from view update batches (17.99 KB, patch)
2011-10-03 11:09 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 14, v1: reorganize update view batches (26.92 KB, patch)
2011-10-03 11:12 PDT, Markus Stange [:mstange]
no flags Details | Diff | Splinter Review
part 15, v1: remove repaint caused by listbox popup opening (1.16 KB, patch)
2011-10-03 11:14 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 16, v1: separate NS_WILL_PAINT and NS_PAINT handling; only flush again if no NS_WILL_PAINT event has been sent by the platform (7.73 KB, patch)
2011-10-03 11:16 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 17, v1: remove unused aViewManager argument (5.15 KB, patch)
2011-10-03 11:17 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 18, v1: rename Update* to Invalidate* (22.15 KB, patch)
2011-10-03 11:19 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 19, v1: update the invalidation setup comment (2.10 KB, patch)
2011-10-03 11:20 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 20, v1: bump nsIViewManager IID (740 bytes, patch)
2011-10-03 11:22 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 11, v2 (13.19 KB, patch)
2011-10-13 11:11 PDT, Markus Stange [:mstange]
roc: review+
Details | Diff | Splinter Review
part 13, v2 (17.93 KB, patch)
2011-10-13 11:15 PDT, Markus Stange [:mstange]
no flags Details | Diff | Splinter Review
Rename IsRefreshEnabled to IsPaintEnabled. Fits between part 13 and part 14 of Marcus' patch queue (3.08 KB, patch)
2011-12-09 23:09 PST, Boris Zbarsky [:bz] (still a bit busy)
roc: review+
Details | Diff | Splinter Review
Update of part 14 per comment 83 item 2 (27.03 KB, patch)
2011-12-09 23:13 PST, Boris Zbarsky [:bz] (still a bit busy)
roc: review+
Details | Diff | Splinter Review
Part 14.5 in the old numbering. Flush out widget geometry changes when flushing layout. (10.12 KB, patch)
2011-12-09 23:16 PST, Boris Zbarsky [:bz] (still a bit busy)
roc: review+
Details | Diff | Splinter Review
part 23. When the refresh driver runs, update plugin geometry after layout updates. (1.32 KB, patch)
2011-12-09 23:39 PST, Boris Zbarsky [:bz] (still a bit busy)
roc: review+
Details | Diff | Splinter Review
Part 23 with a timer, as discussed (4.47 KB, patch)
2011-12-14 09:51 PST, Boris Zbarsky [:bz] (still a bit busy)
no flags Details | Diff | Splinter Review
Timer to update plugin geometry, actually passing tests (5.22 KB, patch)
2011-12-14 15:52 PST, Boris Zbarsky [:bz] (still a bit busy)
roc: review+
Details | Diff | Splinter Review
Interdiff to that last patch to expose the default interval (2.72 KB, patch)
2011-12-14 23:16 PST, Boris Zbarsky [:bz] (still a bit busy)
roc: review+
Details | Diff | Splinter Review
part 24. Flush on every mousemove, because otherwise we can end up with mouse events (mousemove, mousein, mouseout) dispatched to the wrong elements. (6.70 KB, patch)
2011-12-22 17:14 PST, Boris Zbarsky [:bz] (still a bit busy)
bugs: review+
Details | Diff | Splinter Review

Description Markus Stange [:mstange] 2010-09-21 15:50:06 PDT
Created attachment 477322 [details] [diff] [review]
wip

I want to simplify our view update / invalidation handling. That entails getting rid of view update batches, handling view resizes and invalidation flushes from the refresh driver, and getting rid of all synchronous refreshes.

I'll write up a better description tomorrow and just put up a wip patch for now.
Comment 1 Boris Zbarsky [:bz] (still a bit busy) 2010-09-21 19:44:33 PDT
We probably do still need to keep the Flush_Display stuff, no?  And probably make windowutils use it?
Comment 2 Boris Zbarsky [:bz] (still a bit busy) 2010-09-21 19:50:41 PDT
Also, would we ever have more than one presshell being an invalidation flush observer in practice?  It needs to be the presshell for a root vm, right?
Comment 3 Timothy Nikkel (:tnikkel) 2010-09-21 19:52:58 PDT
What if a non-root presshell has a popup widget?
Comment 4 Boris Zbarsky [:bz] (still a bit busy) 2010-09-21 20:04:56 PDT
What about it?  Popups don't get a separate viewmanager, right?
Comment 5 Timothy Nikkel (:tnikkel) 2010-09-21 20:07:12 PDT
So wouldn't the view manager for that non-root presshell that contains the popup need to do the painting itself?
Comment 6 Boris Zbarsky [:bz] (still a bit busy) 2010-09-21 20:46:36 PDT
Well, maybe.  Right now in the patch all the refresh driver does is call TriggerRefresh on the view managers of all the presshells registered with it, and TriggerRefresh always forwards to the root VM.
Comment 7 Boris Zbarsky [:bz] (still a bit busy) 2010-09-21 20:47:06 PDT
And to be clear, TriggerRefresh just invalidates native widgets; it does NOT do any actual painting.  Again, with this patch.
Comment 8 Markus Stange [:mstange] 2010-09-22 02:47:25 PDT
(In reply to comment #1)
> We probably do still need to keep the Flush_Display stuff, no?

I don't know what the use case for it is. I see that our reftest system uses windowutils.processUpdates, but I don't understand why it needs it yet. I'm going to find out now.

Not sure about the root presshell view manager stuff yet.
Comment 9 Markus Stange [:mstange] 2010-09-22 04:31:26 PDT
(In reply to comment #8)
> I see that our reftest system uses
> windowutils.processUpdates, but I don't understand why it needs it yet.

I think it doesn't need it. MozAfterPaint events and drawWindow results are completely independent from native widget drawing.
Comment 10 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-09-22 19:17:13 PDT
This sounds great!
Comment 11 Boris Zbarsky [:bz] (still a bit busy) 2010-10-01 23:53:21 PDT
We should consider flushing invalidates onload; otherwise there can be a 20ms pause between when onload fires and when things start to paint.
Comment 12 Timothy Nikkel (:tnikkel) 2010-10-01 23:57:13 PDT
But we also want to not flush layout onload? Wouldn't that be a bad combination?
Comment 13 Markus Stange [:mstange] 2010-10-02 04:15:55 PDT
(In reply to comment #11)
> We should consider flushing invalidates onload; otherwise there can be a 20ms
> pause between when onload fires and when things start to paint.

Why do we care about that? If somebody removes an iframe before it had a chance to paint, as is the case in bug 601332, isn't that his fault?
Comment 14 Boris Zbarsky [:bz] (still a bit busy) 2010-10-02 06:14:44 PDT
> But we also want to not flush layout onload?

We don't want to flush layout before firing the onload event.  I think we should in fact flush layout and invalidates right after firing the onload event, so the user gets to see the page right then.  There are perceived performance benefits there, unlike the flush before onload.

> isn't that his fault?

Assigning fault on the web is usually pointless; the upshot in bug 601332 is that we look like we're hung and other browsers don't.  Or that we take too long to paint things, if someone digs a little bit (which happens to be true!).
Comment 15 Markus Stange [:mstange] 2010-10-02 06:59:21 PDT
(In reply to comment #14)
> the upshot in bug 601332 is that we look like we're hung

OK, that's not good. Maybe we can fix that differently without flushing more than 60 times a second... I'll look into it.
Comment 16 Boris Zbarsky [:bz] (still a bit busy) 2010-10-02 11:03:05 PDT
Yeah, I'm not quite sure what the right solution for that situation is...  The case when close() is called there (so onload fires) would be fixed by my proposal in comment 11, but the case when close() isn't called is hard.
Comment 17 Markus Stange [:mstange] 2010-12-21 05:49:04 PST
(In reply to comment #9)
> (In reply to comment #8)
> > I see that our reftest system uses
> > windowutils.processUpdates, but I don't understand why it needs it yet.
> 
> I think it doesn't need it. MozAfterPaint events and drawWindow results are
> completely independent from native widget drawing.

For future reference, roc came to the same conclusion in bug 617152 comment 22.
Comment 18 Markus Stange [:mstange] 2011-08-02 10:57:50 PDT
Created attachment 550127 [details] [diff] [review]
wip

Here's a more up to date patch that also mixes in lots of cleanup. It worked pretty well a few months ago, apart from very few test failures and a ts_shutdown regression. Unfortunately, on current trunk it results in window sizing weirdness and assertions, which I haven't had time to track down yet.
Comment 19 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-08-02 17:32:20 PDT
Markus, if you don't have time to work on this very much, maybe Mats could help.
Comment 20 Markus Stange [:mstange] 2011-10-03 05:49:47 PDT
Patches are almost ready for review; I've sent them to try one last time:
https://tbpl.mozilla.org/?tree=Try&rev=01b32e3f1328
Comment 21 Markus Stange [:mstange] 2011-10-03 10:31:41 PDT
Created attachment 564231 [details] [diff] [review]
part 1, v1: no sync plugin geometry update

I'm not sure if this patch won't break anything.

Plugin geometry updates are currently triggered from two sources: (1) around painting (either from WillPaint or from DidPaint) and (2) from a runnable. The purpose of (2) is to make sure that the update happens even if no painting occurs. But can we even hit that case? Maybe when a plugin moves offscreen or in a background tab? And what happens if we fail to do the plugin geometry update? Would it cause problems to only do the plugin geometry update when the plugin comes back onscreen (which will definitely trigger a paint)?

The problem with triggering a plugin geometry update from a runnable is that the event might fire before the paint happens, so we'd end up with two paints. Another problem with SynchronousPluginGeometryUpdate() is that it uses a synchronous repaint, and I want to remove all sync paint APIs.
Comment 22 Markus Stange [:mstange] 2011-10-03 10:33:23 PDT
Created attachment 564233 [details] [diff] [review]
part 2, v1: remove ForceUpdate() and Composite() and their callers
Comment 23 Markus Stange [:mstange] 2011-10-03 10:34:50 PDT
Created attachment 564234 [details] [diff] [review]
part 3, v1: remove synchronous painting APIs from nsIWidget
Comment 24 Markus Stange [:mstange] 2011-10-03 10:36:56 PDT
Created attachment 564235 [details] [diff] [review]
part 4, v1: remove unused nsIDOMWindowUtils::processUpdates

I think you removed the last caller (reftests) in bug 617152.
Comment 25 Timothy Nikkel (:tnikkel) 2011-10-03 10:38:53 PDT
Have you tested if this regresses things like bug 635465 on Windows?
Comment 26 Markus Stange [:mstange] 2011-10-03 10:39:04 PDT
Created attachment 564237 [details] [diff] [review]
part 5, v1: remove invalidate event stuff
Comment 27 Timothy Nikkel (:tnikkel) 2011-10-03 10:39:32 PDT
Have you checked if this regresses things like bug 635465 on Windows?
Comment 28 Markus Stange [:mstange] 2011-10-03 10:41:44 PDT
Created attachment 564238 [details] [diff] [review]
part 6, v1: remove futile eEditorUseAsyncUpdatesMask flag

All updates now behave like NS_VMREFRESH_NO_SYNC, so the flag has become useless.
Comment 29 Markus Stange [:mstange] 2011-10-03 10:42:56 PDT
Created attachment 564240 [details] [diff] [review]
part 7, v1: remove update flags
Comment 30 Markus Stange [:mstange] 2011-10-03 10:44:15 PDT
Created attachment 564241 [details] [diff] [review]
part 8, v1: remove unused mUpdateCnt
Comment 31 Markus Stange [:mstange] 2011-10-03 10:45:36 PDT
Created attachment 564243 [details] [diff] [review]
part 9, v1: remove ignored aIgnoreWidgetView argument
Comment 32 Markus Stange [:mstange] 2011-10-03 10:47:57 PDT
Created attachment 564247 [details] [diff] [review]
part 11, v1: set up a connection between view manager and refresh driver
Comment 33 Markus Stange [:mstange] 2011-10-03 10:50:36 PDT
Created attachment 564253 [details] [diff] [review]
part 10, v1: factor out AddDirtyRegion and FlushDirtyRegionToWidget

(this needs to be applied before part 11)
Comment 34 Markus Stange [:mstange] 2011-10-03 10:52:54 PDT
Created attachment 564255 [details] [diff] [review]
part 12, v1: remove aWidget argument

There's a comment above UpdateWidgetArea which justifies the existence of this argument, but in reality all callers just call ->GetWidget() on the view they pass in.
Comment 35 Markus Stange [:mstange] 2011-10-03 11:09:31 PDT
Created attachment 564261 [details] [diff] [review]
part 13, v1: flush invalidations from the refresh driver instead of from view update batches

This is the most important part because it completes the connection to the refresh driver.

This patch changes the meaning of several things. Most importantly, the result of IsRefreshEnabled() is now treated completely differently. Before this patch, while refresh was enabled (i.e. outside of any view update batch), invalidations were instantly passed to the widget, and widget geometry was update synchronously. While refresh was disabled (i.e. during a view update batch), widget geometry changes and invalidations were stored on the view, and only synced to the widget at the end of the outermost view update batch.
So IsRefreshEnabled() meant all of "painting is allowed at the moment", "invalidations on the widget are allowed at the moment", and "widget geometry changes are expected to be synchronous".

Now, IsRefreshEnabled() only means "widget geometry changes are expected to be synchronous". Since widget geometry changes can cause synchronous painting, this also means that painting is allowed during that time.

Whether invalidations are stored on the view or on the widget doesn't depend on IsRefreshEnabled() any more. The new rules are: (1) All invalidations are stored on the view, and (2) they're only flushed to the widget when ProcessPendingUpdates() is called (which only happens via the refresh driver).
Comment 36 Markus Stange [:mstange] 2011-10-03 11:12:37 PDT
Created attachment 564262 [details] [diff] [review]
part 14, v1: reorganize update view batches

Most users of update view batches only wanted the paint coalescing feature, which is now the default even outside update view batches. So most uses of them can be removed.
They're only necessary at times when painting is absolutely forbidden, e.g. during reflow.
Comment 37 Markus Stange [:mstange] 2011-10-03 11:14:31 PDT
Created attachment 564263 [details] [diff] [review]
part 15, v1: remove repaint caused by listbox popup opening

This is a behavior change that maybe doesn't belong into this bug, but it's a caller that I didn't want to keep updating. I think it's completely unnecessary.
Comment 38 Markus Stange [:mstange] 2011-10-03 11:16:25 PDT
Created attachment 564265 [details] [diff] [review]
part 16, v1: separate NS_WILL_PAINT and NS_PAINT handling; only flush again if no NS_WILL_PAINT event has been sent by the platform
Comment 39 Markus Stange [:mstange] 2011-10-03 11:17:51 PDT
Created attachment 564266 [details] [diff] [review]
part 17, v1: remove unused aViewManager argument

Unrelated cleanup which I stumbled upon while I was near.
Comment 40 Markus Stange [:mstange] 2011-10-03 11:19:28 PDT
Created attachment 564268 [details] [diff] [review]
part 18, v1: rename Update* to Invalidate*

Updates are now never synchronous, and asynchronous view updates are called invalidations in my book.
Comment 41 Markus Stange [:mstange] 2011-10-03 11:20:46 PDT
Created attachment 564269 [details] [diff] [review]
part 19, v1: update the invalidation setup comment
Comment 42 Markus Stange [:mstange] 2011-10-03 11:22:24 PDT
Created attachment 564271 [details] [diff] [review]
part 20, v1: bump nsIViewManager IID
Comment 43 Markus Stange [:mstange] 2011-10-03 11:29:12 PDT
(In reply to Timothy Nikkel (:tn) from comment #25)
> Have you tested if this regresses things like bug 635465 on Windows?

Not yet, thanks for the pointer.
Comment 44 Markus Stange [:mstange] 2011-10-03 12:16:54 PDT
It does regress bug 635465 on Windows. :(
Comment 45 Timothy Nikkel (:tnikkel) 2011-10-03 12:20:53 PDT
Ok so bug 635465 comment 12 seems like what we want to do to fix that. I just haven't gotten around to doing it yet. Do you want to take a stab at that?
Comment 46 Markus Stange [:mstange] 2011-10-03 12:42:06 PDT
(In reply to Markus Stange from comment #44)
> It does regress bug 635465 on Windows. :(

On OS X, too. Fortunately.

(In reply to Timothy Nikkel (:tn) from comment #45)
> Ok so bug 635465 comment 12 seems like what we want to do to fix that. I
> just haven't gotten around to doing it yet. Do you want to take a stab at
> that?

Sure.
Comment 47 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 14:49:30 PDT
Comment on attachment 564231 [details] [diff] [review]
part 1, v1: no sync plugin geometry update

Review of attachment 564231 [details] [diff] [review]:
-----------------------------------------------------------------

Imagine there's a windowed plugin covering the whole content window and it gets moved or removed. The plugin's widget sits there and blocks all repainting of the browser content. Assuming nothing else happens, no Gecko painting occurs, and without this code, we'll never update plugin geometry and update the plugin widget. At least, that's why I wrote this code.

The reason the repaint is synchronous here is because it's important to make the plugin geometry update happen as quickly as possible.

We can take this out if, after moving all painting to the refresh driver, we also do all the plugin geometry updates from the widget driver and they happen every time the refresh driver runs, even if the invalid areas are covered by a plugin widget. So this shouldn't be hard to fix, and it's OK to take this patch *if* we fix things up at the end of the patch queue, like I have suggested.
Comment 48 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 14:53:01 PDT
Comment on attachment 564261 [details] [diff] [review]
part 13, v1: flush invalidations from the refresh driver instead of from view update batches

Review of attachment 564261 [details] [diff] [review]:
-----------------------------------------------------------------

We want to get rid of views, so it would actually be good to move storage of invalidations further out, probably to the frames corresponding to displayroots (to the callers of nsFrame::InvalidateRoot, actually).

I'm not sure whether it makes sense to do that in this bug, or defer it. What do you think?
Comment 49 Markus Stange [:mstange] 2011-10-03 15:00:44 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #47)
> Imagine there's a windowed plugin covering the whole content window and it
> gets moved or removed. The plugin's widget sits there and blocks all
> repainting of the browser content. Assuming nothing else happens, no Gecko
> painting occurs, and without this code, we'll never update plugin geometry
> and update the plugin widget.

I see, thank you.

> We can take this out if, after moving all painting to the refresh driver, we
> also do all the plugin geometry updates from the widget driver and they
> happen every time the refresh driver runs, even if the invalid areas are
> covered by a plugin widget. So this shouldn't be hard to fix, and it's OK to
> take this patch *if* we fix things up at the end of the patch queue, like I
> have suggested.

OK, I'll do that.

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #48)
> I'm not sure whether it makes sense to do that in this bug, or defer it.
> What do you think?

I'd rather defer it. And then ask you to do it. :-)
Comment 50 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 15:06:40 PDT
Comment on attachment 564231 [details] [diff] [review]
part 1, v1: no sync plugin geometry update

Review of attachment 564231 [details] [diff] [review]:
-----------------------------------------------------------------

This is OK given you've promised to fix this in another patch :-)
Comment 51 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 15:48:25 PDT
One thing I'm a little concerned about is possibly adding latency to user actions where we're removing synchronous painting, like in scrolling. We may need to address this by creating a way to trigger immediate running of the refresh driver at the end of the current event; we'll see.
Comment 52 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 16:09:39 PDT
Comment on attachment 564237 [details] [diff] [review]
part 5, v1: remove invalidate event stuff

Review of attachment 564237 [details] [diff] [review]:
-----------------------------------------------------------------

This patch effectively makes VMREFRESH_DEFERRED a no-op. I think you should remove it in this patch.

I believe the uses of VMREFRESH_DEFERRED are fine except possibly for this one:
http://mxr.mozilla.org/mozilla-central/source/editor/libeditor/base/nsEditor.cpp#4231
but I think it's OK to just remove it.

I believe that's the only real use of eEditorUseAsyncUpdatesMask, so we should remove that too.
Comment 53 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 16:10:22 PDT
ooops I see you're doing that in later patches!
Comment 54 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 16:12:20 PDT
Comment on attachment 564237 [details] [diff] [review]
part 5, v1: remove invalidate event stuff

Review of attachment 564237 [details] [diff] [review]:
-----------------------------------------------------------------

This patch effectively makes VMREFRESH_DEFERRED a no-op. I think you should remove it in this patch.

I believe the uses of VMREFRESH_DEFERRED are fine except possibly for this one:
http://mxr.mozilla.org/mozilla-central/source/editor/libeditor/base/nsEditor.cpp#4231
but I think it's OK to just remove it.

I believe that's the only real use of eEditorUseAsyncUpdatesMask, so we should remove that too.
Comment 55 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 16:12:56 PDT
Sorry about that spurious comment, Splinter error.
Comment 56 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 16:26:22 PDT
Comment on attachment 564247 [details] [diff] [review]
part 11, v1: set up a connection between view manager and refresh driver

Review of attachment 564247 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/base/nsRefreshDriver.h
@@ +165,5 @@
> +  void RemoveInvalidationFlushObserver(nsIPresShell* aShell) {
> +    if (mInvalidationFlushObserver == aShell) {
> +      mInvalidationFlushObserver = nsnull;
> +    }
> +  }

Instead of treating the presshell as an observer, why not just always call mPresContext->GetShell()->GetViewManager()->ProcessPendingUpdates()? Seems a lot simpler.
Comment 57 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 16:41:25 PDT
Comment on attachment 564261 [details] [diff] [review]
part 13, v1: flush invalidations from the refresh driver instead of from view update batches

Review of attachment 564261 [details] [diff] [review]:
-----------------------------------------------------------------

Let's rename IsRefreshEnabled to IsPaintingAllowed.
Comment 58 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 16:45:52 PDT
Comment on attachment 564262 [details] [diff] [review]
part 14, v1: reorganize update view batches

Review of attachment 564262 [details] [diff] [review]:
-----------------------------------------------------------------

This is actually a little scary since you're allowing paints in a lot more places, if someone happens to flush layout which used to be inside a view-update-batch.

I wonder what breaks if we always delay widget geometry updates until the refresh driver runs. That would be safest and simplest.
Comment 59 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 16:50:13 PDT
Comment on attachment 564265 [details] [diff] [review]
part 16, v1: separate NS_WILL_PAINT and NS_PAINT handling; only flush again if no NS_WILL_PAINT event has been sent by the platform

Review of attachment 564265 [details] [diff] [review]:
-----------------------------------------------------------------

We really need to file bugs to get all platforms to dispatch WILL_PAINT/DID_PAINT...
Comment 60 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 17:57:31 PDT
Comment on attachment 564261 [details] [diff] [review]
part 13, v1: flush invalidations from the refresh driver instead of from view update batches

Review of attachment 564261 [details] [diff] [review]:
-----------------------------------------------------------------

::: view/src/nsViewManager.cpp
@@ +458,5 @@
> +  nsViewManager* rootVM = RootViewManager();
> +  rootVM->mHasPendingUpdates = true;
> +  if (rootVM->mObserver) {
> +    nsCOMPtr<nsIPresShell> shell = do_QueryInterface(rootVM->mObserver);
> +    shell->ScheduleInvalidationFlush();

Really shell->ScheduleInvalidationFlush() should be in nsIViewObserver.
Comment 61 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 18:02:12 PDT
Also, I realized there's a problem here due to view manager invalidation being global for the entire document tree, whereas we have separate nsRefreshDrivers for each document. I think these patches will *work*, but we could be increasing latency due to a content document's refresh driver causing invalidation, but that invalidation not being applied to the widget until the root document's refresh driver has run (followed by the actual paint).

I'm not sure how serious a problem this is.
Comment 62 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 18:13:58 PDT
So, there's a few more things I think we should do in this area. They don't have to be done in this bug, but we should prepare for them.

-- Connect refresh drivers in the same document tree together to avoid the problem in the previous comment. Really we should have only one timer for the entire document tree, have the root document's refresh driver be solely responsible for viewmanager invalidation flushing, and have it propagate notifications down the entire tree of refresh drivers (possibly optimizing to skip refresh driver subtrees that do not need to run).

-- Paint synchronously in the root document refresh driver after it has flushed everything. This should reduce layout flushing because we're less likely to have dirtied the layout again before the painting triggered by the refresh driver happens. (This means we probably don't want patch 3 to land.) It could also reduce latency.

-- When receiving a paint event other than during that synchronous paint in the refresh driver, just trigger an immediate run of the refresh driver and don't do anything else. (This might be a bit tricky if the synchronous paint of the refresh driver interferes with the OS paint event; we may need to make the refresh driver run slightly asynchronously.)

-- Then the refresh driver will take care of all flushing before painting, so we don't need to flush in the view system.

-- Then we may be able to avoid OS-level invalidates completely, by making our synchronous paint just paint directly instead of in response to an OS paint event.
Comment 63 Boris Zbarsky [:bz] (still a bit busy) 2011-10-03 18:19:04 PDT
Note that there isn't really a "tree" of refresh drivers.  There's one for the toplevel chrome window and one for every root of a content docshell tree.  That's it.
Comment 64 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 18:22:58 PDT
Oh right, even better, that would make step 1 in comment #62 easier.
Comment 65 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-03 18:52:55 PDT
Hmm, but we don't want all chrome painting to be gated on content document flushing (not that it matters with e10s, but still). hmm.
Comment 66 Markus Stange [:mstange] 2011-10-05 07:28:36 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #51)
> One thing I'm a little concerned about is possibly adding latency to user
> actions where we're removing synchronous painting, like in scrolling. We may
> need to address this by creating a way to trigger immediate running of the
> refresh driver at the end of the current event; we'll see.

I agree that it would make sense to skip the delay before the first timer firing, if the timer is not running already. (Otherwise the delay between two subsequent refresh driver runs would be unnecessarily short.)

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #56)
> ::: layout/base/nsRefreshDriver.h
> @@ +165,5 @@
> > +  void RemoveInvalidationFlushObserver(nsIPresShell* aShell) {
> > +    if (mInvalidationFlushObserver == aShell) {
> > +      mInvalidationFlushObserver = nsnull;
> > +    }
> > +  }
> 
> Instead of treating the presshell as an observer, why not just always call
> mPresContext->GetShell()->GetViewManager()->ProcessPendingUpdates()? Seems a
> lot simpler.

Good idea. But the refresh driver still needs a method that tells it to start the timer, and a way to remember that something is waiting for an invalidation flush (so that we don't take the early exit in nsRefreshDriver::Notify). So it might not be that much simpler in the end. I'll give it a try.

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #58)
> Comment on attachment 564262 [details] [diff] [review] [diff] [details] [review]
> part 14, v1: reorganize update view batches
> 
> Review of attachment 564262 [details] [diff] [review] [diff] [details] [review]:
> -----------------------------------------------------------------
> 
> This is actually a little scary since you're allowing paints in a lot more
> places, if someone happens to flush layout which used to be inside a
> view-update-batch.

Right. For some reason I only thought of popup widgets here, and not of child widgets (whose geometry changes might also cause sync painting)... I need to think about this again.

> I wonder what breaks if we always delay widget geometry updates until the
> refresh driver runs. That would be safest and simplest.

That's what I tried first, but then I got all kinds of test failures involving popups. I've forgotten all the details (this was a year ago), so I'll fire off another try run with that change. But I think it was due to nsMenuPopupFrame.cpp or nsXULPopupManager.cpp working with widget coordinates instead of view coordinates. But maybe all that has changed in the meantime.

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #62)
> (This might be a bit tricky if the synchronous paint
> of the refresh driver interferes with the OS paint event; we may need to
> make the refresh driver run slightly asynchronously.)

And do nothing during the OS paint? That might work, but it's not very pretty. Or we could just do a refresh driver run without triggering the sync OS paint, and then do the paint outside the refresh driver directly from the OS paint.

In any case we'd need to detect recursion in two places.

With async painting we wouldn't need any recursion detection. We might have two subsequent refresh driver runs (one from the refresh driver timer, and one from inside the triggered OS paint), but if the second run has nothing to flush, that might be cheap enough to not make a difference.

> -- Then we may be able to avoid OS-level invalidates completely, by making
> our synchronous paint just paint directly instead of in response to an OS
> paint event.

But then we couldn't coalesce our own paints with OS-triggered paints, could we? And we'd have two different code paths entering painting.

Do we have evidence of cases where asynchronous painting and going through OS-level invalidates induces noticeable delays?
Comment 67 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-05 16:31:46 PDT
> > -- Then we may be able to avoid OS-level invalidates completely, by making
> > our synchronous paint just paint directly instead of in response to an OS
> > paint event.
> 
> But then we couldn't coalesce our own paints with OS-triggered paints, could
> we? And we'd have two different code paths entering painting.

I believe OS-triggered paints are relatively rare, especially on modern composited systems where the OS is usually retaining the contents of our window anyway.

> Do we have evidence of cases where asynchronous painting and going through
> OS-level invalidates induces noticeable delays?

No, but no-one's looked. We have had a lot of problems with event prioritization involving paint events, which would become non-issues if we simply painted ourselves off a timer.

> With async painting we wouldn't need any recursion detection. We might have two
> subsequent refresh driver runs (one from the refresh driver timer, and one from
> inside the triggered OS paint), but if the second run has nothing to flush, that
> might be cheap enough to not make a difference.

While animations are running, the second run always has something to do.
Comment 68 Andreas Gal :gal 2011-10-13 00:29:21 PDT
Will this fix bug 693707?
Comment 70 Markus Stange [:mstange] 2011-10-13 11:11:17 PDT
Created attachment 566886 [details] [diff] [review]
part 11, v2
Comment 71 Markus Stange [:mstange] 2011-10-13 11:15:26 PDT
Created attachment 566887 [details] [diff] [review]
part 13, v2
Comment 72 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-13 15:57:41 PDT
Comment on attachment 566887 [details] [diff] [review]
part 13, v2

Review of attachment 566887 [details] [diff] [review]:
-----------------------------------------------------------------

::: view/src/nsViewManager.cpp
@@ +790,5 @@
> +          }
> +        }
> +
> +        if (!didResize) {
> +          //NS_ASSERTION(view->IsEffectivelyVisible(), "painting an invisible view");

Is this assertion valid? I think not, it seems to me the flush could change the view visibility. Just take it out.
Comment 73 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-10-24 15:33:30 PDT
(In reply to Markus Stange from comment #66)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #58)
> > Comment on attachment 564262 [details] [diff] [review] [diff] [details] [review] [diff] [details] [review]
> > part 14, v1: reorganize update view batches
> > 
> > Review of attachment 564262 [details] [diff] [review] [diff] [details] [review] [diff] [details] [review]:
> > -----------------------------------------------------------------
> > 
> > This is actually a little scary since you're allowing paints in a lot more
> > places, if someone happens to flush layout which used to be inside a
> > view-update-batch.
> 
> Right. For some reason I only thought of popup widgets here, and not of
> child widgets (whose geometry changes might also cause sync painting)... I
> need to think about this again.
> 
> > I wonder what breaks if we always delay widget geometry updates until the
> > refresh driver runs. That would be safest and simplest.
> 
> That's what I tried first, but then I got all kinds of test failures
> involving popups. I've forgotten all the details (this was a year ago), so
> I'll fire off another try run with that change. But I think it was due to
> nsMenuPopupFrame.cpp or nsXULPopupManager.cpp working with widget
> coordinates instead of view coordinates. But maybe all that has changed in
> the meantime.

These are the only issues remaining to be addressed, AFAIK...
Comment 74 Timothy Nikkel (:tnikkel) 2011-10-24 15:35:42 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #73)
> > But I think it was due to
> > nsMenuPopupFrame.cpp or nsXULPopupManager.cpp working with widget
> > coordinates instead of view coordinates. But maybe all that has changed in
> > the meantime.

My patches in bug 668437 change this.
Comment 75 Leman Bennett [Omega] 2011-11-13 17:43:04 PST
What's the status on this? Its been very quiet since the 24th.
Comment 76 Andreas Gal :gal 2011-11-13 17:46:56 PST
I still see > 120Hz repaints on mac. We should really fix this.
Comment 77 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-11-14 17:08:12 PST
Giving to Boris to finish addressing comments and get this landed!
Comment 78 Boris Zbarsky [:bz] (still a bit busy) 2011-11-14 20:13:35 PST
> Let's rename IsRefreshEnabled to IsPaintingAllowed.

Done.

> We really need to file bugs to get all platforms to dispatch WILL_PAINT/DID_PAINT...

Which platforms need those bugs?

> I wonder what breaks if we always delay widget geometry updates until the refresh driver
> runs.

Kicked off a try build for that.
Comment 79 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-11-14 20:57:43 PST
(In reply to Boris Zbarsky (:bz) from comment #78)
> > We really need to file bugs to get all platforms to dispatch WILL_PAINT/DID_PAINT...
> 
> Which platforms need those bugs?

Windows, Mac and GTK support WILL_PAINT. So Android, Qt, PuppetWidget and OS/2 need WILL_PAINT.

Windows, GTK and PuppetWidget support DID_PAINT. So Android, Qt, OS/2 and Mac need DID_PAINT.
Comment 80 Mozilla RelEng Bot 2011-11-15 01:50:23 PST
Try run for ff2e449c4945 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=ff2e449c4945
Results (out of 209 total builds):
    success: 171
    warnings: 37
    failure: 1
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bzbarsky@mozilla.com-ff2e449c4945
Comment 81 Markus Stange [:mstange] 2011-11-15 04:49:03 PST
Sorry for the silence. Here's a try push with more up-to-date patches:
https://tbpl.mozilla.org/?tree=Try&rev=51b7bb1988ff

Here's an older one with the same patches but an older mozilla-central base:
https://tbpl.mozilla.org/?tree=Try&rev=7a371ec21c6f

There are some mochitest-other failures on Mac which I can't reproduce. The mochitest-1 failure is tnikkel's test from bug 592954 which unfortunately isn't fixed by the flush I added for bug 635465 in https://hg.mozilla.org/try/rev/965f95d8d123 .

The main changes in that try push compared to the patches that are already attached here are addressing comment 58; i.e. making all widget geometry changes asynchronous. That happens in https://hg.mozilla.org/try/rev/b4ecd98e61de and https://hg.mozilla.org/try/rev/39130b9def6e .

(In reply to Timothy Nikkel (:tn) from comment #74)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #73)
> > > But I think it was due to
> > > nsMenuPopupFrame.cpp or nsXULPopupManager.cpp working with widget
> > > coordinates instead of view coordinates. But maybe all that has changed in
> > > the meantime.
> 
> My patches in bug 668437 change this.

The three places that still needed fixing were nsPopupBoxObject::GetOuterScreenRect, nsResizerFrame::HandleEvent and nsTitleBarFrame::HandleEvent. https://hg.mozilla.org/try/rev/5332a0aa1ded

The remaining popup test failures which I could reproduce locally were caused by popup move and resize events that fire at times when the current code doesn't expect them. https://hg.mozilla.org/try/rev/4c7619d98fce prevents them from firing on Mac for popups without titlebars, but that's probably not the right fix.
In fact I'm not sure what should be the correct behavior here. For example, in one case a test opened a popup, operated on it, and closed it. When a popup is closed, its view is resized to 0x0. When that resize was synced to the widget, the resulting resize event set width="0" height="0" attributes on the popup in nsXULPopupManager::PopupResized, and when the popup was reopened it was too small.
Maybe we shouldn't fire move and resize events while a widget is closed, but I think that wouldn't fix all problems. For example, in the async widget visibility world, when a popup is closed and reopened without a view manager flush in between, the widget is never hidden.

https://hg.mozilla.org/try/rev/2e095cf5e2e1 isn't really necessary if we block popup move events, but it will be if we decide that we don't want the blocking.

Boris, do you want to continue with the old patches or the new ones? Should I attach my new patches here, or should we save the async view geometry work for another bug?
Comment 82 Mozilla RelEng Bot 2011-11-15 08:20:23 PST
Try run for 1c3a590d6f11 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=1c3a590d6f11
Results (out of 211 total builds):
    success: 183
    warnings: 28
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bzbarsky@mozilla.com-1c3a590d6f11
Comment 83 Boris Zbarsky [:bz] (still a bit busy) 2011-11-15 13:36:37 PST
<sigh>.

So what I have locally is a copy of the patches in this bug but merged to tip, with the following changes:

1)  Rename IsRefreshEnabled to IsPaintingAllowed
2)  Make sure that ResetWidgetBounds only resizes widgets when called from ProcessPendingUpdates.  It's a somewhat less invasive change than https://hg.mozilla.org/try/rev/b4ecd98e61de and can be found at https://hg.mozilla.org/try/rev/9719f8c1acc7
3)  To work around the obvious fails that result in Moth popup tests with jut #2, I added some code that flushes out widget resizes when a Flush_Layout happens on the presshell.  That's found in the same changeset as #2.

I'm not making any attempts to do async widget visibility changes.

With those changes, I get the try run in comment 82.  It looks like there's some random orange, but not anything obviously caused by these patches.

So if we think that the flush described above and sync visibility changes are safe, I would tend to think we should land that and worry about async visibility changes and weaning popups off widget coordinates in other bugs.  Thoughts?
Comment 84 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-11-15 15:19:01 PST
I like Boris's approach. It sounds pretty safe.

(In reply to Markus Stange from comment #49)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #47)
> > We can take this out if, after moving all painting to the refresh driver, we
> > also do all the plugin geometry updates from the widget driver and they
> > happen every time the refresh driver runs, even if the invalid areas are
> > covered by a plugin widget. So this shouldn't be hard to fix, and it's OK to
> > take this patch *if* we fix things up at the end of the patch queue, like I
> > have suggested.
> 
> OK, I'll do that.

Has this been done? I don't see it.
Comment 85 Boris Zbarsky [:bz] (still a bit busy) 2011-11-16 02:11:58 PST
So I'm actually still seeing a bunch of orange on my try push that does not in fact look random now that I've respun some builds.  :(
Comment 86 Boris Zbarsky [:bz] (still a bit busy) 2011-11-18 05:04:56 PST
Current progress update:  Most of the orange is fixed.  The remaining issues are a Windows debug accessibility test leak that's caused by part 13 of the patch queue and a Linux reftest fail that roc and I are still debugging.

Next up, looking into addressing comment 47.
Comment 87 Andreas Gal :gal 2011-11-22 21:53:50 PST
bz, I think we are about to remove support for windowed plugins so we get async painting, so comment 47 might be moot.
Comment 88 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-11-23 02:19:41 PST
No we aren't!
Comment 89 Andreas Gal :gal 2011-11-23 07:52:37 PST
jst can you clarify? Maybe I misunderstood our plan for asynchronous painting but my impression was we will force windowless mode for all plugin objects on a page in the future (or well the flash plugin will and we will deprecate and drop support for windowed mode).
Comment 90 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-11-23 13:42:25 PST
Many plugins don't support windowless mode, or won't be updated to use our new async rendering APIs in a timely manner. So we can't drop support for windowed plugins without completely dropping support for lots of plugins (everything except Flash, probably). Tempting as that is, I don't think we can do it. (Making them all click-to-play might be reasonable though.)
Comment 91 Andreas Gal :gal 2011-11-23 13:50:03 PST
In practice anything but flash and maybe silverlight are mostly irrelevant. We can make it click to play and deprecated. It's definitely not something we should focus on supporting or even optimize for.
Comment 92 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-11-23 14:16:07 PST
This discussion belongs in another forum. But I'm confident we won't be completely dropping Java support anytime soon. We won't even be dropping windowed Flash support anytime soon, since that's gated on us shipping our new API, Adobe implementing support for it, and updating the vast majority of Flash users to a version supporting the API. We certainly don't want this bug to be waiting for that!
Comment 93 Johnny Stenback (:jst, jst@mozilla.com) 2011-11-23 22:33:45 PST
(In reply to Andreas Gal :gal from comment #89)
> jst can you clarify? Maybe I misunderstood our plan for asynchronous
> painting but my impression was we will force windowless mode for all plugin
> objects on a page in the future (or well the flash plugin will and we will
> deprecate and drop support for windowed mode).

The long term plan is certainly to drop windowed plugin support, but how quickly we can do that is unclear. What seems clear from discussion with flash engineers is that we can very soon start forcing flash into windowless mode and that way get async painting for all flash no matter what the webpage requests. Whether that's something we do in the browser or if that's something flash does internally is tbd.
Comment 94 IU 2011-12-04 12:48:13 PST
(In reply to Johnny Stenback (:jst, jst@mozilla.com) from comment #93)
> The long term plan is certainly to drop windowed plugin support...

This will increase the visibility of bug 686673, so hopefully it can be fixed before then.
Comment 95 Jet Villegas (:jet) 2011-12-07 16:58:26 PST
Adding [Snappy] search string. Synching invalidates to the refresh driver should improve perceived performance.
Comment 96 Boris Zbarsky [:bz] (still a bit busy) 2011-12-07 17:04:55 PST
Progress update: The linux reftest I mention in comment 86 is fixed, and I've figured out how to work around the a11y fail with a change to the setTimeout value in the test for now.  Planning on comment 47 work tomorrow.
Comment 97 Boris Zbarsky [:bz] (still a bit busy) 2011-12-09 23:09:56 PST
Created attachment 580620 [details] [diff] [review]
Rename IsRefreshEnabled to IsPaintEnabled.  Fits between part 13 and part 14 of Marcus' patch queue
Comment 98 Boris Zbarsky [:bz] (still a bit busy) 2011-12-09 23:13:28 PST
Created attachment 580621 [details] [diff] [review]
Update of part 14 per comment 83 item 2
Comment 99 Boris Zbarsky [:bz] (still a bit busy) 2011-12-09 23:16:29 PST
Created attachment 580622 [details] [diff] [review]
Part 14.5 in the old numbering.  Flush out widget geometry changes when flushing layout.
Comment 100 Boris Zbarsky [:bz] (still a bit busy) 2011-12-09 23:24:56 PST
I also pushed the relevant mq to >http://hg.mozilla.org/users/bzbarsky_mozilla.com/async-painting-queue/> in case people want to look.  My apologies for the extraneous files there...
Comment 101 Boris Zbarsky [:bz] (still a bit busy) 2011-12-09 23:37:41 PST
OK.  So I've been staring at comment 47 for a bit.

Is this just a matter of calling nsRootPresContext::UpdatePluginGeometry on every refresh tick?  I'm not sure whether the "even if the invalid areas are covered by a plugin widget" part from comment 47 means that UpdatePluginGeometry is not OK to use...

If this is just a matter of UpdatePluginGeometry, then I assume that the only reason we need to add it is that refresh driver is doing interruptible layout flushes and PresShell::FlushPendingNotifications only updates plugin geometry on non-interruptible ones?

On the assumption that this is true, going to post a patch to add the UpdatePluginGeometry call....
Comment 102 Boris Zbarsky [:bz] (still a bit busy) 2011-12-09 23:39:24 PST
Created attachment 580625 [details] [diff] [review]
part 23.  When the refresh driver runs, update plugin geometry after layout updates.
Comment 103 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-12-10 02:55:56 PST
(In reply to Boris Zbarsky (:bz) from comment #101)
> If this is just a matter of UpdatePluginGeometry, then I assume that the
> only reason we need to add it is that refresh driver is doing interruptible
> layout flushes and PresShell::FlushPendingNotifications only updates plugin
> geometry on non-interruptible ones?

Currently FlushPendingNotifications, via PresShell::DoReflow, only calls RequestUpdatePluginGeometry, which runs SynchronousPluginGeometryUpdate off an event. And that tries to trigger a repaint which will update the plugin geometry in WillPaint. The goal is to have plugin geometry changes happen as close as possible in time to the actual repainting of the window.

The change in part 1 means that we just wait for the next paint of the presshell to do the geometry changes, which might never happen, as described in comment #47.

Your patch in part 23 will ensure that plugin geometry gets updated. I wonder if it will increase the delay between updating plugin geometry and repainting the window, though. Eventually (or maybe soon) I want the refresh driver to synchronously paint the window, which would probably fix it (especially if we update the layer tree update the plugin geometry, then just composite the layer tree).

Perhaps a better alternative to part 23 would be to just use a timer or something to ensure that the UpdatePluginGeometry eventually happens if no paint is forthcoming.
Comment 104 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-12-10 03:20:37 PST
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #103)
> (In reply to Boris Zbarsky (:bz) from comment #101)
> > If this is just a matter of UpdatePluginGeometry, then I assume that the
> > only reason we need to add it is that refresh driver is doing interruptible
> > layout flushes and PresShell::FlushPendingNotifications only updates plugin
> > geometry on non-interruptible ones?

Oh, I see what you mean now. That UpdatePluginGeometry call was added later, at least in my thought processes, to ensure plugin geometry is up-to-date when plugin tests talk to the plugin after flushing. It's fortunate that the refresh driver currently doesn't trigger this.
Comment 105 Boris Zbarsky [:bz] (still a bit busy) 2011-12-14 09:51:12 PST
Created attachment 581682 [details] [diff] [review]
Part 23 with a timer, as discussed
Comment 106 Boris Zbarsky [:bz] (still a bit busy) 2011-12-14 10:07:59 PST
> Windows, Mac and GTK support WILL_PAINT. So Android, Qt, PuppetWidget and
> OS/2 need WILL_PAINT.

Filed bug 710765, bug 710767, bug 710769, bug 710770.

> Windows, GTK and PuppetWidget support DID_PAINT. So Android, Qt, OS/2 and
> Mac need DID_PAINT.

Filed bug 710777, bug 710774, bug 710772, bug 710773.
Comment 107 Boris Zbarsky [:bz] (still a bit busy) 2011-12-14 15:52:18 PST
Created attachment 581818 [details] [diff] [review]
Timer to update plugin geometry, actually passing tests
Comment 108 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-12-14 18:06:50 PST
Comment on attachment 581818 [details] [diff] [review]
Timer to update plugin geometry, actually passing tests

Review of attachment 581818 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/base/nsPresContext.cpp
@@ +2615,5 @@
> +    // so set it for double the refresh driver interval.
> +    mUpdatePluginGeometryTimer = do_CreateInstance("@mozilla.org/timer;1");
> +    if (mUpdatePluginGeometryTimer) {
> +      mUpdatePluginGeometryTimer->
> +        InitWithFuncCallback(UpdatePluginGeometryCallback, this, 32,

Can we actually expose DEFAULT_FRAME_RATE and use it here instead of having a magic number?
Comment 109 Boris Zbarsky [:bz] (still a bit busy) 2011-12-14 23:16:09 PST
Created attachment 581892 [details] [diff] [review]
Interdiff to that last patch to expose the default interval
Comment 110 Boris Zbarsky [:bz] (still a bit busy) 2011-12-16 22:01:05 PST
Will land after the branchpoint so this has a full cycle of bake time.
Comment 111 Timothy Nikkel (:tnikkel) 2011-12-17 12:00:49 PST
Have we resolved comment 25?
Comment 112 Ryan VanderMeulen [:RyanVM] 2011-12-17 12:10:30 PST
Using the win32 build from what appears to be the latest try push, the bug 635465 testcase doesn't appear to work correctly. When I try to hover over the second link, it disappears. The testcase works on trunk, of course.
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bzbarsky@mozilla.com-602317873a2c/
Comment 113 Boris Zbarsky [:bz] (still a bit busy) 2011-12-17 19:31:53 PST
> Have we resolved comment 25?

Gah.  I assumed that passing tests would handle that, but apparently not...

I'll look into that, I guess.  :(
Comment 114 Timothy Nikkel (:tnikkel) 2011-12-18 00:19:20 PST
I tried and failed to write a test for that, sorry.
Comment 115 Boris Zbarsky [:bz] (still a bit busy) 2011-12-22 17:14:06 PST
Created attachment 583973 [details] [diff] [review]
part 24.  Flush on every mousemove, because otherwise we can end up with mouse events (mousemove, mousein, mouseout) dispatched to the wrong elements.
Comment 116 Olli Pettay [:smaug] 2011-12-23 03:33:01 PST
Comment on attachment 583973 [details] [diff] [review]
part 24.  Flush on every mousemove, because otherwise we can end up with mouse events (mousemove, mousein, mouseout) dispatched to the wrong elements.

>+    // Flush pending layout changes, so that later mouse move events
>+    // will go to the right nodes.
>+    FlushPendingEvents(aPresContext);

Strange method name, but not your fault.
Comment 117 Boris Zbarsky [:bz] (still a bit busy) 2011-12-23 20:01:17 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/c631f9c3e9a9
https://hg.mozilla.org/integration/mozilla-inbound/rev/df447fcc75f4
https://hg.mozilla.org/integration/mozilla-inbound/rev/0787c2f0c45f
https://hg.mozilla.org/integration/mozilla-inbound/rev/e9834142dbbd
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b25cab4ad9d
https://hg.mozilla.org/integration/mozilla-inbound/rev/88ebe1a18f0d
https://hg.mozilla.org/integration/mozilla-inbound/rev/aba624563ee7
https://hg.mozilla.org/integration/mozilla-inbound/rev/b3fa93cdc185
https://hg.mozilla.org/integration/mozilla-inbound/rev/583ac6c21fbe
https://hg.mozilla.org/integration/mozilla-inbound/rev/42fde5b5e098
https://hg.mozilla.org/integration/mozilla-inbound/rev/fc0f7ad1de41
https://hg.mozilla.org/integration/mozilla-inbound/rev/84a70e2a270e
https://hg.mozilla.org/integration/mozilla-inbound/rev/4ce03bf986e7
https://hg.mozilla.org/integration/mozilla-inbound/rev/51699a063cea
https://hg.mozilla.org/integration/mozilla-inbound/rev/02cfda93f548
https://hg.mozilla.org/integration/mozilla-inbound/rev/1116117b73cc
https://hg.mozilla.org/integration/mozilla-inbound/rev/2147078591dc
https://hg.mozilla.org/integration/mozilla-inbound/rev/05a23c8d3acb
https://hg.mozilla.org/integration/mozilla-inbound/rev/01db2746f7d2
https://hg.mozilla.org/integration/mozilla-inbound/rev/3e1a744fba1e
https://hg.mozilla.org/integration/mozilla-inbound/rev/2a36941a6991
https://hg.mozilla.org/integration/mozilla-inbound/rev/09f0d4cc58cc
https://hg.mozilla.org/integration/mozilla-inbound/rev/9cb424c5127d
https://hg.mozilla.org/integration/mozilla-inbound/rev/1ac4cb2e7c32
Comment 118 Phil Ringnalda (:philor) 2011-12-23 22:32:27 PST
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/25224a78f895

Remember a try push a few days (or weeks, dunno) ago, that I starred for you, but instead of starring the Android XUL reftest-3, I retriggered it three or four times, and every one was orange? That must have been part of the patches for this, since I remember those failures as having been svg/as-image/canvas-drawImage-simple-1a.html and svg/as-image/canvas-drawImage-origin-clean-1.html, same as this hit on inbound, and that was supposed to be an orange "look at me" flag.

I don't remember it having had the 14 to 19 failures in canvas reftests in reftest-2, though, so those may have been added by later parts that weren't in that try push yet, though they were in today's try push.
Comment 119 Phil Ringnalda (:philor) 2011-12-23 23:35:56 PST
Less of a problem, but you seem to be making the intermittent unexpected pass of bug 636379 much less intermittent - 5 of the 8 runs while you were in unexpectedly passed.
Comment 120 Boris Zbarsky [:bz] (still a bit busy) 2011-12-25 05:03:37 PST
OK, so the Android reftest issue shows up first in the patch that actually moves invalidates to the refresh driver.  The reftest screenshots show that some things are just not being painted, at least by the time the screenshot happens.

I assume this is something to do with out of process rendering.  Chris, any idea what I should be looking for here?
Comment 121 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-12-25 08:47:01 PST
The failures were in canvas drawing?

Hmm, I'm not sure.  The major difference in the reftest harness between the same-process and multi-process is that in multi-process, the content process doesn't snapshot window contents, but rather asks the chrome process to do so.  And before the chrome process can snapshot, the content process has to "flush" all pending layers updates, i.e. force shadow layer updates to propagate to the chrome process.  The code to do that is

http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/reftest-content.js#675

I'm not too familiar with the patches here, but maybe something changed wrt CanvasLayer behavior, maybe in incomplete layer transactions?  I can't help more without understanding these patches better, but that's where I would start looking.

Note too that you can run these tests on desktop with |make -C $objdir reftest-ipc|, so you should be able to bisect over the patches here.  You'll likely have the most success on linux-x11.  I can also send a mozconfig that should work there.  (But of course, if there's something timing related here too, that might not repro reliably on desktop :/.)
Comment 122 Boris Zbarsky [:bz] (still a bit busy) 2011-12-25 08:51:26 PST
> The failures were in canvas drawing?

Yes.

The main change in these patches is that invalidates are flushed to the widget layer off the refresh driver, instead of synchronously.

As far as bisecting, I did that on try already.  The patch that does the actual switch to using the refresh driver is what causes the issue.

Will look at that link.  Thanks!
Comment 123 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-12-25 09:03:21 PST
(In reply to Boris Zbarsky (:bz) from comment #122)
> The main change in these patches is that invalidates are flushed to the
> widget layer off the refresh driver, instead of synchronously.
> 

How does that interact with drawWindow(USE_WIDGET_LAYERS)?
Comment 124 Boris Zbarsky [:bz] (still a bit busy) 2011-12-25 09:44:12 PST
> How does that interact with drawWindow(USE_WIDGET_LAYERS)?

Good question.  Will check.
Comment 125 Boris Zbarsky [:bz] (still a bit busy) 2011-12-25 19:15:14 PST
OK, I can reproduce _one_ of the failures involved using reftest-ipc, I think.  Maybe.

I added code in drawWindow to flush out the invalidates by calling presShell->GetViewManager()->ProcessPendingUpdates() when drawing with USE_WIDGET_LAYERs, but that didn't help.

Turning off empty transactions for canvas by both commenting out the check for canvas in nsIFrame::InvalidateLayer and adding "&& 0" to the check for isRetainingManager in PresShell::Paint didn't help.
Comment 126 Boris Zbarsky [:bz] (still a bit busy) 2011-12-25 20:03:06 PST
I tried adding an invalidate flush in RenderFrameParent::ShadowLayersUpdated but I don't see why that would be needed; it's mostly flailing...
Comment 127 Boris Zbarsky [:bz] (still a bit busy) 2011-12-25 20:03:32 PST
Oh, and it doesn't obviously fix things locally, though I pushed it to try just in case.
Comment 128 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-12-27 11:09:14 PST
The flush in RenderFrameParent::ShadowLayersUpdated shouldn't be needed.

In general, what's supposed to happen is
 - SynchronizeForSnapshot(): force content-process paint, in order to force PLayers:Update message to be sent.
   * is PLayers:Update being sent?  (http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ShadowLayers.cpp#324)
   * if so, are the pixels in the buffer of the CanvasLayer in the layer tree correct?

If PLayers:Update *isn't* being sent, then something is probably bailing out early.  I'm not sure why; the best guess previously was empty transactions.

If PLayers:Update *is* being sent, but the CanvasLayers in these tests have the wrong pixels, then there's likely a bug in some invalidation/repainting logic.  PuppetWidget would seem to be most suspect.

(It's extremely unlikely that PLayers:Update is being sent with the correct CanvasLayer pixels, but there's some bug in the chrome process.)
Comment 129 Boris Zbarsky [:bz] (still a bit busy) 2011-12-27 12:32:23 PST
>   * if so, are the pixels in the buffer of the CanvasLayer in the layer tree correct?

How would I tell?  Keep in mind that I still can't reproduce this locally, at least on Mac, so I'm having to use tryserver for all my debugging....
Comment 130 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-12-27 13:40:39 PST
Easiest would be to use DumpAsDataURL() [1] after the UpdateSurface() call here [2].  I'm not sure that stdout will make it to the tinderbox log though, so you might need to dump it somewhere else :(, and/or with a magical prefix.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxASurface.h#242
[2] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/basic/BasicLayers.cpp#2691
Comment 131 Boris Zbarsky [:bz] (still a bit busy) 2011-12-28 22:33:59 PST
OK, so getting things into the log is rocket science.  I can do it if I'm willing to lose part of my data to the reftest logging code itself, but I haven't managed to actually get my data into the log sanely.

I tried building fennec, and I can install it on my tablet, but it seems like running reftest only works on rooted devices or something.  It totally fails with permissions errors on my (non-rooted) tablet.

I'm somewhat at a loss as to where to go from here.  :(
Comment 132 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-12-28 23:29:20 PST
Not being able to run reftests on an unrooted device smells like a bug to me, if that's the problem.  Maybe jmaher can help with that tomorrow?
Comment 133 Boris Zbarsky [:bz] (still a bit busy) 2011-12-29 21:41:57 PST
OK, I finally have reftests running.

At first glance, I'm getting calls to SendUpdate, and the two data: URLs that are dumped in EndTransaction both look correct.  I'll try to dig more into the exact timing of things here if I can, but probably not until Wednesday.
Comment 134 Boris Zbarsky [:bz] (still a bit busy) 2012-01-09 08:06:52 PST
I've spun off part 24 into bug 716549.
Comment 135 CruNcher 2012-01-12 07:33:11 PST
Sorry is this issue described here dependent on this https://bugzilla.mozilla.org/show_bug.cgi?id=702485 compared to the chromeium behavior ?
Comment 136 Boris Zbarsky [:bz] (still a bit busy) 2012-01-12 07:49:20 PST
That's a completely unrelated issue.
Comment 137 Boris Zbarsky [:bz] (still a bit busy) 2012-01-13 20:53:09 PST
After much log-wrangling and such with Chris, we found bug 718150.  Fixing that made at least one of the failing tests pass for me; I'll check the others.
Comment 138 Boris Zbarsky [:bz] (still a bit busy) 2012-01-14 08:47:31 PST
Yep, the patch for bug 718150 fixed the android failures.  Now looking at the other issues that seem to have popped up in the interim...
Comment 139 Boris Zbarsky [:bz] (still a bit busy) 2012-01-14 21:03:09 PST
Second time is the charm, right?

https://hg.mozilla.org/integration/mozilla-inbound/rev/b820017dc738
https://hg.mozilla.org/integration/mozilla-inbound/rev/7dd97ae4452d
https://hg.mozilla.org/integration/mozilla-inbound/rev/487d669f7973
https://hg.mozilla.org/integration/mozilla-inbound/rev/3b7183745bb9
https://hg.mozilla.org/integration/mozilla-inbound/rev/a0238d9206c2
https://hg.mozilla.org/integration/mozilla-inbound/rev/e83f53b3302e
https://hg.mozilla.org/integration/mozilla-inbound/rev/a4e8d5f7c9be
https://hg.mozilla.org/integration/mozilla-inbound/rev/04292f9ff363
https://hg.mozilla.org/integration/mozilla-inbound/rev/61a1f0b28812
https://hg.mozilla.org/integration/mozilla-inbound/rev/ac01aec1a983
https://hg.mozilla.org/integration/mozilla-inbound/rev/4616fa6c1dc7
https://hg.mozilla.org/integration/mozilla-inbound/rev/2d5e31af7e3e
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b492a0e51f5
https://hg.mozilla.org/integration/mozilla-inbound/rev/d7dcde1032ed
https://hg.mozilla.org/integration/mozilla-inbound/rev/a35342556482
https://hg.mozilla.org/integration/mozilla-inbound/rev/6d9721de770c
https://hg.mozilla.org/integration/mozilla-inbound/rev/71d77495b606
https://hg.mozilla.org/integration/mozilla-inbound/rev/b9e630462207
https://hg.mozilla.org/integration/mozilla-inbound/rev/05dff2f69c0c
https://hg.mozilla.org/integration/mozilla-inbound/rev/88b93c358499
https://hg.mozilla.org/integration/mozilla-inbound/rev/0d025e2e0134
https://hg.mozilla.org/integration/mozilla-inbound/rev/8988bb111202
https://hg.mozilla.org/integration/mozilla-inbound/rev/399dace827fa
https://hg.mozilla.org/integration/mozilla-inbound/rev/5e6e63f3aed8
Comment 141 Boris Zbarsky [:bz] (still a bit busy) 2012-02-29 23:01:17 PST
Backed out on Aurora 12 in bug 722641.
Comment 142 Jet Villegas (:jet) 2012-03-20 10:57:31 PDT
Marking this REOPENED until we get the code landed again.
Comment 143 Boris Zbarsky [:bz] (still a bit busy) 2012-03-20 11:07:02 PDT
The code is just fine on m-c.  It was only backed out on Aurora 12.

Note You need to log in before you can comment on or make changes to this bug.