Status

()

defect
REOPENED
6 years ago
3 years ago

People

(Reporter: smaug, Assigned: smaug)

Tracking

(Blocks 5 bugs)

Trunk
mozilla29
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox40 fixed)

Details

(Whiteboard: tpi:-)

Attachments

(6 attachments, 12 obsolete attachments)

11.87 KB, patch
roc
: review+
Details | Diff | Splinter Review
16.51 KB, patch
avih
: review+
roc
: review+
Details | Diff | Splinter Review
28.15 KB, patch
Details | Diff | Splinter Review
28.13 KB, patch
Details | Diff | Splinter Review
28.13 KB, patch
Details | Diff | Splinter Review
24.50 KB, patch
Details | Diff | Splinter Review
I can see some use for the perf hint on content processes, but hopefully we could
kill it elsewhere.
Posted patch patch (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=9262ddc2e7fb

No idea what all this will break :)
But I think we should just do this (after fixing tons of tests I guess).
Comment on attachment 822830 [details] [diff] [review]
Performance hint only in the content processes

>+++ b/dom/tests/mochitest/general/test_vibrator.html
>@@ -52,17 +52,17 @@ function testSuccesses() {
>   expectSuccess(1000);
>   expectSuccess(1000.1);
>   expectSuccess([0, 0, 0]);
>   expectSuccess(['1000', 1000]);
>   expectSuccess([1000, 1000]);
>   expectSuccess([1000, 1000.1]);
> 
>   // The following loop shouldn't cause us to crash.  See bug 701716.
>-  for (var i = 0; i < 10000; i++) {
>+  for (var i = 0; i < 1000; i++) {
>     navigator.vibrate([100, 100]);
>   }
So we end up posting a runnable for each these, and in case we're not in perf mode we
handle them slower than with perf mode.

>+++ b/dom/tests/mochitest/pointerlock/file_screenClientXYConst.html
>@@ -46,19 +46,19 @@ https://bugzilla.mozilla.org/show_bug.cg
>         is(unLockedCoords.screenY, lockedCoords.screenY,
>            "screenY should be equal to where the mouse was originaly locked");
>       }
> 
>       function moveUnlocked(e) {
>         var firstCall = !unLockedCoords;
>         if (!firstCall) {
>           todo(false, "mousemove is fired twice.");
>+        } else {
>+          isUnlocked = !document.mozPointerLockElement;
>         }
>-
>-        isUnlocked = !document.mozPointerLockElement;
I think the issue here is just racy test. Test expects gecko-only events, but there can be also native events.

But let's see what the tryserver tells.
And the browser_Troubleshoot.js failure is something that happened on m-i too but was fixed there
after I had updated my tree.
Attachment #822024 - Attachment is obsolete: true
Attachment #822830 - Attachment description: +some test fixes → Performance hint only in the content processes
Trying to reduce tp regression by allowing a bit more gecko event processing.
Haven't tested yet how this affects to responsiveness.

https://tbpl.mozilla.org/?tree=Try&rev=e6860ad556bf
Comment on attachment 822901 [details] [diff] [review]
only_content_process_favor_perf_hint_5.diff

Doesn't seem to affect to anything and this was a test anyway.
Attachment #822901 - Attachment is obsolete: true
different kind of approach coming.
Summary: Disable favor performance hint on non-content-processes → Remove favor performance mode
Posted patch remove_favor_perf_2.diff (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=1079269a55d5

Locally this behaves well on Linux, but no idea about tp numbers.

Added also some code for windows since it isn't necessarily using omtc.
Other platforms should use it, I was told, so no need for the Update.
(But I don't have windows dev environment, so no idea whether the code even compiles there :) 
 If it does compile, will test trybuilds)


Matt, this is the bug I was talking on #gfx. Do the nsWindow changes look even close to reasonable?
Comment on attachment 831896 [details] [diff] [review]
remove_favor_perf_2.diff

Review of attachment 831896 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/xpwidgets/nsBaseAppShell.cpp
@@ +253,5 @@
>    mProcessedGeckoEvents = false;
>  
> +  if (start - mLastNativeEventTime > limit) {
> +    mLastNativeEventTime = start;
> +    DoProcessNextNativeEvent(false, recursionDepth);

Didn't you want to use the if part rather than the else? It looks like the else part which you kept is the perf mode where native events are throttled...
No, because this way I can keep good tp results, and the fix to RefreshDriver handling
ensures user visible jank stays still ok.
tp results are not an absolute benchmark to not regress IMO. We do expect some tp regressions because more work will happen before the page is loaded (e.g. painting it few times maybe, handling user inputs if they occur, etc). This is by design - perf mode throttle native events, and we want to un-throttle them.

The part which you kept is clearly throttling native events (forcing |limit| to pass before handling them again), of which OnPaint is one, BTW (on windows), and not handling OnPaint will not update the screen. Same goes for mouse inputs I'd think, so if indeed it's throttled, then it has a potential to hurt responsiveness compared to "non-perf" mode before the patch.

You also modified limit from 20 to 10ms, so maybe 10ms between handling native events is OK, and maybe it isn't. As you mentioned on IRC, we don't currently have a framework to measure responsiveness from input event to paint (hopefully bug 935680 will help with that).

I'm just saying we should understand how that limit condition affects behavior, preferably at the code level together with some actual hands-on tests, and especially since to me it looks like it's a regression compared to "normal" mode before the patch.
tp is not an absolute benchmark, but it is a benchmark.

Also, I didn't actually say we don't have framework to measure from input to
paint, but from invalidate to paint. Currently there are cases when we're processing input and
Invalidating, but we just don't let OS to paint. And we have no tools to detect that.

In practice the patch shouldn't regress behavior, since we force the Update, so user actually
sees something happening on the screen. That is the (or a) real jank we have now, UI not updating.

I'm about to upload a patch which tries to process a native event whenever gecko event is processed.
Depends on: 938030
Depends on: 931641
(In reply to Olli Pettay [:smaug] from comment #11)
> (But I don't have windows dev environment, so no idea whether the code even
> compiles there :) 
>  If it does compile, will test trybuilds)
> 
> 
> Matt, this is the bug I was talking on #gfx. Do the nsWindow changes look
> even close to reasonable?

If it works! :)

Looks sane to me anyway.
Seems to work and fixes bug 822096 and bug 880036.

https://tbpl.mozilla.org/?tree=Try&rev=cf7ea2fa86f6
v3 + some test fixes.
So the original code had 2 modes at nsBaseAppShell::OnProcessNextEvent(..) :

Normal mode:
- On each entry, as long as there are more native events pending, they are processed in a loop of DoProcessNextNativeEvent for up to limit ms.

Perf mode:
- On each entry, call DoProcessNextNativeEvent once, but only if limit ms have elapsed since the last native event was processed.


In patch v2 from comment 10, it's exactly like perf mode before the patches (but with the Update() calls).

In patch v3, it's like perf mode, but DoProcessNextNativeEvent is called once on each entry rather than only if limit ms have elapsed (also with the Update() calls).


Now, the Update() calls help with flushing, but they don't help increase native events processing.

To me it still seems that normal mode before the patch can process much more native events than patch v3 (unless a typical single call to DoProcessNextNativeEvent lasts more than limit ms - which I think is unlikely).

So I think there are two potential results for this:

1. It'll paint more than in perf mode before the patch, because of the added Update() calls and extra processing of Native events (each entry rather than only after limit ms elapsed).

2. Compared to normal mode, it'll potentially paint less and native events might queue up, making the browser less responsive on cases where many native events are queued, because it calls DoProcessNextNativeEvent only once on each entry rather than in a loop of upto limit ms on each entry.
(In reply to Avi Halachmi (:avih) from comment #20)

> Now, the Update() calls help with flushing, but they don't help increase
> native events processing.
Yes they do. Calling Update() explicitly makes non-OMTC platforms to update the
screen asap, instead of waiting native event loop to update UI later.


> To me it still seems that normal mode before the patch can process much more
> native events than patch v3 (unless a typical single call to
> DoProcessNextNativeEvent lasts more than limit ms - which I think is
> unlikely)
there aren't usually many pending native events.


> 
> So I think there are two potential results for this:
> 
> 1. It'll paint more than in perf mode before the patch, because of the added
> Update() calls and extra processing of Native events (each entry rather than
> only after limit ms elapsed).
yes

> 
> 2. Compared to normal mode, it'll potentially paint less and native events
> might queue up, making the browser less responsive on cases where many
> native events are queued, because it calls DoProcessNextNativeEvent only
> once on each entry rather than in a loop of upto limit ms on each entry.
No, it won't paint less, since we Invalidate based on RefreshDriver.
Native events are possibly processed slower _if_ we don't have many Gecko events.
But even then, we want the UI to update. So processing tons of native events, but not showing
the results to the user is effectively jank.
(In reply to Olli Pettay [:smaug] from comment #21)
> > To me it still seems that normal mode before the patch can process much more
> > native events than patch v3 (unless a typical single call to
> > DoProcessNextNativeEvent lasts more than limit ms - which I think is
> > unlikely)
> there aren't usually many pending native events.

So your patch not only removes perf mode, but also modifies the behavior of normal mode to possibly process less native events (compared to gecko events). If we have good data or understanding to support also this change, then fine, otherwise, it has a potential to regress (not on tp tests which never measured in normal mode).
We usually have lots of Gecko events, so we end up processing lots of native events to with v3.
And if we don't have Gecko events, we're idle from Gecko point of view and just process native events.

Just calling DoProcessNextNativeEvent(false, recursionDepth); always is almost enough
to fix the jank-y scrolling I get (bug 822096), but not enough to fix bug 880036 properly.
If the main reason to process native events more is to make painting happen in a timely manner, OMTC will make that unnecessary. And we could make it less necessary without OMTC, by calling UpdateWindow() at the end of nsRefreshDriver::Tick on Windows.
The patch makes us call UpdateWindow() at the end of Tick. 
And AFAIK OMTC is not very close to be enabled on linux, so need to fix bug 822096
and bug 880036 even without OMTC.


I prefer (v3) attachment 832162 [details] [diff] [review] which gives good compromise between gecko and native events and
forces timely updates on non-omtc platforms. That is the patch I'll ask review for once I've
fixed bug 931641 (and possibly some other racy tests).
Blocks: 880036, 822096
Depends on: 939084
Depends on: 944125
Depends on: 944126
Depends on: 944128
Posted patch v3.1 (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=ef4badd0355e
Attachment #822830 - Attachment is obsolete: true
Attachment #822840 - Attachment is obsolete: true
Attachment #822841 - Attachment is obsolete: true
Attachment #831896 - Attachment is obsolete: true
Attachment #832162 - Attachment is obsolete: true
Attachment #832166 - Attachment is obsolete: true
Comment on attachment 8340525 [details] [diff] [review]
v3.1

Actually, I think asking review for this now is ok.
I'll need to still look at some random failures on linux, but they all
seem to be racy tests.

Because the patch needs to go in as one, splitting it doesn't make much sense.
Favor-perf-mode removal is just removing code, and reviewer should see the
big picture here anyway.
The basic idea is to remove favor-perf-mode and always try to process one native
event per Gecko event. And to get visual updates fast enough, call Update() from refresh driver. Update() should be in general no-op with omtc.

The patch regresses tp a bit but helps with tp_responsiveness, and 
bug 822096 and bug 880036 are fixed.

This all is scary enough that it should go in early in a cycle.
Attachment #8340525 - Flags: review?(roc)
Comment on attachment 8340525 [details] [diff] [review]
v3.1

Review of attachment 8340525 [details] [diff] [review]:
-----------------------------------------------------------------

Can't the code to update the window during the refresh driver tick be factored out into its own patch?
I can do that.

(not sure it is useful, since I wouldn't land it without the other stuff.)
Attachment #8340525 - Attachment is obsolete: true
Attachment #8340525 - Flags: review?(roc)
Attachment #8341298 - Flags: review?(roc)
Attachment #8341300 - Flags: review?(roc)
Attachment #8341300 - Flags: review?(avihpit)
Comment on attachment 8341298 [details] [diff] [review]
RefreshDriver to Update

Review of attachment 8341298 [details] [diff] [review]:
-----------------------------------------------------------------

::: view/public/nsViewManager.h
@@ +305,5 @@
>    /**
>     * Flush the accumulated dirty region to the widget and update widget
>     * geometry.
>     */
> +  void ProcessPendingUpdates(bool aTryToSyncUpdate);

Please make this a flags word or an enum so we don't have to decipher what ProcessPendingUpdates(true) means.

::: widget/cocoa/nsChildView.h
@@ +469,5 @@
>  
>    virtual int32_t         RoundsWidgetCoordinatesTo() MOZ_OVERRIDE;
>  
>    NS_IMETHOD              Invalidate(const nsIntRect &aRect);
> +  NS_IMETHOD              Update();

just make this virtual void.
Attachment #8341298 - Flags: review?(roc) → review+
Comment on attachment 8341300 [details] [diff] [review]
remove favor perf, 3.1

Review of attachment 8341300 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/xpwidgets/nsBaseAppShell.cpp
@@ +9,5 @@
>  #include "nsThreadUtils.h"
>  #include "nsIObserverService.h"
>  #include "nsServiceManagerUtils.h"
>  #include "mozilla/Services.h"
> +#include "nsBaseWidget.h"

Still needed?
Attachment #8341300 - Flags: review?(avihpit) → review+
Blocks: 732621
Depends on: 945979
Depends on: 948092
Depends on: 941587
Depends on: 943793
Depends on: 899910
Depends on: 942229
Depends on: 949227
Depends on: 950337
Depends on: 950342
Depends on: 867505
Depends on: 950367
Posted patch 3.1.1Splinter Review
Posted patch v3.1.2Splinter Review
Right kind of NS_OBJC_END_TRY_ABORT*
https://tbpl.mozilla.org/php/getParsedLog.php?id=33033795&tree=Mozilla-Inbound is a nice case
where gdk after resize returns first the correct data and then for some reason the old data.
Depends on: 960309
Depends on: 960351
https://hg.mozilla.org/mozilla-central/rev/80b64a63b524
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Depends on: 964074
...because trying to fix Android 2.2 regression revealed that we have racy
b2g reftests and what not.
Comment on attachment 8368173 [details] [diff] [review]
patch for backout

backout, not backup
Attachment #8368173 - Attachment description: patch for backup → patch for backout
http://hg.mozilla.org/integration/mozilla-inbound/rev/89f9304ff4ba

I'll reland early next week (and continue fixing the racy b2g tests before that :/ ).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Olli Pettay [:smaug] from comment #44)
> I'll reland early next week (and continue fixing the racy b2g tests before
> that :/ ).

Did this end up relanding? (or maybe the racy b2g tests were racier than anticipated?)
Flags: needinfo?(bugs)
I haven't found time to fix b2g reftest framework, not some more tests.

But I will..
Flags: needinfo?(bugs)
Note that bug 959281 and bug 996848 also touch heavily the events processing system. It'd probably be wise to not try landing them in parallel.
Posted patch 2015-v1 (obsolete) — Splinter Review
Trying to get this in again

https://tbpl.mozilla.org/?tree=Try&rev=311b4f2720eb
Posted patch 2015-v2 (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=893974c6b6a1
Attachment #8574655 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/1b8f36a7ee32
Status: REOPENED → RESOLVED
Closed: 6 years ago4 years ago
Resolution: --- → FIXED
Blocks: 1153879
https://hg.mozilla.org/integration/mozilla-inbound/rev/a5a7035ebf21
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This also appeared on Autophone Throbber stop for local-blank. The backout confirms it.
http://phonedash.mozilla.org/#/org.mozilla.fennec/throbberstop/local-blank/norejected/2015-04-12/2015-04-13/notcached/errorbars/standarderror/notry

Did not appear/was not significant  for local-twitter or webappstartup though.
No longer blocks: 1153879
Depends on: 1153879
Posted patch 2015-v3 (obsolete) — Splinter Review
Some new tryserserver  try: -b o -p linux,linux64,macosx64,win32,win64 -u none -t all pushes
without the patch, clean m-i: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0ab7f32def79
with the patch, same m-i + patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ce8adeefcea
Attachment #8577494 - Attachment is obsolete: true
Looks like any new native event processing during window open regresses tpaint on linux
(and in fact if native event processing is reduced from what it is now on m-c, we get even better results from that test).

An issue seems to be that we get more OnExposeEvent calls, so, more painting.
(In reply to Olli Pettay [:smaug] from comment #55)
> Created attachment 8627464 [details] [diff] [review]
> 2015-v3.1
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=033c5f3e23b1
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=5231b5e4ee26

After smaug retriggers those tests 5 more times, we ended up with this compare-view:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=033c5f3e23b1&newProject=try&newRevision=5231b5e4ee26

It shows some relatively small ( < 5% ) improvements and regressions, and one supposedly meaningful (25%) tpaint regression on linux only.

smaug thinks this is not an actual regression but rather a weakness of tpaint which got exposed by the patch of this bug (where tpaint measured the first content paint which possibly happened with the first chrome paint before the patch, but now chrome paint happens earlier, but tpaint still measures the first content paint). That's bug 1179377.
Depends on: 1179377
IMO we should try to land this and accept the regressions (and improvements) as they manifest at comment 57 (several up to 5% improvements and regressions).

Including the tpaint linux regression if it ends up to be real (which currently seems like not a real regression).
Well, it is a regression, but somewhat expected one. We don't block Gecko from interacting with the
OS level stuff so much with the patch, so we end up processing certain things faster, which may lead to process Gecko internal things slower.
But the tpaint itself does indeed seem to be a bit too racy, and measuring a bit wrong things.
Blocks: 1187850
Note that phonedash urls have changed with the addition of the new mean vs min display, you need to append /mean to the end of the url. 

The new is:

http://phonedash.mozilla.org/#/org.mozilla.fennec/throbberstop/local-blank/norejected/2015-04-12/2015-04-13/notcached/errorbars/standarderror/notry/mean
Depends on: 1260070
Whiteboard: tpi:-
You need to log in before you can comment on or make changes to this bug.