Closed Bug 1302071 Opened 8 years ago Closed 7 years ago

Webm sample - There is a lag at going on/off Fullscreen

Categories

(Core :: Layout, defect, P3)

49 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla54
Tracking Status
firefox48 --- unaffected
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox53 --- wontfix
firefox54 --- verified

People

(Reporter: cgeorgiu, Assigned: mattwoodrow)

References

Details

(Keywords: regression)

Attachments

(5 files)

[Notes]:
- Gif attached for Win
- MP4 attached for Ubuntu

[Affected versions]:
- latest Nightly 51.0a1 (2016-09-07)
- latest Aurora 50.0a2 (2016-09-07)
- Firefox Beta 49.0b10

[Affected platforms]:
- Win 10 Pro, x64
- Ubuntu 16.04 LTS, x64
- MAC OS Sierra 10.12 Beta

[Steps to reproduce]:
1. Launch Firefox
2. Go to this *webm video file: http://techslides.com/demos/sample-videos/small.webm
3. Press Fullscreen, exit Fullscreen several times

[Expected result]:
- Going on and off Full screen is smooth, with no lags.

[Actual result]:
- There are lags while going on/off Full screen mode.

[Regression range]:
- I will investigate for a regression range ASAP.

[Additional notes]:

- On Chrome it's instant
- It happens with ESC, Exit Full screen (Esc) notification on top of the video and double click on the video. 
- On Ubuntu 16.04 LTS, there is also glitching on all versions (affected and unaffected)
- On MAC OS Sierra 10.12 Beta (16A313a), the lag appeared only when the browser window was minimized.
Hi Anthony, Chris -- This is believed to be a new regression in Fx49.  Can one of you triage it?  Thanks.
Flags: needinfo?(cpearce)
Flags: needinfo?(ajones)
Version: Trunk → 49 Branch
(In reply to Ciprian Georgiu, QA [:ciprian_georgiu] from comment #0)

> [Regression range]:
> - I will investigate for a regression range ASAP.

Hi Ciprian, any luck getting a regression range?
Flags: needinfo?(ciprian.georgiu)
(In reply to Ciprian Georgiu, QA [:ciprian_georgiu] from comment #0)
> [Notes]:
> - Gif attached for Win
> - MP4 attached for Ubuntu

WFM. I don't see the attachments. What kind of machine are you using?
I investigated on Nightly builds and I came up with following results:

Last good revision: 660b322ef9fd22614359864cb2ebff3943511766
First bad revision: 2f34f0a20c58059be4ccac8f2023ca2a9defedd3
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=660b322ef9fd22614359864cb2ebff3943511766&tochange=2f34f0a20c58059be4ccac8f2023ca2a9defedd3

Looks like the following bug has the changes which introduced the regression:
https://bugzilla.mozilla.org/show_bug.cgi?id=1264091

(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #3)
> (In reply to Ciprian Georgiu, QA [:ciprian_georgiu] from comment #0)
> > [Notes]:
> > - Gif attached for Win
> > - MP4 attached for Ubuntu
> 
> WFM. I don't see the attachments. What kind of machine are you using?

I have attached again the following files presenting the issues:
Win 10 Pro x64: https://drive.google.com/file/d/0B-B8HwBlbdDXQWozcTV3ZFlSY1k/view?usp=sharing
Ubuntu 16.04 LTS x64: https://drive.google.com/file/d/0B-B8HwBlbdDXMzZXNlZlSGxUUDg/view?usp=sharing

If you have any questions, please let me know.
Thank you, Carmen for your investigation.
Flags: needinfo?(ciprian.georgiu)
Blocks: 1264091
Flags: needinfo?(ajones)
Xidorn - this is a fullscreen issue so I'll leave it with you.
Flags: needinfo?(cpearce) → needinfo?(xidorn+moz)
The blinking on Ubuntu looks pretty weird... Probably worth a different bug.

It seems to me this issue happens when the video is loading (has the loading indicator at the center of the video). I can reproduce the lag when that happens.

Bug 1264091 just adds code to track the transaction id, and only continues fullscreen change when it observes a higher transaction id. Does video loading somehow block the increasement of transaction id?
Flags: needinfo?(xidorn+moz) → needinfo?(ajones)
Too late for 49 but we could still consider a patch for 50.
Component: Audio/Video: Playback → Layout
Flags: needinfo?(ajones)
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #7)
> The blinking on Ubuntu looks pretty weird... Probably worth a different bug.

Did we get one on file, Ciprian?

> It seems to me this issue happens when the video is loading (has the loading
> indicator at the center of the video). I can reproduce the lag when that
> happens.

I think that warrants a fix-optional for at least 50.

Xidorn, did you get answers to your questions in comment 7?
Flags: needinfo?(xidorn+moz)
Flags: needinfo?(ciprian.georgiu)
(In reply to Andrew Overholt [:overholt] from comment #9)
> (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #7)
> > The blinking on Ubuntu looks pretty weird... Probably worth a different bug.
> 
> Did we get one on file, Ciprian?

Yes, I filed a separate bug for that issue, a few weeks ago, here: 1305736.
Flags: needinfo?(ciprian.georgiu)
No, I didn't get answer.

If there could be some reliable way to keep the loading indicator displayed, it might be easier to diagnose what happens behind.
Flags: needinfo?(xidorn+moz)
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #7)
> It seems to me this issue happens when the video is loading (has the loading
> indicator at the center of the video). I can reproduce the lag when that
> happens.
> 
> Bug 1264091 just adds code to track the transaction id, and only continues
> fullscreen change when it observes a higher transaction id. Does video
> loading somehow block the increasement of transaction id?

Anthony, based on comment 11 Xidorn is still looking for an answer here.
Flags: needinfo?(ajones)
Anthony is probably not familiar with that... It seems to be something for Layout people, though I don't know who is the best one to ask.

I can probably investigate that latter myself... But not in my top priority...
Flags: needinfo?(ajones)
Leave an ni? to myself.
Flags: needinfo?(xidorn+moz)
Priority: -- → P3
So it does seem that the transaction id is incorrect. It doesn't increase in the following MozAfterPaint event which flush the rendering. My guess is that the previous pending transaction is revoked somehow, and the same transaction id is then reused for the rendering of fullscreen change.

It seems recording transaction id is not very reliable... Should we probably make it more reliable by, probably never reusing a revoked transaction id? mconley, any idea?
Flags: needinfo?(xidorn+moz) → needinfo?(mconley)
I'm afraid I don't know much about how transaction IDs are assigned. :/

Perhaps mstange knows more? Markus, do you know if transaction IDs can be revoked somehow?
Flags: needinfo?(mconley) → needinfo?(mstange)
Matt knows about transaction IDs. Matt, see the patch in bug 1264091 that added the full screen transition check based on MozAfterPaint event transaction IDs.
Flags: needinfo?(mstange) → needinfo?(matt.woodrow)
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #15)
> So it does seem that the transaction id is incorrect. It doesn't increase in
> the following MozAfterPaint event which flush the rendering. My guess is
> that the previous pending transaction is revoked somehow, and the same
> transaction id is then reused for the rendering of fullscreen change.
> 
> It seems recording transaction id is not very reliable... Should we probably
> make it more reliable by, probably never reusing a revoked transaction id?
> mconley, any idea?

The only time we revoke a transaction id is if we don't end up painting anything. The allocation of a new id and then revoking it again happens entirely within painting code, so shouldn't be visible to JS.

We shouldn't be firing MozAfterPaint multiple times with the same transaction id, it sounds like a bug if we are.

I don't know the DOMFullscreen code very well, are we sure that there will be a content paint scheduled *after* we recorded the lastTransactionId?

lastTransactionId is the last id used by content to paint, it might not have been through the compositor and returned as MozAfterPaint yet.
Flags: needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #18)
> We shouldn't be firing MozAfterPaint multiple times with the same
> transaction id, it sounds like a bug if we are.

I don't think we are firing multiple MozAfterPaint events with the same transition id. It is just that transition Id returned from lastTransactionId is not consumed for the currently on-going transaction, and the Id is reused by the paint we are actually waiting for.

> I don't know the DOMFullscreen code very well, are we sure that there will
> be a content paint scheduled *after* we recorded the lastTransactionId?

There should be, because there would at least be a resize reflow from fullscreen change.

> lastTransactionId is the last id used by content to paint, it might not have
> been through the compositor and returned as MozAfterPaint yet.

I know that. This detection was added because we don't want to unblock the screen in MozAfterPaint event of a paint triggered before the resize reflow.

Could we never reuse an id even if the transaction is revoked?
Flags: needinfo?(matt.woodrow)
The transaction id detection was added because when we move the document into fullscreen, there could have been an on-going paint which may trigger a MozAfterPaint after we start waiting, however the rendering result with that event is still before the fullscreen change, so we don't want to do anything with that.

The issue here is that, the on-going paint is revoked, and the transaction id is reused for the paint we are actually waiting for, but we expect the transaction id has been advanced since the last on-going transaction.
We only increment the transaction id value during ClientLayerManager::ForwardTransaction (and then possibly revoke it again in the same function).

I don't see how javascript could possibly be seeing the incremented value before it is revoked.
Flags: needinfo?(matt.woodrow)
It seems the issue is not that the transaction id is revoked and reused, it is that the MozAfterPaint event is not dispatched for the transaction after the fullchange.

This is some brief log by code I added to nsRefreshDriver:
> GetTransactionId: 231
> NotifyTransactionCompleted: 231 (230, 231)
> GetTransactionId: 232
> GetTransactionId: 233
> NotifyTransactionCompleted: 232 (231, 233)
> LastTransactionId: 233
> * DOMFullscreen:Entered: 233
> GetTransactionId: 234
> * MozAfterPaint: 233
> NotifyTransactionCompleted: 233 (232, 234)
> GetTransactionId: 235
> RevokeTransactionId: 235
> NotifyTransactionCompleted: 234 (233, 234)
> GetTransactionId: 235
> * MozAfterPaint: 235
> * Removing MozAfterPaint listener
> NotifyTransactionCompleted: 235 (234, 235)
> GetTransactionId: 236
> GetTransactionId: 237
> NotifyTransactionCompleted: 236 (235, 237)

Format is:
* GetTransactionId: mPendingTransaction after ++
* NotifyTransactionCompleted: aTransaction (mCompletedTransaction, mPendingTransaction) before setting mCompletedTransaction
* LastTransactionId: mPendingTransaction

As can be seen above, after "DOMFullscreen:Entered" which is triggered just before we start the fullscreen change, there is a transaction id allocation returns 234, and we then receive MozAfterPaint for the transaction 233 which was started before the fullscreen change.

However, there is no MozAfterPaint for transaction 234, and the listener is removed only when the MozAfterPaint event for transaction 235 is dispatched. But normally, a MozAfterPaint should be dispatched before NotifyTransactionCompleted is called.

In what case may we not dispatch a MozAfterPaint for a transaction? There is some code freezes/thaws refresh driver for fullscreen, which might be related.
Flags: needinfo?(matt.woodrow)
mFirePaintEvents being false on the PresContext?

It looks like that might be racy as we set it immediately when we record an invalidation, but clear it in response to the async message back from the compositor.
Flags: needinfo?(matt.woodrow)
MozAfterPaint/NotifyTransactionCompleted 233 will probably set mFireAfterPaintEvents to false [1].

Transaction 235 *won't* set it to true again (since we didn't paint anything, revoked the transaction id, and didn't call nsPresContext::NotifyInvalidation/NotifySubDocInvalidation).

NotifyTransactionCompleted 234 gets called, but since mFireAfterPaintsEvents is false, then we don't fire MozAfterPaint [2].

That's bad, seems like a bug introduced when we converted MozAfterPaint to be fired when the compositor finishes (instead of when content finishes).


[1] http://searchfox.org/mozilla-central/source/layout/base/nsPresContext.cpp#2482
[2] http://searchfox.org/mozilla-central/source/layout/base/nsPresContext.cpp#2448
Xidorn, is this bug still on your radar? It's definitely a noticeable lag.
Per comment 24, it feels like a graphics issue that MozAfterPaint is not fired as expected when transaction 234 finished, so it is probably a question for Matt.
Flags: needinfo?(xidorn+moz) → needinfo?(matt.woodrow)
Assignee: nobody → matt.woodrow
Flags: needinfo?(matt.woodrow)
So what's happening is:

* JS records the current transaction ID N, and begins waiting for MozAfterPaint to be fired for N+1.

* We start painting transaction N+1 and record the invalidated area in nsPresContext::mInvalidateRequestsSinceLastPaint (and set mFireAfterPaintEvents to true).

* We finish painting transaction N+1 and call nsPresContext::NotifyDidPaintForSubtree(PAINT_LAYERS, N+1). This moves the invalidated rectangles from mInvalidateRequestsSinceLastPaint into mUndeliveredInvalidateRequestsBeforeLastPaint.

* We receive an asynchronous DidComposite message from the compositor for transaction N, and call NotifyDidPaintForSubtree(PAINT_COMPOSITE, N). This fires the MozAfterPaint event for transaction N, using the invalid rectangles computed for N+1. It also clears mFireAfterPaintEvents as both mInvalidateRequestsSinceLastPaint and mUndeliveredInvalidateRequestsBeforeLastPaint are now empty.

* We receive an asynchronous DidComposite message from the compositor for transaction N+1, and call NotifyDidPaintForSubtree(PAINT_COMPOSITE, N+1). This does nothing as mFireAfterPaintEvents is false.

* We wait until something else triggers an invalidation and causes a MozAfterPaint to be sent.
Comment on attachment 8838285 [details]
Bug 1302071 - Part 1: Remove AddPresShellToInvalidateIfHidden since it doesn't appear to be necessary.

https://reviewboard.mozilla.org/r/113234/#review114714
Attachment #8838285 - Flags: review?(tnikkel) → review+
Comment on attachment 8838286 [details]
Bug 1302071 - Part 2: Remove ClearMozAfterPaintEvents since it's unused.

https://reviewboard.mozilla.org/r/113236/#review114718
Attachment #8838286 - Flags: review?(tnikkel) → review+
Comment on attachment 8838287 [details]
Bug 1302071 - Part 3: Remove flags parameter to NotifyInvalidation since it's unused.

https://reviewboard.mozilla.org/r/113238/#review114724
Attachment #8838287 - Flags: review?(tnikkel) → review+
Comment on attachment 8838288 [details]
Bug 1302071 - Part 4: Pass the current transaction ID to NotifyInvalidation.

https://reviewboard.mozilla.org/r/113240/#review114726

::: layout/base/nsPresContext.h:958
(Diff revision 1)
>    // Ensure that it is safe to hand out CSS rules outside the layout
>    // engine by ensuring that all CSS style sheets have unique inners
>    // and, if necessary, synchronously rebuilding all style data.
>    void EnsureSafeToHandOutCSSRules();
>  
> -  void NotifyInvalidation(const nsRect& aRect);
> +  void NotifyInvalidation(uint64_t aTransactionId, const nsRect& aRect);

Can you document what the transactionid means here in a comment?
Attachment #8838288 - Flags: review?(tnikkel) → review+
Comment on attachment 8838289 [details]
Bug 1302071 - Part 5: Bucket PresContext invalidations by transaction ID, and only deliver them when the associated composite has completed.

https://reviewboard.mozilla.org/r/113242/#review114728

::: layout/base/nsPresContext.h:1294
(Diff revision 1)
>  
> -  nsTArray<nsRect> mInvalidateRequestsSinceLastPaint;
> -  nsTArray<nsRect> mUndeliveredInvalidateRequestsBeforeLastPaint;
> +  struct TransactionInvalidations {
> +    uint64_t mTransactionId;
> +    nsTArray<nsRect> mInvalidations;
> +  };
> +  AutoTArray<TransactionInvalidations, 4> mTransactions;

4 seems a bit high. Do we expect to commonly hit 4 inflight transactions?
Attachment #8838289 - Flags: review?(tnikkel) → review+
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/31dab83c688f
Part 1: Remove AddPresShellToInvalidateIfHidden since it doesn't appear to be necessary. r=tnikkel
https://hg.mozilla.org/integration/autoland/rev/d42c9f31c8c5
Part 2: Remove ClearMozAfterPaintEvents since it's unused. r=tnikkel
https://hg.mozilla.org/integration/autoland/rev/57251e932377
Part 3: Remove flags parameter to NotifyInvalidation since it's unused. r=tnikkel
https://hg.mozilla.org/integration/autoland/rev/eaf30368f343
Part 4: Pass the current transaction ID to NotifyInvalidation. r=tnikkel
https://hg.mozilla.org/integration/autoland/rev/a535bf3bfff1
Part 5: Bucket PresContext invalidations by transaction ID, and only deliver them when the associated composite has completed. r=tnikkel
This seems to have improved performance on the tps test:

== Change summary for alert #5176 (as of February 17 2017 08:22 UTC) ==

Improvements:

 10%  tps summary osx-10-10 opt        60.08 -> 53.96
 10%  tps summary linux64 opt          54.78 -> 49.34
  9%  tps summary windows8-64 opt      66.89 -> 61.16
  8%  tps summary windows7-32 opt      67.93 -> 62.32
  7%  tps summary linux64 pgo          44.66 -> 41.7

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=5176
I'm going to go out on a limb here and say that this isn't something we want to try backporting to 52 at this point :). Was there any thought of uplifting to Aurora or just let it ride the trains?
Flags: needinfo?(matt.woodrow)
Agreed, uplifting to 52 is pretty terrifying.

We could probably uplift to Aurora though, I'd want to give it a week on Nightly at least first.
Flags: needinfo?(matt.woodrow)
Satisfied enough with the baking to request uplift on this (and dep) now? :)
Flags: needinfo?(matt.woodrow)
Comment on attachment 8838285 [details]
Bug 1302071 - Part 1: Remove AddPresShellToInvalidateIfHidden since it doesn't appear to be necessary.

Approval Request Comment
[Feature/Bug causing the regression]: Underlying bug has existed for a long time, Bug 1264091 is when frontend code started using the broken code.
[User impact if declined]: Slow transitions to/from fullscreen.
[Is this code covered by automated tests?]: Most of it is, but not the specific interaction that was the issue here.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: Bug 1343069.
[Is the change risky?]: Medium?
[Why is the change risky/not risky?]: This introduces fairly large changes to the MozAfterPaint code, but we mainly only use that event for automated testing (which still all pass). Bug 1264091 introduced one of the few real users, and we've tested that it works well there. I think this should be fine, but we don't want to put it onto Beta.
[String changes made/needed]: None
Flags: needinfo?(matt.woodrow)
Attachment #8838285 - Flags: approval-mozilla-aurora?
This would only have the weekend on aurora before it goes to the beta 1 build on Monday. The fix sounds good but I'd rather this stick with 54.
Comment on attachment 8838285 [details]
Bug 1302071 - Part 1: Remove AddPresShellToInvalidateIfHidden since it doesn't appear to be necessary.

I get that it may not be that risky but I think we can wait till 54 here.
Attachment #8838285 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Cipri, could you please verify this on Beta 54?
Flags: needinfo?(ciprian.georgiu)
I can no longer reproduce this issue with 54 beta 2 (20170424215451) under Windows 10 x64, Ubuntu 16.04 x64 LTS and Mac OS X 10.11.6.
Status: RESOLVED → VERIFIED
Flags: needinfo?(ciprian.georgiu)
Hi, is it expected that only non-e10s builds are benefiting from the tps improvements for these patches? In bug 1362920 it came up, that it were nice to be able to extend this improvement for e10s builds as well... since I don't quite understand what is the cause of the improvement exactly I thought I'll just ask you if it's completely non-e10s specific or something similar could be done for e10s as well in theory.
Flags: needinfo?(matt.woodrow)
There wasn't really an expected performance improvement here, just changes in the way we report our timings back to content.

It's probably that e10s didn't suffer from the same issue.
Flags: needinfo?(matt.woodrow)
Depends on: 1395971
You need to log in before you can comment on or make changes to this bug.