Closed Bug 1319653 Opened 7 years ago Closed 7 years ago

Intermittent toolkit/content/tests/widgets/test_videocontrols_audio_direction.html | Rendering of reftest videocontrols_direction-{2c,2d,2e}.html should not be different to the reference

Categories

(Toolkit :: Video/Audio Controls, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox52 --- unaffected
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: ralin)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed])

Attachments

(1 file)

Summary: Intermittent toolkit/content/tests/widgets/test_videocontrols_audio_direction.html | Rendering of reftest videocontrols_direction-2e.html should not be different to the reference → Intermittent toolkit/content/tests/widgets/test_videocontrols_audio_direction.html | Rendering of reftest videocontrols_direction-{2c,2e}.html should not be different to the reference
Summary: Intermittent toolkit/content/tests/widgets/test_videocontrols_audio_direction.html | Rendering of reftest videocontrols_direction-{2c,2e}.html should not be different to the reference → Intermittent toolkit/content/tests/widgets/test_videocontrols_audio_direction.html | Rendering of reftest videocontrols_direction-{2c,2d,2e}.html should not be different to the reference
jmaher, in your second link above, https://bugzilla.mozilla.org/show_bug.cgi?id=1317779 looks plausible.
Flags: needinfo?(esawin)
The changes in bug 1317779 shouldn't affect non-restricted platforms. Maybe the separation of audio and video decoder allocation policies has some unwanted side effects?
Flags: needinfo?(esawin) → needinfo?(jwwang)
It is easy to change the policy to be per-reader-based which is quite similar to the original ReaderQueue.

So when the limit is 1, there is only one reader is allowed for decoding. The reader could create an audio decoder or a video decoder or both.
Flags: needinfo?(jwwang)
This continues to be one of the top intermittents and is maintaining the same pattern as before (win7 pgo only - odd why this is pgo only).

:esawin, do you need any other information or help to solve this bug?
Flags: needinfo?(esawin)
This seems to be a timing-related issue which got worse with bug 1317779 but wasn't caused by it.
I don't know anything about the Windows-specific media bits, so I'm moving this to the media teams.
Flags: needinfo?(esawin)
Flags: needinfo?(bwu)
Flags: needinfo?(ajones)
ni Alastor to put this on his plate.
Flags: needinfo?(bwu)
Flags: needinfo?(alwu)
Flags: needinfo?(ajones)
After discussed with Ray, we found that this issue is the layout problem, it's caused by loading external CSS file.

---

First, we checked the result [1] and reference page [2] of "videocontrols_direction-2e.html". We can see the different of these two pages, we can't see the play button in the reference page.

Ray said that the reference page is the wrong one, the result page is the correct one. And why we didn't see the play button in the control bar? That's because layout return the wrong width [3], it returns ZERO in this cas. It's incorrect.

And we also found that, if we trigger the reflow for the page which show the wrong control bar, and then the control bar would become correct after reflow. That can also explain why it's intermittent fail, because the capture image would be different before and after reflow.

After further survey, we found that if you remove this line [4] or you didn't load any file (set src=""), the control can get the correct width value.

Therefore, move this bug to CSS related component.

[1] result page : https://pageshot.net/ZoGx7Xe2ACkkhcxf/data
[2] reference page : https://pageshot.net/NcqrZiSUwzS6sDCm/data
[3] return width : https://goo.gl/31DCJo
[4] loading external CSS file : https://goo.gl/Cui0KH
Component: Video/Audio Controls → CSS Parsing and Computation
Flags: needinfo?(alwu)
Product: Toolkit → Core
By the way, this issue isn't Window-specific, we can reproduce that in OSX.
:jet, I see you are the triage owner for this new component, can you help find someone to look at this, for the last 6 weeks this has been one of the top intermittent failures.
Flags: needinfo?(bugs)
Thanks Alastor for summarizing the result we discussed offline. 

In videocontrols initialization process, we intend to make controlBar show and hide in a flash to record the `computed dimension` of each controls for later adjustment[0]. Once we got incorrect computed sizes, we could not adjust controlBar accordingly. 

I've tried but still having a hard time to find out how external css interfere layout computation, so I made a few test pages aside from reftest that might help to clarify the problem:

1. https://people-mozilla.org/~ralin/vid-css/external.html
2. https://people-mozilla.org/~ralin/vid-css/in-page.html 
3. https://people-mozilla.org/~ralin/vid-css/inline.html
4. https://people-mozilla.org/~ralin/vid-css/in-page-external.html
5. https://people-mozilla.org/~ralin/vid-css/inline-external.html
(Any page with external css breaks video layout, even the external css is emtpy.)


[0] http://searchfox.org/mozilla-central/rev/848c29538ab007fb95dc6cff194f0e3d3809613d/toolkit/content/widgets/videocontrols.xml#357-417
(In reply to Alastor Wu [:alwu] from comment #25)
> After discussed with Ray, we found that this issue is the layout problem,
> it's caused by loading external CSS file.

I don't think so. Reverting bug 1271765 fixes it. Please inspect Part 1 of those patches, particularly the removal of the video controls' reflow.
Component: CSS Parsing and Computation → Video/Audio Controls
Depends on: 1271765
Flags: needinfo?(bugs) → needinfo?(ralin)
Product: Core → Toolkit
Target Milestone: --- → mozilla53
Version: unspecified → Trunk
(In reply to Jet Villegas (:jet) from comment #29)
> (In reply to Alastor Wu [:alwu] from comment #25)
> > After discussed with Ray, we found that this issue is the layout problem,
> > it's caused by loading external CSS file.
> 
> I don't think so. Reverting bug 1271765 fixes it. Please inspect Part 1 of
> those patches, particularly the removal of the video controls' reflow.

The Part1 patch's commit message is tiny bit misleading, sorry for that. Since we converted video controls from XUL to HTML, what we did in Part1 is not removal of video controls' reflow code but actually rewriting to HTML reflow. I don't think the issue is all about reflow as layout sill work fine with inline and in-page css rules. (please visit the test pages in comment 28)
Flags: needinfo?(ralin)
Hi Daniel
 
Per comment 25, we can get correct 'clientWidth' in video controls initialization with inline and internal css, but if we imported any external CSS, 'clientWidth' would return 0. I just wonder that from the aspect of layout, does different CSS source(inline,internal and external) affect size computation?

Sorry that I'm not familiar with layout and CSS parsing, it would be great if you can help to look at this issue. Thanks
Flags: needinfo?(dholbert)
(In reply to Ray Lin[:ralin] from comment #31)
> I just wonder that from the aspect of
> layout, does different CSS source(inline,internal and external) affect size
> computation?

Nope, the source doesn't matter.

It looks like in the test linked in comment 25, the <audio> element doesn't have a specified size -- and the external CSS file imposes a size on it. The HTML has:
> <audio controls preload="none" id="av" source="audio.wav" style="direction: rtl;"></audio>

and the external CSS has:
> audio, video {
>   width: 140px;
>   height: 100px;

So if you can work around the problem by removing the reference to the external CSS, then my bet would be that this is just a problem that happens for <audio> elements with a specified size.
Flags: needinfo?(dholbert)
Oh, I'm just seeing this from comment 28:
> (Any page with external css breaks video layout, even the external css is emtpy.)

If that ^^ quote is really accurate, then I'd guess that this is a race condition of some sort, where the pending external CSS file happens to delay our initial reflow a bit, and the state of the world ends up a bit different when we get to that initial reflow.  Something like that.

(Or it *might* be that the external CSS makes us trigger a second reflow after that external CSS file loads, and that second reflow is broken. Though if that were the case, I'd expect the problem to go away when the external CSS is empty -- and it doesn't sound like that's what happens. So this parenthesized idea is probably not the explanation.)
(I'll take a closer look with a debugger on Monday.)
Flags: needinfo?(dholbert)
(In reply to Daniel Holbert [:dholbert] from comment #34)
> (I'll take a closer look with a debugger on Monday.)

Thank you so much Daniel.

Just want to provide more what I've got so far. By listening to `resizevideocontrols` event[0], I have the same guess as yours mentioned in comment 33, a kind of race condition. I can get correct computed size while `resizevideocontrols` being triggered, but unfortunately it happens immediately after video controls' initial process...

Video controls used to work fine with the same logic [1][2], so I think there should be some differences between how we handle CSS rules in XUL namespace and HTML namespace. (maybe timing or priority?)

[0] https://dxr.mozilla.org/mozilla-central/rev/0eef1d5a39366059677c6d7944cfe8a97265a011/layout/generic/nsVideoFrame.cpp#390-393
[1] https://hg.mozilla.org/mozilla-central/file/62c5218b7325/toolkit/content/widgets/videocontrols.xml#l415
[2] http://searchfox.org/mozilla-central/rev/848c29538ab007fb95dc6cff194f0e3d3809613d/toolkit/content/widgets/videocontrols.xml#357-417
Attached file testcase 1
OK, here's a testcase. For now, let's not focus on the initial reflow -- let's focus on the hover-triggered reflows in this testcase (due to the "audio:hover { width: 150px}" rule that I've included).

When you hover, the expectation is that:
 (1) From layout, we detect the size change & dispatch "resizevideocontrols".
 (2) In videocontrols.xml, this "resizevideocontrols" event is caught & its handler re-evaluates which components will fit inside of the widget.

But that's not happening. The components aren't being updated to fit -- we've just got all of the components.

In particular, the problem here is with videocontrols.xml. We're taking the early return in its event handler, here:
 And specifically, the event handler is failing **because it takes this early return**:
>      adjustControlSize() {
>        if (!this.controlBar.minWidth || this.videocontrols.isTouchControls) {
>          return;
>        }
https://dxr.mozilla.org/mozilla-central/rev/0eef1d5a39366059677c6d7944cfe8a97265a011/toolkit/content/widgets/videocontrols.xml

...because this.controlBar.minWidth happens to be 0.  So, we never get to remove any components,  because we return before we see what will fit.
Flags: needinfo?(dholbert) → needinfo?(ralin)
So a partial fix *might* be to replace this "falseyness" check...
  !this.controlBar.minWidth
...with an undefined-ness check like so:
  (typeof this.controlBar.minWidth == 'undefined')

This lets us make some progress inside the handler function, at least. But it still doesn't give us a working layout -- it ends up with a collapsed height, for some reason. I don't really understand how we aim to set up the sizing for these components & handle their resizing/hiddenness, but hopefully ralin can take it from here!  I've been debugging by sprinkling dump("...\n"); statements liberally in videocontrols.xml, BTW. :) That may be the best tool that we have here.


(One more thing: I dug a little bit into "why does this only happen when there's an external stylesheet", and it's basically just because content additions get batched a bit differently between those cases. (Due to different HTML-parser progress from different amounts of content, or maybe from us wanting to avoid flash-of-unstyled-content when there are pending external style loads, or something like that.)  And that matters because it influences *when the <video> XBL constructor (and "setupInitialState()" in particular) gets invoked.  And that matters because setupInitialState has calls to .clientWidth and .clientHeight, which cause layout flushes.  So: depending on how the parser batches its content, our videocontrols XBL constructor may or may not be invoked at a time when there's a pending reflow -- and so it may or may not synchronously flush layout, and hence may or may not produce a nonzero value in this.controlBar.minWidth up-front.)
An alternate partial-fix might be to add something like this towards the bottom of setupInitialState:
         this.controlBar.minHeight = 40;
+        this.controlBar.minWidth = 100;  // [This line is new]
         this.scrubberStack.minWidth = 64;
         this.volumeControl.minWidth = 48;

I don't immediately know what these magic numbers represent, but from a naive incomplete first-glance, it seems like this.controlBar.minWidth is always 100 in the "working" scenario (with no external stylesheet).  So as long as we're setting all these other ones, we might as well set it as well...?  (But at the same time, it feels silly that we're setting these up as a .clientWidth/.clientHeight proxies, if we're just going to hardcode them...)

Anwyay, this has the same downside as my partial-solution in comment 38 -- it gets us into the handler function, but we still end up with a collapsed height, and it looks like videocontrols.xml thinks that the controls are forever hidden, for some reason. So, not a complete solution.

Also: *do not open devtools* when testing out the patches here.  If you have devtools open, it seems to make us re-trigger all the XBL constructors or something, which makes the bug fix itself. (So devtools effectively prevent the bug from being visible.)
(In reply to Daniel Holbert [:dholbert] from comment #39)
> Also: *do not open devtools* when testing out the patches here.

(er, I meant to say "testcases", not "patches" -- sorry)
Depends on: 1339269
So I suppose my debugging here has largely circled back to the question that ralin asked in comment 31 -- where he poiunted out that .clientWidth is returning 0 in the videocontrols setup code. So let me step back a bit and explain why that's happening (I think).

Right now, the XBL video controls' constructor (setupInitialState in particular) uses .clientWidth, which means it effectively assumes that we already have layout information (or that we can flush layout to produce layout information). But I think that's an invalid assumption, on the XBL constructor's part.  The XBL constructor may get invoked *during frame construction*, i.e. while we are building the frame tree for the first time.  This is before we have marked the tree as dirty.  In that case, its clientWidth/clientHeight queries *will not flush layout* (because there's nothing dirty that needs flushing yet) and so those clientWidth/clientHeight reads will just return 0.  So: the videocontrols XML needs to be more robust against that sort of thing, and needs to be able to update once actually do reflow & end up with a nonzero size later on.

Also: even in the absence of this external-stylesheet race-condition complexity, it turns out we can create a version of the same problem by simply using a <video> or <audio> element that initially has explicit "width:0", which we later remove.  I spun off helper bug 1339269 for that scenario. I think we need to fix that bug first (by making videocontrols.xml better-handle changes to/from 0 size).  And then once that's fixed, that'll either fix this bug here or get us closer to fixing this bug.
> An alternate partial-fix might be to add something like this towards the bottom of setupInitialState:
>          this.controlBar.minHeight = 40;
> +        this.controlBar.minWidth = 100;  // [This line is new]
>          this.scrubberStack.minWidth = 64;
>          this.volumeControl.minWidth = 48;
I suspect the hardcoded nsSize we used to have in layout code[0] somehow hid this problem. 

> So: the videocontrols XML needs to be more robust against that sort of thing, and needs to be able to update once actually do reflow & end up with a nonzero size later on.
so I think re-caching clientWidth/clientHeight every time `resizevideocontrols` occurred would help. I'll attach a patch to bug 1339269 and probably hope to fix the problem at once, then see if still any intermittent is reported here.

Thank you Daniel, I'm really grateful for your help.


[0] https://hg.mozilla.org/mozilla-central/file/2be680105fcd/layout/generic/nsVideoFrame.cpp#l601
Flags: needinfo?(ralin)
can we move forward on fixing this?
Flags: needinfo?(ralin)
Bug 1339269 (referenced directly above) landed on m-c today. And yes, it needs an uplift request once we verify on OrangeFactor that it fixed the failures.
Flags: needinfo?(ralin)
Assignee: nobody → ralin
Blocks: 1271765
No longer depends on: 1271765
Target Milestone: mozilla53 → mozilla54
Whiteboard: [stockwell fixed]
Recent Android mochitest-32 failures reported here will be handled in bug 1345350.
I believe we can call this FIXED by Bug 1339269 -- the last legitimate[1] report was on March 1st, from mozilla-aurora -- and that's the day we uplifted the fix (from Bug 1339269) to Aurora, so that's when we'd expect the reports would've stopped.

Here are the orangefactor reports for the past couple weeks:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1319653&startday=2017-02-27&endday=2017-03-15&tree=all

[1] There were two "illegitimate" reports (mis-starrings) on March 8th for mozilla-inbound, shown in the orangefactor link I provided above.  Those were both for an "application ran for longer than allowed maximum time" abort on Android Debug, which just happened to pop up when this was the current test being run.  That has nothing to do with this bug, and probably has nothing to do with this test -- IIRC it just means the testsuite as a whole is taking longer than we're expecting, and it's not something we can do much about aside from increasing the allowed testsuite time and/or sharding the testsuite into more pieces.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
(In reply to Daniel Holbert [:dholbert] from comment #49)
> [1] There were two "illegitimate" reports (mis-starrings) on March 8th for
> mozilla-inbound, shown in the orangefactor link I provided above.  Those
> were both for an "application ran for longer than allowed maximum time"
> abort on Android Debug, which just happened to pop up when this was the
> current test being run.  That has nothing to do with this bug, and probably
> has nothing to do with this test -- IIRC it just means the testsuite as a
> whole is taking longer than we're expecting, and it's not something we can
> do much about aside from increasing the allowed testsuite time and/or
> sharding the testsuite into more pieces.

(Oh, it looks like gbrown's comment 48 was about those two reports -- and we did indeed address that problem by sharding into more pieces over on bug 1345350. Hooray!)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: