1230641 - playback of mp4 in MacOS Firefox Nightly is bad

Reporter

Description

•

10 years ago

https://streamable.com/g5dv https://cdn.streamable.com/video/mp4/g5dv.mp4 Play back of this mp4 is pretty bad in Firefox. If you open it up in VLC or QuickTime, playback is fine.

Jean-Yves Avenard [:jya]

Comment 1

•

10 years ago

What version of FF were you using? IIRC, you mentioned on IRC that you used a 2015 MBPr 13" right?

Jean-Yves Avenard [:jya]

Comment 2

•

10 years ago

We do skip lots of frames on that video (this is FF developer edition 44) nearly half of them 1478 frames, 631 decoded but only 304 painted... Using FFmpeg h264 decoder it plays nicely. Interestingly, on iPhone it shows some pauses, always in the same spot @4s

Jean-Yves Avenard [:jya]

Comment 3

•

10 years ago

Video plays fine with a local central build... Can't think of any changes that would explain the behaviour on 44.

Jean-Yves Avenard [:jya]

Updated

•

10 years ago

Assignee: nobody → jyavenard

Jean-Yves Avenard [:jya]

Comment 4

•

10 years ago

This issue can be easily seen with the video set in full screen. This is a 60fps video. I added logs to find out the time it took for the decoder to decode a frame and how often MediaFormatReader::RequestVideoData is called What it shows is that 99% of the time, the video MediaDataPromise is typically resolved within 1ms. There are times however where the Apple VT PDM will take > 30ms to return a frame, this always happen at around 6.95ms with over 10 frames being queued for decoding. What is more worrying however, is that the time between calls to RequestVideoData is almost always > 16ms ; which is the bare minimum to handle 60fps. Looking at when the MDSM is rescheduled for another run shows that we hit this: http://mxr.mozilla.org/mozilla-central/source/dom/media/MediaDecoderStateMachine.cpp#2513 constntly I see mPlaybackRate being one, so we reschedule the MDSM every AUDIO_DURATION_USECS which is every 40ms. This appears to be a regression from https://hg.mozilla.org/mozilla-central/rev/1dc6e120ebff05eb39ed64d6cbedbd72c0ab5bc9

Flags: needinfo?(kikuo)

Flags: needinfo?(jwwang)

Jean-Yves Avenard [:jya]

Comment 5

•

10 years ago

I have a suspicion on why the mac VT decoder will sometimes takes over 30ms to decode a frame, will investigate that theory

Kilik Kuo [:kikuo] (inactive)

Comment 6

•

10 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #4) > This issue can be easily seen with the video set in full screen. > This is a 60fps video. > > I added logs to find out the time it took for the decoder to decode a frame > and how often MediaFormatReader::RequestVideoData is called > > What it shows is that 99% of the time, the video MediaDataPromise is > typically resolved within 1ms. There are times however where the Apple VT > PDM will take > 30ms to return a frame, this always happen at around 6.95ms > with over 10 frames being queued for decoding. > > What is more worrying however, is that the time between calls to > RequestVideoData is almost always > 16ms ; which is the bare minimum to > handle 60fps. > Looking at when the MDSM is rescheduled for another run shows that we hit > this: > http://mxr.mozilla.org/mozilla-central/source/dom/media/ > MediaDecoderStateMachine.cpp#2513 > constntly > > I see mPlaybackRate being one, so we reschedule the MDSM every > AUDIO_DURATION_USECS which is every 40ms. > > This appears to be a regression from > https://hg.mozilla.org/mozilla-central/rev/ > 1dc6e120ebff05eb39ed64d6cbedbd72c0ab5bc9 VideoSink keeps updating rendered frames here [1]. It's related to the first timestamp(greater than current) in queue. Just like original logic it was. [1] https://hg.mozilla.org/mozilla-central/rev/1dc6e120ebff05eb39ed64d6cbedbd72c0ab5bc9#l4.375 As you said, the number of painted frames is much lower than decoded frames, this should be checked in VideoSink and underlying machinery. RequestVideoData() mostly is triggered when there's a decoded video data pushed into VideoQueue [2]. [2] http://mxr.mozilla.org/mozilla-central/source/dom/media/MediaDecoderStateMachine.cpp#705 That should not be much related to the (40ms) scheduling in MDSM which attempts to update playback position periodically. The problem here to me is the decoder taking too long for decode I think. Sorry I don't have a mac environment for further testing.

Flags: needinfo?(kikuo)

JW Wang [:jwwang] [:jw_wang]

Comment 7

•

10 years ago

It looks like bug 1218311 is not uplifted to 44. Does it happen to 45?

Flags: needinfo?(jwwang)

Jean-Yves Avenard [:jya]

Comment 8

•

10 years ago

(In reply to Kilik Kuo [:kikuo][:kilikkuo] from comment #6) > http://mxr.mozilla.org/mozilla-central/source/dom/media/ > MediaDecoderStateMachine.cpp#705 > That should not be much related to the (40ms) scheduling in MDSM which > attempts to update playback position periodically. The problem here to me is > the decoder taking too long for decode I think. > Sorry I don't have a mac environment for further testing. Then why the very long delay between calls to RequestVideoData? Even when the video promise is resolved in less than 1ms, the next call to RequestVideoData will typically be > 30ms after the previous one. We are mostly slow at decoding because RequestVideoData isn't called quickly enough

Jean-Yves Avenard [:jya]

Comment 9

•

10 years ago

(In reply to JW Wang [:jwwang] from comment #7) > It looks like bug 1218311 is not uplifted to 44. Does it happen to 45? yes, but it's not as bad as in 44. 44 will typically skips more frames, to the point where it's unwatchable

Kilik Kuo [:kikuo] (inactive)

Comment 10

•

10 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #8) > (In reply to Kilik Kuo [:kikuo][:kilikkuo] from comment #6) > > http://mxr.mozilla.org/mozilla-central/source/dom/media/ > > MediaDecoderStateMachine.cpp#705 > > That should not be much related to the (40ms) scheduling in MDSM which > > attempts to update playback position periodically. The problem here to me is > > the decoder taking too long for decode I think. > > Sorry I don't have a mac environment for further testing. > > Then why the very long delay between calls to RequestVideoData? > Even when the video promise is resolved in less than 1ms, the next call to > RequestVideoData will typically be > 30ms after the previous one. > > We are mostly slow at decoding because RequestVideoData isn't called quickly > enough If VideoDataPromise can be resolved in less than 1ms, so that a OnVideoDecoded(or OnVideoNotDecoded) should be called to receive newly decoded data. Then a call to MDSM::Push, then to MDSM::DispatchDecodeTasksIfNeeded, then MDSM::EnsureVideoDecodeTaskQueued, and then a MDSM::RequestVideoData in final to trigger another round of video requesting. Unless there's no need to decode video in certain cases, one of the cases is that there're enough video data. But this should still doesn't make impact on videosink rendering frames.

JW Wang [:jwwang] [:jw_wang]

Comment 11

•

10 years ago

https://www.mozilla.org/en-US/firefox/44.0a2/auroranotes/ It looks like 44.0a2 is released on November 3 which doesn't include bug 1218311.

Jean-Yves Avenard [:jya]

Comment 12

•

10 years ago

You think it's bug 1218311 that caused the regression? I'm not entirely convinced it's a regression though, I've seen it in 44 too, albeit less severe

JW Wang [:jwwang] [:jw_wang]

Comment 13

•

10 years ago

bug 1218311 is about porting the fix of bug 1193614 to VidoeSink.

Ralph Giles (:rillian)

Comment 14

•

10 years ago

Is this still an issue?

Priority: -- → P1

Jean-Yves Avenard [:jya]

Comment 15

•

10 years ago

yes... though I had temporarily given up on it... When timing the apple's VT decoder itself, almost every frames took > 16ms to decode (and this a 60fps video). Which is super weird as FFmpeg has no problem with it. Yet, Apple using the same framework plays that video just fine.

Jean-Yves Avenard [:jya]

Updated

•

10 years ago

Assignee: jyavenard → nobody

LK506

Comment 16

•

10 years ago

I have an iMac and running on the latest OS, i have opened the provided link with the provided link, no problems has been found with the video feedback, tested it out on quickplayer and is running the same. https://cdn.streamable.com/video/mp4/g5dv.mp4

Jean-Yves Avenard [:jya]

Comment 17

•

10 years ago

which imac was that though? the issue can be seen on my MBP 2015 13" retina with an Intel Iris Graphics 6100. it happens consistently there. unfortunately, still no clue on why this would be happening on this laptop and why the intel decoder takes so long for some of the frames

Jean-Yves Avenard [:jya]

Updated

•

10 years ago

Assignee: nobody → jyavenard

Jean-Yves Avenard [:jya]

Comment 18

•

10 years ago

I can reproduce a similar issue using VLC when I enable Apple's VDA hardware decoder. Over 30% of frames are dropped, though the frames are dropped more smoothly than we do so the video is looks better with VLC. If I disable the hardware decoder, playback is still bad. However, disabling both the hardware decoder and the use of GPU surface to instead use software decoder: then it works okay. Looking at the CPU usage and the process running when playing this video with Safari, I can't see the VTDecoderXPCService process kicking in; so it appears Safari isn't using the hardware decoder either. So maybe what we should do is simply blacklist on mac the hardware decoder for certain resolution and framerate like we do on windows.

Jean-Yves Avenard [:jya]

Comment 19

•

10 years ago

When using the VT software decoder with software YUV buffer, playback plays almost smoothly, with very few frames being dropped (and skip to next keyframe logic never kicks in). When using IOSurface however the result is very bad. Seeing that we are always using IOSurface during decoding, this indicates that the decoder would be waiting for an IOSurface to be made available after use (used by the compositor) Also: with the default setting, the average time taken by the decoder to output a frame is 26404us. If after starting playback, I immediately tab out, this time drops to 14ms. The VideoToolbox framework uses a CVPixelBufferPool to allocate frames. VT provides an API to control the minimum number of frames available in this pool: @constant kVTDecompressionPropertyKey_OutputPoolRequestedMinimumBufferCount @abstract Requests that the VTDecompressionSession use the value provided as a minimum buffer count for its output CVPixelBufferPool, not releasing buffers while the number in use is below this level. @discussion This property effectively requests that the kCVPixelBufferPoolMinimumBufferCountKey key be used for the creation of the output CVPixelBufferPool. For general playback cases, standard CVPixelBufferPool age-out behaviour should be sufficient and this property should not be needed. This property should only be used in unusual playback scenarios where a peak pool level is known, and the potential memory overhead is an acceptable tradeoff for avoiding possible buffer reallocation. Setting this property to NULL or passing in the value 0 will clear this setting and remove the minimum buffer count. Setting this property while a VTDecompressionSession is in use will result in the creation of a new CVPixelBufferPool. This will cause new buffers to be allocated, and existing buffers to be deallocated when they are released.

Jean-Yves Avenard [:jya]

Comment 20

•

9 years ago

So still progressing here. I've timed various things. The video attached is made of 1422 samples. If I set the decode ahead value to 1422 (so there is never any wait between the time we feed the decoder with new data). It shows that the average time to decode a frame is 12ms, and the median time is 8ms. This is surprisingly low because the dummy decoder I wrote using identical decoding code takes an average of 8ms with a median time of 4ms. So it's much slower, but I can't explain why. Now I then timed the following: 1- Time between a call to RequestVideoData and the time the video promise is resolved 2- Time between when a video promise is resolved and the next RequestVideoData 3- Time between two call to MDM::OnVideoPopped With a decode ahead settings of 10. 1) min 52.83 max 424460.96 <- this is the time between the first frame decoding and the next, and can be ignored average 3023.913268 median 269.83 So this works quite well, while it may take a while for a video to be decoded from time to time, the majority of the time, it's short and as decode ahead works as expected, it explain the low average. 2) min 136.53 max 7050229.88 average 18543.18263 median 13203.22 now this is bad. 18.5ms average between requests makes it impossible to play a 60fps video properly. 3) min 20 max 91748.69 average 6338.207178 median 2930.62 this is allright, it shows how often the video queue pop a sample. Now, it's puzzling on why time between OnVideoPopped be 6ms, yet the time between RequestVideoData is so high. Logs shows that MDSM::NeedToDecodeVideo() typically returns false. So no video decoding is queued.

Jean-Yves Avenard [:jya]

Comment 21

•

9 years ago

JW, what is the logic on how often RequestVideoData will be called, and what could explain why such delays are seen? thanks

Flags: needinfo?(jwwang)

Jean-Yves Avenard [:jya]

Comment 22

•

9 years ago

ok, found out what's going on. Because it's HW decoding, a new request will only be made if the VideoQueue size is < 3. Additionally, VideoQueue().GetSize() shows at 3 even as OnVideoPopped got called. Sounds like a race between the time the nsdequeue is popped and the time the MediaEventProducer listener is called.

Flags: needinfo?(jwwang)

JW Wang [:jwwang] [:jw_wang]

Comment 23

•

9 years ago

OnVideoPopped() is called asynchronously when video samples are popped from the queue. Therefore it is possible the queue is filled again by the time OnVideoPopped() is called. However, it should be fine since DispatchVideoDecodeTaskIfNeeded() always check if the queue is full before really dispatching decoding tasks. What race are you concerned about?

Jean-Yves Avenard [:jya]

Comment 24

•

9 years ago

(In reply to JW Wang [:jwwang] from comment #23) > What race are you concerned about? it turned out to be something else. Most of the time the VideoSink pop a video frame, it immediately push it back https://dxr.mozilla.org/mozilla-central/source/dom/media/mediasink/VideoSink.cpp#399 so the task that is being dispatched serves no purpose whatsoever. More often than not however, logs show that the reschedule of the media sink is set to be 39ms (basically to coincide when the next frame is about to be popped from the video queue). That means that the VideoSink calls the MDSM that it needs a new frame right when it's already too late to get one (as the video decoder takes sometimes > 16ms). The time the media sink is being put on hold is wasted, when it could have scheduled the MDSM to get a new frame earlier instead. Everything works well if the decoder is very fast and can decode a frame in less than the time it took to display the current one (e.g. < 16ms) However, in this particular video, while the average time to decode is about 12ms, the variance is great (e.g it goes like 2ms, 2ms, 3ms, 38ms, 29ms, 2ms) As when HW accelerated the queue is only 3 frames deep, being slow to decode a single frame is enough to kick the skip to the next key frame logic. If things were done more regularly, decoding in advance, and more importantly, the MDSM requesting the frames more often (requesting frames every 18ms is never going to allow playback of a 60fps video) we would have a chance to play. right now, it fails miserably. I'm afraid however that this bug only reveals a fundamental issue with our entire gecko architecture, we have way too many threads going, so task dispatches are slow (I've seen up to 18ms between the time a task is queued on the reader's task queue and the time it actually runs)

JW Wang [:jwwang] [:jw_wang]

Comment 25

•

9 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #24) > More often than not however, logs show that the reschedule of the media sink > is set to be 39ms (basically to coincide when the next frame is about to be > popped from the video queue). This shouldn't be true. The video sink is scheduled to wake up at the start time of the next frame. E.g. if the current time is 100ms, and the start time of next video frame is 110ms, video sink will wake up 10ms later to render the next frame.

Jean-Yves Avenard [:jya]

Comment 26

•

9 years ago

(In reply to JW Wang [:jwwang] from comment #25) > This shouldn't be true. The video sink is scheduled to wake up at the start > time of the next frame. E.g. if the current time is 100ms, and the start > time of next video frame is 110ms, video sink will wake up 10ms later to > render the next frame. not quite, it is scheduled to wake up after the start of the next frame whose time is past the current time if (nextFrame->mTime > clockTime) { remainingTime = nextFrame->mTime - clockTime; break; } so if frames are dropped because they are too late, the delay will be greater than a frame duration. I added logs in VideoSink doing: TimeDuration next = TimeDuration::FromMicroseconds(remainingTime / mAudioSink->GetPlaybackParams().mPlaybackRate); TimeStamp target = nowTime + next; fprintf(stderr, "rescheduling in:%.02f queuesize:%u\n", next.ToSeconds() * 1000000, (uint32_t)VideoQueue().GetSize()); and in the logs, I see plenty of burst with: rescheduling in:39885.00 queuesize:11 popping image at front pushing image at front rescheduling in:39885.00 queuesize:11 OnVideoPopped, time since last popped: 42965.02 UpdateplaybackPositionPeriodically ScheduleStateMachineIn: 40000 popping image at front pushing image at front rescheduling in:39885.00 queuesize:11 OnVideoPopped, time since last popped: 36108.73 on this run, the average time between calls to RequestVideoData is 18543.18263us

JW Wang [:jwwang] [:jw_wang]

Comment 27

•

9 years ago

If frames are dropped, they don't even exist to the video sink. And the video sink still wake up at the start time of "next" frame, right?

Jean-Yves Avenard [:jya]

Comment 28

•

9 years ago

why wouldn't they exist in the videosink ? https://dxr.mozilla.org/mozilla-central/source/dom/media/mediasink/VideoSink.cpp#385 loops over all frames in the video queue until it finds the first whose start time is > clock, all others are dropped.

Jean-Yves Avenard [:jya]

Comment 29

•

9 years ago

(In reply to JW Wang [:jwwang] from comment #27) > If frames are dropped, they don't even exist to the video sink. And the > video sink still wake up at the start time of "next" frame, right? yes, it will wake up at the start time of the next frame, but it does nothing until then. So OnVideoPopped isn't called, and as such RequestVideoData isn't called. While we obviously have issues on this video on this particular machine speed wise. Decoding in advance rather than waiting until it's too late to decode would allow to play this video.

JW Wang [:jwwang] [:jw_wang]

Comment 30

•

9 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #29) > yes, it will wake up at the start time of the next frame, but it does > nothing until then. So OnVideoPopped isn't called, and as such > RequestVideoData isn't called. https://dxr.mozilla.org/mozilla-central/source/dom/media/MediaDecoderStateMachine.cpp#704 MDSM will also request next samples when a sample is pushed to the queue. In fact, MDSM always tries to decode as much as possible to keep the queues full without intervention from video sink.

Jean-Yves Avenard [:jya]

Comment 31

•

9 years ago

As I wrote, the average time between the moment the video data promise is resolved and the next RequestVideoData is 18ms, so the MDSM does a poor job at keeping the queue full in time :( https://docs.google.com/spreadsheets/d/1gtApsRPRRFhwy1zto3tnAM7ed5SexAFoRJOCsSIli04/edit?usp=sharing on tab 2 you see the various times I mentioned in comment 20

Jean-Yves Avenard [:jya]

Comment 32

•

9 years ago

Now of course, it could just be the MDSM task queue is too busy processing other tasks, or the overall system is too busy to action anything

Jean-Yves Avenard [:jya]

Updated

•

9 years ago

Comment 33

•

9 years ago

Attached file playback60fps.trace.zip — Details

this is a 34s profiler run when playing the video. The biggest chunk of time is spent in The compositor thread takes 17% of CPU time Still the total amount of CPU time is 26.7s, so it's not particularly huge seeing that this machine is a dual core. compositor biggest task is: Running Time Self (ms) Symbol Name 3042.0ms 11.3% 1.0 mozilla::layers::LayerManagerComposite::UpdateAndRender() main thread biggest tasks are: Running Time Self (ms) Symbol Name 2835.0ms 10.5% nsRefreshDriver::Tick(long long, mozilla::TimeStamp) 2541.0ms 9.4% mozilla::dom::HTMLMediaElement::nsAsyncEventRunner::Run()

Jean-Yves Avenard [:jya]

Comment 34

•

9 years ago

Attached file GPUprofiler60fps.trace.zip — Details

JW pointed out on how bad performance was when using full screen. And indeed, decoding time slows to a crawl when playing in full screen. Matt, anything out of the ordinary coming out here? CPU usage is almost the same as when in windowed mode.

Flags: needinfo?(matt.woodrow)

Jeff Muizelaar [:jrmuizel]

Updated

•

9 years ago

Flags: needinfo?(jmuizelaar)

JW Wang [:jwwang] [:jw_wang]

Comment 35

•

9 years ago

FYI. Bug 1250570 comment 3 indicates high GPU usage on https://cdn.streamable.com/video/mp4/g5dv.mp4.

Jean-Yves Avenard [:jya]

Comment 36

•

9 years ago

Attached file MozReview Request: Bug 1230641: P1. Use UYVY (YUV422) format in decoders. r?mattwoodrow — Details

Review commit: https://reviewboard.mozilla.org/r/39921/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/39921/

Attachment #8730480 - Flags: review?(matt.woodrow)

Jean-Yves Avenard [:jya]

Comment 37

•

9 years ago

Attached file MozReview Request: Bug 1230641: P2. Increase the video queue size on mac. r?cpearce — Details

The mac decoder when used on intel GPU gives bad performance due to locking. Increasing the queue size allows to alleviate most problems. Review commit: https://reviewboard.mozilla.org/r/39923/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/39923/

Attachment #8730481 - Flags: review?(cpearce)

Chris Pearce [:cpearce (Not reading bugmail)]

Updated

•

9 years ago

Attachment #8730481 - Flags: review?(cpearce) → review+

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 38

•

9 years ago

Comment on attachment 8730481 [details] MozReview Request: Bug 1230641: P2. Increase the video queue size on mac. r?cpearce https://reviewboard.mozilla.org/r/39923/#review36483

Matt Woodrow (:mattwoodrow)

Assignee

Comment 39

•

9 years ago

Comment on attachment 8730480 [details] MozReview Request: Bug 1230641: P1. Use UYVY (YUV422) format in decoders. r?mattwoodrow https://reviewboard.mozilla.org/r/39921/#review36501

Attachment #8730480 - Flags: review?(matt.woodrow) → review+

JW Wang [:jwwang] [:jw_wang]

Comment 40

•

9 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #36) > Created attachment 8730480 [details] > MozReview Request: Bug 1230641: P1. Use UYVY (YUV422) format in decoders. > r?mattwoodrow > > Review commit: https://reviewboard.mozilla.org/r/39921/diff/#index_header > See other reviews: https://reviewboard.mozilla.org/r/39921/ How does changing format improve playback?

Jean-Yves Avenard [:jya]

Comment 41

•

9 years ago

> How does changing format improve playback? when using YUV420P, we end up having two CVPixelBufferPool, one used by the hardware decoder, and one used to convert the data and pass it to the client (the AppleVTDecoder). By using YUV422, the CVPixelBufferPool is shared internally halving the memory usage of the decoder and shelving some times as the conversion from the format returned by the decoder (UYVY) doesn't need to be converted to NV12. Additionally (and more importantly). UYVY surface can be uploaded to an OGL surface and handled as if it was RGB. So the compositor no longer need a shader to convert UYVY into RGB and composite things (see bug 1248323) Those two patches do not fix the main issue, they just help.

Depends on: 1248323

Jean-Yves Avenard [:jya]

Updated

•

9 years ago

Keywords: leave-open

Pulsebot

Comment 42

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/f8c2b7248428 https://hg.mozilla.org/integration/mozilla-inbound/rev/97571d40a08a

Carsten Book [:Tomcat]

Comment 43

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f8c2b7248428 https://hg.mozilla.org/mozilla-central/rev/97571d40a08a

Matt Woodrow (:mattwoodrow)

Assignee

Updated

•

9 years ago

Flags: needinfo?(matt.woodrow)

Anthony Jones (:ajones, :kentuckyfriedtakahe, :k17e)

Updated

•

9 years ago

Comment 44

•

9 years ago

http://people.mozilla.org/~jyavenard/mediatest/g5dvv.mp4

Jean-Yves Avenard [:jya]

Comment 45

•

9 years ago

Attached patch Timing log module AppleTiming (obsolete) — Details — Splinter Review

MozReview-Commit-ID: JPKhjyycpJG

Jean-Yves Avenard [:jya]

Comment 46

•

9 years ago

Attached patch Timing log module AppleTiming — Details — Splinter Review

MozReview-Commit-ID: JPKhjyycpJG

Jean-Yves Avenard [:jya]

Updated

•

9 years ago

Attachment #8762675 - Attachment is obsolete: true

Jean-Yves Avenard [:jya]

Comment 47

•

9 years ago

With the help of :mstange, I created this project https://github.com/jyavenard/DecodeTest It decodes and displays this video in a minimal application. It's very rough, being on PTO, my aim was only to get something quick Decoding is slightly different to how it would occur within gecko, in that the demuxing+feeding of the decoder is performed on the main thread. With the provided shader, the majority of the time is spent in [mContext flushBuffer]; in the function drawme. I can do a more similar threading model as gecko if you think that could help.

Matt Woodrow (:mattwoodrow)

Assignee

Comment 48

•

9 years ago

Attached patch Decode test experiments — Details — Splinter Review

Experimentation based on top of 1bab81ac53da1b755fc83384c7f12b26f5554752. I played around with this for a while, adding a proper frame queue and trying to copy more exactly what gecko does. I found that when NSOpenGLCPSurfaceOpacity was set to 0 (what gecko does unconditionally) and when the NSTitledWindowMask window mask flag was removed (what gecko does during fullscreen) then performance was awful (20fps roughly). mstange noticed the connection between these two and suggested trying [window setOpaque:YES], which unfortunately doesn't help. The slowdown appears to be within [NSOpenGLContext flushBuffer] for a double buffered GL context, and within glFlush for a single buffered one. The time is spent waiting on mach_msg_trap. It's not really clear why having a transparent context affects performance so much, or why removing the window title makes it show up. I believe we have had somewhat poor performance even in windowed mode in gecko, so removing the window title may just exacerbate the opacity issue rather than actually cause it.

Assignee: jyavenard → matt.woodrow

Matt Woodrow (:mattwoodrow)

Assignee

Comment 49

•

9 years ago

Attached patch Make the GLContext opaque when we can — Details — Splinter Review

Unfortunately we can't have an opaque context at all times since we need transparency to render rounded border on our window. The poor video playback seems to be primarily when in fullscreen, so we can switch to using an opaque context during fullscreen and get much improved performance. I'm not sure if there's much we can do for windowed without giving up on rounded corners. We could try asking Apple if there's a solution for this? It would be interesting to see if other performance is improved with opaque contexts, it might be significant enough to accept the loss of rounded corners.

Attachment #8776827 - Flags: review?(mstange)

Markus Stange [:mstange]

Comment 50

•

9 years ago

Comment on attachment 8776827 [details] [diff] [review] Make the GLContext opaque when we can Review of attachment 8776827 [details] [diff] [review]: ----------------------------------------------------------------- Looks good. It's possible that starting with 10.10 we can make the context opaque even for regular windows, because the window server will apply the rounding. We'll need to experiment with that again. pcwalton told me that servo always uses an opaque context, and they get the rounding by using a private API to describe a window shape region to the window server, but that doesn't give them anti-aliasing in the rounded corner. Maybe we can do better. ::: widget/cocoa/nsChildView.mm @@ +2017,5 @@ > CGFloat cornerRadius = [(ChildView*)mView cornerRadius]; > mDevPixelCornerRadius = cornerRadius * BackingScaleFactor(); > mIsCoveringTitlebar = [(ChildView*)mView isCoveringTitlebar]; > NSInteger styleMask = [[mView window] styleMask]; > + bool isFullscreen = (styleMask & NSFullScreenWindowMask) || !(styleMask & NSTitledWindowMask); "!(styleMask & NSTitledWindowMask)"? That works, I guess, but only until we start accelerating popup windows. And it's existing code. I'll have to keep this in mind for when we actually want to start accelerating popup windows.

Attachment #8776827 - Flags: review?(mstange) → review+

Pulsebot

Comment 51

•

9 years ago

Pushed by mwoodrow@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/bfc47d8a87ef Make our NSOpenGLContext opaque when in fullscreen mode. r=mstange

Carsten Book [:Tomcat]

Comment 52

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/bfc47d8a87ef

Markus Stange [:mstange]

Updated

•

9 years ago

Depends on: 1292288

Markus Stange [:mstange]

Comment 53

•

9 years ago

I think we should back this out due to bug 1292288.

Jean-Yves Avenard [:jya]

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

Keywords: leave-open

Resolution: --- → FIXED

Myk Melez [:myk] [@mykmelez]

Updated

•

9 years ago

Depends on: 1295217

Stefan [:stefanh]

Updated

•

9 years ago

Depends on: 1297879

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

status-firefox51: --- → fixed

Target Milestone: --- → mozilla51

playback60fps.trace.zip 9 years ago Jean-Yves Avenard [:jya] 7.01 MB, application/zip		Details
GPUprofiler60fps.trace.zip 9 years ago Jean-Yves Avenard [:jya] 4.87 MB, application/zip		Details
MozReview Request: Bug 1230641: P1. Use UYVY (YUV422) format in decoders. r?mattwoodrow 9 years ago Jean-Yves Avenard [:jya] 58 bytes, text/x-review-board-request	mattwoodrow : review+	Details
MozReview Request: Bug 1230641: P2. Increase the video queue size on mac. r?cpearce 9 years ago Jean-Yves Avenard [:jya] 58 bytes, text/x-review-board-request	cpearce : review+	Details
Timing log module AppleTiming 9 years ago Jean-Yves Avenard [:jya] 4.72 KB, patch		Details \| Diff \| Splinter Review
Timing log module AppleTiming 9 years ago Jean-Yves Avenard [:jya] 5.39 KB, patch		Details \| Diff \| Splinter Review
Decode test experiments 9 years ago Matt Woodrow (:mattwoodrow) 5.57 KB, patch		Details \| Diff \| Splinter Review
Make the GLContext opaque when we can 9 years ago Matt Woodrow (:mattwoodrow) 3.28 KB, patch	mstange : review+	Details \| Diff \| Splinter Review