Closed Bug 1307546 Opened 8 years ago Closed 8 years ago

memory burst and stop responding when playing specific video

Categories

(Core :: Audio/Video: Playback, defect, P1)

50 Branch
All
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox49 --- unaffected
firefox50 + fixed
firefox51 - fixed
firefox52 --- verified

People

(Reporter: yfdyh000, Assigned: cpearce)

References

Details

(Keywords: memory-footprint, regression)

Attachments

(2 files, 1 obsolete file)

Caused by bug 1258870, tested by mozregression. I can reproduce it on Fx50b4 ~ Nightly 52, whatever e10s and 32/64bit. Steps to reproduce: Install the https://greasyfork.org/scripts/22441-bilibili-html5%E6%92%AD%E6%94%BE%E5%99%A8 to activating HTML5 mode for bilibili.com. You can turn off it after the first visit. Open the http://www.bilibili.com/video/av6535800/ (I only found this example). Click the video to play, wait at least ten seconds, as well as observe the Firefox's process footprint. Actual results: Within 10 seconds of the start of the playback, the content process occupies many CPU resources, and the UI stops responding, the memory usage of the content process is growing. Expected results: Smooth playback, low CPU and memory usage.
Has Regression Range: --- → yes
Has STR: --- → yes
You can also get to this HTML5 site by spoofing the Safari user agent. I posted the video file to help isolate the issue: http://media.junglecode.net/test/1307546/10632694-1hd.mp4 Yang: does it still happen for you if you view the URL above in Firefox?
Flags: needinfo?(yfdyh000)
Yes, I can reproduce it with this url in Firefox 50.0b4 with clean profile. Thanks for posting this video, I did not get a working sample before.
Flags: needinfo?(yfdyh000)
Chris: can you fire this up in a debugger and see what comes back? Thx!
Flags: needinfo?(cpearce)
Can't see CPU spikes on Linux by running `./mach run http://media.junglecode.net/test/1307546/10632694-1hd.mp4`. Hi Yang, Can you repro the issue on Mac or Linux?
Flags: needinfo?(yfdyh000)
Please try to reproduce it using Firefox Beta on Windows. I see the 100% CPU during playback and no playback started, buffer and adjust the time is normal, using http://ftp.mozilla.org/pub/firefox/releases/50.0b6/linux-x86_64/en-US/ on Lubuntu 14.04 x64, I'm not see the memory burst on the Linux.
Flags: needinfo?(yfdyh000)
JW, can you try comment 6?
Flags: needinfo?(jwwang)
I can't repro the issue with http://ftp.mozilla.org/pub/firefox/releases/50.0b6/linux-x86_64/en-US/. There is not CPU spikes nor memory burst.
Flags: needinfo?(jwwang)
Hi Alastor, Can you try to repro this issue on Windows?
Flags: needinfo?(alwu)
I didn't see the memory burst in beta 50.0b10 on Windows 7, the video playback is smooth and I can still interact with UI interface. But I observed the CPU usage when using HTML5 player is indeed higher than using flash player. When using the HTML5 player, the CPU usage is 56~60%. When using flash player, the CPU usage is 15~20%. Maybe the problem is their HTML player doesn't be implemented well causing the high CPU usage rate?
Flags: needinfo?(alwu)
It seems to have been alleviated in 50.0b10, I can still reproduce it with https://ftp.mozilla.org/pub/firefox/releases/50.0b4/win32/. Can you try the 50.0b4 to make sure this is not my environmental problems? In 50.0b10, the memory GC looks normal, but CPU usage is still full while playing.
Flags: needinfo?(alwu)
(In reply to YF (Yang) from comment #11) > It seems to have been alleviated in 50.0b10, I can still reproduce it with > https://ftp.mozilla.org/pub/firefox/releases/50.0b4/win32/. > > Can you try the 50.0b4 to make sure this is not my environmental problems? > > > In 50.0b10, the memory GC looks normal, but CPU usage is still full while > playing. Same result as comment10, I didn't see the memory burst in 50.0b4.
Flags: needinfo?(alwu)
So, what should I do to get more debugging information? Added, I can still reproduce it in latest Aurora and Nightly with clear profile and comment 2, it seems be relieved in the latest Beta only.
Hmm. Did something recently change in this area in beta?
Flags: needinfo?(ajones)
(In reply to Andrew Overholt [:overholt] from comment #14) > Hmm. Did something recently change in this area in beta? Not that I'm aware of.
Flags: needinfo?(ajones)
I cannot reproduce this issue with 50b10. Yang: can you please open about:support and "Copy text to clipboard" and post that here?
Flags: needinfo?(cpearce)
We are definitely using more CPU in 50b10 than in 49 release on my machine. I will run a mozregression.
Mozregression points to bug 1258870, as reported by Yang. Bummer.
[Tracking Requested - why for this release]: This causes a spike in CPU usage when playing back video. It's a regression from bug 1258870. I propose we back out bug 1258870 from beta and rebuild the RC, and fix bug 1258870 in 51. That's a shame, as bug 1258870 fixed a nasty A/V sync issue affecting Amazon Prime video.
To help understand the severity, on my i7 4670K desktop, I'm seeing about 20% CPU utilization with the regression, and about 1% without. The reporter of this bug says Firefox is unusable when playing HD video with the regression on his machine.
To back this out cleanly, we would need to backout the following changesets (in top to bottom order) from the 'release' branch: changeset: 350216:7852ac7e74fa user: Chris Pearce <cpearce@mozilla.com> date: Fri Aug 19 21:34:42 2016 +1200 summary: Bug 1295630 - Provide a way for A/V sync to be ruined, so talos can test compositor throughput by painting expired frames. r=mattwoodrow, a=ritu changeset: 350150:f04b44d05fcc user: Jean-Yves Avenard <jyavenard@mozilla.com> date: Mon Aug 29 15:46:56 2016 +1000 summary: Bug 1298594 - P4. Pop the frame when current time is past the end of the current frame. r=jwwang, a=ritu changeset: 349946:21fd3ea62d8c user: Chris Pearce <cpearce@mozilla.com> date: Mon Aug 15 13:35:52 2016 +1200 summary: Bug 1258870 - Don't push late video frames to the compositor, drop them. r=jwwang,a=ritu
The problem is in VideoSink::UpdateRenderedVideoFrames(), here: http://searchfox.org/mozilla-central/rev/f5c9e9a249637c9abd88754c8963ecb3838475cb/dom/media/mediasink/VideoSink.cpp#445 The expression (nextFrameTime - clockTime) is negative. This results in us setting a timeout to run UpdateRenderedVideoFrames() again immediately. The expression is negative because the file is badly muxed. When we first go bad, we hit two consecutive frames with these time ranges: [100000,200000] [67000,133000], that is the frame cover mutually ovelapping time ranges, and don't have monotonically increasing start times. In this example, if clockTime was 100001, clockTime would be greater than the start of the first frame (100000) but the start of the next frame (67000) minus clockTime is negative. So we end up hot-looping. This is fortunate, since it means we only perform badly on badly muxed files. I have no idea how common files with this particular variation of bad muxing are.
I've seen a lot of files with out of order pts. All the sample files used in wpt to start with. Few amazon files are like this too
Comment on attachment 8807414 [details] Bug 1307546 - Ensure we don't set timers with negative intervals in to update A/V sync. https://reviewboard.mozilla.org/r/90550/#review90284
Attachment #8807414 - Flags: review+
Attachment #8807414 - Flags: review?(jwwang)
Assignee: nobody → cpearce
Comment on attachment 8807414 [details] Bug 1307546 - Ensure we don't set timers with negative intervals in to update A/V sync. https://reviewboard.mozilla.org/r/90550/#review90316 ::: dom/media/mediasink/VideoSink.cpp:444 (Diff revision 1) > if (frames.Length() < 2) { > return; > } > > int64_t nextFrameTime = frames[1]->mTime; > + int64_t delta = (nextFrameTime > clockTime) ? (nextFrameTime - clockTime) For small timeouts, the timer might fire immediately without the audio clock being progressed at all (depending on the resolution of the audio clock). In this situation the timer will hog the CPU for a short time and cause CPU usage peaks. We might want to do delta = std::max(nextFrameTime - clockTime, SOME_THRESHOLD); to avoid the situation.
Comment on attachment 8807414 [details] Bug 1307546 - Ensure we don't set timers with negative intervals in to update A/V sync. https://reviewboard.mozilla.org/r/90550/#review90316 > For small timeouts, the timer might fire immediately without the audio clock being progressed at all (depending on the resolution of the audio clock). In this situation the timer will hog the CPU for a short time and cause CPU usage peaks. > > We might want to do > delta = std::max(nextFrameTime - clockTime, SOME_THRESHOLD); > to avoid the situation. if doing so, can't use the 30fps constant defined here!
Pushed by cpearce@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d5a00a0b689c Ensure we don't set timers with negative intervals in to update A/V sync. r=jya
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Hello Yang, could you please verify this issue is fixed as expected on a latest (11/05) Nightly build? Thanks!
Flags: needinfo?(yfdyh000)
Comment on attachment 8807414 [details] Bug 1307546 - Ensure we don't set timers with negative intervals in to update A/V sync. Requesting uplift for Firefox 50 and 51. Approval Request Comment [Feature/regressing bug #]: Bug 1307546, regression from fixing A/V sync when the computer is too slow to playback video in real time. [User impact if declined]: Some users on lower end hardware will find that video playback on some videos which are incorrectly encoded will cause unnecessarily high CPU usage, making Firefox not respond well or even lock up the UI. [Describe test coverage new/current, TreeHerder]: It's very hard to write automated tests for a/v sync. [Risks and why]: Low; this patch detects the degenerate case, and backs off to make it not hurt. The usual case of a video which is correctly encoded will not have its behaviour changed. [String/UUID change made/needed]: None.
Attachment #8807414 - Flags: approval-mozilla-release?
Attachment #8807414 - Flags: approval-mozilla-aurora?
Comment on attachment 8807414 [details] Bug 1307546 - Ensure we don't set timers with negative intervals in to update A/V sync. We discussed this over email, this is a low risk fix in that it does not alter playback for correctly encoded video files. this fixes the case when we encounter a badly encoded file which leads to CPU usage spike. Aurora51+, Beta50+
Attachment #8807414 - Flags: approval-mozilla-release?
Attachment #8807414 - Flags: approval-mozilla-release+
Attachment #8807414 - Flags: approval-mozilla-aurora?
Attachment #8807414 - Flags: approval-mozilla-aurora+
Backed out from central (and autoland and inbound) for increased mda test_BufferingWait_mp4.html failure on Windows 7: https://hg.mozilla.org/mozilla-central/rev/143ac19c32251f3cdbef47c7d1998fc870762b19 Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=d5a00a0b689c385afd53b7b93f41e244976e613f Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=6198971&repo=autoland 07:00:49 INFO - TEST-START | dom/media/mediasource/test/test_BufferingWait_mp4.html 07:06:07 INFO - TEST-INFO | started process screenshot 07:06:07 INFO - TEST-INFO | screenshot: exit 0 07:06:07 INFO - Buffered messages logged at 07:00:49 07:06:07 INFO - TEST-PASS | dom/media/mediasource/test/test_BufferingWait_mp4.html | Receive a sourceopen event 07:06:07 INFO - TEST-PASS | dom/media/mediasource/test/test_BufferingWait_mp4.html | Should only receive one sourceopen for this test 07:06:07 INFO - TEST-PASS | dom/media/mediasource/test/test_BufferingWait_mp4.html | Create a SourceBuffer 07:06:07 INFO - TEST-PASS | dom/media/mediasource/test/test_BufferingWait_mp4.html | fetchWithXHR load uri='bipbop/bipbopinit.mp4' status=200 07:06:07 INFO - Loading buffer: [0, 1395) 07:06:07 INFO - SourceBuffer buffered ranges grew from TimeRanges: to TimeRanges: 07:06:07 INFO - TEST-PASS | dom/media/mediasource/test/test_BufferingWait_mp4.html | fetchWithXHR load uri='bipbop/bipbop1.m4s' status=200 07:06:07 INFO - Loading buffer: [0, 24424) 07:06:07 INFO - SourceBuffer buffered ranges grew from TimeRanges: to TimeRanges: [0.095, 0.835918) 07:06:07 INFO - TEST-PASS | dom/media/mediasource/test/test_BufferingWait_mp4.html | fetchWithXHR load uri='bipbop/bipbop2.m4s' status=200 07:06:07 INFO - Loading buffer: [0, 22205) 07:06:07 INFO - SourceBuffer buffered ranges grew from TimeRanges: [0.095, 0.835918) to TimeRanges: [0.095, 1.625396) 07:06:07 INFO - Buffered messages logged at 07:00:50 07:06:07 INFO - TEST-PASS | dom/media/mediasource/test/test_BufferingWait_mp4.html | fetchWithXHR load uri='bipbop/bipbop5.m4s' status=200 07:06:07 INFO - Loading buffer: [0, 18367) 07:06:07 INFO - SourceBuffer buffered ranges grew from TimeRanges: [0.095, 1.625396) to TimeRanges: [0.095, 1.625396)[3.298333, 4.040272) 07:06:07 INFO - Playing video. It should play for a bit, then fire 'waiting' 07:06:07 INFO - Got a waiting event at 0 07:06:07 INFO - Buffered messages logged at 07:00:51 07:06:07 INFO - Got a waiting event at 1.623945 07:06:07 INFO - TEST-PASS | dom/media/mediasource/test/test_BufferingWait_mp4.html | Reached target time of: 1.57895 07:06:07 INFO - TEST-PASS | dom/media/mediasource/test/test_BufferingWait_mp4.html | fetchWithXHR load uri='bipbop/bipbop3.m4s' status=200 07:06:07 INFO - Loading buffer: [0, 24013) 07:06:07 INFO - SourceBuffer buffered ranges grew from TimeRanges: [0.095, 1.625396)[3.298333, 4.040272) to TimeRanges: [0.095, 2.414875)[3.298333, 4.040272) 07:06:07 INFO - Buffered messages logged at 07:00:52 07:06:07 INFO - Got a waiting event at 2.168117 07:06:07 INFO - Buffered messages finished 07:06:07 INFO - TEST-UNEXPECTED-FAIL | dom/media/mediasource/test/test_BufferingWait_mp4.html | Test timed out. 07:06:07 INFO - reportError@SimpleTest/TestRunner.js:114:7 07:06:07 INFO - TestRunner._checkForHangs@SimpleTest/TestRunner.js:135:7 07:06:07 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:156:5 List of failures on OrangeFactor: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1293170&startday=2016-10-01&endday=2016-11-05&tree=all
Status: VERIFIED → REOPENED
Flags: needinfo?(cpearce)
Resolution: FIXED → ---
Despite that status flag change, it's not backed out on either mozilla-release or mozilla-aurora. firefox50's status in particular is unchanged: I did 50 retriggers, opt and debug e10s and not, in https://treeherder.mozilla.org/#/jobs?repo=mozilla-release&revision=8e32fc90a6aadb62ab464ef56a53da8a5d66091a and I don't see anything there other than the slight possibility of an increase in the rate of bug 1282260, which probably isn't real (50 didn't get whatever magic caused that to go away on 52, and 50 runs is almost three weeks worth of mozilla-beta - in that time on beta we got 2 failures, and I triggered up 7, maybe something, maybe nothing). firefox51 is actually worse than 52, since there it's a timeout in test_BufferingWait_mp4.html and a hang in test_DrainOnMissingData_mp4.html killing the rest of the run, so we frequently aren't actually running all the media tests on 51 anymore.
Target Milestone: mozilla52 → ---
I wouldn't disable the test. It's unreliable due to skip to next keyframe logic being randomly activated. This occurs when a machine is too slow to keep up A/V sync and video frames will be dropped. We have a pending bug opened to disable skip to next keyframe during testing
Given that this is failing on m-c but not on m-a or m-r, it must be interacting with something in m-c. I'll do a regression range bisection search over m-c. On Try, as of course this doesn't fail locally. I think we should proceed to ship this fix in 50 and 51, and we'll fix whatever's regressed it in 52.
(In reply to Chris Pearce (:cpearce) from comment #39) > Given that this is failing on m-c but not on m-a or m-r, it must be > interacting with something in m-c. It is failing on m-a, in fact failing worse since either test_DrainOnMissingData_mp4.html or test_BufferingWait_mp4.html itself hangs for 330 seconds without output after the timeout in test_BufferingWait_mp4.html.
But it's not failing on m-r? I see lots of retriggers of mda-e10s there, but no failures.
Changing FAILOVER_UPDATE_INTERVAL_US to 5ms makes us not fail, however I'm seeing a lot of timeouts in dom/media/test/test_bug1113600.html, which I'm not sure are related: https://treeherder.mozilla.org/#/jobs?repo=try&revision=98090384549e&selectedJob=30529041 This seems to have regressed in 51 in between 23 Aug: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f528ba3076535612efa6d6d8601691df0c373563 .. and 32 Aug: https://treeherder.mozilla.org/#/jobs?repo=try&revision=30dffb464ab0c5da396bc30f105f0020990cf70a&selectedJob=30528927
(In reply to Chris Pearce (:cpearce) from comment #55) > Changing FAILOVER_UPDATE_INTERVAL_US to 5ms makes us not fail, however I'm > seeing a lot of timeouts in dom/media/test/test_bug1113600.html, which I'm > not sure are related: Bug 1315586 should fix the timeout of test_bug1113600.html.
Flags: qe-verify+
Hi Andrei, could you please provide specific details on how much of a CPU spike are you noticing on OS X even with CPearce's latest fix? Thanks!
Flags: needinfo?(andrei.vaida)
(In reply to Ritu Kothari (:ritu) from comment #57) > Hi Andrei, could you please provide specific details on how much of a CPU > spike are you noticing on OS X even with CPearce's latest fix? Thanks! We noticed spikes of around 40% on 50.0-build2, using two different machines with Mac OS X 10.11.6 and the video sample from Comment 2. As a side note, on 49.0.2-build2 with the same machines/platforms, we noticed spikes of about 44%.
Flags: needinfo?(andrei.vaida)
Also, this is verified fixed on Windows 7 x64 and Windows 10 x64. We were unable to replicate this bug on Linux, so I assume that's unaffected across branches. I'm not sure exactly what's the best way to point out these results in this bug. Is this fix specifically for Windows? If so, we can mark it verified fixed and just file a follow up bug for Mac, if need be.
Setting the 51 status back to fixed to reflect that it wasn't ever backed out from Aurora.
Looks like what jwwang suggested in comment 26 is working, provided Bug 1315586 is included as an ancestor: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a5c2f54e435f&selectedJob=30534990
Flags: needinfo?(cpearce)
(In reply to Andrei Vaida, QA [:avaida] – please ni? me from comment #58) > (In reply to Ritu Kothari (:ritu) from comment #57) > > Hi Andrei, could you please provide specific details on how much of a CPU > > spike are you noticing on OS X even with CPearce's latest fix? Thanks! > > We noticed spikes of around 40% on 50.0-build2, using two different machines > with Mac OS X 10.11.6 and the video sample from Comment 2. > > As a side note, on 49.0.2-build2 with the same machines/platforms, we > noticed spikes of about 44%. Given that is not the same regressions as what was introduced in 50, we can afford to not hold the release on this. We've been aware that playing video on MacOS often results in the CPU fan spinning up, which indicates higher CPU usage, so I expect you're seeing that.
[Tracking Requested - why for this release]: Let's try to get this fixed in 51.
Attachment #8807414 - Attachment is obsolete: true
Attachment #8808347 - Flags: review?(jwwang) → review+
Pushed by cpearce@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f9b7b51d1416 Give VideoSink a minimum update interval. r=jwwang
cpearce: does some of this still need to land in aurora? It's unclear here if these patches are for uplift or if you need to uplift from bug 1315586. As long as you have it on your radar and know what to request uplift for, great.
Flags: needinfo?(cpearce)
Yes. We'll need the new patch in this bug, and bug 1315586. I'll request here and in bug 1315586. I was waiting to make sure I wasn't backed out again.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Track 51- as this was fixed in 51.
Hi :cpearce, The first patch in m-c was backed out in comment #36 but was not backed out in aurora & beta. Now, you have a new patch and re-land in m-c in comment #69, do we need to back out codes in aurora and re-land your new patch in aurora?
I've rebased my second patch on top of the first, it looks green in Aurora: https://treeherder.mozilla.org/#/jobs?repo=try&revision=98574795b210e5ecdd4f050fe8be1706ae4f2474 bug 1315586 doesn't apply to Aurora, and the above Try push is on Aurora without bug 1315586, so I think we can avoid uplifting bug 1315586 as well and just take my second patch here.
Flags: needinfo?(cpearce)
Comment on attachment 8809516 [details] [diff] [review] Give VideoSink a minimum update interval Approval Request Comment [Feature/regressing bug #]: Bug 1307546, and earlier patch in this bug; regression from fixing A/V sync when the computer is too slow to playback video in real time. [User impact if declined]: Intermittent test failures on Aurora. [Describe test coverage new/current, TreeHerder]: This fixes test failures on Aurora. [Risks and why]: Low; this is a further back-off in the degenerate case. So it should be safe. [String/UUID change made/needed]: None.
Attachment #8809516 - Flags: review+
Attachment #8809516 - Flags: approval-mozilla-aurora?
Comment on attachment 8809516 [details] [diff] [review] Give VideoSink a minimum update interval Fix for performance issues with video syncing. OK to uplift to aurora.
Attachment #8809516 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Wes or Tomcat, this may not show up in your queries for 51/aurora uplift because the status flag is already set to fixed. Can you please merge the patch in comment 74 to aurora? thanks!
Flags: needinfo?(wkocher)
Flags: needinfo?(cbook)
Flags: needinfo?(wkocher)
Flags: needinfo?(cbook)
Verified as fixed using Aurora 52.0a2 2016-11-22 under Win 10 64-bit. Considering this and the above comments, I'm removing the qe-verify+ flag.
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: