Closed Bug 791344 Opened 10 years ago Closed 10 years ago
Browser freeze, allocates all memory when viewing a particular type of video with a particular build
1021.96 KB, text/plain
2.23 KB, patch
|Details | Diff | Splinter Review|
1.79 KB, patch
|Details | Diff | Splinter Review|
STR: - Go to http://wrla.ch/eideticker/dashboard/#/lg-g2x/taskjs-scrolling/checkerboard - Click on one of the data point, that is quite recent (bottom right corner), notice that a video appears on the right of the graph. - Click on the video to start the playback Expected: The video plays Actual result: The browser freezes, allocates all the memory, and gets killed by the oom killer. The CPU usage showed by `top` is around 150-200%, indicating that at least two threads are doing things. I managed to run sysprof while my system was dying to get a profile (attached, open it in sysprof on Linux). It shows a couple things (profiles are cumulative, i.e. include time spent by called functions): - nsBuiltinDecoderStateMachine::AdvanceFrame - 17% - nsBuiltinDecoderStateMachine::StartBuffering - 8% - nsBuiltinDecodedStateMachine::UpdateReadyState - 5% - NS_DispatchToMainThread - 3% - nsBuiltinDecoderStateMachine::HasLowUndecodedData - 5% - nsBuiltinDecoderStateMachine::GetUndecodedData - 4.5% - nsBuiltinDecoderStateMachine::GetBuffered - 4% - nsWebMReader::GetBuffered - 4% - nsBuiltinDecoderStateMachine::UpdateReadyState - 6% - NS_DispatchToMainThread - 3.5% - nsBuiltinDecoderStateMachine::GetUndecodedData - 4% - nsWebMReader::GetBuffered - 3.5% Also, there is a _very_ large amount of time spent in system calls related to locking (sys_futex, do_futex, etc.), and thos are called by the Firefox process This happens only on those videos, and only on my main nightly (from the tarballs on http://nightly.mozilla.org, updated to version 2012-09-13), not on my builds, and only on Linux (I tested the same nightly on Mac, works fine). It works fine on release.
If I update this build to a new version (2012-09-14), it works, I can't reproduce. I can also reproduce on another machine, with a build from 2012-09-10. When I updated this machine, the problem goes away as well. Funnily enough, nothing relevant has landed in content/media or media/ in that time range.
Testing against http://wrla.ch/eideticker/dashboard/videos/video-1347531828.23.webm I can reproduce this in a 2012-09-13 build, 2012-09-16 build, and my out-of-date debug build from September 1stish (3e2f50279d90), all on Linux. The video plays fine in Firefox 15, and in all builds when played from a local download. It seems like we're entering a tight loop when running low on data/pausing to buffer. The main thread is spending most of its time in nsBuiltinDecoder::NextFrameAvailable, and NSPR logging shows the state machine furiously switching between BUFFERING and DECODING. Also seem to be a surprisingly larger number of "Decoder discarding video frame" messages logged both when reproducing this and when the video plays normally.
This seems to have regressed between Firefox 15 and Beta 16. which should correspond to the range 26dcd1b1a208:6fdf9985acfe if I've interpreted the tags correctly.
Hardware: x86_64 → All
Version: unspecified → 16 Branch
849da4232313 is good, 45dbfd1debfd is bad. That makes this a regression from bug 760336.
(In reply to Matthew Gregan [:kinetik] from comment #2) > Also seem to be a surprisingly larger number of "Decoder discarding video > frame" messages logged both when reproducing this and when the video plays > normally. This might have to do with the fact that those particular videos are actually 60fps (just checked with mkvinfo). AdvanceFrame _should_ handle that, but apparently does not. I've checked, and we are each time slightly (less that 200 microseconds) late to make the frame we want, so it gets skipped. I'll see if we can get something better by tweaking some constants or something. I'm investigating the other problem, which might be related (here, we have to decode more that twice the amount of frames we usually do, and maybe the buffering logic is making assumptions it should not make).
The logic that logged frame skipping is off, and is reporting that every frame is skipped. This patch actually makes it report the number of frames skipped. With this patch applied, we don't skip frames at all. I'll continue investigating for the other problem.
Attachment #661983 - Flags: review?(kinetik)
Comment on attachment 661983 [details] [diff] [review] Properly log frame skipping. Review of attachment 661983 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/media/nsBuiltinDecoderStateMachine.cpp @@ +2292,5 @@ > int64_t remainingTime = AUDIO_DURATION_USECS; > NS_ASSERTION(clock_time >= mStartTime, "Should have positive clock time."); > nsAutoPtr<VideoData> currentFrame; > +#ifdef PR_LOGGING > + int32_t popedFrames = 0; Call this droppedFrames instead.
Attachment #661983 - Flags: review?(kinetik) → review+
I can reproduce reliably when I follow those steps: - Activate the NSPR log for the media decoder - Load the video using cpearce's HttpMediaServer, setting the network rate to 100k/s or so, that is slower that what would be needed for this particular video to play through (in the canplaythrough sense), because the video is around 350k/s - Start playing the video, that hit the end of the buffered region pretty quickly - Notice that we switch super quickly from buffering to decoding (we see that in the log) - Hit escape so we kill the http connection Then an infinite loop occurs, the |clock_time| variable goes way after then end of the video, and never stops.
Attachment #661983 - Attachment is obsolete: true
This was the problem, I guess, some units were not coherent, messing up the buffering logic. I can't reproduce the bug with this patch applied, and I'm still able to reproduce it with 100% success rate if I qpop the patches. Also, when a user press espace when a video loads, the state machine goes go into shutdown state, and then does something I can't figure out at 11pm, but it seems sketchy. It's still switching back and forth between buffering and decoding state, but of course, no more data are going to come (the resource is closed, there is no decoder, no element, nothing). I'll investigate tomorrow. Or maybe this belongs to another bug.
Attachment #662030 - Flags: review?(kinetik)
Comment on attachment 662030 [details] [diff] [review] Use the right method to build a timestamp from microseconds. r= Thanks for digging in to this!
Attachment #662030 - Flags: review?(kinetik) → review+
Attachment #662030 - Attachment is obsolete: true
Attachment #661992 - Attachment is obsolete: true
Assignee: nobody → paul
Status: NEW → ASSIGNED
Pushed to Try since I don't see any results here. Will checkin if it goes green. https://tbpl.mozilla.org/?tree=Try&rev=f9cf6f65e19f
Looks like this or bug 792274 is causing OSX crashtest failures. https://tbpl.mozilla.org/php/getParsedLog.php?id=15325977&tree=Try TEST-UNEXPECTED-FAIL | file:///builds/slave/talos-slave/test/build/reftest/tests/content/media/test/crashtests/752784-1.html | Exited with code 1 during test run INFO | automation.py | Application ran for: 0:01:23.503429 INFO | automation.py | Reading PID log: /var/folders/3t/f8_jgcpj1_s59lzd72ny3wyh00000w/T/tmputbtvjpidlog Downloading symbols from: http://firstname.lastname@example.org/try-macosx64-debug/firefox-18.0a1.en-US.mac64.crashreporter-symbols.zip PROCESS-CRASH | file:///builds/slave/talos-slave/test/build/reftest/tests/content/media/test/crashtests/752784-1.html | application crashed (minidump found) Crash dump filename: /var/folders/3t/f8_jgcpj1_s59lzd72ny3wyh00000w/T/tmpqjrRQd/minidumps/27D53FF3-B731-4410-B87B-63FB306229E4.dmp Operating system: Mac OS X 10.8.0 12A269 CPU: amd64 family 6 model 42 stepping 7 8 CPUs Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash address: 0x40 Thread 0 (crashed) 0 XUL!nsBuiltinDecoder::ConnectDecodedStreamToOutputStream [nsAutoPtr.h : 1003 + 0x0] rbx = 0x04b643f0 r12 = 0x19a6bb60 r13 = 0x19a6ba80 r14 = 0x19a6ba80 r15 = 0x42334978 rip = 0x01cdb5f5 rsp = 0x5fbf95d0 rbp = 0x5fbf95f0 Found by: given as instruction pointer in context 1 XUL!nsBuiltinDecoder::AddOutputStream [nsBuiltinDecoder.cpp : 181 + 0xa] rip = 0x01cdc56c rsp = 0x5fbf9600 Found by: stack scanning 2 XUL + 0xcdc45f rip = 0x01cdc460 rsp = 0x5fbf9610 Found by: stack scanning 3 XUL!nsHTMLMediaElement::CaptureStreamInternal [nsHTMLMediaElement.cpp : 1551 + 0x13] rip = 0x0188e542 rsp = 0x5fbf9640 Found by: stack scanning 4 XUL!nsHTMLMediaElement::MozCaptureStream [nsHTMLMediaElement.cpp : 1559 + 0x4] rip = 0x0188e590 rsp = 0x5fbf9680 Found by: stack scanning 5 XUL!NS_InvokeByIndex_P [xptcinvoke_x86_64_unix.cpp : 162 + 0x3] rip = 0x0274ebcf rsp = 0x5fbf96a0 Found by: stack scanning 6 XUL!CallMethodHelper::ConvertIndependentParam [XPCWrappedNative.cpp : 2836 + 0x15] rip = 0x01f3bd38 rsp = 0x5fbf96b0 Found by: stack scanning 7 XUL!CallMethodHelper::InitializeDispatchParams [nsTArray.h : 192 + 0x0] rip = 0x01f3ac92 rsp = 0x5fbf9700 Found by: stack scanning 8 XUL!CallMethodHelper::Call [XPCWrappedNative.cpp : 3105 + 0x4] rip = 0x01f39e15 rsp = 0x5fbf9740 Found by: stack scanning 9 XUL!XPCWrappedNative::CallMethod [XPCWrappedNative.cpp : 2405 + 0x7] rip = 0x01f37b22 rsp = 0x5fbf9780 Found by: stack scanning 10 XUL + 0xc1029f rip = 0x01c102a0 rsp = 0x5fbf97b8 Found by: stack scanning Thread 1 0 libsystem_kernel.dylib + 0x12d16 rbx = 0x04704648 r12 = 0x7317ff40 r13 = 0x7317ff88 r14 = 0x731809c0 r15 = 0x73180934 rip = 0x8a972d16 rsp = 0x04704508 rbp = 0x04704680
Pushed to try along with other things: https://tbpl.mozilla.org/?tree=Try&rev=75d8ed08b09c
Status: ASSIGNED → NEW
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Comment on attachment 662240 [details] [diff] [review] Use the right method to build a timestamp from microseconds and properly convert between units [Approval Request Comment] Bug caused by (feature/regressing bug #): 760336 User impact if declined: high CPU and memory use, possible main thread hangs; also makes QA of bug 789075's patch difficult Testing completed (on m-c, etc.): baked on mozilla-central for 3 weeks Risk to taking this patch (and alternatives if risky): very low, corrects time calculation in two places String or UUID changes made by this patch: none
Comment on attachment 662240 [details] [diff] [review] Use the right method to build a timestamp from microseconds and properly convert between units Oops, this is on aurora already. It'd still be nice to take on beta.
Comment on attachment 662240 [details] [diff] [review] Use the right method to build a timestamp from microseconds and properly convert between units We get enough feedback from users about memory use that this seems very worth taking for uplift to Beta, please land this week so it's in Beta 3 and we have time to ensure there are no unintended regressions.
Attachment #662240 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/17.0 Firefox/17.0 Verified fixed with F17 beta 6. Could previously reproduce the issue. Checked with sites in comments 0 and 2.
You need to log in before you can comment on or make changes to this bug.