Closed Bug 1697641 Opened 3 years ago Closed 3 years ago

Assertion failure: mStart <= mEnd (Invalid Interval), at src/dom/media/Intervals.h:48

Categories

(Core :: Audio/Video: Playback, defect, P3)

defect

Tracking

()

RESOLVED FIXED
89 Branch
Tracking Status
firefox-esr78 --- wontfix
firefox87 --- wontfix
firefox88 --- wontfix
firefox89 --- fixed

People

(Reporter: bryce, Assigned: bryce)

References

(Blocks 1 open bug)

Details

(Keywords: assertion, crash, testcase)

Crash Data

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #1530897 +++

Bug 1530897 fixed a fuzz case for this signature, but we have more crashes with the same stack. I'm fairly confident that while we're getting crashes logged on this line, that the assertion in questions cannot happen there because that line constructs an empty interval. I think the issue actually lies with the buffered variable and the assertion on the line we're seeing is a quirk of optimisation.

Fuzzers are hitting this a few times a day but none of the test cases seem to repro.

(In reply to Tyson Smith [:tsmith] from comment #1)

Fuzzers are hitting this a few times a day but none of the test cases seem to repro.

Do the fuzzers ever hit this in debug builds such that we can get a stack/dump with further symbols? It would help to see if we ever get the assertion on a different line in the WebM demuxer to help with my comment 0 thesis.

Seeing this on both ASan (non-debug) and debug builds. They all seem to be on the same line WebMDemuxer.cpp:999.

Assertion failure: mStart <= mEnd (Invalid Interval), at /builds/worker/workspace/obj-build/dist/include/Intervals.h:49

#0 0x7f94e8935928 in Interval<mozilla::media::TimeUnit &, mozilla::media::TimeUnit &> /builds/worker/workspace/obj-build/dist/include/Intervals.h:49:5
#1 0x7f94e8935928 in mozilla::WebMDemuxer::GetBuffered() /builds/worker/checkouts/gecko/dom/media/webm/WebMDemuxer.cpp:999:19
#2 0x7f94e89395ab in mozilla::WebMTrackDemuxer::GetBuffered() /builds/worker/checkouts/gecko/dom/media/webm/WebMDemuxer.cpp:1257:19
#3 0x7f94e893897a in mozilla::WebMTrackDemuxer::Reset() /builds/worker/checkouts/gecko/dom/media/webm/WebMDemuxer.cpp:1190:35
#4 0x7f94e851410c in operator() /builds/worker/checkouts/gecko/dom/media/MediaFormatReader.cpp:671:41
#5 0x7f94e851410c in mozilla::detail::RunnableFunction<mozilla::MediaFormatReader::DemuxerProxy::Wrapper::Reset()::'lambda'()>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:534:5
#6 0x7f94e4d30462 in mozilla::TaskQueue::Runner::Run() /builds/worker/checkouts/gecko/xpcom/threads/TaskQueue.cpp:158:20
#7 0x7f94e4d48dc7 in nsThreadPool::Run() /builds/worker/checkouts/gecko/xpcom/threads/nsThreadPool.cpp:303:14
#8 0x7f94e4d3fc73 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1152:16
#9 0x7f94e4d465aa in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:548:10
#10 0x7f94e566dc6d in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:302:20
#11 0x7f94e55d7ee3 in MessageLoop::RunInternal() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:335:10
#12 0x7f94e55d7dfd in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:328:3
#13 0x7f94e55d7dfd in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:310:3
#14 0x7f94e4d3c396 in nsThread::ThreadFunc(void*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:391:10
#15 0x7f94f9cebcdb in _pt_root /builds/worker/checkouts/gecko/nsprpub/pr/src/pthreads/ptthread.c:201:5
#16 0x7f94fa36a608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477:8
#17 0x7f94f9f33292 in clone /build/glibc-eX1tMB/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Perfect, thanks! That helps confirm my comment 0 thoughts.

Attached video testcase.webm
Flags: in-testsuite?
Keywords: testcase

Think I've figured out this one. This is caused by our clamping of end time to the duration[0]. However, we don't clamp the start time. So we can get into a state where oddly formed files will cause the algorithm to end up with a start time and end time greater than our duration. Then we only clamp the end time such that end < start and we then create an invalid interval.

This looks to be caused by the final cluster in the block having a timecode that is inconsistent with other timecodes and number of frames (it jumps ahead by an hour from what you'd expect given previous clusters).

I think we should handle this as in the current malformed cases by skipping the malformed interval. We could try and clamp the interval back into valid ranges, but I don't know if it's worth the time for busted files.

Fix incoming.

[0] https://searchfox.org/mozilla-central/rev/be413c29deeb86be6cdac22445e0d0b035cb9e04/dom/media/webm/WebMDemuxer.cpp#978

I can't get a reliable crash test out of this. I've been able to crash locally with the test file and thought I had a good case where I opened the file and forced a reset of the decoder by triggering a load on another file. However it doesn't reliably repro and I can't get my crashtest to fail. Going to push my fix and if I can't get my crasher crashing I'll rely on the fuzzers.

This expands on existing checks when getting a webms buffered intervals. The
additional check ensures we don't end up with end < start due to our code that
clamps end at duration.

This patch moves those checks into their own helper function so as to reduce
clutter in GetBuffered.

If I load the test file from bmo then run

$('video').load()

in the console, I get a reliable repro. My guess would be it's to do with different caching of the resource. Still can't get a reliable local repro or test, I'm going to give up in the interest of timeboxing the rabbit hole.

I can reproduce it reliable locally with Grizzly.

If you don't have Grizzly installed:

pip install grizzly-framework

To repro:

python3 -m grizzly.replay <path_to_browser>/firefox testcase.webm --relaunch 1 --repeat 10 --time-limit 5

Use -l if you want to save logs and --rr if you also want an rr trace.

(In reply to Tyson Smith [:tsmith] from comment #11)

I can reproduce it reliable locally with Grizzly.

If you don't have Grizzly installed:

pip install grizzly-framework

To repro:

python3 -m grizzly.replay <path_to_browser>/firefox testcase.webm --relaunch 1 --repeat 10 --time-limit 5

Use -l if you want to save logs and --rr if you also want an rr trace.

I tried several times with a local build and a shippable build and couldn't repro (10 repeats, no fault). Do a rebuild, now faulting :\

My crashtest will now fault if I serve it locally, but won't fault if I run it with ./mach crashtest. I've been trying different combinations of config to try and get it to fall over, but no luck.

Pushed by bvandyk@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/23ff124946a2
Gracefully handle webms with bogus timecodes that conflict other metadata. r=kinetik
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 89 Branch
Flags: in-testsuite? → in-testsuite-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: