Closed Bug 1059101 Opened 10 years ago Closed 10 years ago

Intermittent 860143.html | application crashed [@ mozilla::TrackUnionStream::CopyTrackData(mozilla::StreamBuffer::Track*, unsigned int, long long, long long, bool*)] | Assertion failure: startTicks == TimeToTicksRoundUp(rate, outputStart) (Samples missing

Categories

(Core :: WebRTC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla35
Tracking Status
firefox33 --- unaffected
firefox34 --- fixed
firefox35 --- fixed
firefox-esr31 --- unaffected
b2g-v2.0 --- unaffected
b2g-v2.0M --- unaffected
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: cbook, Assigned: jesup)

References

()

Details

(Keywords: assertion, crash, intermittent-failure, Whiteboard: [webrtc-uplift])

Attachments

(1 file, 3 obsolete files)

Rev5 MacOSX Mountain Lion 10.8 mozilla-inbound debug test crashtest on 2014-08-26 21:58:31 PDT for push a99f3b29c4a8

slave: talos-mtnlion-r5-082

https://tbpl.mozilla.org/php/getParsedLog.php?id=46829628&tree=Mozilla-Inbound



Assertion failure: startTicks == TimeToTicksRoundUp(rate, outputStart) (Samples missing), at /builds/slave/m-in-osx64-d-00000000000000000/build/content/media/TrackUnionStream.h:253

22:04:08  WARNING -  PROCESS-CRASH | file:///builds/slave/talos-slave/test/build/tests/reftest/tests/dom/media/tests/crashtests/860143.html | application crashed [@ mozilla::TrackUnionStream::CopyTrackData(mozilla::StreamBuffer::Track*, unsigned int, long long, long long, bool*)]
22:04:08     INFO -  Crash dump filename: /var/folders/m9/3c3mb3ws0y18t531sbfpqzmc00000w/T/tmpQWcW0r.mozrunner/minidumps/3FEC79A8-396C-4D04-A56D-529861434139.dmp
22:04:08     INFO -  Operating system: Mac OS X
22:04:08     INFO -                    10.8.0 12A269
22:04:08     INFO -  CPU: amd64
22:04:08     INFO -       family 6 model 42 stepping 7
22:04:08     INFO -       8 CPUs
22:04:08     INFO -  Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
22:04:08     INFO -  Crash address: 0x0
22:04:08     INFO -  Thread 55 (crashed)
22:04:08     INFO -   0  XUL!mozilla::TrackUnionStream::CopyTrackData(mozilla::StreamBuffer::Track*, unsigned int, long long, long long, bool*) [nsTArray.h:a99f3b29c4a8 : 866 + 0x0]
22:04:08     INFO -      rbx = 0x00007fff7e795c68   r12 = 0x0000000000000e80
22:04:08     INFO -      r13 = 0x0000000000062d80   r14 = 0x00000001114c3cd0
22:04:08     INFO -      r15 = 0x00000000000f4240   rip = 0x000000010301d608
22:04:08     INFO -      rsp = 0x0000000127e809a0   rbp = 0x0000000127e80a80
22:04:08     INFO -      Found by: given as instruction pointer in context
22:04:08     INFO -   1  XUL!mozilla::TrackUnionStream::ProcessInput(long long, long long, unsigned int) [TrackUnionStream.h:a99f3b29c4a8 : 77 + 0x1f]
22:04:08     INFO -      rbx = 0x000000011282dd60   r12 = 0x0000000000000000
22:04:08     INFO -      r13 = 0x0000000000000000   r14 = 0x000000010c977c80
22:04:08     INFO -      r15 = 0x0000000000000000   rip = 0x000000010301cb87
22:04:08     INFO -      rsp = 0x0000000127e80a90   rbp = 0x0000000127e80b60
22:04:08     INFO -      Found by: call frame info
22:04:08     INFO -   2  XUL!mozilla::MediaStreamGraphImpl::Process(long long, long long) [MediaStreamGraph.cpp:a99f3b29c4a8 : 1302 + 0x1b]
22:04:08     INFO -      rbx = 0x0000000000062f80   r12 = 0x0000000000000000
22:04:08     INFO -      r13 = 0x000000010f23aae0   r14 = 0x00000001114c3cd0
22:04:08     INFO -      r15 = 0x00000001114c3cd0   rip = 0x0000000103007480
22:04:08     INFO -      rsp = 0x0000000127e80b70   rbp = 0x0000000127e80bd0
22:04:08     INFO -      Found by: call frame info
22:04:08     INFO -   3  XUL!mozilla::MediaStreamGraphImpl::OneIteration(long long, long long, long long, long long) [MediaStreamGraph.cpp:a99f3b29c4a8 : 1381 + 0xe]
22:04:08     INFO -      rbx = 0x000000010f23aae0   r12 = 0x000000010f23aae0
22:04:08     INFO -      r13 = 0x00000001066fe508   r14 = 0x0000000000062f80
22:04:08     INFO -      r15 = 0x000000000000ac44   rip = 0x000000010300789b
22:04:08     INFO -      rsp = 0x0000000127e80be0   rbp = 0x0000000127e80c60
22:04:08     INFO -      Found by: call frame info
22:04:08     INFO -   4  XUL!mozilla::ThreadedDriver::RunThread() [GraphDriver.cpp:a99f3b29c4a8 : 255 + 0x4]
22:04:08     INFO -      rbx = 0x000000010c9977b0   r12 = 0x000000010f23aae0
22:04:08     INFO -      r13 = 0x00000001066fe508   r14 = 0x0000000000062a1c
22:04:08     INFO -      r15 = 0x000000000000ac44   rip = 0x0000000102fd1c3f
22:04:08     INFO -      rsp = 0x0000000127e80c70   rbp = 0x0000000127e80cb0
22:04:08     INFO -      Found by: call frame info
https://tbpl.mozilla.org/php/getParsedLog.php?id=46904916&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=46945600&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=47027510&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=46975318&tree=B2g-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=47047038&tree=Fx-Team
https://tbpl.mozilla.org/php/getParsedLog.php?id=47067771&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=47077570&tree=Mozilla-Inbound
OS: Mac OS X → All
Hardware: x86 → All
https://tbpl.mozilla.org/php/getParsedLog.php?id=47101013&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=47242874&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=47349911&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=47891712&tree=Fx-Team
https://tbpl.mozilla.org/php/getParsedLog.php?id=47944859&tree=Mozilla-Aurora
This bug is attracting dupes to it like fruit flies to a rotting banana. Can we please find a way to make some time for this?
Flags: needinfo?(paul)
https://tbpl.mozilla.org/php/getParsedLog.php?id=48058822&tree=B2g-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=48071369&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=48095109&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=48106376&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=48211590&tree=Fx-Team
https://tbpl.mozilla.org/php/getParsedLog.php?id=48210746&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=48293760&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=48299224&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=48303688&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=48305959&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=48393894&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=48473186&tree=Fx-Team
https://tbpl.mozilla.org/php/getParsedLog.php?id=48545128&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=48532132&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=48610994&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=48690305&tree=B2g-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=48698090&tree=B2g-Inbound

And suddenly we're not getting usable crash stacks for this anymore?
https://tbpl.mozilla.org/php/getParsedLog.php?id=48693208&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=48715534&tree=Mozilla-Inbound
Assignee: nobody → rjesup
 MOZ_ASSERT(startTicks == TimeToTicksRoundUp(rate, outputStart), "Samples missing");

So, every one of these shows an assertion failure at TimeToTicksRoundUp(), preceded by one or more (usually 3-6) WARNING: Slice out of range: 'aStart >= 0 && aEnd <= aSource.mDuration', file /builds/slave/m-in-osx64-d-00000000000000000/build/content/media/MediaSegment.h, line 282
now without an additional assert for debugging
Attachment #8496142 - Attachment is obsolete: true
Comment on attachment 8496145 [details] [diff] [review]
assert only on time going backwards.  If time jumps forwards (an underflow due to hardware/VM/load stalling us too long), just warn about it - underflows on hardware that stalls you too long are unavoidable.

I'll take a review from whomeever can get to it first.  It's a really trivial patch.
Attachment #8496145 - Flags: review?(roc)
Attachment #8496145 - Flags: review?(karlt)
Whiteboard: [webrtc-uplift]
Comment on attachment 8496145 [details] [diff] [review]
assert only on time going backwards.  If time jumps forwards (an underflow due to hardware/VM/load stalling us too long), just warn about it - underflows on hardware that stalls you too long are unavoidable.

Review of attachment 8496145 [details] [diff] [review]:
-----------------------------------------------------------------

I'm skeptical of your explanation. Underflows should not cause the track to get out of sync here. Can you explain what you think is happening in more detail?
Attachment #8496145 - Flags: review?(roc) → review-
https://tbpl.mozilla.org/php/getParsedLog.php?id=49050179&tree=Mozilla-Inbound

mozilla-inbound_snowleopard-debug_test-mochitest-browser-chrome-1
revision: aaca52b09d8f

13:33:49     INFO -  Assertion failure: startTicks == TimeToTicksRoundUp(rate, outputStart) (Samples missing), at /builds/slave/m-in-osx64-d-00000000000000000/build/content/media/TrackUnionStream.h:267
13:33:52     INFO -  601 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/general/browser_devices_get_user_media.js | application terminated with exit code 1
https://tbpl.mozilla.org/php/getParsedLog.php?id=49048577&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=49062253&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=49168417&tree=B2g-Inbound
https://treeherder.mozilla.org/ui/logviewer.html#?job_id=564422&repo=b2g-inbound
Attached patch Fix assert (obsolete) β€” β€” Splinter Review
This bug is blocking bug 994912 from landing; you can reproduce 100% of the time with my changes in a debug build in case reproducibility is holding up a resolution.

I was looking at it independent from this and wrote the attached patch to resolve it, thinking I was the only one suffering. But I don't fully understand the module, so I'm not sure if I did it right :).

My understanding is as follows:

content/media/TrackUnionStream.h:283
>        // These are the properties we need to ensure:
>        // 1) Exactly 'ticks' ticks of output are produced, i.e.
>        // inputEndTicks - inputStartTicks = ticks.
>        // 2) inputEndTicks <= aInputTrack->GetSegment()->GetDuration().
>        // 3) In any sequence of intervals where neither stream is blocked,
>        // the content of the input track we use is a contiguous sequence of
>        // ticks with no gaps or overlaps.
>        if (map->mEndOfLastInputIntervalInInputStream != inputStart ||
>            map->mEndOfLastInputIntervalInOutputStream != outputStart) {
>          // Start of a new series of intervals where neither stream is blocked.
>          map->mEndOfConsumedInputTicks = TimeToTicksRoundDown(rate, inputStart) - 1;
>        }
>        TrackTicks inputStartTicks = map->mEndOfConsumedInputTicks;
>        TrackTicks inputEndTicks = inputStartTicks + ticks;

There is a long proof in the comments for property #3, but it hinges on the statement below:

content/media/TrackUnionStream.h:331
>        // (now using the fact that inputEnd <= track->GetEndTimeRoundDown() for a non-ended track)

But in the if statement I quoted above, we actually change the inputEndTicks, such that in the non-trivial case, this assumption is invalidated and it may go past the track ending.. In my testing usually start time is way ahead of the actual track end time, so there are no acceptable samples only stale samples (underrun?). The vast majority of the time that the if is tripped, all of the values of 0, so the math works out (trivial case), I'm guessing that happens at the beginning.
Attachment #8497888 - Flags: feedback?(roc)
Attachment #8496145 - Flags: review?(karlt)
So I'll note that ALL of these crashes are due to the only line of code added to this section since Dec 2013 - the one MOZ_ASSERT() line that's crashing us, which padenot added.  We may have been failing this all along occasionally.
Comment on attachment 8497888 [details] [diff] [review]
Fix assert

Review of attachment 8497888 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable.

Bug 1061046 vastly simplifies all this logic but I won't be able to land that for a couple of weeks.
Attachment #8497888 - Flags: feedback?(roc) → feedback+
https://treeherder.mozilla.org/ui/logviewer.html#?job_id=571078&repo=b2g-inbound
Attached patch Fix assert, v2 β€” β€” Splinter Review
Updated with a brief comment explaining underrun can cause the code documented assumptions to be violated.

try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=8224896bea5e
rerun try on busted cases I don't think this impacts: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=20dfba0dceba

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) (on vacation Sep 27 to Oct 12) from comment #146)
> Comment on attachment 8497888 [details] [diff] [review]
> Fix assert
> 
> Review of attachment 8497888 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks reasonable.
> 
> Bug 1061046 vastly simplifies all this logic but I won't be able to land
> that for a couple of weeks.

I reviewed this afterwards, specifically: https://bugzilla.mozilla.org/attachment.cgi?id=8492532&action=diff

I think that is complementary to my change, since it still has some (original) questionable logic:
1) inputTrackEndPoint is only updated if the track is ended, but even if it hasn't we still can't go past the duration boundary
2) if we select fewer samples than "TrackTicks ticks", it will complain at the end indicating the stream is out of sync; this could easily happen if it prefers inputTrackEndPoint (the correct value from 1 above) in the std::min calculations, so we still need to do the null calculations
Attachment #8497888 - Attachment is obsolete: true
Attachment #8498224 - Flags: review?(roc)
https://tbpl.mozilla.org/php/getParsedLog.php?id=49332949&tree=Mozilla-Inbound
Flags: needinfo?(padenot)
Keywords: checkin-needed
Attachment #8496145 - Attachment is obsolete: true
https://hg.mozilla.org/integration/mozilla-inbound/rev/0a8b7e2db5fd
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/0a8b7e2db5fd

So far, so good on inbound, FWIW :)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Looking good! Please nominate this for Aurora approval when you get a chance :)
Comment on attachment 8498224 [details] [diff] [review]
Fix assert, v2

Approval Request Comment
[Feature/regressing bug #]: bug 848954

[User impact if declined]: lots of starring randomly across the tree (top webrtc/media orange) - or forcing us to disable the MOZ_ASSERT Paul added, which caught this extant bug.

[Describe test coverage new/current, TBPL]: all of tbpl that touches MSG tests it.

[Risks and why]: low risk - tweaks how tick ranges are decided on.  The code survived being off by a tick before we added the assert.

[String/UUID change made/needed]: none.
Attachment #8498224 - Flags: approval-mozilla-aurora?
Flags: needinfo?(rjesup)
Comment on attachment 8498224 [details] [diff] [review]
Fix assert, v2

Aurora+
Attachment #8498224 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/855936995e70
Blocks: 1096078
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: