Closed Bug 1273523 Opened 4 years ago Closed 4 years ago

Crash in std::__ndk1::__deque_base<T>::clear

Categories

(Firefox for Android :: Audio/Video, defect)

48 Branch
All
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Firefox 49
Tracking Status
firefox47 --- fixed
firefox48 --- fixed
firefox49 --- fixed

People

(Reporter: esawin, Assigned: esawin)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file, 3 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-a57f78dd-e5df-4d94-9a2d-4f8642160515.
=============================================================

This is the new signature for bug 1262456.
I can (sometimes) reproduce this on the GT-I9060I when watching YouTube videos.

This device has a tendency to fail in QueueSample (probably {Dequeue|QueueInput}Buffer) and we don't handle this case correctly in ProcessOutput, where we expect !mDurations.empty().

We should break out of the decoder loop when QueueSample fails and we should return early from ProcessOutput if mDurations.empty() holds.
We shouldn't just ignore when we fail to queue a sample.
Attachment #8753443 - Flags: review?(snorp)
Calling GetOutputDuration with mDurations.empty() will corrupt the heap by the call to mDurations.pop_front(). We should check for this and return early.
Attachment #8753444 - Flags: review?(snorp)
I've noticed that the GT-I9060I seems to intermittently fail in QueueSample but succeed in subsequent calls to it.

We can give the decoder a few tries before giving up to prevent playback from failing on these devices.
Attachment #8753445 - Flags: review?(snorp)
Attachment #8753443 - Flags: review?(snorp) → review+
Comment on attachment 8753445 [details] [diff] [review]
0003-Bug-1273523-3.1-Attempt-to-queue-sample-a-few-times-.patch

Taking this patch back, retrying doesn't actually help.
Attachment #8753445 - Attachment is obsolete: true
Attachment #8753445 - Flags: review?(snorp)
Comment on attachment 8753444 [details] [diff] [review]
0002-Bug-1273523-2.1-Don-t-process-output-if-no-duration-.patch

Review of attachment 8753444 [details] [diff] [review]:
-----------------------------------------------------------------

Why would we be in this situation? The only way to get here is if we are somehow getting more frames out than we have given as input, no?
Comment on attachment 8753443 [details] [diff] [review]
0001-Bug-1273523-1.1-Stop-decoding-on-sample-queueing-fai.patch

Review of attachment 8753443 [details] [diff] [review]:
-----------------------------------------------------------------

Actually I'm taking back my r+. Failure to queue the sample is a fairly normal error. You can get that just by having all of the slots full. It shouldn't bail out of the decoder loop just for that.
Attachment #8753443 - Flags: review+ → review-
You're right, we only need patch 2, patch 3 doesn't help and patch 1 will only make a lot of playbacks fail for no reason.
Attachment #8753443 - Attachment is obsolete: true
FWIW, I've seen numerous occurrences of bad behaviour from std::deque in terms of memory usage.

For example, consider this crash report:

https://crash-stats.mozilla.com/report/index/9b1773c9-1569-4756-b347-ea86e2160511

We're initializing the deque. Why are we trying to allocate 2,144,532,512 bytes?! That's only 2,951,136 bytes short of 2^31, which is suggestive of an integer overflow of some kind.

Then again, we recently moved away from stlport, so maybe we'll get better behaviour from std::deque from now on?
Comment on attachment 8753444 [details] [diff] [review]
0002-Bug-1273523-2.1-Don-t-process-output-if-no-duration-.patch

Review of attachment 8753444 [details] [diff] [review]:
-----------------------------------------------------------------

Why would we be in this situation? The only way to get here is if we are somehow getting more frames out than we have given as input, no?

EDIT: Eugen and I talked about this, and apparently the problem is that we can fail to queue an input buffer, but the decoder will still give us an output buffer. Eugen, please add a comment describing the issue.
Attachment #8753444 - Flags: review?(snorp) → review+
Added comment.
Attachment #8753444 - Attachment is obsolete: true
Attachment #8753871 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/0ebe261ee2f7
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 49
Depends on: 1274488
Comment on attachment 8753871 [details] [diff] [review]
0001-Bug-1273523-1.2-Don-t-process-output-if-no-duration-.patch

Approval Request Comment
[Feature/regressing bug #]: Media playback
[User impact if declined]: Media playback may crash Fennec on certain popular devices (#1 crash on beta)
[Describe test coverage new/current, TreeHerder]: Due to broken Nightly updates (bug 1274488) this has just landed on today's Nightly
[Risks and why]: Low, it's a simple a safe change to fix heap corruption
[String/UUID change made/needed]: None
Attachment #8753871 - Flags: approval-mozilla-beta?
Attachment #8753871 - Flags: approval-mozilla-aurora?
Comment on attachment 8753871 [details] [diff] [review]
0001-Bug-1273523-1.2-Don-t-process-output-if-no-duration-.patch

This is definitely a medium/high volume crash on Fennec47, ~600 crashes in a week on 47.0b6, let's uplift to Aurora48, Beta47.
Attachment #8753871 - Flags: approval-mozilla-beta?
Attachment #8753871 - Flags: approval-mozilla-beta+
Attachment #8753871 - Flags: approval-mozilla-aurora?
Attachment #8753871 - Flags: approval-mozilla-aurora+
Depends on: 1262456
I could rebase the patch on Beta, but it's much safer (Aurora-tested) to uplift bug 1262456 instead. I've requested it for uplifting.
Flags: needinfo?(esawin)
No crashes have been encountered on the 47.0b10 build in today's testing session. 

The testing took place on the following devices: Samsung Galaxy Note 5(Android 5.1.1), Xiaomi mi i4 (Android 5.0.2), Samsung Galaxy S5 (Android 5.0.2), Nexus 5 (Android 6.0.1), Nexus 6P (Android N Beta), 	Alcatel OneTouch, and on tablets: Xiaomi MI pad2 (Android 5.1.1), Nexus 9 (Android 5.1), Nexus 9 (Android 6.1).

I will leave this issue open in order to gather more information in the Crash-stats reports.
You need to log in before you can comment on or make changes to this bug.