crash in msvcr120.dll@0xf608/memcpy via ResourceQueue::CopyData

VERIFIED FIXED in Firefox 38

Status

()

defect
P1
normal
VERIFIED FIXED
5 years ago
4 years ago

People

(Reporter: rowbot, Assigned: mattwoodrow)

Tracking

(Blocks 1 bug, {crash, topcrash-win})

Trunk
mozilla40
x86_64
Windows 7
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify +

Firefox Tracking Flags

(firefox36+ wontfix, firefox37+ wontfix, firefox38+ verified, firefox39+ verified, firefox40 verified)

Details

(crash signature)

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
While attempting to reproduce the crash in Bug 1100063, I ran into this crash.  I do not have reliable steps to reproduce.  All I can say is that I was doing an excessive amount of seeking in YouTube videos trying to reproduce the crash in Bug 1100063.

bp-c48c2d76-e5d3-4ff7-b5be-7e37a2141120
bp-39dc5ad7-dedb-4274-b2c8-87ec32141120
bp-1a1b1aa7-6bcc-43ef-9cd4-32e692141120
bp-40ee45e3-de9b-4b0d-bb58-b51ba2141120
bp-b806799d-9424-4e8e-984a-197a02141120
The msvcr120 frame is a memcpy. The correct signature should be:
[@ memcpy | mozilla::ResourceQueue::CopyData(unsigned __int64, unsigned int, char*) ]
(which we also see in lower volume on aurora)

Kinetik any interest? This is in the top 10 on nightly.

Ted, what do we need to do to pick up symbols for the new CRT?
Status: UNCONFIRMED → NEW
Crash Signature: [@ msvcr120.dll@0xf608 ] → [@ msvcr120.dll@0xf608 ] [@ memcpy | mozilla::ResourceQueue::CopyData(unsigned __int64, unsigned int, char*) ]
Ever confirmed: true
Flags: needinfo?(ted)
Flags: needinfo?(kinetik)
Keywords: crash
Summary: crash in msvcr120.dll@0xf608 → crash in msvcr120.dll@0xf608/memcpy via ResourceQueue::CopyData
MSE crash, bouncing ni? to Anthony to allocate.
Flags: needinfo?(kinetik)
(In reply to dmajor (away/busy) from comment #1)
> Ted, what do we need to do to pick up symbols for the new CRT?

Sadly my cron job for this doesn't work reliably, so I just kick it manually as needed. I just kicked it, hopefully it will pick these symbols up. (Obviously this won't fix these crash signatures without reprocessing, but it will fix new crashes.)
Flags: needinfo?(ted)
This is now #2 in the list of crashers for Aurora 36.
Crash Signature: [@ msvcr120.dll@0xf608 ] [@ memcpy | mozilla::ResourceQueue::CopyData(unsigned __int64, unsigned int, char*) ] → [@ msvcr120.dll@0xf608 ] [@ msvcr120.dll@0xf20c ] [@ memcpy | mozilla::ResourceQueue::CopyData(unsigned __int64, unsigned int, char*) ]
Keywords: topcrash
[Tracking Requested - why for this release]: Because it's a top crasher.
Looks like a nullptr crash with memcpy().

Bug 1098126 is going to change the allocation path entirely here, so lets wait until that lands and see if it fixes the issue.

Comment 7

4 years ago
This is 20% of our Aurora 36 crashes right now, by far our top crash there.
Keywords: topcrashtopcrash-win
Topcrash, tracking.
Ralph, do you think you could help here? Thanks
Flags: needinfo?(giles)

Comment 9

4 years ago
(In reply to Matt Woodrow (:mattwoodrow) from comment #6)
> Looks like a nullptr crash with memcpy().
> 
> Bug 1098126 is going to change the allocation path entirely here, so lets
> wait until that lands and see if it fixes the issue.

Bug 1112822 and bug 1100260 did rise in Dev Edition data from the 17th slightly while the signatures for this bug are going down slightly (bug 1098126 landed on Dev Edition on the 16th).
I can't debug this directly today. My windows vm is at the office where the internet is down, but I'm not much good on Windows anyway. If you can find a mac or linux reproduction that would help. cpearce, can you take a quick look, just to trace where the bad pointer is coming into ResourceQueue::CopyData()?
Flags: needinfo?(giles) → needinfo?(cpearce)
If Matt is correct in comment #6, this should be decreasing in frequency now that Bug 1098126 has landed on both Nightly and Aurora. Have we seen that yet?
Flags: needinfo?(kairo)
This is deep in the bowels of MSE, so I probably know less about this code than you do! ;)
Flags: needinfo?(cpearce)
Priority: -- → P1

Comment 13

4 years ago
(In reply to Ralph Giles (:rillian) from comment #11)
> If Matt is correct in comment #6, this should be decreasing in frequency now
> that Bug 1098126 has landed on both Nightly and Aurora. Have we seen that
> yet?

The specific signatures of this seem to have gone away, yes.

There's a list of other signatures related to MSE that have been rising in the last few days, though, see the bold font ones in https://crash-analysis.mozilla.com/rkaiser/2014-12-21/2014-12-21.firefox.36.explosiveness.html which all started rising on 2014-12-17, would make sense for your team to look into those and file bugs and fixes as needed so we can hopefully improve a few things even before we hit beta.
Flags: needinfo?(kairo)
I'll close this one which is related to the MPEG4Extractor path.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
I am still able to reproduce this crash on latest Nightly using Windows Vista 64-bit with this steps:

1. Start Firefox
2. Visit 4k video on youtube (eg: https://www.youtube.com/watch?v=iNJdPyoqt8U)
3. Change the quality to 4k - 2160p
4. Pause the video while the spinner is present on video
5. Hit play button 3 times
6. Change the quality to HD - 720p or FullHD - 1080p

bp-68b07679-719c-4891-8833-766142150323
bp-7e7ff768-fcea-440e-a9ee-a15832150323
bp-62ec6ef1-1389-4ebd-a884-d33dc2150323
bp-2051076b-76fa-4ad3-b60f-2ffc82150323
bp-2c7712bf-5ebe-4ed6-b9ef-638dd2150323
bp-ba27456e-a783-460e-a38c-5aaaf2150323
bp-2a112ffc-3510-403e-a8b1-0cf0f2150323

I did not reproduce this on Windows 7 64-bit from a different machine.

Comment 16

4 years ago
Anthony, why did you close this? We are definitely still seeing crashes like this as comment #15 shows and crash-stats also show.
Flags: needinfo?(ajones)
msvcr120.dll@0xf20c is still the #8 topcrash for Firefox 37.0b with 1.32% of crashes (1435 crashes in 37.0b6). It also still affects Firefox 38.  

Tracking for 38 and 39.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Lawrence, I realize we don't have a fix for this but I requested tracking for 37, since it's a topcrash in beta -- in case we end up doing a second RC build.
Flags: needinfo?(lmandel)
I wonder whether main thread eviction could potentially lead to the readOffset passed to CopyData being less than mInputBuffer.mOffset.
As per Liz, I'm going to track this speculatively but I don't expect that we'll be able to fix this bug in 37.
Flags: needinfo?(lmandel)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #16)
> Anthony, why did you close this? We are definitely still seeing crashes like
> this as comment #15 shows and crash-stats also show.

At the time I thought it was related to the old code path. Comment #15 shows otherwise.
Flags: needinfo?(ajones)
I confirmed with Sheila and Anthony that this bug will not be fixed in 37.
Assignee: nobody → matt.woodrow
I think the problem is that we evict data while waiting on the monitor in SourceBufferResource::ReadInternal and then try read from evicted data.

ReadAtInternal calls SeekAtInternal which makes sure that our read position is after the start of the resource queue [1]. We then call into ReadInternal which can wait on the monitor until sufficient data is available [2].

It seems plausible that we could get an eviction while waiting that would move the start position of the resource queue to be after the read position. We could then call into ResourceQueue::CopyData with aOffset < mOffset.

ResourceQueue::GetAtOffset has an assert to check for this [3], but that won't hit in opt builds. We'll instead compute the difference between the values (negative in this case) and cast to uint32_t [4].

The end result is an invalid value for 'offset' at the crash line, and memcpy crashes.

[1] http://hg.mozilla.org/mozilla-central/annotate/82ae3b4e2215/dom/media/mediasource/SourceBufferResource.cpp#l151
[2] http://hg.mozilla.org/mozilla-central/annotate/82ae3b4e2215/dom/media/mediasource/SourceBufferResource.cpp#l71
[3] http://hg.mozilla.org/mozilla-central/annotate/82ae3b4e2215/dom/media/mediasource/ResourceQueue.h#l199
[4] http://hg.mozilla.org/mozilla-central/annotate/82ae3b4e2215/dom/media/mediasource/ResourceQueue.h#l207
Attachment #8586529 - Flags: review?(jyavenard)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=212f36da8b52

Bogdan, could you please test the try build above (when it completes) to see if you can still reproduce with that build?
Flags: needinfo?(bogdan.maris)
(Reporter)

Comment 26

4 years ago
Matt, I ran into a crash in SourceBufferResource::ReadInternal the other day.  Here is the crash report if you think it will help you prove your theory in comment #23.

bp-6e8ff90c-30d9-4f8b-a4e4-9125d2150329
Comment on attachment 8586529 [details] [diff] [review]
copy-data-crash

Review of attachment 8586529 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the minor comment addressed.

should be uplifted too

::: dom/media/mediasource/SourceBufferResource.cpp
@@ +203,5 @@
>      mInputBuffer.EvictBefore(aOffset);
>    }
> +  // Wake up any waiting threads in case a ReadInternal call
> +  // is now invalid.
> +  mon.NotifyAll();

should be within the test above.
Attachment #8586529 - Flags: review?(jyavenard) → review+
(In reply to Matt Woodrow (:mattwoodrow) from comment #25)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=212f36da8b52
> 
> Bogdan, could you please test the try build above (when it completes) to see
> if you can still reproduce with that build?

Sure thing!
Using the try build I was unable to crash Firefox. Videos will not work on 4k though, but that`s not the issue here.
Flags: needinfo?(bogdan.maris)
Duplicate of this bug: 1117881
(In reply to Bogdan Maris, QA [:bogdan_maris] from comment #28) 
> Sure thing!
> Using the try build I was unable to crash Firefox. Videos will not work on
> 4k though, but that`s not the issue here.

Awesome, thank you!
https://hg.mozilla.org/mozilla-central/rev/a7b710c75a02
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Flags: qe-verify+

Comment 33

4 years ago
Hi fellow mozillians,

"Status: REOPENED → RESOLVED
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Last Resolved: 2014-12-22 17:18:10 PST → 2015-04-02 05:11:49 PDT
status-firefox40: --- → fixed"

That's mean it's fixed in Firefox 40?
I just reproduced this bug with Firefox 37.0 and 4K Youtube video in HTML5.

https://crash-stats.mozilla.com/report/index/a84440da-1b59-4b8d-9f9f-910082150403
Matt, could you fill out an uplift request please?
Flags: needinfo?(matt.woodrow)

Comment 35

4 years ago
(In reply to pioruns from comment #33)
> Hi fellow mozillians,
> 
> "Status: REOPENED → RESOLVED
> Resolution: --- → FIXED
> Target Milestone: --- → mozilla40
> Last Resolved: 2014-12-22 17:18:10 PST → 2015-04-02 05:11:49 PDT
> status-firefox40: --- → fixed"
> 
> That's mean it's fixed in Firefox 40?
> I just reproduced this bug with Firefox 37.0 and 4K Youtube video in HTML5.
> 
> https://crash-stats.mozilla.com/report/index/a84440da-1b59-4b8d-9f9f-
> 910082150403

Crash info shows you're on beta Catalyst 15.3 (14.502.1014.0) drivers. Does this also happen on stable Catalyst 12.14 (14.501.1003)?

Comment 36

4 years ago
To be honest, I could not reproduce any crash as of yet on newer Firefox 37.0.1 which I have right now. I will try to mess around with 4K videos if any crash will happen I will let you guys know.
Comment on attachment 8586529 [details] [diff] [review]
copy-data-crash

Approval Request Comment
[Feature/regressing bug #]: MSE
[User impact if declined]: Occasional crashes when changing resolution on youtube.
[Describe test coverage new/current, TreeHerder]: Manually tested and confirmed to fix issue.
[Risks and why]: Very low risk
[String/UUID change made/needed]: None
Flags: needinfo?(matt.woodrow)
Attachment #8586529 - Flags: approval-mozilla-beta?
Attachment #8586529 - Flags: approval-mozilla-aurora?
Comment on attachment 8586529 [details] [diff] [review]
copy-data-crash

Should be in 38 beta 3.
Attachment #8586529 - Flags: approval-mozilla-beta?
Attachment #8586529 - Flags: approval-mozilla-beta+
Attachment #8586529 - Flags: approval-mozilla-aurora?
Attachment #8586529 - Flags: approval-mozilla-aurora+
I can`t reproduce this crash anymore on Windows Vista 64-bit using latest Aurora and Firefox 38 beta 9, but Socorro still recorded crashes in the last 7 days after the fix, on Firefox 38 beta and Nightly:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=msvcr120.dll%400xf608#tab-reports

Any idea why?
Flags: needinfo?(matt.woodrow)
msvcr120.dll@0xf608 is the 'memcpy' function. It's a generic function and many things can make it crash. As long as there are no more crashes where the second line is 'ResourceQueue::CopyData', then we can call this fixed.
Flags: needinfo?(matt.woodrow)

Updated

4 years ago
Depends on: 1161067
(In reply to David Major [:dmajor] from comment #42)
> msvcr120.dll@0xf608 is the 'memcpy' function. It's a generic function and
> many things can make it crash. As long as there are no more crashes where
> the second line is 'ResourceQueue::CopyData', then we can call this fixed.

I see 0 crashes with '@ msvcr120.dll@0xf608' signature in the last week:
https://crash-stats.mozilla.com/report/list?range_unit=days&range_value=7&signature=msvcr120.dll%400xf608#tab-sigsummary

And 35 crashes with '@ msvcr120.dll@0xf20c' signature in the last week but none with 'ResourceQueue::CopyData':
https://crash-stats.mozilla.com/report/list?range_unit=days&range_value=7&signature=msvcr120.dll%400xf20c#tab-sigsummary

Marking this as verified fixed.
You need to log in before you can comment on or make changes to this bug.