Closed
Bug 1162364
Opened 9 years ago
Closed 9 years ago
detect MF_E_TRANSFORM_STREAM_CHANGE infinite loops
Categories
(Core :: Audio/Video, defect)
Tracking
()
RESOLVED
FIXED
mozilla41
People
(Reporter: karlt, Assigned: karlt)
References
Details
Attachments
(4 files, 3 obsolete files)
6.41 KB,
patch
|
cpearce
:
review+
benjamin
:
feedback+
vladan
:
feedback+
|
Details | Diff | Splinter Review |
3.01 KB,
patch
|
cpearce
:
review+
|
Details | Diff | Splinter Review |
6.42 KB,
patch
|
Sylvestre
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
3.00 KB,
patch
|
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Some stacks (e.g. [1]) tracked in bug 1129455 have the platform decoder thread running in WMFVideoMFTManager::ConfigureVideoFrameGeometry() which indicates that MF_E_TRANSFORM_STREAM_CHANGE is involved. If the transform implementation is returning MF_E_TRANSFORM_STREAM_CHANGE again and again, then we would have an infinite loop, which would seem the simplest explanation for most of the hangs in bug 1129455 We rely on the transform to behave as specified, returning MFT_OUTPUT_DATA_BUFFER_FORMAT_CHANGE only once on each change. We handle MF_E_TRANSFORM_STREAM_CHANGE assuming MFT_OUTPUT_DATA_BUFFER_FORMAT_CHANGE [2]. However, if the transform creates a new stream then we are not handling that properly [3], though I don't know why that would happen. It should not be necessary to repeat ProcessOutput on the transform more than once. We can detect if MF_E_TRANSFORM_STREAM_CHANGE is returned a second time and error out of the decode. [1] https://crash-stats.mozilla.com/report/index/b901bff8-3d02-4fda-bcb8-dec762150427#allthreads [2] https://msdn.microsoft.com/en-us/library/windows/desktop/ms704014%28v=vs.85%29.aspx#stream_changes [3] https://msdn.microsoft.com/en-us/library/windows/desktop/ms699875%28v=vs.85%29.aspx
Assignee | ||
Comment 1•9 years ago
|
||
Attachment #8605514 -
Flags: review?(cpearce)
Assignee | ||
Comment 2•9 years ago
|
||
Attachment #8605531 -
Flags: review?(cpearce)
Updated•9 years ago
|
Attachment #8605514 -
Flags: review?(cpearce) → review+
Comment 3•9 years ago
|
||
Comment on attachment 8605531 [details] [diff] [review] report telemetry on WMFMediaDataDecoder errors Review of attachment 8605531 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/media/fmp4/wmf/WMFMediaDataDecoder.cpp @@ +78,5 @@ > + sample = 0; > + } else if (hr < 0xc00d36b0) { > + sample = 1; // low bucket > + } else if (hr < 0xc00d3700) { > + sample = hr & 0xffU; // MF_E_* What if only one of the last two bits is set? Then couldn't this value be misreported as 1 or 2? Ditto for the MF_E_TRANSFORM_* case below. Or is there something about the layout of the error codes that makes it safe? @@ +89,5 @@ > + } else { > + sample = 3; // high bucket > + } > + > + nsCOMPtr<nsIRunnable> runnable = new WMFTelemeteryRunnable(sample); You could use NS_NewRunnableFunction() instead of defining a new Runnable for this, and either pass a lambda, or (I assume) &Telemetry::Accumulate and the arguments to the call you want to make.
Assignee | ||
Comment 4•9 years ago
|
||
Comment on attachment 8605531 [details] [diff] [review] report telemetry on WMFMediaDataDecoder errors (In reply to Chris Pearce (:cpearce) from comment #3) > ::: dom/media/fmp4/wmf/WMFMediaDataDecoder.cpp > @@ +78,5 @@ > > + sample = 0; > > + } else if (hr < 0xc00d36b0) { > > + sample = 1; // low bucket > > + } else if (hr < 0xc00d3700) { > > + sample = hr & 0xffU; // MF_E_* > > What if only one of the last two bits is set? Then couldn't this value be > misreported as 1 or 2? Ditto for the MF_E_TRANSFORM_* case below. Or is > there something about the layout of the error codes that makes it safe? In this block the last two bytes range from 0xb0 to 0xff. Similarly in the MF_E_TRANSFORM_* case, they range from 0x60 to 0x78. BTW these are useful references for error codes: http://errco.de/win32/mferror-h/ https://github.com/Alexpux/mingw-w64/blob/master/mingw-w64-headers/include/mferror.h But the non-overlapping layout is provided by the "if" blocks rather than the set of possible codes. > You could use NS_NewRunnableFunction() instead of defining a new Runnable > for this, and either pass a lambda, or (I assume) &Telemetry::Accumulate and > the arguments to the call you want to make. Sounds like that could be simpler. Will look thanks.
Attachment #8605531 -
Flags: review?(cpearce)
Assignee | ||
Comment 5•9 years ago
|
||
NS_NewRunnableFunction currently accepts only a single parameter. I like how the lamba associates parameters with the function call anyway.
Attachment #8605587 -
Flags: review?(cpearce)
Assignee | ||
Updated•9 years ago
|
Attachment #8605531 -
Attachment is obsolete: true
Assignee | ||
Comment 6•9 years ago
|
||
(In reply to Karl Tomlinson (ni?:karlt) from comment #4) > (In reply to Chris Pearce (:cpearce) from comment #3) > > ::: dom/media/fmp4/wmf/WMFMediaDataDecoder.cpp > > @@ +78,5 @@ > > > + sample = 0; > > > + } else if (hr < 0xc00d36b0) { > > > + sample = 1; // low bucket > > > + } else if (hr < 0xc00d3700) { > > > + sample = hr & 0xffU; // MF_E_* > > > > What if only one of the last two bits is set? Then couldn't this value be > > misreported as 1 or 2? Ditto for the MF_E_TRANSFORM_* case below. Or is > > there something about the layout of the error codes that makes it safe? > > In this block the last two bytes range from 0xb0 to 0xff. > Similarly in the MF_E_TRANSFORM_* case, they range from 0x60 to 0x78. I mean last to hex digits. i.e. last byte. There is no overlap with 1 or 2 because higher bits are set in this byte.
Comment 7•9 years ago
|
||
Comment on attachment 8605587 [details] [diff] [review] report telemetry on WMFMediaDataDecoder errors (lamba version) Review of attachment 8605587 [details] [diff] [review]: ----------------------------------------------------------------- You also need approval from a Telemetry peer. https://wiki.mozilla.org/Firefox/Data_Collection
Attachment #8605587 -
Flags: review?(cpearce) → review+
Assignee | ||
Comment 8•9 years ago
|
||
Comment on attachment 8605587 [details] [diff] [review] report telemetry on WMFMediaDataDecoder errors (lamba version) https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe#Declaring_a_Histogram "You need approval from a data collection peer to add an opt-out measurement: https://wiki.mozilla.org/Firefox/Data_Collection" seems to imply that default opt-in releaseChannelCollection doesn't require approval, but https://wiki.mozilla.org/Firefox/Data_Collection "For every new measurement, even a simple new Telemetry probe, please request approval by setting the NEEDINFO flag for the module owner or a peer." says otherwise. The intention of this probe is to find out the common causes of Windows Media Framework video decode failure in browser usage. Only one error code corresponds to the MF_E_TRANSFORM_STREAM_CHANGE failure suspected in this bug, but this probe will tell us whether that is one of the common causes.
Attachment #8605587 -
Flags: feedback?(benjamin)
Assignee | ||
Comment 9•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3ad9789530c3
Assignee | ||
Comment 10•9 years ago
|
||
Comment on attachment 8605587 [details] [diff] [review] report telemetry on WMFMediaDataDecoder errors (lamba version) I notice Vladan has been looking at some of these approval requests recently.
Attachment #8605587 -
Flags: feedback?(vdjeric)
Comment 11•9 years ago
|
||
Afaik, you don't need approval for opt-in probes, unless you're reporting sensitive/potentially private information. I agree the wording on the Data Collection page is contradictory, I'll ask Benjamin for clarification
Updated•9 years ago
|
Attachment #8605587 -
Flags: feedback?(vdjeric) → feedback+
Updated•9 years ago
|
Attachment #8605587 -
Flags: feedback?(benjamin) → feedback+
Comment 12•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/3f28c4380901 https://hg.mozilla.org/integration/mozilla-inbound/rev/6bdabdc61b68
Comment 13•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/3f28c4380901 https://hg.mozilla.org/mozilla-central/rev/6bdabdc61b68
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox41:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Comment 14•9 years ago
|
||
So it turns out all new data collection needs privacy peer approval. For the vast majority of opt-in histograms, you'll receive a quick r+ within 24 hours.
Comment 15•9 years ago
|
||
I updated the docs
Assignee | ||
Comment 16•9 years ago
|
||
https://crash-stats.mozilla.com/report/index/a9c4f9ac-1b1c-4c4f-9175-3d1a02150518#allthreads is a 40.0a2 hang report with MF_E_TRANSFORM_STREAM_CHANGE involved.
Assignee | ||
Comment 17•9 years ago
|
||
3f28c4380901 was backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/a8c823f6e6c5
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 18•9 years ago
|
||
It turns out that sometimes MF_E_TRANSFORM_STREAM_CHANGE is returned on 2 consecutive calls to ProcessOutput(). On Windows 7, try runs with some logging [1][2] show that the video decoder using MFT_OUTPUT_STREAM_PROVIDES_SAMPLES returns MF_E_TRANSFORM_STREAM_CHANGE twice. Sometimes MF_E_TRANSFORM_NEED_MORE_INPUT is returned between the two calls returning MF_E_TRANSFORM_STREAM_CHANGE and sometimes not. When the MF_E_TRANSFORM_STREAM_CHANGE calls are consecutive we hit the loop abort introduced in attachment 8605514 [details] [diff] [review]. cpearce observed a similar situation with the AAC decoder on Windows 8.1. The try logging of reveals that on the first MF_E_TRANSFORM_STREAM_CHANGE from the video decoder the number of available output types changes as well as cbSize and cbAlignment in MFT_OUTPUT_STREAM_INFO. There is no change in MFT_OUTPUT_STREAM_INFO::dwFlags. On the second MF_E_TRANSFORM_STREAM_CHANGE, there is no change in any of these properties. It seems that the transform provides an initial set of output types and then provides an updated list when it has enough input to determine more information about the stream. This is consistent with "To avoid an initial format change, provide as much information in the input type as possible, including:" [3]. "To guard against endless loops, the MFT should not set the MFT_OUTPUT_DATA_BUFFER_FORMAT_CHANGE flag again until there is another change" [4] and the transform has no new information between the 2 calls so I think there should not be another consecutive change, but perhaps an async handling of input or other implementation detail means we are getting the extra notification. If this can happen twice consecutively, then there is no reason why it won't happen more often. MFT_OUTPUT_DATA_BUFFER_FORMAT_CHANGE is being returned in MFT_OUTPUT_DATA_BUFFER::dwStatus with each MF_E_TRANSFORM_STREAM_CHANGE, as expected. What is not expected is that the pdwStatus parameter from ProcessOutput() is returning MFT_PROCESS_OUTPUT_STATUS_NEW_STREAMS. GetStreamIDs() returns E_NOTIMPL which means "The transform has a fixed number of streams" [5]. I suspect someone accidentally tried to return MFT_OUTPUT_DATA_BUFFER_FORMAT_CHANGE in this parameter because MFT_OUTPUT_DATA_BUFFER_FORMAT_CHANGE == MFT_PROCESS_OUTPUT_STATUS_NEW_STREAMS, which should be OK because the values should be returned in different places. The logging reveals some other decoder transforms do not follow the prescribed behavior. MFT_OUTPUT_DATA_BUFFER_NO_SAMPLE is frequently in output.dwStatus when returning MF_E_TRANSFORM_NEED_MORE_INPUT, even though "If no streams are ready to produce output, the MFT does not set this flag. Instead, the ProcessOutput method returns MF_E_TRANSFORM_NEED_MORE_INPUT" [6] and GetStreamCount() is returning one input and one output stream. This transform returns mOutputStreamInfo.dwFlags=1 and reports no stream change return values. Another transform is returning 0x1000007 in MFT_OUTPUT_DATA_BUFFER::dwStatus even though this is not a valid _MFT_OUTPUT_DATA_BUFFER_FLAGS (which are "not bit flags") [7]. This is MFT_OUTPUT_DATA_BUFFER_INCOMPLETE with 3 extra bits set. This transform returns dwFlags=0 and also reports no stream change return values. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b92a36ffc58 [2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=42d948267019 [3] https://msdn.microsoft.com/en-us/library/windows/desktop/dd797815%28v=vs.85%29.aspx [4] https://msdn.microsoft.com/en-us/library/windows/desktop/ms704014%28v=vs.85%29.aspx#stream_changes [5] https://msdn.microsoft.com/en-us/library/windows/desktop/ms693988%28v=vs.85%29.aspx [6] https://msdn.microsoft.com/en-us/library/windows/desktop/ms702281%28v=vs.85%29.aspx [7] https://msdn.microsoft.com/en-us/library/windows/desktop/ms702281%28v=vs.85%29.aspx
Assignee | ||
Comment 19•9 years ago
|
||
Attachment #8609135 -
Flags: review?(cpearce)
Comment 20•9 years ago
|
||
Comment on attachment 8609135 [details] [diff] [review] detect and abort MF_E_TRANSFORM_STREAM_CHANGE infinite loops v2 Review of attachment 8609135 [details] [diff] [review]: ----------------------------------------------------------------- Don't forget to address my comment please! ::: dom/media/platforms/wmf/WMFAudioMFTManager.cpp @@ +216,5 @@ > NS_ENSURE_TRUE(SUCCEEDED(hr), hr); > + // Catch infinite loops, but some decoders perform at least 2 stream > + // changes on consecutive calls, so be permissive. > + // 100 is arbitrarily > 2. > + NS_ENSURE_TRUE(typeChangeCount < 100, MF_E_TRANSFORM_STREAM_CHANGE); ++typeChangeCount;
Attachment #8609135 -
Flags: review?(cpearce) → review+
Assignee | ||
Comment 22•9 years ago
|
||
(In reply to Chris Pearce (:cpearce) from comment #20) > Don't forget to address my comment please! > ++typeChangeCount; Thank you. Added.
Assignee | ||
Comment 24•9 years ago
|
||
This was backed out due to failures caused by another changeset and relanded.
Assignee | ||
Comment 25•9 years ago
|
||
Of a sample ten 20150521175336/38.0.5b0 shutdown hangs tracked in bug 1129455, half involve MF_E_TRANSFORM_STREAM_CHANGE. 3 with SetDecoderOutputType() calling SetOutputType(): https://crash-stats.mozilla.com/report/index/422ff099-6a1b-4696-85d6-b3c972150523#allthreads https://crash-stats.mozilla.com/report/index/211ef738-c969-4e45-bc1f-1248d2150524#allthreads https://crash-stats.mozilla.com/report/index/0d9142eb-e03b-4877-a3ba-7e8712150524#allthreads 2 in ConfigureVideoFrameGeometry: https://crash-stats.mozilla.com/report/index/11bebae4-344f-48e2-b171-e50af2150524#allthreads https://crash-stats.mozilla.com/report/index/99aee6b8-f6f1-4af8-8dfe-552a12150523#allthreads That doesn't prove that there is an infinite loop here, but it is a scenario worth excluding.
Comment 26•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/975a4f9dea3d https://hg.mozilla.org/mozilla-central/rev/4ff2c52aad5c
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•9 years ago
|
Attachment #8605514 -
Attachment is obsolete: true
Assignee | ||
Comment 27•9 years ago
|
||
Approval Request Comment [Feature/regressing bug #]: WMF media playback (several releases old) [User impact if declined]: Potential shutdown hang. See comment 25. I don't know that this will work around the observed shutdown hangs, and so I'm requesting approval only for Aurora right now. We can then see if it changes the shutdown hang reports and perhaps if there is time consider again for Beta. [Describe test coverage new/current, TreeHerder]: Existing test coverage was sufficient to detect that the first version of this patch was too strict. This version is very lenient, bailing only when things are obviously wrong. [Risks and why]: Risks are low given the simplicity of the patch and existing test. [String/UUID change made/needed]: none.
Attachment #8611587 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 28•9 years ago
|
||
Approval Request Comment [Feature/regressing bug #]: Telemetry on WMF decode errors including reporting when these suspected infinite loops are aborted. [User impact if declined]: Less feedback provided to developers. [Describe test coverage new/current, TreeHerder]: On m-c. [Risks and why]: This is a little more complicated than the other patch, but any failure should be limited to problems in reporting this specific telemetry data, and testing coverage provides assurance of that. [String/UUID change made/needed]: none.
Attachment #8611594 -
Flags: approval-mozilla-aurora?
Comment 29•9 years ago
|
||
Comment on attachment 8611587 [details] [diff] [review] 40 branch patch: detect and abort infinite loops Shutdown hangs are a pain. Taking it.
Attachment #8611587 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•9 years ago
|
Attachment #8611594 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 30•9 years ago
|
||
Comment on attachment 8611587 [details] [diff] [review] 40 branch patch: detect and abort infinite loops This doesn't apply.
Flags: needinfo?(karlt)
Assignee | ||
Comment 31•9 years ago
|
||
Attachment #8611587 -
Attachment is obsolete: true
Flags: needinfo?(karlt)
Assignee | ||
Updated•9 years ago
|
Attachment #8611594 -
Attachment description: 40 branch patch: telemetry → 40 branch patch 2: telemetry
Comment 32•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/580fbd9d7778 https://hg.mozilla.org/releases/mozilla-aurora/rev/22e1e8e0eb6c
Assignee | ||
Comment 33•9 years ago
|
||
Comment on attachment 8612592 [details] [diff] [review] 40 branch patch 1: detect and abort infinite loops (with adjustments to diff comment) Approval Request Comment I'm requesting beta approval for this patch only. I'm not requesting approval for the telemetry patch because we wouldn't have time to act on data before release anyway, so it is not worth taking that code change. [Feature/regressing bug #]: WMF media playback (several releases old) [User impact if declined]: One thread may spin and cause a shutdown hang. See comment 25. There are no hang reports on aurora like those in comment 25 since this change landed on aurora. The small number of aurora data points, and the apparently different distribution on aurora, means that doesn't prove anything. Only one of the dozen media shutdowns hangs reported on the build prior to this change had such a hang report. However, telemetry on Aurora shows 1 data point out of 35k indicating that the suspected situation here is really happening for some small proportion of users. [Describe test coverage new/current, TreeHerder]: Existing test coverage was sufficient to detect that the first version of this patch was too strict. This version is very lenient, bailing only when things are obviously wrong. [Risks and why]: Risks are low given the simplicity of the patch and existing tests. [String/UUID change made/needed]: none
Attachment #8612592 -
Flags: approval-mozilla-beta?
Updated•9 years ago
|
status-firefox39:
--- → affected
Comment 34•9 years ago
|
||
Comment on attachment 8612592 [details] [diff] [review] 40 branch patch 1: detect and abort infinite loops (with adjustments to diff comment) Seems reasonable, let's uplift it to beta 4.
Attachment #8612592 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in
before you can comment on or make changes to this bug.
Description
•