Closed
Bug 1164480
Opened 9 years ago
Closed 9 years ago
Crashes on startup in AppleVDADecoder::CreateOutputConfiguration() in content process
Categories
(Core :: Audio/Video, defect)
Tracking
()
VERIFIED
FIXED
mozilla41
People
(Reporter: smichaud, Assigned: smichaud)
References
Details
(Keywords: regression, reproducible, topcrash-mac, Whiteboard: [STR in comment #17] )
Crash Data
Attachments
(3 files, 1 obsolete file)
4.09 KB,
patch
|
smichaud
:
feedback-
|
Details | Diff | Splinter Review |
1.17 KB,
patch
|
smichaud
:
review+
|
Details | Diff | Splinter Review |
1.41 KB,
patch
|
jya
:
review+
Sylvestre
:
approval-mozilla-aurora+
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
For example bp-5132843b-d0dc-4398-bfe0-c6ad82150512. These started with the 20150509030210 mozilla-central nightly. They are similar to (and may be related to) the crashes at 1153809.
Assignee | ||
Comment 1•9 years ago
|
||
Bug 1153809 turned out to be a problem with the content process sandbox. So, probably, will these. They only happen on OS X 10.9 and up (and the content process sandbox is only enabled (only has a non-empty ruleset) on 10.9 and up). These are a Mac topcrasher on the 41 branch.
Crash Signature: [@ CoreFoundation@0x117d8 ]
[@ CoreFoundation@0x11a5b ]
Keywords: topcrash-mac
Assignee | ||
Updated•9 years ago
|
Keywords: regression
Assignee | ||
Comment 2•9 years ago
|
||
Implied regression range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=86203ac87a08&tochange=39dc888ce14c Nothing here jumps out at me. And in any case it'd (probably) be the trigger, not the cause.
Assignee | ||
Comment 3•9 years ago
|
||
These appear to all be crashes on startup. The most common URL is about:blank.
Summary: Crashes in AppleVDADecoder::CreateOutputConfiguration() in content process → Crashes on startup in AppleVDADecoder::CreateOutputConfiguration() in content process
Assignee | ||
Comment 4•9 years ago
|
||
Jya, can you identify the trigger in the implied regression range from comment #2? If we knew it, it might be easier to come up with STR. (And of course there's little we can do here without STR.)
Flags: needinfo?(jyavenard)
Assignee | ||
Comment 5•9 years ago
|
||
These crashes may turn out to be hardware-specific (as were those for bug 1153809). There's a strong correlation with Intel HD Graphics 4000 hardware (vendor 0x8086, device 0x0166). But it's by no means the only kind of graphics hardware on which these crashes happen. And Intel isn't the only vendor. Here are some others: Vendor Adapter ID Intel 0x0d26 Intel 0x0a26 Intel 0x0116 https://crash-stats.mozilla.com/search/?signature=~CoreFoundation%400x117d8&version=40.0a1&_facets=signature&_facets=adapter_vendor_id&_facets=adapter_device_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-adapter_vendor_id
Assignee | ||
Comment 6•9 years ago
|
||
My (Retina) MacBookPro10,1 has Intel HD Graphics 4000 hardware! Later I'll try to reproduce these crashes ... or at least see if I get any sandbox errors.
Assignee | ||
Comment 7•9 years ago
|
||
I'm unable to reproduce these crashes, on either OS X 10.10.3 or 10.9.5. I also don't see any video-hardware-specific sandbox errors (in the system console). I tested yesterday's m-c nightly with a fresh profile. I tested with and without gfxCardStatus (https://gfx.io/, which I used to force the nightly to always use the "integrated" Intel HD Graphics 4000 hardware on my MacBookPro10,1). So, Jya, please do see if you can identify this bug's trigger. Without that we may never be able to find STR.
Assignee | ||
Comment 8•9 years ago
|
||
I took another look at our crash stacks and noticed that mozilla::CreateTestH264Decoder was added by the patch for bug 1160321, which is in the implied regression range from comment #2. So that's the most likely trigger. Cpearce, do you have any idea how that patch could have caused/triggered these crashes?
Blocks: 1160321
Flags: needinfo?(cpearce)
Assignee | ||
Comment 9•9 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/03dc784d5456 Bug 1160321 - Test whether we can create H.264/AAC decoders before we report we support them. r=mattwoodrow author Chris Pearce <cpearce@mozilla.com> Fri, 08 May 2015 13:36:32 +1200
Comment 10•9 years ago
|
||
Yes, this is the one I was going to point to. It would be interesting to see if applying the patches from bug 1163458 fix the problem, in particular path #2. Currently, the apple PlatformDecoderModule constantly link/unlink the framework (the last decoder when destroyed unlink the framework). Bug 1160321 causes the creation / deletion of a dummy PDM to ensure that we can decode video. A quick fix possibility would be to make the code introduced by bug 1160321 conditional, as i can't foresee a reason creating a PDM wouldn't work on any mac we support (>= 10.6) Jean-Yves
Flags: needinfo?(jyavenard)
Assignee | ||
Comment 11•9 years ago
|
||
[Tracking Requested - why for this release]: This is a Mac topcrasher. Currently only the 41 branch (currently trunk) is affected, because e10s mode is only available on the trunk. But if we allow e10s mode to be turned on in the Developer Edition, the 40 branch would also be affected.
status-firefox40:
--- → affected
status-firefox41:
--- → affected
tracking-firefox40:
--- → ?
tracking-firefox41:
--- → ?
Comment 12•9 years ago
|
||
This JS code should trigger CreateTestH264Decoder: document.createElement("video").canPlayType("video/mp4; codecs=\"avc1.42E01E, mp4a.40.2\""); Following up on what jya was saying; we create a PlatformDecoderModule in CreateTestH264Decoder(), and use that to create a MediaDataDecoder, which we return. However we'll destroy the PDM when we exit CreateTesth264Decoder, which unlinks the system video decoding libraries, so when the caller uses the returned MediaDataDecoder we could end up calling into code in the system libraries that has been removed from our process space. So we may need to return the PDM as well as the MediaDataDecoder to ensure it doesn't die.
Flags: needinfo?(cpearce)
Comment 13•9 years ago
|
||
Steven: Try this patch, it may help.
Attachment #8605606 -
Flags: feedback?(smichaud)
Assignee | ||
Comment 15•9 years ago
|
||
(In reply to comment #13) I'm not (yet) able to reproduce these crashes, so there's no point (yet) in trying your fix, Chris. I've noticed that CreateTestH264Decoder doesn't normally get called on startup (where these crashes happen). I'll try to figure out why that's happening. The code snippet from comment #12 doesn't crash for me when called at "normal" times (on my MacBookPro10,1, on OS X 10.10.3). In that regard it's odd that so many of the crash URLs are "about:blank". You'd think the only way to trigger a call to CreateTestH264Decoder is to load a web page. Does anyone know if there exceptions to this (presumed) rule?
Assignee | ||
Comment 16•9 years ago
|
||
Putting the code snippet from comment #12 in a JavaScript onload handler and reloading the testcase on startup doesn't trigger these crashes. But, in any case, doing that doesn't call CreateTestH264Decoder the way it's called in this bug's crash stacks. That kind of testcase produces a call to CreateTestH264Decoder (indirectly) from nsDocShell::EndPageLoad(). But none of this bug's crash stacks have that call in them. Instead, what all this bug's crash stacks have is a call to nsHtml5TreeOpExecutor::RunFlushLoop(), called (indirectly) from nsTimerImpl::Fire(). Anyone know what that means? Anyone know how to trigger a call to CreateTestH264Decoder (indirectly) from nsHtml5TreeOpExecutor::RunFlushLoop()?
Assignee | ||
Comment 17•9 years ago
|
||
Hooray, I've got STR for this bug! 1) Set a current m-c nightly to "show my windows and tabs from last time" "when Nightly starts". 2) Visit http://rss.slashdot.org/~r/Slashdot/slashdot/~3/GVTelbFuSos/the-decline-of-pixel-art 3) Quit Nightly and restart. I got the URL from crash stats. Interestingly my own crash reports all show the URL as "about:blank" or "about:home". That's a bug, I suppose. bp-51fcefe2-bf23-4435-a492-d4aeb2150514 bp-89feb3c0-82d1-4db2-bae1-64d652150514 bp-e00f9876-8a8e-4be9-b3dc-2e5062150514 bp-6e8549ac-88f8-4a4f-91e3-7fc2c2150514
Keywords: reproducible
Whiteboard: [STR in comment #17]
Assignee | ||
Comment 18•9 years ago
|
||
> bp-51fcefe2-bf23-4435-a492-d4aeb2150514
> bp-89feb3c0-82d1-4db2-bae1-64d652150514
> bp-e00f9876-8a8e-4be9-b3dc-2e5062150514
> bp-6e8549ac-88f8-4a4f-91e3-7fc2c2150514
The interpose.dylib in some of these is my own interpose library that I was using for debug logging. You won't see it in other crash logs ... and no, it doesn't trigger the crashes :-)
Assignee | ||
Comment 19•9 years ago
|
||
My STR doesn't work with e10s off, even though (in that case) CreateTestH264Decoder is still called (indirectly) from nsHtml5TreeOpExecutor::RunFlushLoop().
Assignee | ||
Comment 20•9 years ago
|
||
The content process sandbox isn't involved here. Setting security.sandbox.content.level doesn't make these crashes go away.
Assignee | ||
Comment 21•9 years ago
|
||
> 2) Visit http://rss.slashdot.org/~r/Slashdot/slashdot/~3/GVTelbFuSos/the-decline-of-pixel-art
You need to do this in the current tab, or make it current before you quit and restart.
Assignee | ||
Comment 22•9 years ago
|
||
My STR from comment #17 "works" (I crash) on all supported versions of OS X (back to 10.6.8), and on all video hardware on which I tried it. So this bug is not OS-X-version-specific or hardware-specific.
Assignee | ||
Comment 23•9 years ago
|
||
Comment on attachment 8605606 [details] [diff] [review] Patch: potential fix This patch doesn't help. With it my STR from comment #17 still crashes. My guess is that some kind of necessary initialization doesn't happen, in e10s mode, before the first call to CreateTestH264Decoder (via nsHtml5TreeOpExecutor::RunFlushLoop(), on startup). In any case, it does seem that the call to CreateTestH264Decoder happens "too early" in e10s mode (in the content process).
Attachment #8605606 -
Flags: feedback?(smichaud) → feedback-
Assignee | ||
Comment 24•9 years ago
|
||
> Interestingly my own crash reports all show the URL as "about:blank" > or "about:home". That's a bug, I suppose. I've opened bug 1165017 on this issue.
Assignee | ||
Comment 25•9 years ago
|
||
I wonder if this bug and bug 1165017 are related -- if both bugs aren't caused by some kind of global initialization taking place "too late" in the content process.
Assignee | ||
Comment 26•9 years ago
|
||
I no longer think bug 1165017 is related -- it seems characteristic of almost all content process crashes, on all platforms.
Comment 27•9 years ago
|
||
I was able to reproduce this on my 2015 MBP. I am going to #ifdef the code that causes this out on everything except Windows in bug 1163814, since it causes problems on Android too. At this stage I think we only really need this code on Windows, so Steven I don't think you need to spend more time trying to make it work on MacOSX. I have a patch in bug 1163814 going through Try. I will hopefully be able to land it later on tonight if the tree re-opens, but Steven you're welcome to land it for me tomorrow (which is Saturday for me) if I don't get to it tonight.
Assignee | ||
Comment 28•9 years ago
|
||
I couldn't give up on this, after already having spent so much time on it :-) The fix is *very* simple. The problem is that (with this bug's STR) MacIOSurfaceLib::kPropIsGlobal (a CFStringRef) is getting referenced before it's been initialized (before MacIOSurfaceLib::LoadLibrary() has been called and the IOSurface framework has been dlopened). My patch just uses a copy of the CFStringRef to which MacIOSurfaceLib::kPropIsGlobal will eventually point.
Assignee: nobody → smichaud
Attachment #8606531 -
Flags: review?(jyavenard)
Comment 29•9 years ago
|
||
I have to say, I'm always super impressed on your ability to reproduce a bug and find a fix. This is very impressive! how can MacIOSurfaceLib::LoadLibrary() not be called earlier though? Shouldn't we make the reference to kPropIsGlobal a function getter instead that would load the library as required ?
Comment 30•9 years ago
|
||
Comment on attachment 8606531 [details] [diff] [review] Fix Review of attachment 8606531 [details] [diff] [review]: ----------------------------------------------------------------- A bit unfortunate that work around. We did lots of work to avoid copying those CFSTR and use the system's one.
Attachment #8606531 -
Flags: review?(jyavenard) → review+
Assignee | ||
Comment 31•9 years ago
|
||
> I have to say, I'm always super impressed on your ability to > reproduce a bug and find a fix. This is very impressive! Thanks! :-) > Shouldn't we make the reference to kPropIsGlobal a function getter > instead that would load the library as required? I gave some thought to calling MacIOSurfaceLib::LoadLibrary() at the beginning of the function. If we were going to do this "properly", I think that'd be better than writing a getter for kPropIsGlobal. But I'd want to look at the context before committing to this. In particular I'd want to find out if there is some reason we don't call MacIOSurfaceLib::LoadLibrary() as early in the content process as we do in the main process. The patch I posted has the advantage of being very simple, though, and almost certainly very safe. Apple has long followed the pattern of making its CFStringRef variable names match their values -- so, for example, the value of kCFBooleanTrue is CFSTR("kCFBooleanTrue"), and this is exactly what you'd expect. Let me look further into this on Monday.
Assignee | ||
Comment 32•9 years ago
|
||
> In particular I'd want to find out if there is some reason we don't
> call MacIOSurfaceLib::LoadLibrary() as early in the content process
> as we do in the main process.
As best I can tell it's just an accident: For some reason the calls
to nsHtml5TreeOpExecutor::RunFlushLoop() start much earlier (as
Firefox is being started up) in the content process than in the main
process. Since calls to RunFlushLoop() are supposed to be blocked
while scripts are running, my guess is that in the main process
they're blocked until fairly late by Chrome-priviledged scripts that
run on startup.
So in the main process the MacIOSurfaceLib libraries are never
dlopened from calls to RunFlushLoop() (by the time calls to
RunFlushLoop() start, these librararies have always already been
dlopened by other means). I don't *know* that it's harmful to do this
from RunFlushLoop(), but my gut tells me it's probably safer not to.
So I'm going to go with my original fix ... once mozilla-inbound
reopens.
Assignee | ||
Comment 34•9 years ago
|
||
Oddly, the file I patched seems to have moved very recently to elsewhere in the tree. So I had to unbitrot my patch before landing. Carrying forward r+.
Attachment #8606531 -
Attachment is obsolete: true
Attachment #8607300 -
Flags: review+
Comment 35•9 years ago
|
||
the PlatformDecoderModules have moved to their own directory (dom/media/platforms)
Comment 36•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/d9b6cc4ce4cc
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Regressed video on OS X, see bug 1166394.
Assignee | ||
Comment 38•9 years ago
|
||
I don't know how my patch for this bug could have caused bug 1166394, but it seems to have done so. I'm going to back it out on mozilla-inbound as soon as I can.
Assignee | ||
Comment 39•9 years ago
|
||
Comment on attachment 8607300 [details] [diff] [review] Fix (unbitrot) I've done the backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/a8463ba74546
Assignee | ||
Comment 40•9 years ago
|
||
This bug is now REOPENED. But it's much less urgent than before, since the crashes have been otherwise taken care of by bug 1163814.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Updated•9 years ago
|
Whiteboard: [STR in comment #17] → [STR in comment #17] [keep open]
Assignee | ||
Comment 41•9 years ago
|
||
This has landed on mozilla-central: https://hg.mozilla.org/mozilla-central/rev/a8463ba74546
Assignee | ||
Comment 42•9 years ago
|
||
> This has landed on mozilla-central:
The backout has landed on mozilla-central.
Comment 43•9 years ago
|
||
Alternative fix, just makes sure MacIOSurfaceLib is loaded early in startup.
Attachment #8608445 -
Flags: review?(jyavenard)
Comment 44•9 years ago
|
||
Comment on attachment 8608445 [details] [diff] [review] Load MacIOSurfaceLib early Review of attachment 8608445 [details] [diff] [review]: ----------------------------------------------------------------- I'm really not the best person to review this and I don't know the big picture on when that code is called. But if that works :)
Attachment #8608445 -
Flags: review?(jyavenard) → review+
Assignee | ||
Comment 45•9 years ago
|
||
I've looked around for a way to repair my original patch, but can't find one. So it looks like we'll need to keep the reference to the kIOSurfaceIsGlobal symbol (a CFStringRef) in the IOSurface framework. So Matt's approach (if it works as expected) is probably best. At first (after having recovered from being stunned by the bug 1166394 failures and others), I thought there might be confusion about exactly what kind of indirection we need (for example that we might need a CFStringRef* instead of a CFStringRef, or that we can't use the results of CFSTR() directly). All my variants on using CFSTR() directly or indirectly (or of using other ways to create CFStringRefs) still failed in the same way. Then I noticed that the IOSurface we're creating needs to be passed between processes (bug 1064847 comment #14), and that the failures only occur in e10s mode -- only when the IOSurface actually is being passed between processes. So somehow, for kIOSurfaceIsGlobal we *must* use the symbol in the IOSurface framework. Somehow the OS can flatten/serialize that when passing the IOSurface between processes, but not just any old CFStringRef. At some point I'd like to get to the bottom of this. But right now I don't have the time. And in any case this bug is no longer urgent.
Assignee | ||
Comment 46•9 years ago
|
||
For future reference, here are two IOKit methods that might be useful in getting to the bottom of the problem described in comment #45: CFDataRef IOCFSerialize(CFTypeRef object, CFOptionFlags options); CFTypeRef IOCFUnserialize(const char *buffer, CFAllocatorRef allocator, CFOptionFlags options, CFStringRef *errorString); http://opensource.apple.com/source/IOKitUser/IOKitUser-1050.1.21/
Comment 48•9 years ago
|
||
(In reply to Steven Michaud from comment #45) > I've looked around for a way to repair my original patch, but can't find > one. So it looks like we'll need to keep the reference to the > kIOSurfaceIsGlobal symbol (a CFStringRef) in the IOSurface framework. So > Matt's approach (if it works as expected) is probably best. I think the issue isn't just about having that symbol not defined. CreateOutputConfiguration() just creates the dictionaries. But the callers of CreateOutputConfiguration() in particular VDADecoderCreate() or VTDecompressionSessionCreate() themselves would likely refer to the IOSurface framework internally. So I don't think we could solve the problem just by copying the key.
Assignee | ||
Comment 49•9 years ago
|
||
For CFStringRefs, it isn't supposed to matter if you have two separate copies or two references to one original. CFEqual is supposed to return 'true' for two different but lexically identical CFStringRefs, and CFHash is supposed to return the same value for each. So I still do think serialization/flattening is involved here ... but it would take a long time to figure out exactly how.
Assignee | ||
Comment 50•9 years ago
|
||
For those who are curious, I'd start my investigation by hooking calls to IOCFSerialize and IOCFUnserialize, particularly from the IOSurface framework, using http://people.mozilla.org/~stmichaud/ReverseEngineering/InterposeLibraryTemplate/.
Comment 51•9 years ago
|
||
(In reply to Steven Michaud from comment #49) > For CFStringRefs, it isn't supposed to matter if you have two separate > copies or two references to one original. CFEqual is supposed to return > 'true' for two different but lexically identical CFStringRefs, and CFHash is > supposed to return the same value for each. this is not what I had in mind. Not that VDA or VideoToolbox would use references to that particular CFStringRef (as you said, it doesn't matter), but that it will likely use function from the IOSurface framework, framework that hasn't been loaded yet.
Updated•9 years ago
|
Whiteboard: [STR in comment #17] [keep open] → [STR in comment #17]
Comment 52•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/5d6c5b470d38
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 53•9 years ago
|
||
Matt, your patch has been on m-c for about a week, without causing any problems (at least that I'm aware of). Could you request uplift to aurora (the 40 branch)?
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(matt.woodrow)
Comment 54•9 years ago
|
||
Comment on attachment 8608445 [details] [diff] [review] Load MacIOSurfaceLib early Approval Request Comment [Feature/regressing bug #]: Bug 1160321 [User impact if declined]: Startup crashes [Describe test coverage new/current, TreeHerder]: Manually tested, few weeks on m-c [Risks and why]: Low risk [String/UUID change made/needed]: None.
Flags: needinfo?(matt.woodrow)
Attachment #8608445 -
Flags: approval-mozilla-aurora?
Comment 55•9 years ago
|
||
Comment on attachment 8608445 [details] [diff] [review] Load MacIOSurfaceLib early Thanks. Taking it to remove a startup crash.
Attachment #8608445 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 57•9 years ago
|
||
It looks like Bug 1160321 was uplifted to 39. Should this go along with it? Thanks.
Flags: needinfo?(matt.woodrow)
Updated•9 years ago
|
status-firefox39:
--- → affected
tracking-firefox39:
--- → +
Comment 58•9 years ago
|
||
Comment on attachment 8608445 [details] [diff] [review] Load MacIOSurfaceLib early Approval Request Comment [Feature/regressing bug #]: Bug 1160321 [User impact if declined]: Startup crashes [Describe test coverage new/current, TreeHerder]: Manually tested, few weeks on m-c [Risks and why]: Low risk [String/UUID change made/needed]: None.
Flags: needinfo?(matt.woodrow)
Attachment #8608445 -
Flags: approval-mozilla-beta?
Comment 59•9 years ago
|
||
Comment on attachment 8608445 [details] [diff] [review] Load MacIOSurfaceLib early Approved for uplift to beta.
Attachment #8608445 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Updated•9 years ago
|
Flags: qe-verify+
Comment 61•9 years ago
|
||
Reproduced with Nightly from 2015-05-14 on Mac OS X 10.9.5: bp-9f5b10fb-7269-444e-b42a-fe3e32150619 Verified fixed with 39.0b7 (Build ID: 20150618135210) and latest 41.0a1 (from 2015-06-18), under Mac OS X 10.9.5 and Yosemite 10.10. Also, in Socorro there are no new reports for this signatures: * https://crash-stats.mozilla.com/signature/?signature=CoreFoundation%400x117d8&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1 * https://crash-stats.mozilla.com/signature/?signature=CoreFoundation%400x11a5b&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1
Updated•9 years ago
|
Flags: qe-verify+
You need to log in
before you can comment on or make changes to this bug.
Description
•