Crash in [@ RtlDeleteCriticalSection | mozilla::layers::ImageContainer::~ImageContainer]
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
People
(Reporter: philipp, Assigned: ktaeleman)
References
(Regression)
Details
(Keywords: crash, regression, sec-moderate, Whiteboard: [adv-main82+r][adv-esr78.4+r])
Crash Data
Attachments
(2 files, 2 obsolete files)
47 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
jcristau
:
approval-mozilla-esr78+
|
Details | Review |
47 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
jcristau
:
approval-mozilla-esr78+
tjr
:
sec-approval+
|
Details | Review |
This bug is for crash report bp-7898d616-3ac6-4e85-b11e-3f0370200803.
Top 10 frames of crashing thread:
0 ntdll.dll RtlDeleteCriticalSection
1 xul.dll mozilla::layers::ImageContainer::~ImageContainer gfx/layers/ImageContainer.cpp:221
2 xul.dll mozilla::layers::ImageLayer::~ImageLayer gfx/layers/ImageLayers.cpp:22
3 xul.dll mozilla::layers::ClientImageLayer::~ClientImageLayer gfx/layers/client/ClientImageLayer.cpp:35
4 xul.dll mozilla::layers::ContainerLayer::RemoveAllChildren gfx/layers/Layers.cpp:846
5 xul.dll mozilla::layers::ClientContainerLayer::~ClientContainerLayer gfx/layers/client/ClientContainerLayer.h:36
6 xul.dll mozilla::layers::ClientContainerLayer::~ClientContainerLayer gfx/layers/client/ClientContainerLayer.h:35
7 xul.dll mozilla::layers::ContainerLayer::RemoveAllChildren gfx/layers/Layers.cpp:846
8 xul.dll mozilla::layers::ClientContainerLayer::~ClientContainerLayer gfx/layers/client/ClientContainerLayer.h:36
9 xul.dll mozilla::layers::ClientContainerLayer::~ClientContainerLayer gfx/layers/client/ClientContainerLayer.h:35
these content crash reports on windows are getting much more frequent in firefox 79.
url correlations are pointing towards various instances of the web game "mahjong dimensions" as the source of the crashing tab:
http://www.mahjong.fr/jouer/mahjongdimensions.php
https://spiele.rtl.de/mahjong-spiele/mahjongg-dimensions-candy/spielen.html
https://www.wildtangent.com/Play/mahjongg-dimensions-html5
curiously german users seem to be overrepresented in this crash signature too:
(75.58% in signature vs 10.19% overall) useragent_locale = de
Comment 1•4 years ago
|
||
:ktaleman Triaging as REO for 79. Could you set a priority and severity for this bug?
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 2•4 years ago
|
||
The volume of this crash on 79 is pretty worrying. Kris, is there anything we can do to help this investigation forward?
Comment 4•4 years ago
|
||
Yes, this slipped under my radar, but it is at the top of the list now.
Comment 5•4 years ago
|
||
So we don't have much to go on besides:
- It is non-WR.
- It is with canvas/WebGL.
- It is Windows only as best I can tell (trying to find signature morphs).
Reviewing the pushlog from 78.0.2 to 79.0:
I believe bug 1632249 or related is the most likely cause of the regression given it was a pretty big change wrt to canvas/WebGL. Jeff, are you able to reproduce and/or have any ideas about this?
Comment 6•4 years ago
|
||
RtlDeleteCriticalSection
could be from the dtor for RecursiveMutex ImageContainer::mRecursiveMutex
.
https://searchfox.org/mozilla-central/source/xpcom/threads/RecursiveMutex.cpp#60
https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletecriticalsection
This seems like either double-deletion or delete-while-locked.
There's no WebGL?
in the report I saw, so it can happen without webgl.
Comment 7•4 years ago
|
||
I don't think I touched anything that touches that code, so this is weird.
Comment 8•4 years ago
|
||
Since there are reports from 77 and 78 it's possible this race condition just got 1000x more common due to another change. Over 90% of the reports are unique for their install, so most people aren't running into this more than once, which supports the race condition theory.
Comment 9•4 years ago
|
||
It's NS_INLINE_DECL_THREADSAFE_REFCOUNTING(ImageContainer)
so double-delete seems unlikely.
Maybe we borrow a bare pointer somewhere?
Comment 11•4 years ago
|
||
I took a look at one of the minidumps for this.
It looks like the CRITICAL_SECTION object within the RecursiveMutex has the DebugInfo member as 0xffffffffffffff01.
Valid values for this are valid pointer addresses, 0, and -1 (0xffffffffffffffff).
It is none of these, and we crash attempting to dereference it (0x10 within it, causing a faulting address of 0xffffffffffffff11).
Interestingly, it seems like the vast majority of reports all have the same crashing address, so it doesn't really seem like random corruption.
This value is stored directly within the ImageContainer class, so UAF could be letting someone else write it, but I can't see why it'd always have the same mutation.
Comment 12•4 years ago
|
||
Given the volume of the crash, we need to reprioritize this bug. This is in topcrash territory.
Assignee | ||
Comment 13•4 years ago
•
|
||
I've been unable to reproduce this issue. I've tried stress testing the imagecontainers by minimizing/maximizing which seems to trigger construction and destruction of the instances, but no luck.
Looking at the crash stats it looks like this crash started happening ~73.0.1, but with different callstacks (initially with Webrender), all resulting in a crash in an ImageContainer RtlDeleteCriticalSection crash.
One thing I was wondering about was the potential for a double free or so, but as far as I can see, the memory should have been filled with e5 or e4 if the data was freed and re-used.
I'm not sure how to progress here, one thing we could do is add padding around the mRecursiveMutex and assert on those values to see if it is caused by something writing to the memory at the same offset, but since we haven't been able to reproduce this, it would have to be checked in and verified in the beta population.
Updated•4 years ago
|
Assignee | ||
Comment 14•4 years ago
|
||
Running this through Application Verifier with critical section analysis enabled, I get
=======================================
VERIFIER STOP 0000000000000211: pid 0x653C: Critical section is already initialized.
00007DF42AE05F60 : Critical section address. Run !cs -s <address> to get more information.
0000016C67027FD0 : Critical section debug info address.
0000016C57E666F0 : First initialization stack trace. Use dps to dump it if non-NULL.
0000000000000000 : Not used.
=======================================
here:
vrfcore.dll!VerifierStopMessageEx() Unknown
vfbasics.dll!AVrfpInitializeCriticalSectionCommon() Unknown
KernelBase.dll!InitializeCriticalSectionAndSpinCount() Unknown
mozglue.dll!Mutex::Init() Line 35 C++
mozglue.dll!arena_t::arena_t(arena_params_s * aParams) Line 3516 C++
mozglue.dll!ArenaCollection::CreateArena(bool aIsPrivate, arena_params_s * aParams) Line 3603 C++
> mozglue.dll!Allocator<MozJemallocBase>::moz_create_arena_with_params(arena_params_s * aParams) Line 4444 C++
mozglue.dll!replace_moz_create_arena_with_params(arena_params_s * aParams) Line 1431 C++
mozglue.dll!Allocator<ReplaceMallocBase>::moz_create_arena_with_params(arena_params_s * arg1) Line 111 C++
mozglue.dll!moz_create_arena_with_params(arena_params_s * arg1) Line 111 C++
xul.dll!mozilla::dom::DOMArena::DOMArena() Line 38 C++
xul.dll!mozilla::dom::DocGroup::DocGroup(mozilla::dom::BrowsingContextGroup * aBrowsingContextGroup, const nsTSubstring<char> & aKey) Line 178 C++
xul.dll!mozilla::dom::DocGroup::Create(mozilla::dom::BrowsingContextGroup * aBrowsingContextGroup, const nsTSubstring<char> & aKey) Line 127 C++
xul.dll!mozilla::dom::BrowsingContextGroup::AddDocument(const nsTSubstring<char> & aKey, mozilla::dom::Document * aDocument) Line 324 C++
xul.dll!mozilla::dom::Document::GetDocGroupOrCreate() Line 6760 C++
xul.dll!nsNodeInfoManager::Allocate(unsigned __int64 aSize) Line 275 C++
xul.dll!nsINode::operator new(unsigned __int64 aSize, nsNodeInfoManager * aManager) Line 121 C++
xul.dll!NS_NewHTMLSharedElement(already_AddRefed<mozilla::dom::NodeInfo> && aNodeInfo, mozilla::dom::FromParser aFromParser) Line 25 C++
xul.dll!NS_NewHTMLHtmlElement(already_AddRefed<mozilla::dom::NodeInfo> && aNodeInfo, mozilla::dom::FromParser aFromParser) Line 1264 C++
xul.dll!nsContentDLF::CreateBlankDocument(nsILoadGroup * aLoadGroup, nsIPrincipal * aPrincipal, nsIPrincipal * aPartitionedPrincipal, nsDocShell * aContainer) Line 255 C++
xul.dll!nsDocShell::CreateAboutBlankContentViewer(nsIPrincipal * aPrincipal, nsIPrincipal * aPartitionedPrincipal, nsIContentSecurityPolicy * aCSP, nsIURI * aBaseURI, const mozilla::Maybe<nsILoadInfo::CrossOriginEmbedderPolicy> & aCOEP, bool aTryToSaveOldPresentation, bool aCheckPermitUnload, mozilla::dom::WindowGlobalChild * aActor) Line 6503 C++
xul.dll!nsDocShell::EnsureContentViewer() Line 6354 C++
xul.dll!nsDocShell::GetDocument() Line 3013 C++
xul.dll!nsPIDOMWindowOuter::MaybeCreateDoc() Line 7774 C++
xul.dll!nsPIDOMWindowOuter::GetDoc() Line 797 C++
xul.dll!nsPIDOMWindowOuter::EnsureInnerWindow() Line 687 C++
xul.dll!mozilla::dom::ToJSValue(JSContext * aCx, const mozilla::dom::WindowProxyHolder & aArgument, JS::MutableHandle<JS::Value> aValue) Line 81 C++
xul.dll!mozilla::dom::WrapObject(JSContext * cx, const mozilla::dom::WindowProxyHolder & p, JS::MutableHandle<JS::Value> rval) Line 1220 C++
xul.dll!mozilla::dom::ContentFrameMessageManager_Binding::get_content(JSContext * cx, JS::Handle<JSObject *> obj, void * void_self, JSJitGetterCallArgs args) Line 1706 C++
xul.dll!mozilla::dom::binding_detail::GenericGetter<mozilla::dom::binding_detail::NormalThisPolicy,mozilla::dom::binding_detail::ThrowExceptions>(JSContext * cx, unsigned int argc, JS::Value * vp) Line 3109 C++
xul.dll!CallJSNative(JSContext * cx, bool(*)(JSContext *, unsigned int, JS::Value *) native, js::CallReason reason, const JS::CallArgs & args) Line 507 C++
xul.dll!js::InternalCallOrConstruct(JSContext * cx, const JS::CallArgs & args, js::MaybeConstruct construct, js::CallReason reason) Line 599 C++
xul.dll!InternalCall(JSContext * cx, const js::AnyInvokeArgs & args, js::CallReason reason) Line 664 C++
xul.dll!js::Call(JSContext * cx, JS::Handle<JS::Value> fval, JS::Handle<JS::Value> thisv, const js::AnyInvokeArgs & args, JS::MutableHandle<JS::Value> rval, js::CallReason reason) Line 681 C++
xul.dll!js::CallGetter(JSContext * cx, JS::Handle<JS::Value> thisv, JS::Handle<JS::Value> getter, JS::MutableHandle<JS::Value> rval) Line 805 C++
xul.dll!CallGetter(JSContext * cx, JS::Handle<JSObject *> obj, JS::Handle<JS::Value> receiver, JS::Handle<js::Shape *> shape, JS::MutableHandle<JS::Value> vp) Line 2374 C++
xul.dll!GetExistingProperty<js::CanGC>(JSContext * cx, JS::Handle<JS::Value> receiver, JS::Handle<js::NativeObject *> obj, JS::Handle<js::Shape *> shape, JS::MutableHandle<JS::Value> vp) Line 2426 C++
xul.dll!js::NativeGetExistingProperty(JSContext * cx, JS::Handle<JSObject *> receiver, JS::Handle<js::NativeObject *> obj, JS::Handle<js::Shape *> shape, JS::MutableHandle<JS::Value> vp) Line 2435 C++
xul.dll!js::FetchName<js::GetNameMode::Normal>(JSContext * cx, JS::Handle<JSObject *> receiver, JS::Handle<JSObject *> holder, JS::Handle<js::PropertyName *> name, JS::Handle<JS::PropertyResult> prop, JS::MutableHandle<JS::Value> vp) Line 169 C++
xul.dll!js::GetEnvironmentName<js::GetNameMode::Normal>(JSContext * cx, JS::Handle<JSObject *> envChain, JS::Handle<js::PropertyName *> name, JS::MutableHandle<JS::Value> vp) Line 220 C++
xul.dll!GetNameOperation(JSContext * cx, js::InterpreterFrame * fp, unsigned char * pc, JS::MutableHandle<JS::Value> vp) Line 246 C++
xul.dll!Interpret(JSContext * cx, js::RunState & state) Line 3474 C++
xul.dll!js::RunScript(JSContext * cx, js::RunState & state) Line 468 C++
xul.dll!js::ExecuteKernel(JSContext * cx, JS::Handle<JSScript *> script, JS::Handle<JSObject *> envChainArg, JS::Handle<JS::Value> newTargetValue, js::AbstractFramePtr evalInFrame, JS::MutableHandle<JS::Value> result) Line 856 C++
xul.dll!js::Execute(JSContext * cx, JS::Handle<JSScript *> script, JS::Handle<JSObject *> envChain, JS::MutableHandle<JS::Value> rval) Line 888 C++
xul.dll!ExecuteScript(JSContext * cx, JS::Handle<JSObject *> envChain, JS::Handle<JSScript *> script, JS::MutableHandle<JS::Value> rval) Line 377 C++
xul.dll!ExecuteScript(JSContext * cx, JS::Handle<JS::StackGCVector<JSObject *,js::TempAllocPolicy>> envChain, JS::Handle<JSScript *> scriptArg, JS::MutableHandle<JS::Value> rval) Line 396 C++
xul.dll!JS::CloneAndExecuteScript(JSContext * cx, JS::Handle<JS::StackGCVector<JSObject *,js::TempAllocPolicy>> envChain, JS::Handle<JSScript *> scriptArg, JS::MutableHandle<JS::Value> rval) Line 452 C++
xul.dll!nsMessageManagerScriptExecutor::LoadScriptInternal(JS::Handle<JSObject *> aMessageManager, const nsTSubstring<char16_t> & aURL, bool aRunInUniqueScope) Line 1163 C++
xul.dll!mozilla::dom::InProcessBrowserChildMessageManager::LoadFrameScript(const nsTSubstring<char16_t> & aURL, bool aRunInGlobalScope) Line 301 C++
xul.dll!nsFrameLoader::DoLoadMessageManagerScript(const nsTSubstring<char16_t> & aURL, bool aRunInGlobalScope) Line 2816 C++
xul.dll!nsFrameMessageManager::LoadScript(const nsTSubstring<char16_t> & aURL, bool aAllowDelayedLoad, bool aRunInGlobalScope, mozilla::ErrorResult & aError) Line 315 C++
xul.dll!nsFrameMessageManager::LoadPendingScripts(nsFrameMessageManager * aManager, nsFrameMessageManager * aChildMM) Line 789 C++
xul.dll!mozilla::dom::MessageBroadcaster::AddChildManager(mozilla::dom::MessageListenerManager * aManager) Line 33 C++
xul.dll!mozilla::dom::MessageSender::InitWithCallback(mozilla::dom::ipc::MessageManagerCallback * aCallback) Line 26 C++
xul.dll!nsFrameLoader::ReallyLoadFrameScripts() Line 2954 C++
xul.dll!nsFrameLoader::MaybeCreateDocShell() Line 2222 C++
xul.dll!nsFrameLoader::ReallyStartLoadingInternal() Line 707 C++
xul.dll!nsFrameLoader::ReallyStartLoading() Line 612 C++
xul.dll!mozilla::dom::Document::MaybeInitializeFinalizeFrameLoaders() Line 8580 C++
xul.dll!mozilla::detail::RunnableMethodArguments<>::applyImpl<mozilla::dom::Document,void (mozilla::dom::Document::*)()>(mozilla::dom::Document * o, void(mozilla::dom::Document::*)() m, mozilla::Tuple<> & args, std::integer_sequence<unsigned long long>) Line 1188 C++
xul.dll!mozilla::detail::RunnableMethodArguments<>::apply<mozilla::dom::Document,void (mozilla::dom::Document::*)()>(mozilla::dom::Document * o, void(mozilla::dom::Document::*)() m) Line 1194 C++
xul.dll!mozilla::detail::RunnableMethodImpl<mozilla::dom::Document *,void (mozilla::dom::Document::*)(),1,mozilla::RunnableKind::Standard>::Run() Line 1243 C++
xul.dll!nsContentUtils::RemoveScriptBlocker() Line 5373 C++
xul.dll!nsAutoScriptBlocker::~nsAutoScriptBlocker() Line 3480 C++
xul.dll!nsFrameLoaderOwner::ChangeRemotenessCommon(const nsFrameLoaderOwner::ChangeRemotenessContextType & aContextType, bool aSwitchingInProgressLoad, bool aIsRemote, mozilla::dom::BrowsingContextGroup * aGroup, std::function<void ()> & aFrameLoaderInit, mozilla::ErrorResult & aRv) Line 152 C++
xul.dll!nsFrameLoaderOwner::ChangeRemoteness(const mozilla::dom::RemotenessOptions & aOptions, mozilla::ErrorResult & rv) Line 215 C++
xul.dll!mozilla::dom::XULFrameElement_Binding::changeRemoteness(JSContext * cx_, JS::Handle<JSObject *> obj, void * void_self, const JSJitMethodCallArgs & args) Line 510 C++
xul.dll!mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy,mozilla::dom::binding_detail::ThrowExceptions>(JSContext * cx, unsigned int argc, JS::Value * vp) Line 3227 C++
xul.dll!CallJSNative(JSContext * cx, bool(*)(JSContext *, unsigned int, JS::Value *) native, js::CallReason reason, const JS::CallArgs & args) Line 507 C++
xul.dll!js::InternalCallOrConstruct(JSContext * cx, const JS::CallArgs & args, js::MaybeConstruct construct, js::CallReason reason) Line 599 C++
xul.dll!InternalCall(JSContext * cx, const js::AnyInvokeArgs & args, js::CallReason reason) Line 664 C++
xul.dll!js::CallFromStack(JSContext * cx, const JS::CallArgs & args) Line 668 C++
xul.dll!Interpret(JSContext * cx, js::RunState & state) Line 3336 C++
xul.dll!js::RunScript(JSContext * cx, js::RunState & state) Line 468 C++
xul.dll!js::InternalCallOrConstruct(JSContext * cx, const JS::CallArgs & args, js::MaybeConstruct construct, js::CallReason reason) Line 636 C++
xul.dll!InternalCall(JSContext * cx, const js::AnyInvokeArgs & args, js::CallReason reason) Line 664 C++
xul.dll!js::Call(JSContext * cx, JS::Handle<JS::Value> fval, JS::Handle<JS::Value> thisv, const js::AnyInvokeArgs & args, JS::MutableHandle<JS::Value> rval, js::CallReason reason) Line 681 C++
xul.dll!JS_CallFunctionValue(JSContext * cx, JS::Handle<JSObject *> obj, JS::Handle<JS::Value> fval, const JS::HandleValueArray & args, JS::MutableHandle<JS::Value> rval) Line 2768 C++
xul.dll!nsXPCWrappedJS::CallMethod(unsigned short methodIndex, const nsXPTMethodInfo * info, nsXPTCMiniVariant * nativeParams) Line 963 C++
xul.dll!PrepareAndDispatch(nsXPTCStubBase * self, unsigned int methodIndex, unsigned __int64 * args, unsigned __int64 * gprData, double * fprData) Line 168 C++
[External Code]
xul.dll!nsObserverList::NotifyObservers(nsISupports * aSubject, const char * aTopic, const char16_t * someData) Line 69 C++
xul.dll!nsObserverService::NotifyObservers(nsISupports * aSubject, const char * aTopic, const char16_t * aSomeData) Line 290 C++
xul.dll!mozilla::dom::ContentParent::ActorDestroy(mozilla::ipc::IProtocol::ActorDestroyReason why) Line 1888 C++
xul.dll!mozilla::ipc::IProtocol::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason aWhy) Line 602 C++
xul.dll!mozilla::dom::PContentParent::OnChannelError() Line 15999 C++
xul.dll!mozilla::dom::ContentParent::OnChannelError() Line 1770 C++
xul.dll!mozilla::ipc::MessageChannel::NotifyMaybeChannelError() Line 2564 C++
xul.dll!mozilla::ipc::MessageChannel::OnNotifyMaybeChannelError() Line 2594 C++
xul.dll!mozilla::detail::RunnableMethodArguments<>::applyImpl<mozilla::ipc::MessageChannel,void (mozilla::ipc::MessageChannel::*)()>(mozilla::ipc::MessageChannel * o, void(mozilla::ipc::MessageChannel::*)() m, mozilla::Tuple<> & args, std::integer_sequence<unsigned long long>) Line 1188 C++
xul.dll!mozilla::detail::RunnableMethodArguments<>::apply<mozilla::ipc::MessageChannel,void (mozilla::ipc::MessageChannel::*)()>(mozilla::ipc::MessageChannel * o, void(mozilla::ipc::MessageChannel::*)() m) Line 1194 C++
xul.dll!mozilla::detail::RunnableMethodImpl<mozilla::ipc::MessageChannel *,void (mozilla::ipc::MessageChannel::*)(),0,mozilla::RunnableKind::Cancelable>::Run() Line 1243 C++
xul.dll!mozilla::RunnableTask::Run() Line 242 C++
xul.dll!mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(const mozilla::detail::BaseAutoLock<mozilla::Mutex &> & aProofOfLock) Line 512 C++
xul.dll!mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(const mozilla::detail::BaseAutoLock<mozilla::Mutex &> & aProofOfLock) Line 371 C++
xul.dll!mozilla::TaskController::ProcessPendingMTTask(bool aMayWait) Line 168 C++
xul.dll!mozilla::TaskController::InitializeInternal::<unnamed-tag>::operator()() Line 86 C++
xul.dll!mozilla::detail::RunnableFunction<`lambda at c:/Users/krist/code/gecko/xpcom/threads/TaskController.cpp:86:7'>::Run() Line 577 C++
xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 1236 C++
xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 513 C++
xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 84 C++
xul.dll!MessageLoop::RunInternal() Line 334 C++
xul.dll!MessageLoop::RunHandler() Line 328 C++
xul.dll!MessageLoop::Run() Line 310 C++
xul.dll!nsBaseAppShell::Run() Line 139 C++
xul.dll!nsAppShell::Run() Line 602 C++
xul.dll!nsAppStartup::Run() Line 270 C++
xul.dll!XREMain::XRE_mainRun() Line 4751 C++
xul.dll!XREMain::XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 4941 C++
xul.dll!XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 4995 C++
xul.dll!mozilla::BootstrapImpl::XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 45 C++
firefox.exe!do_main(int argc, char * * argv, char * * envp) Line 217 C++
firefox.exe!NS_internal_main(int argc, char * * argv, char * * envp) Line 331 C++
firefox.exe!wmain(int argc, wchar_t * * argv) Line 131 C++
[External Code]
I'm still figuring out whether this is a false positive as this could happen when allocations don't get their memory zerod out, but it seems like most of the data has valid members. The arena allocation code is run quite frequently and does not trigger the error in other usecases.
Comment 15•4 years ago
|
||
Hello,
I'd just like to add that some users of my comic reader (https://github.com/chocolatkey/xbreader) are also experiencing what seems to be the same crash, although I cannot reproduce it on my own machine. I asked one to submit the crash report, and it's here: https://crash-stats.mozilla.org/report/index/bf6433ef-f17e-47a3-aae8-470200200912 . I think it definitely canvas-related. An example URL is here: https://xb.yrkz.me/#!/ptbr-ex-1 . Tends to happen when you start stressing it, as in resizing quickly, moving around pages quickly etc.
Comment 16•4 years ago
|
||
(In reply to chocolatkey from comment #15)
Hello,
I'd just like to add that some users of my comic reader (https://github.com/chocolatkey/xbreader) are also experiencing what seems to be the same crash, although I cannot reproduce it on my own machine. I asked one to submit the crash report, and it's here: https://crash-stats.mozilla.org/report/index/bf6433ef-f17e-47a3-aae8-470200200912 . I think it definitely canvas-related. An example URL is here: https://xb.yrkz.me/#!/ptbr-ex-1 . Tends to happen when you start stressing it, as in resizing quickly, moving around pages quickly etc.
Also, to add, I can ask these users (my acquaintances) to perform any special tests you need to help resolve this bug. Here is another crash report: https://crash-stats.mozilla.org/report/index/c7758e8b-a2ff-47bd-a98d-040380200912
Comment 17•4 years ago
|
||
It'd be worth trying to look at what changed in 79 to see if anything stands out.
One thing I noticed is that ImageContainer is threadsafe (NS_INLINE_DECL_THREADSAFE_REFCOUNTING), but also uses SupportsWeakPtr
, which isn't threadafe. I can't find cases where we'd use the two at the same time, but that could be problematic.
Assignee | ||
Comment 18•4 years ago
|
||
I've found what's causing the Application Verifier to throw an exception.
The lock taken here: https://searchfox.org/mozilla-central/source/memory/build/mozjemalloc.cpp#3515 initializes a critical section, but the critical section is never released. Other use cases of this mutex type are all static, causing (I'm guessing) the criticalsection to be released when the process ends, but in this case, every arena construction will leak a critical section.
I'm not sure this is directly related to this issue, but since we reuse memory, there is some undefined behavior in calling a mutex init twice.
Assignee | ||
Comment 19•4 years ago
|
||
Assignee | ||
Comment 20•4 years ago
|
||
(In reply to chocolatkey from comment #16)
(In reply to chocolatkey from comment #15)
Hello,
I'd just like to add that some users of my comic reader (https://github.com/chocolatkey/xbreader) are also experiencing what seems to be the same crash, although I cannot reproduce it on my own machine. I asked one to submit the crash report, and it's here: https://crash-stats.mozilla.org/report/index/bf6433ef-f17e-47a3-aae8-470200200912 . I think it definitely canvas-related. An example URL is here: https://xb.yrkz.me/#!/ptbr-ex-1 . Tends to happen when you start stressing it, as in resizing quickly, moving around pages quickly etc.
Also, to add, I can ask these users (my acquaintances) to perform any special tests you need to help resolve this bug. Here is another crash report: https://crash-stats.mozilla.org/report/index/c7758e8b-a2ff-47bd-a98d-040380200912
Thanks for letting us know! We might take you up on this offer once we identify a potential fix.
Sadly I was also not able to reproduce the issue with your comic reader, but stressing with resizing/minimizing etc is more than we knew about this crash before!
@rares: Would you be able to try and reproduce this with these repro steps?
Assignee | ||
Comment 21•4 years ago
|
||
I'm able to reproduce this with @chocolatkey's comic reader site: https://xb.yrkz.me/#!/ptbr-ex-1.
Preconditions:
- Disable webrender (gfx.webrender.force-disable = true)
- Release build (Not occurring in debug)
It looks like the criticalsection fix attached does not fix the issue.
Updated•4 years ago
|
Comment 22•4 years ago
|
||
The only place the WeakPtr is needed is with ImageLib.
https://searchfox.org/mozilla-central/rev/30e70f2fe80c97bfbfcd975e68538cefd7f58b2a/image/Image.h#451
For testing purposes, we could just make it a strong pointer. If it fixed the problem, we could probably work around how we depend the WeakPtr. Expire old ImageContainers with only one strong reference like we do the surfaces in the SurfaceCache and/or hook it into the memory pressure signal.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 23•4 years ago
|
||
Removing the WeakPtr seems to fix it (or makes it harder to repro). While I was debugging this I ran into a Skia crash which distracted me a bit trying to get to the bottom of it, but it seems like it might be the same issue (https://bugzilla.mozilla.org/show_bug.cgi?id=1576843)
Assignee | ||
Comment 24•4 years ago
|
||
Comment 25•4 years ago
|
||
Comment 26•4 years ago
|
||
bugherder |
Assignee | ||
Comment 27•4 years ago
|
||
Approval Request Comment
[Feature/Bug causing the regression]:
N/A, bug seems to have been present from before FF70
[User impact if declined]:
Crashes will continue
[Is this code covered by automated tests?]:
Yes, but it is hard to reproduce so it has only shown up as intermittents
[Has the fix been verified in Nightly?]:
Yes
[Needs manual test from QE? If yes, steps to reproduce]:
Yes
STR:
- Go to https://xb.yrkz.me/#!/ptbr-ex-1
- Constantly resize/minimize/maximize window
[Why is the change risky/not risky?]:
It only changes a WeakPtr into a ThreadsafeWeakptr and does not change existing logic beyond that.
[String changes made/needed]:
N/A
Assignee | ||
Comment 28•4 years ago
|
||
@chocolatkey: Thank you so much for the repro case, this really helped us dig into the issue!
Comment 29•4 years ago
|
||
:ktaeleman You're welcome, I look forward to when the fix arrives in mainline FF!
Comment 30•4 years ago
|
||
Comment on attachment 9176656 [details]
Bug 1656987 - Change WeakPtr to ThreadSafeWeakPtr. r=aosmond
approved for 82.0b3
Comment 31•4 years ago
|
||
Comment on attachment 9177435 [details]
Uplift request for FF 82 Beta
moved the approval flag to the actual patch attachment :)
Updated•4 years ago
|
Comment 32•4 years ago
|
||
bugherder uplift |
Updated•4 years ago
|
Comment 33•4 years ago
|
||
Hi, I tried to reproduce this issue in Release, as well as Beta 82.0b2 but without success.. Im running an Nvidia GeForce GT 730 on my Windows 10 machine, does this issue only occur on specific GPUS ? are there any other prefs I should set ? or I could set to help me reproduce this issue on my end ?
Assignee | ||
Comment 34•4 years ago
|
||
(In reply to Rares Doghi from comment #33)
Hi, I tried to reproduce this issue in Release, as well as Beta 82.0b2 but without success.. Im running an Nvidia GeForce GT 730 on my Windows 10 machine, does this issue only occur on specific GPUS ? are there any other prefs I should set ? or I could set to help me reproduce this issue on my end ?
I couldn't reproduce it on my high end desktop, my laptop (still pretty high end) was able to repro quite quickly (~15-30 seconds of resizing). It's a race condition that depends who releases the pointer first, not GPU related and seems to occur more on slower cpus.
Assignee | ||
Comment 35•4 years ago
|
||
(In reply to Kris Taeleman (:ktaeleman) from comment #34)
(In reply to Rares Doghi from comment #33)
Hi, I tried to reproduce this issue in Release, as well as Beta 82.0b2 but without success.. Im running an Nvidia GeForce GT 730 on my Windows 10 machine, does this issue only occur on specific GPUS ? are there any other prefs I should set ? or I could set to help me reproduce this issue on my end ?
I couldn't reproduce it on my high end desktop, my laptop (still pretty high end) was able to repro quite quickly (~15-30 seconds of resizing). It's a race condition that depends who releases the pointer first, not GPU related and seems to occur more on slower cpus.
Oh, forgot to mention, make sure webrender is disabled.
Comment 36•4 years ago
|
||
Hi Kris, I force-disabled WebRender and tried this page https://xb.yrkz.me/#!/ptbr-ex-1 and now for me the tab crashes in Release, in our latest Nightly as well as our Latest Beta builds, I dont even need to stress it too much , with webrender off I just resize the page a few times and it crashes.
Comment 37•4 years ago
|
||
Same signature as before? Can you please link to a crash report?
Assignee | ||
Comment 38•4 years ago
|
||
It took me a while on my laptop (desktop doesn't repro), but you are right, I was able to repro the issue.
https://crash-stats.mozilla.org/report/index/8ac3dcc0-66bb-40b5-b1ca-1a2810200928
Callstack is a bit different resulting in similar crash, but instead of ImageContainer causing the crash, I'm getting mozilla::layers::ShadowableLayer::~ShadowableLayer() causing it this time. Maybe a similar issue is present there OR the fix did not fix the initial issue.
It would be interesting to see your callstack rares.
Assignee | ||
Comment 39•4 years ago
|
||
Second repro has same callstack as previous issue. Seems like the fix did not fix the issue. Re-opening
Updated•4 years ago
|
Comment 40•4 years ago
|
||
I'm attaching the link to one of my crash reports: https://crash-stats.mozilla.org/report/index/cff03138-d7ab-42c4-bca9-e19580200929
Comment 41•4 years ago
|
||
Comment on attachment 9176656 [details]
Bug 1656987 - Change WeakPtr to ThreadSafeWeakPtr. r=aosmond
Clearing approval flag to get this off the needs-uplift radar.
Comment 42•4 years ago
|
||
(In reply to Rares Doghi from comment #40)
I'm attaching the link to one of my crash reports: https://crash-stats.mozilla.org/report/index/cff03138-d7ab-42c4-bca9-e19580200929
I was looking at your crash report and the crashing modules see odd. nvwgf2umx_cfg.dll is NVidia but ownnq.dll, q_izrnukfh.dll and a_rtndic.dll? Are those maybe malware? A Google search comes up empty with respect to those DLLs.
If this is driver / NVidia related, you could update your old drivers (436.30) to 456.55 and see if it helps. Maybe do a malware scan with something like MalwareBytes.
Comment 43•4 years ago
|
||
https://crash-stats.mozilla.org/report/index/98bba450-7eba-4257-9d61-24b050200930
I have updated my Nvidia Drivers but I'm still getting these crashes as soon as I resize the window while on that test page: https://xb.yrkz.me/#!/ptbr-ex-1
Comment 44•4 years ago
|
||
(In reply to Rares Doghi from comment #43)
https://crash-stats.mozilla.org/report/index/98bba450-7eba-4257-9d61-24b050200930
I have updated my Nvidia Drivers but I'm still getting these crashes as soon as I resize the window while on that test page: https://xb.yrkz.me/#!/ptbr-ex-1
Ok, thanks for updating and testing. And you also ran a malware scan as well? I'm seeing three new crashing .DLLs from https://crash-stats.mozilla.org/report/index/98bba450-7eba-4257-9d61-24b050200930#tab-modules: dkndc.dll, v_iybjndmk.dll and r_zxlhti.dll.
Those are very suspect and I would suspect be malware or something in the browser that's not supposed to be there.
Comment 45•4 years ago
|
||
Nevermind. I was able to repro the crash on 82.0b5 x64 just now: https://crash-stats.mozilla.org/report/index/119abf5f-192d-4cf7-b5aa-240340200930
Assignee | ||
Comment 46•4 years ago
|
||
Looking at this further and seems like the location that writes 01 to the 0xffffffffffffffff value is:
[0x0] xul!mozilla::layers::CanvasRenderer::SetDirty
[0x1] xul!mozilla::layers::CanvasLayer::Updated + 0x7
[0x2] xul!mozilla::dom::HTMLCanvasElement::InvalidateCanvasContent + 0x194
[0x3] xul!mozilla::dom::ImageBitmapRenderingContext::Redraw + 0x56
[0x4] xul!mozilla::dom::ImageBitmapRenderingContext::TransferFromImageBitmap + 0xb7
[0x5] xul!mozilla::dom::ImageBitmapRenderingContext_Binding::transferFromImageBitmap + 0xec
[0x6] xul!mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy,mozilla::dom::binding_detail::ThrowExceptions> + 0x108
At https://searchfox.org/mozilla-central/rev/222e4f64b769413ac1a1991d2397b13a0acb5d9d/dom/html/HTMLCanvasElement.cpp#1049, the layer gets casted to a CanvasLayer, while according to the debugger it's a ClientImageLayer.
Looking at the nsIFrame::InvalidateLayer(DisplayItemType::TYPE_CANVAS, &invalRect) call, it looks like it is of the correct type in the DisplayItemData.
The plot thickens....
Updated•4 years ago
|
Assignee | ||
Comment 47•4 years ago
|
||
Assignee | ||
Comment 48•4 years ago
|
||
To repro this, it seems to trigger faster if you scroll through the pages on https://xb.yrkz.me/#!/ptbr-ex-1.
Attached patch should fix the issue.
Updated•4 years ago
|
Assignee | ||
Comment 49•4 years ago
|
||
Comment on attachment 9179145 [details]
Bug 1656987 - Only cast and call CanvasLayer::Update() if the layer is a CanvasLayer. r=mattwoodrow
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Unlikely
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: esr, release, beta
- If not all supported branches, which bug introduced the flaw?: Bug 1379920
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: Should be exactly the same change.
- How likely is this patch to cause regressions; how much testing does it need?: Unlikely, testing the existing repro case should be sufficient.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 50•4 years ago
|
||
Comment on attachment 9179145 [details]
Bug 1656987 - Only cast and call CanvasLayer::Update() if the layer is a CanvasLayer. r=mattwoodrow
Beta/Release Uplift Approval Request
- User impact if declined:
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: Yes
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky):
- String changes made/needed:
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration:
- User impact if declined:
- Fix Landed on Version:
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky):
- String or UUID changes made by this patch:
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 51•4 years ago
|
||
Comment on attachment 9179145 [details]
Bug 1656987 - Only cast and call CanvasLayer::Update() if the layer is a CanvasLayer. r=mattwoodrow
Beta/Release Uplift Approval Request
-
User impact if declined: Frequent crashes when using canvas and resizing window or using certain canvas features
-
Is this code covered by automated tests?: Yes
-
Has the fix been verified in Nightly?: No
-
Needs manual test from QE?: Yes
-
If yes, steps to reproduce: STR:
-
Disable Webrender
-
Use slider to scroll through comic
-
List of other uplifts needed: None
-
Risk to taking this patch: Low
-
Why is the change risky/not risky? (and alternatives if risky): Change only changes codepath causing crash.
-
String changes made/needed:
Assignee | ||
Updated•4 years ago
|
Comment 52•4 years ago
|
||
https://hg.mozilla.org/integration/autoland/rev/fc1a99a319ba316619c77a7c5ef630a63ff18370
https://hg.mozilla.org/mozilla-central/rev/fc1a99a319ba
Comment 53•4 years ago
|
||
Comment on attachment 9179145 [details]
Bug 1656987 - Only cast and call CanvasLayer::Update() if the layer is a CanvasLayer. r=mattwoodrow
approved for 82.0b9
Comment 54•4 years ago
|
||
Comment on attachment 9176656 [details]
Bug 1656987 - Change WeakPtr to ThreadSafeWeakPtr. r=aosmond
this one's already uplifted to 82
Comment 56•4 years ago
|
||
uplift |
Assignee | ||
Comment 57•4 years ago
|
||
@julien: I was also wondering if this should be uplifted for esr78, but the form said it should be sec-high as an uplift requirement. I'd be happy to have this uplifted to esr.
Updated•4 years ago
|
Comment 58•4 years ago
|
||
This issue is Verified as fixed in Beta 82.0b9 as well as our latest Nightly Build 83.0a1 (2020-10-08)
Comment 59•4 years ago
|
||
Tom or Dan, can you weigh in on whether we should get this on esr78?
There's no rating in this bug and nothing recently from esr78 in crash-stats with the known signatures as far as I can tell (but there's bp-e5d643a0-a4cf-47fa-aaa4-a036e0200911 and bp-ca2a1cd1-d0bf-4398-9198-a95440200727 looking further back).
Comment 60•4 years ago
|
||
If we think it's low risk; I'd be inclined to take it.
Comment 61•4 years ago
|
||
Although there are almost no crashes on ESR-78 because whatever changed the frequency in Fx79 (bug 1632249?) hadn't happened yet, the bug does exist. Just because the right preconditions don't happen very often by chance doesn't mean they can't be created once you know the reason for it. The changes look pretty straightforward so we're better off taking them in ESR.
Updated•4 years ago
|
Comment 62•4 years ago
|
||
Comment on attachment 9176656 [details]
Bug 1656987 - Change WeakPtr to ThreadSafeWeakPtr. r=aosmond
approved for 78.4esr
Updated•4 years ago
|
Comment 63•4 years ago
|
||
uplift |
Comment 64•4 years ago
|
||
This bug cannot be reproduced on the ESR channel as it reproduced on the other channels so it cannot be verified.
Maybe the fix in fx79 from bug 1632249 mentioned in comment 5 actually caused the appearance of the crash.
I will consider this bug verified, but I will not set the tag of fx 78 as verified.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Description
•