Closed Bug 1656987 Opened 4 years ago Closed 4 years ago

Crash in [@ RtlDeleteCriticalSection | mozilla::layers::ImageContainer::~ImageContainer]

Categories

(Core :: Graphics: Layers, defect)

79 Branch
All
Windows
defect

Tracking

()

VERIFIED FIXED
83 Branch
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 82+ fixed
firefox78 --- unaffected
firefox79 --- wontfix
firefox80 --- wontfix
firefox81 + wontfix
firefox82 + verified
firefox83 + verified

People

(Reporter: philipp, Assigned: ktaeleman)

References

(Regression)

Details

(Keywords: crash, regression, sec-moderate, Whiteboard: [adv-main82+r][adv-esr78.4+r])

Crash Data

Attachments

(2 files, 2 obsolete files)

This bug is for crash report bp-7898d616-3ac6-4e85-b11e-3f0370200803.

Top 10 frames of crashing thread:

0 ntdll.dll RtlDeleteCriticalSection 
1 xul.dll mozilla::layers::ImageContainer::~ImageContainer gfx/layers/ImageContainer.cpp:221
2 xul.dll mozilla::layers::ImageLayer::~ImageLayer gfx/layers/ImageLayers.cpp:22
3 xul.dll mozilla::layers::ClientImageLayer::~ClientImageLayer gfx/layers/client/ClientImageLayer.cpp:35
4 xul.dll mozilla::layers::ContainerLayer::RemoveAllChildren gfx/layers/Layers.cpp:846
5 xul.dll mozilla::layers::ClientContainerLayer::~ClientContainerLayer gfx/layers/client/ClientContainerLayer.h:36
6 xul.dll mozilla::layers::ClientContainerLayer::~ClientContainerLayer gfx/layers/client/ClientContainerLayer.h:35
7 xul.dll mozilla::layers::ContainerLayer::RemoveAllChildren gfx/layers/Layers.cpp:846
8 xul.dll mozilla::layers::ClientContainerLayer::~ClientContainerLayer gfx/layers/client/ClientContainerLayer.h:36
9 xul.dll mozilla::layers::ClientContainerLayer::~ClientContainerLayer gfx/layers/client/ClientContainerLayer.h:35

these content crash reports on windows are getting much more frequent in firefox 79.
url correlations are pointing towards various instances of the web game "mahjong dimensions" as the source of the crashing tab:
http://www.mahjong.fr/jouer/mahjongdimensions.php
https://spiele.rtl.de/mahjong-spiele/mahjongg-dimensions-candy/spielen.html
https://www.wildtangent.com/Play/mahjongg-dimensions-html5

curiously german users seem to be overrepresented in this crash signature too:
(75.58% in signature vs 10.19% overall) useragent_locale = de

:ktaleman Triaging as REO for 79. Could you set a priority and severity for this bug?

Flags: needinfo?(ktaeleman)
Blocks: gfx-triage
Severity: -- → S3
Flags: needinfo?(ktaeleman)
Priority: -- → P3
Blocks: wr-81
No longer blocks: gfx-triage
Flags: needinfo?(aosmond)

The volume of this crash on 79 is pretty worrying. Kris, is there anything we can do to help this investigation forward?

Flags: needinfo?(ktaeleman)

Andrew is going to investigate.

Flags: needinfo?(ktaeleman)

Yes, this slipped under my radar, but it is at the top of the list now.

So we don't have much to go on besides:

  • It is non-WR.
  • It is with canvas/WebGL.
  • It is Windows only as best I can tell (trying to find signature morphs).

Reviewing the pushlog from 78.0.2 to 79.0:

https://hg.mozilla.org/releases/mozilla-release/pushloghtml?fromchange=e56adbbfe01c2443bae35e3d6f34867e36c3828e&tochange=df3ed76cf46b23c9b658cd5be4cdd4162d86f736

I believe bug 1632249 or related is the most likely cause of the regression given it was a pretty big change wrt to canvas/WebGL. Jeff, are you able to reproduce and/or have any ideas about this?

Flags: needinfo?(aosmond) → needinfo?(jgilbert)
Regressed by: 1632249

RtlDeleteCriticalSection could be from the dtor for RecursiveMutex ImageContainer::mRecursiveMutex.
https://searchfox.org/mozilla-central/source/xpcom/threads/RecursiveMutex.cpp#60
https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletecriticalsection

This seems like either double-deletion or delete-while-locked.
There's no WebGL? in the report I saw, so it can happen without webgl.

I don't think I touched anything that touches that code, so this is weird.

Flags: needinfo?(jgilbert)

Since there are reports from 77 and 78 it's possible this race condition just got 1000x more common due to another change. Over 90% of the reports are unique for their install, so most people aren't running into this more than once, which supports the race condition theory.

It's NS_INLINE_DECL_THREADSAFE_REFCOUNTING(ImageContainer) so double-delete seems unlikely.
Maybe we borrow a bare pointer somewhere?

@ktaeleman: repro

Flags: needinfo?(ktaeleman)

I took a look at one of the minidumps for this.

It looks like the CRITICAL_SECTION object within the RecursiveMutex has the DebugInfo member as 0xffffffffffffff01.

Valid values for this are valid pointer addresses, 0, and -1 (0xffffffffffffffff).

It is none of these, and we crash attempting to dereference it (0x10 within it, causing a faulting address of 0xffffffffffffff11).

Interestingly, it seems like the vast majority of reports all have the same crashing address, so it doesn't really seem like random corruption.

This value is stored directly within the ImageContainer class, so UAF could be letting someone else write it, but I can't see why it'd always have the same mutation.

Given the volume of the crash, we need to reprioritize this bug. This is in topcrash territory.

Severity: S3 → --
Priority: P3 → --

I've been unable to reproduce this issue. I've tried stress testing the imagecontainers by minimizing/maximizing which seems to trigger construction and destruction of the instances, but no luck.

Looking at the crash stats it looks like this crash started happening ~73.0.1, but with different callstacks (initially with Webrender), all resulting in a crash in an ImageContainer RtlDeleteCriticalSection crash.

One thing I was wondering about was the potential for a double free or so, but as far as I can see, the memory should have been filled with e5 or e4 if the data was freed and re-used.

I'm not sure how to progress here, one thing we could do is add padding around the mRecursiveMutex and assert on those values to see if it is caused by something writing to the memory at the same offset, but since we haven't been able to reproduce this, it would have to be checked in and verified in the beta population.

Running this through Application Verifier with critical section analysis enabled, I get

=======================================
VERIFIER STOP 0000000000000211: pid 0x653C: Critical section is already initialized. 

	00007DF42AE05F60 : Critical section address. Run !cs -s <address> to get more information.
	0000016C67027FD0 : Critical section debug info address.
	0000016C57E666F0 : First initialization stack trace. Use dps to dump it if non-NULL.
	0000000000000000 : Not used.
=======================================

here:

 	vrfcore.dll!VerifierStopMessageEx()	Unknown
 	vfbasics.dll!AVrfpInitializeCriticalSectionCommon()	Unknown
 	KernelBase.dll!InitializeCriticalSectionAndSpinCount()	Unknown
 	mozglue.dll!Mutex::Init() Line 35	C++
 	mozglue.dll!arena_t::arena_t(arena_params_s * aParams) Line 3516	C++
 	mozglue.dll!ArenaCollection::CreateArena(bool aIsPrivate, arena_params_s * aParams) Line 3603	C++
>	mozglue.dll!Allocator<MozJemallocBase>::moz_create_arena_with_params(arena_params_s * aParams) Line 4444	C++
 	mozglue.dll!replace_moz_create_arena_with_params(arena_params_s * aParams) Line 1431	C++
 	mozglue.dll!Allocator<ReplaceMallocBase>::moz_create_arena_with_params(arena_params_s * arg1) Line 111	C++
 	mozglue.dll!moz_create_arena_with_params(arena_params_s * arg1) Line 111	C++
 	xul.dll!mozilla::dom::DOMArena::DOMArena() Line 38	C++
 	xul.dll!mozilla::dom::DocGroup::DocGroup(mozilla::dom::BrowsingContextGroup * aBrowsingContextGroup, const nsTSubstring<char> & aKey) Line 178	C++
 	xul.dll!mozilla::dom::DocGroup::Create(mozilla::dom::BrowsingContextGroup * aBrowsingContextGroup, const nsTSubstring<char> & aKey) Line 127	C++
 	xul.dll!mozilla::dom::BrowsingContextGroup::AddDocument(const nsTSubstring<char> & aKey, mozilla::dom::Document * aDocument) Line 324	C++
 	xul.dll!mozilla::dom::Document::GetDocGroupOrCreate() Line 6760	C++
 	xul.dll!nsNodeInfoManager::Allocate(unsigned __int64 aSize) Line 275	C++
 	xul.dll!nsINode::operator new(unsigned __int64 aSize, nsNodeInfoManager * aManager) Line 121	C++
 	xul.dll!NS_NewHTMLSharedElement(already_AddRefed<mozilla::dom::NodeInfo> && aNodeInfo, mozilla::dom::FromParser aFromParser) Line 25	C++
 	xul.dll!NS_NewHTMLHtmlElement(already_AddRefed<mozilla::dom::NodeInfo> && aNodeInfo, mozilla::dom::FromParser aFromParser) Line 1264	C++
 	xul.dll!nsContentDLF::CreateBlankDocument(nsILoadGroup * aLoadGroup, nsIPrincipal * aPrincipal, nsIPrincipal * aPartitionedPrincipal, nsDocShell * aContainer) Line 255	C++
 	xul.dll!nsDocShell::CreateAboutBlankContentViewer(nsIPrincipal * aPrincipal, nsIPrincipal * aPartitionedPrincipal, nsIContentSecurityPolicy * aCSP, nsIURI * aBaseURI, const mozilla::Maybe<nsILoadInfo::CrossOriginEmbedderPolicy> & aCOEP, bool aTryToSaveOldPresentation, bool aCheckPermitUnload, mozilla::dom::WindowGlobalChild * aActor) Line 6503	C++
 	xul.dll!nsDocShell::EnsureContentViewer() Line 6354	C++
 	xul.dll!nsDocShell::GetDocument() Line 3013	C++
 	xul.dll!nsPIDOMWindowOuter::MaybeCreateDoc() Line 7774	C++
 	xul.dll!nsPIDOMWindowOuter::GetDoc() Line 797	C++
 	xul.dll!nsPIDOMWindowOuter::EnsureInnerWindow() Line 687	C++
 	xul.dll!mozilla::dom::ToJSValue(JSContext * aCx, const mozilla::dom::WindowProxyHolder & aArgument, JS::MutableHandle<JS::Value> aValue) Line 81	C++
 	xul.dll!mozilla::dom::WrapObject(JSContext * cx, const mozilla::dom::WindowProxyHolder & p, JS::MutableHandle<JS::Value> rval) Line 1220	C++
 	xul.dll!mozilla::dom::ContentFrameMessageManager_Binding::get_content(JSContext * cx, JS::Handle<JSObject *> obj, void * void_self, JSJitGetterCallArgs args) Line 1706	C++
 	xul.dll!mozilla::dom::binding_detail::GenericGetter<mozilla::dom::binding_detail::NormalThisPolicy,mozilla::dom::binding_detail::ThrowExceptions>(JSContext * cx, unsigned int argc, JS::Value * vp) Line 3109	C++
 	xul.dll!CallJSNative(JSContext * cx, bool(*)(JSContext *, unsigned int, JS::Value *) native, js::CallReason reason, const JS::CallArgs & args) Line 507	C++
 	xul.dll!js::InternalCallOrConstruct(JSContext * cx, const JS::CallArgs & args, js::MaybeConstruct construct, js::CallReason reason) Line 599	C++
 	xul.dll!InternalCall(JSContext * cx, const js::AnyInvokeArgs & args, js::CallReason reason) Line 664	C++
 	xul.dll!js::Call(JSContext * cx, JS::Handle<JS::Value> fval, JS::Handle<JS::Value> thisv, const js::AnyInvokeArgs & args, JS::MutableHandle<JS::Value> rval, js::CallReason reason) Line 681	C++
 	xul.dll!js::CallGetter(JSContext * cx, JS::Handle<JS::Value> thisv, JS::Handle<JS::Value> getter, JS::MutableHandle<JS::Value> rval) Line 805	C++
 	xul.dll!CallGetter(JSContext * cx, JS::Handle<JSObject *> obj, JS::Handle<JS::Value> receiver, JS::Handle<js::Shape *> shape, JS::MutableHandle<JS::Value> vp) Line 2374	C++
 	xul.dll!GetExistingProperty<js::CanGC>(JSContext * cx, JS::Handle<JS::Value> receiver, JS::Handle<js::NativeObject *> obj, JS::Handle<js::Shape *> shape, JS::MutableHandle<JS::Value> vp) Line 2426	C++
 	xul.dll!js::NativeGetExistingProperty(JSContext * cx, JS::Handle<JSObject *> receiver, JS::Handle<js::NativeObject *> obj, JS::Handle<js::Shape *> shape, JS::MutableHandle<JS::Value> vp) Line 2435	C++
 	xul.dll!js::FetchName<js::GetNameMode::Normal>(JSContext * cx, JS::Handle<JSObject *> receiver, JS::Handle<JSObject *> holder, JS::Handle<js::PropertyName *> name, JS::Handle<JS::PropertyResult> prop, JS::MutableHandle<JS::Value> vp) Line 169	C++
 	xul.dll!js::GetEnvironmentName<js::GetNameMode::Normal>(JSContext * cx, JS::Handle<JSObject *> envChain, JS::Handle<js::PropertyName *> name, JS::MutableHandle<JS::Value> vp) Line 220	C++
 	xul.dll!GetNameOperation(JSContext * cx, js::InterpreterFrame * fp, unsigned char * pc, JS::MutableHandle<JS::Value> vp) Line 246	C++
 	xul.dll!Interpret(JSContext * cx, js::RunState & state) Line 3474	C++
 	xul.dll!js::RunScript(JSContext * cx, js::RunState & state) Line 468	C++
 	xul.dll!js::ExecuteKernel(JSContext * cx, JS::Handle<JSScript *> script, JS::Handle<JSObject *> envChainArg, JS::Handle<JS::Value> newTargetValue, js::AbstractFramePtr evalInFrame, JS::MutableHandle<JS::Value> result) Line 856	C++
 	xul.dll!js::Execute(JSContext * cx, JS::Handle<JSScript *> script, JS::Handle<JSObject *> envChain, JS::MutableHandle<JS::Value> rval) Line 888	C++
 	xul.dll!ExecuteScript(JSContext * cx, JS::Handle<JSObject *> envChain, JS::Handle<JSScript *> script, JS::MutableHandle<JS::Value> rval) Line 377	C++
 	xul.dll!ExecuteScript(JSContext * cx, JS::Handle<JS::StackGCVector<JSObject *,js::TempAllocPolicy>> envChain, JS::Handle<JSScript *> scriptArg, JS::MutableHandle<JS::Value> rval) Line 396	C++
 	xul.dll!JS::CloneAndExecuteScript(JSContext * cx, JS::Handle<JS::StackGCVector<JSObject *,js::TempAllocPolicy>> envChain, JS::Handle<JSScript *> scriptArg, JS::MutableHandle<JS::Value> rval) Line 452	C++
 	xul.dll!nsMessageManagerScriptExecutor::LoadScriptInternal(JS::Handle<JSObject *> aMessageManager, const nsTSubstring<char16_t> & aURL, bool aRunInUniqueScope) Line 1163	C++
 	xul.dll!mozilla::dom::InProcessBrowserChildMessageManager::LoadFrameScript(const nsTSubstring<char16_t> & aURL, bool aRunInGlobalScope) Line 301	C++
 	xul.dll!nsFrameLoader::DoLoadMessageManagerScript(const nsTSubstring<char16_t> & aURL, bool aRunInGlobalScope) Line 2816	C++
 	xul.dll!nsFrameMessageManager::LoadScript(const nsTSubstring<char16_t> & aURL, bool aAllowDelayedLoad, bool aRunInGlobalScope, mozilla::ErrorResult & aError) Line 315	C++
 	xul.dll!nsFrameMessageManager::LoadPendingScripts(nsFrameMessageManager * aManager, nsFrameMessageManager * aChildMM) Line 789	C++
 	xul.dll!mozilla::dom::MessageBroadcaster::AddChildManager(mozilla::dom::MessageListenerManager * aManager) Line 33	C++
 	xul.dll!mozilla::dom::MessageSender::InitWithCallback(mozilla::dom::ipc::MessageManagerCallback * aCallback) Line 26	C++
 	xul.dll!nsFrameLoader::ReallyLoadFrameScripts() Line 2954	C++
 	xul.dll!nsFrameLoader::MaybeCreateDocShell() Line 2222	C++
 	xul.dll!nsFrameLoader::ReallyStartLoadingInternal() Line 707	C++
 	xul.dll!nsFrameLoader::ReallyStartLoading() Line 612	C++
 	xul.dll!mozilla::dom::Document::MaybeInitializeFinalizeFrameLoaders() Line 8580	C++
 	xul.dll!mozilla::detail::RunnableMethodArguments<>::applyImpl<mozilla::dom::Document,void (mozilla::dom::Document::*)()>(mozilla::dom::Document * o, void(mozilla::dom::Document::*)() m, mozilla::Tuple<> & args, std::integer_sequence<unsigned long long>) Line 1188	C++
 	xul.dll!mozilla::detail::RunnableMethodArguments<>::apply<mozilla::dom::Document,void (mozilla::dom::Document::*)()>(mozilla::dom::Document * o, void(mozilla::dom::Document::*)() m) Line 1194	C++
 	xul.dll!mozilla::detail::RunnableMethodImpl<mozilla::dom::Document *,void (mozilla::dom::Document::*)(),1,mozilla::RunnableKind::Standard>::Run() Line 1243	C++
 	xul.dll!nsContentUtils::RemoveScriptBlocker() Line 5373	C++
 	xul.dll!nsAutoScriptBlocker::~nsAutoScriptBlocker() Line 3480	C++
 	xul.dll!nsFrameLoaderOwner::ChangeRemotenessCommon(const nsFrameLoaderOwner::ChangeRemotenessContextType & aContextType, bool aSwitchingInProgressLoad, bool aIsRemote, mozilla::dom::BrowsingContextGroup * aGroup, std::function<void ()> & aFrameLoaderInit, mozilla::ErrorResult & aRv) Line 152	C++
 	xul.dll!nsFrameLoaderOwner::ChangeRemoteness(const mozilla::dom::RemotenessOptions & aOptions, mozilla::ErrorResult & rv) Line 215	C++
 	xul.dll!mozilla::dom::XULFrameElement_Binding::changeRemoteness(JSContext * cx_, JS::Handle<JSObject *> obj, void * void_self, const JSJitMethodCallArgs & args) Line 510	C++
 	xul.dll!mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy,mozilla::dom::binding_detail::ThrowExceptions>(JSContext * cx, unsigned int argc, JS::Value * vp) Line 3227	C++
 	xul.dll!CallJSNative(JSContext * cx, bool(*)(JSContext *, unsigned int, JS::Value *) native, js::CallReason reason, const JS::CallArgs & args) Line 507	C++
 	xul.dll!js::InternalCallOrConstruct(JSContext * cx, const JS::CallArgs & args, js::MaybeConstruct construct, js::CallReason reason) Line 599	C++
 	xul.dll!InternalCall(JSContext * cx, const js::AnyInvokeArgs & args, js::CallReason reason) Line 664	C++
 	xul.dll!js::CallFromStack(JSContext * cx, const JS::CallArgs & args) Line 668	C++
 	xul.dll!Interpret(JSContext * cx, js::RunState & state) Line 3336	C++
 	xul.dll!js::RunScript(JSContext * cx, js::RunState & state) Line 468	C++
 	xul.dll!js::InternalCallOrConstruct(JSContext * cx, const JS::CallArgs & args, js::MaybeConstruct construct, js::CallReason reason) Line 636	C++
 	xul.dll!InternalCall(JSContext * cx, const js::AnyInvokeArgs & args, js::CallReason reason) Line 664	C++
 	xul.dll!js::Call(JSContext * cx, JS::Handle<JS::Value> fval, JS::Handle<JS::Value> thisv, const js::AnyInvokeArgs & args, JS::MutableHandle<JS::Value> rval, js::CallReason reason) Line 681	C++
 	xul.dll!JS_CallFunctionValue(JSContext * cx, JS::Handle<JSObject *> obj, JS::Handle<JS::Value> fval, const JS::HandleValueArray & args, JS::MutableHandle<JS::Value> rval) Line 2768	C++
 	xul.dll!nsXPCWrappedJS::CallMethod(unsigned short methodIndex, const nsXPTMethodInfo * info, nsXPTCMiniVariant * nativeParams) Line 963	C++
 	xul.dll!PrepareAndDispatch(nsXPTCStubBase * self, unsigned int methodIndex, unsigned __int64 * args, unsigned __int64 * gprData, double * fprData) Line 168	C++
 	[External Code]	
 	xul.dll!nsObserverList::NotifyObservers(nsISupports * aSubject, const char * aTopic, const char16_t * someData) Line 69	C++
 	xul.dll!nsObserverService::NotifyObservers(nsISupports * aSubject, const char * aTopic, const char16_t * aSomeData) Line 290	C++
 	xul.dll!mozilla::dom::ContentParent::ActorDestroy(mozilla::ipc::IProtocol::ActorDestroyReason why) Line 1888	C++
 	xul.dll!mozilla::ipc::IProtocol::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason aWhy) Line 602	C++
 	xul.dll!mozilla::dom::PContentParent::OnChannelError() Line 15999	C++
 	xul.dll!mozilla::dom::ContentParent::OnChannelError() Line 1770	C++
 	xul.dll!mozilla::ipc::MessageChannel::NotifyMaybeChannelError() Line 2564	C++
 	xul.dll!mozilla::ipc::MessageChannel::OnNotifyMaybeChannelError() Line 2594	C++
 	xul.dll!mozilla::detail::RunnableMethodArguments<>::applyImpl<mozilla::ipc::MessageChannel,void (mozilla::ipc::MessageChannel::*)()>(mozilla::ipc::MessageChannel * o, void(mozilla::ipc::MessageChannel::*)() m, mozilla::Tuple<> & args, std::integer_sequence<unsigned long long>) Line 1188	C++
 	xul.dll!mozilla::detail::RunnableMethodArguments<>::apply<mozilla::ipc::MessageChannel,void (mozilla::ipc::MessageChannel::*)()>(mozilla::ipc::MessageChannel * o, void(mozilla::ipc::MessageChannel::*)() m) Line 1194	C++
 	xul.dll!mozilla::detail::RunnableMethodImpl<mozilla::ipc::MessageChannel *,void (mozilla::ipc::MessageChannel::*)(),0,mozilla::RunnableKind::Cancelable>::Run() Line 1243	C++
 	xul.dll!mozilla::RunnableTask::Run() Line 242	C++
 	xul.dll!mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(const mozilla::detail::BaseAutoLock<mozilla::Mutex &> & aProofOfLock) Line 512	C++
 	xul.dll!mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(const mozilla::detail::BaseAutoLock<mozilla::Mutex &> & aProofOfLock) Line 371	C++
 	xul.dll!mozilla::TaskController::ProcessPendingMTTask(bool aMayWait) Line 168	C++
 	xul.dll!mozilla::TaskController::InitializeInternal::<unnamed-tag>::operator()() Line 86	C++
 	xul.dll!mozilla::detail::RunnableFunction<`lambda at c:/Users/krist/code/gecko/xpcom/threads/TaskController.cpp:86:7'>::Run() Line 577	C++
 	xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 1236	C++
 	xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 513	C++
 	xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 84	C++
 	xul.dll!MessageLoop::RunInternal() Line 334	C++
 	xul.dll!MessageLoop::RunHandler() Line 328	C++
 	xul.dll!MessageLoop::Run() Line 310	C++
 	xul.dll!nsBaseAppShell::Run() Line 139	C++
 	xul.dll!nsAppShell::Run() Line 602	C++
 	xul.dll!nsAppStartup::Run() Line 270	C++
 	xul.dll!XREMain::XRE_mainRun() Line 4751	C++
 	xul.dll!XREMain::XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 4941	C++
 	xul.dll!XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 4995	C++
 	xul.dll!mozilla::BootstrapImpl::XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 45	C++
 	firefox.exe!do_main(int argc, char * * argv, char * * envp) Line 217	C++
 	firefox.exe!NS_internal_main(int argc, char * * argv, char * * envp) Line 331	C++
 	firefox.exe!wmain(int argc, wchar_t * * argv) Line 131	C++
 	[External Code]	

I'm still figuring out whether this is a false positive as this could happen when allocations don't get their memory zerod out, but it seems like most of the data has valid members. The arena allocation code is run quite frequently and does not trigger the error in other usecases.

Hello,

I'd just like to add that some users of my comic reader (https://github.com/chocolatkey/xbreader) are also experiencing what seems to be the same crash, although I cannot reproduce it on my own machine. I asked one to submit the crash report, and it's here: https://crash-stats.mozilla.org/report/index/bf6433ef-f17e-47a3-aae8-470200200912 . I think it definitely canvas-related. An example URL is here: https://xb.yrkz.me/#!/ptbr-ex-1 . Tends to happen when you start stressing it, as in resizing quickly, moving around pages quickly etc.

(In reply to chocolatkey from comment #15)

Hello,

I'd just like to add that some users of my comic reader (https://github.com/chocolatkey/xbreader) are also experiencing what seems to be the same crash, although I cannot reproduce it on my own machine. I asked one to submit the crash report, and it's here: https://crash-stats.mozilla.org/report/index/bf6433ef-f17e-47a3-aae8-470200200912 . I think it definitely canvas-related. An example URL is here: https://xb.yrkz.me/#!/ptbr-ex-1 . Tends to happen when you start stressing it, as in resizing quickly, moving around pages quickly etc.

Also, to add, I can ask these users (my acquaintances) to perform any special tests you need to help resolve this bug. Here is another crash report: https://crash-stats.mozilla.org/report/index/c7758e8b-a2ff-47bd-a98d-040380200912

It'd be worth trying to look at what changed in 79 to see if anything stands out.

One thing I noticed is that ImageContainer is threadsafe (NS_INLINE_DECL_THREADSAFE_REFCOUNTING), but also uses SupportsWeakPtr, which isn't threadafe. I can't find cases where we'd use the two at the same time, but that could be problematic.

I've found what's causing the Application Verifier to throw an exception.

The lock taken here: https://searchfox.org/mozilla-central/source/memory/build/mozjemalloc.cpp#3515 initializes a critical section, but the critical section is never released. Other use cases of this mutex type are all static, causing (I'm guessing) the criticalsection to be released when the process ends, but in this case, every arena construction will leak a critical section.

I'm not sure this is directly related to this issue, but since we reuse memory, there is some undefined behavior in calling a mutex init twice.

Assignee: nobody → ktaeleman
Flags: needinfo?(ktaeleman)

(In reply to chocolatkey from comment #16)

(In reply to chocolatkey from comment #15)

Hello,

I'd just like to add that some users of my comic reader (https://github.com/chocolatkey/xbreader) are also experiencing what seems to be the same crash, although I cannot reproduce it on my own machine. I asked one to submit the crash report, and it's here: https://crash-stats.mozilla.org/report/index/bf6433ef-f17e-47a3-aae8-470200200912 . I think it definitely canvas-related. An example URL is here: https://xb.yrkz.me/#!/ptbr-ex-1 . Tends to happen when you start stressing it, as in resizing quickly, moving around pages quickly etc.

Also, to add, I can ask these users (my acquaintances) to perform any special tests you need to help resolve this bug. Here is another crash report: https://crash-stats.mozilla.org/report/index/c7758e8b-a2ff-47bd-a98d-040380200912

Thanks for letting us know! We might take you up on this offer once we identify a potential fix.
Sadly I was also not able to reproduce the issue with your comic reader, but stressing with resizing/minimizing etc is more than we knew about this crash before!

@rares: Would you be able to try and reproduce this with these repro steps?

Flags: needinfo?(rares.doghi)

I'm able to reproduce this with @chocolatkey's comic reader site: https://xb.yrkz.me/#!/ptbr-ex-1.
Preconditions:

  • Disable webrender (gfx.webrender.force-disable = true)
  • Release build (Not occurring in debug)

It looks like the criticalsection fix attached does not fix the issue.

Flags: needinfo?(rares.doghi)
Attachment #9175653 - Attachment is obsolete: true

The only place the WeakPtr is needed is with ImageLib.

https://searchfox.org/mozilla-central/rev/30e70f2fe80c97bfbfcd975e68538cefd7f58b2a/image/Image.h#451

For testing purposes, we could just make it a strong pointer. If it fixed the problem, we could probably work around how we depend the WeakPtr. Expire old ImageContainers with only one strong reference like we do the surfaces in the SurfaceCache and/or hook it into the memory pressure signal.

Blocks: gfx-82
No longer blocks: wr-81

Removing the WeakPtr seems to fix it (or makes it harder to repro). While I was debugging this I ran into a Skia crash which distracted me a bit trying to get to the bottom of it, but it seems like it might be the same issue (https://bugzilla.mozilla.org/show_bug.cgi?id=1576843)

Pushed by ktaeleman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/22d8b5439a7a
Change WeakPtr to ThreadSafeWeakPtr. r=aosmond
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 83 Branch
Attached file Uplift request for FF 82 Beta (obsolete) —

Approval Request Comment
[Feature/Bug causing the regression]:
N/A, bug seems to have been present from before FF70
[User impact if declined]:
Crashes will continue
[Is this code covered by automated tests?]:
Yes, but it is hard to reproduce so it has only shown up as intermittents
[Has the fix been verified in Nightly?]:
Yes
[Needs manual test from QE? If yes, steps to reproduce]:
Yes
STR:

[Why is the change risky/not risky?]:
It only changes a WeakPtr into a ThreadsafeWeakptr and does not change existing logic beyond that.
[String changes made/needed]:
N/A

Attachment #9177435 - Flags: approval-mozilla-beta?
See Also: → 1576843

@chocolatkey: Thank you so much for the repro case, this really helped us dig into the issue!

:ktaeleman You're welcome, I look forward to when the fix arrives in mainline FF!

Comment on attachment 9176656 [details]
Bug 1656987 - Change WeakPtr to ThreadSafeWeakPtr. r=aosmond

approved for 82.0b3

Attachment #9176656 - Flags: approval-mozilla-beta+

Comment on attachment 9177435 [details]
Uplift request for FF 82 Beta

moved the approval flag to the actual patch attachment :)

Attachment #9177435 - Flags: approval-mozilla-beta?
Attachment #9177435 - Attachment is obsolete: true

Hi, I tried to reproduce this issue in Release, as well as Beta 82.0b2 but without success.. Im running an Nvidia GeForce GT 730 on my Windows 10 machine, does this issue only occur on specific GPUS ? are there any other prefs I should set ? or I could set to help me reproduce this issue on my end ?

Flags: needinfo?(ktaeleman)

(In reply to Rares Doghi from comment #33)

Hi, I tried to reproduce this issue in Release, as well as Beta 82.0b2 but without success.. Im running an Nvidia GeForce GT 730 on my Windows 10 machine, does this issue only occur on specific GPUS ? are there any other prefs I should set ? or I could set to help me reproduce this issue on my end ?

I couldn't reproduce it on my high end desktop, my laptop (still pretty high end) was able to repro quite quickly (~15-30 seconds of resizing). It's a race condition that depends who releases the pointer first, not GPU related and seems to occur more on slower cpus.

Flags: needinfo?(ktaeleman)

(In reply to Kris Taeleman (:ktaeleman) from comment #34)

(In reply to Rares Doghi from comment #33)

Hi, I tried to reproduce this issue in Release, as well as Beta 82.0b2 but without success.. Im running an Nvidia GeForce GT 730 on my Windows 10 machine, does this issue only occur on specific GPUS ? are there any other prefs I should set ? or I could set to help me reproduce this issue on my end ?

I couldn't reproduce it on my high end desktop, my laptop (still pretty high end) was able to repro quite quickly (~15-30 seconds of resizing). It's a race condition that depends who releases the pointer first, not GPU related and seems to occur more on slower cpus.

Oh, forgot to mention, make sure webrender is disabled.

Hi Kris, I force-disabled WebRender and tried this page https://xb.yrkz.me/#!/ptbr-ex-1 and now for me the tab crashes in Release, in our latest Nightly as well as our Latest Beta builds, I dont even need to stress it too much , with webrender off I just resize the page a few times and it crashes.

Flags: needinfo?(ktaeleman)

Same signature as before? Can you please link to a crash report?

Flags: needinfo?(rares.doghi)

It took me a while on my laptop (desktop doesn't repro), but you are right, I was able to repro the issue.
https://crash-stats.mozilla.org/report/index/8ac3dcc0-66bb-40b5-b1ca-1a2810200928

Callstack is a bit different resulting in similar crash, but instead of ImageContainer causing the crash, I'm getting mozilla::layers::ShadowableLayer::~ShadowableLayer() causing it this time. Maybe a similar issue is present there OR the fix did not fix the initial issue.

It would be interesting to see your callstack rares.

Flags: needinfo?(ktaeleman)

Second repro has same callstack as previous issue. Seems like the fix did not fix the issue. Re-opening

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → NEW
Target Milestone: 83 Branch → ---
Flags: needinfo?(rares.doghi)

Comment on attachment 9176656 [details]
Bug 1656987 - Change WeakPtr to ThreadSafeWeakPtr. r=aosmond

Clearing approval flag to get this off the needs-uplift radar.

Attachment #9176656 - Flags: approval-mozilla-beta+

(In reply to Rares Doghi from comment #40)

I'm attaching the link to one of my crash reports: https://crash-stats.mozilla.org/report/index/cff03138-d7ab-42c4-bca9-e19580200929

I was looking at your crash report and the crashing modules see odd. nvwgf2umx_cfg.dll is NVidia but ownnq.dll, q_izrnukfh.dll and a_rtndic.dll? Are those maybe malware? A Google search comes up empty with respect to those DLLs.

If this is driver / NVidia related, you could update your old drivers (436.30) to 456.55 and see if it helps. Maybe do a malware scan with something like MalwareBytes.

https://crash-stats.mozilla.org/report/index/98bba450-7eba-4257-9d61-24b050200930

I have updated my Nvidia Drivers but I'm still getting these crashes as soon as I resize the window while on that test page: https://xb.yrkz.me/#!/ptbr-ex-1

(In reply to Rares Doghi from comment #43)

https://crash-stats.mozilla.org/report/index/98bba450-7eba-4257-9d61-24b050200930

I have updated my Nvidia Drivers but I'm still getting these crashes as soon as I resize the window while on that test page: https://xb.yrkz.me/#!/ptbr-ex-1

Ok, thanks for updating and testing. And you also ran a malware scan as well? I'm seeing three new crashing .DLLs from https://crash-stats.mozilla.org/report/index/98bba450-7eba-4257-9d61-24b050200930#tab-modules: dkndc.dll, v_iybjndmk.dll and r_zxlhti.dll.

Those are very suspect and I would suspect be malware or something in the browser that's not supposed to be there.

Nevermind. I was able to repro the crash on 82.0b5 x64 just now: https://crash-stats.mozilla.org/report/index/119abf5f-192d-4cf7-b5aa-240340200930

Looking at this further and seems like the location that writes 01 to the 0xffffffffffffffff value is:

[0x0]   xul!mozilla::layers::CanvasRenderer::SetDirty   
[0x1]   xul!mozilla::layers::CanvasLayer::Updated + 0x7   
[0x2]   xul!mozilla::dom::HTMLCanvasElement::InvalidateCanvasContent + 0x194   
[0x3]   xul!mozilla::dom::ImageBitmapRenderingContext::Redraw + 0x56   
[0x4]   xul!mozilla::dom::ImageBitmapRenderingContext::TransferFromImageBitmap + 0xb7   
[0x5]   xul!mozilla::dom::ImageBitmapRenderingContext_Binding::transferFromImageBitmap + 0xec   
[0x6]   xul!mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy,mozilla::dom::binding_detail::ThrowExceptions> + 0x108   

At https://searchfox.org/mozilla-central/rev/222e4f64b769413ac1a1991d2397b13a0acb5d9d/dom/html/HTMLCanvasElement.cpp#1049, the layer gets casted to a CanvasLayer, while according to the debugger it's a ClientImageLayer.
Looking at the nsIFrame::InvalidateLayer(DisplayItemType::TYPE_CANVAS, &invalRect) call, it looks like it is of the correct type in the DisplayItemData.

The plot thickens....

Group: gfx-core-security, core-security

To repro this, it seems to trigger faster if you scroll through the pages on https://xb.yrkz.me/#!/ptbr-ex-1.
Attached patch should fix the issue.

Group: core-security

Comment on attachment 9179145 [details]
Bug 1656987 - Only cast and call CanvasLayer::Update() if the layer is a CanvasLayer. r=mattwoodrow

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Unlikely
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: esr, release, beta
  • If not all supported branches, which bug introduced the flaw?: Bug 1379920
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: Should be exactly the same change.
  • How likely is this patch to cause regressions; how much testing does it need?: Unlikely, testing the existing repro case should be sufficient.
Attachment #9179145 - Flags: sec-approval?
Crash Signature: [@ RtlDeleteCriticalSection | mozilla::layers::ImageContainer::~ImageContainer] → [@ RtlDeleteCriticalSection | mozilla::layers::ImageContainer::~ImageContainer] [@ RtlDeleteCriticalSection | mozilla::layers::ShadowableLayer::~ShadowableLayer]
Crash Signature: [@ RtlDeleteCriticalSection | mozilla::layers::ImageContainer::~ImageContainer] [@ RtlDeleteCriticalSection | mozilla::layers::ShadowableLayer::~ShadowableLayer] → [@ RtlDeleteCriticalSection | mozilla::layers::ImageContainer::~ImageContainer] [@ RtlDeleteCriticalSection | xul.dll | je_realloc | je_realloc | mozilla::layers::ShadowableLayer::~ShadowableLayer] [@ ntdll.dll | xul.dll | je_realloc | je_realloc | mozi…
Attachment #9179145 - Flags: sec-approval? → sec-approval+

Comment on attachment 9179145 [details]
Bug 1656987 - Only cast and call CanvasLayer::Update() if the layer is a CanvasLayer. r=mattwoodrow

Beta/Release Uplift Approval Request

  • User impact if declined:
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky):
  • String changes made/needed:

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration:
  • User impact if declined:
  • Fix Landed on Version:
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky):
  • String or UUID changes made by this patch:
Attachment #9179145 - Flags: approval-mozilla-esr78?
Attachment #9179145 - Flags: approval-mozilla-beta?
Attachment #9176656 - Flags: approval-mozilla-beta?

Comment on attachment 9179145 [details]
Bug 1656987 - Only cast and call CanvasLayer::Update() if the layer is a CanvasLayer. r=mattwoodrow

Beta/Release Uplift Approval Request

  • User impact if declined: Frequent crashes when using canvas and resizing window or using certain canvas features

  • Is this code covered by automated tests?: Yes

  • Has the fix been verified in Nightly?: No

  • Needs manual test from QE?: Yes

  • If yes, steps to reproduce: STR:

  • Disable Webrender

  • Go to https://xb.yrkz.me/#!/ptbr-ex-1

  • Use slider to scroll through comic

  • List of other uplifts needed: None

  • Risk to taking this patch: Low

  • Why is the change risky/not risky? (and alternatives if risky): Change only changes codepath causing crash.

  • String changes made/needed:

Attachment #9179145 - Flags: approval-mozilla-esr78?
Flags: qe-verify+
Group: gfx-core-security → core-security-release
Status: NEW → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 83 Branch

Comment on attachment 9179145 [details]
Bug 1656987 - Only cast and call CanvasLayer::Update() if the layer is a CanvasLayer. r=mattwoodrow

approved for 82.0b9

Attachment #9179145 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9176656 [details]
Bug 1656987 - Change WeakPtr to ThreadSafeWeakPtr. r=aosmond

this one's already uplifted to 82

Attachment #9176656 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Do we need to do something for esr78 also?

Flags: needinfo?(ktaeleman)

@julien: I was also wondering if this should be uplifted for esr78, but the form said it should be sec-high as an uplift requirement. I'd be happy to have this uplifted to esr.

Flags: needinfo?(ktaeleman) → needinfo?(jcristau)
QA Whiteboard: [qa-triaged]

This issue is Verified as fixed in Beta 82.0b9 as well as our latest Nightly Build 83.0a1 (2020-10-08)

Tom or Dan, can you weigh in on whether we should get this on esr78?

There's no rating in this bug and nothing recently from esr78 in crash-stats with the known signatures as far as I can tell (but there's bp-e5d643a0-a4cf-47fa-aaa4-a036e0200911 and bp-ca2a1cd1-d0bf-4398-9198-a95440200727 looking further back).

Flags: needinfo?(tom)
Flags: needinfo?(dveditz)

If we think it's low risk; I'd be inclined to take it.

Flags: needinfo?(tom)

Although there are almost no crashes on ESR-78 because whatever changed the frequency in Fx79 (bug 1632249?) hadn't happened yet, the bug does exist. Just because the right preconditions don't happen very often by chance doesn't mean they can't be created once you know the reason for it. The changes look pretty straightforward so we're better off taking them in ESR.

Flags: needinfo?(dveditz)
Keywords: sec-moderate
Regressed by: 1379920
Has Regression Range: --- → yes

Comment on attachment 9176656 [details]
Bug 1656987 - Change WeakPtr to ThreadSafeWeakPtr. r=aosmond

approved for 78.4esr

Flags: needinfo?(jcristau)
Attachment #9176656 - Flags: approval-mozilla-esr78+
Attachment #9179145 - Flags: approval-mozilla-esr78+

This bug cannot be reproduced on the ESR channel as it reproduced on the other channels so it cannot be verified.
Maybe the fix in fx79 from bug 1632249 mentioned in comment 5 actually caused the appearance of the crash.

I will consider this bug verified, but I will not set the tag of fx 78 as verified.

Status: RESOLVED → VERIFIED

Removing the qe-verify+ tag.

Flags: qe-verify+
QA Whiteboard: [qa-triaged]
Whiteboard: [adv-main82+r]
Whiteboard: [adv-main82+r] → [adv-main82+r][adv-esr78.4+r]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: