Open Bug 1346421 Opened 3 years ago Updated 2 years ago
Crash in mozilla::css::Style
This bug was filed from the Socorro interface and is report bp-704e4ecc-6b27-4b42-aced-5c05e2170310. ============================================================= Crashing Thread (0) Frame Module Signature Source 0 xul.dll mozilla::css::StyleRule::IsCCLeaf() layout/style/StyleRule.cpp:1281 1 xul.dll mozilla::CSSStyleSheet::cycleCollection::TraverseNative(void*, nsCycleCollectionTraversalCallback&) layout/style/CSSStyleSheet.cpp:547 2 xul.dll MayHaveChild xpcom/base/nsCycleCollector.cpp:2556 3 xul.dll nsPurpleBuffer::PurpleBlock::VisitEntries<RemoveSkippableVisitor>(nsPurpleBuffer&, RemoveSkippableVisitor&) xpcom/base/nsCycleCollector.cpp:1035 4 xul.dll nsPurpleBuffer::RemoveSkippable(nsCycleCollector*, bool, bool, void (*)(void)) xpcom/base/nsCycleCollector.cpp:2815 5 xul.dll nsCycleCollector::ForgetSkippable(bool, bool) xpcom/base/nsCycleCollector.cpp:2863 6 xul.dll nsCycleCollector_forgetSkippable(bool, bool) xpcom/base/nsCycleCollector.cpp:4096 7 xul.dll FireForgetSkippable dom/base/nsJSEnvironment.cpp:1238 8 xul.dll CCTimerFired dom/base/nsJSEnvironment.cpp:1795 9 xul.dll nsTimerImpl::Fire(int) xpcom/threads/nsTimerImpl.cpp:479 10 xul.dll nsTimerEvent::Run() xpcom/threads/TimerThread.cpp:297 11 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1240 12 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:96 13 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:301 14 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:231 15 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:211 16 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp:156 17 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp:262 18 xul.dll XRE_RunAppShell() toolkit/xre/nsEmbedFunctions.cpp:924 19 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:269 20 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:231 21 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:211 22 xul.dll XRE_InitChildProcess(int, char** const, XREChildData const*) toolkit/xre/nsEmbedFunctions.cpp:756 23 xul.dll mozilla::BootstrapImpl::XRE_InitChildProcess(int, char** const, XREChildData const*) toolkit/xre/Bootstrap.cpp:65 24 firefox.exe content_process_main(mozilla::Bootstrap*, int, char** const) ipc/contentproc/plugin-container.cpp:115 25 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:115 26 firefox.exe __scrt_common_main_seh f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253 27 kernel32.dll BaseThreadInitThunk 28 ntdll.dll __RtlUserThreadStart 29 ntdll.dll _RtlUserThreadStart this crash signature seems to be regressing in firefox 53 & later versions. so far it has rather been a low-volume crash though...
a related signature may be https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozilla%3A%3Acss%3A%3ARule%3A%3AIsKnownLive
Crash Signature: [@ mozilla::css::StyleRule::IsCCLeaf] → [@ mozilla::css::StyleRule::IsCCLeaf] [@ mozilla::css::Rule::IsKnownLive]
This function was only added in 53, so crashes in it being new in 53 makes sense. The linked crash is at 0x20c on this line: return !mDOMDeclaration || !mDOMDeclaration->PreservingWrapper(); 0x20c isn't _that_ close to 0.... but still pretty close. The crashes linked in comment 1 are mostly at 0x1b4 but not all, and at this line: return nsCCUncollectableMarker::InGeneration( in IsKnownLive. Apart from "memory corruption", I have no idea what's up here. And memory corruption would not be this consistent....
Let's assume this is a regresssion from bug 851892 until proven otherwise.
Happening on mDOMDeclaration line but not Rule::IsCCLeaf line probably means the StyleRule object itself is still in valid memory, but it is filled with some random data. There are two crashes from a same install happen at 0x20c, but the address varies otherwise. There is one 0xd, 0x7a000c, and 0x400c. So the position here probably doesn't have much value.
While this is low volume it does seem like a recent regression and I want to make sure we don't forget it. Tracking for 53 and 54.
dbaron: bz: :astley -- any thoughts? What are our options here? I presume we don't want to backout the CSS in WebIDL
The best place to start would be with someone with a working Windows install to try debugging one of these crashes in a minidump so we get some idea of what pointers are valid and what pointers are garbage. The next best place to start is checking whether there is any information on the crash dumps (e.g. urls) that might enable us to reproduce the crash. Especially under rr....
So, two crashes from 53.0b8: First, bp-22706024-0157-4a9f-913b-8ed2a2170401, which is a crash at 659DF7A0: 659DF7A0 F6 40 0C 01 test byte ptr [eax+0Ch],1 <===== CRASH HERE 659DF7A4 0F 84 F8 44 BC FF je mozilla::css::StyleRule::IsCCLeaf+11h (655A3CA2h) 659DF7AA 33 C0 xor eax,eax 659DF7AC C3 ret with EAX = 1 Second, bp-b16689b4-d9df-4a5f-92c2-1ce082170403, which is a crash at 00007FFCB5C90CDE: 00007FFCB5C90CC4 F6 41 18 01 test byte ptr [rcx+18h],1 00007FFCB5C90CC8 75 11 jne mozilla::css::StyleRule::IsCCLeaf+17h (07FFCB5C90CDBh) 00007FFCB5C90CCA 48 8B 51 58 mov rdx,qword ptr [rcx+58h] 00007FFCB5C90CCE 33 C0 xor eax,eax 00007FFCB5C90CD0 48 85 D2 test rdx,rdx 00007FFCB5C90CD3 75 09 jne mozilla::css::StyleRule::IsCCLeaf+1Ah (07FFCB5C90CDEh) 00007FFCB5C90CD5 B8 01 00 00 00 mov eax,1 00007FFCB5C90CDA C3 ret 00007FFCB5C90CDB 32 C0 xor al,al 00007FFCB5C90CDD C3 ret 00007FFCB5C90CDE F6 42 18 01 test byte ptr [rdx+18h],1 <====== CRASH HERE 00007FFCB5C90CE2 75 F6 jne mozilla::css::StyleRule::IsCCLeaf+16h (07FFCB5C90CDAh) 00007FFCB5C90CE4 EB EF jmp mozilla::css::StyleRule::IsCCLeaf+11h (07FFCB5C90CD5h) with RDX = 0000000020000000.
I think (at a quick look) that in both cases that means we have a bad mDOMDeclaration pointer in mDOMDeclaration->PreservingWrapper().
One thing I'm a little suspicious of is whether it's OK for DOMCSSDeclarationImpl to aggregate the two cycle collection interfaces with mRule, but not aggregate in general with mRule.
My suspicion is that the url field (present in 20 crashes in the past 6 months) isn't particularly useful. 6 are exactly https://www.facebook.com/ , 3 are other urls on facebook, 2 on youtube, 1 on google, 1 https://web.whatsapp.com/ , 1 about:newtab, 1 on etsy, and the remaining 4 on sites I hadn't heard of. Though I suppose that means there's a chance it's Facebook-related. Maybe bz has some ideas based on the comments above, though?
Flags: needinfo?(dbaron) → needinfo?(bzbarsky)
> I think (at a quick look) that in both cases that means we have a bad mDOMDeclaration pointer > in mDOMDeclaration->PreservingWrapper(). Certainly the 64-bit one looks like that. The other has less context, but certainly involves the same sort of "read the thing 3 words in, and check the 1 bit", so is quite plausible. > but not aggregate in general with mRule DOMCSSDeclarationImpl does aggregate with mRule for refcounting: NS_IMPL_ADDREF_USING_AGGREGATOR(DOMCSSDeclarationImpl, mRule) NS_IMPL_RELEASE_USING_AGGREGATOR(DOMCSSDeclarationImpl, mRule) Were you thinking of something else in terms of aggregating in general? Anyway, the basic ownership model here is that mDOMDeclaration is a UniquePtr<DOMCSSDeclarationImpl>, the refcounts are aggregated, and the CC bits are aggregated so that both objects look like a single node in the CC graph. The only real benefit here is that we save a word for the refcount of the DOMCSSDeclarationImpl. We could disaggregate the stuff, give DOMCSSDeclarationImpl its own refcount, and see whether that makes the crashes go away. But I'm not quite sure how we can end up with a bad mDOMDeclaration value in general. It starts life in the UniquePtr ctr, which nulls it out. Then it's only modified when we allocate the DOMCSSDeclarationImpl, and set to the new object. It's never written anywhere else... Looking at reports for this crash, I'm seeing the following crash addresses in the last 14 days, including the two from comment 9: 0x800c 0x8a00000018 0x10 0x2c 0x20000018 0x400018 0x200c 0xd 0xe 0x14 0x10c 0x4c 0x11 0x4000000c 0x10c 0x818 0x8000c All the ones ending in "18" are 64-bit. All the others are 32-bit. A crash for a bad mDOMDeclaration on 64-bit would be at "bad pointer + 0x18" and on 32-bit at "bad pointer + 0xC". So the corresponding bad pointers would be: 0x8000 0x8a00000000 0x04 0x20 0x20000000 0x400000 0x2000 0x1 0x2 0x08 0x100 0x40 0x05 0x40000000 0x100 0x800 0x80000 and all of those except 0x8a00000000 and 0x05 are a single bit from a 0 pointer. The report from comment 0 was crashing at 0x20c which is 0x200 + 0xC. Anyway, those two non-one-bit-from-0 reports are https://crash-stats.mozilla.com/report/index/1ea017ca-7429-49ce-ba52-c48442170404 and https://crash-stats.mozilla.com/report/index/270040d2-913e-44b9-bd51-195972170326 in case we think we can get something useful out of them. The rest really do look like someone bitflipped mDOMDeclaration...
So one question is whether this is low-volume enough that we can in fact chalk this up to memory bitflips...
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #13) > Were you thinking of something else in terms of aggregating in general? I was thinking of COM identity, i.e., both objects produce the same set of results from QueryInterface.
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #14) > So one question is whether this is low-volume enough that we can in fact > chalk this up to memory bitflips... Given that the pointer is generally null, I think that's plausible.
The volume of crashes in Beta54 is low. Mark 54 as fix-optional but still happy to take fix in 54.
You need to log in before you can comment on or make changes to this bug.