Open Bug 1346421 Opened 3 years ago Updated 2 years ago

Crash in mozilla::css::StyleRule::IsCCLeaf

Categories

(Core :: DOM: CSS Object Model, defect, P5, critical)

53 Branch
All
Windows
defect

Tracking

()

Tracking Status
firefox52 --- unaffected
firefox53 + fix-optional
firefox54 + fix-optional
firefox55 --- unaffected

People

(Reporter: philipp, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-704e4ecc-6b27-4b42-aced-5c05e2170310.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::css::StyleRule::IsCCLeaf() 	layout/style/StyleRule.cpp:1281
1 	xul.dll 	mozilla::CSSStyleSheet::cycleCollection::TraverseNative(void*, nsCycleCollectionTraversalCallback&) 	layout/style/CSSStyleSheet.cpp:547
2 	xul.dll 	MayHaveChild 	xpcom/base/nsCycleCollector.cpp:2556
3 	xul.dll 	nsPurpleBuffer::PurpleBlock::VisitEntries<RemoveSkippableVisitor>(nsPurpleBuffer&, RemoveSkippableVisitor&) 	xpcom/base/nsCycleCollector.cpp:1035
4 	xul.dll 	nsPurpleBuffer::RemoveSkippable(nsCycleCollector*, bool, bool, void (*)(void)) 	xpcom/base/nsCycleCollector.cpp:2815
5 	xul.dll 	nsCycleCollector::ForgetSkippable(bool, bool) 	xpcom/base/nsCycleCollector.cpp:2863
6 	xul.dll 	nsCycleCollector_forgetSkippable(bool, bool) 	xpcom/base/nsCycleCollector.cpp:4096
7 	xul.dll 	FireForgetSkippable 	dom/base/nsJSEnvironment.cpp:1238
8 	xul.dll 	CCTimerFired 	dom/base/nsJSEnvironment.cpp:1795
9 	xul.dll 	nsTimerImpl::Fire(int) 	xpcom/threads/nsTimerImpl.cpp:479
10 	xul.dll 	nsTimerEvent::Run() 	xpcom/threads/TimerThread.cpp:297
11 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1240
12 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:96
13 	xul.dll 	mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:301
14 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:231
15 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:211
16 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp:156
17 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp:262
18 	xul.dll 	XRE_RunAppShell() 	toolkit/xre/nsEmbedFunctions.cpp:924
19 	xul.dll 	mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:269
20 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:231
21 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:211
22 	xul.dll 	XRE_InitChildProcess(int, char** const, XREChildData const*) 	toolkit/xre/nsEmbedFunctions.cpp:756
23 	xul.dll 	mozilla::BootstrapImpl::XRE_InitChildProcess(int, char** const, XREChildData const*) 	toolkit/xre/Bootstrap.cpp:65
24 	firefox.exe 	content_process_main(mozilla::Bootstrap*, int, char** const) 	ipc/contentproc/plugin-container.cpp:115
25 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:115
26 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
27 	kernel32.dll 	BaseThreadInitThunk 	
28 	ntdll.dll 	__RtlUserThreadStart 	
29 	ntdll.dll 	_RtlUserThreadStart

this crash signature seems to be regressing in firefox 53 & later versions. so far it has rather been a low-volume crash though...
a related signature may be https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozilla%3A%3Acss%3A%3ARule%3A%3AIsKnownLive
Crash Signature: [@ mozilla::css::StyleRule::IsCCLeaf] → [@ mozilla::css::StyleRule::IsCCLeaf] [@ mozilla::css::Rule::IsKnownLive]
This function was only added in 53, so crashes in it being new in 53 makes sense.

The linked crash is at 0x20c on this line:

  return !mDOMDeclaration || !mDOMDeclaration->PreservingWrapper();

0x20c isn't _that_ close to 0.... but still pretty close.

The crashes linked in comment 1 are mostly at  0x1b4 but not all, and at this line:

  return nsCCUncollectableMarker::InGeneration(

in IsKnownLive.

Apart from "memory corruption", I have no idea what's up here.  And memory corruption would not be this consistent....
Let's assume this is a regresssion from bug 851892 until proven otherwise.
Blocks: 851892
See Also: 851892
Happening on mDOMDeclaration line but not Rule::IsCCLeaf line probably means the StyleRule object itself is still in valid memory, but it is filled with some random data.

There are two crashes from a same install happen at 0x20c, but the address varies otherwise. There is one 0xd, 0x7a000c, and 0x400c. So the position here probably doesn't have much value.
 52.0b2 was just released out, let's see how the trend is going.
While this is low volume it does seem like a recent regression and I want to make sure we don't forget it. Tracking for 53 and 54.
dbaron: bz: :astley -- any thoughts?  What are our options here?  I presume we don't want to backout the CSS in WebIDL
Flags: needinfo?(dbaron)
Flags: needinfo?(bzbarsky)
Flags: needinfo?(aschen)
The best place to start would be with someone with a working Windows install to try debugging one of these crashes in a minidump so we get some idea of what pointers are valid and what pointers are garbage.

The next best place to start is checking whether there is any information on the crash dumps (e.g. urls) that might enable us to reproduce the crash.  Especially under rr....
Flags: needinfo?(bzbarsky)
So, two crashes from 53.0b8:

First, bp-22706024-0157-4a9f-913b-8ed2a2170401, which is a crash at 659DF7A0:


659DF7A0 F6 40 0C 01          test        byte ptr [eax+0Ch],1                   <===== CRASH HERE
659DF7A4 0F 84 F8 44 BC FF    je          mozilla::css::StyleRule::IsCCLeaf+11h (655A3CA2h)  
659DF7AA 33 C0                xor         eax,eax  
659DF7AC C3                   ret  

with EAX = 1


Second, bp-b16689b4-d9df-4a5f-92c2-1ce082170403, which is a crash at 00007FFCB5C90CDE:


00007FFCB5C90CC4 F6 41 18 01          test        byte ptr [rcx+18h],1  
00007FFCB5C90CC8 75 11                jne         mozilla::css::StyleRule::IsCCLeaf+17h (07FFCB5C90CDBh)  
00007FFCB5C90CCA 48 8B 51 58          mov         rdx,qword ptr [rcx+58h]  
00007FFCB5C90CCE 33 C0                xor         eax,eax  
00007FFCB5C90CD0 48 85 D2             test        rdx,rdx  
00007FFCB5C90CD3 75 09                jne         mozilla::css::StyleRule::IsCCLeaf+1Ah (07FFCB5C90CDEh)  
00007FFCB5C90CD5 B8 01 00 00 00       mov         eax,1  
00007FFCB5C90CDA C3                   ret  
00007FFCB5C90CDB 32 C0                xor         al,al  
00007FFCB5C90CDD C3                   ret  
00007FFCB5C90CDE F6 42 18 01          test        byte ptr [rdx+18h],1    <====== CRASH HERE
00007FFCB5C90CE2 75 F6                jne         mozilla::css::StyleRule::IsCCLeaf+16h (07FFCB5C90CDAh)  
00007FFCB5C90CE4 EB EF                jmp         mozilla::css::StyleRule::IsCCLeaf+11h (07FFCB5C90CD5h)  

with RDX = 0000000020000000.
I think (at a quick look) that in both cases that means we have a bad mDOMDeclaration pointer in mDOMDeclaration->PreservingWrapper().
One thing I'm a little suspicious of is whether it's OK for DOMCSSDeclarationImpl to aggregate the two cycle collection interfaces with mRule, but not aggregate in general with mRule.
My suspicion is that the url field (present in 20 crashes in the past 6 months) isn't particularly useful.  6 are exactly  https://www.facebook.com/ , 3 are other urls on facebook, 2 on youtube, 1 on google, 1  https://web.whatsapp.com/ , 1 about:newtab, 1 on etsy, and the remaining 4 on sites I hadn't heard of.  Though I suppose that means there's a chance it's Facebook-related.

Maybe bz has some ideas based on the comments above, though?
Flags: needinfo?(dbaron) → needinfo?(bzbarsky)
> I think (at a quick look) that in both cases that means we have a bad mDOMDeclaration pointer
> in mDOMDeclaration->PreservingWrapper().

Certainly the 64-bit one looks like that.  The other has less context, but certainly involves the same sort of "read the thing 3 words in, and check the 1 bit", so is quite plausible.

> but not aggregate in general with mRule

DOMCSSDeclarationImpl does aggregate with mRule for refcounting:

  NS_IMPL_ADDREF_USING_AGGREGATOR(DOMCSSDeclarationImpl, mRule)
  NS_IMPL_RELEASE_USING_AGGREGATOR(DOMCSSDeclarationImpl, mRule)

Were you thinking of something else in terms of aggregating in general?

Anyway, the basic ownership model here is that mDOMDeclaration is a UniquePtr<DOMCSSDeclarationImpl>, the refcounts are aggregated, and the CC bits are aggregated so that both objects look like a single node in the CC graph.  The only real benefit here is that we save a word for the refcount of the DOMCSSDeclarationImpl.

We could disaggregate the stuff, give DOMCSSDeclarationImpl its own refcount, and see whether that makes the crashes go away.  But I'm not quite sure how we can end up with a bad mDOMDeclaration value in general.  It starts life in the UniquePtr ctr, which nulls it out.  Then it's only modified when we allocate the DOMCSSDeclarationImpl, and set to the new object.  It's never written anywhere else...

Looking at reports for this crash, I'm seeing the following crash addresses in the last 14 days, including the two from comment 9:

  0x800c
  0x8a00000018 
  0x10
  0x2c 
  0x20000018
  0x400018 
  0x200c
  0xd
  0xe 
  0x14 
  0x10c
  0x4c 
  0x11
  0x4000000c
  0x10c
  0x818
  0x8000c

All the ones ending in "18" are 64-bit.  All the others are 32-bit.  A crash for a bad mDOMDeclaration on 64-bit would be at "bad pointer + 0x18" and on 32-bit at "bad pointer + 0xC".  So the corresponding bad pointers would be:

  0x8000
  0x8a00000000
  0x04
  0x20 
  0x20000000
  0x400000 
  0x2000
  0x1
  0x2
  0x08
  0x100
  0x40 
  0x05
  0x40000000
  0x100
  0x800
  0x80000 

and all of those except 0x8a00000000 and 0x05 are a single bit from a 0 pointer.  The report from comment 0 was crashing at 0x20c which is 0x200 + 0xC.

Anyway, those two non-one-bit-from-0 reports are https://crash-stats.mozilla.com/report/index/1ea017ca-7429-49ce-ba52-c48442170404 and https://crash-stats.mozilla.com/report/index/270040d2-913e-44b9-bd51-195972170326 in case we think we can get something useful out of them.  The rest really do look like someone bitflipped mDOMDeclaration...
Flags: needinfo?(bzbarsky)
So one question is whether this is low-volume enough that we can in fact chalk this up to memory bitflips...
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #13)
> Were you thinking of something else in terms of aggregating in general?

I was thinking of COM identity, i.e., both objects produce the same set of results from QueryInterface.
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #14)
> So one question is whether this is low-volume enough that we can in fact
> chalk this up to memory bitflips...

Given that the pointer is generally null, I think that's plausible.
The volume of crashes in Beta54 is low. Mark 54 as fix-optional but still happy to take fix in 54.
Flags: needinfo?(aschen)
Priority: -- → P5
You need to log in before you can comment on or make changes to this bug.