Closed Bug 837370 Opened 7 years ago Closed Last year

crash in TypeConstraintFreeze::newType @ js::types::TypeCompartment::addPendingRecompile

Categories

(Core :: JavaScript Engine, defect, critical)

21 Branch
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX
mozilla22
Tracking Status
firefox20 --- unaffected
firefox21 + fixed
firefox22 --- unaffected
firefox23 --- unaffected

People

(Reporter: scoobidiver, Unassigned)

Details

(Keywords: crash, regression, steps-wanted, Whiteboard: [native-crash][leave open])

Crash Data

Attachments

(3 files)

It started spiking in 21.0a1/20130201. The regression range for the spike is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=20bbf73921f4&tochange=50cf5bbcb180

Signature 	js::types::TypeCompartment::addPendingRecompile(JSContext*, js::types::RecompileInfo const&) More Reports Search
UUID	bb1d4014-843d-4995-9662-2a06b2130202
Date Processed	2013-02-02 07:51:24
Uptime	16673
Last Crash	6.3 weeks before submission
Install Age	16.3 hours since version was first installed.
Install Time	2013-02-01 15:34:29
Product	Firefox
Version	21.0a1
Build ID	20130201030927
Release Channel	nightly
OS	Windows NT
OS Version	6.1.7601 Service Pack 1
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 42 stepping 7
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x4
App Notes 	
AdapterVendorID: 0x1002, AdapterDeviceID: 0x6760, AdapterSubsysID: 3673103c, AdapterDriverVersion: 8.882.2.0
Has dual GPUs. GPU #2: AdapterVendorID2: 0x8086, AdapterDeviceID2: 0x0116, AdapterSubsysID2: 3673103c, AdapterDriverVersion2: 8.882.2.0D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 
Processor Notes 	sp-processor09.phx1.mozilla.com_15950:2008
EMCheckCompatibility	True
Adapter Vendor ID	0x1002
Adapter Device ID	0x6760
Total Virtual Memory	4294836224
Available Virtual Memory	3489337344
System Memory Use Percentage	35
Available Page File	10577408000
Available Physical Memory	4386545664

Frame 	Module 	Signature 	Source
0 	mozjs.dll 	js::types::TypeCompartment::addPendingRecompile 	js/src/jsinfer.cpp:2794
1 	mozjs.dll 	TypeConstraintFreeze::newType 	js/src/jsinfer.cpp:1677
2 	mozjs.dll 	js::types::TypeCompartment::resolvePending 	js/src/jsinferinlines.h:1150
3 	mozjs.dll 	js::types::TypeSet::addType 	js/src/jsinferinlines.h:1469
4 	mozjs.dll 	js::types::TypeCompartment::markSetsUnknown 	js/src/jsinfer.cpp:2953
5 	mozjs.dll 	js::SetClassAndProto 	js/src/jsobj.cpp:2878
6 	mozjs.dll 	JSObject::ReserveForTradeGuts 	js/src/jsobj.cpp:1787
7 	mozjs.dll 	JSObject::swap 	js/src/jsobj.cpp:2019
8 	mozjs.dll 	JS_TransplantObject 	js/src/jsapi.cpp:1602
9 	xul.dll 	xpc::TransplantObject 	js/xpconnect/wrappers/WrapperFactory.cpp:639
10 	xul.dll 	mozilla::dom::ReparentWrapper 	dom/bindings/BindingUtils.cpp:1417
11 	xul.dll 	nsNodeUtils::CloneAndAdopt 	content/base/src/nsNodeUtils.cpp:532
12 	xul.dll 	nsNodeUtils::Adopt 	content/base/src/nsNodeUtils.h:185
13 	xul.dll 	nsIDocument::AdoptNode 	content/base/src/nsDocument.cpp:6368
14 	xul.dll 	nsDocument::AdoptNode 	content/base/src/nsDocument.cpp:6250
15 	xul.dll 	AdoptNodeIntoOwnerDoc 	content/base/src/nsINode.cpp:1278
16 	xul.dll 	nsINode::ReplaceOrInsertBefore 	content/base/src/nsINode.cpp:1847
17 	xul.dll 	mozilla::dom::NodeBinding::insertBefore 	obj-firefox/dom/bindings/NodeBinding.cpp:524
18 	xul.dll 	mozilla::dom::NodeBinding::genericMethod 	obj-firefox/dom/bindings/NodeBinding.cpp:1424
19 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:390
20 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:437
21 	mozjs.dll 	js::CrossCompartmentWrapper::call 	js/src/jswrapper.cpp:631
22 	mozjs.dll 	proxy_Call 	js/src/jsproxy.cpp:3015
23 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:383
24 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:2376
25 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:339
26 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:404
27 	mozjs.dll 	js::Invoke 	js/src/jsinterp.h:131
28 	mozjs.dll 	js_fun_apply 	js/src/jsfun.cpp:965
29 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:390
30 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:2376
31 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:339
32 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:404
33 	mozjs.dll 	js::Invoke 	js/src/jsinterp.h:131
34 	mozjs.dll 	mozjs.dll@0xb70d0 	

More reports at:
https://crash-stats.mozilla.com/report/list?signature=js%3A%3Atypes%3A%3ATypeCompartment%3A%3AaddPendingRecompile%28JSContext*%2C+js%3A%3Atypes%3A%3ARecompileInfo+const%26%29
It's #5 top crasher in today's build.
Keywords: topcrash
Tracking since it's a top crasher and assigning to David to go through the regression range to try and find a culprit for this sudden explosion in crash volume.
Assignee: general → dvander
Nothing in the regression window looks suspect, but it's hard to say since this is fairly core code. Soccorro unfortunately isn't great for answering questions like, "what addresses does this crash at the most", but eyeballing the reports available it seems like the common crash is at 0x4, so a NULL deref in RecompileInfo::compileOuput().

Nicolas, do you know what that means or how the compile output information could be NULL?
Flags: needinfo?(nicolas.b.pierron)
Just by looking at the code, it seems that cx->compartment->types.constrainedOutputs is NULL, and when we use the operator [] on it we access the pointer to the beginning which is at te beginning of the Vector (behind the vtable).

The only case where this might happen is in the sweep case where we reset constrainedOutputs.  Maybe the validity is not yet right as we might be waiting for a parallel compilation.  This is still some blind-guess of what is going on.
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #4)
> Just by looking at the code, it seems that
> cx->compartment->types.constrainedOutputs is NULL, and when we use the
> operator [] on it we access the pointer to the beginning which is at te
> beginning of the Vector (behind the vtable).

In other words.  When we call the operator[] on cx->compartment->types.constrainedOutputs, we have an offset of 4/8 on x86/x64, because it reads the pointer which is stored just after the TempAllocPolicy class (not the vtable) from which it inherits.

Currently, just by looking at offsets of the inlined functions, this is the only location which match the displacement.  So, I am quite confident about the symptoms and we should look on how the list of constraint output can be reset without getting rid of all compiled code.

PS: That's awful how sleeping can disable you ability to read your own writing …
It seems to be related to Google+ or Gmail according to comments.
OS: Windows 7 → All
Whiteboard: [native-crash]
This is now #2 on Aurora 21.
After talking with Brian and Nicolas, we think this may be caused by a cross-compartment mismatch. I don't see anything particularly incriminating in the regression range. Bill, do you?

We're going to try sanity checking against NULL and see what happens.
https://hg.mozilla.org/try/rev/236dd1d066bd

This patch does some NULL checking and cross-compartment release-asserts. If it looks okay on try we should throw it at nightly for a day or two to see what happens
Status: NEW → ASSIGNED
Bug 836312 is in the range. We could try backing it out to see if there's any effect. Backing it out shouldn't have much effect on performance.

However, there's nothing about that patch that seems likely to trigger this problem.
Crash Signature: [@ js::types::TypeCompartment::addPendingRecompile(JSContext*, js::types::RecompileInfo const&)] → [@ js::types::TypeCompartment::addPendingRecompile(JSContext*, js::types::RecompileInfo const&)] [@ js::types::CompilerOutput::isValid() ]
I've been trying, unsuccessfully, to reproduce this crash in Firefox Aurora 21.0a2 on Windows 7. I've been using GMail and Google+ as the comments indicate but no crashes experienced in the last couple of hours. Are they any more specific leads I can follow?
(In reply to Bill McCloskey (:billm) from comment #11)
> Bug 836312 is in the range. We could try backing it out to see if there's
> any effect. Backing it out shouldn't have much effect on performance.
> 
> However, there's nothing about that patch that seems likely to trigger this
> problem.

Considering this is a top-crash & the spike coincides with this landing I think we should go ahead & back out Bug 836312 on Aurora(only) and continue our investigation on central based on :dvander's approach in comment# 10 to see if either helps .

We should be able to easily identify if the backout really helped based on crash-stats.Bill can you or :jonco please help with the backout here?
(In reply to bhavana bajaj [:bajaj] from comment #13)
> Considering this is a top-crash & the spike coincides with this landing I
> think we should go ahead & back out Bug 836312 on Aurora(only) and continue
> our investigation on central based on :dvander's approach in comment# 10 to
> see if either helps .
> 
> We should be able to easily identify if the backout really helped based on
> crash-stats.Bill can you or :jonco please help with the backout here?

Anything QA can do to assist in this effort?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #14)
> (In reply to bhavana bajaj [:bajaj] from comment #13)
> > Considering this is a top-crash & the spike coincides with this landing I
> > think we should go ahead & back out Bug 836312 on Aurora(only) and continue
> > our investigation on central based on :dvander's approach in comment# 10 to
> > see if either helps .
> > 
> > We should be able to easily identify if the backout really helped based on
> > crash-stats.Bill can you or :jonco please help with the backout here?
> 
> Anything QA can do to assist in this effort?
Keywords: qawanted
(In reply to Bill McCloskey (:billm) from comment #15)
> Sorry, I forgot about this. Backed out on Aurora:
There are still crashes in 21.0a2/20130308.
True, it's still around in current Aurora builds, but crash frequency seems to have decreased.
Attached patch diagnostic patchSplinter Review
Brian, this is the diagnostic patch we talked about. It tries to stop the NULL derefs and also tell us whether cross-compartment violations are happening.
Attachment #723533 - Flags: review?(bhackett1024)
Comment on attachment 723533 [details] [diff] [review]
diagnostic patch

Review of attachment 723533 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsinfer.cpp
@@ +2312,5 @@
> +        if (!co) {
> +            if (script->compartment() != cx->compartment)
> +                MOZ_CRASH();
> +            return;
> +        }

Unfortunately I think there are a bunch more places where cross compartment pointers could cause TI data to be malformed.  To be comprehensive these checks should be at API boundaries.
Attachment #723533 - Flags: review?(bhackett1024) → review+
There's a second improvement between 21.0a2/20130314 and 20130316. The working range is:
http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=a64ef5e8030c&tochange=8f067a1714bf
(In reply to Scoobidiver from comment #23)
> There's a second improvement between 21.0a2/20130314 and 20130316.
It's a red herring because Aurora L10N builds are no longer updated (bug 851720).
(In reply to Brian Hackett (:bhackett) from comment #20)
> Unfortunately I think there are a bunch more places where cross compartment
> pointers could cause TI data to be malformed.  To be comprehensive these
> checks should be at API boundaries.

Yeah, understandable. I was told we can't do that because it crashes too much in practice, or there was some perf cost or something.

https://hg.mozilla.org/integration/mozilla-inbound/rev/f2706052a3bf
https://hg.mozilla.org/mozilla-central/rev/f2706052a3bf
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
I think that patch is only diagnostic, so this bug should be kept open, right?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [native-crash] → [native-crash][leave open]
dvander, do the diagnostics help here?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #28)
> dvander, do the diagnostics help here?

My guess would be that this helps identifying the kind of issues that we have, just by looking at the graph that is reported here:

https://crash-stats.mozilla.com/report/list?signature=js%3A%3Atypes%3A%3ATypeCompartment%3A%3AaddPendingRecompile%28JSContext*%2C+js%3A%3Atypes%3A%3ARecompileInfo+const%26%29
Adding needinfo on :dvander to help with comment# 28/29.
Flags: needinfo?(dvander)
My crash-stats fu isn't good enough to see whether the diagnostics helped. Am I supposed to be able to see my MOZ_CRASH's somewhere?

But these crashes have completely gone away on nightly, so the non-diagnostic part of the patch did something. I'll tear out the diagnostics.
Flags: needinfo?(dvander)
I'm confused, it doesn't look like they've gone away on nightly. The link in comment 29 still shows crashes such as https://crash-stats.mozilla.com/report/index/dda76994-a9cc-44ed-abfd-4f2df2130402

But in this case at least the crash is right at the beginning of the function, at http://hg.mozilla.org/releases/mozilla-aurora/annotate/28ff0e110387/js/src/jsinfer.cpp#l2800 so either `info` or `cx` is null (debugging the minidump directly would show you in the inlined function what was actually being dereferenced).
When I look at the top crashers for 22, addPendingRecompile is at 0.06%
(In reply to David Anderson [:dvander] from comment #33)
> When I look at the top crashers for 22, addPendingRecompile is at 0.06%
For both crash signatures, crashes stopped in 22.0a1/20130320. The working range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e23e43a2c14e&tochange=8156df33b757
It seems either the diagnostic patch has fixed these crashes or another patch in the working range did it.
How can mere diagnostics fix the crash? Could the signature have shifted, i.e. are we crashing from a different signature instead (bsmedberg said that shouldn't be the case)? Or is the diagnostics stuff changing timing and influencing what's going on?
The patch was a combination of papering over the crash, as well as some diagnostics to try and determine why the crash was happening. Apparently, the diagnostics were inconclusive. I'll make a follow-up patch to remove them shortly.
Ah, makes sense. The papering-over part should get uplifted to 21 as well in that case.
(In reply to David Anderson [:dvander] from comment #36)
> The patch was a combination of papering over the crash
Can you write it without the diagnostic part and uplift it to Beta where it's currently #4 top browser crasher?
Adding needsinfo on :dvander to help with comment #37,38.

:dvander, we have already gone to build with 21.0b2 targeting release today. 21.0b3 will be going to build Tuesday, Apr 16 early morning PT. Can you please help get a patch ready asap so that it can get into our next beta ?
Flags: needinfo?(dvander)
Attachment #736018 - Flags: review?(bhackett1024)
Flags: needinfo?(dvander)
Attachment #736018 - Flags: review?(bhackett1024) → review+
Attachment #736023 - Flags: review?(bhackett1024) → review+
:dvander, can you please request the beta patch for uplift and land by Monday EOD/Tue Morning before we go-to-build with our Fx21 beta 4(targeted Tuesday Morning)?Thanks !
Setting need-info for comment #42
Flags: needinfo?(dvander)
Comment on attachment 736023 [details] [diff] [review]
patch for beta

[Approval Request Comment]
Bug caused by (feature/regressing bug #): unknown
User impact if declined: specific top crash signature
Testing completed (on m-c, etc.): yes
Risk to taking this patch (and alternatives if risky): none
String or IDL/UUID changes made by this patch:
Attachment #736023 - Flags: approval-mozilla-beta?
Flags: needinfo?(dvander)
Comment on attachment 736023 [details] [diff] [review]
patch for beta

low risk patch, lnown to resolve a top-crasher by nightly testing.

Approving uplift on beta. Please make sure to land by Tuesday(tomorrow) Morning to get this into our next beta going to build tomorrow.Thanks!
Attachment #736023 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Do you plan to land the patch to remove the diagnostic part in Aurora and the trunk?
Flags: needinfo?(dvander)
Keywords: topcrash
Crash Signature: [@ js::types::TypeCompartment::addPendingRecompile(JSContext*, js::types::RecompileInfo const&)] [@ js::types::CompilerOutput::isValid() ] → [@ js::types::TypeCompartment::addPendingRecompile(JSContext*, js::types::RecompileInfo const&)] [@ js::types::CompilerOutput::isValid() ] [@ js::types::TypeCompartment::addPendingRecompile] [@ js::types::CompilerOutput::isValid ]
Assignee: dvander → nobody
Closing because no crash reported since 12 weeks.
Status: REOPENED → RESOLVED
Closed: 7 years agoLast year
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.