Closed Bug 1058131 Opened 6 years ago Closed 6 years ago

crash in mozilla::dom::Element::SetAttr(int, nsIAtom*, nsAString_internal const&, bool)

Categories

(Core :: DOM: Core & HTML, defect, critical)

34 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla35
Tracking Status
firefox32 + verified
firefox33 --- verified
firefox34 --- verified
firefox35 --- verified

People

(Reporter: lizzard, Assigned: dmajor)

Details

(Keywords: crash, topcrash, topcrash-win)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-f925fb18-8071-478d-8a5e-9ded62140822.
=============================================================
This is the #10 topcrasher for Firefox 32.0b9 (2014082202) with 679 crashes in the last 7 days. 

From commments, it appears to happen during the update from Beta 8 to Beta 9 or just afterwards on startup and leads to a continuous cycle of startup crashes. 


0 	xul.dll 	mozilla::dom::Element::SetAttr(int, nsIAtom*, nsAString_internal const&, bool) 	obj-firefox/dist/include/mozilla/dom/Element.h
1 	xul.dll 	nsDOMStringMap::NamedSetter(nsAString_internal const&, nsAString_internal const&, mozilla::ErrorResult&) 	content/html/content/src/nsDOMStringMap.cpp
2 	xul.dll 	mozilla::dom::DOMStringMapBinding::DOMProxyHandler::setCustom(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>, bool*) 	obj-firefox/dom/bindings/DOMStringMapBinding.cpp
3 	xul.dll 	mozilla::dom::DOMProxyHandler::set(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, bool, JS::MutableHandle<JS::Value>) 	dom/bindings/DOMJSProxyHandler.cpp
4 	mozjs.dll 	js::Proxy::set(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, bool, JS::MutableHandle<JS::Value>) 	js/src/jsproxy.cpp
5 	mozjs.dll 	js::proxy_SetGeneric(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>, bool) 	js/src/jsproxy.cpp
6 	mozjs.dll 	JSObject::nonNativeSetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>, bool) 	js/src/jsobj.cpp
7 	mozjs.dll 	JSObject::setGeneric(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>, bool) 	js/src/jsobj.h
8 	mozjs.dll 	SetPropertyOperation 	js/src/vm/Interpreter.cpp
9 	mozjs.dll 	Interpret 	js/src/vm/Interpreter.cpp
10 	mozjs.dll 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp
11 	mozjs.dll 	js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value const&, js::ExecuteType, js::AbstractFramePtr, JS::Value*) 	js/src/vm/Interpreter.cpp
12 	mozjs.dll 	js::Execute(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value*) 	js/src/vm/Interpreter.cpp
13 	mozjs.dll 	Evaluate 	js/src/jsapi.cpp
14 	mozjs.dll 	JS::Evaluate(JSContext*, JS::Handle<JSObject*>, JS::ReadOnlyCompileOptions const&, JS::SourceBufferHolder&) 	js/src/jsapi.cpp
15 	xul.dll 	nsJSUtils::EvaluateString(JSContext*, JS::SourceBufferHolder&, JS::Handle<JSObject*>, JS::CompileOptions&, nsJSUtils::EvaluateOptions const&, JS::MutableHandle<JS::Value>, void**) 	dom/base/nsJSUtils.cpp
16 	xul.dll 	nsJSUtils::EvaluateString(JSContext*, JS::SourceBufferHolder&, JS::Handle<JSObject*>, JS::CompileOptions&, void**) 	dom/base/nsJSUtils.cpp
17 	xul.dll 	nsScriptLoader::EvaluateScript(nsScriptLoadRequest*, JS::SourceBufferHolder&, void**) 	content/base/src/nsScriptLoader.cpp
18 	xul.dll 	nsScriptLoader::ProcessRequest(nsScriptLoadRequest*, void**) 	content/base/src/nsScriptLoader.cpp
19 	xul.dll 	nsScriptRequestProcessor::Run() 	content/base/src/nsScriptLoader.cpp
20 	xul.dll 	nsContentUtils::RemoveScriptBlocker() 	content/base/src/nsContentUtils.cpp
21 	xul.dll 	nsDocument::EndUpdate(unsigned int) 	content/base/src/nsDocument.cpp
22 	xul.dll 	nsHTMLDocument::EndUpdate(unsigned int) 	content/html/document/src/nsHTMLDocument.cpp
23 	xul.dll 	nsINode::ReplaceOrInsertBefore(bool, nsINode*, nsINode*, mozilla::ErrorResult&) 	content/base/src/nsINode.cpp
24 	xul.dll 	mozilla::dom::NodeBinding::replaceChild 	obj-firefox/dom/bindings/NodeBinding.cpp
25 	xul.dll 	mozilla::dom::GenericBindingMethod(JSContext*, unsigned int, JS::Value*) 	dom/bindings/BindingUtils.cpp
26 	mozjs.dll 	js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
27 	mozjs.dll 	js::EmptyShape::getInitialShape(js::ExclusiveContext*, js::Class const*, js::TaggedProto, JSObject*, JSObject*, unsigned int, unsigned int) 	js/src/vm/Shape.cpp
That's a near-null deref at 0xd4.

At a guess, that means the vtable pointer of "this" in SetAttr is null, which is ... very not good.  The other option is that the nsDOMStringMap's mElement is null, but then I'd expect to crash reading the vtable pointer, which should be at a much smaller offset in mElement.
[Tracking Requested - why for this release]: Currently a top crasher on Beta 9 (not all of Beta), so let's keep an eye on it.
Those are the top correlations for this signature:

100% (251/251) vs.   1% (334/40118) Aavm4h.dll
100% (251/251) vs.   1% (335/40118) aswCommChannel.dll
100% (251/251) vs.   1% (347/40118) ashBase.dll
100% (251/251) vs.   1% (350/40118) avastIP.dll
100% (251/251) vs.   1% (355/40118) ashTask.dll
100% (251/251) vs.   1% (355/40118) aswCmnOS.dll
100% (251/251) vs.   1% (355/40118) aswAux.dll
100% (251/251) vs.   1% (355/40118) aswProperty.dll
100% (251/251) vs.   1% (355/40118) aswEngLdr.dll
100% (251/251) vs.   1% (355/40118) aswCmnBS.dll
100% (251/251) vs.   1% (355/40118) aswCmnIS.dll
100% (251/251) vs.   1% (357/40118) AavmRpch.dll
101% (253/251) vs.   3% (1013/40118) msvcp110.dll
101% (253/251) vs.   3% (1069/40118) msvcr110.dll
100% (251/251) vs.  13% (5241/40118) winhttp.dll
100% (251/251) vs.  15% (5922/40118) aswJsFlt.dll
100% (251/251) vs.  16% (6366/40118) snxhk.dll
100% (251/251) vs.  45% (17999/40118) rasman.dll
100% (251/251) vs.  45% (17999/40118) rasapi32.dll
53% (132/251) vs.   5% (2041/40118) webio.dll
91% (228/251) vs.  55% (22200/40118) FWPUCLNT.DLL
55% (139/251) vs.  21% (8609/40118) d3d11.dll

Sounds a lot to me like it's Avast that makes us run into issues here.
At first look, it's a null vtable pointer, but...

0:000> u .-14
xul!mozilla::dom::Element::SetAttr [c:\builds\moz2_slave\rel-m-beta-w32_bld-00000000000\build\obj-firefox\dist\include\mozilla\dom\element.h @ 432]:
572c85a4 e977dc5918      jmp     aswJsFlt+0x6220 (6f866220)

Lovely.

That's not a dllexport symbol, right? How else could Avast get a hold of that function, is it reachable through vtables that JS hands out? Or are they mining the symbol file?

If they are using the PDB, a potential hazard is that there are two overloads of this name:

0:000> x xul!mozilla::dom::Element::SetAttr
573cea70          xul!mozilla::dom::Element::SetAttr (int, class nsIAtom *, class nsIAtom *, class nsAString_internal *, bool)
572c85a4          xul!mozilla::dom::Element::SetAttr (int, class nsIAtom *, class nsAString_internal *, bool)

Or maybe it's unrelated and their hook is simply doing something bad. Do we have a channel to any Avast engineers?
By the way, this is a recent regression:

    100% (251/251) vs.   1% (304/40118) 10.0.2022.566

    Timestamp:        Fri Aug 15 22:07:10 2014 (53EDDBCE)
    File version:     10.0.2022.566
QA Contact: andrei.vaida
We already have a potential RC2 driver. I would like to understand the source of this bug and if we can fix the top crasher as well. We're going to need a fix today/tomorrow though.

dmajor/kairo - Can you continue to help with the diagnosis? Does this crash impact any other channels?
jst/overholt - Can you find an owner to work on this today?
Flags: needinfo?(overholt)
Flags: needinfo?(kairo)
Flags: needinfo?(jst)
Flags: needinfo?(dmajor)
bz and/or peterv, can you guys look into this one? Note the urgency per the above comment.
Assignee: nobody → bzbarsky
Flags: needinfo?(peterv)
Flags: needinfo?(overholt)
Flags: needinfo?(jst)
Flags: needinfo?(bzbarsky)
There were 4 crashes for 33.0a2, all on a single install and all startup crashes.
So Kyle explained to me that comment 4 is saying that Avast overwrote part of our function with their code...  If so, that sure sounds like an Avast bug.  David, am I understanding that part correctly?

SetAttr is certainly virtual, and I would expect they could get their hands on the vtable it's on.
Flags: needinfo?(bzbarsky)
Sent a message to my contact at Avast.
(In reply to Boris Zbarsky [:bz] from comment #9)
> So Kyle explained to me that comment 4 is saying that Avast overwrote part
> of our function with their code...  If so, that sure sounds like an Avast
> bug.  David, am I understanding that part correctly?

I spoke with dmajor on irc. He confirmed that this is Avast and said he was looking for mitigation/hack on our side. Given that this is Avast, I don't think we need to block 32 on this issue.

Jorge - I have some Avast contacts as well and can follow up if you're not making process. If we're unable to get this issue fixed before release day, will a block of Avast mitigate this issue?
Flags: needinfo?(peterv)
Flags: needinfo?(kairo)
Flags: needinfo?(jorge)
(In reply to Boris Zbarsky [:bz] from comment #9)
> So Kyle explained to me that comment 4 is saying that Avast overwrote part
> of our function with their code...  If so, that sure sounds like an Avast
> bug.  David, am I understanding that part correctly?

Yes, Avast installed a hook on our SetAttr function. When we come back from the hook, |this| is messed up.
Flags: needinfo?(dmajor)
Somewhat good news, according to this forum, the problematic build 10.0.2022.566 is "AVAST 2015 Beta1" https://forum.avast.com/index.php?topic=153511.0

The main download from avast.com stays at version 9 and doesn't offer to update me to 10. I'll try the beta anyway though.
(In reply to Lawrence Mandel [:lmandel] from comment #11)
> Jorge - I have some Avast contacts as well and can follow up if you're not
> making process. If we're unable to get this issue fixed before release day,
> will a block of Avast mitigate this issue?

One of the forum posters on comment #14 mentioned that the problem happens even in Safe Mode, which suggests that this is the Avast software causing the crash, rather than an add-on. If that's true, then a block won't help.
Flags: needinfo?(jorge)
Well, an add-on block wouldn't help. A DLL block might, but that requires a patch.
Confirmed crash on FF startup with FF 32b9 and Avast Beta 10.0.2022.566. I have a VM snapshot from shortly before the crash.

Here's what's happening. Avast is installing the same hook onto (at least) two of our functions, Element::SetAttr and nsGenericHTMLElement::SetAttr. Avast sets up the same jump target for both:

0:000> u 5d4185a4 
xul!mozilla::dom::Element::SetAttr [c:\builds\moz2_slave\rel-m-beta-w32_bld-00000000000\build\obj-firefox\dist\include\mozilla\dom\element.h @ 432]:
5d4185a4 e977dc3916      jmp     aswJsFlt+0x6220 (737b6220)

0:000> u 5de2cf6b
xul!nsGenericHTMLElement::SetAttr [c:\builds\moz2_slave\rel-m-beta-w32_bld-00000000000\build\content\html\content\src\nsgenerichtmlelement.h @ 542]:
5de2cf6b e9b0929815      jmp     aswJsFlt+0x6220 (737b6220)

After the hook, execution jumps to a trampoline that makes up for the bytes 'stolen' from Element::SetAttr, and then continues Element::SetAttr where we left off.

In FF 32b8 this worked, because the two SetAttr's were byte-for-byte identical. In FF 32b9, the functions are different. So we enter nsGenericHTMLElement::SetAttr, then get detoured to Avast and finally come back to Element::SetAttr and start executing unexpected instructions. Oops.

Avast should stop using the same hook for those two functions. In the meantime, as a mitigation, maybe we could get those two functions identical again?

Why did the functions diverge, anyway? Is there a functional difference, or compiler whim (in which case we might get lucky with the real 32 build)? I need to double check, but at first glance I think what happened is the compiler optimized the "push 0" for kNameSpaceID_None by pushing it down into the callee here: http://hg.mozilla.org/mozilla-central/annotate/e30f862fe19e/content/html/content/src/nsDOMStringMap.cpp#l120
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #18)
> Can we blocklist the DLL?

Not in the usual way. They have code to bypass our blocklist. I don't think we can develop something to counter it before the next merge.
I think the Avast issue was exposed by an unlucky FF compilation. Neither 32 beta 8 nor the latest Aurora expose the bug, for example.

Do we still make "beta 99" RC builds? If so, I can look at the disassembly and tell you whether it will be affected (as well as actually test it out). We might simply get lucky.

If we still have a crashy build, we might be able force the affected functions to be identical using optimization pragmas. Or just live with the issue since it's only on the Avast beta.
Flags: needinfo?(lmandel)
Ok, I found the build1 candidate, and it's fine (both from code inspection and an actual test). I heard something about a potential build2 though, so ping me to re-test if/when that's available.
(I think we should figure out easy way to blocklist .dlls which inject themselves into FF process in this kind of error prone way.)
Olli, see comment 19.  This particular injection actually actively works to bypass our blocklist...
And I meant that we should actively try to improve our blocklisting code and actively block
this kind of hostile addons or dlls (or what we should call Avast in this case)
(In reply to David Major [:dmajor] from comment #21)
> Ok, I found the build1 candidate, and it's fine (both from code inspection
> and an actual test). I heard something about a potential build2 though, so
> ping me to re-test if/when that's available.

No build2 as of now. I'll ping you if this changes.

I take it this means that we don't need a fix from Avast before 32 is released. We should still follow up with them to correct their behaviour so that this doesn't reoccur.

Can we resolve this bug as works for me?
Flags: needinfo?(lmandel)
> They have code to bypass our blocklist.

David, is that specific to the beta, or in general?
Flags: needinfo?(dmajor)
That is, specific to the Avast 10 beta, or also shipping in Avast 9?
Version 9 as well.

I am speculating, but since we're not currently blocklisting Avast, I don't think their countermeasure was intended for us. It could just as well be meant to stop malware that tries to block AVs using the same technique.

The DLL blocklist is excellent way to deal with unmaintained, crashy applications. But it's not a good defense against a determined adversary. We have a public codebase, and Avast has root. At best it would be a back-and-forth with each side countering one another in every new release.

We may have better luck with a non-technical solution. Let's talk to Avast and make them aware of the issue. Presumably that's what betas are meant for.
Flags: needinfo?(dmajor)
(In reply to David Major [:dmajor] from comment #28)
> We may have better luck with a non-technical solution. Let's talk to Avast
> and make them aware of the issue. Presumably that's what betas are meant for.

Agreed. Jorge is already working on this and I told him that I can help if he is unsuccessful in making contact.
The crash has returned on the 32.0.1 build1 candidate. This time it's much less consistent, maybe one out of every five to ten launches. It's also a different crash. This time the trampoline for nsGenericHtmlElement::SetAttr isn't mixed up with another SetAttr; it's just plain garbage.

We could try inlining nsGenericHtmlElement::SetAttr so that it can't be hooked. That's an easy, low-risk change that ought to get the browser at least started up. On the other hand, I don't know what other crash we might hit later on after some use -- Avast obviously has some badness going on.
Attached patch Wallpaper patchSplinter Review
Attachment #8487841 - Flags: review?(bzbarsky)
Comment on attachment 8487841 [details] [diff] [review]
Wallpaper patch

r=me assuming it fixes the issue.
Attachment #8487841 - Flags: review?(bzbarsky) → review+
Assignee: bzbarsky → dmajor
(In reply to Boris Zbarsky [:bz] from comment #32)
> r=me assuming it fixes the issue.

I'd like to have things ready for dmajor when he comes online. How should we test whether this fixes the issue? Should we get this into a Nightly build? Should we land this directly on release so that dmajor can verify on 32.0.1 build#2?
Flags: needinfo?(bzbarsky)
I think the only way to do it is the latter; testing in nightly is pointless because the issue seems to depend on the exact binary code generated.
Flags: needinfo?(bzbarsky)
Comment on attachment 8487841 [details] [diff] [review]
Wallpaper patch

You're right. I agree. I'm approving for release. bz, all of the sheriffs are afk. Can you please help land this on mozilla-release so that we can get the build going?
Attachment #8487841 - Flags: approval-mozilla-release+
Flags: needinfo?(bzbarsky)
Alright.  Landed https://hg.mozilla.org/releases/mozilla-release/rev/01a075f70172

Now that this will not affect 32.0.1 build#1, since it landed on default, not the branch for that build, afaict.
Flags: needinfo?(bzbarsky)
32.0.1 build2 looks good.

* Confirmed there is no symbol for nsGenericHTMLElement::SetAttr
* Watched the hook setup - that hook attaches only to mozilla::dom::Element::SetAttr
* Tested several cycles of: startup, load tabs, load Avast sidebar, change settings, navigate
Mozilla/5.0 (Windows NT 6.1; rv:32.0) Gecko/20100101 Firefox/32.0


Crashed with this signature on Windows 7 32-bit using Firefox 32 beta 9 and Avast 10.0.2022.124.

Unable to reproduce it using Firefox 32.0.1 build 2 (build ID: 20140911151253) with the same Avast version.
Let's put this on the other branches as well.
Keywords: checkin-needed
We should probably land this on Aurora and beta too, right?
Yes - but since it's no longer super urgent I figured I'd let it ride the normal uplift process.
Comment on attachment 8487841 [details] [diff] [review]
Wallpaper patch

Approval Request Comment
[Feature/regressing bug #]: Avast 2015 Beta1
[User impact if declined]: Startup crashes
[Describe test coverage new/current, TBPL]: It's in 32.0.1
[Risks and why]: Very low risk
[String/UUID change made/needed]: None
Attachment #8487841 - Flags: approval-mozilla-beta?
Attachment #8487841 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/9c113bb19f42
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Attachment #8487841 - Flags: approval-mozilla-beta?
Attachment #8487841 - Flags: approval-mozilla-beta+
Attachment #8487841 - Flags: approval-mozilla-aurora?
Attachment #8487841 - Flags: approval-mozilla-aurora+
Confirming the fix for this issue on Windows 7 32-bit using:
1. Latest Nightly - build ID: 20140916133140
2. Latest Aurora - build ID: 20140916004001
3. Firefox 33 beta 4 - build ID: 20140915204924

User Agents:
1. Mozilla/5.0 (Windows NT 6.1; rv:35.0) Gecko/20100101 Firefox/35.0
2. Mozilla/5.0 (Windows NT 6.1; rv:34.0) Gecko/20100101 Firefox/34.0
3. Mozilla/5.0 (Windows NT 6.1; rv:33.0) Gecko/20100101 Firefox/33.0

Also marking verified on Firefox 32 based on comment 39.
Confirmed that 32.0.2 build1 does not crash.
QA Contact: andrei.vaida → cornel.ionce
You need to log in before you can comment on or make changes to this bug.