Closed Bug 750019 Opened 8 years ago Closed 7 years ago

crash in nsIDOMHTMLDocument_Write

Categories

(Core :: JavaScript Engine, defect, critical)

14 Branch
x86
Windows XP
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla18
Tracking Status
firefox14 - affected
firefox15 - affected
firefox16 - affected
firefox17 + verified

People

(Reporter: scoobidiver, Assigned: dvander)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(1 file)

It's #13 top crasher in the first days of 14.0a2.
It first appeared in 14.0a1/20120417. The regression range might be (discontinuous):
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=fd06332733e5&tochange=c61e7c3a232a

The first frames of the stack are:
Frame 	Module 	Signature 	Source
0 	xul.dll 	nsIDOMHTMLDocument_Write 	obj-firefox/js/xpconnect/src/dom_quickstubs.cpp:13527
1 	mozjs.dll 	js::types::TypeSet::addType 	js/src/jsinferinlines.h:1128
2 	mozjs.dll 	js::types::TypeMonitorResult 	js/src/jsinfer.cpp:4971

More reports at:
https://crash-stats.mozilla.com/report/list?signature=nsIDOMHTMLDocument_Write
Crash Signature: [@ nsIDOMHTMLDocument_Write] → [@ nsIDOMHTMLDocument_Write ]
The stacks look funny--I think the last signature might be bogus, but it would take looking at the disassembly around the crash point to know for sure. This has been around since at least 3/13 (on nightly?) but got much more common around 4/28 (on aurora? it's hard to read the graphs).
Whiteboard: [js:p1:fx15][js:investigate][js:dm]
The stack signature is bogus, addType does not call nsIDOMHTMLDocument_Write nor any other VM function.  addType can leave return addresses on the C stack so will show up from time to time during stack scanning, and if nsIDOMHTMLDocument_Write is called directly from jitcode (as can happen) then that old return address is likely to still be around.
Looks more like a DOM bug then.
Assignee: general → nobody
Component: JavaScript Engine → DOM
QA Contact: general → general
Whiteboard: [js:p1:fx15][js:investigate][js:dm]
Summary: crash in js::types::TypeSet::addType @ nsIDOMHTMLDocument_Write → crash in nsIDOMHTMLDocument_Write
Johnny - could somebody on your team take a look? Thanks!
Assignee: nobody → jst
Peter, can you investigate this?
Assignee: jst → peterv
It's #19 top browser crasher in 15.0a2.
14.0b6 is unaffected while the latest build of 14.0a2 is affected, so it's probably a build configuration issue.
(In reply to Johnny Stenback (:jst, jst@mozilla.com) from comment #5)
> Peter, can you investigate this?

Peter - any updates?
Nope, I've looked at minidumps and we're definitely just crashing in nsIDOMHTMLDocument_Write, but without a reproducable testcase there's not much to go on.
Only Nightly and Aurora builds are affected.
(In reply to Scoobidiver from comment #9)
> Only Nightly and Aurora builds are affected.

Thanks Scoobidiver. No need to track for FF14 in that case.
It affects 14.0b8 while 14.0b6 and b7 were unaffected. It makes me think to bug 700288 and co that show up randomly.

It's currently #11 top browser crasher in 14.0b8.
This has moved up to #9 in Beta 8 data, and if you factor out the hang/empty signatures it really ranks as #3.
(In reply to Peter Van der Beken [:peterv] from comment #8)
> Nope, I've looked at minidumps and we're definitely just crashing in
> nsIDOMHTMLDocument_Write, but without a reproducable testcase there's not
> much to go on.

Have you had the opportunity to look at changes in the range of http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=fd06332733e5&tochange=c61e7c3a232a as well?
Yes, I don't immediately see anything to implicate.
There are no crashes in 14.0b9 and b10.
(In reply to Scoobidiver from comment #15)
> There are no crashes in 14.0b9 and b10.

Thanks scoobidiver. Given that, untracking.
It's a signature that shows up depending on builds, likely related to PGO.
(In reply to Scoobidiver from comment #18)
> It's a signature that shows up depending on builds, likely related to PGO.

Still #10 in beta, might have stuck this time.

"I am playing farmville on Facebook and Mozilla has crashed 3 or 4 times in the last 2 days."
"loged onto pier 1 website and gave OK to enable images when it crashed. Page was www.pier1.com"
Keywords: qawanted
After an hour of testing on pier1 and facebook I've been unable to trigger this crash with Firefox 15.0b3 and Flash 11.3.300.268 on Windows XP SP3. Are there any other leads I can pursue? Can we do some user outreach? Are there any engineering insights?

Thanks
After some weekend dogfooding I'm still unable to reproduce this. Are there any other leads QA can follow?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #21)
> After some weekend dogfooding I'm still unable to reproduce this. Are there
> any other leads QA can follow?

Looking at the comments there are also many mentions of aol mail - can you try fiddling around with AOL mail to try and reproduce?
Yes, Farmville, Cafeworld and other Facebook games are still the prevalent occurrence in comments, but AOL Mail pops up, YouTube is mentioned, and Flash in general.
I gave another go with Farmville, tried Cafeworld, and played around with AOL Mail and have still failed to reproduce with crash using the same environment as comment 20. At this point I think we need to do some user outreach if at all possible. Please re-add qawanted if there is some other leads QA can pursue.
Keywords: qawanted
The stack trace in comment 0 has stopped after 15.0a1/20120502.

From 15.0a1/20120510, the first frames of the stack trace looks like:
Frame 	Module 	Signature 	Source
0 	xul.dll 	nsIDOMHTMLDocument_Write 	obj-firefox/js/xpconnect/src/dom_quickstubs.cpp:13654
1 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:313
2 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:361
3 	mozjs.dll 	js::CrossCompartmentWrapper::call 	js/src/jswrapper.cpp:651
4 	mozjs.dll 	proxy_Call 	js/src/jsproxy.cpp:1649
5 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:306
6 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:2515
7 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:329
8 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:361
9 	mozjs.dll 	JS_CallFunctionValue 	js/src/jsapi.cpp:5537

The regression range for this stack is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=dd29535bac5f&tochange=654ac86492e8
Assignee: peterv → general
Component: DOM → JavaScript Engine
At this point we're about to go to final beta build so bumping tracking-16 so we can keep tracking this in the hopes of a new avenue of insight.
So I tried building a build that corresponds to one of the crashing builds.

If the line number in crash-stats is not lying, the crash is at:

  *vp = JSVAL_VOID;

near the end of the method.  And it's a null-deref.  Which is Very Odd.
Given comment 18 and comment 27, maybe we should look around PGO. (And also comment 25. There is very little JS in the regression range. Would a JS_NEVER_INLINE on nsIDOMHTMLDocument_Write be respected by PGO? 

We could also look in a minidump and see if we can spot a faulty argument-pass or something like that. The methods look big, though, so it might hard to trace through.
We're going to try to get you a crash minidump, dmandelin.
I'll take a look at this.
Assignee: general → dvander
Status: NEW → ASSIGNED
This is starting to look like a PGO bug to me. comment #27 mentions that |vp| should not be NULL, and indeed it's not. |vp| is the third parameter, so it'll be offset by 16 from the EBP register:

> >   *(void **)(ebp + 0x10)
> >       0x03c10060

We first load this argument into EBX here:

>    00EEC538  mov         ebx,dword ptr [vp] 


The only two uses of EBX that follow are to call JS_ARGV() and xpc_qsDOMString's constructor. It seems safe to assume that EBX is valid at that point, since the loop will execute once (argc == 1). Otherwise those calls would probably have crashed.

The line of assembly that's crashing:

>    00EEC5DA  mov         dword ptr [ebx],ecx

EBX is suddenly 0 here. EBX is not written to anywhere else in the function, so some function call is clobbering EBX. The code in between the use of EBX and the crashing line is:

    --- snip ---
    if (!arg0.IsValid())
        return JS_FALSE;
    nsresult rv;

    nsAString &str = arg0;
    for (unsigned i = 1; i < argc; ++i) {
      xpc_qsDOMString next_arg(cx, argv[i], &argv[i],
                               xpc_qsDOMString::eStringify,
                               xpc_qsDOMString::eStringify);
      if (!next_arg.IsValid())
        return JS_FALSE;

      str.Append(next_arg);
    }

    rv = self->Write(arg0, cx);

    if (NS_FAILED(rv))
        return xpc_qsThrowMethodFailed(cx, rv, vp);
    *vp = JSVAL_VOID;

    --- snip ---
Under normal circumstances, the calling convention strictly requires that EBX be preserved by functions that wish to modify it. PGO however will happily slice and dice functions and make up seemingly arbitrary calling conventions. My guess is that there's a compiler bug where one of the callees isn't preserving EBX even though the caller requires it (that or, the caller isn't saving and restoring it).
Err, I misspoke. The loop won't run because argc == 1. So the likely offenders are calls outside the loop:

xpc_qsUnwrapThis<nsHTMLDocument> (1086983h)  
xpc_qsDOMString::xpc_qsDOMString (0F4D660h)

It's also possible that some guard in nsIDOMHTMLDocument_Write is failing, so we go to the deoptimized version of the function, and then we jump back in, and the corruption happened somewhere in between.
Boris, is it sane to try hardcoding a #pragma deoptimize("", off/on) around this particular function in the quickstub generator?
We can certainly try doing that, yes.
If we'd like to land a low risk speculative fix, please land on mozilla-central this week and prepare a patch for branches prior to our b4 go-to-build (Tuesday). Thanks!
Longstanding top crash, and close to FF16's release. We're going to have to resolve in FF17 first, if ready.
Let's get this moving. Here's a patch to disable opts on MSVC for the crashing function.
Attachment #667113 - Flags: review?(bzbarsky)
Comment on attachment 667113 [details] [diff] [review]
Turn off opts for nsIDOMHTMLDocument_Write on MSVC

r=me.  Thank you for picking this up!
Attachment #667113 - Flags: review?(bzbarsky) → review+
http://hg.mozilla.org/integration/mozilla-inbound/rev/30de0459165c
Whiteboard: [leave open]
Target Milestone: --- → mozilla18
I just experienced https://crash-stats.mozilla.com/report/index/bp-9491fdd4-449a-449c-b905-7344d2121003 while scrolling around on failblog.cheezburger.com with 15.0.1 it hasn't happened before, though.
Depends on: 798275
Please nominate for uplift as soon as you all are comfortable with landing on 17. Would be great to get this into our first beta.
Comment on attachment 667113 [details] [diff] [review]
Turn off opts for nsIDOMHTMLDocument_Write on MSVC

[Approval Request Comment]
Bug caused by (feature/regressing bug #): PGO on windows
User impact if declined: this topcrash signature stays around
Testing completed (on m-c, etc.): has run on Nightly for a few days
Risk to taking this patch (and alternatives if risky): Risk seems very low--we are changing anything only on Windows, and all we're doing is turning off optimization. So aside from the usual 'impossible' stuff, the risk is that Document.write becomes slower in a way that matters.
String or UUID changes made by this patch:
Attachment #667113 - Flags: approval-mozilla-beta?
Comment on attachment 667113 [details] [diff] [review]
Turn off opts for nsIDOMHTMLDocument_Write on MSVC

Approving for beta as this is a low risk fix
Attachment #667113 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
http://hg.mozilla.org/releases/mozilla-beta/rev/08aea86dd6af
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
Please verify by checking Socorro given we don't have a reproducible testcase.
Keywords: verifyme
I don't see any crash reports on FF newer than 16.0.1, 17.0a2, 18.0a1.
I guess this could be marked as verified fixed on FF 17b1.
Status: RESOLVED → VERIFIED
Keywords: verifyme
This showed up on the explosive report today, and https://crash-stats.mozilla.com/report/list?signature=nsIDOMHTMLDocument_Write indicates there are crashes happening in 16.0.2.
No longer depends on: 798275
You need to log in before you can comment on or make changes to this bug.