Closed Bug 732343 Opened 12 years ago Closed 12 years ago

Crash in CNavDTD::OpenContainer @ SinkContext::OpenContainer

Categories

(Core :: DOM: HTML Parser, defect)

13 Branch
All
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla14
Tracking Status
firefox13 + fixed
firefox14 + fixed

People

(Reporter: scoobidiver, Assigned: hsivonen)

References

Details

(5 keywords, Whiteboard: [startupcrash][qa-])

Crash Data

Attachments

(2 files, 1 obsolete file)

It's a new crash signature that first appeared in 13.0a1/20120224. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5e756e59a794&tochange=cd120efbe4c6

One comment says: "Crashed when I wanted to catch an image from a Facebook page and save it."

Signature 	MOZ_Assert More Reports Search
UUID	9d7762dd-fb81-491f-a056-6415e2120302
Date Processed	2012-03-02 05:15:20
Uptime	8
Last Crash	5.0 hours before submission
Install Age	5.1 hours since version was first installed.
Install Time	2012-03-02 00:11:46
Product	Firefox
Version	13.0a1
Build ID	20120229031108
Release Channel	nightly
OS	Mac OS X
OS Version	10.6.8 10K549
Build Architecture	amd64
Build Architecture Info	family 6 model 23 stepping 6
Crash Reason	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash Address	0x0
AdapterVendorID: 0x10de, AdapterDeviceID: 0x 863
EMCheckCompatibility	True

Frame 	Module 	Signature [Expand] 	Source
0 	libmozglue.dylib 	MOZ_Assert 	mfbt/Assertions.cpp:76
1 	XUL 	SinkContext::OpenContainer 	content/html/document/src/nsHTMLContentSink.cpp:704
2 	XUL 	CNavDTD::OpenContainer 	parser/htmlparser/src/CNavDTD.cpp:2511
3 	XUL 	CNavDTD::HandleDefaultStartToken 	parser/htmlparser/src/CNavDTD.cpp:976
4 	XUL 	CNavDTD::HandleStartToken 	parser/htmlparser/src/CNavDTD.cpp:1312
5 	XUL 	CNavDTD::HandleToken 	parser/htmlparser/src/CNavDTD.cpp:654
6 	XUL 	CNavDTD::BuildModel 	parser/htmlparser/src/CNavDTD.cpp:245
7 	XUL 	nsParser::BuildModel 	parser/htmlparser/src/nsParser.cpp:1688
8 	XUL 	nsParser::ResumeParse 	parser/htmlparser/src/nsParser.cpp:1590
9 	XUL 	nsParser::OnDataAvailable 	parser/htmlparser/src/nsParser.cpp:2158
10 	XUL 	nsBaseChannel::OnDataAvailable 	netwerk/base/src/nsBaseChannel.cpp:773
11 	XUL 	nsInputStreamPump::OnStateTransfer 	netwerk/base/src/nsInputStreamPump.cpp:514
12 	XUL 	nsInputStreamPump::OnInputStreamReady 	netwerk/base/src/nsInputStreamPump.cpp:402
13 	XUL 	nsInputStreamReadyEvent::Run 	xpcom/io/nsStreamUtils.cpp:114
14 	XUL 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:657
15 	XUL 	NS_ProcessNextEvent_P 	obj-firefox/x86_64/xpcom/build/nsThreadUtils.cpp:245
16 	XUL 	nsXMLHttpRequest::Send 	content/base/src/nsXMLHttpRequest.cpp:2864
17 	XUL 	nsIXMLHttpRequest_Send 	obj-firefox/x86_64/js/xpconnect/src/dom_quickstubs.cpp:21495
18 	XUL 	js::InvokeKernel 	js/src/jscntxtinlines.h:311
19 	XUL 	js::Interpret 	js/src/jsinterp.cpp:2699
20 	XUL 	js::RunScript 	js/src/jsinterp.cpp:454
21 	XUL 	js::InvokeKernel 	js/src/jsinterp.cpp:517
22 	XUL 	js::Invoke 	js/src/jsinterp.h:157
23 	XUL 	JS_CallFunctionValue 	js/src/jsapi.cpp:5450
....

More reports at:
https://crash-stats.mozilla.com/report/list?signature=MOZ_Assert
Well, this is weird. The stack shows synchronous XHR spinning the event loop so that the old HTML parser processes a document that has stuff that about:blank doesn't have.

Yet, the stated regression range shows no changes to nsHTMLDocument.cpp which is responsible for choosing the parser.
(In reply to Henri Sivonen (:hsivonen) from comment #1)
> Yet, the stated regression range shows no changes to nsHTMLDocument.cpp
> which is responsible for choosing the parser.
It might be related to bug 729142.
(Mid-aired)

Strange indeed. Scoobidiver, how certain is that regression range?

(In reply to Scoobidiver from comment #2)
> (In reply to Henri Sivonen (:hsivonen) from comment #1)
> > Yet, the stated regression range shows no changes to nsHTMLDocument.cpp
> > which is responsible for choosing the parser.
> It might be related to bug 729142.

Nah, MOZ_STATIC_ASSERTs are compile-time, and don't cause crashes, but compile errors.
(In reply to Ms2ger from comment #3)
> Strange indeed. Scoobidiver, how certain is that regression range?
Extend the crash query to 4 weeks and take a look at crash reports. It has been hit by several users from 13.0a1/20120224.
MOZ_Assert is a recent crash signature, so it might be a new form of a pre-existing crash although the comment is related to a common task.
The top MOZ_Assert frame should really be ignored; that's probably a regression from bug 717540.
In all likeliness, the crash preexists with JS_Assert in its signature, although maybe it's filtered out by socorro.
(In reply to Mike Hommey [:glandium] from comment #6)
> In all likeliness, the crash preexists with JS_Assert in its signature,
> although maybe it's filtered out by socorro.
There are no crashes on Mac OS X and Windows with JS_Assert as signature in recent versions.
In 10.0.2, I see only one related crash report:
bp-35b43ce7-d293-4955-bc92-723cb2120221
(In reply to Ms2ger from comment #5)
> The top MOZ_Assert frame should really be ignored

Should a "MOZ_Assert" frame _always_ be ignored in a crash signature? If so, sounds like a skiplist / ignore list bug should be filed.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #8)
> (In reply to Ms2ger from comment #5)
> > The top MOZ_Assert frame should really be ignored
> 
> Should a "MOZ_Assert" frame _always_ be ignored in a crash signature? If so,
> sounds like a skiplist / ignore list bug should be filed.

I would tend to say it would, but i don't know what rules we have wrt skiplist. If JS_Assert is always skiplisted, then definitely yes.
Depends on: 732462
As I didn't find a potential pre-existing signature with the same volume of crashes (see comment 7), it should be considered as a regression in 13.0.
Crash Signature: [@ MOZ_Assert] → [@ MOZ_Assert] [@ SinkContext::OpenContainer]
Summary: Crash in CNavDTD::OpenContainer @ MOZ_Assert → Crash in CNavDTD::OpenContainer @ SinkContext::OpenContainer
This always crashes for me on startup with the current debug build.

This message is in the console:
Must not use HTMLContentSink for styles and scripts.. at e:/builds/moz2_slave/m-cen-w32-dbg/build/content/html/document/src/nsHTMLContentSink.cpp:704
Robert: any extensions that could be messing with about:blank?
Well about:blank is empty when I call it up. so it doesn't look like it.
...on latest nightly which i installed over the debug build and install the debug build over it.
That makes sense, though... I removed the code that would handle the style and script tags. If you can reproduce reliably, could you try confirming and narrowing down the regression range in comment 0?
What's the easiest way you know of to do that?
I ran the debug build before the one cited here and got this crash intermittently. From past experience, I'd say a number of crashes before this with an empty stack trace had been due to this bug. One of the patches just made it more frequent. Hope this helps narrow it down. I will attempt to figure out which one, but if this info can help someone else narrow it down then good.
It would be helpful to set a breakpoint right before where the app crashes and then examine the stack to find out the URL of the crashing document.
No longer depends on: 732462
Depends on: 733235
Er, what code did I change in CNavDTD.cpp or nsHTMLContentSink.cpp?
Thwere seems to be a bug in hg.mozilla.org, but you appear to be listed as the author on those two pages.
Who is mrbkap, jwalden, fred, bent, ehsan... etc
Keywords: topcrash
Whiteboard: [startupcrash]
Robert Claypool, does the problem occur if you start Firefox in the Safe Mode?
http://support.mozilla.org/en-US/kb/Safe%20Mode

Based on the stacks, there are two possible explanations for this bug:
 1) An extension has modified the stream about:blank dereferences to.
 2) There's a bug in the code that limits the old parser to about:blank.

I'll try to write some code that'll make the crash noticeably different in case #2.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
When I run a debug build outside of safe mode, it crashes before displaying any windows/loading any web pages. When I run a debug build in safe mode it starts to load web pages, but crashes before it finishes. IT seems to be a low memory sensitive crash.
Looking at the latest crash
https://crash-stats.mozilla.com/report/index/015e2020-0177-4607-86d0-62c452120326
turns up Bug #691271 which was closed with WORKSFORME, which is highly likely to be a dupe of this bug and the reason it worked was because of the lack of memory pressure.
(In reply to Robert Claypool from comment #26)
> When I run a debug build outside of safe mode, it crashes before displaying
> any windows/loading any web pages. When I run a debug build in safe mode it
> starts to load web pages, but crashes before it finishes.

Do you have any extensions installed (in the non-Safe mode)?
Attached patch Potential fix (obsolete) — Splinter Review
Yes, I have extensions installed. Obviously, they take up memory, and don't in safe-mode, which enable the debug-build to start trying to load pages. Non-debug builds use even less memory, so it hits the memory wall less frequently.
(In reply to Robert Claypool from comment #29)
> Yes, I have extensions installed. Obviously, they take up memory, and don't
> in safe-mode, which enable the debug-build to start trying to load pages.
> Non-debug builds use even less memory, so it hits the memory wall less
> frequently.

Which add-ons do you have specifically?
Adblock Plus 2.0.3
Add-on Compatibility Reporter 1.1
ChatZilla 0.9.88.1
Crash Report Helper 1.2
Cycle Collector Analyzer, about:ccdump 0.4.2
Cycle Collector Graph Analyzer, about:cc 0.0.1
DownloadHelper 4.9.8
Nightly Tester Tools 3.2.2
PDF Viewer 0.2.414
Rikaichan 2.04
Rikaichan Japanese Names Dictionary File 2.01.110527
Rikaichan Japanese-English Dictionary File 2.01.110527
Tabhunter 1.0.2
User Agent Switcher 0.7.3
WordWeb one-click lookup 5.7.1701
(In reply to Robert Claypool from comment #31)
> Crash Report Helper 1.2

The author of this extension has withdrawn it:
https://addons.mozilla.org/en-US/firefox/addon/crash-report-helper/

> Cycle Collector Graph Analyzer, about:cc 0.0.1

Can't find this anywhere.

> WordWeb one-click lookup 5.7.1701

Didn't find this exact version.

I didn't see a crash with the other extensions.
Crash Signature: [@ MOZ_Assert] [@ SinkContext::OpenContainer] → [@ MOZ_Assert] [@ SinkContext::OpenContainer] [@ SinkContext::OpenContainer(nsIParserNode const&)]
(In reply to Henri Sivonen (:hsivonen) from comment #33)
> > Cycle Collector Graph Analyzer, about:cc 0.0.1
> 
> Can't find this anywhere.

Olli wrote that.  See the attachment in bug 726346.
Comment on attachment 609629 [details] [diff] [review]
Defend the HTML load path against extensions doing unsupported things

Could you explain what this patch is supposed to do.
Is some addon overriding about:blank or what?


> nsHTMLDocument::StartDocumentLoad(const char* aCommand,
>                                   nsIChannel* aChannel,
>                                   nsILoadGroup* aLoadGroup,
>                                   nsISupports* aContainer,
>                                   nsIStreamListener **aDocListener,
>                                   bool aReset,
>                                   nsIContentSink* aSink)
> {
>+  if (!aCommand) {
>+    MOZ_NOT_REACHED("Command is mandatory");
>+    return NS_ERROR_INVALID_POINTER;
>+  }
>+  if (aSink) {
>+    MOZ_NOT_REACHED("Got a sink override. Should not happen for HTML doc.");
>+    return NS_ERROR_INVALID_ARG;
>+  }
>+  if (!mIsRegularHTML) {
>+    MOZ_NOT_REACHED("Must not set HTML doc to XHTML mode before load start.");
>+    return NS_ERROR_DOM_INVALID_STATE_ERR;
>+  }
Why is this right in general? What if someone has cloned an XHTML document.
(though, I can't now recall any case when after cloning one could actually end up calling StartDocumentLoad)
(In reply to Olli Pettay [:smaug] from comment #35)
> Could you explain what this patch is supposed to do.

It's supposed to do three things:

 1) Fatally assert in debug builds and return early with an error code if nsHTMLDocument::StartDocumentLoad gets arguments it does not know how to handle.

 2) Remove support for a legacy left-over parser command "view delayedContentLoad" that's never used by either mozilla-central or comm-central and that seems to be ancient. Remove support for passing explicit sink, which is also unused in our code and can't be supported going forward anyway.

 3) Make sure about:blank always behaves like an empty stream even if an extension makes about:blank dereference into a non-empty stream.

In general, it's supposed to prevent running the HTML load path in surprising and unsupported ways. The crash stack here suggests that the HTML load path is being run in a surprising and unsupported way in the wild.

> Is some addon overriding about:blank or what?

Possibly, but I haven't been able to reproduce the problem locally.

> Why is this right in general? 

You aren't supposed to call StartDocumentLoad on nsHTMLDocuments that aren't freshly constructed.

> What if someone has cloned an XHTML document.
> (though, I can't now recall any case when after cloning one could actually
> end up calling StartDocumentLoad)

You aren't supposed to do that. The old code also has an assertion along these lines.
Comment on attachment 609629 [details] [diff] [review]
Defend the HTML load path against extensions doing unsupported things


>+
>+  bool html = contentType.EqualsLiteral(TEXT_HTML);
>+  bool xhtml = contentType.Equals("application/xhtml+xml");
Perhaps 
bool xhtml = !html && contentType.Equals("application/xhtml+xml");


>   bool plainText = (contentType.EqualsLiteral(TEXT_PLAIN) ||
>     contentType.EqualsLiteral(TEXT_CSS) ||
>     contentType.EqualsLiteral(APPLICATION_JAVASCRIPT) ||
>     contentType.EqualsLiteral(APPLICATION_XJAVASCRIPT) ||
>     contentType.EqualsLiteral(TEXT_ECMASCRIPT) ||
>     contentType.EqualsLiteral(APPLICATION_ECMASCRIPT) ||
>     contentType.EqualsLiteral(TEXT_JAVASCRIPT));
In which case
bool plainText = !html && !xhtml && <the current EqualsLiteral checks>


>+  bool loadAsHtml5 = true;
>+
>+  if (!viewSource && xhtml) {
>+      // We're parsing XHTML as XML, remember that.
>+      mIsRegularHTML = false;
>+      mCompatMode = eCompatibility_FullStandards;
>+      loadAsHtml5 = false;
>   }
Please use 2 space indentation
Attachment #609629 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #37)
> Comment on attachment 609629 [details] [diff] [review]
> Defend the HTML load path against extensions doing unsupported things
> 
> 
> >+
> >+  bool html = contentType.EqualsLiteral(TEXT_HTML);
> >+  bool xhtml = contentType.Equals("application/xhtml+xml");
> Perhaps 
> bool xhtml = !html && contentType.Equals("application/xhtml+xml");
...
> bool plainText = !html && !xhtml && <the current EqualsLiteral checks>
> 

Why? A sring cannot equal multiple distinct lterals at the same time anyway.
Just to reduce string comparisons. If we know that we have html, contentType can't be xhtml etc.
(In reply to Olli Pettay [:smaug] from comment #39)
> Just to reduce string comparisons.

Oh. Good point. Thanks. Landed with the change:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a6c661655312
Flags: in-testsuite-
Target Milestone: --- → mozilla14
https://hg.mozilla.org/mozilla-central/rev/a6c661655312
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Adding the addon-compat keyword. Although this behavior has been expected of extensions ever since Firefox 4, the constraints haven't been enforced until now (Firefox 14 currently; 13 if this gets branch approval).  Developer documentation should say that extensions shouldn't call nsIDocument::StartDocumentLoad(). Instead, extensions should use XHR when they don't need a docshell and an iframe if they need a docshell.
Keywords: addon-compat
The backport differs from the original by including the fixes for bug 741384 and bug 741218 which were fallout from this bug.  Also, since Firefox 13 still includes another implementation of nsIHTMLContentSink, the backport adds a method to nsIHTMLContentSink for querying whether the instance is backed by HTMLContentSink.
Those crashes look something very different than this one.
Then perhaps Bug #732065 shouldn't have been marked a duplicate of this, because for those crashes that bug is referenced as a "Related Bug" in the crash report.
Ollie, if the crashes look different then is the patch for Aurora backport able to be reviewed and nominated for uplift?  Would like to see this land in Aurora before merge day if possible and risk low enough.
Attachment #611756 - Flags: review?(bugs) → review+
Comment on attachment 611756 [details] [diff] [review]
Backport to Aurora, includes fixes for bug 741384 and bug 741218

[Approval Request Comment]
Regression caused by (bug #): Probably bug 720124, bug 715112 and similar earlier bugs combined with unexpected behavior most likely provoked by an extension. The code paths affected are never taken when running our test suite. The most likely explanation of how those paths are taken in the wild is that some extension makes it happen.
User impact if declined: Crashes. Presumably only with some extension that hasn't been identified.
Testing completed (on m-c, etc.): Has baked on m-c and the fixes for the fallout have been rolled into this backport patch.
Risk to taking this patch (and alternatives if risky): If the crash was really trigged by an extension, this patch probably makes that extension not work as designed (but not crash). An alternative (probably more risky) would be a merge of nsHTMLContentSink as of mozilla-beta and nsHTMLContentSink as of mozilla-aurora. In that case, the extension (whose existence is assumed) would break in the Firefox 14 train anyway, so it doesn't seem particularly useful to avoid landing this patch for 13.
String changes made by this patch: None.
Attachment #611756 - Flags: approval-mozilla-aurora?
Comment on attachment 611756 [details] [diff] [review]
Backport to Aurora, includes fixes for bug 741384 and bug 741218

[Triage Comment]
Should fix a top crash - approved for Aurora 13.
Attachment #611756 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Is there a reproducible testcase for this crash?
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #52)
> Is there a reproducible testcase for this crash?

I'm not aware of one.
Whiteboard: [startupcrash] → [startupcrash][qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: