Last Comment Bug 732343 - Crash in CNavDTD::OpenContainer @ SinkContext::OpenContainer
: Crash in CNavDTD::OpenContainer @ SinkContext::OpenContainer
Status: RESOLVED FIXED
[startupcrash][qa-]
: addon-compat, crash, regression, reproducible, topcrash
Product: Core
Classification: Components
Component: HTML: Parser (show other bugs)
: 13 Branch
: All Mac OS X
: -- critical (vote)
: mozilla14
Assigned To: Henri Sivonen (:hsivonen)
:
Mentors:
: 732065 (view as bug list)
Depends on: 733235
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-02 00:51 PST by Scoobidiver (away)
Modified: 2015-10-07 18:43 PDT (History)
13 users (show)
hsivonen: in‑testsuite-
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
fixed


Attachments
Potential fix (17.54 KB, patch)
2012-03-26 09:15 PDT, Henri Sivonen (:hsivonen)
no flags Details | Diff | Review
Defend the HTML load path against extensions doing unsupported things (19.80 KB, patch)
2012-03-26 23:34 PDT, Henri Sivonen (:hsivonen)
bugs: review+
Details | Diff | Review
Backport to Aurora, includes fixes for bug 741384 and bug 741218 (22.32 KB, patch)
2012-04-03 02:56 PDT, Henri Sivonen (:hsivonen)
bugs: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Review

Description Scoobidiver (away) 2012-03-02 00:51:03 PST
It's a new crash signature that first appeared in 13.0a1/20120224. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5e756e59a794&tochange=cd120efbe4c6

One comment says: "Crashed when I wanted to catch an image from a Facebook page and save it."

Signature 	MOZ_Assert More Reports Search
UUID	9d7762dd-fb81-491f-a056-6415e2120302
Date Processed	2012-03-02 05:15:20
Uptime	8
Last Crash	5.0 hours before submission
Install Age	5.1 hours since version was first installed.
Install Time	2012-03-02 00:11:46
Product	Firefox
Version	13.0a1
Build ID	20120229031108
Release Channel	nightly
OS	Mac OS X
OS Version	10.6.8 10K549
Build Architecture	amd64
Build Architecture Info	family 6 model 23 stepping 6
Crash Reason	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash Address	0x0
AdapterVendorID: 0x10de, AdapterDeviceID: 0x 863
EMCheckCompatibility	True

Frame 	Module 	Signature [Expand] 	Source
0 	libmozglue.dylib 	MOZ_Assert 	mfbt/Assertions.cpp:76
1 	XUL 	SinkContext::OpenContainer 	content/html/document/src/nsHTMLContentSink.cpp:704
2 	XUL 	CNavDTD::OpenContainer 	parser/htmlparser/src/CNavDTD.cpp:2511
3 	XUL 	CNavDTD::HandleDefaultStartToken 	parser/htmlparser/src/CNavDTD.cpp:976
4 	XUL 	CNavDTD::HandleStartToken 	parser/htmlparser/src/CNavDTD.cpp:1312
5 	XUL 	CNavDTD::HandleToken 	parser/htmlparser/src/CNavDTD.cpp:654
6 	XUL 	CNavDTD::BuildModel 	parser/htmlparser/src/CNavDTD.cpp:245
7 	XUL 	nsParser::BuildModel 	parser/htmlparser/src/nsParser.cpp:1688
8 	XUL 	nsParser::ResumeParse 	parser/htmlparser/src/nsParser.cpp:1590
9 	XUL 	nsParser::OnDataAvailable 	parser/htmlparser/src/nsParser.cpp:2158
10 	XUL 	nsBaseChannel::OnDataAvailable 	netwerk/base/src/nsBaseChannel.cpp:773
11 	XUL 	nsInputStreamPump::OnStateTransfer 	netwerk/base/src/nsInputStreamPump.cpp:514
12 	XUL 	nsInputStreamPump::OnInputStreamReady 	netwerk/base/src/nsInputStreamPump.cpp:402
13 	XUL 	nsInputStreamReadyEvent::Run 	xpcom/io/nsStreamUtils.cpp:114
14 	XUL 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:657
15 	XUL 	NS_ProcessNextEvent_P 	obj-firefox/x86_64/xpcom/build/nsThreadUtils.cpp:245
16 	XUL 	nsXMLHttpRequest::Send 	content/base/src/nsXMLHttpRequest.cpp:2864
17 	XUL 	nsIXMLHttpRequest_Send 	obj-firefox/x86_64/js/xpconnect/src/dom_quickstubs.cpp:21495
18 	XUL 	js::InvokeKernel 	js/src/jscntxtinlines.h:311
19 	XUL 	js::Interpret 	js/src/jsinterp.cpp:2699
20 	XUL 	js::RunScript 	js/src/jsinterp.cpp:454
21 	XUL 	js::InvokeKernel 	js/src/jsinterp.cpp:517
22 	XUL 	js::Invoke 	js/src/jsinterp.h:157
23 	XUL 	JS_CallFunctionValue 	js/src/jsapi.cpp:5450
....

More reports at:
https://crash-stats.mozilla.com/report/list?signature=MOZ_Assert
Comment 1 Henri Sivonen (:hsivonen) 2012-03-02 02:08:50 PST
Well, this is weird. The stack shows synchronous XHR spinning the event loop so that the old HTML parser processes a document that has stuff that about:blank doesn't have.

Yet, the stated regression range shows no changes to nsHTMLDocument.cpp which is responsible for choosing the parser.
Comment 2 Scoobidiver (away) 2012-03-02 02:38:47 PST
(In reply to Henri Sivonen (:hsivonen) from comment #1)
> Yet, the stated regression range shows no changes to nsHTMLDocument.cpp
> which is responsible for choosing the parser.
It might be related to bug 729142.
Comment 3 :Ms2ger 2012-03-02 02:48:40 PST
(Mid-aired)

Strange indeed. Scoobidiver, how certain is that regression range?

(In reply to Scoobidiver from comment #2)
> (In reply to Henri Sivonen (:hsivonen) from comment #1)
> > Yet, the stated regression range shows no changes to nsHTMLDocument.cpp
> > which is responsible for choosing the parser.
> It might be related to bug 729142.

Nah, MOZ_STATIC_ASSERTs are compile-time, and don't cause crashes, but compile errors.
Comment 4 Scoobidiver (away) 2012-03-02 03:00:03 PST
(In reply to Ms2ger from comment #3)
> Strange indeed. Scoobidiver, how certain is that regression range?
Extend the crash query to 4 weeks and take a look at crash reports. It has been hit by several users from 13.0a1/20120224.
MOZ_Assert is a recent crash signature, so it might be a new form of a pre-existing crash although the comment is related to a common task.
Comment 5 :Ms2ger 2012-03-02 03:09:53 PST
The top MOZ_Assert frame should really be ignored; that's probably a regression from bug 717540.
Comment 6 Mike Hommey [:glandium] 2012-03-02 03:17:28 PST
In all likeliness, the crash preexists with JS_Assert in its signature, although maybe it's filtered out by socorro.
Comment 7 Scoobidiver (away) 2012-03-02 03:27:58 PST
(In reply to Mike Hommey [:glandium] from comment #6)
> In all likeliness, the crash preexists with JS_Assert in its signature,
> although maybe it's filtered out by socorro.
There are no crashes on Mac OS X and Windows with JS_Assert as signature in recent versions.
In 10.0.2, I see only one related crash report:
bp-35b43ce7-d293-4955-bc92-723cb2120221
Comment 8 Robert Kaiser (not working on stability any more) 2012-03-02 05:36:48 PST
(In reply to Ms2ger from comment #5)
> The top MOZ_Assert frame should really be ignored

Should a "MOZ_Assert" frame _always_ be ignored in a crash signature? If so, sounds like a skiplist / ignore list bug should be filed.
Comment 9 Mike Hommey [:glandium] 2012-03-02 08:03:57 PST
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #8)
> (In reply to Ms2ger from comment #5)
> > The top MOZ_Assert frame should really be ignored
> 
> Should a "MOZ_Assert" frame _always_ be ignored in a crash signature? If so,
> sounds like a skiplist / ignore list bug should be filed.

I would tend to say it would, but i don't know what rules we have wrt skiplist. If JS_Assert is always skiplisted, then definitely yes.
Comment 10 Scoobidiver (away) 2012-03-02 09:30:31 PST
As I didn't find a potential pre-existing signature with the same volume of crashes (see comment 7), it should be considered as a regression in 13.0.
Comment 11 :Ms2ger 2012-03-02 09:50:09 PST
*** Bug 732065 has been marked as a duplicate of this bug. ***
Comment 12 Robert Claypool 2012-03-02 09:59:17 PST
This always crashes for me on startup with the current debug build.

This message is in the console:
Must not use HTMLContentSink for styles and scripts.. at e:/builds/moz2_slave/m-cen-w32-dbg/build/content/html/document/src/nsHTMLContentSink.cpp:704
Comment 13 :Ms2ger 2012-03-02 11:40:48 PST
Robert: any extensions that could be messing with about:blank?
Comment 14 Robert Claypool 2012-03-03 12:56:58 PST
Well about:blank is empty when I call it up. so it doesn't look like it.
Comment 15 Robert Claypool 2012-03-03 12:58:04 PST
...on latest nightly which i installed over the debug build and install the debug build over it.
Comment 16 :Ms2ger 2012-03-03 13:00:49 PST
That makes sense, though... I removed the code that would handle the style and script tags. If you can reproduce reliably, could you try confirming and narrowing down the regression range in comment 0?
Comment 17 Robert Claypool 2012-03-03 13:16:24 PST
What's the easiest way you know of to do that?
Comment 18 :Ms2ger 2012-03-03 13:21:11 PST
<http://mercurial.selenic.com/wiki/BisectExtension>, probably
Comment 19 Robert Claypool 2012-03-04 11:04:04 PST
I ran the debug build before the one cited here and got this crash intermittently. From past experience, I'd say a number of crashes before this with an empty stack trace had been due to this bug. One of the patches just made it more frequent. Hope this helps narrow it down. I will attempt to figure out which one, but if this info can help someone else narrow it down then good.
Comment 20 Henri Sivonen (:hsivonen) 2012-03-04 23:23:54 PST
It would be helpful to set a breakpoint right before where the app crashes and then examine the stack to find out the URL of the crashing document.
Comment 22 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-03-19 03:46:32 PDT
Er, what code did I change in CNavDTD.cpp or nsHTMLContentSink.cpp?
Comment 23 Robert Claypool 2012-03-19 08:44:03 PDT
Thwere seems to be a bug in hg.mozilla.org, but you appear to be listed as the author on those two pages.
Comment 24 Robert Claypool 2012-03-19 09:04:37 PDT
Who is mrbkap, jwalden, fred, bent, ehsan... etc
Comment 25 Henri Sivonen (:hsivonen) 2012-03-26 01:57:40 PDT
Robert Claypool, does the problem occur if you start Firefox in the Safe Mode?
http://support.mozilla.org/en-US/kb/Safe%20Mode

Based on the stacks, there are two possible explanations for this bug:
 1) An extension has modified the stream about:blank dereferences to.
 2) There's a bug in the code that limits the old parser to about:blank.

I'll try to write some code that'll make the crash noticeably different in case #2.
Comment 26 Robert Claypool 2012-03-26 08:51:31 PDT
When I run a debug build outside of safe mode, it crashes before displaying any windows/loading any web pages. When I run a debug build in safe mode it starts to load web pages, but crashes before it finishes. IT seems to be a low memory sensitive crash.
Looking at the latest crash
https://crash-stats.mozilla.com/report/index/015e2020-0177-4607-86d0-62c452120326
turns up Bug #691271 which was closed with WORKSFORME, which is highly likely to be a dupe of this bug and the reason it worked was because of the lack of memory pressure.
Comment 27 Henri Sivonen (:hsivonen) 2012-03-26 09:14:18 PDT
(In reply to Robert Claypool from comment #26)
> When I run a debug build outside of safe mode, it crashes before displaying
> any windows/loading any web pages. When I run a debug build in safe mode it
> starts to load web pages, but crashes before it finishes.

Do you have any extensions installed (in the non-Safe mode)?
Comment 28 Henri Sivonen (:hsivonen) 2012-03-26 09:15:49 PDT
Created attachment 609344 [details] [diff] [review]
Potential fix
Comment 29 Robert Claypool 2012-03-26 09:46:42 PDT
Yes, I have extensions installed. Obviously, they take up memory, and don't in safe-mode, which enable the debug-build to start trying to load pages. Non-debug builds use even less memory, so it hits the memory wall less frequently.
Comment 30 Henri Sivonen (:hsivonen) 2012-03-26 11:19:17 PDT
(In reply to Robert Claypool from comment #29)
> Yes, I have extensions installed. Obviously, they take up memory, and don't
> in safe-mode, which enable the debug-build to start trying to load pages.
> Non-debug builds use even less memory, so it hits the memory wall less
> frequently.

Which add-ons do you have specifically?
Comment 31 Robert Claypool 2012-03-26 12:32:33 PDT
Adblock Plus 2.0.3
Add-on Compatibility Reporter 1.1
ChatZilla 0.9.88.1
Crash Report Helper 1.2
Cycle Collector Analyzer, about:ccdump 0.4.2
Cycle Collector Graph Analyzer, about:cc 0.0.1
DownloadHelper 4.9.8
Nightly Tester Tools 3.2.2
PDF Viewer 0.2.414
Rikaichan 2.04
Rikaichan Japanese Names Dictionary File 2.01.110527
Rikaichan Japanese-English Dictionary File 2.01.110527
Tabhunter 1.0.2
User Agent Switcher 0.7.3
WordWeb one-click lookup 5.7.1701
Comment 32 Henri Sivonen (:hsivonen) 2012-03-26 23:34:58 PDT
Created attachment 609629 [details] [diff] [review]
Defend the HTML load path against extensions doing unsupported things
Comment 33 Henri Sivonen (:hsivonen) 2012-03-27 02:36:31 PDT
(In reply to Robert Claypool from comment #31)
> Crash Report Helper 1.2

The author of this extension has withdrawn it:
https://addons.mozilla.org/en-US/firefox/addon/crash-report-helper/

> Cycle Collector Graph Analyzer, about:cc 0.0.1

Can't find this anywhere.

> WordWeb one-click lookup 5.7.1701

Didn't find this exact version.

I didn't see a crash with the other extensions.
Comment 34 Andrew McCreight (PTO-ish through 6-29) [:mccr8] 2012-03-27 07:11:06 PDT
(In reply to Henri Sivonen (:hsivonen) from comment #33)
> > Cycle Collector Graph Analyzer, about:cc 0.0.1
> 
> Can't find this anywhere.

Olli wrote that.  See the attachment in bug 726346.
Comment 35 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-03-28 08:48:19 PDT
Comment on attachment 609629 [details] [diff] [review]
Defend the HTML load path against extensions doing unsupported things

Could you explain what this patch is supposed to do.
Is some addon overriding about:blank or what?


> nsHTMLDocument::StartDocumentLoad(const char* aCommand,
>                                   nsIChannel* aChannel,
>                                   nsILoadGroup* aLoadGroup,
>                                   nsISupports* aContainer,
>                                   nsIStreamListener **aDocListener,
>                                   bool aReset,
>                                   nsIContentSink* aSink)
> {
>+  if (!aCommand) {
>+    MOZ_NOT_REACHED("Command is mandatory");
>+    return NS_ERROR_INVALID_POINTER;
>+  }
>+  if (aSink) {
>+    MOZ_NOT_REACHED("Got a sink override. Should not happen for HTML doc.");
>+    return NS_ERROR_INVALID_ARG;
>+  }
>+  if (!mIsRegularHTML) {
>+    MOZ_NOT_REACHED("Must not set HTML doc to XHTML mode before load start.");
>+    return NS_ERROR_DOM_INVALID_STATE_ERR;
>+  }
Why is this right in general? What if someone has cloned an XHTML document.
(though, I can't now recall any case when after cloning one could actually end up calling StartDocumentLoad)
Comment 36 Henri Sivonen (:hsivonen) 2012-03-28 23:25:10 PDT
(In reply to Olli Pettay [:smaug] from comment #35)
> Could you explain what this patch is supposed to do.

It's supposed to do three things:

 1) Fatally assert in debug builds and return early with an error code if nsHTMLDocument::StartDocumentLoad gets arguments it does not know how to handle.

 2) Remove support for a legacy left-over parser command "view delayedContentLoad" that's never used by either mozilla-central or comm-central and that seems to be ancient. Remove support for passing explicit sink, which is also unused in our code and can't be supported going forward anyway.

 3) Make sure about:blank always behaves like an empty stream even if an extension makes about:blank dereference into a non-empty stream.

In general, it's supposed to prevent running the HTML load path in surprising and unsupported ways. The crash stack here suggests that the HTML load path is being run in a surprising and unsupported way in the wild.

> Is some addon overriding about:blank or what?

Possibly, but I haven't been able to reproduce the problem locally.

> Why is this right in general? 

You aren't supposed to call StartDocumentLoad on nsHTMLDocuments that aren't freshly constructed.

> What if someone has cloned an XHTML document.
> (though, I can't now recall any case when after cloning one could actually
> end up calling StartDocumentLoad)

You aren't supposed to do that. The old code also has an assertion along these lines.
Comment 37 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-03-30 10:48:03 PDT
Comment on attachment 609629 [details] [diff] [review]
Defend the HTML load path against extensions doing unsupported things


>+
>+  bool html = contentType.EqualsLiteral(TEXT_HTML);
>+  bool xhtml = contentType.Equals("application/xhtml+xml");
Perhaps 
bool xhtml = !html && contentType.Equals("application/xhtml+xml");


>   bool plainText = (contentType.EqualsLiteral(TEXT_PLAIN) ||
>     contentType.EqualsLiteral(TEXT_CSS) ||
>     contentType.EqualsLiteral(APPLICATION_JAVASCRIPT) ||
>     contentType.EqualsLiteral(APPLICATION_XJAVASCRIPT) ||
>     contentType.EqualsLiteral(TEXT_ECMASCRIPT) ||
>     contentType.EqualsLiteral(APPLICATION_ECMASCRIPT) ||
>     contentType.EqualsLiteral(TEXT_JAVASCRIPT));
In which case
bool plainText = !html && !xhtml && <the current EqualsLiteral checks>


>+  bool loadAsHtml5 = true;
>+
>+  if (!viewSource && xhtml) {
>+      // We're parsing XHTML as XML, remember that.
>+      mIsRegularHTML = false;
>+      mCompatMode = eCompatibility_FullStandards;
>+      loadAsHtml5 = false;
>   }
Please use 2 space indentation
Comment 38 Henri Sivonen (:hsivonen) 2012-03-30 11:13:37 PDT
(In reply to Olli Pettay [:smaug] from comment #37)
> Comment on attachment 609629 [details] [diff] [review]
> Defend the HTML load path against extensions doing unsupported things
> 
> 
> >+
> >+  bool html = contentType.EqualsLiteral(TEXT_HTML);
> >+  bool xhtml = contentType.Equals("application/xhtml+xml");
> Perhaps 
> bool xhtml = !html && contentType.Equals("application/xhtml+xml");
...
> bool plainText = !html && !xhtml && <the current EqualsLiteral checks>
> 

Why? A sring cannot equal multiple distinct lterals at the same time anyway.
Comment 39 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-03-30 11:21:56 PDT
Just to reduce string comparisons. If we know that we have html, contentType can't be xhtml etc.
Comment 40 Henri Sivonen (:hsivonen) 2012-03-31 07:12:07 PDT
(In reply to Olli Pettay [:smaug] from comment #39)
> Just to reduce string comparisons.

Oh. Good point. Thanks. Landed with the change:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a6c661655312
Comment 41 Ed Morley [:emorley] 2012-03-31 19:15:47 PDT
https://hg.mozilla.org/mozilla-central/rev/a6c661655312
Comment 42 Henri Sivonen (:hsivonen) 2012-04-03 02:51:04 PDT
Adding the addon-compat keyword. Although this behavior has been expected of extensions ever since Firefox 4, the constraints haven't been enforced until now (Firefox 14 currently; 13 if this gets branch approval).  Developer documentation should say that extensions shouldn't call nsIDocument::StartDocumentLoad(). Instead, extensions should use XHR when they don't need a docshell and an iframe if they need a docshell.
Comment 43 Henri Sivonen (:hsivonen) 2012-04-03 02:56:29 PDT
Created attachment 611756 [details] [diff] [review]
Backport to Aurora, includes fixes for bug 741384 and bug 741218
Comment 44 Henri Sivonen (:hsivonen) 2012-04-03 03:00:12 PDT
The backport differs from the original by including the fixes for bug 741384 and bug 741218 which were fallout from this bug.  Also, since Firefox 13 still includes another implementation of nsIHTMLContentSink, the backport adds a method to nsIHTMLContentSink for querying whether the instance is backed by HTMLContentSink.
Comment 46 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-04-06 01:19:16 PDT
Those crashes look something very different than this one.
Comment 47 Robert Claypool 2012-04-06 06:51:05 PDT
Then perhaps Bug #732065 shouldn't have been marked a duplicate of this, because for those crashes that bug is referenced as a "Related Bug" in the crash report.
Comment 48 Lukas Blakk [:lsblakk] use ?needinfo 2012-04-09 16:05:51 PDT
Ollie, if the crashes look different then is the patch for Aurora backport able to be reviewed and nominated for uplift?  Would like to see this land in Aurora before merge day if possible and risk low enough.
Comment 49 Henri Sivonen (:hsivonen) 2012-04-10 00:24:11 PDT
Comment on attachment 611756 [details] [diff] [review]
Backport to Aurora, includes fixes for bug 741384 and bug 741218

[Approval Request Comment]
Regression caused by (bug #): Probably bug 720124, bug 715112 and similar earlier bugs combined with unexpected behavior most likely provoked by an extension. The code paths affected are never taken when running our test suite. The most likely explanation of how those paths are taken in the wild is that some extension makes it happen.
User impact if declined: Crashes. Presumably only with some extension that hasn't been identified.
Testing completed (on m-c, etc.): Has baked on m-c and the fixes for the fallout have been rolled into this backport patch.
Risk to taking this patch (and alternatives if risky): If the crash was really trigged by an extension, this patch probably makes that extension not work as designed (but not crash). An alternative (probably more risky) would be a merge of nsHTMLContentSink as of mozilla-beta and nsHTMLContentSink as of mozilla-aurora. In that case, the extension (whose existence is assumed) would break in the Firefox 14 train anyway, so it doesn't seem particularly useful to avoid landing this patch for 13.
String changes made by this patch: None.
Comment 50 Alex Keybl [:akeybl] 2012-04-10 12:15:28 PDT
Comment on attachment 611756 [details] [diff] [review]
Backport to Aurora, includes fixes for bug 741384 and bug 741218

[Triage Comment]
Should fix a top crash - approved for Aurora 13.
Comment 51 Henri Sivonen (:hsivonen) 2012-04-10 23:40:57 PDT
Thanks for the approval. Landed.
https://hg.mozilla.org/releases/mozilla-aurora/rev/19ddd5081c9e
Comment 52 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-04-26 13:37:22 PDT
Is there a reproducible testcase for this crash?
Comment 53 Henri Sivonen (:hsivonen) 2012-04-29 05:01:59 PDT
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #52)
> Is there a reproducible testcase for this crash?

I'm not aware of one.

Note You need to log in before you can comment on or make changes to this bug.