Closed Bug 619043 Opened 14 years ago Closed 13 years ago

crash [@ nsHtml5TreeOpExecutor::RunFlushLoop()] mainly with NoScript

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla10

People

(Reporter: wsmwk, Assigned: hsivonen)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file, 1 obsolete file)

crash [@ nsHtml5TreeOpExecutor::RunFlushLoop()]

4.0 betas
bp-54e3249d-ca51-4083-8505-ee24e2101125
EXCEPTION_ACCESS_VIOLATION_READ
0x4C
0	xul.dll	nsHtml5TreeOpExecutor::RunFlushLoop	parser/html/nsHtml5TreeOpExecutor.cpp:458
1	xul.dll	nsHtml5ExecutorFlusher::Run	parser/html/nsHtml5TreeOpExecutor.cpp:90
2	xul.dll	nsThread::ProcessNextEvent	xpcom/threads/nsThread.cpp:626
3	xul.dll	mozilla::ipc::MessagePump::Run	ipc/glue/MessagePump.cpp:110
4	xul.dll	xul.dll@0xb0bcbb	
5	xul.dll	MessageLoop::RunHandler	ipc/chromium/src/base/message_loop.cc:202
6	mozcrt19.dll	_VEC_memzero	
7	xul.dll	xul.dll@0x35996d	
8	firefox.exe	firefox.exe@0x1bb7	
9	ntdll.dll	WinSqmSetIfMaxDWORD	
10	ntdll.dll	_RtlUserThreadStart
This _literally_ exploded in yesterday's stats on 4.0*, with some very high correlations:

Modules:
90% (253/281) vs.   3% (1751/52073) avgssff4.dll
90% (252/281) vs.   3% (1765/52073) avglngx.dll
90% (253/281) vs.   4% (1832/52073) avgcfgx.dll
90% (253/281) vs.   4% (1878/52073) avgxpl.dll
90% (253/281) vs.   4% (1900/52073) avglogx.dll

Add-Ons:
85% (238/281) vs.   4% (1937/52073) {1E73965B-8B48-48be-9C8D-68B920ABC1C4}

This crash was in the range of 15-25 crashes / 1M ADU from 2011-03-20 to 2011-03-29 and on 2011-03-30 jumped to ~150 crashes / 1M ADU (-29 and -30 numbers roughly "normalized" for throttling of crashes).

This is the #24 topcrash in 4.0* numbers for 2011-03-30 alone.
Those correlations sure make it look like an AVG product is involved.
Kairo, are you sure you pasted the correlations on the right bug? I don't see this signature in the top crashes?
(In reply to comment #4)
> Kairo, are you sure you pasted the correlations on the right bug? I don't see
> this signature in the top crashes?

It's #122 in https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/4.0 right now - note that this page is a 7-day report on 4.0 alone, while my number in comment #2 is a single-day number for 4.0*, as calculated by my experimental report at http://test.kairo.at/socorro/2011-03-30.4.0.explosiveness.html
Summary: crash [@ nsHtml5TreeOpExecutor::RunFlushLoop()] → crash [@ nsHtml5TreeOpExecutor::RunFlushLoop()] mainly with AVG 10.0
Checking today this stack has 1584 Windows crashes for the last 7 days.
Crash Signature: [@ nsHtml5TreeOpExecutor::RunFlushLoop()]
Can someone have a look into whether there is a correlation with the NoScript extension? From what I have seen many people having this crash had this extension (id: {73a6fe31-595d-460b-a920-fcc0f8843232})
Maybe it's some DOM bug esp. triggered by NoScript.
Here are crash reports:
https://crash-stats.mozilla.com/report/list?signature=nsHtml5TreeOpExecutor%3A%3ARunFlushLoop%28%29

They show indeed a correlation with NoScript and no longer with AVG in 6.0.2:
71% (42/59) vs.   1% (1333/114741) {73a6fe31-595d-460b-a920-fcc0f8843232} (NoScript, https://addons.mozilla.org/addon/722)
47% (28/59) vs.   9% (10758/114741) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)

Stack traces are now different from the ones in comment 0 and seem to be related to IPC:
0 	xul.dll 	nsHtml5TreeOpExecutor::RunFlushLoop 	parser/html/nsHtml5TreeOpExecutor.cpp:458
1 	xul.dll 	nsHtml5ExecutorReflusher::Run 	parser/html/nsHtml5StreamParser.cpp:153
2 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:618
3 	nspr4.dll 	_MD_CURRENT_THREAD 	nsprpub/pr/src/md/windows/w95thred.c:308
4 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:134
5 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:202
6 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:176
7 	xul.dll 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:189
8 	xul.dll 	xul.dll@0xb7153f 	
9 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:222
10 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3686
11 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:106
12 	firefox.exe 	__tmainCRTStartup 	obj-firefox/memory/jemalloc/crtsrc/crtexe.c:591
13 	kernel32.dll 	BaseProcessStart
Summary: crash [@ nsHtml5TreeOpExecutor::RunFlushLoop()] mainly with AVG 10.0 → crash [@ nsHtml5TreeOpExecutor::RunFlushLoop()] mainly with NoScript
Giorgio, does noscript stop a document load from an observer that fires synchronously with DOM building?
(In reply to Henri Sivonen (:hsivonen) from comment #9)
> Giorgio, does noscript stop a document load from an observer that fires
> synchronously with DOM building?

Yes it may (the observer is, yet again, for http-on-modify-request).
And this time there's no way to do otherwise without breaking NoScript (actually, my work around for bug 677050 depends on this ability). So, if this is the culprit of this crash, please fix Gecko back, or give me a way to block *and* possibly redirect requests asynchronously at that stage.
should this be tracked or fx7 or 8.  for the last several weeks its been low volume only in final releases. starting around sept 13 it started increasing in daily volume on beta and aurora channels.
         nsHtml5TreeOpExecutor::RunFlushLoop
date     total    breakdown by build
         crashes  count build, count build, ...


20110910 3  	1 7.02011082417, 
        		1 6.0.22011090213, 	1 4.0.12011041322, 
20110911 1 6.0.22011090213 	1 , 
20110912 5  	3 6.0.22011090213, 
        		1 6.0.22011090606, 	1 6.0.22011090517, 
20110913 11  	5 6.0.22011090213, 
        		2 7.02011090813, 	2 6.0.12011083020, 
        		2 6.02011081116, 
20110914 24  	11 6.0.22011090213, 
        		3 7.02011090813, 	3 7.02011082417, 
        		2 7.02011090909, 	1 8.0a22011090804, 
        		1 7.02011091214, 	1 6.0.22011090517, 
        		1 5.0.12011070718, 	1 4.0.12011041322, 
20110915 8  	5 6.0.22011090213, 
        		2 6.0.22011090517, 	1 7.02011090813,
I expect this patch to fix the crash. I didn't try to reproduce the crash or to write a test case, because I expect that to be a similar exercise in futility as with bug 677050.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Attachment #562041 - Flags: review?(bzbarsky)
Comment on attachment 562041 [details] [diff] [review]
Defer termination if nsIParser::Terminate() is called when it's not safe

r=me, but should you have an aTerminated test in the DidBuildModel conditional?
Attachment #562041 - Flags: review?(bzbarsky) → review+
Adding Mac and Linux specific signature, since it is almost the exact same signature. Mac and Linux crashes happen across all versions and I see No Script installed in few of the crashes.
Crash Signature: [@ nsHtml5TreeOpExecutor::RunFlushLoop()] → [@ nsHtml5TreeOpExecutor::RunFlushLoop()] [@ nsHtml5TreeOpExecutor::RunFlushLoop ]
OS: Windows Vista → All
Hardware: x86 → All
Comment on attachment 562041 [details] [diff] [review]
Defer termination if nsIParser::Terminate() is called when it's not safe

Sorry about wasting your time with an over-eager review request. (I was hoping to proceed in time for Firefox 7.) This patch can't be landed, because there are other situations where Terminate() is supposed to work while the flush loop is running.
Attachment #562041 - Attachment is obsolete: true
Attached patch Less bogus fixSplinter Review
Comment on attachment 562368 [details] [diff] [review]
Less bogus fix

Checking mParser for null is the pre-existing termination check and the methods to which I added early returns already had early returns. Also, the tryserver was green with this except for one leak that looked random.
Attachment #562368 - Flags: review?(bzbarsky)
Comment on attachment 562368 [details] [diff] [review]
Less bogus fix

r=me but could you please file a followup bug on reverting this change once bug 694101 is fixed and load start becomes async?
Attachment #562368 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky (:bz) from comment #19)
> Comment on attachment 562368 [details] [diff] [review] [diff] [details] [review]
> Less bogus fix
> 
> r=me 

Thank you.

> but could you please file a followup bug on reverting this change once
> bug 694101 is fixed and load start becomes async?

Filed bug 694528.
Comment on attachment 562368 [details] [diff] [review]
Less bogus fix

Nominating for aurora and beta, since this would be a relatively simple way to reduce crashiness for NoScript users.
Attachment #562368 - Flags: approval-mozilla-beta?
Attachment #562368 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/fa65bb9a0909
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
Target Milestone: --- → mozilla10
Comment on attachment 562368 [details] [diff] [review]
Less bogus fix

We're going to let this one come up through the normal process:

1. There were only ~500 crashes total in the last week
2. This has existed since Firefox 4
3. Changes to the parser feel like they have more risk in general
Attachment #562368 - Flags: approval-mozilla-beta?
Attachment #562368 - Flags: approval-mozilla-beta-
Attachment #562368 - Flags: approval-mozilla-aurora?
Attachment #562368 - Flags: approval-mozilla-aurora-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: