Closed Bug 616421 Opened 9 years ago Closed 8 months ago

crash [@ nsHtml5TreeOperation::Perform(nsHtml5TreeOpExecutor*, nsIContent**) ]

Categories

(Core :: HTML: Parser, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox47 --- affected
firefox48 --- fixed
firefox49 --- affected
firefox-esr45 --- affected
firefox50 --- affected
firefox51 --- affected
firefox52 --- wontfix

People

(Reporter: scoobidiver, Assigned: njn)

References

Details

(Keywords: crash, Whiteboard: [tbird crash])

Crash Data

Attachments

(2 files)

It is a residual crash signature that exists in the trunk builds.
It is #263 top crasher in 4.0b7 for the last week.

Signature	nsHtml5TreeOperation::Perform(nsHtml5TreeOpExecutor*, nsIContent**)
UUID	b3ce6b53-6ea7-43c4-a5b5-ecc882101202
Time 	2010-12-02 20:17:28.445494
Uptime	1634
Last Crash	458556 seconds (5.3 days) before submission
Install Age	17733 seconds (4.9 hours) since version was first installed.
Product	Firefox
Version	4.0b8pre
Build ID	20101202030316
Branch	2.0
OS	Windows NT
OS Version	5.1.2600 Service Pack 3
CPU	x86
CPU Info	GenuineIntel family 6 model 23 stepping 10
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x1afcad8
App Notes 	AdapterVendorID: 8086, AdapterDeviceID: 29c2
MSAFD Tcpip [TCP/IP] : 2 : 1 :
MSAFD Tcpip [UDP/IP] : 2 : 2 : %SystemRoot%\system32\mswsock.dll
MSAFD Tcpip [RAW/IP] : 2 : 3 : %SystemRoot%\system32\mswsock.dll
RSVP UDP Service Provider : 6 : 2 : %SystemRoot%\system32\rsvpsp.dll
RSVP TCP Service Provider : 6 : 1 : %SystemRoot%\system32\rsvpsp.dll
MSAFD NetBIOS [\Device\NetBT_Tcpip_{BE5C30CD-B017-4054-AC27-3638F43693E7}] SEQPACKET 0 : 2 : 5 : %SystemRoot%\system32\mswsock.dll
MSAFD NetBIOS [\Device\NetBT_Tcpip_{BE5C30CD-B017-4054-AC27-3638F43693E7}] DATAGRAM 0 : 2 : 2 : %SystemRoot%\system32\mswsock.dll
MSAFD NetBIOS [\Device\NetBT_Tcpip_{8647200B-DB04-4800-AFEA-7A5D232122D5}] SEQPACKET 1 : 2 : 5 : %SystemRoot%\system32\mswsock.dll
MSAFD NetBIOS [\Device\NetBT_Tcpip_{8647200B-DB04-4800-AFEA-7A5D232122D5}] DATAGRAM 1 : 2 : 2 : %SystemRoot%\system32\mswsock.dll
MSAFD NetBIOS [\Device\NetBT_Tcpip_{9230AC61-0AAD-4A13-92F7-02591246DD5D}] SEQPACKET 2 : 2 : 5 : %SystemRoot%\syste

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	nsHtml5TreeOperation::Perform 	parser/html/nsHtml5TreeOperation.cpp:493
1 	xul.dll 	nsHtml5TreeOpExecutor::RunFlushLoop 	parser/html/nsHtml5TreeOpExecutor.cpp:509
2 	xul.dll 	nsHtml5ExecutorReflusher::Run 	parser/html/nsHtml5TreeOpExecutor.cpp:90
3 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:626
4 	nspr4.dll 	_MD_CURRENT_THREAD 	nsprpub/pr/src/threads/combined/prulock.c:404
5 	nspr4.dll 	_MD_CURRENT_THREAD 	nsprpub/pr/src/threads/combined/prulock.c:404
6 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:110
7 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:202
8 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:176
9 	xul.dll 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:192
10 	xul.dll 	xul.dll@0xb0cd0b 	
11 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/src/nsAppStartup.cpp:191
12 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3691
13 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:128
14 	firefox.exe 	__tmainCRTStartup 	obj-firefox/memory/jemalloc/crtsrc/crtexe.c:591
15 	kernel32.dll 	BaseProcessStart 	

More reports at:
http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=&range_value=4&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=nsHtml5TreeOperation%3A%3APerform%28nsHtml5TreeOpExecutor*%2C%20nsIContent**%29
I am guessing this is a matter of the nsIContent handle nsTArray failing to allocate a handle. Making this depend on bug 610823. If the problem persists after bug 610823 has been fixed, let's investigate more.
Depends on: 610823
Whiteboard: [waiting for bug 610823]
Crash Signature: [@ nsHtml5TreeOperation::Perform(nsHtml5TreeOpExecutor*, nsIContent**) ]
(In reply to comment #1)
> I am guessing this is a matter of the nsIContent handle nsTArray failing to
> allocate a handle. Making this depend on bug 610823. If the problem persists
> after bug 610823 has been fixed, let's investigate more.

So, bug 610823 has been fixed, but my brother has seen this crash on his computer:

https://crash-stats.mozilla.com/report/index/73984456-422f-4aab-ba2e-571f32110715
0 	xul.dll 	nsHtml5TreeOperation::Perform 	parser/html/nsHtml5TreeOperation.cpp:279
1 	xul.dll 	nsHtml5TreeOpExecutor::RunFlushLoop 	parser/html/nsHtml5TreeOpExecutor.cpp:489
2 	xul.dll 	nsHtml5ExecutorFlusher::Run 	parser/html/nsHtml5TreeOpExecutor.cpp:90
3 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:618
4 	xul.dll 	nsThread::GetObserver 	xpcom/threads/nsThread.cpp:692
5 	xul.dll 	nsThread::Shutdown 	xpcom/threads/nsThread.cpp:481
6 	xul.dll 	NS_InvokeByIndex_P 	xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:102
7 	xul.dll 	nsProxyObjectCallInfo::Run 	xpcom/proxy/src/nsProxyEvent.cpp:182
8 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:618
9 	xul.dll 	nsThread::GetObserver 	xpcom/threads/nsThread.cpp:692
10 	xul.dll 	nsThread::Shutdown 	xpcom/threads/nsThread.cpp:481
11 	xul.dll 	NS_InvokeByIndex_P 	xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:102
12 	xul.dll 	nsProxyObjectCallInfo::Run 	xpcom/proxy/src/nsProxyEvent.cpp:182
13 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:618
14 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:110
15 	xul.dll 	xul.dll@0xb60d87 	
16 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:202
17 	xul.dll 	xul.dll@0x36a1ff 	
18 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:176
19 	xul.dll 	nsJARURI::QueryInterface 	modules/libjar/nsJARURI.cpp:75 

Does this mean that the problem still exists?
Can the line number in the stack trace be trusted? The right hg version is http://hg.mozilla.org/releases/mozilla-beta/file/tip/parser/html/nsHtml5TreeOperation.cpp#l279 right? It doesn't make sense for that line to crash. If the method Append() is getting inlined, shouldn't the line numbers point to lines inside Append()?

Does it actually make sense for nsJARURI::QueryInterface to be spinning the main thread event loop?
Whiteboard: [waiting for bug 610823]
I got Crashes with this Signature using a recent Thunderbird Nightly. The first one right after fetching E-Mail, the others right on Start-up - even in Safe-Mode.

bp-8d54c28f-bda3-4b69-8088-ed3472110829
bp-33ccf474-c0be-457d-b0c7-5ca082110829
bp-881854f3-a750-48ad-9824-66a7b2110829
bp-9aa2e773-5a93-4fc3-a438-086392110829

Opposed to the already mentioned this is with
Crash Reason	EXCEPTION_ILLEGAL_INSTRUCTION
and the Stack looks rather short (corrupted? invalid?).
(In reply to XtC4UaLL [:xtc4uall] from comment #4)
> and the Stack looks rather short (corrupted? invalid?).

The stacks look completely bogus to me. If you look at what is supposedly calling what, the source has no such calls.
Are there any work arounds for this issue, we have had two questions on this this week. 
[https://support.mozilla.org/en-US/questions/991262]
Whiteboard: [tbird crash]
I get this crash what feels like every few weeks, often when opening links on Twitter.  Unfortunately, I can't reproduce it reliably at all.

c378e70d-84ae-4195-b765-d95302140804
(In reply to Doug Turner (:dougt) from comment #7)
> https://crash-stats.mozilla.com/report/index/6d5a90c9-849b-40bc-90e9-
> 2a78c2140508

(In reply to Andrew McCreight [:mccr8] from comment #8)
> I get this crash what feels like every few weeks, often when opening links
> on Twitter.  Unfortunately, I can't reproduce it reliably at all.
> 
> c378e70d-84ae-4195-b765-d95302140804

Hmm. Unlike the earlier crashes, these two have stacks that look real.

The tree op enum has different values for opt and debug builds, which in retrospect is probably not smart. I guess the next step is removing that distinction and seeing if the crashing line changes when the enum value 0 signals an uninitialized enum. (If the crashes still happen in the "append" op, then the enum is OK but a pointer is bad on its own right.)
Assignee: nobody → hsivonen
Landing this would help narrow down the cause of this crash (once more crashes are seen), but I guess first I should find out if explicitly zeroing all those tree ops matters for performance.
I just crashed again with this stack.
This was after I tried to submit a form.
Seen on Mac, so not Windows specific.
OS: Windows XP → All
Hardware: x86 → All
Comment on attachment 8475060 [details] [diff] [review]
Distinguish bogus enum crashes

smaug, what do you think about landing a patch like this in order to make crashed due to uninitialied tree ops and crashes due to initialized tree ops with bad pointers have different stacks? (In order to know which case should be investigated.)
Attachment #8475060 - Flags: review?(bugs)
Comment on attachment 8475060 [details] [diff] [review]
Distinguish bogus enum crashes

Oh, we should definitely initialize mOpCode to something.
Attachment #8475060 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #15)
> Comment on attachment 8475060 [details] [diff] [review]
> Distinguish bogus enum crashes
> 
> Oh, we should definitely initialize mOpCode to something.

Thanks.

https://hg.mozilla.org/integration/mozilla-inbound/rev/e8c2cd5bc9e3
Keywords: leave-open
Got a crash after the debugging patch from comment 18 landed:

https://crash-stats.mozilla.com/report/index/09f70a8b-17c8-4d48-a3f6-59b452141006

Looks like I am crashing at the "Bogus tree op" line that was modified.
smaug, do you have ideas how we could end up with zeroed memory as supposedly valid item in nsTarray used as the op queue?

I checked that:
 1) Whenever the tree builder appends an element, it always initializes the op code.
 2) The other queues only receive ops by nsTarray move or swap operation.
 3) Removals are by nsTArray operations. (There's only one case that's not Clear() and if that case was bogus, I'd expect many more problems.)

The locking code also seems to be there as designed.

I wonder if I should review all the non-locking cases to see that non-locking code paths are never used when locking is supposed to be used. I haven't audited that angle yet.
Flags: needinfo?(bugs)
Flags: needinfo?(bugs)
Still crashing with this stack, I guess Olli has no ideas either?
Yeah, I was at actually looking at crash-stats for this last week hoping that hsivonen's recent-ish
changes would have fixed this, but no.
Crash Signature: [@ nsHtml5TreeOperation::Perform(nsHtml5TreeOpExecutor*, nsIContent**) ] → [@ nsHtml5TreeOperation::Perform(nsHtml5TreeOpExecutor*, nsIContent**) ] [@ nsHtml5TreeOperation::Perform ]
In the past 7 days, 1302 crashes with this signature have occurred. This makes it the #51 topcrash on 46.0.1.

Of those 1302, 491 are due to hitting MOZ_CRASH("Bogus tree op"). We should add some diagnostics to better understand that case -- is the tree op a valid but unexpected one, or is it truly a bogus op?
This may help understand the "Bogus tree op" crashes.
Attachment #8758581 - Flags: review?(hsivonen)
Assignee: hsivonen → n.nethercote
Status: NEW → ASSIGNED
Attachment #8758581 - Flags: review?(hsivonen) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/2313feadbdaf1e2a6a92ef5f6c0d0e6bd34121b2
Bug 616421 - Better distinguish invalid mOpCode values in nsHtml5TreeOperation::Perform. r=hsivonen.
Comment on attachment 8758581 [details] [diff] [review]
Better distinguish invalid mOpCode values in nsHtml5TreeOperation::Perform

Approval Request Comment
[Feature/regressing bug #]: Html parsing.

[User impact if declined]: This diagnostic patch may give insight into the cause of some MOZ_CRASH aborts.

[Describe test coverage new/current, TreeHerder]: Html parsing is heavily exercised in many tests.

[Risks and why]: Negligible. Patch is very simple.

[String/UUID change made/needed]: none.
Attachment #8758581 - Flags: approval-mozilla-aurora?
Comment on attachment 8758581 [details] [diff] [review]
Better distinguish invalid mOpCode values in nsHtml5TreeOperation::Perform

diagnostic patch, taking it
Attachment #8758581 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8475060 - Flags: checkin+
Attachment #8758581 - Flags: checkin+
AFAICT we are now getting roughly equal numbers of MOZ_CRASH(Bogus tree op) and MOZ_CRASH(eTreeOpUninitialized) crashes.
Crash volume for signature 'nsHtml5TreeOperation::Perform':
 - nightly(version 50):26 crashes from 2016-06-06.
 - aurora (version 49):73 crashes from 2016-06-07.
 - beta   (version 48):680 crashes from 2016-06-06.
 - release(version 47):7670 crashes from 2016-05-31.
 - esr    (version 45):537 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       7       6       4       1       2       4       2
 - aurora        8      13      10      12       5       9      14
 - beta         95      87     101      94      83     102      96
 - release     983    1105    1090    1096    1055    1082     949
 - esr          50      50      51      39      53      63      62

Affected platforms: Windows, Mac OS X, Linux
Crash volume for signature 'nsHtml5TreeOperation::Perform':
 - nightly (version 51): 8 crashes from 2016-08-01.
 - aurora  (version 50): 32 crashes from 2016-08-01.
 - beta    (version 49): 190 crashes from 2016-08-02.
 - release (version 48): 795 crashes from 2016-07-25.
 - esr     (version 45): 656 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       0       1       2
 - aurora       11       9       8
 - beta         62      63      26
 - release     248     212     108
 - esr          57      45      49

Affected platforms: Windows, Mac OS X, Linux

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly           #127
 - aurora            #67
 - beta    #332      #133
 - release #68       #68
 - esr     #140
Crash volume for signature 'nsHtml5TreeOperation::Perform':
 - nightly (version 52): 2 crashes from 2016-09-19.
 - aurora  (version 51): 5 crashes from 2016-09-19.
 - beta    (version 50): 39 crashes from 2016-09-20.
 - release (version 49): 534 crashes from 2016-09-05.
 - esr     (version 45): 810 crashes from 2016-06-01.

Crash volume on the last weeks (Week N is from 10-03 to 10-09):
            W. N-1  W. N-2
 - nightly       1       1
 - aurora        5       0
 - beta         36       3
 - release     439      95
 - esr          77      66

Affected platforms: Windows, Mac OS X, Linux

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly           #986
 - aurora            #204
 - beta    #833      #162
 - release #249      #46
 - esr     #148
Too late for firefox 52, mass-wontfix.
See Also: → 1478581

The leave-open keyword is there and there is no activity for 6 months.
:njn, maybe it's time to close this bug?

Flags: needinfo?(n.nethercote)

Sure.

Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Flags: needinfo?(n.nethercote)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.