521668 - content sink re-entry causes crash [@ xul.dll@0x3ec28f ] and [@ SinkContext::DidAddContent(nsIContent*) ]

Reporter

Description

•

15 years ago

Kinda wierd, but when trying to go this page, get these crashes (did it 3 times, two were [@SinkContext::DidAddContent(nsIContent*) ] the other was the @xul one). bp-032188c2-20c6-4d3f-b6b1-9fece2091011 bp-d81fe7bb-56b9-46eb-9359-217fb2091011 bp-2a44d905-cb6c-4fcb-b74c-b5a942091011 Did not crash in safe mode.

Tyler Downer [He/Him]

Reporter

Comment 1

•

15 years ago

Bug 521164 may be related/the same.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 2

•

15 years ago

This started with the Oct 13 nightly -- http://bit.ly/OSOPj for crash stats list. We need a regression; it makes the nightly highly unstable and generally unusable. (Note that I can reproducibly crash by loading bug 521164 while logged in, though with a nonstandard bugzilla skin.)

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

15 years ago

Keywords: regressionwindow-wanted

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 3

•

15 years ago

Ignore Oct 13 -- sorting by date doesn't actually seem to work in crash stats.

chris hofmann

Comment 4

•

15 years ago

just looking at the incoming daily reports on the trunk crashes looks like this ramped around oct 5th and more reports are seen when that days crashes went into the report produced on the 6th. report for the stack signature SinkContext::DidAddContent 4 total crashes for SinkContext::DidAddContent on 20091004-crashdata.csv 3 total crashes for SinkContext::DidAddContent on 20091005-crashdata.csv 1 total crashes for SinkContext::DidAddContent on 20091006-crashdata.csv 15 total crashes for SinkContext::DidAddContent on 20091007-crashdata.csv 18 total crashes for SinkContext::DidAddContent on 20091008-crashdata.csv 17 total crashes for SinkContext::DidAddContent on 20091009-crashdata.csv 17 total crashes for SinkContext::DidAddContent on 20091010-crashdata.csv 19 total crashes for SinkContext::DidAddContent on 20091011-crashdata.csv 20 total crashes for SinkContext::DidAddContent on 20091012-crashdata.csv

chris hofmann

Comment 5

•

15 years ago

report for the stack signature xul.dll@0x3ec28f 0 total crashes for xul.dll@0x3ec28f on 20091004-crashdata.csv 0 total crashes for xul.dll@0x3ec28f on 20091005-crashdata.csv 0 total crashes for xul.dll@0x3ec28f on 20091006-crashdata.csv 0 total crashes for xul.dll@0x3ec28f on 20091007-crashdata.csv 0 total crashes for xul.dll@0x3ec28f on 20091008-crashdata.csv 0 total crashes for xul.dll@0x3ec28f on 20091009-crashdata.csv 3 total crashes for xul.dll@0x3ec28f on 20091010-crashdata.csv 5 total crashes for xul.dll@0x3ec28f on 20091011-crashdata.csv 1 total crashes for xul.dll@0x3ec28f on 20091012-crashdata.csv

chris hofmann

Comment 6

•

15 years ago

is bug 521164 also related?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 7

•

15 years ago

According to http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.7a1pre&query_search=signature&query_type=exact&query=&date=&range_value=6&range_unit=weeks&do_query=1&signature=SinkContext%3A%3ADidAddContent%28nsIContent*%29 It looks like it started with the 2009-10-06 nightly, though. That gives an overly-broad regression range of: http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2009-10-05&enddate=2009-10-06+06%3A00 but nothing is jumping out at me.

Marcia Knous [:marcia]

Comment 8

•

15 years ago

Using the latest nightly, I can reproduce this crash 99% of the time when I visit a crash report link such as http://crash-stats.mozilla.com/report/index/032188c2-20c6-4d3f-b6b1-9fece2091011 and click on one of the related bug links. I went back to both the 10/5 and 10/6 nightly trunk builds and repeated the same STR, but wasn't able to get it to crash. I also tested the facebook search link but I wasn't able to crash on any of the three builds I tested.

Johnny Stenback (:jst)

Comment 9

•

15 years ago

Not sure if it's relevant or not, but the one change in the date range that dbaron pointed at which is Windows 7 specific is the fix for bug 501490.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 10

•

15 years ago

I have this in the debugger.. the first time I hit it, I was at SinkContext::DidAddContent+0xbignum and the rest of the stack was bogus. But most of the time I hit it with exactly the most common signature in the crash reports. Note that we have crashes on multiple line numbers, in order of frequency: 706, 729, 707 (based on a very rough sampling of the crash reports). For the crash in line 706, I dug into this. mCurrentContext is valid in ProcessLINKTag and is passed in as %esi into AddLeaf. In AddLeaf, it's moved to %ecx before calling DidAddContent. It's bogus in DidAddContent, and it's the same |this| as the current value in %esi in AddLeaf. AddLeaf is simple, except for a call to Node::Add; %esi is supposed to be callee-saved, so my guess is that something inside Node::Add is clobbering %esi and/or not restoring it correctly.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 11

•

15 years ago

This bug seems to have more activity than the preceding bug 521164, so posting here. I got this in the debugger as well, second time it has happened for me, both times using Bugzilla. Stack: http://pastebin.mozilla.org/676493 Saved a full process dump in case anyone wants to take a look.

(not currently active) Ted Mielczarek

Comment 12

•

15 years ago

vlad: your analysis sounds distressingly similar to my analysis in bug 475178 comment 20. There, the compiler wasn't saving %eax prior to a function call, despite using it to store a local variable. In that case I was able to convince the compiler to produce correct code by using the return value of the function instead of ignoring it.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 13

•

15 years ago

... and I just hit this on the 1.9.2 branch. Same symptoms, same stack, same analysis.

Flags: blocking-firefox3.6?

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 14

•

15 years ago

This is a chrome jit bug; it does not appear if chrome jit is disabled. This is why safe mode fixes it -- safe mode forcibly disables the jit. Reproduced with all of my extensions disabled, on both 1.9.2 and trunk.

Assignee: nobody → general

Component: General → JavaScript Engine

Flags: blocking-firefox3.6?

Product: Firefox → Core

QA Contact: general → general

Hardware: x86_64 → All

Summary: Crash [@xul.dll@0x3ec28f ] and [@SinkContext::DidAddContent(nsIContent*) ] → Chrome jit crash [@xul.dll@0x3ec28f ] and [@SinkContext::DidAddContent(nsIContent*) ]

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 15

•

15 years ago

Marking blocking 1.9.2 due to dogfood, as bugzilla very frequently triggers this.

Flags: blocking1.9.2+

Keywords: dogfood

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

15 years ago

Priority: -- → P1

Robert Sayre

Comment 16

•

15 years ago

(In reply to comment #14) > This is a chrome jit bug; it does not appear if chrome jit is disabled. This > is why safe mode fixes it -- safe mode forcibly disables the jit. Reproduced > with all of my extensions disabled, on both 1.9.2 and trunk. how can this be a chrome jit bug? there's no js on the stack, is there?

Andreas Gal :gal

Comment 17

•

15 years ago

Disabling the JIT might move something around in memory. With no jit code on the stack at all, its really unlikely we cause this (not impossible, just unlikely). There is a deep callstack of jit-unrelated functions. Someone in there is going to use/clobber EAX, so even if the jit messed up earlier, I don't think it would reach this far, and this consistently this far.

Jim Jeffery not reading bug-mail 1/2/11

Comment 18

•

15 years ago

FYI, I manually disabled jit.chrome and went to the test URL, and after several clicks on different tabs from the menu in the upper left corner of the page I got today's trunk to crash. unfortunately the crash-report looks useless maybe: http://crash-stats.mozilla.com/report/index/fe03e30f-e051-43dd-90f7-745b52091014?p=1

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 19

•

15 years ago

I can now reproduce this with the jit entirely disabled, so the jit's off the hook I think. But in doing so I've seen at least 5 different crash signatures/stacks, all with the same symptoms -- clobbered/incorrect registers. I can reproduce this at will, though I haven't been able to set up a similar repro environment on the purify box. It gets stranger, though; I changed the code in SinkContext::AddLeaf to this: PR_LogPrint("AddLeaf this: %p ", this); nsIContent *rval = mStack[mStackPos - 1].Add(aContent); PR_LogPrint("-> %p\n", this); DidAddContent(rval); in the log file, at the end right before the crash I see: .... 0[83a140]: AddLeaf this 0: 17c618e0 0[83a140]: -> 17c618e0 0[83a140]: AddLeaf this 0: 17c618e0 0[83a140]: -> 17c618e0 0[83a140]: AddLeaf this 0: 17c618e0 0[83a140]: -> 17c618e0 0[83a140]: -> 17d66220 I'm terrified and confused. How did we hit that second PR_LogPrint twice, without hitting the first one? That second value is clearly bogus, and the crash happens right after when DidAddContent gets called. This is also in a local opt build, with no PGO, and with VC9, so PGO is not to blame.

Assignee: general → nobody

Component: JavaScript Engine → General

QA Contact: general → general

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

15 years ago

Summary: Chrome jit crash [@xul.dll@0x3ec28f ] and [@SinkContext::DidAddContent(nsIContent*) ] → Crash [@xul.dll@0x3ec28f ] and [@SinkContext::DidAddContent(nsIContent*) ]

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 20

•

15 years ago

Attached file recursive AddLeaf stack — Details

We're recursing into AddLeaf; is this valid? Here's the stack showing how it happens. Sometimes we crash, other times we don't after we return from the outer AddLeaf. We often have a whole bunch of inner AddLeafs for one Add() call.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 21

•

15 years ago

The code executing at the first js_Invoke (cx->fp->down->script) is WindowsPreviewPerTab.jsm:108 WindowsPreviewPerTab.jsm:122 WindowsPreviewPerTab.jsm:127 which is _getFaviconAsImage, which fits, since this is adding a <link> element. The code there does a synchronous http request to get the image, and so we spin up a nested event loop, which processes more data that we got on the parser listener channel before we're actually done processing the last chunk, and then very bad things happen. I'm guessing that this bug isn't tab preview specific, but that it was very rare to hit this situation before.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

15 years ago

Component: General → HTML: Parser

QA Contact: general → parser

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Group: core-security

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

15 years ago

Depends on: 522416

Boris Zbarsky [:bzbarsky]

Comment 22

•

15 years ago

vlad's filed bug 522416 on the tab preview thing doing Open() on an HTTP channel. But web content could trigger this too, using sync XHR from a DOMLinkAdded handler, as vlad noted. Or anything else that happened to run off a script runner and trigger content script. So just making DOMLinkAdded truly async is not a real solution. The sink clearly can't deal with reentry here, and I guess we can fix that by putting a script blocker higher up on the stack in the sink. Will that be good enough? Can the parser deal with reentry in this situation? Henri, can the HTML5 parser deal with script running under its AppendChildTo/InsertChildAt operations, or whenever the script blockers it puts around those operations come off the stack?

Mike Beltzner [:beltzner, not reading bugmail]

Comment 23

•

15 years ago

Do we feel that this issue is a P1 blocker for 3.6, or just the fact that it's being triggered by Aero Peek in bug 522416? My instinct is to move this to a P2.

Mike Shaver (:shaver emeritus)

Comment 24

•

15 years ago

I think that's correct, yeah.

Priority: P1 → P2

Summary: Crash [@xul.dll@0x3ec28f ] and [@SinkContext::DidAddContent(nsIContent*) ] → content sink re-entry causes crash [@xul.dll@0x3ec28f ] and [@SinkContext::DidAddContent(nsIContent*) ]

Boris Zbarsky [:bzbarsky]

Comment 25

•

15 years ago

I think Blake/Olli/Jonas should comment on whether they think we can fix this in a way they're not ok beta baking.... But yes, generally I think this should be ok as P2.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 26

•

15 years ago

I'll take this. Recursing into AddLeaf isn't really unexpected, but obviously we're not dealing with it properly.

Assignee: nobody → jonas

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 27

•

15 years ago

Even with taskbar previews disabled, I still see frequent crashes (though less frequent), with this stack: http://crash-stats.mozilla.com/report/index/1c15c9d1-5f5c-4b8d-8101-e19ef2091016 This is I think the same as in bug 521164, and is probably directly related to this.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

15 years ago

Whiteboard: [#1 trunk (3.7a1) topcrash]

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

15 years ago

Whiteboard: [#1 trunk (3.7a1) topcrash] → [#1 trunk (3.7a1) topcrash][#1 3.6b1 topcrash]

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

15 years ago

Blocks: 522286

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

15 years ago

Whiteboard: [#1 trunk (3.7a1) topcrash][#1 3.6b1 topcrash] → [#1 trunk (3.7a1) topcrash]

chris hofmann

Comment 29

•

15 years ago

still early but this is near the top of the 3.6b1 topcrash list. 25 total crashes for SinkContext::DidAddContent on 20091102-crashdata.csv 4 start up crashes inside 3 minutes os breakdown 22 SinkContext::DidAddContent(nsIContent*) Windows NT 6.1.7600 3 SinkContext::DidAddContent(nsIContent*) Windows NT 6.1.7100 distribution of all versions where the SinkContext::DidAddContent crash was found on 20091102-crashdata.csv 25 Firefox 3.6b1 domains of sites 7 http://www.facebook.com 3 http://www.youtube.com 2 http://vkontakte.ru 1 http://xhamster.com 1 http://www.teamsolutions.fr 1 http://www.soundpharmaceuticals.com

Whiteboard: [#1 trunk (3.7a1) topcrash] → [#1 trunk (3.7a1) topcrash][crashkill]

chris hofmann

Comment 30

•

15 years ago

possible dup at bug 522286 that is not security closed

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 31

•

15 years ago

The OS breakdown still indicates only windows 7, which is weird, since I thought we disabled all the win7-specific stuff. I guess we still have the tab list, whatever that's called? (peek?)

Damon Sicore (:damons)

Comment 32

•

15 years ago

Jonas, why aren't we seeing progress here? Is there anything you need? You mentioned issues with recursing into AddLeaf above, but have you investigated what's going on?

chris hofmann

Updated

•

15 years ago

Summary: content sink re-entry causes crash [@xul.dll@0x3ec28f ] and [@SinkContext::DidAddContent(nsIContent*) ] → content sink re-entry causes crash [ @xul.dll@0x3ec28f ] and [@ SinkContext::DidAddContent(nsIContent*) ]

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 33

•

15 years ago

I know what's going on here and hope to have a patch today.

Henrik Skupin [:whimboo][⌚️UTC+2]

Updated

•

15 years ago

Keywords: regression

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 34

•

15 years ago

I'm fairly certain this isn't a regression. What's happened is that some extensions (possibly weave) have started doing evil things in their DOMLinkAdded listener. If someone has indication that this is actually a regression, feel free to re-add the regression keyword

Keywords: regression, regressionwindow-wanted

chris hofmann

Comment 35

•

15 years ago

thunder, any ideas on comment 34 ?

chris hofmann

Comment 36

•

15 years ago

ok, if I look back at crash data from august I can see traces of this signature on older builds grep SinkContext::DidAddContent 200908* | awk -F\t '{print $1,$8}' | sort | uniq -c 2 20090807-crashdata.csv:SinkContext::DidAddContent(nsIContent*) 3.5.1 2 20090812-crashdata.csv:SinkContext::DidAddContent(nsIContent*) 3.5 1 20090814-crashdata.csv:SinkContext::DidAddContent(nsIContent*) 3.0.1 1 20090815-crashdata.csv:SinkContext::DidAddContent(nsIContent*) 3.5.2 1 20090821-crashdata.csv:SinkContext::DidAddContent(nsIContent*) 3.5.2 1 20090825-crashdata.csv:SinkContext::DidAddContent(nsIContent*) 3.5.2 1 20090830-crashdata.csv:SinkContext::DidAddContent(nsIContent*) 3.0.10 it also looks like 3.5.3, 3.5.4, 3.0.x are in the 0-2 crashes per day on this signature, and trunk and 3.6b1 are around 24 to 25 crashes per day.

Mike Shaver (:shaver emeritus)

Comment 37

•

15 years ago

Can you group by OS? Would be interesting to know what proportion of the trunk and 3.6b1 crashes are on Win7, since we know that tickled the bug aggressively.

Shawn Wilsher :sdwilsh

Comment 38

•

15 years ago

(In reply to comment #35) > thunder, any ideas on comment 34 ? From the looks of the stack, I'd say it's because Weave likes the spin the event loop. We have a nested event loop in that stack. This is assuming that the JS on that stack is Weave.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 39

•

15 years ago

There seems to be several issues here. It's unclear to me if there's just several ways to trigger the same bug, or if it's in fact several different bugs. I'm currently writing up a testcase which reproduces the spinning of the event loop, a'la the stack in attachment 406382 [details], and then a fix for that crash. Then i'll look at if the fix might cover other ways to crash at the same signature.

chris hofmann

Comment 40

•

15 years ago

(In reply to comment #37) > Can you group by OS? Would be interesting to know what proportion of the trunk > and 3.6b1 crashes are on Win7, since we know that tickled the bug aggressively. yeah, it looks like its pretty uniformly divided. the 3.6/7 crashes are on NT 6.1 and the 3.0.x and 3.5.x crashes are on older versions of windows. here is the date for oct/nov crash counts - sig - firefox version - os version 326 SinkContext::DidAddContent(nsIContent*) 3.6b1 Windows NT 6.1.7600 167 SinkContext::DidAddContent(nsIContent*) 3.7a1pre Windows NT 6.1.7600 122 SinkContext::DidAddContent(nsIContent*) 3.6b1 Windows NT 6.1.7100 83 SinkContext::DidAddContent(nsIContent*) 3.7a1pre Windows NT 6.1.7100 11 SinkContext::DidAddContent(nsIContent*) 3.6b2pre Windows NT 6.1.7100 11 SinkContext::DidAddContent(nsIContent*) 3.6b1pre Windows NT 6.1.7100 6 SinkContext::DidAddContent(nsIContent*) 3.7a1pre Windows NT 6.1.7232 5 SinkContext::DidAddContent(nsIContent*) 3.6b2pre Windows NT 6.1.7600 4 SinkContext::DidAddContent(nsIContent*) 3.6b1pre Windows NT 6.1.7600 3 SinkContext::DidAddContent(nsIContent*) 3.6b2pre Windows NT 6.1.7229 Service Pack 2 3 SinkContext::DidAddContent(nsIContent*) 3.6b1 Windows NT 6.1.7137 2 SinkContext::DidAddContent(nsIContent*) 3.7a1pre Windows NT 6.1.7077 2 SinkContext::DidAddContent(nsIContent*) 3.6b1 Windows NT 6.1.7260 1 SinkContext::DidAddContent(nsIContent*) 3.7a1pre Windows NT 6.1.7600 Service Pack 2 1 SinkContext::DidAddContent(nsIContent*) 3.7a1pre Windows NT 6.1.7229 1 SinkContext::DidAddContent(nsIContent*) 3.7a1pre Windows NT 6.1.7201 1 SinkContext::DidAddContent(nsIContent*) 3.6b2pre Windows NT 6.1.7057 1 SinkContext::DidAddContent(nsIContent*) 3.6b1 Windows NT 6.1.7600 Service Pack 2 1 SinkContext::DidAddContent(nsIContent*) 3.6b1 Windows NT 6.1.7264 1 SinkContext::DidAddContent(nsIContent*) 3.6b1 Windows NT 6.1.7229 1 SinkContext::DidAddContent(nsIContent*) 3.6b1 Windows NT 6.1.7201 1 SinkContext::DidAddContent(nsIContent*) 3.6b1 Windows NT 6.1.7068 1 SinkContext::DidAddContent(nsIContent*) 3.6b1 Windows NT 6.1.7057 23 SinkContext::DidAddContent(nsIContent*) 3.0.14 Windows NT 5.1.2600 Service Pack 2 22 SinkContext::DidAddContent(nsIContent*) 3.0.14 Windows NT 5.1.2600 Service Pack 3 8 SinkContext::DidAddContent(nsIContent*) 3.0.14 Windows NT 5.1.2600 Dodatek Service Pack 3 4 SinkContext::DidAddContent(nsIContent*) 3.5.3 Windows NT 5.1.2600 Service Pack 2 1 SinkContext::DidAddContent(nsIContent*) 3.5.4 Windows NT 6.0.6002 Service Pack 2 1 SinkContext::DidAddContent(nsIContent*) 3.5.4 Windows NT 5.1.2600 Service Pack 3 1 SinkContext::DidAddContent(nsIContent*) 3.5.3 Windows NT 5.1.2600 Service Pack 3 1 SinkContext::DidAddContent(nsIContent*) 3.0.14 Windows NT 5.1.2600 Dodatek Service Pack 2

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 41

•

15 years ago

So it appears things are more complicated than I had hoped. I have a testcase which recurses into AddLeaf, and the code is dealing just fine. So maybe it's only failing under certain other conditions, or the recursion was a red herring. Will have to continue investigation tomorrow.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 42

•

15 years ago

Oh, and in the meantime, if someone is able to reliably reproduce, do let me know.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 43

•

15 years ago

I don't think all recursion is broken, just that sometimes we end up in very much the wrong state. I would suggest going back to one of the previous builds before we fixed the sync http stuff, and run on Windows 7 -- it's usually trivially reproducable (especially opening lots of bugzilla pages), and then can be reproduced on demand with session restore.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 44

•

15 years ago

Actually, I got another idea on the drive home. If the context is going away during the recursion, that should definitely kill us. Should be easy to try using a missplaced-table-content-context. Will try this tomorrow. I'll also try with a build before Shawn fixed the tab preview code.

Boris Zbarsky [:bzbarsky]

Comment 45

•

15 years ago

I still don't understand how this can possibly handle reentry correctly in general. If we reenter under Add, won't we have mStackPos and such totally unrelated to aContent by the time we get into DidAddContent for the first Add? In particular, won't the "immediate notificaton" case be completely screwed?

Marcia Knous [:marcia]

Comment 46

•

15 years ago

Jonas: I can reproduce this reliably using the QA x64 Win 7 lab machine (using an older build, Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a1pre) Gecko/20091010 Minefield/3.7a1pre GTB6). I also installed Weave on that build.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 47

•

15 years ago

Marcia, Awesome! Ah, indeed, we risk notifying every which way. Though the interesting thing is that we only notify if we're in a table-misplaced-content-context. Or rather, we only notify in a way that would break in that case. Hmm.. In general it'll be hard to write testcases that cover all possibilities here. We really need to fuzz this. Though that turns out to be hard because DOMLinkAdded is only fired on chrome these days. So it really needs to be chrome that is misbehaving. Unless there are other elements that also fire events in the same way, but I can't find any. I'll chat with Jesse to see if we can figure out a way to fuzz here. In the meantime I'll write up a patch and see if it helps with Marcias steps to reproduce. Adding back regression keyword since it really seems like it's the windows toolbar code that's breaking us, though the bug is very arguably in the contentsink still.

Keywords: regression

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 48

•

15 years ago

(In reply to comment #47) > Adding back regression keyword since it really seems like it's the windows > toolbar code that's breaking us, though the bug is very arguably in the > contentsink still. Then we should do a regression test now that we can reproduce it reliably.

Keywords: regressionwindow-wanted

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 49

•

15 years ago

Most likely caused by bug 474056 then

Blocks: 474056

Keywords: regressionwindow-wanted

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 50

•

15 years ago

Wow, this code is mind boggling, the notification stuff *might* actually be working as it is. It might actually be that FlushTags will notify on the node properly if script inside the DOMLinkAdded-handler does something to the DOM. In that case mNumFlushed will be set to the childcount, and DidAddContent won't do anything. We've known for a long time that this code is too complex, and that this batching crap needs to go, which is happening in the HTML5 parser code. More investigating coming up...

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 51

•

15 years ago

Attached patch Don't recur into parser from events — Details — Splinter Review

This is not a full fix for the bug, but it's something I think we should do. We already protect from letting the parser pull in more network data if we're currently executing a <script> element. However this doesn't protect us from doing so if we're executing scripts for other reasons. This patch should take care of that. Blake, since this is parser land it would be great to get your input on if you think this is useful and safe enough to go on branch. I have a plan for fixing this bug in other ways, so it might not be strictly needed.

Attachment #410728 - Flags: review?(mrbkap)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 52

•

15 years ago

Attached patch Should fix it — Details — Splinter Review

This patch should be enough to fix things. I've pushed a patch to the tryserver so I can try a build on the boxes where I am able to reproduce. I'm a little uneasy about not knowing exactly why recurring into the sink again fails. I'd imagine it's the notification stuff, but I'd like to be sure. In any event this patch should make it so that the only place where we recur back into the sink is from <script> elements, which has received significant testing over the years. Will ask for review once I'm more sure that this patch fixes the bug. Talked to Jesse about fuzzing the sink (would be great for the HTML5 sink), but it's non-trivial. Something we should do though for sure.

Blake Kaplan (:mrbkap) (inactive)

Updated

•

15 years ago

Attachment #410728 - Flags: review?(mrbkap) → review+

Boris Zbarsky [:bzbarsky]

Comment 53

•

15 years ago

Jonas, would that patch happen to fix bug 467005 on branches too? Or is that happening from under a <script> element so still allowed?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

15 years ago

Attachment #410732 - Flags: superreview?(jst)

Attachment #410732 - Flags: review?(bnewman)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 54

•

15 years ago

Comment on attachment 410732 [details] [diff] [review] Should fix it Tried this on the machine and profile where we could reliably reproduce this before, and it's no longer crashing with this patch. So asking for reviews.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 55

•

15 years ago

(In reply to comment #53) > Jonas, would that patch happen to fix bug 467005 on branches too? Or is that > happening from under a <script> element so still allowed? Hmm.. I'll try, but I don't think so. The document.write there shouldn't be affected. I think. (can't wait to have code we actually fully understand)

Johnny Stenback (:jst)

Comment 56

•

15 years ago

Comment on attachment 410732 [details] [diff] [review] Should fix it Excellent! sr=jst

Attachment #410732 - Flags: superreview?(jst) → superreview+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 57

•

15 years ago

Comment on attachment 410732 [details] [diff] [review] Should fix it Ben Turner said he might be able to review this today

Attachment #410732 - Flags: review?(bnewman) → review?(bent.mozilla)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 58

•

15 years ago

Comment on attachment 410732 [details] [diff] [review] Should fix it r=me. Talking with sicking more it seems like the best approach given the fragility of the sink and the impending HTML5 landing.

Attachment #410732 - Flags: review?(bent.mozilla) → review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 59

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/6da836add4b9 http://hg.mozilla.org/mozilla-central/rev/8abdde9f628c Pushed to trunk. I'd lean towards just taking attachment 410732 [details] [diff] [review] on the branch. That one seems safer and should be enough to fix the problem here.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 60

•

15 years ago

Pushed attachment 410732 [details] [diff] [review] to 1.9.2 branch http://hg.mozilla.org/releases/mozilla-1.9.2/rev/40093e9721d5

status1.9.2: --- → final-fixed

timeless

Updated

•

15 years ago

Summary: content sink re-entry causes crash [ @xul.dll@0x3ec28f ] and [@ SinkContext::DidAddContent(nsIContent*) ] → content sink re-entry causes crash [@ xul.dll@0x3ec28f ] and [@ SinkContext::DidAddContent(nsIContent*) ]

Nobody; OK to take it and work on it

Updated

•

14 years ago

Crash Signature: [@ xul.dll@0x3ec28f ] [@ SinkContext::DidAddContent(nsIContent*) ]

Daniel Veditz [:dveditz]

Updated

•

13 years ago

Group: core-security

recursive AddLeaf stack 15 years ago Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond) 9.25 KB, text/plain		Details
Don't recur into parser from events 15 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 4.96 KB, patch	mrbkap : review+	Details \| Diff \| Splinter Review
Should fix it 15 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 1.42 KB, patch	bent.mozilla : review+ jst : superreview+	Details \| Diff \| Splinter Review