Created attachment 351659 [details]
See testcase, which crashes current trunk build within 100ms.
This regressed between 2008-09-11 and 2008-09-12:
I think a regression from bug 454361 somehow.
0 xul.dll nsCSSFrameConstructor::AdjustParentFrame layout/base/nsCSSFrameConstructor.cpp:3407
1 xul.dll nsCSSFrameConstructor::ConstructFrameInternal layout/base/nsCSSFrameConstructor.cpp:7494
2 xul.dll nsCSSFrameConstructor::ConstructFrame layout/base/nsCSSFrameConstructor.cpp:7406
3 xul.dll nsCSSFrameConstructor::ContentInserted layout/base/nsCSSFrameConstructor.cpp:8997
4 xul.dll nsCSSFrameConstructor::RecreateFramesForContent layout/base/nsCSSFrameConstructor.cpp:11143
5 xul.dll nsCSSFrameConstructor::ProcessRestyledFrames layout/base/nsCSSFrameConstructor.cpp:9868
6 xul.dll nsCSSFrameConstructor::RestyleElement layout/base/nsCSSFrameConstructor.cpp:9942
7 xul.dll nsCSSFrameConstructor::ProcessOneRestyle layout/base/nsCSSFrameConstructor.cpp:13280
8 xul.dll nsCSSFrameConstructor::ProcessPendingRestyles layout/base/nsCSSFrameConstructor.cpp:13378
9 xul.dll PresShell::DoFlushPendingNotifications layout/base/nsPresShell.cpp:4541
10 xul.dll PresShell::FlushPendingNotifications layout/base/nsPresShell.cpp:4505
11 xul.dll nsCSSFrameConstructor::RestyleEvent::Run layout/base/nsCSSFrameConstructor.cpp:13451
12 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510
13 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:170
14 nspr4.dll PR_GetEnv
15 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:87
16 firefox.exe firefox.exe@0x2197
17 kernel32.dll BaseProcessStart
Because of the load delay by loading it over the network, there is a chance it doesn't crash. In that case, load the testcase in offline mode or test the testcase locally.
I don't see a crash on Linux, either from the network or from a local file, although I do see some assertions and warnings.
I crash on Windows from a local file, though.
The variable parentFrame in ContentInserted is full of 0xdddddddd, and then we crash when we start accessing it.
And the thing we're calling RecreateFramesForContent on is the tooltip element.
Created attachment 352799 [details]
This is probably the same issue.
Created attachment 356113 [details]
testcase2 crashes reliably on Linux.
Created attachment 357202 [details]
stack of node removal
This is from debugging testcase2.
So I think the problem is related to this stack: we're removing the node here, and then on unwind from here, we're then trying to do frame construction on a node that's no longer in the document, which crashes.
Before this point, we get:
###!!! ASSERTION: killing mutation events: 'nsContentUtils::IsSafeToRunScript()', file ../../../dist/include/content/nsContentUtils.h, line 1528
called from what's frame #13 in this stack (but line 4336), but then we just go on and run the mutation events anyway, at this stack. I'd have though that the assertion would imply that we wouldn't run the mutation events given that some of the script blockers were not removable.
... in particular, we unwind to frame #24, and then get a pile of assertions when it calls ConstructFrameByDisplayType ... ConstructBlock ... ProcessChildren
But we actually don't crash until reflow, where we try to reflow some deleted frames.
Created attachment 357228 [details]
valgrind's explanation for the crash on testcase 2, with preceding assertions
But the crash is a result of having frames for an element with a null document pointer, because of the warning from RecreateFramesForContent:
WARNING: NS_ENSURE_TRUE(aContent->GetDocument()) failed: file /home/dbaron/builds/mozilla-central/mozilla/layout/base/nsCSSFrameConstructor.cpp, line 10984
since this is in the middle of a style context rebuild, we end up crashing as a result of skipping recreation of frames, since we depended on the frame reconstruction to delete the style contexts with dangling pointers.
There are a few things that are fragile here that we could perhaps fix:
1. rule tree reconstruction assuming all pointers into the old rule tree will be destroyed
2. style reresolution relying on frame reconstruction to destroy all the frames with old style context pointers (and the resulting skipping of the style context pointer fixup)
but it does look like the underlying problem comes down to constructing frames for a node that's not in the document.
Created attachment 357249 [details] [diff] [review]
patch to add an assertion in EndReconstruct that makes things clearer
I don't like this; I'm going to rewrite it. But I want to save it here because it does have potentially useful child enumeration code (which I'm going to get rid of for the next version).
(In reply to comment #12)
> There are a few things that are fragile here that we could perhaps fix:
3. Make RecreateFramesForContent destroy the frames but not rebuild them in the case we're hitting.
The problem is somewhere in frame 15-18. We really shouldn't be calling SetAttr with aNotify=1 when it's unsafe to run script. One of the functioncalls in those frames need to be put in a scriptrunner. I'm guessing either the call to SynchronizeBroadcastListener or the calls to SetAttr.
There really is no other way to fully fix the crash here since the script in the mutation observer can change the whole world under the frame-ctor.
Neil, do you have an opinion on what to put off for when it's safe to run scripts? See above comment 15.
Since this can lead to exploitable crashes I'm marking this security sensitive.
I moved the business about asserting on EndReconstruct into bug 473871.
And I've decided I am going to fix the rule tree thing in bug 475128.
However, the script could have done much worse things here that layout can't protect against; this is really a content bug. (Sorry, should have moved this last week.)
I need to test if Bug 471166 fixes this one.
This doesn't crash anymore, at least not in trunk.
I get a "killing mutation events" assertion.
Ah, sorry, the testcase 3 does still crash :( And Bug 471166 doesn't help here.
Created attachment 358815 [details] [diff] [review]
patch, requires bug 471166
Fixes also Bug 472668.
I need to still run all the tests.
FWIW, the patch in bug 475128 didn't fix this. I get a crash in nsCSSFrameConstructor::ConstructFrameInternal when I load the testcase.
Verified fixed, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090201 Minefield/3.2a1pre (.NET CLR 3.5.30729)
Shouldn't we take this one on 220.127.116.11 with the others (bug 466057 et al.) you've fixed like this?
Comment on attachment 358815 [details] [diff] [review]
patch, requires bug 471166
Approved for 18.104.22.168, a=dveditz for release-drivers.
The patch depends on 1.9.1/trunk broadcasting changes.
Changing blocking22.214.171.124+ to blocking126.96.36.199?
Can you stop this crash working around the missing trunk changes? Would it be hard to back-port the trunk broadcasting changes?
Verified fix on Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20090204 Shiretoko/3.1b3pre Ubiquity/0.1.5
These self-referencing testcases no longer work from bugzilla. Is there something we could change on the server end, maybe let the documents be cached so the self-references don't trigger a re-load and therefore a re-redirect?
testcase3 (at least) is crashing in the same place in 188.8.131.52pre as reported here for trunk. We ought to get this fix into 184.108.40.206
(In reply to comment #34)
> These self-referencing testcases no longer work from bugzilla. Is there
> something we could change on the server end, maybe let the documents be cached
> so the self-references don't trigger a re-load and therefore a re-redirect?
I don't think so... the tokens used for security bugs are one-time-use-only.
Still need an answer to comment 32 from Olli, but clearly this isn't going to make tomorrow's code freeze regardless.
(In reply to comment #37)
> but clearly this isn't going to
> make tomorrow's code freeze regardless.
Sorry about that.
I think all the broadcaster changes should be ported to branch. There are quite
a few... :/
Now that 3.5b4 is essentially done you've got time to roll-up all the broadcaster changes you're talking about? Not going to make 220.127.116.11 because that code-freezes next week, but please start working on it now so it doesn't miss another release.
The assertions are fixed now in 18.104.22.168. I can't reliably reproduce the
crash on 1.9.0.x.
Martijn, could you help verifying this and bug 472668.
Testcase 3 crashes reliably for me on Windows XP with 3.0.11. With 3.0.12pre (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:22.214.171.124pre) Gecko/2009063005 GranParadiso/3.0.12pre), it does not crash. Marking verified126.96.36.199.
Other cases do not reproduce a crash.
Added crashtests, but had to disable one temporarily due to bug 509269.