Closed Bug 328395 Opened 18 years ago Closed 18 years ago

Crash in [@ nsIPresShell::GetPresContext()] closing Print Preview window after automatic selection testcase.

Categories

(Core :: Layout, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: stephend, Assigned: sharparrow1)

References

()

Details

(4 keywords)

Crash Data

Attachments

(3 files, 1 obsolete file)

Build ID: 2006-02-23-09, Windows XP SeaMonkey trunk.

Summary: Crash in nsIPresShell::GetPresContext() closing Print Preview window after automatic selection testcase.

Steps to Reproduce:

1. Load https://bugzilla.mozilla.org/attachment.cgi?id=212983&action=view
2. Click the button to start.
3. Invoke Print Preview
4. Change orientation (from Landscape to Portrait, or vice-versa).
5. Close Print Preview

Expected Results:

Orientation changed and the Print Preview window closes.

Actual Results:

We crash in nsIPresShell::GetPresContext().  Stack forthcoming.
#0  0x057165f6 in nsIPresShell::GetPresContext() (this=0xfeeefeee)
    at ../../../../dist/include/layout/nsIPresShell.h:180
#1  0x051565a3 in nsTypedSelection::GetPresContext(nsPresContext**) (
    this=0xf73ffc0, aPresContext=0x22c148)
    at c:/mozilla/mozilla/layout/generic/nsSelection.cpp:6518
#2  0x051538ab in nsTypedSelection::RemoveRange(nsIDOMRange*) (
    this=0xf73ffc0, aRange=0xf4aab88)
    at c:/mozilla/mozilla/layout/generic/nsSelection.cpp:5544
#3  0x6ff4ea11 in _XPTC_InvokeByIndex ()
    at ../../../../../../dist/include/xpcom/xptcall.h:123
etc.
Basically, we're reenabling scripts when quitting print preview, which synchronously (!) runs timeouts while we're in a not very consistent state.  So we crash because we're messing with a dead presshell.

Perhaps we should actually run timeouts off an event, not from inside the SetScriptEnabled call?  Or pass the other boolean from print preview?

That said, perhaps nsSelection should hold an nsWeakPtr to the presshell?  And perhaps nsTypedSelection::GetPresContext should use GetPresShell on |this|?  I really can't tell why the selection setup has all these random presshell pointers, etc...  But I guess in this case what we actually have is a dead nsSelection object, right?

In any case, the crash is in an inlined method, so this isn't obviously exploitable...
Flags: blocking1.9a1?
Flags: blocking1.8.1?
Flags: blocking1.8.0.3?
I've been thinking about this kind of thing lately. It seems we need an API to say "run this callback when it is safe to execute script in this document". This would run off a PLEvent, take account of document destruction, and avoid running while the document is bfcache-hidden (queuing until the document is shown again, if it ever is). We don't have anything like this, do we?
Roc: is there a safe and simple way to fix this on the 1.8.0 branch?
Flags: blocking1.8.0.3? → blocking1.8.0.3+
Maybe, but what I was thinking about is not it. A quick fix for this would be to run the post-Print-Preview timeouts off an event.
I tried implementing what I think was suggested in Comment #5, but was not successful in fixing the bug.

Here's a summary of what I tried:

In nsPrintEngine::TurnScriptingOn, it still calls scx->SetScriptEnabled, but in the case where we are leaving print preview, it passes PR_FALSE for the aFireTimeouts argument. Then it creates a new PLEvent and posts it on the event queue. When the event handler is called, it calls scx->SetScriptEnabled with PR_TRUE for the aFireTimeouts arguments.
Unfortunately, calling scx->SetScriptEnabled during the event handler crashes because we are apparently still in an inconsistent state.
I tried using the current thread event queue (NS_GetCurrentEventQ) and the UI thread event queue (NS_GetMainEventQ); both had the same poor results.

Roc, do you think the event handler should check if the document is shown and if not, re-queue the event? Is that what you were suggesting in Comment #3? I can try that.
What I was suggesting is what I'm working on in 333544, which probably could be used to fix this bug, but is probably overkill for the branch ... unless we decide there are enough bugs of this type that it's justified to backport 333544's work all the way to 1.8.0.x. It's a bit like what you suggested.

What I would do is modify nsGlobalWindow::SetScriptsEnabled to not call RunTimeout(nsnull) directly, but post a PLEvent which does RunTimeout(nsnull) when called. That creates the possibility that the system will fire a timeout between scripts being reenabled and RunTimeout(nsnull) running, but I think RunTimeout can handle that. I don't know why you're crashing in the PLEvent but it may simply be that calling SetScriptsEnabled(PR_TRUE) when they're already enabled is, er, not supported.

Thanks for looking at this.
Depends on: 333544
I think you guys are missing the real issue here.  It really doesn't matter that it executes synchronously.  The JS holds a pointer to the nsTypedSelection object.  The nsTypedSelection holds a weak pointer to the nsSelection object.  The nsSelection object gets freed when the the PresShell gets destroyed.  The PresShell gets destroyed when the print preview is shown.  Therefore, when the nsTypedSelection tries to get a pointer to the PresShell after a print preview, it's trying to get a pointer to a nonexistent PresShell from a nonexistent nsSelection object.  Stupid architecture, but that's the way it is.

By the way, it is possible to cause this crash without print preview; I'll post the testcase if anyone's interested.

The fix here is to have the nsSelection object clear the all the pointers to itself from the nsTypedSelection objects upon destruction.  Messing with when the script gets called isn't going to help.
Wouldn't running the script later mean we get a different nsTypedSelection?  One that's pointing the the galley presshell?  That's what most of comment 2 was predicated on.  I do agree that we should fix the selection weak pointer mess.
The statement "var sel=window.getSelection();" stores a pointer to the nsTypedSelection.
Ah, right.  And print preview usually wipes out the galley presshell, eh?  :(  We should probably fix that too; we have a codepath to keep the galley presentation around and all...
Attached patch Patch (obsolete) — Splinter Review
I was thinking of something like this.  When the nsSelection is going away, clean up the nsTypedSelection so it's no longer dependent on the document.  Also added in some null checks for mFrameSelection to prevent other crashes.
Assignee: nobody → sharparrow1
Status: NEW → ASSIGNED
Attachment #218122 - Flags: review?(bzbarsky)
Comment on attachment 218122 [details] [diff] [review]
Patch

Make those errors NS_ERROR_NOT_INITIALIZED or maybe NS_ERROR_NOT_AVAILABLE and r=bzbarsky.
Attachment #218122 - Flags: review?(bzbarsky) → review+
Attached patch Updated patchSplinter Review
Okay, NS_ERROR_NOT_INITIALIZED it is.

dveditz, this is safe for the branches; it should only change behavior in cases where a script keeps around and uses a pointer to an nsTypedSelection.  Those cases are rare, and likely crash without this patch.  I don't have a branch tree, but this should be easy to port.  I'm pretty sure this code hasn't been touched for a while.
Attachment #218122 - Attachment is obsolete: true
Attachment #218138 - Flags: superreview?(roc)
Attachment #218138 - Flags: superreview?(roc)
Attachment #218138 - Flags: superreview+
Attachment #218138 - Flags: review+
Checked in on trunk.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Comment on attachment 218138 [details] [diff] [review]
Updated patch

Who do I ask for approval‑branch‑1.8.1?  Also, would someone mind checking this in to the branches when the approvals are done?
Attachment #218138 - Flags: approval1.8.0.3?
Attachment #218138 - Flags: approval-branch-1.8.1?
me or Boris can do 1.8.1 approval, and check it in for you if you like. This branch landing should be pretty easy though.
Verified FIXED using SeaMonkey 1.5a;Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060413 SeaMonkey/1.5a on the testcase of: https://bugzilla.mozilla.org/attachment.cgi?id=212983&action=view
Status: RESOLVED → VERIFIED
(In reply to comment #18)
> Verified FIXED using SeaMonkey 1.5a;Mozilla/5.0 (Windows; U; Windows NT 5.1;
> en-US; rv:1.9a1) Gecko/20060413 SeaMonkey/1.5a on the testcase of:
> https://bugzilla.mozilla.org/attachment.cgi?id=212983&action=view

Sorry for the spam; just wanted to clarify that it's an -04-13-21 build (a morning 13th build still crashes, as expected). 

Attachment #218138 - Flags: approval-branch-1.8.1? → approval-branch-1.8.1?(roc)
Attachment #218138 - Flags: approval-branch-1.8.1?(roc) → approval-branch-1.8.1+
Please land this on the MOZILLA_1_8_BRANCH ASAP and add the fixed1.8.1 keyword.  We'd like to maximize "bake-time" for this patch before approving it for the MOZILLA_1_8_0_BRANCH.  Thanks!
Fixed on 1.8 branch.
Keywords: fixed1.8.1
Comment on attachment 218138 [details] [diff] [review]
Updated patch

approved for 1.8.0 branch, a=dveditz for drivers.
Attachment #218138 - Flags: approval1.8.0.3? → approval1.8.0.3+
verified fixed on the 1.8.0.4 branch using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.4) Gecko/20060504 Firefox/1.5.0.4. No crash using stephen's excellent STR. adding keyword.
Flags: blocking1.9a1?
Flags: blocking1.8.1?
Crash Signature: [@ nsIPresShell::GetPresContext()]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: