Closed Bug 818626 Opened 12 years ago Closed 12 years ago

crash in nsStyleContext::~nsStyleContext

Categories

(Core :: Layout, defect)

18 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla21
Tracking Status
firefox18 - fixed
firefox19 + verified
firefox20 --- verified

People

(Reporter: scoobidiver, Assigned: heycam)

References

Details

(Keywords: crash, reproducible, topcrash)

Crash Data

Attachments

(4 files, 1 obsolete file)

It's #4 top browser crasher in 18.0b2 and #3 in 19.0a2 on Mac OS X. Comments in 18.0 talk about printing a PDF with PDF Viewer (before bug 815718 was fixed). Signature nsStyleContext::~nsStyleContext() More Reports Search UUID 3d284134-db3d-4fa7-a53d-f6e292121205 Date Processed 2012-12-05 04:09:33 Uptime 6560 Last Crash more than 3 months before submission Install Age 1.8 hours since version was first installed. Install Time 2012-12-05 02:17:06 Product Firefox Version 20.0a1 Build ID 20121204030754 Release Channel nightly OS Mac OS X OS Version 10.8.2 12C3006 Build Architecture amd64 Build Architecture Info family 6 model 58 stepping 9 Crash Reason EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash Address 0x18 App Notes AdapterVendorID: 0x8086, AdapterDeviceID: 0x 166GL Context? GL Context+ GL Layers? GL Layers+ Processor Notes /data/socorro/stackwalk/bin/exploitable: ERROR: unable to analyze dump EMCheckCompatibility True Adapter Vendor ID 0x8086 Adapter Device ID 0x 166 Frame Module Signature Source 0 XUL nsStyleContext::~nsStyleContext nsIPresShell.h:296 1 XUL nsStyleContext::Destroy nsStyleContext.cpp:80 2 XUL nsFrame::~nsFrame nsStyleContext.h:86 3 XUL nsFrame::DestroyFrom nsFrame.cpp:680 4 XUL nsFrameList::DestroyFramesFrom nsFrameList.cpp:61 5 XUL nsContainerFrame::DestroyFrom nsContainerFrame.cpp:228 6 XUL nsFrameManager::Destroy nsIFrame.h:607 7 XUL PresShell::Destroy nsPresShell.cpp:1079 8 XUL nsPrintObject::~nsPrintObject nsPrintObject.cpp:107 9 XUL nsPrintData::~nsPrintData nsPrintData.cpp:102 10 XUL nsPrintEngine::Destroy nsPrintEngine.cpp:255 11 XUL nsDocumentViewer::OnDonePrinting nsDocumentViewer.cpp:4275 12 XUL nsPrintCompletionEvent::Run nsPrintEngine.cpp:3652 13 XUL nsThread::ProcessNextEvent nsThread.cpp:627 14 XUL NS_ProcessPendingEvents_P nsThreadUtils.cpp:171 15 XUL nsBaseAppShell::NativeEventCallback nsBaseAppShell.cpp:97 16 XUL nsAppShell::ProcessGeckoEvents nsAppShell.mm:387 17 CoreFoundation __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ 18 CoreFoundation __CFRunLoopDoSources0 19 CoreFoundation __CFRunLoopRun 20 CoreFoundation CFRunLoopRunSpecific 21 HIToolbox RunCurrentEventLoopInMode 22 HIToolbox ReceiveNextEventCommon 23 HIToolbox BlockUntilNextEventMatchingListInMode 24 AppKit _DPSNextEvent 25 AppKit -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 26 XUL -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] nsAppShell.mm:164 27 AppKit -[NSApplication run] 28 XUL nsAppShell::Run nsAppShell.mm:741 29 XUL nsAppStartup::Run nsAppStartup.cpp:291 30 XUL XREMain::XRE_mainRun nsAppRunner.cpp:3824 31 XUL XREMain::XRE_main nsAppRunner.cpp:3891 32 XUL XRE_main nsAppRunner.cpp:4089 33 firefox main nsBrowserApp.cpp:174 More reports at: https://crash-stats.mozilla.com/report/list?signature=nsStyleContext%3A%3A~nsStyleContext%28%29 https://crash-stats.mozilla.com/report/list?signature=nsStyleContext%3A%3A~nsStyleContext
Looks like this might be pdf.js related but can we get urls and any other info that might help find out if this is a crash caused by a regression?
The URLs look more like they'd lead to Flash content or lots of Flash ads, like adult content or download sites, and lots of Facebook as well as a few webmail services. Still, almost all of those URLs only have one visit and the list is large, I don't want to go through and sanitize all of those, so here's the ones that have more than 1 visit - not too helpful either, I fear: 7 http://www.facebook.com/ 5 https://www.facebook.com/ 4 about:blank 2 https://www.facebook.com/login.php?login_attempt=1 2 about:newtab
Keywords: needURLs
Since Anthony is the QA contact for FF19, and PDF.js will be disabled in FF18b3, assigning to him to reproduce. We'll untrack once this has verifiably dropped off the 18 top crash list. Also including roc and bdahl already.
Keywords: qawanted
QA Contact: anthony.s.hughes
I've been trying several ways to reproduce this but haven't stumbled upon anything yet. Here's a list of a few things I've tried: * loading a PDF from Google.com * loading a PDF from an email in gmail * loading a PDF from local disk * loading a PDF shared via Facebook * Fullscreen mode * Printing * Download * Handing off from PDF.js to Preview.app * Handing off from PDF.js to Acrobat Reader 10 * Testing with PDF Viewer 0.6.39 installed Testing with Firefox Aurora 19.0a2 2012-12-06 on Mac OSX 10.7.4 with Flash 11.5.502.110. I'll try some load testing next to see if having many PDFs and Flash pages open makes a difference. Is there any other guidance you can give me?
I'm not sure if this is related or not, but I encountered a problem printing when load testing. I have 20 tabs open playing different flash demos. I have a couple of gmail tabs, a couple facebook tabs, and 10 more tabs loaded with the same PDF viewable in PDF.js. I drag one of the PDF tabs to the desktop and Firefox hangs for about 30 seconds but eventually moves the tab to it's own window. When I click the Print icon in PDF.js for this PDF I get an error in Error Console and no print dialog appears. Error: NS_ERROR_NOT_AVAILABLE: Component returned failure code 0x8004011 (NS_ERROR_NOT_AVAILABLE) [nsIDOMWindow.print] Source File: resource://pdf.js/web/viewer.js Line: 2708 Again, under load testing I'm experiencing quite a few hangs but no crashes yet.
In 18.0b3, there are no crashes on Mac OS X and one related crash on Windows, bp-91ed2345-6559-4347-b1c4-5a54f2121207, confirming the correlation to PDF Viewer.
Asking for untracking in 18.0 Beta so that it shows up in the query for Aurora tracked bugs in https://wiki.mozilla.org/Platform/2012-12-11
Untracking for FF18 since PDF.js is disabled on that version now. Roc - can you help find somebody to perform a code investigation here? QA hasn't been able to find STR thus far.
Assignee: nobody → roc
Cameron, could you look into the crash stacks from the links in comment #0 and see if you can guess why we might be crashing?
Assignee: roc → cam
Here is a guess: nsPrintEngine creates an nsPresContext for us, but doesn't keep a strong reference to it itself. Strong references are only kept in in the various nsPrintObjects' mPresContext. In http://hg.mozilla.org/releases/mozilla-beta/annotate/6eb5edeb9f52/layout/printing/nsPrintObject.cpp#l101 we release the nsPresContext. If that happens to drop the refcnt to 0, ~nsPresContext will set mShell to null. Then later in ~nsStyleContext we try to call StyleSet() on the presContext->PresShell(), which has returned null. Is that plausible? (If so, maybe we should move the `mPresContext = nullptr;` line to the end of nsPrintObject::DestroyPresentation.)
Attached patch patchSplinter Review
In case my guess is right.
Attachment #693222 - Flags: review?(roc)
Whiteboard: [leave open]
Dropping qawanted from this bug given multiple unsuccessful attemtps to reproduce. Please re-add if there are any new leads we can follow.
Keywords: qawantedsteps-wanted
Comment on attachment 693222 [details] [diff] [review] patch [Approval Request Comment] Bug caused by (feature/regressing bug #): ? User impact if declined: occasional crashes when trying to print Testing completed (on m-c, etc.): been on m-c for a couple of days, haven't seen any crashes since then (though crashes were coming in only once every one or two days for Firefox 20 anyway), still I think it is worth landing this now Risk to taking this patch (and alternatives if risky): low String or UUID changes made by this patch: N/A
Attachment #693222 - Flags: approval-mozilla-aurora?
(In reply to Cameron McCormack (:heycam) from comment #15) > Comment on attachment 693222 [details] [diff] [review] > patch > > [Approval Request Comment] > Bug caused by (feature/regressing bug #): ? > User impact if declined: occasional crashes when trying to print > Testing completed (on m-c, etc.): been on m-c for a couple of days, haven't > seen any crashes since then (though crashes were coming in only once every > one or two days for Firefox 20 anyway), still I think it is worth landing > this now > Risk to taking this patch (and alternatives if risky): low > String or UUID changes made by this patch: N/A I am holding off on approving on aurora at this time as I see a few crash reports for firefox 20 after this patch has landed : https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A20.0a1&platform=mac&query_search=signature&query_type=contains&reason_type=contains&date=12%2F27%2F2012%2000%3A07%3A29&range_value=1&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=nsStyleContext%3A%3A~nsStyleContext%28%29 Please let me know what you think of those reports & if this needs further investigation. Thank you .
Yes, looks like my patch didn't fix the issue.
Comment on attachment 693222 [details] [diff] [review] patch Not approving on aurora at this time as this does not fix the issue per comment# 17.Please renominate once ready.
Attachment #693222 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Attached patch debuggingSplinter Review
I'm pretty sure the pres context is returning null from PresShell(), but not any closer to understanding why that would be. Is it reasonable to add some debugging like with this patch? It adds some notes to the crash report if there is something amiss with the nsPrintObject tree's pres contexts/shells, so we should can at least see if the problem is in there.
Attachment #696489 - Flags: review?(roc)
Depends on: 826195
Is it correct that PresShell::Destroy() should only be called while the nsPresContext has it as its shell? I take it from the comments at https://hg.mozilla.org/mozilla-central/file/tip/layout/base/nsPresShell.cpp#l1087 and the stack trace in the crashes (where ~nsStyleSet expects nsPresContext::PresShell() to return an object) that this is the case. If that's true, would it also be valid to assert in nsPresContext::SetShell() that if mShell starts out non-null, then it must have had Destroy() called on it? If we want to avoid the crash without understanding why it is happening, then we could just avoid the mPresShell->Destroy() call in nsPrintObject::DestroyPresentation() if mPresContext->GetPresShell() doesn't match mPresShell. That'll leak the PresShell but it's probably better than crashing.
When is it valid for a PresShell to have have its mPresContext = null? Why does it make sense for this line https://hg.mozilla.org/mozilla-central/file/tip/layout/base/nsPresShell.cpp#l1092 to be within the `if (mPresContext)` but not for the mFrameConstructor->WillDestroyFrameTree() call just before it to be?
These might be of interest: bug 229080, bug 154199.
(In reply to Cameron McCormack (:heycam) from comment #22) > If we want to avoid the crash without understanding why it is happening, > then we could just avoid the mPresShell->Destroy() call in > nsPrintObject::DestroyPresentation() if mPresContext->GetPresShell() doesn't > match mPresShell. That'll leak the PresShell but it's probably better than > crashing. Not necessarily, a dangling pointer could lead to a security bug which is worse than crashing :-). Generally a presshell and its prescontext should have the same lifetime, unless something keeps a strong pointer to one of them past Destroy().
Attached patch debugging 2Splinter Review
There's been one crash since the previous debugging patch was in a nightly, but it didn't show anything wrong with the pres shells/contexts in the print obejct tree. Let's record a bit more info on the print object tree structure in the crash report.
Attachment #698398 - Flags: review?(roc)
(In reply to Cameron McCormack (:heycam) from comment #10) > In > http://hg.mozilla.org/releases/mozilla-beta/annotate/6eb5edeb9f52/layout/ > printing/nsPrintObject.cpp#l101 we release the nsPresContext. If that > happens to drop the refcnt to 0, ~nsPresContext will set mShell to null. > Then later in ~nsStyleContext we try to call StyleSet() on the > presContext->PresShell(), which has returned null. > > Is that plausible? (If so, maybe we should move the `mPresContext = > nullptr;` line to the end of nsPrintObject::DestroyPresentation.) Well, PresShell has strong ref to prescontext. NS_IF_RELEASE(mPresContext) is called in PresShell::~PresShell()
From the annotations in the crash reports, it looks like the crashes occur when printing fails.
The STR of bug 830236 makes Firefox crash with this signature for me: bp-475c8f5e-ca79-4248-bff0-690282130114.
Attached patch hold on to PresShell (obsolete) — Splinter Review
As you suggest in bug 830236.
Attachment #701736 - Flags: review?(matspal)
Comment on attachment 701736 [details] [diff] [review] hold on to PresShell Mats has his own patch on bug 830236.
Attachment #701736 - Attachment is obsolete: true
Attachment #701736 - Flags: review?(matspal)
Should we uplift the fixes of bug 826195 and bug 830236 to Aurora/Beta?
Depends on: 830236
Yes I think we should uplift the bug 830236. So far I haven't seen any crashes in Firefox 21 since it landed, and I suspect it has fixed this bug. Bug 826195 is a harmless assertion, it is fine to uplift. I'd like to remove some of the debugging I added in this bug too (and uplift).
Attached patch remove debuggingSplinter Review
Attachment #703644 - Flags: review?(roc)
Comment on attachment 703644 [details] [diff] [review] remove debugging Review of attachment 703644 [details] [diff] [review]: ----------------------------------------------------------------- Thanks
Attachment #703644 - Flags: review?(roc) → review+
Comment on attachment 703644 [details] [diff] [review] remove debugging [Approval Request Comment] Bug caused by (feature/regressing bug #): This bug User impact if declined: Crash reports will have some unnecessary information in them Testing completed (on m-c, etc.): Just landed on inbound Risk to taking this patch (and alternatives if risky): Very low String or UUID changes made by this patch: N/A This is just removing the assertions and crash report notes added while debugging this bug.
Attachment #703644 - Flags: approval-mozilla-aurora?
Comment on attachment 703644 [details] [diff] [review] remove debugging low risk, approving for Aurora.
Attachment #703644 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
Fixed in Beta, Aurora and m-c because of the fix in bug 830236.
Target Milestone: --- → mozilla21
(In reply to Anthony Hughes, Mozilla QA (:ashughes) [Away Jan 20-28] from comment #4) I couldn't reproduce this issue following your investigations information, but following STR from Comment 31 added in bug 830236 I could easily reproduce on FF 19b2 on Ubuntu x86 and Windows 7. Crash reports: bp-da4d2194-8d11-482a-853e-413422130124 bp-c54caaa9-d4cf-43ea-a169-fc5082130124 Firefox build: FF 19b2 Ubuntu x86: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0(20130116072953) Does FF 19b3 contains fix done in bug 830236 so I can or cannot verified this?
(In reply to MarioMi from comment #44) > Does FF 19b3 contains fix done in bug 830236 so I can or cannot verified > this? Yes, it does. See http://hg.mozilla.org/releases/mozilla-beta/graph
(In reply to Scoobidiver from comment #45) > Yes, it does. See http://hg.mozilla.org/releases/mozilla-beta/graph I confirm this fix is verified on FF 19b3 on Windows 7x64 and Ubuntu x86. FF doesn't crash following STR from Comment 31 added in bug 830236 anymore. Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0 Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
I confirm this fix is verified on Latest Aurora too, as I also said oin bug 830236.
Depends on: 900934
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: