Open Bug 1153736 Opened 10 years ago Updated 3 years ago

crash in nsTextFrame::EnsureTextRun(nsTextFrame::TextRunType, gfxContext*, nsIFrame*, nsLineList_iterator const*, unsigned int*)

Categories

(Core :: Layout, defect)

All
macOS
defect

Tracking

()

REOPENED
Tracking Status
firefox48 --- wontfix
firefox-esr45 --- wontfix
thunderbird_esr102 --- affected
firefox-esr60 --- affected
firefox64 --- affected
firefox65 --- affected
firefox66 --- affected
firefox67 --- affected

People

(Reporter: cbook, Assigned: eeejay)

Details

(Keywords: crash, Whiteboard: [tbird crash])

Crash Data

This bug was filed from the Socorro interface and is report bp-0bb425dc-34db-467f-87cb-85c862150413. ============================================================= crashed while doing sheriff work on treeherder (like marking failures etc) Crashing Thread Frame Module Signature Source 0 XUL nsTextFrame::EnsureTextRun(nsTextFrame::TextRunType, gfxContext*, nsIFrame*, nsLineList_iterator const*, unsigned int*) obj-firefox/x86_64/dist/include/nsExpirationTracker.h 1 XUL nsTextFrame::PaintText(nsRenderingContext*, nsPoint, nsRect const&, nsCharClipDisplayItem const&, gfxTextContextPaint*, nsTextFrame::DrawPathCallbacks*) layout/generic/nsTextFrame.cpp 2 XUL nsDisplayText::Paint(nsDisplayListBuilder*, nsRenderingContext*) layout/generic/nsTextFrame.cpp 3 XUL mozilla::FrameLayerBuilder::PaintItems(nsTArray<mozilla::FrameLayerBuilder::ClippedDisplayItem>&, nsIntRect const&, gfxContext*, nsRenderingContext*, nsDisplayListBuilder*, nsPresContext*, nsIntPoint const&, float, float, int) layout/base/FrameLayerBuilder.cpp 4 XUL mozilla::FrameLayerBuilder::DrawPaintedLayer(mozilla::layers::PaintedLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*) layout/base/FrameLayerBuilder.cpp 5 XUL mozilla::layers::ClientTiledLayerBuffer::PostValidate(nsIntRegion const&) gfx/layers/client/TiledContentClient.cpp 6 XUL mozilla::layers::TiledLayerBuffer<mozilla::layers::ClientTiledLayerBuffer, mozilla::layers::TileClient>::Update(nsIntRegion const&, nsIntRegion const&) gfx/layers/TiledLayerBuffer.h 7 XUL mozilla::layers::ClientTiledLayerBuffer::PaintThebes(nsIntRegion const&, nsIntRegion const&, void (*)(mozilla::layers::PaintedLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*) gfx/layers/client/TiledContentClient.cpp 8 XUL mozilla::layers::ClientTiledPaintedLayer::RenderHighPrecision(nsIntRegion&, nsIntRegion const&, void (*)(mozilla::layers::PaintedLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*) gfx/layers/client/ClientTiledPaintedLayer.cpp 9 XUL mozilla::layers::ClientTiledPaintedLayer::RenderLayer() gfx/layers/client/ClientTiledPaintedLayer.cpp 10 XUL mozilla::layers::ClientContainerLayer::RenderLayer() gfx/layers/client/ClientContainerLayer.h 11 XUL mozilla::layers::ClientContainerLayer::RenderLayer() gfx/layers/client/ClientContainerLayer.h 12 XUL mozilla::layers::ClientLayerManager::EndTransactionInternal(void (*)(mozilla::layers::PaintedLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) gfx/layers/client/ClientLayerManager.cpp 13 XUL mozilla::layers::ClientLayerManager::EndTransaction(void (*)(mozilla::layers::PaintedLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) gfx/layers/client/ClientLayerManager.cpp 14 XUL nsDisplayList::PaintRoot(nsDisplayListBuilder*, nsRenderingContext*, unsigned int) layout/base/nsDisplayList.cpp 15 XUL nsLayoutUtils::PaintFrame(nsRenderingContext*, nsIFrame*, nsRegion const&, unsigned int, unsigned int) layout/base/nsLayoutUtils.cpp 16 XUL PresShell::Paint(nsView*, nsRegion const&, unsigned int) layout/base/nsPresShell.cpp 17 XUL nsViewManager::ProcessPendingUpdatesPaint(nsIWidget*) view/nsViewManager.cpp 18 XUL nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) view/nsViewManager.cpp 19 XUL nsRefreshDriver::Tick(long long, mozilla::TimeStamp) layout/base/nsRefreshDriver.cpp 20 XUL nsRefreshDriver::FinishedWaitingForTransaction() layout/base/nsRefreshDriver.cpp 21 XUL mozilla::layers::CompositorChild::RecvDidComposite(unsigned long long const&, unsigned long long const&) gfx/layers/ipc/CompositorChild.cpp 22 XUL mozilla::layers::PCompositorChild::OnMessageReceived(IPC::Message const&) obj-firefox/x86_64/ipc/ipdl/PCompositorChild.cpp 23 XUL mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) ipc/glue/MessageChannel.cpp 24 XUL mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message const&) ipc/glue/MessageChannel.cpp 25 XUL mozilla::ipc::MessageChannel::OnMaybeDequeueOne() ipc/glue/MessageChannel.cpp 26 XUL MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) ipc/chromium/src/base/message_loop.cc 27 XUL MessageLoop::DoWork() ipc/chromium/src/base/message_loop.cc 28 XUL mozilla::ipc::DoWorkRunnable::Run()
Crash Signature: [@ nsTextFrame::EnsureTextRun(nsTextFrame::TextRunType, gfxContext*, nsIFrame*, nsLineList_iterator const*, unsigned int*)] → [@ nsTextFrame::EnsureTextRun(nsTextFrame::TextRunType, gfxContext*, nsIFrame*, nsLineList_iterator const*, unsigned int*)] [@ nsTextFrame::EnsureTextRun]
Crash volume for signature 'nsTextFrame::EnsureTextRun': - nightly (version 50): 0 crashes from 2016-06-06. - aurora (version 49): 0 crashes from 2016-06-07. - beta (version 48): 25 crashes from 2016-06-06. - release (version 47): 43 crashes from 2016-05-31. - esr (version 45): 4 crashes from 2016-04-07. Crash volume on the last weeks: W. N-1 W. N-2 W. N-3 W. N-4 W. N-5 W. N-6 W. N-7 - nightly 0 0 0 0 0 0 0 - aurora 0 0 0 0 0 0 0 - beta 3 2 1 0 5 11 3 - release 3 7 8 3 5 6 4 - esr 0 2 1 1 0 0 0 Affected platforms: Windows, Mac OS X
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Closing because no crashes reported for 12 weeks.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---

Marking 66/67 as affected. Overall, not a very high volume crash. nsTextFrame::EnsureTextRun signature is the only one that is still crashing. High correlation to accessibility being on.

(100.0% in signature vs 24.18% overall) accessibility = Active [100.0% vs 29.81% if startup_crash = 0]
(73.33% in signature vs 03.51% overall) reason = SIGSEGV /SEGV_ACCERR
(53.33% in signature vs 01.93% overall) Module "libcsc.so" = true
(53.33% in signature vs 02.54% overall) Module "libexynosgscaler.so" = true
(53.33% in signature vs 02.78% overall) Module "libexynosscaler.so" = true
(60.00% in signature vs 12.58% overall) Module "libSEC_EGL.so" = true
(60.00% in signature vs 13.06% overall) Module "gralloc.exynos5.so" = true
(60.00% in signature vs 13.06% overall) Module "u:object_r:system_security_prop:s0" = true
(60.00% in signature vs 14.51% overall) Module "esecomm.jar" = true
(60.00% in signature vs 15.24% overall) Module "boot-esecomm.oat" = true
(60.00% in signature vs 15.24% overall) Module "system@framework@boot-esecomm.art" = true

could this spike in crashes on android during 66 be related to bug 1518247?

Flags: needinfo?(eitan)

Good chance it is. I think this is because we calculate accessible names much more often. It looks like it aggrevated something in layout. At least from the accessibility modules perspective, I think we are doing are due diligence and null checking the frames. I'll see if I can somehow trigger this locally.

Flags: needinfo?(eitan)

This is #6 overall top crash in Fennec 66b11. I guess we shall see where it ends up in b13 which just shipped.

I'm hoping the patch in bug 1531346 will remedy this. But I want to wait before calling it a dup.

Assignee: nobody → eitan

It looks as if one signature in 67 has a crash in 20190308093915, the other signature has a crash in 20190307094951. Since then on nightly 67 I haven't seen further crashes.

QA Whiteboard: qa-not-actionable
Severity: critical → S2

Right now, it looks like a substantial portion of our crashes with this signature are stack overflows, without any particularly deep backtrace (i.e. not a recursive death spiral type overflow). e.g. this one: bp-c4fa80d3-88b7-4416-9f85-c24cb0221013

These stack overflows seem to be pointing to our call to BuildTextRuns -- we're exhausting stack memory when we try to allocate space for our local variables when we enter that function. Maybe there's something we can do to make that function a little cheaper in terms of stack-space-usage?

At first glance, it looks like the heavyweight things there are:

Note that BIG_TEXT_NODE_SIZE is 4096, defined here, which goes back to bug 333659 in 2007 in this commit. 4096 feels quite large... That means this one AutoTArray uses at least 8KB of stack space (uint16_t is 2 bytes, and we're storing 4096 2-byte values, which is 8192 i.e. 8KB). That feels substantial. I wonder how often we actually grow to use all 4096 entries there, and if we could start lower and failing over to heap allocation when we do need the space.

jfkthame, do you think we could change our AutoTArray usage there (and maybe BIG_TEXT_NODE_SIZE in general) to be more conservative of stack space, to avoid triggering stack overflows when entering this function?

Crash Signature: [@ nsTextFrame::EnsureTextRun(nsTextFrame::TextRunType, gfxContext*, nsIFrame*, nsLineList_iterator const*, unsigned int*)] [@ nsTextFrame::EnsureTextRun] → [@ nsTextFrame::EnsureTextRun]
Flags: needinfo?(jfkthame)

(note: the crash report I linked in the previous comment was for a 32-bit system. There are also quite a few stack-overflow crashes with this signature on 64-bit systems, but so far I haven't been able to find a stack overflow here for a 64-bit system with a useful backtrace; they all seem to be missing any call-tree info. e.g. bp-c8be21bc-3676-43c5-a2c6-47f980220902 . I wonder if that's indicative of something else going wrong, vs. if we've somehow lost the ability to stackwalk when there's a stack overflow on some 64-bit systems, vs. something else.)

(In reply to Daniel Holbert [:dholbert] from comment #12)

jfkthame, do you think we could change our AutoTArray usage there (and maybe BIG_TEXT_NODE_SIZE in general) to be more conservative of stack space, to avoid triggering stack overflows when entering this function?

(This is just spitballing a little bit; I'm also not entirely sure that reducing this weight would reduce overall crash volume. Our stack limit is on the order of multiple megabytes, according to kmag's recollection -- so if we're reaching that limit, then shaving off a portion of 8KB in just this one function probably won't help us too much; maybe we'd still overshoot a little later on.)

We could certainly try reducing BIG_TEXT_NODE_SIZE, though before picking a new number it might be interesting to do some instrumentation to see what the distribution of sizes is like while browsing various kinds of content. But as you suggest, it's not that big compared to the stack limit, so it's unclear whether it would really help significantly. In many cases, I think the real problem is recursion further up the call stack. EnsureTextRun (and the BuildTextRunsScanner it uses) does have a fairly large stack frame, and so it will often show up as the frame that finally overflows, but if we make that frame smaller it just means we'll get a few more levels of recursion in before it happens.

The other thing I wonder, looking at bp-c4fa80d3-88b7-4416-9f85-c24cb0221013 from comment 12, is what else is already on the main thread's stack, beyond the ~70 frames here which seem to be a callback processing a window-resized message. These stack frames don't look like nearly enough to cause overflow. Is it possible for this callback to be called by Windows while we're deep inside something else already?

Flags: needinfo?(jfkthame)
Whiteboard: [tbird crash]

Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: S2 → S3
You need to log in before you can comment on or make changes to this bug.