crash in nsTextFrame::EnsureTextRun(nsTextFrame::TextRunType, gfxContext*, nsIFrame*, nsLineList_iterator const*, unsigned int*)
Categories
(Core :: Layout, defect)
Tracking
()
People
(Reporter: cbook, Assigned: eeejay)
Details
(Keywords: crash, Whiteboard: [tbird crash])
Crash Data
Comment 1•10 years ago
|
||
Updated•10 years ago
|
Comment 2•9 years ago
|
||
Comment 3•6 years ago
|
||
Comment 4•6 years ago
|
||
Comment 5•6 years ago
|
||
Updated•6 years ago
|
Comment 6•6 years ago
|
||
Marking 66/67 as affected. Overall, not a very high volume crash. nsTextFrame::EnsureTextRun signature is the only one that is still crashing. High correlation to accessibility being on.
(100.0% in signature vs 24.18% overall) accessibility = Active [100.0% vs 29.81% if startup_crash = 0]
(73.33% in signature vs 03.51% overall) reason = SIGSEGV /SEGV_ACCERR
(53.33% in signature vs 01.93% overall) Module "libcsc.so" = true
(53.33% in signature vs 02.54% overall) Module "libexynosgscaler.so" = true
(53.33% in signature vs 02.78% overall) Module "libexynosscaler.so" = true
(60.00% in signature vs 12.58% overall) Module "libSEC_EGL.so" = true
(60.00% in signature vs 13.06% overall) Module "gralloc.exynos5.so" = true
(60.00% in signature vs 13.06% overall) Module "u:object_r:system_security_prop:s0" = true
(60.00% in signature vs 14.51% overall) Module "esecomm.jar" = true
(60.00% in signature vs 15.24% overall) Module "boot-esecomm.oat" = true
(60.00% in signature vs 15.24% overall) Module "system@framework@boot-esecomm.art" = true
Comment 7•6 years ago
|
||
could this spike in crashes on android during 66 be related to bug 1518247?
| Assignee | ||
Comment 8•6 years ago
|
||
Good chance it is. I think this is because we calculate accessible names much more often. It looks like it aggrevated something in layout. At least from the accessibility modules perspective, I think we are doing are due diligence and null checking the frames. I'll see if I can somehow trigger this locally.
Comment 9•6 years ago
|
||
This is #6 overall top crash in Fennec 66b11. I guess we shall see where it ends up in b13 which just shipped.
| Assignee | ||
Comment 10•6 years ago
|
||
I'm hoping the patch in bug 1531346 will remedy this. But I want to wait before calling it a dup.
| Assignee | ||
Updated•6 years ago
|
Comment 11•6 years ago
|
||
It looks as if one signature in 67 has a crash in 20190308093915, the other signature has a crash in 20190307094951. Since then on nightly 67 I haven't seen further crashes.
Updated•3 years ago
|
Comment 12•3 years ago
•
|
||
Right now, it looks like a substantial portion of our crashes with this signature are stack overflows, without any particularly deep backtrace (i.e. not a recursive death spiral type overflow). e.g. this one: bp-c4fa80d3-88b7-4416-9f85-c24cb0221013
These stack overflows seem to be pointing to our call to BuildTextRuns -- we're exhausting stack memory when we try to allocate space for our local variables when we enter that function. Maybe there's something we can do to make that function a little cheaper in terms of stack-space-usage?
At first glance, it looks like the heavyweight things there are:
BuildTextRunsScanner scannerwhich brings along someAutoTArrayinstances which each reserve some stack space. (on the order of 100 pointers worth in total, I think? Three arrays with size 10, 50, and 10, where the first array stores pointer-pairs, and the last two arrays store individual pointers.)- Another
AutoTArray<char16_t, BIG_TEXT_NODE_SIZE> buffer;
https://searchfox.org/mozilla-central/rev/76ccfc801e6b736c844cde3fddeab7a748fc8515/layout/generic/nsTextFrame.cpp#1590
Note that BIG_TEXT_NODE_SIZE is 4096, defined here, which goes back to bug 333659 in 2007 in this commit. 4096 feels quite large... That means this one AutoTArray uses at least 8KB of stack space (uint16_t is 2 bytes, and we're storing 4096 2-byte values, which is 8192 i.e. 8KB). That feels substantial. I wonder how often we actually grow to use all 4096 entries there, and if we could start lower and failing over to heap allocation when we do need the space.
jfkthame, do you think we could change our AutoTArray usage there (and maybe BIG_TEXT_NODE_SIZE in general) to be more conservative of stack space, to avoid triggering stack overflows when entering this function?
Comment 13•3 years ago
|
||
(note: the crash report I linked in the previous comment was for a 32-bit system. There are also quite a few stack-overflow crashes with this signature on 64-bit systems, but so far I haven't been able to find a stack overflow here for a 64-bit system with a useful backtrace; they all seem to be missing any call-tree info. e.g. bp-c8be21bc-3676-43c5-a2c6-47f980220902 . I wonder if that's indicative of something else going wrong, vs. if we've somehow lost the ability to stackwalk when there's a stack overflow on some 64-bit systems, vs. something else.)
Comment 14•3 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #12)
jfkthame, do you think we could change our AutoTArray usage there (and maybe BIG_TEXT_NODE_SIZE in general) to be more conservative of stack space, to avoid triggering stack overflows when entering this function?
(This is just spitballing a little bit; I'm also not entirely sure that reducing this weight would reduce overall crash volume. Our stack limit is on the order of multiple megabytes, according to kmag's recollection -- so if we're reaching that limit, then shaving off a portion of 8KB in just this one function probably won't help us too much; maybe we'd still overshoot a little later on.)
Comment 15•3 years ago
|
||
We could certainly try reducing BIG_TEXT_NODE_SIZE, though before picking a new number it might be interesting to do some instrumentation to see what the distribution of sizes is like while browsing various kinds of content. But as you suggest, it's not that big compared to the stack limit, so it's unclear whether it would really help significantly. In many cases, I think the real problem is recursion further up the call stack. EnsureTextRun (and the BuildTextRunsScanner it uses) does have a fairly large stack frame, and so it will often show up as the frame that finally overflows, but if we make that frame smaller it just means we'll get a few more levels of recursion in before it happens.
The other thing I wonder, looking at bp-c4fa80d3-88b7-4416-9f85-c24cb0221013 from comment 12, is what else is already on the main thread's stack, beyond the ~70 frames here which seem to be a callback processing a window-resized message. These stack frames don't look like nearly enough to cause overflow. Is it possible for this callback to be called by Windows while we're deep inside something else already?
Updated•3 years ago
|
Comment 16•3 years ago
|
||
Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.
For more information, please visit auto_nag documentation.
Description
•