Closed
Bug 1324130
Opened 8 years ago
Closed 7 years ago
Crash in _pixman_implementation_lookup_composite
Categories
(Core :: Graphics, defect)
Tracking
()
RESOLVED
FIXED
mozilla54
People
(Reporter: philipp, Assigned: lsalzman)
References
Details
(Keywords: crash, regression, Whiteboard: [gfx-noted])
Crash Data
Attachments
(2 files, 1 obsolete file)
936 bytes,
patch
|
jrmuizel
:
review+
|
Details | Diff | Splinter Review |
2.12 KB,
patch
|
lsalzman
:
review+
jcristau
:
approval-mozilla-aurora+
jcristau
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-744b786c-4db6-450b-a46c-e03122161216. ============================================================= Crashing Thread (0) Frame Module Signature Source 0 xul.dll _pixman_implementation_lookup_composite gfx/cairo/libpixman/src/pixman-implementation.c:108 1 xul.dll _moz_pixman_image_composite32 gfx/cairo/libpixman/src/pixman.c:683 2 xul.dll _cairo_image_surface_composite gfx/cairo/cairo/src/cairo-image-surface.c:4318 3 xul.dll _cairo_surface_composite gfx/cairo/cairo/src/cairo-surface.c:1889 4 xul.dll _cairo_surface_fallback_composite gfx/cairo/cairo/src/cairo-surface-fallback.c:1431 5 xul.dll _cairo_surface_composite gfx/cairo/cairo/src/cairo-surface.c:1900 6 xul.dll _composite_rectangle gfx/cairo/cairo/src/cairo-win32-surface.c:3176 7 xul.dll _clip_and_composite_trapezoids gfx/cairo/cairo/src/cairo-win32-surface.c:3226 8 xul.dll _cairo_win32_surface_fallback_fill gfx/cairo/cairo/src/cairo-win32-surface.c:3841 9 xul.dll _cairo_path_fixed_fill_rectilinear_to_region gfx/cairo/cairo/src/cairo-path-fill.c:258 10 xul.dll _cairo_gstate_fill gfx/cairo/cairo/src/cairo-gstate.c:1290 11 xul.dll mozilla::gfx::DrawTargetCairo::DrawPattern(mozilla::gfx::Pattern const&, mozilla::gfx::StrokeOptions const&, mozilla::gfx::DrawOptions const&, mozilla::gfx::DrawTargetCairo::DrawPatternType, bool) gfx/2d/DrawTargetCairo.cpp:1032 12 xul.dll mozilla::gfx::DrawTargetCairo::FillRect(mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits, float> const&, mozilla::gfx::Pattern const&, mozilla::gfx::DrawOptions const&) gfx/2d/DrawTargetCairo.cpp:1089 13 xul.dll gfxContext::Fill(mozilla::gfx::Pattern const&) gfx/thebes/gfxContext.cpp:198 14 xul.dll gfxContext::SetMatrix(gfxMatrix const&) gfx/thebes/gfxContext.cpp:272 15 xul.dll nsCSSRendering::PaintGradient(nsPresContext*, nsRenderingContext&, nsStyleGradient*, nsRect const&, nsRect const&, nsRect const&, nsSize const&, mozilla::gfx::IntRectTyped<mozilla::CSSPixel> const&, nsSize const&) layout/base/nsCSSRendering.cpp:3038 16 xul.dll nsImageRenderer::Draw(nsPresContext*, nsRenderingContext&, nsRect const&, nsRect const&, nsRect const&, nsPoint const&, nsSize const&, mozilla::gfx::IntRectTyped<mozilla::CSSPixel> const&) layout/base/nsCSSRendering.cpp:5482 17 xul.dll nsImageRenderer::DrawBackground(nsPresContext*, nsRenderingContext&, nsRect const&, nsRect const&, nsPoint const&, nsRect const&, nsSize const&) layout/base/nsCSSRendering.cpp:5584 crashes with this signature have been for a while but they are increasing in frequency after firefox 50 and again after 51 - they are mostly happening in the content process. just looking at the beta channel, there have been ~1-5 daily crashes with this signature before 50, in 50 there were around 5-20 daily crashes, and in 51 it's rising to a volume of 20-100 daily crashes. currently it's the #10 content crasher on 51.0b7 accounting for 0.9% of content crashes. Correlations for Firefox Beta: (100.0% in signature vs 00.44% overall) address = 0x4 (100.0% in signature vs 13.99% overall) reason = EXCEPTION_ACCESS_VIOLATION_READ (99.49% in signature vs 30.59% overall) Module "wevtapi.dll" = true (99.49% in signature vs 32.20% overall) Module "Wpc.dll" = true (100.0% in signature vs 41.18% overall) Module "pnrpnsp.dll" = true (100.0% in signature vs 41.24% overall) Module "NapiNSP.dll" = true (100.0% in signature vs 41.25% overall) Module "nlaapi.dll" = true (92.89% in signature vs 31.13% overall) Module "samcli.dll" = true (86.80% in signature vs 28.25% overall) Module "RpcRtRemote.dll" = true (86.80% in signature vs 29.33% overall) Module "samlib.dll" = true (86.80% in signature vs 29.51% overall) Module "WSHTCPIP.DLL" = true (85.28% in signature vs 28.63% overall) Module "d3d11.dll" = true (85.28% in signature vs 29.39% overall) Module "dxgi.dll" = true (86.80% in signature vs 31.86% overall) Module "dhcpcsvc.dll" = true (49.75% in signature vs 11.64% overall) os_arch = amd64 (22.84% in signature vs 00.79% overall) adapter_device_id = 0x0042
Comment 1•8 years ago
|
||
From user dump, I found cache in [1] was null and caused problem when access cache->cache[i].fast_path. It looks like the allocation failure of TLS[2] or calloc[3]. I also found the 'Uptime' of these crashes were less than 25 seconds. [1]http://searchfox.org/mozilla-central/source/gfx/cairo/libpixman/src/pixman-implementation.c#91 [2]http://searchfox.org/mozilla-central/source/gfx/cairo/libpixman/src/pixman-compiler.h#179 [3]http://searchfox.org/mozilla-central/source/gfx/cairo/libpixman/src/pixman-compiler.h#149 Jeff, do you have any idea about this bug?
Flags: needinfo?(jmuizelaar)
Updated•8 years ago
|
Whiteboard: [gfx-noted]
Updated•7 years ago
|
Comment 2•7 years ago
|
||
Still alive and well on 52. No idea if 53/54 are affected since this only appears to crop up on Beta and Release. Would be great if we could get someone to investigate this more.
status-firefox54:
--- → ?
Here's a 52 crash - https://crash-stats.mozilla.com/report/index/d8378a6e-1729-4021-870b-3e8352170208. Bas, can this stack happen with the default prefs? 52 should have skia content, this doesn't look like printing, but there is the DrawTargetRecording and basic layers, and it gets a bit confusing.
Flags: needinfo?(milan)
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(bas)
Comment 5•7 years ago
|
||
So the bottom part of that stack makes no sense to me, so I don't know where it's coming from. For the top bit, I agree, it looks like it would have to be printing. Also note the short uptime on this crash. In this case it's actually possible someone was printing, with an uptime of less than 25 seconds it seems a little unlikely. Unless they use Firefox as their PDF handler or something like that? With non-default prefs I don't see how it could happen, but maybe there's parts of the code where we didn't convert to Skia? The people that ran the conversion would probably know better than me about that particular situation.
Flags: needinfo?(bas)
Lee, thoughts? Printing or something going wrong with Skia pref'd Firefox, or somebody flipping the preference to Cairo?
Flags: needinfo?(lsalzman)
Assignee | ||
Comment 7•7 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #6) > Lee, thoughts? Printing or something going wrong with Skia pref'd Firefox, > or somebody flipping the preference to Cairo? Looking through some of the 52 bug reports, there are some observations: Sometimes we are just explicitly calling pixman_image_composite32 directly, like in this stack running through imagelib: https://crash-stats.mozilla.com/report/index/bde89733-2ed9-4943-abf5-7bd702170215 Sometimes we are unconditionally creating temporary software Cairo contexts in various places in layout, like the SVG code: https://crash-stats.mozilla.com/report/index/37256978-f04e-4d2a-8355-3c2142170215 Possibly other places too by some negligible counts, but those seem the main ones. So there are sufficient places we still use Cairo DTs, even when we're not even using Cairo as our content or canvas backend. The stacks pretty much all end up here: https://hg.mozilla.org/releases/mozilla-beta/annotate/7b8aa893944b/gfx/cairo/libpixman/src/pixman-implementation.c#l108 The crash address is 0x4, which is strange because the line pegged is info->dest_flags, and dest_flags is stored at way higher offset than 4 within the pixman_fast_path_t struct. However, within the cache_t struct, fast_path itself WOULD be at address 0x4 on x86-32, where the pointer size is 4 bytes, given this cache_t definition: typedef struct { struct { pixman_implementation_t * imp; pixman_fast_path_t fast_path; } cache [N_CACHED_FAST_PATHS]; } cache_t; The report listing indicates this only happens on x86 to back up that evidence. So the only reason accessing &cache[0].fast_path would ever give us 0x4 is if cache is NULL. That would mean this line above is failing: cache = PIXMAN_GET_THREAD_LOCAL (fast_path_cache); Why that is is somewhat of a mystery, but a trivial workaround for it is to just make it bypass the cache if accessing it fails and do the fallback lookup. I can write up a patch for this.
Flags: needinfo?(lsalzman)
Assignee | ||
Comment 8•7 years ago
|
||
The only place pixman actually uses TLS is this cache, and this seems to be intermittently failing for reasons we can't pin down. So for now, this chooses to just make this function more robust against the cache not being available, but still letting pixman work by going through its normal implementation-walking code that does not access the cache, which is not necessary for functioning.
Comment 9•7 years ago
|
||
Comment on attachment 8837764 [details] [diff] [review] verify accessing the TLS fast-path cache succeeded in _pixman_implementation_lookup_composite We need to understand when cache will be null.
Attachment #8837764 -
Flags: review?(jmuizelaar) → review-
Assignee | ||
Comment 10•7 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #9) > Comment on attachment 8837764 [details] [diff] [review] > verify accessing the TLS fast-path cache succeeded in > _pixman_implementation_lookup_composite > > We need to understand when cache will be null. Okay, I will leave that investigation to someone else who may be interested for now.
Assignee: lsalzman → nobody
Status: ASSIGNED → NEW
Comment 12•7 years ago
|
||
So if this is a TLS problem we can fix it post XP by switching from using the Tls functions and using __declspec(thread). However, this code has not changed in a long time, so if this is a regression the cause came from something else.
Assignee | ||
Comment 13•7 years ago
|
||
Since the last version on which we support Windows XP is 52, and trunk is at 54, we no longer need to use the pixman TLS workaround in trunk. This just removes the enabling of that workaround in our build, so that we now use __declspec(thread), which will get rid of the inherent bugs of setting up the TLS cache. We will uplift a different patch for beta and/or aurora to wallpaper over the problem there unless a better solution can eventually be found.
Updated•7 years ago
|
Attachment #8838143 -
Flags: review?(jmuizelaar) → review+
Assignee | ||
Comment 14•7 years ago
|
||
Jeff, can you r+ the original wallpaper patch on the condition that we will only use it for uplift?
Updated•7 years ago
|
Attachment #8837764 -
Flags: review- → review+
Comment 15•7 years ago
|
||
Comment on attachment 8837764 [details] [diff] [review] verify accessing the TLS fast-path cache succeeded in _pixman_implementation_lookup_composite Review of attachment 8837764 [details] [diff] [review]: ----------------------------------------------------------------- Please add a better comment to this.
Assignee | ||
Comment 16•7 years ago
|
||
Added a comment explaining the situation with Tls functions failing to set up the fast-path cache on Windows XP and how this may cause the fast-path cache to end up null, yet we can never the less function without the fast-path cache, just having to look through the fast-path implementation list via linear search every time.
Attachment #8837764 -
Attachment is obsolete: true
Flags: needinfo?(milan)
Attachment #8838149 -
Flags: review+
Comment 17•7 years ago
|
||
Pushed by lsalzman@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/debb1b927f6a use __declspec(thread) instead of Tls calls in pixman on Windows. r=jrmuizel
Assignee | ||
Comment 18•7 years ago
|
||
Comment on attachment 8838149 [details] [diff] [review] Bug 1324130 - verify accessing the TLS fast-path cache succeeded in _pixman_implementation_lookup_composite. r=jrmuizel Approval Request Comment [Feature/Bug causing the regression]: bug 1007702, so 52+ [User impact if declined]: Crashiness for 32-bit Windows. [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: No. This patch is only for aurora/beta, as we are going with a different fix for trunk that can't be used for beta 52 due to the requirement that we still support Windows XP in that build. [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: aurora (53), beta (52) [Is the change risky?]: No. [Why is the change risky/not risky?]: This code in question can function without the mechanism that would otherwise make it crash. This just bypasses the crash. [String changes made/needed]: None
Attachment #8838149 -
Attachment description: verify accessing the TLS fast-path cache succeeded in _pixman_implementation_lookup_composite → Bug 1324130 - verify accessing the TLS fast-path cache succeeded in _pixman_implementation_lookup_composite. r=jrmuizel
Attachment #8838149 -
Flags: approval-mozilla-beta?
Attachment #8838149 -
Flags: approval-mozilla-aurora?
Comment 19•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/debb1b927f6a
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Comment 20•7 years ago
|
||
Comment on attachment 8838149 [details] [diff] [review] Bug 1324130 - verify accessing the TLS fast-path cache succeeded in _pixman_implementation_lookup_composite. r=jrmuizel workaround TLS issue in pixman causing crashes on windows, aurora53+, beta52+
Attachment #8838149 -
Flags: approval-mozilla-beta?
Attachment #8838149 -
Flags: approval-mozilla-beta+
Attachment #8838149 -
Flags: approval-mozilla-aurora?
Attachment #8838149 -
Flags: approval-mozilla-aurora+
Comment 21•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/53cd0f71c361
Comment 22•7 years ago
|
||
(In reply to Carsten Book [:Tomcat] from comment #21) > https://hg.mozilla.org/releases/mozilla-aurora/rev/53cd0f71c361 backed this out and landed the correct patch in https://hg.mozilla.org/releases/mozilla-aurora/rev/9a78c5da4fc8855bfe7c3d3234bd3c6e413b4517
Comment 23•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/c8b603a9fbd6
Comment 24•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-esr52/rev/c8b603a9fbd6
status-firefox-esr52:
--- → fixed
Comment 25•7 years ago
|
||
Setting qe-verify- based on Lee's assessment on manual testing needs (Comment 18) and the fact that this fix has automated coverage.
Flags: qe-verify-
You need to log in
before you can comment on or make changes to this bug.
Description
•