Closed Bug 1200021 Opened 9 years ago Closed 9 years ago

crash in mozilla::layers::ContentClientDoubleBuffered::FinalizeFrame(nsIntRegion const&)

Categories

(Core :: Graphics, defect)

40 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox40 --- wontfix
firefox41 --- wontfix
firefox42 --- wontfix
firefox43 + fixed
firefox44 - unaffected
firefox45 + fixed
firefox-esr38 --- affected
b2g-v2.5 --- fixed

People

(Reporter: philipp, Assigned: milan)

References

Details

(Keywords: crash, topcrash, Whiteboard: gfx-noted)

Crash Data

Attachments

(4 files, 6 obsolete files)

3.36 KB, patch
milan
: review+
milan
: checkin+
Details | Diff | Splinter Review
3.70 KB, patch
bas.schouten
: review+
cbook
: checkin+
Details | Diff | Splinter Review
18.90 KB, patch
milan
: review+
Details | Diff | Splinter Review
17.67 KB, patch
milan
: review+
Details | Diff | Splinter Review
This bug was filed from the Socorro interface and is report bp-10629dc5-0118-44ca-9ad3-816cf2150829. ============================================================= Crashing Thread Frame Module Signature Source 0 xul.dll mozilla::layers::ContentClientDoubleBuffered::FinalizeFrame(nsetRegion const&) gfx/layers/client/ContentClient.cpp 1 xul.dll mozilla::layers::RotatedContentBuffer::BeginPaint(mozilla::layers::PaintedLayer*, unsigned int) gfx/layers/RotatedBuffer.cpp 2 xul.dll mozilla::layers::ContentClientRemoteBuffer::BeginPaintBuffer(mozilla::layers::PaintedLayer*, unsigned int) gfx/layers/client/ContentClient.h 3 xul.dll mozilla::layers::ClientPaintedLayer::PaintThebes() gfx/layers/client/ClientPaintedLayer.cpp 4 xul.dll mozilla::layers::ClientPaintedLayer::RenderLayerWithReadback(mozilla::layers::ReadbackProcessor*) gfx/layers/client/ClientPaintedLayer.cpp 5 xul.dll mozilla::layers::ClientContainerLayer::RenderLayer() gfx/layers/client/ClientContainerLayer.h 6 xul.dll mozilla::layers::ClientLayer::RenderLayerWithReadback(mozilla::layers::ReadbackProcessor*) gfx/layers/client/ClientLayerManager.h 7 xul.dll mozilla::layers::ClientContainerLayer::RenderLayer() gfx/layers/client/ClientContainerLayer.h 8 xul.dll mozilla::layers::ClientLayerManager::EndTransactionInternal(void (*)(mozilla::layers::PaintedLayer*, gfxContext*, nsetRegion const&, mozilla::layers::DrawRegionClip, nsetRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) gfx/layers/client/ClientLayerManager.cpp 9 xul.dll mozilla::layers::ClientLayerManager::EndTransaction(void (*)(mozilla::layers::PaintedLayer*, gfxContext*, nsetRegion const&, mozilla::layers::DrawRegionClip, nsetRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) gfx/layers/client/ClientLayerManager.cpp 10 xul.dll nsDisplayList::PaintRoot(nsDisplayListBuilder*, nsRenderingContext*, unsigned int) layout/base/nsDisplayList.cpp 11 xul.dll nsLayoutUtils::PaintFrame(nsRenderingContext*, nsIFrame*, nsRegion const&, unsigned int, unsigned int) layout/base/nsLayoutUtils.cpp 12 xul.dll PresShell::Paint(nsView*, nsRegion const&, unsigned int) layout/base/nsPresShell.cpp this is a graphics crash across all versions of windows (xp-win10) and occurring with all gpu vendors. at the moment it ranks on #15 on the top crash scoreboard with firefox release, but this signature has been around for a while already, including in 38esr. some of the comments point to it happening on certain sites (facebook.com, eurosport.com). https://crash-stats.mozilla.com/search/?signature=%3Dmozilla%3A%3Alayers%3A%3AContentClientDoubleBuffered%3A%3AFinalizeFrame%28nsIntRegion+const%26%29&_facets=version&_facets=user_comments&_facets=adapter_vendor_id&_facets=adapter_driver_version&_facets=app_notes&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-version
This is a null crash. This change here looks suspicious: http://hg.mozilla.org/mozilla-central/diff/436cc8abfda3/gfx/layers/client/ContentClient.cpp Bas you're assuming that 'BorrowDrawTarget()' is not null but it can be. Does this need to be fixed?
Flags: needinfo?(bas)
Whiteboard: gfx-noted
I believe I have the same bug too with v2.35: https://crash-stats.mozilla.com/report/index/6f845258-9280-494a-a806-5628c2150906 for a crash dump with a brand new profile and in safe mode. Basically, I see asian characters, and then the crash like on http://google.cn , https://www.youtube.com/watch?v=fJomcfTgNZ8&t=1191 , http://www.seamonkey-project.org/releases/ , etc. I couldn't reproduce it on my other machines: Debian oldstable with 64-bit contrib build, Mac OS X v10.8.5, and a clean Windows XP Pro SP3 VM. :( Since I can easily reproduce this on my very old English Windows XP Pro SP3 machine, I can help about by collecting, trying newer builds (as long as they are stable to use as my primary software! ;)), etc.
Oh, I meant to say SEAMONKEY v2.35 in my above comment. :/
We've recently noticed reports of users reporting consistent crashes when visiting a page on our site (futurelearn.com). At lease some of these crashes look to be related to this bug from looking at the signature of the crash report. All these users are on v40.0.2 and v40.0.3 of Firefox are on Windows 7 (6.1.7601 Service Pack 1). We do have CSS transitions on this page as well as a video player and javascript enhanced links. Our users are reporting when they click one of these links Firefox crashes. Example of these crash reports: https://crash-stats.mozilla.org/report/index/a5b0fe41-e8fe-4bbb-862e-7345c2150823 (submitted to us by a user) https://crash-stats.mozilla.com/report/index/ae9810be-e16a-4df5-a389-cdf5b2150901 (found by searching crash reports for 'futurelearn') We're also seeing crash reports with a different signature. These crash reports are similar in other respects since they occur on the same pages and for the same OS and Firefox version so may be related: Submitted to us from users: https://crash-stats.mozilla.org/report/index/e0cda87d-2d2c-4c3a-89b3-1cdaf2150831 https://crash-stats.mozilla.org/report/index/ab9a24af-8d7e-4f56-9282-3a8312150904 https://crash-stats.mozilla.com/report/index/6686d44c-88e1-4566-a546-38f152150907 Found by searching crash report comments for 'futurelearn' https://crash-stats.mozilla.com/report/index/15ad7717-599b-4ea7-9ef9-3a7ba2150908 https://crash-stats.mozilla.com/report/index/ed78562b-e1c0-46f5-bbca-076a62150908 https://crash-stats.mozilla.com/report/index/d1a6e077-dd90-449c-a271-4f5712150909 If it's helpful, I can share an example page (it's behind a log in). We've tried to reproduce the problem on our machines but have not yet been successful so haven't been able to create a minimal example.
Chris, thank you for that information. Milan, the OOM|small reports in comment 4 are coming from PushNewDT, so I suspect they may be related to the null BorrowDrawTarget() that we see in comment 1. It doesn't seem like a typical OOM though (I see plenty of virtual and physical memory available in these reports) so are they exhausting some graphics resources, perhaps? Chris can share an example if needed.
Flags: needinfo?(milan)
(In reply to Benoit Girard (:BenWa) from comment #1) > This is a null crash. This change here looks suspicious: > http://hg.mozilla.org/mozilla-central/diff/436cc8abfda3/gfx/layers/client/ > ContentClient.cpp > > Bas you're assuming that 'BorrowDrawTarget()' is not null but it can be. > Does this need to be fixed? It shouldn't be, and if it is it's probably worth adding a gfxCriticalError, since I don't understand how it could be in this case, but I suspect the right decision is just aborting the coby when BorrowDrawTarget is nullptr.
Flags: needinfo?(bas)
IIRC BorrowDrawTarget can cause a memory allocation. These should probably always be proper fallible allocation since layers can be quite large. Unless I'm mistaken?
(In reply to Benoit Girard (:BenWa) from comment #7) > IIRC BorrowDrawTarget can cause a memory allocation. These should probably > always be proper fallible allocation since layers can be quite large. Unless > I'm mistaken? David said: 't doesn't seem like a typical OOM though (I see plenty of virtual and physical memory available in these reports) so are they exhausting some graphics resources, perhaps?' On the side, honestly, I don't know if I'm a big believer in fallible allocations in general. Firefox doesn't have to have large transient memory usage, i.e. if we fail once, we're just going to fail more and more at progressively smaller chunks. All I've ever seen fallible allocations do is destroy my session in a slower and more annoying way.
If we were to abort on a failed allocation, we should only do it on an "unreasonably large" ones - I agree with Bas that if we're in trouble we should just crash, but sometimes people just ask for a large chunk and crashing wouldn't be nice (don't think it's the case here, but printing is one scenario where we ask for large things and crashing isn't nice.)
Flags: needinfo?(milan)
I'll get a diagnostic patch to confirm this is OOM situation - just because Cairo returns 1 as the failure status, it doesn't mean the OOM is the actual cause. The code is quite trigger happy to blame "out of memory" for all sorts of other failure reasons.
Ant - are you able to reproduce in Firefox? I could get a custom build of Firefox with some extra diagnostics.
Flags: needinfo?(ant)
(In reply to Milan Sreckovic [:milan] (in Taipei 9/14-9/18) from comment #11) > Ant - are you able to reproduce in Firefox? I could get a custom build of > Firefox with some extra diagnostics. I was able to reproduce it easily with a portable Firefox v40.0.3 (http://iweb.dl.sourceforge.net/project/portableapps/Mozilla%20Firefox%2C%20Portable%20Ed./Mozilla%20Firefox%2C%20Portable%20Edition%2040.0.3/FirefoxPortable_40.0.3_English.paf.exe). For some reason, my crash reports didn't submit. :( Can you build me a portable Firefox with debugging to share? FYI, I'm going to sleep soon.
Flags: needinfo?(ant)
Just some notes: The assumption is that mFrontClient->BorrowDrawTarget() returns null in ContentClientDoubleBuffered::FinalizeFrame. Before this call, we ensure mFrontClient is valid and locked. BufferTextureClient::BorrowDrawTarget can fail if ImageDataSerializer constructor "fails" or its call ImageDataSerializer::GetAsDrawTarget fails (for Cairo.) ImageDataSerializer constructor can "fail" if the TextureClient doesn't have a buffer. We're currently assuming this is not the case (adding diagnostics to confirm). ImageDataSerializer::GetAsDrawTarget fails iff Factory::CreateDrawTargetForData fails (for Cairo.) Factory::CreateDrawTargetForData (for Cairo) fails: * if we have invalid size (not the case here) * if DrawTargetCairo::Init fails - we are assuming this is the case (adding diagnostics to confirm.) DrawTargetCairo::Init fails if DrawTargetCairo::InitAlreadyReferenced fails - we assume this is the case, and we have a error message in the crash supporting this. It fails with cairo error code 1, which is "out of memory", but that code is used for non-out of memory situations as well. The above fails the way it does iff cairo_image_surface_create_for_data creates invalid surface. This can happen if we have null data, but that would have triggered a failure in ImageDataSerializer's constructor, and we wouldn't get this far. Not likely (but possible, as the error message we see could be from another error, thus adding extra diagnostics.) cairo_image_surface_create_for_data failing with out of memory code means _cairo_image_surface_create_with_pixman_format fails with that code. This can happen if pixman_image_create_bits returns null, or if _cairo_image_surface_create_for_pixman_image fails with the out of memory code (which means malloc failed.) pixman_image_create_bits can return null iff create_bits_image_internal returns null. This can happen if the byte stride is not a multiple of 4 or if _pixman_bits_image_init fails. With the sizes we're seeing, this last one means malloc/calloc failed. ----------------------------------------- OOM may be the case, but it is a boring scenario - this is malloc failing inside of Cairo. I like the idea of something going bad with the stride, especially since great majority of the failure messages are with textures of dimensions not multiple of 4 (although some are.)
Attachment #8662226 - Flags: review?(bas)
Attachment #8662226 - Flags: review?(bas) → review+
So, far, only one crash with a nightly with this change in it (https://crash-stats.mozilla.com/report/index/87d5f038-cef4-4a9a-b233-6db592150921). May need more diagnostics.
Review ping
Flags: needinfo?(bas)
I can reproduce the gfx assert 100% of the time if that's useful: [GFX1-]: Invalid draw target type specified: 0 [GFX1-]: Failed GetAsDrawTarget true, 0x121993000 + 16, Size(1564,60), 6256, 0 STR: 1) Open http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/19b4dd63ca80622e977632224f4205877ea032e263b7b053e9c3b91c36a976d68ba8be86366a6b964e081bb374287a1655b38fed083f10a12678ed78a6da1718 2) Wait for the page to load (the tree view ATM doesn't load but it should reproduce without it)
Reproduces with a smaller window (smaller surfaces) and different formats: [GFX1-]: Invalid draw target type specified: 0 [GFX1-]: Failed GetAsDrawTarget true, 0x125c60000 + 16, Size(413,60), 1652, 0 [GFX1-]: Invalid draw target type specified: 0 [GFX1-]: Failed GetAsDrawTarget true, 0x113621000 + 16, Size(428,60), 1712, 0
Can you reproduce on a local build with the attached patch? Just want to see if there are any additional messages.
Flags: needinfo?(bgirard)
I'm reproducing this on OSX ATM so these warnings wont be relevant. I'll check to see what happens on windows.
The warnings don't reproduce on mac. Perhaps I'm seeing something different then.
Flags: needinfo?(bgirard)
Attachment #8665434 - Flags: review?(bas) → review+
Flags: needinfo?(bas)
Simple diagnostics on error, saving the try run. Let me know if we want one anyway.
Attachment #8665434 - Flags: checkin? → checkin+
Crash Signature: [@ mozilla::layers::ContentClientDoubleBuffered::FinalizeFrame(nsIntRegion const&)] → [@ mozilla::layers::ContentClientDoubleBuffered::FinalizeFrame(nsIntRegion const&)] [@ mozilla::layers::ContentClientDoubleBuffered::FinalizeFrame]
Bughunter has seen mozilla::layers::ContentClientDoubleBuffered::FinalizeFrame crashes on Release, Beta, Aurora, Nightly and NS_ABORT_OOM gfxContext::PushNewDT on Beta and Aurora at least. Manually attempting to reproduce with release builds gives me: http://www.monister.ir/page/MP Firefox 41.0.1 Windows 7 bp-0efe95a2-d516-4a82-9616-dba702151104 [@ mozilla::layers::ContentClientDoubleBuffered::FinalizeFrame ] Firefox 42.0.0 Windows 7 bp-18162508-09cb-477d-af0b-b486f2151104 [@ mozilla::layers::ContentClientDoubleBuffered::FinalizeFrame ] Firefox 43.0b1 Windows 7 bp-d8b4b7d4-226a-497b-af4a-d01b42151104 [@ mozilla::layers::ContentClientDoubleBuffered::FinalizeFrame ] Firefox 44.0a2 Windows 7 no crash Firefox 45.0a1 Windows 7 no crash http://www.cnblogs.com/LonelyShadow/p/4925127.html Windows 7 43.0b1 bp-6960fa10-8a02-454c-a6ba-204d62151104 [@ OOM | small ] Windows 7 43.0b1 bp-2579e4bd-f3e8-4ab3-bc1c-5cebe2151104 [@ OOM | unknown | NS_ABORT_OOM | gfxContext::PushNewDT ] Windows 7 44.0a1 bp-2579e4bd-f3e8-4ab3-bc1c-5cebe2151104 [@ OOM | unknown | NS_ABORT_OOM | gfxContext::PushNewDT ] http://recruit-match.ncsasports.org/clientrms/athletes/967364 I have less luck manually with this url. Nightly 45 tinderbox builds fail with: ??????s: T[GFX2-]: DrawTargetCairo context in error state: out of memory(1) [GFX2-]: ASurface Init failed with Cairo status 1 on 0x000007FEEDA87800 [GFX2-]: DrawTargetCairo context in error state: out of memory(1) ... [GFX2-]: DrawTargetCairo context in error state: out of memory(1) [GFX1]: Attempt to create DrawTarget for invalid surface. Size(7760,2) Cairo Status: 1 Assertion failure: false (An assert from the graphics logger), at c:\builds\moz2_slave\m-cen-w64-d-000000000000000000\build\src\gfx\2d\Logging.h:504
See Also: → 1216909
See Also: → 1175903
using mozregression with http://www.cnblogs.com/LonelyShadow/p/4925127.html narrowed the regression range to --good=2014-12-28 --bad=2015-03-15 but the browser and/or flash began hanging and it was not possible to progress further. http://www.monister.ir/page/MP is being updated now and has an interstitial page atm. :-(
Bob, could you set environment variable MOZ_CAIRO_ERROR_ABORT to 1 and run nightly (or dev edition) and catch that crash? Also, one of the comments mentions http:://Futrelearn.org as "I always crash on it".
Flags: needinfo?(bob)
Nightly Windows 7 32bit w/MOZ_CAIRO_ERROR_ABORT bp-eb885946-95cb-4a1f-9c78-0657d2151105
http://Futrelearn.org did not reproduce in normal automation in Beta/43, Aurora/44, Nightly/45.
Flags: needinfo?(bob)
I think I got it to crash with the URL from comment 31, so I'll see how far I can get.
Assignee: nobody → milan
The OOM message we get from Cairo appears to be a GDI_ERROR returned by GetGlyphOutlineW called here: https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-win32-font.c#1743
Some notes so that I don't forget. DrawTargetCairo::FillGlyphs() eventually calls _cairo_gstate_show_text_glyphs(). In this case, the first attempt to render using _cairo_scaled_font_glyph_path() fails, and sets the error status on the context - using "out of memory" as "unknown error". The fallback is to call _cairo_surface_show_text_glyphs(), which also fails, but it sets the error on the surface instead. Likely not much of a difference, as DrawTargetCairo::Flush() flushes the "oom invalid surface" if the context has that error on it.
This patch may have some controversial pieces to it: There is the addition of "IsValid" to the DrawTarget, but only the Cairo version actually checks - the rest, through the base class, just return true. We leave image surface invalid if the Cairo surface we use in creation is invalid. We now won't make a context for an invalid draw target. The callers are ready for it.
Attachment #8685083 - Flags: review?(bas)
Mentioned in the channel meeting today as a topcrash for 43.0b4. Tracking since it sounds like a high volume crash.
Keywords: topcrash
Review ping.
Flags: needinfo?(bas)
Comment on attachment 8685083 [details] [diff] [review] Watch for failing Cairo draw targets. r=bas Review of attachment 8685083 [details] [diff] [review]: ----------------------------------------------------------------- I'm a little scared of this as long as we don't have a way to get reports for this happening in our release population, this could easily lead to a not crashing but completely dysfunctional browser, which arguably is a lot worse than crashing. ::: gfx/layers/client/ContentClient.cpp @@ +591,5 @@ > surfOnWhite, > mFrontBufferRect, > mFrontBufferRotation); > UpdateDestinationFrom(frontBuffer, updateRegion); > } else { gfxCriticalError() } it will be nice to know where the failing snapshots are coming from even though the Snapshot call itself will log this as well.
Attachment #8685083 - Flags: review?(bas) → review+
Flags: needinfo?(bas)
Added a note so that we can track when this happens. I wasn't worried about it because there are other early returns out of the function, so it seems we are ready for it. https://treeherder.mozilla.org/#/jobs?repo=try&revision=78bf92ee86e5
Attachment #8685083 - Attachment is obsolete: true
Attachment #8691531 - Flags: review+
Checkin-needed for Part 3.
(In reply to Milan Sreckovic [:milan] from comment #44) > Checkin-needed for Part 3. Hi, this failed to apply: applying 1200021.3.p4 patching file gfx/thebes/gfxContext.cpp Hunk #1 FAILED at 82 1 out of 1 hunks FAILED -- saving rejects to file gfx/thebes/gfxContext.cpp.rej patch failed, unable to continue (try -v) patch failed, rejects left in working directory errors during apply, please fix and qrefresh 1200021.3.p4 could you take a look, thanks!
Flags: needinfo?(milan)
Rebase.
Attachment #8691561 - Attachment is obsolete: true
Flags: needinfo?(milan)
Attachment #8692576 - Flags: review+
Keywords: checkin-needed
Looks like part 3 still needs to land here. This is the #2 topcrash on 43 beta 7. Milan, are we any closer to a fix on beta for this, or is this all work to help diagnose what's happening?
Flags: needinfo?(milan)
Flags: needinfo?(cbook)
hm still seems to fail to apply: renamed 1200021 -> 1200021.3.p5 applying 1200021.3.p5 patching file gfx/layers/TextureDIB.cpp Hunk #1 FAILED at 47 Hunk #2 FAILED at 111 Hunk #3 FAILED at 130 3 out of 4 hunks FAILED -- saving rejects to file gfx/layers/TextureDIB.cpp.rej patching file gfx/layers/basic/BasicPaintedLayer.cpp Hunk #1 FAILED at 156 1 out of 2 hunks FAILED -- saving rejects to file gfx/layers/basic/BasicPaintedLayer.cpp.rej patching file gfx/layers/client/ClientPaintedLayer.cpp Hunk #1 FAILED at 74 1 out of 1 hunks FAILED -- saving rejects to file gfx/layers/client/ClientPaintedLayer.cpp.rej patch failed, unable to continue (try -v) patch failed, rejects left in working directory errors during apply, please fix and qrefresh 1200021.3.p5 Tomcats-MacBook-Pro-2:mozilla-inbound Tomcat$
Flags: needinfo?(cbook)
Keywords: checkin-needed
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #47) > Looks like part 3 still needs to land here. > This is the #2 topcrash on 43 beta 7. Milan, are we any closer to a fix on > beta for this, or is this all work to help diagnose what's happening? I'll put up a beta version of the patch (and a rebased patch for nightly.)
Flags: needinfo?(milan)
Rebased. (Should also survive the landing of an unrelated bug 1072501 that changes some of the same files.)
Attachment #8692576 - Attachment is obsolete: true
Attachment #8694258 - Flags: review+
Approval Request Comment See comment 47. We don't seem to have enough patches to see the problem on nightly or aurora, so the idea was to land on beta, then back out if it doesn't fix this, or creates (other) problems.
Attachment #8694259 - Flags: review+
Attachment #8694259 - Flags: approval-mozilla-beta?
Comment on attachment 8694259 [details] [diff] [review] Beta patch - watch for failing Cairo surfaces. Carry r=bas. Speculative fix for a topcrash in beta. Please uplift this to beta only.
Attachment #8694259 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8694258 [details] [diff] [review] Part 3. Watch for failing Cairo draw targets. Carry r=bas This doesn't apply to inbound tip.
Attachment #8694258 - Flags: checkin?
Flags: needinfo?(milan)
Rebased; seems the changes to related files have slowed down, this should hopefully apply.
Attachment #8694258 - Attachment is obsolete: true
Flags: needinfo?(milan)
Attachment #8696068 - Flags: review+
In the meantime, let's see if this goes away in 43b8, or whatever is built after Dec 1st.
yes, this signature has totally ceased with 43.0b8! thank you
See Also: → 1230740
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Just a note in this bug as well - this went to 43 and 45, but not 44. 45 also got the follow up fix in bug 1230740. If we decide that we want this in 44, we need to pick up bug 1230740 at the same time. Put all this together, probably worth a separate bug to track - bug 1233182.
Depends on: 1380431
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: