Closed Bug 1252677 Opened 9 years ago Closed 9 years ago

[e10s] Windows-only texture leaks

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla48

Tracking Flags:

Tracking

Status

e10s

m9+

---

firefox46

---

wontfix

firefox47

---

affected

firefox48

---

fixed

People

(Reporter: jgriffin, Assigned: bas.schouten)

References

(Blocks 1 open bug)

Details

(Keywords: meta, Whiteboard: [gfx-noted][e10s-orangeblockers][MemShrink:P1])

Attachments

(3 files, 1 obsolete file)

Increase Windows e10s texture leak threshold even more. 9 years ago Andrew McCreight [:mccr8] 1.20 KB, patch	jgriffin : feedback+	Details \| Diff \| Splinter Review
Increase Windows e10s texture leak threshold even more. 9 years ago Andrew McCreight [:mccr8] 1.50 KB, patch	erahm : review+	Details \| Diff \| Splinter Review
Log when textures are created and destroyed. DO NOT LAND 9 years ago Andrew McCreight [:mccr8] 1.43 KB, patch		Details \| Diff \| Splinter Review
Enable e10s and canvas tests everywhere, disable some canvas tests. DO NOT LAND. 9 years ago Andrew McCreight [:mccr8] 11.37 KB, patch		Details \| Diff \| Splinter Review

Jonathan Griffin (:jgriffin)

Reporter

Description

•

9 years ago

When canvas mochitests are run on Windows 7 debug in e10s mode, they leak: leakcheck | tab process: 4328 bytes leaked (CondVar, Mutex, PTextureChild, SharedMemory, TextureChild, ...) In order to get the suite enabled on Treeherder, I will disable the canvas tests only on win7-debug-e10s. Full log: https://treeherder.mozilla.org/logviewer.html#?job_id=17240284&repo=try

Andrew McCreight [:mccr8]

Comment 1

•

9 years ago

I think this is just a variation on bug 1232549. We could increase the Windows-specific leak thresholds I added in bug 1219919 so that this is green, too.

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Depends on: 1232549

Milan Sreckovic [:milan] (needinfo for best results)

Comment 2

•

9 years ago

Yes, it would be much better to increase that threshold than to disable the tests completely.

Flags: needinfo?(jgriffin)

Whiteboard: [gfx-noted]

Andrew McCreight [:mccr8]

Comment 3

•

9 years ago

I can put together a patch to do that in this bug. Jonathan, do you have links to additional logs? I need to figure out of the number of leaked objects is entirely consistent or might vary a bit.

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Assignee: nobody → continuation

Andrew McCreight [:mccr8]

Comment 4

•

9 years ago

Attached patch Increase Windows e10s texture leak threshold even more. (obsolete) — Details — Splinter Review

Jonathan, does this make the orange go away?

Attachment #8725774 - Flags: feedback?(jgriffin)

Jonathan Griffin (:jgriffin)

Reporter

Comment 5

•

9 years ago

(In reply to Andrew McCreight [:mccr8] from comment #3) > I can put together a patch to do that in this bug. Jonathan, do you have > links to additional logs? I need to figure out of the number of leaked > objects is entirely consistent or might vary a bit. Here's another log with a different number of bytes leaked: https://treeherder.mozilla.org/logviewer.html#?job_id=17269279&repo=try I have a couple of more jobs pending on try so we'll see what the range is.

Flags: needinfo?(jgriffin)

Jonathan Griffin (:jgriffin)

Reporter

Comment 6

•

9 years ago

Comment on attachment 8725774 [details] [diff] [review] Increase Windows e10s texture leak threshold even more. Review of attachment 8725774 [details] [diff] [review]: ----------------------------------------------------------------- lgtm; I just triggered a try run with this to verify it's effective: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5324da8c8d45

Attachment #8725774 - Flags: feedback?(jgriffin) → feedback+

Tracy Walker [:tracy]

Updated

•

9 years ago

tracking-e10s: --- → +

Ryan VanderMeulen [:RyanVM]

Comment 7

•

9 years ago

I've got a similar-looking leak in dom/html/test on WinXP. https://treeherder.mozilla.org/logviewer.html#?job_id=17794695&repo=try

Blocks: 1253849

Andrew McCreight [:mccr8]

Comment 8

•

9 years ago

It seems like Windows graphics code can just leak an arbitrary number of textures. (Maybe there is some kind of cap, I don't know.)

Andrew McCreight [:mccr8]

Comment 9

•

9 years ago

I'll move forward with this tomorrow, hopefully. I lost track of it.

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 10

•

9 years ago

Attached patch Increase Windows e10s texture leak threshold even more. — Details — Splinter Review

At a certain point, we should just ignore these texture leaks entirely. But for now I'll just increase the limit some more. This also increases it enough to cover Ryan's latest round of even more leaks.

Attachment #8729106 - Flags: review?(erahm)

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Attachment #8725774 - Attachment is obsolete: true

Eric Rahm [:erahm]

Comment 11

•

9 years ago

Comment on attachment 8729106 [details] [diff] [review] Increase Windows e10s texture leak threshold even more. Review of attachment 8729106 [details] [diff] [review]: ----------------------------------------------------------------- r=me, although I think we should push harder to get the leak (bug 1232549) properly fixed.

Attachment #8729106 - Flags: review?(erahm) → review+

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Keywords: checkin-needed

Andrew McCreight [:mccr8]

Comment 12

•

9 years ago

I'll investigate this a bit to try and figure out if we are really leaking one texture per canvas or something and file a more specific bug and try to get it marked tracking.

Pulsebot

Comment 13

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/021fe8e5b781

Keywords: checkin-needed

Carsten Book [:Tomcat]

Comment 14

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/021fe8e5b781

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox48: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla48

Jonathan Griffin (:jgriffin)

Reporter

Comment 15

•

9 years ago

This still occurs, see e.g., https://treeherder.mozilla.org/logviewer.html#?job_id=17962898&repo=try

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Andrew McCreight [:mccr8]

Comment 16

•

9 years ago

WARNING | leakcheck | tab process: leaked too many Mutex (expected 15, got 31) WARNING | leakcheck | tab process: leaked too many PTextureChild (expected 9, got 23) Yeah, that's a lot. I think I've narrowed down at least two of the tests that leak, if you want to try disabling those individually, rather than the whole directory: test_imagebitmap_cropping.html and test_imagebitmap.html

Andrew McCreight [:mccr8]

Comment 17

•

9 years ago

Further work here will need somebody to actually fix the leak. I'll file a blocking bug about the two image bitmap leaks I found.

Assignee: continuation → nobody

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Depends on: 1255891

Jonathan Griffin (:jgriffin)

Reporter

Comment 18

•

9 years ago

(In reply to Andrew McCreight [:mccr8] from comment #16) > WARNING | leakcheck | tab process: leaked too many Mutex (expected 15, got > 31) > WARNING | leakcheck | tab process: leaked too many PTextureChild (expected > 9, got 23) > > Yeah, that's a lot. I think I've narrowed down at least two of the tests > that leak, if you want to try disabling those individually, rather than the > whole directory: test_imagebitmap_cropping.html and test_imagebitmap.html I've disabled just those two tests and the leak persists: https://treeherder.mozilla.org/#/jobs?repo=try&revision=817f0453ed46&selectedJob=17970673

Andrew McCreight [:mccr8]

Comment 19

•

9 years ago

You could also try disabling test_capture.html, which leaks intermittently. I've got a script now for reporting which tests leak textures with my logging added, so I'll try rerunning that a bunch of times over the weekend and see if there are any other tests that leak.

Andrew McCreight [:mccr8]

Comment 20

•

9 years ago

Attached patch Log when textures are created and destroyed. DO NOT LAND — Details — Splinter Review

I did 8 or so retriggers of the canvas tests, and the three tests I mentioned before (test_imagebitmap_cropping.html, test_imagebitmap.html and test_capture.html) seem to be the only ones that leak. I'll try triggering M2 XP to reproduce the leak Ryan mentioned in comment 7. I don't know how common that is. This is the patch I used to log texture creation. The script I used to print out the tests that these leak during is here: https://github.com/amccreight/mochitest-logs/blob/master/leak_matcher.py

Andrew McCreight [:mccr8]

Comment 21

•

9 years ago

Attached patch Enable e10s and canvas tests everywhere, disable some canvas tests. DO NOT LAND. — Details — Splinter Review

This is the various other things you have to bunch in there to get it to run e10s Windows debug tests, based on patches from Jonathan and Ryan.

Andrew McCreight [:mccr8]

Comment 22

•

9 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #7) > I've got a similar-looking leak in dom/html/test on WinXP. > https://treeherder.mozilla.org/logviewer.html#?job_id=17794695&repo=try Are you seeing this leak all of the time or just occasionally? I tried to reproduce and wasn't able to, but maybe I just don't have it set right ( https://treeherder.mozilla.org/#/jobs?repo=try&revision=d89a4b3e3ef0 ).

Flags: needinfo?(ryanvm)

Ryan VanderMeulen [:RyanVM]

Comment 23

•

9 years ago

I don't recall offhand, so I've retriggered more runs on Ash. Otherwise, maybe comment 14 took care of it?

Flags: needinfo?(ryanvm)

Andrew McCreight [:mccr8]

Comment 24

•

9 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #23) > I don't recall offhand, so I've retriggered more runs on Ash. Otherwise, > maybe comment 14 took care of it? Oh, that's a good thought. But looking at the logs I see only 3 textures leaking, so I don't know. But from analyzing the logs, I see that these are leaking during dom/html/test/test_video_wakelock.html so maybe it just sometimes leaks more textures. I'll file a separate bug for that.

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Depends on: 1256946

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

status-firefox48: fixed → ---

Whiteboard: [gfx-noted] → [gfx-noted][e10s-orangeblockers]

Target Milestone: mozilla48 → ---

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Depends on: 1258113

Andrew McCreight [:mccr8]

Comment 25

•

9 years ago

I'm just going to turn this into a meta bug for the various Windows-only e10s texture leaks. I've filed bugs for the various tests that leak.

Keywords: meta

Summary: [e10s] canvas mochitests leak with leakcheck | tab process: 4328 bytes leaked (CondVar, Mutex, PTextureChild, SharedMemory, TextureChild, ...) → [e10s] Windows-only texture leaks

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Component: Canvas: 2D → Graphics: Layers

Jim Mathies [:jimm]

Comment 26

•

9 years ago

this affects debug test runs (currently on Ash, but eventually on mc). Milan, can you fix this in someplace in the near future?

Flags: needinfo?(milan)

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

9 years ago

Assignee: nobody → bas

Flags: needinfo?(milan)

Jim Mathies [:jimm]

Updated

•

9 years ago

Priority: -- → P1

Chris Peterson [:cpeterson]

Comment 27

•

9 years ago

Adding to e10s-rc list. Andrew says this bug should at least be looked at enough until we can tell if these texture leaks should block e10s release.

Blocks: e10s-rc

status-firefox46: --- → affected

status-firefox47: --- → affected

status-firefox48: --- → affected

Eric Rahm [:erahm]

Comment 28

•

9 years ago

We're going to P1 the meta bug. Blocking bugs are probably of equal priority.

Whiteboard: [gfx-noted][e10s-orangeblockers] → [gfx-noted][e10s-orangeblockers][MemShrink:P1]

Bas Schouten (:bas.schouten)

Assignee

Comment 29

•

9 years ago

It should be noted all the tests that appear to be affected involve drawing video to a Canvas. I don't know this code very well but I will have a look, ni? Nical and Matt Woodrow as they might have an idea how how drawing video into a canvas interacts with hardware video decoding and the imagebridge, so I'll have some idea of where to start looking.

Flags: needinfo?(nical.bugzilla)

Flags: needinfo?(matt.woodrow)

Nicolas Silva [:nical]

Comment 30

•

9 years ago

When the ImageContainer receives an Image, it immediately tries to send the Image to the ImageBridge, which creates a TextureClient if the latter wasn't already created and pass it to an ImageClient on the ImageBridge. When the canvas access the ImageContainer, I assume that it will copy/read-back the content of ImageContainer on the main thread. The code that reads from the ImageContainer is in nsLayoutUtils::SurfaceFromElement: among other things it calls Image::GetAsSourceSurface. So in principle the interactions between ImageContainer+ImageBridge and ImageContainer+Canvas *should* be orthogonal. It could be that the canvas holds the ImageContainer and/or Image longer than usual and we get into a situation that we hadn't thought of as a result, but I don't see what kind of situation that would be. If that was the case I am still confident that TextureClient's destruction is robust enough to be triggered from any thread at any time without problems *as long as it happens before gfx's shutdown*, so I would guess we leak something that holds on to the texture rather than just the texture, maybe. That's all very hand-wavy, just dumping anything that comes to mind as I type.

Flags: needinfo?(nical.bugzilla)

Matt Woodrow (:mattwoodrow)

Comment 32

•

9 years ago

(In reply to Nicolas Silva [:nical] from comment #30) > It could be that the canvas > holds the ImageContainer and/or Image longer than usual and we get into a > situation that we hadn't thought of as a result, but I don't see what kind > of situation that would be. This was my guess too, but I looked over the code and we don't ever hold onto Layers things, only SourceSurfaces. Do we assert when we try release things after gfx has shut down? Would be good to confirm that this is the problem, and ideally get a stack trace to who was holding the last ref.

Flags: needinfo?(matt.woodrow)

Brad Lassey [:blassey] (use needinfo?)

Updated

•

9 years ago

tracking-e10s: + → m9+

Edwin Flores [inactive from 2016-12-01] [:eflores] [:edwin]

Comment 33

•

9 years ago

I might be looking at a similar issue in bug 1244883. What I've observed is that the main process is trying to shut down the content process, and blocks the I/O thread in EnsureProcessTerminated() until the content process has died. All (AFAICT?) IPDL messages go via the I/O thread so no IPC can happen, including __delete__ calls. In the log linked in the original description of this bug we even see a small flood of messages attempting to send once the content process dies, but failing (note the PTexture::Msg___delete__ call failing): 23:51:19 INFO - => Process ID: 1816, Thread ID: 2992 23:51:19 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel closing: too late to send/recv, messages will be lost 23:51:19 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel closing: too late to send/recv, messages will be lost 23:51:19 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel closing: too late to send/recv, messages will be lost 23:51:19 INFO - [Parent 3624] WARNING: pipe error: 109: file c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 333 23:51:19 INFO - ###!!! [Parent][MessageChannel] Error: (msgtype=0xF60001,name=PTexture::Msg___delete__) Channel error: cannot send/recv 23:51:19 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv 23:51:19 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv 23:51:19 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv 23:51:19 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv 23:51:19 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv 23:51:19 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv 23:51:19 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv 23:51:19 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv 23:51:19 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv

Edwin Flores [inactive from 2016-12-01] [:eflores] [:edwin]

Comment 34

•

9 years ago

The offending line in EnsureProcessTerminated: https://dxr.mozilla.org/mozilla-central/rev/68c0b7d6f16ce5bb023e08050102b5f2fe4aacd8/ipc/chromium/src/chrome/common/process_watcher_win.cc#77 I've a try run here with full symbols that crashes when it hits this and makes a minidump of the parent process: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b1f4af293d1d53e8bb8d1d36a702a1c6da6ba3a8

Benjamin Smedberg

Comment 35

•

9 years ago

Edwin, I don't understand what you're saying. In the case of a debug/leak-checking build, all IPDL protocols (including PBackground) should be shut down before we call EnsureProcessTerminated. If there is still any protocol system open at that point, that is a bug in the design of the protocols and should be fixed.

Edwin Flores [inactive from 2016-12-01] [:eflores] [:edwin]

Comment 36

•

9 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #35) > Edwin, I don't understand what you're saying. In the case of a > debug/leak-checking build, all IPDL protocols (including PBackground) should > be shut down before we call EnsureProcessTerminated. If there is still any > protocol system open at that point, that is a bug in the design of the > protocols and should be fixed. Well there are plenty of things shutting down in ShutdownXPCOM that still need IPDL to work (in particular, media shuts down there, and holds on to graphics resources that are shared between processes and need IPDL to die). AFAICT, ShutdownXPCOM is kicked off on the child at around the same time that EnsureProcessTerminated is called on the parent -- both coming from Content*::ActorDestroy. (I would rather like to be wrong about this -- see https://bugzilla.mozilla.org/show_bug.cgi?id=1244883#c21)

Benjamin Smedberg

Comment 37

•

9 years ago

There should not be any things in XPCOM shutdown that require IPC access. All IPC/IPDL protocols must be completely shut down before you start XPCOM shutdown. The protocol ActorDestroy methods should be responsible for cleaning up any open shared resources. If we are starting XPCOM shutdown from within the ActorDestroy method, perhaps we need to move that later?

Edwin Flores [inactive from 2016-12-01] [:eflores] [:edwin]

Comment 38

•

9 years ago

Is there a specific place where IPC-dependent shutdown should be done? ContentChild::ActorDestroy? I see we fire a |content-child-shutdown| notification there, but simply observing for that doesn't afford the ordering guarantees that we rely on (things use graphics -- lots of things -- that need to be dead before we shut down).

Bas Schouten (:bas.schouten)

Assignee

Comment 40

•

9 years ago

(In reply to Edwin Flores [:eflores] [:edwin] from comment #38) > Is there a specific place where IPC-dependent shutdown should be done? > ContentChild::ActorDestroy? I see we fire a |content-child-shutdown| > notification there, but simply observing for that doesn't afford the > ordering guarantees that we rely on (things use graphics -- lots of things > -- that need to be dead before we shut down). Like Javascript for example, since DOM/Javascript has a unidirectional dependency on graphics those really have to shutdown before graphics does.

Jim Mathies [:jimm]

Comment 41

•

9 years ago

(In reply to Edwin Flores [:eflores] [:edwin] from comment #38) > Is there a specific place where IPC-dependent shutdown should be done? > ContentChild::ActorDestroy? I see we fire a |content-child-shutdown| > notification there, but simply observing for that doesn't afford the > ordering guarantees that we rely on (things use graphics -- lots of things > -- that need to be dead before we shut down). ContentChild::ActorDestroy appears to be too late, that calls XRE_ShutdownChildProcess() which breaks the main thread out of it's run loop. What protocol(s) are you having shutdown issues with? http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsEmbedFunctions.cpp#812

Flags: needinfo?(edwin)

Bas Schouten (:bas.schouten)

Assignee

Comment 42

•

9 years ago

Let's not confuse the bugs too much. This bug is actually just because we're not shutting down the ImageBridge and CompositorBridge cleanly on windows. I have patches to address this issue (although in a mildly hacky way). Let's have the shutdown discussions elsewhere.

Depends on: 1262898

Edwin Flores [inactive from 2016-12-01] [:eflores] [:edwin]

Updated

•

9 years ago

Flags: needinfo?(edwin)

Bas Schouten (:bas.schouten)

Assignee

Updated

•

9 years ago

Blocks: e10s-gfx

Jim Mathies [:jimm]

Comment 45

•

9 years ago

Should be fixed by bug 1262898. Bas, can we close this out or is there something we should confirm first?

Flags: needinfo?(bas)

Bas Schouten (:bas.schouten)

Assignee

Comment 46

•

9 years ago

Looking good!

Status: REOPENED → RESOLVED

Closed: 9 years ago → 9 years ago

Flags: needinfo?(bas)

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

status-firefox46: affected → wontfix

status-firefox48: affected → fixed

Target Milestone: --- → mozilla48

Pulsebot

Comment 47

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/93a7d65441a3

Carsten Book [:Tomcat]

Comment 48

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/93a7d65441a3

Ryan VanderMeulen [:RyanVM]

Comment 49

•

9 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/eb158df87111a35aac428c0239ec7567bb92074b

You need to log in before you can comment on or make changes to this bug.