Investigate why we're running out of VM with probably-leaked memory mappings relating to gfx

NEW
Unassigned

Status

()

defect
6 years ago
2 months ago

People

(Reporter: benjamin, Unassigned)

Tracking

(Depends on 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 1 obsolete attachment)

See https://groups.google.com/forum/?fromgroups=#!topic/mozilla.dev.platform/4KuiEQqMUkc for more backstory.

We have compelling evidence that some users are running out of virtual memory (but not actual memory) on Windows.

http://people.mozilla.com/~bsmedberg/graphical-minidump-memoryinfo/ has a chart of the memory allocations from one user, who reported the issue in bug 857030. There are several issues to investigate here, but the most urgent problem is that there are memory mappings near the top of the address space which are consuming most of the virtual memory space. Apparently this problem goes away if that particular user switches from their nvidia graphics card to the integrated graphics card.

I need somebody to help get stack traces from the allocation point of these buffers and figure out why they aren't getting unmapped. The function that is allocating these is probably MapViewOfFile(Ex). We currently hook MapViewOfFile in AvailableMemoryTracker.cpp, but neither jlebar nor I was sure whether we were capable of collecting stack traces in-process from this hook.

The newsgroup thread suggested that perhaps leakdiag could be a useful tool to debug this in the field, although there is precious little information about whether it tracks MapViewOfFile instead of just VirtualAlloc:
* http://blogs.msdn.com/b/slavao/archive/2005/01/27/361678.aspx
* http://blogs.jetbrains.com/yole/archives/000034.html
* http://www.codeproject.com/Articles/108529/LeakDiag-An-Effective-Memory-Leak-Analysis-Tool

It also seems possible that the GetMappedFileName function could also be a useful diagnostic; although it's likely that if this is anonymous shared memory that the function will not show us anything useful.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms683195%28v=vs.85%29.aspx

Some possibilities:
* we have a bug in our compositor use of shared memory
* we have a bug in our plugin-process use of shared memory
* the windows code (d3d or d2d) code has a bug in the use of shared memory
* the nvidia driver has a bug in the use of shared memory
Blocks: 837835, 857030
> but neither jlebar nor I was sure whether we were capable of collecting stack traces in-process 
> from this hook.

The only issue (I think) would occur if getting a stack requires allocating.

But if we're going to do a custom build for this anyway, I'd guess we can build with -fno-omit-frame-pointer, in which case it should be simple to write a stack-walker that doesn't allocate.
Maybe this is something jld can help with.
The pseudostack can be read at any point without allocating.
NS_StackWalk can already be configured for frame pointer walking on x86, I think — someone else might know the ifdefs better than I do.  Obviously that breaks if the stack goes through a closed-source driver, but in that case we'll at least have one PC from it to assign blame.
Jed sure, but does that require allocation? Or can we reuse the existing lock-free profiler infrastructure to do this?

BenWa, I think we need real stacks, not the pseudostacks for these, no?

Another possibility is to do this processing from an external debugger process.
Well the windows version uses DuplicateHandle so that could cause some trouble for us. That's why I suggested the pseudostack. It does work fine in practice with the profiler but I don't know how it will behave when we're out of address space.

http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsStackWalk.cpp#464
Whiteboard: [MemShrink]
FramePointerStackWalk doesn't allocate, as far as I can see.
Duplicate of this bug: 860882
The most obvious way to avoid allocation, then, is to set aside a large buffer in which to store logging results. I just want the stacks for each call to MapViewOfFile and UnmapViewOfFile with the parameters and the result/address.
Assigning to Benoit for the specified hacking...
Assignee: bugs → bgirard
Alright just to clarify what I plan on doing tomorrow:
Post a try build of:
* Use DllInterceptor to hot-patch MapViewOfFile and UnmapViewOfFile
* Perhaps every 100 calls to each function print a backtrace since we believe it's called very often.
* Experiment with NS_Stackwalk, if not simply use Pseudostacks.

Do we still have a contact with someone who can reproduce this issue who can run the try build above?
Flags: needinfo?(benjamin)
(In reply to Benoit Girard (:BenWa) from comment #11)
> Do we still have a contact with someone who can reproduce this issue who can
> run the try build above?

I mean the build described above which will be posted tomorrow.
Yes, I have several contacts. Do we really believe that MapViewOfFile and UnmapViewOfFile are called so often that we can't log every call? That would surprise me.

But I'm worried about "simply use pseudostacks". I don't understand how the pseudostack would actually help us pinpoint the problem in most cases.
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #13)
> But I'm worried about "simply use pseudostacks". I don't understand how the
> pseudostack would actually help us pinpoint the problem in most cases.

Alright I'll look into just NS_Stackwalk then and report back.
(In reply to comment #14)
> (In reply to Benjamin Smedberg  [:bsmedberg] from comment #13)
> > But I'm worried about "simply use pseudostacks". I don't understand how the
> > pseudostack would actually help us pinpoint the problem in most cases.
> 
> Alright I'll look into just NS_Stackwalk then and report back.

Note that you probably want to turn on --enable-profiling in the mozconfig for the try build if you want to use NS_StackWalk.
Here's a build that can be used to test this:
https://dl.dropboxusercontent.com/u/10523664/firefox-23.0a1.en-US.win32.zip

1) Create the folder 'C:\mozilla'
2) Run the build and reproduce the problem
3) Attach C:\mozilla\alloclog2.txt

The log will include the library mapping follow by a log of each call to MapViewOfFile such as:

MapViewOfFileHook Begin (4, 0, 0, 64):
56541915
76852C95
5BDE57D6
Whiteboard: [MemShrink] → [MemShrink:P1]
I am reliably hitting this bug.  I'm now running with the tryserver build, but I notice that my alloclog2.txt is exactly 8680 bytes for about 20 minutes now, and ends with "MapViewOfFileHook Begin (2, 0, " and no newline.  There are one or two entries before that.  Also, it doesn't seem to log the returned address of the mapping -- that would be helpful to correlate with vmmap info.
I'll add an explicit flush and the return address.
Also note that we want to hook MapViewOfFileEx as well.
> I'll add an explicit flush and the return address.

I always use |fprintf(stderr, ...)| for logs like this, because stderr is unbuffered.
Nope, nothing here.  With the flushes, it only logs about 13k bytes worth of logs, and then there just aren't any more calls.  This is consistent with what I was seeing when I would quit firefox with the previous logging patch without flushing.  I don't think these are coming through those API calls.
Bug 859377 comment 44 suggests this may be caused by "rampant use of hardware surfaces on Windows" and might be fixed by the latest patch in that bug.
Depends on: 859377
I doubt it.  This was happening for me in a very specific configuration: laptop with Optimus, Firefox running on the NVIDIA GPU.  That configuration has the most complex d3d setup (since Firefox is rendering using the NVIDIA GPU's D3D implementation, which is then routed to the Intel GPU for display).
Is it not resolved for you, then? Bug 866526 comment 12 and other comments elsewhere suggest to me that bug 859377 was indeed the problem.
Note that this issue predates both ClippedImage and the subsequent regression and improvements from bug 859377. It may be a different issue altogether.
Hmm, good point. All we can say is that bug 859377 was _a_ problem.
I've discovered that there appears to be a bug in CreateOffscreenSurface that could be the culprit. See bug 869252.
Off chance: I wonder if this was related to the problem we were having on M1 and M3 with OOM errors intermittently (Bug 870002 was M3), causing no stack traceback.  These disappeared when inbound/m-c upgraded to the newer (more memory) ix slaves
Has anything changed here recently?
Not from my end -- I was never able to capture a vmmap log of when the problem was happening, before vmmap ran out of memory itself.
(In reply to Benjamin Smedberg  [:bsmedberg] workweek high latency 19-Aug through 23-Aug from comment #0)
> Some possibilities:
> * we have a bug in our compositor use of shared memory
OMTC/e10s is not enabled on windows so this shouldn't be a factor
> * we have a bug in our plugin-process use of shared memory
This is a good theory. Have we been able to reproduce this bug without any plugins loaded...
> * the windows code (d3d or d2d) code has a bug in the use of shared memory
... AND d3d/d2d disabled?
> * the nvidia driver has a bug in the use of shared memory
which would leave us with this. Perhaps checking if having an GPU intensive background application like Starcraft 2 title screen aggregate the problem might be interesting.

Checking this should narrow down the problem.
Unassigning from BenWa, as he's not actively looking for a fix, thought he can stay involved in the conversations.
Assignee: bgirard → nobody
(In reply to Benoit Girard (:BenWa) from comment #34)
> Have we been able to reproduce this bug without any plugins loaded?
> Have we been able to reproduce this bug with d3d/d2d disabled?
> Perhaps checking if having an GPU intensive background application like Starcraft 2 title screen aggregate the problem might be interesting.
> Checking these should narrow down the problem.

sunset.in.trance: can you help answer the questions above on your affected system? Thanks!
Flags: needinfo?(sunset.in.trance)
Not resolved ia had a PoC to reproduce the crash all times in five or 10 seconds firefox crash, it it in mozalloc.dll.
eax=00000000 ebx=00000000 ecx=62943896 edx=00000003 esi=628f1ec6 edi=6294379c
eip=7335119c esp=0043dac8 ebp=0043db1c iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00200202
mozalloc!mozalloc_abort+0x2a:
7335119c cc              int     3

it's posible to exploit it with a heap spray to found a free or a RW memory address to return.
I can reproduce it in firefox  and mozalloc.dll 24.0.0.5001
This is possibly happening on TBPL in mochitest-bc. See bug 937997.
So khuey and I were playing with this simple testcase he wrote:

http://people.mozilla.com/~khuey/tests/bigimage.html

If you watch the process with vmmap:

http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx

you can see 32MB chunks slowing filling up the address space as Private Data.

The funny thing is that these 32MB chunks are being allocated via VirtualAlloc and VirtualFree, through jemalloc, and the appropriate allocation calls are balanced.

So the question is why VirtualAlloc thinks those chunks aren't available for future allocations.
Blocks: 937997
(In reply to Nathan Froyd (:froydnj) from comment #39)
> So khuey and I were playing with this simple testcase he wrote:
> 
> http://people.mozilla.com/~khuey/tests/bigimage.html
> 
> If you watch the process with vmmap:
> 
> http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx
> 
> you can see 32MB chunks slowing filling up the address space as Private Data.
>

I think you're looking at the wrong chunks.. I'm seeing this test steadily increase our GPU commited size. I'm investigating this right now.
> The funny thing is that these 32MB chunks are being allocated via
> VirtualAlloc and VirtualFree, through jemalloc, and the appropriate
> allocation calls are balanced.
> 
> So the question is why VirtualAlloc thinks those chunks aren't available for
> future allocations.
(In reply to Bas Schouten (:bas.schouten) from comment #40)
> (In reply to Nathan Froyd (:froydnj) from comment #39)
> > So khuey and I were playing with this simple testcase he wrote:
> > 
> > http://people.mozilla.com/~khuey/tests/bigimage.html
> > 
> > If you watch the process with vmmap:
> > 
> > http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx
> > 
> > you can see 32MB chunks slowing filling up the address space as Private Data.
> >
> 
> I think you're looking at the wrong chunks.. I'm seeing this test steadily
> increase our GPU commited size. I'm investigating this right now.

It did eventually reclaim the memory for me. But it took a very long time.

I suspect this may be some sort of internal cache by D2D.
Running that test on my laptop OOMs the browser.
So what's the difference in environment/configuration between khuey's machine and Bas's machine?
I'm running a pretty stock X220 with Intel graphics.
So yeah, one key difference is, that with my drivers GPU memory does not get mapped into your address space. On stock -intel- drivers it does for some machines. This is something I found out a couple of weeks ago. In reality this should only mean that as long as we don't have ImageLib -only- holding on to the Moz2D surfaces we'll use twice as much address space on these machines. But twice as much for normal usage should still be acceptable. The next question is, why are we using so much GPU memory when doing these big images that we're actually destroying.

I don't have an answer to that yet. I've confirmed that the RefCount of our ID2D1Bitmap objects is 0 when the SourceSurfaceD2D gets destroyed, I've also confirmed that nothing I do with ID2D1Device (like ClearResources or SetMaximumTextureMemory) makes much of a difference.

I'm currently working on creating a stand-alone test in Moz2D, modeled after what Kyle's code ends up doing inside Moz2D, and we'll see if that reproduces it. If it does reproduce it, I need to look at what we can do different to fix it, I have a theory for something we may be able to use as a workaround. It won't be too pretty, but it shouldn't be too bad either.
> Some possibilities:
> * the nvidia driver has a bug in the use of shared memory


> So khuey and I were playing with this simple testcase he wrote:
> 
> http://people.mozilla.com/~khuey/tests/bigimage.html
> 
> If you watch the process with vmmap:
> 
> http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx
> 
> you can see 32MB chunks slowing filling up the address space as Private Data.


Works all fine for me!

- Win7 64bit
- Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 ID:20131111154639 CSet: ab495a0b16a0
- Nvidia Quadro FX 1500
- Nvidia Driver 307.83
More interesting data here, I've done a couple of test cases:

  uint8_t* data = new uint8_t[512 * 512 * 4];

  for (int i = 0; i < 100000; i++) {
    RefPtr<SourceSurface> surf = mDT->CreateSourceSurfaceFromData(data, IntSize(512, 512), 512 * 4, FORMAT_B8G8R8A8);
    mDT->Flush();
  }
  delete [] data;

In this test, GPU committed will grow to 200 MB as the test runs, then reset to +/- 10 MB, and go again, etc. etc.

Change that to 1024x1024, and it will grow to 800 MB as the test runs, then reset back, etc. etc. Some quick math, would then seem to indicate that it grows to +/- 200 times the image size.

Now, because we all love data, I did the following:

  for (int i = 0; i < 100000; i++) {
    RefPtr<SourceSurface> surf = mDT->CreateSourceSurfaceFromData(data, IntSize(1024, 1024), 1024 * 4, FORMAT_B8G8R8A8);
    RefPtr<SourceSurface> surf2 = mDT->CreateSourceSurfaceFromData(data, IntSize(4, 4), 4 * 4, FORMAT_B8G8R8A8);
    mDT->Flush();
  }

Our perceptive readers will probably be able to predict what happens, yes indeed! We grow to 400 MB.

So what this seems to indicate is that D2D just keeps the last 200 textures it used around, just in case it's going to need them again. Without looking at all at the size! If I make these 4096x2048 it will just grow to 6 GB! Now if all those textures end up in your address space, as you might well imagine, we're in a pickle. I have an idea for a workaround, sadly that workaround will also mean we don't get the benefit of this cache, but if that's what's needed to improve the situation, so be it.
So this seems to be a part of D3D, not D2D, this simpler test case using D3D directly reproduces the problem as well:

  for (int i = 0; i < 100000; i++) {

    RefPtr<ID3D10Texture2D> texture;
    CD3D10_TEXTURE2D_DESC desc(DXGI_FORMAT_B8G8R8A8_UNORM, 1024, 1024, 1, 1);
    Factory::GetDirect3D10Device()->CreateTexture2D(&desc, nullptr, byRef(texture));
    CD3D10_TEXTURE2D_DESC desc2(DXGI_FORMAT_B8G8R8A8_UNORM, 2, 2, 1, 1);
    Factory::GetDirect3D10Device()->CreateTexture2D(&desc2, nullptr, byRef(texture));
    Factory::GetDirect3D10Device()->Flush();
  }

Still looking into a workaround, because of this being a D3D issue my previous idea didn't work. The question now also becomes whether this is driver dependent to a larger extent than I'd hoped.
(In reply to Bas Schouten (:bas.schouten) from comment #48)
> So this seems to be a part of D3D, not D2D, this simpler test case using D3D
> directly reproduces the problem as well:
> 
>   for (int i = 0; i < 100000; i++) {
> 
>     RefPtr<ID3D10Texture2D> texture;
>     CD3D10_TEXTURE2D_DESC desc(DXGI_FORMAT_B8G8R8A8_UNORM, 1024, 1024, 1, 1);
>     Factory::GetDirect3D10Device()->CreateTexture2D(&desc, nullptr,
> byRef(texture));
>     CD3D10_TEXTURE2D_DESC desc2(DXGI_FORMAT_B8G8R8A8_UNORM, 2, 2, 1, 1);
>     Factory::GetDirect3D10Device()->CreateTexture2D(&desc2, nullptr,
> byRef(texture));
>     Factory::GetDirect3D10Device()->Flush();
>   }
> 
> Still looking into a workaround, because of this being a D3D issue my
> previous idea didn't work. The question now also becomes whether this is
> driver dependent to a larger extent than I'd hoped.

It looks like this cache is specific to my device, I don't see the same issue on other devices, but I see subtly different problems. On devices where some of my testcases crash (where textures are mapped into the address space), I may be close to a workaround.
I've pushed several try runs with different attempts at differing degrees of mitigation for these problems, the test case in this bug hits a very special case that shouldn't be common, but it should also be addressed by at least some of these patches:

https://tbpl.mozilla.org/?tree=Try&rev=c797cdc08ea0
https://tbpl.mozilla.org/?tree=Try&rev=b12eb295e5cd
https://tbpl.mozilla.org/?tree=Try&rev=e66af2b5c47f
https://tbpl.mozilla.org/?tree=Try&rev=f083753edc0b
(In reply to Bas Schouten (:bas.schouten) from comment #45)
> this should only mean that as long as we don't have ImageLib -only- holding
> on to the Moz2D surfaces we'll use twice as much address space on these
> machines.

This sounds like bug 700545 needs a larger scope.
Fwiw, the test case that was added to this bug works, because it's drawing to a canvas that's never shown or used, which means all those images will get stored in the drawing pipeline which is never flushed. https://tbpl.mozilla.org/?tree=Try&rev=e66af2b5c47f contains a patch that takes care of making sure all canvases are regularly flushed when large amount of uploads are put in the pipeline.

This would be a viable short term workaround and I'm not against landing this, but I don't have -too- high hopes it'll fix bc, let's see though.
So https://tbpl.mozilla.org/?tree=Try&rev=e66af2b5c47f still has an OOM failure in M2. Even though it does fix the test listed in this bug.
Depends on: 941837
(In reply to Jet Villegas (:jet) from comment #36)
> (In reply to Benoit Girard (:BenWa) from comment #34)
>...
> sunset.in.trance: can you help answer the questions above on your affected
> system? Thanks!

:jet, I pm sunset some months ago and no response, do I don't think we're going to get the feedback
Flags: needinfo?(sunset.in.trance)
Depends on: defrag
Did anything positive come out of the tests of comment 50?
Summary: Investigate why we're running out of VM with probably-leaked memory mappings → Investigate why we're running out of VM with probably-leaked memory mappings relating to gfx
Whiteboard: [MemShrink:P1]
(In reply to Bas Schouten (:bas.schouten) from comment #47)
> In this test, GPU committed will grow to 200 MB as the test runs, then reset
> to +/- 10 MB, and go again, etc. etc.
> 
> Change that to 1024x1024, and it will grow to 800 MB as the test runs, then
> reset back, etc. etc. Some quick math, would then seem to indicate that it
> grows to +/- 200 times the image size.
> 
> Our perceptive readers will probably be able to predict what happens, yes
> indeed! We grow to 400 MB.
> 
> So what this seems to indicate is that D2D just keeps the last 200 textures
> it used around, just in case it's going to need them again. Without looking
> at all at the size! If I make these 4096x2048 it will just grow to 6 GB! Now
> if all those textures end up in your address space, as you might well
> imagine, we're in a pickle. I have an idea for a workaround, sadly that
> workaround will also mean we don't get the benefit of this cache, but if
> that's what's needed to improve the situation, so be it.

Can this be a case for gpu-commited leaks like bug 1432086 or bug 1436848 ?
(In reply to Benjamin Smedberg from comment #0)
> * the nvidia driver has a bug in the use of shared memory

Idk if relevent for this thread, but as I remember a lot of nvidia drivers (long before windows 10 release, and few around AU) had slow non-paged memory leak (something like 1 gigabyte for every 150 hours of uptime).

After updating nvidia driver to 390.77 I have also got first BSOD since AU, caused by nvlddmkm. (PAGE_FAULT_IN_NONPAGED_AREA, probably related to bug 1440026)
You need to log in before you can comment on or make changes to this bug.