Closed
Bug 859955
Opened 10 years ago
Closed 5 months ago
Investigate why we're running out of VM with probably-leaked memory mappings relating to gfx
Categories
(Core :: General, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: benjamin, Unassigned)
References
Details
Attachments
(2 files, 1 obsolete file)
63.05 KB,
patch
|
Details | Diff | Splinter Review | |
1.11 MB,
application/octet-stream
|
Details |
See https://groups.google.com/forum/?fromgroups=#!topic/mozilla.dev.platform/4KuiEQqMUkc for more backstory. We have compelling evidence that some users are running out of virtual memory (but not actual memory) on Windows. http://people.mozilla.com/~bsmedberg/graphical-minidump-memoryinfo/ has a chart of the memory allocations from one user, who reported the issue in bug 857030. There are several issues to investigate here, but the most urgent problem is that there are memory mappings near the top of the address space which are consuming most of the virtual memory space. Apparently this problem goes away if that particular user switches from their nvidia graphics card to the integrated graphics card. I need somebody to help get stack traces from the allocation point of these buffers and figure out why they aren't getting unmapped. The function that is allocating these is probably MapViewOfFile(Ex). We currently hook MapViewOfFile in AvailableMemoryTracker.cpp, but neither jlebar nor I was sure whether we were capable of collecting stack traces in-process from this hook. The newsgroup thread suggested that perhaps leakdiag could be a useful tool to debug this in the field, although there is precious little information about whether it tracks MapViewOfFile instead of just VirtualAlloc: * http://blogs.msdn.com/b/slavao/archive/2005/01/27/361678.aspx * http://blogs.jetbrains.com/yole/archives/000034.html * http://www.codeproject.com/Articles/108529/LeakDiag-An-Effective-Memory-Leak-Analysis-Tool It also seems possible that the GetMappedFileName function could also be a useful diagnostic; although it's likely that if this is anonymous shared memory that the function will not show us anything useful. http://msdn.microsoft.com/en-us/library/windows/desktop/ms683195%28v=vs.85%29.aspx Some possibilities: * we have a bug in our compositor use of shared memory * we have a bug in our plugin-process use of shared memory * the windows code (d3d or d2d) code has a bug in the use of shared memory * the nvidia driver has a bug in the use of shared memory
Reporter | ||
Updated•10 years ago
|
Comment 1•10 years ago
|
||
> but neither jlebar nor I was sure whether we were capable of collecting stack traces in-process
> from this hook.
The only issue (I think) would occur if getting a stack requires allocating.
But if we're going to do a custom build for this anyway, I'd guess we can build with -fno-omit-frame-pointer, in which case it should be simple to write a stack-walker that doesn't allocate.
Comment 2•10 years ago
|
||
Maybe this is something jld can help with.
Comment 3•10 years ago
|
||
The pseudostack can be read at any point without allocating.
Comment 4•10 years ago
|
||
NS_StackWalk can already be configured for frame pointer walking on x86, I think — someone else might know the ifdefs better than I do. Obviously that breaks if the stack goes through a closed-source driver, but in that case we'll at least have one PC from it to assign blame.
Reporter | ||
Comment 5•10 years ago
|
||
Jed sure, but does that require allocation? Or can we reuse the existing lock-free profiler infrastructure to do this? BenWa, I think we need real stacks, not the pseudostacks for these, no? Another possibility is to do this processing from an external debugger process.
Comment 6•10 years ago
|
||
Well the windows version uses DuplicateHandle so that could cause some trouble for us. That's why I suggested the pseudostack. It does work fine in practice with the profiler but I don't know how it will behave when we're out of address space. http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsStackWalk.cpp#464
Updated•10 years ago
|
Whiteboard: [MemShrink]
Comment 7•10 years ago
|
||
FramePointerStackWalk doesn't allocate, as far as I can see.
Reporter | ||
Comment 9•10 years ago
|
||
The most obvious way to avoid allocation, then, is to set aside a large buffer in which to store logging results. I just want the stacks for each call to MapViewOfFile and UnmapViewOfFile with the parameters and the result/address.
Comment 11•10 years ago
|
||
Alright just to clarify what I plan on doing tomorrow: Post a try build of: * Use DllInterceptor to hot-patch MapViewOfFile and UnmapViewOfFile * Perhaps every 100 calls to each function print a backtrace since we believe it's called very often. * Experiment with NS_Stackwalk, if not simply use Pseudostacks. Do we still have a contact with someone who can reproduce this issue who can run the try build above?
Flags: needinfo?(benjamin)
Comment 12•10 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #11) > Do we still have a contact with someone who can reproduce this issue who can > run the try build above? I mean the build described above which will be posted tomorrow.
Reporter | ||
Comment 13•10 years ago
|
||
Yes, I have several contacts. Do we really believe that MapViewOfFile and UnmapViewOfFile are called so often that we can't log every call? That would surprise me. But I'm worried about "simply use pseudostacks". I don't understand how the pseudostack would actually help us pinpoint the problem in most cases.
Flags: needinfo?(benjamin)
Comment 14•10 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #13) > But I'm worried about "simply use pseudostacks". I don't understand how the > pseudostack would actually help us pinpoint the problem in most cases. Alright I'll look into just NS_Stackwalk then and report back.
Comment 15•10 years ago
|
||
(In reply to comment #14) > (In reply to Benjamin Smedberg [:bsmedberg] from comment #13) > > But I'm worried about "simply use pseudostacks". I don't understand how the > > pseudostack would actually help us pinpoint the problem in most cases. > > Alright I'll look into just NS_Stackwalk then and report back. Note that you probably want to turn on --enable-profiling in the mozconfig for the try build if you want to use NS_StackWalk.
Comment 16•10 years ago
|
||
Comment 17•10 years ago
|
||
Here's a build that can be used to test this: https://dl.dropboxusercontent.com/u/10523664/firefox-23.0a1.en-US.win32.zip 1) Create the folder 'C:\mozilla' 2) Run the build and reproduce the problem 3) Attach C:\mozilla\alloclog2.txt The log will include the library mapping follow by a log of each call to MapViewOfFile such as: MapViewOfFileHook Begin (4, 0, 0, 64): 56541915 76852C95 5BDE57D6
Comment 18•10 years ago
|
||
crashreporters symbols can be found here for symbolication: https://dl.dropboxusercontent.com/u/10523664/firefox-23.0a1.en-US.win32.crashreporter-symbols-full.zip
![]() |
||
Updated•10 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P1]
I am reliably hitting this bug. I'm now running with the tryserver build, but I notice that my alloclog2.txt is exactly 8680 bytes for about 20 minutes now, and ends with "MapViewOfFileHook Begin (2, 0, " and no newline. There are one or two entries before that. Also, it doesn't seem to log the returned address of the mapping -- that would be helpful to correlate with vmmap info.
Comment 20•10 years ago
|
||
I'll add an explicit flush and the return address.
Comment 21•10 years ago
|
||
Also note that we want to hook MapViewOfFileEx as well.
Comment 22•10 years ago
|
||
I produced a build that will fflush, look at MapViewOfFileEx and print the return address: https://dl.dropboxusercontent.com/u/10523664/firefox-23-VM-leak.zip https://dl.dropboxusercontent.com/u/10523664/firefox-23-VM-leak.crashreporter-symbols-full.zip
Attachment #738238 -
Attachment is obsolete: true
![]() |
||
Comment 23•10 years ago
|
||
> I'll add an explicit flush and the return address.
I always use |fprintf(stderr, ...)| for logs like this, because stderr is unbuffered.
Nope, nothing here. With the flushes, it only logs about 13k bytes worth of logs, and then there just aren't any more calls. This is consistent with what I was seeing when I would quit firefox with the previous logging patch without flushing. I don't think these are coming through those API calls.
Comment 25•10 years ago
|
||
Bug 859377 comment 44 suggests this may be caused by "rampant use of hardware surfaces on Windows" and might be fixed by the latest patch in that bug.
Depends on: 859377
I doubt it. This was happening for me in a very specific configuration: laptop with Optimus, Firefox running on the NVIDIA GPU. That configuration has the most complex d3d setup (since Firefox is rendering using the NVIDIA GPU's D3D implementation, which is then routed to the Intel GPU for display).
Comment 27•10 years ago
|
||
Is it not resolved for you, then? Bug 866526 comment 12 and other comments elsewhere suggest to me that bug 859377 was indeed the problem.
Reporter | ||
Comment 28•10 years ago
|
||
Note that this issue predates both ClippedImage and the subsequent regression and improvements from bug 859377. It may be a different issue altogether.
Comment 29•10 years ago
|
||
Hmm, good point. All we can say is that bug 859377 was _a_ problem.
Comment 30•10 years ago
|
||
I've discovered that there appears to be a bug in CreateOffscreenSurface that could be the culprit. See bug 869252.
Comment 31•10 years ago
|
||
Off chance: I wonder if this was related to the problem we were having on M1 and M3 with OOM errors intermittently (Bug 870002 was M3), causing no stack traceback. These disappeared when inbound/m-c upgraded to the newer (more memory) ix slaves
![]() |
||
Comment 32•10 years ago
|
||
Has anything changed here recently?
Not from my end -- I was never able to capture a vmmap log of when the problem was happening, before vmmap ran out of memory itself.
Comment 34•10 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] workweek high latency 19-Aug through 23-Aug from comment #0) > Some possibilities: > * we have a bug in our compositor use of shared memory OMTC/e10s is not enabled on windows so this shouldn't be a factor > * we have a bug in our plugin-process use of shared memory This is a good theory. Have we been able to reproduce this bug without any plugins loaded... > * the windows code (d3d or d2d) code has a bug in the use of shared memory ... AND d3d/d2d disabled? > * the nvidia driver has a bug in the use of shared memory which would leave us with this. Perhaps checking if having an GPU intensive background application like Starcraft 2 title screen aggregate the problem might be interesting. Checking this should narrow down the problem.
Comment 35•10 years ago
|
||
Unassigning from BenWa, as he's not actively looking for a fix, thought he can stay involved in the conversations.
Assignee: bgirard → nobody
Comment 36•10 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #34) > Have we been able to reproduce this bug without any plugins loaded? > Have we been able to reproduce this bug with d3d/d2d disabled? > Perhaps checking if having an GPU intensive background application like Starcraft 2 title screen aggregate the problem might be interesting. > Checking these should narrow down the problem. sunset.in.trance: can you help answer the questions above on your affected system? Thanks!
Flags: needinfo?(sunset.in.trance)
Comment 37•10 years ago
|
||
Not resolved ia had a PoC to reproduce the crash all times in five or 10 seconds firefox crash, it it in mozalloc.dll. eax=00000000 ebx=00000000 ecx=62943896 edx=00000003 esi=628f1ec6 edi=6294379c eip=7335119c esp=0043dac8 ebp=0043db1c iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00200202 mozalloc!mozalloc_abort+0x2a: 7335119c cc int 3 it's posible to exploit it with a heap spray to found a free or a RW memory address to return. I can reproduce it in firefox and mozalloc.dll 24.0.0.5001
Comment 38•10 years ago
|
||
This is possibly happening on TBPL in mochitest-bc. See bug 937997.
![]() |
||
Comment 39•10 years ago
|
||
So khuey and I were playing with this simple testcase he wrote: http://people.mozilla.com/~khuey/tests/bigimage.html If you watch the process with vmmap: http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx you can see 32MB chunks slowing filling up the address space as Private Data. The funny thing is that these 32MB chunks are being allocated via VirtualAlloc and VirtualFree, through jemalloc, and the appropriate allocation calls are balanced. So the question is why VirtualAlloc thinks those chunks aren't available for future allocations.
Comment 40•10 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #39) > So khuey and I were playing with this simple testcase he wrote: > > http://people.mozilla.com/~khuey/tests/bigimage.html > > If you watch the process with vmmap: > > http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx > > you can see 32MB chunks slowing filling up the address space as Private Data. > I think you're looking at the wrong chunks.. I'm seeing this test steadily increase our GPU commited size. I'm investigating this right now. > The funny thing is that these 32MB chunks are being allocated via > VirtualAlloc and VirtualFree, through jemalloc, and the appropriate > allocation calls are balanced. > > So the question is why VirtualAlloc thinks those chunks aren't available for > future allocations.
Comment 41•10 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #40) > (In reply to Nathan Froyd (:froydnj) from comment #39) > > So khuey and I were playing with this simple testcase he wrote: > > > > http://people.mozilla.com/~khuey/tests/bigimage.html > > > > If you watch the process with vmmap: > > > > http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx > > > > you can see 32MB chunks slowing filling up the address space as Private Data. > > > > I think you're looking at the wrong chunks.. I'm seeing this test steadily > increase our GPU commited size. I'm investigating this right now. It did eventually reclaim the memory for me. But it took a very long time. I suspect this may be some sort of internal cache by D2D.
Running that test on my laptop OOMs the browser.
![]() |
||
Comment 43•10 years ago
|
||
So what's the difference in environment/configuration between khuey's machine and Bas's machine?
I'm running a pretty stock X220 with Intel graphics.
Comment 45•10 years ago
|
||
So yeah, one key difference is, that with my drivers GPU memory does not get mapped into your address space. On stock -intel- drivers it does for some machines. This is something I found out a couple of weeks ago. In reality this should only mean that as long as we don't have ImageLib -only- holding on to the Moz2D surfaces we'll use twice as much address space on these machines. But twice as much for normal usage should still be acceptable. The next question is, why are we using so much GPU memory when doing these big images that we're actually destroying. I don't have an answer to that yet. I've confirmed that the RefCount of our ID2D1Bitmap objects is 0 when the SourceSurfaceD2D gets destroyed, I've also confirmed that nothing I do with ID2D1Device (like ClearResources or SetMaximumTextureMemory) makes much of a difference. I'm currently working on creating a stand-alone test in Moz2D, modeled after what Kyle's code ends up doing inside Moz2D, and we'll see if that reproduces it. If it does reproduce it, I need to look at what we can do different to fix it, I have a theory for something we may be able to use as a workaround. It won't be too pretty, but it shouldn't be too bad either.
Comment 46•10 years ago
|
||
> Some possibilities: > * the nvidia driver has a bug in the use of shared memory > So khuey and I were playing with this simple testcase he wrote: > > http://people.mozilla.com/~khuey/tests/bigimage.html > > If you watch the process with vmmap: > > http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx > > you can see 32MB chunks slowing filling up the address space as Private Data. Works all fine for me! - Win7 64bit - Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 ID:20131111154639 CSet: ab495a0b16a0 - Nvidia Quadro FX 1500 - Nvidia Driver 307.83
Comment 47•10 years ago
|
||
More interesting data here, I've done a couple of test cases: uint8_t* data = new uint8_t[512 * 512 * 4]; for (int i = 0; i < 100000; i++) { RefPtr<SourceSurface> surf = mDT->CreateSourceSurfaceFromData(data, IntSize(512, 512), 512 * 4, FORMAT_B8G8R8A8); mDT->Flush(); } delete [] data; In this test, GPU committed will grow to 200 MB as the test runs, then reset to +/- 10 MB, and go again, etc. etc. Change that to 1024x1024, and it will grow to 800 MB as the test runs, then reset back, etc. etc. Some quick math, would then seem to indicate that it grows to +/- 200 times the image size. Now, because we all love data, I did the following: for (int i = 0; i < 100000; i++) { RefPtr<SourceSurface> surf = mDT->CreateSourceSurfaceFromData(data, IntSize(1024, 1024), 1024 * 4, FORMAT_B8G8R8A8); RefPtr<SourceSurface> surf2 = mDT->CreateSourceSurfaceFromData(data, IntSize(4, 4), 4 * 4, FORMAT_B8G8R8A8); mDT->Flush(); } Our perceptive readers will probably be able to predict what happens, yes indeed! We grow to 400 MB. So what this seems to indicate is that D2D just keeps the last 200 textures it used around, just in case it's going to need them again. Without looking at all at the size! If I make these 4096x2048 it will just grow to 6 GB! Now if all those textures end up in your address space, as you might well imagine, we're in a pickle. I have an idea for a workaround, sadly that workaround will also mean we don't get the benefit of this cache, but if that's what's needed to improve the situation, so be it.
Comment 48•10 years ago
|
||
So this seems to be a part of D3D, not D2D, this simpler test case using D3D directly reproduces the problem as well: for (int i = 0; i < 100000; i++) { RefPtr<ID3D10Texture2D> texture; CD3D10_TEXTURE2D_DESC desc(DXGI_FORMAT_B8G8R8A8_UNORM, 1024, 1024, 1, 1); Factory::GetDirect3D10Device()->CreateTexture2D(&desc, nullptr, byRef(texture)); CD3D10_TEXTURE2D_DESC desc2(DXGI_FORMAT_B8G8R8A8_UNORM, 2, 2, 1, 1); Factory::GetDirect3D10Device()->CreateTexture2D(&desc2, nullptr, byRef(texture)); Factory::GetDirect3D10Device()->Flush(); } Still looking into a workaround, because of this being a D3D issue my previous idea didn't work. The question now also becomes whether this is driver dependent to a larger extent than I'd hoped.
Comment 49•10 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #48) > So this seems to be a part of D3D, not D2D, this simpler test case using D3D > directly reproduces the problem as well: > > for (int i = 0; i < 100000; i++) { > > RefPtr<ID3D10Texture2D> texture; > CD3D10_TEXTURE2D_DESC desc(DXGI_FORMAT_B8G8R8A8_UNORM, 1024, 1024, 1, 1); > Factory::GetDirect3D10Device()->CreateTexture2D(&desc, nullptr, > byRef(texture)); > CD3D10_TEXTURE2D_DESC desc2(DXGI_FORMAT_B8G8R8A8_UNORM, 2, 2, 1, 1); > Factory::GetDirect3D10Device()->CreateTexture2D(&desc2, nullptr, > byRef(texture)); > Factory::GetDirect3D10Device()->Flush(); > } > > Still looking into a workaround, because of this being a D3D issue my > previous idea didn't work. The question now also becomes whether this is > driver dependent to a larger extent than I'd hoped. It looks like this cache is specific to my device, I don't see the same issue on other devices, but I see subtly different problems. On devices where some of my testcases crash (where textures are mapped into the address space), I may be close to a workaround.
Comment 50•10 years ago
|
||
I've pushed several try runs with different attempts at differing degrees of mitigation for these problems, the test case in this bug hits a very special case that shouldn't be common, but it should also be addressed by at least some of these patches: https://tbpl.mozilla.org/?tree=Try&rev=c797cdc08ea0 https://tbpl.mozilla.org/?tree=Try&rev=b12eb295e5cd https://tbpl.mozilla.org/?tree=Try&rev=e66af2b5c47f https://tbpl.mozilla.org/?tree=Try&rev=f083753edc0b
Comment 51•10 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #45) > this should only mean that as long as we don't have ImageLib -only- holding > on to the Moz2D surfaces we'll use twice as much address space on these > machines. This sounds like bug 700545 needs a larger scope.
Comment 52•10 years ago
|
||
Fwiw, the test case that was added to this bug works, because it's drawing to a canvas that's never shown or used, which means all those images will get stored in the drawing pipeline which is never flushed. https://tbpl.mozilla.org/?tree=Try&rev=e66af2b5c47f contains a patch that takes care of making sure all canvases are regularly flushed when large amount of uploads are put in the pipeline. This would be a viable short term workaround and I'm not against landing this, but I don't have -too- high hopes it'll fix bc, let's see though.
Comment 53•10 years ago
|
||
So https://tbpl.mozilla.org/?tree=Try&rev=e66af2b5c47f still has an OOM failure in M2. Even though it does fix the test listed in this bug.
Comment 54•9 years ago
|
||
(In reply to Jet Villegas (:jet) from comment #36) > (In reply to Benoit Girard (:BenWa) from comment #34) >... > sunset.in.trance: can you help answer the questions above on your affected > system? Thanks! :jet, I pm sunset some months ago and no response, do I don't think we're going to get the feedback
Flags: needinfo?(sunset.in.trance)
![]() |
||
Comment 55•9 years ago
|
||
Did anything positive come out of the tests of comment 50?
![]() |
||
Updated•9 years ago
|
Summary: Investigate why we're running out of VM with probably-leaked memory mappings → Investigate why we're running out of VM with probably-leaked memory mappings relating to gfx
Whiteboard: [MemShrink:P1]
Comment 56•5 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #47) > In this test, GPU committed will grow to 200 MB as the test runs, then reset > to +/- 10 MB, and go again, etc. etc. > > Change that to 1024x1024, and it will grow to 800 MB as the test runs, then > reset back, etc. etc. Some quick math, would then seem to indicate that it > grows to +/- 200 times the image size. > > Our perceptive readers will probably be able to predict what happens, yes > indeed! We grow to 400 MB. > > So what this seems to indicate is that D2D just keeps the last 200 textures > it used around, just in case it's going to need them again. Without looking > at all at the size! If I make these 4096x2048 it will just grow to 6 GB! Now > if all those textures end up in your address space, as you might well > imagine, we're in a pickle. I have an idea for a workaround, sadly that > workaround will also mean we don't get the benefit of this cache, but if > that's what's needed to improve the situation, so be it. Can this be a case for gpu-commited leaks like bug 1432086 or bug 1436848 ?
Comment 57•5 years ago
|
||
(In reply to Benjamin Smedberg from comment #0) > * the nvidia driver has a bug in the use of shared memory Idk if relevent for this thread, but as I remember a lot of nvidia drivers (long before windows 10 release, and few around AU) had slow non-paged memory leak (something like 1 gigabyte for every 150 hours of uptime). After updating nvidia driver to 390.77 I have also got first BSOD since AU, caused by nvlddmkm. (PAGE_FAULT_IN_NONPAGED_AREA, probably related to bug 1440026)
Comment 58•5 years ago
|
||
Updated•6 months ago
|
Severity: normal → S3
Comment 59•5 months ago
|
||
The severity field for this bug is relatively low, S3. However, the bug has 12 votes and 53 CCs.
:jstutte, could you consider increasing the bug severity?
For more information, please visit auto_nag documentation.
Flags: needinfo?(jstutte)
Comment 60•5 months ago
|
||
Hi Andrew, I see some comment from you on the blocked bug 965936, can you help with triaging this better?
Flags: needinfo?(jstutte) → needinfo?(continuation)
Comment 61•5 months ago
|
||
A 9 year old performance investigation is not very useful. I'm sure we have fixed (and created) plenty of VM leaks in the meanwhile. If somebody is still experiencing similar issues, they should file a new bug.
Status: NEW → RESOLVED
Closed: 5 months ago
Flags: needinfo?(continuation)
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•