Closed Bug 930797 Opened 11 years ago Closed 6 years ago

Specific instance of [@ EMPTY: no crashing thread identified; corrupt dump] the user caught the crash in WinDbg

Categories

(Core :: Graphics: Layers, defect)

24 Branch
x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: kbrosnan, Assigned: cpearce)

References

Details

(Keywords: crash)

Crash Data

Attachments

(5 files)

breaking out from bug 837835

bug 837835 comment 94
"Thanks; I finally managed to catch the crash, though it took two days of slow browsing under the debugger. However, because it took so long, the log file is giant: 565 MB. How much I can cut from it so the log would be light enough for easy upload for developers to see what happened?

(The way it goes is I open a bunch of YouTube videos, which becomes unbearable for FF so it dies.)

Also, for just in case of need, I have made "minidump", which is not really mini since it takes 3.5 GB. But I would prefer not to upload it -- unless it would become absolutely necessary -- since has personal information."


#136  Id: 51e0.74a4 Suspend: 1 Teb: ffe20000 Unfrozen "Media Decode"
ChildEBP RetAddr  
ea1df480 583e1218 mozalloc!mozalloc_abort(char * msg = 0xea1df498 "out of memory: 0x0000000000151800 bytes requested")+0x2a [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\memory\mozalloc\mozalloc_abort.cpp @ 30]
ea1df4d0 583e10a2 mozalloc!mozalloc_handle_oom(unsigned int size = 0x151800)+0x5f [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\memory\mozalloc\mozalloc_oom.cpp @ 50]
ea1df4e0 105e28ab mozalloc!moz_xmalloc(unsigned int size = 0x151800)+0x1b [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\memory\mozalloc\mozalloc.cpp @ 56]
ea1df4f8 106235c9 xul!mozilla::layers::BufferRecycleBin::GetBuffer(unsigned int aSize = 0x9c4cf9a4)+0x52 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\gfx\layers\imagecontainer.cpp @ 111]
ea1df504 10623536 xul!mozilla::layers::PlanarYCbCrImage::AllocateBuffer(unsigned int aSize = 0x151800)+0x10 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\gfx\layers\imagecontainer.cpp @ 427]
ea1df518 1064b315 xul!mozilla::layers::PlanarYCbCrImage::CopyData(struct mozilla::layers::PlanarYCbCrImage::Data * aData = 0xea1df53c)+0x2f [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\gfx\layers\imagecontainer.cpp @ 463]
ea1df524 10a44a41 xul!mozilla::layers::PlanarYCbCrImage::SetData(struct mozilla::layers::PlanarYCbCrImage::Data * aData = 0xea1df53c)+0xa [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\gfx\layers\imagecontainer.cpp @ 485]
ea1df598 10a44e5f xul!mozilla::VideoData::Create(class mozilla::VideoInfo * aInfo = 0x3a0feae0, class mozilla::layers::ImageContainer * aContainer = 0x554c6fb0, class mozilla::layers::Image * aImage = 0xea1df5e0, int64 aOffset = 0n487161, int64 aTime = 0n300300, int64 aEndTime = 0n333666, struct mozilla::VideoData::YCbCrBuffer * aBuffer = 0xea1df668, bool aKeyframe = false, int64 aTimecode = 0n-1, struct nsIntRect aPicture = struct nsIntRect)+0x266 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\mediadecoderreader.cpp @ 252]
ea1df5e0 1007ad2a xul!mozilla::VideoData::Create(class mozilla::VideoInfo * aInfo = 0x3a0feae0, class mozilla::layers::ImageContainer * aContainer = 0x554c6fb0, int64 aOffset = 0n487161, int64 aTime = 0n300300, int64 aEndTime = 0n333666, struct mozilla::VideoData::YCbCrBuffer * aBuffer = 0xea1df668, bool aKeyframe = false, int64 aTimecode = 0n-1, struct nsIntRect aPicture = struct nsIntRect)+0x3a [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\mediadecoderreader.cpp @ 266]
ea1df6b4 1007b06f xul!mozilla::WMFReader::CreateBasicVideoFrame(struct IMFSample * aSample = 0x264b0208, int64 aTimestampUsecs = 0n300300, int64 aDurationUsecs = 0n33366, int64 aOffsetBytes = 0n487161, class mozilla::VideoData ** aOutVideoData = 0xea1df724)+0x1d1 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\wmf\wmfreader.cpp @ 856]
ea1df714 10a47885 xul!mozilla::WMFReader::DecodeVideoFrame(bool * aKeyframeSkip = 0xea1df7d3, int64 aTimeThreshold = 0n333666)+0x1e1 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\wmf\wmfreader.cpp @ 987]
ea1df7dc 10a489e2 xul!mozilla::MediaDecoderStateMachine::DecodeLoop(void)+0x248 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\mediadecoderstatemachine.cpp @ 905]
ea1df7f4 10589423 xul!mozilla::MediaDecoderStateMachine::DecodeThreadRun(void)+0x9f [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\mediadecoderstatemachine.cpp @ 507]
ea1df7f8 0fdc7a51 xul!nsRunnableMethodImpl<void (void)+0xe [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\obj-firefox\dist\include\nsthreadutils.h @ 351]
ea1df86c 0fe1b1b8 xul!nsThread::ProcessNextEvent(bool mayWait = true, bool * result = 0xea1df89c)+0x221 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\xpcom\threads\nsthread.cpp @ 632]
ea1df894 5176e927 xul!nsThread::ThreadFunc(void * arg = 0x532d0201)+0x98 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\xpcom\threads\nsthread.cpp @ 264]
ea1df8b4 5177329d nss3!_PR_NativeRunThread(void * arg = 0x223a5860)+0x167 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\nsprpub\pr\src\threads\combined\pruthr.c @ 419]
ea1df8bc 51d8c6de nss3!pr_root(void * arg = 0x223a5860)+0xd [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\nsprpub\pr\src\md\windows\w95thred.c @ 90]
ea1df8f4 51d8c788 MSVCR100!_callthreadstartex(void)+0x1b [f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c @ 314]
ea1df900 756f336a MSVCR100!_threadstartex(void * ptd = 0x02e4ec78)+0x64 [f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c @ 292]
ea1df90c 77569f72 kernel32!BaseThreadInitThunk+0xe
ea1df94c 77569f45 ntdll!__RtlUserThreadStart+0x70
ea1df964 00000000 ntdll!_RtlUserThreadStart+0x1b
Apparently this shows up in crash-stats a little bit:
https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort%28char+const*+const%29+|+mozalloc_handle_oom%28unsigned+int%29+|+moz_xmalloc+|+mozilla%3A%3Alayers%3A%3ABufferRecycleBin%3A%3AGetBuffer%28unsigned+int%29&
Slightly different signature on Android:
https://crash-stats.mozilla.com/report/list?signature=mozalloc_abort+|+moz_xmalloc+|+mozilla%3A%3Alayers%3A%3ABufferRecycleBin%3A%3AGetBuffer&
Crash Signature: [@ EMPTY: no crashing thread identified; corrupt dump] → [@ EMPTY: no crashing thread identified; corrupt dump] [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | mozilla::layers::BufferRecycleBin::GetBuffer(unsigned int) ] [@ mozalloc_abort | moz_xmalloc | mozilla::layer…
I too am experiencing this problem.  I have captured the info as described here:

https://developer.mozilla.org/en-US/docs/How_to_get_a_stacktrace_with_WinDbg

The file is about 7.58 Mb.  Can someone tell me how to upload the file?

I'm stuck at a point in WinDbg generating an access violation, and may be stuck in a loop (I repeatedly hit F5).  I haven't gotten to the point of getting a windows saying FF has crashed yet.
Flags: needinfo?
(In reply to tjsepka from comment #2)

An access violation is a type of crash. You can't really continue running after that point, so WinDbg keeps putting you back at the same line. With the debugger standing in the way, other code hasn't noticed the crash, so the crash window won't come up.

Bugzilla ought to be able to handle a small dump, especially if you compress it. Are you able to use the "Add an attachment" link? (I'm not sure if it needs special permission)
Flags: needinfo?
Actually, maybe you might not want to attach it here if you are concerned about others seeing the contents of memory.
Thanks for the feedback.  I opened bug 933453, since the cause night not be the same.
It turns out that this video/core/graphic layers resource leak is still not anywhere fixed in FF 25:

http://crash-stats.mozilla.com/report/index/e4fe4044-890a-415f-9333-8f0ee2131109

As it was with FF24, I just watched lots of YouTube videos and page rendering of has started to disintegrate: at first, it became slower, then it painted page in tabs in black, leaving only video stream itself, then video started flickering, then pull-down menu with list of currently opened tabs became rendered over and over on itself -- obviously layers rendering became corrupted -- then all of FF turned to complete white, including the title and minimize/maximize/close window controls.
Video layer/core thing in FF 25 is a disaster. Just now it has crashed so bad that not only Crash Reporter was unable to grab the crash, get memory dump, but it ***has died itself***: "Unfortunately, the crash reporter is unable to submit a report for this crash. Details: The application did not identify itself."
User Dderss, thank you for the minidump.

So looks like there are two things that need addressing:

1)  Whatever causes the low-memory condition, perhaps.
2)  BufferRecycleBin::GetBuffer allocates a buffer of size aSize using infallible malloc.  But I expect these buffers can get pretty big; this should be doing a fallible allocation.

The actual size being allocated here is ~1.3MB, so it's a bit odd that we're hitting OOM on that...

Bas, you have blame for the GetBuffer thing. Could you take a look at this, please?
Flags: needinfo?(bas)
Hmm, though bas just moved that.  Anyway, let's start there.  ;)
Keywords: crash
From what I see visually, after some extensive use of FF (including watching YouTube videos in HMTL5 player) page rendering starts to disintegrate gradually, where different parts of FF graphic layers fall apart.

Parts of page in tabs black out (as in the attached screenshot above), or menu during scrolling through it being overlayed/rendered on itself -- until situation becomes so bad that the crash happens during Malloc attempt. Once FF's video has died so badly that whole FF window became completely white, including system window control buttons ("_", "O", "X").

So there are primary errors that never cause a crash by themselves, and the crash only happens at the Malloc failure at some point. How those primary, initial errors -- that do not directly cause crash, unlike secondary Malloc failures -- can be tracked, so FF source code could be fixed?
Actually, I wonder whether a 1.3MB allocation could fail due to bug 941837...  If so, that might account for the other symptoms too: 1MB+ graphics buffers are pretty common (e.g. a 640x480 video frame), so if other callsites are doing fallible allocation and bailing out on OOM due to fragmentation, that could cause the symptoms you see.  Of course they could be caused by other things too.

Once those symptoms start happening, what does about:memory say?
Dderss, we've just added some extra debugging tools to our Nightly builds to help measure this. Could you try today's nightly build from nightly.mozilla.org, and when things start getting "weird" could you load about:memory and save a memory report? I'm especially interested in  vsize-max-contiguous numbers as we approach the crash point.
Flags: needinfo?(zxspectrum3579)
Hi there, Benjamin.

I just installed the nightly build ("firefox-28.0a1.en-US.win32.installer.exe"), there is new parameter "size-max-contiguous", indeed. In a just loaded session this parameter is such:

1,805.94 MB ── vsize-max-contiguous

Usually it takes a day or two to bring the video rendering to degradation, so I might not give any interesting results sooner. I will check how this value is going to change alone the way, and will try to save memory report as soon as I will see visual glitches.
Flags: needinfo?(zxspectrum3579)
Sorry for the off topic (sort of), but in FF 28 nightly (firefox-28.0a1.en-US.win32.installer.exe) highlighting of the currently active tab does happen on my system -- as shown on a screenshot.

Since this drives me crazy, and bringing FF to expose symptoms of video rendering corruption will take some time with active browsing, I have to ask if there is way to edit tabs' style properties -- at least in some resource file -- so I could see currently active tabs highlighted, as it is supposed to be?
(In reply to Boris Zbarsky [:bz] from comment #9)
> The actual size being allocated here is ~1.3MB, so it's a bit odd that we're
> hitting OOM on that...

The minidump contains enough information to tell us what the max contiguous VM region is, someone (dmajor? bsmedberg?) could pull that information out fairly easily.
Making FF to crash turned out to be much faster than I thought, if there is goal to do it.

What I did is just opened and watched a lot of YouTube videos (which, in my case, use forced HTML5 player instead of Flash, plagin for which is turned off):

Here how the video memory goes from start of the work with session to the point where whole FF window got completely degraded, disintegrated, corrupted, as the screenshot from the attachment shows:

2,534.81 MB -- vsize
1,121.88 MB -- vsize-max-contiguous

2,825.41 MB -- vsize
  699.88 MB -- vsize-max-contiguous 
  
  2,982.71 MB -- vsize
  540.88 MB -- vsize-max-contiguous
  
  3,376.72 MB -- vsize
  238.61 MB -- vsize-max-contiguous

  3,588.18 MB -- vsize
  168.07 MB -- vsize-max-contiguous
  
  3,785.73 MB -- vsize
   10.00 MB -- vsize-max-contiguous

_____________________________________

As soon as "vsize-max-contiguous" hits 10 MB, visuals completely die. The stage after this screenshot is completely white application.

I managed to capture the latest memory values by pointing mouse on white screen, like blindfolded: started memory measurement, then saved result.

Thankfully, FF crashed right after that operation, and not before, so we have those values now. The crash report this time has proper memory dump, with stack and everything, it is not empty, so no debugging is needed:

https://crash-stats.mozilla.com/report/index/93a85de0-dd1b-4693-8d3f-9350e2131122
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #16)
> (In reply to Boris Zbarsky [:bz] from comment #9)
> > The actual size being allocated here is ~1.3MB, so it's a bit odd that we're
> > hitting OOM on that...
> 
> The minidump contains enough information to tell us what the max contiguous
> VM region is, someone (dmajor? bsmedberg?) could pull that information out
> fairly easily.

We could extract that if we had a minidump. So far the reports have been empty though.

And if 1.3MB seems low, the report in comment 8 had an OOM size of 40 (bytes!)
(In reply to User Dderss from comment #17)
> https://crash-stats.mozilla.com/report/index/93a85de0-dd1b-4693-8d3f-
> 9350e2131122

Cool, this one had an intact minidump.  This is the same 1.3MB in BufferRecycleBin::GetBuffer.
            msg = 0x9b03f8ac "out of memory: 0x0000000000151800 bytes requested"
mozalloc!mozalloc_abort
mozalloc!mozalloc_handle_oom
mozalloc!moz_xmalloc
xul!mozilla::layers::BufferRecycleBin::GetBuffer

In the minidump I see a max vsize of 2MB at the time of crash.
Maybe code for HTML5 player does not properly release allocated memory, but if FF's engine sees that VSize value comes to upper theoretical level for 32-bit applications, or that "vsize-max-contiguous" becomes too small, why no garbage collection and/or other measures such as memory defragmentation are forced?
We have tried that, it turns out to be very hard to get right. In this case, if there's a bug that's fragmenting virtual memory, it's unlikely that trying to garbage collect would actually have any impact.
Dderss, that is very helpful information. Could you try disabling graphics acceleration in options > advanced > general > use hardware acceleration when available?
Just done that: turned hardware acceleration off, and restarted to test how it goes: no difference.

See it in the steps:

1) just opened session:

  960.50 MB -- vsize
1,805.94 MB -- vsize-max-contiguous


2) opened/reloades some tabs without videos:

1,390.97 MB -- vsize
1,805.94 MB -- vsize-max-contiguous

3) opened/reloades some tabs **with** videos (YouTube with forced HTML5 player; "avc1/mp4a" stream):

1,686.60 MB -- vsize
1,805.94 MB -- vsize-max-contiguous


4) opened/reloades more and more tabs **with** videos (including 1h-2h long YouTube):

2,115.88 MB -- vsize
1,414.88 MB -- vsize-max-contiguous

5) without single other tab load/reload, tried to use "Minimize memory usage" on "about:memory" page -- it did not help:

2,154.77 MB -- vsize
1,351.88 MB -- vsize-max-contiguous

______________________________________________________

It looks like step #4 was too much, and free memory got corrupted with fragmentation. After that, no matter what you do, with every move "vsize-max-contiguous" goes down until the very end with 10 MB and less, where video layers disintegrate and FF crashes (this time I did not want to continue to this point as this would be waste of time.)

Maybe step #4 is some point where "explicit" allocated memory plus VSize and free memory come close to 4 GB (32-bit) limit, after which free memory always gets fragmented?
I might be getting confused, but isn't vsize the total amount of *reserved* virtual memory space? Whereas vsize-max-contiguous is the maximum contiguous chunk in *unreserved* virtual memory space. So I would expect that for a large address aware 32-bit process like Firefox on a 64-bit OS, vsize-max-contiguous could never be more than |4GB - vsize| - but that says nothing about address space fragmentation without also knowing the *resident* set size, i.e. how much RAM the process is actually using. If resident is close to vsize and the process is just using a lot of RAM, there might be a memory leak somewhere but the problem isn't fragmentation.
At least, fragmentation of the unreserved address space. There might well still be fragmentation at the allocation level :)
I just conducted additional experiments with the tabs with videos in it. After I watch the video and close the tab (and even clean the closed tabs list and use minimize memory button), vsize rarely or never goes up. Not even "vsize" sometimes lowers value. After loading many tabs, I watched and purified three tabs with videos, and the results are:

  1,715.57 MB -->  1,705.62 MB (100.0%) -- explicit
  3,191.90 MB --> 3,175.90 MB -- vsize
  243.88 MB --> 243.88 MB -- vsize-max-contiguous  

If I understand correctly, FireFox does not use memory defragmentation procedures, so "vsize-max-contiguous" can only grow back if you are lucky, and you close some fat tab with giant video, which was previously allocated near to the biggest remaining non-fragmented piece. In my experiment, I was not lucky, so all the 16 MB of memory that was released was not near those 243.88 MBs, so this "vsize-max-contiguous" value did not increase.

Maybe there are no leaks or anything corruptive in the source code, and the solution is to include forced memory defragmentation, which would automatically turn on if "vsize-max-contiguous" goes too small?
Thanks, that confirms that vsize is much bigger than the amount of memory allocated (some of what I said above may have been a bit incorrect, but I think the idea is right). Note that bug 941837, which will hopefully make it to tomorrow's Nightly, should fix a long-standing VM fragmentation bug, so it'd be nice to see if that improves matters.
Thanks.

Can also please anyone answer couple of critical (at least for some cases like mine) questions?
___________________________________________

1) I always use "click-to-load" option for tabs, so FF should start immediately in absolutely *skeletal* state (the list of tabs and groups takes only few KBs) and take minimum space in the memory. However, in reality it loads complete fat SessionStore.JSON object and unpacks it into the memory with the following values:

563.23 MB (100.0%) -- explicit
1,028.82 MB -- vsize

before I even click-to-load any tab (only the current active tab is displayed; which is "about:memory" this time).

Why so? Is there plan to implement **true** click-to-load feature? Those 563.23 MB are absolutely not needed for any purpose whatsoever until I click to load a tab. And even then only the corresponding fraction of that could be needed, not whole 563/1028 MB of explicit/vsize that I get now without even clicking on any tabs yet.

___________________________________________

2) I found out that SessionStore collects garbage from tabs that I visited and *closed* (including with cleared closed tabs command, and without "Back/Forward" pages that could be cashable).

If I export my tabs into bookmarks, then kill SessionStore.JS object, open FF, and import all of the tabs/groups back into the session, "just started" session will take 300 MB of explicit memory, not 563 MB (information from question #1) as it is now (I did it few times to confirm).

This means that the garbage that gets collected in SessionStore.JS eats hundreds of megabytes of memory. In the past, my SessionStore.JS grew to values such as 30 MB and even 60 MB because of this.

Is there fundamental change to SessionStore.JS' low level management that would exclude its corruption/garbage collection? This issue is a plague for *years*. (But maybe it already resolved in FF 26, 27 and I just do not know yet.)

Thanks in advance.
It might be possible to load the SessionStore file more lazily, but it really shouldn't be that big in the first place. There are bugs open for this - for instance in bug 934935, it seems that Facebook's chat interface was rapidly generating ad-related data that ended up getting saved - this appears to have been fixed on their end now. In bug 936271 a strategy was proposed to workaround bad behavior like this, which should hopefully reduce the problem significantly.
If I understand correctly, with "load-to-click" option turned on, the only thing that needs load besides the links for tabs and tabs' titles, is  Favicons. If this would be implemented, FF would be lighting fast and "just opened" state of the session would take like 10 times less memory.

Thanks to you and other good people of the community I see that there is movement to deal with SS.JSON bloat with old objects that do not belong to any of tabs and their history ("backward/forward"). Personally, I like the behaviour, where FF remembers how pages looked after restart, including comment form data, which was not posted yet, for example.

I hope one of the strategies to deal with objects that lost their owners after crashes is to make forced garbage collection on FF's exit. A cycle where for each tab with cached pages SS.JSON gets rebuilt with all the junk shed.
Let's please keep this bug to concrete things about the seen memory issue. Efficiency of session storage should not be discussed here unless you find very concrete hints that session storage is causing you to run out of memory. Session storage is being completely rewritten due to multiple reasons anyhow, as I understand, so analyzing it here doesn't help us much.
I thought it was connected, because having the memory use blown up right from "just opened" state of the browser might indirectly worsen the situation.

And clicking to just one tab -- for example, Verge.com (though other pages give similar result) -- affects memory usage this way:

682.26 MB (100.0%) -- explicit  
1,109.82 MB -- vsize

+119 MB added to "explicit" and 81 MB added to "vsize" with just one tab loaded. Even if I am not watching any videos, memory use grows with *additional* gigabytes in just like 15-20 tabs loaded (is this expected behaviour?), and kills FF anyway, though it takes longer.

Maybe SessionStore's corrupted garbage unfolds in memory in some weird way right from the start of FF, and tampers with both size of newly allocated memory with each loaded tab, and memory's release when tab is closed, which causes fragmentation.

I am not sure that this is anywhere concrete hint or not, though; sorry, Robert.
I honestly don't think sessionstore has a major part in the memory issues. We know there are issues with session store and those are being worked at, but I'm pretty sure they are not the core behind the issues you are seeing here.

Maybe we can get more out of the investigations of David in comment #19 or the question to Bas in comment #9.
(In reply to Boris Zbarsky [:bz] from comment #9)
> User Dderss, thank you for the minidump.
> 
> So looks like there are two things that need addressing:
> 
> 1)  Whatever causes the low-memory condition, perhaps.
> 2)  BufferRecycleBin::GetBuffer allocates a buffer of size aSize using
> infallible malloc.  But I expect these buffers can get pretty big; this
> should be doing a fallible allocation.
> 
> The actual size being allocated here is ~1.3MB, so it's a bit odd that we're
> hitting OOM on that...
> 
> Bas, you have blame for the GetBuffer thing. Could you take a look at this,
> please?

I think I just moved that around between platforms. The original code for this was written by roc. Fwiw, I don't mind this becoming a fallible allocation but I don't know much about how it's used.
Flags: needinfo?(bas)
roc?
Flags: needinfo?(roc)
Yes, we should make this fallible. This will require handling PlanarYCbCrImages whose buffer allocation failed, but that's not going to be hard to do.
Flags: needinfo?(roc)
David Major offered (https://bugzilla.mozilla.org/show_bug.cgi?id=941837#c35) some help with debugging the memory/fragmentation issue of this bug, thanks for assistance.

The question is, why HTML5 player video cache does not get cut when FF's engine approaches limit of its 32-bit addressing ability? I can easily kill FF by simply opening few big YouTube videos (some of which could be two hour long films, others 4K 3D videos, so on). Each big video consumes hundreds of megabytes of virtual memory allocation and "vsize-max-contiguous" gets to zero and kills FF in no time. (That said, "vsize-max-contiguous" erodes any way, even if I do not open videos; however, it takes two-three days to crash FF.)

Cutting allocated memory for HMTL5 video player cache, and memory defragmentation mechanism should be used to free virtual memory, and restore contiguous memory, or am I wrong?

Also, why FF does not have paging mechanism, where it could surpass 32-bit limitation for memory addressing? It is not right that with systems having 8 and 16 GB RAM (my case) FF can only take no more than 4 GB. I would prefer this to take an action *first*, and FF cutting allocated memory for earlier opened tabs only when FF would actually consume all of the memory. (Say, FF would start cutting allocated memory from oldest accessed tabs when it would *all* of of RAM, not earlier.)

(This is beside the point, but the fact that 64-bit FF was discontinued last year, is very strange. Nowadays even smartphones have 64-bit browsers: like Safari for iOS on Apple A7 SoC.)
Here is the latest crash:

https://crash-stats.mozilla.com/report/index/79ea34ac-191a-41ae-a5a2-f3e052131202

Pages like Verge.com take up to 100-120 megabytes of allocated memory. Considering that "just opened" state of FF is 500-650 MB (which is wrong, because "click-to-load" option for tabs is on and FF should not unfold SessionStore object completely unless I click on a tab), this means that I have to load like 20 tabs and just a couple of paused videos before FF will crash.

Why memory management does not work even in FF 28? Why the earliest accessed tabs do not get unloaded, if "vsize" approaches 4 GB, and "vsize-max-contiguous" goes to zero?
User Dderss: let's not go way off track here. There's clearly a bug that you're hitting, and we'd like to get to the bottom of it. Theorizing about what Firefox ought to do in low-memory situations is not terribly helpful here--we simply shouldn't be using this much memory in this situation. We should find and fix the root cause of that, full stop.
Theodore, what I wrote is relevant to the issue.

Especially on the fact that FF engine does not forcefully release memory allocations for older tabs/pages when "vsize-max-contiguous" and "vsize" near crash-borderline values.

You just check "vsize-max-contiguous"/"vsize" values and launch procedure to free the memory, when needed. Then FF will never crash of out of memory issue (unless I load one tab which would require malloc of like 3 GBs right away -- which is not coming until YouTube will implement support of 120 FPS 8K stereo video).

Do you know in which module such code could be inserted?
I wrote some instructions for collecting a VirtualAlloc trace in xperf. I need to clean this up and put it in a better place like a wiki page, but for now it's here: https://etherpad.mozilla.org/sOCISHDFSm

If you don't mind using this tool, please try to capture one of your scenarios where FF dies quickly. A trace file will help us see where the memory allocations are coming from.
David, thanks for the instruction. For how long I should have tracing on? Until the crash (the tracing file might be huge)? Or just few minutes will be enough?
Ideally for the whole length, otherwise you might have leaks that we can't see. Xperf should be able to handle it; today I ran a 45 minute trace without overflowing its buffers.
Thanks; David, how do I make setting to limit tracing to "firefox.exe" process? (I could not find such limitation in either in your general instruction, or batch command file.)
The VirtualAlloc tracing feature is system-wide. I don't know of a way to select a single process. I recommend not running other intensive applications during the trace, in order to minimize the extra data in the file.
Thanks, but will I be able to cut out the data on all the processes besides "firefox.exe" from the tracing log? I am not sure I will be able to make pure system run of FF. (My RAM is 16 GB so usually I never even close to be out of memory, there is no issue with other processes somehow tampering with FF. So the only issue is that there are a lot of things besides FF, and I do not want to include it in the log file.)
I don't know of a way to filter the log file. If the other processes are not actively allocating memory, then they won't add much overhead. But I understand the concerns -- if you don't want to send a log with extra data, then maybe xperf is not a good approach.

As an alternative, you could try using VMMap (http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx) to monitor the FF process. I have not had much success with the "launch and trace" feature, so I typically use the "view a running process" mode. If you refresh (F5) every minute or so, and save the log at the end, we might be able to see some patterns. It gives a much less complete picture than xperf, but hopefully better than nothing.
David, I have tried to clean up as much as possible and run XPerf. I successfully crashed FF (fully updated FF 28, alpha), buy there was error after I stopped tracing, and trace log did not appear:

_________________________________________________________________________________

c:\Program Files (x86)\Windows Kits\8.1\Windows Performance Toolkit>xperf_virtualalloc.cmd

Tracing started. Press a key to stop.
Stopping...

xperf: error: NT Kernel Logger: Transferred copy name was not recognized as acceptable by WMI data provider (0x1069).

c:\Program Files (x86)\Windows Kits\8.1\Windows Performance Toolkit>
I did some local experiments, and I think this indicates that the tracing buffers are full. Here are some possible ideas:
* Increase the -MaxFile value, maybe 2048
* Add "-FileMode Circular" after the MaxFile number. The disadvantage of a circular buffer is that if you have too many events, we will lose the events from the beginning.
* Try to get the browser to crash sooner
* Stop tracing earlier, before you crash. Unfortunately this also produces incomplete data.

Sorry for the back-and-forth. I had hoped this script would work more smoothly.
Also, after taking another look at the memory dump from comment 17, I see that there are many external modules (Bluetooth hook, two ASUS hooks, ScreenCapture, several Skydrive) loaded into the FF process. It's possible that one of these is not interacting well with some change in 25.

On the other hand, it's also possible that they are harmless. Now I'm even more curious to see a trace, even if it's incomplete.
David, I have increased the MaxFile value, and the trace log was written successfully.

It takes me just ten minutes to load about and twenty+ big pages such as YT videos (HMTL5 player) or so to crash FF. I can not do it any faster, and the log file is still huge 2 GB. Please teach me how to mine this file for only relevant information so I could browse the file locally and take you chunks that could matter.

That said, I am not sure that there is a memory leak. The memory just runs out, when opened/loaded tabs take all of 4 GB limit. The earliest loaded/opened tabs just do not get unloaded from the memory, though I suppose this should happen so FF would never run out of memory. Also, even if I close all opened tabs, clear recently closed tabs history, "vsize-max-contiguous" does not grow very much, so if the value is already low and FF did not crash yet, opening any big tab would still crash FF even if "vsize" is already down from its the peak thanks to user closing and clearing previously opened/loaded tabs. From this I can conclude that memory defragmentation procedure does not get run either -- though when "vsize-max-contiguous" is close to zero, it should be run after every user action.

(If I am wrong on pointing on those two memory management issues, then I can not understand why older tabs do not get unloaded and memory is kept fragmented even if user closes and clears older tabs manually.)
(In reply to User Dderss from comment #51)
> log file is still huge 2 GB. Please teach me how to mine this file for only
> relevant information so I could browse the file locally and take you chunks
> that could matter.

Trace analysis is not always straightforward and I don't want to write a book about it here in this bug. Note that you can get about 7x compression by putting the trace in a .zip file. If that is still too much data to send, then I guess you can try tracing only the first few minutes, that's probably as good as any other chunk.

> defragmentation procedure does not get run either -- though when
> "vsize-max-contiguous" is close to zero, it should be run after every user
> action.

For this bug I want to focus on looking for small/local bugs before we discuss broader architectural changes. For example: Imagine that there is a bug in one of the memory collection algorithms, then regardless of how frequently we run that code, it won't make a difference. I would like to eliminate such possibilities before using a larger hammer.
I have tried to upload the compressed file (320 MB), please check it out:

http://www.adrive.com/public/uHZ2F2/trace.zip
Thanks very much for the trace! It looks like you are using HTML5 video for YouTube, and I suspect that we are not very good at having tons of <video> tabs open (details coming soon). 

As an experiment, could you try using Flash for these videos and let us know if the browser survives longer? (You can right-click a video to confirm what it's using, it will either say "About Flash" or "About HTML5". If you have trouble, check your settings at youtube.com/html5)
Flags: needinfo?(zxspectrum3579)
One thing that stands out about this trace is that there are a LOT of threads in the browser process. I see 330 outstanding threads at the point of crash. Most of them were created by media code: mfreadwrite, mfplat, MSMPEG2VDEC, MediaDecoderStateMachine, etc.

These threads use tons of heap, but surprisingly, tons of stack too. On WOW64, Windows creates a 1MB 64-bit stack in addition to whatever 32-bit stack size you request. When a thread requests 256KB stack it actually consumes 5x that, 1.25MB. Across 330 threads that's 412MB of VA used for thread stacks alone.

I did a quick test of youtube in Flash versus HTML5. With Flash, I saw no additional threads created when I opened a new tab containing a video. (Plus the allocations for the video come out of their address space, which also helps) In HTML5 mode, I see 12 to 20 new threads for each new youtube tab I open, and these persist after the video is finished, until I close the tab.

Anthony, is there anything that can be done to improve the many-tab scenario?
Flags: needinfo?(ajones)
As I wrote above, I am only using forced HMTL5 player since few months ago since Flash crashes became so bad that FF would crash on its own just being left alone for few hours. (I have Flash with incredibly powerful passion. This this should die forever for all the crashes it caused across multiple platforms on millions of devices.)

On FF 26 flash-based YouTube tabs take way less memory than HTML5 player pages; on all-updated FF 28 even less. So it would take much longer to crash FF on this -- instead of like 20 tabs, I would have to open like 50 tabs or more.

The issue is, however, not only with the fact that HMTL5 tabs take more space than Flash, but with the fact, as I wrote, that:

1) FF does not start unload/release memory for earliest opened/loaded tabs when "vsize" approaches to theoretical limit;

2) FF does not start memory defragmentation procedure to increase "vsize-max-contiguous" when it becomes too low (there is always possibility that user would open a heavy tab that would require allocating more memory, which would cause crash);

Dealing with just excessive HTML5 space will not fix the root of the crashes. Some heavy pages take 60-100 MB of memory. I have to open like 50 of them during my session (not hard at all -- during few days) and FF will crash anyway. FF always crashes, I have over a hundred and fifty crashes in my "about:crashes" history for the last sixteen months.
> In HTML5 mode, I see 12 to 20 new threads for each new youtube tab I
> open, and these persist after the video is finished, until I close the tab.

Good thing nobody every watches more than one YouTube video in a row, right?
</sarcasm>
There are a lot of things we can do. Unfortunately none of them are easy and most of them are backend specific. We may get some practical improvement by shutting down the decoder when the video gets to the end of playback.
Assignee: nobody → cpearce
Flags: needinfo?(ajones)
(In reply to User Dderss from comment #57)
> 1) FF does not start unload/release memory for earliest opened/loaded tabs
> when "vsize" approaches to theoretical limit;

See bug 675539.

> 2) FF does not start memory defragmentation procedure to increase
> "vsize-max-contiguous" when it becomes too low

Defragmentation of memory is very difficult. Most programs do not even attempt this due to its complexity. However the JS team has been working towards such a system for few a few years now in bug 619558.
Thanks.

It seems that both of the issues already took years to be figured out, and might take another few years in the future without resolution.

In the past, there was "load-on-click" extension -- before it was implemented in FF itself. Maybe there is extension which would check "vsize" value and would send command to unload to the earliest opened/loaded tab?

If not, I will have to move on to Chrome, because I am super tired of hundreds of crash reports I have, it is unbearable. (The more some sometimes session objects corrupts so badly that it gets half-synchronised with its only copy, which results in both copies corrupted -- what kills session altogether; I have to use hourly back-up to secure myself from loosing my tabs/groups.)
Flags: needinfo?(zxspectrum3579)
Bugzilla works best when bug report are about specific things.

So let's keep this bug about the video+threads issue identified in comment 56, please.  If anyone wants to propose other ideas for reducing memory consumption, please file new bugs (and add the "[MemShrink]" tag in the whiteboard field) and/or comment in existing bugs relating to those ideas.  Thanks.
bhackett also ran into a stack size issue in bug 943924 (private bytes regression from code intended to *decrease* stack size on Windows) which now sounds like the same problem that dmajor identified in comment #56. Whatever we're doing to set the stack size on Windows, it sounds like either the function isn't working as advertised or we're using it incorrectly.
(In reply to Nicholas Nethercote [:njn] from comment #62)
> Bugzilla works best when bug report are about specific things.
> 
> So let's keep this bug about the video+threads issue identified in comment
> 56, please.  If anyone wants to propose other ideas for reducing memory
> consumption, please file new bugs (and add the "[MemShrink]" tag in the
> whiteboard field) and/or comment in existing bugs relating to those ideas. 
> Thanks.

Nicholas, this bug is about memory management issues I outlined above. Video+threads issue is just particular case that only appeared here because of the way I decided to crash FF to get memory trace log -- to find out if there are leaks (there are not). Instead of opening tabs with YouTube videos, I could have load/open a number of heavy tabs with no HMTL5 video player at all, and FF would crash anyway. So resolving big HTML5 video tab issue is basically irrelevant to big picture of this bug.

Video+threads issue should be filed as separate bug, not memory management issues, which are root of this bug. Nicholas, could you please say how do I detach comment #56 from this bug and attach it to a new one?

Thanks in advance.
Depends on: 675539, 619558
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #63)
> bhackett also ran into a stack size issue in bug 943924 (private bytes
> regression from code intended to *decrease* stack size on Windows) which now
> sounds like the same problem that dmajor identified in comment #56. Whatever
> we're doing to set the stack size on Windows, it sounds like either the
> function isn't working as advertised or we're using it incorrectly.

I looked into this a bit, and I think it's at least partially a bug in NSPR caused by bad documentation of _beginthreadex.

NSPR calls _beginthreadex here:
https://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/md/windows/ntthread.c#180
https://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/md/windows/w95thred.c#104

In both cases it passes CREATE_SUSPENDED in the for the initflag argument. However, as noted in the first comment on the documentation of _beginthreadex [0], initflag is passed through to CreateThread under the hood. In addition to CREATE_SUSPENDED, CreateThread has the flag STACK_SIZE_PARAM_IS_A_RESERVATION [1].

From the documentation: "The dwStackSize parameter specifies the initial reserve size of the stack. If this flag is not specified, dwStackSize specifies the commit size." So I believe NSPR should pass this flag into _beginthreadex or we won't get the right behavior. I'm not entirely sure why we'd get 1MB + stacksize and not just 1MB with the current implementation, though - it should just be rounding up to the default size. Maybe it ends up having to reallocate with the default size somewhere else?

Incidentally, there was also a bug filed on the Java VM about this: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6316878

[0] http://msdn.microsoft.com/en-us/library/kdzttdcb%28v=vs.100%29.aspx
[1] http://msdn.microsoft.com/en-us/library/windows/desktop/ms682453%28v=vs.85%29.aspx
(In reply to User Dderss from comment #61)
> It seems that both of the issues already took years to be figured out, and
> might take another few years in the future without resolution.

They already took years, yes, they probably will not take years to get fixed, though. We are in a renewed effort to work on out-of-memory issues right now, and as you are seeing with comments here, we are actively investigating the issues here, esp. after we have that trace from you, which is very valuable as it shows us some concrete cases we can work on.

Given that it looks like what you are seeing here is not one single issue, I think it's best to file dependent bugs for the specific fixes coming out of looking at concrete data here.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #66)
> In both cases it passes CREATE_SUSPENDED in the for the initflag argument.
> However, as noted in the first comment on the documentation of
> _beginthreadex [0], initflag is passed through to CreateThread under the
> hood. In addition to CREATE_SUSPENDED, CreateThread has the flag
> STACK_SIZE_PARAM_IS_A_RESERVATION [1].

Reserve versus commit doesn't matter for many of these OOM cases. Frequently we run out of 32-bit addresses before running out of actual memory. So what worries me is that these stacks reserve so much address space (committed or not), making those addresses unavailable to other code.
To clarify: I pointed out the stack size because it was surprising, but I'm not saying it's the only problem. Outstanding heap allocations from the media threads are more than 1200MB.
How common is this failure?

We're working towards a long term architectural shift in the media pipeline that will fix this and a bunch of other problems properly, but it'll take several months before we're in a position where we can fix this properly. There are some hacks I could do now to shutdown the Windows Media Foundation decoders (and thus the threads they create which we have *no control over*) when we think we don't need them to be active in the meantime, but I don't want to take time away from all the other urgent stuff I'm doing unless this is actively hurting users.
I am not sure how common the problem is, but there is great number of OOM-related crashes just like mine in crash report database. How many of them related to the issue of FF simply running out of memory is not clear, but I have incredible amount of crashes in the last sixteen month, FF *always* crashes, at least every couple or three days; it is really painful.

I understand that by far the most of users open just few tabs, and that is it, so this can not be a critical issue that would require full immediate effort from Mozilla software engineers. But from what I see, there is no need to make a new effort: if bug #675539 and bug #619558 will be fixed, then the crashes will finally go away. Could you please check with colleagues how soon those two bugs can be done? They are in the works for years, and maybe there are almost ready to get implemented (in FF version 30 or something)? 

If there is no hope for soon fix of those two bugs, then there is simple thing any programmer could do. Nowadays I avoid the crashes by manually checking "vsize" and "vsize_max_contiguous" values and restart FF before it would crash. But it can be helped if FF's main process would check those values and would automatically close and restart itself. It is super simple tiny procedure which could immediately help to avoid those amounts of OOM (most of them "empty") crash reports and make lives of pro-users much less unbearable.
Flags: needinfo?(cpearce)
User Dderss, please do not make any conclusions on what will be needed to fix even your scenario unless you really know what the code there does. It definitely looks like one of the largest problems in the scenario you gave us the trace for is media elements, which is what cpearce is talking about.

We'll actually try to find out how common that problem is by mining data we can get from crash reports send with Firefox 26, which just has been released and can give us more than empty minidumps in many of those out-of-memory cases. I'm setting a needinfo on Benjamin for that.
Flags: needinfo?(cpearce) → needinfo?(benjamin)
Chris wondered whether it is super-urgent to fix the current media handling, considering that you work on different, new system for media handling. As I wrote earlier, even if I do not see a single video, FF crashes anyway (it just takes longer time), so this is my addition to Chris' doubts that fixing the current media system is critical.

Lets see what Benjamin has, but I highly doubt that a lot of OOM-related crashes are directly caused by fat HTML5 video player tabs, considering that not a lot of people use forced HMTL5 YouTube player (Greasemonkey JS code) instead of Flash. 

Why waste time on issue that does not solve this bug at all? This is OOM-bug, not a fat HMTL5 video bug. Though resolving the latter would postpone the crash, it is conceptually irrelevant to the cause of the bug. So me suggesting to focus on root issues should not be a surprise. I hope someone could actually look at #comment 71 from this perspective. Thanks in advance.
(In reply to Chris Pearce (:cpearce) from comment #70)
> We're working towards a long term architectural shift in the media pipeline
> that will fix this and a bunch of other problems properly, but it'll take
> several months before we're in a position where we can fix this properly.
> There are some hacks I could do now to shutdown the Windows Media Foundation
> decoders (and thus the threads they create which we have *no control over*)
> when we think we don't need them to be active in the meantime, but I don't
> want to take time away from all the other urgent stuff I'm doing unless this
> is actively hurting users.

I don't think we should wait for that. I think we should do some work now to shut down decoders at least when we reach the end of the video. That shouldn't be very hard, should it? Just clear mReader and later recreate it and call ReadMetadata. We know any replay will start at the beginning again unless there's a Seek.
Dderss: most people who have a "lot" of tabs open don't run into this problem. I personally often have 50-100 tabs in multiple windows and rarely experience OOM crashes. So the primary point of this bug is to figure out why you are running out of memory unlike most people and fix that case.

There are already bugs filed for pretty much all of the suggestions you have made, but most of them aren't going to help this case *anyway* because you aren't running out of heap memory that we track in about:memory. Bug 942892 is filed for warning the user that they're close to OOM, but it's not totally simple to set the cutoffs correctly.

So I'd like to keep this bug focused either on using more memory and threads in HTML5 video playback than we really "ought" to be using (more than Flash), or on finding other reasons you are running out of memory.

FWIW, I did run a small query for cpearce about OOM crashes and media thread usage. I'll attach the result set shortly, but I don't see any obvious connection. There are some OOM crashes with an unusual number of threads, but those don't appear to be media threads in general (where "media thread" is defined as "a thread with mfplat.dll, msmpeg2vdec.dll or mfh264dec.dll in it's stack").
Flags: needinfo?(benjamin)
OOM crashes from FF26+ from Wednesday.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #75)
> Dderss: most people who have a "lot" of tabs open don't run into this
> problem. I personally often have 50-100 tabs in multiple windows and rarely
> experience OOM crashes. 

A heavy tab adds up to 60-120 MB to "vsize". If you happen to have 50-100 tabs and 30 of them are heavy, and through your session you happen to click/load those heavy tabs, then it will add 30 * 80-120 MB =~3 GB to "vsize". This crashes FF *absolutely* *every* *time* because FF does not unload earlier opened/loaded tabs, does not free memory allocated for them, does not defragment virtual ram.

Where I am wrong?
We're not saying those aren't problems, or even actionable bugs (e.g. see bug 675539 or the work to give JS a moving/compacting GC), but they're *hard* problems that require a lot of architectural work. Specific instances of use cases that lead to a lot of memory usage are generally much easier to solve in the short term. They don't 'fix' the problem, but they make it less urgent and buy more time for the general mechanisms to be developed. Besides, if reducing the memory usage in cases you commonly run into allows you to keep the same session going for 30 days instead of 3, isn't that a win?
I agree, resolving fat HMTL5 tabs will help, but I just checked: I had 273 (yes, two hundred seventy three) crashes in the last 16 months.

In average, a crash every 1.75 days  -- and this is considering that in significant quantity of sessions I do not play any videos, and I am trying to avoid crashes by restarting FF when it starts to fall apart, and there are cases when FF crashes without registering/sending reports (without entry in "about:crashes" page).

What I am saying is that even if I do not open a single video, FF does not survive more than few days, and this is unbearable. In the last 16 months, the few cases where days gap between crashes grew to double-digit value only exist because crashes were not registered and/or I was not using FF altogether. 30 days of FF survivability to me is unheard, I literally can not imagine it (unless the session is just few tabs).
Under FireFox 27 (beta), latest crash at the point where "vsize" got high and "vsize_max_contiguous" got near zero, there is already no Malloc error; the module is still XUL, though:

http://crash-stats.mozilla.com/report/index/97115403-ff52-415f-b50d-161652131217

Maybe changes in whether those allocations are fallible/infallible took into effect, or this is something different?
Depends on: 951333
I filed bug 951333 for the specific case you're crashing at, but that's not going to solve the "if you've run out of memory you're screwed" problem ;-)

If the problem is really just that you're running out of memory without any leaks or stupid memory usage, then you probably just need to use a 64-bit build of Firefox such as that here: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/firefox-29.0a1.en-US.win64-x86_64.installer.exe
I thought that 64-bit version development for Windows was cancelled last year. Good to know that it still in the works. I do not see memory leaks, indeed: huge HMTL5 video tabs are just excessively huge (due to reasons explained earlier in this thread), they do not cause a leak.

So until FF will be able to unload/release memory for earliest opened/loaded tabs and defragment virtual memory, 64-bit version of FF is the best possible way to not have FF crashing that often. Thanks, Benjamin: with the development build of of Tab Mix Plus extension, 64-bit FF feels superb.
So, currently, the observations in comment 56 (threads, stack etc. from the decoder) and/or the rapid decrease of vsize-max-contiguous in comment 17 are actually no problems regarding out-of-memory- or out-of-virtual-memory-crashes?
Rapid decrease of vsize-max-contiguous in #comment 17 and observations in #comment56 are:

1) might still cause significant problems for pro-users in 32-bit versions FF;

2) less problems with 64-bit version if there is 8 GB of physical RAM; with session uptime growing (and more fat HTML5 video tabs opened, among any other tabs), more of virtual memory that FF allocates would spill out to Windows page file, slowing down the system;

3) significantly less problems with 64-bit version if there is enough size of physical memory for the browser to not go overboard to OS' page file for quite a significant time -- as in my case (I have 16 GB RAM, but for the sake of SSD, I have page file turned off, so the crash will be inevitable).

Eventually even in my configuration FF will crash (since it never frees allocated memory for earlier loaded tabs which are not closed), but for my configuration, variant #3, it would take 3x-4x more time comparing to 32-bit browser, #1 variant: for example, the uptime before the latest crash was 36 hours, so I can expect the crash now in 4.5-6 days -- which is still annoying, but much more bearable.
To mitigate effect of this bug, I have tried to use FF 29, 64 bit Windows version. But since it is not officially supported (one of the bad decisions on Mozilla's part), no one at Mozilla fixes bugs there, even if those which are critical. This makes use of that build really painful, difficult.

As it is now, there is no way for power users to work in FF normally in Windows, and it is ridiculous. Mozilla has to either normally support 64-bit Windows build, or make 32-bit version viable in heavy duty circumstances. 

If there is no normal 64-bit version, my question is why it is so hard to insert a tiny function in 32-bit FF that would check "vsize_max_contiguous" so that when it approaches zero, FF would automatically restart BEFORE it could crash? 

(I hate crashes because JSON session store is very unreliable and you never know if crash will kill your profile in part, or in full, or not. I have to have Cobian Backup set up to archive SessionStore.js like every few minutes because the damn thing is so unpredictable. "Bak" file is not remedy, still makes FF open empty session sometimes.)

Why myself (and many others, if you look at out of memory crashes statistics) have to deal with those OOM crashes, risking sessions, while those can be avoided by such simple change? Whom I should contact so that even this simplest thing could be done?
USerDderss: could you try setting the pref media.video-queue.default-size to a smaller number? The default is 10, you could try setting it to 5 or 3.

This is the amount of video frames we pre-decode ahead of the playback position. Reducing this number will reduce memory usage.
Thanks, Chris, but it does not help. Every page that I load increases "vsize", and, after some time, decreases "vsize_max_contiguous". Cutting immediate decoding buffer does not make it better.

Since it seems there is no way Mozilla can do this tiny restart function I wrote about, I think maybe this can be done in extension?

1. if I understand it correctly, "Ci.nsIMemoryInfoDumper" can return "vsize_max_contiguous" value, right?
2. if so, can this "Ci.nsIMemoryInfoDumper" method be accessed from a FireFox extension code, so it could check "vsize_max_contiguous" value regularly?
3. if so, can extension autorestart FireFox (once it determine that "vsize_max_contiguous" becomes to low)?
You could modify "RAM Restart" or "Memory Restart" to use vsize instead of private/explicit/resident, and set them to ask for a restart whenever vsize grows to a certain size.
But this wouldn't solve any underlying problems.
And if such a "feature" were part of the browser, it would be devastating regarding the public opinion when the browser repeatedly asks for a restart because of coming close to a OOM-crash.
I have made "Prevent Out Of Virtual Memory Crashes" extension (extensively changed version of "Memory Restart"):

http://addons.mozilla.org/firefox/addon/prevent-out-of-virtual-memo/

Default settings are such that nothing is asked, the browser will silently restart itself. Since even site forms' content is restored after restart, "silent restart" feature should be switched off only if you download/upload some big files and thus it would be bad to interrupt this process.

I have written detailed description of what and how the extension does and does not do, see the link.
Crash Signature: mozilla::layers::BufferRecycleBin::GetBuffer ] → mozilla::layers::BufferRecycleBin::GetBuffer ] [@ mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::layers::BufferRecycleBin::GetBuffer ]
Closing because no crash reported since 12 weeks.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.