On Windows, fire a memory-pressure event when the amount of available virtual address space or physical memory is low

RESOLVED FIXED in mozilla11

Status

()

defect
RESOLVED FIXED
8 years ago
3 years ago

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

Tracking

(Blocks 1 bug)

unspecified
mozilla11
x86
Windows XP
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P1])

Attachments

(6 attachments, 13 obsolete attachments)

27.25 KB, image/gif
Details
24.14 KB, image/gif
Details
4.12 KB, patch
benjamin
: review+
Details | Diff | Splinter Review
2.08 KB, patch
ted
: review+
Details | Diff | Splinter Review
17.08 KB, patch
Details | Diff | Splinter Review
1.65 KB, patch
Details | Diff | Splinter Review
Split off from bug 664291, this bug has a tighter focus: Fire a memory-pressure event when we're running out of virtual address space on Windows 32-bit.

Focusing on Windows, at least as a first step, makes sense because:

 * We generally have less virtual address space available there than on Linux and Mac.
 * I know how to retrieve the amount of virtual address space available to a process on Windows, but not elsewhere.
 * Decoded images are stored in-process (at least, without d2d -- I don't know about otherwise), while decoded images are stored out-of-process on Linux, so it's easier to run out of address space due to having many images open.

In bug 664291, I focused on polling the available address space using a timer.  That worked, but it's hard to figure out how often we should poll, and it's hard to figure out when we're idle and can stop polling.

The alternative I want to explore here is wrapping the Windows virtual memory functions (together the equivalent of mmap) and doing our own accounting of how much vmem is available.  We can periodically adjust our total by calling the function which tells us how much vmem we've actually used, but we don't need to use a timer for that -- we can just do it on every N'th call to a wrapped function.

I think the relevant functions are:

 VirtualAlloc
 VirtualFree
 MapViewOfFile
 UnmapViewOfFile
 VirtualQuery

which are all linked to from [1].

I think we can wrap the relevant functions using nsWindowsDllInterceptor.

[1] http://msdn.microsoft.com/en-us/library/aa366781%28v=vs.85%29.aspx
Assignee: nobody → justin.lebar+bug
Blocks: 664291
Whiteboard: [MemShrink]
This sounds like a heuristic that could break if there are a lot of small allocations and the occasional large one. Would it make sense to always update for large allocations/deallocations (but only do it every nth call for smaller ones)? Or are these allocations similarly sized enough that this shouldn't be an issue?
> The alternative I want to explore here is wrapping the Windows virtual memory 
> functions (together the equivalent of mmap) and doing our own accounting of how 
> much vmem is available.  We can periodically adjust our total by calling the 
> function which tells us how much vmem we've actually used, but we don't need to 
> use a timer for that -- we can just do it on every N'th call to a wrapped 
> function.

> This sounds like a heuristic that could break [...]

I think this might not have been clear.  What I mean is: Whenever any call is made to VirtualAlloc/VirtualFree, we figure out how big the allocation is and update a running counter.  But we don't completely trust this counter -- for instance, other processes can allocate memory into us with VirtualAllocEx (weird, I know) -- so we occasionally ask the operating system how much vmem we're actually using.
Aah, thanks for clarifying. Perhaps it would still make sense to be more cautious on large allocations than on small ones, but that doesn't affect internal accounting.
(In reply to comment #3)
> Aah, thanks for clarifying. Perhaps it would still make sense to be more
> cautious on large allocations than on small ones, but that doesn't affect
> internal accounting.

By "be more cautious on large allocations", I presume you mean that before we make a large allocation, we might call into the OS to check that we actually have enough space?  And if we don't, we might trigger a memory-pressure notification?

That sounds pretty reasonable to me, although we'd have to be careful about re-entrancy.
I did some timing of two syscalls:

  GlobalMemoryStatusEx, and
  VirtualQuery.

GlobalMemoryStatusEx tells us how much virtual memory the whole process is using.  VirtualQuery tells us how much virtual memory the extent which starts at a given pointer takes up.

I think we'll want to use at least one of these functions.  When memory is freed with VirtualFree, we get only a pointer to the memory.  So to figure out how much address space we're freeing, we'd need to either:

 1) free the memory and then call GlobalMemoryStatusEx to get the new amount of vmem used,
 2) call VirtualQuery, free the memory, and then adjust our running total of vmem used, or
 3) remember how much memory was allocated to that pointer when we called VirtualAlloc.

(3) sounds like much more work than it's worth, since the OS has to keep track of this anyway.  All things being equal, (1) is preferable to (2), because if we miss any allocations, we'll immediately fix the problem with (1), but the error will persist with (2).

Also, it's not clear that VirtualQuery always gives the right results.  From the docs [1], it looks like if we have two adjacent allocations with the same protection parameters and call VirtualQuery with a pointer to the beginning of the first allocation, it'll give us the allocations' combined size.

The only reason I can think of to use (2) is if it's faster.  Which brings us to the timing.  I'm using a Windows 7 VMware box.  On each of 50 calls to VirtualFree (generated by starting the browser), I ran each of the two syscalls 10,000 times and timed this using TimeStamp.  I ran this on a debug build.

                      Min  1Qt  Mean  Median   3Qt   Max
GlobalMemoryStatusEx: 4.7  4.7   4.8     4.7   4.7   6.3 us / call
VirtualQuery:         1.5  1.6  19.8    25.0  26.6  61.0 us / call

GlobalMemoryStatusEx seems like the clear winner here; it's much more consistent and faster on average.  (MSDN claims that VirtualQuery walks the page table sequentially, which I imagine might be slow, particularly for large allocations.  But looking at the data,  all calls to VirtualQuery where the page range is small take less than 10us, but many calls where the range is large also take less than 10us.  There seem to be four modes: 1.5us, 3us, 26us, and 60us.  Weird.)

To be sure these numbers are right, I need to run this experiment on bare hardware with a release build.  Since the VM has to emulate page tables, it's conceivable to me that we might get different results when running outside a VM.

[1] http://msdn.microsoft.com/en-us/library/aa366902%28v=vs.85%29.aspx
Oops; all those numbers are too large by a factor of 10, because my loop was up to 100,000, not 10,000.  Anyway, the interpretation is exactly the same.
Attachment #546046 - Attachment is obsolete: true
What I have here doesn't notice when we allocate memory for images using CreateDIBSection.  (At least, I think that's the allocation function.)

The browser crashes on startup when I try to wrap this function, and I don't think I'm up for debugging WindowsDllInterceptor.

It's easy enough to effectively wrap this function in the one or two places where it's called, so I'll do that next.
Aha.  We were crashing on startup when I used WindowsDllInterceptor on CreateDIBSection because I didn't define the function prototype with WINAPI, which affects its calling convention.

(I didn't add WINAPI earlier because MSDN [1] doesn't have it for this function...)

[1] http://msdn.microsoft.com/en-us/library/dd183494%28v=vs.85%29.aspx
Attachment #546044 - Attachment is obsolete: true
Attachment #546169 - Attachment is obsolete: true
Attachment #546237 - Flags: review?(benjamin)
Attachment #546239 - Flags: review?(benjamin)
The real question is: How easily can I get my browser to OOM after building with this patch?  I'll report back once my release build finishes.
Darn, I was too optimistic requesting these reviews.  FF now doesn't crash when I load tons of pictures, but it does hang.  That's not much of an improvement...
Attachment #546237 - Flags: review?(benjamin)
Attachment #546239 - Flags: review?(benjamin)
Word to the wise: You won't get any output on the console with a release build unless you pipe to tee.

  $ dist/bin/firefox.exe | tee
Patch v2 didn't work because I was missing an mb-to-bytes conversion.

The easiest way I've found to OOM FF is to session-restore a few pages with many large images.  In my testcase which consistently OOMs the stock browser, my patched build fires about five low-memory notifications and keeps running.  (Some of the images don't load, presumably because their allocations failed.  But that doesn't seem like much of a problem to me.)
Attachment #546554 - Flags: review?(benjamin)
Attachment #546239 - Attachment is obsolete: true
Attachment #546237 - Flags: review?(benjamin)
Comment on attachment 546554 [details] [diff] [review]
Part 2, v3: Wrap VirtualAlloc and friends.

>+#define LOGGING_ENABLED

Forgot to turn this off, but of course we should turn it off before we push.
I have a patch which I'll upload in a moment to track available physical memory on Windows.  Because it works basically the same way as the other patches in this bug, I'm morphing this bug to encompass both virtual- and physical-memory tracking on Windows.
Summary: Fire a memory-pressure event when the amount of available virtual address space on Windows is low → On Windows, fire a memory-pressure event when the amount of available virtual address space or physical memory is low
Attachment #546624 - Flags: review?(benjamin)
Comment on attachment 546624 [details] [diff] [review]
Part 3, v1 - Track physical memory.

This patch has us track available physical memory on the machine, as given by GetMemoryStatusEx.  [1] defines available physical memory as:

> The amount of physical memory currently available, in bytes. This is the amount 
> of physical memory that can be immediately reused without having to write its 
> contents to disk first. It is the sum of the size of the standby, free, and 
> zero lists.

In my tests, the machine starts to page heavily when available physical memory hits 0, suggesting that this definition is correct.

We check how much memory is available whenever we make a virtual allocation or commit an allocated page.  We seem to make one of these calls at least once a second while browsing around, but we don't make any while we're idle.

If we detect physical memory pressure (less than 32mb of physical memory free) but last fired a memory-pressure notification less than 10s ago, we don't fire another notification.  This is to keep us from spending all our time GC/CC'ing.

Possibly the Right Thing to do would be to fire memory-pressure notifications more often and have the GC/CC notice when there are a lot of memory-pressure notifications.  It could then apply the same ten-second rule, or do something else.  As it is, we restrict any module from responding more than once every ten seconds.

On the other hand, this heuristic seems to work OK, and I'm not sure we want to move from tweaking one dial to setting ~15 dials (one for each memory-pressure observer).  I tried setting the 10s to 1s, and it didn't cause us to page much less in my session-restore a lot of pages with a lot of images test.

[1] http://msdn.microsoft.com/en-us/library/aa366770%28v=vs.85%29.aspx
One consequence of the physical-memory tracking (part 3) is that we'll effectively disable bfcache whenever the system is out of available memory, even if it isn't Firefox's fault.  (bfcache clears itself on every low-memory notification.)  I think this is probably OK, because if your machine is swapping as you try to use FF, we should try to do whatever we can to get the machine back to a usable state.
Pushed to try so we can get some performance numbers.  My hope is that this patch queue won't affect our benchmarks at all.

http://tbpl.mozilla.org/?tree=Try&rev=83aaa669ccef
I just tested this on bare metal (previously, I'd been in a VM).  I used a Core 2 Duo laptop with 2GB of RAM and Windows 7.

I needed to set the physical memory warning level to 64mb (instead of 32mb, as in the patch), but once I did that, it worked pretty well -- it noticed when we were running out of memory and freed it before we started to page.

I couldn't get FF to exhaust virtual memory on this machine, since if I turn off the low physical memory detection, it grinds to a halt long before I approach the vmem limit.

It would be helpful if someone with a Windows XP machine could test this and give me some feedback.  It seems that swapping is much more painful on a physical machine, since I presume my VM's swap file is backed by my host machine's ram.  :)  This leads to at least slightly different behavior.
I should say that in contrast, unpatched nightly was much less pleasant to use under the same workload.  FF would begin swapping once I decoded enough images, and then it'd be basically unusable until the image discard timer fired and the process was swapped back in.
This is great stuff.  Comment 17 and comment 25 sound very promising.  If you can quantify the improvement any further that'd be really helpful!
My desktop has 8 GiB of RAM, so if it's the swapping that's causing it to grind to a halt I should be able to test an out of vmem situation. If you could give some steps to quickly generate an out of vmem situation, and tell me what to look for / what numbers to track, I can give the try build a whirl.
Here's an attempt to make the improvement more concrete.  This is a screenshot of the Windows Activity Monitor on Win7.  We're mostly concerned with the thick red line, available mb of memory (divided by 10 so it fits on the graph), and the thick blue line, pages written to disk per second.  I had the low-memory threshold set to 64mb.

I loaded 6 pages from The Atlantic's In Focus blog.  (I loaded them from disk, since they're big and take a long time to download.)  This was enough to exhaust the memory on my machine from comment 24.

Once the pages were finished loading, I ctrl-tab'ed through all 6 tabs.  This causes us to decode all the images on all the tabs, and causes the red line (mb of free memory) to drop.  Eventually we've exhausted most of the available memory, and FF notices and kicks out the decoded images, causing the available memory to jump up.  I think available memory continues to decrease after the jump because we still have more images to decode (*).

Eventually memory usage levels off, and I repeat the process of ctrl-tab'ing through all the tabs.

Compare the amount of swapping (thick blue line) on this graph to the next one I'll attach.

(*) Notice that the cyan line, CPU usage, is pegged to 50% (one out of two cores) while memory usage decreases, but drops as soon as memory usage flattens.
Here are the results from the same process on a nightly build.

Notice that we swap a lot more (thick blue line).  The slow rise in available memory appears to be due to Windows swapping Firefox out.  Firefox is basically unusable while it's swapping like this, and it takes some time to recover after the blue line falls to 0.  (In fact, the whole machine is pretty difficult to use while we're swapping.)
I need to do some experiments to see whether 64mb is really the right place for this threshold.  It might be the case that we need to do something more complicated, such as fire a notification when 90% of memory is used and we're swapping.  XP (and maybe vista) may behave differently.  (I'm also testing using a 32-bit build on Win7 64-bit -- I wonder if Win7 32-bit would behave differently.)
(In reply to comment #27)
> My desktop has 8 GiB of RAM, so if it's the swapping that's causing it to
> grind to a halt I should be able to test an out of vmem situation. If you
> could give some steps to quickly generate an out of vmem situation, and tell
> me what to look for / what numbers to track, I can give the try build a
> whirl.

Thanks.  My last try build apparently went nowhere, so I'll let you know when I have a build you can try.  I've been able to OOM FF by opening lots of tabs to individual entries on The Atlantic's In Focus [1] and the Boston Globe's Big Picture [2] blogs.

The number to watch is vsize in about:memory.  Since you have 8g of RAM, I presume you're running a 64-bit version of Windows.  I'm not actually sure how much address space a 32-bit program can use on a 64-bit version of Windows; it's probably 3 or 4gb.  Certainly no more than 4.  :)

If you're trying to create an OOM, it might be helpful to set the discard timeout very high, so we don't throw away decoded images you're trying to keep around.  In about:config, it's image.mem.min_discard_timeout_ms.  I don't think you need to restart after changing it, but you might...

[1] http://www.theatlantic.com/infocus/
[2] http://www.boston.com/bigpicture/
I think 32-bit programs get the full 4 GiB on 64-bit Windows - I recall Firefox once using almost that much as a result of a memory leak (this was a while ago, long before about:memory was created).
Whiteboard: [MemShrink] → [MemShrink:P1]
(In reply to comment #32)
> I think 32-bit programs get the full 4 GiB on 64-bit Windows

A large-address-aware 32-bit program (such as Firefox as of several months ago, sorry I can't find the bug#) gets to address:
4 GB on 64 bit Windows
3 GB on 32 bit Windows with /3GB switch
2 GB on regular 32 bit Windows

A "regular" non-LAA program gets 2 GB in all above scenarios.

Hopefully that helps.
Aha, it's bug 556382.  Thanks!  (BTW, you can now search b.m.o with google [1].  So much better than the built-in search.)

[1] http://www.google.com/search?q=site%3Abugzilla.mozilla.org+large+address+aware
Note that I will be going on holiday in about 24 hours, and won't have my desktop available to me for 3 weeks (barring remote desktop via tethering, which doesn't seem like a great idea). I'll be available until then as long as the testing doesn't take too long, however.
I actually just got Windows up and running on my 8gb machine, so I can test myself in a bit.  Thanks for your help!
Alright, that's probably easier anyway :)
Something to note: Direct3D will swap texture memory, as needed, into our address space and I know of no way to track when these happen. This means that the accounting that we do by hooking won't necessarily match reality.
(In reply to comment #38)
> Something to note: Direct3D will swap texture memory, as needed, into our
> address space and I know of no way to track when these happen. This means
> that the accounting that we do by hooking won't necessarily match reality.

Fortunately, the newest patch doesn't do what I'd been calling "accounting" -- it doesn't try to avoid a syscall by calculating how much memory we're allocating with VirtualAlloc and adding that to a total.  Instead, it just calls GetMemoryStatusEx after every VirtualAlloc.  This was easier and more accurate than the arithmetic method, and it doesn't appear to be much slower.

Still, do you have details on this?  We might want to try to hook into these allocations, if they're large enough.
bsmedberg, are you able to estimate when you'll get to these reviews?  This is an important bug (MemShrink:P1).  If you can't estimate or it'll take a long time, can you suggest someone else who might be able to review?
Comment on attachment 546554 [details] [diff] [review]
Part 2, v3: Wrap VirtualAlloc and friends.

I already mentioned concerns about memory-pressure XPCOM notifications in general in IRC: have we decided to just go with the existing mechanism even though it's sucky?

>diff --git a/xpcom/base/Makefile.in b/xpcom/base/Makefile.in

>diff --git a/xpcom/base/VirtualMemoryTracker.cpp b/xpcom/base/VirtualMemoryTracker.cpp

>+// Pull in as little of windows.h as we can get away with.
>+#ifndef WINDOWS_LEAN_AND_MEAN
>+#define WINDOWS_LEAN_AND_MEAN
>+#endif

Why did you need this? This doesn't seem like the right place for this kind of define, especially because configure already defines WIN32_LEAN_AND_MEAN unconditionally.

>+#include "nsIProxyObjectManager.h"

It doesn't appear that you need this header, and if you do there is something wrong...

>+void ScheduleLowMemNotification()
>+{
>+  // If one of these notifications is pending, don't schedule another.
>+  if (PR_ATOMIC_SET(&sLowMemNotificationPending, 1) == 0) {
>+    LOG("Scheduling a low-memory notification.");
>+
>+    nsRefPtr<LowMemNotificationRunnable> runnable =
>+      new LowMemNotificationRunnable();
>+
>+    // Dispatch this event as soon as possible, since we may be about to crash
>+    // and burn.
>+    NS_DispatchToMainThread(runnable, NS_DISPATCH_ASAP);
>+  }
>+  else {
>+    LOG("Not scheduling low-memory notification, "
>+        "because one is already pending.");
>+  }
>+}

Doesn't this function run inside the hook? How can this possibly be safe? you're calling `new` which could reenter, and since the hook can be inside a lock, NS_DispatchToMainThread is also completely unsafe (and can allocated freely).

>+
>+void CheckVmemAvailable()
>+{
>+  MEMORYSTATUSEX stat;
>+  stat.dwLength = sizeof(stat);
>+  bool success = GlobalMemoryStatusEx(&stat);
>+
>+  DEBUG_WARN_IF_FALSE(success, "GlobalMemoryStatusEx failed.");
>+
>+  // sLowVmemThreshold is in MB, but ullAvailVirtual is in bytes.
>+  if (success &&
>+      stat.ullAvailVirtual < sLowVmemThreshold * 1024 * 1024) {
>+    ScheduleLowMemNotification();
>+  }
>+}
>+
>+void* WINAPI
>+VirtualAllocHook(void* aAddress, size_t aSize,
>+                 PRUint32 aAllocationType,
>+                 PRUint32 aProtect)

This should use windows types, not PR types, so LPVOID, SIZE_T, DWORD, DWORD.


>+{
>+  // It's tempting to see whether we have enough free virtual address space for
>+  // this allocation and, if we don't, synchronously fire a low-memory
>+  // notification to free some before we allocate.
>+  //
>+  // Unfortunately that doesn't work, principally because code doesn't expect a
>+  // call to malloc could trigger a GC (or call into the other routines which
>+  // are triggered by a low-memory notification).
>+  //
>+  // I think the best we can do here is try to allocate the memory and check
>+  // afterwards how much free virtual address space we have.  If we're running
>+  // low, we schedule a low-memory notification to run as soon as possible.
>+
>+  void* result = sVirtualAllocOrig(aAddress, aSize, aAllocationType, aProtect);
>+
>+  // Only check whether we're running low on virtual memory if we tried to add
>+  // memory to our virtual address space.  (We might have called VirtualAlloc
>+  // with aAllocationType == MEM_COMMIT, which allocates physical pages to
>+  // memory already in our virtual address space, but a MEM_COMMIT call
>+  // wouldn't affect our available vmem, nor would it fail because we're
>+  // running low on vmem.)
>+  if (aAllocationType & MEM_RESERVE) {
>+    LOG3("VirtualAllocHook(size=", aSize, ")");
>+    CheckVmemAvailable();

I presume you measured the cost of checking on every page allocation? How often does this actually get called (especially during startup)?

>+void* WINAPI
>+MapViewOfFileHook(HANDLE aFileMappingObject,
>+                  PRUint32 aDesiredAccess,
>+                  PRUint32 aFileOffsetHigh,
>+                  PRUint32 aFileOffsetLow,
>+                  size_t aNumBytesToMap)

again, use the documented windows signature

>+HBITMAP WINAPI
>+CreateDIBSectionHook(HDC aDC,
>+                     const BITMAPINFO *aBitmapInfo,
>+                     unsigned int aUsage,
>+                     void **aBits,
>+                     HANDLE aSection,
>+                     PRUint32 aOffset)

ditto

It seems to me that the only safe way to get information from here to the event loop is to set the flag and then have the event loop check the flag at the next safe point. Because all of these can be called from inside of the event loop locks, I don't think there's any way to use the event loop from here.

>diff --git a/xpcom/base/VirtualMemoryTracker.h b/xpcom/base/VirtualMemoryTracker.h

>+#ifndef VirtualMemoryTracker_h
>+#define VirtualMemoryTracker_h

nit, should be mozilla_VirtualMemoryTracker_h

>+
>+namespace mozilla {
>+namespace VirtualMemoryTracker {
>+
>+// The VirtualMemoryTracker is implemented only on 32-bit Windows.  But to make
>+// callers' lives easier, we stub out empty calls on other platforms.  So you
>+// can always call these functions; they just might not do anything.
>+
>+#if defined(WIN32) && defined(_M_IX86)
>+void Init();
>+void CheckVmemAvailable();

Since CheckVmemAvailable() is only called from within the .cpp file, can we make it static/anonymous and avoid declaring it in the header?

>+#else
>+void Init() {}
>+void CheckVmemAvailable() {}
>+#endif
Attachment #546554 - Flags: review?(benjamin) → review-
Thanks for the review!  Regarding the important stuff:

> I already mentioned concerns about memory-pressure XPCOM notifications in 
> general in IRC: have we decided to just go with the existing mechanism even 
> though it's sucky?

As I recall, we concluded in IRC that the existing mechanism isn't so bad.  I recall your concern was that XPCOM would force us to fire the notification asynchronously, but we decided that we'd have to fire asynchronously anyway, because code breaks very hard if, for instance, malloc() triggers a GC.

Is there something I'm forgetting?

> It seems to me that the only safe way to get information from here to the 
> event loop is to set the flag and then have the event loop check the flag at 
> the next safe point. Because all of these can be called from inside of the 
> event loop locks, I don't think there's any way to use the event loop from 
> here.

Ah, yes.  Back to the drawing board on that.
Comment on attachment 546237 [details] [diff] [review]
Part 1, v2: Add DISPATCH_ASAP to nsIEventQueue.

I don't want to take this unless it's necessary, and I think given the rework that will be required this probably won't be necessary.
Attachment #546237 - Flags: review?(benjamin)

Updated

8 years ago
Attachment #546624 - Flags: review?(benjamin)
The patches as they exist won't fire a memory-pressure notification due to low physical memory more than once every 10 seconds.  I'm a little concerned about this heuristic going awry on a computer which, for whatever reason, is in a permanent state of zero available physical memory.  In this case, we'll spam low-memory notifications, which might degrade the user's already poor experience.

A better heuristic might be not to fire a low-memory notification unless we're using X% more memory than we were after we fired the last low-memory notification, on the assumption that if we're not using any more memory, firing another notification is unlikely to free up much.

I think the heuristic would need to be a bit more complicated in practice.  If the last low-memory notification was fired an hour ago, comparing our memory usage now to back then doesn't make much sense.  But the new heuristic might allow us to increase the timer from 10s to 60s or higher, which would be an improvement for people on memory-starved machines.
Pushed new patches to try: http://tbpl.mozilla.org/?tree=Try&rev=76a84944544a

If you liked DISPATCH_ASAP, you're going to *love* nsIThreadInternal::SetASAPCallback.  :)

http://hg.mozilla.org/try/rev/d433881507b1
One of these days, I'm going to get a try build which works.

http://tbpl.mozilla.org/?tree=Try&rev=567b22a900ef

I got a netbook in today, so I'm going to do some experiments on it once the build finishes.
Strange |make package| errors last time.  Third time's a charm?

http://tbpl.mozilla.org/?tree=Try&rev=56cef6923a7d
As a status update, I'm working on two issues here.

* |make package| strangely fails on try.  It doesn't output an error message (just "error 128", which is ERROR_WAIT_NO_CHILDREN), and I can't reproduce the problem on my machine.

I'm going to try to narrow down what code is causing this by pushing the patches one at a time.

* Testing on a netbook.  My netbook won't run the build from my Windows builder, even after I installed the VS2010 CRT.  I'm installing VS2010 on the netbook in the hopes that that will allow me to run the builds.
I just did some tests on my netbook, and I'm now not so convinced that the available physical memory tracking is useful.

Previously, it was easy to gobble up a lot of RAM by loading many image-heavy sites in tabs.  When the discard timer was stepped down from 120s to 10s, it became harder: Now you had to cycle between all your tabs every 10s or so to keep the images alive.

In practice, you could still gobble up a lot of memory by letting all your tabs' images expire, then cycling quickly through all the tabs.  The images on those tabs would start to decode, and they wouldn't be eligible for discarding until the decode finished.

But we changed that too (bug 672578).  Now when a tab loses focus, it cancels any ongoing decodes and throws away the unfinished products.

Now it's very hard for me to use images to eat up memory in a way that would be helped by the available-memory tracking in this patch.  Maybe there's another technique I could use, but afacit images are by far the largest thing which is purged upon memory pressure.

So I'm concerned that the available-memory tracking might not be useful.  I'm also concerned it might be harmful.  We're basing Firefox's actions on the system's state, which we don't control.  A crummy background application (I hear Windows has a few) could eat up all available RAM, and Firefox will react to this by GC'ing more often, dropping caches, and whatnot.

Maybe that's the right thing to do, but it also could lead to weird situations, where some other program's bad behavior causes us to do something the user wasn't expecting (such as drop decoded images).

Anyway, I think the virtual memory tracking, if its overhead is sufficiently low, is still a good idea, since the amount of virtual memory we have available is (mostly) a result only of Firefox's actions.  I'll be able to measure overhead once I figure out what's going on with the |make package| failures.
(In reply to comment #49)
> 
> But we changed that too (bug 672578).  Now when a tab loses focus, it
> cancels any ongoing decodes and throws away the unfinished products.

Oh, you didn't mark that with "MemShrink" and I didn't know about it.  Very nice.

> Anyway, I think the virtual memory tracking, if its overhead is sufficiently
> low, is still a good idea, since the amount of virtual memory we have
> available is (mostly) a result only of Firefox's actions.

Sounds reasonable.
I've bisected the |make package| failure down the line indicated below (i.e., the build completes successfully if the line is commented out:

  ASAPCallbackType asapCallback =
    atomic::PointerGetAndSetToNull(&mASAPCallback);
  if (asapCallback) {
    (*asapCallback)();   <---
  }

in nsThread.  I'm still not sure why this is causing the error, however.  It would appear that this code is being run during |make package| and somehow the function pointer contains an invalid value.

khuey or bsmedberg, do you have any idea why the code in nsThread is being run during make package?
Do you have a link to a log?
http://tbpl.mozilla.org/?tree=Try&rev=e6105c2574f0
http://tinderbox.mozilla.org/showlog.cgi?log=Try/1312645986.1312650806.30270.gz

Once try reopens, I'll push with a printf before the call to the function pointer, and see if that printf gets hit anywhere in the log.
http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/installer/packager.mk#394 is the command being run, xpcshell is prepopulating the startup cache.
Oh...maybe this is a |volatile| issue.  The function pointer member doesn't have volatile...
Oh.  This is a much more mundane "I added an initializer to one constructor but forgot the other" bug.
Depends on: 677520
Depends on: 677685
Posted patch Newer patch from (obsolete) — Splinter Review
No longer depends on: 677685
Alas that last push didn't actually have PGO for some reason.  I'll try again later today.
Try run for 4d2bcd023cbc is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=4d2bcd023cbc
Results (out of 1 total builds):
    failure: 1
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-4d2bcd023cbc
Great, the failure only happens with PGO.  This really could be a volatile thing or something...
Try run for 3e7267140a20 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=3e7267140a20
Results (out of 1 total builds):
    failure: 1
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-3e7267140a20
...so volatile didn't fix this.

Now that I know it's related to PGO, I think I should try to get a releng machine again and debug the crash there.
I got a build to fail on a machine I'm VNC'ed into.  I've been told in the past that the point it's failing at is when we try to run xpcshell.  And indeed, when I run xpcshell, I get

  "This application has failed to start because PGORT80.DLL was not found. Re-installing the application may fix this problem."

The Firefox binary also won't start with the same error.

This looks like bug 626274, which was resolved as a buildbot issue?  Surely that's not what's going on here, since I triggered the build myself.  Nthomas, khuey, can you offer any insight into what this means?
Ted, can you help with the comments above?
As (I think) a separate issue, when I put the DLL in the build directory, it runs and crashes with a stack overflow.

Looking at the disassembly, this is an instrumented build.  In my function which wraps VirtualAlloc, there's a call to __PogoRuntimeVector before and after I invoke the original, unwrapped VirtualAlloc.   But apparently __PogoRuntimeVector can call VirtualAlloc, so around and around we go.

Can I detect at compile-time whether I'm an instrumented PGO build?
Latest strategy is to set an environment variable if we're an instrumented PGO build and not wrap VirtualAlloc if that env var is set.
(In reply to Justin Lebar [:jlebar] from comment #67)
> Latest strategy is to set an environment variable if we're an instrumented
> PGO build and not wrap VirtualAlloc if that env var is set.

This worked!  I got a successful build on the builder I'm VPN'ed into.

I'll prepare a patch for review tomorrow.
Try run for e6b46e2d3039 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=e6b46e2d3039
Results (out of 41 total builds):
    success: 39
    warnings: 2
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-e6b46e2d3039
Attachment #546237 - Attachment is obsolete: true
Attachment #546554 - Attachment is obsolete: true
Attachment #546624 - Attachment is obsolete: true
Attachment #551892 - Attachment is obsolete: true
Attachment #567012 - Flags: review?(benjamin)
Attachment #567011 - Flags: review?(benjamin)
Comment on attachment 567014 [details] [diff] [review]
Part 1, v4: Add atomic pointer operations.

I see no meaningful movement on Talos in the try results above.

That the part 3 patch I pushed knows how to monitor for low physical as well as virtual memory, but the physical memory tracking is pref'ed off.  This lets us easily experiment with turning on physical memory tracking, if we want.  And it's not a lot of extra code.
Attachment #567014 - Flags: review?(benjamin)
Comment on attachment 567014 [details] [diff] [review]
Part 1, v4: Add atomic pointer operations.

I really don't like the #error condition here, there are people who maintain these platforms. NSPR already has 32-bit atomic operations falling back to a lock: can we just extend that for pointer-sized operations, or do the same thing it does in XPCOM?

Are these intrinsics and the associated #defines available in clang?
Attachment #567014 - Flags: review?(benjamin) → review-
Comment on attachment 567011 [details] [diff] [review]
Part 2, v4: Add nsIThreadPrivate::SetASAPCallback

A %{C++ block in the interface will cause the vtable to not match xptcall's understanding if the interface ever gets new methods or inherited. Use this method instead:

outside the interface, near the top of the file:

%{C++
typedef void (*voidfunc)();
%}

[ptr] native voidfunc(voidfunc);

The method declaration:

  [noscript, notxpcom] bool SetASAPCallback(in voidfunc aCallback);

I still think that this is overkill for the specific solution you have in mind: why not just have a method "runLowMemoryNotification" or even just a global bool without interface changes at all. Then you don't have to worry about the atomic swap operations in the other patch at all. It seems like there is too much generality here, because if more than one consumer actually started using this SetASAPCallback method they might trample each other anyway.
Attachment #567011 - Flags: review?(benjamin) → review-
The difficulty is that we need to ensure that the NSPR ops don't virtual alloc.  I'm afraid they might initially, since NSPR has to init the locks.

OTOH, we're certainly not going to write a virtualalloc interceptor on a platform where we don't have good atomic pointer ops, so we could just use NSPR here!

> It seems like there is too much generality here, because if more than one consumer actually started 
> using this SetASAPCallback method they might trample each other anyway.

I think this is a fair point.
Comment on attachment 567012 [details] [diff] [review]
Part 3, v4: Add memory monitoring.

Are the mozconfig/PGO changes meant to be in this patch? I don't think I'm the correct reviewer for those bits and they should probably be separate.

TimeStamp::Now() can lock (and then calls PR_IntervalNow): are you sure that it is safe to call this from within CheckMemAvailable which runs within arbitrary code? The hooks added in this code are never removed, but they can (and probably will) crash if they run after the XPCOM shutdown process which destroys the Timestamp lock. Perhaps it would be better to use PR_IntervalNow directly and deal with overflow directly rather than using Timestamp?

I feel like cjones should maybe look at this also.
Attachment #567012 - Flags: review?(benjamin) → review-
Posted patch Part 2, v5 (WIP) (obsolete) — Splinter Review
Attachment #567011 - Attachment is obsolete: true
> I still think that this is overkill for the specific solution you have in mind: why not just have a 
> method "runLowMemoryNotification" or even just a global bool without interface changes at all.

The global bool is nice, since then we don't have to worry about whether NS_GetMainThread can call malloc or acquire a lock.
Try run for 6215e727cdf4 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=6215e727cdf4
Results (out of 1 total builds):
    failure: 1
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-6215e727cdf4
Try run for af0a9767fc34 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=af0a9767fc34
Results (out of 1 total builds):
    failure: 1
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-af0a9767fc34
Try run for a74caaf7640c is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=a74caaf7640c
Results (out of 24 total builds):
    success: 23
    warnings: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-a74caaf7640c
Attachment #579565 - Attachment is obsolete: true
Attachment #567012 - Attachment is obsolete: true
Attachment #581177 - Flags: review?(benjamin)
Attachment #567014 - Attachment is obsolete: true
Comment on attachment 581186 [details] [diff] [review]
Rev 2, part 3 v1: Pass env var when running PGO-instrumented build

Review of attachment 581186 [details] [diff] [review]:
-----------------------------------------------------------------

::: client.mk
@@ +219,1 @@
>  	OBJDIR=${PGO_OBJDIR} JARLOG_DIR=${PGO_OBJDIR}/jarlog/en-US $(PROFILE_GEN_SCRIPT)

Any reason not to just stick MOZ_PGO_INSTRUMENTED=1 at the beginning of this line and avoid changing profileserver.py?
Attachment #581186 - Flags: review?(ted.mielczarek) → review+
> Any reason not to just stick MOZ_PGO_INSTRUMENTED=1 at the beginning of this line and avoid changing 
> profileserver.py?

>   browserEnv = automation.environment()

I think automation.environment() is constructed without respect for the calling environment.

Probably should put MOZ_PGO_INSTRUMENTED=1 where you suggested too, though, because we're not necessarily running with profileserver.py.
Whiteboard: [MemShrink:P1] → [MemShrink:P1][needs review]
Try run for a74caaf7640c is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=a74caaf7640c
Results (out of 24 total builds):
    success: 23
    warnings: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-a74caaf7640c
Hm...I don't know why trybot reposted that comment.  The rev I pushed (without changes to profileserver.py) is https://tbpl.mozilla.org/?tree=Try&rev=d5a4503d6628
Try run for d5a4503d6628 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=d5a4503d6628
Results (out of 1 total builds):
    failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-d5a4503d6628
Ted ^.  Looks like it didn't work.  :(

https://hg.mozilla.org/try/rev/d5a4503d6628
Comment on attachment 581177 [details] [diff] [review]
Rev 2, part 1 v1: Add ScheduleMemoryPressureEvent

Note that on mac this will only run the event when the next *XPCOM* event is processed: if we're in the native event loop and are only running UI events this will be delayed until the next XPCOM event is dispatched. I don't think this will be a problem in practice, though.
Attachment #581177 - Flags: review?(benjamin) → review+
(In reply to Justin Lebar [:jlebar] from comment #92)
> Ted ^.  Looks like it didn't work.  :(
> 
> https://hg.mozilla.org/try/rev/d5a4503d6628

Uh, you didn't actually make the change I suggested. You need to add it to the line that calls $(PROFILE_GEN_SCRIPT), here:
https://hg.mozilla.org/try/rev/d5a4503d6628#l2.16
Oh.  I didn't qref.  :(

Updated

8 years ago
Attachment #581178 - Flags: review?(benjamin) → review+
Whiteboard: [MemShrink:P1][needs review] → [MemShrink:P1][needs landing]
Try run for b7afd340ad8b is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=b7afd340ad8b
Results (out of 24 total builds):
    success: 24
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-b7afd340ad8b
Try run for f2c604e605fa is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=f2c604e605fa
Results (out of 25 total builds):
    exception: 13
    failure: 12
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-f2c604e605fa
Try run for 7e4d4045e0bf is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=7e4d4045e0bf
Results (out of 50 total builds):
    exception: 13
    success: 13
    failure: 24
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-7e4d4045e0bf
This bounced on inbound because Android's GCC isn't pleased with passing a bool* to PR_ATOMIC_SET.  It must be a PRInt32*.  (Even PRUint32* doesn't work.)
Fixing win64 link error.

We used to build the AvailableMemTracker only on 32-bit Windows.  But now we build it on 32- and 64-bit.  But on 64-bit, we do not wrap the syscalls if the prefs are set only to look for low virtual memory (as is the default).  So this change has no effect on 64-bit builds (except to make them compile :).
Attachment #581178 - Attachment is obsolete: true
There's also a mochitest-1 failure I'm on try that I haven't seen before.  Unfortunately I don't have a stack... (bug 711179)
Try run for 7e4d4045e0bf is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=7e4d4045e0bf
Results (out of 51 total builds):
    exception: 13
    success: 14
    failure: 24
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-7e4d4045e0bf
Try run for 7e4d4045e0bf is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=7e4d4045e0bf
Results (out of 51 total builds):
    exception: 13
    success: 14
    failure: 24
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-7e4d4045e0bf
Whiteboard: [MemShrink:P1][needs landing] → [MemShrink:P1]
Try run for 8c6b6dbf1ec0 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=8c6b6dbf1ec0
Results (out of 206 total builds):
    exception: 1
    success: 167
    warnings: 37
    failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-8c6b6dbf1ec0
Blocks: 711490
Oh, cool!  Judging from the code you trigger memory pressure on VirtualAlloc if physical memory is full or virtual memory is within 128MB of full, is that right?  Do you have any updated data on how well this works?
> Judging from the code you trigger memory pressure on VirtualAlloc if physical memory is full or 
> virtual memory is within 128MB of full, is that right?

It's tracking only low virtual memory right now (0 means don't track).  IIRC Windows started paging long before available physical mem hit 0.

We can turn physical memory tracking on, but I figured we'd take things slow and start with vmem only.  ATM physical tracking wouldn't help much, since we really don't drop much on memory pressure.
Ok, that's a reasonable (cautious) start.  Only tracking virtual memory avoids your worry about other processes affecting Firefox, too.
Oh geez:

    +pref("memory.low_virtual_memory_threshold_mb", 128);
    +pref("memory.low_physical_mem_threshold_mb", 0);
    +pref("memory.low_physical_memory_notification_interval_ms", 10000);

Inconsistent use of "memory" vs. "mem"?  jlebar, can you please do a follow-up fix to make them all use "memory"?  Thanks.
Depends on: 717092
Depends on: 742491
Depends on: 850957

Updated

6 years ago
Duplicate of this bug: 849577

Updated

4 years ago
Blocks: 1195331
See Also: → 1301667
You need to log in before you can comment on or make changes to this bug.