Closed Bug 1451005 Opened 2 years ago Closed 2 years ago

Implement low-memory detection on 64-bit Windows builds

Categories

(Core :: Memory Allocator, enhancement)

x86_64
Windows
enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- fixed

People

(Reporter: gsvelto, Assigned: gsvelto)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(4 files, 3 obsolete files)

The AvailableMemoryTracker is currently enabled only on 32-bit Windows builds [1], but as it happens current OOM crashes are more frequent on 64-bit Windows than on 32-bit (possibly just because our 64-bit population exceeds the 32-bit one).

Anyway, we should enable it so that we can react to low-memory situations correctly. However the tracker currently check both for physical and virtual memory exhaustion but there's no point in checking the latter since on 64-bit builds we'll never run out of virtual memory.

[1] https://searchfox.org/mozilla-central/rev/f5fb323246bf22a3a3b4185882a1c5d8a2c02996/xpcom/base/AvailableMemoryTracker.cpp#38
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Attachment #8966563 - Flags: feedback?(dmajor)
Comment on attachment 8966563 [details]
Bug 1451005 - Introduce a timer-based poller for detecting low-memory scenarios;

https://reviewboard.mozilla.org/r/235280/#review241248

Seems ok to me. dmajor, do you have an concerns?
Attachment #8966563 - Flags: review?(n.nethercote) → review+
Comment on attachment 8966563 [details]
Bug 1451005 - Introduce a timer-based poller for detecting low-memory scenarios;

https://reviewboard.mozilla.org/r/235280/#review241398

If physical OOMs are becoming a larger problem, should we add some new triggers to SaveMemoryReportNearOOM? https://searchfox.org/mozilla-central/rev/6bfadf95b4a6aaa8bb3b2a166d6c3545983e179a/xpcom/threads/nsThread.cpp#514

::: xpcom/base/AvailableMemoryTracker.cpp:408
(Diff revision 1)
>  }
>  
>  void
>  Init()
>  {
>    // Do nothing on x86-64, because nsWindowsDllInterceptor is not thread-safe

This comment suggests that 64-bit was deliberately avoided in the past.

I know that we still hook 64-bit functions all over the place, despite the threading danger. Maybe it's an acceptable risk; I don't know.

At the least, this comment block should be reworded.

::: xpcom/base/AvailableMemoryTracker.cpp:423
(Diff revision 1)
>    // Don't register the hooks if we're a build instrumented for PGO: If we're
>    // an instrumented build, the compiler adds function calls all over the place
>    // which may call VirtualAlloc; this makes it hard to prevent
>    // VirtualAllocHook from reentering itself.
>    if (!PR_GetEnv("MOZ_PGO_INSTRUMENTED")) {
>      sKernel32Intercept.Init("Kernel32.dll");

Something to watch out for: VirtualAlloc is the type of commonplace function that injected apps love to hook. And we've many times run into problems when our two hooks don't play nice together. Keep an eye out for new crashes that correlate with injected apps (they may happen in very different stacks from this code).

::: xpcom/base/AvailableMemoryTracker.cpp:424
(Diff revision 1)
>    // an instrumented build, the compiler adds function calls all over the place
>    // which may call VirtualAlloc; this makes it hard to prevent
>    // VirtualAllocHook from reentering itself.
>    if (!PR_GetEnv("MOZ_PGO_INSTRUMENTED")) {
>      sKernel32Intercept.Init("Kernel32.dll");
>      sKernel32Intercept.AddHook("VirtualAlloc",

Please make sure we enable 64-bit coverage for all three of these AddHook calls: https://searchfox.org/mozilla-central/rev/6bfadf95b4a6aaa8bb3b2a166d6c3545983e179a/toolkit/xre/test/win/TestDllInterceptor.cpp#680

And you'll want to run TestDllInterceptor.exe on all our supported OS, manually if necessary -- e.g. we don't run Windows 8 in CI, which results in surprises like bug 1353304.
(In reply to David Major [:dmajor] from comment #3)
> Comment on attachment 8966563 [details]
> Bug 1451005 - Enable the available memory tracker on Windows 64-bit;
> 
> https://reviewboard.mozilla.org/r/235280/#review241398
> 
> If physical OOMs are becoming a larger problem, should we add some new
> triggers to SaveMemoryReportNearOOM?
> https://searchfox.org/mozilla-central/rev/
> 6bfadf95b4a6aaa8bb3b2a166d6c3545983e179a/xpcom/threads/nsThread.cpp#514

Yes, physical OOMs are becoming more commonplace, especially on 64-bit hosts with <4GiB of physical memory. Users that have been upgraded from 32-bit Firefox to the 64-bit version are crashing more often because of the increased memory footprint. I'll file a bug for that.

> ::: xpcom/base/AvailableMemoryTracker.cpp:408
> (Diff revision 1)
> >  }
> >  
> >  void
> >  Init()
> >  {
> >    // Do nothing on x86-64, because nsWindowsDllInterceptor is not thread-safe
> 
> This comment suggests that 64-bit was deliberately avoided in the past.
> 
> I know that we still hook 64-bit functions all over the place, despite the
> threading danger. Maybe it's an acceptable risk; I don't know.
> 
> At the least, this comment block should be reworded.

Good catch, I've run some tests on try and I've already hit some issues:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2441154c2bd4d487c78b4e9e4d7c171fd5de5669&selectedJob=172859980

This isn't going to be as straightforward as I thought it would be.

> ::: xpcom/base/AvailableMemoryTracker.cpp:423
> (Diff revision 1)
> >    // Don't register the hooks if we're a build instrumented for PGO: If we're
> >    // an instrumented build, the compiler adds function calls all over the place
> >    // which may call VirtualAlloc; this makes it hard to prevent
> >    // VirtualAllocHook from reentering itself.
> >    if (!PR_GetEnv("MOZ_PGO_INSTRUMENTED")) {
> >      sKernel32Intercept.Init("Kernel32.dll");
> 
> Something to watch out for: VirtualAlloc is the type of commonplace function
> that injected apps love to hook. And we've many times run into problems when
> our two hooks don't play nice together. Keep an eye out for new crashes that
> correlate with injected apps (they may happen in very different stacks from
> this code).
> 
> ::: xpcom/base/AvailableMemoryTracker.cpp:424
> (Diff revision 1)
> >    // an instrumented build, the compiler adds function calls all over the place
> >    // which may call VirtualAlloc; this makes it hard to prevent
> >    // VirtualAllocHook from reentering itself.
> >    if (!PR_GetEnv("MOZ_PGO_INSTRUMENTED")) {
> >      sKernel32Intercept.Init("Kernel32.dll");
> >      sKernel32Intercept.AddHook("VirtualAlloc",
> 
> Please make sure we enable 64-bit coverage for all three of these AddHook
> calls:
> https://searchfox.org/mozilla-central/rev/
> 6bfadf95b4a6aaa8bb3b2a166d6c3545983e179a/toolkit/xre/test/win/
> TestDllInterceptor.cpp#680

Sure, thanks.

> And you'll want to run TestDllInterceptor.exe on all our supported OS,
> manually if necessary -- e.g. we don't run Windows 8 in CI, which results in
> surprises like bug 1353304.

I will, time to dust off the Windows 8 VM.
when do you expect to land this in 61 or 62?  

thanks!
Flags: needinfo?(gsvelto)
Component: General → Memory Allocator
Comment on attachment 8966563 [details]
Bug 1451005 - Introduce a timer-based poller for detecting low-memory scenarios;

I won't be landing this as the fix here is more complex than I thought. This requires more work and possibly a different approach. Marking as obsolete.
Attachment #8966563 - Attachment is obsolete: true
Flags: needinfo?(gsvelto)
Attachment #8966563 - Flags: feedback?(dmajor)
OS: Unspecified → Windows
Hardware: Unspecified → x86_64
I'm changing the title of this bug to reflect the fact that the available memory tracker is not usable on 64-bit Windows because of the inability to hook the DLLs in a thread-safe way. I'm investigating the use of memory resource notifications [1] instead which would provide a cleaner way to detect low-memory scenarios. I have to test it because I couldn't find any recent documentation describing the thresholds used for the notification. Some old documentation (circa 2012) suggests it's only 32MiB on hosts with 4GiB up to a maximum of 64MiB for hosts with 8GiB or more. While those thresholds are low they would be sufficient to catch most of the situations which lead to an OOM crash provided they refer to commit space. At least 80% of the OOM crash pings have less than 64MiB of available commit space. However the documentation doesn't say what this threshold is compared against; if it were available physical memory then it would be useless. 

[1] https://msdn.microsoft.com/en-us/library/windows/desktop/aa366541(v=vs.85).aspx
Summary: Enable the available memory tracker on 64-bit Windows builds → Implement low-memory detection on 64-bit Windows builds
Just to throw another idea out there: a large fraction of our VirtualAlloc calls are going to be coming from mozjemalloc. If we can't safely hook VirtualAlloc, what if we make mozjemalloc itself call CheckMemAvailable after each VirtualAlloc?
Good point. And in fact, if I squint hard enough, I can see that this is just a polling method that we happen to hook up to memory allocation calls because it has to do with measuring memory consumption. We could use a timer instead, one that would stop when Firefox enters an idle state and possibly tick faster when memory consumption is increasing rapidly.
I've tested the current implementation to figure out how often we call CheckMemAvailable() and depending on the workload it varies between hundreds of times per second (during active loading of a heavy webpage) to tens of times per second (interacting with a light-ish page) to only a few times per second (scrolling / hovering over a mostly static page). When idle the check is never called. I didn't expect active use to invoke it so often, so I guess that doing it on a timer is probably a better idea because right now we're clearly over-polling (during startup this is being called thousands of times per second, it might even have a measurable performance impact). The key will be to also detect idle states and never poll during those.

Also, I have looked only at the parent process, but IIUC the available memory tracker is enabled in child processes too so we're also doing polling within those. We might keep doing the same or restrict the polling to the main process and forward the memory-pressure events to child processes if we don't mind some delay in responding to low-memory scenarios.

Switching this to a timer also means that we might make this mechanism cross-platform provided we find a way to check for available memory on Linux and Mac.
SaveMemoryReportNearOOM is another operation that's on a pseudo-timer (every trip around the event loop, we check to see if enough time has passed): https://searchfox.org/mozilla-central/rev/ce9ff94ffed34dc17ec0bfa406156d489eaa8ee1/xpcom/threads/nsThread.cpp#479

Perhaps that function could be updated to use your new timer solution.
Indeed, thanks for the pointer! Strangely enough in all my delving into crash reporting and dealing with low-memory scenarios I never encountered this particular bit of code yet :-/

We're calling this every time we spin the main thread event loop so it's a very good idea to move it out of there. Additionally this is using a different threshold than the available memory tracker to detect low-memory scenarios and more importantly one that is so low we're unlikely to ever hit it. It also doesn't cover non-virtual low-memory scenarios. I'll file a bug to overhaul this code so as to not pile too much stuff into this one.
Blocks: 1459212
This is a WIP of my rewrite of this code. Instead of using DLL hooks I'm using a regular timer to poll for memory availability. The timer starts and stops depending on the user interacting with Firefox so as not to poll when we're idle. I've currently implemented it only on Win64 but the upside of this approach is that it can be used on other platforms too.
This patch removes the -no-forward memory pressure events and forwards all the events to the content processes. The -no-forward approach was initially introduced in B2G where every process had kernel-based low-memory detection and needed to react very fast. Since we're using a timer to poll available memory there's no point in keeping this mechanism as there would still inevitably be a delay even if the content processes were polling on their own.
For some reasons we had telemetry probes for low virtual memory events and low physical memory events (which were never activated) but not low commit-space events. This patch adds the latter.
Attachment #8975203 - Attachment is obsolete: true
Attachment #8975204 - Attachment is obsolete: true
Attachment #8975205 - Attachment is obsolete: true
Comment on attachment 8966563 [details]
Bug 1451005 - Introduce a timer-based poller for detecting low-memory scenarios;

mozreview recycled the old request so clearing the review flag and f? to :dmajor.
Attachment #8966563 - Flags: review?(n.nethercote)
Attachment #8966563 - Flags: review+
Attachment #8966563 - Flags: feedback?(dmajor)
Comment on attachment 8975518 [details]
Bug 1451005 - Add telemetry probes for low commit-space events;

https://reviewboard.mozilla.org/r/243776/#review249634

Good catch. We've had that probe since Histograms.json was created, but near as I can tell it's been missed since the beginning of TelemetrySession.jsm.

Don't we need LOW_MEMORY* in the content process? I notice we receive quite a few of them from content processes in the wild: https://mzl.la/2Kn8zR6
Attachment #8975518 - Flags: review?(chutten)
Comment on attachment 8975518 [details]
Bug 1451005 - Add telemetry probes for low commit-space events;

https://reviewboard.mozilla.org/r/243776/#review249634

We needed them but I'm changing it in this series, we'll now have them only in the main process. See the second patch description for all the nitty gritty details dating back to the months before the launch of Firefox OS 1.0. Also, about your link, there must be something funny going on because those are low physical memory events and they should be disabled :S
Comment on attachment 8966563 [details]
Bug 1451005 - Introduce a timer-based poller for detecting low-memory scenarios;

https://reviewboard.mozilla.org/r/235280/#review249810

(Only a quick look, I'd like to look closer when I have more time)

::: xpcom/base/AvailableMemoryTracker.cpp:67
(Diff revision 2)
>  
>  Atomic<uint32_t, MemoryOrdering::Relaxed> sNumLowVirtualMemEvents;
>  Atomic<uint32_t, MemoryOrdering::Relaxed> sNumLowCommitSpaceEvents;
>  Atomic<uint32_t, MemoryOrdering::Relaxed> sNumLowPhysicalMemEvents;
>  
> +#if !defined(HAVE_64BIT_BUILD)

Why do Win32 and Win64 take different paths? It leads to a lot of duplication.

Also, under Win64 you've got kLowVirtualMemoryThreshold etc. defined at both top-level and at class-level. Was that intentional?

::: xpcom/base/AvailableMemoryTracker.cpp:424
(Diff revision 2)
> +  stat.dwLength = sizeof(stat);
> +  bool success = GlobalMemoryStatusEx(&stat);
> +
> +  if (success) {
> +    bool lowMemory =
> +      IsVirtualMemoryLow(stat) ||

This will short-circuit and I'm not sure that's what we want. If multiple types of memory are low, we should probably do the sNum..++ on all of them.
Comment on attachment 8966563 [details]
Bug 1451005 - Introduce a timer-based poller for detecting low-memory scenarios;

https://reviewboard.mozilla.org/r/235280/#review249810

> Why do Win32 and Win64 take different paths? It leads to a lot of duplication.
> 
> Also, under Win64 you've got kLowVirtualMemoryThreshold etc. defined at both top-level and at class-level. Was that intentional?

I intend to remove the Win32-specific code once I've proven that the new code does its job. Since both gather telemetry data it should be easy to compare them while they will be both enabled.

> This will short-circuit and I'm not sure that's what we want. If multiple types of memory are low, we should probably do the sNum..++ on all of them.

Actually that was the previous behavior: the counters were incremented only when MaybeSendMemoryPressureEvent() returned true so if we'd reach two (or three) thresholds at the same time only the first one would be accounted. I don't mind changing it though, in fact it's probably more interesting to know if you're running out of more than one resource at a time.
Attachment #8966563 - Flags: review?(n.nethercote)
Attachment #8966563 - Flags: feedback?(dmajor)
Comment on attachment 8975518 [details]
Bug 1451005 - Add telemetry probes for low commit-space events;

https://reviewboard.mozilla.org/r/243776/#review249928

To ensure all our ducks are in a row, please seek Data Collection Review for the addition of the commit space collection.

::: toolkit/components/telemetry/Histograms.json:1081
(Diff revision 2)
>      "n_buckets": 10,
>      "description": "Time(ms) to purge dirty heap pages."
>    },
>    "LOW_MEMORY_EVENTS_VIRTUAL": {
> -    "record_in_processes": ["main", "content"],
> +    "record_in_processes": ["main"],
>      "alert_emails": ["memshrink-telemetry-alerts@mozilla.com"],

Please add bug_numbers fields to these probes (they'll need taking off one of the whitelists in histogram-whitelists.json)
Attachment #8975518 - Flags: review?(chutten) → review+
Comment on attachment 8975517 [details]
Bug 1451005 - Add low commit-space event counts to the memory reporter;

https://reviewboard.mozilla.org/r/243774/#review250166

::: xpcom/base/nsIMemoryReporter.idl:374
(Diff revision 2)
>     * raster images in content.
>     *
>     * |storageSQLite| (UNITS_BYTES)  Memory used by SQLite.
>     *
> -   * |lowMemoryEvents{Virtual,Physical}| (UNITS_COUNT_CUMULATIVE)  The number
> -   * of low-{virtual,physical}-memory events that have occurred since the
> +   * |lowMemoryEvents{Virtual,CommitSpace,Physical}| (UNITS_COUNT_CUMULATIVE)
> +   * The number of low-{virtual,physical,commit-space}-memory events that have

Nit: inconsistent ordering of the event types.
Attachment #8975517 - Flags: review?(n.nethercote) → review+
Comment on attachment 8975516 [details]
Bug 1451005 - Forward all memory-pressure events to the child processes;

https://reviewboard.mozilla.org/r/243772/#review250168
Attachment #8975516 - Flags: review?(n.nethercote) → review+
Comment on attachment 8966563 [details]
Bug 1451005 - Introduce a timer-based poller for detecting low-memory scenarios;

https://reviewboard.mozilla.org/r/235280/#review250170

Seems reasonable, but I'm a bit wary about the timer frequency. See the comment below.

::: xpcom/base/AvailableMemoryTracker.cpp:279
(Diff revision 3)
> +
> +  nsresult Init();
> +
> +private:
> +  // Poll the amount of free memory at this rate.
> +  static const uint32_t kPollingIntervalMS = 250;

How did you choose this interval? Even every 250ms sounds quite frequent to me.

Also, you said the timer "stops whenever the user stops interacting with Firefox" -- what's the mechanism for that?

::: xpcom/base/AvailableMemoryTracker.cpp:385
(Diff revision 3)
> +                                                     : MemPressure_New;
> +    NS_DispatchEventualMemoryPressure(state);
> +}
> +
> +void
> +nsAvailableMemoryWatcher::AdjustPollingInterval(const bool aLowMemory) {

Nit: brace on new line.

::: xpcom/base/AvailableMemoryTracker.cpp:425
(Diff revision 3)
> +
> +  return NS_OK;
> +}
> +
> +// Observer service callback, used to stop the polling timer when the user
> +// stops interacting with Firefox and resuming it when she interacts again.

I'd use "they interact"... but I'll let you decide.

::: xpcom/base/AvailableMemoryTracker.cpp:429
(Diff revision 3)
> +// Observer service callback, used to stop the polling timer when the user
> +// stops interacting with Firefox and resuming it when she interacts again.
> +// Also used to shut down the service if the application is quitting.
> +NS_IMETHODIMP
> +nsAvailableMemoryWatcher::Observe(nsISupports* aSubject, const char* aTopic,
> +                                 const char16_t* aData)

Nit: need to indent by 1 more space.
Attachment #8966563 - Flags: review?(n.nethercote) → review+
Comment on attachment 8966563 [details]
Bug 1451005 - Introduce a timer-based poller for detecting low-memory scenarios;

https://reviewboard.mozilla.org/r/235280/#review250170

> How did you choose this interval? Even every 250ms sounds quite frequent to me.
> 
> Also, you said the timer "stops whenever the user stops interacting with Firefox" -- what's the mechanism for that?

I measured how often we polled before using the VirtualAlloc hook and it happened from a minimum of ~10 times per second to a maximum of several thousands during startup. Loading a page would usually hover around 200-300 times per second. I figured that four times per second should be fast enough to respond to user input (e.g. scrolling a page with huge images, such as on Pinterest) while not consuming too many resources. If you think it's still too fast I can make it slower; maybe once per second might suffice. Either way it's going to be a lot less than before which is one more reason to replace the old code with this one in Win32 too.

As for stopping and starting the code listens for the user-interaction-active/inactive events. The inactive event is sent 5 seconds after the user stops interacting with the browser, the active one is sent immediately upon user interaction.
Comment on attachment 8966563 [details]
Bug 1451005 - Introduce a timer-based poller for detecting low-memory scenarios;

https://reviewboard.mozilla.org/r/235280/#review250262

I'm also concerned about the polling interval. Sure, this will be less than what 32-bit is doing today, but it's going to be more than what 64-bit is doing today. Last I checked (admittedly years ago) a GlobalMemoryStatus cost about 7ms. Doing that 4 times a second during user activity would probably horrify the perf team. To be safe I'd recommend doing some profiling of whatever scenarios we're currently interested in to make sure that this code doesn't appear in the profiles.

::: xpcom/base/AvailableMemoryTracker.cpp:282
(Diff revision 3)
> +private:
> +  // Poll the amount of free memory at this rate.
> +  static const uint32_t kPollingIntervalMS = 250;
> +
> +  // Observer topics we subscribe to
> +  static const char *kObserverTopics[];

Nit: this could be `const char* const` or `constexpr const char*`.
Attachment #8966563 - Flags: review+
Comment on attachment 8966563 [details]
Bug 1451005 - Introduce a timer-based poller for detecting low-memory scenarios;

(Not sure what mozreview is doing here...)
Attachment #8966563 - Flags: review+
Attachment #8966563 - Flags: feedback?(dmajor)
Attachment #8966563 - Flags: feedback+
OK, I'll give this a spin and report back. I tried measuring the cost of GlobalMemoryStatusEx() using PR_IntervalNow() calls when I started working on this patch and it didn't even register but I'll double-check just to be on the safe side. Anyway I doubt it's above the µs range because otherwise it would show up quite prominently during startup on Win32.
Request for data collection review form

What questions will you answer with this data?

How often user machines enter a state where they're running out of commit space. See here for a description of what committed memory is on Windows: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366803%28v=vs.85%29.aspx

Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:

We need this information to ensure that our out-of memory crash mitigations are working correctly as well as tracking how often users end up in a low-memory situation.

Can current instrumentation answer these questions?

No, this particular piece of data had already been gathered in the past but was forgotten during a refactoring of this code.

List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories on the Mozilla wiki.

Measurement Description: Number of low commit-space events
Data Collection Category: Technical data
Tracking Bug #1451005

How long will this data be collected?

I want to permanently monitor this data.

What populations will you measure?

All users irrespective of channel, country or locale. This applies only to the Windows platform though (both 32- and 64-bit builds).

If this data collection is default on, what is the opt-out mechanism for users?

There is no opt-out mechanism.

Please provide a general description of how you will analyze this data.

I will be monitoring the evolution of this events and compare it to OOM crash rate.

Where do you intend to share the results of your analysis?

On bugzilla
Flags: needinfo?(chutten)
There is actually an opt-out mechanism: the Telemetry checkbox in the privacy settings. That opts users out of all of our Telemetry collections.

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Standard Telemetry mechanisms apply.

    Is there a control mechanism that allows the user to turn the data collection on and off? 

Standard Telemetry mechanisms apply.

    If the request is for permanent data collection, is there someone who will monitor the data over time?**

memshrink will, and gsvelto specifically.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Type 1

    Is the data collection request for default-on or default-off?

Default-on

    Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?

No.

    Is the data collection covered by the existing Firefox privacy notice?

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

No.
-----
Result: datareview+
Flags: needinfo?(chutten)
I've done some testing on the cost of GlobalMemoryStatusEx() on my machine which is at the same time relatively fast (~4GHz) and relatively old (a Xeon E3 1270v2 so an Ivy Bridge-based core). The machine has full Meltdown and Spectre mitigations applied which I think is something important to keep in mind here. Invocations of the function seem to cost between 4µs and 8µs but they consistently tend to creep up towards the upper bound as time goes by. My guess is that this progressive slowing down of the function is a side-effect of the Meltdown mitigations having to switch in and out a larger page table as Firefox memory consumption increases thus making system calls costlier.

In the light of this and considering that most machines where this will be useful will be old machines I agree that slowing down the timer is the best course of action. We'll poll no more than once per second. Crash pings also show that when we finally run out of memory we're usually quite deep in the woods (well below the 256MiB threshold we set) so it's probable that we reach that situation slowly.

Anyway it should be easy to see if this works or not from the telemetry data we gather. We can always consider an adaptive approach in the future if need be.
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/dbacc0320046
Introduce a timer-based poller for detecting low-memory scenarios; r=dmajor,njn
https://hg.mozilla.org/integration/autoland/rev/ad6107fa1bba
Forward all memory-pressure events to the child processes; r=njn
https://hg.mozilla.org/integration/autoland/rev/640682f11718
Add low commit-space event counts to the memory reporter; r=njn
https://hg.mozilla.org/integration/autoland/rev/9101f785e646
Add telemetry probes for low commit-space events; r=chutten
Are you able to see a difference from telemetry results? Can you link to that or give some more followup information? Thanks.
Flags: needinfo?(gsvelto)
I'm looking at this chart for now, it counts the number of OOM crashes on Win64 nightly:

https://sql.telemetry.mozilla.org/queries/53931#142743

I think it's too early to tell since there's only 5 days worth of data. Additionally I should probably normalize the number of crashes by usage hours to get a better view of what's happening but my SQL-fu isn't good enough for that (yet).
Flags: needinfo?(gsvelto)
I've updated the query so that I can normalize the number of OOM crashes by the amount of usage hours. The relevant graph is here:

https://sql.telemetry.mozilla.org/queries/53931#142756
I did further tweaks to the query to also display the number of OOM crashes as a percentage of the total crash pings received.
You may wish to use the build_id for your X axis if you want to see before-and-after differences.

You can turn build_ids to dates using `DATE_PARSE(SUBSTR(build_id, 1, 8), '%Y%m%d') AS build_date` or you can use the visualization editor to set your X axis to... I think it's Category (maybe Linear will work too).

Either way the effect may or may not be perceptible in the noise of Nightly.
Thanks, I'll try that! In the meantime I wrote another contraption to compare OOM crashes VS build IDs here:

https://sql.telemetry.mozilla.org/queries/53936#142753
Blocks: 1464773
It seems that we have enough telemetry data to evaluate how this went:

- Low commit-space detection is working fine, we see data coming in in regular telemetry [1] and more importantly we see that a large percentage (>70%) of the low-memory crashes now have the LowCommitSpaceEvents annotation present [2]

- Unfortunately, even though we're detecting low-memory conditions we don't seem to be able to help much with them. While nightly data is quite noisy it seems that crashes / kilo usage hours and OOM crashes % hasn't gone down significantly [3]

The following steps should be to use this detector for Win32 builds too so that we can drop the DLL hooks and introduce more aggressive techniques to free memory, such as unloading tabs that are not visible, something that we already do on Fennec [4].

[1] https://mzl.la/2xfunN1
[2] https://sql.telemetry.mozilla.org/queries/55055#145014
[3] https://sql.telemetry.mozilla.org/queries/53931#142756
[4] https://searchfox.org/mozilla-central/rev/e6a0a192ea8691f7eb466506301da73fabe8fd23/mobile/android/chrome/content/browser.js#6700
Blocks: 1468207
You need to log in before you can comment on or make changes to this bug.