1558554 - Tabs are suspending above the low memory threshold

jeff.delamater

Reporter

Description

•

6 years ago

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0

Steps to reproduce:

Normal browsing. No special steps to reproduce

Actual results:

Per the feature in 67, tabs are auto-suspending, but this is only supposed to happen when available memory is below 400mb. The problem is, the auto suspending is happening even when available memory is above 400mb. Currently, I have 2.1gb available, and tabs are suspending.

Expected results:

With sufficient available memory, tabs should not auto-suspend.

Alice0775 White

Updated

•

6 years ago

Component: Untriaged → Tabbed Browser

Regressions: tab-unloading

Alice0775 White

Updated

•

6 years ago

Blocks: tab-unloading

No longer regressions: tab-unloading

Dão Gottwald [:dao]

Comment 1

•

6 years ago

Gabriele, can you take a look / help investigate what's going on here?

status-firefox67: --- → affected

Flags: needinfo?(gsvelto)

Keywords: regression

Priority: -- → P2

Regressed by: 675539

Comment 2

•

6 years ago

We've seen happening in a few cases but I need more information to diagnose it: when you say that there are still 2.1gb available is that the value show by the task manager? If it is, when this happens could you report what shows up under the "Committed" label? Also how much physical memory and swap space do you have? It would also be very helpful if you could attach a memory report when this happens: to get one browse to about:memory, under the "Save memory reports" box check the "anonymize" button then press "Measure & Save".

To give you a bit of context as to why this might be happening keep in mind that tabs can be suspended if the amount of committed memory gets too close to the maximum limit even if there's still available memory shown in the task manager. The reason for this behavior is that Windows will not allow an application to use available memory if there's no commit space available. It's a bit counter-intuitive but it's how Windows manages memory.

Flags: needinfo?(gsvelto) → needinfo?(jeff.delamater)

jeff.delamater

Reporter

Comment 3

•

6 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #2)

I saved the memory report, but I don't see any way of attaching it. Can I provide a Google Drive link? Or is there another preferred way of providing the file?

On to the other questions...
16GB physical ram
4096MB of fixed swap space (100GB free on drive)

Task Manager Memory
In use (compressed): 7.4GB (687MB)
Committed: 13.9/20.0GB
paged pool: 924MB
non-paged pool: 434MB
Available: 8.5GB
Cached: 8.4GB

One thing that I just noticed, and maybe this is related?
I opened up Resource Monitor, and went to the memory tab there. It gives a more granular view of memory use.
Hardware Reserved: 39MB
In Use: 7577MB
Modified: 74MB
Standby: 8557MB
Free: 123MB

Flags: needinfo?(jeff.delamater)

Gabriele Svelto [:gsvelto]

Assignee

Comment 4

•

6 years ago

You can attach the report by clicking on the "Attach file" link under the "Details" pane at the top of the bug. It's small and easy to miss. The free memory shown by the Resource Monitor indicates memory that is really free - i.e. it does not contain anything at all - so don't worry about that. You seem to have plenty of available commit space which worries me because tab unloading shouldn't kick in at all. Did tabs get suspended only once or did it happen multiple times?

Dão Gottwald [:dao]

Updated

•

6 years ago

Flags: needinfo?(jeff.delamater)

Gabriele Svelto [:gsvelto]

Assignee

Comment 5

•

6 years ago

Nevermind, I figured out what's going on. It seems that - at least under recent version of Windows 10 - every process is assigned a lower available commit space. This amount is reported in the MEMORYSTATUSEX ullAvailPageFile and is what we use to detect if we're running out of memory. On my machine I could create a scenario where that value would start around 1GiB, then drop to less than 10MiB (causing tabs to be unloaded) only to jump up again to 4GiB less than a second later.

This means that our low-memory detection logic is flawed. I'm not sure if this is due to a recent change in Windows or if it always worked this way but I never saw this behavior before. David do you know something about this?

Anyway I'll have to switch our low-memory detection logic to using PROCESS_MEMORY_COUNTERS_EX which provides a (presumably reliable) way to measure total commit space irrespective of temporary per-process limits.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Flags: needinfo?(jeff.delamater) → needinfo?(dmajor)

Gabriele Svelto [:gsvelto]

Assignee

Updated

•

6 years ago

Updated

•

6 years ago

Blocks: 1554547

Dão Gottwald [:dao]

Updated

•

6 years ago

Comment 7

•

6 years ago

The fix for this issue needs to happen in three steps:

First we need to switch the OOM crash annotations measuring how much memory is available to use the global values provided via GetPerformanceInfo.
Then we need to gather enough information to re-evaluate the appropriate threshold to use to detect low-memory scenarios WRT to commit space
Once we have enough data we can switch the available memory tracker to also use GetPerformanceInfo to measure free memory taking into account the new thresholds

In the meantime I think we should turn off tab unloading on release (it can be done via a pref). I don't know how widespread the issue is but better safe than sorry.

(Away)

Comment 8

•

6 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #5)

Nevermind, I figured out what's going on. It seems that - at least under
recent version of Windows 10 - every process is assigned a lower available
commit space. This amount is reported in the MEMORYSTATUSEX ullAvailPageFile
and is what we use to detect if we're running out of memory. On my machine I
could create a scenario where that value would start around 1GiB, then drop
to less than 10MiB (causing tabs to be unloaded) only to jump up again to
4GiB less than a second later.

I haven't heard of this behavior before. Can you reproduce these jumps reliably? When ullAvailPageFile drops, what happens to ullAvailPageFile and dwMemoryLoad?

Flags: needinfo?(dmajor)

Dão Gottwald [:dao]

Comment 9

•

6 years ago

[Tracking Requested - why for this release]: new significant regression

status-firefox68: --- → affected

tracking-firefox67: --- → ?

tracking-firefox68: --- → ?

Pascal Chevrel:pascalc

Comment 10

•

6 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #7)

The fix for this issue needs to happen in three steps:

First we need to switch the OOM crash annotations measuring how much memory is available to use the global values provided via GetPerformanceInfo.

Then we need to gather enough information to re-evaluate the appropriate threshold to use to detect low-memory scenarios WRT to commit space

Once we have enough data we can switch the available memory tracker to also use GetPerformanceInfo to measure free memory taking into account the new thresholds

In the meantime I think we should turn off tab unloading on release (it can be done via a pref). I don't know how widespread the issue is but better safe than sorry.

Could you file a new bug for preffing off on release? Thanks

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Assignee

Comment 11

•

6 years ago

(In reply to Pascal Chevrel:pascalc from comment #10)

Could you file a new bug for preffing off on release? Thanks

I filed bug 1558930.

Flags: needinfo?(gsvelto)

Dão Gottwald [:dao]

Updated

•

6 years ago

Blocks: 1554990

Dão Gottwald [:dao]

Updated

•

6 years ago

Depends on: 1558930

Gabriele Svelto [:gsvelto]

Assignee

Comment 12

•

6 years ago

(In reply to David Major [:dmajor] from comment #8)

I haven't heard of this behavior before. Can you reproduce these jumps reliably?

No, I managed to reproduce it only once where I saw ullTotalPageFile start at 1GiB then jump to 4GiB. In all the other tests I made ullTotalPageFile was always the maximum amount for my machine (around 40GiB). I'm almost starting to doubt myself here... but besides my memory there's at least two about:memory reports where we had low-commit-space events recorded on machines with plenty of available commit space.

When ullAvailPageFile drops, what happens to ullAvailPageFile and dwMemoryLoad?

I'm still trying to reproduce the issue. I'll check those out. In the meantime I'll switch the code to use GetPerformanceInfo because that seems the correct way to measure global commit-space. The only thing I could establish with certainty from Windows documentation is that MEMORYSTATUSEX.ullAvailPageFile and MEMORYSTATUSEX.ullTotalPageFile are both per-process and not global and their meaning changed over time.

(Away)

Comment 13

•

6 years ago

Anyway I'll have to switch our low-memory detection logic to using
PROCESS_MEMORY_COUNTERS_EX which provides a (presumably reliable) way to
measure total commit space irrespective of temporary per-process limits.

(Since you mentioned GetPerformanceInfo in comment 7, I assume you mean PERFORMANCE_INFORMATION rather than PROCESS_MEMORY_COUNTERS_EX )

Anyway, I'm looking at the doc for PERFORMANCE_INFORMATION, and for the CommitLimit field it says: "This number can change if memory is added or deleted, or if pagefiles have grown, shrunk, or been added. If the paging file can be extended, this is a soft limit."

That doesn't sound super reliable either. :(

Gabriele Svelto [:gsvelto]

Assignee

Comment 14

•

6 years ago

(In reply to David Major [:dmajor] from comment #13)

(Since you mentioned GetPerformanceInfo in comment 7, I assume you mean PERFORMANCE_INFORMATION rather than PROCESS_MEMORY_COUNTERS_EX )

Anyway, I'm looking at the doc for PERFORMANCE_INFORMATION, and for the CommitLimit field it says: "This number can change if memory is added or deleted, or if pagefiles have grown, shrunk, or been added. If the paging file can be extended, this is a soft limit."

That doesn't sound super reliable either. :(

Yes, it's not optimal but at least it's global. While I couldn't reproduce the behavior I saw yesterday I confirmed that it is possible to adjust MEMORYSTATUSEX.ullAvailPageFile at runtime (up or down) so it's not unlikely that this is happening to users. I'll try to use CommitLimit instead and augment it with QueryMemoryResourceNotification. With enough annotations in the crash pings I should be able to establish what's the best tool to use.

I just wish Microsoft documentation was a little less light on details.

Julien Cristau [:jcristau]

Comment 15

•

6 years ago

I guess we won't want to ship 68 with this.

status-firefox69: --- → affected

status-firefox-esr60: --- → unaffected

tracking-firefox68: ? → blocking

tracking-firefox69: --- → +

BugBot [:suhaib / :marco/ :calixte]

Updated

•

6 years ago

Severity: normal → major

Gabriele Svelto [:gsvelto]

Assignee

Comment 16

•

6 years ago

Attached file Bug 1558554 - Properly detect low-commit space scenarios on Windows — Details

This uses GetPerformanceInformation() to measure how much commit space is
available globally. Previously we relied on GlobalMemoryStatusEx() but in
spite of the function name the commit-space values it returns are
per process and thus unreliable. Available physical memory measurement has
also been removed since it was never used and we don't plan on using it
anytime soon.

Julien Cristau [:jcristau]

Comment 17

•

6 years ago

This is being disabled in bug 1558930 so clearing 67/68 tracking flags.

tracking-firefox67: ? → ---

tracking-firefox68: blocking → ---

Pulsebot

Comment 18

•

6 years ago

Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/82daf65cbe5e Properly detect low-commit space scenarios on Windows r=dmajor

Alexandru Michis [:malexandru]

Comment 19

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/82daf65cbe5e

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox69: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 69

BugBot [:suhaib / :marco/ :calixte]

Updated

•

6 years ago

Assignee: nobody → gsvelto

Ryan VanderMeulen [:RyanVM]

Comment 20

•

6 years ago

I guess the plan here it to just let this fix ride the trains?

status-firefox67: affected → disabled

status-firefox68: affected → disabled

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox67: disabled → wontfix

Gabriele Svelto [:gsvelto]

Assignee

Comment 21

•

6 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #20)

I guess the plan here it to just let this fix ride the trains?

Yes, this will allow me to gather more data about the appropriate low-memory threshold. I will then turn tab unloading back on and we'll let it ride the trains again.

Cosmin Sabou [:CosminS]

Updated

•

6 years ago

Regressions: 1559760

Cosmin Sabou [:CosminS]

Updated

•

6 years ago

Regressions: 1559761

Andreea Pavel [:apavel]

Updated

•

6 years ago

Regressions: 1559851

(Away)

Comment 22

•

6 years ago

Is there any useful data from nightly so far? Have we seen that this fixes the problem of tabs suspending too soon?

Gabriele Svelto [:gsvelto]

Assignee

Comment 23

•

6 years ago

Yes, data from nightly crash pings suggests that the measurements we get are more precise than before. Generally speaking the threshold we hit before crashing appears to be lower so when I'll re-enable tab unloading I'll bring the threshold down too. Before doing that I also want to gather data on QueryMemoryResourceNotification in the hope that I can use it instead of our polling tracker.

Gabriele Svelto [:gsvelto]

Assignee

Comment 24

•

6 years ago

You can find the raw data here: https://sql.telemetry.mozilla.org/queries/52836/source#140549

Florian Quèze [:florian]

Updated

•

5 years ago

Regressions: 1575216

Worcester12345

Comment 25

•

5 years ago

Would/should this have either "PERF" or "MLK" key words?

BugBot [:suhaib / :marco/ :calixte]

Updated

•

3 years ago

Has Regression Range: --- → yes