Closed Bug 1558554 Opened 5 years ago Closed 5 years ago

Tabs are suspending above the low memory threshold

Categories

(Firefox :: Tabbed Browser, defect, P2)

67 Branch
defect

Tracking

()

RESOLVED FIXED
Firefox 69
Tracking Status
firefox-esr60 --- unaffected
firefox67 --- wontfix
firefox68 --- disabled
firefox69 + fixed

People

(Reporter: jeff.delamater, Assigned: gsvelto)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0

Steps to reproduce:

Normal browsing. No special steps to reproduce

Actual results:

Per the feature in 67, tabs are auto-suspending, but this is only supposed to happen when available memory is below 400mb. The problem is, the auto suspending is happening even when available memory is above 400mb. Currently, I have 2.1gb available, and tabs are suspending.

Expected results:

With sufficient available memory, tabs should not auto-suspend.

Component: Untriaged → Tabbed Browser
Regressions: tab-unloading
No longer regressions: tab-unloading

Gabriele, can you take a look / help investigate what's going on here?

Flags: needinfo?(gsvelto)
Keywords: regression
Priority: -- → P2
Regressed by: 675539
See Also: → 1554124

We've seen happening in a few cases but I need more information to diagnose it: when you say that there are still 2.1gb available is that the value show by the task manager? If it is, when this happens could you report what shows up under the "Committed" label? Also how much physical memory and swap space do you have? It would also be very helpful if you could attach a memory report when this happens: to get one browse to about:memory, under the "Save memory reports" box check the "anonymize" button then press "Measure & Save".

To give you a bit of context as to why this might be happening keep in mind that tabs can be suspended if the amount of committed memory gets too close to the maximum limit even if there's still available memory shown in the task manager. The reason for this behavior is that Windows will not allow an application to use available memory if there's no commit space available. It's a bit counter-intuitive but it's how Windows manages memory.

Flags: needinfo?(gsvelto) → needinfo?(jeff.delamater)

(In reply to Gabriele Svelto [:gsvelto] from comment #2)

I saved the memory report, but I don't see any way of attaching it. Can I provide a Google Drive link? Or is there another preferred way of providing the file?

On to the other questions...
16GB physical ram
4096MB of fixed swap space (100GB free on drive)

Task Manager Memory
In use (compressed): 7.4GB (687MB)
Committed: 13.9/20.0GB
paged pool: 924MB
non-paged pool: 434MB
Available: 8.5GB
Cached: 8.4GB

One thing that I just noticed, and maybe this is related?
I opened up Resource Monitor, and went to the memory tab there. It gives a more granular view of memory use.
Hardware Reserved: 39MB
In Use: 7577MB
Modified: 74MB
Standby: 8557MB
Free: 123MB

Flags: needinfo?(jeff.delamater)

You can attach the report by clicking on the "Attach file" link under the "Details" pane at the top of the bug. It's small and easy to miss. The free memory shown by the Resource Monitor indicates memory that is really free - i.e. it does not contain anything at all - so don't worry about that. You seem to have plenty of available commit space which worries me because tab unloading shouldn't kick in at all. Did tabs get suspended only once or did it happen multiple times?

Flags: needinfo?(jeff.delamater)

Nevermind, I figured out what's going on. It seems that - at least under recent version of Windows 10 - every process is assigned a lower available commit space. This amount is reported in the MEMORYSTATUSEX ullAvailPageFile and is what we use to detect if we're running out of memory. On my machine I could create a scenario where that value would start around 1GiB, then drop to less than 10MiB (causing tabs to be unloaded) only to jump up again to 4GiB less than a second later.

This means that our low-memory detection logic is flawed. I'm not sure if this is due to a recent change in Windows or if it always worked this way but I never saw this behavior before. David do you know something about this?

Anyway I'll have to switch our low-memory detection logic to using PROCESS_MEMORY_COUNTERS_EX which provides a (presumably reliable) way to measure total commit space irrespective of temporary per-process limits.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(jeff.delamater) → needinfo?(dmajor)
See Also: → 1554547
Blocks: 1554547
See Also: 1554124, 1554547

The fix for this issue needs to happen in three steps:

  • First we need to switch the OOM crash annotations measuring how much memory is available to use the global values provided via GetPerformanceInfo.
  • Then we need to gather enough information to re-evaluate the appropriate threshold to use to detect low-memory scenarios WRT to commit space
  • Once we have enough data we can switch the available memory tracker to also use GetPerformanceInfo to measure free memory taking into account the new thresholds

In the meantime I think we should turn off tab unloading on release (it can be done via a pref). I don't know how widespread the issue is but better safe than sorry.

(In reply to Gabriele Svelto [:gsvelto] from comment #5)

Nevermind, I figured out what's going on. It seems that - at least under
recent version of Windows 10 - every process is assigned a lower available
commit space. This amount is reported in the MEMORYSTATUSEX ullAvailPageFile
and is what we use to detect if we're running out of memory. On my machine I
could create a scenario where that value would start around 1GiB, then drop
to less than 10MiB (causing tabs to be unloaded) only to jump up again to
4GiB less than a second later.

I haven't heard of this behavior before. Can you reproduce these jumps reliably? When ullAvailPageFile drops, what happens to ullAvailPageFile and dwMemoryLoad?

Flags: needinfo?(dmajor)

[Tracking Requested - why for this release]: new significant regression

(In reply to Gabriele Svelto [:gsvelto] from comment #7)

The fix for this issue needs to happen in three steps:

  • First we need to switch the OOM crash annotations measuring how much memory is available to use the global values provided via GetPerformanceInfo.
  • Then we need to gather enough information to re-evaluate the appropriate threshold to use to detect low-memory scenarios WRT to commit space
  • Once we have enough data we can switch the available memory tracker to also use GetPerformanceInfo to measure free memory taking into account the new thresholds

In the meantime I think we should turn off tab unloading on release (it can be done via a pref). I don't know how widespread the issue is but better safe than sorry.

Could you file a new bug for preffing off on release? Thanks

Flags: needinfo?(gsvelto)

(In reply to Pascal Chevrel:pascalc from comment #10)

Could you file a new bug for preffing off on release? Thanks

I filed bug 1558930.

Flags: needinfo?(gsvelto)
Blocks: 1554990
Depends on: 1558930

(In reply to David Major [:dmajor] from comment #8)

I haven't heard of this behavior before. Can you reproduce these jumps reliably?

No, I managed to reproduce it only once where I saw ullTotalPageFile start at 1GiB then jump to 4GiB. In all the other tests I made ullTotalPageFile was always the maximum amount for my machine (around 40GiB). I'm almost starting to doubt myself here... but besides my memory there's at least two about:memory reports where we had low-commit-space events recorded on machines with plenty of available commit space.

When ullAvailPageFile drops, what happens to ullAvailPageFile and dwMemoryLoad?

I'm still trying to reproduce the issue. I'll check those out. In the meantime I'll switch the code to use GetPerformanceInfo because that seems the correct way to measure global commit-space. The only thing I could establish with certainty from Windows documentation is that MEMORYSTATUSEX.ullAvailPageFile and MEMORYSTATUSEX.ullTotalPageFile are both per-process and not global and their meaning changed over time.

Anyway I'll have to switch our low-memory detection logic to using
PROCESS_MEMORY_COUNTERS_EX which provides a (presumably reliable) way to
measure total commit space irrespective of temporary per-process limits.

(Since you mentioned GetPerformanceInfo in comment 7, I assume you mean PERFORMANCE_INFORMATION rather than PROCESS_MEMORY_COUNTERS_EX )

Anyway, I'm looking at the doc for PERFORMANCE_INFORMATION, and for the CommitLimit field it says: "This number can change if memory is added or deleted, or if pagefiles have grown, shrunk, or been added. If the paging file can be extended, this is a soft limit."

That doesn't sound super reliable either. :(

(In reply to David Major [:dmajor] from comment #13)

(Since you mentioned GetPerformanceInfo in comment 7, I assume you mean PERFORMANCE_INFORMATION rather than PROCESS_MEMORY_COUNTERS_EX )

Anyway, I'm looking at the doc for PERFORMANCE_INFORMATION, and for the CommitLimit field it says: "This number can change if memory is added or deleted, or if pagefiles have grown, shrunk, or been added. If the paging file can be extended, this is a soft limit."

That doesn't sound super reliable either. :(

Yes, it's not optimal but at least it's global. While I couldn't reproduce the behavior I saw yesterday I confirmed that it is possible to adjust MEMORYSTATUSEX.ullAvailPageFile at runtime (up or down) so it's not unlikely that this is happening to users. I'll try to use CommitLimit instead and augment it with QueryMemoryResourceNotification. With enough annotations in the crash pings I should be able to establish what's the best tool to use.

I just wish Microsoft documentation was a little less light on details.

I guess we won't want to ship 68 with this.

Severity: normal → major

This uses GetPerformanceInformation() to measure how much commit space is
available globally. Previously we relied on GlobalMemoryStatusEx() but in
spite of the function name the commit-space values it returns are
per process and thus unreliable. Available physical memory measurement has
also been removed since it was never used and we don't plan on using it
anytime soon.

This is being disabled in bug 1558930 so clearing 67/68 tracking flags.

Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/82daf65cbe5e
Properly detect low-commit space scenarios on Windows r=dmajor
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 69
Assignee: nobody → gsvelto

I guess the plan here it to just let this fix ride the trains?

(In reply to Ryan VanderMeulen [:RyanVM] from comment #20)

I guess the plan here it to just let this fix ride the trains?

Yes, this will allow me to gather more data about the appropriate low-memory threshold. I will then turn tab unloading back on and we'll let it ride the trains again.

Regressions: 1559760
Regressions: 1559761
Regressions: 1559851

Is there any useful data from nightly so far? Have we seen that this fixes the problem of tabs suspending too soon?

Yes, data from nightly crash pings suggests that the measurements we get are more precise than before. Generally speaking the threshold we hit before crashing appears to be lower so when I'll re-enable tab unloading I'll bring the threshold down too. Before doing that I also want to gather data on QueryMemoryResourceNotification in the hope that I can use it instead of our polling tracker.

Regressions: 1575216

Would/should this have either "PERF" or "MLK" key words?

Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: