32-bit Firefox on Win7 uses 30MB more memory than 64-bit Firefox on Win10

NEW
Unassigned

Status

defect
P3
normal
11 months ago
2 months ago

People

(Reporter: erahm, Unassigned)

Tracking

(Blocks 1 bug)

Version 3
All
Windows
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [overhead:30MB])

Note: Filing under the AWSY component for now until we can track down the root cause.

The AWSY "Base Content Resident Unique" measure clearly shows a 32-bit Firefox build on a 32-bit Windows 7 image uses ~30MB more memory than a 64-bit Firefox build on a 64-bit Windows 10 image [1]. Conventional wisdom is that a 64-bit build should use more memory than a 32-bit build so something is quite off (our explicit numbers show this [2]). This discrepancy is somewhat concerning for Fission where an additional 30MB of overhead is unacceptable.

In this bug we'd like to at least identify what's going on and possibly come up with some mitigations.

Given this comparison has a few variables (operating system, byte size of OS, byte size of executable) we have a few questions we need to answer:

#1 - Is this limited to Win7, as in do we see a similar issue on Win10
     w/ a 32-bit build?
#2 - Is this limited to 32-bit OS builds, as in can we repro on Windows 7
     64-bit?. Can we repro on Windows 10 32-bit (yes this is a thing)?
#3 - Is this limited to 32-bit Firefox builds, as in can we repro with
     64-bit build on Windwos 7 64-bit

These questions can help us identify how important this is (for example if it's limited to Firefox 32-bit on Windows 7 32-bit we might care less).

There are a few snags before we can fully investigate these questions:
  #1 - We don't have a Windows 7 64-bit VM in automation AFAICT
  #2 - We don't have a Windows 10 32-bit VM 

#ateam or #treeherder folks might be able to help out with that.

Once we've isolated where this behavior exists we want to actually figure out what's going on. The current hunch is this is related to how ASLR is implemented on Windows 7 vs Windows 10. Another guess was this has something to do with the way libraries are specifying load addresses.

One suggestion has been to us VMMAP [3] to compare mappings between runs.

This test can be run with in automation with:

|./mach try -b o -p win32,win64 -u awsy-e10s -t none --rebuild 5|

and can be run locally with:

|./mach awsy-test testing/awsy/awsy/test_base_memory_usage.py|

[1] https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1684810,1,4&series=mozilla-inbound,1684816,1,4
[2] https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1707041,1,4&series=mozilla-central,1707039,1,4
[3] https://docs.microsoft.com/en-us/sysinternals/downloads/vmmap
(In reply to Eric Rahm [:erahm] from comment #0)
> #1 - Is this limited to Win7, as in do we see a similar issue on Win10
>      w/ a 32-bit build?

We've already answered this. We don't see this with a 32 bit build on (64 bit) Windows 10.
(In reply to Kris Maglione [:kmag] from comment #1)
> (In reply to Eric Rahm [:erahm] from comment #0)
> > #1 - Is this limited to Win7, as in do we see a similar issue on Win10
> >      w/ a 32-bit build?
> 
> We've already answered this. We don't see this with a 32 bit build on (64
> bit) Windows 10.

Sure but I don't have a link to prove that :) Can you dig up the try results?
David, do you think you can look into this?
Flags: needinfo?(dmajor)
(In reply to Eric Rahm [:erahm] from comment #2)
> (In reply to Kris Maglione [:kmag] from comment #1)
> > (In reply to Eric Rahm [:erahm] from comment #0)
> > > #1 - Is this limited to Win7, as in do we see a similar issue on Win10
> > >      w/ a 32-bit build?
> > 
> > We've already answered this. We don't see this with a 32 bit build on (64
> > bit) Windows 10.
> 
> Sure but I don't have a link to prove that :) Can you dig up the try results?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=de4d9ece2c71
Another data point: There was about a week when the 32 bit Windows 7 numbers were bi-modal between reasonable values and the current numbers:

https://treeherder.mozilla.org/perf.html#/graphs?timerange=31536000&series=mozilla-central,1685127,1,4&series=try,1685822,1,4&highlightedRevisions=368ae05266bd&selected=mozilla-central,1685127,349747,532389285,4
(In reply to Kris Maglione [:kmag] from comment #5)
> Another data point: There was about a week when the 32 bit Windows 7 numbers
> were bi-modal between reasonable values and the current numbers:
> 
> https://treeherder.mozilla.org/perf.html#/
> graphs?timerange=31536000&series=mozilla-central,1685127,1,4&series=try,
> 1685822,1,4&highlightedRevisions=368ae05266bd&selected=mozilla-central,
> 1685127,349747,532389285,4

That was a test harness issue caused by bug 1395540, and fixed by bug 1470831. TLDR, disabling sandboxing caused a memory weirdness.
I think the methodical thing to do here would be to run this awsy-test locally on four identical-spec VMs on the same host -- {32,64}-bit Win{7,10} -- with both bitness of Firefox where available. I'm not entirely confident that we can combine separate deductions made from different machines.

Then when we've identified the minimal-pair that demonstrates the issue, run xperf/WPA with VirtualAlloc tracing on both of them.

I am not opposed to doing this myself, but it may take me longer than you would like.
Flags: needinfo?(dmajor)
(In reply to David Major [:dmajor] from comment #7)
> I am not opposed to doing this myself, but it may take me longer than you
> would like.

It's okay if it takes some time, we need to figure it out but there are plenty of other projects going on as well so it's not blocking anything.
Flagging as 30MB overhead for now, it's not clear if this is Win7/win8 specific or 32-bit on Win7 specific or 32-bit OS specific. Win7 is 43% [1] of our user-base, add in Win 8 and we're at 50% so this is pretty important.

30% of our users are using a 32-bit browser (the OS breakdown isn't clear on that number), 18% are using a 32-bit OS (again the OS breakdown isn't clear).

[1] https://data.firefox.com/dashboard/hardware
Whiteboard: [overhead:30MB]
Update: dmajor was unable to repro on a VM by manually running. I tried on a W7 loaner and couldn't repro manually, tested using mozharness and was also unable to repro.

Other failed attempts included:
 - reload the process w/ a clean profile and then test
 - reload the process w/ the same profile
 - remove our call to minimize memory (GC and CC)
 - simulate user activity

We discussed adding telemetry to see if this happens in the wild.
Depends on: 1494827

Telemetry has confirmed that this is limited to 32-bit builds. It's most prevalent on Windows 7/8, but there is a small cohort on Windows 10 as well. Both the number of Win7 users (37%) and 32-bit browser users (26%) continue to fall. On beta 67 we're seeing roughly 3.5% of 32-bit samples showing this bump. Combined it looks something like 2% of Windows samples. Telemetry for total memory usage doesn't show a correlating spike which seems to imply we don't see this behavior in all content processes.

Interestingly, at least in our test infrastructure, initial results from the font sharing work in bug 1514869 ended up clearing the large bump. It's possible that by being the first 32-bit program to access font data we take some sort of large hit loading up various windows dependencies, but it's not clear exactly what is going on.

Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.