<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 6

•

7 years ago

[Tracking Requested - why for this release]: large memory regression that is not understood

tracking-firefox59: --- → ?

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 7

•

7 years ago

We can't just ship large memory regressions because nobody understands the code we're shipping.

Comment 8

•

7 years ago

> We can't just ship large memory regressions because nobody understands the code we're shipping. Agree. I believe the solution is to invest in understanding ICU/CLDR better including: - build system integration - how we select what we build - what tables/languages get added/removed between versions of CLDR/ICU - what is the package size / memory / perf impact of an update As far as I know :m_kato is the only person occasionally investigating those issues, while :waldo, :anba and :jfkthame put work to maintain our integration and prevent technical debt from accruing by updating us to latest ICU/CLDR/Unicode. Steven Loomis of IBM has been working on tooling to improve selection/slicing of data tables when building ICU, and I filed bug 1402379 to get us to a slightly better place on how we handle ICU/CLDR/Intl API integration into SpiderMonkey and Gecko.

Flags: needinfo?(srl)

Assignee

Comment 9

•

7 years ago

ICU data is 600KB+ by 60.1. Humm, I will create en-US only package to compare memory usages. BTW, why is this issue linux32-stylo-disabled only? (Why doesn't we run on linux32?)

Flags: needinfo?(m_kato)

Joel Maher ( :jmaher ) (UTC -8)

Comment 10

•

7 years ago

Did this regression show up on any other platforms besides Linux32?

Flags: needinfo?(igoldan)

Comment 11

•

7 years ago

it looks like linux32 only: https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=mozilla-inbound,1557638,1,4&series=mozilla-inbound,1557588,1,4&series=mozilla-inbound,1308546,1,4&highlightedRevisions=e74e4ec2cb39&zoom=1510394777838.5198,1511345278124.4744,358576776.7402563,521498125.05486304

Flags: needinfo?(igoldan)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 12

•

7 years ago

Humm, does it depends on OS page size? "stylo-disabled" isn't real configuration as user. I wonder that we don't run awsy on linux (style enabled).

Comment 13

•

7 years ago

stylo-disabled is a real config for android users and users who disable it for specific extensions. I am curious why this isn't run for regular linux32 and it looks like we intentionally shut it off in bug 1365708.

Liz Henry (:lizzard) (relman/hg->git project)

Comment 14

•

7 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #13) > stylo-disabled is a real config for android users and users who disable it > for specific extensions. I am curious why this isn't run for regular > linux32 and it looks like we intentionally shut it off in bug 1365708. Given 32-bit Linux is such a small percentage of users I think we decided it wasn't worth dedicating resources to perf testing. It does appear we regressed the win32 tabs open measures as well [1]. I'll spin up a few more runs to get better numbers. FWIW we enabled stylo on android in bug 1366049. Eventually we're removing the old style system in bug 1395112, so disabling it won't be an option. The main advantage to testing with stylo disabled is that we can compare memory usage of the new style system to the old one, with 57 shipped that probably less important. [1] https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=907fb4683dfef81fbb746035939ac8154b66dfe5&newProject=mozilla-inbound&newRevision=94111a55e8c3dfc3cb526650c7a18ea9e7f5d85e&originalSignature=c10a99a5c6a58850b7f9039fc7a4eb7b9ca2fc46&newSignature=c10a99a5c6a58850b7f9039fc7a4eb7b9ca2fc46&framework=4

Comment 15

•

7 years ago

I can track this for 59, and follow up next week to see if there is progress.

tracking-firefox59: ? → +

Assignee

Updated

•

7 years ago

Assignee: nobody → m_kato

Comment 16

•

7 years ago

Can you reproduce the memory increase using only ICU api calls? We frequently run Valgrind against ICU to make sure all memory is cleaned up. So there should not be wasted memory.

Flags: needinfo?(srl)

Andrew Overholt [:overholt]

Comment 17

•

7 years ago

(In reply to Steven R. Loomis from comment #16) > Can you reproduce the memory increase using only ICU api calls? We > frequently run Valgrind against ICU to make sure all memory is cleaned up. > So there should not be wasted memory. This isn't a leak. This is an increase in memory usage while Firefox is running.

Comment 18

•

7 years ago

(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #14) > (In reply to Joel Maher ( :jmaher) (UTC-5) from comment #13) > > stylo-disabled is a real config for android users and users who disable it > > for specific extensions. I am curious why this isn't run for regular > > linux32 and it looks like we intentionally shut it off in bug 1365708. > > Given 32-bit Linux is such a small percentage of users I think we decided it > wasn't worth dedicating resources to perf testing. I'm inclined to firefox-59:fix-optional this, then, but given this: > FWIW we enabled stylo on android in bug 1366049. Eventually we're removing > the old style system in bug 1395112, so disabling it won't be an option. The > main advantage to testing with stylo disabled is that we can compare memory > usage of the new style system to the old one, with 57 shipped that probably > less important. do we need to keep this open to track regressions with stylo on android?

Comment 19

•

7 years ago

This doesn't have anything to do with Stylo being enabled or not. We just happen to only run this particular configuration with Stylo disabled. Per comment 13, I think this affects all 32 bit systems.

Updated

•

7 years ago

Summary: 3.73% Resident Memory (linux32-stylo-disabled) regression on push e74e4ec2cb39 (Wed Nov 15 2017) → 3.73% Resident Memory (linux32-stylo-disabled and win32) regression on push e74e4ec2cb39 (Wed Nov 15 2017)

Comment 20

•

7 years ago

(In reply to Andrew McCreight [:mccr8] from comment #17) > (In reply to Steven R. Loomis from comment #16) > > Can you reproduce the memory increase using only ICU api calls? We > > frequently run Valgrind against ICU to make sure all memory is cleaned up. > > So there should not be wasted memory. > > This isn't a leak. This is an increase in memory usage while Firefox is > running. OK, thanks. This is comparing ICU 59.1 to 60.1 ? ICU does have configurable caching mechanisms, perhaps there's something that can be tweaked. A separate use case (just calling ICU code) that simulates your use case and reproduces this difference would be helpful.

Comment 21

•

7 years ago

FWIW I think we're looking at more like a 45MiB regression in the 'Tabs Open' measure of RSS on Win32. That measure is pretty noisy so it's hard to tell for sure, but this is rather concerning.

Liz Henry (:lizzard) (relman/hg->git project)

Comment 22

•

7 years ago

I was also pointed to "Smaller files in breakiterator 9954" http://site.icu-project.org/download/60#TOC-Common-Changes — BreakIterator does use more memory in 60 in exchange for smaller data files (1.2mb to 0.8 mb), because it caches break positions.

Updated

•

7 years ago

status-firefox58: --- → unaffected

Comment 23

•

7 years ago

This was independently filed upstream as https://ssl.icu-project.org/trac/ticket/13532

Chris Peterson [:cpeterson]

Assignee

Comment 24

•

7 years ago

(In reply to Steven R. Loomis from comment #22) > I was also pointed to "Smaller files in breakiterator 9954" > http://site.icu-project.org/download/60#TOC-Common-Changes — BreakIterator > does use more memory in 60 in exchange for smaller data files (1.2mb to 0.8 > mb), because it caches break positions. breakiterator might use Intl.RelativeTimeFormat only by bug 1270140 and we don't add break rule data file in our icudt*.dat.

Comment 25

•

7 years ago

Since this bug is not caused by Stylo, I am adding a whiteboard tag to remove this bug from the Stylo bug queries.

Whiteboard: [memshrink] → [memshrink][stylo]

Assignee

Comment 26

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6b8e2af73aa5bd730dd1179eb013f0f44e5bf6f3 Also, when I shrink dat file (en and root only) for testing, it doesn't reduce resident memory.

Andrew Overholt [:overholt]

Updated

•

7 years ago

Whiteboard: [memshrink][stylo] → [MemShrink:P1][stylo]

Comment 27

•

7 years ago

Are we able to do anything about this regression or do we just have to eat it since we want to take the new version of ICU?

Flags: needinfo?(m_kato)

Comment 28

•

7 years ago

(In reply to Andrew Overholt [:overholt] from comment #27) > Are we able to do anything about this regression or do we just have to eat > it since we want to take the new version of ICU? Has anyone evaluated https://ssl.icu-project.org/trac/ticket/13532 compared to the Mozilla code? For example, are you "creating large numbers of break iterators" where they might be able to be reused? It's possible the caches have some control on them.

Comment 29

•

7 years ago

(In reply to Steven R. Loomis from comment #28) > (In reply to Andrew Overholt [:overholt] from comment #27) > > Are we able to do anything about this regression or do we just have to eat > > it since we want to take the new version of ICU? > > Has anyone evaluated https://ssl.icu-project.org/trac/ticket/13532 compared > to the Mozilla code? For example, are you "creating large numbers of break > iterators" where they might be able to be reused? It's possible the caches > have some control on them. There's a linked ticket with an initial patch… it is aimed at speeding up initial time, but may reduce heap also.

https://treeherder.mozilla.org/#/jobs?repo=try&author=m_kato@ga2.so-net.ne.jp&selectedJob=157078744

Assignee

Comment 30

•

7 years ago

Assignee

Comment 31

•

7 years ago

This might be regression by the following ICU landing according to try server. svn log -r r40076 ------------------------------------------------------------------------ r40076 | roubert | 2017-04-27 05:12:01 +0900 (Thu, 27 Apr 2017) | 1 line ticket:13094: Handle empty language subtag in parseTagString(). ------------------------------------------------------------------------

Assignee

Comment 32

•

7 years ago

(In reply to Steven R. Loomis from comment #28) > (In reply to Andrew Overholt [:overholt] from comment #27) > > Are we able to do anything about this regression or do we just have to eat > > it since we want to take the new version of ICU? > > Has anyone evaluated https://ssl.icu-project.org/trac/ticket/13532 compared > to the Mozilla code? For example, are you "creating large numbers of break > iterators" where they might be able to be reused? It's possible the caches > have some control on them. We don't use breaker rule. So even if I apply that patch, it doesn't improve.

Comment 33

•

7 years ago

(In reply to Makoto Kato [:m_kato] from comment #31) > This might be regression by the following ICU landing according to try > server. > > svn log -r r40076 > ------------------------------------------------------------------------ > r40076 | roubert | 2017-04-27 05:12:01 +0900 (Thu, 27 Apr 2017) | 1 line > > ticket:13094: Handle empty language subtag in parseTagString(). > ------------------------------------------------------------------------ http://bugs.icu-project.org/trac/changeset/40076 caused a 3.73% increase in memory usage? Anyway, need info about the content of the used memory, what you're actually seeing.

Assignee

Comment 34

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5fc9a55f798aca4eb50218313e00b52805fc56d1 (r40074) Resident Memory opt stylo-disabled: 434,478,390.42 https://treeherder.mozilla.org/#/jobs?repo=try&revision=d3ed376e726cda213f300217753300bacb01f079 (r40076) Resident Memory opt stylo-disabled: 460,372,989.86

Assignee

Updated

•

7 years ago

Flags: needinfo?(m_kato)

Assignee

Comment 35

•

7 years ago

(In reply to Steven R. Loomis from comment #33) > (In reply to Makoto Kato [:m_kato] from comment #31) > > This might be regression by the following ICU landing according to try > > server. > > > > svn log -r r40076 > > ------------------------------------------------------------------------ > > r40076 | roubert | 2017-04-27 05:12:01 +0900 (Thu, 27 Apr 2017) | 1 line > > > > ticket:13094: Handle empty language subtag in parseTagString(). > > ------------------------------------------------------------------------ > > http://bugs.icu-project.org/trac/changeset/40076 caused a 3.73% increase in > memory usage? Anyway, need info about the content of the used memory, what > you're actually seeing. As long as our test infra, it causes this issue. When I backed out this from 60.2, this was resolved. Do you know the reason why Resident Memory (See https://wiki.mozilla.org/AWSY/Tests) is incremented by r40076?

Flags: needinfo?(srl)

Assignee

Comment 36

•

7 years ago

http://people.mozilla.org/%7Ejmaher/taloszips/zips/tp5n.zip is no loner available, so where is current tp5n.zip ?

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 37

•

7 years ago

ICU 60.2 Resident Memory opt stylo-disabled: 468,086,541.99 ICU 60.2 + backed out r40076 Resident Memory opt stylo-disabled: 457,160,950.85 Although I add an assertion that this condition (*langLength == 0 and _isIDSeparator(*position) is true) is hit, it doesn't stop by that assertion. This might be compiled code issue.

Comment 38

•

7 years ago

you can find tp5n.zip here: https://github.com/rwood-moz/talos-pagesets/

Flags: needinfo?(jmaher)

Assignee

Comment 39

•

7 years ago

:waldo, according to try server's AWSY data, this causes by http://bugs.icu-project.org/trac/changeset/40076 (http://bugs.icu-project.org/trac/ticket/13094). I add assertion with (*langLength == 0 and _isIDSeparator(*position) is true), tp5n doesn't becomes this condition. So I think that this issue depends on generated code by compiler. Of course, when we back out that change, the usage of Resident Memory is back. But I think that this is risky as long as ICU's ticket. And AWSY on linux32 is already disabled. So I recommend that this should be wontfix. How do you think?

Flags: needinfo?(jwalden+bmo)

Ryan VanderMeulen [:RyanVM]

Comment 40

•

7 years ago

(In reply to Makoto Kato [:m_kato] from comment #39) > :waldo, according to try server's AWSY data, this causes by > http://bugs.icu-project.org/trac/changeset/40076 > (http://bugs.icu-project.org/trac/ticket/13094). I add assertion with > (*langLength == 0 and _isIDSeparator(*position) is true), tp5n doesn't > becomes this condition. So I think that this issue depends on generated > code by compiler. > > Of course, when we back out that change, the usage of Resident Memory is > back. But I think that this is risky as long as ICU's ticket. And AWSY on > linux32 is already disabled. So I recommend that this should be wontfix. > How do you think? This is an issue on win32 as well (see comment 21). Given we know the exact patch that is regressing it seems like we can figure out a fix.

Comment 41

•

7 years ago

We're running out of time in the Fx59 development cycle to fix this. Are there any recent updates?

status-firefox60: --- → affected

status-firefox-esr52: --- → unaffected

tracking-firefox60: --- → +

Jeff Walden [:Waldo]

Comment 42

•

7 years ago

Attached patch Diff between the two trypushes diagnosing the RSS regression — Details — Splinter Review

I grabbed a copy of the try repository plus the two revs tested to diagnose the RSS regression. This is the full diff. Nothing in this except the one changed line. Looking back at the original change in Trac and discussion of it, the patch logic checks out. It's definitely the correct and proper change for what the code is trying to do. It would actively reintroduce the problem identified in that change, to revert it. Cherrypicking reverting this one change would not be correct. I don't see a plausible mechanism by which this one-line change leads to a 25-45MB regression. The statement that was moved from an else-arm to happen all the time, should only hit if a tag is parsed that lacks a language component -- but this shouldn't happen in the browser at all for JS on the web. Every language tag SpiderMonkey hands off to ICU, it's first syntax-checked to verify that the tag is structurally valid per BCP47. BCP47 (in contrast to the structure of the $LANG environment variable, that this function was additionally written to process) requires a language component in language tags. ICU probably does use this function to parse $LANG on Firefox's behalf -- but this would only likely happen rarely. Maybe only when getting the default locale in Intl code (and we actually cache that value pretty hard, so even that would happen not regularly). Besides, I rather doubt treeherder is running with one of these weird language-less $LANG in the first place. I also repushed the two revs in comment 34 to double-check the RSS stats: Before: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8a9233cfcfc9186bc336d87bc284a4e6fedd868e&selectedJob=161078449 After: https://treeherder.mozilla.org/#/jobs?repo=try&revision=35930e7a008de82c0cce926f939772ede5ffc067 The linux32 sy numbers are all over the map. I don't see a consistent regression in them at all. Nor do I see one in win32 numbers. m_kato also tells me he set up tp5 locally to try to find the regression alleged in comment 14 and comment 21 and wasn't seeing it. Given that https://treeherder.mozilla.org/perf.html#/graphs?timerange=7776000&series=mozilla-inbound,1557588,1,4&series=mozilla-inbound,1557638,1,4&series=mozilla-inbound,1308546,1,4&selected=mozilla-inbound,1308546,284784,380134165,4 isn't really revealing a regression here, and given that the mechanism by which a one-line change would result in such vast memory swing is also unclear -- at worst I would expect cost linear in the number of almost-always-small language tags parsed -- I am increasingly convinced we're just shadowboxing here, and there is not and never was an issue. So: at this point this whole bug feels like an INCOMPLETE or INVALID to me. Either the two identified revs are not the cause of the regression, or there isn't a regression at all. And given the long-term graph doesn't show a regression, that seems much likelier to me.

Flags: needinfo?(jwalden+bmo)