Closed
Bug 1418296
Opened 7 years ago
Closed 7 years ago
3.73% Resident Memory (linux32-stylo-disabled and win32) regression on push e74e4ec2cb39 (Wed Nov 15 2017)
Categories
(Core :: JavaScript: Internationalization API, defect, P2)
Tracking
()
RESOLVED
INCOMPLETE
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox58 | --- | unaffected |
firefox59 | --- | wontfix |
firefox60 | --- | wontfix |
People
(Reporter: igoldan, Assigned: m_kato, NeedInfo)
References
(Blocks 1 open bug)
Details
(Keywords: perf, regression, Whiteboard: [MemShrink:P1][stylo])
Attachments
(1 file)
We have detected an awsy regression from push:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=815b8640da1e3029dd680e4187b68852d36e2a4e&tochange=e74e4ec2cb39d737da0e16d0645e9dbdc564acd1
As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
4% Resident Memory linux32-stylo-disabled opt stylo-disabled 391,463,779.41 -> 406,046,931.70
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=10587
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.
To learn more about the regressing test(s), please see: https://wiki.mozilla.org/AWSY/Tests
Reporter | ||
Updated•7 years ago
|
Component: Untriaged → JavaScript: Internationalization API
Product: Firefox → Core
Reporter | ||
Comment 1•7 years ago
|
||
:anba Please look for this regression. Can we fix this or should we backout/accept?
Flags: needinfo?(andrebargull)
Comment 2•7 years ago
|
||
ICU is nowadays too widely used in Firefox which rules out a backout from my point of view, because we always want to stay on the latest version for security fixes and to use up-to-date localization data. Fixing the regression is difficult/impossible without having any information which data exactly caused in the increased resident memory usage. It could be:
- New or changed localization data in ICU
- New or changed code in ICU
- Data structures in Gecko which cache or use data from ICU and now need to store more data because ICU provides more data
- Data structures in SpiderMonkey which cache or use data from ICU ...
- bug 1405993 also included a Unicode update, maybe additional Unicode data increased the memory usage
- and so forth.
:Waldo, do you think there's anything actionable for this bug?
Flags: needinfo?(andrebargull) → needinfo?(jwalden+bmo)
Updated•7 years ago
|
Priority: -- → P2
Reporter | ||
Comment 3•7 years ago
|
||
(In reply to André Bargull [:anba] from comment #2)
> Fixing the regression is difficult/impossible without having any information which data
> exactly caused in the increased resident memory usage.
All our subtests saw at least 3% regressions. These ones are bigger and more consistent:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=907fb4683dfef81fbb746035939ac8154b66dfe5&newProject=mozilla-inbound&newRevision=94111a55e8c3dfc3cb526650c7a18ea9e7f5d85e&originalSignature=31fd0678d8e90fb268261b5df34a7e08897caaf4&newSignature=31fd0678d8e90fb268261b5df34a7e08897caaf4&showOnlyImportant=1&framework=4
Updated•7 years ago
|
Whiteboard: [memshrink]
Comment 4•7 years ago
|
||
(In reply to André Bargull [:anba] from comment #2)
> ICU is nowadays too widely used in Firefox which rules out a backout from my
> point of view, because we always want to stay on the latest version for
> security fixes and to use up-to-date localization data. Fixing the
> regression is difficult/impossible without having any information which data
> exactly caused in the increased resident memory usage. It could be:
> - New or changed localization data in ICU
> - New or changed code in ICU
> - Data structures in Gecko which cache or use data from ICU and now need to
> store more data because ICU provides more data
> - Data structures in SpiderMonkey which cache or use data from ICU ...
> - bug 1405993 also included a Unicode update, maybe additional Unicode data
> increased the memory usage
> - and so forth.
>
>
> :Waldo, do you think there's anything actionable for this bug?
I'd like to separate the need for security updates from large table updates.
Kato-san: can you suggest a way to strip this down? Thx!
Flags: needinfo?(m_kato)
Updated•7 years ago
|
Blocks: MemoryOverhead
Comment 5•7 years ago
|
||
Separating out security updates from ICU bumps overall, I think, would require us to audit every upstream commit to ICU and perhaps also to CLDR (for we could imagine changes in CLDR data tickling bugs in ICU code parsing that, to build up ICU-specific serialized representations). This is not something we have historically done. I'm not sure it's something we reasonably *can* do. We'd have to become experts in understanding all aspects of ICU to understand what changes might or might not have security implications. And the mere attempt to do so has its own risks.
Do we have any sorts of tools we could use to divine what precise thing allocated more memory here? Could we add some sort of logging code to pre/post-update builds to dump ICU memory usage to figure this out maybe?
Flags: needinfo?(jwalden+bmo)
Comment 6•7 years ago
|
||
[Tracking Requested - why for this release]: large memory regression that is not understood
tracking-firefox59:
--- → ?
Comment 7•7 years ago
|
||
We can't just ship large memory regressions because nobody understands the code we're shipping.
Comment 8•7 years ago
|
||
> We can't just ship large memory regressions because nobody understands the code we're shipping.
Agree. I believe the solution is to invest in understanding ICU/CLDR better including:
- build system integration
- how we select what we build
- what tables/languages get added/removed between versions of CLDR/ICU
- what is the package size / memory / perf impact of an update
As far as I know :m_kato is the only person occasionally investigating those issues, while :waldo, :anba and :jfkthame put work to maintain our integration and prevent technical debt from accruing by updating us to latest ICU/CLDR/Unicode.
Steven Loomis of IBM has been working on tooling to improve selection/slicing of data tables when building ICU, and I filed bug 1402379 to get us to a slightly better place on how we handle ICU/CLDR/Intl API integration into SpiderMonkey and Gecko.
Flags: needinfo?(srl)
Assignee | ||
Comment 9•7 years ago
|
||
ICU data is 600KB+ by 60.1. Humm, I will create en-US only package to compare memory usages.
BTW, why is this issue linux32-stylo-disabled only? (Why doesn't we run on linux32?)
Flags: needinfo?(m_kato)
Comment 10•7 years ago
|
||
Did this regression show up on any other platforms besides Linux32?
Flags: needinfo?(igoldan)
Comment 11•7 years ago
|
||
Flags: needinfo?(igoldan)
Assignee | ||
Comment 12•7 years ago
|
||
Humm, does it depends on OS page size? "stylo-disabled" isn't real configuration as user. I wonder that we don't run awsy on linux (style enabled).
Comment 13•7 years ago
|
||
stylo-disabled is a real config for android users and users who disable it for specific extensions. I am curious why this isn't run for regular linux32 and it looks like we intentionally shut it off in bug 1365708.
Comment 14•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #13)
> stylo-disabled is a real config for android users and users who disable it
> for specific extensions. I am curious why this isn't run for regular
> linux32 and it looks like we intentionally shut it off in bug 1365708.
Given 32-bit Linux is such a small percentage of users I think we decided it wasn't worth dedicating resources to perf testing. It does appear we regressed the win32 tabs open measures as well [1]. I'll spin up a few more runs to get better numbers.
FWIW we enabled stylo on android in bug 1366049. Eventually we're removing the old style system in bug 1395112, so disabling it won't be an option. The main advantage to testing with stylo disabled is that we can compare memory usage of the new style system to the old one, with 57 shipped that probably less important.
[1] https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=907fb4683dfef81fbb746035939ac8154b66dfe5&newProject=mozilla-inbound&newRevision=94111a55e8c3dfc3cb526650c7a18ea9e7f5d85e&originalSignature=c10a99a5c6a58850b7f9039fc7a4eb7b9ca2fc46&newSignature=c10a99a5c6a58850b7f9039fc7a4eb7b9ca2fc46&framework=4
Comment 15•7 years ago
|
||
I can track this for 59, and follow up next week to see if there is progress.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → m_kato
Comment 16•7 years ago
|
||
Can you reproduce the memory increase using only ICU api calls? We frequently run Valgrind against ICU to make sure all memory is cleaned up. So there should not be wasted memory.
Flags: needinfo?(srl)
Comment 17•7 years ago
|
||
(In reply to Steven R. Loomis from comment #16)
> Can you reproduce the memory increase using only ICU api calls? We
> frequently run Valgrind against ICU to make sure all memory is cleaned up.
> So there should not be wasted memory.
This isn't a leak. This is an increase in memory usage while Firefox is running.
Comment 18•7 years ago
|
||
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #14)
> (In reply to Joel Maher ( :jmaher) (UTC-5) from comment #13)
> > stylo-disabled is a real config for android users and users who disable it
> > for specific extensions. I am curious why this isn't run for regular
> > linux32 and it looks like we intentionally shut it off in bug 1365708.
>
> Given 32-bit Linux is such a small percentage of users I think we decided it
> wasn't worth dedicating resources to perf testing.
I'm inclined to firefox-59:fix-optional this, then, but given this:
> FWIW we enabled stylo on android in bug 1366049. Eventually we're removing
> the old style system in bug 1395112, so disabling it won't be an option. The
> main advantage to testing with stylo disabled is that we can compare memory
> usage of the new style system to the old one, with 57 shipped that probably
> less important.
do we need to keep this open to track regressions with stylo on android?
Comment 19•7 years ago
|
||
This doesn't have anything to do with Stylo being enabled or not. We just happen to only run this particular configuration with Stylo disabled. Per comment 13, I think this affects all 32 bit systems.
Updated•7 years ago
|
Summary: 3.73% Resident Memory (linux32-stylo-disabled) regression on push e74e4ec2cb39 (Wed Nov 15 2017) → 3.73% Resident Memory (linux32-stylo-disabled and win32) regression on push e74e4ec2cb39 (Wed Nov 15 2017)
Comment 20•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #17)
> (In reply to Steven R. Loomis from comment #16)
> > Can you reproduce the memory increase using only ICU api calls? We
> > frequently run Valgrind against ICU to make sure all memory is cleaned up.
> > So there should not be wasted memory.
>
> This isn't a leak. This is an increase in memory usage while Firefox is
> running.
OK, thanks. This is comparing ICU 59.1 to 60.1 ? ICU does have configurable caching mechanisms, perhaps there's something that can be tweaked. A separate use case (just calling ICU code) that simulates your use case and reproduces this difference would be helpful.
Comment 21•7 years ago
|
||
FWIW I think we're looking at more like a 45MiB regression in the 'Tabs Open' measure of RSS on Win32. That measure is pretty noisy so it's hard to tell for sure, but this is rather concerning.
Comment 22•7 years ago
|
||
I was also pointed to "Smaller files in breakiterator 9954" http://site.icu-project.org/download/60#TOC-Common-Changes — BreakIterator does use more memory in 60 in exchange for smaller data files (1.2mb to 0.8 mb), because it caches break positions.
Updated•7 years ago
|
status-firefox58:
--- → unaffected
Comment 23•7 years ago
|
||
This was independently filed upstream as https://ssl.icu-project.org/trac/ticket/13532
Assignee | ||
Comment 24•7 years ago
|
||
(In reply to Steven R. Loomis from comment #22)
> I was also pointed to "Smaller files in breakiterator 9954"
> http://site.icu-project.org/download/60#TOC-Common-Changes — BreakIterator
> does use more memory in 60 in exchange for smaller data files (1.2mb to 0.8
> mb), because it caches break positions.
breakiterator might use Intl.RelativeTimeFormat only by bug 1270140 and we don't add break rule data file in our icudt*.dat.
Comment 25•7 years ago
|
||
Since this bug is not caused by Stylo, I am adding a whiteboard tag to remove this bug from the Stylo bug queries.
Whiteboard: [memshrink] → [memshrink][stylo]
Assignee | ||
Comment 26•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6b8e2af73aa5bd730dd1179eb013f0f44e5bf6f3
Also, when I shrink dat file (en and root only) for testing, it doesn't reduce resident memory.
Updated•7 years ago
|
Whiteboard: [memshrink][stylo] → [MemShrink:P1][stylo]
Comment 27•7 years ago
|
||
Are we able to do anything about this regression or do we just have to eat it since we want to take the new version of ICU?
Flags: needinfo?(m_kato)
Comment 28•7 years ago
|
||
(In reply to Andrew Overholt [:overholt] from comment #27)
> Are we able to do anything about this regression or do we just have to eat
> it since we want to take the new version of ICU?
Has anyone evaluated https://ssl.icu-project.org/trac/ticket/13532 compared to the Mozilla code? For example, are you "creating large numbers of break iterators" where they might be able to be reused? It's possible the caches have some control on them.
Comment 29•7 years ago
|
||
(In reply to Steven R. Loomis from comment #28)
> (In reply to Andrew Overholt [:overholt] from comment #27)
> > Are we able to do anything about this regression or do we just have to eat
> > it since we want to take the new version of ICU?
>
> Has anyone evaluated https://ssl.icu-project.org/trac/ticket/13532 compared
> to the Mozilla code? For example, are you "creating large numbers of break
> iterators" where they might be able to be reused? It's possible the caches
> have some control on them.
There's a linked ticket with an initial patch… it is aimed at speeding up initial time, but may reduce heap also.
Assignee | ||
Comment 30•7 years ago
|
||
Assignee | ||
Comment 31•7 years ago
|
||
This might be regression by the following ICU landing according to try server.
svn log -r r40076
------------------------------------------------------------------------
r40076 | roubert | 2017-04-27 05:12:01 +0900 (Thu, 27 Apr 2017) | 1 line
ticket:13094: Handle empty language subtag in parseTagString().
------------------------------------------------------------------------
Assignee | ||
Comment 32•7 years ago
|
||
(In reply to Steven R. Loomis from comment #28)
> (In reply to Andrew Overholt [:overholt] from comment #27)
> > Are we able to do anything about this regression or do we just have to eat
> > it since we want to take the new version of ICU?
>
> Has anyone evaluated https://ssl.icu-project.org/trac/ticket/13532 compared
> to the Mozilla code? For example, are you "creating large numbers of break
> iterators" where they might be able to be reused? It's possible the caches
> have some control on them.
We don't use breaker rule. So even if I apply that patch, it doesn't improve.
Comment 33•7 years ago
|
||
(In reply to Makoto Kato [:m_kato] from comment #31)
> This might be regression by the following ICU landing according to try
> server.
>
> svn log -r r40076
> ------------------------------------------------------------------------
> r40076 | roubert | 2017-04-27 05:12:01 +0900 (Thu, 27 Apr 2017) | 1 line
>
> ticket:13094: Handle empty language subtag in parseTagString().
> ------------------------------------------------------------------------
http://bugs.icu-project.org/trac/changeset/40076 caused a 3.73% increase in memory usage? Anyway, need info about the content of the used memory, what you're actually seeing.
Assignee | ||
Comment 34•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5fc9a55f798aca4eb50218313e00b52805fc56d1 (r40074)
Resident Memory opt stylo-disabled: 434,478,390.42
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d3ed376e726cda213f300217753300bacb01f079 (r40076)
Resident Memory opt stylo-disabled: 460,372,989.86
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(m_kato)
Assignee | ||
Comment 35•7 years ago
|
||
(In reply to Steven R. Loomis from comment #33)
> (In reply to Makoto Kato [:m_kato] from comment #31)
> > This might be regression by the following ICU landing according to try
> > server.
> >
> > svn log -r r40076
> > ------------------------------------------------------------------------
> > r40076 | roubert | 2017-04-27 05:12:01 +0900 (Thu, 27 Apr 2017) | 1 line
> >
> > ticket:13094: Handle empty language subtag in parseTagString().
> > ------------------------------------------------------------------------
>
> http://bugs.icu-project.org/trac/changeset/40076 caused a 3.73% increase in
> memory usage? Anyway, need info about the content of the used memory, what
> you're actually seeing.
As long as our test infra, it causes this issue. When I backed out this from 60.2, this was resolved. Do you know the reason why Resident Memory (See https://wiki.mozilla.org/AWSY/Tests) is incremented by r40076?
Flags: needinfo?(srl)
Assignee | ||
Comment 36•7 years ago
|
||
http://people.mozilla.org/%7Ejmaher/taloszips/zips/tp5n.zip is no loner available, so where is current tp5n.zip ?
Flags: needinfo?(jmaher)
Assignee | ||
Comment 37•7 years ago
|
||
ICU 60.2
Resident Memory opt stylo-disabled: 468,086,541.99
ICU 60.2 + backed out r40076
Resident Memory opt stylo-disabled: 457,160,950.85
Although I add an assertion that this condition (*langLength == 0 and _isIDSeparator(*position) is true) is hit, it doesn't stop by that assertion. This might be compiled code issue.
Comment 38•7 years ago
|
||
you can find tp5n.zip here:
https://github.com/rwood-moz/talos-pagesets/
Flags: needinfo?(jmaher)
Assignee | ||
Comment 39•7 years ago
|
||
:waldo, according to try server's AWSY data, this causes by http://bugs.icu-project.org/trac/changeset/40076 (http://bugs.icu-project.org/trac/ticket/13094). I add assertion with (*langLength == 0 and _isIDSeparator(*position) is true), tp5n doesn't becomes this condition. So I think that this issue depends on generated code by compiler.
Of course, when we back out that change, the usage of Resident Memory is back. But I think that this is risky as long as ICU's ticket. And AWSY on linux32 is already disabled. So I recommend that this should be wontfix. How do you think?
Flags: needinfo?(jwalden+bmo)
Comment 40•7 years ago
|
||
(In reply to Makoto Kato [:m_kato] from comment #39)
> :waldo, according to try server's AWSY data, this causes by
> http://bugs.icu-project.org/trac/changeset/40076
> (http://bugs.icu-project.org/trac/ticket/13094). I add assertion with
> (*langLength == 0 and _isIDSeparator(*position) is true), tp5n doesn't
> becomes this condition. So I think that this issue depends on generated
> code by compiler.
>
> Of course, when we back out that change, the usage of Resident Memory is
> back. But I think that this is risky as long as ICU's ticket. And AWSY on
> linux32 is already disabled. So I recommend that this should be wontfix.
> How do you think?
This is an issue on win32 as well (see comment 21). Given we know the exact patch that is regressing it seems like we can figure out a fix.
Comment 41•7 years ago
|
||
We're running out of time in the Fx59 development cycle to fix this. Are there any recent updates?
Comment 42•7 years ago
|
||
I grabbed a copy of the try repository plus the two revs tested to diagnose the RSS regression. This is the full diff. Nothing in this except the one changed line.
Looking back at the original change in Trac and discussion of it, the patch logic checks out. It's definitely the correct and proper change for what the code is trying to do. It would actively reintroduce the problem identified in that change, to revert it. Cherrypicking reverting this one change would not be correct.
I don't see a plausible mechanism by which this one-line change leads to a 25-45MB regression. The statement that was moved from an else-arm to happen all the time, should only hit if a tag is parsed that lacks a language component -- but this shouldn't happen in the browser at all for JS on the web. Every language tag SpiderMonkey hands off to ICU, it's first syntax-checked to verify that the tag is structurally valid per BCP47. BCP47 (in contrast to the structure of the $LANG environment variable, that this function was additionally written to process) requires a language component in language tags. ICU probably does use this function to parse $LANG on Firefox's behalf -- but this would only likely happen rarely. Maybe only when getting the default locale in Intl code (and we actually cache that value pretty hard, so even that would happen not regularly). Besides, I rather doubt treeherder is running with one of these weird language-less $LANG in the first place.
I also repushed the two revs in comment 34 to double-check the RSS stats:
Before:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8a9233cfcfc9186bc336d87bc284a4e6fedd868e&selectedJob=161078449
After:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=35930e7a008de82c0cce926f939772ede5ffc067
The linux32 sy numbers are all over the map. I don't see a consistent regression in them at all. Nor do I see one in win32 numbers.
m_kato also tells me he set up tp5 locally to try to find the regression alleged in comment 14 and comment 21 and wasn't seeing it.
Given that https://treeherder.mozilla.org/perf.html#/graphs?timerange=7776000&series=mozilla-inbound,1557588,1,4&series=mozilla-inbound,1557638,1,4&series=mozilla-inbound,1308546,1,4&selected=mozilla-inbound,1308546,284784,380134165,4 isn't really revealing a regression here, and given that the mechanism by which a one-line change would result in such vast memory swing is also unclear -- at worst I would expect cost linear in the number of almost-always-small language tags parsed -- I am increasingly convinced we're just shadowboxing here, and there is not and never was an issue.
So: at this point this whole bug feels like an INCOMPLETE or INVALID to me. Either the two identified revs are not the cause of the regression, or there isn't a regression at all. And given the long-term graph doesn't show a regression, that seems much likelier to me.
Flags: needinfo?(jwalden+bmo)
Comment 43•7 years ago
|
||
Alright, I guess we'll close this and hope that this is indeed a fluke. Thanks for investigating.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Updated•7 years ago
|
tracking-firefox59:
+ → ---
tracking-firefox60:
+ → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•