1189896 - 5-14% Linux*/WinXP tp5o private bytes regression on Mozilla-Inbound-Non-PGO on July 30, 2015 from push 59e6b29d756a

Reporter

Description

•

10 years ago

Talos has detected a Firefox performance regression from your commit 59e6b29d756a in bug 1188578. We need you to address this regression. This is a list of all known regressions and improvements related to your bug: http://alertmanager.allizom.org:8080/alerts.html?rev=59e6b29d756a&showAll=1 On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5 Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax: try: -b o -p linux,win32 -u none -t tp5o # add "mozharness: --spsProfile" to generate profile data To run the test locally and do a more in-depth investigation, first set up a local Talos environment: https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code Then run the following command from the directory where you set up Talos: talos --develop -e <path>/firefox -a tp5o Making a decision: As the patch author we need your feedback to help us handle this regression. *** Please let us know your plans by Monday, or the offending patch will be backed out! *** Our wiki page oulines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

Ryan VanderMeulen [:RyanVM]

Comment 1

•

10 years ago

drh, any ideas where the memory usage increase would be coming from?

D. Richard Hipp

Comment 2

•

10 years ago

Version 3.8.11 adds a new feature in which a block of memory is preallocated for the first few lines of the page cache for each database connection. By default, the amount of memory allocated is the smaller of the configured "PRAGMA cache_size" and 100 pages. This default can be raised or lowered at compile-time or start-time. By allocating multiple page cache lines all at once, instead of individually, some workloads see a 5% performance increase. In the long run, no additional memory is used. And, in fact, less memory might be used, because the bulk page cache allocation reduces the number of clownshoe allocations (aka internal fragmentation). However, the amount of memory *initially* allocated is larger. You can change the initial page cache allocation at compile-time using -DSQLITE_DEFAULT_PCACHE_INITSZ=N. A value of 0 restores the 3.8.10 behavior. Positive values of N mean N pages worth of memory. Negative values of N means -1024*N bytes of memory. The default N is 100. This setting can also be changed at start-time using sqlite3_config(SQLITE_CONFIG_PAGECACHE, 0, 0, N); The 2nd and 3rd parameters must be zero (or, technically NULL for the 2nd) and then the N parameter overrides the compile-time SQLITE_DEFAULT_PCACHE_INITSZ setting. The configured "PRAGMA cache_size" for each database connection is an upper bound on the size of the bulk allocation. If you lower the cache_size of a database connection that is not in use (and hence as no pinned pages of cache) the bulk allocation will resize if necessary. The bulk allocation will be released (if possible - assuming no cache lines are pinned) when you do sqlite3_db_release_memory() on a database connection. If a page cache needs more lines that are contained in the initial bulk allocation, additional cache lines are individually allocated.

Eric Rahm [:erahm]

Comment 3

•

10 years ago

This has caused a 25MiB regression in AWSY [1] on first start, memory report diff clearly points to sqlite: 25.60 MB (100.0%) -- explicit ├──23.18 MB (90.56%) -- storage/sqlite │ ├──23.12 MB (90.30%) ── other │ └───0.07 MB (00.26%) ++ (5 tiny) ├───2.21 MB (08.62%) ── heap-unclassified Additionally there is a 20MiB regression in the 'after TP5' measurement (which is taken after opening 30 tabs multiple times). [1] https://areweslimyet.com/

Whiteboard: [talos_regression] → [talos_regression][MemShrink][AWSY]

Eric Rahm [:erahm]

Comment 4

•

10 years ago

Note: that's about a 22% regression in startup memory. I'm not sure what the behavior is on b2g, but I imagine this would be particularly bad.

Blocks: AWSY

D. Richard Hipp

Comment 5

•

10 years ago

Does adding -DSQLITE_DEFAULT_PCACHE_INITSZ=0 to the compile-time options cure the problem? If that is hard to do, you can also try adding the line: #define SQLITE_DEFAULT_PCACHE_INITSZ 0 To the top of the db/sqlite/src/sqlite3.c file.

Eric Rahm [:erahm]

Comment 6

•

10 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=95f96f869e14

Eric Rahm [:erahm]

Comment 7

•

10 years ago

(In reply to D. Richard Hipp from comment #5) > Does adding -DSQLITE_DEFAULT_PCACHE_INITSZ=0 to the compile-time options > cure the problem? If that is hard to do, you can also try adding the line: > > #define SQLITE_DEFAULT_PCACHE_INITSZ 0 > > To the top of the db/sqlite/src/sqlite3.c file. drh, thanks for the pointer.I pushed a baseline build and a patch with the #define. Results will eventually show up in https://areweslimyet.com/?series=erahm_bug_1189896&evenspacing

Marco Bonardo [:mak]

Assignee

Comment 8

•

10 years ago

I think that feature basically causes all of our connections to alloc their maximum cache size, that is 2MB per connection (apart the location bar connection that uses 6MB). The alternative of 100 pages with our page size (32K) would be 3MB. Indeed the problem is that we use a very large page size. For some databases that size will be the condition at regime, but some dbs are unlikely to hit that limit. Using a smaller limit like 10 or 16 pages could still give some benefits without a big hit. If I look at my current status: ├──23.07 MB (07.43%) ── other ├──9.05 MB (02.92%) -- places.sqlite [4] ├──1.99 MB (00.64%) ++ ublock.sqlite ├──1.45 MB (00.47%) ++ formhistory.sqlite ├──1.22 MB (00.39%) ++ cookies.sqlite ├──0.41 MB (00.13%) ++ healthreport.sqlite ├──0.17 MB (00.06%) ++ content-prefs.sqlite └──0.11 MB (00.04%) ++ permissions.sqlite └──4.05 MB (01.31%) -- webappsstore.sqlite [2] Btw, why is this memory reported as "other" instead of being reported as cache for each connection? I guess it's because it's pre-allocated and then connections alloc from this arena? if I count missing memory for each connection (memory needed to reach 2MB) it's about 9MB, not 23. If instead I count the memory that should be preallocated it's about 25MB Off-hand looks like we might be reporting this memory twice and that's what causes us to see a so huge regression. ni Nicholas that may be interested into evaluating this.

Flags: needinfo?(n.nethercote)

Marco Bonardo [:mak]

Assignee

Comment 9

•

10 years ago

To sum up: - we should use sqlite3_config (cause otherwise we couldn't control memory usage of native sqlite on linux) to reduce the preallocation - we should NOT count this memory twice

D. Richard Hipp

Comment 10

•

10 years ago

(In reply to Marco Bonardo [::mak] from comment #8) > > Btw, why is this memory reported as "other" instead of being reported as > cache for each connection? I think this is a bug in SQLite. FF is probably getting the amount of heap used for cache memory using sqlite3_db_status(SQLITE_DBSTATUS_CACHE_USED) (see https://www.sqlite.org/c3ref/c_dbstatus_options.html for details.) But that API does not seem to include the big initial allocation. Ugh. Is this a problem you can just live with until the next regular release of SQLite? Or do we need to do a patch release to fix this one problem?

Marco Bonardo [:mak]

Assignee

Comment 11

•

10 years ago

for each connection we get: SQLITE_DBSTATUS_STMT_USED, SQLITE_DBSTATUS_CACHE_USED, SQLITE_DBSTATUS_SCHEMA_USED "other" is: sqlite3_memory_used() - (sum of all of the above) So yes, if SQLITE_DBSTATUS_CACHE_USED doesn't include the initial allocation, that explains the huge other number. But if I read that correctly, that means we are not double counting the memory. Part of those 23MB should be spread across the various connections, that means in the end the memory usage is bigger than I'd expect. There I have 12 connections. I'd expect at a maximum they should take about 28MB (11x2MB + 6Mb) plus some space for other stuff, but currently the reported memory is 42MB. Could be sqlite3_memory_used is double counting something? Honestly I think for now we should go with sqlite3_config(SQLITE_CONFIG_PAGECACHE, 0, 0, 0); and then file a separate bug to slightly increase the initial allocation when the memory mistery will be solved. Regardless that seems to be worth its own bug due to the memory usage implications.

Marco Bonardo [:mak]

Assignee

Updated

•

10 years ago

Blocks: 1191405

Comment 12

•

10 years ago

I filed bug 1191405 to evaluate preallocation in future releases. I'm now going to make a patch to disable them for Firefox 42.

D. Richard Hipp

Comment 13

•

10 years ago

It is just an optimization. I have a note to work on improving the situation for the next SQLite release.

Marco Bonardo [:mak]

Assignee

Comment 14

•

10 years ago

Attached patch patch v1 — Details — Splinter Review

I don't know if Andrew is still available to review patches, or even if he's on vacation... In the worst case we can land as rs=ryanVM :) https://treeherder.mozilla.org/#/jobs?repo=try&revision=1520c52163a1 Eric, could you please tell me if your experiment succeded, or eventually check with this Try?

Assignee: nobody → mak77

Status: NEW → ASSIGNED

Flags: needinfo?(erahm)

Attachment #8643793 - Flags: review?(bugmail)

Marco Bonardo [:mak]

Assignee

Updated

•

10 years ago

Flags: needinfo?(n.nethercote)

D. Richard Hipp

Comment 15

•

10 years ago

Another option is to add -DSQLITE_DEFAULT_PCACHE_INITSZ=0 to the right makefile.

Andrew Sutherland [:asuth] (he/him)

Comment 16

•

10 years ago

Comment on attachment 8643793 [details] [diff] [review] patch v1 I'm still here! If there's some bug I've been delinquent in providing feedback on, let me know. My general thing is just I think it would be a win for you to have another peer on storage from the platform team who could maybe be more helpful and maybe proactively help with cleanup than I.

Attachment #8643793 - Flags: review?(bugmail) → review+

Andrew Sutherland [:asuth] (he/him)

Comment 17

•

10 years ago

(In reply to D. Richard Hipp from comment #15) > Another option is to add -DSQLITE_DEFAULT_PCACHE_INITSZ=0 to the right > makefile. The thing with defines is that linux distributions frequently like to use system SQLite, so sqlite3_config has a lot of upsides for us.

Eric Rahm [:erahm]

Comment 18

•

10 years ago

(In reply to Marco Bonardo [::mak] (spotty available until 24 Aug) from comment #14) > Eric, could you please tell me if your experiment succeded, or eventually > check with this Try? The AWSY results [1] are in, we can clearly see the regression is gone for startup when setting 'SQLITE_DEFAULT_PCACHE_INITSZ=0'. The 'after TP5' measurement improved by 16MiB, so that leaves a 4MiB gap. This could just be noise, so I think we can follow up with that later if it sticks around. [1] https://areweslimyet.com/?series=erahm_bug_1189896&evenspacing

Flags: needinfo?(erahm)

Marco Bonardo [:mak]

Assignee

Comment 19

•

10 years ago

(In reply to Andrew Sutherland [:asuth] from comment #16) > If there's some bug I've been delinquent in providing > feedback on, let me know. No worries, you're great! The fact is we are losing peers all around, I don't expect anyone to do cleanup at this time, since we are all extremely busy with priority stuff. I will ask bkelly if he wants to be added as a Storage peer, and I assume khuey is the new king of the indexedDB hill and may be interested in Storage, so we are somehow covered. Thank you for the prompt review!

Pulsebot

Comment 20

•

10 years ago

https://hg.mozilla.org/integration/fx-team/rev/a65727f3f5c4

Nicholas Nethercote [inactive]

Comment 21

•

10 years ago

I agree that setting the initial size to 0 seems like the right thing to do for now. (In reply to D. Richard Hipp from comment #10) > > Is this a problem you can just live with until the next regular release of > SQLite? Or do we need to do a patch release to fix this one problem? I can live with it until the next regular release. Especially since setting the initial size to 0 means that we won't actually observe the bug in practice, IIUC. Thank you all for the fast diagnosis and fix.

Carsten Book [:Tomcat]

Comment 22

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/a65727f3f5c4

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

status-firefox42: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla42

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 23

•

10 years ago

I have verified that this has fixed the regression we were seeing! Thanks for fixing it:)

BMO Automation

Updated

•

7 months ago

Product: Toolkit → Core