Closed Bug 1169753 Opened 6 years ago Closed 6 years ago
Remove favicons GUIDs - 2% Win7 Non-Main Normal File IO regression on Fx-Team-Non-PGO on May 29, 2015 from push 5670c0c09c2c
Talos has detected a Firefox performance regression from commit 5670c0c09c2c in bug 1127763. We need you to address this regression. This is a list of all known regressions and improvements related to your bug: http://alertmanager.allizom.org:8080/alerts.html?rev=5670c0c09c2c&showAll=1 On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#xperf Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax: try: -b o -p win32 -u none -t xperf # add "mozharness: --spsProfile" to generate profile data To run the test locally and do a more in-depth investigation, first set up a local Talos environment: https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code Then run the following command from the directory where you set up Talos: talos --develop -e <path>/firefox -a xperf Making a decision: As the patch author we need your feedback to help us handle this regression. *** Please let us know your plans by Monday, or the offending patch will be backed out! *** Our wiki page oulines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
out of the 4 changesets, lets start with the first one and probably the most suspicious: http://hg.mozilla.org/integration/fx-team/rev/5670c0c09c2c Timur, can you help determine if this is the cause? We are accessing about 10Mb of file IO (not on the main thread) with the set of patches: https://hg.mozilla.org/integration/fx-team/pushloghtml?changeset=5670c0c09c2c
Sure. I will try to investigate. But I can do it only on Monday.
Thanks Timur! Ask questions on IRC if needed or in the bug.
I'd like to better understand what Win7 Non-Main Normal File IO is measuring. In the wiki it states: measuring IO counters from windows (currently, only startup IO is in scope). This is very unclear, since the test seems to be Tp5? Do we measure startup IO in tp5 that is a test loading lots of pages? how is startup timeframe defined? In a tp5 run, the change could request a little bit more reads (as well ass creation of an additional index on startup, but this would end in a "worst" run and thus should be discarded), since the index needs to verify no duplicate exists. Note that most of the users in the wild were already paying this cost, only "recent" profiles did not pay it, Talos was not measuring this since it was creating a new profile. So this is bringing the measure on par with reality of our users. That said, if we know this has an high price, we could evaluate to remove it for ALL the users. I remember favicons guids were added for Sync, but I can't find any code using them... I think it's because Sync never finished implementing favicons sync. I have no idea if it's still planned, and why favicons couldn't become part of History information in Sync. Then we wouldn't need a guid at all. Maybe Richard has insight regarding status of that, or can point us to someone who has.
I think favicons sync bug was bug 675299.
or bug 428378.
Yes, both of those :) We added GUIDs for favicons to support Sync because: 1. We couldn't just jam favicon data into history records or bookmark records: too big for records that change frequently, and sharing between history and bookmarks becomes difficult. 2. We didn't want to leak visited URLs by using them as identifiers, so we added a random GUID. My current thinking is that the right way to do #2 is to sync some stable salts (inside encrypted records, of course), then use those to hash favicon URLs. No need for GUIDs at that point, and you get random access for free. (I would also like to do this for history records, but that's another story.) I think it's unlikely that favicon sync will ever be implemented in a way that uses these GUIDs. We'll either build a specialized non-encrypted service to aggregate favicons (because bulk-fetching synced favicons implicitly leaks your browsing history to the network you're on), or we'll do it without GUIDs, as mentioned above. And of course there's nobody working on this… so drop the GUIDs.
non-main normal file IO is measured with xperf during the startup and entire run of the browser for tp5. There is a marker that firefox generates for xperf to indicate we are done with startup. We calculate all the IO up to that point. Likewise we do it from when we start to shutdown. The markers are defined by a .mof file and used with xperf to look for the guids: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/startup/mozprofilerprobe.mof?from=mozprofilerprobe.mof&case=true#1 These GUID's in the file are found in the source: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/startup/nsAppStartup.cpp#72
Timur, are you interested in taking this, that means, remove the guid column from the moz_favicons table, remove the index (in a new schema version...) . Basically we should undo bug 675996.
links to the checked-in changes we must revert are in https://bugzilla.mozilla.org/show_bug.cgi?id=675996#c47
Whiteboard: [talos_regression] → [talos_regression][good next bug][lang=cpp]
Summary: 2% Win7 Non-Main Normal File IO regression on Fx-Team-Non-PGO on May 29, 2015 from push 5670c0c09c2c → Remove favicons GUIDs - 2% Win7 Non-Main Normal File IO regression on Fx-Team-Non-PGO on May 29, 2015 from push 5670c0c09c2c
Hello Marco, am I right that I should fully remove these changes? https://hg.mozilla.org/mozilla-central/rev/9d2dd9486d41 https://hg.mozilla.org/mozilla-central/rev/8c47ef9419ac
yes, plus a new schema migration to remove the index for everyone and nullify the column contents. Unfortunately we cannot remove a table column without rebuilding the whole table, when in future we'll move the favicons table to a new database, then we'll also drop the column, for now nullifying will be enough.
clearly you can also remove the last migration we added that created the index for databases missing it. It would be pointless to create an index to just drop it immediately.
Sure, it should be removed. Thank you for the information.
Hi Timur, I am checking in with you here. Have you had a chance to look into this?
Hi Joel. Sorry for the delay. Now I'm working on it. So I hope that through several day new patch will be ready.
First version of the patch. I test it with xpcshell-test and it looks ok.
Comment on attachment 8624677 [details] [diff] [review] bug-1169753-fix-0.1.patch Review of attachment 8624677 [details] [diff] [review]: ----------------------------------------------------------------- Sorry for the delay, I was on vacation and currently at a conference. Apart from the below comments, looks like you might have to update some tests, at least some of the tests in tests/migration. Please ensure running ./mach xpcshell-tests toolkit/components/places will be fine. ::: toolkit/components/places/Database.cpp @@ +1915,5 @@ > + > + rv = mMainConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING( > + "UPDATE moz_favicons " > + "SET guid = NULL " > + )); we cannot do this here, cause users already obtained this code, so they are currently on version 29... we must upgrade them to versions 30, remove migrateV29up, and put a comment where 29 would have been called stating that schema version 29 is obsoleted by the new version... And we should do the same for v14. Basically we don't want to add an index, just to remove it a few calls later.
Hello. Am I right that is needed to add MigrateV30Up function, places_v30.sqlite file and fully delete MigrateV14Up, MigrateV29Up functions, places_v29.sqlite file?
yes, that should do.
I created the new patch. Looks like all xpcshell-tests for toolkit/components/places/ tests are ok.
Attachment #8627619 - Flags: review?(mak77)
Comment on attachment 8627619 [details] [diff] [review] bug-1169753-fix-0.2.patch Review of attachment 8627619 [details] [diff] [review]: ----------------------------------------------------------------- Yes, we are almost there! I didn't check the side of places_v30.sqlite, just remember to VACUUM it so it's about 1,5MB instead of 10MB. ::: toolkit/components/places/Database.cpp @@ +1037,2 @@ > NS_ENSURE_SUCCESS(rv, rv); > } Please leave the "// Firefox 41 uses schema version 29." comment, just change it to "30" we'll then ask for uplifting the patch to Firefox 41. ::: toolkit/components/places/tests/migration/test_current_from_v16.js @@ -1,2 @@ > -/* Any copyright is dedicated to the Public Domain. > - http://creativecommons.org/publicdomain/zero/1.0/ */ We must retain test_current_from_v16.js cause it's testing other stuff unrelated to favicons (it's basically testing schema v17) I suppose you might use places_v11.sqlite for the test and rename the test to _from_v11?
Attachment #8627619 - Flags: review?(mak77) → feedback+
As I understand you suggest to retain test_current_from_v16.js and then rename it to test_current_from_v11.js with small changes (like using places_v11.sqlite). Am I right?
Here is the new patch. File places_v30.sqlite is about 1,5MB
Comment on attachment 8627690 [details] [diff] [review] bug-1169753-fix-0.3.patch Review of attachment 8627690 [details] [diff] [review]: ----------------------------------------------------------------- The patch looks good, the only problem is that I think you manually moved test_current_from_v16.js instead of using hg rename, and that loses all of the blame for the file... Could you please attach a patch using hg rename on that file?
Attachment #8627690 - Flags: review?(mak77) → feedback+
Comment on attachment 8628156 [details] [diff] [review] bug-1169753-fix-0.4.patch Review of attachment 8628156 [details] [diff] [review]: ----------------------------------------------------------------- It looks good to me, thank you! https://treeherder.mozilla.org/#/jobs?repo=try&revision=0f5584d8fe46
Attachment #8628156 - Flags: review?(mak77) → review+
if this fixes everything we might want to uplift to aurora as this regression is on aurora now.
[Tracking Requested - why for this release]: this looks great. I would like to see this uplifted to aurora when possible. not sure of the correct flags for that.
Comment on attachment 8628156 [details] [diff] [review] bug-1169753-fix-0.4.patch Approval Request Comment [Feature/regressing bug #]: bug 1127763 [User impact if declined]: we have a disk IO regression that already existed in old profiles and we "fixed" new profiles. We then figured out we could remove this code completely and reduce disk usage for all the users. This is the patch doing that. We'd like to keep profiles in sync between Aurora and Nightly also for maintenance reasons. [Describe test coverage new/current, TreeHerder]: in Nightly [Risks and why]: mostly code removal [String/UUID change made/needed]: none
Attachment #8628156 - Flags: approval-mozilla-aurora?
Comment on attachment 8628156 [details] [diff] [review] bug-1169753-fix-0.4.patch Approving for uplift to Aurora 41, this has been on m-c for about a week with no known issues.
Attachment #8628156 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.