Closed
Bug 1436297
Opened 6 years ago
Closed 6 years ago
100% I/O usage with Firefox 52 ESR (safebrowsing)
Categories
(Toolkit :: Safe Browsing, defect, P1)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox-esr52 | 60+ | fixed |
firefox59 | --- | unaffected |
firefox60 | --- | unaffected |
firefox61 | --- | unaffected |
People
(Reporter: luca76, Assigned: francois)
References
Details
Attachments
(1 file)
496.00 KB,
image/png
|
Details |
User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0 Build ID: 20180130211649 Steps to reproduce: Nothing in particular, this issue appears at random every days on multiple computers. Using Firefox 52 ESR Actual results: 100% I/O disk activity Expected results: Further investigation with found that the SafeBrowsing component writes too much on the HDD file "mozilla in userprofile \ safebrowsing \ goog-phish-shavar-1.sbstore". Performance monitor shows Firefox is writing this file continously at 7 MB/sec for minutes... why?
Assignee | ||
Updated•6 years ago
|
Priority: -- → P5
Comment 1•6 years ago
|
||
A lot of people have been complaining about performances issue in esr lately. I wonder if this is could not be the source of the problem. We had the same issue with android recently (bug 1420641) Ryan, we might want to do something about that for the next esr release in case there is an easy fix.
Flags: needinfo?(ryanvm)
See Also: → 1420641
Comment 2•6 years ago
|
||
ESR52 is on Safebrowsing v2 still I believe. Francois, any thoughts on what we might be able to do to help mitigate this?
tracking-firefox-esr52:
--- → ?
Flags: needinfo?(ryanvm) → needinfo?(francois)
Assignee | ||
Comment 3•6 years ago
|
||
I'm going to try to see if I can reproduce this.
Assignee: nobody → francois
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(francois)
Priority: P5 → P1
@sylvestre Do you have any number on how many users are impacted? and how bad the performance are impacted?
Comment 5•6 years ago
|
||
I don't know the number but a lot of people have been complaining on the enterprise ML over the last one or two months. I am not sure we have telemetry on that (even if we did, as many enterprise users have disabled telemetry). It is bad enough so that we want to act on it... Sysadmin have been putting in place workaround, etc.
Assignee | ||
Comment 7•6 years ago
|
||
I left my ESR52 profile running overnight and ended up with the following phishing list: -rw-r--r-- 1 francois francois 2.9M Mar 6 10:17 goog-phish-shavar.pset -rw-r--r-- 1 francois francois 4.9M Mar 6 10:17 goog-phish-shavar.sbstore With a fully up-to-date database, I was able to get this amount of startup I/O (iotop -ao): TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 18945 be/4 francois 0.00 B 11.52 M 0.00 % 0.04 % ./firefox --no-remote [URL Classifier] whereas on Nightly 60, I don't even get an entry for the URL Classifier thread. It seems unlikely that this 12MB of writes is responsible for the problems described in comment 0. I also took a profile with the Gecko Profiler and these options: MOZ_PROFILER_STARTUP=1 MOZ_PROFILER_STARTUP_FILTERS="GeckoMain,URL Classifier,Classifier Update" MOZ_PROFILER_STARTUP_FEATURES="mainthreadio" The results are here: https://perfht.ml/2FvnmKI and I couldn't see anything odd in there I/O-wise. I wonder if that's because the Gecko Profiler in 52 doesn't cover the URL Classifier thread. Sylvestre, do you think that Soft Vision would be able to help in creating a test case to reproduce this problem?
Flags: needinfo?(sledru)
Comment 8•6 years ago
|
||
Andrei can probably help finding someone. Luca, the reporter might be able to help also.
Flags: needinfo?(sledru)
Flags: needinfo?(luca76)
Flags: needinfo?(andrei.vaida)
Comment 9•6 years ago
|
||
(In reply to Sylvestre Ledru [:sylvestre] from comment #8) > Andrei can probably help finding someone. This may be harder to crack. We'll try to reproduce it on our Ubuntu and Fedora test machines, but because this involves Safe Browsing v2, we'll need to leave the machines running overnight. We may only have test results sometime next week.
Comment 10•6 years ago
|
||
How can I test this issue on Windows? (other then the profiling extension)
Assignee | ||
Comment 11•6 years ago
|
||
I went through all of the commits to the URL Classifier between 52 and now, filtering out anything that wasn't done by members of the Safe Browsing team: $ hg log --user tnguyen --user hchang --user francois --user dlee --user gcp -r "ancestor(tip,FIREFOX_RELEASE_52_BASE)"..tip toolkit/components/url-classifier --template '{node|short} | {date|isodatesec} | {author|user}: {desc|strip|firstline}\n' | wc -l 174 Of those 174 commits, I shortlisted these 5 commits: > 366897 3d717c8eaa76 2017-01-19 17:48 +0800 hchang > Bug 1332218 - Add boundary checks for array access in nsUrlClassifierPrefixSet::GetPrefixesNative. r=francois. Might be useful since it adds a check for corrupt databases. > 394492 ffdb24073101 2017-04-18 17:00 +0800 tnguyen > Bug 1353853 - Cache preferences when doing channel classify r=francois > 410107 f95be8a71102 2017-07-04 16:36 +0800 tnguyen > Bug 1377559 - Should store value of preference browser.safebrowsing.debug to reuse r=francois Might be useful, but it doesn't appear to be related to disk I/O. > 396372 54ab48204b9e 2017-04-24 17:51 +0800 tnguyen > Bug 1336904 - Add timeouts for Safe Browsing updates r=francois Might be useful since it puts a limit on how long updates can take. This will also need the updated timeout values from bug 1381745. > 402515 8f2b4efc5f0d 2017-05-19 13:17 +0800 tnguyen > Bug 1361699 - Add buffer when writing hashstore to file r=gcp,mcmanus Looks highly relevant since it has to do with lots of small writes to a network-attached disk. That last one sounds especially worthwhile since it's relatively small: https://hg.mozilla.org/mozilla-central/rev/8f2b4efc5f0d Ryan, do you think we should uplift now and hope that it resolves the issues?
Flags: needinfo?(ryanvm)
Assignee | ||
Comment 12•6 years ago
|
||
The new update threading scheme would likely also help here since it almost eliminates all possible deadlocking during database updates, but it's a lot of work and would probably be far too risky for ESR. At a minimum, we'd need all of these commits: 381523 2c1fc455e81e 2017-02-21 14:18 +0800 hchang Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in) r=francois,gcp 382732 313bfccbbd9f 2017-02-25 00:22 +0800 hchang Bug 1342397 - Properly reset stuff when failing to create LookupCache for update. r=gcp 384381 a1dfb38c49b0 2017-02-22 17:25 +0800 hchang Bug 1339760 - Split update process to background/foreground and run background on update thread **synchronously**. r=francois,gcp 385776 a26b45336e56 2017-03-13 21:16 +0800 hchang Bug 1346757 - Change the downloadError callback timing. r=francois 386089 ffb8aa89b8c1 2017-03-10 00:47 +0800 hchang Bug 1345922 - Avoid concurrent update and take the failed beginUpdate into account. r=francois 391336 baaee2deb3ac 2017-04-06 07:07 +0800 hchang Bug 1339050 - Asynchronously apply safebrowsing DB update. r=francois,gcp
Comment 13•6 years ago
|
||
Luca, if we created a test build with these patches included, would you be easily able to test it to see if it helps?
Flags: needinfo?(ryanvm)
Comment 14•6 years ago
|
||
I managed to test this issue on two machine Ubuntu 14.04 x64 and Ubuntu 16.04 x64, same way as Francois did, I left my ESR52 profile running overnight and ended up with the following phishing list: goog-phish-shavar.pset 3,0 MB goog-phish-shavar.sbstore 5,2 MB There were no IO> 100% entries in terminal for the Firefox command, only: TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 14004 be/4 timeazso 0.00 B 416.00 K 0.00 % 0.00 % ./firefox -no-remote -p [localStorage DB] Gecko Profiler result: https://perfht.ml/2DotQpz I contacted the reporter, he told me that the problem is on Windows system, so I will test on windows too.
Flags: needinfo?(andrei.vaida)
Comment 15•6 years ago
|
||
Francois, want to try spinning a Try build of those uplifts for now to see how things look? Note that you'll want to include --buildbot in your Try syntax to make sure you get legacy platforms, i.e. |try: -b do -p all -u all -t none --buildbot|.
Flags: needinfo?(francois)
Comment 16•6 years ago
|
||
On Ubuntu 16.04 x64 sometimes the CPU usages was 100%, see attachment Bug 1436297.png. On Windows 8.1 x64 and Windows 10 x64 I can not reproduced the issue.
Comment 17•6 years ago
|
||
Here is the attachment, I am not sure if the 100% CPU is related to this issue.
Assignee | ||
Comment 18•6 years ago
|
||
Here's a Try build with 8f2b4efc5f0d grafted on top of the esr52 branch: remote: View the pushlog for these changes here: remote: https://hg.mozilla.org/try/pushloghtml?changeset=b9dab7ede75888a47882ef313a78d85b8515118d remote: remote: Follow the progress of your build on Treeherder: remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b9dab7ede75888a47882ef313a78d85b8515118d The patch applies cleanly as-is.
Flags: needinfo?(francois)
Comment 19•6 years ago
|
||
I reached out to the reporter directly and he said he would look at the Try builds. For anybody else interested in testing them, they're available from the links below. Linux32: https://queue.taskcluster.net/v1/task/QkUfCrLrTxihjPmzt3Yqsg/runs/0/artifacts/public/build/target.tar.bz2 Linux64: https://queue.taskcluster.net/v1/task/R4YF-n1WT02htBB6MqIPhw/runs/0/artifacts/public/build/target.tar.bz2 OSX: https://queue.taskcluster.net/v1/task/VIG4SFzbQpKvEPb3YCw4vQ/runs/0/artifacts/public/build/target.dmg Win32: https://archive.mozilla.org/pub/firefox/try-builds/fmarier@mozilla.com-b9dab7ede75888a47882ef313a78d85b8515118d/try-win32/firefox-52.7.2.en-US.win32.zip Win64: https://archive.mozilla.org/pub/firefox/try-builds/fmarier@mozilla.com-b9dab7ede75888a47882ef313a78d85b8515118d/try-win64/firefox-52.7.2.en-US.win64.zip
Flags: needinfo?(luca76)
Comment 20•6 years ago
|
||
I received positive feedback via email on the Try builds. I think we can go ahead with the approval request on bug 1361699 :)
Flags: needinfo?(francois)
Updated•6 years ago
|
status-firefox59:
--- → unaffected
status-firefox60:
--- → unaffected
status-firefox61:
--- → unaffected
status-firefox-esr52:
--- → affected
Assignee | ||
Comment 21•6 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #20) > I received positive feedback via email on the Try builds. I think we can go > ahead with the approval request on bug 1361699 :) https://bugzilla.mozilla.org/show_bug.cgi?id=1361699#c24
Flags: needinfo?(francois)
Comment 22•6 years ago
|
||
Calling this fixed now that bug 1361699 has been uplifted. We can reopen the bug if safebrowsing still shows up as a significant source of slowdowns in next month's 52.8.0 release.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•