1436297 - 100% I/O usage with Firefox 52 ESR (safebrowsing)

Reporter

Description

•

6 years ago

User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0
Build ID: 20180130211649

Steps to reproduce:

Nothing in particular, this issue appears at random every days on multiple computers.

Using Firefox 52 ESR


Actual results:

100% I/O disk activity


Expected results:

Further investigation with found that the SafeBrowsing component writes too much on the HDD file "mozilla in userprofile \ safebrowsing \ goog-phish-shavar-1.sbstore". Performance monitor shows Firefox is writing this file continously at 7 MB/sec for minutes... why?

YF (Yang)

Updated

•

6 years ago

Component: Untriaged → Safe Browsing

Product: Firefox → Toolkit

François Marier [:francois]

Assignee

Updated

•

6 years ago

Priority: -- → P5

Sylvestre Ledru [:Sylvestre]

Comment 1

•

6 years ago

A lot of people have been complaining about performances issue in esr lately.
I wonder if this is could not be the source of the problem. We had the same issue with android recently (bug 1420641)

Ryan, we might want to do something about that for the next esr release in case there is an easy fix.

Flags: needinfo?(ryanvm)

Comment 2

•

6 years ago

ESR52 is on Safebrowsing v2 still I believe. Francois, any thoughts on what we might be able to do to help mitigate this?

tracking-firefox-esr52: --- → ?

Flags: needinfo?(ryanvm) → needinfo?(francois)

François Marier [:francois]

Assignee

Comment 3

•

6 years ago

I'm going to try to see if I can reproduce this.

Assignee: nobody → francois

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Flags: needinfo?(francois)

Priority: P5 → P1

Wennie

Comment 4

•

6 years ago

@sylvestre Do you have any number on how many users are impacted? and how bad the performance are impacted?

Sylvestre Ledru [:Sylvestre]

Comment 5

•

6 years ago

I don't know the number but a lot of people have been complaining on the enterprise ML over the last one or two months. I am not sure we have telemetry on that (even if we did, as many enterprise users have disabled telemetry).
It is bad enough so that we want to act on it... Sysadmin have been putting in place workaround, etc.

Wennie

Comment 6

•

6 years ago

Francois, any update on this bug?

François Marier [:francois]

Assignee

Comment 7

•

6 years ago

I left my ESR52 profile running overnight and ended up with the following phishing list:

-rw-r--r-- 1 francois francois 2.9M Mar  6 10:17 goog-phish-shavar.pset
-rw-r--r-- 1 francois francois 4.9M Mar  6 10:17 goog-phish-shavar.sbstore

With a fully up-to-date database, I was able to get this amount of startup I/O (iotop -ao):

  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                                                                                                                                                                        
18945 be/4 francois      0.00 B     11.52 M  0.00 %  0.04 % ./firefox --no-remote [URL Classifier]

whereas on Nightly 60, I don't even get an entry for the URL Classifier thread. It seems unlikely that this 12MB of writes is responsible for the problems described in comment 0.

I also took a profile with the Gecko Profiler and these options:

MOZ_PROFILER_STARTUP=1
MOZ_PROFILER_STARTUP_FILTERS="GeckoMain,URL Classifier,Classifier Update"
MOZ_PROFILER_STARTUP_FEATURES="mainthreadio"

The results are here: https://perfht.ml/2FvnmKI

and I couldn't see anything odd in there I/O-wise. I wonder if that's because the Gecko Profiler in 52 doesn't cover the URL Classifier thread.

Sylvestre, do you think that Soft Vision would be able to help in creating a test case to reproduce this problem?

Flags: needinfo?(sledru)

Sylvestre Ledru [:Sylvestre]

Comment 8

•

6 years ago

Andrei can probably help finding someone.


Luca, the reporter might be able to help also.

Flags: needinfo?(sledru)

Flags: needinfo?(luca76)

Flags: needinfo?(andrei.vaida)

Andrei Vaida [:avaida]

Comment 9

•

6 years ago

(In reply to Sylvestre Ledru [:sylvestre] from comment #8)
> Andrei can probably help finding someone.

This may be harder to crack. We'll try to reproduce it on our Ubuntu and Fedora test machines, but because this involves Safe Browsing v2, we'll need to leave the machines running overnight. We may only have test results sometime next week.

Andrei Boros

Comment 10

•

6 years ago

How can I test this issue on Windows? (other then the profiling extension)

François Marier [:francois]

Assignee

Comment 11

•

6 years ago

I went through all of the commits to the URL Classifier between 52 and now, filtering out anything that wasn't done by members of the Safe Browsing team:

    $ hg log --user tnguyen --user hchang --user francois --user dlee --user gcp -r "ancestor(tip,FIREFOX_RELEASE_52_BASE)"..tip toolkit/components/url-classifier --template '{node|short} | {date|isodatesec} | {author|user}: {desc|strip|firstline}\n' | wc -l
    174

Of those 174 commits, I shortlisted these 5 commits:

> 366897   3d717c8eaa76   2017-01-19 17:48 +0800   hchang
>   Bug 1332218 - Add boundary checks for array access in nsUrlClassifierPrefixSet::GetPrefixesNative. r=francois.

Might be useful since it adds a check for corrupt databases.

> 394492   ffdb24073101   2017-04-18 17:00 +0800   tnguyen
>   Bug 1353853 - Cache preferences when doing channel classify r=francois
> 410107   f95be8a71102   2017-07-04 16:36 +0800   tnguyen
>   Bug 1377559 - Should store value of preference browser.safebrowsing.debug to reuse r=francois

Might be useful, but it doesn't appear to be related to disk I/O.

> 396372   54ab48204b9e   2017-04-24 17:51 +0800   tnguyen
>   Bug 1336904 - Add timeouts for Safe Browsing updates r=francois

Might be useful since it puts a limit on how long updates can take.

This will also need the updated timeout values from bug 1381745.

> 402515   8f2b4efc5f0d   2017-05-19 13:17 +0800   tnguyen
>   Bug 1361699 - Add buffer when writing hashstore to file r=gcp,mcmanus

Looks highly relevant since it has to do with lots of small writes to a network-attached disk.

That last one sounds especially worthwhile since it's relatively small: https://hg.mozilla.org/mozilla-central/rev/8f2b4efc5f0d

Ryan, do you think we should uplift now and hope that it resolves the issues?

Flags: needinfo?(ryanvm)

François Marier [:francois]

Assignee

Comment 12

•

6 years ago

The new update threading scheme would likely also help here since it almost eliminates all possible deadlocking during database updates, but it's a lot of work and would probably be far too risky for ESR.

At a minimum, we'd need all of these commits:

381523   2c1fc455e81e   2017-02-21 14:18 +0800   hchang
  Bug 1338970 - Introduce 2-step DB update scheme. (background update then swap in) r=francois,gcp
382732   313bfccbbd9f   2017-02-25 00:22 +0800   hchang
  Bug 1342397 - Properly reset stuff when failing to create LookupCache for update. r=gcp
384381   a1dfb38c49b0   2017-02-22 17:25 +0800   hchang
  Bug 1339760 - Split update process to background/foreground and run background on update thread **synchronously**. r=francois,gcp
385776   a26b45336e56   2017-03-13 21:16 +0800   hchang
  Bug 1346757 - Change the downloadError callback timing. r=francois
386089   ffb8aa89b8c1   2017-03-10 00:47 +0800   hchang
  Bug 1345922 - Avoid concurrent update and take the failed beginUpdate into account. r=francois
391336   baaee2deb3ac   2017-04-06 07:07 +0800   hchang
  Bug 1339050 - Asynchronously apply safebrowsing DB update. r=francois,gcp

Ryan VanderMeulen [:RyanVM]

Comment 13

•

6 years ago

Luca, if we created a test build with these patches included, would you be easily able to test it to see if it helps?

Flags: needinfo?(ryanvm)

Timea Zsoldos [:zstimi/tzsoldos], Desktop QA

Comment 14

•

6 years ago

I managed to test this issue on two machine Ubuntu 14.04 x64 and Ubuntu 16.04 x64, same way as Francois did, I left my ESR52 profile running overnight and ended up with the following phishing list:
                        goog-phish-shavar.pset 3,0 MB
                        goog-phish-shavar.sbstore 5,2 MB

There were no IO> 100% entries in terminal for the Firefox command, only:

TID  PRIO  USER       DISK READ   DISK WRITE  SWAPIN   IO>    COMMAND   
14004 be/4 timeazso      0.00 B    416.00 K  0.00 %  0.00 % ./firefox -no-remote -p [localStorage DB]

Gecko Profiler result: https://perfht.ml/2DotQpz

I contacted the reporter, he told me that the problem is on Windows system, so I will test on windows too.

Flags: needinfo?(andrei.vaida)

Ryan VanderMeulen [:RyanVM]

Comment 15

•

6 years ago

Francois, want to try spinning a Try build of those uplifts for now to see how things look? Note that you'll want to include --buildbot in your Try syntax to make sure you get legacy platforms, i.e. |try: -b do -p all -u all -t none --buildbot|.

Flags: needinfo?(francois)

Timea Zsoldos [:zstimi/tzsoldos], Desktop QA

Comment 16

•

6 years ago

On Ubuntu 16.04 x64 sometimes the CPU usages was 100%, see attachment Bug 1436297.png.
On Windows 8.1 x64 and Windows 10 x64 I can not reproduced the issue.

Timea Zsoldos [:zstimi/tzsoldos], Desktop QA

Comment 17

•

6 years ago

Attached image Bug1436297.png — Details

Here is the attachment, I am not sure if the 100% CPU is related to this issue.

François Marier [:francois]

Assignee

Comment 18

•

6 years ago

Here's a Try build with 8f2b4efc5f0d grafted on top of the esr52 branch:

remote: View the pushlog for these changes here:
remote:   https://hg.mozilla.org/try/pushloghtml?changeset=b9dab7ede75888a47882ef313a78d85b8515118d
remote: 
remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=b9dab7ede75888a47882ef313a78d85b8515118d

The patch applies cleanly as-is.

Flags: needinfo?(francois)

Ryan VanderMeulen [:RyanVM]

Comment 19

•

6 years ago

I reached out to the reporter directly and he said he would look at the Try builds. For anybody else interested in testing them, they're available from the links below.

Linux32:
https://queue.taskcluster.net/v1/task/QkUfCrLrTxihjPmzt3Yqsg/runs/0/artifacts/public/build/target.tar.bz2

Linux64:
https://queue.taskcluster.net/v1/task/R4YF-n1WT02htBB6MqIPhw/runs/0/artifacts/public/build/target.tar.bz2

OSX:
https://queue.taskcluster.net/v1/task/VIG4SFzbQpKvEPb3YCw4vQ/runs/0/artifacts/public/build/target.dmg

Win32:
https://archive.mozilla.org/pub/firefox/try-builds/fmarier@mozilla.com-b9dab7ede75888a47882ef313a78d85b8515118d/try-win32/firefox-52.7.2.en-US.win32.zip

Win64:
https://archive.mozilla.org/pub/firefox/try-builds/fmarier@mozilla.com-b9dab7ede75888a47882ef313a78d85b8515118d/try-win64/firefox-52.7.2.en-US.win64.zip

Flags: needinfo?(luca76)

François Marier [:francois]

Assignee

Updated

•

6 years ago

Comment 20

•

6 years ago

I received positive feedback via email on the Try builds. I think we can go ahead with the approval request on bug 1361699 :)

Flags: needinfo?(francois)

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox59: --- → unaffected

status-firefox60: --- → unaffected

status-firefox61: --- → unaffected

status-firefox-esr52: --- → affected

tracking-firefox-esr52: ? → 60+

François Marier [:francois]

Assignee

Comment 21

•

6 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #20)
> I received positive feedback via email on the Try builds. I think we can go
> ahead with the approval request on bug 1361699 :)

https://bugzilla.mozilla.org/show_bug.cgi?id=1361699#c24

Flags: needinfo?(francois)

Ryan VanderMeulen [:RyanVM]

Comment 22

•

6 years ago

Calling this fixed now that bug 1361699 has been uplifted. We can reopen the bug if safebrowsing still shows up as a significant source of slowdowns in next month's 52.8.0 release.

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

status-firefox-esr52: affected → fixed

Resolution: --- → FIXED