Page loading is long paused due to safebrowsing, and Process remains for long term after quit browser while the pause sometimes

RESOLVED FIXED in Firefox 47

Status

()

RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: alice0775, Assigned: francois)

Tracking

({crash, perf, regression})

33 Branch
mozilla47
crash, perf, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox38 affected, firefox38.0.5 ?, firefox39 affected, firefox40 affected, firefox41 affected, firefox47 fixed, firefox-esr31 unaffected, firefox-esr38 ?)

Details

(crash signature)

Attachments

(13 attachments, 1 obsolete attachment)

1.26 MB, application/x-7z-compressed
Details
707.39 KB, text/plain
Details
319.76 KB, video/mp4
Details
342.40 KB, video/mp4
Details
325.53 KB, application/zip
Details
1.43 MB, text/plain
Details
1.63 MB, application/x-zip
Details
2.59 MB, application/x-zip-compressed
Details
8.97 MB, text/plain
Details
2.82 MB, application/zip
Details
58 bytes, text/x-review-board-request
gcp
: review+
Details
58 bytes, text/x-review-board-request
gcp
: review+
Details
58 bytes, text/x-review-board-request
gcp
: review+
Details
Comment hidden (empty)
Alice0775, is there more info forthcoming?
Flags: needinfo?(alice0775)
(Reporter)

Comment 2

4 years ago
This is initially reported on http://forum.mozilla.gr.jp/cbbs.cgi?mode=all&namber=58541 .

If safebrowsing is enabled(default), page loading is long paused in certain web pages[1],[2].
and Process remains for long term after quit browser while the pause sometimes.

[1] http://www.b-ch.com/  -- login needed
[2] http://tokyo-ame.jwa.or.jp/

Disabling the safebrowsing solves the problem. (Options > Security, Uncheck "Block reported attack site" and "Block reported web forgeries").

The problem does not happen on Chrome and IE.

Steps To Reproduce:
1. Clear everything(cache, cookies and so om) from "Clear Recent History"
2. Restart.
3. Open the site [1] or [2]

Actual Results:
Page loading is long paused for more than 60sec.


Expected Results:
Page loading should be completed within 10sec.

Regression window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=d0a80ba46f6f&tochange=f3192b2f9195

Via local build
Last Good: 6810b2fa1afe
First Bad: f3192b2f9195
Regressed by: 
f3192b2f9195	Monica Chew — Bug 1029781: Set NS_LOAD_CLASSIFY_URI in image loader (r=seth)
Blocks: 1029781
Keywords: perf, regression
Summary: Page loading is long paused for a while → Page loading is long paused due to safebrowsing, and Process remains for long term after quit browser while the pause sometimes
(Reporter)

Comment 3

4 years ago
Created attachment 8605338 [details]
HTTP log

NSPR_LOG_MODULES=timestamp,nsHttp:5,nsSocketTransport:5,nsStreamPump:5,nsHostResolver:5
Flags: needinfo?(alice0775)
Doug, who owns the safebrowsing stuff now?
Flags: needinfo?(dougt)

Comment 5

4 years ago
I think these pages have the same problem.

http://www.calpis.co.jp/index.html
http://monipla.jp/

Comment 6

4 years ago
The url tokyo-ame.jwa.or.jp/ja/images/button/ seems to match the prefix of the hash in the table goog-malware-shavar, which makes url-classifier get the full hash and update the table. Because there are many images for the page in the directory, url-classifier repeats it many times.

Comment 7

4 years ago
Created attachment 8606704 [details]
debug log

debug log for http://tokyo-ame.jwa.or.jp/
NSPR_LOG_MODULES=UrlClassifierDbService:5

Comment 8

4 years ago
Product should be Toolkit and component should be Safe Browsing.
Component: Security → Safe Browsing
Product: Core → Toolkit
Flags: needinfo?(dougt) → needinfo?(sworkman)
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Flags: needinfo?(sworkman)
Resolution: --- → DUPLICATE
Duplicate of bug: 1165816
(Assignee)

Comment 10

4 years ago
This is similar, but not actually a duplication of bug 1165816 since it's about page loads, not downloads. This is more like bug 1024555.

I can't reproduce with either site 1 or 2.

Can someone who can reproduce this bug test the following work-around:

1. go to about:config
2. set urlclassifier.gethash.timeout_ms to 1
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---

Comment 11

4 years ago
(In reply to François Marier [:francois] from comment #10)
> 1. go to about:config
> 2. set urlclassifier.gethash.timeout_ms to 1

The workaround worked.
(Assignee)

Comment 12

4 years ago
Ok, that means we might need to change the default value of urlclassifier.gethash.timeout_ms then. It's currently at 5000ms as of bug 1024555.

The downside of reducing the timeout is that it will decrease the protection that users on slow connections get from SafeBrowsing.

I will look into how Chrome handles that.
Assignee: nobody → francois
Status: REOPENED → ASSIGNED

Comment 13

4 years ago
(In reply to François Marier [:francois] from comment #12)
> Ok, that means we might need to change the default value of
> urlclassifier.gethash.timeout_ms then. It's currently at 5000ms as of bug
> 1024555.

Although it worked, I suspect it will virtually disable the safe browsing feature. For example, with timeout_ms set to 1, http://www.66fu.com.deletethispart/pk/ (be careful, this is a phishing site) would not be warned.
(Assignee)

Comment 14

4 years ago
(In reply to gnaka from comment #13)
> (In reply to François Marier [:francois] from comment #12)
> Although it worked, I suspect it will virtually disable the safe browsing
> feature. For example, with timeout_ms set to 1,
> http://www.66fu.com.deletethispart/pk/ (be careful, this is a phishing site)
> would not be warned.

Exactly, that's the difficult trade-off we're facing. If we drop it to 1ms, we're essentially turning off Safe Browsing entirely. So we should _not_ do that.

Maybe 5 seconds is too long though. Would 3-4 seconds improve the situation?

I'm surprised that connecting to https://safebrowsing.google.com/safebrowsing/gethash would be that slow. What sort of connection are you using? Mobile? ADSL? Dial-up? Are you based in Japan, China, somewhere else?
(Reporter)

Comment 15

4 years ago
At least, Process hang should be fixed.
(Assignee)

Comment 16

4 years ago
Since I haven't been able to reproduce it, can someone experiencing this problem maybe post a screencast showing what happens when loading one of these pages?

Comment 17

4 years ago
(In reply to François Marier [:francois] from comment #16)
> Since I haven't been able to reproduce it, can someone experiencing this
> problem maybe post a screencast showing what happens when loading one of
> these pages?

https://youtu.be/emhqdAFLmzU

On the machine (Core i5 2520M) which the video was recorded on, Firefox 38.0.1 hardly reproduced. Nightly build did.
On the other hand, on a slower machine (Core 2 Duo E6600), 38.0.1 always reproduces.

Comment 18

4 years ago
(In reply to François Marier [:francois] from comment #14)
> Maybe 5 seconds is too long though. Would 3-4 seconds improve the situation?

I have tried 3000 as timeout_ms, but the problem didn't seem to have been fixed.

> I'm surprised that connecting to
> https://safebrowsing.google.com/safebrowsing/gethash would be that slow.
> What sort of connection are you using? Mobile? ADSL? Dial-up? Are you based
> in Japan, China, somewhere else?

I'm in Japan. The RTT to safebrowsing.google.com(sb.l.google.com[216.58.220.206]) by ping.exe is 6 ms. The web developer tool of Firefox reported 60-100 ms to receive 400 request for the URL.
(Assignee)

Comment 19

4 years ago
(In reply to gnaka from comment #18)
> I'm in Japan. The RTT to
> safebrowsing.google.com(sb.l.google.com[216.58.220.206]) by ping.exe is 6
> ms. The web developer tool of Firefox reported 60-100 ms to receive 400
> request for the URL.

Thanks for that gnaka. It's not as simple as adjusting the timeout then. I will need to look into how these network requests are created. Perhaps the problem is that they are all done synchronously (i.e. serially, not in parallel).

Comment 20

4 years ago
Created attachment 8608606 [details]
tokyo-ame-firefox(100ms).mp4

Comment 21

4 years ago
Created attachment 8608609 [details]
tokyo-ame-chrome.mp4

Comment 22

4 years ago
(In reply to François Marier [:francois] from comment #16)
> Since I haven't been able to reproduce it, can someone experiencing this
> problem maybe post a screencast showing what happens when loading one of
> these pages?

I am the person who made the first report to http://forum.mozilla.gr.jp/cbbs.cgi?mode=all&namber=58541 .

Situation of site 1 (http://www.b-ch.com/) was improved. On April 29: more than 110sec. => Now: 5-10 sec.
Situation of site 2 (http://tokyo-ame.jwa.or.jp/) is still same.

tokyo-ame-chrome.mp4: Loading site 2 using Google Chrome.
tokyo-ame-firefox(100ms).mp4: Loading site 2 using Firefox 38.0.1 (urlclassifier.gethash.timeout_ms=100).
http://tokyo-ame.jwa.or.jp/ is one of the major websites in Japan, especially for the citizens of Tokyo.
Keywords: jp-critical
francois:

The website is very important for Japanese marketing because I see a lot of people who complain about this bug. If we should contact Google Japan, we should move this bug to TechEvangelism. Otherwise, could you take a look?
Flags: needinfo?(francois)

Comment 25

3 years ago
(In reply to yassan from comment #5)
> http://www.calpis.co.jp/index.html
I think that the problem of this page was solved.
(Assignee)

Comment 26

3 years ago
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #24)
> francois:
> 
> The website is very important for Japanese marketing because I see a lot of
> people who complain about this bug. If we should contact Google Japan, we
> should move this bug to TechEvangelism. Otherwise, could you take a look?

I'm still struggling to reproduce this bug. I will add more debugging info to Nightly to see if we can narrow the problem down a bit that way.

In the meantime, can someone reproduce the problem with Nightly while exporting the following environment variables:

  export NSPR_LOG_MODULES="UrlClassifierDbService:5,nsChannelClassifier:5"

and then attach the log here. The log from comment 7 unfortunately doesn't have enough info.
Flags: needinfo?(francois)
(Assignee)

Updated

3 years ago
Depends on: 1203347

Comment 27

3 years ago
Created attachment 8659164 [details]
debuglog2.zip

Nightly 43.0a1 (20150909030223)
NSPR_LOG_MODULES=timestamp,UrlClassifierDbService:5,nsChannelClassifier:5

tokyo-ame-e10s.log ... e10s
tokyo-ame-Non-e10s.log ... Non-e10s
tokyo-ame-en-e10s.log ... English site(http://tokyo-ame.jwa.or.jp/en/index.html)
(Reporter)

Comment 28

3 years ago
spum
Hmm...
I cannot reproduce the problem anymore......

I think that Google seems to have modified their server/network or something.
(Reporter)

Comment 29

3 years ago
Created attachment 8659309 [details]
log.txt

Reproduced on Nightly43.0a1(2015-09-09) win10 ( http://tokyo-ame.jwa.or.jp/ )
(In reply to Alice0775 White from comment #29)
> Created attachment 8659309 [details]
> log.txt
> 
> Reproduced on Nightly43.0a1(2015-09-09) win10 ( http://tokyo-ame.jwa.or.jp/ )

There is nothing in the log.  You will have to add nsHttp:5 to the list of modules.
(Reporter)

Comment 31

3 years ago
Created attachment 8659361 [details]
log.zip

(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #30)
> (In reply to Alice0775 White from comment #29)
> > Created attachment 8659309 [details]
> > log.txt
> > 
> > Reproduced on Nightly43.0a1(2015-09-09) win10 ( http://tokyo-ame.jwa.or.jp/ )
> 
> There is nothing in the log.  You will have to add nsHttp:5 to the list of
> modules.

attached
how exactly the symptoms look like?  I can see the top level page html loads in less then a second (from an http channel creation to full content delivery from the network layer).

I only see a one minute delay here:

	Line 118375: 2015-09-10 17:17:49.038000 UTC - 0[caa000]: Preparing to write data into the cache [uri=http://tokyo-ame.jwa.or.jp/favicon.ico]
	Line 124704: 2015-09-10 17:18:57.569000 UTC - 0[caa000]: Preparing to write data into the cache [uri=http://tokyo-ame.jwa.or.jp/scripts/jswidget/SlideController/spacer.gif]


Is this what we are hunting?
Flags: needinfo?(alice0775)
OK, this is the classifier that misbehaves:

	Line 115341: 2015-09-10 17:17:47.733000 UTC - 0[caa000]: nsHttpChannel::Starting nsChannelClassifier bb06140 [this=1152c700]
	Line 115343: 2015-09-10 17:17:47.733000 UTC - 0[caa000]: nsChannelClassifier[bb06140]: suspended channel 1152c72c
	Line 124654: 2015-09-10 17:18:57.569000 UTC - 0[caa000]: nsChannelClassifier[bb06140]:OnClassifyComplete 0 (suspended channel)
	Line 124656: 2015-09-10 17:18:57.569000 UTC - 0[caa000]: nsChannelClassifier[bb06140]: resuming channel 1152c72c from OnClassifyComplete


Unfortunatelly the log implementation for nsChannelClassifier is poor or I don't know how to get info from it.

hopefully identifying at least the classifier ptr will help.
Flags: needinfo?(alice0775)
Is this normal:


	Line 717: 2015-09-10 17:17:39.431000 UTC - 3540[13c1c600]: update took 563ms
	Line 113148: 2015-09-10 17:17:46.648000 UTC - 3540[13c1c600]: update took 2192ms
	Line 117536: 2015-09-10 17:17:48.478000 UTC - 3540[13c1c600]: update took 1830ms
	Line 118755: 2015-09-10 17:17:50.038000 UTC - 3540[13c1c600]: update took 1560ms
	Line 118851: 2015-09-10 17:17:51.600000 UTC - 3540[13c1c600]: update took 1562ms
	Line 118947: 2015-09-10 17:17:53.116000 UTC - 3540[13c1c600]: update took 1516ms
	Line 119043: 2015-09-10 17:17:54.647000 UTC - 3540[13c1c600]: update took 1531ms
	Line 119139: 2015-09-10 17:17:56.178000 UTC - 3540[13c1c600]: update took 1531ms
	Line 119349: 2015-09-10 17:17:57.725000 UTC - 3540[13c1c600]: update took 1547ms
	Line 119455: 2015-09-10 17:17:59.334000 UTC - 3540[13c1c600]: update took 1610ms
	Line 119551: 2015-09-10 17:18:00.881000 UTC - 3540[13c1c600]: update took 1547ms
	Line 119647: 2015-09-10 17:18:02.412000 UTC - 3540[13c1c600]: update took 1531ms
	Line 119746: 2015-09-10 17:18:03.897000 UTC - 3540[13c1c600]: update took 1484ms
	Line 119902: 2015-09-10 17:18:05.351000 UTC - 3540[13c1c600]: update took 1454ms
	Line 119998: 2015-09-10 17:18:06.944000 UTC - 3540[13c1c600]: update took 1593ms
	Line 120094: 2015-09-10 17:18:08.475000 UTC - 3540[13c1c600]: update took 1531ms
	Line 120190: 2015-09-10 17:18:10.178000 UTC - 3540[13c1c600]: update took 1703ms
	Line 120286: 2015-09-10 17:18:11.819000 UTC - 3540[13c1c600]: update took 1641ms
	Line 120382: 2015-09-10 17:18:13.319000 UTC - 3540[13c1c600]: update took 1500ms
	Line 120478: 2015-09-10 17:18:14.788000 UTC - 3540[13c1c600]: update took 1469ms
	Line 120574: 2015-09-10 17:18:16.287000 UTC - 3540[13c1c600]: update took 1500ms
	Line 120670: 2015-09-10 17:18:17.787000 UTC - 3540[13c1c600]: update took 1500ms
	Line 120766: 2015-09-10 17:18:19.257000 UTC - 3540[13c1c600]: update took 1469ms
	Line 120862: 2015-09-10 17:18:20.772000 UTC - 3540[13c1c600]: update took 1515ms
	Line 120958: 2015-09-10 17:18:22.288000 UTC - 3540[13c1c600]: update took 1516ms
	Line 121054: 2015-09-10 17:18:23.772000 UTC - 3540[13c1c600]: update took 1484ms
	Line 121150: 2015-09-10 17:18:25.303000 UTC - 3540[13c1c600]: update took 1531ms
	Line 121246: 2015-09-10 17:18:26.788000 UTC - 3540[13c1c600]: update took 1485ms
	Line 121342: 2015-09-10 17:18:28.319000 UTC - 3540[13c1c600]: update took 1531ms
	Line 121438: 2015-09-10 17:18:29.850000 UTC - 3540[13c1c600]: update took 1531ms
	Line 121534: 2015-09-10 17:18:31.506000 UTC - 3540[13c1c600]: update took 1657ms
	Line 121630: 2015-09-10 17:18:33.069000 UTC - 3540[13c1c600]: update took 1562ms
	Line 121905: 2015-09-10 17:18:34.709000 UTC - 3540[13c1c600]: update took 1641ms
	Line 122624: 2015-09-10 17:18:36.288000 UTC - 3540[13c1c600]: update took 1578ms
	Line 123008: 2015-09-10 17:18:37.866000 UTC - 3540[13c1c600]: update took 1578ms
	Line 123179: 2015-09-10 17:18:39.319000 UTC - 3540[13c1c600]: update took 1453ms
	Line 123295: 2015-09-10 17:18:40.850000 UTC - 3540[13c1c600]: update took 1531ms
	Line 123435: 2015-09-10 17:18:42.303000 UTC - 3540[13c1c600]: update took 1453ms
	Line 123531: 2015-09-10 17:18:43.850000 UTC - 3540[13c1c600]: update took 1547ms
	Line 123740: 2015-09-10 17:18:45.412000 UTC - 3540[13c1c600]: update took 1563ms
	Line 123836: 2015-09-10 17:18:46.944000 UTC - 3540[13c1c600]: update took 1531ms
	Line 123932: 2015-09-10 17:18:48.506000 UTC - 3540[13c1c600]: update took 1563ms
	Line 124028: 2015-09-10 17:18:49.975000 UTC - 3540[13c1c600]: update took 1469ms
	Line 124134: 2015-09-10 17:18:51.475000 UTC - 3540[13c1c600]: update took 1499ms
	Line 124230: 2015-09-10 17:18:53.006000 UTC - 3540[13c1c600]: update took 1531ms
	Line 124326: 2015-09-10 17:18:54.491000 UTC - 3540[13c1c600]: update took 1485ms
	Line 124422: 2015-09-10 17:18:55.991000 UTC - 3540[13c1c600]: update took 1500ms
	Line 124518: 2015-09-10 17:18:57.569000 UTC - 3540[13c1c600]: update took 1578ms
(Assignee)

Comment 35

3 years ago
(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #33)
> Unfortunatelly the log implementation for nsChannelClassifier is poor or I
> don't know how to get info from it.

I'm working on this in bug 1203347 and should have a patch that makes the logging more useful today.

Comment 36

3 years ago
(In reply to EarlgreyTea from comment #27)

I forgot to write the environment. 
OS: Windows 7 (Windows NT 6.1; Win64; x64) Time: 18:07-18:12 (JST)

I do not know would be helpful, but this problem seems to have started from Firefox 33.0. 
It has been reported in http://skyblues.org/arts/2828 (Japanese blog).
(Assignee)

Comment 37

3 years ago
(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #34)
> Is this normal:
> 
> 	Line 717: 2015-09-10 17:17:39.431000 UTC - 3540[13c1c600]: update took 563ms
> 	Line 113148: 2015-09-10 17:17:46.648000 UTC - 3540[13c1c600]: update took
> 2192ms

CCing gcp, he might know.
(Assignee)

Comment 38

3 years ago
From the logs provided by EarlgreyTea and Alice0775, I can see that one of the resources on that page hashes to 3D8D92FB which is on the goog-malware-shavar list:

2015-09-10 15:36:10.994000 UTC - 1220[12e8c800]: Probe in goog-malware-shavar: 3D8D92FB, found 1
2015-09-10 15:36:10.994000 UTC - 1220[12e8c800]: Found a result in goog-malware-shavar: Not complete. (Age: 86400s)

Unfortunately, I haven't been able to figure out what URL actually maps to that hash.

I extracted all of the fragments and all of the HTTP URIs from Alice0775's log:

  grep "Checking fragment" log.txt | cut -d' ' -f8 | sort | uniq
  grep ": uri=http://" log-full.txt | cut -d' ' -f 6 | sort | uniq

and then hashed all of the possible fragments using a script I wrote:

  https://github.com/fmarier/safebrowsing-tools/blob/master/regex-lookup.py

but 3D8D82FB is nowhere to be found.

Hopefully the extra logging I'm adding will help identify that resource.

Comment 39

3 years ago
Created attachment 8660757 [details]
safebrowsing.zip

goog-malware-shavar database files of my profile. Replace files in your safebrowsing folder of your profile by these files and it might be reproduced.
(Assignee)

Comment 40

3 years ago
Could someone please re-test with today's Nightly and the following environment variables:

  export NSPR_LOG_MODULES="UrlClassifierDbService:5,nsChannelClassifier:5,nsHttp:5"

and then attach a log to this bug?

It should have more information about what the URL classifier is doing.
(Assignee)

Comment 41

3 years ago
Also, please go into "about:config" and set "browser.safebrowsing.debug" to "true".

Comment 42

3 years ago
Created attachment 8662274 [details]
tokyo-ame-201509171802.log

Nightly 43.0a1 (20150916030203)
NSPR_LOG_MODULES=UrlClassifierDbService:5,nsChannelClassifier:5,nsHttp:5
browser.safebrowsing.debug: true
Time:18:02 (JST)
(Assignee)

Updated

3 years ago
Depends on: 1155646
(Assignee)

Comment 43

3 years ago
The offending resource is:

7000[1a73a900]: Checking fragment tokyo-ame.jwa.or.jp/ja/images/, hash FB928D3D9BB6B30740D5C8BD9D82B50B4CD53692752DB091B3C61C9F4AE3E031 (3D8D92FB)
7000[1a73a900]: Probe in goog-malware-shavar: 3D8D92FB, found 1

Looking at my local goog-malware-shavar DB, I do have that entry:

[goog-malware-shavar] addPrefix[chunk:177585] 3d8d92fb

I will try to reproduce the problem locally.
(In reply to François Marier [:francois] from comment #37)
> (In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #34)
> > Is this normal:
> > 
> > 	Line 717: 2015-09-10 17:17:39.431000 UTC - 3540[13c1c600]: update took 563ms
> > 	Line 113148: 2015-09-10 17:17:46.648000 UTC - 3540[13c1c600]: update took
> > 2192ms
> 
> CCing gcp, he might know.

Yes, that's the time including rebuilding all the DBs and writing them out, so that's fine.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #44)

> > CCing gcp, he might know.
> 
> Yes, that's the time including rebuilding all the DBs and writing them out,
> so that's fine.

	Line 717: 2015-09-10 17:17:39.431000 UTC - 3540[13c1c600]: update took 563ms
	...repeated many times...
	Line 124518: 2015-09-10 17:18:57.569000 UTC - 3540[13c1c600]: update took 1578ms


WAIT, WAIT, why are we completely rebuilding and rewriting the database 20 to 30 times in one minute?

That would certainly make things slow and time out!
(In reply to Gian-Carlo Pascutto [:gcp] from comment #45)
> WAIT, WAIT, ...

Sorry, I forgot to mention it repeats several times in the log, apparently in serial order and probably is the cause of the overall slowdown.
(Assignee)

Comment 47

3 years ago
Created attachment 8663898 [details]
goog-malware-shavar.*

Just attaching a SB DB which has the colliding entry in goog-malware-shavar to help with reproducing this bug.
(Assignee)

Comment 48

3 years ago
Alright, I was finally able to reproduce this problem:

1. create a new profile
2. shutdown Firefox
3. put the contents of goog-malware.* from attachment 8663898 [details] into the safebrowsing cache
4. start firefox
5. wait 10 seconds for the SB updates to finish
6. open http://tokyo-ame.jwa.or.jp/

Note that the second visit to the site in the same session will not be slow, which indicates that we are caching completion misses. Also, this has nothing to do with timeouts while talking to the Google servers.

This is what's happening:

- We load http://tokyo-ame.jwa.or.jp/ and that has several images that collide with partial hash 3D8D92FB (chunk 177585).
- For each image, we start a completion and add three noise entries.
- When all four completions for that image finish, we update the goog-malware-shavar table.
- When the completions for the next image come back, we again update the table.

So we need to figure out a way to put a limit on the frequency of database updates.
>- We load http://tokyo-ame.jwa.or.jp/ and that has several images that collide with partial hash 3D8D92FB (chunk 177585).

How is this possible? Common prefix that hits the normalizations we try to match?

>- For each image, we start a completion and add three noise entries.

For the same prefix? That's a bug - there should be only one. I guess the problem is the new queries come in while the response for the original is still outstanding? IMHO we should remember which queries are outstanding and not launch new ones for the same prefix.

>So we need to figure out a way to put a limit on the frequency of database updates.

Can we short circuit the case where the updates only contain completion updates? Most of the heavy duty work is (un)compressing the PrefixSet and that could be skipped entirely.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #49)
> Can we short circuit the case where the updates only contain completion
> updates? Most of the heavy duty work is (un)compressing the PrefixSet and
> that could be skipped entirely.

I looked into this and the answer is no, because completes are cached in .cache but stored in .sbtore (together with the add chunks) - and .sbtore is the expensive file to process. Even just skipping the PrefixSet rebuild didn't work, probably due to the atomic updates.

I guess we have to try to first fix, i.e. don't allow multiple outstanding queries for the same prefix.

Comment 51

3 years ago
I've got not to be able to reproduce the issue since around yesterday or today on Firefox 41.0. Something in goog-malware-shavar server may be changed.

Comment 52

3 years ago
Today, I also can not reproduce the issue. Loading completed in about two seconds.
(Assignee)

Updated

3 years ago
Keywords: jp-critical
(Assignee)

Updated

3 years ago
Depends on: 1215396
(Assignee)

Updated

3 years ago
Depends on: 1229981
Duplicate of this bug: 1243742

Comment 54

3 years ago
From the dupe:
Another website to test with crash at exit after Firefox stays too long in the process manager.
http://www.pferdekaemper.de/
Crash Signature: [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | nsThread::ProcessNextEvent | NS_ProcessNextEvent | nsThread::Shutdown ]
Keywords: crash
Duplicate of this bug: 1033493
(Assignee)

Comment 56

3 years ago
Created attachment 8725949 [details]
MozReview Request: Bug 1164518 - Better logging of completions. r?gcp

Review commit: https://reviewboard.mozilla.org/r/37703/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/37703/
Attachment #8725949 - Flags: review?(gpascutto)
(Assignee)

Comment 57

3 years ago
Created attachment 8725950 [details]
MozReview Request: Bug 1164518 - Minor optimizations for Safe Browsing completions. r?gcp

- added an early exit to skip table string comparisons on a 204
- removed an unnecessary NS_WARNING on gethash errors (e.g. 503s)
- do the noise check first since it's the majority of completions

Review commit: https://reviewboard.mozilla.org/r/37705/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/37705/
Attachment #8725950 - Flags: review?(gpascutto)
(Assignee)

Comment 58

3 years ago
Created attachment 8725951 [details]
MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp

Review commit: https://reviewboard.mozilla.org/r/37707/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/37707/
Attachment #8725951 - Flags: review?(gpascutto)
Comment on attachment 8725949 [details]
MozReview Request: Bug 1164518 - Better logging of completions. r?gcp

https://reviewboard.mozilla.org/r/37703/#review34409
Attachment #8725949 - Flags: review?(gpascutto) → review+
Comment on attachment 8725950 [details]
MozReview Request: Bug 1164518 - Minor optimizations for Safe Browsing completions. r?gcp

https://reviewboard.mozilla.org/r/37705/#review34411
Attachment #8725950 - Flags: review?(gpascutto) → review+
Comment on attachment 8725951 [details]
MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp

https://reviewboard.mozilla.org/r/37707/#review34413

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1473
(Diff revision 1)
> +  mWorker->ClearLastResults();

This needs a lock or proxy to the worker thread. Is it required to pass some tests? It's not clear why because there is supposed to be no functional impact from this change.
Attachment #8725951 - Flags: review?(gpascutto)
(Assignee)

Comment 62

3 years ago
Comment on attachment 8725951 [details]
MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37707/diff/1-2/
Attachment #8725951 - Flags: review?(gpascutto)
(Assignee)

Comment 63

3 years ago
https://reviewboard.mozilla.org/r/37707/#review34413

> This needs a lock or proxy to the worker thread. Is it required to pass some tests? It's not clear why because there is supposed to be no functional impact from this change.

Yeah, it's for [a test](https://dxr.mozilla.org/mozilla-central/rev/2b5237c178ea02133a777396c24dd2b713f2b8ee/toolkit/components/url-classifier/tests/unit/test_partial.js#561-599) that does something weird with multiple completers.

I added a proxy to the worker thread.
Attachment #8725951 - Flags: review?(gpascutto) → review+
Comment on attachment 8725951 [details]
MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp

https://reviewboard.mozilla.org/r/37707/#review34677

::: toolkit/components/url-classifier/Entries.h:185
(Diff revision 2)
> +  bool operator!=(const AddComplete& aOther) const

nit: inconsistent bracing

::: toolkit/components/url-classifier/LookupCache.h:72
(Diff revision 2)
> +  {

nit: inconsitent bracing
Created attachment 8726784 [details]
MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp

Review commit: https://reviewboard.mozilla.org/r/38235/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/38235/
Attachment #8726784 - Flags: review?(gpascutto)
Attachment #8726784 - Attachment is obsolete: true
Attachment #8726784 - Flags: review?(gpascutto)
(Assignee)

Comment 66

3 years ago
Comment on attachment 8725951 [details]
MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37707/diff/2-3/

Comment 68

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/3f15dd1859d8
https://hg.mozilla.org/mozilla-central/rev/105eec7abaf0
https://hg.mozilla.org/mozilla-central/rev/7f5b604e1b5a
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago3 years ago
status-firefox47: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
(Assignee)

Updated

3 years ago
Depends on: 1254310
(Assignee)

Updated

3 years ago
Depends on: 1254323
(Assignee)

Updated

3 years ago
QA Contact: mwobensmith

Updated

3 years ago
Blocks: 1248837
I tried to reproduce the original problem in some builds from January and February, pre-checkin, but wasn't able to see it. I followed the steps in comment 48, and also tried the sample URL in comment 54. 

Obviously web sites change, so that could very well be the issue here. But currently, I won't be able to verify this fix. If someone feels strongly that I do so, please provide another example.
You need to log in before you can comment on or make changes to this bug.