Closed Bug 1164518 Opened 9 years ago Closed 8 years ago

Page loading is long paused due to safebrowsing, and Process remains for long term after quit browser while the pause sometimes

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla47

Tracking Flags:

Tracking

Status

firefox38

---

wontfix

firefox38.0.5

---

wontfix

firefox39

---

wontfix

firefox40

---

wontfix

firefox41

---

wontfix

firefox47

---

fixed

firefox-esr31

---

unaffected

firefox-esr38

---

wontfix

People

(Reporter: alice0775, Assigned: francois)

References

Details

(Keywords: crash, perf, regression)

Crash Data

Attachments

(13 files, 1 obsolete file)

HTTP log 9 years ago Alice0775 White 1.26 MB, application/x-7z-compressed		Details
debug log 9 years ago gnaka 707.39 KB, text/plain		Details
tokyo-ame-firefox(100ms).mp4 9 years ago EarlgreyTea 319.76 KB, video/mp4		Details
tokyo-ame-chrome.mp4 9 years ago EarlgreyTea 342.40 KB, video/mp4		Details
debuglog2.zip 9 years ago EarlgreyTea 325.53 KB, application/zip		Details
log.txt 9 years ago Alice0775 White 1.43 MB, text/plain		Details
log.zip 9 years ago Alice0775 White 1.63 MB, application/x-zip		Details
safebrowsing.zip 9 years ago gnaka 2.59 MB, application/x-zip-compressed		Details
tokyo-ame-201509171802.log 9 years ago EarlgreyTea 8.97 MB, text/plain		Details
goog-malware-shavar.* 9 years ago François Marier [:francois] 2.82 MB, application/zip		Details
MozReview Request: Bug 1164518 - Better logging of completions. r?gcp 8 years ago François Marier [:francois] 58 bytes, text/x-review-board-request	gcp : review+	Details
MozReview Request: Bug 1164518 - Minor optimizations for Safe Browsing completions. r?gcp 8 years ago François Marier [:francois] 58 bytes, text/x-review-board-request	gcp : review+	Details
MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp 8 years ago François Marier [:francois] 58 bytes, text/x-review-board-request	gcp : review+	Details
MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp 8 years ago Gian-Carlo Pascutto [:gcp] 58 bytes, text/x-review-board-request		Details

Alice0775 White

Reporter

Description

•

9 years ago

      No description provided.

Boris Zbarsky [:bzbarsky]

Comment 1

•

9 years ago

Alice0775, is there more info forthcoming?

Flags: needinfo?(alice0775)

Alice0775 White

Reporter

Comment 2

•

9 years ago

This is initially reported on http://forum.mozilla.gr.jp/cbbs.cgi?mode=all&namber=58541 .

If safebrowsing is enabled(default), page loading is long paused in certain web pages[1],[2].
and Process remains for long term after quit browser while the pause sometimes.

[1] http://www.b-ch.com/  -- login needed
[2] http://tokyo-ame.jwa.or.jp/

Disabling the safebrowsing solves the problem. (Options > Security, Uncheck "Block reported attack site" and "Block reported web forgeries").

The problem does not happen on Chrome and IE.

Steps To Reproduce:
1. Clear everything(cache, cookies and so om) from "Clear Recent History"
2. Restart.
3. Open the site [1] or [2]

Actual Results:
Page loading is long paused for more than 60sec.


Expected Results:
Page loading should be completed within 10sec.

Regression window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=d0a80ba46f6f&tochange=f3192b2f9195

Via local build
Last Good: 6810b2fa1afe
First Bad: f3192b2f9195
Regressed by: 
f3192b2f9195	Monica Chew — Bug 1029781: Set NS_LOAD_CLASSIFY_URI in image loader (r=seth)

Blocks: 1029781

Keywords: perf, regression

Summary: Page loading is long paused for a while → Page loading is long paused due to safebrowsing, and Process remains for long term after quit browser while the pause sometimes

Alice0775 White

Reporter

Comment 3

•

9 years ago

Attached file HTTP log — Details

NSPR_LOG_MODULES=timestamp,nsHttp:5,nsSocketTransport:5,nsStreamPump:5,nsHostResolver:5

Flags: needinfo?(alice0775)

Boris Zbarsky [:bzbarsky]

Comment 4

•

9 years ago

Doug, who owns the safebrowsing stuff now?

Flags: needinfo?(dougt)

yassan

Comment 5

•

9 years ago

I think these pages have the same problem.

http://www.calpis.co.jp/index.html
http://monipla.jp/

gnaka

Comment 6

•

9 years ago

The url tokyo-ame.jwa.or.jp/ja/images/button/ seems to match the prefix of the hash in the table goog-malware-shavar, which makes url-classifier get the full hash and update the table. Because there are many images for the page in the directory, url-classifier repeats it many times.

gnaka

Comment 7

•

9 years ago

Attached file debug log — Details

debug log for http://tokyo-ame.jwa.or.jp/
NSPR_LOG_MODULES=UrlClassifierDbService:5

gnaka

Comment 8

•

9 years ago

Product should be Toolkit and component should be Safe Browsing.

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Updated

•

9 years ago

Component: Security → Safe Browsing

Product: Core → Toolkit

Doug Turner (:dougt)

Updated

•

9 years ago

Flags: needinfo?(dougt) → needinfo?(sworkman)

Steve Workman [:sworkman] (INACTIVE)

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

Flags: needinfo?(sworkman)

Resolution: --- → DUPLICATE

François Marier [:francois]

Assignee

Comment 10

•

9 years ago

This is similar, but not actually a duplication of bug 1165816 since it's about page loads, not downloads. This is more like bug 1024555.

I can't reproduce with either site 1 or 2.

Can someone who can reproduce this bug test the following work-around:

1. go to about:config
2. set urlclassifier.gethash.timeout_ms to 1

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

gnaka

Comment 11

•

9 years ago

(In reply to François Marier [:francois] from comment #10)
> 1. go to about:config
> 2. set urlclassifier.gethash.timeout_ms to 1

The workaround worked.

François Marier [:francois]

Assignee

Comment 12

•

9 years ago

Ok, that means we might need to change the default value of urlclassifier.gethash.timeout_ms then. It's currently at 5000ms as of bug 1024555.

The downside of reducing the timeout is that it will decrease the protection that users on slow connections get from SafeBrowsing.

I will look into how Chrome handles that.

Assignee: nobody → francois

Status: REOPENED → ASSIGNED

gnaka

Comment 13

•

9 years ago

(In reply to François Marier [:francois] from comment #12)
> Ok, that means we might need to change the default value of
> urlclassifier.gethash.timeout_ms then. It's currently at 5000ms as of bug
> 1024555.

Although it worked, I suspect it will virtually disable the safe browsing feature. For example, with timeout_ms set to 1, http://www.66fu.com.deletethispart/pk/ (be careful, this is a phishing site) would not be warned.

François Marier [:francois]

Assignee

Comment 14

•

9 years ago

(In reply to gnaka from comment #13)
> (In reply to François Marier [:francois] from comment #12)
> Although it worked, I suspect it will virtually disable the safe browsing
> feature. For example, with timeout_ms set to 1,
> http://www.66fu.com.deletethispart/pk/ (be careful, this is a phishing site)
> would not be warned.

Exactly, that's the difficult trade-off we're facing. If we drop it to 1ms, we're essentially turning off Safe Browsing entirely. So we should _not_ do that.

Maybe 5 seconds is too long though. Would 3-4 seconds improve the situation?

I'm surprised that connecting to https://safebrowsing.google.com/safebrowsing/gethash would be that slow. What sort of connection are you using? Mobile? ADSL? Dial-up? Are you based in Japan, China, somewhere else?

Alice0775 White

Reporter

Comment 15

•

9 years ago

At least, Process hang should be fixed.

François Marier [:francois]

Assignee

Comment 16

•

9 years ago

Since I haven't been able to reproduce it, can someone experiencing this problem maybe post a screencast showing what happens when loading one of these pages?

gnaka

Comment 17

•

9 years ago

(In reply to François Marier [:francois] from comment #16)
> Since I haven't been able to reproduce it, can someone experiencing this
> problem maybe post a screencast showing what happens when loading one of
> these pages?

https://youtu.be/emhqdAFLmzU

On the machine (Core i5 2520M) which the video was recorded on, Firefox 38.0.1 hardly reproduced. Nightly build did.
On the other hand, on a slower machine (Core 2 Duo E6600), 38.0.1 always reproduces.

gnaka

Comment 18

•

9 years ago

(In reply to François Marier [:francois] from comment #14)
> Maybe 5 seconds is too long though. Would 3-4 seconds improve the situation?

I have tried 3000 as timeout_ms, but the problem didn't seem to have been fixed.

> I'm surprised that connecting to
> https://safebrowsing.google.com/safebrowsing/gethash would be that slow.
> What sort of connection are you using? Mobile? ADSL? Dial-up? Are you based
> in Japan, China, somewhere else?

I'm in Japan. The RTT to safebrowsing.google.com(sb.l.google.com[216.58.220.206]) by ping.exe is 6 ms. The web developer tool of Firefox reported 60-100 ms to receive 400 request for the URL.

François Marier [:francois]

Assignee

Comment 19

•

9 years ago

(In reply to gnaka from comment #18)
> I'm in Japan. The RTT to
> safebrowsing.google.com(sb.l.google.com[216.58.220.206]) by ping.exe is 6
> ms. The web developer tool of Firefox reported 60-100 ms to receive 400
> request for the URL.

Thanks for that gnaka. It's not as simple as adjusting the timeout then. I will need to look into how these network requests are created. Perhaps the problem is that they are all done synchronously (i.e. serially, not in parallel).

EarlgreyTea

Comment 20

•

9 years ago

Attached video tokyo-ame-firefox(100ms).mp4 — Details

EarlgreyTea

Comment 21

•

9 years ago

Attached video tokyo-ame-chrome.mp4 — Details

EarlgreyTea

Comment 22

•

9 years ago

(In reply to François Marier [:francois] from comment #16)
> Since I haven't been able to reproduce it, can someone experiencing this
> problem maybe post a screencast showing what happens when loading one of
> these pages?

I am the person who made the first report to http://forum.mozilla.gr.jp/cbbs.cgi?mode=all&namber=58541 .

Situation of site 1 (http://www.b-ch.com/) was improved. On April 29: more than 110sec. => Now: 5-10 sec.
Situation of site 2 (http://tokyo-ame.jwa.or.jp/) is still same.

tokyo-ame-chrome.mp4： Loading site 2 using Google Chrome.
tokyo-ame-firefox(100ms).mp4: Loading site 2 using Firefox 38.0.1 (urlclassifier.gethash.timeout_ms=100).

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 23

•

9 years ago

http://tokyo-ame.jwa.or.jp/ is one of the major websites in Japan, especially for the citizens of Tokyo.

Keywords: jp-critical

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 24

•

9 years ago

francois:

The website is very important for Japanese marketing because I see a lot of people who complain about this bug. If we should contact Google Japan, we should move this bug to TechEvangelism. Otherwise, could you take a look?

Flags: needinfo?(francois)

yassan

Comment 25

•

9 years ago

(In reply to yassan from comment #5)
> http://www.calpis.co.jp/index.html
I think that the problem of this page was solved.

François Marier [:francois]

Assignee

Comment 26

•

9 years ago

(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #24)
> francois:
> 
> The website is very important for Japanese marketing because I see a lot of
> people who complain about this bug. If we should contact Google Japan, we
> should move this bug to TechEvangelism. Otherwise, could you take a look?

I'm still struggling to reproduce this bug. I will add more debugging info to Nightly to see if we can narrow the problem down a bit that way.

In the meantime, can someone reproduce the problem with Nightly while exporting the following environment variables:

  export NSPR_LOG_MODULES="UrlClassifierDbService:5,nsChannelClassifier:5"

and then attach the log here. The log from comment 7 unfortunately doesn't have enough info.

Flags: needinfo?(francois)

François Marier [:francois]

Assignee

Updated

•

9 years ago

Depends on: 1203347

EarlgreyTea

Comment 27

•

9 years ago

Attached file debuglog2.zip — Details

Nightly 43.0a1 (20150909030223)
NSPR_LOG_MODULES=timestamp,UrlClassifierDbService:5,nsChannelClassifier:5

tokyo-ame-e10s.log ... e10s
tokyo-ame-Non-e10s.log ... Non-e10s
tokyo-ame-en-e10s.log ... English site(http://tokyo-ame.jwa.or.jp/en/index.html)

Alice0775 White

Reporter

Comment 28

•

9 years ago

spum

Hmm...
I cannot reproduce the problem anymore......

I think that Google seems to have modified their server/network or something.

Alice0775 White

Reporter

Comment 29

•

9 years ago

Attached file log.txt — Details

Reproduced on Nightly43.0a1(2015-09-09) win10 ( http://tokyo-ame.jwa.or.jp/ )

Honza Bambas (:mayhemer)

Comment 30

•

9 years ago

(In reply to Alice0775 White from comment #29)
> Created attachment 8659309 [details]
> log.txt
> 
> Reproduced on Nightly43.0a1(2015-09-09) win10 ( http://tokyo-ame.jwa.or.jp/ )

There is nothing in the log.  You will have to add nsHttp:5 to the list of modules.

Alice0775 White

Reporter

Comment 31

•

9 years ago

Attached file log.zip — Details

(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #30)
> (In reply to Alice0775 White from comment #29)
> > Created attachment 8659309 [details]
> > log.txt
> > 
> > Reproduced on Nightly43.0a1(2015-09-09) win10 ( http://tokyo-ame.jwa.or.jp/ )
> 
> There is nothing in the log.  You will have to add nsHttp:5 to the list of
> modules.

attached

Honza Bambas (:mayhemer)

Comment 32

•

9 years ago

how exactly the symptoms look like?  I can see the top level page html loads in less then a second (from an http channel creation to full content delivery from the network layer).

I only see a one minute delay here:

	Line 118375: 2015-09-10 17:17:49.038000 UTC - 0[caa000]: Preparing to write data into the cache [uri=http://tokyo-ame.jwa.or.jp/favicon.ico]
	Line 124704: 2015-09-10 17:18:57.569000 UTC - 0[caa000]: Preparing to write data into the cache [uri=http://tokyo-ame.jwa.or.jp/scripts/jswidget/SlideController/spacer.gif]


Is this what we are hunting?

Flags: needinfo?(alice0775)

Honza Bambas (:mayhemer)

Comment 33

•

9 years ago

OK, this is the classifier that misbehaves:

	Line 115341: 2015-09-10 17:17:47.733000 UTC - 0[caa000]: nsHttpChannel::Starting nsChannelClassifier bb06140 [this=1152c700]
	Line 115343: 2015-09-10 17:17:47.733000 UTC - 0[caa000]: nsChannelClassifier[bb06140]: suspended channel 1152c72c
	Line 124654: 2015-09-10 17:18:57.569000 UTC - 0[caa000]: nsChannelClassifier[bb06140]:OnClassifyComplete 0 (suspended channel)
	Line 124656: 2015-09-10 17:18:57.569000 UTC - 0[caa000]: nsChannelClassifier[bb06140]: resuming channel 1152c72c from OnClassifyComplete


Unfortunatelly the log implementation for nsChannelClassifier is poor or I don't know how to get info from it.

hopefully identifying at least the classifier ptr will help.

Flags: needinfo?(alice0775)

Honza Bambas (:mayhemer)

Comment 34

•

9 years ago

Is this normal:


	Line 717: 2015-09-10 17:17:39.431000 UTC - 3540[13c1c600]: update took 563ms
	Line 113148: 2015-09-10 17:17:46.648000 UTC - 3540[13c1c600]: update took 2192ms
	Line 117536: 2015-09-10 17:17:48.478000 UTC - 3540[13c1c600]: update took 1830ms
	Line 118755: 2015-09-10 17:17:50.038000 UTC - 3540[13c1c600]: update took 1560ms
	Line 118851: 2015-09-10 17:17:51.600000 UTC - 3540[13c1c600]: update took 1562ms
	Line 118947: 2015-09-10 17:17:53.116000 UTC - 3540[13c1c600]: update took 1516ms
	Line 119043: 2015-09-10 17:17:54.647000 UTC - 3540[13c1c600]: update took 1531ms
	Line 119139: 2015-09-10 17:17:56.178000 UTC - 3540[13c1c600]: update took 1531ms
	Line 119349: 2015-09-10 17:17:57.725000 UTC - 3540[13c1c600]: update took 1547ms
	Line 119455: 2015-09-10 17:17:59.334000 UTC - 3540[13c1c600]: update took 1610ms
	Line 119551: 2015-09-10 17:18:00.881000 UTC - 3540[13c1c600]: update took 1547ms
	Line 119647: 2015-09-10 17:18:02.412000 UTC - 3540[13c1c600]: update took 1531ms
	Line 119746: 2015-09-10 17:18:03.897000 UTC - 3540[13c1c600]: update took 1484ms
	Line 119902: 2015-09-10 17:18:05.351000 UTC - 3540[13c1c600]: update took 1454ms
	Line 119998: 2015-09-10 17:18:06.944000 UTC - 3540[13c1c600]: update took 1593ms
	Line 120094: 2015-09-10 17:18:08.475000 UTC - 3540[13c1c600]: update took 1531ms
	Line 120190: 2015-09-10 17:18:10.178000 UTC - 3540[13c1c600]: update took 1703ms
	Line 120286: 2015-09-10 17:18:11.819000 UTC - 3540[13c1c600]: update took 1641ms
	Line 120382: 2015-09-10 17:18:13.319000 UTC - 3540[13c1c600]: update took 1500ms
	Line 120478: 2015-09-10 17:18:14.788000 UTC - 3540[13c1c600]: update took 1469ms
	Line 120574: 2015-09-10 17:18:16.287000 UTC - 3540[13c1c600]: update took 1500ms
	Line 120670: 2015-09-10 17:18:17.787000 UTC - 3540[13c1c600]: update took 1500ms
	Line 120766: 2015-09-10 17:18:19.257000 UTC - 3540[13c1c600]: update took 1469ms
	Line 120862: 2015-09-10 17:18:20.772000 UTC - 3540[13c1c600]: update took 1515ms
	Line 120958: 2015-09-10 17:18:22.288000 UTC - 3540[13c1c600]: update took 1516ms
	Line 121054: 2015-09-10 17:18:23.772000 UTC - 3540[13c1c600]: update took 1484ms
	Line 121150: 2015-09-10 17:18:25.303000 UTC - 3540[13c1c600]: update took 1531ms
	Line 121246: 2015-09-10 17:18:26.788000 UTC - 3540[13c1c600]: update took 1485ms
	Line 121342: 2015-09-10 17:18:28.319000 UTC - 3540[13c1c600]: update took 1531ms
	Line 121438: 2015-09-10 17:18:29.850000 UTC - 3540[13c1c600]: update took 1531ms
	Line 121534: 2015-09-10 17:18:31.506000 UTC - 3540[13c1c600]: update took 1657ms
	Line 121630: 2015-09-10 17:18:33.069000 UTC - 3540[13c1c600]: update took 1562ms
	Line 121905: 2015-09-10 17:18:34.709000 UTC - 3540[13c1c600]: update took 1641ms
	Line 122624: 2015-09-10 17:18:36.288000 UTC - 3540[13c1c600]: update took 1578ms
	Line 123008: 2015-09-10 17:18:37.866000 UTC - 3540[13c1c600]: update took 1578ms
	Line 123179: 2015-09-10 17:18:39.319000 UTC - 3540[13c1c600]: update took 1453ms
	Line 123295: 2015-09-10 17:18:40.850000 UTC - 3540[13c1c600]: update took 1531ms
	Line 123435: 2015-09-10 17:18:42.303000 UTC - 3540[13c1c600]: update took 1453ms
	Line 123531: 2015-09-10 17:18:43.850000 UTC - 3540[13c1c600]: update took 1547ms
	Line 123740: 2015-09-10 17:18:45.412000 UTC - 3540[13c1c600]: update took 1563ms
	Line 123836: 2015-09-10 17:18:46.944000 UTC - 3540[13c1c600]: update took 1531ms
	Line 123932: 2015-09-10 17:18:48.506000 UTC - 3540[13c1c600]: update took 1563ms
	Line 124028: 2015-09-10 17:18:49.975000 UTC - 3540[13c1c600]: update took 1469ms
	Line 124134: 2015-09-10 17:18:51.475000 UTC - 3540[13c1c600]: update took 1499ms
	Line 124230: 2015-09-10 17:18:53.006000 UTC - 3540[13c1c600]: update took 1531ms
	Line 124326: 2015-09-10 17:18:54.491000 UTC - 3540[13c1c600]: update took 1485ms
	Line 124422: 2015-09-10 17:18:55.991000 UTC - 3540[13c1c600]: update took 1500ms
	Line 124518: 2015-09-10 17:18:57.569000 UTC - 3540[13c1c600]: update took 1578ms

François Marier [:francois]

Assignee

Comment 35

•

9 years ago

(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #33)
> Unfortunatelly the log implementation for nsChannelClassifier is poor or I
> don't know how to get info from it.

I'm working on this in bug 1203347 and should have a patch that makes the logging more useful today.

EarlgreyTea

Comment 36

•

9 years ago

(In reply to EarlgreyTea from comment #27)

I forgot to write the environment. 
OS: Windows 7 (Windows NT 6.1; Win64; x64) Time: 18:07-18:12 (JST)

I do not know would be helpful, but this problem seems to have started from Firefox 33.0. 
It has been reported in http://skyblues.org/arts/2828 (Japanese blog).

François Marier [:francois]

Assignee

Comment 37

•

9 years ago

(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #34)
> Is this normal:
> 
> 	Line 717: 2015-09-10 17:17:39.431000 UTC - 3540[13c1c600]: update took 563ms
> 	Line 113148: 2015-09-10 17:17:46.648000 UTC - 3540[13c1c600]: update took
> 2192ms

CCing gcp, he might know.

François Marier [:francois]

Assignee

Comment 38

•

9 years ago

From the logs provided by EarlgreyTea and Alice0775, I can see that one of the resources on that page hashes to 3D8D92FB which is on the goog-malware-shavar list:

2015-09-10 15:36:10.994000 UTC - 1220[12e8c800]: Probe in goog-malware-shavar: 3D8D92FB, found 1
2015-09-10 15:36:10.994000 UTC - 1220[12e8c800]: Found a result in goog-malware-shavar: Not complete. (Age: 86400s)

Unfortunately, I haven't been able to figure out what URL actually maps to that hash.

I extracted all of the fragments and all of the HTTP URIs from Alice0775's log:

  grep "Checking fragment" log.txt | cut -d' ' -f8 | sort | uniq
  grep ": uri=http://" log-full.txt | cut -d' ' -f 6 | sort | uniq

and then hashed all of the possible fragments using a script I wrote:

  https://github.com/fmarier/safebrowsing-tools/blob/master/regex-lookup.py

but 3D8D82FB is nowhere to be found.

Hopefully the extra logging I'm adding will help identify that resource.

gnaka

Comment 39

•

9 years ago

Attached file safebrowsing.zip — Details

goog-malware-shavar database files of my profile. Replace files in your safebrowsing folder of your profile by these files and it might be reproduced.

François Marier [:francois]

Assignee

Comment 40

•

9 years ago

Could someone please re-test with today's Nightly and the following environment variables:

  export NSPR_LOG_MODULES="UrlClassifierDbService:5,nsChannelClassifier:5,nsHttp:5"

and then attach a log to this bug?

It should have more information about what the URL classifier is doing.

François Marier [:francois]

Assignee

Comment 41

•

9 years ago

Also, please go into "about:config" and set "browser.safebrowsing.debug" to "true".

EarlgreyTea

Comment 42

•

9 years ago

Attached file tokyo-ame-201509171802.log — Details

Nightly 43.0a1 (20150916030203)
NSPR_LOG_MODULES=UrlClassifierDbService:5,nsChannelClassifier:5,nsHttp:5
browser.safebrowsing.debug: true
Time:18:02 (JST)

François Marier [:francois]

Assignee

Comment 43

•

9 years ago

The offending resource is:

7000[1a73a900]: Checking fragment tokyo-ame.jwa.or.jp/ja/images/, hash FB928D3D9BB6B30740D5C8BD9D82B50B4CD53692752DB091B3C61C9F4AE3E031 (3D8D92FB)
7000[1a73a900]: Probe in goog-malware-shavar: 3D8D92FB, found 1

Looking at my local goog-malware-shavar DB, I do have that entry:

[goog-malware-shavar] addPrefix[chunk:177585] 3d8d92fb

I will try to reproduce the problem locally.

Gian-Carlo Pascutto [:gcp]

Comment 44

•

9 years ago

(In reply to François Marier [:francois] from comment #37)
> (In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #34)
> > Is this normal:
> > 
> > 	Line 717: 2015-09-10 17:17:39.431000 UTC - 3540[13c1c600]: update took 563ms
> > 	Line 113148: 2015-09-10 17:17:46.648000 UTC - 3540[13c1c600]: update took
> > 2192ms
> 
> CCing gcp, he might know.

Yes, that's the time including rebuilding all the DBs and writing them out, so that's fine.

Gian-Carlo Pascutto [:gcp]

Comment 45

•

9 years ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #44)

> > CCing gcp, he might know.
> 
> Yes, that's the time including rebuilding all the DBs and writing them out,
> so that's fine.

	Line 717: 2015-09-10 17:17:39.431000 UTC - 3540[13c1c600]: update took 563ms
	...repeated many times...
	Line 124518: 2015-09-10 17:18:57.569000 UTC - 3540[13c1c600]: update took 1578ms


WAIT, WAIT, why are we completely rebuilding and rewriting the database 20 to 30 times in one minute?

That would certainly make things slow and time out!

Honza Bambas (:mayhemer)

Comment 46

•

9 years ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #45)
> WAIT, WAIT, ...

Sorry, I forgot to mention it repeats several times in the log, apparently in serial order and probably is the cause of the overall slowdown.

François Marier [:francois]

Assignee

Comment 47

•

9 years ago

Attached file goog-malware-shavar.* — Details

Just attaching a SB DB which has the colliding entry in goog-malware-shavar to help with reproducing this bug.

François Marier [:francois]

Assignee

Comment 48

•

9 years ago

Alright, I was finally able to reproduce this problem:

1. create a new profile
2. shutdown Firefox
3. put the contents of goog-malware.* from attachment 8663898 [details] into the safebrowsing cache
4. start firefox
5. wait 10 seconds for the SB updates to finish
6. open http://tokyo-ame.jwa.or.jp/

Note that the second visit to the site in the same session will not be slow, which indicates that we are caching completion misses. Also, this has nothing to do with timeouts while talking to the Google servers.

This is what's happening:

- We load http://tokyo-ame.jwa.or.jp/ and that has several images that collide with partial hash 3D8D92FB (chunk 177585).
- For each image, we start a completion and add three noise entries.
- When all four completions for that image finish, we update the goog-malware-shavar table.
- When the completions for the next image come back, we again update the table.

So we need to figure out a way to put a limit on the frequency of database updates.

Gian-Carlo Pascutto [:gcp]

Comment 49

•

9 years ago

>- We load http://tokyo-ame.jwa.or.jp/ and that has several images that collide with partial hash 3D8D92FB (chunk 177585).

How is this possible? Common prefix that hits the normalizations we try to match?

>- For each image, we start a completion and add three noise entries.

For the same prefix? That's a bug - there should be only one. I guess the problem is the new queries come in while the response for the original is still outstanding? IMHO we should remember which queries are outstanding and not launch new ones for the same prefix.

>So we need to figure out a way to put a limit on the frequency of database updates.

Can we short circuit the case where the updates only contain completion updates? Most of the heavy duty work is (un)compressing the PrefixSet and that could be skipped entirely.

Gian-Carlo Pascutto [:gcp]

Comment 50

•

9 years ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #49)
> Can we short circuit the case where the updates only contain completion
> updates? Most of the heavy duty work is (un)compressing the PrefixSet and
> that could be skipped entirely.

I looked into this and the answer is no, because completes are cached in .cache but stored in .sbtore (together with the add chunks) - and .sbtore is the expensive file to process. Even just skipping the PrefixSet rebuild didn't work, probably due to the atomic updates.

I guess we have to try to first fix, i.e. don't allow multiple outstanding queries for the same prefix.

gnaka

Comment 51

•

9 years ago

I've got not to be able to reproduce the issue since around yesterday or today on Firefox 41.0. Something in goog-malware-shavar server may be changed.

EarlgreyTea

Comment 52

•

9 years ago

Today, I also can not reproduce the issue. Loading completed in about two seconds.

François Marier [:francois]

Assignee

Updated

•

9 years ago

Keywords: jp-critical

François Marier [:francois]

Assignee

Updated

•

9 years ago

Depends on: 1215396

François Marier [:francois]

Assignee

Updated

•

9 years ago

Depends on: 1229981

Loic

Comment 54

•

8 years ago

From the dupe:
Another website to test with crash at exit after Firefox stays too long in the process manager.
http://www.pferdekaemper.de/

Keywords: crash

François Marier [:francois]

Assignee

Comment 56

•

8 years ago

Attached file MozReview Request: Bug 1164518 - Better logging of completions. r?gcp — Details

Review commit: https://reviewboard.mozilla.org/r/37703/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/37703/

Attachment #8725949 - Flags: review?(gpascutto)

François Marier [:francois]

Assignee

Comment 57

•

8 years ago

Attached file MozReview Request: Bug 1164518 - Minor optimizations for Safe Browsing completions. r?gcp — Details

- added an early exit to skip table string comparisons on a 204
- removed an unnecessary NS_WARNING on gethash errors (e.g. 503s)
- do the noise check first since it's the majority of completions

Review commit: https://reviewboard.mozilla.org/r/37705/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/37705/

Attachment #8725950 - Flags: review?(gpascutto)

François Marier [:francois]

Assignee

Comment 58

•

8 years ago

Attached file MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp — Details

Review commit: https://reviewboard.mozilla.org/r/37707/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/37707/

Attachment #8725951 - Flags: review?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Comment 59

•

8 years ago

Comment on attachment 8725949 [details]
MozReview Request: Bug 1164518 - Better logging of completions. r?gcp

https://reviewboard.mozilla.org/r/37703/#review34409

Attachment #8725949 - Flags: review?(gpascutto) → review+

Gian-Carlo Pascutto [:gcp]

Comment 60

•

8 years ago

Comment on attachment 8725950 [details]
MozReview Request: Bug 1164518 - Minor optimizations for Safe Browsing completions. r?gcp

https://reviewboard.mozilla.org/r/37705/#review34411

Attachment #8725950 - Flags: review?(gpascutto) → review+

Gian-Carlo Pascutto [:gcp]

Comment 61

•

8 years ago

Comment on attachment 8725951 [details]
MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp

https://reviewboard.mozilla.org/r/37707/#review34413

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1473
(Diff revision 1)
> +  mWorker->ClearLastResults();

This needs a lock or proxy to the worker thread. Is it required to pass some tests? It's not clear why because there is supposed to be no functional impact from this change.

Attachment #8725951 - Flags: review?(gpascutto)

François Marier [:francois]

Assignee

Comment 62

•

8 years ago

Comment on attachment 8725951 [details]
MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37707/diff/1-2/

Attachment #8725951 - Flags: review?(gpascutto)

François Marier [:francois]

Assignee

Comment 63

•

8 years ago

https://reviewboard.mozilla.org/r/37707/#review34413

> This needs a lock or proxy to the worker thread. Is it required to pass some tests? It's not clear why because there is supposed to be no functional impact from this change.

Yeah, it's for [a test](https://dxr.mozilla.org/mozilla-central/rev/2b5237c178ea02133a777396c24dd2b713f2b8ee/toolkit/components/url-classifier/tests/unit/test_partial.js#561-599) that does something weird with multiple completers.

I added a proxy to the worker thread.

Gian-Carlo Pascutto [:gcp]

Updated

•

8 years ago

Attachment #8725951 - Flags: review?(gpascutto) → review+

Gian-Carlo Pascutto [:gcp]

Comment 64

•

8 years ago

Comment on attachment 8725951 [details]
MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp

https://reviewboard.mozilla.org/r/37707/#review34677

::: toolkit/components/url-classifier/Entries.h:185
(Diff revision 2)
> +  bool operator!=(const AddComplete& aOther) const

nit: inconsistent bracing

::: toolkit/components/url-classifier/LookupCache.h:72
(Diff revision 2)
> +  {

nit: inconsitent bracing

Gian-Carlo Pascutto [:gcp]

Comment 65

•

8 years ago

Attached file MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp (obsolete) — Details

Review commit: https://reviewboard.mozilla.org/r/38235/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/38235/

Attachment #8726784 - Flags: review?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Updated

•

8 years ago

Attachment #8726784 - Attachment is obsolete: true

Attachment #8726784 - Flags: review?(gpascutto)

François Marier [:francois]

Assignee

Comment 66

•

8 years ago

Comment on attachment 8725951 [details]
MozReview Request: Bug 1164518 - Avoid unnecessary DB updates when caching Safe Browsing results. r?gcp

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37707/diff/2-3/

Pulsebot

Comment 67

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/3f15dd1859d8
https://hg.mozilla.org/integration/mozilla-inbound/rev/105eec7abaf0
https://hg.mozilla.org/integration/mozilla-inbound/rev/7f5b604e1b5a

Ryan VanderMeulen [:RyanVM]

Comment 68

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3f15dd1859d8
https://hg.mozilla.org/mozilla-central/rev/105eec7abaf0
https://hg.mozilla.org/mozilla-central/rev/7f5b604e1b5a

Status: ASSIGNED → RESOLVED

Closed: 9 years ago → 8 years ago

status-firefox47: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla47

François Marier [:francois]

Assignee

Updated

•

8 years ago

Depends on: 1254310

François Marier [:francois]

Assignee

Updated

•

8 years ago

Depends on: 1254323

François Marier [:francois]

Assignee

Updated

•

8 years ago

QA Contact: mwobensmith

Jonathan Howard

Updated

•

8 years ago

Blocks: 1248837

Matt Wobensmith [:mwobensmith][:matt:]

Comment 69

•

8 years ago

I tried to reproduce the original problem in some builds from January and February, pre-checkin, but wasn't able to see it. I followed the steps in comment 48, and also tried the sample URL in comment 54. 

Obviously web sites change, so that could very well be the issue here. But currently, I won't be able to verify this fix. If someone feels strongly that I do so, please provide another example.

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox38: affected → wontfix

status-firefox38.0.5: ? → wontfix

status-firefox39: affected → wontfix

status-firefox40: affected → wontfix

status-firefox41: affected → wontfix

status-firefox-esr38: ? → wontfix

You need to log in before you can comment on or make changes to this bug.