525284 - When cancelling a HTTP transaction, its half-open sockets should be abandoned

Reporter

Description

•

16 years ago

When you run mochitests like this: TEST_PATH=browser/components/safebrowsing make -C $OBJDIR mochitest-browser-chrome browser_bug400731.js will fail, since the actual "fake" attack page will load. My thinking is that the db doesn't have a chance to update. If run as part of the whole browser/components directory it will pass. Is there no better way to do this? I.e. write to the phishing db, make your own local page and navigate to that...

Nochum Sossonko [:Natch]

Reporter

Comment 1

•

15 years ago

This still happens, the actual attack site "it's an attack" loads.

Joel Maher ( :jmaher ) (UTC -8)

Comment 2

•

11 years ago

hey, it has been a few more years, I am running into this also while running the tests as a standalone directory. In fact I am looking to do that for all browser-chrome tests in bug 992911. this seems to be limited to windows debug (I recall seeing it on osx as well): https://tbpl.mozilla.org/?tree=Try&rev=07d9eef3e4f0 here is a link to a win7 log: https://tbpl.mozilla.org/php/getParsedLog.php?id=38287837&tree=Try we are leaking a docshell until shutdown. :jonhnath, you were the original author of this test, any thoughts on why it fails when run as only the safebrowsing subdirectory (effectively a single test)?

Blocks: run-by-dir

Flags: needinfo?(johnath)

Johnathan Nightingale [:johnath]

Comment 3

•

11 years ago

I don't know the current state of the safebrowsing code, but I suspect the explanation goes: * this test checks to ensure we get our warning interstitial pages when navigating to our beacon sites - reference sites guaranteed to be in the safebrowsing databases * BUT tests run in a fresh profile and we don't ship a pre-populated safebrowsing DB since it would instantly be out of date * So this test is, it seems, a race. During a normal run we have time to start fetching safebrowsing DBs and the test passes, but when running them in the directory alone, I'm guessing we (often) don't have time to perform the fetch, and so the beacon sites aren't yet blocked. The most direct solution would be for the test to programmatically add the entries for the beacon sites to the DB before trying to navigate to them. That would let this test pass even when run in isolation. It would have the effect, though, of no longer exercising the background fetching from safebrowsing, so someone closer to the code would have to ensure that was tested elsewhere. Another option would be to block on some indication that the first background fetch had succeeded, but that might get into timeouts and generally be a real nuisance for people running small-subset testing. Adding gcp, with my apologies for writing something racy in his module.

Flags: needinfo?(johnath)

Joel Maher ( :jmaher ) (UTC -8)

Comment 4

•

11 years ago

Thanks johnath for the explanation here. This got me thinking a bit more. We do the best we can to limit outside network traffic, most of this is done with prefs, and we do some safebrowsing pref setting in automation (http://dxr.mozilla.org/mozilla-central/source/testing/profiles/prefs_general.js): user_pref("browser.safebrowsing.gethashURL", "http://%(server)s/safebrowsing-dummy/gethash"); user_pref("browser.safebrowsing.updateURL", "http://%(server)s/safebrowsing-dummy/update"); %(server) is replaced with mochi.test:8888, and I don't believe that resolves to anything, possibly a 404. Maybe gcp could help us ensure we have the right pre loading, prefs, fake server, etc. setup for testing. Could it be that setting these prefs still put something in the DB, we just wait for it to happen? In that case it would make sense to wait for an event and then do actions.

Gian-Carlo Pascutto [:gcp]

Comment 5

•

11 years ago

SafeBrowsing does not, in fact, run any (outside) network access during the automated testing. The database is updated locally with test URLs without involving Google's server. The fact that we don't test using Google's server has lead to some very severe stop-the-press bugs, which we're now only starting to address with bug 967568. The entries are added locally on startup here: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/SafeBrowsing.jsm#68 The problem is that updating that DB requires disk IO. Disk IO on startup is...bad: https://bugzilla.mozilla.org/show_bug.cgi?id=778855 As a result, the SafeBrowsing module currently forces a delay of 2 seconds after it's own initialization before it inserts the test URLs into the DB: http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#1022 I'm going to guess that running the tests without accounting for this delay is causing the problem you're seeing.

Joel Maher ( :jmaher ) (UTC -8)

Comment 6

•

11 years ago

gcp: I wonder if we could have tests for this in mozmill to account for live network access? is there an event I could listen on (i.e. mozaftersafebrowsingupdated)?

Gian-Carlo Pascutto [:gcp]

Comment 7

•

11 years ago

>gcp: I wonder if we could have tests for this in mozmill to account for live network access? That's what the linked bug 967568 is about. >is there an event I could listen on (i.e. mozaftersafebrowsingupdated)? No.

Joel Maher ( :jmaher ) (UTC -8)

Comment 8

•

11 years ago

Attached patch ugly hack to make this work via settimeout — Details — Splinter Review

:gcp, I have verified on try server that doing a 5 second delay before starting the test works. I know this is ugly- maybe you have some other advice on how to do this smarter? could I query a database for values and if none show up within 5 seconds fail? wait for other events?

Attachment #8411660 - Flags: feedback?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Comment 9

•

11 years ago

Comment on attachment 8411660 [details] [diff] [review] ugly hack to make this work via settimeout Review of attachment 8411660 [details] [diff] [review]: ----------------------------------------------------------------- >could I query a database for values and if none show up within 5 seconds fail? wait for other events? Querying a database would work but then you're testing something else entirely - seems pointless. I think we could add an event for the update, feel free to file a bug for that. That being said, the xpcshell tests use delays of 3 seconds in various place, probably for the same reasons. So I wouldn't worry too much about it, and maybe reduce the delay to 3 if that seems to work. I would guess that as long it's >2s you'd be fine.

Attachment #8411660 - Flags: feedback?(gpascutto) → feedback+

Joel Maher ( :jmaher ) (UTC -8)

Comment 10

•

11 years ago

hmm, while this appears to fix the problem, it doesn't in all cases. On windows XP Debug, we still leak a docshell until shutdown with the 5 second delay: https://tbpl.mozilla.org/php/getParsedLog.php?id=38479634&tree=Try open to any thoughts- maybe something I could cleanup

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

11 years ago

Depends on: 1007687

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: Firefox → Toolkit

Vaibhav (:vaibhav1994)

Comment 11

•

11 years ago

This test is still failing for osx and windows debug bc1 jobs as can been seen in a recent try push: https://tbpl.mozilla.org/?tree=Try&rev=4d3f23cc8c3c The goal is to turn these on as --run-by-dir in 2 weeks, we don't have too many tests that are problematic(19 total), if we can fix them now that will help! :) Gian-Carlo, can you figure out why this test might be leaking for when running tests only in the directory: browser/components/safebrowsing/content/test ?

Flags: needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Comment 12

•

11 years ago

I don't know offhand what's up aside from what I already said in the comments above.

Flags: needinfo?(gpascutto)

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

11 years ago

Blocks: 1057512

Joel Maher ( :jmaher ) (UTC -8)

Comment 13

•

11 years ago

I have not had success with the notifications from bug 1007687. My initial tests had the test disabled in another patch. Testing this in more details, I have found that we always have safebrowsing initialized (we never wait for observers). So given that fact we still have failures: https://tbpl.mozilla.org/?tree=Try&rev=9e0dc5d6515a (win7 and win8) I thought, why not wait 3 seconds before starting the test (as mentioned above): https://tbpl.mozilla.org/?tree=Try&rev=78c857cae3db doing that yields winxp perma failures. I guess what can we do to figure out what is missing? Can we do something like this: * add observers * for a db update * wait for observer to be notified * run test that would ensure the database is updated, but I am not sure that is what is causing this failure. :gcp, any further thoughts here?

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 14

•

11 years ago

heh, I am awesome at this bugzilla tool, I ni? myself!

Flags: needinfo?(jmaher) → needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Comment 15

•

11 years ago

Unless I'm misunderstanding those logs show the SafeBrowsing tests themselves are passing, so I'm not sure your docShell issue is because the updates don't work.

Flags: needinfo?(gpascutto)

Joel Maher ( :jmaher ) (UTC -8)

Comment 16

•

11 years ago

The tests pass, but the problem is we leak a docshell after shutdown (see comment 2). I really don't know what to do here, but I am willing to keep hacking if given some direction.

Flags: needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Comment 17

•

11 years ago

I'm not sure why you're asking me for direction :-) SafeBrowsing works. The problem is elsewhere. I started looking at this (with the big disclaimer I know absolutely nothing about it) and as far as I can tell comment 2 and beyond are probably totally off in the wrong direction. The head.js file for these tests forces a SafeBrowsing.init(), which means the case from comment 0 and comment 1 can't exist because the test itself forces SafeBrowsing to initialize before starting. Looking at the hg blame for that file, ehsan fixed bug 800394 to do exactly this (and at that point this bug should've been closed as well). That's also what you observed in comment 13. That still leaves the question why the test occasionally leaks a docShell on Windows if you run it in the specific way you indicate. I've looked at the test source, but I don't see any *obvious* ways in which it fails to close a tab or remove a listener. Is this reproducible locally for you? Maybe you can physically see what's wrong?

Flags: needinfo?(gpascutto)

Joel Maher ( :jmaher ) (UTC -8)

Comment 18

•

11 years ago

I have a loaner w7 machine and I haven't been able to reproduce this at all on there (running it manually, via mozharness, etc.). The problem is this reproduces so easily when run on try server. The issue is we leak a docshell window 100% of the time while running per directory.

Joel Maher ( :jmaher ) (UTC -8)

Comment 19

•

11 years ago

gcp, I have tried hacking around in the test case- honestly I am not sure what to do next here. It sounds like we are back to square one with this test case. should we adjust our listeners for the other two tests like we do here: http://hg.mozilla.org/mozilla-central/rev/dd2ae94d1e01

Flags: needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Comment 20

•

11 years ago

(In reply to Joel Maher (:jmaher) from comment #19) > should we adjust our listeners for the other two tests like we do here: > http://hg.mozilla.org/mozilla-central/rev/dd2ae94d1e01 No, that's specific to deal with the about:blank page that may or may not be finished loading before the test gets to set the test page (see comment in the JS file). I don't see any way that the other tests could require it.

Flags: needinfo?(gpascutto)

Joel Maher ( :jmaher ) (UTC -8)

Comment 21

•

11 years ago

Thanks for all your replies gcp! With no progress on actually fixing this bug and no things to try, I am tempted to disable it for osx and windows debug. This is way out of my knowledge domain- please let me know if there is something else to try vs your thoughts on disabling this test.

Ryan VanderMeulen [:RyanVM]

Comment 22

•

11 years ago

Maybe Tim can help? :)

Flags: needinfo?(ttaubert)

Tim Taubert [:ttaubert] (inactive)

Assignee

Comment 23

•

11 years ago

Attached patch 0001-Bug-525284-When-cancelling-a-HTTP-transaction-abando.patch — Details — Splinter Review

When a HTTP request is canceled by the channel classifier because we consider the URL malware we also cancel the HTTP transaction but do not remove any half-open sockets belonging to this transaction. This seems to cause the permanent docShell leaks we're seeing here and fixes them locally for me. I'm not sure whether it's fine to always abandon half-open sockets when a transaction is canceled or whether we should only do that for blocked requests.

Attachment #8481765 - Flags: review?(mcmanus)

Attachment #8481765 - Flags: review?(hurley)

Flags: needinfo?(ttaubert)

Tim Taubert [:ttaubert] (inactive)

Assignee

Updated

•

11 years ago

Summary: browser_bug400731.js is fragile, not always passing. → Permanent "browser_bug400731.js | leaked 1 docShell(s) until shutdown" when running tests only in browser/components/safebrowsing/content/test/

Patrick McManus [:mcmanus]

Comment 24

•

11 years ago

Comment on attachment 8481765 [details] [diff] [review] 0001-Bug-525284-When-cancelling-a-HTTP-transaction-abando.patch Review of attachment 8481765 [details] [diff] [review]: ----------------------------------------------------------------- Thanks Tim, This would probably actually eventually time out and not leak, but your patch is the intended behavior. ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp @@ +2299,5 @@ > } > } > + > + // Abandon all half-open sockets belonging to the given transaction. > + for (uint32_t index = 0; can you just move this up inside the if (ent) block above and make it the else{} clause to the existing if (index >= 0) bit.. they should be mutually exclusive.

Attachment #8481765 - Flags: review?(mcmanus) → review+

Ryan VanderMeulen [:RyanVM]

Comment 25

•

11 years ago

(In reply to AFK Aug 22- Sep 01 Patrick McManus [:mcmanus] from comment #24) > This would probably actually eventually time out and not leak, but your > patch is the intended behavior. Probably explains why we're only seeing these leaks in the run-by-dir case where the browser instance is much shorter-lived than it is now. Thanks for the investigative work! Run-by-dir is almost ready to land :)

Tim Taubert [:ttaubert] (inactive)

Assignee

Comment 26

•

11 years ago

(In reply to AFK Aug 22- Sep 01 Patrick McManus [:mcmanus] from comment #24) > can you just move this up inside the if (ent) block above and make it the > else{} clause to the existing if (index >= 0) bit.. they should be mutually > exclusive. They don't seem mutually exclusive, I still hit the leak by making it the else-clause. I will move it inside |if (ent)| and make it iterate regardless of the index. Thanks for the fast review!

Tim Taubert [:ttaubert] (inactive)

Assignee

Comment 27

•

11 years ago

https://tbpl.mozilla.org/?tree=Try&rev=67533dfe9869

Tim Taubert [:ttaubert] (inactive)

Assignee

Updated

•

11 years ago

Attachment #8481765 - Flags: review?(hurley)

Tim Taubert [:ttaubert] (inactive)

Assignee

Updated

•

11 years ago

Assignee: nobody → ttaubert

Status: NEW → ASSIGNED

Tim Taubert [:ttaubert] (inactive)

Assignee

Updated

•

11 years ago

Component: Phishing Protection → Networking: HTTP

Product: Toolkit → Core

Tim Taubert [:ttaubert] (inactive)

Assignee

Comment 28

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/30372695cda3

Tim Taubert [:ttaubert] (inactive)

Assignee

Updated

•

11 years ago

Summary: Permanent "browser_bug400731.js | leaked 1 docShell(s) until shutdown" when running tests only in browser/components/safebrowsing/content/test/ → When cancelling a HTTP transaction, its half-open sockets should be abandoned

Phil Ringnalda (:philor)

Comment 29

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/30372695cda3

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla34

Patrick McManus [:mcmanus]

Updated

•

11 years ago

Depends on: 1104993

ugly hack to make this work via settimeout 11 years ago Joel Maher ( :jmaher ) (UTC -8) 481 bytes, patch	gcp : feedback+	Details \| Diff \| Splinter Review
0001-Bug-525284-When-cancelling-a-HTTP-transaction-abando.patch 11 years ago Tim Taubert [:ttaubert] (inactive) 1.90 KB, patch	mcmanus : review+	Details \| Diff \| Splinter Review