Closed Bug 1522412 Opened 2 years ago Closed 1 year ago

Use a heuristic rule to determine if a channel should be classified

Categories

(Toolkit :: Safe Browsing, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: dimi, Assigned: dimi)

References

(Blocks 3 open bugs)

Details

Attachments

(7 files)

We plan to use a heuristic way to determine if a channel should be classified instead of specifying manually by the caller.

Link to the design document:
https://docs.google.com/document/d/1byFAOCVhg5669AwbU5i0cfs_MrzTojXTiK2EMn_-Kno/edit

Summary: Use a heuristic bypassing rule to determine if a channel should be classified → Use a heuristic rule to determine if a channel should be classified
Depends on: 1530125
Blocks: 1532691

In this patch, we move from a model where URL classification is opt-in
(by specifying LOAD_CLASSIFIER_URI) to a model where it is enforced by
default unless a channel opts out or is deemed to be a critical channel
based on simple heuristics.

The heuristics exempt a channel from classification if it is triggered
by system with an exception when it is a top level load.

To ensure critical channels are never classified, we also exempt
channels which are flagged as "BeConservative" (implemented in bug 1321783).
Another load flag LOAD_BYPASS_URL_CLASSIFIER is also introduced
for the same reason.

nsIChannel.LOAD_CLASSIFY_URI is no longer required so we can remove it from
the codebase.
In the mean time, we add a new LOAD_BYPASS_URL_CLASSIFIER load flag for
channel creator to be able to force channel to bypass URL classifier check.
The use of the new LOAD_BYPASS_URL_CLASSIFIER flag will be addressed in
the other patches.

This patch uses the flag to exempt channels from classification, but it
doesn't include the use cases of this flag.

See Bug 1442496 for the list of the call sites should use this flag.

We want to ensure SafeBrowsing update request is never classified so we
can recover from a bad SafeBrowsing database.

SafeBrowsing lookup request is also critical because if it is blocked,
it means SafeBrowsing is out of function.

Some testcases are chrome scope testcase hence the resources it loads are triggered by
"system principal". This will exempt the resource from classification.

So in this patch, we used a new window to for the tracker test frame.
The window creation code is referenced from test_privatebrowsing_trackingprotection.html.

This patch adds a xpcsehll-test which tests all the combinations of
those parameters used by the heuristic classification algorithm.

The goal of this testcase is to provide an easier way to add callsites
to test if it is correctly classified. This is a first step, more callsites should be added to
the testcase(See Bug 1532691)

Flow of the test case:

  1. setup the server(trackerFrame.sjs) with the expceted number of request it should receive
  2. load the test frame(trackerFrame.html) with cookie restriction off, to ensure all the tracking requests contain cookies
  3. server responses a list of tracker's request with cookie after reciving all the requests
  4. the list should contain all the trackers in the test frame.
  5. enable cookie restriction and load the test frame again.
  6. server responses a list of tracker's request without cookie after reciving all the requests
  7. the list should contain all the trackers in the test frame.
Blocks: 1442496
Priority: P2 → P1

This is just a note about how I verified the new heuristic works, to ensure there is no missing exception we don't consider.

I audited all the call sites where we create a channel and added "function name" + "file line number" to the LoadInfo.
With this information, together with "IsTriggeringPrinciaplSystemPrincipal", "IsLoadingPrinciaplSystemPrincipal", "IsContentPolicyTypeTypDocument" and "IsOldLoadClassifyURIFlagSet" as a signature.
In the places where we want to know if a channel should be classified, record the "channel's url" and the "heuristic result" for each signature.

Then we get a bunch of data likes these while wandering on the web
{.triggerIsSystem=false, .loadingIsSystem=false, .oldLoadFlags=set, .contentPolicy=other, .func="nsresult mozilla::dom::FetchDriver::HttpFetch:514", .url="https://syndication.twitter.com/settings", .heurisitcResult=classify},
{.triggerIsSystem=true, .loadingIsSystem=true, .oldLoadFlags=set, .contentPolicy=document, .func="nsresult mozilla::dom::FetchDriver::HttpFetch:521", .url=""https://googlechrome.github.io/samples/service-worker/basic/index.html"", .heurisitcResult=true},

Then I manualy check is there any suspicous result from the data I collected. By doing this, around 100 call sites are verified.
But there are still call sites I fail to collect data this way(can't trigger the code path), so I examined the code of those (around 30 call sites) to see if they are good with the new approach.

There are still a few call sites I haven't checked yet(pdf.js, webpayment ...etc). But I think the result I have collected already prove that this is a robust approach to use.

Report:
https://docs.google.com/spreadsheets/d/1-sOQAgJqUaaeBww4kdlwkYuj-SRAsPN1C3rnQzK6u8U/edit?usp=sharing

Pushed by dlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b80098d0a5c4
P1. Replace LOAD_CLASSIFY_URI flag with a heuristic algorithm. r=Ehsan,mayhemer
https://hg.mozilla.org/integration/autoland/rev/91e5b339fe11
P2. Remove nsIChannel.LOAD_CLASSIFY_URI flag. r=Ehsan
https://hg.mozilla.org/integration/autoland/rev/5399bca472b1
P3. Adopt nsIChannel.LOAD_BYPASS_URL_CLASSIFIER in the algorithm determining if we should classify a channel's URI. r=Ehsan,mayhemer
https://hg.mozilla.org/integration/autoland/rev/fe288a48cecb
P4. Use LOAD_BYPASS_URL_CLASSIFIER flag for SafeBrowsing update/lookup r=Ehsan
https://hg.mozilla.org/integration/autoland/rev/850e1c68d978
P5. Do not use system principal in testcase to test tracking protection. r=Ehsan
https://hg.mozilla.org/integration/autoland/rev/ebb581ddf1d1
P6. Add a xpcshell-test for default classification. r=Ehsan
https://hg.mozilla.org/integration/autoland/rev/7c998456e1fd
P7. A mochitest to test different channel open call sites are classified. r=Ehsan
Backout by rgurzau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/747a5da93708
Backed out 7 changesets for xpcshell failures at test_shouldclassify.js on a CLOSED TREE.

Backed out 7 changesets (bug 1522412) for xpcshell failures at test_shouldclassify.js on a CLOSED TREE.

Backout link: https://hg.mozilla.org/integration/autoland/rev/747a5da93708d6ad12832272d794b21d825a4bb9

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=7c998456e1fd5aa0271b8508a25a5becec2768e0&selectedJob=235152734

Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=235152734&repo=autoland&lineNumber=10326

Log snippet:

09:35:44 INFO - TEST-START | toolkit/components/url-classifier/tests/unit/test_shouldclassify.js
09:35:45 WARNING - TEST-UNEXPECTED-FAIL | toolkit/components/url-classifier/tests/unit/test_shouldclassify.js | xpcshell return code: 0
09:35:45 INFO - TEST-INFO took 468ms
09:35:45 INFO - >>>>>>>
09:35:45 INFO - PID 13864 | Unable to load \untrusted-startup-test-dll.dll; LoadLibraryW failed: 126Couldn't convert chrome URL: chrome://branding/locale/brand.properties
09:35:45 INFO - PID 13864 | [13864, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file z:/build/build/src/netwerk/base/nsIOService.cpp, line 941
09:35:45 INFO - PID 13864 | [13864, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 81
09:35:45 INFO - PID 13864 | [13864, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 81
09:35:45 INFO - PID 13864 | [13864, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 81
09:35:45 INFO - PID 13864 | [13864, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 81
09:35:45 INFO - PID 13864 | [13864, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 81
09:35:45 INFO - PID 13864 | [13864, Main Thread] WARNING: Couldn't get the user appdata directory. Crash events may not be produced.: file z:/build/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 2528
09:35:45 INFO - PID 13864 | [13864, Main Thread] WARNING: This method is lossy. Use GetCanonicalPath !: file z:/build/build/src/xpcom/io/nsLocalFileWin.cpp, line 3188
09:35:45 INFO - PID 13864 | [13864, Main Thread] WARNING: This method is lossy. Use GetCanonicalPath !: file z:/build/build/src/xpcom/io/nsLocalFileWin.cpp, line 3188
09:35:45 INFO - PID 13864 | [13864, Main Thread] WARNING: This method is lossy. Use GetCanonicalPath !: file z:/build/build/src/xpcom/io/nsLocalFileWin.cpp, line 3188
09:35:45 INFO - PID 13864 | [13864, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, kKnownEsrVersion) failed with result 0x80004002: file z:/build/build/src/toolkit/components/resistfingerprinting/nsRFPService.cpp, line 662
09:35:45 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
09:35:45 INFO - (xpcshell/head.js) | test run_next_test 0 pending (2)
09:35:45 INFO - (xpcshell/head.js) | test MAIN run_test finished (2)
09:35:45 INFO - running event loop

Flags: needinfo?(dlee)
Pushed by dlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/eb789cfccafc
P1. Replace LOAD_CLASSIFY_URI flag with a heuristic algorithm. r=Ehsan,mayhemer
https://hg.mozilla.org/integration/autoland/rev/393b5a62f630
P2. Remove nsIChannel.LOAD_CLASSIFY_URI flag. r=Ehsan
https://hg.mozilla.org/integration/autoland/rev/64664ada91ee
P3. Adopt nsIChannel.LOAD_BYPASS_URL_CLASSIFIER in the algorithm determining if we should classify a channel's URI. r=Ehsan,mayhemer
https://hg.mozilla.org/integration/autoland/rev/9f3f62ed5b24
P4. Use LOAD_BYPASS_URL_CLASSIFIER flag for SafeBrowsing update/lookup r=Ehsan
https://hg.mozilla.org/integration/autoland/rev/63bcf0242e87
P5. Do not use system principal in testcase to test tracking protection. r=Ehsan
https://hg.mozilla.org/integration/autoland/rev/0126685be740
P6. Add a xpcshell-test for default classification. r=Ehsan
https://hg.mozilla.org/integration/autoland/rev/deee64970b40
P7. A mochitest to test different channel open call sites are classified. r=Ehsan
Depends on: 1539150
Flags: needinfo?(dlee)
Blocks: 1453448
You need to log in before you can comment on or make changes to this bug.