Closed
Bug 1285848
Opened 8 years ago
Closed 8 years ago
Supports Rice-encoded table update for v4
Categories
(Toolkit :: Safe Browsing, defect, P2)
Toolkit
Safe Browsing
Tracking
()
RESOLVED
FIXED
mozilla52
Tracking | Status | |
---|---|---|
firefox52 | --- | fixed |
People
(Reporter: hchang, Assigned: hchang)
References
Details
(Whiteboard: #sbv4-m1)
Attachments
(3 files, 4 obsolete files)
In SB v4, Rice Golomb encoding is introduced to reduce the prefix list size. We need a decompression function to decode the compressed prefixes.
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → hchang
Assignee | ||
Comment 1•8 years ago
|
||
Assignee | ||
Comment 2•8 years ago
|
||
Attachment #8770498 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Whiteboard: #sbv4-m1
Updated•8 years ago
|
Summary: [SafeBrowsing] Supports encoded table update for v4 → Supports encoded table update for v4
Updated•8 years ago
|
Blocks: safebrowsingv4
Assignee | ||
Comment 3•8 years ago
|
||
Attachment #8770500 -
Attachment is obsolete: true
Assignee | ||
Comment 4•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/68276/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/68276/
Assignee | ||
Comment 5•8 years ago
|
||
Comment on attachment 8776503 [details] Bug 1285848 - Part 1: Implement Rice Delta Decoding. . Review request updated; see interdiff: https://reviewboard.mozilla.org/r/68276/diff/1-2/
Assignee | ||
Comment 6•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Attachment #8776499 -
Attachment is obsolete: true
Assignee | ||
Comment 7•8 years ago
|
||
RICE decoding related codes have landed Chromium a week ago [1]. Reading the code and see we might be able to reuse ... (maybe just the test case) [1] https://chromium.googlesource.com/chromium/src/+/9e71d408737100714f022dffe20dfb346d395bf9
Updated•8 years ago
|
Summary: Supports encoded table update for v4 → Supports Rice-encoded table update for v4
Updated•8 years ago
|
Priority: -- → P2
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8777288 -
Attachment is obsolete: true
Comment 10•8 years ago
|
||
mozreview-review |
Comment on attachment 8776503 [details] Bug 1285848 - Part 1: Implement Rice Delta Decoding. . https://reviewboard.mozilla.org/r/68276/#review71496 Clearing review while Henry investigates the Chromium code.
Attachment #8776503 -
Flags: review?(francois)
Comment 11•8 years ago
|
||
mozreview-review |
Comment on attachment 8783460 [details] Bug 1285848 - Part 2: Request and parse RICE encoded prefix by default. https://reviewboard.mozilla.org/r/73252/#review71498 Clearing review while Henry investigates the Chromium code.
Attachment #8783460 -
Flags: review?(francois)
Assignee | ||
Comment 12•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Attachment #8776503 -
Flags: review?(francois)
Assignee | ||
Updated•8 years ago
|
Attachment #8776503 -
Flags: review?(francois)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 17•8 years ago
|
||
mozreview-review |
Comment on attachment 8783460 [details] Bug 1285848 - Part 2: Request and parse RICE encoded prefix by default. https://reviewboard.mozilla.org/r/73252/#review80424 A few things to address. The only thing that's wrong here is the bug that Henry already knew about (`1` v. `4`). ::: toolkit/components/url-classifier/HashStore.cpp:171 (Diff revision 3) > TableUpdateV4::NewPrefixes(int32_t aSize, std::string& aPrefixes) > { > - NS_ENSURE_TRUE_VOID(aPrefixes.size() % aSize == 0); > - NS_ENSURE_TRUE_VOID(!mPrefixesMap.Get(aSize)); > + MOZ_ASSERT(aPrefixes.size() % aSize == 0); > + MOZ_ASSERT(!mPrefixesMap.Get(aSize)); > + > +#if 1 // DEBUG This should be `if (LOG_ENABLED()) { ... }`. Henry says he wants to also dump the last ten elements. ::: toolkit/components/url-classifier/ProtocolParser.cpp:981 (Diff revision 3) > > return NS_OK; > } > > +static void > +ReverseByte(uint8_t& b) If you can find an existing function in gecko, that would be great because we could rely on it and not have to write a test for this one. ::: toolkit/components/url-classifier/ProtocolParser.cpp:1048 (Diff revision 3) > + > + nsTArray<uint32_t> decoded; > + nsresult rv = DoRiceDeltaDecode(aAddition.rice_hashes(), decoded); > + NS_ENSURE_SUCCESS(rv, rv); > + > + // Say we have raw prefixes "Say we have the following raw prefixes" ::: toolkit/components/url-classifier/ProtocolParser.cpp:1060 (Diff revision 3) > + // which can be treated as uint32 (big-endian) sorted in increasing order: > + // > + // [1, 512, 196608, 67108864] > + // > + // According to https://developers.google.com/safe-browsing/v4/compression, > + // they should be done the following operations prior to compression: "the following should be done prior to compression:" ::: toolkit/components/url-classifier/ProtocolParser.cpp:1063 (Diff revision 3) > + // > + // According to https://developers.google.com/safe-browsing/v4/compression, > + // they should be done the following operations prior to compression: > + // > + // 1) re-interpret in little-endian ==> [16777216, 131072, 768, 4] > + // 2) sort in increaing order ==> [4, 768, 131072, 16777216] typo: "increasing" ::: toolkit/components/url-classifier/ProtocolParser.cpp:1072 (Diff revision 3) > + // > + // 1) sort in big-endian order ==> [16777216, 131072, 768, 4] > + // 2) copy each uint32 in little-endian to the result string > + // > + > + // The 4-byte prefixes have to be re-sorted in the lexical order. "in the lexical order" => "in the lexical (Big Endian) order" ::: toolkit/components/url-classifier/ProtocolParser.cpp:1092 (Diff revision 3) > + > + // The encoded prefixes are always 4 bytes. > + std::string prefixes; > + for (size_t i = 0; i < decoded.Length(); i++) { > + char p[4]; > + NativeEndian::copyAndSwapToLittleEndian(p, &decoded[i], 4); Note from Henry: that last argument should be a `1`, not a `4` because it's the number of elements, not the size in bytes. I will also suggest a small comment at the end of that line to say that it's number of elements and not bytes, because everywhere else is in bytes. ::: toolkit/components/url-classifier/nsUrlClassifierUtils.cpp:110 (Diff revision 3) > { > aListUpdateRequest->set_threat_type(aThreatType); > aListUpdateRequest->set_platform_type(GetPlatformType()); > aListUpdateRequest->set_threat_entry_type(URL); > > // Only RAW data is supported for now. We can remove this comment now :)
Attachment #8783460 -
Flags: review?(francois) → review-
Comment 18•8 years ago
|
||
mozreview-review |
Comment on attachment 8776503 [details] Bug 1285848 - Part 1: Implement Rice Delta Decoding. . https://reviewboard.mozilla.org/r/68276/#review80460 A few general comments, as we discussed in person: - Since we're going to fork this code from WebRTC, can we modify the guts of the decoder to simplify the rest of our code (e.g. by removing some bit reversals)? - We should only support test cases in the Safe Browsing format to keep the code simpler. - The decoder returning true/false is fine by me (as opposed to detailed error codes like in Chromium), but it would be good to `LOG()` all errors explicitly to help with our future debugging. ::: toolkit/components/url-classifier/RiceDeltaDecoder.cpp:10 (Diff revision 5) > +#include "RiceDeltaDecoder.h" > + > +namespace { > + > +//////////////////////////////////////////////////////////////////////// > +// BitBuffer is copied and modified from webrtc/base/bitbuffer.h Since you're going to modify this, we should say something like "BitBuffer is based on the equivalent class in webrtc/base/bitbuffer.h" And then include the copyright header from that file: ``` /* * Copyright 2015 The WebRTC Project Authors. All rights reserved. * * Use of this source code is governed by a BSD-style license * that can be found in the LICENSE file in the root of the source * tree (media/webrtc/<whatever>). An additional intellectual property rights grant can be found * in the file PATENTS. All contributing project authors may * be found in the AUTHORS file in the root of the source tree. */ ``` ::: toolkit/components/url-classifier/RiceDeltaDecoder.cpp:93 (Diff revision 5) > + uint32_t aFirstValue, > + uint32_t aNumEntries, > + uint32_t* aDecodedData, > + bool aIsRemainderBigEndian) > +{ > + BitBuffer bitBuffer(mEncodedData, mEncodedDataSize); A comment defining the terminology (which seems to be standard) would be useful: ``` // N = the delta // q = quotient // r = remainder // k = RICE parameter ``` ::: toolkit/components/url-classifier/RiceDeltaDecoder.cpp:96 (Diff revision 5) > + bool aIsRemainderBigEndian) > +{ > + BitBuffer bitBuffer(mEncodedData, mEncodedDataSize); > + > + const uint32_t k = aRiceParameter; > + uint32_t last = aFirstValue; I would suggest naming this `previous` instead of `last` to remove the ambiguity. ::: toolkit/components/url-classifier/RiceDeltaDecoder.cpp:102 (Diff revision 5) > + for (uint32_t i = 0; i < aNumEntries; i++) { > + // Read the quotient of N. > + uint32_t q; > + NS_ENSURE_TRUE(bitBuffer.ReadExponentialGolomb(&q), false); > + > + // Read the remainder of N. `// Read the remainder of N, one bit at a time.` ::: toolkit/components/url-classifier/RiceDeltaDecoder.cpp:107 (Diff revision 5) > + // Read the remainder of N. > + uint32_t r = 0; > + for (uint32_t j = 0; j < k; j++) { > + uint32_t b = 0; > + if (!bitBuffer.ReadBits(&b, 1)) { > + // Insufficient bits. Just remain zero. "Just leave them as zeros." ::: toolkit/components/url-classifier/RiceDeltaDecoder.cpp:110 (Diff revision 5) > + uint32_t b = 0; > + if (!bitBuffer.ReadBits(&b, 1)) { > + // Insufficient bits. Just remain zero. > + break; > + } > + if (aIsRemainderBigEndian) { `// Add the bit to the right position so that it's in Little Endian order.` ::: toolkit/components/url-classifier/RiceDeltaDecoder.cpp:133 (Diff revision 5) > +namespace { > +////////////////////////////////////////////////////////////////////////// > +// The BitBuffer impl is copied and modified from webrtc/base/bitbuffer.cc > +// > + > +#define RTC_DCHECK_LE(v1, v2) Since we're going to modify the code, we can now remove these empty defines and change the corresponding uses of them in the code to use the standard Mozilla assertions. ::: toolkit/components/url-classifier/RiceDeltaDecoder.cpp:153 (Diff revision 5) > + return (byte & mask) >> shift; > +} > + > +BitBuffer::BitBuffer(const uint8_t* bytes, size_t byte_count) > + : bytes_(bytes), byte_count_(byte_count), byte_offset_(), bit_offset_() { > + RTC_DCHECK(static_cast<uint64_t>(byte_count_) <= I think we could simply replace `RTC_DCHECK` with `MOZ_ASSERT` here, no? ::: toolkit/components/url-classifier/tests/gtest/TestRiceDeltaDecoder.cpp:1 (Diff revision 5) > +#include "gtest/gtest.h" Include standard copyright header for tests (I think it's CC0). ::: toolkit/components/url-classifier/tests/gtest/TestRiceDeltaDecoder.cpp:20 (Diff revision 5) > +static bool runOneTest(TestingData& aData, bool aSBV4Order); > + > +TEST(RiceDeltaDecoder, NetworkOrder) > +{ > + static const std::vector<TestingData> TESTING_DATA = { > + // The first batch of testing data, which has the same delta values. "same delta values (1)." ::: toolkit/components/url-classifier/tests/gtest/TestRiceDeltaDecoder.cpp:52 (Diff revision 5) > + {1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, > + {0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01}, > + 7 > + }, > + > + // The second batch of testing data, which has random delta values. "random" => "different" ::: toolkit/components/url-classifier/tests/gtest/TestRiceDeltaDecoder.cpp:102 (Diff revision 5) > +// in |runOneTest| for more detail. > +TEST(RiceDeltaDecoder, SBV4Order) { > + > + // The following structure and testing data is copied from Chromium source code: > + // > + // http://goo.gl/VgFIAc I don't trust the longevity of URL shorteners. Let's just use https://chromium.googlesource.com/chromium/src.git/+/950f9975599768b6a08c7146cb4befa161be87aa/components/safe_browsing_db/v4_rice_unittest.cc#75 even though it's ugly :) ::: toolkit/components/url-classifier/tests/gtest/TestRiceDeltaDecoder.cpp:104 (Diff revision 5) > + > + // The following structure and testing data is copied from Chromium source code: > + // > + // http://goo.gl/VgFIAc > + // > + // and will be tranlated to our own testing format. typo: translated ::: toolkit/components/url-classifier/tests/gtest/TestRiceDeltaDecoder.cpp:120 (Diff revision 5) > + , mDeltas(aDeltas) > + , mEncoded(aEncoded) > + { > + } > + }; > + We can add `// ----- Start of Chromium test code ----` here along with the Chromium copyright header: ``` // Copyright 2016 The Chromium Authors. All rights reserved. // Use of this source code is governed by a BSD-style license that can be // found in the <wherever that file is in gecko>/LICENSE file. ``` ::: toolkit/components/url-classifier/tests/gtest/TestRiceDeltaDecoder.cpp:187 (Diff revision 5) > + 287633354, 801308671, 424169435, 372520475, 277287849}, > + "\xb2\x2c\x26\x3a\xcd\x66\x9c\xdb\x5f\x07\x2e\x6f\xe6\xf9\x21\x10\x52" > + "\xd5\x94\xf4\x82\x22\x48\xf9\x9d\x24\xf6\xff\x2f\xfc\x6d\x3f\x21\x65" > + "\x1b\x36\x34\x56\xea\xc4\x21\x00"), > + }; > + We can add `// ----- End of Chromium test code ----` here. ::: toolkit/components/url-classifier/tests/gtest/TestRiceDeltaDecoder.cpp:204 (Diff revision 5) > + // set to an arbitrary value, say 7, to avoid any assumption to the > + // first value in the implementation. > + d.mExpectedDecoded.resize(tdc.mDeltas.size() + 1); > + for (size_t i = 0; i < d.mExpectedDecoded.size(); i++) { > + if (0 == i) { > + d.mExpectedDecoded[i] = 7; Add `// "7" is an arbitrary starting value` to the end of that line.
Attachment #8776503 -
Flags: review?(francois) → review-
Assignee | ||
Comment 19•8 years ago
|
||
mozreview-review-reply |
Comment on attachment 8776503 [details] Bug 1285848 - Part 1: Implement Rice Delta Decoding. . https://reviewboard.mozilla.org/r/68276/#review80460 As for "the guts of the decoder", I'd like to confirm with you that it doesn't imply the bitbuffer but the RiceDeltaDecoder.
Updated•8 years ago
|
No longer blocks: safebrowsingv4
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 22•8 years ago
|
||
mozreview-review |
Comment on attachment 8776503 [details] Bug 1285848 - Part 1: Implement Rice Delta Decoding. . https://reviewboard.mozilla.org/r/68276/#review81558 ::: toolkit/components/url-classifier/RiceDeltaDecoder.cpp:18 (Diff revisions 5 - 6) > +/* > + * Copyright 2015 The WebRTC Project Authors. All rights reserved. > + * > + * Use of this source code is governed by a BSD-style license > + * that can be found in the LICENSE file in the root of the source > + * tree (media/webrtc/<whatever>). An additional intellectual property `<whatever>` should be replaced by the actual path :) ::: toolkit/components/url-classifier/RiceDeltaDecoder.cpp:159 (Diff revisions 5 - 6) > namespace { > ////////////////////////////////////////////////////////////////////////// > // The BitBuffer impl is copied and modified from webrtc/base/bitbuffer.cc > // > > -#define RTC_DCHECK_LE(v1, v2) > +#define RTC_DCHECK_LE(v1, v2) MOZ_ASSERT(v1 <= v2) Since we are forking the webrtc version of this code and aren't planning to merge changes in the future, I don't think there's any point in leaving the original macro names. We may as well do a search/replace in the .cpp and use `MOZ_ASSERT()` directly. ::: toolkit/components/url-classifier/tests/gtest/TestRiceDeltaDecoder.cpp:2 (Diff revisions 5 - 6) > +/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ > +/* This Source Code Form is subject to the terms of the Mozilla Public nit: For tests, we can use the Public Domain copyright header (https://www.mozilla.org/en-US/MPL/headers/). This would enable Chromium to take our tests if they wanted to. ::: toolkit/components/url-classifier/tests/gtest/TestRiceDeltaDecoder.cpp:50 (Diff revisions 5 - 6) > } > }; > > + // Copyright 2016 The Chromium Authors. All rights reserved. > + // Use of this source code is governed by a BSD-style license that can be > + // found in the <wherever that file is in gecko>/LICENSE file. Again, the `<whatever...>` placeholder should be replaced with the actual location in the source tree.
Attachment #8776503 -
Flags: review?(francois) → review-
Comment 23•8 years ago
|
||
mozreview-review |
Comment on attachment 8783460 [details] Bug 1285848 - Part 2: Request and parse RICE encoded prefix by default. https://reviewboard.mozilla.org/r/73252/#review81570
Attachment #8783460 -
Flags: review?(francois) → review+
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 26•8 years ago
|
||
mozreview-review |
Comment on attachment 8776503 [details] Bug 1285848 - Part 1: Implement Rice Delta Decoding. . https://reviewboard.mozilla.org/r/68276/#review82282 r+ but please fix this license header before landing. ::: toolkit/components/url-classifier/tests/gtest/TestRiceDeltaDecoder.cpp:48 (Diff revisions 6 - 7) > } > }; > > // Copyright 2016 The Chromium Authors. All rights reserved. > // Use of this source code is governed by a BSD-style license that can be > - // found in the <wherever that file is in gecko>/LICENSE file. > + // found in the TestRiceDeltaDecoder.cpp LICENSE file. Here, I meant to use the path where the LICENSE file is located: `media/webrtc/trunk/webrtc/LICENSE`
Attachment #8776503 -
Flags: review?(francois) → review+
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 29•8 years ago
|
||
Pushed by hchang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e7410bc7e068 Part 1: Implement Rice Delta Decoding. r=francois. https://hg.mozilla.org/integration/autoland/rev/14a91d6bf939 Part 2: Request and parse RICE encoded prefix by default. r=francois
Comment 30•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/e7410bc7e068 https://hg.mozilla.org/mozilla-central/rev/14a91d6bf939
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox52:
--- → fixed
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
You need to log in
before you can comment on or make changes to this bug.
Description
•