Closed
Bug 607115
Opened 10 years ago
Closed 10 years ago
use a much smaller guid format than we currently use for bookmarks
Categories
(Toolkit :: Places, defect)
Toolkit
Places
Tracking
()
RESOLVED
FIXED
mozilla2.0b9
Tracking | Status | |
---|---|---|
blocking2.0 | --- | betaN+ |
People
(Reporter: mconnor, Assigned: sdwilsh)
References
Details
(Whiteboard: [fixed-in-places])
Attachments
(3 files, 2 obsolete files)
1.80 KB,
patch
|
KaiE
:
review+
|
Details | Diff | Splinter Review |
3.14 KB,
patch
|
mak
:
review+
|
Details | Diff | Splinter Review |
9.67 KB,
patch
|
Details | Diff | Splinter Review |
Sync generates smaller GUIDs for history than the UUID + sequence number that bookmarks currently uses. It's 10 characters from a base70 charset that is URL and command-line safe. Collision probability is 1% after 100 million records. See bug 518146 for details. I'm proposing that we take Utils.makeGUID from Sync and make it an API (nsIProbablyUniqueID anyone?), which we can then for bookmarks and history. Key benefits: * Smaller places DBs (especially in conjunction with bug 607112) * Less overhead to generate. * Sync can stick more of these in a URL for partial sync on mobile.
Assignee | ||
Comment 1•10 years ago
|
||
(In reply to comment #0) > I'm proposing that we take Utils.makeGUID from Sync and make it an API > (nsIProbablyUniqueID anyone?), which we can then for bookmarks and history. Prefer to see this as a SQL function in SQLFunctions.cpp, but yes.
Reporter | ||
Comment 2•10 years ago
|
||
I guess if we're assuming no one outside of sqlite-land will use these, that's fine. I was thinking that we'd make Utils.makeGUID use the built in impl, and also make passwords/forms/other stuff use this method so we don't have to insert our own guids anywhere.
Assignee | ||
Comment 3•10 years ago
|
||
(In reply to comment #2) > I guess if we're assuming no one outside of sqlite-land will use these, that's > fine. I was thinking that we'd make Utils.makeGUID use the built in impl, and > also make passwords/forms/other stuff use this method so we don't have to > insert our own guids anywhere. Well, at least for places, it probably makes sense to generate the guid when we insert it. It's possible we could also just add it as a generic function for all storage consumers too. However, it's also possible I'm over-engineering this without thinking about it much yet ;)
Assignee | ||
Updated•10 years ago
|
blocking2.0: --- → betaN+
Assignee | ||
Comment 4•10 years ago
|
||
I am proposing that we go with this algorithm (which is different from what Mardak did in bug 518146, but has the benefit of being driven by PRNG: Use nsIRandomGenerator::generateRandomBytes, and base64 encode that. Base64 encoding gives us 60 characters (a-z, A-Z, 0-9, /, +), with the last two being not so good for urls. We can replace them with ! and * respectively (or one of the other characters we do not use from the current implementation). The current encoding allows for 70^10 possibilities (2.82 x 10^18). If we keep the length of only ten characters, we allow for 64^10 possibilities (1.15 x 10^18), which is a bit less than half, which I believe doubles the likelihood of collision. Adding one additional character nets us 64^11 possibilities (73.8 x 10^18) which is 25 times the number of possibilities with the current implementation in Sync. I'm a bit leery at implementing something that increases the chance of collisions, so my question is this: is the sync team OK with an 11 character guid?
Comment 5•10 years ago
|
||
(In reply to comment #4) > Use nsIRandomGenerator::generateRandomBytes, and base64 encode that. Base64 > encoding gives us 60 characters (a-z, A-Z, 0-9, /, +), (That's 64 characters, not 60.) > with the last two being > not so good for urls. We can replace them with ! and * respectively (or one of > the other characters we do not use from the current implementation). I suggest using the "base64url" encoding as described in http://tools.ietf.org/html/rfc4648#section-5, basically using - and _ instead of / and +. > The current encoding allows for 70^10 possibilities (2.82 x 10^18). If we keep the > length of only ten characters, we allow for 64^10 possibilities (1.15 x 10^18), > which is a bit less than half, which I believe doubles the likelihood of > collision. Adding one additional character nets us 64^11 possibilities (73.8 x > 10^18) which is 25 times the number of possibilities with the current > implementation in Sync. I'm a bit leery at implementing something that > increases the chance of collisions, so my question is this: is the sync team OK > with an 11 character guid? Let's make it 12 characters base64 (because base64 pads to multiples of 4 characters anyway). That's 9 random bytes as input to base64-encode: guid = btoa(prng.generateRandomBytes(9))
Assignee | ||
Comment 6•10 years ago
|
||
First, we'll need to make nsIRandomGenerator threadsafe so we can use it both on the main thread and the background thread.
Comment 7•10 years ago
|
||
Nod. If we go with base64, better to go with multiples of 4 characters, so 12 is better than 8. Although we could always just truncate to 11 characters, etc.
Status: ASSIGNED → NEW
Assignee | ||
Comment 8•10 years ago
|
||
(In reply to comment #5) > (That's 64 characters, not 60.) Uh, yeah. Luckily everything after that had the right number. > I suggest using the "base64url" encoding as described in > http://tools.ietf.org/html/rfc4648#section-5, basically using - and _ instead > of / and +. Yeah, that's simple enough to tweak. > Let's make it 12 characters base64 (because base64 pads to multiples of 4 > characters anyway). That's 9 random bytes as input to base64-encode: > > guid = btoa(prng.generateRandomBytes(9)) OK, but this is being implemented in c++ land so it's a bit more complicated ;)
Status: NEW → ASSIGNED
Comment 9•10 years ago
|
||
Comment on attachment 492380 [details] [diff] [review] Part 1 - mark and make nsIRandomGenerator threadsafe In my understanding PK11_GenerateRandom should be fully threadsafe, it should be fine to call it from any thread. r=kaie
Attachment #492380 -
Flags: review?(kaie) → review+
Assignee | ||
Comment 10•10 years ago
|
||
Attachment #492442 -
Flags: review?(mak77)
Assignee | ||
Comment 11•10 years ago
|
||
This includes tests for part 3 and part 2. Brute force it by making guids until we've seen every single expected character, and the whole time make sure we never see an illegal one. Also tests that nothing blows up on the background thread.
Attachment #492444 -
Flags: review?(mak77)
Assignee | ||
Comment 12•10 years ago
|
||
Parts 1 - 3 are all we need in this bug. The work to actually use this stuff will be in bug 607117.
Comment 13•10 years ago
|
||
Comment on attachment 492442 [details] [diff] [review] Part 2 - helper method >diff --git a/toolkit/components/places/src/Helpers.cpp b/toolkit/components/places/src/Helpers.cpp >+nsresult >+GenerateGUID(nsCString& _guid) >+{ >+ _guid.Truncate(); >+ >+ // Request raw random bytes and base64 encode it. For each set of three >+ // bytes, we get one character. nit: "encode them" >+ const PRUint32 kGUIDLength = 12; >+ const PRUint32 kRequiredBytesLength = >+ static_cast<PRUint32>((kGUIDLength + 1) / 4 * 3); nit: add parentheses to make operator precedence explicit. What is the +1 saving? comment plz >+ PRUint8* buffer; >+ nsresult rv = rg->GenerateRandomBytes(kRequiredBytesLength, &buffer); >+ NS_ENSURE_SUCCESS(rv, rv); I don't see where you NS_Free(buffer), looks like being leaked As an alternative, you could make it being adopted by a nsCAutoString, and const_cast<char*>(string.get()) later. >+ // SetLength doesn't set aside space for NULL termination, but PL_Base64Encode >+ // will NULL terminate it's string. nit: s/it's/its, but better "PL_Base64Encode generates a NULL terminated string." >+ NS_ENSURE_TRUE(_guid.SetCapacity(kGUIDLength + 1), NS_ERROR_OUT_OF_MEMORY); >+ _guid.SetLength(kGUIDLength); >+ (void)PL_Base64Encode(reinterpret_cast<const char*>(buffer), >+ kRequiredBytesLength, _guid.BeginWriting()); Actually, the documentation for base64encode says that the buffer is not null terminated if the third agument is not null. While it's null terminated if you use the direct return value and pass null as the third argument. http://mxr.mozilla.org/mozilla-central/source/nsprpub/lib/libc/src/base64.c#137 137 * If the destination argument is not NULL, it is assumed to 138 * be of sufficient size, and the contents will not be null- 139 * terminated by this routine.
Attachment #492442 -
Flags: review?(mak77) → review-
Comment 14•10 years ago
|
||
Comment on attachment 492444 [details] [diff] [review] Part 3 - SQL method with tests >diff --git a/toolkit/components/places/src/SQLFunctions.cpp b/toolkit/components/places/src/SQLFunctions.cpp > MatchAutoCompleteFunction::create(mozIStorageConnection *aDBConn) > { > nsRefPtr<MatchAutoCompleteFunction> function(new MatchAutoCompleteFunction); >- NS_ENSURE_TRUE(function, NS_ERROR_OUT_OF_MEMORY); nit: Could you make "function = new" per our code style? >+//////////////////////////////////////////////////////////////////////////////// >+//// GUID Creation Function >+ >+ ////////////////////////////////////////////////////////////////////////////// >+ //// GenerateGUIDFunction nit: I find having both a bit redundant, but it's ok if you think it's worth it. >+ /* static */ nit: I think we usually use // static >diff --git a/toolkit/components/places/tests/unit/test_sql_guid_function.js b/toolkit/components/places/tests/unit/test_sql_guid_function.js >+++ b/toolkit/components/places/tests/unit/test_sql_guid_function.js >@@ -0,0 +1,125 @@ >+/* Any copyright is dedicated to the Public Domain. >+ http://creativecommons.org/publicdomain/zero/1.0/ */ nit and OT: We should ask gerv to figure out the common style for these pd declarations, some file puts them between BEGIN/END LICENSE BLOCK, some file uses a javadoc init, some prefixes with *... >+/** >+ * Checks all our invariants about our guids for a given result. >+ * >+ * @param aGuid >+ * The guid to check. >+ */ >+function check_invariants(aGuid) >+{ >+ print("TEST-INFO | " + tests[index - 1].name + " | Checking guid '" + >+ aGuid + "'"); >+ >+ const kGUIDLength = 12; >+ do_check_eq(aGuid.length, kGUIDLength); >+ do_check_true(/[a-zA-Z0-9\-_]{12}/.test(aGuid)); you could avoid the length check and do /^[a-zA-Z0-9\-_]{12}$/ so the regex will also do length check the output should not be worse since you print the guid before >+function test_guid_on_background() >+{ >+ // We should not assert if we execute this asynchronously. >+ let stmt = DBConn().createAsyncStatement("SELECT GENERATE_GUID()"); >+ let checked = false; >+ stmt.executeAsync({ no handleError? >+let tests = [ >+ test_guid_invariants, >+ test_guid_on_background, >+]; >+let index = 0; nit: using tests.shift() is usually more compact than using a test index mostly nits, nothing major.
Attachment #492444 -
Flags: review?(mak77) → review+
Assignee | ||
Comment 15•10 years ago
|
||
(In reply to comment #14) > nit and OT: We should ask gerv to figure out the common style for these pd > declarations, some file puts them between BEGIN/END LICENSE BLOCK, some file > uses a javadoc init, some prefixes with *... I did this a while ago, and this is why we have boilerplate for it now: http://www.mozilla.org/MPL/boilerplate-1.1/
Comment 16•10 years ago
|
||
ah cool, I don't like it with that trailing */ but my personal taste is not part of a discussion :) thanks for notifying me of the boilerplate.
Assignee | ||
Updated•10 years ago
|
Whiteboard: [needs review mak]
Assignee | ||
Comment 17•10 years ago
|
||
(In reply to comment #13) > >+ const PRUint32 kGUIDLength = 12; > >+ const PRUint32 kRequiredBytesLength = > >+ static_cast<PRUint32>((kGUIDLength + 1) / 4 * 3); > > nit: add parentheses to make operator precedence explicit. operator precedence doesn't matter for / and *. Those operations are commutative, so their order doesn't matter (however, the addition is wrapped because it does matter). > What is the +1 saving? comment plz It's due to integer math, but we don't need to do this as long as we are a multiple of 4. I have an assert at the end that makes sure the guid we return is of the right length, so we should be OK. > I don't see where you NS_Free(buffer), looks like being leaked > As an alternative, you could make it being adopted by a nsCAutoString, and > const_cast<char*>(string.get()) later. Naw, I'll just free it right after I'm done using it. > Actually, the documentation for base64encode says that the buffer is not null > terminated if the third agument is not null. > While it's null terminated if you use the direct return value and pass null as > the third argument. nsCStrings need to be NULL terminated though. I've fixed up the wording to better reflect reality.
Assignee | ||
Comment 18•10 years ago
|
||
I refactored this a bit because of how I'm doing things in bug 607112. These changes will make that bug's patch smaller, which I think is a good thing. This should address the review comments.
Attachment #492442 -
Attachment is obsolete: true
Attachment #492673 -
Flags: review?(mak77)
Assignee | ||
Comment 19•10 years ago
|
||
(In reply to comment #14) > nit: I find having both a bit redundant, but it's ok if you think it's worth > it. Agreed, but it's how this file is. I can change it all over if you'd like, but that means re-indenting the world. > nit: I think we usually use // static not in this file though > you could avoid the length check and do /^[a-zA-Z0-9\-_]{12}$/ so the regex > will also do length check > the output should not be worse since you print the guid before Our regex isn't really any different ;) I kept the length check separate just because I wanted the failure to be very clear. I don't think having the separate check hurts us, so I'm going to leave it in. > no handleError? handleCompletion checks the reason, so we don't need it > nit: using tests.shift() is usually more compact than using a test index While this is true, it makes it less clear when I use the functions name for logging output. Using the index makes all of those call sites clearer, IMO.
Updated•10 years ago
|
Attachment #492673 -
Flags: review?(mak77) → review+
Assignee | ||
Comment 21•10 years ago
|
||
http://hg.mozilla.org/projects/places/rev/4ff3fc8162d8 http://hg.mozilla.org/projects/places/rev/510ee4557bb9 http://hg.mozilla.org/projects/places/rev/952cedf2302e
Whiteboard: [fixed-in-places]
Version: unspecified → Trunk
Comment 22•10 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/952cedf2302e http://hg.mozilla.org/mozilla-central/rev/510ee4557bb9 http://hg.mozilla.org/mozilla-central/rev/4ff3fc8162d8
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b9
You need to log in
before you can comment on or make changes to this bug.
Description
•