Closed
Bug 1362449
Opened 7 years ago
Closed 7 years ago
make Base64{Encode,Decode} on nsString not require an ASCII-fication
Categories
(Core :: XPCOM, enhancement, P3)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox57 | --- | fixed |
People
(Reporter: froydnj, Assigned: froydnj)
References
Details
Attachments
(14 files, 1 obsolete file)
7.98 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
2.13 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
1.28 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
2.77 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
2.20 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
1.65 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
3.41 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
2.21 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
2.40 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
2.47 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
2.12 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
1.78 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
1.49 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
3.02 KB,
patch
|
erahm
:
review+
|
Details | Diff | Splinter Review |
Base64{Encode,Decode}, when given an nsString, first convert the string to ASCII, then do the requested operation, then convert back to nsString. We should skip the conversion steps, and just do the encoding directly on char16_t strings. Crash reports indicate that people do pass large strings through these functions sometimes, and trying to reduce the memory footprint here would be helpful.
Assignee | ||
Comment 1•7 years ago
|
||
I have patches for encoding. Patches for decoding are harder, but very doable.
Assignee: nobody → nfroyd
Priority: -- → P3
Assignee | ||
Comment 2•7 years ago
|
||
We already had testcases for streaming encoding, but we didn't have any for one-shot encoding and decoding. Let's fix that.
Attachment #8903713 -
Flags: review?(erahm)
Assignee | ||
Comment 3•7 years ago
|
||
We'll need this to implement direct encoding from nsString, and the changes are trivial.
Attachment #8903714 -
Flags: review?(erahm)
Assignee | ||
Comment 4•7 years ago
|
||
We need to explicitly reduce the incoming character to 8 bits, rather than assuming that a conversion to uint32_t will do the right thing.
Attachment #8903715 -
Flags: review?(erahm)
Assignee | ||
Comment 5•7 years ago
|
||
One less use of NSPR is a good thing. The failure cases that PL_Base64Encode would have caught for us are: 1. "Truncation". 2. Integer overflow when computing destination string length. 3. Malloc failures. The first one only gets checked if we pass in zero for the source length, which we never do. The latter two only get checked if we pass in a null pointer for the destination, which we never do. So removing the error handling PL_Base64Encode implies here is a good thing.
Attachment #8903716 -
Flags: review?(erahm)
Assignee | ||
Comment 6•7 years ago
|
||
The nsACString -> nsACString encode routine has several checks in it for correct operation, and the nsAString -> nsAString encode routine relies on those checks happening via the nsACString -> nsACString routine. Once we start encoding nsAStrings directly, we'll still need those checks, and the easiest way to ensure they happen is to move the core base64 encode logic for strings into a templated helper.
Attachment #8903717 -
Flags: review?(erahm)
Assignee | ||
Comment 7•7 years ago
|
||
After all the previous work, we can base64 encoding nsString values directly into nsString values, without having to go through intermediate nsCString values. Since this routine backs base64 routines exposed to the web, this change should help with OOMs that we see associated with base64 encoding.
Attachment #8903718 -
Flags: review?(erahm)
Assignee | ||
Comment 8•7 years ago
|
||
These tables are nearly identical to the ones for base64url decoding, but ideally will be slightly more readable, since things are broken up into sets of eight entries at a time.
Attachment #8903719 -
Flags: review?(erahm)
Assignee | ||
Comment 9•7 years ago
|
||
The decoding logic is the same for Base64 and Base64URL; we might as well reuse the routines that we already have for Base64URL decoding so we don't make mistakes in the logic.
Attachment #8903720 -
Flags: review?(erahm)
Assignee | ||
Comment 10•7 years ago
|
||
The existing Base64URL code converts from `const char` to `uint8_t`. We're going to want versions that convert from character types to character types, so make the decode routines accept generic input and output types.
Attachment #8903721 -
Flags: review?(erahm)
Assignee | ||
Comment 11•7 years ago
|
||
Our base64 decoding routines are a little nicer, and using them is a necessary step to templating all the necessary routines.
Attachment #8903722 -
Flags: review?(erahm)
Assignee | ||
Comment 12•7 years ago
|
||
Similar to the work we did for encoding, we're going to want this to be type-generic.
Attachment #8903723 -
Flags: review?(erahm)
Assignee | ||
Comment 13•7 years ago
|
||
The current nsString decoding routine indirectly relies on the various checks this routine performs; making it generic over string types ensures that we can eventually call it directly from the nsString decoding routine.
Attachment #8903724 -
Flags: review?(erahm)
Assignee | ||
Comment 14•7 years ago
|
||
The result of decoding needs to be an explicit 8-bit type before being widened to the output type to avoid undesirable effects like sign extension.
Attachment #8903725 -
Flags: review?(erahm)
Assignee | ||
Comment 15•7 years ago
|
||
After all the previous work, we can now base64 decode nsString types without intermediate conversion steps to nsCString, which is faster and more memory-efficient.
Attachment #8903726 -
Flags: review?(erahm)
Comment 16•7 years ago
|
||
Comment on attachment 8903713 [details] [diff] [review] part 0 - add test cases for base64 string encode/decode Review of attachment 8903713 [details] [diff] [review]: ----------------------------------------------------------------- Nice, thanks for adding this.
Attachment #8903713 -
Flags: review?(erahm) → review+
Comment 17•7 years ago
|
||
Comment on attachment 8903714 [details] [diff] [review] part 1 - make Base64 encode generic over source character types Review of attachment 8903714 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/io/Base64.cpp @@ +45,2 @@ > static void > +Encode2to4(const SrcT* aSrc, DestT* aDest) Do we still want to constrain SrcT to be an unsigned type? I'm thinking mainly in regards to the shifts but maybe it doesn't matter.
Attachment #8903714 -
Flags: review?(erahm) → review+
Comment 18•7 years ago
|
||
Comment on attachment 8903715 [details] [diff] [review] part 1a - fix Encode3to4 for non-8-bit character types Review of attachment 8903715 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/io/Base64.cpp @@ +23,5 @@ > "0123456789+/"; > > +// The Base64 encoder assumes all characters are less than 256; for 16-bit > +// strings, that means assuming that all characters are within range, and > +// masking off high bits if necessary. After discussing with you this explanation makes sense, but I think we need to add a comment to the public interface saying we ignore the high order byte in nsStrings (if it's not already there). This is pretty unexpected.
Attachment #8903715 -
Flags: review?(erahm) → review+
Comment 19•7 years ago
|
||
Comment on attachment 8903716 [details] [diff] [review] part 2 - use our Base64 encode implementation rather than NSPR's Review of attachment 8903716 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/io/Base64.cpp @@ +310,5 @@ > *aBase64 = nullptr; > uint32_t base64Len = ((aBinaryLen + 2) / 3) * 4; > > // Add one byte for null termination. > UniqueFreePtr<char[]> base64((char*)malloc(base64Len + 1)); Huh, I think that type is wrong. Should just be `char` right? Looks like we can just get rid of the UniqueFreePtr now and malloc directly into `aBase64`. I don't feel too strongly about that. @@ +338,5 @@ > > uint32_t base64Len = ((aBinary.Length() + 2) / 3) * 4; > > // Add one byte for null termination. > if (!aBase64.SetCapacity(base64Len + 1, fallible)) { While we're in here we could get rid of the +1, our string classes do that stuff for us.
Attachment #8903716 -
Flags: review?(erahm) → review+
Comment 20•7 years ago
|
||
Comment on attachment 8903717 [details] [diff] [review] part 3 - templatify core nsCString base64 encode routine Review of attachment 8903717 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/io/Base64.cpp @@ +343,5 @@ > if (!aBase64.SetCapacity(base64Len + 1, fallible)) { > return NS_ERROR_OUT_OF_MEMORY; > } > > + typename T::char_type* base64 = aBase64.BeginWriting(); `auto` would be a little less awful here
Attachment #8903717 -
Flags: review?(erahm) → review+
Comment 21•7 years ago
|
||
Comment on attachment 8903718 [details] [diff] [review] part 4 - avoid conversions in nsString Base64 encoding Review of attachment 8903718 [details] [diff] [review]: ----------------------------------------------------------------- \o/
Attachment #8903718 -
Flags: review?(erahm) → review+
Comment 22•7 years ago
|
||
Comment on attachment 8903719 [details] [diff] [review] part 5 - add tables for normal base64 decoding Review of attachment 8903719 [details] [diff] [review]: ----------------------------------------------------------------- r- for now, but that's just minor cleanup (that I care enough for you to fix). ::: xpcom/io/Base64.cpp @@ +238,5 @@ > return NS_OK; > } > > +// Maps an encoded character to a value in the Base64 URL alphabet, per > +// RFC 4648, Table 2. Invalid input characters map to UINT8_MAX. So this *isn't* the URL encoding table and it's *not* using RFC 4648 Table 2. A short blurb saying "these are the same but '+' and '/' are used instead of '-' and '_'" or something to that effect. @@ +257,5 @@ > + > + /* 64 */ 255, 0, 1, 2, 3, 4, 5, 6, > + /* 72 */ 7, 8, 9, 10, 11, 12, 13, 14, > + /* 80 */ 15, 16, 17, 18, 19, 20, 21, 22, > + /* 88 */ 23, 24, 25, /* A - Z */ 255, 255, 255, 255, 255, FWIW I don't find the placement of the comment helpful, it would be me more useful at the start of the sequence but meh. @@ +266,5 @@ > + 255, 255, 255, 255, 255, > +}; > + > +template<typename T> > +bool MUST_USE? @@ +270,5 @@ > +bool > +Base64CharToValue(T aChar, uint8_t* aValue) > +{ > + uint8_t index = static_cast<uint8_t>(aChar); > + *aValue = kBase64DecodeTable[index & 0x7f]; Can we add some asserts on the table size and make 0x7f a const with a decent name? @@ +271,5 @@ > +Base64CharToValue(T aChar, uint8_t* aValue) > +{ > + uint8_t index = static_cast<uint8_t>(aChar); > + *aValue = kBase64DecodeTable[index & 0x7f]; > + return (*aValue != 255) && !(index & ~0x7f); a) Maybe just use UINT8_MAX or std::numeric_limits<uint8_t>::max()? b) This is pretty weird, why do we do a bounds check after? What's the use case of getting the masked value and also returning false?* * I get this is copied from the URL version, but I still think it's weird.
Attachment #8903719 -
Flags: review?(erahm) → review-
Comment 23•7 years ago
|
||
Comment on attachment 8903720 [details] [diff] [review] part 6 - factor out base64url decoding routines Review of attachment 8903720 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/io/Base64.cpp @@ +400,5 @@ > { > return Base64EncodeHelper(aBinary, aBase64); > } > > +template<typename F> Do we need to template here, can we just take a function pointer? If you want/need to keep it a template can you call 'F' something more obvious (CharacterMapping or something, just not F).
Attachment #8903720 -
Flags: review?(erahm) → review+
Updated•7 years ago
|
Attachment #8903721 -
Flags: review?(erahm) → review+
Comment 24•7 years ago
|
||
Comment on attachment 8903722 [details] [diff] [review] part 8 - use our base64 decoding routines instead of NSPR's Review of attachment 8903722 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/io/Base64.cpp @@ +456,5 @@ > + uint32_t inputLength = aBase64Len; > + char* binary = aBinary; > + uint32_t binaryLength = 0; > + > + // Handle trailing '=' characters. As followup it might be nice to share this logic w/ the URL version.
Attachment #8903722 -
Flags: review?(erahm) → review+
Updated•7 years ago
|
Attachment #8903723 -
Flags: review?(erahm) → review+
Comment 25•7 years ago
|
||
Comment on attachment 8903724 [details] [diff] [review] part 10 - templatify CString base64 decode routine Review of attachment 8903724 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/io/Base64.cpp @@ +564,5 @@ > if (!aBinary.SetCapacity(binaryLen + 1, fallible)) { > return NS_ERROR_OUT_OF_MEMORY; > } > > + typename T::char_type* binary = aBinary.BeginWriting(); Again, this could be `auto`, but meh.
Attachment #8903724 -
Flags: review?(erahm) → review+
Updated•7 years ago
|
Attachment #8903725 -
Flags: review?(erahm) → review+
Comment 26•7 years ago
|
||
Comment on attachment 8903726 [details] [diff] [review] part 12 - avoid conversions in nsString decoding Review of attachment 8903726 [details] [diff] [review]: ----------------------------------------------------------------- \o/
Attachment #8903726 -
Flags: review?(erahm) → review+
Assignee | ||
Comment 27•7 years ago
|
||
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #17) > ::: xpcom/io/Base64.cpp > @@ +45,2 @@ > > static void > > +Encode2to4(const SrcT* aSrc, DestT* aDest) > > Do we still want to constrain SrcT to be an unsigned type? I'm thinking > mainly in regards to the shifts but maybe it doesn't matter. I changed part 1a to explicitly truncate to 8 bits in all Encode*to* routines instead. (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #18) > ::: xpcom/io/Base64.cpp > @@ +23,5 @@ > > "0123456789+/"; > > > > +// The Base64 encoder assumes all characters are less than 256; for 16-bit > > +// strings, that means assuming that all characters are within range, and > > +// masking off high bits if necessary. > > After discussing with you this explanation makes sense, but I think we need > to add a comment to the public interface saying we ignore the high order > byte in nsStrings (if it's not already there). This is pretty unexpected. Comments added. (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #19) > ::: xpcom/io/Base64.cpp > @@ +310,5 @@ > > *aBase64 = nullptr; > > uint32_t base64Len = ((aBinaryLen + 2) / 3) * 4; > > > > // Add one byte for null termination. > > UniqueFreePtr<char[]> base64((char*)malloc(base64Len + 1)); > > Huh, I think that type is wrong. Should just be `char` right? It should indeed be just `char`. (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #22) > @@ +271,5 @@ > > +Base64CharToValue(T aChar, uint8_t* aValue) > > +{ > > + uint8_t index = static_cast<uint8_t>(aChar); > > + *aValue = kBase64DecodeTable[index & 0x7f]; > > + return (*aValue != 255) && !(index & ~0x7f); > > a) Maybe just use UINT8_MAX or std::numeric_limits<uint8_t>::max()? I don't know if that clarifies or obscures, since we should really be using the same value in the table above. > b) This is pretty weird, why do we do a bounds check after? What's the use > case of getting the masked value and also returning false?* > > * I get this is copied from the URL version, but I still think it's weird. I do not know why it was written that way in the first place. I don't know if there's a lot of value in changing it to be different just because, but I can change it... (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #23) > ::: xpcom/io/Base64.cpp > @@ +400,5 @@ > > { > > return Base64EncodeHelper(aBinary, aBase64); > > } > > > > +template<typename F> > > Do we need to template here, can we just take a function pointer? If you > want/need to keep it a template can you call 'F' something more obvious > (CharacterMapping or something, just not F). I thought with taking the template, we might encourage inlining, and writing templates is nicer than writing function pointer types. It seems odd to complain about `typename F` here, but not complain about `typename T` and similar. But I've changed it to `typename Decoder` locally.
Assignee | ||
Comment 28•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #27) > (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment > #19) > > ::: xpcom/io/Base64.cpp > > @@ +310,5 @@ > > > *aBase64 = nullptr; > > > uint32_t base64Len = ((aBinaryLen + 2) / 3) * 4; > > > > > > // Add one byte for null termination. > > > UniqueFreePtr<char[]> base64((char*)malloc(base64Len + 1)); > > > > Huh, I think that type is wrong. Should just be `char` right? > > It should indeed be just `char`. I take that back, it should be `char[]`. We're specifying the deletion policy explicitly with `UniqueFreePtr`, so we don't have to worry about allocator/deallocator mismatches. `char[]` also permits us to use `operator[]` on the `UniqueFreePtr`, rather than having to do something ugly like `base64.get()[...]`. We're storing what is conceptually an array in the smart pointer, rather than just a pointer to a single dynamically-allocated `char`.
Assignee | ||
Comment 29•7 years ago
|
||
With review comments addressed, I think.
Attachment #8905067 -
Flags: review?(erahm)
Assignee | ||
Updated•7 years ago
|
Attachment #8903719 -
Attachment is obsolete: true
Comment 30•7 years ago
|
||
Comment on attachment 8905067 [details] [diff] [review] part 5 - add tables for normal base64 decoding Review of attachment 8905067 [details] [diff] [review]: ----------------------------------------------------------------- r=me ::: xpcom/io/Base64.cpp @@ +280,5 @@ > + if (index & ~mask) { > + return false; > + } > + *aValue = kBase64DecodeTable[index & mask]; > + nit: spaces
Attachment #8905067 -
Flags: review?(erahm) → review+
Comment 31•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #28) > (In reply to Nathan Froyd [:froydnj] from comment #27) > > (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment > > #19) > > > ::: xpcom/io/Base64.cpp > > > @@ +310,5 @@ > > > > *aBase64 = nullptr; > > > > uint32_t base64Len = ((aBinaryLen + 2) / 3) * 4; > > > > > > > > // Add one byte for null termination. > > > > UniqueFreePtr<char[]> base64((char*)malloc(base64Len + 1)); > > > > > > Huh, I think that type is wrong. Should just be `char` right? > > > > It should indeed be just `char`. > > I take that back, it should be `char[]`. We're specifying the deletion > policy explicitly with `UniqueFreePtr`, so we don't have to worry about > allocator/deallocator mismatches. `char[]` also permits us to use > `operator[]` on the `UniqueFreePtr`, rather than having to do something ugly > like `base64.get()[...]`. We're storing what is conceptually an array in > the smart pointer, rather than just a pointer to a single > dynamically-allocated `char`. Oh right, that's pretty bizarre behavior on our part. You either do UniqueFreePtr<char> or UniqueFreePtr<char[]> but *not* UniqueFreePtr<char*>.
Comment 32•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #27) > > > +template<typename F> > > > > Do we need to template here, can we just take a function pointer? If you > > want/need to keep it a template can you call 'F' something more obvious > > (CharacterMapping or something, just not F). > > I thought with taking the template, we might encourage inlining, and writing > templates is nicer than writing function pointer types. > > It seems odd to complain about `typename F` here, but not complain about > `typename T` and similar. But I've changed it to `typename Decoder` locally. It's just that templating a function param is a little out of the norm and it wasn't obvious it was a function (to me at least). A comment would have sufficed I guess. With `typename T` it's generally clear that's a character type in it's usage (T* etc).
Comment 33•7 years ago
|
||
Pushed by nfroyd@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c0d320e5bad1 part 0 - add test cases for base64 string encode/decode; r=erahm https://hg.mozilla.org/integration/mozilla-inbound/rev/959dd58ea6df part 1 - make Base64 encode generic over source character types; r=erahm https://hg.mozilla.org/integration/mozilla-inbound/rev/bde5d6755cf0 part 1a - fix Encode*to* for non-8-bit character types; r=erahm https://hg.mozilla.org/integration/mozilla-inbound/rev/014a075a5a0d part 2 - use our Base64 encode implementation rather than NSPR's; r=erahm https://hg.mozilla.org/integration/mozilla-inbound/rev/53447e7c6c29 part 3 - templatify core nsCString base64 encode routine; r=erahm https://hg.mozilla.org/integration/mozilla-inbound/rev/837aaf616d85 part 4 - avoid conversions in nsString Base64 encoding; r=erahm https://hg.mozilla.org/integration/mozilla-inbound/rev/9080c5fb0769 part 5 - add tables for normal base64 decoding; r=erahm https://hg.mozilla.org/integration/mozilla-inbound/rev/966a99fcb8ce part 6 - factor out base64url decoding routines; r=erahm https://hg.mozilla.org/integration/mozilla-inbound/rev/e364a330eb3e part 7 - make decode routines type-generic; r=erahm https://hg.mozilla.org/integration/mozilla-inbound/rev/6fed9af7f5e2 part 8 - use our base64 decoding routines instead of NSPR's; r=erahm https://hg.mozilla.org/integration/mozilla-inbound/rev/8454297462a3 part 9 - templatify core base64 decode routine; r=erahm https://hg.mozilla.org/integration/mozilla-inbound/rev/c92f1975d936 part 10 - templatify CString base64 decode routine; r=erahm https://hg.mozilla.org/integration/mozilla-inbound/rev/f2669ac73b90 part 11 - fix decoding subroutines to cope with wide output types; r=erahm https://hg.mozilla.org/integration/mozilla-inbound/rev/981709ff412e part 12 - avoid conversions in nsString Base64 decoding; r=erahm
Comment 34•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c0d320e5bad1 https://hg.mozilla.org/mozilla-central/rev/959dd58ea6df https://hg.mozilla.org/mozilla-central/rev/bde5d6755cf0 https://hg.mozilla.org/mozilla-central/rev/014a075a5a0d https://hg.mozilla.org/mozilla-central/rev/53447e7c6c29 https://hg.mozilla.org/mozilla-central/rev/837aaf616d85 https://hg.mozilla.org/mozilla-central/rev/9080c5fb0769 https://hg.mozilla.org/mozilla-central/rev/966a99fcb8ce https://hg.mozilla.org/mozilla-central/rev/e364a330eb3e https://hg.mozilla.org/mozilla-central/rev/6fed9af7f5e2 https://hg.mozilla.org/mozilla-central/rev/8454297462a3 https://hg.mozilla.org/mozilla-central/rev/c92f1975d936 https://hg.mozilla.org/mozilla-central/rev/f2669ac73b90 https://hg.mozilla.org/mozilla-central/rev/981709ff412e
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in
before you can comment on or make changes to this bug.
Description
•