Closed Bug 711180 Opened 8 years ago Closed 6 years ago

atob should ignore whitespace

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla27

People

(Reporter: Ms2ger, Assigned: mz_mhs-ctb)

Details

(Keywords: dev-doc-complete, site-compat, Whiteboard: [good first bug][mentor=Ms2ger][lang=c++])

Attachments

(1 file, 15 obsolete files)

3.71 KB, patch
Details | Diff | Splinter Review
This requires either changing NSPR behavior (wtc, would that be acceptable?) or reimplementing base64 stuff outside NSPR.
Changing PL_Base64Decode behavior should be OK.  The change is a little
tricky because the 'srclen' input argument may be an over-estimate.
Note that we already have base64 code outside of nspr (xpcom/io/Base64.cpp).
Also, is this really supposed to remove internal space characters?  What's the point of this?
Base64 content, often has spaces inserted to break it up into sane-length lines, especially when going over SMTP.
I found that the base64 decoder in NSS already ignores whitespace:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/util/nssb64d.c&rev=1.8&mark=265-279#265

NSS uses its own base64 decoder (rather than PL_Base64Decode) to
decode PEM certificate files, which contain newlines.
I would like to tackle this bug.  How do I get attached to it?
I just assigned it to you.  Past that, just post patches.  And thank you!
Assignee: nobody → trhodson
Can you confirm that you're still working on this bug?
Flags: needinfo?(trhodson)
Assignee: trhodson → nobody
Flags: needinfo?(trhodson)
I'd like to work on this one, please.
OK, I've assigned it to you, and it's all yours. Good luck, and please let us know about your progress, or any questions you have, in this bug.

Happy hunting!
Assignee: nobody → mz_mhs-ctb
Status: NEW → ASSIGNED
Should other characters be ignored as well, or just whitespace?
Oh, and we can say the input to atob must have a even number of characters, correct?
Anyhow, waiting for mach to build me a test.
OK, thanks!
OK, got that; currently waiting for a test build. If it works I'll attach a patch.
Almost done with this.
Attached patch Proposed patch v1 (obsolete) — Splinter Review
Attachment #779212 - Flags: review?
Michael, it would be better to request a review from a specific person. I'm not sure who'd be appropriate though. Maybe :Ms2ger since they're the bug mentor?
OK, trying that.
Attachment #779212 - Flags: review? → review?(Ms2ger)
Comment on attachment 779212 [details] [diff] [review]
Proposed patch v1

For NSPR changes, you'll want wtc.
Attachment #779212 - Flags: review?(Ms2ger) → review?(wtc)
Attached patch Proposed patch v1.1 (obsolete) — Splinter Review
Attachment #779212 - Attachment is obsolete: true
Attachment #779212 - Flags: review?(wtc)
Attachment #780030 - Flags: review?(wtc)
Comment on attachment 780030 [details] [diff] [review]
Proposed patch v1.1

Review of attachment 780030 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you for the patch. I only reviewed nsprpub/lib/libc/src/base64.c.
I didn't review it in detail because I spotted inappropriate use of
global variables.

::: nsprpub/lib/libc/src/base64.c
@@ +10,5 @@
>  #include <string.h> /* for strlen */
>  
>  static unsigned char *base = (unsigned char *)"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
> +static PRUint32 lenm = (PRUint32)0;
> +static PRUint32 dcn = (PRUint32)0;

lenm and dcn are global variables, but the functions modify them.
So two threads cannot call the functions at the same time. We need
a different approach so that the functions remain thread-safe.

Also the variable names "lenm" and "dcn" are not descriptive enough.
I can't guess what they mean.

@@ +184,5 @@
> +        return -2;
> +    }
> +    else if( (unsigned char)' ' == c || (unsigned char)'\r' == c ||
> +        (unsigned char)'\n' == c || (unsigned char)'\t' == c ||
> +        (unsigned char)'\f' == c )

Use the isspace(c) function from <ctype.h>.

@@ +446,5 @@
>          }
> +        for( i = 0; i < strlen(src); i++ ) {
> +            if ( (unsigned char)' ' == src[i] || (unsigned char)'\r' == src[i] ||
> +            (unsigned char)'\n' == src[i] || (unsigned char)'\t' == src[i] ||
> +            (unsigned char)'\f' == src[i] ) {

Use isspace((unsigned char)src[i]) here.
Attachment #780030 - Flags: review?(wtc) → review-
(In reply to Wan-Teh Chang from comment #25)
> @@ +184,5 @@
> > +        return -2;
> > +    }
> > +    else if( (unsigned char)' ' == c || (unsigned char)'\r' == c ||
> > +        (unsigned char)'\n' == c || (unsigned char)'\t' == c ||
> > +        (unsigned char)'\f' == c )
> 
> Use the isspace(c) function from <ctype.h>.

The definition of "white space" is included in the HTML5 spec and we can't assume that isspace(c) matches the definition.

I think we should use xpcom/io/Base64.cpp or something instead of bringing HTML5'ism into NSPR.
Attached patch Proposed patch v2 (obsolete) — Splinter Review
Attachment #780030 - Attachment is obsolete: true
Attachment #782855 - Flags: review?(wtc)
Comment on attachment 782855 [details] [diff] [review]
Proposed patch v2

Did you read comment #26?
This patch will violate the HTML spec.
http://www.whatwg.org/specs/web-apps/current-work/multipage/common-microsyntaxes.html#space-character
Even the current locale will affect the isspace() result. isspace() is not adequate here at all.
But the NSPR owner refused the first patch, so we will have to add or use a base64 decoder outside NSPR.
(In reply to Masatoshi Kimura [:emk] from comment #28)
> Comment on attachment 782855 [details] [diff] [review]
> Proposed patch v2
> 
> Did you read comment #26?
> This patch will violate the HTML spec.
> http://www.whatwg.org/specs/web-apps/current-work/multipage/common-
> microsyntaxes.html#space-character
> Even the current locale will affect the isspace() result. isspace() is not
> adequate here at all.
> But the NSPR owner refused the first patch, so we will have to add or use a
> base64 decoder outside NSPR.

OK
Attachment #782855 - Attachment is obsolete: true
Attachment #782855 - Flags: review?(wtc)
Attached patch Proposed patch v3 (obsolete) — Splinter Review
Attachment #785435 - Flags: review?(bobbyholley+bmo)
Attached patch Proposed patch v3.1 (obsolete) — Splinter Review
Attachment #785435 - Attachment is obsolete: true
Attachment #785435 - Flags: review?(bobbyholley+bmo)
Attachment #785445 - Flags: review?(bobbyholley+bmo)
Attachment #785445 - Attachment is obsolete: true
Attachment #785445 - Flags: review?(bobbyholley+bmo)
Attachment #785435 - Attachment is obsolete: false
Attachment #785435 - Flags: review?(bobbyholley+bmo)
Attached patch Proposed patch v3.2 (obsolete) — Splinter Review
Attachment #785435 - Attachment is obsolete: true
Attachment #785435 - Flags: review?(bobbyholley+bmo)
Attachment #785509 - Flags: review?(bobbyholley+bmo)
Attachment #785509 - Flags: review?(bobbyholley+bmo) → review?(Ms2ger)
Comment on attachment 785509 [details] [diff] [review]
Proposed patch v3.2

>+  in.SetCapacity(aAsciiBase64String.Length());

I have a concern about wasting memory by copying the string. At least a fallible allocator should be used because Web pages can pass arbitrary large string.

>+    if(!(nsCRT::IsAsciiSpace(*(start)) || *(start) == '\f')) {

Please use nsContentUtils::IsHTMLWhitespace.
(In reply to Masatoshi Kimura [:emk] from comment #33)
> Comment on attachment 785509 [details] [diff] [review]
> Proposed patch v3.2
> 
> >+  in.SetCapacity(aAsciiBase64String.Length());
> 
> I have a concern about wasting memory by copying the string. At least a
> fallible allocator should be used because Web pages can pass arbitrary large
> string.
> 
> >+    if(!(nsCRT::IsAsciiSpace(*(start)) || *(start) == '\f')) {
> 
> Please use nsContentUtils::IsHTMLWhitespace.

Addressing now.
Attachment #785509 - Flags: review?(Ms2ger)
Attached patch Proposed patch v4 (obsolete) — Splinter Review
Attachment #785509 - Attachment is obsolete: true
Attachment #788712 - Flags: review?(Ms2ger)
Comment on attachment 788712 [details] [diff] [review]
Proposed patch v4

Review of attachment 788712 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay.

I wonder if this is too much work for the case where there are no spaces in the input. Maybe you could first do a loop over the input to check if the new string is necessary. (Possibly merged with the Is8Bit call somehow... That's probably a premature optimization, though.)

::: content/base/src/nsContentUtils.cpp
@@ +652,5 @@
>      aBinaryData.Truncate();
>      return NS_ERROR_DOM_INVALID_CHARACTER_ERR;
>    }
>  
> +  const PRUnichar *start = aAsciiBase64String.BeginReading();

* to the left

@@ +653,5 @@
>      return NS_ERROR_DOM_INVALID_CHARACTER_ERR;
>    }
>  
> +  const PRUnichar *start = aAsciiBase64String.BeginReading();
> +  const PRUnichar *end = aAsciiBase64String.EndReading();

Same here

@@ +654,5 @@
>    }
>  
> +  const PRUnichar *start = aAsciiBase64String.BeginReading();
> +  const PRUnichar *end = aAsciiBase64String.EndReading();
> +  nsString in;

Maybe 'trimmedBase64String' or something like that?

@@ +655,5 @@
>  
> +  const PRUnichar *start = aAsciiBase64String.BeginReading();
> +  const PRUnichar *end = aAsciiBase64String.EndReading();
> +  nsString in;
> +  if(!in.SetCapacity(aAsciiBase64String.Length(), fallible_t())) {

'if ('

@@ +658,5 @@
> +  nsString in;
> +  if(!in.SetCapacity(aAsciiBase64String.Length(), fallible_t())) {
> +    return NS_ERROR_DOM_INVALID_CHARACTER_ERR;
> +  }
> +  while (start != end) {

Let's use < just to be sure

@@ +659,5 @@
> +  if(!in.SetCapacity(aAsciiBase64String.Length(), fallible_t())) {
> +    return NS_ERROR_DOM_INVALID_CHARACTER_ERR;
> +  }
> +  while (start != end) {
> +    if(!nsContentUtils::IsHTMLWhitespace(*start)) {

'if ('

@@ +660,5 @@
> +    return NS_ERROR_DOM_INVALID_CHARACTER_ERR;
> +  }
> +  while (start != end) {
> +    if(!nsContentUtils::IsHTMLWhitespace(*start)) {
> +      in.Append(start, 1);

Append(*start) should work, I think

@@ +664,5 @@
> +      in.Append(start, 1);
> +    }
> +    start++;
> +  }
> +  nsresult rv = Base64Decode((const nsAString&)in, aBinaryData);

Should not need this cast
Attachment #788712 - Flags: review?(Ms2ger)
And we need a mochitest.
I'll do it tomorrow. Where does the mochitest go in the tree?
Ignore the above.
Attached patch Proposed patch v5 (obsolete) — Splinter Review
Patch with review comments addressed.
Attachment #788712 - Attachment is obsolete: true
Attachment #798143 - Flags: review?(Ms2ger)
Attached patch Mochitest (obsolete) — Splinter Review
Attachment #801270 - Flags: review?(Ms2ger)
Attachment #801270 - Attachment description: test_bug711180.patch → Mochitest
Comment on attachment 801270 [details] [diff] [review]
Mochitest

Review of attachment 801270 [details] [diff] [review]:
-----------------------------------------------------------------

r=me

::: content/base/test/test_bug711180.html
@@ +11,5 @@
> +<body>
> +<a target="_blank" href="https://bugzilla.mozilla.org/show_bug.cgi?id=711180">Mozilla Bug 711180</a>
> +<p id="display"></p>
> +<div id="content" style="display: none">
> +  

Remove the trailing whitespace here.
Attachment #801270 - Flags: review?(Ms2ger) → review+
Comment on attachment 798143 [details] [diff] [review]
Proposed patch v5

Review of attachment 798143 [details] [diff] [review]:
-----------------------------------------------------------------

> Atob should ignore whitespace - 711180

Make that "Bug 711180 - Ignore whitespace in atob; r=Ms2ger"

and put the test in this patch too.
Attachment #798143 - Flags: review?(Ms2ger) → review+
Attached patch Patch for checkin; r=Ms2ger (obsolete) — Splinter Review
Attachment #801270 - Attachment is obsolete: true
Attachment #802658 - Flags: checkin?(Ms2ger)
Attachment #798143 - Attachment is obsolete: true
Comment on attachment 802658 [details] [diff] [review]
Patch for checkin; r=Ms2ger

Thanks for the patch, Michael! In the future, you can just set the checkin-needed bug keyword instead, though. It's a bit easier for us to work with.
Attachment #802658 - Flags: checkin?(Ms2ger) → checkin+
(In reply to Phil Ringnalda (:philor) from comment #47)
> And backed out in
> https://hg.mozilla.org/integration/mozilla-inbound/rev/315cc4fb111a - turns
> out we have some expected failures in
> http://mxr.mozilla.org/mozilla-central/source/dom/imptests/failures/html/
> html/webappapis/atob/test_base64.html.json?force=1 which no longer fail with
> this patch and need to have their annoatations adjusted, see
> https://tbpl.mozilla.org/php/getParsedLog.php?id=27669092&tree=Mozilla-
> Inbound

Should those just be removed, now that they are capable of passing?
Flags: needinfo?(philringnalda)
I presume so, but the reason I backed it out rather than fixing it was mostly because I don't actually know how either those annotations or the entire testsuite works.
Flags: needinfo?(philringnalda) → needinfo?(Ms2ger)
Comment on attachment 802658 [details] [diff] [review]
Patch for checkin; r=Ms2ger

>+  bool needTrimming = false;
>+  while (iter < end) {
>+    if (!nsContentUtils::IsHTMLWhitespace(*iter)) {
>+      if (!needTrimming) {
>+        needTrimming = true;
>+        trimmedString.Append(start, iter - start);
>+      }
>+      trimmedString.Append(*iter);
>+    }
>+    iter++;
>+  }
OK, so the bit I understand is this section:
while (iter < end) {
  if (!nsContentUtils::IsHTMLWhitespace(*iter))
    trimmedString.Append(*iter);
  iter++;
}
What I can't figure out is what, if anything, the needTrimming code gets you. My best guess is that perhaps you wanted the result string to be the original string, but if you find some whitespace then you make a copy of the string so far and start appending non-whitespace to it.
Where does the needTrimming code come from? I didn't see that in the patch I reviewed...

For the test, you should be able to remove the lines in

http://mxr.mozilla.org/mozilla-central/source/dom/imptests/failures/html/html/webappapis/atob/test_base64.html.json?force=1

that test whitespace. Looks like that's 3--12.
Flags: needinfo?(Ms2ger)
OOPS. It helps if I upload the right one. OK, thanks!
Attached patch The right version (obsolete) — Splinter Review
Attachment #802658 - Attachment is obsolete: true
Attachment #803028 - Flags: review?(Ms2ger)
Attached patch Rebased (obsolete) — Splinter Review
Attachment #803028 - Attachment is obsolete: true
Attachment #803028 - Flags: review?(Ms2ger)
Attachment #808336 - Flags: review?(Ms2ger)
Attached patch Rebased (obsolete) — Splinter Review
Attachment #808336 - Attachment is obsolete: true
Attachment #808336 - Flags: review?(Ms2ger)
Attachment #808337 - Flags: review?(Ms2ger)
Attached patch The really real patch (obsolete) — Splinter Review
INI changes
Attachment #808337 - Attachment is obsolete: true
Attachment #808337 - Flags: review?(Ms2ger)
Attachment #813668 - Flags: review?(Ms2ger)
Attached patch Removed duplicate INI entry (obsolete) — Splinter Review
Attachment #813668 - Attachment is obsolete: true
Attachment #813668 - Flags: review?(Ms2ger)
Attachment #814256 - Flags: review?(Ms2ger)
Comment on attachment 814256 [details] [diff] [review]
Removed duplicate INI entry

Review of attachment 814256 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #814256 - Flags: review?(Ms2ger) → review+
Keywords: checkin-needed
Attachment #814256 - Attachment is obsolete: true
Pushed this along with a bunch of other checkin-neededs to try:
https://tbpl.mozilla.org/?tree=Try&rev=503271ae46d1
https://hg.mozilla.org/mozilla-central/rev/a151e014c562
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.