Closed Bug 1419658 Opened 7 years ago Closed 7 years ago

Change Basic authentication request header username and password character encoding to UTF-8 (used to be ISO-8859-1)

Categories

(Core :: Networking: HTTP, enhancement, P3)

57 Branch
enhancement

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: viapanda, Assigned: CuveeHsu)

References

Details

(Keywords: dev-doc-complete, feature, site-compat, Whiteboard: [necko-triaged])

Attachments

(1 file, 2 obsolete files)

Access a website with basic auth, using login "é" and password "é"

Firefox sends this: `Authorization Basic 6Trp`

Curl it:

`curl -k -u é:é -iv http://fubar`

Gives: `Authorization: Basic w6k6w6k=`

Trying with anything outside of latin1 (eg: ∞) will of course give similar results.

Understood there is some (quite old) background discussion over here: https://bugzilla.mozilla.org/show_bug.cgi?id=436033

At this point in time though, the rest of the ecosystem (Chrome, curl, nginx, and likely others) have settled on utf8.
For the record, embedding the login:password into the url (`é:é@fubar`) gives the expected result.

Sounds like the prompt encoding is messed-up.
Component: General → Networking: HTTP
Product: Firefox → Core
I think this is duplicate of bug 41489.
Severity: critical → normal
Mmmm... not sure...

There seem to be two bugs here:
 * the fact that firefox treats url-embedded-credentials and prompt-credentials differently is one issue
 * how firefox should treat the input is another (<- that's probably the "dupe" part?)
About the component (Networking:HTTP), not sure either.
Since url-embedded credentials are working fine, it feels more like a firefox issue with the way the prompt is handled?

Oh well, you know best, I'm just guessing :-)
Hello njn,
Do we have base64 encoding which doesn't drop high bits?
I find [1] only :(


[1] https://searchfox.org/mozilla-central/rev/7a8c667bdd2a4a32746c9862356e199627c0896d/xpcom/io/Base64.h#32-34
Flags: needinfo?(n.nethercote)
Attached patch BaseAuth16bits, WIPv1 (obsolete) — Splinter Review
Assignee: nobody → juhsu
Status: UNCONFIRMED → NEW
Ever confirmed: true
Digest should be with issue, too.
https://searchfox.org/mozilla-central/rev/7a8c667bdd2a4a32746c9862356e199627c0896d/netwerk/protocol/http/nsHttpDigestAuth.cpp#317
Priority: -- → P3
Whiteboard: [necko-triaged]
> Do we have base64 encoding which doesn't drop high bits?
> I find [1] only :(
> 
> [1]
> https://searchfox.org/mozilla-central/rev/
> 7a8c667bdd2a4a32746c9862356e199627c0896d/xpcom/io/Base64.h#32-34

That's the only one I know of. erahm, do you know any more about this?
Flags: needinfo?(n.nethercote) → needinfo?(erahm)
(In reply to mnot from comment #9)
> See also the "charset" param:
>   http://httpwg.org/specs/rfc7617.html#rfc.section.2.1

Thanks for your input.
Obviously we're not supporting charset now.

This reminds me to check the spec for default charset now.
It's 8-bits char set with the common sense ISO-8859-1, including é
(In reply to Nicholas Nethercote [:njn] from comment #8)
> > Do we have base64 encoding which doesn't drop high bits?
> > I find [1] only :(
> > 
> > [1]
> > https://searchfox.org/mozilla-central/rev/
> > 7a8c667bdd2a4a32746c9862356e199627c0896d/xpcom/io/Base64.h#32-34
> 
> That's the only one I know of. erahm, do you know any more about this?

The "binary versions" (input stream and char*) [1] don't mess with high bits. The one you referenced *does* drop the high bits.

[1] https://searchfox.org/mozilla-central/rev/ba2b0cf4d16711d37d4bf4d267b187c9a27f6638/xpcom/io/Base64.h#16-30
Flags: needinfo?(erahm)
(In reply to Junior[:junior] from comment #7)
> Digest should be with issue, too.
> https://searchfox.org/mozilla-central/rev/
> 7a8c667bdd2a4a32746c9862356e199627c0896d/netwerk/protocol/http/
> nsHttpDigestAuth.cpp#317

Spec only specifies that we need to support 8 bits. I guess Digest auth is good here.
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #11)
> (In reply to Nicholas Nethercote [:njn] from comment #8)
> > > Do we have base64 encoding which doesn't drop high bits?
> > > I find [1] only :(
> > > 
> > > [1]
> > > https://searchfox.org/mozilla-central/rev/
> > > 7a8c667bdd2a4a32746c9862356e199627c0896d/xpcom/io/Base64.h#32-34
> > 
> > That's the only one I know of. erahm, do you know any more about this?
> 
> The "binary versions" (input stream and char*) [1] don't mess with high
> bits. The one you referenced *does* drop the high bits.
> 
> [1]
> https://searchfox.org/mozilla-central/rev/
> ba2b0cf4d16711d37d4bf4d267b187c9a27f6638/xpcom/io/Base64.h#16-30

It works well, thanks!
Attached patch basicAuth8bits, v2 (obsolete) — Splinter Review
not sure if we need to bump the kAllKnownFlags
Attachment #8932425 - Attachment is obsolete: true
Attachment #8934882 - Flags: review?(dd.mozilla)
Please rather consult patches in bug 41489.  Those would be nice to get finished.  It's IMO not that simple as to just convert to UTF-8 all the time.  An extension to the request has been specified for this.
Flags: needinfo?(juhsu)
Did you mean the "charset" mentioned by comment 9?
Now we're not support this extension now.
I could also take bug 41489.
Good to know patches are already there.

On the other hand, we should support ISO-8859-1 (i.e., printable character in 128-255)
no matter the charset is specified or not.

The patch in bug 41489 doesn't cover this.
My idea is to land this first and to rebase after.
Flags: needinfo?(juhsu)
(In reply to Junior[:junior] from comment #16)
> Did you mean the "charset" mentioned by comment 9?
> Now we're not support this extension now.

Bug 41489 is trying to impl that.

> I could also take bug 41489.

it would be great!  unfortunately the contributor stopped responding completely (there were always things to fix and me and him both got frustrated from the everlasting pingpong)

> Good to know patches are already there.
> 
> On the other hand, we should support ISO-8859-1 (i.e., printable character
> in 128-255)
> no matter the charset is specified or not.

I think it's backward compat.  OTOH a nice prop of utf8 is that it is 7bit ascii when you have 7bit chars in the source you convert from.

> 
> The patch in bug 41489 doesn't cover this.

why exactly?  I though it was more like a universal thing.

> My idea is to land this first and to rebase after.

maybe this patch will fix most of the cases for basic.  would be good to update digest as well, but it way less used, so lesser impact.

thanks!
Comment on attachment 8934882 [details] [diff] [review]
basicAuth8bits, v2

I will let Honza review this.
Attachment #8934882 - Flags: review?(dd.mozilla) → review?(honzab.moz)
> > Good to know patches are already there.
> > 
> > On the other hand, we should support ISO-8859-1 (i.e., printable character
> > in 128-255)
> > no matter the charset is specified or not.
> 
> I think it's backward compat.  OTOH a nice prop of utf8 is that it is 7bit
> ascii when you have 7bit chars in the source you convert from.
> 
> > 
> > The patch in bug 41489 doesn't cover this.
> 
> why exactly?  I though it was more like a universal thing.
> 
for example:
(1) |WWW-Authenticate: "Basic realm=foo"| indicates ISO-8859-1 charset.
We need to support ascii and those non-ascii in 128-255 like 'é'.

(2) And |WWW-Authenticate: "Basic realm=foo charset=UTF-8" indicates UTF-8 set.
UTF-8 supports >2000 chars like '∞'

This bug implements (1) and bug 41489 implements (2)
i.e., with patches in bug 41489, we still don't support 'é' with |WWW-Authenticate: "Basic realm=foo"| 

> > My idea is to land this first and to rebase after.
> 
> maybe this patch will fix most of the cases for basic.  would be good to
> update digest as well, but it way less used, so lesser impact.
> 
> thanks!

I didn't test, but there's no downgrade from UTF-8 to ascii in [1]
I believe Digest is fine with charset is not specified.

[1] https://searchfox.org/mozilla-central/rev/7a8c667bdd2a4a32746c9862356e199627c0896d/netwerk/protocol/http/nsHttpDigestAuth.cpp#182-393
(In reply to Junior[:junior] from comment #19)
> for example:
> (1) |WWW-Authenticate: "Basic realm=foo"| indicates ISO-8859-1 charset.
> We need to support ascii and those non-ascii in 128-255 like 'é'.
> 
> (2) And |WWW-Authenticate: "Basic realm=foo charset=UTF-8" indicates UTF-8
> set.
> UTF-8 supports >2000 chars like '∞'
> 
> This bug implements (1) 

No.  This presumes "Basic realm=foo charset=UTF-8" has been sent by the server.

> and bug 41489 implements (2)
> i.e., with patches in bug 41489, we still don't support 'é' with
> |WWW-Authenticate: "Basic realm=foo"| 

Yeah.  And that is IMO correct.  That is exactly the reason why servers should send the extension and explicitly say which charset they expect the username and password be at.  The default is, unless I'm mistaken, still ISO-8859-1.

If other major browsers are these days defaulting to UTF-8 without the "charset=UTF-8" response header extension, then we can take this patch and probably close 41489 as invalid (after making the same default change for digest)

(In reply to Olivier from comment #3)
> Mmmm... not sure...
> 
> There seem to be two bugs here:
>  * the fact that firefox treats url-embedded-credentials and
> prompt-credentials differently is one issue
>  * how firefox should treat the input is another (<- that's probably the
> "dupe" part?)

This is a good point.  We accept url specified creds as UTF-8 probably because we parse URLs as UTF-8 by default.  Question is how we will behave when the charset for a URL is different (specified in markup, and use clicking a link with credentials)  Asking Valentin on these bits (we may have removed charset specification from our url APIs and assume utf8).  That would be a good argument to accept patch for this bug.
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(juhsu)
(In reply to Honza Bambas (:mayhemer) from comment #20)
> (In reply to Junior[:junior] from comment #19)
> > for example:
> > (1) |WWW-Authenticate: "Basic realm=foo"| indicates ISO-8859-1 charset.
> > We need to support ascii and those non-ascii in 128-255 like 'é'.
> > 
> > (2) And |WWW-Authenticate: "Basic realm=foo charset=UTF-8" indicates UTF-8
> > set.
> > UTF-8 supports >2000 chars like '∞'
> > 
> > This bug implements (1) 
> 
> No.  This presumes "Basic realm=foo charset=UTF-8" has been sent by the
> server.
> 
Mm... You remind me that this patch supports more than ISO-8859-1 
> > and bug 41489 implements (2)
> > i.e., with patches in bug 41489, we still don't support 'é' with
> > |WWW-Authenticate: "Basic realm=foo"| 
> 
> Yeah.  And that is IMO correct.  That is exactly the reason why servers
> should send the extension and explicitly say which charset they expect the
> username and password be at.  The default is, unless I'm mistaken, still
> ISO-8859-1.
> 
> If other major browsers are these days defaulting to UTF-8 without the
> "charset=UTF-8" response header extension, then we can take this patch and
> probably close 41489 as invalid (after making the same default change for
> digest)

charset only allowed UTF-8 now [1] (we can't specify the charset is default or ISO-8859-1)
If charset is not set, it is ISO-8859-1 (spec specifies 8-bits char, and ISO-8859-1 is kinda consensus)

I'm not an auth master, but I never see the charset :p
Chrome seems use UTF-8 as the default [2]

At least we can exploit the tests in bug 41489 

[1] http://httpwg.org/specs/rfc7617.html#rfc.section.2.1
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=436033#c6
Flags: needinfo?(juhsu)
Thanks for the summary.  Since this bug is going around on and on and we don't do anything about it because we are scared to break the web, let's do it and change the default to UTF-8 and see what happens.  This should be made public on the right forums, tho.
Comment on attachment 8934882 [details] [diff] [review]
basicAuth8bits, v2

Review of attachment 8934882 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpBasicAuth.cpp
@@ +86,5 @@
>      // we only know how to deal with Basic auth for http.
>      bool isBasicAuth = !PL_strncasecmp(challenge, "basic", 5);
>      NS_ENSURE_TRUE(isBasicAuth, NS_ERROR_UNEXPECTED);
>  
> +    // we work with UTF8 around here

UTF-8

@@ +93,3 @@
>      userpass.Append(':'); // always send a ':' (see bug 129565)
>      if (password)
> +        AppendUTF16toUTF8(password, userpass);

{ } please

@@ +96,3 @@
>  
> +    nsAutoCString authString;
> +    NS_ENSURE_SUCCESS(Base64Encode(userpass, authString), NS_ERROR_FAILURE);

go via rv please or do if (NS_FAILED(..) ..
Attachment #8934882 - Flags: review?(honzab.moz) → review+
(In reply to Honza Bambas (:mayhemer) from comment #20)
> This is a good point.  We accept url specified creds as UTF-8 probably
> because we parse URLs as UTF-8 by default.  Question is how we will behave
> when the charset for a URL is different (specified in markup, and use
> clicking a link with credentials)  Asking Valentin on these bits (we may
> have removed charset specification from our url APIs and assume utf8).  That
> would be a good argument to accept patch for this bug.

We use UTF-8 for all parts of the URL, except the query - where we use the charset specified by the page.
Flags: needinfo?(valentin.gosu)
Changing the default is very likely to break things as there are existing usages which rely on the ISO-8859-1 default. See discussion in <https://www.greenbytes.de/tech/webdav/rfc7617.html#rfc.section.B.3>.
Keywords: checkin-needed
Pushed by btara@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f6f90940e417
let Basic http auth support ISO-8859-1 user/password, r=honzab
Keywords: checkin-needed
Adding Kohei as owner of fxsitecompat.com!
Keywords: feature
Summary: Basic (probably digest?) authentication broken with non ASCII chars → Change Basic authentication request header username and password character encoding to UTF-8 (used to be ISO-8859-1)
Thanks :) Adding the dev-doc-needed keyword for MDN as well.
https://hg.mozilla.org/mozilla-central/rev/f6f90940e417
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
So I was about to write a site compatibility doc for this, but given that Chrome already uses UTF-8 and IIUC UTF-8 is compatible with ISO-8859-1 within the range of alphanumeric and common symbols traditionally allowed for username/password, I _think_ this change won’t break anything in the wild. If there is a report of any affected site, I’m happy to post a note at that time :)
I've mentioned this in the docs, as I think it is still worth highlighting, just in case:

https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#Character_encoding_of_HTTP_authentication
https://developer.mozilla.org/en-US/Firefox/Releases/59#Security

Let me know if this sounds OK. Thanks!
Flags: needinfo?(juhsu)
Re-requesting site-compat because this change actually broke some sites. RFC even warned about the user-agent sniffing:
https://tools.ietf.org/html/rfc7617#appendix-B.3
Keywords: site-compat
Could you please check how this change affected the auth per XMLHttpRequest.open()?
A password with special characters is still accepted per Firefox UI, but no longer per XMLHttpRequest (given as UTF-8).
Chrome does in either way.
It was Chrome compatible before V59 (at least in the range of compatibilty between ISO-8859-1 and UTF-8).
Thanks for the heads-up. Posted the site compatibility note. Will tweet on Monday. https://www.fxsitecompat.com/en-CA/docs/2018/basic-auth-credentials-are-now-encoded-in-utf-8-instead-of-iso-8859-1/
Flags: needinfo?(juhsu)
Blocks: 41489
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: