Last Comment Bug 41489 - HTTP authentication does not support non-ISO-8859-1 characters
: HTTP authentication does not support non-ISO-8859-1 characters
Status: ASSIGNED
[necko-would-take]
: dev-doc-needed, intl
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: All All
: -- major with 15 votes (vote)
: ---
Assigned To: for nothing
:
Mentors:
http://greenbytes.de/tech/webdav/rfc7...
: 55738 88432 263576 264044 337130 340438 352953 420668 453342 576949 629962 656213 (view as bug list)
Depends on: 656503 656213
Blocks: 546330 61681
  Show dependency treegraph
 
Reported: 2000-06-04 12:48 PDT by Stephen P. Morse
Modified: 2016-05-17 11:08 PDT (History)
66 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-


Attachments
Patch for UTF-8 encode (1.45 KB, patch)
2008-04-15 11:30 PDT, Andrey M.
no flags Details | Diff | Review
Simple http authentication (290 bytes, application/x-php)
2008-04-15 11:42 PDT, Andrey M.
no flags Details
Convert basic credentials to UTF-8 prior to base64 encoding them (7.96 KB, patch)
2010-01-09 06:58 PST, Robert Sayre
no flags Details | Diff | Review
Allow converting basic creds to UTF-8 before base64 encoding (16.07 KB, patch)
2011-03-24 12:59 PDT, Nicholas Hurley [:nwgh][:hurley]
brian: review-
bzbarsky: superreview+
Details | Diff | Review
Allow utf-8 encoded username in digest authentication response header. (4.17 KB, patch)
2013-04-07 23:58 PDT, ggo
no flags Details | Diff | Review
Allow utf-8 encoded username in digest authentication response header. (4.17 KB, patch)
2013-04-22 09:23 PDT, ggo
brian: review-
Details | Diff | Review
to do here as draft-ietf-httpauth-digest-15 (26.67 KB, patch)
2015-04-02 23:09 PDT, for nothing
honzab.moz: review-
Details | Diff | Review
to do here as draft-ietf-httpauth-digest-15 (refreshed as it should be) (25.95 KB, patch)
2015-04-07 14:45 PDT, Honza Bambas (:mayhemer)
honzab.moz: review-
Details | Diff | Review
to do here as draft-ietf-httpauth-digest (26.84 KB, patch)
2016-02-01 19:34 PST, for nothing
no flags Details | Diff | Review
to do here as rfc 7616 (26.02 KB, patch)
2016-02-08 09:46 PST, for nothing
honzab.moz: review-
Details | Diff | Review
patch for basic auth (2.38 KB, patch)
2016-05-09 23:14 PDT, for nothing
honzab.moz: review-
Details | Diff | Review
patch for digest auth (22.38 KB, patch)
2016-05-09 23:21 PDT, for nothing
honzab.moz: feedback+
Details | Diff | Review

Description Stephen P. Morse 2000-06-04 12:48:26 PDT
I was just looking over the code for doing http authentication and I 
happened to notice that the username and password are being kept in a char*.  
That won't work with japanese characters for example.  You need to either work 
exclusively with PRUnichar* or you need to have at some point done a converstion  
from PRUnichar to UTF8 encoding.

I picked this up inadvertently in a code reading so I attempted to see what 
would happen if I used japanese characters for http authentication.  I tried it 
and got an assertion failure at nsBasicAuth.cpp line 60.  Comment preceding that 
line says "we work with ASCII around these here parts."

You can continue to work with ascii if you do the utf8 conversion first.  Take a 
look at wallet.cpp and see how I handle double-byte characters and the UTF8 
conversion.

BTW, you'll need to be able to enter and display a japanese character set in 
order to demonstrate this problem.  I questioned the i18n team a long time ago 
about that and was given instructions (see bug 23037) for solving these two 
problems.  Basically you first need to do the following simple steps:

1. Get a japanese font from http://jazz/users/teruko/publish/fonts/ie3lpkja.exe 
and install it.  Need to reboot system after installation.

2. Modify browser prefs to use that font (set variable-width japanese encoding 
to use MS Gothic).  Now you can display japanese characters.

3. In order to enter japanese characters without a japanese keyboard, do a 
copy-and-paste from http://babel/testdata/double_byte/Selected_data_sjis.htm.
Comment 1 Stephen P. Morse 2000-08-10 14:14:01 PDT
I know what "future" means -- it means we aren't going to do it.  Is i18n 
comfortable with the fact that a japanese user will never be able to enter his 
japanese username in an http authentication box?  Is he able to do so now in 
4.x?

It's not that difficult to continue working with ascii and do a utf-8 conversion 
first.  I'll be glad to work with you if you don't know how to do it.
Comment 2 Gagan 2000-08-10 15:18:09 PDT
Steve-- the bugs are not being marked future becuz we don't know how to do them. 
We are prioritizing. We have much more cruel bugs to look at before getting 
around to these. If you want to spare your free time on this-- go ahead.
Comment 3 Stephen P. Morse 2000-08-18 07:24:14 PDT
Haven't heard any comments from the i18n team about this so I guess they don't 
consider this important after all.  Therefore I'll leave it futured and give it 
back to gagan.
Comment 4 bobj 2000-08-18 15:38:04 PDT
Steve,
Thanks for the pro-active efforts on this.
I don't think non-ASCII username/passwords for HTTP authentication is a
high priority for PR3 or maybe even 6.0.  But does this gracefully degrade?
You mention you got an assertion.  In a non-debug build, what will happen?
Comment 5 Stephen P. Morse 2000-08-18 15:49:24 PDT
I don't recall what happened after I got the assertion, it was a while ago that 
I tested this.
Comment 6 bobj 2000-08-21 11:25:42 PDT
Teruko, Can we have someone try this with Japanese characters and add a
comment to this bug about what happens?
Comment 7 Darin Fisher 2000-11-28 15:48:27 PST
This is probably a more serious concern with the realm string.  The realm
will more than likely be sent in a localized form.  However, RFC 2616 states
that the realm string must be of type quoted-string, which means that it
can include any TEXT (including non-ascii characters) except for " and
certain control characters.  If the TEXT contains characters from character
sets other than ISO-8859-1, then the TEXT must be encoded according to the
rules of RFC 2047.  I haven't had a chance to read RFC 2047, so I don't know
exactly what that means. 
Comment 8 Darin Fisher 2000-11-30 19:57:41 PST
->me
Comment 9 Darin Fisher 2001-11-27 11:44:43 PST
-> upping priority
Comment 10 Darin Fisher 2001-11-27 12:15:00 PST
*** Bug 88432 has been marked as a duplicate of this bug. ***
Comment 11 Darin Fisher 2002-01-23 22:20:57 PST
do we have existing code for handling text strings encoded according to RFC 2047?
Comment 12 Darin Fisher 2002-01-23 22:27:07 PST
nsHttpBasicAuth.cpp still has this problem... here's the problem code:

    // we work with ASCII around here
    nsCAutoString userpass;
    userpass.AssignWithConversion(username);
    if (password) {
        userpass.Append(':');
        userpass.AppendWithConversion(password);
    }

according to RFC2617 section 2, the username and password need to be of type
TEXT before they are base64 encoded, and according to RFC2616 section 2.2:

  Words of *TEXT MAY contain characters from character sets other than
  ISO-8859-1 only when encoded according to the rules of RFC2047.

so, for properly interpreting the realm string, we need to decode using RFC2047
and for properly constructing a Authorization request header, we need to encode
using RFC2047.

so, it comes down to this: is there any existing support for RFC2047 within mozilla?
Comment 13 bobj 2002-01-24 04:34:15 PST
> so, it comes down to this: is there any existing support for RFC2047
> within mozilla?

Yes, because we encode mail headers according to rfc2047.
It's probably in source/mailnews/mime/src/

Naoki, can you help?
Comment 14 nhottanscp 2002-01-24 13:01:32 PST
nsIMimeConverter::EncodeMimePartIIStr
http://lxr.mozilla.org/seamonkey/source/mailnews/mime/public/nsIMimeConverter.h#87

Comment 15 Darin Fisher 2002-01-24 20:36:48 PST
oh.. this is bad.  perhaps we need to move some of this mime stuff into a more
generic place.  necko cannot depend on mailnews being installed for proper intl
support.  suggestions?
Comment 16 nhottanscp 2002-01-25 10:57:06 PST
>perhaps we need to move some of this mime stuff into a more generic place
Cc to mail engineers
Comment 17 Darin Fisher 2002-01-28 19:24:21 PST
can someone on the mail team comment on this bug?  i'm wondering if there is a
way to leverage some of the existing mime encoding/decoding facilities for HTTP.
 we need to handle RFC2047 encoding, but necko cannot depend on mailnews.

any suggestions on how best to handle this?
Comment 18 nhottanscp 2002-01-30 10:04:54 PST
To encode, we need to specify a charset name, is that available in the code?
Also, need to convert the string from UTF-8 to that charset, so need uconv. Does
necko currently depends on uconv?
Taka is working on MIME encoder, cc to him.
Comment 19 Takayuki Tei 2002-01-30 11:12:38 PST
Do we really want to do this?  I see lots of implications here which are
not easy to get answers.

For instance, RFC-2617 "3.2.2 The Authorization Request Header" has
following BNF:

       credentials      = "Digest" digest-response
       digest-response  = 1#( username | realm | nonce | digest-uri
                       | response | [ algorithm ] | [cnonce] |
                       [opaque] | [message-qop] |
                           [nonce-count]  | [auth-param] )

       username         = "username" "=" username-value
       username-value   = quoted-string

And, RFC-2047 says:

       + An 'encoded-word' MUST NOT appear within a 'quoted-string'.

I don't know how we can get over this conflict.
Comment 20 Darin Fisher 2002-01-30 12:11:53 PST
RFC2617 depends on the definitions from RFC2616, which specifies that
quoted-string can contain RFC2047 encoded words.

nhotta: necko currently depends on uconv.
Comment 21 Darin Fisher 2002-02-06 18:36:30 PST
-> future
Comment 22 Darin Fisher 2002-02-06 18:37:07 PST
not critical for mozilla 1.0
Comment 23 Andrew Clover 2002-04-10 15:24:11 PDT
[disclaimer: I am unfamiliar with Digest authentication. I just want Basic to work.]

Is the suggestion that user/password values using non-ASCII characters should be
encoded something like:

  =?shift_jis?q?=93=fa?=:=?shift_jis?q?=96=7b=

Is there any server-side software at all that will understand this? It seems a
break from established practice even if it is to the letter of the standard.

Using an encoding intended for mail-header parts where there are many
out-of-bounds characters seems non-sensible for something that's about to be
base64-encoded, ie. no out-of-bounds characters. It's especially silly if the
coding is 'b' - as recommended for primarily non-US-ASCII input - as you then
end up base64-encoding everything twice!

It seems Opera6 just submits basic auth values as UTF-8. This seems fairly
reasonable to me. (maybe any charset given in the Content-Type header of the
response including the WWW-Authenticate header could be used as a hint,
otherwise?) On the other hand, IE6 shares Moz's current inability to submit
non-US-ASCII characters in auth.
Comment 24 Darin Fisher 2002-04-10 15:32:47 PDT
yeah, i don't know why the spec doesn't say to use base64 encoded UTF-8.  that
would be fairly sensible, but there is a defined standard.  opera's
implementation sounds non-conformant, which is generally a bad thing.  internet
standards exist for a reason, right? ;-)
Comment 25 Andrew Clover 2002-04-11 03:07:48 PDT
/me grudgingly accepts the Internet Standards Existing for a Reason argument.
(Although secretly I'd be quite happy if the Mozilla team said "Hey, look over
there! A dinosaur!" and hacked UTF-8 in whilst no-one was paying attention.)

PRO: as well as allowing non-Latin-1 characters to be used in
usernames/passwords, encoding them could finally allow colons to be used.
Currently all browsers submit an entered username of 'a:b' with password of
'c:d' as 'a:b:c:d', leaving the unlucky server to work out what was meant.

CON: as far as I know, no server software will support this. Certainly a quick
flick through the Apache 1.3 auth code and modules reveals no such handling.

Basic Authentication really isn't very well designed is it?
Comment 26 Dmitry Marochko 2004-10-12 10:27:11 PDT
*** Bug 264044 has been marked as a duplicate of this bug. ***
Comment 27 Darin Fisher 2004-10-15 20:59:30 PDT
*** Bug 263576 has been marked as a duplicate of this bug. ***
Comment 28 benc 2004-11-14 23:04:23 PST
There was a comment in bug 263576 about IE doing this a bit better than us, I
think it mean in displaying the realm.
Comment 29 Simon Montagu :smontagu 2006-06-06 05:27:03 PDT
*** Bug 340438 has been marked as a duplicate of this bug. ***
Comment 30 Simon Montagu :smontagu 2006-06-06 05:43:08 PDT
Bug 340438 contains headers sent by Firefox and IE when attempting to login with user name "卓" (U+5353).

We send Authorization: Basic UzpwYXNzd29yZA==
which decodes as S:password

IE sends Authorization: Basic 1786cGFzc3dvcmQ=
which decodes as <0xd7bf>:password. 0xd7bf is 卓 in cp936
Comment 31 Jungshik Shin 2006-06-08 06:15:24 PDT
(In reply to comment #30)

> We send Authorization: Basic UzpwYXNzd29yZA==
> which decodes as S:password

So, we're just truncating 0x5353 (=U+5353 in UTF-16) to 0x53 as expected.
 
> IE sends Authorization: Basic 1786cGFzc3dvcmQ=
> which decodes as <0xd7bf>:password. 0xd7bf is 卓 in cp936

IE uses perhaps the defautl character encoding ... while Opera uses UTF-8. 

And, RFC 2616 stipulates that RFC 2047 be used..  This is a total mess. 
How about major web servers? I wonder whether Apache 2.x has anything better than Apache 1.x

BTW, bug 295084 is about moving RFC 2047/2231 encoding routines from mail to necko.

Comment 32 Christian :Biesinger (don't email me, ping me on IRC) 2006-06-27 03:56:03 PDT
*** Bug 337130 has been marked as a duplicate of this bug. ***
Comment 33 Christian :Biesinger (don't email me, ping me on IRC) 2006-09-16 10:51:46 PDT
*** Bug 352953 has been marked as a duplicate of this bug. ***
Comment 34 Robert Sayre 2006-10-14 19:08:11 PDT
The IETF is spinning some wheels on RFC2617 internationalization. I'll watch them and fix this when they have a solution.
Comment 35 Robert Sayre 2007-05-03 20:13:53 PDT
OK, according to spec, it's clear that non-encoded fields like realm should be rfc2047-encoded, but it's not clear which character set they should be encoded from. The same encoding conundrum applies to hexdigest- and base64-encoded fields.
Comment 36 David Nesting 2007-05-07 13:16:52 PDT
Ick.

The WWW-Authenticate header is sent in response to a request, so the server may have client cues (Accept-Charset) to go on in formatting this header per RFC2047.  We (the browser) shouldn't need to care what's in the realm, except when we need to display it to the user, but even then, things should "work" even if we don't understand how to display it.  (Is the realm "foo" equivalent to the realm "=?US-ASCII?Q?foo?="?)

Andrew's comment 23 would seem to be the proper approach, according to RFC 2617.  This seems very inelegant, though, and I agree that nothing anywhere is going to support this any time soon, and in the mean time, lots of stuff will break.  I would advocate a change to the standards, moving to UTF-8 in base64-encoded fields.  The real constraint is the need for the headers to remain US-ASCII, and by encoding with base64, you've already accomplished that.
Comment 37 Robert Sayre 2007-05-13 21:48:35 PDT
(In reply to comment #36)
> Ick.
> 
> The WWW-Authenticate header is sent in response to a request, so the server may
> have client cues (Accept-Charset) to go on in formatting this header per
> RFC2047.  We (the browser) shouldn't need to care what's in the realm, except
> when we need to display it to the user, but even then, things should "work"...

There is a general problem here, regarding HTTP authentication headers, but you have made conclusions that apply only to Basic authentication. For example, the Realm is part of the "response" field in Digest. That field is hex-encoded, so there are no immediate problems with it, but it's not clear how to encode the given fields in order to perform the Digest calculations. Authentication needs to work if the username is in Mongolian, the password is in Japanese, and the Realm contains a Euro symbols.

> 
> Andrew's comment 23 would seem to be the proper approach, according to RFC
> 2617.  This seems very inelegant, though, and I agree that nothing anywhere is
> going to support this any time soon, and in the mean time, lots of stuff will
> break.  I would advocate a change to the standards, moving to UTF-8 in
> base64-encoded fields.

Yeah, that is one option. Unfortunately, it is not compatible with IE, afaik.
Comment 38 Zhang Weiwu 2007-12-30 04:55:43 PST
Is there a way to push this? I made a bad mistake that yesterday announced the username will be each office name and later leant this bug. When I decide username to be the office name I never ever imaged Fx have i18n problem on this part, because Fx holds i18n crown IMHO. Now I am frustrated to announce something different again, since users are coming from all different places and any change in this organization is horrible. Yes I should have checked Fx before any user management policy, but who can expect such bug?

Send username in the way IE does, or send it in the way Opera does, or send it in the way RFC does, whatever you do I can modify my web application to cope with it. But only, only if you drop some information from username, that makes it impossible for web developer side to do anything to save the situation. ANY CHANGE is better THAN CURRENT SITUATION. However I knew it's impossible in the atomsphere, things are like that because they used to be like that, and that's enough to make sure it hardly can be changed.

Please understand that it really breaks things completely for non-ASCII users. Sometimes I think in situation like Opera or IE, in non-democratic organization a manager can make arbitrary decision, and it happen to help solving problems faster. (Though one can argue it also helps creating problem faster)
Comment 39 Andrew Clover 2008-01-05 08:32:51 PST
Had a further rummage through the RFCs and to be honest none of it makes much sense. RFC2616's references to RFC2047 are in the context of TEXT, which in all the cases that matter to us are completely against the spirit of RFC2047 encoded-words-as-atoms - indeed in the case of the quoted-string realm, as Taka posted, it's also against the letter of RFC2047.

I think this is an error in 2616 (it wouldn't be the only one). HTTP is not explicitly an RFC822-family protocol: it defines its own header structures that are incompatible with 822 in significant ways instead of deferring to the older standard. If it really wanted to say we should be putting 2047 encoded-words inside 2616 quoted-strings, and base64-encoding them before putting them in auth headers, it should have specifically said so. Just referencing 'the rules of RFC2047' is not enough when those rules are incompatible with the rules of 2616.

In summary, the HTTP standard is self-contradictory and nobody else has managed to interpret and implement it in any way. So we should simply ignore it and not attempt to put 2047 encoded-words in HTTP headers at all(*).

The remaining options are: (a) continue to simply drop non-ISO-8859-1 characters on the floor (status quo), (b) just use UTF-8 (Opera method), (c) use the system codepage (IE method), and (d) guess.

If (a), we should at least stop non-ISO-8859-1 characters from being entered in the UI, to make it clear it's not permissable.

If (b), we potentially break any sites that are currently working with ISO-8859-1 (eg. if I have a username with an acute accent on it), when running on non-Western European machines. Are Opera users significantly affected by this? I haven't seen any reports of it.

If (c), we potentially break any sites not running in the same region as the client, and introduce an obscure dependency the user won't know how to sort out. How important is IE compatibility these days?

(d) might work, perhaps based on the Content-Type;charset of the page served to us with the WWW-Authenticate header, if any, falling back to pref intl.charset.default.
Comment 40 Andrew Clover 2008-01-05 08:45:14 PST
heh, forgot to add the off-topic footnote -

* - in the main entity headers, that is.

When posting eg. a multipart/form-data, the headers in each part are out of the scope of 2616, which explicitly (in section 3.7.2) hands their definition over to RFC2046, making them real 822-family headers, where RFC2047 and RFC2231 apply in the usual way.

That would mean technically we might be submitting non-ASCII form field as something like:

  Content-Disposition: form-data; name*=iso-8859-1'en'n=E9e

However, RFC2388, defining multipart/form-data again makes the same mistake as RFC2616 and HTML 4.01 in recommending RFC2047 encoded-words for use inside parameter values, as specifically disallowed in 2047. Sigh, them standards eh...

Of course in practice Mozilla should do neither, as it would solve little and break a lot, and continue with the current strategy of encoding field names to the form submission charset. Whether it is worth correctly following the quoted-pair scheme for backslash-escaping " and \ inside a quoted-string is another matter. (Opera does; IE does not.)
Comment 41 Christian :Biesinger (don't email me, ping me on IRC) 2008-03-03 02:44:18 PST
*** Bug 420668 has been marked as a duplicate of this bug. ***
Comment 42 Alexander 2008-03-03 15:46:55 PST
Eight years of this bug. Maybe it shold be fixed? All you need is to encode data to utf-8, as I undertand.
Comment 43 Christian :Biesinger (don't email me, ping me on IRC) 2008-03-04 00:42:53 PST
HTTP does not allow non-latin1 headers, see RFC 2616 2.2 (definition of TEXT), they have to be encoded.
Comment 44 Alexander 2008-03-04 00:57:19 PST
Is utf-8 not compatible with this standard?
Comment 45 Zhang Weiwu 2008-03-04 01:05:19 PST
"HTTP does not allow non-latin1 headers, see RFC 2616 2.2 (definition of TEXT),
they have to be encoded."

That doesn't mean we (the users) should only use latin1, which is what's happening now. I use my own language everywhere, why should I do latin1 because of Fx?

Whatever solution it is, just solve it. (UTF-8 is most attractive for being simple and effective).
Comment 46 Christian :Biesinger (don't email me, ping me on IRC) 2008-03-04 02:43:21 PST
All I was saying is that it's not as simple as comment 42 made it sound.
Comment 47 Zhang Weiwu 2008-03-04 02:51:37 PST
"Is utf-8 not compatible with this standard?"

Unfortunately no. First it is RFC, that's regulation or (strong) recommendation, not strictly a standard. However I am not sure if it's practical to wait for IETF to correct it upstream until we do something. On the other hand, I am not sure if this is possible, that Fx corrects it NOW (say, using UTF8), then IETF has the pressure that NO BROWSER FOLLOWS their RFC and the method used by more browsers is UTF8, then it has to recognize this fact and update RFC, say, by 2028.

I am completely unsure how IETF works and if this is practical, but letting slowness from update of RFC affect end user experience is the down side of opensource. If it is commercial software, they would have pressure from user side to not to wait for a mature standard (e.g. Opera and IE).
Comment 48 Alexander 2008-03-04 19:01:16 PST
Support Zhang Weiwu. Why non-english users must be limited to ASCII characters?
Comment 49 Alexey Gubanov 2008-03-19 21:51:20 PDT
This bug added to our (Mozilla Russia)  bug bounty program. Cost is $300. For more details visit program page http://www.mozilla-russia.org/contribute/bounty-en.html
Comment 50 Andrey M. 2008-04-15 11:30:41 PDT
Created attachment 315810 [details] [diff] [review]
Patch for UTF-8 encode

This patch change behaviour of http authentication to send username and password in UTF-8. After this patch authentication in Mozilla happen exactly like Opera.
Comment 51 Andrey M. 2008-04-15 11:42:54 PDT
Created attachment 315813 [details]
Simple http authentication

This php script implement simple http authentication for testing this bug.
Comment 52 Zhang Weiwu 2008-04-15 19:50:58 PDT
A user's voice: SUPPORT THIS MOVE!
Comment 53 Julian Reschke 2008-05-27 10:00:05 PDT
It appears that FF indeed uses UTF-8, if and only if the request is sent via XMLHTTPRequest (otherwise, as discussed above, ISO-8859-1). This really looks bizarre. Does this deserve it's own bug?
Comment 54 Wladimir Palant 2008-09-02 11:57:02 PDT
*** Bug 453342 has been marked as a duplicate of this bug. ***
Comment 55 Evgeniy Ivanov 2008-09-28 04:43:45 PDT
(In reply to comment #50)
> Created an attachment (id=315810) [details]
> Patch for UTF-8 encode
> 
> This patch change behaviour of http authentication to send username and
> password in UTF-8. After this patch authentication in Mozilla happen exactly
> like Opera.

Since we should to respect standards I can suggest to do it only for non-Latin characters. And leave ASCII for latin usernames/passwords. Doing things in this way we will not change current behavior, but fix the bug for non-latin characters.
Comment 56 Julian Reschke 2008-09-28 06:37:00 PDT
(In reply to comment #55)
> Since we should to respect standards I can suggest to do it only for non-Latin
> characters. And leave ASCII for latin usernames/passwords. Doing things in this
> way we will not change current behavior, but fix the bug for non-latin
> characters.

That doesn't compute. USASCII is a subset of UTF-8.

That being said: the spec says it's ISO-8859-1, and changing this would break existing servers.
Comment 57 Evgeniy Ivanov 2008-10-03 07:02:55 PDT
(In reply to comment #56)
> That doesn't compute. USASCII is a subset of UTF-8.
> 
> That being said: the spec says it's ISO-8859-1, and changing this would break
> existing servers.
Oops, I meant leaving ISO-8859-1 for latin characters. And switching to UTF-8 for non-Latin. I don't think it can break anything, but it will help to work with server configurations that understand utf-8.
Comment 58 Andrew Clover 2008-10-03 10:37:20 PDT
> Oops, I meant leaving ISO-8859-1 for latin characters. And switching to UTF-8
for non-Latin.

I don't think having mixed-encoding strings is a useful option, it makes no sense and is compatible with no other browser.

Having discarded the option of RFC2047 encoding on the grounds that it is a logical contradiction and not compatible with anything, the basic options are:

1. ISO-8859-1. Compatible with the RFC and with IE in the US and Western Europe.

2. The system codepage. Compatible with IE in all locales.

3. UTF-8. Compatible with Opera, never compatible with IE (the system codepage cannot be UTF-8 under Windows.)

In my opinion none of these are really satisfactory.

To 2 we can add 2a: pref intl.charset.default. This is eminently preferable to the system codepage in my view as it is easier for the user to set without negative cross-platform consequences.

Better would be if we could guess an encoding based on any charset given in the Content-Type of the response containing the original WWW-Authorize header that drove the authentication request. (With fallback to options 1 or 2a where no such parameter is set.) However this is not a simple hack I can immediately see how to do; we would have to remember a charset everywhere auth details are stored.
Comment 59 Julian Reschke 2008-10-03 10:45:42 PDT
Another options is to define "basic2" and "digest2", let them use UTF-8, and try to get this implemented in Apache and IIS as well.
Comment 60 Wladimir Palant 2008-10-03 10:47:11 PDT
(In reply to comment #58)
> Better would be if we could guess an encoding based on any charset given in the
> Content-Type of the response containing the original WWW-Authorize header that
> drove the authentication request.

That would be the charset of the server's 401 page - but this is usually a standard page meaning that its charset isn't related to the charset of the page you are sending the password to.
Comment 61 Magnus Melin 2008-10-03 11:13:01 PDT
> 2. The system codepage. Compatible with IE in all locales.

Though usable only by coincidence, as the server side wouldn't know what to decode to. My 2c would certainly be to go with the UTF-8 approach.
Comment 62 Andrew Clover 2008-10-03 13:05:05 PDT
> this is usually a standard page meaning that its charset isn't related to the charset of the page

It's not necessarily a standard page. If you're writing a web application you'll be returning an error page, probably in the same charset as every other page, which will be in the same charset you want for the authentication (UTF-8 if you're sensible).

If you get the standard page, the web server probably won't be serving it with a charset parameter so it falls through to the default.

> Though usable only by coincidence, as the server side wouldn't know what to
decode to. My 2c would certainly be to go with the UTF-8 approach.

Yes, but compatible by coincidence is still better than compatible never!

Putting UTF-8 in as-is does break existing IE compatibility for the US and Western Europe locales, giving site author no way to be compatible with both browsers.

> Another options is to define "basic2" and "digest2"

That would be nice, but a lot of work pushing it through WHAT-WG and browsers before site authors can use it! Is there any way it could be done retaining backwards compatibility for the cases where no non-ASCII characters are used? eg.

WWW-Authorize: basic realm="My realm;charset=utf-8"

(Looks ugly if the browser doesn't strip off the trailer, but at least it would still work.)
Comment 63 Julian Reschke 2008-10-03 13:42:12 PDT
(In reply to comment #62)
> > Another options is to define "basic2" and "digest2"
> 
> That would be nice, but a lot of work pushing it through WHAT-WG and browsers
> before site authors can use it! Is there any way it could be done retaining
> backwards compatibility for the cases where no non-ASCII characters are used?
> eg.
> 
> WWW-Authorize: basic realm="My realm;charset=utf-8"
> 
> (Looks ugly if the browser doesn't strip off the trailer, but at least it would
> still work.)

The WHATWG isn't relevant for that. The HTTPbis WG (IETF) could be, although it's not chartered to define new authentication schemes.

That being said, new schemes can easily be introduced (from a technical p.o.v.) - the server can offer several different schemes, and the client can pick what it knows about.
Comment 64 Kyle Huey [:khuey] (khuey@mozilla.com) 2009-05-14 10:49:51 PDT
This bug has been stalled for almost a decade now, and it's certainly not the only HTTP Auth bug that's been around this long.  It would be nice it we could make some progress on this.

My two cents:
Providing some mechanism to allow non-ASCII characters is better than providing no mechanism.
UTF-8 encoding makes the most sense of the options presented.  If the IETF standard is unimplementable as it seems to be from the above description, we should do what seems the most sensible IMO.

There's already a patch here for this (haven't checked for bitrot though), so if we can come to a consensus on the approach to take this bug can be resolved shortly.
Comment 65 Julian Reschke 2009-05-14 11:03:05 PDT
"This bug has been stalled for almost a decade now, and it's certainly not the
only HTTP Auth bug that's been around this long.  It would be nice it we could
make some progress on this."

Yes.

"Providing some mechanism to allow non-ASCII characters is better than providing
no mechanism."

There is a mechanism, it's ISO-8859-1. There are servers out there that rely on it.

"UTF-8 encoding makes the most sense of the options presented.  If the IETF
standard is unimplementable as it seems to be from the above description, we
should do what seems the most sensible IMO."

The IETF standard is implementable, it "just" doesn't handle non ISO-8859-1 characters.

"There's already a patch here for this (haven't checked for bitrot though), so
if we can come to a consensus on the approach to take this bug can be resolved
shortly."

Big -1. You will break stuff.

The right fix is to define a new scheme, let's say "Basic2", which differs from "Basic" in the encoding being UTF-8 instead of ISO-8859-1, and get it deployed. For instance, by adding support both in Firefox and Apache httpd.
Comment 66 Andrew Clover 2009-05-21 04:32:27 PDT
Kyle> It would be nice it we could make some progress on this.

Well, yes, but no-one can agree which direction counts as ‘progress’! There is no one direction that improves every use case from what we have at the moment. Whilst I agree that UTF-8 is a more sensible encoding in general (and using it was my original suggestion), we'd lose the accidental-compatibility with IE we currently have for Western European users (other than the differences between cp1252 and 8859-1).

Julian> There is a mechanism, it's ISO-8859-1.

I'd still dispute that RFC2616 says that. Whilst HTTP/1.1 headers are explicitly ISO-8559-1, authentication details are not an HTTP header, even if they are encapsulated in one. The base64 token decodes only to a bunch of bytes; there is no standards-endorsed way of turning that into characters.

Indeed, the current mechanism Moz uses isn't even ISO-8859-1 as such, it's just taking the least-significant byte of a UTF-16 codepoint. I would much rather have the user's configured default encoding used for this; this could retain the IE-partial-compatibility and allow other users to at least get sites to work until such time as a proper charset-signalling method is available:

> The right fix is to define a new scheme, let's say "Basic2"

Well, right or not, I can't see it getting support within another decade! :-(

Most importantly, could it be deployed in a backwards-compatible way? Do UAs even understand multiple ‘WWW-Authenticate’ challenges ATM? If not, do they at least read the first (Basic) challenge from a list of challenges?

If not, could we get away with adding an optional charset auth-param to Basic challenges and/or responses? (RFC2617 implies they should be extensible, but I don't know what real-world UAs and servers do here.) If not, all we're left with is the ugly options of an ‘X-Authorization-Charset’ request header.

Incidentally, the good news is that HTTPbis has recently removed the broken references to RFC2047 that got us all so confused above.
Comment 67 Julian Reschke 2009-05-21 04:53:19 PDT
Andrew> I'd still dispute that RFC2616 says that. Whilst HTTP/1.1 headers are
Andrew> explicitly ISO-8559-1, authentication details are not an HTTP header,
Andrew> even if they are encapsulated in one. The base64 token decodes only to a
Andrew> bunch of bytes; there is no standards-endorsed way of turning that into
Andrew> characters.

I agree it's not obvious, and that the spec sucks with respect to that. But, it *does* say:

   base64-user-pass  = <base64 [4] encoding of user-pass,
                    except not limited to 76 char/line>
   user-pass   = userid ":" password
   userid      = *<TEXT excluding ":">
   password    = *TEXT

So it re-uses the TEXT production, and that is defined in RFC2616 with the words (Section 2.2):

"The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 [22] only when encoded according to the rules of RFC 2047 [14].

    TEXT           = <any OCTET except CTLs,
                     but including LWS>"

So at least some people read this as defining it.

Andrew> Well, right or not, I can't see it getting support within another
Andrew> decade! :-(

It will never happen if nobody tries.

Andrew> Most importantly, could it be deployed in a backwards-compatible way?
Andrew> Do UAs even understand multiple ‘WWW-Authenticate’ challenges ATM?
Andrew> If not, do they at least read the first (Basic) challenge from a
Andrew> list of challenges?

We discussed that not so long ago in the context of http://tools.ietf.org/html/draft-broyer-http-cookie-auth-00, and my recollection is that only Opera made problems, and their developers signaled it was a known issue that needs to be fixed.

Andrew> If not, could we get away with adding an optional charset auth-param
Andrew> to Basic challenges and/or responses? (RFC2617 implies they should be
Andrew> extensible, but I don't know what real-world UAs and servers do here.)

The extension mechanism is "must ignore", in which case "old" UA will do whatever they did for Basic, thus fail to authenticate. If they did *not* ignore, it would be impossible to deploy.

Andrew> Incidentally, the good news is that HTTPbis has recently removed
Andrew> the broken references to RFC2047 that got us all so confused above.

They are gone from Part1, but Part7 (Auth) still delegates most to RFC2617 which still has it (HTTPbis is not chartered to revise RFC2617).
Comment 68 Julian Reschke 2009-05-21 05:29:34 PDT
I just did a quick test with

  WWW-Authenticate: Basic "foo" enc=utf-8

and it seems that all (Windows)-UAs ignore the parameter except for Opera and Chrome, for which I'll open bug reports.
Comment 69 Julian Reschke 2009-05-21 05:33:48 PDT
Of course that should have been

  WWW-Authenticate: Basic realm="foo" enc=utf-8

which makes it work in Opera, but still failts in Chrome.
Comment 70 Henrik Nordstrom 2009-09-09 13:59:46 PDT
Just noticed this bug when reading the source trying to figure out why Digest auth completely fails with non-ASCII characters in the login even if within ISO-8859-1...
Comment 71 Zhang Weiwu 2009-09-09 18:48:31 PDT
I have been watching this bug for years. I decide to second Andrew Clover's earlier idea again trying to bring this 9-year-old bug a little step further.

I recommend to add a config option "charset for http authentication" and defaults to utf8, or if not acceptable to you all, default to ascii which is the current situation. This way at least admins can change this option to reasonable value for every of his user for his own deployment. e.g. I would change it to utf8 as part of desktop & notebook computer maintain policy to make our own intranet work, I can imagine the other guy in a company where intranet is M$ based can change it to GB18030 (or any charset used by default in majority of their company) to smooth the switch to firefox without breaking existing web app deployed on MS stack.

I am afraid if my suggestion is taken, the demond for a clean and final solution (defaults to UTF8 or a better RFC) will be weaken that default unreasonable ascii setting would sleep a few decades, maybe until firefox fads a way from the market if there is such a day.

I also wonder how konqueror and Google chrome handles the case.
Comment 72 Jason Duell [:jduell] (needinfo? me) 2009-09-10 13:35:23 PDT
It would be very good to have some data on how Safari and/or Chrome are handling this.  Does anyone know?  Also, do we know how well Opera's decision to use UTF-8 has turned out?
Comment 73 Jason Duell [:jduell] (needinfo? me) 2009-10-09 12:09:37 PDT
*** Bug 55738 has been marked as a duplicate of this bug. ***
Comment 74 Andrew Clover 2009-12-01 05:40:50 PST
Chrome uses UTF-8.

Safari uses ISO-8859-1 (proper 8859, not cp1252) and silently fails to send any authorization header at all when there is a non-ISO-8859-1 character present in the username or password. (Safari also has some other fairly bad auth problems.)

It looks like every site is going to have dropped HTTP Basic Authentication in favour of cookies before this sees any resolution.
Comment 75 Robert Sayre 2010-01-08 06:49:34 PST
(In reply to comment #74)
> Chrome uses UTF-8.

Hmm, if we go with that, I think the two implementations can force the issue. Let's do it.
Comment 76 Julian Reschke 2010-01-08 07:39:16 PST
This *will* break with existing servers, unless Firefox retries with ISO-8859-1 upon 401.

Again; the proper solution it either to extend Basic with a parameter, or define a new authentication scheme.
Comment 77 Robert Sayre 2010-01-09 06:56:52 PST
(In reply to comment #76)
> This *will* break with existing servers, unless Firefox retries with ISO-8859-1
> upon 401.

Firefox has never sent anything other than ASCII, so no servers that work with Firefox now will break. No browser actually follows the spec here, afaik, so utf8 seems like just the thing. We should certainly try the patch I'm about to attach and see how things go.
Comment 78 Robert Sayre 2010-01-09 06:58:40 PST
Created attachment 420886 [details] [diff] [review]
Convert basic credentials to UTF-8 prior to base64 encoding them
Comment 79 Robert Sayre 2010-01-09 07:00:15 PST
Comment on attachment 420886 [details] [diff] [review]
Convert basic credentials to UTF-8 prior to base64 encoding them

This is Andrey M.'s patch with a test added.
Comment 80 Justin Dolske [:Dolske] 2010-01-09 21:20:17 PST
(In reply to comment #77)

> Firefox has never sent anything other than ASCII, so no servers that work with
> Firefox now will break.

Are you sure? From the patch it looks like all we did was send the lower byte of a 2-byte UCS2 character, now any character between U+0080 and U+00FF will be encoded as 2 bytes (for UTF-8). So, an existing username/password like "MëtälMøøse" would stop working... Servers that assumed we were sending ISO-8859-1, Windows-1252, or other charsets that happen to overlap with this range would have largely worked fine.
Comment 81 Robert Sayre 2010-01-10 08:25:21 PST
(In reply to comment #80)
> 
> Are you sure? From the patch it looks like all we did was send the lower byte
> of a 2-byte UCS2 character, now any character between U+0080 and U+00FF will be
> encoded as 2 bytes (for UTF-8). So, an existing username/password like
> "MëtälMøøse" would stop working... 

oh shoot, I think you're right. More testing needed, I guess.
Comment 82 Emil Ivanov 2010-01-10 09:19:40 PST
At least is it possible to show encoded content of the popups like this one is unreadable http://lib.homelinux.org
Comment 83 Zhang Weiwu 2010-01-10 17:17:06 PST
Ooops. glad to know something other than opera work with utf8. Not that I hate opera, but our deployed web applications have some bug with opera that no one can fix because they rooted too deep. I am migrating my users (corporate deployement) to Chrome, change from http authentication to a web form based authentication would create more problems than forcing users to change browser. Luckily we have an IT department that can decide what users use.
Comment 84 Boris Zbarsky [:bz] (Out June 25-July 6) 2010-01-10 19:38:21 PST
Hmm.  Retrying with byte-truncation upon 401 might actually be reasonable...  and get us out of this impasse.  How hard would that be to do?
Comment 85 Honza Bambas (:mayhemer) 2010-01-11 12:42:17 PST
We could introduce a simple continuation state object for basic auth. It would keep a state of attempt to send creds in UTF-8 or loosy-convert form. In case we failed for UTF-8, nsHttpBasicAuth::ChallengeReceived would then set *identityInvalid to PR_FALSE and move the state to loosy-convert, and on the following attempt it would set *identityInvalid to PR_TRUE as it does now, breaking the basic auth cycle. nsHttpBasicAuth::GenerateCredentials would then just convert according to the state object. We have to invalidate the state object when the challenge (actually the realm) changes in ChallengeReceived.

Relatively simple. Makes sense?

Could this new code be potentially exploitable with a UTF-8 and loosy-convert user name or password overlap (two user-names are converted to the same byte representation one by UTF8 and other by loosy-convert, for instance) ?
Comment 86 Robert Sayre 2010-01-13 13:26:39 PST
(In reply to comment #85)
> 
> Could this new code be potentially exploitable with a UTF-8 and loosy-convert
> user name or password overlap 

don't think so, but maybe I'm missing the point. how would the attack work. the credentials are pretty much in the clear, so what does it matter?
Comment 87 Johnny Stenback (:jst, jst@mozilla.com) 2010-01-28 16:40:00 PST
Not blocking 1.9.3 on this.
Comment 88 Boris Zbarsky [:bz] (Out June 25-July 6) 2010-02-15 18:19:47 PST
So one thing confusing me here... The attached patch doesn't change AppendQuotedString, right?  Why not?  See bug 546330.
Comment 89 Julian Reschke 2010-08-03 06:14:51 PDT
I have started work on an Internet Draft defining an auth param expressing the server's expected character encoding, see http://greenbytes.de/tech/webdav/draft-reschke-basicauth-enc-latest.html. Feedback, such as test cases, appreciated. Co-authors as well.
Comment 90 Andrew Clover 2010-08-03 15:57:44 PDT
@Julian: Looks fairly simple and watertight, good. The comma-separated syntax is consistent with Digest auth and of the browsers I have to hand, none have tripped on it -

Already use UTF-8 anyway:
    Opera 10
    Chrome 5

Uses a different encoding but has no problem with req param (so ASCII is still OK):
    Firefox 1-4  (UTF-16 lower bytes)
    Safari 4-5   (ISO-8859-1)
    IE:Mac 5     (MacRoman)
    IE 5-8       (CP_ACP)
    IEMobile 6-8 (CP_ACP)
    Netscape 4   (CP_ACP)
    Opera 5      (CP_ACP)

Assuming 'Deployment Advice' is to be aimed at webapp/server authors (rather than browser vendors), testing incoming credentials against both UTF-8 and ISO-8859-1 is a good start, though it should be stressed that this doesn't fully cover real-world browser behaviour.

The system codepage of Western Windows machines (cp1252) is close enough to ISO-8859-1 for the purposes of the simple Western European accented letters the encodings share, but other marks like smart quotes or the euro won't come through. And of course in other territories sites are left guessing what codepage their users have, like the Japanese sites where you can only log in with IE using cp932.
Comment 91 Matthias Versen [:Matti] 2011-01-29 21:15:25 PST
*** Bug 629962 has been marked as a duplicate of this bug. ***
Comment 92 Nicholas Hurley [:nwgh][:hurley] 2011-03-24 12:59:46 PDT
Created attachment 521605 [details] [diff] [review]
Allow converting basic creds to UTF-8 before base64 encoding
Comment 93 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-03-25 10:02:23 PDT
It is more important to be compatible with the servers than it is to be compatible with Opera and Chrome, since the servers are the ones we have to work with. How do the most recent versions of IIS, Apache, Tomcat, and PHP (at least) decode this? If that's UTF-8, then great. If not, then we should provide the server a way to say "use UTF-8" instead. The enc parameter proposed by Julian seems like a reasonable approach, but also it could just be flagged in a separate header. Also, when we're encoding in UTF-8, we should send some indication (e.g. in another header field), besides the User-Agent, that we are using UTF-8, unless we have evidence that servers are already expecting UTF-8.

The idea of trying one encoding and then trying another will work poorly with sites that lock your account after a small number of login failures. Plus, it is added complexity that we don't need.
Comment 94 Julian Reschke 2011-03-25 10:18:41 PDT
(In reply to comment #93)
> It is more important to be compatible with the servers than it is to be
> compatible with Opera and Chrome, since the servers are the ones we have to
> work with. How do the most recent versions of IIS, Apache, Tomcat, and PHP (at
> least) decode this? If that's UTF-8, then great. If not, then we should provide
> the server a way to say "use UTF-8" instead. The enc parameter proposed by
> Julian seems like a reasonable approach, but also it could just be flagged in a
> separate header. Also, when we're encoding in UTF-8, we should send some

I believe we can get away with the new parameter; after all, that's what they are for.

> indication (e.g. in another header field), besides the User-Agent, that we are
> using UTF-8, unless we have evidence that servers are already expecting UTF-8.

See <http://greenbytes.de/tech/webdav/draft-reschke-basicauth-enc-00.html#rfc.section.B.3>.

I'm not convinced it's needed; could you explain a use case where the server could take advantage from that information?

(I'm not strictly against it; I'd just like to avoid information nobody needs).

> The idea of trying one encoding and then trying another will work poorly with
> sites that lock your account after a small number of login failures. Plus, it
> is added complexity that we don't need.

I agree that this might be a problem; on the other hand it might be the way of least resistance, although a hack.

-> http://greenbytes.de/tech/webdav/draft-reschke-basicauth-enc-01.html#rfc.section.A.1.1 (will mention this in the next revision)
Comment 95 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-03-25 12:28:52 PDT
(In reply to comment #94)
> I believe we can get away with the new parameter; after all, that's what they
> are for.

RFC2617 doesn't provide for new parameters for BASIC.

> > indication (e.g. in another header field), besides the User-Agent
> 
> I'm not convinced it's needed; could you explain a use case where the server
> could take advantage from that information?

Some servers will be able to just try to decode using multiple encodings until one works. But, if the account lockout controls are behind the web server (which is the case for some Mozilla services, for example), then this could lead to similar account-lockout problem I mentioned above for the case where clients retry with multiple encodings. If the server knows which encoding the client used then it can choose that encoding on the first try and avoid the issue. User-Agent isn't the appropriate way of conveying this information about the encoding and/or whether the UA ignored the server's indication of the encoding it wants to use.

The nice thing about Julian's parameter-based solution or a new server-sent header field is that it is easy for server admins to configure their servers to work with us without code changes, using mod_headers or similar, as long as we are willing to support encodings other than UTF-8 like cp932 (for Japanese) sites. Any other solution requires server code changes and means that we are still going to be incompatible with some existing sites.

I do not understand why Julian's parameter-based solution is that it seems like it will break for some parsers (like Chrome's), and those parsers are compliant because RFC2617 doesn't require parsers to accept additional parameters. New is header fields (one for Proxy-Auth and one for Auth) make more sense, considering maximizing interoperability between new and old implementations is the goal.
Comment 96 Julian Reschke 2011-03-25 12:41:28 PDT
(In reply to comment #95)
> (In reply to comment #94)
> > I believe we can get away with the new parameter; after all, that's what they
> > are for.
> 
> RFC2617 doesn't provide for new parameters for BASIC.

RFC 2617 says that the challenge can contain auth params, it just doesn't define any for Basic.

Anyway, the spec would update 2617 anyway. 

> > > indication (e.g. in another header field), besides the User-Agent
> > 
> > I'm not convinced it's needed; could you explain a use case where the server
> > could take advantage from that information?
> 
> Some servers will be able to just try to decode using multiple encodings until
> one works. But, if the account lockout controls are behind the web server

That sounds like a potential security hole. Do you have evidence of servers behaving like that? And if they do, wouldn't it make sense to always start with UTF-8?

> (which is the case for some Mozilla services, for example), then this could
> lead to similar account-lockout problem I mentioned above for the case where
> clients retry with multiple encodings. If the server knows which encoding the
> client used then it can choose that encoding on the first try and avoid the
> issue. User-Agent isn't the appropriate way of conveying this information about
> the encoding and/or whether the UA ignored the server's indication of the
> encoding it wants to use.
> 
> The nice thing about Julian's parameter-based solution or a new server-sent
> header field is that it is easy for server admins to configure their servers to
> work with us without code changes, using mod_headers or similar, as long as we
> are willing to support encodings other than UTF-8 like cp932 (for Japanese)
> sites. Any other solution requires server code changes and means that we are
> still going to be incompatible with some existing sites.
> 
> I do not understand why Julian's parameter-based solution is that it seems like
> it will break for some parsers (like Chrome's), and those parsers are compliant

If it breaks Chrome, let's raise a bug report.

> because RFC2617 doesn't require parsers to accept additional parameters. New is

It does:

  challenge   = auth-scheme 1*SP 1#auth-param

So recipients must accept params; it just doesn't require them to do anything useful with them for Basic.

If you believe there's a spec problem, then let's fix it in draft-ietf-httpbis-p7-auth (which replaces the authentication framework parts of 2617).

> header fields (one for Proxy-Auth and one for Auth) make more sense,
> considering maximizing interoperability between new and old implementations is
> the goal.

Implementations currently update fast. The only thing that would scare me would be a problem in IE8.
Comment 97 Nicholas Hurley [:nwgh][:hurley] 2011-04-06 16:10:18 PDT
The servers themselves don't do anything involving character set encoding/decoding of basic auth. To them, it's just a sequence of bytes that gets hashed using whatever hash they may use (if any) and compared to the stored password (hashed or not). So that brings us to the question of how are the stored hashes computed? In the case of apache with a htpasswd file, it's based entirely on whatever input encoding the user happens to be using when they run "htpasswd". For apache with some other backing store for the passwords, as well as PHP using some random backing store, it will be whatever encoding is used by whatever piece of software reads the password from the user and writes it (in hashed or otherwise form) to the backing store. Tomcat's hash generator program takes a "-e" argument to tell it what encoding to use. By default, IIS is tied to only using windows accounts, so I imagine that's all done using the system codepage (I couldn't find any documentation on using IIS with non-windows passwords in my quick googling).

So where does this leave us? For the most part, we're at the mercy of whoever wrote whatever software is requiring authentication, or worse yet, the user that set his/her own password. Even if we could get all UA vendors and all server vendors to implement a scheme where the UA tells the server what encoding it used, that doesn't do us much good unless everyone rewrites their custom-rolled basic auth systems to honor the server when it tells them "Hey, the UA gave me this password encoded in UTF-8".

Just using UTF-8 has the advantages of (1) not breaking ascii passwords that already work with fx just fine, and (2) giving us SOME sort of compatibility with non-latin character sets that makes some sort of sense, instead of just stripping out all but the lowest byte of whatever characters were input. We still won't be compatible with non-latin IIS installations that use the system codepage for usernames/passwords, but we weren't before. If UTF-8 is made the default, we also lose default compatibility with latin charsets in the case when a username/password contains a non-ascii character, but that is mitigated with a pref to enable or disable utf-8 encoding of basic auth info (see my current patch).
Comment 98 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-04-22 22:20:56 PDT
Comment on attachment 521605 [details] [diff] [review]
Allow converting basic creds to UTF-8 before base64 encoding

One nit.  This pattern:

>+    if (prefs)
>+        if (NS_FAILED(prefs->GetBoolPref(kUseUTF8, &utf8auth)))

is probably better as:

  if (prefs && NS_FAILED(prefs->GetBoolPref(kUseUTF8, &utf8auth)))

(less indentation and such).

sr=me with that fixed for getting this in and then at least asking some users in non-Western locales (and maybe some Western ones) to flip the pref and so we can gather some data on what servers are actually expecting....

We'll need to broadcast the need for such testing somewhat; maybe we can ask the Mozilla Europe and Mozilla Japan folks to help.  Please let me know if you need help with that part once you land the patch, ok?
Comment 99 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-05-11 00:17:58 PDT
(In reply to comment #96)
> RFC 2617 says that the challenge can contain auth params, it just doesn't
> define any for Basic.

>   challenge   = auth-scheme 1*SP 1#auth-param
> 
> So recipients must accept params; it just doesn't require them to do
> anything useful with them for Basic.

But, for BASIC it is restricted to:

    challenge   = "Basic" realm

Anyway, bug 656213 is now the bug about Gecko discovering what encoding the server wants us to use. By default, we will use UTF-8 unless/until there's a different consensus reached between us and the other browsers.
Comment 100 Julian Reschke 2011-05-11 00:20:11 PDT
(In reply to comment #99)
> (In reply to comment #96)
> > RFC 2617 says that the challenge can contain auth params, it just doesn't
> > define any for Basic.
> 
> >   challenge   = auth-scheme 1*SP 1#auth-param
> > 
> > So recipients must accept params; it just doesn't require them to do
> > anything useful with them for Basic.
> 
> But, for BASIC it is restricted to:
> 
>     challenge   = "Basic" realm

No, it's not "restricted". This is an extension point.

> Anyway, bug 656213 is now the bug about Gecko discovering what encoding the
> server wants us to use. By default, we will use UTF-8 unless/until there's a
> different consensus reached between us and the other browsers.

Using UTF-8 will break existing deployments that rely on ISO-8859-1.
Comment 101 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-05-11 00:56:26 PDT
(In reply to comment #100)
> Using UTF-8 will break existing deployments that rely on ISO-8859-1.

These ISO-8859-1 passwords already don't work in Firefox (unless they are already valid UTF-8), so it would not be a regression in our behavior. And, like I said, if/when there's consensus on an alternative solution, we will almost definitely implement that solution. (But, I'm not holding my breath on there being a consensus any time soon.)
Comment 102 Julian Reschke 2011-05-11 01:04:36 PDT
(In reply to comment #101)
> (In reply to comment #100)
> > Using UTF-8 will break existing deployments that rely on ISO-8859-1.
> 
> These ISO-8859-1 passwords already don't work in Firefox (unless they are

As far as I can tell, they do work, because Firefox just drop the top 8 bits, and thus ISO-8859-1 passes through unharmed. The same IMHO is true for IE. See summary in comment 58.

> ...
Comment 103 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-05-11 17:02:39 PDT
(In reply to comment #102)
> As far as I can tell, they do work, because Firefox just drop the top 8
> bits, and thus ISO-8859-1 passes through unharmed. The same IMHO is true for
> IE. See summary in comment 58.

Thanks Julian. The key point that was overlooked previously is that codepoints 0-255 are the same in ISO-8859-1 and Unicode, so chopping off the high byte works pretty well for sites that are expecting ISO-8859-1, which is probably most of the Americas and Western Europe. Consequently, switching to UTF-8 by default *would* be not be a regression-free change, as I think some were assuming.

Our goal here is to get non-ISO-8859-1 characters in usernames and passwords to work as well as the ISO-8859-1 characters already do. We don't have any evidence that switching to UTF-8 actually improves things here. We should avoid switching the default to UTF-8 unless we have some evidence that it actually helps. I filed two bugs blocking this one to improve things here.
Comment 104 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-05-11 17:07:42 PDT
Comment on attachment 521605 [details] [diff] [review]
Allow converting basic creds to UTF-8 before base64 encoding

I am r-'ing the patch because it would cause regressions in current behavior.
Comment 105 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-05-11 18:31:31 PDT
Hold on.  Did you read the patch?  The patch adds a _pref_ to allow the user to switch to UTF-8 for the encoding for auth.  The default value of the pref is set such that the current ISO-8859-1 behavior is used, precisely because a wholesale switch to UTF-8 would not be backwards-compatible, but the patch gives users the _option_ of doing something else.  See comment 98 for why I think such a pref could be useful.
Comment 106 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-05-11 18:56:19 PDT
(In reply to comment #105)
> Hold on.  Did you read the patch?  The patch adds a _pref_ to allow the user
> to switch to UTF-8 for the encoding for auth.  The default value of the pref
> is set such that the current ISO-8859-1 behavior is used, precisely because
> a wholesale switch to UTF-8 would not be backwards-compatible, but the patch
> gives users the _option_ of doing something else.  See comment 98 for why I
> think such a pref could be useful.

All the information we've been given indicates that UTF-8 is the one encoding least likely to work. I expect the IE way (which is never UTF-8) to dominate, especially in Asian markets that are IE-centric like China, Korean, and Japan. The patch would be useful if it let the user select a non-UTF-8 encoding to use, because then the user could experiment with different settings and report back what encoding each of the sites they are testing is actually using, instead of being limited to "ISO-8859-1", "UTF-8", and "not ISO-8859-1 and not UTF-8".
Comment 107 Honza Bambas (:mayhemer) 2011-05-12 07:55:46 PDT
I tried the patch with IIS and an account name with diacritics.  Either way (w/o or w/ the patch and the pref turned on) I could not log in.

I can try the same on my virtual linux box, if someone else doesn't already have a test environment to try.
Comment 108 Jo Hermans 2011-08-16 03:50:20 PDT
*** Bug 576949 has been marked as a duplicate of this bug. ***
Comment 110 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-16 16:46:27 PST
Sorry, comment 109 is in the wrong bug.
Comment 111 Julian Reschke 2012-01-29 06:51:50 PST
(In reply to Nick Hurley [:hurley] from comment #97)
> The servers themselves don't do anything involving character set
> encoding/decoding of basic auth. To them, it's just a sequence of bytes that
> gets hashed using whatever hash they may use (if any) and compared to the
> stored password (hashed or not). So that brings us to the question of how
> are the stored hashes computed? In the case of apache with a htpasswd file,
> it's based entirely on whatever input encoding the user happens to be using
> when they run "htpasswd". For apache with some other backing store for the
> passwords, as well as PHP using some random backing store, it will be
> whatever encoding is used by whatever piece of software reads the password
> from the user and writes it (in hashed or otherwise form) to the backing
> store. Tomcat's hash generator program takes a "-e" argument to tell it what
> encoding to use. By default, IIS is tied to only using windows accounts, so
> I imagine that's all done using the system codepage (I couldn't find any
> documentation on using IIS with non-windows passwords in my quick googling).
> 
> So where does this leave us? For the most part, we're at the mercy of
> whoever wrote whatever software is requiring authentication, or worse yet,
> the user that set his/her own password. Even if we could get all UA vendors
> and all server vendors to implement a scheme where the UA tells the server
> what encoding it used, that doesn't do us much good unless everyone rewrites
> their custom-rolled basic auth systems to honor the server when it tells
> them "Hey, the UA gave me this password encoded in UTF-8".
> ...

Point taken; but in these cases the problem essentially is not solvable, right? After all, the *user* enters characters, not octets, in their password prompt.

I don't believe we should reject a fix just because it doesn't fix a problem that by definition can not be solved.
Comment 112 Nicholas Hurley [:nwgh][:hurley] 2012-01-29 14:16:16 PST
(In reply to Julian Reschke from comment #111) 
> Point taken; but in these cases the problem essentially is not solvable,
> right? After all, the *user* enters characters, not octets, in their
> password prompt.
> 
> I don't believe we should reject a fix just because it doesn't fix a problem
> that by definition can not be solved.

For the record, I'm fairly certain that was the point I was trying to make, as well, but it's been so long since I made that comment, that I'm a little fuzzy on the details :)
Comment 113 ggo 2013-04-05 10:31:29 PDT
My 2 cents: the current version of Firefox (20 under Windows) will still not work with non-ascii userid (at least with the digest authentication).
However, a non ascii password with an ascii username is handled "correctly".

Actually a non ascii password with an non ascii username are handled correctly, until the username is added in the response header.
This is because the nsHttpDigestAuth::AppendQuotedString() method in nsHttpDigestAuth.cpp checks if non ascii characters are used, and fails if non ascii characters are used. 

What about having a version of AppendQuotedString() (say AppendUTF8QuotedString) that is used to append the username in the response header.

Actually this is what I have done today: I have downloaded, built, modified nsHttpDigestAuth.cpp, rebuilt and it seems to work very well (at least, it works like some other browsers do...), and it does NOT break anything AFAIU...

BTW, I would like to submit the modified version, but I'm not sure about how to do it: Can I just check out nsHttpDigestAuth.cpp and then check it in..?

Any comment are welcome.






Why not
Comment 114 ggo 2013-04-05 10:36:22 PDT
I've read that I must have this bug assigned to me to submit a modification.

Can someone with the required authority do that please?

Or how do I "take" the bug as suggested by the text beside the "Assigned To" field : "Nobody; OK to take it and work on it" ?

Thank you.
Comment 115 Patrick McManus [:mcmanus] 2013-04-05 10:47:11 PDT
btw you don't really need to be assigned the bug to attach patches and request reviews. Note that there is already a patch here with a r- attached to it. Best to understand the issues there (talking to author, reviewer, other commenters) before reliving the experience :)
Comment 116 Nicholas Hurley [:nwgh][:hurley] 2013-04-05 11:09:00 PDT
As the author of that (long ago) r-'d patch, I was just about to chime in :)

I will start by saying this: please don't let any of my notes below discourage you from contributing to this (or any other!) bugs. This is all just background information for why (unless people's minds have changed) I would be surprised if we accept a "switch to UTF-8" approach for this bug. I definitely would like to see this bug fixed, though!

Now that that's been said, here's the info dump:

One concern (if I remember correctly) is that "just" changing auth to be UTF-8 (or having a pref for that, as in my patch) could easily lead to silent breakage from servers that have (over the 12+ years this bug has existed) been configured to expect our broken behavior.

Even worse, sending UTF-8 auth may fix some sites, but break others, so we would need to have some way to detect a server that broke with UTF-8 and change the encoding on the fly to fall back to ASCII.

I had a patch that, at one point, did the fallback, but there was yet another problem: some (many?) sites enforce a lockout after N failed login attempts, with 3 being a (in my experience) common value for N. So imagine the case where a user is logging into a server that breaks with UTF-8 but works with ASCII. Imagine again that the user enters their password incorrectly on the first try. We then have this flow of events:

1. User sends incorrect password, encoded incorrectly (first login failure)
2. We automatically fall back to the expected encoding, but still with the wrong password, and send that (second login failure)
3. User gets prompted for their password again, and enters the correct password
4. We send the correct password, encoded incorrectly (third login failure)

At this point, assuming N=3 for lockout, the user is locked out of their account, even though (in their experience) they've only tried entering their password twice, instead of the 3 chances that they are supposed to get.

The final issue was one of incompleteness. Adding UTF-8 as an option may gain us a few more sites that work, but the risk for breakage with other sites (and the lack of fixing even more sites) seems to have outweighed the limited gains we could have made with the "simple fix".

Extensions to the HTTP spec to allow the server to say "encode your authentication in encoding X" have been suggested, but I don't believe any of them have gotten anywhere.

Unfortunately a large portion of this discussion happened between me and bsmith in person, so there wasn't a record of it in this bug (or if there was, it's scattered in many different comments, which makes it hard to get at).
Comment 117 ggo 2013-04-06 00:15:45 PDT
Ok, thank you :)
Comment 118 ggo 2013-04-06 12:49:48 PDT
(In reply to Nick Hurley [:hurley] from comment #116)
> As the author of that (long ago) r-'d patch, I was just about to chime in :)
> 
> I will start by saying this: please don't let any of my notes below
> discourage you from contributing to this (or any other!) bugs. This is all
> just background information for why (unless people's minds have changed) I
> would be surprised if we accept a "switch to UTF-8" approach for this bug. I
> definitely would like to see this bug fixed, though!
> 
> Now that that's been said, here's the info dump:
> 
> One concern (if I remember correctly) is that "just" changing auth to be
> UTF-8 (or having a pref for that, as in my patch) could easily lead to
> silent breakage from servers that have (over the 12+ years this bug has
> existed) been configured to expect our broken behavior.
> 
> Even worse, sending UTF-8 auth may fix some sites, but break others, so we
> would need to have some way to detect a server that broke with UTF-8 and
> change the encoding on the fly to fall back to ASCII.
> 
> I had a patch that, at one point, did the fallback, but there was yet
> another problem: some (many?) sites enforce a lockout after N failed login
> attempts, with 3 being a (in my experience) common value for N. So imagine
> the case where a user is logging into a server that breaks with UTF-8 but
> works with ASCII. Imagine again that the user enters their password
> incorrectly on the first try. We then have this flow of events:
> 
> 1. User sends incorrect password, encoded incorrectly (first login failure)
> 2. We automatically fall back to the expected encoding, but still with the
> wrong password, and send that (second login failure)
> 3. User gets prompted for their password again, and enters the correct
> password
> 4. We send the correct password, encoded incorrectly (third login failure)
> 
> At this point, assuming N=3 for lockout, the user is locked out of their
> account, even though (in their experience) they've only tried entering their
> password twice, instead of the 3 chances that they are supposed to get.
> 
> The final issue was one of incompleteness. Adding UTF-8 as an option may
> gain us a few more sites that work, but the risk for breakage with other
> sites (and the lack of fixing even more sites) seems to have outweighed the
> limited gains we could have made with the "simple fix".
> 
> Extensions to the HTTP spec to allow the server to say "encode your
> authentication in encoding X" have been suggested, but I don't believe any
> of them have gotten anywhere.
> 
> Unfortunately a large portion of this discussion happened between me and
> bsmith in person, so there wasn't a record of it in this bug (or if there
> was, it's scattered in many different comments, which makes it hard to get
> at).


Hi Nick,

first thank you very for your comments and valuable informations you provide.

I've seen your patch, and that there has been a lot of discussion about this issue, and some thought it was not a good idea to send back some UTF-8 encoded stuff (the username in this case).



I understand very well the need not to break existing solutions, but one thing is that I'm talking about the Digest authentication; 
I have not checked precisely how Basic authentication is implemented up to now, because it's really unsafe (except with SSL).
Please correct me if I'm wrong, but I believe that your patch is for Basic authentication.

What I understand from the existing Digest authentication code is that it does already handle the creation of the response_digest using UTF-8.
The code "fails" when adding the username in the response header, because non-control ASCII characters only are allowed, which is what the original specification says (once again I may be wrong, but it seems to be like that according to what I've been reading up to now).

Now, what will happen if the username is sent in the response header with the UTF-8 encoding (which is what some other browsers already do):
1) username containing ASCII characters only will map to the same string in the UTF-8 encoding, so nothing is broken for the existing implementations.
2) username containing UNICODE characters will map to strings containing non-ASCII characters (but not control characters). 

On the server side, the digest authentication module (called "DAM" later) receives the username with bytes having the highest bit set to 1 (= bytes with a value >= 128). 

I think that the "DAM" *should* see it, and if it does follow the specification stricly, reject the authentication attempt without any further processing.

But may be that the "DAM" won't do that, and some problems may occur here. 
Here is 1 and 2 possible consequences
(may be there are others I'm not aware of yet):

Problem: the "DAM" might just IGNORE the highest bit, and, if we are unlucky enough, the resulting username (called "username7bits" later) might match an existing userid. This leads to 2 possible consequences:

a) After the usual 3 login attempts, the "username7bits" is locked, as you mentioned.

b) Since MD5 has collisions (I'm not expert about this, I only know it like most of us), the digest answer *might* match what would have been the digest for the "username7bits" + its password. In this unlikely case, the original username might be authenticated as "username7bit"...

Now, what is the original problem on the server side in this possible implementation: the "DAM" truncates the original username bytes, 
so ...this is a problem with the server, not a problem with Firefox! 
If a server does this, then it *must* be fixed on the server.
And the fix is simple: if UTF-8 support for the username is not wanted on the server then the authentication attempt has to be rejected, without any other processing.

Last words: the idea that Firefox is going to be the *only* Web Browser 
which supports ASCII characters only for the username sounds bad to me.

So... I will try to submit the minor modifications in mozilla-central;
if they are rejected, there will probably be good reasons, regarding respecting the standards, or whatever.
Unfortunately users who wish to use unicode user names with the Digest authentication (and it does happen) will still not be able use Firefox...

May be I'm completely wrong about this, so I would be glad if you and/or other wish to add comments / remarks.

Have a good day :)
Comment 119 ggo 2013-04-07 23:58:32 PDT
Created attachment 734492 [details] [diff] [review]
Allow utf-8 encoded username in digest authentication response header.

setting the network.auth.digest-response-header-username-utf8 pref flag to true (through about:config) enables the new behavior regarding username in the digest authentication response header.
Comment 120 Patrick McManus [:mcmanus] 2013-04-08 04:39:03 PDT
Comment on attachment 734492 [details] [diff] [review]
Allow utf-8 encoded username in digest authentication response header.

Hi - Thanks for the patch.

Please see
https://developer.mozilla.org/en-US/docs/Developer_Guide/How_to_Submit_a_Patch
for information on how the review flags work. You need to submit r? to someone to get a review. You added r+ which indicates you have given the code a positive review. You can't review your own code.. you also set checkin+ which indicates the patch is ready for checkin - and we can't do that without the right reviews.

So your next step is to request a review.. you probably want to do that from bsmith because he had issues with the prior approach. You should also add to the comment trail in the bug how this effort differs from the last one (or why we should change our minds about the reasons nick and brian have mentioned in the comment trail).

Thanks for the work!
Comment 121 ggo 2013-04-08 06:19:37 PDT
(In reply to Patrick McManus [:mcmanus] from comment #120)

Hi, thank you for the information :) 
I'll keep your reply in order to remember how to submit a patch correctly.
I will read again the "how to submit a patch" documentation in order to know how to ask for a review.

Olivier.
Comment 122 ggo 2013-04-08 06:38:11 PDT
Comment on attachment 734492 [details] [diff] [review]
Allow utf-8 encoded username in digest authentication response header.

This patch allows the http digest authentication module to send back utf8 encoded username in the response header in order to enable support for non-ascii usernames.
It can be enabled by setting the "network.auth.digest-response-header-username-utf8" flag to true in the FF config (about:config).

the patch could be improved by using the charset parameter as defined in RFC 2831, even if the flag is not set. I can probably do if it is asked.
Comment 123 ggo 2013-04-22 09:23:22 PDT
Created attachment 740318 [details] [diff] [review]
Allow utf-8 encoded username in digest authentication response header.

This patch allows the http digest authentication module to send back utf8 encoded username in the response header in order to enable support for non-ascii usernames.
It can be enabled by setting the "network.auth.digest-response-header-username-utf8" flag to true in the FF config (about:config).

the patch could be improved by using the charset parameter as defined in RFC 2831, even if the flag is not set. I can probably do if it is asked.
Comment 124 Justin Dolske [:Dolske] 2013-04-22 13:09:16 PDT
Comment on attachment 740318 [details] [diff] [review]
Allow utf-8 encoded username in digest authentication response header.

>diff -r fd264d551130 modules/libpref/src/init/all.js
>--- a/modules/libpref/src/init/all.js	Fri Apr 19 07:45:15 2013 -0400
>+++ b/modules/libpref/src/init/all.js	Mon Apr 22 18:13:28 2013 +0200
>@@ -1280,6 +1280,14 @@
> // Specify if the gss lib comes standard with the OS
> pref("network.negotiate-auth.using-native-gsslib", true);
> 
>+// Controls whether to allow sending back UTF8 username 
>+// or ASCII only characters in the digest authentication response header
>+// False: non-ascii usernames will cause the authentication to fail, 
>+// (which is the default behavior up to now).
>+// True: non-ascii usernames will be sent back UTF-8 encoded in the 
>+// in the digest authentication response header .
>+pref("network.auth.digest-response-header-username-utf8", false);
>+
> #ifdef XP_WIN
> 
> // Default to using the SSPI intead of GSSAPI on windows 
>diff -r fd264d551130 netwerk/protocol/http/nsHttpDigestAuth.cpp
>--- a/netwerk/protocol/http/nsHttpDigestAuth.cpp	Fri Apr 19 07:45:15 2013 -0400
>+++ b/netwerk/protocol/http/nsHttpDigestAuth.cpp	Mon Apr 22 18:13:28 2013 +0200
>@@ -156,6 +156,8 @@
>   return NS_OK;
> }
> 
>+static const char kAllowUTF8UserNameInResponseHeader[] = "network.auth.digest-response-header-username-utf8";
>+
> NS_IMETHODIMP
> nsHttpDigestAuth::GenerateCredentials(nsIHttpAuthenticableChannel *authChannel,
>                                       const char *challenge,
>@@ -178,6 +180,13 @@
>   bool isDigestAuth = !PL_strncasecmp(challenge, "digest ", 7);
>   NS_ENSURE_TRUE(isDigestAuth, NS_ERROR_UNEXPECTED);
> 
>+  // we work with ASCII around here
>+  nsCOMPtr<nsIPrefBranch> prefs = do_GetService(NS_PREFSERVICE_CONTRACTID);
>+  bool allowUTF8UserNameInResponseHeader = false; // Default to the old behavior
>+  if (prefs)
>+    if (NS_FAILED(prefs->GetBoolPref(kAllowUTF8UserNameInResponseHeader, &allowUTF8UserNameInResponseHeader)))
>+      allowUTF8UserNameInResponseHeader = false;
>+
>   // IIS implementation requires extra quotes
>   bool requireExtraQuotes = false;
>   {
>@@ -314,7 +323,10 @@
>   nsAutoCString authString;
> 
>   authString.AssignLiteral("Digest username=");
>-  rv = AppendQuotedString(cUser, authString);
>+  if (allowUTF8UserNameInResponseHeader)
>+    rv = AppendUTF8QuotedString(cUser, authString);
>+  else
>+    rv = AppendQuotedString(cUser, authString);
>   NS_ENSURE_SUCCESS(rv, rv);
> 
>   authString.AppendLiteral(", realm=");
>@@ -688,4 +700,40 @@
>   return NS_OK;
> }
> 
>+nsresult
>+nsHttpDigestAuth::AppendUTF8QuotedString(const nsACString & value,
>+                                     nsACString & aHeaderLine)
>+{
>+  nsAutoCString quoted;
>+  nsACString::const_iterator s, e;
>+  value.BeginReading(s);
>+  value.EndReading(e);
>+
>+  //
>+  // Encode string according to RFC 2616 quoted-string production, 
>+  // but with NON-ascii characters allowed.
>+  // (not a standard, but what other browsers already 
>+  // do to support username with unicode characters)
>+  quoted.Append('"');
>+  for ( ; s != e; ++s) {
>+    //
>+    // CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
>+    //
>+    if (((unsigned char)(*s)) <= 31 || *s == 127) {
>+      return NS_ERROR_FAILURE;
>+    }
>+
>+    // Escape two syntactically significant characters
>+    if (*s == '"' || *s == '\\') {
>+      quoted.Append('\\');
>+    }
>+
>+    quoted.Append(*s);
>+  }
>+  quoted.Append('"');
>+  aHeaderLine.Append(quoted);
>+  return NS_OK;
>+}
>+
>+
> // vim: ts=2 sw=2
>diff -r fd264d551130 netwerk/protocol/http/nsHttpDigestAuth.h
>--- a/netwerk/protocol/http/nsHttpDigestAuth.h	Fri Apr 19 07:45:15 2013 -0400
>+++ b/netwerk/protocol/http/nsHttpDigestAuth.h	Mon Apr 22 18:13:28 2013 +0200
>@@ -79,6 +79,11 @@
>     // append the quoted version of value to aHeaderLine
>     nsresult AppendQuotedString(const nsACString & value,
>                                 nsACString & aHeaderLine);
>+    // append the quoted version of value to aHeaderLine, 
>+    // with bytes values from 128 to 255 allowed 
>+    // (to be able to send back the utf8 encoded username)
>+    nsresult AppendUTF8QuotedString(const nsACString & value,
>+                                    nsACString & aHeaderLine);
> 
>   protected:
>     nsCOMPtr<nsICryptoHash>        mVerifier;
Comment 125 Justin Dolske [:Dolske] 2013-04-22 13:17:50 PDT
[Ack! Not sure what I fubared to trigger the last comment, I was just setting the ispatch flag...]

I think a global pref is a really bad idea. It makes the browser behave differently to the server in an undetectable way that breaks the spec, isn't a very good option if the user needs to login to a different site which needs the opposite behavior, and all-around feels like a big footgun that can lead to the perception that Firefox is broken (perhaps long after the user has flipped the pref and forgotten about it).

Nor does it really address the problem for the vast majority of users, because we can't expect many people to be using this pref (most people don't change defaults, and also the above).
Comment 126 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2013-05-28 20:36:09 PDT
ggo98: Thank you very much for working on this.

I agree with Dolske regarding the non-value of having a pref. This kind of pref tends to make one website work while another website breaks.

Our goals should be to (a) work as well as we can with existing servers, and (b) push the web forward towards a reasonable way to use UTF-8 for HTTP Auth. For (b) it seems like the new parameter-based mechanism Julian suggested above, tracked in bug 656213, is the way to go there by default.

For (a), perhaps a not-completely-and-totally-horrible solution would be to modify the UI of the HTTP auth prompt so that the user can override the choice of encoding, just like we allow the user to override the choice of encoding in the good ol' charset menu. Is it bad UI? Well, it isn't great UI. However, saying that European and American people can use passwords in their languages, but others can't, is a horrible (at best) policy. And, HTTP basic/digest auth is rarely used, so few users would be exposed to the less-than-optimal-but-important UI.

Even if we were to implement (b), we will still need (a) to cope with servers that don't implement that new mechanism. And, I don't think it matters whether we implement (a) before (b) or vice-versa.

I doubt any Mozilla Corp. employees are going to sit down and fix this problem any time soon because of the other things we're busy with. But, luckily, we have some people who have stepped up to write some code to solve this problem. The pref is not the way to do that, but we should steer people towards a solution that is acceptable. So, here's my non-UX-designer suggested UI:

    Username: [                   ]
    Password: [                   ]
    Encoding: [                ][v]

"Encoding" is a drop-down list of encodings. It is disabled by default and its default value is "Automatic (ASCII)" which means ASCII. If the user types a non-ASCII username or a non-ASCII password, then the Encoding drop-down gets enabled and the user can see a list of encodings that looks similar to the good ol' charset menu, sorted and separated in the same way, which they can choose from. When the user chooses a non-ASCII encoding, then we send the auth data using that encoding, exactly like MSIE encodes it.

Bonus points (perhaps in a follow-up bug): filter the list of encodings in the Encodings list based on the encodings that include the non-ASCII characters that the user typed.

If we ever implement the mechanism tracked by bug 656213, or a similar mechanism, then we will disable the Encodings field when the server uses that mechanism, and set its value to the value that the server chose.

dolske: Does this seem like a reasonable approach, in general, to you? Is there somebody on Firefox or UX team that needs to be involved before we would suggest potential contributors take a go at implementing it, for us to review?

Does anybody else see any tragic flaws with changing the UI in this way, or a similar manner? Anybody got a better idea?
Comment 127 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2013-05-28 20:38:13 PDT
Comment on attachment 740318 [details] [diff] [review]
Allow utf-8 encoded username in digest authentication response header.

Review of attachment 740318 [details] [diff] [review]:
-----------------------------------------------------------------

ggo, Thanks for the patch. I am going to r- it, based on the reasons given above. If you disagree and think that I should take a look at this patch, please respond to the comments that dolske and I made above and r?bsmith again. Thanks again!
Comment 128 Justin Dolske [:Dolske] 2013-05-28 21:29:50 PDT
Asking the user to deal with encoding issues in UI is just making things worse. :(

TBH, the only path suggested so far that sounds likely to fly is specing a new auth method (like comment 59) that's explicitly utf8, and getting sites to use that. Everything else breaks stuff to some unknown degree.

Someone just needs to do that. It sounds straightforward, and should be pretty simple for servers and clients to implement.
Comment 129 Honza Bambas (:mayhemer) 2013-05-30 07:45:07 PDT
(In reply to Brian Smith (:bsmith) from comment #126)
>     Encoding: [                ][v]

My half-cent: should be hidden until we detect non-ASCII chars, or until we get 401/7.  Explanatory text would be good to show then as well or just tell the user "we are going to try it other way if you failed to auth".  Persisting the option or found encoding is probably a must.

So, not just few lines of code anyway...
Comment 130 ggo 2013-06-24 23:39:39 PDT
Thank all for your comments:)
I'll try to work on this again if necessary. (unfortunately I don't have time these days..., so I'll "Reset Assignee to default" if I have the right to do it).
Comment 131 ggo 2013-06-24 23:58:50 PDT
(In reply to Brian Smith (:bsmith) from comment #127)

Hi Brian,

I don't disagree with the "r-".
I understand the patch is not perfect, and that "pref" is not the best way, for the explained reasons (people usually don't play with prefs)
The only point is that I don't understand what it breaks something , because:
1) without the pref set, ASCII chars only are supported (iso8859-1 chars are currently rejected, at least with the code I know : FF 19 & 20)
2) with the pref set (which may be asked/acceptable under certain circumstances), ASCII chars are supported the same way as they are without the pref set, and chars out of the ASCII charset are too... but it's true that this way is not compliant with a standard method.
The most standard way I think would be to check the "charset" authentication header (already suggested by others and me) in order to check if the Web server supports the UTF8 charset, as specified in RFC2831:
"charset
      This directive, if present, specifies that the server supports
      UTF-8 encoding for the username and password. If not present, the
      username and password must be encoded in ISO 8859-1 (of which
      US-ASCII is a subset). The directive is needed for backwards
      compatibility with HTTP Digest, which only supports ISO 8859-1.
      This directive may appear at most once; if multiple instances are
      present, the client should abort the authentication exchange.
"

Thank you,
Olivier.




> Comment on attachment 740318 [details] [diff] [review]
> Allow utf-8 encoded username in digest authentication response header.
> 
> Review of attachment 740318 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ggo, Thanks for the patch. I am going to r- it, based on the reasons given
> above. If you disagree and think that I should take a look at this patch,
> please respond to the comments that dolske and I made above and r?bsmith
> again. Thanks again!
Comment 132 Julian Reschke 2013-06-25 00:43:44 PDT
(In reply to ggo from comment #131)
> ...
> The only point is that I don't understand what it breaks something , because:
> 1) without the pref set, ASCII chars only are supported (iso8859-1 chars are
> currently rejected, at least with the code I know : FF 19 & 20)

As far as I understand, FF currently uses ISO-8859-1. Are you saying it does not?

> 2) with the pref set (which may be asked/acceptable under certain
> circumstances), ASCII chars are supported the same way as they are without
> the pref set, and chars out of the ASCII charset are too... but it's true
> that this way is not compliant with a standard method.

Changing the default from ISO-8859-1 to UTF-8 will break those sites that rely on it being ISO-8859-1.

> The most standard way I think would be to check the "charset" authentication
> header (already suggested by others and me) in order to check if the Web
> server supports the UTF8 charset, as specified in RFC2831:

That parameter does not apply to Basic and/or Digest. That being said, you may want to help fixing the issue in the specs. See <http://greenbytes.de/tech/webdav/draft-ietf-httpauth-basicauth-enc-latest.html>.

> ...
Comment 133 ggo 2013-06-25 01:26:15 PDT
(In reply to Julian Reschke from comment #132)
> (In reply to ggo from comment #131)
> > ...
> > The only point is that I don't understand what it breaks something , because:
> > 1) without the pref set, ASCII chars only are supported (iso8859-1 chars are
> > currently rejected, at least with the code I know : FF 19 & 20)
> 
> As far as I understand, FF currently uses ISO-8859-1. Are you saying it does
> not?

Yes, that's what I'm saying, for Digest Authentication.
see nsHttpDigestAuth::AppendQuotedString() in netwerk/protocol/http/nsHttpDigestAuth.cpp:

.
.

    //
    // CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
    //
    if (*s <= 31 || *s == 127) {
      return NS_ERROR_FAILURE;
    }
.
.


> 
> > 2) with the pref set (which may be asked/acceptable under certain
> > circumstances), ASCII chars are supported the same way as they are without
> > the pref set, and chars out of the ASCII charset are too... but it's true
> > that this way is not compliant with a standard method.
> 
> Changing the default from ISO-8859-1 to UTF-8 will break those sites that
> rely on it being ISO-8859-1.

Agree with that. Except that according to the code and some tests I have done, it seems that it does not work with ISO-8859-1 at this time.

> 
> > The most standard way I think would be to check the "charset" authentication
> > header (already suggested by others and me) in order to check if the Web
> > server supports the UTF8 charset, as specified in RFC2831:
> 
> That parameter does not apply to Basic and/or Digest. That being said, you
> may want to help fixing the issue in the specs. See
> <http://greenbytes.de/tech/webdav/draft-ietf-httpauth-basicauth-enc-latest.
> html>.

Right, it does not apply to basic authentication.
However if I understand correctly, RFC2831 (which talks about "HTTP Digest Access
   Authentication") seems to say the "charset" parameter could be used...


> 
> > ...

Thank you,
Olivier.
Comment 134 Julian Reschke 2013-06-25 01:54:20 PDT
(In reply to ggo from comment #133)
> (In reply to Julian Reschke from comment #132)
> > (In reply to ggo from comment #131)
> > > ...
> > > The only point is that I don't understand what it breaks something , because:
> > > 1) without the pref set, ASCII chars only are supported (iso8859-1 chars are
> > > currently rejected, at least with the code I know : FF 19 & 20)
> > 
> > As far as I understand, FF currently uses ISO-8859-1. Are you saying it does
> > not?
> 
> Yes, that's what I'm saying, for Digest Authentication.
> see nsHttpDigestAuth::AppendQuotedString() in
> netwerk/protocol/http/nsHttpDigestAuth.cpp:
> 
> .
> .
> 
>     //
>     // CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
>     //
>     if (*s <= 31 || *s == 127) {
>       return NS_ERROR_FAILURE;
>     }
> .
> .

What does rejecting control characters have to do with ISO-8859-1 support?

> > > 2) with the pref set (which may be asked/acceptable under certain
> > > circumstances), ASCII chars are supported the same way as they are without
> > > the pref set, and chars out of the ASCII charset are too... but it's true
> > > that this way is not compliant with a standard method.
> > 
> > Changing the default from ISO-8859-1 to UTF-8 will break those sites that
> > rely on it being ISO-8859-1.
> 
> Agree with that. Except that according to the code and some tests I have
> done, it seems that it does not work with ISO-8859-1 at this time.

Then it's probably time to write tests first.

> > > The most standard way I think would be to check the "charset" authentication
> > > header (already suggested by others and me) in order to check if the Web
> > > server supports the UTF8 charset, as specified in T:
> > 
> > That parameter does not apply to Basic and/or Digest. That being said, you
> > may want to help fixing the issue in the specs. See
> > <http://greenbytes.de/tech/webdav/draft-ietf-httpauth-basicauth-enc-latest.
> > html>.
> 
> Right, it does not apply to basic authentication.
> However if I understand correctly, RFC2831 (which talks about "HTTP Digest
> Access
>    Authentication") seems to say the "charset" parameter could be used...

But that RFC is not about the Digest HTTP auth mechanism.

Yes, the HTTPAUTH WG might end up adopting this approach for HTTP as well, but we're not there yet.

That being said, *if* we use parameters, we'll have first to fix the code that parses the WWW-Authenticate header field. See test cases at <http://greenbytes.de/tech/tc/httpauth/>.
Comment 135 ggo 2013-06-25 02:31:04 PDT
(In reply to Julian Reschke from comment #134)
> (In reply to ggo from comment #133)
> > (In reply to Julian Reschke from comment #132)
> > > (In reply to ggo from comment #131)
> > > > ...
> > > > The only point is that I don't understand what it breaks something , because:
> > > > 1) without the pref set, ASCII chars only are supported (iso8859-1 chars are
> > > > currently rejected, at least with the code I know : FF 19 & 20)
> > > 
> > > As far as I understand, FF currently uses ISO-8859-1. Are you saying it does
> > > not?
> > 
> > Yes, that's what I'm saying, for Digest Authentication.
> > see nsHttpDigestAuth::AppendQuotedString() in
> > netwerk/protocol/http/nsHttpDigestAuth.cpp:
> > 
> > .
> > .
> > 
> >     //
> >     // CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
> >     //
> >     if (*s <= 31 || *s == 127) {
> >       return NS_ERROR_FAILURE;
> >     }
> > .
> > .
> 
> What does rejecting control characters have to do with ISO-8859-1 support?

you're right, this is what this code is supposed to do. But since s is a "nsACString" (= some chars), and since char is signed (at least by default), the "*s <= 31" test is always true for any char which value would be >= 128 if it was stored in an unsigned char.  So in the end, the only allowed char values are > to 31 and < to 127.
(*** at least in the FF version 20 code I have, may be it has been changed in FF 21; did not check yet...)

> 
> > > > 2) with the pref set (which may be asked/acceptable under certain
> > > > circumstances), ASCII chars are supported the same way as they are without
> > > > the pref set, and chars out of the ASCII charset are too... but it's true
> > > > that this way is not compliant with a standard method.
> > > 
> > > Changing the default from ISO-8859-1 to UTF-8 will break those sites that
> > > rely on it being ISO-8859-1.
> > 
> > Agree with that. Except that according to the code and some tests I have
> > done, it seems that it does not work with ISO-8859-1 at this time.
> 
> Then it's probably time to write tests first.

probably yes.

> 
> > > > The most standard way I think would be to check the "charset" authentication
> > > > header (already suggested by others and me) in order to check if the Web
> > > > server supports the UTF8 charset, as specified in T:
> > > 
> > > That parameter does not apply to Basic and/or Digest. That being said, you
> > > may want to help fixing the issue in the specs. See
> > > <http://greenbytes.de/tech/webdav/draft-ietf-httpauth-basicauth-enc-latest.
> > > html>.
> > 
> > Right, it does not apply to basic authentication.
> > However if I understand correctly, RFC2831 (which talks about "HTTP Digest
> > Access
> >    Authentication") seems to say the "charset" parameter could be used...
> 
> But that RFC is not about the Digest HTTP auth mechanism.
OK.

> 
> Yes, the HTTPAUTH WG might end up adopting this approach for HTTP as well,
> but we're not there yet.
> 
> That being said, *if* we use parameters, we'll have first to fix the code
> that parses the WWW-Authenticate header field. See test cases at
> <http://greenbytes.de/tech/tc/httpauth/>.

OK.

Thanks for the feedback :)
Comment 136 z.reddish.j 2013-10-19 21:24:55 PDT Comment hidden (me-too)
Comment 137 z.reddish.j 2013-10-19 21:35:09 PDT Comment hidden (off-topic)
Comment 138 for nothing 2015-04-02 23:09:22 PDT
Created attachment 8587817 [details] [diff] [review]
to do here as draft-ietf-httpauth-digest-15

This is incompatible with Opera and Chromium(they just send the username as byte utf8 without transport encoding), but it's like here http://tools.ietf.org/html/draft-ietf-httpauth-digest-15. Despite the fact that draft-ietf-httpauth-digest-15 this is a draft only, he already will not change significantly, at least charset can still use it.
Comment 139 Patrick McManus [:mcmanus] 2015-04-03 05:37:12 PDT
Comment on attachment 8587817 [details] [diff] [review]
to do here as draft-ietf-httpauth-digest-15

Review of attachment 8587817 [details] [diff] [review]:
-----------------------------------------------------------------

honza, I think you have the most domain knowledge here and have shown an interest in 656213.. but let me know if the review should be rerouted
Comment 140 Honza Bambas (:mayhemer) 2015-04-07 14:45:10 PDT
Created attachment 8589323 [details] [diff] [review]
to do here as draft-ietf-httpauth-digest-15 (refreshed as it should be)
Comment 141 Honza Bambas (:mayhemer) 2015-04-07 14:51:48 PDT
I'll try to get to this within few days.
Comment 142 Honza Bambas (:mayhemer) 2015-06-22 03:30:35 PDT
FYI: review is in progress!
Comment 143 Honza Bambas (:mayhemer) 2015-07-17 10:33:09 PDT
Review draft lost!!!!  I have to start all over again!
Comment 144 Jason Duell [:jduell] (needinfo? me) 2015-08-04 17:59:10 PDT
*** Bug 656213 has been marked as a duplicate of this bug. ***
Comment 145 Honza Bambas (:mayhemer) 2016-01-06 05:06:03 PST
Comment on attachment 8589323 [details] [diff] [review]
to do here as draft-ietf-httpauth-digest-15 (refreshed as it should be)

Review of attachment 8589323 [details] [diff] [review]:
-----------------------------------------------------------------

There is a plenty of formatting issues.  The original patch was not submitted with 8 line context and function names.  Please read https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style and maybe also https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/How_to_Submit_a_Patch

::: netwerk/protocol/http/nsHttpBasicAuth.cpp
@@ +75,3 @@
>      nsAutoCString userpass;
> +    nsAutoCString charset;
> +    if(strstr(challenge,"charset=")){

please don't use strstr or any of CRT string functions.  Use mozilla::Tokenizer (see bug 1024056).  The challenge header should be parsed correctly.

@@ +75,4 @@
>      nsAutoCString userpass;
> +    nsAutoCString charset;
> +    if(strstr(challenge,"charset=")){
> +	const char *p = strstr(challenge,"charset=") + 8;

this is very unsafe...

@@ +95,5 @@
> +	if (password)
> +		AppendUTF16toUTF8(password, userpass);
> +    }
> +    else{
> +	//actually it is ISO-8859-1

if charset == "" (or there is no "charset" option in the challenge), otherwise on unknown values we should fail the authentication.  Rather be safe.

::: netwerk/protocol/http/nsHttpDigestAuth.cpp
@@ -61,2 @@
>    if (NS_FAILED(rv)) return rv;
> -

please leave the blank lines that make the code more readable.  and add some to your new code as well.

@@ -204,5 @@
>    if (NS_FAILED(rv)) {
>      LOG(("nsHttpDigestAuth::GenerateCredentials [ParseChallenge failed rv=%x]\n", rv));
>      return rv;
>    }
> -

again, why are you unreasonably removing blank lines?

@@ +302,5 @@
> +  else{
> +	//this true lossy convert UTF16 to ASCII
> +	//ASCII is an 7-bit encoding
> +    const char16_t *p = username;
> +		cUser.Assign(*p % 128);

you really can't do this...  who is using real 7bit ascii these days?  the correct way is escaping and it's probably not what you want here, or yes?

TODO for me: check the spec

@@ +338,5 @@
> +  if(userhash == 2){
> +	  char hashuser[EXPANDED_DIGEST_SHA256_LENGTH+1];
> +    	  cUser.Append(":");
> +    	  cUser.Append(realm);
> +	  aHash(cUser.get(), cUser.Length(), algorithm);

check result

@@ +345,5 @@
> +	  authString += hashuser;
> +	  authString += '\"';
> +	  goto appendRealm;
> +  }
> +  if(charset.EqualsLiteral("ISO-8859-1") || charset.EqualsLiteral("UTF-8")){

please use else if () instead of a goto here.

@@ +353,5 @@
> +	  else{
> +		authString.AssignLiteral("Digest username*=UTF-8\'\'");
> +	  }
> +	  nsAutoCString escUser;
> +	  NS_Escape(cUser, escUser,  url_XAlphas);

TODO for me: check this escaping

@@ +453,5 @@
> +	  len = 2*EXPANDED_DIGEST_SHA256_LENGTH + nonce.Length() + 2;
> +  }
> +  else{
> +	  len = 2*EXPANDED_DIGEST_LENGTH + nonce.Length() + 2;
> +  }

please have functions DigestLength(algorithm) and ExpadedDigestLength(algorithm).  It will make the code much more clean and clear on many places.  When |algorithm| is not one of the expected make those function MOZ_ASSERT(false);

This particular lines should then be:
uint32_t len = nonce.Length() + 2 + ExpadedDigestLength(algorithm) * 2;

@@ +454,5 @@
> +  }
> +  else{
> +	  len = 2*EXPANDED_DIGEST_LENGTH + nonce.Length() + 2;
> +  }
> +  if (qop == QOP_AUTH || qop == QOP_AUTH_INT) {

TODO for me: check this condition

@@ -468,2 @@
>    nsAutoCString contents;
> -  contents.SetCapacity(len + 1);

all these calculations are perf optimizations.  but definitely not critical.

::: netwerk/protocol/http/nsHttpDigestAuth.h
@@ +84,3 @@
>  
>      // result is in mHashBuf
> +    nsresult aHash(const char *buf, uint32_t len, uint16_t algorithm);

Hash or DoHash
Comment 146 Honza Bambas (:mayhemer) 2016-01-06 05:07:37 PST
Comment on attachment 8587817 [details] [diff] [review]
to do here as draft-ietf-httpauth-digest-15

Please see review comments in comment 145.  I'm really sorry for such a huge delay.  This patch simply got lost among all my other reviews and more priority work.  I'll do better next time.
Comment 147 for nothing 2016-02-01 19:34:31 PST
Created attachment 8714628 [details] [diff] [review]
to do here as draft-ietf-httpauth-digest

About it:

> +  else{
> +	//this true lossy convert UTF16 to ASCII
> +	//ASCII is an 7-bit encoding
> +    const char16_t *p = username;
> +		cUser.Assign(*p % 128);

you really can't do this...  who is using real 7bit ascii these days?  the correct way is escaping and it's probably not what you want here, or yes?

I can not do otherwise. For escaping required "charset" to use extended notation username*=charset''username, without it required to use quoted string ascii or do not use at all ABNF.
Comment 148 Masatoshi Kimura [:emk] 2016-02-01 22:17:47 PST
First of all, the internet draft is RFC 7616 (for digest auth)/7617 (for basic auth) now.

(In reply to for nothing from comment #147)
> I can not do otherwise. For escaping required "charset" to use extended
> notation username*=charset''username, without it required to use quoted
> string ascii or do not use at all ABNF.

You don't have to care about charset other than UTF-8. UTF-8 is the only valid value per RFC 7616. For all other charset values (or servers that do not send the charset parameter), just use |username="<username without escape>"| as before (that is, the RFC 2617 compatible header). The quoted-string production can have non-ASCII octets.

https://tools.ietf.org/html/rfc7230#section-3.2.6
>     quoted-string  = DQUOTE *( qdtext / quoted-pair ) DQUOTE
>     qdtext         = HTAB / SP /%x21 / %x23-5B / %x5D-7E / obs-text
>     obs-text       = %x80-FF

Although it is obsolete, RFC 2617 compatible header is obsolete by definition. It will be needed for compatibility with old servers. RFC 7616-unaware servers are unlikely to support "username*" not "userhash".
Comment 149 Honza Bambas (:mayhemer) 2016-02-04 06:39:14 PST
Comment on attachment 8714628 [details] [diff] [review]
to do here as draft-ietf-httpauth-digest

The patch is malformed on binary level and cannot be applied.  Please submit a result of hg diff -U 8 -p (or qdiff).
Comment 150 for nothing 2016-02-08 09:46:29 PST
Created attachment 8717027 [details] [diff] [review]
to do here as rfc 7616
Comment 151 Honza Bambas (:mayhemer) 2016-04-22 06:33:51 PDT
Comment on attachment 8717027 [details] [diff] [review]
to do here as rfc 7616

Review of attachment 8717027 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the update and my apologies for taking that long to respond.

I'll ask you to do one major thing: please split this patch to two pieces extra for each type of authentication.  One for basic auth, which I believe will be easier to bring to the final state, and second for digest auth.  This will make progress on this bug faster.  Thanks.

::: netwerk/protocol/http/nsHttpBasicAuth.cpp
@@ +78,5 @@
> +    nsAutoCString charset;
> +    Tokenizer p(challenge);
> +    Tokenizer::Token t;
> +    while (p.Next(t)) {
> +      if (t.AsString() == "charset" && p.Next(t) && (t.AsChar() == '=')) {

Note: p.Next(t) && (t.AsChar() == '=') can better be written as p.CheckChar('=')

@@ +80,5 @@
> +    Tokenizer::Token t;
> +    while (p.Next(t)) {
> +      if (t.AsString() == "charset" && p.Next(t) && (t.AsChar() == '=')) {
> +        p.Record();
> +        while (p.Next(t) && !t.Equals(Tokenizer::Token::Char(',')));

Note: now we have a convenient ReadUntil method

@@ +84,5 @@
> +        while (p.Next(t) && !t.Equals(Tokenizer::Token::Char(',')));
> +        p.Claim(charset);
> +        charset.StripChar('"', 0);
> +      }
> +    }

You must respect the grammar here.  This (if your code were written correctly, by the way) will take any occurrence of "charset" within the string.

The grammar is:

Schema header="value"[,header="value"]*

Also allows white spaces (including cr/lf) between each element.  I think we are OK with having just a necessary subset of https://tools.ietf.org/html/rfc5234#section-4

Roughly looking at the spec, seems like header is what Tokenizer recognizes as "word" with added "-" as word char (see the constructor).  value seems to be anything between the quotes delimited by ','.  Not sure how quotes are escaped.

You will need to write a better loop for it or even a simple recursive descent (optionally by deriving from Tokenizer).  You have to walk the headers one by one and pick charset value.

@@ +91,5 @@
> +      userpass.Append(':');
> +      if (password) {
> +        AppendUTF16toUTF8(password, userpass);
> +      }
> +    } else {

we must fail when charset is found non-null and something else than "UTF-8".  The spec seems to allow only that value.

::: netwerk/protocol/http/nsHttpDigestAuth.cpp
@@ +212,5 @@
>  
> +  char ha1_digest[EXPANDED_DIGEST_SHA256_LENGTH+1];
> +  char ha2_digest[EXPANDED_DIGEST_SHA256_LENGTH+1];
> +  char response_digest[EXPANDED_DIGEST_SHA256_LENGTH+1];
> +  char upload_data_digest[EXPANDED_DIGEST_SHA256_LENGTH+1];

I'd slightly more prefer to have LONGEST_DIGEST_LENGTH define that will simply defined as the longest of the defined hash lengths, so that when new one is added, we simply redefine it (when needed) and don't crash.

@@ +319,5 @@
> +    for (;*p != 0;p++) {
> +      if (*p % 128 >= 32) {
> +        cUser.Append(*p % 128);
> +      }
> +    }

can you please explain what you are doing here?  we might already have some helper functions for this (if I knew what's going on).

@@ +358,5 @@
>    //
>  
>    nsAutoCString authString;
>  
> +  if (userhash == 2) {

please have defines for possible values of |userhash| to make it clear what is happening here.

@@ +374,5 @@
> +    if (charset.EqualsLiteral("ISO-8859-1")) {
> +      authString.AssignLiteral("Digest username*=ISO-8859-1\'\'");
> +    } else {
> +      authString.AssignLiteral("Digest username*=UTF-8\'\'");
> +    }

maybe: 

authString.AssignLiteral("Digest username*=")
authString.Append(charset);

?

@@ +405,2 @@
>    } else {
> +    authString.AppendLiteral("MD5");

We need to know that we don't understand the algorithm.  But here for unknown values you simply fallback to MD5, which is default only when the "algorithm" header is not present at all.  That is the whole purpose of the ALGO_SPECIFIED flag, which your patch effectively removes.

@@ +722,5 @@
>      else if (nameLength == 9 &&
>          nsCRT::strncasecmp(challenge+nameStart, "algorithm", 9) == 0)
>      {
>        // we want to clear the default, so we use = not |= here
>        *algorithm = ALGO_SPECIFIED;

with this usage, this should be renamed to ALGO_SPECIFIED_UNKNOWN

@@ +739,3 @@
>        else if (valueLength == 8 &&
> +               nsCRT::strncasecmp(challenge+valueStart, "MD5-sess", 8) == 0) {
> +        *algorithm = ALGO_MD5_SESS;

should we pick the strongest algo when this header is present more then once?  or should we ignore the challenge completely when present more than once?

::: netwerk/protocol/http/nsHttpDigestAuth.h
@@ +24,4 @@
>  #define QOP_AUTH 0x01
>  #define QOP_AUTH_INT 0x02
>  
>  #define DIGEST_LENGTH 16

rename to DIGEST_MD5_LENGTH

@@ +91,5 @@
>      nsresult AppendQuotedString(const nsACString & value,
>                                  nsACString & aHeaderLine);
>  
> +    int16_t DigestLength(int16_t algorithm);
> +    int16_t ExpadedDigestLength(int16_t algorithm);

Expanded? (typo)
Comment 152 for nothing 2016-05-09 23:14:02 PDT
Created attachment 8750616 [details] [diff] [review]
patch for basic auth

I do not know of bugs should correct patch for basic authentication, leave here.
Comment 153 for nothing 2016-05-09 23:21:33 PDT
Created attachment 8750619 [details] [diff] [review]
patch for digest auth

This for digest auth
Comment 154 Honza Bambas (:mayhemer) 2016-05-11 02:05:39 PDT
Comment on attachment 8750616 [details] [diff] [review]
patch for basic auth

Review of attachment 8750616 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpBasicAuth.cpp
@@ +78,5 @@
> +    nsAutoCString charset;
> +    Tokenizer p(challenge);
> +    Tokenizer::Token t;
> +    while (p.Next(t)) {
> +      if (t.AsString() == "charset" && p.CheckChar('=')) {

This is still not a grammar-like parsing loop, sorry.  Also, doing t.AsString() on a returned token w/o checking its type first is wrong.  The correct way to check it's a word and has an expected content is to do:

t.Equals(Tokenizer::Token::Word("charset"))

which does the correct type and value checks for you.  Just read the comments in https://mxr.mozilla.org/mozilla-central/source/xpcom/ds/Tokenizer.h to more understand how it works.

@@ +91,5 @@
> +    }
> +    if (!charset.IsEmpty()) {
> +      if (charset.EqualsLiteral("UTF-8")) {
> +        CopyUTF16toUTF8(user, userpass);
> +        userpass.Append(':');

please leave the "// always send a ':' (see bug 129565)" comment
Comment 155 Honza Bambas (:mayhemer) 2016-05-17 10:04:31 PDT
Comment on attachment 8750616 [details] [diff] [review]
patch for basic auth

ups.
Comment 156 Honza Bambas (:mayhemer) 2016-05-17 11:08:12 PDT
Comment on attachment 8750619 [details] [diff] [review]
patch for digest auth

Review of attachment 8750619 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.  This looks mostly well, just few formatting and details to be updated.

Other thing we need is a test.  Probably just expand https://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/test_authentication.js to also exercise the sha256 code.  the more corners of your added/modified code you cover the better - ideally - all.

::: netwerk/protocol/http/nsHttpDigestAuth.cpp
@@ +59,5 @@
>    }
>  
> +  if (algorithm & ALGO_SHA256 || algorithm & ALGO_SHA256_SESS) {
> +    rv = mVerifier->Init(nsICryptoHash::SHA256);
> +  } else {

for safety, you should check for the ALGO_MD5* flags here as well.  and NS_ERROR("Algorithm not set")+return NS_ERROR_UNEXPECTED when none of the flags is found.

@@ +296,5 @@
>    // calculate credentials
>    //
>  
> +  if (!charset.IsEmpty()) {
> +    if (!charset.EqualsLiteral("UTF-8")) {

if (!charset.IsEmpty() && !charset.EqualsLiteral("UTF-8"))

@@ +324,5 @@
>    //
>  
>    nsAutoCString authString;
>  
> +  if (userhash & USERHASH_TRUE) {

just == ?

@@ +329,5 @@
> +    char hashuser[EXPANDED_DIGEST_LENGTH+1];
> +    cUser.Append(":");
> +    cUser.Append(realm);
> +    rv = DoHash(cUser.get(), cUser.Length(), algorithm);
> +    if (NS_FAILED(rv)) return rv;

if (NS_FAILED(rv)) {
  return rv;
}

or better

NS_ENSURE_SUCCESS(rv, rv);

@@ +334,5 @@
> +    ExpandToHex(mHashBuf, algorithm, hashuser);
> +    authString.AssignLiteral("Digest username=\"");
> +    authString += hashuser;
> +    authString += '\"';
> +  } else if (charset.EqualsLiteral("UTF-8")) {

would be nice to somehow cache this test

@@ +339,5 @@
> +    authString.AssignLiteral("Digest username*=");
> +    authString.Append(charset);
> +    authString.AppendLiteral("\'\'");
> +    nsAutoCString escUser;
> +    NS_Escape(cUser, escUser,  url_XAlphas);

two spaces after the second comma

@@ +400,5 @@
>    }
>  
> +  if (userhash) {
> +    authString.AppendLiteral(", userhash=");
> +    if (userhash & USERHASH_TRUE) {

userhash == USERHASH_TRUE ?  I didn't really mean these should be flags, just named constants to make more clear what's going on.  they should not be combined.

@@ +475,3 @@
>  {
> +  int16_t index, value, digestLen;
> +  digestLen = DigestLength(algorithm);

nit, maybe just:

int16_t index, value, digestLen = DigestLength(algorithm);

@@ +489,5 @@
>      else
>        result[(index*2)+1] = value - 10 + 'a';
>    }
>  
> +  result[digestLen * 2] = 0;

ExpandedDigestLength(algorithm) instead of digestLen * 2 please

@@ +601,4 @@
>    *stale = false;
>    *algorithm = ALGO_MD5; // default is MD5
>    *qop = 0;
> +  uint16_t algo = 0;

if you really need this, please declare before first use

@@ +668,5 @@
> +    else if (nameLength == 7 &&
> +	      nsCRT::strncasecmp(challenge+nameStart, "charset", 7) == 0)
> +    {
> +      if (!charset.EqualsLiteral("UTF-8")) {
> +        charset.Assign(challenge+valueStart, valueLength);

please explain what the condition exactly means.  are you just ignoring everything but UTF-8?  a comment is appropriate here.

@@ +713,5 @@
> +        algo = ALGO_MD5_SESS;
> +      }
> +      if (*algorithm < algo){
> +      *algorithm |= algo;
> +      }

isn't it OK to just always do *algorithm |= algo;?  we select strongest algo later in GenerateCredentials anyway.

nit: indention and spaces.

@@ +779,5 @@
> +nsHttpDigestAuth::DigestLength(int16_t algorithm)
> +{
> +  MOZ_ASSERT(algorithm >= ALGO_SPECIFIED && algorithm <= ALGO_SHA256_SESS + 1);
> +  int16_t len;
> +  if (algorithm & ALGO_SHA256 || algorithm & ALGO_SHA256_SESS) {

nit: algorithm & (ALGO_SHA256 || ALGO_SHA256_SESS)

::: netwerk/protocol/http/nsHttpDigestAuth.h
@@ +30,3 @@
>  #define NONCE_COUNT_LENGTH 8
> +#define USERHASH_FALSE 0x01
> +#define USERHASH_TRUE 0x02

We may also want USERHASH_UNKNOWN or _NOT_SET defined as 0 which would be the default value.

Note You need to log in before you can comment on or make changes to this bug.